Re: [CODE4LIB] code4lib services and https

2015-09-06 Thread stuart yeates

SSL is security theatre unless people start doing it better.

SSL is a layer of complexity, it's easy to get wrong and the library 
community is systematically getting it wrong (picking on some big names, 
because they're tough enough to take it, not because they noticeably do 
it any better or worse):


https://www.ssllabs.com/ssltest/analyze.html?d=viaf.org
https://www.ssllabs.com/ssltest/analyze.html?d=code4lib.org
https://www.ssllabs.com/ssltest/analyze.html?d=loc.gov

I'd implore you to check a couple of sites local to you and ping the 
administrators if it doesn't get the all clear.


In some cases there are reasons why security might be lagging on a 
particular site (third party hosting, third party client connecting 
using out-of-date SSL libraries, need to support 
many-years-out-of-patch-cycle browsers, etc), but that's the kind of 
thing that needs to be an explicit policy.


cheers
stuart


Re: [CODE4LIB] Open Journal Systems - experiences?

2014-12-16 Thread stuart yeates

On 17/12/14 04:23, Tania Fersenheim wrote:

I have some staff interested in a pilot of Open Journal Systems.
http://openjournalsystems.com/

Anyone here have experiences with the software they'd like to share, either
installed locally or hosted by the OJS folks?

I'm especially interested in how responsive the developers and the user
community are.



The software is OJS and can be found at https://pkp.sfu.ca/ojs/

The link you used points to an third-party hoster of that software.

My experience with locally-installed OJS installed 'out-of-the-box' with 
half a dozen journals (some doing retrospective digitalisation, so 
reasonably large) has been great. Two upgrades have gone very painlessly.


As with most open source software, most of the development is left to 
individual users to do (or to pay third parties to do). For example I 
reported that the MARCXML output is completely useless and got this 
response: http://pkp.sfu.ca/bugzilla/show_bug.cgi?id=9019 On the 
positive side, I have the tools to provide better MARCXML, on the 
downside, I have to wrangle the time to do that.


cheers
stuart


Re: [CODE4LIB] MARC reporting engine

2014-11-03 Thread Stuart Yeates
Apologies, I should have used Plain English for an international audience. 
'Sundry' means 'miscellaneous' or 'other'

Ideally for each person, I'd generate a range of date for mentions, a check to 
see whether they had obituaries in the index, I'll also generate URLs into the 
search engines for various external systems (worldcat, VIAF, ORCID, digitalnz, 
etc) because these are useful to the editor who makes the decisions about using 
the content to make the wikipedia stub.

cheers
stuart

--
I have a new phone number: 04 463 5692


From: Code for Libraries  on behalf of Jean-Claude 
Dauphin 
Sent: Tuesday, 4 November 2014 7:40 a.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARC reporting engine

Hi Stuart,

I made some experiments with the innz-metadata in J-ISIS software, and you
may be interested to read the summary which is attached. Thank you for
informing the CODE4LIB list about the innz-metadata dataset, this is very
useful for testing and improving J-ISIS.

But now, I would like to see if it's easy to do what you wish to achieve
with J-ISIS. Please excuse my ignorance, but could you please explain on
which MARC fields or subfields you wish to extract the person authorities
and explain me what are the sundry metadata and how they are related to
MARC records. I googled about "sundry metadata"  but didn't found any
satisfactory information

Best wishes,

Jean-Claude

On Mon, Nov 3, 2014 at 4:24 PM, Brian Kennison  wrote:

> On Nov 2, 2014, at 9:29 PM, Stuart Yeates  stuart.yea...@vuw.ac.nz>> wrote:
>
> Do any of these have built-in indexing? 800k records isn't going to fit in
> memory and if building my own MARC indexer is 'relatively straightforward'
> then you're a better coder than I am.
>
>
>
> I think the XMLDB idea is the way to go but I’d use Basex (
> http://basex.org). Basex has  query and indexing capabilities, If you
> know XSLT (and SQL) then you’d at least have a start with Xquery.
>
> —Brian
>



--
Jean-Claude Dauphin

jc.daup...@gmail.com
jc.daup...@afus.unesco.org

http://kenai.com/projects/j-isis/
http://www.unesco.org/isis/
http://www.unesco.org/idams/
http://www.greenstone.org


Re: [CODE4LIB] MARC reporting engine

2014-11-03 Thread Stuart Yeates
Thank you to all who responded with software suggestions. 
https://github.com/ubleipzig/marctools is looking like the most promising 
candidate so far. The more I read through the recommendations the more it 
dawned on me that I don't want to have to configure yet another java toolchain 
(yes I know, that may be personal bias). 

Thank you to all who responded about the challenges of authority control in 
such collections. I'm aware of these issues. The current project is about 
marshalling resources for editors to make informed decisions about rather than 
automating the creation of articles, because there is human judgement involved 
in the last step I can afford to take a few authority control 'risks'

cheers
stuart

--
I have a new phone number: 04 463 5692


From: Code for Libraries  on behalf of raffaele 
messuti 
Sent: Monday, 3 November 2014 11:39 p.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARC reporting engine

Stuart Yeates wrote:
> Do any of these have built-in indexing? 800k records isn't going to fit in 
> memory and if building my own MARC indexer is 'relatively straightforward' 
> then you're a better coder than I am.

you could try marcdb[1] from marctools[2]

[1] https://github.com/ubleipzig/marctools#marcdb
[2] https://github.com/ubleipzig/marctools


--
raffaele


Re: [CODE4LIB] MARC reporting engine

2014-11-02 Thread Stuart Yeates
Do any of these have built-in indexing? 800k records isn't going to fit in 
memory and if building my own MARC indexer is 'relatively straightforward' then 
you're a better coder than I am. 

cheers
stuart

--
I have a new phone number: 04 463 5692


From: Code for Libraries  on behalf of Jonathan 
Rochkind 
Sent: Monday, 3 November 2014 1:24 p.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARC reporting engine

If you are, can become, or know, a programmer, that would be relatively 
straightforward in any programming language using the open source MARC 
processing library for that language. (ruby marc, pymarc, perl marc, whatever).

Although you might find more trouble than you expect around authorities, with 
them being less standardized in your corpus than you might like.

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Stuart Yeates 
[stuart.yea...@vuw.ac.nz]
Sent: Sunday, November 02, 2014 5:48 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] MARC reporting engine

I have ~800,000 MARC records from an indexing service 
(http://natlib.govt.nz/about-us/open-data/innz-metadata CC-BY). I am trying to 
generate:

(a) a list of person authorities (and sundry metadata), sorted by how many 
times they're referenced, in wikimedia syntax

(b) a view of a person authority, with all the records by which they're 
referenced, processed into a wikipedia stub biography

I have established that this is too much data to process in XSLT or multi-line 
regexps in vi. What other MARC engines are there out there?

The two options I'm aware of are learning multi-line processing in sed or 
learning enough koha to write reports in whatever their reporting engine is.

Any advice?

cheers
stuart
--
I have a new phone number: 04 463 5692


[CODE4LIB] MARC reporting engine

2014-11-02 Thread Stuart Yeates
I have ~800,000 MARC records from an indexing service 
(http://natlib.govt.nz/about-us/open-data/innz-metadata CC-BY). I am trying to 
generate:

(a) a list of person authorities (and sundry metadata), sorted by how many 
times they're referenced, in wikimedia syntax

(b) a view of a person authority, with all the records by which they're 
referenced, processed into a wikipedia stub biography

I have established that this is too much data to process in XSLT or multi-line 
regexps in vi. What other MARC engines are there out there?

The two options I'm aware of are learning multi-line processing in sed or 
learning enough koha to write reports in whatever their reporting engine is.

Any advice?

cheers
stuart
--
I have a new phone number: 04 463 5692


Re: [CODE4LIB] content inventory of mediawiki site?

2014-10-30 Thread Stuart Yeates
All but the cats should be available thought the standard API, I believe.

https://www.mediawiki.org/wiki/API:Main_page

cheers
stuart

--
I have a new phone number: 04 463 5692


From: Code for Libraries  on behalf of Shearer, 
Timothy 
Sent: Friday, 31 October 2014 4:49 a.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] content inventory of mediawiki site?

Hi Folks,

My google fu isn't working.

Does anyone know of an extension, something native, or a methodology to
get a content inventory of a mediawiki site.

We're trying to get a report that includes

Pagename, most recent editor, date created, date last modified, categories

Thanks for any advice,
Tim


Re: [CODE4LIB] Subject: Re: Why learn Unix?

2014-10-28 Thread Stuart Yeates
'alias' is a non-portable bash-ism. 

Of course, this matters less now Oracle as declared Solaris dead.

cheers
stuart

--
I have a new phone number: 04 463 5692


From: Code for Libraries  on behalf of Alex Berry 

Sent: Wednesday, 29 October 2014 1:11 p.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Subject: Re: Why learn Unix?

And that is why alias rm='rm -I' was invented.

Quoting Roy Tennant :

> I agree. I've done serious damage to my own server this way. Anyone
> who knows me knows that I'm completely capable of this. Unlike
> others, who are both more intelligent and more cautious. Down the
> path of the wild carded, recursive delete command lies DANGER. Having
> a little bit of knowledge is more dangerous, in most cases, than none
> at all. In Unix and in whitewater rafting.
> Roy
>
>
>> On Oct 28, 2014, at 6:46 PM, Cary Gordon  wrote:
>>
>> Well you can do a lot of damage quickly using very short commands. Deleting
>> the master boot record can be quite effective, but I will demure from
>> giving specific examples.
>>
>>
>>
>> On Tue, Oct 28, 2014 at 3:22 PM, Stuart Yeates 
>> wrote:
>>
>>>> -- Because you can delete everything on the system with a very short
>>>> command.
>>>
>>> This is actually a misconception.
>>>
>>> The very short command doesn't delete everything on the system. The
>>> integrity of files which are currently open (including things like the
>>> kernel image, executable files for currently-running programs, etc) is
>>> protected until they are closed (or the next reboot, whichever is first).
>>> These files vanish from the directory structure on the filesystem but can
>>> still be accessed by interacting with the running processes which have them
>>> open (or /proc/ for the very desperate).
>>>
>>> This is the POSIX alternative to the windows "That file is currently in
>>> use" scenario and explains why, when a runaway log file fills up a disk,
>>> you have to both delete the log file and restart the service to get the
>>> disk back.
>>>
>>> cheers
>>> stuart
>>
>>
>>
>> --
>> Cary Gordon
>> The Cherry Hill Company
>> http://chillco.com
>


Re: [CODE4LIB] Subject: Re: Why learn Unix?

2014-10-28 Thread Stuart Yeates
> -- Because you can delete everything on the system with a very short
> command.

This is actually a misconception. 

The very short command doesn't delete everything on the system. The integrity 
of files which are currently open (including things like the kernel image, 
executable files for currently-running programs, etc) is protected until they 
are closed (or the next reboot, whichever is first). These files vanish from 
the directory structure on the filesystem but can still be accessed by 
interacting with the running processes which have them open (or /proc/ for the 
very desperate). 

This is the POSIX alternative to the windows "That file is currently in use" 
scenario and explains why, when a runaway log file fills up a disk, you have to 
both delete the log file and restart the service to get the disk back.

cheers
stuart


Re: [CODE4LIB] Why learn Unix?

2014-10-27 Thread Stuart Yeates
Learning UNIX is a dreadful idea. 

If you think you want to learn UNIX, you probably should learn POSIX.

Implementations are transient; if we're lucky standards are durable.

cheers
stuart

--
I have a new phone number: 04 463 5692


From: Code for Libraries  on behalf of Siobhain 
Rivera 
Sent: Tuesday, 28 October 2014 3:02 a.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Why learn Unix?

Hi everyone,

I'm part of the ASIS&T Student Chapter and Indiana University, and we're
putting together a series of workshops on Unix. We've noticed that a lot of
people don't seem to have a good idea of why they should learn Unix,
particularly the reference/non technology types. We're going to do some
more research to make a fact sheet about the uses of Unix, but I thought
I'd pose the question to the list - what do you think are reasons
librarians need to know Unix, even if they aren't in particularly tech
heavy jobs?

I'd appreciate any input. Have a great week!

Siobhain Rivera
Indiana University Bloomington
Library Science, Digital Libraries Specialization
ASIS&T-SC, Webmaster


Re: [CODE4LIB] Linux distro for librarians

2014-10-21 Thread Stuart Yeates
Turning this question on it's head:

Is there any group / page / etc doing coordination of which library software is 
packaged for which distros and the chasing of distro-level bugs?

At least some interoperability issues would be mitigated if all the appropriate 
libraries installed and worked reliably out of the box on whatever platform our 
colleagues were being force by local precedent to use.

cheers
stuart 


[CODE4LIB] Wikipedia in teaching

2014-10-20 Thread Stuart Yeates
Some of you may know of teaching staff using, or looking to use,  wikipedia in 
their courses; if you do, I implore you to forward them 
https://en.wikipedia.org/wiki/Wikipedia:Education_program Wikipedia has active 
assistance that can be provided in such cases, but assistance is less useful 
once egg has connected with face.

Alternatively, to see whether we're already providing assistance to courses at 
your institution, you can go to https://en.wikipedia.org/wiki/Special:Courses

cheers
stuart

--
I have a new phone number: 04 463 5692


[CODE4LIB] ISSN lists?

2014-10-16 Thread Stuart Yeates
My understanding is that there is no universal ISSN list but that worldcat 
allows querying of their database by ISSN. 

Which method of sampling the ISSN namespace is going to cause least pain? 
http://www.worldcat.org/ISSN/ seems to be the one talked about, but is there 
another that's less resource intensive? Maybe someone's already exported this 
data?

cheers
stuart
--
I have a new phone number: 04 463 5692


Re: [CODE4LIB] Digitization Project from Scratch

2014-10-14 Thread Stuart Yeates
Once the physical embodiment of books become self-aware, they might seriously 
look at building a consortia. 

That may or may not be what triggers the transition to a post-apocalyptic world.

cheers
stuart

--
I have a new phone number: 04 463 5692


From: Code for Libraries  on behalf of Cary Gordon 

Sent: Wednesday, 15 October 2014 1:54 p.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digitization Project from Scratch

OK

I am now obsessed with the idea of a post-apocalyptic consortia of 100,000 
libraries, each with one book.

Cary

On Oct 14, 2014, at 4:57 PM, Stuart Yeates  wrote:

> Others in this thread have all made useful comments, but I think it would pay 
> to take a step back first and ask yourself some questions about your 
> situation:
>
> (*) what's your volume of material? Do you have a single book? a shelf of 
> contents? a room of content? a multi-site organisation full of content?
> (*) what are your resources? Do you have techies? Do you have cataloguers? Do 
> you have volunteers? Do you have machine-readable catalog records for the 
> books?  Is there good authority control for the people in the archive? Do you 
> have existing finding aids? Do you have a book scanner?
> (*) Are you working as part of an enduring institution with a demonstrated 
> commitment to archives?
> (*) Have you looked around for possible consortia to join?
> (*) Have you looked around to see who else has already digitised 
> closely-related materials?
> (*) Which languages are the archives in?
> (*) Do you have a collections policy?
> ...
>
> The more detailed the answers, the better we'll be able to give you advice 
> rather than just push our prejudices at you...
>
> cheers
> stuart
>
>
> --
> I have a new phone number: 04 463 5692
>
> 
> From: Code for Libraries  on behalf of P.G. 
> 
> Sent: Wednesday, 15 October 2014 9:55 a.m.
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] Digitization Project from Scratch
>
> Hello,
>
> Anyone has experience in digitizing archival materials? I need your
> recommendations/suggestions on how we can start with our digitization. We
> need to build a searchable website so the public can access our materials
> of images, publications and media files.
>
> What platform did you use? Open-source or fee-base? What is your experience
> using it?
>
> Basically, we started using Sharepoint but at this point, I believe it is
> only good for sharing of internal documents. We are on a limited budget so
> we may need to host it on our own server as well.
>
> Any feedback or persons to contact for more info is highly appreciated.
> Thanks.
>
> Chris


Re: [CODE4LIB] Digitization Project from Scratch

2014-10-14 Thread Stuart Yeates
Others in this thread have all made useful comments, but I think it would pay 
to take a step back first and ask yourself some questions about your situation:

(*) what's your volume of material? Do you have a single book? a shelf of 
contents? a room of content? a multi-site organisation full of content?
(*) what are your resources? Do you have techies? Do you have cataloguers? Do 
you have volunteers? Do you have machine-readable catalog records for the 
books?  Is there good authority control for the people in the archive? Do you 
have existing finding aids? Do you have a book scanner?
(*) Are you working as part of an enduring institution with a demonstrated 
commitment to archives?
(*) Have you looked around for possible consortia to join?
(*) Have you looked around to see who else has already digitised 
closely-related materials? 
(*) Which languages are the archives in?
(*) Do you have a collections policy?
...

The more detailed the answers, the better we'll be able to give you advice 
rather than just push our prejudices at you...

cheers
stuart


--
I have a new phone number: 04 463 5692


From: Code for Libraries  on behalf of P.G. 

Sent: Wednesday, 15 October 2014 9:55 a.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Digitization Project from Scratch

Hello,

Anyone has experience in digitizing archival materials? I need your
recommendations/suggestions on how we can start with our digitization. We
need to build a searchable website so the public can access our materials
of images, publications and media files.

What platform did you use? Open-source or fee-base? What is your experience
using it?

Basically, we started using Sharepoint but at this point, I believe it is
only good for sharing of internal documents. We are on a limited budget so
we may need to host it on our own server as well.

Any feedback or persons to contact for more info is highly appreciated.
Thanks.

Chris


Re: [CODE4LIB] wget archiving for dummies

2014-10-06 Thread Stuart Yeates
A number of others have suggested other approaches, but since you started with 
wget, here are the two wget commands I recently used to archive a 
wordpress-behind-exproxy site. The first logs into ezproxy and saves the login 
as a cookie. The second uses to cookie to access a site through exproxy 

wget  --no-check-certificate --keep-session-cookies  --save-cookies cookies.txt 
 --post-data 'user=yeatesst&pass=PASSWORD&auth=d1&url' 
https://login.EZPROXYMACHINE/login

wget --restrict-file-names=windows  --default-page=index.php -e robots=off  
--mirror --user-agent="" --ignore-length --keep-session-cookies  --save-cookies 
cookies.txt --load-cookies cookies.txt --recursive  --page-requisites 
--convert-links --backup-converted "http://WORDPRESSMACHINE. 
EZPROXYMACHINE/BLOGNAME"

cheers
stuart


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric 
Phetteplace
Sent: Monday, 6 October 2014 7:44 p.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] wget archiving for dummies

Hey C4L,

If I wanted to archive a Wordpress site, how would I do so?

More elaborate: our library recently got a "donation" of a remote Wordpress 
site, sitting one directory below the root of a domain. I can tell from a 
cursory look it's a Wordpress site. We've never archived a website before and I 
don't need to do anything fancy, just download a workable copy as it presently 
exists. I've heard this can be as simple as:

wget -m $PATH_TO_SITE_ROOT

but that's not working as planned. Wget's convert links feature doesn't seem to 
be quite so simple; if I download the site, disable my network connection, then 
host locally, some 20 resources aren't available. Mostly images which are under 
the same directory. Possibly loaded via AJAX. Advice?

(Anticipated) pertinent advice: I shouldn't be doing this at all, we should 
outsource to Archive-It or similar, who actually know what they're doing.
Yes/no?

Best,
Eric


Re: [CODE4LIB] REST vs ODBC

2014-09-22 Thread Stuart Yeates

On 23/09/14 10:01, Fitchett, Deborah wrote:

Morning, all,

We have a small dilemma:

1.   Our brand new Alma system provides access to a bunch of data via 
RESTful API. It’s on The Cloud so we’re not going to be getting direct access 
to the database anytime soon.


Is there a reason reason that you can't take your nightly backups of 
MARC data, suck them into a disposable install of koha and the koha ODBC 
connection?


cheers
stuart


[CODE4LIB] Wikipedia notability (was Re: [CODE4LIB] Official #teamharpy Statement on the case of Joseph Hawley Murphy vs. nina de jesus and Lisa Rabey)

2014-09-22 Thread Stuart Yeates
I'm currently spending a chunk of time attempting to balance 
https://en.wikipedia.org/wiki/Category:New_Zealand_academics for 
recentism, gender imbalance and racial imbalance, creating >100 
biographies so far.


I can tell you that google scholar is a really crappy measure once you 
move outside the modern hard sciences, particularly in fields where the 
monograph is still revered.


I can also tell you that there are people who have won the Hector Medal 
(for a long time the highest science prize in the country) who are in 
neither google scholar nor VIAF (and yes, we're a country hooked up to 
the feeder system for VIAF).


As much as we like to think that libraries are central to academia and 
research, sometimes the world is not as it appears from our hallowed 
windows.


[Anyone struggling to prove notability for a non-straight-white-male is 
welcome to ping me on wikipedia for help.]


cheers
stuart


On 21/09/14 06:12, Karen Coyle wrote:


I also was interested because I've recently joined the hardworking group
of Wikipedians who work to distinguish between notable persons and able
self-promoters. In doing so, I've learned a lot about how self-promotion
works, especially in social media. In Wikipedia, to be considered
notable, there needs to be some reliable proof - that is, third-party
references, not provided by the individual in question. In terms of
accomplishments, for example for academics, there is a list of
"measures", albeit not measurable in the scientific sense. [1]

Just for a lark, look at the Google scholar profiles for Joe Murphy,
RoyT, and for myself:

http://scholar.google.com/citations?user=zW1lb04J&hl=en&oi=ao
http://scholar.google.com/citations?user=LJw73cAJ&hl=en
http://scholar.google.com/citations?user=m4Tx73QJ&hl=en&oi=ao

The "h-index", while imprecise, is about as close as you get to
something one can cite as a measure. It's not a decision, but it is an
indication.

I put this forward not as proof of anything, but to offer that
reputation is extremely hard to quantify, but should be looked at with a
critical eye and not taken for granted. It also fits in with what we
already know, which is that men promote themselves in the workplace more
aggressively than women do. In fact, in the Wikipedia group, we mainly
find articles about men whose notability is over-stated. (You can see my
blog post on the problems of notability for women. [2])

I greatly admire your stand for free speech. Beyond this, I will contact
you offline with other thoughts.

kc
[1] http://en.wikipedia.org/wiki/Wikipedia:Notability_%28academics%29
[2] http://kcoyle.blogspot.com/2014/09/wpnotability-and-women.html

On 9/20/14, 9:16 AM, Lisa Rabey wrote:

Friends:


I know many of you have already been boosting the signal, and we thank
you profusely for the help.

For those who do not know, Joe Murphy is currently suing nina and I in
$1.25M defamation case because

 From our official statement
(http://teamharpy.wordpress.com/why-are-we-being-sued/)

"Mr. Murphy claims that Ms. Rabey “posted the following false,
libelous and highly damaging tweet accusing the plaintiff of being a
‘sexual predator'”3. He further claims that Ms. de jesus wrote a blog
post that “makes additional false, libelous, highly damaging,
outrageous, malicious statements against the plaintiff alleging the
commission of sexual harassment and sexual abuse of women and other
forms of criminal and unlawful behaviour”4.

Both Ms. Rabey and Ms. de jesus maintain that our comments are fair
and are truthful, which we intend to establish in our defense. Neither
of us made the claims maliciously nor with any intent to damage Mr.
Murphy’s reputation."

Right now we need the following most importantly:

1. We have a call out for additional witnesses
(http://teamharpy.wordpress.com/call-for-witnesses/), which have
started to filter in more accounts of harrassment. Please, PLEASE, if
you know/seen/heard anything about the plaintiff, or know someone who
might -- please have them get in touch.

2. Share our site (http://teamharpy.wordpress.com) which includes
details of the case and updates. Please help us get the word out to as
many people as possible about the plaintiff's attempt to silence those
speaking up against sexual harassment and why you won't stand for it.

3.
onations: Many, many of you have asked to help donate to fund our
mounting legal costs. We will have a donation page up soon. Even if
you cannot help financially, please share across your social networks.

We will not be silenced. We will not be shamed.

Thank you again. The outpouring of support that has been happening has
made this all very much worth while.

Best,
Lisa










[CODE4LIB] handle and doi HTTPS infrastructure

2014-09-11 Thread Stuart Yeates
First up, I've got to say that I'm unaware of anyone using these over 
HTTPS in production, so issues are forward-looking and largely hypothetical.


The good news is that both use DNSSEC:

http://dnssec-debugger.verisignlabs.com/hdl.handle.net
http://dnssec-debugger.verisignlabs.com/dx.doi.org

The bad news is that some servers in the dx.doi.org DNS rotation don't 
appear be listening on 443 at all and that those that do have variable 
configuration that gets them a 'C':


https://www.ssllabs.com/ssltest/analyze.html?d=dx.doi.org

Further, a number of doi.org-native links redirect from HTTPS to HTTP 
without warning. For example https://dx.doi.org/ links to 
https://dx.doi.org/help.html but that's just a redirect to 
http://www.doi.org/factsheets/DOIProxy.html www.doi.org isn't listening 
on port 443.


Testing DOI resolution over HTTPS gives occasional very long timeouts 
(presumably those non-443 servers?).


All of the servers in the hdl.handle.net  DNS rotation are listening on 
443, but again the variable security config and low scores:


https://www.ssllabs.com/ssltest/analyze.html?d=hdl.handle.net

Note that some of the servers have 'test' in their server name, which 
makes me wonder...


Again, the home site and help pages are HTTP only and there are HTTPS-> 
HTTP redirects.


Testing handle resolution over HTTPS seemed to work reliably for me when 
I tested it.


Anyone have ideas as to who needs to lobby who to get this improved?

cheers
stuart


Re: [CODE4LIB] metadata for free ebook repositories

2014-08-19 Thread Stuart Yeates
> Authors in OL have already been linked to Wikipedia, and Wikipedia has
> been linked to VIAF, and the OCLC number, when present, has been taken
> from the MARC record. Therefore the OL record in some cases already has
> these connections.

It's not just about authors. It's also about the work (+manifestation, +... ), 
the subject (particularly when the subject is separate work), the publisher, 
the illustrator, etc, etc.

cheers
stuart


Re: [CODE4LIB] metadata for free ebook repositories

2014-08-18 Thread Stuart Yeates

I think what I'm looking for is a crowd-sourcing platform to add:

https://en.wikipedia.org/wiki/Willa_Cather
http://viaf.org/viaf/182113193/

https://en.wikipedia.org/wiki/My_%C3%81ntonia
http://www.worldcat.org/title/my-antonia/oclc/809034

...

to

https://archive.org/download/myantonia00cathrich/myantonia00cathrich_marc.xml 



cheers
stuart


On 19/08/14 11:57, Karen Coyle wrote:

About 1/3 of the 1M ebooks on OpenLibrary.org have full MARC records,
and you can retrieve the record via the API. There is also a "secret"
record format that returns not the full MARC for the hard copy (which is
what the records represent because these are digitized books) but a
record that has been modified to represent the ebook.

The MARC records for the hard copy follow the pattern:

https://archive.org/download/[archive identifier]/[archive
identifier]_marc.[xml|mrc]

Download MARC XML
https://archive.org/download/myantonia00cathrich/myantonia00cathrich_marc.xml

Download MARC binary
https://www.archive.org/download/myantonia00cathrich/myantonia00cathrich_meta.mrc
<https://archive.org/download/myantonia00cathrich/myantonia00cathrich_meta.mrc>



To get the one that represents the ebook, do:

https://archive.org/download/[archive identifier]/[archive
identifier]_archive_marc.xml

https://archive.org/download/myantonia00cathrich/myantonia00cathrich_archive_marc.xml


This one has an 007, the 245 $h, and a few other things.

Tom Morris did some code that helps you search for books by author and
title and retrieve a MARC record. I don't recall where his github
archive is, but I'll find out and post it here. The code is open source.
We used it for a project that added ebook records to a public library
catalog.

You can also use the OPenLibrary API to select all open access ebooks.
What I'd like to see is a way to create a list or bibliography in OL
that then is imported into a program that will find MARC records for
those books. The list function is still under development, though.

kc

On 8/18/14, 3:04 PM, Stuart Yeates wrote:

There are a stack of great free ebook repositories available on the
web, things like https://unglue.it/ http://www.gutenberg.org/
https://en.wikibooks.org/wiki/Main_Page http://www.gutenberg.net.au/
https://www.smashwords.com/books/category/1/newest/0/free/any etc, etc

What there doesn't appear to be, is high-quality AACR2 / RDA records
available for these. There are things like
https://ebooks.adelaide.edu.au/meta/pg/ which are elaborate dublin
core to MARC converters, but these lack standardisation of names,
authority control (people, entities, places, etc), interlinking, etc.

It seems to me that quality metadata would greatly increase the value
/ findability / use of these projects and thus their visibility and
available sources.

Are there any projects working in this space already? Are there
suitable tools available?

cheers
stuart




[CODE4LIB] metadata for free ebook repositories

2014-08-18 Thread Stuart Yeates
There are a stack of great free ebook repositories available on the web, 
things like https://unglue.it/ http://www.gutenberg.org/ 
https://en.wikibooks.org/wiki/Main_Page http://www.gutenberg.net.au/ 
https://www.smashwords.com/books/category/1/newest/0/free/any etc, etc


What there doesn't appear to be, is high-quality AACR2 / RDA records 
available for these. There are things like 
https://ebooks.adelaide.edu.au/meta/pg/ which are elaborate dublin core 
to MARC converters, but these lack standardisation of names, authority 
control (people, entities, places, etc), interlinking, etc.


It seems to me that quality metadata would greatly increase the value / 
findability / use of these projects and thus their visibility and 
available sources.


Are there any projects working in this space already? Are there suitable 
tools available?


cheers
stuart


Re: [CODE4LIB] EZProxy ssl security

2014-08-13 Thread Stuart Yeates

Thank you, that helped greatly.

cheers
stuart

On 13/08/14 10:09, Will Martin wrote:

I can't offer a comprehensive guide, but I can give you some tips
gleaned from the EZ Proxy mailing list and my own experimentation.

There are some configuration settings you can adjust to improve its
security.  Here are the ones from mine:

# Disable old, insecure SSL methods
Option DisableSSL56bit
Option DisableSSL40bit
Option DisableSSLv2

Those go before setting the LoginPortSSL -- in my config.txt, they're
the first thing after the Name directive at the top of the file.

Doing that will help a good bit.  Here's the report for my server on SSL
Labs:

https://www.ssllabs.com/ssltest/analyze.html?d=ezproxy.library.und.edu

A marked improvement.  Not perfect, but much better.

EZ Proxy embeds a statically linked copy of the SSL libraries, so SSL
upgrades to it only happen when you update EZ Proxy itself.  I'm on
version 5.7.32, which still suffers from some old security
vulnerabilities, as you can see in the SSL labs report.

I believe the next version of EZ Proxy is supposed to update the SSL to
support newer protocols.  But I'm not sure, and I'm unlikely to find out
of my own.  OCLC recently changed their pricing model to a yearly
subscription fee if you want to receive continued updates, and my
university has not chosen to pay for that at this time.  So we won't be
getting any further updates until we can find the money for the yearly fee.

Hope this helps.

Will Martin

On 2014-08-12 16:38, Stuart Yeates wrote:

So I just ran my EZproxy through an SSL checker and was shocked by the
outcome:

https://www.ssllabs.com/ssltest/analyze.html?d=login.helicon.vuw.ac.nz

Finding other EZproxy installs in google and checking them gave a
range of answers, some MUCH better, some MUCH worse. Clearly secure
EZproxy is possible, but patchy.

Is there a decent guide to securing EZproxy anywhere?

I'm hoping that it might be as simple as dropping a new openssl
library into a directory within the EZproxy install?

cheers
stuart


[CODE4LIB] EZProxy ssl security

2014-08-12 Thread Stuart Yeates
So I just ran my EZproxy through an SSL checker and was shocked by the 
outcome:


https://www.ssllabs.com/ssltest/analyze.html?d=login.helicon.vuw.ac.nz

Finding other EZproxy installs in google and checking them gave a range 
of answers, some MUCH better, some MUCH worse. Clearly secure EZproxy is 
possible, but patchy.


Is there a decent guide to securing EZproxy anywhere?

I'm hoping that it might be as simple as dropping a new openssl library 
into a directory within the EZproxy install?


cheers
stuart



Re: [CODE4LIB] Bandwidth control

2014-08-05 Thread Stuart Yeates
We had complaints from students about other students using the limited 
resource (in this case student computers) to do facebook / youtube.


We negotiated with the students union that certain sites would be 
blocked from those machines for a certain busy period during the day. 
Negotiation with the students union appeared to be hugely important in 
deflating any protests.


cheers
stuart

On 05/08/14 02:20, Carol Bean wrote:

A quick and dirty search of the list archives turned up this topic from 5
years ago.  I am wondering what libraries (especially those with limited
resources) are doing today to control or moderate bandwidth, e.g., where
viewing video sites uses up excessive amounts of bandwidth?

Thanks for any help,
Carol

Carol Bean
beanwo...@gmail.com



Re: [CODE4LIB] Community anti-harassment policy

2014-07-02 Thread Stuart Yeates

There exists a code at:

https://github.com/code4lib/antiharassment-policy/blob/master/code_of_conduct.md

I believe it applies here.

cheers
stuart

On 07/03/2014 12:54 PM, Coral Sheldon-Hess wrote:

I was under the impression that we had a code of conduct/anti-harassment
policy in place for IRC and the mailing lists. Was this an incorrect
impression?

I am definitely in favor of adopting one, if there isn't one in place!

Logistically, Geek Feminism is also not a formal organization--they were
recently described as an anarchist collective--so I think we could follow
their lead pretty easily. We could make a mail alias that goes to a
ROTATING team/committee (this is very important; people burn out, dealing
with these things for too long), for reporting purposes. IRC aliases are a
thing, too, right?

-coral



[CODE4LIB] ANNOUNCEMENT REMINDER (Was: Re: [CODE4LIB] NOW AVAILABLE: VIVO Release 1.7

2014-07-02 Thread Stuart Yeates
I'd just like to remind posters of announcements that it's not really 
helpful post announcements that don't actually say what it is that 
product / service / event you're announcing is.


The following case is particularly egregious, since the project home 
page at https://wiki.duraspace.org/display/VIVO/VIVO+Main+Page also 
fails to contain this important information.


For the record "VIVO is an open source, semantic-web tool for research 
discovery -- finding people and the research they do." 
https://wiki.duraspace.org/display/VIVO/Short+Tour%3A+VIVO+Overview


cheers
stuart


On 07/03/2014 02:53 AM, Carol Minton Morris wrote:

FOR IMMEDIATE RELEASE

July 2, 2014
Contact: Layne Johnson 
Read it online: http://bit.ly/1sXbXHE

VIVO Release 1.7 is Now Available!
Key Features Include Enhanced ORCID Functionality and Simplified Data Handling

Winchester, MA  The VIVO Project is pleased to announce the release of VIVO 
1.7.  The software can be installed by downloading either a zip or tar.gz file 
located on the download page at VIVOweb.org and deploying it to your web server 
for production use. Installation Instructions and an Upgrade Guide v1.6 to 1.7 
are also available. VIVO is a DuraSpace project.

The VIVO 1.7 release combines new features with improvements to existing 
features and services and continues to leverage the VIVO-Integrated Semantic 
Framework (VIVO-ISF) ontology introduced in VIVO 1.6.  No data migration or 
changes to local data ingest procedures, visualization, or analysis tools 
drawing directly on VIVO data will be required to upgrade to VIVO 1.7.

VIVO 1.7 notably includes the results of an ORCID Adoption and Integration 
Grant to support the creation and verification of ORCID iDs. VIVO now offers 
the opportunity for a researcher to add and/or confirm his or her global, 
unique researcher identifier directly with ORCID without the necessity of 
applying through other channels and re-typing the 16-digit ORCID identifier.  
We anticipate that this facility will help promote ORCID iDs more widely and 
expand adoption for the benefit of the entire research community.

VIVO 1.7 also incorporates several updates to key software libraries in VIVO, 
including the Apache Jena libraries that provide the default VIVO triple store 
from Jena 2.6.4 to Jena 2.10.1.  This Jena upgrade does require existing VIVO 
sites to run an automated migration procedure for user accounts prior to 
upgrading VIVO itself.

The Apache Solr search library used by VIVO has been updated to Solr 4.7.2 and 
the programming interface to Solr has been modularized to allow substitution of 
alternative search indexing libraries to benefit from specific desired features.

The SPARQL web services introduced in VIVO 1.6 have been extended to support full 
read-write capability and content negotiation through a single interface. The ability to 
export or "dump" the entire VIVO knowledge base for analysis by external tools 
has also been improved to scale better with triple store size, as has the ability to 
request lists of RDF by type to facilitate linked data applications.

The VIVO 1.7 release also reflects feedback from the VIVO Leadership Group 
requesting a predictable pattern of one minor release and one major release 
each year. We anticipate releases in late spring/early summer and again in late 
fall to help adopters plan for release schedules and new features, and 
anticipate any changes that may affect local data ingest processes, 
visualizations, reporting, and/or data analysis.

Learn More at the VIVO Conference

There’s still time to register for the upcoming VIVO Conference that will be 
held in Austin, TX August 6-8, 2014. The program is designed to help you 
harness the full potential of research networking, discovery, and open research.

• Program available here
• Register here

How Does DuraSpace Help?

VIVO is a DuraSpace project. The DuraSpace (http://duraspace.org) organization is an 
independent 501(c)(3) not-for-profit providing leadership and innovation for open 
technologies that promote durable, persistent access and discovery of digital data. Our 
values are expressed in our organizational byline, "Committed to our digital 
future."

DuraSpace works collaboratively with organizations that use VIVO to advance the 
design, development and sustainability of the project. As a non-profit, 
DuraSpace provides technical leadership, sustainability planning, fundraising, 
community development, marketing and communications, collaborations and 
strategic partnerships, and administration.




Re: [CODE4LIB] Is ISNI / ISO 27729:2012 a name identifier or an entity identifier?

2014-06-19 Thread Stuart Yeates
In wikipedia, the principal representation for alternative names for 
entities are 'redirects'. The redirect from "Catherine Sefton" to 
"Martin Waddell" can be found at 
https://en.wikipedia.org/w/index.php?title=Catherine_Sefton&redirect=no 
(and yes, being a wiki it's editable).


That redirect is annotated that this is a  redirect "From an alternative 
name" (as opposed to a common spelling mistake or something else) and 
"From a printworthy page title" (which says to use this redirect when 
building (cross-) indexes etc.).


To create a link from the "Catherine Sefton" to an authority control 
system (as distinct from the "Martin Waddell" link), the redirect can be 
editted include an Authority control template (see 
https://en.wikipedia.org/wiki/Template:Authority_control ), which is the 
same template used for full articles.


cheers
stuart



On 06/19/2014 08:53 PM, Owen Stephens wrote:

An aside but interesting to see how some of this identity stuff seems to be 
playing out in the wild now. Google for Catherine Sefton:

https://www.google.co.uk/search?q=catherine+sefton

The Knowledge Graph displays information about Martin Waddell. Catherine Sefton 
is a pseudonym of Martin Waddell. It is impossible to know, but the most likely 
source of this knowledge is Wikipedia which includes the ISNI for Catherine 
Sefton in the Wikipeda page for Martin Waddell 
(http://en.wikipedia.org/wiki/Martin_Waddell) (although oddly not the ISNI for 
Martin Waddell under his own name).

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 18 Jun 2014, at 23:28, Stuart Yeates  wrote:


My reading of that suggests that http://isni-url.oclc.nl/isni/000122816316 shouldn't have both 
"Bell, Currer" and "Brontë, Charlotte", which it clearly does...

Is this is a case of one of our sources of truth doesn't distinguish betweens 
identities and entities and we're allowing it to pollute our data?

If that source of truth is wikipedia, we can fix that.

cheers
stuart

On 06/19/2014 12:11 AM, Richard Wallis wrote:

Hi all,

Seeing this thread I checked with the ISNI team and got the following
answer from Janifer Gatenby who asked me to post it on her behalf:

SNI identifies “public identities”.The scope as stated in the standard
is



“This International Standard specifies the International Standard name
identif*i*er (ISNI) for the identification of public identities of parties;
that is, the identities used publicly by parties involved throughout the
media content industries in the creation, production, management, and
content distribution chains.”



The relevant definitions are:



*3.1*

*party*

natural person or legal person, whether or not incorporated, or a group of
either

*3.3*

*public identity*

Identity of a *party *(3.1) or a fictional character that is or was
presented to the public

*3.4*

*name*

character string by which a *public identity *(3.3) is or was commonly
referenced



A party may have multiple public identities and a public identity may have
multiple names (e.g. pseudonyms)



ISNI data is available as linked data.  There are currently 8 million ISNIs
assigned and 16 million links.



Example:



[image: ]

~Richard.


On 16 June 2014 10:54, Ben Companjen  wrote:


Hi Stuart,

I don't have a copy of the official standard, but from the documents on
the ISNI website I remember that there are name variations and 'public
identities' (as the lemma on Wikipedia also uses). I'm not sure where the
borderline is or who decides when different names are different identities.

If it were up to me: pseudonyms are definitely different public
identities, name changes after marriage probably not, name change after
gender change could mean a different public identity. Different public
identities get different ISNIs; the ISNI organisation says the ISNI system
can keep track of connected public identities.

Discussions about name variations or aliases are not new, of course. I
remember the discussions about 'aliases' vs 'Artist Name Variations' that
are/were happening on Discogs.com, e.g. 'is J Dilla an alias or a ANV of
Jay Dee?' It appears the users on Discogs finally went with aliases, but
VIAF put the names/identities together: http://viaf.org/viaf/32244000 -
and there is no ISNI (yet).

It gets more confusing when you look at Washington Irving who had several
pseudonyms: they are just listed under one ISNI. Maybe because he is dead,
or because all other databases already know and connected the pseudonyms
to the birth name? (I just sent a comment asking about the record at
http://isni.org/isni/000121370797 )


[Here goes the reference list…]

Hope this helps :)

Groeten van Ben

On 15-06-14 23:11, "Stuart Yeates"  wrote:


Could someone with access to the o

Re: [CODE4LIB] Does 'Freedom to Read' require us to systematically privilege HTTPS over HTTP?

2014-06-18 Thread Stuart Yeates
Anyone thinking about these things is encouraged to read the thread 
"[CODE4LIB] EZProxy changes / alternatives ?" in the archives of this list.


cheers
stuart

On 06/19/2014 05:28 AM, Andrew Anderson wrote:

EZproxy already handles HTTPS connections for HTTPS enabled services today, and 
on modern hardware (i.e. since circa 2005), cryptographic processing far 
surpasses the speed of most network connections, so I do not accept the “it’s 
too heavy” argument against it supporting the HTTPS to HTTP functionality.  
Even embedded systems with 500MHz CPUs can terminate SSL VPNs at over 100Mb/s 
these days.

All I am saying is that the model where you expose HTTPS to the patron and 
still continue to use HTTP for the vendor is not possible with EZproxy today, 
and there is no technical reason why it could not do so, but rather a policy 
decision.  While HTTPS to HTTP translation would not completely solve the 
entire point of the original posting, it would be a step in the right direction 
until the rest of the world caught up.

As an aside, the lightweight nature of EZproxy seems to be becoming its 
Achilles Heel these days, as modern web development methods seem to be pushing 
the boundaries of its capabilities pretty hard.  The stance that EZproxy only 
supports what it understands is going to be a problem when vendors adopt 
HTTP/2.0, SDCH encoding, web sockets, etc., just as AJAX caused issues 
previously.  Most vendor platforms are Java based, and once Jetty starts 
supporting these features, the performance chasm between dumbed-down proxy 
connections and direct connections is going to become even more significant 
than it is today.



Re: [CODE4LIB] Is ISNI / ISO 27729:2012 a name identifier or an entity identifier?

2014-06-18 Thread Stuart Yeates
My reading of that suggests that 
http://isni-url.oclc.nl/isni/000122816316 shouldn't have both "Bell, 
Currer" and "Brontë, Charlotte", which it clearly does...


Is this is a case of one of our sources of truth doesn't distinguish 
betweens identities and entities and we're allowing it to pollute our data?


If that source of truth is wikipedia, we can fix that.

cheers
stuart

On 06/19/2014 12:11 AM, Richard Wallis wrote:

Hi all,

Seeing this thread I checked with the ISNI team and got the following
answer from Janifer Gatenby who asked me to post it on her behalf:

SNI identifies “public identities”.The scope as stated in the standard
is



“This International Standard specifies the International Standard name
identif*i*er (ISNI) for the identification of public identities of parties;
that is, the identities used publicly by parties involved throughout the
media content industries in the creation, production, management, and
content distribution chains.”



The relevant definitions are:



*3.1*

*party*

natural person or legal person, whether or not incorporated, or a group of
either

*3.3*

*public identity*

Identity of a *party *(3.1) or a fictional character that is or was
presented to the public

*3.4*

*name*

character string by which a *public identity *(3.3) is or was commonly
referenced



A party may have multiple public identities and a public identity may have
multiple names (e.g. pseudonyms)



ISNI data is available as linked data.  There are currently 8 million ISNIs
assigned and 16 million links.



Example:



[image: ]

~Richard.


On 16 June 2014 10:54, Ben Companjen  wrote:


Hi Stuart,

I don't have a copy of the official standard, but from the documents on
the ISNI website I remember that there are name variations and 'public
identities' (as the lemma on Wikipedia also uses). I'm not sure where the
borderline is or who decides when different names are different identities.

If it were up to me: pseudonyms are definitely different public
identities, name changes after marriage probably not, name change after
gender change could mean a different public identity. Different public
identities get different ISNIs; the ISNI organisation says the ISNI system
can keep track of connected public identities.

Discussions about name variations or aliases are not new, of course. I
remember the discussions about 'aliases' vs 'Artist Name Variations' that
are/were happening on Discogs.com, e.g. 'is J Dilla an alias or a ANV of
Jay Dee?' It appears the users on Discogs finally went with aliases, but
VIAF put the names/identities together: http://viaf.org/viaf/32244000 -
and there is no ISNI (yet).

It gets more confusing when you look at Washington Irving who had several
pseudonyms: they are just listed under one ISNI. Maybe because he is dead,
or because all other databases already know and connected the pseudonyms
to the birth name? (I just sent a comment asking about the record at
http://isni.org/isni/000121370797 )


[Here goes the reference list…]

Hope this helps :)

Groeten van Ben

On 15-06-14 23:11, "Stuart Yeates"  wrote:


Could someone with access to the official text of ISO 27729:2012 tell me
whether an ISNI is a name identifier or an entity identifier? That is,
if someone changes their name (adopts a pseudonym, changes their name by
to marriage, transitions gender, etc), should they be assigned a new
identifier?

If the answer is 'No' why is this called a 'name identifier'?

Ideally someone with access to the official text would update the
article at
https://en.wikipedia.org/wiki/International_Standard_Name_Identifier
With a brief quote referenced to the standard with a page number.

[The context of this is ORCID, which is being touted as an entity
identifier, while not being clear on whether it's a name or entity
identifier.]

cheers
stuart








Re: [CODE4LIB] Is ISNI / ISO 27729:2012 a name identifier or an entity identifier?

2014-06-18 Thread Stuart Yeates

Thank you for (and Janifer Gatenby) for this answer.

My reading of this is that people who change their name when they marry 
don't get a new ISNI, but those who change it when they transition 
gender do, because that's a new identify.


That's useful to know.

cheers
stuart

On 06/19/2014 12:11 AM, Richard Wallis wrote:

Hi all,

Seeing this thread I checked with the ISNI team and got the following
answer from Janifer Gatenby who asked me to post it on her behalf:

SNI identifies “public identities”.The scope as stated in the standard
is



“This International Standard specifies the International Standard name
identif*i*er (ISNI) for the identification of public identities of parties;
that is, the identities used publicly by parties involved throughout the
media content industries in the creation, production, management, and
content distribution chains.”



The relevant definitions are:



*3.1*

*party*

natural person or legal person, whether or not incorporated, or a group of
either

*3.3*

*public identity*

Identity of a *party *(3.1) or a fictional character that is or was
presented to the public

*3.4*

*name*

character string by which a *public identity *(3.3) is or was commonly
referenced



A party may have multiple public identities and a public identity may have
multiple names (e.g. pseudonyms)



ISNI data is available as linked data.  There are currently 8 million ISNIs
assigned and 16 million links.



Example:



[image: ]

~Richard.


On 16 June 2014 10:54, Ben Companjen  wrote:


Hi Stuart,

I don't have a copy of the official standard, but from the documents on
the ISNI website I remember that there are name variations and 'public
identities' (as the lemma on Wikipedia also uses). I'm not sure where the
borderline is or who decides when different names are different identities.

If it were up to me: pseudonyms are definitely different public
identities, name changes after marriage probably not, name change after
gender change could mean a different public identity. Different public
identities get different ISNIs; the ISNI organisation says the ISNI system
can keep track of connected public identities.

Discussions about name variations or aliases are not new, of course. I
remember the discussions about 'aliases' vs 'Artist Name Variations' that
are/were happening on Discogs.com, e.g. 'is J Dilla an alias or a ANV of
Jay Dee?' It appears the users on Discogs finally went with aliases, but
VIAF put the names/identities together: http://viaf.org/viaf/32244000 -
and there is no ISNI (yet).

It gets more confusing when you look at Washington Irving who had several
pseudonyms: they are just listed under one ISNI. Maybe because he is dead,
or because all other databases already know and connected the pseudonyms
to the birth name? (I just sent a comment asking about the record at
http://isni.org/isni/000121370797 )


[Here goes the reference list…]

Hope this helps :)

Groeten van Ben

On 15-06-14 23:11, "Stuart Yeates"  wrote:


Could someone with access to the official text of ISO 27729:2012 tell me
whether an ISNI is a name identifier or an entity identifier? That is,
if someone changes their name (adopts a pseudonym, changes their name by
to marriage, transitions gender, etc), should they be assigned a new
identifier?

If the answer is 'No' why is this called a 'name identifier'?

Ideally someone with access to the official text would update the
article at
https://en.wikipedia.org/wiki/International_Standard_Name_Identifier
With a brief quote referenced to the standard with a page number.

[The context of this is ORCID, which is being touted as an entity
identifier, while not being clear on whether it's a name or entity
identifier.]

cheers
stuart








Re: [CODE4LIB] Does 'Freedom to Read' require us to systematically privilege HTTPS over HTTP?

2014-06-17 Thread Stuart Yeates

On 06/18/2014 12:36 PM, Brent E Hanner wrote:

Stuart Yeates wrote:


Compared to other contributors to this thread, I appear to be (a) less
worried about state actors than our commercial partners and (b) keener
to see relatively straight forward technical fixes that just work 'for
free' across large classes of library systems. Things like:

* An ILS module that pulls the HTTPS Everywhere ruleset from
https://gitweb.torproject.org/https-everywhere.git/tree/HEAD:/src/chrome/content/rules
and applies those rules as a standard data-cleanup step on all
imported data (MARC, etc).

* A plugin to the CMS that drives the library's websites / blogs /
whatever and uses the same rulesets to default all links to HTTPS.

* An EzProxy plugin (or howto) on silently redirectly users to HTTPS
over HTTP sites.


So let me see if I understand this.  Your concern is that commercial
partners are putting HTTP links in their systems rather than HTTPS.
Because HTTPS only protects from a third party so the partner will still
have access to all the information about what the user read.  IP6 will
improve the HTTPS issue but something like HTTPS Everywhere (
https://www.eff.org/https-everywhere ) is actually the simplest
solution, especially as you can't be sure every link will have HTTPS.


My concern is that by referring users to resources and services via HTTP 
rather than HTTPS, we are encouraging users to leak more personal 
information (reading habits, location, language settings, etc) to third 
parties.


These third parties include our networking providers, our hosting 
providers, our content providers, the next person who uses the users' 
public computer, etc., etc.


HTTPS protects in multiple ways. Firstly it protects the data 'on the 
wire' (but that is rarely a problem in practice). Secondly HTTPS 
protects from web caching attacks. Thirdly the fact that a connection is 
HTTPS causes almost all tools and applications to use a more secure set 
of options and preferences, covering everything from cookie handling, to 
not remembering passwords, not storing local caches, using shorter 
timeouts, etc. This last category is where the real protection is.


There are lots of privacy breaches that HTTPS won't deter (a thorough 
compromise of the users' machine, a thorough compromise of the content 
provider's machine, etc.), but it raises the bar and protects against a 
significant number of breaches that become impossible or much, much 
harder / less likely.


My understanding is that that HTTPS and EzProxy can potentially protect 
readers identity very effectively (assuming the library systems are 
secure and no one turns up with a warrant).



And having just read the Freedom to Read Statement, this issue has no
bearing on it.  Freedom to Read is about accessibility to materials, not
privacy.  While no doubt there is some statement somewhere about that,
Freedom to Read is a statement about diversity of materials and not the
ability to read them without anyone knowing about it.


If materials are only available at the cost of personal privacy, are 
they really available? In repressive regimes all across the world people 
are actively discriminated against (or worse) for read the wrong book, 
being in the wrong place or communicating in the wrong language.


How many of us live in countries where currently (or in living memory) 
people are been derided for speaking a non-English language?


cheers
stuart


Re: [CODE4LIB] Does 'Freedom to Read' require us to systematically privilege HTTPS over HTTP?

2014-06-17 Thread Stuart Yeates

On 06/17/2014 08:49 AM, Galen Charlton wrote:

On Sun, Jun 15, 2014 at 4:03 PM, Stuart Yeates  wrote:

As I read it, 'Freedom to Read' means that we have to take active steps to
protect that rights of our readers to read what they want and  in private.

[snip]

* building HTTPS Everywhere-like functionality into LMSs (such functionality
may already exist, I'm not sure)


Many ILSs can be configured to require SSL to access their public
interfaces, and I think it would be worthwhile to encourage that as a
default expectation for discovery interfaces.

However, I think that's only part of the picture for ILSs.  Other
parts would include:

* staff training on handling patron and circulation data
* ensuring that the ILS has the ability to control (and let users
control) how much circulation and search history data gets retained
* ensuring that the ILS backup policy strikes the correct balance
between having enough for disaster recovery while not keeping
individually identifiable circ history forever
* ensuring that contracts with ILS hosting providers and services that
access patron data from the ILS have appropriate language concerning
data retention and notification of subpoenas.


Compared to other contributors to this thread, I appear to be (a) less 
worried about state actors than our commercial partners and (b) keener 
to see relatively straight forward technical fixes that just work 'for 
free' across large classes of library systems. Things like:


* An ILS module that pulls the HTTPS Everywhere ruleset from 
https://gitweb.torproject.org/https-everywhere.git/tree/HEAD:/src/chrome/content/rules 
and applies those rules as a standard data-cleanup step on all imported 
data (MARC, etc).


* A plugin to the CMS that drives the library's websites / blogs / 
whatever and uses the same rulesets to default all links to HTTPS.


* An EzProxy plugin (or howto) on silently redirectly users to HTTPS 
over HTTP sites.


cheers
stuart


[CODE4LIB] Does 'Freedom to Read' require us to systematically privilege HTTPS over HTTP?

2014-06-15 Thread Stuart Yeates
As I read it, 'Freedom to Read' means that we have to take active steps 
to protect that rights of our readers to read what they want and  in 
private.


Triggered by discussions at a bar-camp on NLNZ on Friday I'm thinking 
that in a digital world this means systematically privileging HTTPS over 
HTTP. Things like:

* serving our websites and content over HTTPS
* installing HTTPS Everywhere on public-access desktops
* preferring HTTPS links in EZProxy / MARC / etc (basically in our 
catalogued materials)
* building HTTPS Everywhere-like functionality into LMSs (such 
functionality may already exist, I'm not sure)

* providing user-education materials.

Thoughts?

cheers
stuart


[CODE4LIB] Is ISNI / ISO 27729:2012 a name identifier or an entity identifier?

2014-06-15 Thread Stuart Yeates
Could someone with access to the official text of ISO 27729:2012 tell me 
whether an ISNI is a name identifier or an entity identifier? That is, 
if someone changes their name (adopts a pseudonym, changes their name by 
to marriage, transitions gender, etc), should they be assigned a new 
identifier?


If the answer is 'No' why is this called a 'name identifier'?

Ideally someone with access to the official text would update the 
article at 
https://en.wikipedia.org/wiki/International_Standard_Name_Identifier 
With a brief quote referenced to the standard with a page number.


[The context of this is ORCID, which is being touted as an entity 
identifier, while not being clear on whether it's a name or entity 
identifier.]


cheers
stuart


Re: [CODE4LIB] orcid

2014-06-10 Thread Stuart Yeates
On a similar note, https://en.wikipedia.org/wiki/User:Pigsonthewing has 
just (today / yesterday depending on timezone) been appointed Wikipedian 
in residence at ORCID. He has tons of experience in museums, galleries 
and archives and is a great person to get in touch with in this kind of 
area.


cheers
stuart

On 06/11/2014 08:11 AM, todd.d.robb...@gmail.com wrote:

On a related-new-things-at-Wikipedia note:
https://en.wikipedia.org/wiki/Wikipedia:ORCID


On Tue, Jun 10, 2014 at 2:07 PM, Masamitsu, Pam  wrote:


http://orcid.org/faq-page#n110
ORCID is an acronym, short for Open Researcher and Contributor ID.
pam

Pam Masamitsu
Reference and Systems
Phone:  925.424.4299
Email:  masamit...@llnl.gov
Lawrence Livermore National Laboratory Main Library





-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Eric Lease Morgan
Sent: Tuesday, June 10, 2014 1:05 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] orcid

   Is ORCID an acronym, and if it is then what does it stand for? -ELM







[CODE4LIB] Selenium testing for outsourced library services?

2014-06-09 Thread Stuart Yeates
Has anyone had an success using Selenium or other web testing systems 
for testing and monitoring of complex outsourced web services?


I'm thinking of a system that is the integration of a website, an 
authentication service, a discovery service and a repository and 
monitoring that end-to-end use of the system is working.


cheers
stuart


Re: [CODE4LIB] orcid and researcherid and scopus, oh my

2014-06-05 Thread Stuart Yeates

On 06/06/2014 12:51 AM, Gary Thompson wrote:


I also hope to convince our campus Shibboleth IdP to add ORCID as a new
attribute.


If I understand correctly, what we need is ISNI added to the next 
release of EduPerson, per 
http://software.internet2.edu/eduperson/internet2-mace-dir-eduperson-201310.html


Then non-trivial numbers of service providers and identity providers can 
interoperate using ORCID and/or ISNI.


cheers
stuart


Re: [CODE4LIB] orcid and researcherid and scopus, oh my

2014-06-05 Thread Stuart Yeates
If there is really an appetite to continue DAIs going forward, the 
wikipedia support for identifiers is modula and there's no reason not to 
add more identifiers.


cheers
stuart

On 06/05/2014 11:06 PM, Ben Companjen wrote:

Hi,

Of course there are more identifier systems (or domains, if you will).

Most/many authors in The Netherlands have a Digital Author Identifier
(DAI), which is the record number in the GGC (Gemeenschappelijk
Geautomatiseerd Catalogiseersysteem), or Shared Automated Catalogue
system.
The DAIs are assigned by (university) libraries and in the case of
university libraries assigning/finding DAIs for their researchers, the DAI
is usually linked to the employee in the repository. Following
"EduStandaard" agreements [0] among all Dutch universities and some
service providers like my employer DANS and the National Library of the
Netherlands (KB), we can harvest the IRs and link publications to
researcher profiles and show them in NARCIS [1].

[0]:
http://www.edustandaard.nl/afspraken%20en%20architectuur/beheerde-afspraken
/
[1]: http://www.narcis.nl

Setup as a service by a company called Pica, the GGC is now hosted by OCLC
after Pica merged into OCLC [2]. The authority files for authors together
are called the NTA ([Dutch Thesaurus Author names]).

[2]: http://www.oclc.org/nl-NL/ggc.html

OCLC is also hosting the ISNI database and VIAF (of course). VIAF, as you
know, was setup as a crosswalk of authority files (including the NTA).
OCLC are working on crosswalking identifiers, AFAIK.

Please be aware that ISNI is a /name/ identifier. Pseudonyms and birth
names for the same person (should) get different ISNIs. And, as said
before, not only people can get ISNIs. Also, the business models for ORCID
and ISNI are different.


As a Linked Data aside, Eric, be aware of what an identifier identifies -
and then how you make assertions using them.
For example, ORCID doesn't use the hash or 303 pattern, so if you resolve
http://orcid.org/-0002-9952-7800 you get a webpage, i.e.
http://orcid.org/-0002-9952-7800 identifies a webpage (the same goes
for DOIs, btw). That is why I say about myself (in Turtle):

 <http://companjen.name/id/BC> dct:identifier
"http://orcid.org/-0002-7023-9047"; .

instead of

 <http://companjen.name/id/BC> owl:sameAs
<http://orcid.org/-0002-7023-9047> .


… for I am not a website.

Linking me to things I make is done like so (Qualified DC):

 <#thing> dct:creator <http://companjen.name/id/BC> .


In your example you used the identifiers as names for the creator(s); it
is as meaningful as saying (in unqualified/simple DC):

 <#thing> dc:creator "Eric Lease Morgan" .

Hope this helps :)

Groeten van Ben


On 05-06-14 00:14, "Stuart Yeates"  wrote:


Others have made excellent contributions to this thread, which I won't
repeat, but I feel it's worth asking the question:

Who is systematically cross walking these identifiers?

The only party I'm aware of doing this in a large-scale fashion is
Wikipedia, via https://en.wikipedia.org/wiki/Template:Authority_control

cheers
stuart


On 06/05/2014 06:34 AM, Eric Lease Morgan wrote:

ORDID and ResearcherID and Scopus, oh my!

It is just me, or are there an increasing number of unique identifiers
popping up in Library Land? A person can now be identified with any one
of a number of URIs such as:

* ORCID - http://orcid.org/-0002-9952-7800
* ResearcherID - http://www.researcherid.com/rid/F-2062-2014
* Scopus -
http://www.scopus.com/authid/detail.url?authorId=25944695600
* VIAF - http://viaf.org/viaf/26290254
* LC - http://id.loc.gov/authorities/names/n94036700
* ISNI - http://isni.org/isni/35290715

At least these identifiers are (for the most part) “cool”.

I have a new-to-me hammer, and these identifiers can play a nice role
in linked data. For example:

@prefix dc: <http://purl.org/dc/elements/1.1/> .
<http://dx.doi.org/10.1108/07378831211213201> dc:creator
  "http://orcid.org/-0002-9952-7800"; ,
  "http://id.loc.gov/authorities/names/n94036700"; ,
  "http://isni.org/isni/35290715"; ,
  "http://viaf.org/viaf/26290254"; .

How have any of y’all used theses sorts of identifiers, and what
problems do you think you will be able to solve by doing so? For
example, I know of a couple of instances where these sort of identifiers
are being put into MARC records.

—
Eric Morgan





Re: [CODE4LIB] orcid and researcherid and scopus, oh my

2014-06-04 Thread Stuart Yeates
Others have made excellent contributions to this thread, which I won't 
repeat, but I feel it's worth asking the question:


Who is systematically cross walking these identifiers?

The only party I'm aware of doing this in a large-scale fashion is 
Wikipedia, via https://en.wikipedia.org/wiki/Template:Authority_control


cheers
stuart


On 06/05/2014 06:34 AM, Eric Lease Morgan wrote:

ORDID and ResearcherID and Scopus, oh my!

It is just me, or are there an increasing number of unique identifiers popping 
up in Library Land? A person can now be identified with any one of a number of 
URIs such as:

   * ORCID - http://orcid.org/-0002-9952-7800
   * ResearcherID - http://www.researcherid.com/rid/F-2062-2014
   * Scopus - http://www.scopus.com/authid/detail.url?authorId=25944695600
   * VIAF - http://viaf.org/viaf/26290254
   * LC - http://id.loc.gov/authorities/names/n94036700
   * ISNI - http://isni.org/isni/35290715

At least these identifiers are (for the most part) “cool”.

I have a new-to-me hammer, and these identifiers can play a nice role in linked 
data. For example:

   @prefix dc:  .
    dc:creator
 "http://orcid.org/-0002-9952-7800"; ,
 "http://id.loc.gov/authorities/names/n94036700"; ,
 "http://isni.org/isni/35290715"; ,
 "http://viaf.org/viaf/26290254"; .

How have any of y’all used theses sorts of identifiers, and what problems do 
you think you will be able to solve by doing so? For example, I know of a 
couple of instances where these sort of identifiers are being put into MARC 
records.

—
Eric Morgan



[CODE4LIB] 2014 National Digital Forum conference

2014-05-19 Thread Stuart Yeates
I'm sending this on the off-chance that people happen to have the travel 
budget for it: http://www.ndf.org.nz/programme/


It's a great cross-GLAM event hosted by the Museum of New Zealand Te 
Papa Tongarewa, which you may be familiar with for their kōiwi tangata 
Māori repatriation program.


If I were a foreign visitor looking to have a paper accepted, I'd (a) 
get in touch with the organisers and (b) present on an indigenous topic.


cheers
stuart


Re: [CODE4LIB] statistics for image sharing sites?

2014-05-13 Thread Stuart Yeates

On 05/14/2014 01:39 PM, Joe Hourcle wrote:

On May 13, 2014, at 9:04 PM, Stuart Yeates wrote:


We have been using google analytics since October 2008 and by and large we're 
pretty happy with it.

Recently I noticed that we're getting >100 hits a day from the "Pinterest/0.1 
+http://pinterest.com/"; bot which I understand is a reasonably reliable indicator of 
activity from that site. Much of this activity is pure-jpeg, so there is no HTML and no 
opportunity to execute javascript, so google analytics doesn't see it.

pinterest.com is absent from our referrer logs.

My main question is whether anyone has an easy tool to report on this kind of 
use of our collections?


Set your webserver logs to include user agent (I use 'combined' logs), then use:

grep Pinterest /path/to/access/logs

You could also use any analytic tools that work directly off of your log files. 
 It might not have all of the info that the javascript analytics tools pull 
(window size, extensions installed, etc.), but it'll work for anything, not 
just HTML files.


When I visit http://www.pinterest.com/search/pins/?q=nzetc I see a whole 
lot of our images, but absolutely zero traffic in my log files, because 
those images are cached by pinterest.




My secondary question is whether any httpd gurus have recipes for redirecting by agent 
string from low quality images to high quality. So when AGENT =  "Pinterest/0.1 
+http://pinterest.com/"; and the URL matches a pattern redirect to a different 
pattern. For example:

http://nzetc.victoria.ac.nz/etexts/MakOldT/MakOldTP022a%28w100%29.jpg

to

http://nzetc.victoria.ac.nz/etexts/MakOldT/MakOldTP022a.jpg



Perfectly possible w/ Apache's mod_rewrite, but you didn't say what http server 
you're using.

If Apache, you'd do something like:

RewriteCond %{HTTP_USER_AGENT} ^Pinterest
RewriteRule (^/etexts/MakOldT/.*)\(.*\)\.jpg $1.jpg [L]


That is pretty much exactly what I was after, thanks. As discussed 
elsewhere on the thread, I plan on using it judiciously.


cheers
stuart


Re: [CODE4LIB] statistics for image sharing sites?

2014-05-13 Thread Stuart Yeates

On 05/14/2014 01:23 PM, Barnes, Hugh wrote:

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stuart 
Yeates
Sent: Wednesday, 14 May 2014 1:04 p.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] statistics for image sharing sites?

[snip]


My secondary question is whether any httpd gurus have recipes for redirecting by agent 
string from low quality images to high quality. So when AGENT =  "Pinterest/0.1 
+http://pinterest.com/"; and the URL matches a pattern redirect to a different 
pattern. For example:



http://nzetc.victoria.ac.nz/etexts/MakOldT/MakOldTP022a%28w100%29.jpg



to



http://nzetc.victoria.ac.nz/etexts/MakOldT/MakOldTP022a.jpg


This sounds totally doable, but what are you trying to achieve?

To my mind, it has unintended consequences and chaos writ all over it.


My naive thought was to make the images that appear in pinterest as 
high-quality as possible, rather than the thumbnails that we often use 
on search pages, etc.


cheers
stuart


[CODE4LIB] statistics for image sharing sites?

2014-05-13 Thread Stuart Yeates
We have been using google analytics since October 2008 and by and large 
we're pretty happy with it.


Recently I noticed that we're getting >100 hits a day from the 
"Pinterest/0.1 +http://pinterest.com/"; bot which I understand is a 
reasonably reliable indicator of activity from that site. Much of this 
activity is pure-jpeg, so there is no HTML and no opportunity to execute 
javascript, so google analytics doesn't see it.


pinterest.com is absent from our referrer logs.

My main question is whether anyone has an easy tool to report on this 
kind of use of our collections?


My secondary question is whether any httpd gurus have recipes for 
redirecting by agent string from low quality images to high quality. So 
when AGENT =  "Pinterest/0.1 +http://pinterest.com/"; and the URL matches 
a pattern redirect to a different pattern. For example:


http://nzetc.victoria.ac.nz/etexts/MakOldT/MakOldTP022a%28w100%29.jpg

to

http://nzetc.victoria.ac.nz/etexts/MakOldT/MakOldTP022a.jpg

cheers
stuart


Re: [CODE4LIB] Extracting Text From .tiff Files

2014-05-12 Thread Stuart Yeates
Your first step is to pin down the format. TIFF is a container form (like zip) 
and can contain pretty much anything. Likely candidates for you format include 
https://en.wikipedia.org/wiki/IPTC_Information_Interchange_Model and 
https://en.wikipedia.org/wiki/Extensible_Metadata_Platform 

Your second step is to find a library / tool for your platform that supports 
your format. 

Cheers
stuart

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Gavin 
Spomer
Sent: Tuesday, 13 May 2014 10:01 a.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Extracting Text From .tiff Files

Hello folks, 

I'm in the process of migrating a student newspaper collection, currently 
implemented with ResCarta, into our new bepress institutional repository. 
ResCarta has each page of a newspaper stored as a tiff file. Not only does the 
tiff file contain the graphics data, but it has some metadata in xml format and 
the fulltext of the page. I know this because I opened up some of the tiffs 
with a plain-text editor (Vim). 

Although I can see the text in the file, I've only been about 90% accurate in 
extracting it with a script. Some of those "weird" characters seem to do some 
wonky things when doing file IO for some reason. Is there a more reliable way 
to extract text stored in a tiff file? I've Googled and Googled and have pulled 
up almost nothing. But there's got to be a way, since ResCarta stores it there 
and can extract it. 

Any ideas? 
Gavin Spomer
Systems Programmer
Brooks Library
Central Washington University


Re: [CODE4LIB] Withdraw my post was: Re: [CODE4LIB] separate list for jobs

2014-05-11 Thread Stuart Yeates
Context: http://www.youtube.com/watch?v=dIYvD9DI1ZA

Cheers
stuart

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
Fitchett, Deborah
Sent: Monday, 12 May 2014 9:53 a.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Withdraw my post was: Re: [CODE4LIB] separate list for 
jobs

I can't help with the Python, but a test case for the script would obviously be 
"You know I can't subscribe to your ghost jobs list".

Deborah

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Susan 
Kane
Sent: Friday, 9 May 2014 2:44 a.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Withdraw my post was: Re: [CODE4LIB] separate list for jobs

Obviously, we must now task someone in CODE4LIB with writing a Python script to 
convert New Zealand English to International English.

Or, I guess we could solve this on the user side with a sarcasm filter or a 
humor pipe, but you might lose some data that way.

:-)

-- Susan Kane
Boston(ish), MA


P Please consider the environment before you print this email.
"The contents of this e-mail (including any attachments) may be confidential 
and/or subject to copyright. Any unauthorised use, distribution, or copying of 
the contents is expressly prohibited. If you have received this e-mail in 
error, please advise the sender by return e-mail or telephone and then delete 
this e-mail together with all attachments from your system."


Re: [CODE4LIB] how to post jobs (was Re: [CODE4LIB] separate list for Jobs)

2014-05-08 Thread Stuart Yeates

On 05/09/2014 10:04 AM, Jodi Schneider wrote:

On Thu, May 8, 2014 at 9:54 PM, Coral Sheldon-Hess wrote:


I have another, maybe minor, point to add to this: I've posted a job to
Code4Lib, and I did it wrong. I have no idea how I'm supposed to make a job
show up correctly, and now that I have realized I've done it wrong, I
probably won't send another job to this list. (Or maybe I'll look it up in
... where? the wiki?)


You post them at
http://jobs.code4lib.org/


Could that information please be added to the footer that's added when 
posting jobs?


cheers
stuart


Re: [CODE4LIB] Withdraw my post was: Re: [CODE4LIB] separate list for jobs

2014-05-08 Thread Stuart Yeates

On 05/09/2014 02:44 AM, Susan Kane wrote:

Obviously, we must now task someone in CODE4LIB with writing a Python
script to convert New Zealand English to International English.


Yes, because tasking people with AI-complete programming tasks (see 
https://en.wikipedia.org/wiki/AI-complete ) is only slightly worse than 
systematically malfunctioning sarcasm filters.



Or, I guess we could solve this on the user side with a sarcasm filter or a
humor pipe, but you might lose some data that way.


Or we could acknowledge code4lib's role as a safe place for people to 
tune their sarcasm detectors.


cheers
stuart


[CODE4LIB] Withdraw my post was: Re: [CODE4LIB] separate list for jobs

2014-05-07 Thread Stuart Yeates
The fact that the only person who has given any acknowledgement of 
understanding my message was someone else in .ac.nz suggests that 
despite my best efforts my message content was effectively shredded by 
the implicit conversion from New Zealand English to International English.


My apologies; I withdraw my original email.

To translate explicitly into International English, my point was:

"I have observed that an individuals position on mail filtering vs 
separate mailing lists appears to be an implicit marker of group 
membership in this group (i.e. a shibboleth)."


Note that I do not endorse this or any other marker of group membership, 
but my understanding of psychology of groups suggest that all functional 
groups have markers of group membership and that attempting to eliminate 
markers of group membership in an attempt at inclusiveness (a) can in 
itself be a marker of group membership and (b) is only likely to drive a 
shift from relative explicit markers to relatively implicit markers.


cheers
stuart

On 05/08/2014 10:17 AM, David Friggens wrote:

This is a pretty terrible reply.


I thought it was a great reply.


obscure words (seriously, shibboleth?)


Somewhat obscure, but not so much in Code4Lib.
http://en.wikipedia.org/wiki/Shibboleth
http://en.wikipedia.org/wiki/Shibboleth_(Internet2)


Unless you're trying to be sarcastic...in which case ignore this.


He most definitely was.

I believe Stuart's point was to suggest that when the multiple
requests for a separate list for job notices get immediately shot down
with "no - use an email filter, or are you stupid?" [1] it doesn't
help to create an "inclusive" and "good learning environment".

[1] NB the respondents aren't explicitly "are you stupid" but that's
how it may be taken by some people.


And to answer the original question - job listings help more people than they 
annoy so they should be kept as-is.


My view is that it would make more sense to have separate discussion
and job notice lists, as I see in other places. But I'm not that
bothered personally, as I would subscribe to both and filter them into
the same folder in my mail client. :-)

Cheers
David



Re: [CODE4LIB] separate list for jobs

2014-05-06 Thread Stuart Yeates

On 05/07/2014 04:59 AM, Richard Sarvas wrote:

Not to be a jerk about this, but why is the answer always "No"? There seem to 
be more posts on this list relating to job openings than there are relating to code 
discussions. Are job postings a part why this list was originally created? If so, I'll 
stop now.


The answer is always "no" because we are collectively using the the 
possession of an email client with filtering capability and the personal 
knowledge of how to use it as a Shibboleth for group membership. Those 
who find it easier to complain than write a filter mark themselves as 
members of the outgroup intruding on the ingroup.


cheers
stuart


Re: [CODE4LIB] barriers to open metadata?

2014-04-29 Thread Stuart Yeates

On 04/30/2014 09:38 AM, David Friggens wrote:

Hi Laura


I'd like to find out from as many people as are interested what barriers
you feel exist right now to you releasing your library's bibliographic
metadata openly.


One issue is that we pay for enrichments (tables of contents etc) for
records, and I believe the licence restricts us from giving them to
other people. We send our records to the national union catalogue and
OCLC before adding the enrichments, and we'd need to take them out
before we could "release" records elsewhere.


Note that this is primarily a problem because MARC assumes that all 
versioning is done at the record level; there's no easy way to say "the 
core bib item is from X, the TOC is from Y and the cover image is from Z".


cheers
stuart


Re: [CODE4LIB] New Zealand Chapter

2014-04-09 Thread Stuart Yeates

Nice.

The real question is whether that's U+2163, like it should be.

cheers
stuart

On 04/10/2014 07:17 AM, Jay Gattuso wrote:

Hi all,

Long time listener, first time caller.

We don't have a C4L chapter over here in New Zealand, and I wondered what we 
would need to do to align the small group of Lib  / GLAM coders with the 
broader C4L group.

One of my colleagues did make this: http://i.imgur.com/XgGP9vX.jpg

We are  also setting up a two day code/hack fest, focusing on our Digital 
Preservation concerns, in June.

I'd also really like to "run" the hackfest under a C4L banner.

Any thoughts?

J

Jay Gattuso | Digital Preservation Analyst | Preservation, Research and 
Consultancy
National Library of New Zealand | Te Puna Mātauranga o Aotearoa
PO Box 1467 Wellington 6140 New Zealand | +64 (0)4 474 3064
jay.gatt...@dia.govt.nz



[CODE4LIB] Cataloguing Telugu

2014-04-07 Thread Stuart Yeates
Currently there is a funding proposal for cataloguing Telugu works up 
before the Wikimedia foundation. If anyone has experience with Telugu or 
knows of any tools that are likely to be useful, please give your input:


https://meta.wikimedia.org/wiki/Grants:IEG/Making_telugu_content_accessible

cheers
stuart


Re: [CODE4LIB] Open Publication Distribution System

2014-02-09 Thread stuart yeates

We use OPDS at http://nzetc.victoria.ac.nz and have for a while now.

Basically it's an extra namespace that you can use in your RSS with 
extra information for ebook readers / consumers. Other RSS readers / 
consumers silently ignore the namespace, so done right you only need one 
RSS feed.


We do ours on the back of our solr search, so that suddenly you can 
browse by anything you can facet or search by. At the bottom of every 
search page is the result set as RSS / OPDS.


cheers
stuart


On 06/02/14 11:06, Bigwood, David wrote:

I recently became aware of Open Publication Distribution System (OPDS) Catalog 
format, a syndication format for e-pubs based on Atom & HTTP. It is something 
like an RSS feed for e-books. People are using it to find and acquire books. It 
sounds like a natural fit for library digitization projects. An easy way for folks 
to know what's new and grab a copy if they like.

So is anyone using this? Is it built into Omeka, Greenstone, DSpace or any of 
our tools? If you do use it do you have separate feeds for different projects. 
Say, one for dissertations, another for the local history project and another 
for books by state authors. Or do you have just one large feed? Is it being 
used by the DPLA or Internet Archive? How's it working for you?

We have plenty of documents we have scanned as well as our own publications. 
Might this be a good way to make them more discoverable? Or is this just a tool 
no one is using?

Thanks,
David Bigwood
dbigw...@hou.usra.edu<mailto:dbigw...@hou.usra.edu>
Lunar and Planetary Institute

https://twitter.com/Catalogablog




--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] EZProxy changes / alternatives ?

2014-02-03 Thread stuart yeates

On 04/02/14 05:09, Andrew Anderson wrote:

There exists a trivial DoS attack against EZproxy that I reported to OCLC about 
2 years ago, and has not been addressed yet.


... and as soon as that gets a CVE (see http://cve.mitre.org/), 
corporate IT departments will force libraries to upgrade to the latest 
version or turn the software off.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Southeastern Library Association

2014-02-02 Thread Stuart Yeates
> [and yes, the article is still in need of secondary sources]

Thanks to User:Eveross1 (who may or may not be a code4libber) for stepping up 
and fixing this.

cheers
stuart


Re: [CODE4LIB] Southeastern Library Association

2014-02-02 Thread stuart yeates

On 03/02/14 13:35, BWS Johnson wrote:

[This also serves to illustrate why wikipedia has issues as an authority
control system.]


 I went ahead and strongarmed the templates away. Feel free to add your 
thoughts on the talk page. :)

 Wikimedians are very cool in person, and there's acknowledgement inside of 
the community that there are several bad actors that end up making for lots of 
bad experiences. So any time you run into this, revert the changes, add more 
sources if possible, and add to the talk page so that editors that aren't in 
the know should be able to read the whys of things.


I'm the wikimedian who added the templates there in the first place to 
give the newbie author some guidance as to what needed to happen; when 
the newbie editor ran out of steam I appealed for input from here.


Wikipedia is in many ways as structured as cataloguing, but you can get 
away with pretty much everything if you have secondary sources.


The fact that anyone on this list thinks that a single-column 
contemporary eye-witness account qualifies as a secondary source 
staggers me. Maybe that makes me a bad actor.


[and yes, the article is still in need of secondary sources]

cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] EZProxy changes / alternatives ?

2014-02-02 Thread stuart yeates

It's worse than that.

The price we were quoted for hosting seems to have been picked so it can 
be offered with a 90% discount when bundled with a package deal with 
other OCLC products; buying into the on-going balkanization of the industry.


cheers
stuart

On 01/02/14 16:24, Roy Tennant wrote:

When it comes to hedging bets, I'd sure rather hedge my $50,000 bet than my
$500 one. Just sayin'.
Roy


On Fri, Jan 31, 2014 at 6:04 PM, BWS Johnson wrote:


Salvete!

   Tisn't necessarily Socialist to hedge one's bets. Look at what Wall
St. experts advise when one is unsure of whether to hold or sell. Monopoly
is only ever in the interest of those that hold it.

Short term the aquarium is enticing, but do you enjoy your
collapsed dorsal fin?

Cheers,
Brooke

--
On Fri, Jan 31, 2014 6:10 PM EST Salazar, Christina wrote:


I think though that razor thin budgets aside, the EZProxy using community

is vulnerable to what amounts to a monopoly. Don't get any ideas, OCLC
peeps (just kiddin') but now we're so captive to EZProxy, what are our
options if OCLC wants to gradually (or not so gradually) jack up the price?


Does being this captive to a single product justify community developer

time?


I think so but I'm probably just a damn socialist.

On Jan 31, 2014, at 1:36 PM, "Tim McGeary"  wrote:


Even with razor thin budgets, this is a no brainer.  May they need

decide

between buying 10 new books or license EZProxy?  Possibly, but if they

have

a need for EZProxy, that's still a no brainer - until a solid OSS
replacement that includes as robust a developer /support community comes
around.  But again, at $500/year, I don't see a lot of incentive to

invest

in such a project.


On Fri, Jan 31, 2014 at 3:55 PM, Riley Childs 
wrote:


But there are places on a razor thin budget, and things like this throw
them off ball acne

Sent from my iPhone


On Jan 31, 2014, at 3:32 PM, "Tim McGeary" 

wrote:


So what's the price point that EZProxy needs to climb to make it more
realistic to put resources into an alternative.  At $500/year, I don't

even

have to think about justifying it.  At 1% (or less) of the cost of

position

with little to no prior experience needed, it doesn't make a lot of

sense

to invest in an open source alternative, even on a campus that heavily

uses

Shibboleth.

Tim


On Fri, Jan 31, 2014 at 1:36 PM, Ross Singer 

wrote:


Not only that, but it's also expressly designed for the purpose of

reverse

proxying subscription databases in a library environment.  There are

tons

of things vendors do that would be incredibly frustrating to get

working

properly in Squid, nginx, or Apache that have already been solved by
EZProxy.  Which is self-fulfilling: vendors then cater to what EZProxy

does

(rather than improving access to their resources).

Art Rhyno used to say that the major thing that was inhibiting the
widespread adoption of Shibboleth was how simple and cheap EZProxy was.

I

think there is a lot of truth to that.

-Ross.


On Fri, Jan 31, 2014 at 1:23 PM, Kyle Banerjee <

kyle.baner...@gmail.com

wrote:



EZproxy is a self-installing statically compiled single binary

download,

with a built-in administrative interface that makes most common
administrative tasks point-and-click, that works on Linux and Windows
systems, and requires very little in the way of resources to run.  It
also
has a library of a few hundred vendor stanzas that can be copied and
pasted
and work the majority of the time.

To successfully replace EZproxy in this setting, it would need to be
packaged in such a way that it is equally easy to install and

maintain,

and
the library of vendor stanzas would need to be developed as apache

conf.d

files.

This. The real gain with EZProxy is that configuring it is crazy easy.

You

just drop it in and run it -- it's feasible for someone with no

experience

in proxying or systems administration to get it operational in a few
minutes. That is why I think virtualizing a system that makes

accessing

the

more powerful features of EZProxy easy is a good alternative.

kyle




--
Tim McGeary
timmcge...@gmail.com
GTalk/Yahoo/Skype/Twitter: timmcgeary
484-294-7660 (cell)




--
Tim McGeary
timmcge...@gmail.com
GTalk/Yahoo/Skype/Twitter: timmcgeary
484-294-7660 (cell)







--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


[CODE4LIB] Southeastern Library Association

2014-02-02 Thread stuart yeates
Someone has put a great deal of time into 
https://en.wikipedia.org/wiki/Southeastern_Library_Association but it's 
going to get deleted unless it acquires some independent secondary sources.


[This also serves to illustrate why wikipedia has issues as an authority 
control system.]


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] EZProxy changes / alternatives ?

2014-02-02 Thread stuart yeates

On 01/02/14 08:34, Mosior, Benjamin wrote:

Does anyone have any thoughts on how to move forward with organizing the 
development and adoption of an alternative proxy solution?

A collaborative Google Doc? Perhaps a LibraryProxy GitHub Organization?


I'd say that more than anything else what is needed is for techies to do 
experiments, document and share the results. These could either follow 
on from the example of Andrew Anderson earlier in this thread or strike 
out in different directions.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] EZProxy changes / alternatives ?

2014-01-29 Thread stuart yeates
 better way 
to do this, but this is what I threw together for testing:


ProxyHTMLEnable Off
SetOutputFilter INFLATE;dummy-html-to-plain
ExtFilterOptions LogStdErr Onfail=remove

ExtFilterDefine dummy-html-to-plain mode=output intype=text/html 
outtype=text/plain cmd=“/bin/cat -“

So what’s currently missing in the Apache HTTPd solution?

- Services that use an authentication token (predominantly ebook vendors) need 
special support written.  I have been entertaining using mod_lua for this to 
make this support relatively easy for someone who is not hard-core technical to 
maintain.

- Services that are not IP authenticated, but use one of the Form-based 
authentication variants.  I suspect that an approach that injects a script tag 
into the page pointing to javascript that handles the form fill/submission 
might be a sane approach here.  This should also cleanly deal with the ASP.net 
abominations that use __PAGESTATE to store sessions client-side instead of 
server-side.

- EZproxy’s built-in DNS server (enabled with the “DNS” directive) would need 
to be handled using a separate DNS server (there are several options to choose 
from).

- In this setup, standard systems-level management and reporting tools would be 
used instead of the /admin interface in EZproxy

- In this setup, the functionality of the EZproxy /menu URL would need to be 
handled externally.  This may not be a real issue, as many academic sites 
already use LMS or portal systems instead of the EZproxy to direct students to 
resources, so this feature may not be as critical to replicate.

- And of course, extensive testing.  While the above ProQuest stanza works for 
the main ProQuest search interface, it won’t work for everyone, everywhere just 
yet.

Bottom line: Yes, Apache HTTPd is a viable EZproxy alternative if you have a 
system administrator who knows their way around Apache HTTPd, and are willing 
to spend some time getting to know your vendor services intimately.

All of this testing was done on Fedora 19 for the 2.4 version of HTTPd, which 
should be available in RHEL7/CentOS7 soon, so about the time that hard 
decisions are to be made regarding EZproxy vs something else, that something 
else may very well be Apache HTTPd with vendor-specific configuration files.




--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] EZProxy changes / alternatives ?

2014-01-29 Thread stuart yeates
The text I've seen talks about "[e]xpanded reporting capabilities to 
support management decisions" in forthcoming versions and encourages 
towards the hosted solution.


Since we're in .nz, they'd put our hosted proxy server in .au, but the 
network connection between .nz and .au is via the continental .us, which 
puts an extra trans-pacific network loop in 99% of our proxied network 
connections.


cheers
stuart

On 30/01/14 03:14, Ingraham Dwyer, Andy wrote:

OCLC announced in April 2013 the changes in their license model for North 
America.  EZProxy's license moves from requiring a one-time purchase of US$495 
to a *annual* fee of $495, or through their hosted service, with the fee 
depending on scale of service.  The old one-time purchase license is no longer 
offered for sale as of July 1, 2013.  I don't have any details about pricing 
for other parts of the world.

An important thing to recognize here, is that they cannot legally change the 
terms of a license that is already in effect.  The software you have purchased 
under the old license is still yours to use, indefinitely.  OCLC has even 
released several maintenance updates during 2013 that are available to current 
license-holders.  In fact, they released V5.7 in early January 2014, and made 
that available to all license-holders.  However, all updates after that version 
are only available to holders of the yearly subscription.  The hosted product 
is updated to the most current version automatically.

My recommendation is:  If your installation of EZProxy works, don't change it.  
Yet.  Upgrade your installation to the last version available under the old 
license, and use that for as long as you can.  At this point, there are no 
world-changing new features that have been added to the product.  There is 
speculation that IPv6 support will be the next big feature-add, but I haven't 
heard anything official.  Start planning and budgeting for a change, either to 
the yearly fee, or the cost of hosted, or to some as-yet-undetermined 
alternative.  But I see no need to start paying now for updates you don't need.

-Andy



Andy Ingraham Dwyer
Infrastructure Specialist
State Library of Ohio
274 E. 1st Avenue
Columbus, OH 43201
library.ohio.gov


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of stuart 
yeates
Sent: Tuesday, January 28, 2014 10:03 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] EZProxy changes / alternatives ?

I probably should have been more specific.

Does anyone have experience switching from EzProxy to anything else?

Is anyone else aware of the coming OCLC changes and considering switching?

Does anyone have a worked example like: "My EzProxy config for site Y looked like A; 
after the switch, my X config for site Z looked like B"?

I'm aware of this good article:
http://journal.code4lib.org/articles/7470

cheers
stuart


On 29/01/14 15:24, stuart yeates wrote:

We've just received notification of forth-coming changes to EZProxy,
which will require us to pay an arm and a leg for future versions to
install locally and/or host with OCLC AU with a ~ 10,000km round trip.

What are the alternatives?

cheers
stuart



--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/




--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] EZProxy changes / alternatives ?

2014-01-28 Thread stuart yeates

I probably should have been more specific.

Does anyone have experience switching from EzProxy to anything else?

Is anyone else aware of the coming OCLC changes and considering switching?

Does anyone have a worked example like: "My EzProxy config for site Y 
looked like A; after the switch, my X config for site Z looked like B"?


I'm aware of this good article:
http://journal.code4lib.org/articles/7470

cheers
stuart


On 29/01/14 15:24, stuart yeates wrote:

We've just received notification of forth-coming changes to EZProxy,
which will require us to pay an arm and a leg for future versions to
install locally and/or host with OCLC AU with a ~ 10,000km round trip.

What are the alternatives?

cheers
stuart



--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] EZProxy changes / alternatives ?

2014-01-28 Thread stuart yeates
EZProxy is a proxy for use with vendors that have products gateway'd by 
IP address. It allows users who are off-campus to access resources that 
are locked down by IP address as though the user was on campus. It does 
deep-packet inspection to write URLs and javascript, facing DNS stuff, etc.


It's a product from OCLC, see http://www.oclc.org/en-US/ezproxy.html

cheers
stuart

On 29/01/14 15:05, Riley Childs wrote:

Ok, what exactly is EZProxy, I could never figure that out, if I knew I could 
help :)

Sent from my iPhone


On Jan 28, 2014, at 9:04 PM, "stuart yeates"  wrote:

We've just received notification of forth-coming changes to EZProxy,
which will require us to pay an arm and a leg for future versions to
install locally and/or host with OCLC AU with a ~ 10,000km round trip.

What are the alternatives?

cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/





--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


[CODE4LIB] EZProxy changes / alternatives ?

2014-01-28 Thread stuart yeates
We've just received notification of forth-coming changes to EZProxy, 
which will require us to pay an arm and a leg for future versions to 
install locally and/or host with OCLC AU with a ~ 10,000km round trip.


What are the alternatives?

cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Expressing negatives and similar in RDF

2013-09-17 Thread stuart yeates

On 13/09/13 23:32, Meehan, Thomas wrote:


However, it would be more useful, and quite common at least in a bibliographic context, to say "This book does not 
have a title". Ideally (?!) there would be an ontology of concepts like "none", "unknown", or 
even "something, but unspecified":

This book has no title:
example:thisbook dc:title hasobject:false .

It is unknown if this book has a title (sounds undesirable but I can think of 
instances where it might be handy[2]):
example:thisbook dc:title hasobject:unknown .

This book has a title but it has not been specified:
example:thisbook dc:title hasobject:true .


The root of the cure here is having a model that defines the exact 
semantics of the RDF tags you're using.


For example the FRBRoo model, to assert that an F1 (Work) exists 
logically implies the existence of an E39 (Creator), an F27 (Work 
Conception), an F28 (Expression Creation), an F4 (Manifestation 
Singleton) and an F2 Expression, as well as two E52 (TimeSpan)s and two 
E53 (Place)s. See 
http://www.cidoc-crm.org/frbr_graphical_representation/graphical_representation/work_time.html


The bibliographer / cataloguer need not mention any of these, unless 
they wish to use them to add metadata to the F1 or to connect them with 
other items in the collection.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Subject Terms in Institutional Repositories

2013-09-02 Thread stuart yeates

That's handled by staff with cataloguing training and disposition.

cheers
stuart

On 03/09/13 05:24, McAulay, Elizabeth wrote:

Hi Stuart,

For bullet point #2 below, how do you manage the workflow of the "creative 
spelling" correction. Is the correction handled manually or automatically, or 
somewhere in between?

Thanks,
Lisa

-
Elizabeth "Lisa" McAulay
Librarian for Digital Collection Development
UCLA Digital Library Program
http://digital.library.ucla.edu/
email: emcaulay [at] library.ucla.edu

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of stuart yeates 
[stuart.yea...@vuw.ac.nz]
Sent: Sunday, September 01, 2013 1:36 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Subject Terms in Institutional Repositories

I run the techie side of http://researcharchive.vuw.ac.nz/ and we use
dc.subject:

(*) We ask for at least three depositor-supplied keywords

(*) When a depositor uses creative spelling in any of the
depositor-supplied fields, we add standard spelling as a dc.subject

(*) When any field uses non-English language terms we add an English
term as a dc.subject

(*) When any field uses English language terms to refer to non-English
subjects, we add a dc.subject with the native-language term

(*) We have some hacky stuff in vuwschema.subject.* which the DSpace
development team have told use to keep hacky while they migrate to
http://dublincore.org/documents/dcmi-terms/ in the next couple of releases.

We'd love to have the resources to do proper subject classification,
because it would be a huge enabler of deep interoperability.

cheers
stuart

On 31/08/13 01:36, Matthew Sherman wrote:

Sorry, I probably should have provided a bit more depth.  It is a
University Institutional Repository so we have a rather varied collection
of materials from engineering to education to computer science to
chiropractic to dental to some student theses and posters.  So I guess I
need to find something at is extensible.  Does that provide a better idea
or should I provide more info?


On Fri, Aug 30, 2013 at 9:32 AM, Jacob Ratliff wrote:


Hi Matt,

It depends on the subject area of your repository. There are dozens of
controlled vocabularies that exist (not including specific Enterprise
Content Management controlled vocabularies). If you can describe your
collection, people might be able to advise you better.

Jacob Ratliff
Archivist/Taxonomy Librarian
National Fire Protection Association


On Fri, Aug 30, 2013 at 9:26 AM, Matthew Sherman
wrote:


Hello Code4Libbers,

I am working on cleaning up our institutional repository, and one of the
big areas of improvement needed is the list of terms from the subject
fields.  It is messy and I want to take the subject terms and place them
into a much better order.  I was contemplating using Library of Congress
Subject Headings, but I wanted to see what others have done in this area

to

see if there is another good controlled vocabulary that could work

better.

Any insight is welcome.  Thanks for your time everyone.

Matt Sherman
Digital Content Librarian
University of Bridgeport








--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/




--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Subject Terms in Institutional Repositories

2013-09-01 Thread stuart yeates
I run the techie side of http://researcharchive.vuw.ac.nz/ and we use 
dc.subject:


(*) We ask for at least three depositor-supplied keywords

(*) When a depositor uses creative spelling in any of the 
depositor-supplied fields, we add standard spelling as a dc.subject


(*) When any field uses non-English language terms we add an English 
term as a dc.subject


(*) When any field uses English language terms to refer to non-English 
subjects, we add a dc.subject with the native-language term


(*) We have some hacky stuff in vuwschema.subject.* which the DSpace 
development team have told use to keep hacky while they migrate to 
http://dublincore.org/documents/dcmi-terms/ in the next couple of releases.


We'd love to have the resources to do proper subject classification, 
because it would be a huge enabler of deep interoperability.


cheers
stuart

On 31/08/13 01:36, Matthew Sherman wrote:

Sorry, I probably should have provided a bit more depth.  It is a
University Institutional Repository so we have a rather varied collection
of materials from engineering to education to computer science to
chiropractic to dental to some student theses and posters.  So I guess I
need to find something at is extensible.  Does that provide a better idea
or should I provide more info?


On Fri, Aug 30, 2013 at 9:32 AM, Jacob Ratliff wrote:


Hi Matt,

It depends on the subject area of your repository. There are dozens of
controlled vocabularies that exist (not including specific Enterprise
Content Management controlled vocabularies). If you can describe your
collection, people might be able to advise you better.

Jacob Ratliff
Archivist/Taxonomy Librarian
National Fire Protection Association


On Fri, Aug 30, 2013 at 9:26 AM, Matthew Sherman
wrote:


Hello Code4Libbers,

I am working on cleaning up our institutional repository, and one of the
big areas of improvement needed is the list of terms from the subject
fields.  It is messy and I want to take the subject terms and place them
into a much better order.  I was contemplating using Library of Congress
Subject Headings, but I wanted to see what others have done in this area

to

see if there is another good controlled vocabulary that could work

better.

Any insight is welcome.  Thanks for your time everyone.

Matt Sherman
Digital Content Librarian
University of Bridgeport








--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] The Wikipedia Library

2013-08-29 Thread stuart yeates
If I understand what you're saying, what you need is an EZproxy 
http://www.oclc.org/ezproxy.en.html install configured to authenticate 
against Unified login https://meta.wikimedia.org/wiki/Help:Unified_login 
and specific user rights 
https://www.mediawiki.org/wiki/Manual:User_rights or some other user 
grouping mechanism.


EZproxy is the defacto standard for making paywalled resources avaliable 
to institutional users from off campus. The only downsides are (a) that 
it requires you have full control over DNS because a proxy at 
https://proxy.wikimedia.org/ would also answer 
https://some.subscrpition.resource.example.com.proxy.wikimedia.org/ and 
forward to https://some.subscrpition.resource.example.com/ and (b) as a 
proxy, it touches all traffic, stream video and very large PDFs will be 
redirected through the proxy.


Alternatively, there is a slight chance of joining a Shibboleth 
federation http://en.wikipedia.org/wiki/Shibboleth_%28Internet2%29 but 
that's a big ball of policy that is likely to be incompatible. 
Shibboleth is preferred for streaming because there is no proxying.


cheers
stuart
http://en.wikipedia.org/wiki/User:Stuartyeates

On 29/08/13 03:49, Jake Orlowitz wrote:

Hi folks,

My name is Jake Orlowitz and I coordinate Wikipedia's open research hub,
The Wikipedia Library.  Wikimedia Foundation board member Phoebe Ayers
recommended that I reach out to you to see if we might be able to
collaborate in some way.

The Wikipedia Library has several different platforms, several of which
would benefit from better technical integration.  One of our primary goals
is to get active, experienced Wikipedia editors access to paywalled sources
and university libraries.  We have received donations from several
publishers and interest from several libraries.  The challenge for us is
managing those partnerships at scale and in a secure fashion.

We're also working towards more functional research desks, programs that
let reference librarians field research queries from editors or the public,
remote 'visiting scholar' or 'research affiliate' positions at
institutional libraries, University partnerships with online library
access, open access awareness programs, and other related activities.

I'd love to talk more about these projects with you, either through email
or voice chat.

Best,

Jake Orlowitz
   Wikipedia: Ocaasi <http://enwp.org/User:Ocaasi>
   Facebook: Jake Orlowitz <http://www.facebook.com/jorlowitz>
   Twitter: JakeOrlowitz <https://twitter.com/JakeOrlowitz>
   LinkedIn: Jake Orlowitz<http://www.linkedin.com/profile/view?id=197604531>
   Email: jorlow...@yahoo.com
   Skype: jorlowitz
   Cell: (484) 684-2104
   Home: (484) 380-3940




--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] text mining software

2013-08-27 Thread stuart yeates
There have been some great software recommendations in this thread, that 
I really don't want to quibble with. What I'd like to quibble with is 
the software-first approach. We've all tried the software-first 
approach, how many of us were happy with it?


There is a standard in this area and that standard appears to have at 
least two non-trivial implementations, including from one software 
distributor whose name we all recognise.


SPEC: http://docs.oasis-open.org/uima/v1.0/uima-v1.0.html
APACHE UIMA: http://uima.apache.org/
GATE: http://gate.ac.uk/

Anyone have experience using the standard or these two implementations?

cheers
stuart

--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Way to record usage of tables/rooms/chairs in Library

2013-08-15 Thread stuart yeates
Many buildings have IR sensors already installed for burglar alarms / 
fire detection. If you can get a read-only feed from that system you may 
be able to piggyback.


Of course, these kinds of sensors are tripped by staff making regular 
rounds of all spaces and similar non-patron activity.


cheers
stuart

On 16/08/13 06:33, Brian Feifarek wrote:

Motion sensors might be the ticket.  For example, 
https://www.sparkfun.com/products/8630

Brian
- Original Message -
From: "Andreas Orphanides" 
To: CODE4LIB@LISTSERV.ND.EDU
Sent: Thursday, August 15, 2013 11:12:02 AM
Subject: Re: [CODE4LIB] Way to record usage of tables/rooms/chairs in Library

Oh, that's a much better idea than light sensors. One challenge with that
might be difficulty in determining what "vacant" looks like
authoritatively, especially if people move chairs, walk through room, etc.
But much more accessible than actually bolting stuff to the table, I would
think.

On Thu, Aug 15, 2013 at 1:03 PM, Schwartz, Raymond wrote:


Hey Dre, Perhaps a video camera with some OpenCV?

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Andreas Orphanides
Sent: Thursday, August 15, 2013 8:55 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Way to record usage of tables/rooms/chairs in
Library

If I were feeling really ambitious -- and fair warning, I'm a big believer
that any solution worth engineering is worth over-engineering -- I'd come
up with something involving light sensors (a la a gate counter) mounted on
the table legs, just above seat height. Throw in some something something
Arduino or Raspberry Pi, and Bob's your uncle.

I find myself more intimidated by the practicality of maintaining such a
system (batteries, cord management etc) than about the practicality of this
implementation, actually.

-dre.

On Wed, Aug 14, 2013 at 7:59 PM, Thomas Misilo  wrote:


Hi,

I was wondering if anyone has been asked before to come up with a way
to record usage of tables.

The ideal solution would be a web app, that we can create floor plans
with where all the tables/chairs are and select the "reporting time",
say 9PM at night. Go around the library and select all the
seats/tables/rooms that are currently being used/occupied for

statistical data.


We would be wanting to go around probably multiple times a day.

The current solution I have seen is a pen and paper task, and then
someone will have to manually put the data into a spreadsheet for

analysis.


Thanks!

Tom








--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] TemaTres 1.7 released: now with meta-terms and SPARQL endpoint

2013-08-13 Thread stuart yeates

I'm glad to see development of this continuing.

It's been on my list of potential stand-along authority control systems.

cheers
stuart

On 14/08/13 13:35, diego ferreyra wrote:

We are glad to announce the public release of
TemaTres<./TemaTres.html>1.7.

Here the changelog:
- Now you can have a SPARQL Endpoint for your TemaTres vocabulary. Many
thanks to Enayat Rajabi!!!
- Capability to create and manage meta-terms. Meta-term is a term to
describe others terms (Ej: Guide terms, Facets, Categories, etc.). Can't be
use in indexing process.
- New standard reports: all the terms with his UF terms and all the terms
with his RT terms.
- Capability to define custom fields in alphabetical export
- New capabilities for TemaTres API: suggest & suggestDetails,
- Fixed bugs and improved several functional aspects.

Many thanks to the feedback provided by TemaTres community :)

Some HOWTO:
How to update to Tematres 1.7:
- Login as admin and go to: Menu -> Administration -> Database maintance ->
Update 1.6 to 1.7

How to enable SPARQL endpoint:
1) Login as admin and go to Menu -> Administration -> Configuration ->
Click in your vocabulary: Set as ENABLE SPARQL endpoint (by default is
disable).

2) Login as admin and Goto: Menu -> Administration -> Database maintance ->
Update SPARQL endpoint.


Best Regards and apologies for cross-posting


diego ferreyra
temat...@r020.com.ar
http://www.vocabularyserver.com



--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] LibGuides: I don't get it

2013-08-11 Thread stuart yeates

On 12/08/13 12:20, Andrew Darby wrote:

I don't get this argument at all.  Why is it "counter productive to try to
look at open source alternatives" if the vendor's option is relatively
cheap?  Why wouldn't you investigate all options?


If you have no in-house technical capability, the cost of looking at an 
open source alternative can easily outweigh the multi-year licensing fee.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Lightweight Autocomplete Application

2013-07-08 Thread stuart yeates

On 09/07/13 02:37, Anderson, David (NIH/NLM) [E] wrote:

I'm looking for a lightweight autocomplete application for data entry. Here's 
what I'd like to be able to do:


* Import large controlled vocabularies into the app

* Call up the app with a macro wherever I'm entering data

* Begin typing in a term from the vocabulary, get a list of suggestions 
for terms

* Select a term from the list and have it paste automatically into my 
data entry field

Ideally it would load and suggest terms quickly. I've looked around, but 
nothing really stands out. Anyone using anything like this?


There's a worked example doing this a couple of ways using wikipedia at:

http://stackoverflow.com/questions/7834174/jquery-autocomplete-plugin-to-jquery-ui-autocomplete

In both cases note the 'http://en.wikipedia.org/w/api.php' URI. The 'en' 
part of that is a language code, switch it out for whatever natural 
language you're expecting people to type.


If you're working in a bi-lingual or multi-lingual environment, there's 
a category of redirects 
http://en.wikipedia.org/wiki/Category:Redirects_from_alternative_languages‎ 
which allow you to autocomplete across languages. The rules around 
redirects only allow such redirects from languages with a direct 
connection to the subject matter. In theory wikidata could be use to 
build something more complete.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] WorldCat Implements Content-Negotiation for Linked Data

2013-06-03 Thread stuart yeates

On 04/06/13 11:18, Karen Coyle wrote:

Ta da! That did it, Kyle. Why on earth do we all them "smart quotes" ?!


Because they look damn sexy when printed on pulp-of-murdered-tree, which 
we all know is authoritative form of any communication.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] On-going support for DL projects

2013-05-19 Thread stuart yeates

On 18/05/13 01:51, Tim McGeary wrote:

There is no easy answer for this, so I'm looking for discussion.

- Should we begin considering a cooperative project that focuses on
emulation, where we could archive projects that emulate the system
environment they were built?
- Do we set policy that these types of projects last for as long as they
can, and once they break they are pulled down?
- Do we set policy that supports these projects for a certain period of
time and then deliver the application, files, and databases to the faculty
member to find their own support?
- Do we look for a solution like the Way Back Machine of the Internet
Archive to try to present some static / flat presentation of these project?


Actually, there is an easy answer to this.

Make sure that the collection is aligned with broader institutional 
priorities to ensure that if/when staff and funding priorities move 
elsewhere that there is some group / community with a clear interest 
and/or mandate in keeping the collection at least on life support, if 
not thriving.


Google "collections policy" for what written statements of this might 
look like.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Tool to highlight differences in two files

2013-04-23 Thread stuart yeates
Automating your favourite browser to load and screenshot each version 
and then using http://www.imagemagick.org/Usage/compare/ should work.


Note that this will also catch the scenario where someone has changed 
the page by changing an image on the page.


cheers
stuart

On 24/04/13 10:18, Wilhelmina Randtke wrote:

That helps a lot, because it's for websites which is what I want to compare.

I am looking for changes in a site, and I have some archives, but tools for
merging code are too labor intensive and don't give a good visual report
that I can show to a supervisor.  This is good moving forward, but doesn't
cover historical pages.

I was hoping for something where I could call up two pages and get a visual
display of differences for the display version of html, not the code.

-Wilhelmina

On Tue, Apr 23, 2013 at 5:14 PM, Pottinger, Hardy J. <
pottinge...@missouri.edu> wrote:


Hi, I'm not sure if you're really looking for a diff tool, so I'll just
shout an answer to a question that I think you might be asking. I use a
variation of the script posted here:

http://stackoverflow.com/questions/1494488/watch-a-web-page-for-changes


for watching a web page for changes. I mostly only ever use this for
watching for new artifacts to appear in Maven Central (because refreshing
a web page is pretty dull "work").

Hope this helps.

--
HARDY POTTINGER 
University of Missouri Library Systems
http://lso.umsystem.edu/~pottingerhj/
https://MOspace.umsystem.edu/
"Do you love it? Do you hate it? There it is, the way you made it."
--Frank Zappa





On 4/23/13 3:24 PM, "Wilhelmina Randtke"  wrote:


I would like to compare versions of a website scraped at different times
to
see what paragraphs on a page have changed.  Does anyone here know of a
tool for holding two files side by side and noting what is the same and
what is different between the files?

It seems like any simple script to note differences in two strings of text
would work, but I don't know a tool to use.

-Wilhelmina Randtke







--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


[CODE4LIB] RDA software for managing authorities

2013-03-05 Thread stuart yeates

I'm looking for recommendations for software for managing authorities.

Currently we're using a somewhat antiquated version of EATS 
https://code.google.com/p/eats/ but we're looking for something 
different. Our needs / wants are:


(*) Sane import/export to RDA (leaning towards RDA native)
(*) Sane import from legacy formats
(*) Sane export to sundry RDF formats + legacy formats
(*) Web based
(*) Out of the box rather than highly customised software
(*) Good support for bi-lingual / multi-lingual entries
(*) Ability to host multiple entirely separate authorities groups with 
separate policies and practises.

(*) Explicit support for VIAF / wikidata / LoC

It occurs to me that conceivably the best software for the job is 
actually an LMS with all the item-level stuff suppressed in favour 
work-level and authority records, in which case the question becomes "is 
there a RDA-based LMS that can be customised to remove all the 
item-level stuff?"


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] wiki page about the chode4lib irc bot created

2013-01-24 Thread stuart yeates

On 25/01/13 09:47, Bohyun Kim wrote:

Hi all~

I was not familiar with the code4lib IRC bot (or irc bot in general for that 
matter), and the recent discussion on the listserv made me curious.

BTW I fully support the idea of removing offensive content, and big thanks to 
those who have been working on cleaning up those stuff.

In any case, I figured there might be others who are new to code4lib and were 
somewhat aware of zoia but not sure what exactly it does or will do. So I 
created a wiki page with a bunch of examples today morning. It's far from 
comprehensive but I think it would be cool if others -who care about the bot - 
add more content to this page.
http://wiki.code4lib.org/index.php/Zoia_or_the_Code4Lib_IRC_bot


Looking at that, the only absolutely library-specific content there 
appears to be the MARC plugin (which isn't documented in detail).


cheers
stuart

--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Metrics for measuring digital library production

2012-12-17 Thread stuart yeates

On 18/12/12 10:20, Kyle Banerjee wrote:

Howdy all,

Just wondering who might be willing to share what kind of stats they
produce to justify their continued existence? Of course we do the normal
(web activity, items and metadata records created, stuff scanned, etc), but
I'm trying to wrap my mind around ways to describe work where there's not a
built in assumption that more is better.


I recently gave a seven minute rant at NDF about what statistics we 
aren't collecting. The video requires silverlight, alas:


http://webcast.gigtv.com.au/Mediasite/Catalog/catalogs/NDF/ (second 
page, 'Lightning talks Session 2')


Capsule summary: we claim to value user engagement. Making that claim 
and then failing to attempt to measure it is unprofessional.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] basic IRC question/comments

2012-12-10 Thread Stuart Yeates
> If you decide you'll be hanging out on #libtechwomen a lot (or on 
> irc.freenode.net in general), it might be a good idea to register your nick 
> as explained here http://freenode.net/faq.shtml#userregistration because the 
> short answer is, sometimes people are assholes.

Another important option for complete newbies is to pick a guest username 
(these are often generated automatically by different clients, Guest1234, etc) 
so that you can try it out completely anonymously. 

I, for one, am certainly happy to interact with people on an anonymous basis if 
that's what they want.

Cheers
stuart


Re: [CODE4LIB] Gender Survey Summary and Results

2012-12-05 Thread stuart yeates

On 06/12/12 09:05, Sara Amato wrote:

I'd been staying out of this discussion, but the thought occurs to me that 
someone with access to the list of subscribers might run that against a list of 
traditional boy/girl names, and be able to make some guesses….


That idea runs into problems both with non-western names (there is more 
than one kind of diversity) and those people whose experience of gender 
in the workplace have led them to use non-gender-specific identifiers.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] EPUB and ILS indexing update : Question on CIP Usage for e-books

2012-11-11 Thread Stuart Yeates
> > But who is deciding the LCC or Dewey Classification code ? Should it 
> > be the publisher's initiative ? Is there a way to get those 
> > information automatically ?
> 
> For books published in the US except for those categories listed at 
> http://www.loc.gov/publish/cip/about/ineligible.html , the Library 
> of Congress creates all of the CIP data based on information provided 
> before publication by the publisher.  It's not up to the publisher to 
> suggest these.  If you want to find classification data for existing 
> titles (as above, mostly only those published in print), you could 
> query the Library of Congress catalog or WorldCat.

Conveniently, those ineligible  categories include: 

"Books published in electronic format"

Cheers
stuart


Re: [CODE4LIB] Google Analytics/Do Not Track

2012-10-30 Thread stuart yeates

On 31/10/12 09:51, Nathan Tallman wrote:

After all the hoopla this year it looks as if all the major browsers plan
to implement a "do not track" feature that users can enable. Does anyone
know if this will block Google Analytics  It's probably too early to tell,
but my guess is yes...


Let me get this right. You want to track users after they have expressed 
an explicit desire not to be tracked? The link you're after is 
http://www.ala.org/offices/oif/statementspols/ftrstatement/freedomreadstatement


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Seeking examples of outstanding discovery layers

2012-09-20 Thread stuart yeates

On 21/09/12 12:52, Penelope Campbell wrote:

It may not be what you are thinking of, but see

http://trove.nla.gov.au/

the best way to see it in action is to do a search.


http://www.digitalnz.org/ and it's skins such as 
http://nzresearch.org.nz/ are also pretty good, not that I'm trying to 
encourage trans-Tasman rivalry.


cheers
stuart

--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] visualize website

2012-09-02 Thread stuart yeates

On 03/09/12 12:59, David Friggens wrote:

If you don't have a GUI on the server, it looks like philesight will
provide a ring chart view through the web server (haven't tried it
myself).


If you don't have X11 on the server, pipe (or copy) the output of 'du' 
via 'ssh' to 'xdu' on your desktop.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Corrections to Worldcat/Hathi/Google

2012-08-29 Thread stuart yeates

On 29/08/12 19:46, Michael Hopwood wrote:

Thanks for this pointer Owen.

It's a nice illustration of the fact that what users actually want (well, I know I did back when I actually 
worked in large information services departments!) is something more like an intranet where the content I 
find is weighted towards me, the "audience" e.g. the intranet knows I'm a 2nd year medical student 
and one of my registered preferred languages is Mandarin already, or it "knows" that I'm a rare 
books cataloguer and I want to see what "nine out of ten" other cataloguers recorded for this 
obscure and confusing title.


Yet another re-invention of content negotiation, AKA  RFC 2295.

These attempts fail because 99% of data publishers care in the first 
instance about the single use before them and in the second instance the 
precedent has already been set.


The exception, of course, is legally mandated multi-lingual 
bureaucracies (Canadian government for en/fr; EU organs for various 
languages etc) and on-the-wire formatting (for which it works very well)



However, this stuff is quite intense for linked data, isn't it? I understand 
that it would involve lots of quads, named graphs or whatever...

In a parallel world, I'm currently writing up recommendations for aggregating ONIX for 
Books records. ONIX data can come from multiple sources who potentially assert different 
things about a given "book" (i.e. something with an ISBN to keep it simple).

This is why *every single ONIX data element* can have option attributes of

@datestamp
@sourcename
@sourcetype [e.g. publisher, retailer, data aggregator... library?]

...and the ONIX message as a whole is set up with "header" and "product record" 
 segments that each include some info about the sender/recipient/data record in question.


Do yo have any stats for how many ONIX data elements in the wild 
actually use these elements in non-trivial ways? I've never seen any.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Corrections to Worldcat/Hathi/Google

2012-08-27 Thread stuart yeates

On 28/08/12 12:07, Peter Noerr wrote:

 They are not descendents of the same original, they are independent entities, 
whether they are recorded as singular MARC records or collections of LD triples.


That depends on which end of the stick one grasps.

Conceptually these are descendants of the abstract work in question; 
textually these are independent (or likely to be).


In practice it doesn't matter: since git/svn/etc are all textual in 
nature, they're not good at handling these.


The reconciliation is likely to be a good candidate for temporal 
versioning.


It's interesting to ponder which of the many datasets is going to prove 
to be the hub for reconciliation. My money is on librarything, because 
their merge-ist approach to cataloguing means they have lots and lots of 
different versions of the work information to match against. See for 
example:  https://www.librarything.com/work/683408/editions/11795335 
Wikipedia / dbpedia have redirects which tend in the same direction, but 
only for titles and not ISBNs.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Corrections to Worldcat/Hathi/Google

2012-08-27 Thread stuart yeates
These have to be named graphs, or at least collections of triples which 
can be processed through workflows as a single unit.


In terms of LD there version needs to be defined in terms of:

(a) synchronisation with the non-bibliographic real world (i.e. Dataset 
Z version X was released at time Y)


(b) correction/augmentation of other datasets (i.e Dataset F version G 
contains triples augmenting Dataset H versions A, B, C and D)


(c) mapping between datasets (i.e. Dataset I contains triples mapping 
between Dataset J version K and Dataset L version M (and visa-versa))


Note that a 'Dataset' here could be a bibliographic dataset (records of 
works, etc), a classification dataset (a version of the Dewey Decimal 
Scheme, a version of the Māori Subject Headings, a version of Dublin 
Core Scheme, etc), a dataset of real-world entities to do authority 
control against (a dbpedia dump, an organisational structure in an 
institution, etc), or some arbitrary mapping between some arbitrary 
combination of these.


Most of these are going to be managed and generated using current 
systems with processes that involve periodic dumps (or drops) of data 
(the dbpedia drops of wikipedia data are a good model here). git makes 
little sense for this kind of data.


github is most likely to be useful for smaller niche collaborative 
collections (probably no more than a million triples) mapping between 
the larger collections, and scripts for integrating the collections into 
a sane whole.


cheers
stuart

On 28/08/12 08:36, Karen Coyle wrote:

Ed, Corey -

I also assumed that Ed wasn't suggesting that we literally use github as
our platform, but I do want to remind folks how far we are from having
"people friendly" versioning software -- at least, none that I have seen
has felt "intuitive." The features of git are great, and people have
built interfaces to it, but as Galen's question brings forth, the very
*idea* of versioning doesn't exist in library data processing, even
though having central-system based versions of MARC records (with a
single time line) is at least conceptually simple.

Therefore it seems to me that first we have to define what a version
would be, both in terms of data but also in terms of the mind set and
work flow of the cataloging process. How will people *understand*
versions in the context of their work? What do they need in order to
evaluate different versions? And that leads to my second question: what
is a version in LD space? Triples are just triples - you can add them or
delete them but I don't know of a way that you can version them, since
each has an independent T-space existence. So, are we talking about
named graphs?

I think this should be a high priority activity around the "new
bibliographic framework" planning because, as we have seen with MARC,
the idea of versioning needs to be part of the very design or it won't
happen.

kc

On 8/27/12 11:20 AM, Ed Summers wrote:

On Mon, Aug 27, 2012 at 1:33 PM, Corey A Harper 
wrote:

I think there's a useful distinction here. Ed can correct me if I'm
wrong, but I suspect he was not actually suggesting that Git itself be
the user-interface to a github-for-data type service, but rather that
such a service can be built *on top* of an infrastructure component
like GitHub.

Yes, I wasn't saying that we could just plonk our data into Github,
and pat ourselves on the back for a good days work :-) I guess I was
stating the obvious: technologies like Git have made once hard
problems like decentralized version control much, much easier...and
there might be some giants shoulders to stand on.

//Ed





--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Wikis

2012-07-24 Thread Stuart Yeates
The wiki software with the largest user base is undoubtedly media wiki (i.e. 
wikiepdia).

We're moving to it as a platform precisely because to leverage the skills that 
implies.

We're not far enough into our roll out to tell whether it's going to be a 
success

cheers
stuart

Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nathan 
Tallman
Sent: Wednesday, 25 July 2012 8:34 a.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Wikis

There are a plethora of options for wiki software. Does anyone have any
recommendations for a platform that's easy-to-use and has a low-learning
curve for users? I'm thinking of starting a wiki for internal best
practices, etc. and wondered what people who've done the same had success
with.

Thanks,
Nathan


Re: [CODE4LIB] Best way to process large XML files

2012-06-10 Thread stuart yeates

On 09/06/12 06:36, Kyle Banerjee wrote:


How do you guys deal with large XML files?


There have been a number of excellent suggestions from other people, but 
it's worth pointing out that sometimes low tech is all you need.


I frequently use sed to do things such as replace one domain name with 
another when a website changes their URL.


Short for Stream EDitor, sed is a core part of POSIX and should be 
available pretty on much every UNIX-like platform imaginable. For 
non-trivial files it works faster than disk access (i.e. works as fast 
as a naive file copy). Full regexp support is available.


sed 's/www.example.net/example.com/gI' < IN_FILE > OUT_FILE

Will stream IN_FILE to OUT_FILE replacing all instances of 
"www.example.net" with "example.com"


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] The history of Code4Lib and MediaWiki development.

2012-06-10 Thread stuart yeates

On 09/06/12 06:18, Klein,Max wrote:

I was just wondering if there have been any efforts from Code4Lib into
MediaWiki development? I know that there have been some Wikipedia
templates and bots designed to interface with library services. Yet what
about cold hard MediaWiki extensions? Has there been any discussion on
this, any ideas raised?


Do you have any specific examples of things that can't be done with 
templates (good for holding information), bots (good for adding, 
curating and maintaining information) or CSS (good for displaying 
information) that can be done using a MediaWiki extension?


The road to get a MediaWiki extension stable enough, tested enough, 
trusted enough and needed enough for it to get rolled out on Wikipedia 
is long and hard and I wouldn't recommend it unless you have the most 
cast-iron of use-cases.


If the ISBN support were proposed today, I strongly suspect that it 
won't make it in today. Of course it's now grandfathered in and removing 
support seems very, very, unlikely.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Studying the email list

2012-06-05 Thread stuart yeates

On 06/06/12 06:11, Doran, Michael D wrote:

Without asking permission of the list, I hereby assign this new category of things 
requiring OCLC oversight as "salami" on the charcuterie spectrum.

   Bologna == Seal of Disapproval


There appears to be a typo here:

Soylent Green == Seal of Disapproval

cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


[CODE4LIB] OCLC / British Library / VIAF / Wikipedia

2012-06-01 Thread Stuart Yeates
There's a discussion going on on Wikipedia that may be of interest to 
subscribers of this list:

https://en.wikipedia.org/wiki/Wikipedia_talk:Authority_control#More_VIAF_Integration

cheers
stuart


Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread stuart yeates

On 24/05/12 07:14, Ford, Kevin wrote:

I finally had occasion today (read: remembered) to see if the *nix "file" 
command would recognize a MARC record file.  I haven't tested extensively, but it did 
identify the file as MARC21 Bibliographic record.  It also correctly identified a MARC21 
Authority Record.  I'm running the most recent version of Ubuntu (12.04 - precise 
pangolin).

I write because the inclusion of a "file" MARC21 specification rule in the 
magic.db stems from a Code4lib exchange that started in March 2011 [1] (it ends in April 
if you want to go crawling for the entire thread).


A couple of warnings about the unix file command

(a) it only looks at the start of the file. This is great because it 
works fast on big files. This is dreadful because it can't warn you that 
everything after the first 10k of a 2GB file is corrupt or a 1k MARC 
file is pre-pended to a 400GB astronomy data file.


(b) it is not uncommon for a file to match multiple file types. This can 
cause problems when using file to check whether inputs to a program are 
actually the type the program is expecting.


(c) some platforms have been notoriously slow to add new definitions, 
ubuntu is not such a platform.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] whimsical homepage idea

2012-05-02 Thread stuart yeates
The catalog is also a good reference for how many books there are 
available fuel. Hopefully the records contain information on which are 
printed on clean-burning paper.


cheers
stuart

On 03/05/12 10:32, Genny Engel wrote:

The number of currently available cardigans could then be displayed along with 
the temperature gauges.  Now you also have to interface this whole thing with 
the item status in the catalog, which will of course have to contain cardigan 
records.  You could use NCIP to grab the status, but I'm not sure what the 
standard cardigan metadata would include.

Genny Engel


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Maryann 
Kempthorne
Sent: Tuesday, May 01, 2012 9:56 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] whimsical homepage idea

Why not a cardigan checkout?
Maryann

On Tue, May 1, 2012 at 6:23 PM, Kyle Banerjee  wrote:

[stuff on where to get sensors deleted]

Depending on how many you need, wireless sensors for weather stations could
make more sense (you can run them on different channels to prevent
interference). Plus you can use the weather software to generate graphs,
upload data, etc.

kyle

--
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
baner...@uoregon.edu / 503.999.9787



--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


[CODE4LIB] Alternatives to MARC (was: Re: [CODE4LIB] NON-MARC ILS?)

2012-03-14 Thread stuart yeates
MARC is a pain to work with; this is a truism which most of us should be 
familiar with.


Blindly moving away from MARC is not the solution, indeed history 
suggests that path leads us back to an even more complex version of MARC.


MARC is complex (and thus a pain) for three reasons: (a) the inherent 
complexity of the bibliographic content it deals with; (b) the fact that 
there are many MARC-using groups who have different sets of motivations 
and ideas as to what MARC is for; and (c) MARC's long and complicated 
history.


Throwing out MARC doesn't solve any of these except the last, and then 
only if you throw away all your data and make no efforts to migrate it. 
Obtaining new data from a consortia or company almost certainly buys you 
not only MARC's history, but some tasty local decisions on top.




A far more productive discussion is to explore potential replacements 
for MARC. This, of course, is only productively conducted with a sound 
understanding of the causes of the complexity in MARC. I'll leave it to 
the reader to consider whether various proponents' arguments are 
persuasive on this point.


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Preserving hyperlinks in conversion from Excel/googledocs/anything to PDF (was Any ideas for free pdf to excel conversion?)

2012-03-06 Thread stuart yeates

Sounds like a job for LaTeX and a short bash script to me.

cheers
stuart

On 07/03/12 07:55, Bill Dueber wrote:

What exactly are you trying to do? Take a list of links and turn them
into...a list of hot links in a PDF file?

On Mon, Mar 5, 2012 at 8:46 AM, Matt Amory  wrote:


Does anyone know of any script library that can convert a set of (~200)
hyperlinks into Acrobat's goofy protocol?  I do own Acrobat Pro.

Thanks

On Wed, Dec 14, 2011 at 1:08 PM, Matt Amory  wrote:


Just looking to preserve column structure.

--
Matt Amory
(917) 771-4157
matt.am...@gmail.com
http://www.linkedin.com/pub/matt-amory/8/515/239





--
Matt Amory
(917) 771-4157
matt.am...@gmail.com
http://www.linkedin.com/pub/matt-amory/8/515/239








--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Lift the Flap books

2012-02-14 Thread stuart yeates

On 15/02/12 13:43, Sara Amato wrote:

If you were to have a 'lift the flap' type book that you wanted to digitize, 
for web display and use, what technology would you use for markup and display?

Visually I like the Internet Archive BookReader ( 
http://openlibrary.org/dev/docs/bookreader ), which says it can do 'foldouts', 
thought I haven't found an example of HOW to do that ... nor exactly what the 
metadata schema is.


Sounds like an ideal use for HTML, javascript and image transparency 
using OnMouseOver as trigger.


cheers
stuart

--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] Detecting DRM in files

2012-01-19 Thread stuart yeates

On 19/01/12 10:39, Farrell, Larry D wrote:

Does anyone know of a Java, Ruby, Python, etc. package to detect digital rights 
management features in files?


This is really a part of detecting file types. There are a number of 
systems for detecting file types, an overview can be found at: 
http://www.forensicswiki.org/wiki/File_Format_Identification


cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


  1   2   >