[CODE4LIB] IR+ 2.0 now available for download

2010-10-25 Thread Sarr, Nathan
The University of Rochester is pleased to announce the open source
release of its institutional repository IR+.  Following a successful
production launch on Tuesday October 12th, IR+ 2.0 is now available to
the entire community.  

 

IR+ was born from user research. With portfolios, personal workspaces,
and publication listings, it offers useful tools for researchers and
extends the role of the repository into the authoring process.  It is a
fully featured digital repository management solution that is easy for
users to understand and manage. Its goals are to meet the needs of any
organization that needs to author, publish and preserve digital
information.

 

The download and documentation can be found at http://www.irplus.org

 

The new version has many new features and updates.  These include:

 

-  OAI-PMH harvestable

-  Dublin Core mapping features for Identifiers and contributors

-  Improved batch metadata manipulation - automated re-indexing
enhancements (changing control lists forces re-indexing of all items
that use changed data)

-  Sponsor browsing / statistics

-  Paging and Sorting for contributor pages

-  Improved Search Engine Optimization(SEO) for better indexing
of researcher pages and content within the repository

-  Researcher page interface enhancements

-  Content type listing and filtering  at the repository and
collection levels

-  Content type counts at the repository and collection levels

-  Increased download information and removal options for more
accurate download counts

-  Updated Help,  Installation and User manuals

-  RSS feeds for Collections/Contributor Pages

-  Upgraded pdf/word/excel/power point text extraction libraries

-  Updated user account management features

-  Submission performance enhancements

-  Improved home page module placement

-  Improved change tracking

We are pleased with the current faculty and student interest in our IR+
installation, UR Research, having over two thousand registered users,
over one million downloads and thirty six public researcher pages.
Please feel free to contact me if you have any questions.

 

 

 

 

Nathan Sarr

Senior Software Engineer

River Campus Libraries

University of Rochester

Rochester, NY  14627

(585) 275-0692

 


[CODE4LIB] VPN vs. Proxy - Quick Question

2010-10-25 Thread Tim McGeary

Hi all,

I realize that some of you may not directly deal with this issue, but I 
was wondering if I could get some quick replies about how your 
institutions are handling access to off-campus resources via VPN and Proxy.


Do you offer a VPN service?  If so, do you split-tunnel the traffic so 
that the VPN only handles traffic to inside your campus IP?  If you 
split-tunnel, do users complain about not being able to connect to 
external library resources (databases, journals, etc)?


Do you offer a Proxy service?  Will your proxy service work for users 
already connected to VPN?


Do you know an estimated ratio of Proxy:VPN users?

Thanks,
Tim

--
Tim McGeary
Team Leader, Library Technology
Lehigh University
610-758-4998
tim.mcge...@lehigh.edu

timmcge...@gmail.com
GTalk/Yahoo/Skype: timmcgeary


Re: [CODE4LIB] Simple Flexible ILS written in Django

2010-10-25 Thread Jonathan Rochkind
Neat, if you put this into production at a public URL anytime, do let us 
know.


Elliot Hallmark wrote:

Re: simple, flexible ILS for small library

hello all,


Just wanted to mention that I did decide to code an ILS for a book
sharing library. Tweaking conventional ILS or bartering software would
not accomplish what I want, and programatically the problem isn't very
difficult.  Modern programming frameworks make building something like
this out very quick.

The program currently has all the basic functionality I need including
user front ends for checkin/out, adding new items from downloaded MARC
records and a powerful backend for fixing anything that a user
(librarian) did accidentally or shouldnt have permission to easily do.

I am doing this in Django because Python is awesome.  There are more
defined reasons people use Django, but I will admit that I would never
have done any substantial programming if I had not found python.
Everytime a project comes up, python just happens to have a set of
tools that go well beyond what I need from them.

The code is available for anyone:
http://bitbucket.org/permafacture/django-ils/

I'll gladly explain/document more if anyone cares to hear

-Elliot Hallmark

PS: I am now using a more clean email address.  Previously I was using
an address that I dont mind getting a little spammy because I was just
poking around.  I am the same person as offonoffoffonoff at gmail.

  


[CODE4LIB] Django

2010-10-25 Thread Junior Tidal
Hello Code4Lib,

Does anyone have any recommendations for learning Django? Books, websites, 
video tutorials, etc. ...

thanks,

Junior Tidal
Assistant Professor
Web Services and Multimedia Librarian
New York City College of Technology, CUNY 
300 Jay Street
Brooklyn, NY 11210
718.260.5481
 
http://library.citytech.cuny.edu


Re: [CODE4LIB] Django

2010-10-25 Thread Andrew Hankinson
There's the Django Book: http://www.djangobook.com/ (Make sure you choose the 
revised edition for 1.0)
The Django docs, with some intro tutorials: 
http://docs.djangoproject.com/en/1.2/

Did you try those already?


On 2010-10-25, at 10:19 AM, Junior Tidal wrote:

 Hello Code4Lib,
 
 Does anyone have any recommendations for learning Django? Books, websites, 
 video tutorials, etc. ...
 
 thanks,
 
 Junior Tidal
 Assistant Professor
 Web Services and Multimedia Librarian
 New York City College of Technology, CUNY 
 300 Jay Street
 Brooklyn, NY 11210
 718.260.5481
 
 http://library.citytech.cuny.edu


Re: [CODE4LIB] Django

2010-10-25 Thread Michael J. Giarlo
I'd start here:

   http://docs.djangoproject.com/en/1.2/

There are some tutorials in there as well.

-Mike



On Mon, Oct 25, 2010 at 10:19, Junior Tidal jti...@citytech.cuny.edu wrote:
 Hello Code4Lib,

 Does anyone have any recommendations for learning Django? Books, websites, 
 video tutorials, etc. ...

 thanks,

 Junior Tidal
 Assistant Professor
 Web Services and Multimedia Librarian
 New York City College of Technology, CUNY
 300 Jay Street
 Brooklyn, NY 11210
 718.260.5481

 http://library.citytech.cuny.edu



Re: [CODE4LIB] VPN vs. Proxy - Quick Question

2010-10-25 Thread Thomas Bennett
We have VPN and Proxy(III WAM) available here although for our online 
resources VPN doesn't get you anything special you still go through proxy.  
The regular URLs and Proxy URLs are in a PostgreSQL database and the page with 
the links to online resources is dynamically fed based on your IP (HTTP 
variable HTTP_X_FORWARDED_FOR).  Apache forwards all requests to Zope server 
so that's why I'm not checking REMOTE_ADDR variable.  If your IP is not in our 
domain, that is if the first two octets don't match, then you get a proxy link 
which goes to our III authentication page.  Online resources that are free get 
the same URL for on campus and off campus not a PROXY link.

  I use a simple python script to check the http variable 
'HTTP_X_FORWARDED_FOR' and return 0 or 1 in the variable 'hostname' to a Zope 
(python based WEB server)  page. A simple IF conditional statement determines 
which URL to display based on the return value of the script.

# call the python script ip_add_flag and set the return value to the variable 
hostname
dtml-call REQUEST.set('hostname',ip_add_flag())

dtml-if hostname
   A href=http://dtml-var vdb_local_urldtml-var vdb_title/a
dtml-else
A href=http://dtml-var vdb_proxy_url target=newdtml-var 
vdb_title/a font size=-2(opens in a new window)/font
/dtml-if

The campus offers a VPN service but you don't get the usual campus domain IP so 
we handle it the same as if it is any other off campus IP, our vendors are not 
given this range either so it is not in the group of IPs for licensing certain 
databases.

As far as user complaints, we have a form that a small group of people here 
receive those submissions and they put it into TRAC and individually work 
through the issues.

Don't know the ratio of Proxy:VPN users, I don't have a definitive range of VPN 
IPs  to work with.  The campus VPN is used to be able to access certain 
servers that are not normally accessible off campus because the vlan they are 
in.

Thomas  




On Monday 25 October 2010 09:33:55 Tim McGeary wrote:
 Hi all,
 
 I realize that some of you may not directly deal with this issue, but I
 was wondering if I could get some quick replies about how your
 institutions are handling access to off-campus resources via VPN and Proxy.
 
 Do you offer a VPN service?  If so, do you split-tunnel the traffic so
 that the VPN only handles traffic to inside your campus IP?  If you
 split-tunnel, do users complain about not being able to connect to
 external library resources (databases, journals, etc)?
 
 Do you offer a Proxy service?  Will your proxy service work for users
 already connected to VPN?
 
 Do you know an estimated ratio of Proxy:VPN users?
 
 Thanks,
 Tim
 

-- 
==
Thomas McMillan Grant Bennett   Appalachian State University
Operations  Systems AnalystP O Box 32026
University LibraryBoone, North Carolina 28608
(828) 262 6587

Library Systems Help Desk: https://www.library.appstate.edu/help/
==


Re: [CODE4LIB] Django

2010-10-25 Thread Nate Vack
On Mon, Oct 25, 2010 at 9:19 AM, Junior Tidal jti...@citytech.cuny.edu wrote:

 Does anyone have any recommendations for learning Django? Books, websites, 
 video tutorials, etc. ...

For resources, learn django in Google shows a bunch of promising hints.

Methodology-wise: Start with a fairly concrete, well-defined problem.
Have a product in mind before you start. Work hard with the tool you
choose to make your product. Don't stress about whether you've chosen
the best tools (you haven't) or whether you're doing it perfectly (you
aren't). Make the thing.

You can spend months looking over example code and tutorials and blog
posts and not learn nearly as much as you would attacking the problem.
Plus, you've gotten closer to solving the problem as you've learned.

Or, DHH says it a bit better:

http://37signals.com/svn/posts/2582-how-do-i-learn-to-program

Cheers,
-Nate


[CODE4LIB] (LC) call number searching in Solr

2010-10-25 Thread Naomi Dushay
I recently set up a testing framework allowing me to twiddle Solr  
knobs until I met acceptance criteria for LC call number searching.  I  
came up with two Solr field types that worked for my criteria.


You can read all about it here:

http://discovery-grindstone.blogspot.com/2010/10/lc-call-number-searching-in-solr.html

- Naomi


[CODE4LIB] testing testing testing - Solr indexing software

2010-10-25 Thread Naomi Dushay
I just finished a bunch of blog posts about the sorts of tests to  
write for Solr indexing software.  Comments are welcome.  Try not to  
drool when you fall asleep on your keyboard.


Start with this one:

http://discovery-grindstone.blogspot.com/2010/10/testing-solr-indexing-software.html

- Naomi


Re: [CODE4LIB] Help with DLF-ILS GetAvailability

2010-10-25 Thread Emily Lynema
I agree with Jonathan and David. The only reason there are no examples 
of including dlf:simpleavailability within dlf:holdingsrec is 
because no one thought of a use case for why you would do that. The xsd 
for dlf:holdingsrec explicitly states that it is simply Metadata must 
be expressed in XML that complies with another XML Schema 
(namespace=#other). Metadata must be explicitly qualified in the 
response. So the only restriction is that it's some kind of 
standardized metadata! While we had envisioned using something like 
MARCXML or ISO Holdings here to express things like serial runs, there 
is no reason that simpleavailability could not be employed to describe a 
different kind of collection of items. The dlf:holdingset and 
dlf:holdingsrec are after all intended to represent a collection of 
items, and as David points out, the ISO Holdings schema explicitly 
allows for collection-level availability summary. And I will also note 
that ISO Holdings certainly does express availability in addition to 
'holdings'; they are really one and the same thing. I guess I should 
note that I was a member of the original DLF group, so I suppose this is 
a fairly authoritative perspective on the original intent of the 
elements. :) -emily -- Date: Thu, 21 Oct 
2010 16:26:54 -0400 From: Jonathan Rochkind rochk...@jhu.edu Subject: 
Re: Help with DLF-ILS GetAvailability I don't think that's an abuse. I 
consider dlf:holdings to be for information about a holdingset, or 
some collection of items, while dlf:item is for information about an 
individual item. I think regardless of what you do you are being 
over-optimistic in thinking that if you just do dlf, your stuff will 
interchangeable with any other clients or servers doing dlf. The spec 
is way too open-ended for that, it leaves a whole bunch of details not 
specified and up to the implementer. For better or worse. I made more 
comments about this in the blog post I referenced earlier. Jonathan Owen 
Stephens wrote:



 Thanks Dave,

 Yes - my reading was that dlf:holdings was for pure 'holdings' as opposed to
 'availability'. We could put the simpleavailability in there I guess but as
 you say since we are controlling both ends then there doesn't seem any point
 in abusing it like that. The downside is we'd hoped to do something that
 could be taken by other sites - the original plan was to use the Juice
 framework - developed by Talis using jQuery to parse a standard availability
 format so that this could then be applied easily in other environments.
 Obviously we can still achieve the outcome we need for the immediate
 requirements of the project by using a custom format.

 Thanks again

 Owen


 On Thu, Oct 21, 2010 at 4:28 PM, Walker, David dwal...@calstate.edu wrote:

   
  

 Hey Owen,

 Seems like the you could use the dlf:holdings element to hold this kind
 of individual library information.

 The DLF-ILS documentation doesn't seem to think that you would use
 dlf:simpleavailability here, though, but rather MARC or ISO holdings
 schemas.

 But if you're controlling both ends of the communication, I don't know if
 it really matters.

 --Dave

 ==
 David Walker
 Library Web Services Manager
 California State University
 http://xerxes.calstate.edu
 
 From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Owen
 Stephens [o...@ostephens.com]
 Sent: Wednesday, October 20, 2010 12:22 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] Help with DLF-ILS GetAvailability

 I'm working with the University of Oxford to look at integrating some
 library services into their VLE/Learning Management System (Sakai). One of
 the services is something that will give availability for items on a reading
 list in the VLE (the Sakai 'Citation Helper'), and I'm looking at the
 DLF-ILS GetAvailability specification to achieve this.

 For physical items, the availability information I was hoping to use is
 expressed at the level of a physical collection. For example, if several
 college libraries within the University I have aggregated information that
 tells me the availability of the item in each of the college libraries.
 However, I don't have item level information.

 I can see how I can use simpleavailability to say over the entire
 institution whether (e.g.) a book is available or not. However, I'm not
 clear I can express this in a more granular way (say availability on a
 library by library basis) except by going to item level. Also although it
 seems you can express multiple locations in simpleavailability, and multiple
 availabilitymsg, there is no way I can see to link these, so although I
 could list each location OK, I can't attach an availabilitymsg to a specific
 location (unless I only express one location).

 Am I missing something, or is my interpretation correct?

 Any other suggestions?

 Thanks,

 Owen

 PS also looked at DAIA which I like, but this (as far as I 

Re: [CODE4LIB] Help with DLF-ILS GetAvailability

2010-10-25 Thread Jonathan Rochkind

Emily Lynema wrote:
standardized metadata! While we had envisioned using something like 
MARCXML or ISO Holdings here to express things like serial runs, there 
  


Kind of a side note, but please consider ONIX Serial Holdings for 
expressing serial runs!   It is by far the best schema I've seen for 
doing this -- simple for simple cases, flexible for other cases, 
actually DOES express things in a machine-interpretable way. Everything 
else I've seen is both way too complicated, even for simple cases, and 
often ends up expressing holdings in a way that a machine can't act upon 
anyway.


[CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Nate Vack
Hi all,

I've just spent the last couple of weeks delving into and decoding a
binary file format. This, in turn, got me thinking about MARCXML.

In a nutshell, it looks like it's supposed to contain the exact same
data as a normal MARC record, except in XML form. As in, it should be
round-trippable.

What's the advantage to this? I can see using a human-readable format
for poorly-documented file formats -- they're relatively easy to read
and understand. But MARC is well, well-documented, with more than one
free implementation in cursory searching. And once you know a binary
file's format, it's no harder to parse than XML, and the data's
smaller and processing faster.

So... why the XML?

Curious,
-Nate


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Tim Spalding
MARC records break parsing far too frequently. Apart from requiring no
truly specialized tools, MARCXML should—should!—eliminate many of
those problems. That's not to mention that MARC character sets vary a
lot (DanMARC anyone?), and more even in practice than in theory.

From my perspective the problem is simply that MARCXML isn't as
ubiquitous as MARC. For what we do, at least, there's no point. We'd
need to parse non-XML MARC data anyway. So if we're going to do it, we
might as well do it for everything.

Best,
Tim

On Mon, Oct 25, 2010 at 2:38 PM, Nate Vack njv...@wisc.edu wrote:
 Hi all,

 I've just spent the last couple of weeks delving into and decoding a
 binary file format. This, in turn, got me thinking about MARCXML.

 In a nutshell, it looks like it's supposed to contain the exact same
 data as a normal MARC record, except in XML form. As in, it should be
 round-trippable.

 What's the advantage to this? I can see using a human-readable format
 for poorly-documented file formats -- they're relatively easy to read
 and understand. But MARC is well, well-documented, with more than one
 free implementation in cursory searching. And once you know a binary
 file's format, it's no harder to parse than XML, and the data's
 smaller and processing faster.

 So... why the XML?

 Curious,
 -Nate




-- 
Check out my library at http://www.librarything.com/profile/timspalding


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Andrew Hankinson
I'm not a big user of MARCXML, but I can think of a few reasons off the top of 
my head:

- Existing libraries for reading, manipulating and searching XML-based 
documents are very mature.
- Documents can be validated for their well-formedness using these existing 
tools and a pre-defined schema (a validator for MARC would need to be 
custom-coded)
- MARCXML can easily be incorporated into XML-based meta-metadata schemas, like 
METS.
- It can be parsed and manipulated in a web service context without sending a 
binary blob over the wire.
- XML is self-describing, binary is not.

There's nothing stopping you from reading the MARCXML into a binary blob and 
working on it from there. But when sharing documents from different 
institutions around the globe, using a wide variety of tools and techniques, 
XML seems to be the lowest common denominator.

-Andrew

On 2010-10-25, at 2:38 PM, Nate Vack wrote:

 Hi all,
 
 I've just spent the last couple of weeks delving into and decoding a
 binary file format. This, in turn, got me thinking about MARCXML.
 
 In a nutshell, it looks like it's supposed to contain the exact same
 data as a normal MARC record, except in XML form. As in, it should be
 round-trippable.
 
 What's the advantage to this? I can see using a human-readable format
 for poorly-documented file formats -- they're relatively easy to read
 and understand. But MARC is well, well-documented, with more than one
 free implementation in cursory searching. And once you know a binary
 file's format, it's no harder to parse than XML, and the data's
 smaller and processing faster.
 
 So... why the XML?
 
 Curious,
 -Nate


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Patrick Hochstenbach
Dear Nate,

There is a trade-off: do you want very fast processing of data - go for binary 
data. do you want to share your data globally easily in many (not per se 
library related) environments - go for XML/RDF. 
Open your data and do both :-)

Pat

Sent from my iPhone

On 25 Oct 2010, at 20:39, Nate Vack njv...@wisc.edu wrote:

 Hi all,
 
 I've just spent the last couple of weeks delving into and decoding a
 binary file format. This, in turn, got me thinking about MARCXML.
 
 In a nutshell, it looks like it's supposed to contain the exact same
 data as a normal MARC record, except in XML form. As in, it should be
 round-trippable.
 
 What's the advantage to this? I can see using a human-readable format
 for poorly-documented file formats -- they're relatively easy to read
 and understand. But MARC is well, well-documented, with more than one
 free implementation in cursory searching. And once you know a binary
 file's format, it's no harder to parse than XML, and the data's
 smaller and processing faster.
 
 So... why the XML?
 
 Curious,
 -Nate


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread MJ Suhonos
It's helpful to think of MARCXML as a sort of lingua franca.

 - Existing libraries for reading, manipulating and searching XML-based 
 documents are very mature.
Including XSLT and XPath; very powerful stuff.

 There's nothing stopping you from reading the MARCXML into a binary blob and 
 working on it from there. But when sharing documents from different 
 institutions around the globe, using a wide variety of tools and techniques, 
 XML seems to be the lowest common denominator.

Assuming it's also round-trippable, MARC-in-JSON would accomplish this as well.

Not to mention it's nice to be able to read and edit MARC records in any 
(any!!) text editor for those of us who are comfortable looking at JSON or XML 
but can't handle staring at binary bytestreams without having an aneurysm.

MJ

 On 2010-10-25, at 2:38 PM, Nate Vack wrote:
 
 Hi all,
 
 I've just spent the last couple of weeks delving into and decoding a
 binary file format. This, in turn, got me thinking about MARCXML.
 
 In a nutshell, it looks like it's supposed to contain the exact same
 data as a normal MARC record, except in XML form. As in, it should be
 round-trippable.
 
 What's the advantage to this? I can see using a human-readable format
 for poorly-documented file formats -- they're relatively easy to read
 and understand. But MARC is well, well-documented, with more than one
 free implementation in cursory searching. And once you know a binary
 file's format, it's no harder to parse than XML, and the data's
 smaller and processing faster.
 
 So... why the XML?
 
 Curious,
 -Nate


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Tim Spalding
- XML is self-describing, binary is not.

Not to quibble, but that's only in a theoretical sense here. Something
like Amazon XML is truly self-describing. MARCXML is self-obfuscating.
At least MARC records kinda imitate catalog cards.
:)

Tim

On Mon, Oct 25, 2010 at 2:50 PM, Andrew Hankinson
andrew.hankin...@gmail.com wrote:
 I'm not a big user of MARCXML, but I can think of a few reasons off the top 
 of my head:

 - Existing libraries for reading, manipulating and searching XML-based 
 documents are very mature.
 - Documents can be validated for their well-formedness using these existing 
 tools and a pre-defined schema (a validator for MARC would need to be 
 custom-coded)
 - MARCXML can easily be incorporated into XML-based meta-metadata schemas, 
 like METS.
 - It can be parsed and manipulated in a web service context without sending a 
 binary blob over the wire.
 - XML is self-describing, binary is not.

 There's nothing stopping you from reading the MARCXML into a binary blob and 
 working on it from there. But when sharing documents from different 
 institutions around the globe, using a wide variety of tools and techniques, 
 XML seems to be the lowest common denominator.

 -Andrew

 On 2010-10-25, at 2:38 PM, Nate Vack wrote:

 Hi all,

 I've just spent the last couple of weeks delving into and decoding a
 binary file format. This, in turn, got me thinking about MARCXML.

 In a nutshell, it looks like it's supposed to contain the exact same
 data as a normal MARC record, except in XML form. As in, it should be
 round-trippable.

 What's the advantage to this? I can see using a human-readable format
 for poorly-documented file formats -- they're relatively easy to read
 and understand. But MARC is well, well-documented, with more than one
 free implementation in cursory searching. And once you know a binary
 file's format, it's no harder to parse than XML, and the data's
 smaller and processing faster.

 So... why the XML?

 Curious,
 -Nate




-- 
Check out my library at http://www.librarything.com/profile/timspalding


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Bryan Baldus
On  Monday, October 25, 2010 1:50 PM, Andrew Hankinson wrote:
- Documents can be validated for their well-formedness using these existing 
tools and a pre-defined schema (a validator for MARC would need to be 
custom-coded)

In Perl, MARC::Lint might be an example of such a validator (though I need to 
update it with the most recent MARC updates at some point soon). MarcEdit also 
includes a validator.

Bryan Baldus
bryan.bal...@quality-books.com
eij...@cpan.org
http://home.comcast.net/~eijabb/


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Eric Hellman
I think you'd have a very hard time demonstrating any speed advantage to MARC 
over MARCXML. XML parsers have been speed optimized out the wazoo; If there 
exists a MARC parser that has ever been speed-optimized without serious 
compromise, I'm sure someone on this list will have a good story about it.

On Oct 25, 2010, at 3:05 PM, Patrick Hochstenbach wrote:

 Dear Nate,
 
 There is a trade-off: do you want very fast processing of data - go for 
 binary data. do you want to share your data globally easily in many (not per 
 se library related) environments - go for XML/RDF. 
 Open your data and do both :-)
 
 Pat
 
 Sent from my iPhone
 
 On 25 Oct 2010, at 20:39, Nate Vack njv...@wisc.edu wrote:
 
 Hi all,
 
 I've just spent the last couple of weeks delving into and decoding a
 binary file format. This, in turn, got me thinking about MARCXML.
 
 In a nutshell, it looks like it's supposed to contain the exact same
 data as a normal MARC record, except in XML form. As in, it should be
 round-trippable.
 
 What's the advantage to this? I can see using a human-readable format
 for poorly-documented file formats -- they're relatively easy to read
 and understand. But MARC is well, well-documented, with more than one
 free implementation in cursory searching. And once you know a binary
 file's format, it's no harder to parse than XML, and the data's
 smaller and processing faster.
 
 So... why the XML?
 
 Curious,
 -Nate

Eric Hellman
President, Gluejar, Inc.
41 Watchung Plaza, #132
Montclair, NJ 07042
USA

e...@hellman.net 
http://go-to-hellman.blogspot.com/
@gluejar


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Nate Vack
On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding t...@librarything.com wrote:
 - XML is self-describing, binary is not.

 Not to quibble, but that's only in a theoretical sense here. Something
 like Amazon XML is truly self-describing. MARCXML is self-obfuscating.
 At least MARC records kinda imitate catalog cards.

Yeah -- this is kinda the source of my confusion. In the case of the
files I'm reading, it's not that it's hard to find out where the
nMeasurement field lives (it's six short ints starting at offset 64),
but what the field means, and whether or not I care about it.

Switching to an XML format doesn't help with that at all.

WRT character encoding issues and validation: if MARC and MARCXML are
round-trippable, a solution in one environment is equivalent to a
solution in the other.

And I think we've all seen plenty of unvalidated, badly-formed XML,
and plenty with Character Encoding Problemsâ„¢ ;-)

Thanks for the input!
-Nate


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Alexander Johannesen
Hiya,

On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack njv...@wisc.edu wrote:
 Switching to an XML format doesn't help with that at all.

I'm willing to take it further and say that MARCXML was the worst
thing the library world ever did. Some might argue it was a good first
step, and that it was better with something rather than nothing, to
which I respond ;

Poppycock!

MARCXML is nothing short of evil. Not only does it goes against every
principal of good XML anywhere (don't rely on whitespace, structure
over code, namespace conventions, identity management, document
control, separation of entities and properties, and on and on), it
breaks the ontological commitment that a better treatment of the MARC
data could bring, deterring people from actually a) using the darn
thing as anything but a bare minimal crutch, and b) expanding it to be
actual useful and interesting.

The quicker the library world can get rid of this monstrosity, the
better, although I doubt that will ever happen; it will hang around
like a foul stench for as long as there is MARC in the world. A long
time. A long sad time.

A few extra notes;
   http://shelterit.blogspot.com/2008/09/marcxml-beast-of-burden.html

Can you tell I'm not a fan? :)


Kind regards,

Alex
-- 
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Andrew Hankinson
I guess what I meant is that in MARCXML, you have a datafield element with 
subsequent subfield elements each with fairly clear attributes, which, while 
not my idea of fun Sunday-afternoon reading, requires less specialized tools to 
parse (hello Textmate!) and is a bit easier than trying to count INT positions. 
One quick XPath query and you can have all 245 fields, regardless of their 
length or position in the record.


On 2010-10-25, at 3:26 PM, Nate Vack wrote:

 On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding t...@librarything.com wrote:
 - XML is self-describing, binary is not.
 
 Not to quibble, but that's only in a theoretical sense here. Something
 like Amazon XML is truly self-describing. MARCXML is self-obfuscating.
 At least MARC records kinda imitate catalog cards.
 
 Yeah -- this is kinda the source of my confusion. In the case of the
 files I'm reading, it's not that it's hard to find out where the
 nMeasurement field lives (it's six short ints starting at offset 64),
 but what the field means, and whether or not I care about it.
 
 Switching to an XML format doesn't help with that at all.
 
 WRT character encoding issues and validation: if MARC and MARCXML are
 round-trippable, a solution in one environment is equivalent to a
 solution in the other.
 
 And I think we've all seen plenty of unvalidated, badly-formed XML,
 and plenty with Character Encoding Problemsâ„¢ ;-)
 
 Thanks for the input!
 -Nate


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread MJ Suhonos
I'll just leave this here:

http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records

That trade-off ought to offend both camps, though I happen to think it's quite 
clever.

MJ

On 2010-10-25, at 3:22 PM, Eric Hellman wrote:

 I think you'd have a very hard time demonstrating any speed advantage to MARC 
 over MARCXML. XML parsers have been speed optimized out the wazoo; If there 
 exists a MARC parser that has ever been speed-optimized without serious 
 compromise, I'm sure someone on this list will have a good story about it.
 
 On Oct 25, 2010, at 3:05 PM, Patrick Hochstenbach wrote:
 
 Dear Nate,
 
 There is a trade-off: do you want very fast processing of data - go for 
 binary data. do you want to share your data globally easily in many (not per 
 se library related) environments - go for XML/RDF. 
 Open your data and do both :-)
 
 Pat
 
 Sent from my iPhone
 
 On 25 Oct 2010, at 20:39, Nate Vack njv...@wisc.edu wrote:
 
 Hi all,
 
 I've just spent the last couple of weeks delving into and decoding a
 binary file format. This, in turn, got me thinking about MARCXML.
 
 In a nutshell, it looks like it's supposed to contain the exact same
 data as a normal MARC record, except in XML form. As in, it should be
 round-trippable.
 
 What's the advantage to this? I can see using a human-readable format
 for poorly-documented file formats -- they're relatively easy to read
 and understand. But MARC is well, well-documented, with more than one
 free implementation in cursory searching. And once you know a binary
 file's format, it's no harder to parse than XML, and the data's
 smaller and processing faster.
 
 So... why the XML?
 
 Curious,
 -Nate
 
 Eric Hellman
 President, Gluejar, Inc.
 41 Watchung Plaza, #132
 Montclair, NJ 07042
 USA
 
 e...@hellman.net 
 http://go-to-hellman.blogspot.com/
 @gluejar


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Kyle Banerjee
On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding t...@librarything.com wrote:

 Does processing speed of something matter anymore? You'd have to be
 doing a LOT of processing to care, wouldn't you?


Data migrations and data dumps are a common use case. Needing to break or
make hundreds of thousands or millions of records is not uncommon.

kyle


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Tim Spalding
Does processing speed of something matter anymore? You'd have to be
doing a LOT of processing to care, wouldn't you?

Tim

On Mon, Oct 25, 2010 at 3:35 PM, MJ Suhonos m...@suhonos.ca wrote:
 I'll just leave this here:

 http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records

 That trade-off ought to offend both camps, though I happen to think it's 
 quite clever.

 MJ

 On 2010-10-25, at 3:22 PM, Eric Hellman wrote:

 I think you'd have a very hard time demonstrating any speed advantage to 
 MARC over MARCXML. XML parsers have been speed optimized out the wazoo; If 
 there exists a MARC parser that has ever been speed-optimized without 
 serious compromise, I'm sure someone on this list will have a good story 
 about it.

 On Oct 25, 2010, at 3:05 PM, Patrick Hochstenbach wrote:

 Dear Nate,

 There is a trade-off: do you want very fast processing of data - go for 
 binary data. do you want to share your data globally easily in many (not 
 per se library related) environments - go for XML/RDF.
 Open your data and do both :-)

 Pat

 Sent from my iPhone

 On 25 Oct 2010, at 20:39, Nate Vack njv...@wisc.edu wrote:

 Hi all,

 I've just spent the last couple of weeks delving into and decoding a
 binary file format. This, in turn, got me thinking about MARCXML.

 In a nutshell, it looks like it's supposed to contain the exact same
 data as a normal MARC record, except in XML form. As in, it should be
 round-trippable.

 What's the advantage to this? I can see using a human-readable format
 for poorly-documented file formats -- they're relatively easy to read
 and understand. But MARC is well, well-documented, with more than one
 free implementation in cursory searching. And once you know a binary
 file's format, it's no harder to parse than XML, and the data's
 smaller and processing faster.

 So... why the XML?

 Curious,
 -Nate

 Eric Hellman
 President, Gluejar, Inc.
 41 Watchung Plaza, #132
 Montclair, NJ 07042
 USA

 e...@hellman.net
 http://go-to-hellman.blogspot.com/
 @gluejar




-- 
Check out my library at http://www.librarything.com/profile/timspalding


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Kyle Banerjee
On Mon, Oct 25, 2010 at 12:22 PM, Eric Hellman e...@hellman.net wrote:

 I think you'd have a very hard time demonstrating any speed advantage to
 MARC over MARCXML. XML parsers have been speed optimized out the wazoo; If
 there exists a MARC parser that has ever been speed-optimized without
 serious compromise, I'm sure someone on this list will have a good story
 about it.


I'll take MarcEdit over a XML parser for MARCXML any day. For a benchmark
test, try roundtripping a million records. Unless I've been messing with the
wrong stuff, the differences are dramatic.

kyle


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Jonathan Rochkind
Yes, it is designed to be a round-trippable expression of ordinary marc 
in XML. Some reasons this is useful:


1. No maximum record length, unlike actual marc which tops out at ~10k.
2. You can use XSLT and other XML tools to work with it, and store it in 
stores optimized for XML (or that only accept XML), etc.
3. You can embed it inside XML schema's that allow arbitrary embeddable 
XML.
4. (Of much lesser importance than these others, but still ends up being 
important to me -- saving the time of the developer does matter) it's a 
lot easier to debug the raw data, doesn't require me to open up a hex 
editor and count bytes.


Nate Vack wrote:

Hi all,

I've just spent the last couple of weeks delving into and decoding a
binary file format. This, in turn, got me thinking about MARCXML.

In a nutshell, it looks like it's supposed to contain the exact same
data as a normal MARC record, except in XML form. As in, it should be
round-trippable.

What's the advantage to this? I can see using a human-readable format
for poorly-documented file formats -- they're relatively easy to read
and understand. But MARC is well, well-documented, with more than one
free implementation in cursory searching. And once you know a binary
file's format, it's no harder to parse than XML, and the data's
smaller and processing faster.

So... why the XML?

Curious,
-Nate

  


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Jonathan Rochkind
MODS was an attempt to mostly-but-not-entirely-roundtrippably represent 
data in MARC in a format that's more 'normal' XML, without packed bytes 
in elements, with element names that are more or less self-documenting, 
etc.  It's caught on even less than MARCXML though, so if you find 
MARCXML under-adopted (I disagree), you won't like MODS.


Personally I think MODS is kind of the worst of both worlds. The only 
reason to stick with something that looks anything like MARC is to be 
round-trippable with legacy MARC, which MODS is not.  But if you're 
going to give that up, you really want more improvements than MODS 
supplies, it's still got a lot of the unfortunate legacy of MARC in it.


Nate Vack wrote:

On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding t...@librarything.com wrote:
  

- XML is self-describing, binary is not.

Not to quibble, but that's only in a theoretical sense here. Something
like Amazon XML is truly self-describing. MARCXML is self-obfuscating.
At least MARC records kinda imitate catalog cards.



Yeah -- this is kinda the source of my confusion. In the case of the
files I'm reading, it's not that it's hard to find out where the
nMeasurement field lives (it's six short ints starting at offset 64),
but what the field means, and whether or not I care about it.

Switching to an XML format doesn't help with that at all.

WRT character encoding issues and validation: if MARC and MARCXML are
round-trippable, a solution in one environment is equivalent to a
solution in the other.

And I think we've all seen plenty of unvalidated, badly-formed XML,
and plenty with Character Encoding Problemsâ„¢ ;-)

Thanks for the input!
-Nate

  


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Jonathan Rochkind
Marc in JSON can be a nice middle-ground, faster/smaller than MarcXML 
(although still probably not as binary), based on a standard low-level 
data format so easier to work with using existing tools (and developers 
eyes) than binary, no maximum record length. 

There have been a couple competing attempts to define a 
marc-expressed-in-json 'standard', none have really caught on yet. I 
like Ross's latest attempt:  
http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/


Patrick Hochstenbach wrote:

Dear Nate,

There is a trade-off: do you want very fast processing of data - go for binary data. do you want to share your data globally easily in many (not per se library related) environments - go for XML/RDF. 
Open your data and do both :-)


Pat

Sent from my iPhone

On 25 Oct 2010, at 20:39, Nate Vack njv...@wisc.edu wrote:

  

Hi all,

I've just spent the last couple of weeks delving into and decoding a
binary file format. This, in turn, got me thinking about MARCXML.

In a nutshell, it looks like it's supposed to contain the exact same
data as a normal MARC record, except in XML form. As in, it should be
round-trippable.

What's the advantage to this? I can see using a human-readable format
for poorly-documented file formats -- they're relatively easy to read
and understand. But MARC is well, well-documented, with more than one
free implementation in cursory searching. And once you know a binary
file's format, it's no harder to parse than XML, and the data's
smaller and processing faster.

So... why the XML?

Curious,
-Nate



  


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Jonathan Rochkind

Tim Spalding wrote:

Does processing speed of something matter anymore? You'd have to be
doing a LOT of processing to care, wouldn't you?
  


Yes,which sometimes you are. Say, when you're indexing 2 or 3 or 10 
million marc records into, say, solr.


Which is faster depends on what language and what libraries you are 
using for both binary marc and marcxml. But in many of our experiences, 
parseing and serializing binary marc _is_ significantly faster than 
parseing and serializing marcxml.  That is of course just one of the 
various criteria that comes into play when choosing a format.


Here's Bill Dueber's benchmarks comparing MarcXML, marc binary, and a 
marc-in-json format; in ruby, using various library alternatives.  I 
rather like the marc-in-json format for being a happy medium.  Whether 
it's standard or not doesn't neccesarily matter when you're dealing 
with your own records, passing them through several stops on a 
toolchain, and have tools available that can do it. Who cares if 
any/everyone else uses it.


http://robotlibrarian.billdueber.com/sizespeed-of-various-marc-serializations-using-ruby-marc/


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread MJ Suhonos
JSON++

I routinely re-index about 2.5M JSON records (originally from binary MARC), and 
it's several orders of magnitude faster than XML (measured in single-digit 
minutes rather than double-digit hours).  I'm not sure if it's in the same 
range as binary MARC, but as Tim says, it's plenty fast enough for pragmatic 
purposes.

Unfortunately JSON doesn't have as many mature tools for manipulation as XML 
(yet?), but I'd be inclined to call it the best of both worlds rather than a 
middle-ground or compromise.

MJ

 Marc in JSON can be a nice middle-ground, faster/smaller than MarcXML 
 (although still probably not as binary), based on a standard low-level data 
 format so easier to work with using existing tools (and developers eyes) than 
 binary, no maximum record length. 
 There have been a couple competing attempts to define a 
 marc-expressed-in-json 'standard', none have really caught on yet. I like 
 Ross's latest attempt:  
 http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/
 
 Patrick Hochstenbach wrote:
 Dear Nate,
 
 There is a trade-off: do you want very fast processing of data - go for 
 binary data. do you want to share your data globally easily in many (not per 
 se library related) environments - go for XML/RDF. Open your data and do 
 both :-)
 
 Pat
 
 Sent from my iPhone
 
 On 25 Oct 2010, at 20:39, Nate Vack njv...@wisc.edu wrote:
 
  
 Hi all,
 
 I've just spent the last couple of weeks delving into and decoding a
 binary file format. This, in turn, got me thinking about MARCXML.
 
 In a nutshell, it looks like it's supposed to contain the exact same
 data as a normal MARC record, except in XML form. As in, it should be
 round-trippable.
 
 What's the advantage to this? I can see using a human-readable format
 for poorly-documented file formats -- they're relatively easy to read
 and understand. But MARC is well, well-documented, with more than one
 free implementation in cursory searching. And once you know a binary
 file's format, it's no harder to parse than XML, and the data's
 smaller and processing faster.
 
 So... why the XML?
 
 Curious,
 -Nate

 
  


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Stephen Meyer

Kyle Banerjee wrote:

On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding t...@librarything.com wrote:


Does processing speed of something matter anymore? You'd have to be
doing a LOT of processing to care, wouldn't you?



Data migrations and data dumps are a common use case. Needing to break or
make hundreds of thousands or millions of records is not uncommon.

kyle


To make this concrete, we processes the MARC records from 14 separate 
ILS's throughout the University of Wisconsin System. We extract, sort on 
OCLC number, dedup and merge pieces from any campus that has a record 
for the work. The MARC that we then index and display here


 http://forward.library.wisconsin.edu/catalog/ocm37443537?school_code=WU

is not identical to the version of the MARC record from any of the 4 
schools that hold it.


We extract 13 million records and dedup down to 8 million every week. 
Speed is paramount.


-sm
--
Stephen Meyer
Library Application Developer
UW-Madison Libraries
436 Memorial Library
728 State St.
Madison, WI 53706

sme...@library.wisc.edu
608-265-2844 (ph)


Just don't let the human factor fail to be a factor at all.
- Andrew Bird, Tables and Chairs


Re: [CODE4LIB] Django

2010-10-25 Thread Junior Tidal
Thanks for the suggestions everyone. I haven't actively looked for resources 
since I'm busy doing collection development. However, I came across an 
advertisement for a Django book and figured it would be a useful language to 
learn. I already know php, so it seems logical that django is the next step?

Best,  

Junior Tidal
Assistant Professor
Web Services and Multimedia Librarian
New York City College of Technology, CUNY 
300 Jay Street
Brooklyn, NY 11210
718.260.5481
 
http://library.citytech.cuny.edu


 Andrew Hankinson andrew.hankin...@gmail.com 10/25/2010 10:23 AM 
There's the Django Book: http://www.djangobook.com/ (Make sure you choose the 
revised edition for 1.0)
The Django docs, with some intro tutorials: 
http://docs.djangoproject.com/en/1.2/ 

Did you try those already?


On 2010-10-25, at 10:19 AM, Junior Tidal wrote:

 Hello Code4Lib,
 
 Does anyone have any recommendations for learning Django? Books, websites, 
 video tutorials, etc. ...
 
 thanks,
 
 Junior Tidal
 Assistant Professor
 Web Services and Multimedia Librarian
 New York City College of Technology, CUNY 
 300 Jay Street
 Brooklyn, NY 11210
 718.260.5481
 
 http://library.citytech.cuny.edu


Re: [CODE4LIB] Django

2010-10-25 Thread Gabriel Farrell
Agreed on the docs at the website. If you can't figure something out
from those, dig into the source. Happy hacking!

On Mon, Oct 25, 2010 at 10:25 AM, Michael J. Giarlo
leftw...@alumni.rutgers.edu wrote:
 I'd start here:

   http://docs.djangoproject.com/en/1.2/

 There are some tutorials in there as well.

 -Mike



 On Mon, Oct 25, 2010 at 10:19, Junior Tidal jti...@citytech.cuny.edu wrote:
 Hello Code4Lib,

 Does anyone have any recommendations for learning Django? Books, websites, 
 video tutorials, etc. ...

 thanks,

 Junior Tidal
 Assistant Professor
 Web Services and Multimedia Librarian
 New York City College of Technology, CUNY
 300 Jay Street
 Brooklyn, NY 11210
 718.260.5481

 http://library.citytech.cuny.edu




Re: [CODE4LIB] Django

2010-10-25 Thread Gabriel Farrell
If you already know PHP you might want to check out Symfony or another
PHP framework to get the hang of web frameworks, then move onto other
languages from there.

On Mon, Oct 25, 2010 at 4:25 PM, Junior Tidal jti...@citytech.cuny.edu wrote:
 Thanks for the suggestions everyone. I haven't actively looked for resources 
 since I'm busy doing collection development. However, I came across an 
 advertisement for a Django book and figured it would be a useful language to 
 learn. I already know php, so it seems logical that django is the next step?

 Best,

 Junior Tidal
 Assistant Professor
 Web Services and Multimedia Librarian
 New York City College of Technology, CUNY
 300 Jay Street
 Brooklyn, NY 11210
 718.260.5481

 http://library.citytech.cuny.edu


 Andrew Hankinson andrew.hankin...@gmail.com 10/25/2010 10:23 AM 
 There's the Django Book: http://www.djangobook.com/ (Make sure you choose the 
 revised edition for 1.0)
 The Django docs, with some intro tutorials: 
 http://docs.djangoproject.com/en/1.2/

 Did you try those already?


 On 2010-10-25, at 10:19 AM, Junior Tidal wrote:

 Hello Code4Lib,

 Does anyone have any recommendations for learning Django? Books, websites, 
 video tutorials, etc. ...

 thanks,

 Junior Tidal
 Assistant Professor
 Web Services and Multimedia Librarian
 New York City College of Technology, CUNY
 300 Jay Street
 Brooklyn, NY 11210
 718.260.5481

 http://library.citytech.cuny.edu



Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Ray Denenberg, Library of Congress
It really is possible to make your point without being quite so obnoxious.
Everyone else seems to be able to do so. --Ray

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Alexander Johannesen
Sent: Monday, October 25, 2010 3:38 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?

Hiya,

On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack njv...@wisc.edu wrote:
 Switching to an XML format doesn't help with that at all.

I'm willing to take it further and say that MARCXML was the worst thing the
library world ever did. Some might argue it was a good first step, and that
it was better with something rather than nothing, to which I respond ;

Poppycock!

MARCXML is nothing short of evil. Not only does it goes against every
principal of good XML anywhere (don't rely on whitespace, structure over
code, namespace conventions, identity management, document control,
separation of entities and properties, and on and on), it breaks the
ontological commitment that a better treatment of the MARC data could bring,
deterring people from actually a) using the darn thing as anything but a
bare minimal crutch, and b) expanding it to be actual useful and
interesting.

The quicker the library world can get rid of this monstrosity, the better,
although I doubt that will ever happen; it will hang around like a foul
stench for as long as there is MARC in the world. A long time. A long sad
time.

A few extra notes;
   http://shelterit.blogspot.com/2008/09/marcxml-beast-of-burden.html

Can you tell I'm not a fan? :)


Kind regards,

Alex
--
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---


Re: [CODE4LIB] Django

2010-10-25 Thread Andrew Hankinson
Django is a web framework; Python is the language.

If you don't know the difference, I'd suggest sticking with PHP and going with 
one of the frameworks available to you there.


On 2010-10-25, at 4:25 PM, Junior Tidal wrote:

 Thanks for the suggestions everyone. I haven't actively looked for resources 
 since I'm busy doing collection development. However, I came across an 
 advertisement for a Django book and figured it would be a useful language to 
 learn. I already know php, so it seems logical that django is the next step?
 
 Best,  
 
 Junior Tidal
 Assistant Professor
 Web Services and Multimedia Librarian
 New York City College of Technology, CUNY 
 300 Jay Street
 Brooklyn, NY 11210
 718.260.5481
 
 http://library.citytech.cuny.edu
 
 
 Andrew Hankinson andrew.hankin...@gmail.com 10/25/2010 10:23 AM 
 There's the Django Book: http://www.djangobook.com/ (Make sure you choose the 
 revised edition for 1.0)
 The Django docs, with some intro tutorials: 
 http://docs.djangoproject.com/en/1.2/ 
 
 Did you try those already?
 
 
 On 2010-10-25, at 10:19 AM, Junior Tidal wrote:
 
 Hello Code4Lib,
 
 Does anyone have any recommendations for learning Django? Books, websites, 
 video tutorials, etc. ...
 
 thanks,
 
 Junior Tidal
 Assistant Professor
 Web Services and Multimedia Librarian
 New York City College of Technology, CUNY 
 300 Jay Street
 Brooklyn, NY 11210
 718.260.5481
 
 http://library.citytech.cuny.edu


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Alexander Johannesen
Ray Denenberg, Library of Congress r...@loc.gov wrote:
 It really is possible to make your point without being quite so obnoxious.

Obnoxious?


Alex
-- 
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Bill Dueber
I know there are two parts of this discussion (speed on the one hand,
applicability/features on teh other), but for the former, running a little
benchmark just isn't that hard. Aren't we supposed to, you know, prefer to
make decisions based on data?

Note: I'm only testing deserialization because there's isn't, as of now, a
fast serialization option for ruby-marc. It uses REXML, and it's dog-slow. I
already looked marc-in-json vs marc binary at
http://robotlibrarian.billdueber.com/sizespeed-of-various-marc-serializations-using-ruby-marc/

Benchmark Source: http://gist.github.com/645683

18,883 records as either an XML collection or newline-delimited json.
Open the file, read every record, pull out a title. Repeat 5 times for a
total of 94,415 records (i.e., just under 100K records total).

Under ruby-marc, using the libxml deserializer is the fastest option. If
you're using the REXML parser, well,  god help us all.

ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-darwin9.8.0]. User time
reported in seconds.

  xml w/libxml 227 seconds
  marc-in-json w/yajl  130 seconds


Soquite a bit faster (more than 40%). For a million records (assuming I
can just say 10*these_values) you're talking about a difference of 16
minutes due to just reading speed. Assuming, of course, you're running your
code on my desktop. Today.

For the 8M records I have to deal with, that'd be roughly 8M * ((227-130)
/ 94,415)  = 7806 seconds, or about 130 minutes. S...a lot.

Of course, if you're using a slower XML library or a slower JSON library,
your numbers will vary quite a bit. REXML is unforgivingly slow, and
json/pure (and even 'json') are quite a bit slower than yajl. And don't
forget that you need to serialize these things from your source somehow...

 -Bill-



On Mon, Oct 25, 2010 at 4:23 PM, Stephen Meyer sme...@library.wisc.eduwrote:

 Kyle Banerjee wrote:

 On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding t...@librarything.com
 wrote:

  Does processing speed of something matter anymore? You'd have to be
 doing a LOT of processing to care, wouldn't you?


 Data migrations and data dumps are a common use case. Needing to break or
 make hundreds of thousands or millions of records is not uncommon.

 kyle


 To make this concrete, we processes the MARC records from 14 separate ILS's
 throughout the University of Wisconsin System. We extract, sort on OCLC
 number, dedup and merge pieces from any campus that has a record for the
 work. The MARC that we then index and display here

  http://forward.library.wisconsin.edu/catalog/ocm37443537?school_code=WU

 is not identical to the version of the MARC record from any of the 4
 schools that hold it.

 We extract 13 million records and dedup down to 8 million every week. Speed
 is paramount.

 -sm
 --
 Stephen Meyer
 Library Application Developer
 UW-Madison Libraries
 436 Memorial Library
 728 State St.
 Madison, WI 53706

 sme...@library.wisc.edu
 608-265-2844 (ph)


 Just don't let the human factor fail to be a factor at all.
 - Andrew Bird, Tables and Chairs




-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] Django

2010-10-25 Thread Junior Tidal
I know the difference. 

 Andrew Hankinson andrew.hankin...@gmail.com 10/25/2010 4:40 PM 
Django is a web framework; Python is the language.

If you don't know the difference, I'd suggest sticking with PHP and going with 
one of the frameworks available to you there.


On 2010-10-25, at 4:25 PM, Junior Tidal wrote:

 Thanks for the suggestions everyone. I haven't actively looked for resources 
 since I'm busy doing collection development. However, I came across an 
 advertisement for a Django book and figured it would be a useful language to 
 learn. I already know php, so it seems logical that django is the next step?
 
 Best,  
 
 Junior Tidal
 Assistant Professor
 Web Services and Multimedia Librarian
 New York City College of Technology, CUNY 
 300 Jay Street
 Brooklyn, NY 11210
 718.260.5481
 
 http://library.citytech.cuny.edu 
 
 
 Andrew Hankinson andrew.hankin...@gmail.com 10/25/2010 10:23 AM 
 There's the Django Book: http://www.djangobook.com/ (Make sure you choose the 
 revised edition for 1.0)
 The Django docs, with some intro tutorials: 
 http://docs.djangoproject.com/en/1.2/ 
 
 Did you try those already?
 
 
 On 2010-10-25, at 10:19 AM, Junior Tidal wrote:
 
 Hello Code4Lib,
 
 Does anyone have any recommendations for learning Django? Books, websites, 
 video tutorials, etc. ...
 
 thanks,
 
 Junior Tidal
 Assistant Professor
 Web Services and Multimedia Librarian
 New York City College of Technology, CUNY 
 300 Jay Street
 Brooklyn, NY 11210
 718.260.5481
 
 http://library.citytech.cuny.edu


Re: [CODE4LIB] Django

2010-10-25 Thread Luciano Ramalho
On Mon, Oct 25, 2010 at 6:33 PM, Gabriel Farrell gsf...@gmail.com wrote:
 If you already know PHP you might want to check out Symfony or another
 PHP framework to get the hang of web frameworks, then move onto other
 languages from there.

I've been using Django for a couple of years now, and have been tasked
to introduce Django to a team in my current employer. Two of the
developers here, both experienced in PHP but just learning Python,
told me that they've found Django much simpler and easier to learn
than Symfony.

Besides the original Django Book, my colleagues have also enjoyed
Python Web Development with Django, which includes half a dozen
simple and diverse example applications.

http://www.amazon.com/Python-Development-Django-Jeff-Forcier/dp/0132356139

-- 
Luciano Ramalho
programador repentista || stand-up programmer
Twitter: @luciano


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Walker, David
 b) expanding it to be actual useful and interesting.

But here I think you've missed the very utility of MARC-XML.

Let's say you have a binary MARC file (the kind that comes out of an ILS) and 
want to transform that into MODS, Dublin Core, or maybe some other XML schema.  

How would you do that?  

One way is to first transform the MARC into MARC-XML.  Then you can use XSLT to 
crosswalk the MARC-XML into that other schema.  Very handy.

Your criticisms of MARC-XML all seem to presume that MARC-XML is the goal, the 
end point in the process.  But MARC-XML is really better seen as a utility, a 
middle step between binary MARC and the real goal, which is some other useful 
and interesting XML schema.

--Dave

==
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander 
Johannesen [alexander.johanne...@gmail.com]
Sent: Monday, October 25, 2010 12:38 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?

Hiya,

On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack njv...@wisc.edu wrote:
 Switching to an XML format doesn't help with that at all.

I'm willing to take it further and say that MARCXML was the worst
thing the library world ever did. Some might argue it was a good first
step, and that it was better with something rather than nothing, to
which I respond ;

Poppycock!

MARCXML is nothing short of evil. Not only does it goes against every
principal of good XML anywhere (don't rely on whitespace, structure
over code, namespace conventions, identity management, document
control, separation of entities and properties, and on and on), it
breaks the ontological commitment that a better treatment of the MARC
data could bring, deterring people from actually a) using the darn
thing as anything but a bare minimal crutch, and b) expanding it to be
actual useful and interesting.

The quicker the library world can get rid of this monstrosity, the
better, although I doubt that will ever happen; it will hang around
like a foul stench for as long as there is MARC in the world. A long
time. A long sad time.

A few extra notes;
   http://shelterit.blogspot.com/2008/09/marcxml-beast-of-burden.html

Can you tell I'm not a fan? :)


Kind regards,

Alex
--
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Eric Lease Morgan
On Oct 25, 2010, at 8:56 PM, Walker, David wrote:

 Your criticisms of MARC-XML all seem to presume that MARC-XML is the goal, 
 the end point in the process.  But MARC-XML is really better seen as a 
 utility, a middle step between binary MARC and the real goal, which is some 
 other useful and interesting XML schema.

Exactly.

-- 
Eric Morgan


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Alexander Johannesen
On Tue, Oct 26, 2010 at 11:56 AM, Walker, David dwal...@calstate.edu wrote:
 Your criticisms of MARC-XML all seem to presume that MARC-XML is the
 goal, the end point in the process.  But MARC-XML is really better seen as a
 utility, a middle step between binary MARC and the real goal, which is some
 other useful and interesting XML schema.

How do you create an ontological commitment in a community to an
expanding and useful set of tools and vocabularies? I think I need to
remind people of what MARCXML is supposed to be ;

a framework for working with MARC data in a XML environment. This
framework is intended to be flexible and extensible to allow users to
work with MARC data in ways specific to their needs. The framework
itself includes many components such as schemas, stylesheets, and
software tools.

I'm not assuming MARCXML is a goal, no matter how we define that. I'm
poo-pooing MARCXML for the semantics we, as a community, have been
given by a process I suspect had goals very different from reality.
Very few people would work with MARC through MARCXML, they would use
it to convert it, filter it, hack around it to something else
entirely. And I'm afraid lots of people are missing the point of
stubbing the developments in a community by embracing tools that
pushes a packet that inhibits innovation. So, here's the point, in
paraphrased point;

   Here's our new thing. And we did it by simply converting all our
MARC into MARCXML that runs on a cron job every midnight, and a bit of
horrendous XSLT that's impossible to maintain.

   But it looks just like the old thing using MARC and some templates?

   Ah yes, but now we're doing it in XML!

   (Yeah, yeah, your mileage will vary)

I'm sorry if I'm overly pessimistic about the XML goodness in the
world, not for the XML itself, but the consequences of the named
entities involved. I've been a die-hard XML wonk for far too many
years, and the tools in that tool-chest doesn't automatically solve
hard problems better by wrapping stuff up in angle brackets, and -
dare I say it? - perhaps introduces a whole fleet of other problems
rarely talked about when XML is the latest buzz-word, like using a
document model on what's a traditional records model, character
encodings, whitespace issues, unicode, size and efficiencies (the
other part of this thread), and so on.

But let me also be a bit more specific about that hard semantic
problem I'm talking about;

Lots of people around the library world infra-structure will think
that since your data is now in XML it has taken some important step
towards being inter-operable with the rest of the world, that library
data now is part of the real world in *any* meaningful way, but this
is simply demonstrably deceivingly not true. By having our data in XML
has killed a few good projects where people have gone A new project
to convert our MARC into useful XML? Aha! LoC has already solved that
problem for us.

Btw, to those who find me so obnoxious, at no point do I say it was
intentionally evil, just evil none the same. The road to hell is, as
always, paved with good intentions.


Alex
-- 
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Bill Dueber
On Mon, Oct 25, 2010 at 9:32 PM, Alexander Johannesen 
alexander.johanne...@gmail.com wrote:

 Lots of people around the library world infra-structure will think
 that since your data is now in XML it has taken some important step
 towards being inter-operable with the rest of the world, that library
 data now is part of the real world in *any* meaningful way, but this
 is simply demonstrably deceivingly not true.


Here, I think you're guilty of radically underestimating lots of people
around the library world. No one thinks MARC is a good solution to our
modern problems, and no one who actually knows what MARC is has trouble
understanding MARC-XML as an XML serialization of the same old data --
certainly not anyone capable of meaningful contribution to work on an
alternative.

You seem to presuppose that there's an enormous pent-up energy poised to
sweep in changes to an obviously-better data format, and that the existence
of MARC-XML somehow defuses all that energy. The truth is that a high
percentage of people that work with MARC data actively think about (or
curse) things that are wrong with it and gobs and gobs of ridiculously-smart
people work on a variety of alternate solutions (not the least of which is
RDA) and get their organizations to spend significant money to do so. The
problem we're dealing with is *hard*. Mind-numbingly hard.

The library world has several generations of infrastructure built around
MARC (by which I mean AACR2), and devising data structures and standards
that are a big enough improvement over MARC to warrant replacing all
that infrastructure is an engineering and political nightmare. I'm happy to
take potshots at the RDA stuff from the sidelines, but I never forget that
I'm on the sidelines, and that the people active in the game are among the
best and brightest we have to offer, working on a problem that invariably
seems more intractable the deeper in you go.

If you think MARC-XML is some sort of an actual problem, and that people
just need to be shouted at to realize that and do something about it, then,
well, I think you're just plain wrong.

  -Bill-

-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Alexander Johannesen
On Tue, Oct 26, 2010 at 12:48 PM, Bill Dueber b...@dueber.com wrote:
 Here, I think you're guilty of radically underestimating lots of people
 around the library world. No one thinks MARC is a good solution to
 our modern problems, and no one who actually knows what MARC
 is has trouble understanding MARC-XML as an XML serialization of
 the same old data -- certainly not anyone capable of meaningful
 contribution to work on an alternative.

Slow down, Tex. Lots of people in the library world is not the same
as developers, or even good developers, or even good XML developers,
or even good XML developers who knows what the document model imposes
to a data-centric approach.

 The problem we're dealing with is *hard*. Mind-numbingly hard.

This is no justification for not doing things better. (And I'd love to
know what the hard bits are; always interesting to hear from various
people as to what they think are the *real* problems of library
problems, as opposed to any other problem they have)

 The library world has several generations of infrastructure built
 around MARC (by which I mean AACR2), and devising data
 structures and standards that are a big enough improvement over
  MARC to warrant replacing all that infrastructure is an engineering
  and political nightmare.

Political? For sure. Engineering? Not so much. This is just that whole
blinded by MARC issue that keeps cropping up from time to time, and
rightly so; it is truly a beast - at least the way we have come to
know it through AACR2 and all its friends and its death-defying focus
on all things bibliographic - that has paralyzed library innovation,
probably to the point of making libraries almost irrelevant to the
world.

 I'm happy to take potshots at the RDA stuff from the sidelines, but I never
 forget that I'm on the sidelines, and that the people active in the game are
 among the best and brightest we have to offer, working on a problem that
  invariably seems more intractable the deeper in you go.

Well, that's a pretty scary sentence, for all sorts of reasons, but I
think I shall not go there.

 If you think MARC-XML is some sort of an actual problem

What, because you don't agree with me the problem doesn't exist? :)

 and that people
 just need to be shouted at to realize that and do something about it, then,
 well, I think you're just plain wrong.

Fair enough, although you seem to be under the assumption that all of
the stuff I'm saying is a figment of my imagination (I've been
involved in several projects lambasted because managers think MARCXML
is solving some imaginary problem; this is not bullshit, but pain and
suffering from the battlefields of library development), that I'm not
one of those developers (or one of you, although judging from this
discussion it's clear that I am not), that the things I say somehow
doesn't apply because you don't agree with, umm, what I'm assuming is
my somewhat direct approach to stating my heretic opinions.


Alex
-- 
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Dana Pearson
i'm not a coder but i undertook a study of XML some years after it
came onto the scene and with a likely confused notion that it would be
the next significant technology, I learned some XSL and later was able
to weave PubMed Central journal information (CSV transformed into XML)
together with Dublin Core metadata of journal articles into MARCXML
during harvest with MarcEdit (which the inestimable Terry Reece
continues to tweak).  Also used the same XML journal data to augment
NLM  journal records with PubMed Central holdings and other data with
a transform in my IDE though it took me weeks to get right..so, no
asperations to become a coder.

Probably did not get all of the MARC cataloging rules right and I can
empathize with those who come to MARC and cataloging standards without
cataloging training, experience. My library experience was primarily
as library director...my expertise on library specializations would
always be under question.

regards,
dana








-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Bill Dueber
On Mon, Oct 25, 2010 at 10:10 PM, Alexander Johannesen 
alexander.johanne...@gmail.com wrote:

 Political? For sure. Engineering? Not so much.


Ok. Solve it. Let us know when you're done.


-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Bill Dueber
Sorry. That was rude, and uncalled for. I disagree that the problem is
easily solved, even without the politics. There've been lots of attempts to
try to come up with a sufficiently expressive toolset for dealing with
biblio data, and we're still working on it. If you do think you've got some
insight, I'm sure we're all ears, but try to frame it terms of the existing
work if you can (RDA, some of the dublin core stuff, etc.) so we have a
frame of reference.

On Mon, Oct 25, 2010 at 10:18 PM, Bill Dueber b...@dueber.com wrote:

 On Mon, Oct 25, 2010 at 10:10 PM, Alexander Johannesen 
 alexander.johanne...@gmail.com wrote:

 Political? For sure. Engineering? Not so much.


 Ok. Solve it. Let us know when you're done.



 --
 Bill Dueber
 Library Systems Programmer
 University of Michigan Library




-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library