[CODE4LIB] Fedora Workshop in Knoxville - Registration Open
Hi all, I'm pleased to announce that, thanks to the efforts of my colleague Christina Harlow, the University of Tennessee Libraries will be hosting a Fedora Workshop on Friday, June 26th. Please see the following link [1] to learn more and register. Space is limited! Register soon! Please feel free to send us/me/her questions. Warmly, Bridger -- Bridger Dyson-Smith Digital Initiatives University of Tennessee Libraries "Trust the Computer. The Computer is Your Friend." [1] http://cmh2166.github.io/Fedora4Knox/
Re: [CODE4LIB] Drupal code club
On Wed, Mar 4, 2015 at 6:31 AM, Brown, Bryan wrote: > With the overwhelming response to the recent post about a Python/PyMARC > code club on the Code4Lib list, I was wondering if anyone would be > interested in something similar for Drupal. As a Drupal user and a wannabe > module developer, I'm finding that reading other people's modules > (especially core) is one of the best ways to really understand what's going > on under the hood. Much like Python, Drupal has its own unique idioms and > ways of looking at problems, and reading modules helps one get into this > mindset. > > If anyone else is interested in this, please let me know either through a > reply to this thread or a direct email (bjbr...@fsu.edu). If there is > sufficient interest, I'll try to set up something similar to what the > Python/PyMARC crowd is doing. > > - Bryan Brown > +1 Thanks, Bridger
Re: [CODE4LIB] OAI Crosswalk XSLT
And just because I'm drinking too much coffee... If you're using an XSLT 2.0 processor for this you can do some things with variables that might make things a little easier; e.g. You can call those variables in a concat() function and you won't have to deal with wonky spacing in your output. There are almost certainly a bunch of much better ways to do this - I'll never be anything better than an XSLT apprentice - but it might be a good starting point. See the second for the difference in output. Cheers. --> cat sherman.xml 1 Quarterly Review of Economics and Finance 47 --> cat sherman.xsl http://www.w3.org/1999/XSL/Transform"; xmlns:xs="http://www.w3.org/2001/XMLSchema"; exclude-result-prefixes="xs" version="2.0"> Vol. Issue --> saxon -s:./sherman.xml -xsl:./sherman.xsl Quarterly Review of Economics and Finance Vol. 47 Issue 1 Quarterly Review of Economics and Finance Vol. 47 Issue 1 On Fri, Jul 11, 2014 at 1:33 PM, Matthew Sherman wrote: > Thanks, that is very helpful. > > > On Fri, Jul 11, 2014 at 1:30 PM, Bridger Dyson-Smith < > bdysonsm...@gmail.com> > wrote: > > > Hi Matthew, > > > > That looks good to me. The only thing I might suggest -- depending on > your > > needs -- is to add around your literals; e.g. > > > > Vol. > > select="dcvalue[@qualifier='volume']"/> > > Issue > > > > If the processor you are using does something weird with white space, > > you'll avoid it by having the white space in text element. You may need a > > more precise XPath, depending on the context of your template, but the > > initial statement didn't look to bad. > > > > Hope that helps. > > Best, > > Bridger > > > > > > On Fri, Jul 11, 2014 at 11:24 AM, Matthew Sherman < > > matt.r.sher...@gmail.com> > > wrote: > > > > > Given the DSpace Dublin Core formatting I would like to be able to take > > > this: > > > > > > > > language="">1 > > > > language="">Quarterly > > > Review of Economics and Finance > > > 47 > > > > > > And turn during a OAI harvest turn it into: > > > > > > Quarterly Review of Economics and > > > Finance Vol. 47 Issue 1 > > > > > > I am thinking I can just add > > > > > > Vol. > > > Issue > > > > > > in the identifier section of the cross walk, but I am not 100% sure. > > Also > > > I am not sure if I will need to use the excessively complex XPath to > > > reference my source values. Can anyone tell me if I am on the right > > track? > > > > > > > > > On Fri, Jul 11, 2014 at 11:13 AM, Matthew Sherman < > > > matt.r.sher...@gmail.com> > > > wrote: > > > > > > > Ok, that makes sense. While I knew of OAI-PMH this is my first time > > > > really getting my hands dirty with it so I wasn't sure if this > > > > exceptionally detailed formatting was a function of the OAI protocols > > or > > > a > > > > function of DSpace. I also extracted a metadata record from DSpace > to > > > see > > > > how they are formatting it and this I what I found for the type > field: > > > > > > > > Poster > > > > > > > > > > > > On Fri, Jul 11, 2014 at 11:08 AM, Dunn, Katie > wrote: > > > > > > > >> Matt said: "I guess it is the "doc:element/doc:element/doc:field" > > thing > > > >> that is mostly what it throwing me." > > > >> > > > >> More DSpacey people than I can probably comment more knowledgeably > on > > > >> this, but this seems like less of an OAI-PMH thing than a DSpace > > thing. > > > It > > > >> looks like maybe DSpace stores metadata internally in a generic > > > >> metadata/element/field structure like Bridger showed (with doc > > > namespace): > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> ...and the select is pulling the information it needs for the > > >
Re: [CODE4LIB] OAI Crosswalk XSLT
Hi Matthew, That looks good to me. The only thing I might suggest -- depending on your needs -- is to add around your literals; e.g. Vol. Issue If the processor you are using does something weird with white space, you'll avoid it by having the white space in text element. You may need a more precise XPath, depending on the context of your template, but the initial statement didn't look to bad. Hope that helps. Best, Bridger On Fri, Jul 11, 2014 at 11:24 AM, Matthew Sherman wrote: > Given the DSpace Dublin Core formatting I would like to be able to take > this: > > language="">1 > Quarterly > Review of Economics and Finance > 47 > > And turn during a OAI harvest turn it into: > > Quarterly Review of Economics and > Finance Vol. 47 Issue 1 > > I am thinking I can just add > > Vol. > Issue > > in the identifier section of the cross walk, but I am not 100% sure. Also > I am not sure if I will need to use the excessively complex XPath to > reference my source values. Can anyone tell me if I am on the right track? > > > On Fri, Jul 11, 2014 at 11:13 AM, Matthew Sherman < > matt.r.sher...@gmail.com> > wrote: > > > Ok, that makes sense. While I knew of OAI-PMH this is my first time > > really getting my hands dirty with it so I wasn't sure if this > > exceptionally detailed formatting was a function of the OAI protocols or > a > > function of DSpace. I also extracted a metadata record from DSpace to > see > > how they are formatting it and this I what I found for the type field: > > > > Poster > > > > > > On Fri, Jul 11, 2014 at 11:08 AM, Dunn, Katie wrote: > > > >> Matt said: "I guess it is the "doc:element/doc:element/doc:field" thing > >> that is mostly what it throwing me." > >> > >> More DSpacey people than I can probably comment more knowledgeably on > >> this, but this seems like less of an OAI-PMH thing than a DSpace thing. > It > >> looks like maybe DSpace stores metadata internally in a generic > >> metadata/element/field structure like Bridger showed (with doc > namespace): > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> ...and the select is pulling the information it needs for the /> > >> element in the OAI-PMH output out of the internal DSpace structure. > >> > >> Katie > >> > >> > >> -Original Message- > >> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > >> Bridger Dyson-Smith > >> Sent: Friday, July 11, 2014 10:56 AM > >> To: CODE4LIB@LISTSERV.ND.EDU > >> Subject: Re: [CODE4LIB] OAI Crosswalk XSLT > >> > >> Hi Matt, > >> > >> Michael Kays' XSLT 2.0 and XPath 2.0 is a great reference and is > >> available as an eBook. Mulberry Technologies has some quick reference > >> guides [1] that might be helpful. > >> > >> Cheers, > >> Bridger > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> [1] http://www.mulberrytech.com/quickref/ > >> > >> > >> > >> On Fri, Jul 11, 2014 at 10:38 AM, Matthew Sherman < > >> matt.r.sher...@gmail.com> > >> wrote: > >> > >> > Hi Code4Lib folks, > >> > > >> > I have a question for those of you who have worked with OAI-PMH. I am > >> > currently editing our DSpace OAI crosswalk to include a few custom > >> > metadata field that exist in our repository for publication > >> > information and port them into a more standard format. The problem I > >> > am running into is the select statements they use are not the typical > >> > XPath statements I am used to. For example: > >> > > >> > >> > > >> > select="doc:metadata/doc:element[@name='dc']/doc:element[@name='type'] > >> > /doc:element/doc:element/doc:field[@name='value']"> > >> > > >> > > >> > I know what the "." does, but the other select statement is a bit > >> > foreign to me. So my question is, does anyone know of some reference > >> > material that can help me make sense of this select? I need to > >> > understand what it is doing so I can make my own. Thanks for any > >> insight you can provide. > >> > > >> > Matt Sherman > >> > > >> > > > > >
Re: [CODE4LIB] OAI Crosswalk XSLT
Hi Matt, Michael Kays' XSLT 2.0 and XPath 2.0 is a great reference and is available as an eBook. Mulberry Technologies has some quick reference guides [1] that might be helpful. Cheers, Bridger [1] http://www.mulberrytech.com/quickref/ On Fri, Jul 11, 2014 at 10:38 AM, Matthew Sherman wrote: > Hi Code4Lib folks, > > I have a question for those of you who have worked with OAI-PMH. I am > currently editing our DSpace OAI crosswalk to include a few custom metadata > field that exist in our repository for publication information and port > them into a more standard format. The problem I am running into is the > select statements they use are not the typical XPath statements I am used > to. For example: > > > select="doc:metadata/doc:element[@name='dc']/doc:element[@name='type']/doc:element/doc:element/doc:field[@name='value']"> > > > > I know what the "." does, but the other select statement is a bit foreign > to me. So my question is, does anyone know of some reference material that > can help me make sense of this select? I need to understand what it is > doing so I can make my own. Thanks for any insight you can provide. > > Matt Sherman >
Re: [CODE4LIB] extracting tiff info
Hi Kyle, +1 for Exiftool, but as Nick mentioned, it depends on what information you're wanting to extract. Best, Bridger PS exiftool -a -G1 -s image-name.tif > image-exif.txt has come in very handy for us. HTH. On Mon, Nov 19, 2012 at 4:31 PM, Kyle Banerjee wrote: > Howdy all, > > I need to extract all the metadata from a few thousand images on a network > drive and put it into spreadsheet. Since the files are huge (each is > 100MB+) and my connection isn't that fast, I strongly prefer to not move > them before working on them -- i.e. I'm using cygwin and/or windows. > > Just eyeballing these things, I see the headers contain everything I need > in purty rdf. What's the best way to extract this? I thought tiffinfo would > do the trick, but it's just giving me technical info. Of course I can just > parse the files with perl but I'm thinking there just has to be a slicker > way to do this. What's my best option? Thanks, > > kyle >
Re: [CODE4LIB] PDF Compression
Have you tried ghostscript? It should be available for any *nix-like OS or Windows [1]. Cheers, Bridger [1] http://www.ghostscript.com/download/ On Wed, Oct 24, 2012 at 10:59 AM, Paul Butler (pbutler3) wrote: > Have you looked into Irfanview's [ww.irfanview.com] batch conversion > settings and plugins? Might be something there that is useful. > Cheers, Paul > +-+-+-+-+-+-+-+-+-+-+-+-+ > Paul R Butler > Assistant Systems Librarian > Simpson Library > University of Mary Washington > 1801 College Avenue > Fredericksburg, VA 22401 > 540.654.1756 > libraries.umw.edu > > Sent from the mighty Dell Vostro 230. > > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Nathan Tallman > Sent: Wednesday, October 24, 2012 10:29 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: [CODE4LIB] PDF Compression > > Can anyone recommend some good PDF compression software? Preferable > open-source or low-cost. We're scanning archival collections and the PDFs > can be quite large for a single folder. The folder may be thick or thin, > and contain a mix of text and images. We've fiddled with various Acrobat > settings for getting the file size down, but we haven't found a good > balance between quality and file size. (Plus, these need to be OCR'ed; so > far we've been doing that in Acrobat.) > > We were looking at LuraTech PDF Compressor, but the cost for an enterprise > license is pretty high. It did do an excellent job though. > > Thanks, > Nathan >
Re: [CODE4LIB] OCR To ALTO without ABBYY
You might take a look at Tesseract [1]. On a typical Linux box: $ tesseract input.tif outputName hocr renders html with some coordinate information. You might be able to process from that output to ALTO. Cheers, Bridger [1] http://code.google.com/p/tesseract-ocr/ On Thu, Sep 6, 2012 at 8:29 AM, Michael Beccaria wrote: > I inadvertently purchase ABBYY Finereader 11 Corporate thinking that it > would be capable of outputting to ALTO XML. I was wrong. ABBYY Finereader > Engine does:/ > > Ultimately, I want to OCR some newspaper images and export them to ALTO > XML and, until the proof of concept is done, I want to try to do it on the > cheap. My plan this morning was to write some scripts to OCR them using > Microsoft Office Document Imaging (MODI) and then export the results to > ALTO XML which could be a big project. Has anyone done this before or know > of a quick and dirty way to get some OCR data? > Thanks, > Mike Beccaria > Systems Librarian > Paul Smith's College > 518.327.6376 >
Re: [CODE4LIB] Transport options from Charlotte to Asheville for c4l2010
Hello - an FYI: if you're planning on flying into Knoxville and making a drive to Asheville via I-40, be advised that there has been a rather large rockslide and 40 is closed. Here's a link to the official update -- http://www.ncdot.org/traffictravel/. Safe travels to all. cheers, Bridger -- Bridger Dyson-Smith Digital Library Initiatives University of Tennessee Libraries On Thu, Nov 12, 2009 at 8:29 AM, John Fereira wrote: > Ross Singer wrote: > >> Likewise, Knoxville is also ~1.5 hours from Asheville. Between >> Greenville, Charlotte and Knoxville you might be able to catch a >> special deal. >> >> > A bit closer is the Tri-Cities airport (Johnson City, Bristol, Kingsport). > I've flown in there a couple of times when my in-laws lived in Johnson > City. It's a real nice drive, about an hour and 20 minutes, from there to > Asheville. > > -- > John Fereira > Cornell University > Twitter: @john_fereira > Google Wave: fere...@googlewave.com >
Re: [CODE4LIB] OCR PDFs
If you haven't already, take a look at tesseract ( http://code.google.com/p/tesseract-ocr/). There's some discussion of using tesseract and shell scripting to work with tiffs to pdfs to ocr'd text, which isn't exactly what you're wanting to do, I know, but may prove helpful (http://www.groklaw.net/articlebasic.php?story=20061210115516438). Cheers! Bridger Dyson-Smith On Fri, Oct 17, 2008 at 8:28 AM, Terry Harrison <[EMAIL PROTECTED]> wrote: > You might want to look at ABBYY Fine Reader 9.0 Professional, which can be > driven from the command line. Fine Reader is used at the Library of > Congress. Here is a info link to get you started (search "command"): > > > http://www.scanstore.com/Scanning/Document_Imaging/Software/OCR_Software/Nuance/omnipage_review.asp > > Regards, > Terry > > > Terry Harrison > Project Manager > CACI > 5505 Robin Hood Road, Suite F > Norfolk, Va. 23508 > Ph: 757.321.9120 x232 > Fax: 757.321.8797 > [EMAIL PROTECTED] >
[CODE4LIB] Job Posting: IT Administrator, Digital Library Initiatives
and Diversity. --- Bridger Dyson-Smith Research Assistant Professor Digital Library Initiatives University of Tennessee Libraries John C. Hodges Library 865-974-0012 [EMAIL PROTECTED]