[CODE4LIB] Fedora Workshop in Knoxville - Registration Open

2015-05-12 Thread Bridger Dyson-Smith
Hi all,

I'm pleased to announce that, thanks to the efforts of my colleague
Christina Harlow, the University of Tennessee Libraries will be hosting a
Fedora Workshop on Friday, June 26th. Please see the following link [1] to
learn more and register.

Space is limited! Register soon!

Please feel free to send us/me/her questions.

Warmly,
Bridger
--
Bridger Dyson-Smith
Digital Initiatives
University of Tennessee Libraries

"Trust the Computer. The Computer is Your Friend."

[1] http://cmh2166.github.io/Fedora4Knox/


Re: [CODE4LIB] Drupal code club

2015-03-05 Thread Bridger Dyson-Smith
On Wed, Mar 4, 2015 at 6:31 AM, Brown, Bryan  wrote:

> With the overwhelming response to the recent post about a Python/PyMARC
> code club on the Code4Lib list, I was wondering if anyone would be
> interested in something similar for Drupal. As a Drupal user and a wannabe
> module developer, I'm finding that reading other people's modules
> (especially core) is one of the best ways to really understand what's going
> on under the hood. Much like Python, Drupal has its own unique idioms and
> ways of looking at problems, and reading modules helps one get into this
> mindset.
>
> If anyone else is interested in this, please let me know either through a
> reply to this thread or a direct email (bjbr...@fsu.edu). If there is
> sufficient interest, I'll try to set up something similar to what the
> Python/PyMARC crowd is doing.
>
> - Bryan Brown
>

+1

Thanks,
Bridger


Re: [CODE4LIB] OAI Crosswalk XSLT

2014-07-11 Thread Bridger Dyson-Smith
And just because I'm drinking too much coffee...

If you're using an XSLT 2.0 processor for this you can do some things with
variables that might make things a little easier; e.g.





You can call those variables in a concat() function and you won't have to
deal with wonky spacing in your output. There are almost certainly a bunch
of much better ways to do this - I'll never be anything better than an XSLT
apprentice - but it might be a good starting point.

See the second  for the difference in output.
Cheers.

--> cat sherman.xml


1
Quarterly
Review of Economics and Finance
47


--> cat sherman.xsl

http://www.w3.org/1999/XSL/Transform";
xmlns:xs="http://www.w3.org/2001/XMLSchema";
exclude-result-prefixes="xs"
version="2.0">







 Vol.

Issue 







--> saxon -s:./sherman.xml -xsl:./sherman.xsl

Quarterly
Review of Economics and Finance Vol. 47 Issue 1
Quarterly Review of Economics and Finance Vol. 47 Issue
1


On Fri, Jul 11, 2014 at 1:33 PM, Matthew Sherman 
wrote:

> Thanks, that is very helpful.
>
>
> On Fri, Jul 11, 2014 at 1:30 PM, Bridger Dyson-Smith <
> bdysonsm...@gmail.com>
> wrote:
>
> > Hi Matthew,
> >
> > That looks good to me. The only thing I might suggest -- depending on
> your
> > needs -- is to add  around your literals; e.g.
> >
> >  Vol.
> >  select="dcvalue[@qualifier='volume']"/>
> > Issue 
> >
> > If the processor you are using does something weird with white space,
> > you'll avoid it by having the white space in text element. You may need a
> > more precise XPath, depending on the context of your template, but the
> > initial statement didn't look to bad.
> >
> > Hope that helps.
> > Best,
> > Bridger
> >
> >
> > On Fri, Jul 11, 2014 at 11:24 AM, Matthew Sherman <
> > matt.r.sher...@gmail.com>
> > wrote:
> >
> > > Given the DSpace Dublin Core formatting I would like to be able to take
> > > this:
> > >
> > >  > > language="">1
> > >   > language="">Quarterly
> > > Review of Economics and Finance
> > > 47
> > >
> > > And turn during a OAI harvest turn it into:
> > >
> > > Quarterly Review of Economics and
> > > Finance Vol. 47 Issue 1
> > >
> > > I am thinking I can just add
> > >
> > >  Vol. 
> > > Issue 
> > >
> > > in the identifier section of the cross walk, but I am not 100% sure.
> >  Also
> > > I am not sure if I will need to use the excessively complex XPath to
> > > reference my source values.  Can anyone tell me if I am on the right
> > track?
> > >
> > >
> > > On Fri, Jul 11, 2014 at 11:13 AM, Matthew Sherman <
> > > matt.r.sher...@gmail.com>
> > > wrote:
> > >
> > > > Ok, that makes sense.  While I knew of OAI-PMH this is my first time
> > > > really getting my hands dirty with it so I wasn't sure if this
> > > > exceptionally detailed formatting was a function of the OAI protocols
> > or
> > > a
> > > > function of DSpace.  I also extracted a metadata record from DSpace
> to
> > > see
> > > > how they are formatting it and this I what I found for the type
> field:
> > > >
> > > > Poster
> > > >
> > > >
> > > > On Fri, Jul 11, 2014 at 11:08 AM, Dunn, Katie 
> wrote:
> > > >
> > > >> Matt said: "I guess it is the "doc:element/doc:element/doc:field"
> > thing
> > > >> that is mostly what it throwing me."
> > > >>
> > > >> More DSpacey people than I can probably comment more knowledgeably
> on
> > > >> this, but this seems like less of an OAI-PMH thing than a DSpace
> > thing.
> > > It
> > > >> looks like maybe DSpace stores metadata internally in a generic
> > > >> metadata/element/field structure like Bridger showed (with doc
> > > namespace):
> > > >>
> > > >> 
> > > >>  
> > > >> 
> > > >>  
> > > >> 
> > > >> 
> > > >> 
> > > >>  
> > > >>
> > > >> ...and the select is pulling the information it needs for the
>  > > 

Re: [CODE4LIB] OAI Crosswalk XSLT

2014-07-11 Thread Bridger Dyson-Smith
Hi Matthew,

That looks good to me. The only thing I might suggest -- depending on your
needs -- is to add  around your literals; e.g.

 Vol.

Issue 

If the processor you are using does something weird with white space,
you'll avoid it by having the white space in text element. You may need a
more precise XPath, depending on the context of your template, but the
initial statement didn't look to bad.

Hope that helps.
Best,
Bridger


On Fri, Jul 11, 2014 at 11:24 AM, Matthew Sherman 
wrote:

> Given the DSpace Dublin Core formatting I would like to be able to take
> this:
>
>  language="">1
>  Quarterly
> Review of Economics and Finance
> 47
>
> And turn during a OAI harvest turn it into:
>
> Quarterly Review of Economics and
> Finance Vol. 47 Issue 1
>
> I am thinking I can just add
>
>  Vol. 
> Issue 
>
> in the identifier section of the cross walk, but I am not 100% sure.  Also
> I am not sure if I will need to use the excessively complex XPath to
> reference my source values.  Can anyone tell me if I am on the right track?
>
>
> On Fri, Jul 11, 2014 at 11:13 AM, Matthew Sherman <
> matt.r.sher...@gmail.com>
> wrote:
>
> > Ok, that makes sense.  While I knew of OAI-PMH this is my first time
> > really getting my hands dirty with it so I wasn't sure if this
> > exceptionally detailed formatting was a function of the OAI protocols or
> a
> > function of DSpace.  I also extracted a metadata record from DSpace to
> see
> > how they are formatting it and this I what I found for the type field:
> >
> > Poster
> >
> >
> > On Fri, Jul 11, 2014 at 11:08 AM, Dunn, Katie  wrote:
> >
> >> Matt said: "I guess it is the "doc:element/doc:element/doc:field" thing
> >> that is mostly what it throwing me."
> >>
> >> More DSpacey people than I can probably comment more knowledgeably on
> >> this, but this seems like less of an OAI-PMH thing than a DSpace thing.
> It
> >> looks like maybe DSpace stores metadata internally in a generic
> >> metadata/element/field structure like Bridger showed (with doc
> namespace):
> >>
> >> 
> >>  
> >> 
> >>  
> >> 
> >> 
> >> 
> >>  
> >>
> >> ...and the select is pulling the information it needs for the  />
> >> element in the OAI-PMH output out of the internal DSpace structure.
> >>
> >> Katie
> >>
> >>
> >> -Original Message-
> >> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> >> Bridger Dyson-Smith
> >> Sent: Friday, July 11, 2014 10:56 AM
> >> To: CODE4LIB@LISTSERV.ND.EDU
> >> Subject: Re: [CODE4LIB] OAI Crosswalk XSLT
> >>
> >> Hi Matt,
> >>
> >> Michael Kays' XSLT 2.0 and XPath 2.0 is a great reference and is
> >> available as an eBook. Mulberry Technologies has some quick reference
> >> guides [1] that might be helpful.
> >>
> >> Cheers,
> >> Bridger
> >>
> >> 
> >>  
> >> 
> >>  
> >> 
> >> 
> >> 
> >>  
> >>
> >> [1] http://www.mulberrytech.com/quickref/
> >>
> >>
> >>
> >> On Fri, Jul 11, 2014 at 10:38 AM, Matthew Sherman <
> >> matt.r.sher...@gmail.com>
> >> wrote:
> >>
> >> > Hi Code4Lib folks,
> >> >
> >> > I have a question for those of you who have worked with OAI-PMH.  I am
> >> > currently editing our DSpace OAI crosswalk to include a few custom
> >> > metadata field that exist in our repository for publication
> >> > information and port them into a more standard format.  The problem I
> >> > am running into is the select statements they use are not the typical
> >> > XPath statements I am used to.  For example:
> >> >
> >> >  >> >
> >> > select="doc:metadata/doc:element[@name='dc']/doc:element[@name='type']
> >> > /doc:element/doc:element/doc:field[@name='value']">
> >> >  
> >> >
> >> > I know what the "." does, but the other select statement is a bit
> >> > foreign to me.  So my question is, does anyone know of some reference
> >> > material that can help me make sense of this select?  I need to
> >> > understand what it is doing so I can make my own.  Thanks for any
> >> insight you can provide.
> >> >
> >> > Matt Sherman
> >> >
> >>
> >
> >
>


Re: [CODE4LIB] OAI Crosswalk XSLT

2014-07-11 Thread Bridger Dyson-Smith
Hi Matt,

Michael Kays' XSLT 2.0 and XPath 2.0 is a great reference and is available
as an eBook. Mulberry Technologies has some quick reference guides [1] that
might be helpful.

Cheers,
Bridger


 

 



 

[1] http://www.mulberrytech.com/quickref/



On Fri, Jul 11, 2014 at 10:38 AM, Matthew Sherman 
wrote:

> Hi Code4Lib folks,
>
> I have a question for those of you who have worked with OAI-PMH.  I am
> currently editing our DSpace OAI crosswalk to include a few custom metadata
> field that exist in our repository for publication information and port
> them into a more standard format.  The problem I am running into is the
> select statements they use are not the typical XPath statements I am used
> to.  For example:
>
> 
> select="doc:metadata/doc:element[@name='dc']/doc:element[@name='type']/doc:element/doc:element/doc:field[@name='value']">
> 
> 
>
> I know what the "." does, but the other select statement is a bit foreign
> to me.  So my question is, does anyone know of some reference material that
> can help me make sense of this select?  I need to understand what it is
> doing so I can make my own.  Thanks for any insight you can provide.
>
> Matt Sherman
>


Re: [CODE4LIB] extracting tiff info

2012-11-19 Thread Bridger Dyson-Smith
Hi Kyle,
+1 for Exiftool, but as Nick mentioned, it depends on what information
you're wanting to extract.

Best,
Bridger

PS exiftool -a -G1 -s image-name.tif > image-exif.txt has come in very
handy for us. HTH.


On Mon, Nov 19, 2012 at 4:31 PM, Kyle Banerjee wrote:

> Howdy all,
>
> I need to extract all the metadata from a few thousand images on a network
> drive and put it into spreadsheet. Since the files are huge (each is
> 100MB+) and my connection isn't that fast, I strongly prefer to not move
> them before working on them -- i.e. I'm using cygwin and/or windows.
>
> Just eyeballing these things, I see the headers contain everything I need
> in purty rdf. What's the best way to extract this? I thought tiffinfo would
> do the trick, but it's just giving me technical info. Of course I can just
> parse the files with perl but I'm thinking there just has to be a slicker
> way to do this. What's my best option? Thanks,
>
> kyle
>


Re: [CODE4LIB] PDF Compression

2012-10-24 Thread Bridger Dyson-Smith
Have you tried ghostscript? It should be available for any *nix-like OS or
Windows [1].

Cheers,
Bridger
[1] http://www.ghostscript.com/download/

On Wed, Oct 24, 2012 at 10:59 AM, Paul Butler (pbutler3)
wrote:

> Have you looked into Irfanview's [ww.irfanview.com] batch conversion
> settings and plugins?  Might be something there that is useful.
> Cheers, Paul
> +-+-+-+-+-+-+-+-+-+-+-+-+
> Paul R Butler
> Assistant Systems Librarian
> Simpson Library
> University of Mary Washington
> 1801 College Avenue
> Fredericksburg, VA 22401
> 540.654.1756
> libraries.umw.edu
>
> Sent from the mighty Dell Vostro 230.
>
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Nathan Tallman
> Sent: Wednesday, October 24, 2012 10:29 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] PDF Compression
>
> Can anyone recommend some good PDF compression software? Preferable
> open-source or low-cost. We're scanning archival collections and the PDFs
> can be quite large for a single folder. The folder may be thick or thin,
> and contain a mix of text and images. We've fiddled with various Acrobat
> settings for getting the file size down, but we haven't found a good
> balance between quality and file size. (Plus, these need to be OCR'ed; so
> far we've been doing that in Acrobat.)
>
> We were looking at LuraTech PDF Compressor, but the cost for an enterprise
> license is pretty high. It did do an excellent job though.
>
> Thanks,
> Nathan
>


Re: [CODE4LIB] OCR To ALTO without ABBYY

2012-09-06 Thread Bridger Dyson-Smith
You might take a look at Tesseract [1]. On a typical Linux box:

$ tesseract input.tif outputName hocr

renders html with some coordinate information. You might be able to process
from that output to ALTO.

Cheers,
Bridger
[1] http://code.google.com/p/tesseract-ocr/


On Thu, Sep 6, 2012 at 8:29 AM, Michael Beccaria
wrote:

> I inadvertently purchase ABBYY Finereader 11 Corporate thinking that it
> would be capable of outputting to ALTO XML. I was wrong. ABBYY Finereader
> Engine does:/
>
> Ultimately, I want to OCR some newspaper images and export them to ALTO
> XML and, until the proof of concept is done, I want to try to do it on the
> cheap. My plan this morning was to write some scripts to OCR them using
> Microsoft Office Document Imaging (MODI) and then export the results to
> ALTO XML which could be a big project. Has anyone done this before or know
> of a quick and dirty way to get some OCR data?
> Thanks,
> Mike Beccaria
> Systems Librarian
> Paul Smith's College
> 518.327.6376
>


Re: [CODE4LIB] Transport options from Charlotte to Asheville for c4l2010

2009-11-12 Thread Bridger Dyson-Smith
Hello -
an FYI: if you're planning on flying into Knoxville and making a drive to
Asheville via I-40, be advised that there has been a rather large rockslide
and 40 is closed. Here's a link to the official update --
http://www.ncdot.org/traffictravel/.

Safe travels to all.
cheers,
Bridger

--
Bridger Dyson-Smith
Digital Library Initiatives
University of Tennessee Libraries

On Thu, Nov 12, 2009 at 8:29 AM, John Fereira  wrote:

> Ross Singer wrote:
>
>> Likewise, Knoxville is also ~1.5 hours from Asheville.  Between
>> Greenville, Charlotte and Knoxville you might be able to catch a
>> special deal.
>>
>>
> A bit closer is the Tri-Cities airport (Johnson City, Bristol, Kingsport).
>  I've flown in there a couple of times when my in-laws lived in Johnson
> City.  It's a real nice drive, about an hour and 20 minutes, from there to
> Asheville.
>
> --
> John Fereira
> Cornell University
> Twitter: @john_fereira
> Google Wave: fere...@googlewave.com
>


Re: [CODE4LIB] OCR PDFs

2008-10-17 Thread Bridger Dyson-Smith
If you haven't already, take a look at tesseract (
http://code.google.com/p/tesseract-ocr/). There's some discussion of using
tesseract and shell scripting to work with tiffs to pdfs to ocr'd text,
which isn't exactly what you're wanting to do, I know, but may prove helpful
(http://www.groklaw.net/articlebasic.php?story=20061210115516438).
Cheers!
Bridger Dyson-Smith


On Fri, Oct 17, 2008 at 8:28 AM, Terry Harrison <[EMAIL PROTECTED]> wrote:

> You might want to look at ABBYY Fine Reader 9.0 Professional, which can be
> driven from the command line.  Fine Reader  is used at the Library of
> Congress.  Here is a info link to get you started (search "command"):
>
>
> http://www.scanstore.com/Scanning/Document_Imaging/Software/OCR_Software/Nuance/omnipage_review.asp
>
> Regards,
> Terry
>
> 
> Terry Harrison
> Project Manager
> CACI
> 5505 Robin Hood Road, Suite F
> Norfolk, Va. 23508
> Ph: 757.321.9120 x232
> Fax: 757.321.8797
> [EMAIL PROTECTED]
>


[CODE4LIB] Job Posting: IT Administrator, Digital Library Initiatives

2008-09-17 Thread Bridger Dyson-Smith
 and Diversity.

---
Bridger Dyson-Smith
Research Assistant Professor
Digital Library Initiatives
University of Tennessee Libraries
John C. Hodges Library
865-974-0012
[EMAIL PROTECTED]