Re: [CODE4LIB] bagit java version

2016-03-23 Thread Terry Brady
John,

Thank you for the TAR library recommendations.

I think the example folder would be very helpful.

Thanks,

Terry

On Wed, Mar 23, 2016 at 9:24 AM, Scancella, John  wrote:

> Terry,
>
> Thanks for responding! There are libraries that already deal with making
> Tar(and other archive formats) files much better than bagit-java ever could
> simply because they have more resources to maintain and add new features.
> After a quick search here are two that you could use
> http://commons.apache.org/proper/commons-compress/tar.html
> https://github.com/kamranzafar/jtar
>
> Would it help to create an examples folder and show how you would create a
> tar yourself when using bagit-java?
>
> Thanks
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Terry Brady
> Sent: Wednesday, March 23, 2016 12:09 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] bagit java version
>
> John,
>
> I am glad to hear that the bagit library will be enhanced.
>
> At Georgetown, I have integrated the the bagit java library into our
> FileAnalyzer application.  We use this application for a number of
> digitization related tasks.
>
>
> https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Bagit-automation
>
> Our real use case is to prepare bags for the APTrust preservation
> repository.
>
>
> https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Bagit-automation-for-Academic-Preservation-Trust-(APTrust)
>
> APTrust requires bags to be packaged as TAR files.  This code creates some
> APTrust tag files and then packages the bag as a tar file.  It would be
> useful to have a tar option built into the bagit library.
>
> Since we have a number of applications that are already in Java, we will
> continue to work with the Java version of the library.
>
> Terry
>
> On Wed, Mar 23, 2016 at 6:36 AM, Scancella, John  wrote:
>
> > Hi All,
> >
> > I am currently rewriting the library so that it better conforms to the
> > spec, and to make it easier to extend and use.  I was wondering if
> > people would be so kind as to provide any feedback on:
> > * How they currently use the bagit-java library
> > * Do you use the command line?
> > * If so what is stopping you from using the python version
> instead?
> > * Do you use the library in a java application?
> > * If so what functionality do you use? What do you wish it did?
> > * Trying out the latest version (you can see examples here
> > https://github.com/LibraryOfCongress/bagit-java/blob/master/README.md#
> > examples
> > on how to use it)
> >
> > Thanks
> >
> > John
> >
> > Please note, all opinions expressed in this email are my own.
> >
>
>
>
> --
> Terry Brady
> Applications Programmer Analyst
> Georgetown University Library Information Technology
> https://www.library.georgetown.edu/lit/code
> 425-298-5498 (Seattle, WA)
>



-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498 (Seattle, WA)


Re: [CODE4LIB] bagit java version

2016-03-23 Thread Terry Brady
John,

I am glad to hear that the bagit library will be enhanced.

At Georgetown, I have integrated the the bagit java library into our
FileAnalyzer application.  We use this application for a number of
digitization related tasks.

https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Bagit-automation

Our real use case is to prepare bags for the APTrust preservation
repository.

https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Bagit-automation-for-Academic-Preservation-Trust-(APTrust)

APTrust requires bags to be packaged as TAR files.  This code creates some
APTrust tag files and then packages the bag as a tar file.  It would be
useful to have a tar option built into the bagit library.

Since we have a number of applications that are already in Java, we will
continue to work with the Java version of the library.

Terry

On Wed, Mar 23, 2016 at 6:36 AM, Scancella, John  wrote:

> Hi All,
>
> I am currently rewriting the library so that it better conforms to the
> spec, and to make it easier to extend and use.  I was wondering if people
> would be so kind as to provide any feedback on:
> * How they currently use the bagit-java library
> * Do you use the command line?
> * If so what is stopping you from using the python version instead?
> * Do you use the library in a java application?
> * If so what functionality do you use? What do you wish it did?
> * Trying out the latest version (you can see examples here
> https://github.com/LibraryOfCongress/bagit-java/blob/master/README.md#examples
> on how to use it)
>
> Thanks
>
> John
>
> Please note, all opinions expressed in this email are my own.
>



-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498 (Seattle, WA)


Re: [CODE4LIB] Software to track scheduled processes

2015-10-05 Thread Terry Brady
We are running Sierra.

On Mon, Oct 5, 2015 at 9:44 AM, Cornel Darden Jr. 
wrote:

> Hello,
>
> What ILS are you currently using?
>
> Thanks,
>
>
> Sent from my iPhone
>
> > On Oct 5, 2015, at 11:19 AM, Terry Brady 
> wrote:
> >
> > Our ILS (library system) has a mechanism to run tasks on a schedule.
> > Unfortunately, it is very difficult to track the success & failure of
> these
> > scheduled tasks.  We also have some cron-initiated processes that perform
> > additional processing on the files generated by the ILS scheduler.
> >
> > We would like to acquire or develop a simple reporting system to track
> the
> > execution of these processes.  I would envision that this system that
> would
> > allow scheduled tasks to post the following information at job start up
> and
> > at job completion.
> >
> >   - Task name
> >   - Task Step (Start, Export, Import, FTP, Complete)
> >   - Date/Time
> >   - Status: Success/Failure
> >   - Data relevant to the process (number of items processed)
> >
> > Ideally, we would like to have a centralized view of the status of these
> > tasks in a spreadsheet-like view.  We do not need the system to initiate
> > tasks or to attempt to restart tasks.
> >
> > Before we consider building a simple system to perform this function, are
> > you aware of any existing software or open source projects that might
> > accomplish this goal?
> >
> > Thanks, Terry
> >
> > --
> > Terry Brady
> > Applications Programmer Analyst
> > Georgetown University Library Information Technology
> > https://www.library.georgetown.edu/lit/code
> > 425-298-5498 (Seattle, WA)
>



-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498 (Seattle, WA)


[CODE4LIB] Software to track scheduled processes

2015-10-05 Thread Terry Brady
Our ILS (library system) has a mechanism to run tasks on a schedule.
Unfortunately, it is very difficult to track the success & failure of these
scheduled tasks.  We also have some cron-initiated processes that perform
additional processing on the files generated by the ILS scheduler.

We would like to acquire or develop a simple reporting system to track the
execution of these processes.  I would envision that this system that would
allow scheduled tasks to post the following information at job start up and
at job completion.

   - Task name
   - Task Step (Start, Export, Import, FTP, Complete)
   - Date/Time
   - Status: Success/Failure
   - Data relevant to the process (number of items processed)

Ideally, we would like to have a centralized view of the status of these
tasks in a spreadsheet-like view.  We do not need the system to initiate
tasks or to attempt to restart tasks.

Before we consider building a simple system to perform this function, are
you aware of any existing software or open source projects that might
accomplish this goal?

Thanks, Terry

-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498 (Seattle, WA)


Re: [CODE4LIB] 'automation' tools

2014-07-07 Thread Terry Brady
I learned about Open Refine <http://openrefine.org/> at the Code4Lib
conference, and it looks like it would be a great tool for normalizing
data.  I worked on a few projects in the past in which this would have been
very helpful.

Bohyun Kim wrote a great article about how to query Google Spreadsheet data
from a web page: http://www.bohyunkim.net/blog/archives/2831.  I have found
this approach very helpful for parsing Google Form data.

I have created an application that has been very useful for our library:
http://georgetown-university-libraries.github.io/File-Analyzer/.  We use
this application within our digitization and ingest workflows.  We have
written some custom code to convert files exported from our ILS.  If you
check out the wiki pages
<https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki>,
you might find some tasks that would be useful to recommend.

Good luck with your workshop.

Terry


On Fri, Jul 4, 2014 at 1:51 AM, Owen Stephens  wrote:

> I'm doing a workshop in the UK at a library tech unconference-style event
> (Pi and Mash http://piandmash.info) on automating computer based tasks.
> I want to cover tools that are usable by non-programmers and that would
> work in a typical library environment. The types of tools I'm thinking of
> are:
>
> MacroExpress
> AutoHotKey
> iMacros for Firefox
>
> While I'm hoping workshop attendees will bring ideas about tasks they
> would like to automate the type of thing I have in mind are things like:
>
> Filling out a set of standard data on a GUI or Web form (e.g. standard set
> of budget codes for an order)
> Processing a list of item barcodes from a spreadsheet and doing something
> with them on the library system (e.g. change loan status, check for holds)
> Similarly for User IDs
> Navigating to a web page and doing some task
>
> Clearly some of these tasks would be better automated with appropriate
> APIs and scripts, but I want to try to introduce those without programming
> skills to some of the concepts and tools and essentially how they can work
> around problems themselves to some extent.
>
> What tools do you use for this kind of automation task, and what kind of
> tasks do they best deal with?
>
> Thanks,
>
> Owen
>
> Owen Stephens
> Owen Stephens Consulting
> Web: http://www.ostephens.com
> Email: o...@ostephens.com
> Telephone: 0121 288 6936
>



-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498


Re: [CODE4LIB] Need a Win7 checksum creator and verifer for files in a folder

2014-06-27 Thread Terry Brady
Georgetown Libraries has released an updated and improved version of the
File Analyzer application.

http://georgetown-university-libraries.github.io/File-Analyzer/

Information about the checksum tool is available at
https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Core-File-Test-Rules#sort-by-checksum

Terry


On Fri, Jun 27, 2014 at 5:38 PM, Levy, Michael  wrote:

> You might look at NARA File Analyzer
> http://www.archives.gov/social-media/github.html
> Some docs here:
> http://www.mnhs.org/preserve/records/docs_pdfs/NARAFileAnalyzer.pdf
>
> On Fri, Jun 27, 2014 at 5:21 PM, Nathan Tallman 
> wrote:
>
> > Perhaps Fixity from AV Preserve <
> > http://www.avpreserve.com/avpsresources/tools/>?
> >
> > Nathan
> >
> >
> > On Fri, Jun 27, 2014 at 4:43 PM, Kari R Smith  wrote:
> >
> > > Looking for a free, Windows 7 sha-256 checksum creator and verifier
> tool
> > > that runs as a GUI and will produce a file of the checksums (that can
> > then
> > > be later verified.)
> > >
> > > Thoughts?
> > >
> > > Thank you,
> > > Kari
> > >
> >
>



-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498


Re: [CODE4LIB] Excel to XML

2014-06-13 Thread Terry Brady
The current version of Excel offers a save as XML option.

It will produce something like this.  There is other wrapping metadata, but
the table is pretty easy to parse.

  
   
row 1
question 1
answer 1
   
   
row 2
answer 2
   
   
row 3
answer 3
   
   
row 4
question 2
answer 1
   
   
row 5 
answer 2
   
   
row 6
quest 
answer 3
   
   

   
  


On Fri, Jun 13, 2014 at 2:28 PM, Ryan Engel  wrote:

> Hello -
>
> I have an Excel spreadsheet that, for the purposes of an easy import into
> a Drupal site, I'd like to convert to XML.  I know people more
> knowledgeable than I could code up something in Python or Perl to convert a
> CSV version of the data to XML (and I have a colleague who offered to do
> just that for me), but I am looking for recommendations for something more
> immediately accessible.
>
> Here's an idea of how the spreadsheet is structured:
>
> Row1Question1Q1Answer1
> Row2Q1Answer2
> Row3Q1Answer3
> Row4Question2Q2Answer1
> Row5Q2Answer2
> Row6Question3Q3Answer1
> etc.
>
> How do other people approach this?  Import the data to an SQL database,
> write some clever queries, and then export that to XML?  Work some wizardry
> in GoogleRefine/OpenRefine?  Are scripting languages really the best all
> around solution?  Excel's built in XML mapping function wasn't able to
> process the one-to-many relationship of questions to answers, though maybe
> I just don't know how to build the mapping structure correctly.
>
> In the interest immanent deadlines, I have handed the spreadsheet off to
> my Perl-writing colleague.  But as a professional growth opportunity, I'm
> interested in suggestions from Libraryland about ways others have
> approached this successfully.
>
> Thanks!
>
> Ryan Engel
> Web Stuff
> UW-Madison
>



-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498


Re: [CODE4LIB] Tool to highlight differences in two files

2013-04-23 Thread Terry Brady
WinMerge is a great diff tool, and it is easy to use.


On Tue, Apr 23, 2013 at 4:29 PM, Jim DelRosso  wrote:

> The one resource that came immediately to mind was Juxta:
> http://www.juxtasoftware.org/juxta-commons/
>
> Thanks!
>
> Jim
>
> *Jim DelRosso, MPA, MSLIS
> Digital Projects Coordinator*
> *Hospitality, Labor, and Management Library*
> Catherwood Library
> ILR School
> Cornell University
> 239D Ives Hall
> Ithaca, NY 14853
> p 607.255.8688
> f 607.255.9641
> e jd...@cornell.edu
> www.ilr.cornell.edu
> *Advancing the World of Work*
>
>
> On Tue, Apr 23, 2013 at 4:24 PM, Wilhelmina Randtke  >wrote:
>
> > I would like to compare versions of a website scraped at different times
> to
> > see what paragraphs on a page have changed.  Does anyone here know of a
> > tool for holding two files side by side and noting what is the same and
> > what is different between the files?
> >
> > It seems like any simple script to note differences in two strings of
> text
> > would work, but I don't know a tool to use.
> >
> > -Wilhelmina Randtke
> >
>



-- 
Terry Brady
Applications Programmer Analyst
Lauinger Information Technology
202-687-7053


Re: [CODE4LIB] Image de-duping and file identification

2013-03-19 Thread Terry Brady
Carmen,

The following code may be able to help.

https://github.com/Georgetown-University-Libraries/File-Analyzer

This application can scan a file system and report counts of files by type.

The application can also report on files by checksum.  If you are trying to
find exact file duplicates, the checksum report will identify exact
duplicates found across a file system.

I will be presenting an overview of this application during the virtual
lightning talks session on April 3.

If this looks useful to you, I will be glad to give you an overview of the
application.

Terry


On Tue, Mar 19, 2013 at 4:51 PM, Carmen Mitchell
wrote:

> Hello Code4Libbers,
>
> I'm working with a faculty member and trying to help them to formalize
> their data collection practices. Part of this process is also going through
> old data and trying to assess what they currently have. This particular
> faculty member has been doing research for 10 years without any kind of
> structure or regular method. So far we have over 2 TB of data in various
> states. (With more to come.)
>
> I've got a programmer working with me to:
> a) identify file types
> b) count how many files of each type
>
> We are now working on de-duping and assessing file size, focusing on the
> JPEGs first. With over 300,000 over them...it might take a while. (Of
> course they aren't following any kind of file naming structure,
> either...It's a mess.)
>
> Any tips or tricks or tools that you might know of to help speed up this
> process? Is there a good image recognition tool that you could suggest that
> would help us with automation?
>
>  Thanks,
>
> Carmen Mitchell
> Institutional Repository Librarian
> Cal State San Marcos
>



-- 
Terry Brady
Applications Programmer Analyst
Lauinger Information Technology
202-687-7053


Re: [CODE4LIB] Code4lib 2013 Presentation Election now open!

2012-11-13 Thread Terry Brady
Ross, I submitted 2 proposals.  I noticed that on my second proposal
"Quality Assurance Reports for DSpace Collections" my name and affiliation
are not present.

http://wiki.code4lib.org/index.php/2013_talks_proposals#Quality_Assurance_Reports_for_DSpace_Collections

Thanks for setting this up.

Terry


On Tue, Nov 13, 2012 at 12:03 AM, Ross Singer  wrote:

> http://vote.code4lib.org/election/24
>
> Vote early, vote often, but most importantly, vote soon:  the polls close
> sometime on the night of Monday the 19th of November (looking at the host
> that the diebold-o-tron, I think it will be around 11 PM EST, but when they
> close, they close!).
>
> -Ross.
> p.s. given the new design, let me know if there are any voting problems.
>



-- 
Terry Brady
Applications Programmer Analyst
Lauinger Information Technology
202-687-7053


Re: [CODE4LIB] clarification about file visualization

2012-08-31 Thread Terry Brady
The following application may be useful for your task.  I created this
application at the National Archives.  The team that I was on used this
application for a number of file system analysis tasks.

https://github.com/usnationalarchives/File-Analyzer

This application will allow you to select a recipe to use when crawling a
file system.  The recipe that you select will determine the type of report
that will be generated.  Once the report is generated, you can filter and
sort for information of interest.  Essentially, the application converts
the tree structure of the file system into a table structure.  The table
structure seemed to simplify decisions about a complex file hierarchy.

Terry

On Thu, Aug 30, 2012 at 6:02 PM, Shearer, Timothy J
wrote:

> Hi Folks,
>
> My query may have been poorly expressed...
>
> What we have is a webserver with 64,665 files (html, css, js, jpg, you get
> the idea) and lots of directories with subdirectories.
>
> The goal is to be able to conveniently take all that in in a way that
> makes it pretty simple to see/navigate (say for a public services staff
> member tasked with doing a survey of the old content) so that we can get a
> handle on what's there (prior to say, moving from a php+html template
> approach to a CMS).  It's about exploring the website from under the hood.
>
> In my limited imagination it might look like: the document tree
> represented in xml as viewed through a web browser.  Expanding/contracting
> nodes (and being able to recursively explode the view at at any node).
> Maybe choose to hide things like image, css, and js files.  Annotation
> would be lovely (say at a subdirectory be able to say: "this one's old and
> needs to go", "this one we keep as is", "this one needs to be reworked
> entirely").  And in an ideal world state could be preserved...if you'd
> expanded/contracted chunks as you were exploring, you could come back
> later and be where you were in your exploration.
>
> tree expresses the file system as (strangely enough) a tree, but the
> output is not interactive and it's huge and unwieldy to deal with.  If you
> find a subdirectory that's full of thousands of files that are irrelevant
> to the task of getting a handle on the overall content, they're on the
> screen and you page and page down and eventually lose track of where they
> are in the directory hierarchy.
>
> I'm more interested in how other shops help users understand a huge old
> webserver's content than focusing on a specific tool such as the one my
> brain imagines.
>
> Thanks for the feedback so far!
>
> Tim
>



-- 
Terry Brady
Applications Programmer Analyst
Lauinger Information Technology
202-687-7053