Re: [CODE4LIB] bagit java version

2016-03-23 Thread Terry Brady
John,

Thank you for the TAR library recommendations.

I think the example folder would be very helpful.

Thanks,

Terry

On Wed, Mar 23, 2016 at 9:24 AM, Scancella, John <j...@loc.gov> wrote:

> Terry,
>
> Thanks for responding! There are libraries that already deal with making
> Tar(and other archive formats) files much better than bagit-java ever could
> simply because they have more resources to maintain and add new features.
> After a quick search here are two that you could use
> http://commons.apache.org/proper/commons-compress/tar.html
> https://github.com/kamranzafar/jtar
>
> Would it help to create an examples folder and show how you would create a
> tar yourself when using bagit-java?
>
> Thanks
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Terry Brady
> Sent: Wednesday, March 23, 2016 12:09 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] bagit java version
>
> John,
>
> I am glad to hear that the bagit library will be enhanced.
>
> At Georgetown, I have integrated the the bagit java library into our
> FileAnalyzer application.  We use this application for a number of
> digitization related tasks.
>
>
> https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Bagit-automation
>
> Our real use case is to prepare bags for the APTrust preservation
> repository.
>
>
> https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Bagit-automation-for-Academic-Preservation-Trust-(APTrust)
>
> APTrust requires bags to be packaged as TAR files.  This code creates some
> APTrust tag files and then packages the bag as a tar file.  It would be
> useful to have a tar option built into the bagit library.
>
> Since we have a number of applications that are already in Java, we will
> continue to work with the Java version of the library.
>
> Terry
>
> On Wed, Mar 23, 2016 at 6:36 AM, Scancella, John <j...@loc.gov> wrote:
>
> > Hi All,
> >
> > I am currently rewriting the library so that it better conforms to the
> > spec, and to make it easier to extend and use.  I was wondering if
> > people would be so kind as to provide any feedback on:
> > * How they currently use the bagit-java library
> > * Do you use the command line?
> > * If so what is stopping you from using the python version
> instead?
> > * Do you use the library in a java application?
> > * If so what functionality do you use? What do you wish it did?
> > * Trying out the latest version (you can see examples here
> > https://github.com/LibraryOfCongress/bagit-java/blob/master/README.md#
> > examples
> > on how to use it)
> >
> > Thanks
> >
> > John
> >
> > Please note, all opinions expressed in this email are my own.
> >
>
>
>
> --
> Terry Brady
> Applications Programmer Analyst
> Georgetown University Library Information Technology
> https://www.library.georgetown.edu/lit/code
> 425-298-5498 (Seattle, WA)
>



-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498 (Seattle, WA)


Re: [CODE4LIB] bagit java version

2016-03-23 Thread Terry Brady
John,

I am glad to hear that the bagit library will be enhanced.

At Georgetown, I have integrated the the bagit java library into our
FileAnalyzer application.  We use this application for a number of
digitization related tasks.

https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Bagit-automation

Our real use case is to prepare bags for the APTrust preservation
repository.

https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Bagit-automation-for-Academic-Preservation-Trust-(APTrust)

APTrust requires bags to be packaged as TAR files.  This code creates some
APTrust tag files and then packages the bag as a tar file.  It would be
useful to have a tar option built into the bagit library.

Since we have a number of applications that are already in Java, we will
continue to work with the Java version of the library.

Terry

On Wed, Mar 23, 2016 at 6:36 AM, Scancella, John <j...@loc.gov> wrote:

> Hi All,
>
> I am currently rewriting the library so that it better conforms to the
> spec, and to make it easier to extend and use.  I was wondering if people
> would be so kind as to provide any feedback on:
> * How they currently use the bagit-java library
> * Do you use the command line?
> * If so what is stopping you from using the python version instead?
> * Do you use the library in a java application?
> * If so what functionality do you use? What do you wish it did?
> * Trying out the latest version (you can see examples here
> https://github.com/LibraryOfCongress/bagit-java/blob/master/README.md#examples
> on how to use it)
>
> Thanks
>
> John
>
> Please note, all opinions expressed in this email are my own.
>



-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498 (Seattle, WA)


[CODE4LIB] Software to track scheduled processes

2015-10-05 Thread Terry Brady
Our ILS (library system) has a mechanism to run tasks on a schedule.
Unfortunately, it is very difficult to track the success & failure of these
scheduled tasks.  We also have some cron-initiated processes that perform
additional processing on the files generated by the ILS scheduler.

We would like to acquire or develop a simple reporting system to track the
execution of these processes.  I would envision that this system that would
allow scheduled tasks to post the following information at job start up and
at job completion.

   - Task name
   - Task Step (Start, Export, Import, FTP, Complete)
   - Date/Time
   - Status: Success/Failure
   - Data relevant to the process (number of items processed)

Ideally, we would like to have a centralized view of the status of these
tasks in a spreadsheet-like view.  We do not need the system to initiate
tasks or to attempt to restart tasks.

Before we consider building a simple system to perform this function, are
you aware of any existing software or open source projects that might
accomplish this goal?

Thanks, Terry

-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498 (Seattle, WA)


Re: [CODE4LIB] Software to track scheduled processes

2015-10-05 Thread Terry Brady
We are running Sierra.

On Mon, Oct 5, 2015 at 9:44 AM, Cornel Darden Jr. <corneldarde...@gmail.com>
wrote:

> Hello,
>
> What ILS are you currently using?
>
> Thanks,
>
>
> Sent from my iPhone
>
> > On Oct 5, 2015, at 11:19 AM, Terry Brady <terry.br...@georgetown.edu>
> wrote:
> >
> > Our ILS (library system) has a mechanism to run tasks on a schedule.
> > Unfortunately, it is very difficult to track the success & failure of
> these
> > scheduled tasks.  We also have some cron-initiated processes that perform
> > additional processing on the files generated by the ILS scheduler.
> >
> > We would like to acquire or develop a simple reporting system to track
> the
> > execution of these processes.  I would envision that this system that
> would
> > allow scheduled tasks to post the following information at job start up
> and
> > at job completion.
> >
> >   - Task name
> >   - Task Step (Start, Export, Import, FTP, Complete)
> >   - Date/Time
> >   - Status: Success/Failure
> >   - Data relevant to the process (number of items processed)
> >
> > Ideally, we would like to have a centralized view of the status of these
> > tasks in a spreadsheet-like view.  We do not need the system to initiate
> > tasks or to attempt to restart tasks.
> >
> > Before we consider building a simple system to perform this function, are
> > you aware of any existing software or open source projects that might
> > accomplish this goal?
> >
> > Thanks, Terry
> >
> > --
> > Terry Brady
> > Applications Programmer Analyst
> > Georgetown University Library Information Technology
> > https://www.library.georgetown.edu/lit/code
> > 425-298-5498 (Seattle, WA)
>



-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498 (Seattle, WA)


Re: [CODE4LIB] 'automation' tools

2014-07-07 Thread Terry Brady
I learned about Open Refine http://openrefine.org/ at the Code4Lib
conference, and it looks like it would be a great tool for normalizing
data.  I worked on a few projects in the past in which this would have been
very helpful.

Bohyun Kim wrote a great article about how to query Google Spreadsheet data
from a web page: http://www.bohyunkim.net/blog/archives/2831.  I have found
this approach very helpful for parsing Google Form data.

I have created an application that has been very useful for our library:
http://georgetown-university-libraries.github.io/File-Analyzer/.  We use
this application within our digitization and ingest workflows.  We have
written some custom code to convert files exported from our ILS.  If you
check out the wiki pages
https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki,
you might find some tasks that would be useful to recommend.

Good luck with your workshop.

Terry


On Fri, Jul 4, 2014 at 1:51 AM, Owen Stephens o...@ostephens.com wrote:

 I'm doing a workshop in the UK at a library tech unconference-style event
 (Pi and Mash http://piandmash.info) on automating computer based tasks.
 I want to cover tools that are usable by non-programmers and that would
 work in a typical library environment. The types of tools I'm thinking of
 are:

 MacroExpress
 AutoHotKey
 iMacros for Firefox

 While I'm hoping workshop attendees will bring ideas about tasks they
 would like to automate the type of thing I have in mind are things like:

 Filling out a set of standard data on a GUI or Web form (e.g. standard set
 of budget codes for an order)
 Processing a list of item barcodes from a spreadsheet and doing something
 with them on the library system (e.g. change loan status, check for holds)
 Similarly for User IDs
 Navigating to a web page and doing some task

 Clearly some of these tasks would be better automated with appropriate
 APIs and scripts, but I want to try to introduce those without programming
 skills to some of the concepts and tools and essentially how they can work
 around problems themselves to some extent.

 What tools do you use for this kind of automation task, and what kind of
 tasks do they best deal with?

 Thanks,

 Owen

 Owen Stephens
 Owen Stephens Consulting
 Web: http://www.ostephens.com
 Email: o...@ostephens.com
 Telephone: 0121 288 6936




-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498


Re: [CODE4LIB] Need a Win7 checksum creator and verifer for files in a folder

2014-06-27 Thread Terry Brady
Georgetown Libraries has released an updated and improved version of the
File Analyzer application.

http://georgetown-university-libraries.github.io/File-Analyzer/

Information about the checksum tool is available at
https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Core-File-Test-Rules#sort-by-checksum

Terry


On Fri, Jun 27, 2014 at 5:38 PM, Levy, Michael ml...@ushmm.org wrote:

 You might look at NARA File Analyzer
 http://www.archives.gov/social-media/github.html
 Some docs here:
 http://www.mnhs.org/preserve/records/docs_pdfs/NARAFileAnalyzer.pdf

 On Fri, Jun 27, 2014 at 5:21 PM, Nathan Tallman ntall...@gmail.com
 wrote:

  Perhaps Fixity from AV Preserve 
  http://www.avpreserve.com/avpsresources/tools/?
 
  Nathan
 
 
  On Fri, Jun 27, 2014 at 4:43 PM, Kari R Smith smit...@mit.edu wrote:
 
   Looking for a free, Windows 7 sha-256 checksum creator and verifier
 tool
   that runs as a GUI and will produce a file of the checksums (that can
  then
   be later verified.)
  
   Thoughts?
  
   Thank you,
   Kari
  
 




-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498


Re: [CODE4LIB] Excel to XML

2014-06-13 Thread Terry Brady
The current version of Excel offers a save as XML option.

It will produce something like this.  There is other wrapping metadata, but
the table is pretty easy to parse.

  Table ss:ExpandedColumnCount=3 ss:ExpandedRowCount=7
x:FullColumns=1
   x:FullRows=1 ss:DefaultRowHeight=15
   Row
Cell ss:StyleID=s62Data ss:Type=Stringrow 1/Data/Cell
CellData ss:Type=Stringquestion 1/Data/Cell
CellData ss:Type=Stringanswer 1/Data/Cell
   /Row
   Row
Cell ss:StyleID=s62Data ss:Type=Stringrow 2/Data/Cell
Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell
   /Row
   Row
Cell ss:StyleID=s62Data ss:Type=Stringrow 3/Data/Cell
Cell ss:Index=3Data ss:Type=Stringanswer 3/Data/Cell
   /Row
   Row
Cell ss:StyleID=s62Data ss:Type=Stringrow 4/Data/Cell
CellData ss:Type=Stringquestion 2/Data/Cell
CellData ss:Type=Stringanswer 1/Data/Cell
   /Row
   Row
Cell ss:StyleID=s62Data ss:Type=Stringrow 5 /Data/Cell
Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell
   /Row
   Row
Cell ss:StyleID=s62Data ss:Type=Stringrow 6/Data/Cell
CellData ss:Type=Stringquest /Data/Cell
CellData ss:Type=Stringanswer 3/Data/Cell
   /Row
   Row
Cell ss:StyleID=s62/
   /Row
  /Table


On Fri, Jun 13, 2014 at 2:28 PM, Ryan Engel rten...@wisc.edu wrote:

 Hello -

 I have an Excel spreadsheet that, for the purposes of an easy import into
 a Drupal site, I'd like to convert to XML.  I know people more
 knowledgeable than I could code up something in Python or Perl to convert a
 CSV version of the data to XML (and I have a colleague who offered to do
 just that for me), but I am looking for recommendations for something more
 immediately accessible.

 Here's an idea of how the spreadsheet is structured:

 Row1Question1Q1Answer1
 Row2Q1Answer2
 Row3Q1Answer3
 Row4Question2Q2Answer1
 Row5Q2Answer2
 Row6Question3Q3Answer1
 etc.

 How do other people approach this?  Import the data to an SQL database,
 write some clever queries, and then export that to XML?  Work some wizardry
 in GoogleRefine/OpenRefine?  Are scripting languages really the best all
 around solution?  Excel's built in XML mapping function wasn't able to
 process the one-to-many relationship of questions to answers, though maybe
 I just don't know how to build the mapping structure correctly.

 In the interest immanent deadlines, I have handed the spreadsheet off to
 my Perl-writing colleague.  But as a professional growth opportunity, I'm
 interested in suggestions from Libraryland about ways others have
 approached this successfully.

 Thanks!

 Ryan Engel
 Web Stuff
 UW-Madison




-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498


Re: [CODE4LIB] Tool to highlight differences in two files

2013-04-23 Thread Terry Brady
WinMerge is a great diff tool, and it is easy to use.


On Tue, Apr 23, 2013 at 4:29 PM, Jim DelRosso jd...@cornell.edu wrote:

 The one resource that came immediately to mind was Juxta:
 http://www.juxtasoftware.org/juxta-commons/

 Thanks!

 Jim

 *Jim DelRosso, MPA, MSLIS
 Digital Projects Coordinator*
 *Hospitality, Labor, and Management Library*
 Catherwood Library
 ILR School
 Cornell University
 239D Ives Hall
 Ithaca, NY 14853
 p 607.255.8688
 f 607.255.9641
 e jd...@cornell.edu
 www.ilr.cornell.edu
 *Advancing the World of Work*


 On Tue, Apr 23, 2013 at 4:24 PM, Wilhelmina Randtke rand...@gmail.com
 wrote:

  I would like to compare versions of a website scraped at different times
 to
  see what paragraphs on a page have changed.  Does anyone here know of a
  tool for holding two files side by side and noting what is the same and
  what is different between the files?
 
  It seems like any simple script to note differences in two strings of
 text
  would work, but I don't know a tool to use.
 
  -Wilhelmina Randtke
 




-- 
Terry Brady
Applications Programmer Analyst
Lauinger Information Technology
202-687-7053


Re: [CODE4LIB] Image de-duping and file identification

2013-03-19 Thread Terry Brady
Carmen,

The following code may be able to help.

https://github.com/Georgetown-University-Libraries/File-Analyzer

This application can scan a file system and report counts of files by type.

The application can also report on files by checksum.  If you are trying to
find exact file duplicates, the checksum report will identify exact
duplicates found across a file system.

I will be presenting an overview of this application during the virtual
lightning talks session on April 3.

If this looks useful to you, I will be glad to give you an overview of the
application.

Terry


On Tue, Mar 19, 2013 at 4:51 PM, Carmen Mitchell
carmenmitch...@gmail.comwrote:

 Hello Code4Libbers,

 I'm working with a faculty member and trying to help them to formalize
 their data collection practices. Part of this process is also going through
 old data and trying to assess what they currently have. This particular
 faculty member has been doing research for 10 years without any kind of
 structure or regular method. So far we have over 2 TB of data in various
 states. (With more to come.)

 I've got a programmer working with me to:
 a) identify file types
 b) count how many files of each type

 We are now working on de-duping and assessing file size, focusing on the
 JPEGs first. With over 300,000 over them...it might take a while. (Of
 course they aren't following any kind of file naming structure,
 either...It's a mess.)

 Any tips or tricks or tools that you might know of to help speed up this
 process? Is there a good image recognition tool that you could suggest that
 would help us with automation?

  Thanks,

 Carmen Mitchell
 Institutional Repository Librarian
 Cal State San Marcos




-- 
Terry Brady
Applications Programmer Analyst
Lauinger Information Technology
202-687-7053


Re: [CODE4LIB] Code4lib 2013 Presentation Election now open!

2012-11-13 Thread Terry Brady
Ross, I submitted 2 proposals.  I noticed that on my second proposal
Quality Assurance Reports for DSpace Collections my name and affiliation
are not present.

http://wiki.code4lib.org/index.php/2013_talks_proposals#Quality_Assurance_Reports_for_DSpace_Collections

Thanks for setting this up.

Terry


On Tue, Nov 13, 2012 at 12:03 AM, Ross Singer rossfsin...@gmail.com wrote:

 http://vote.code4lib.org/election/24

 Vote early, vote often, but most importantly, vote soon:  the polls close
 sometime on the night of Monday the 19th of November (looking at the host
 that the diebold-o-tron, I think it will be around 11 PM EST, but when they
 close, they close!).

 -Ross.
 p.s. given the new design, let me know if there are any voting problems.




-- 
Terry Brady
Applications Programmer Analyst
Lauinger Information Technology
202-687-7053


Re: [CODE4LIB] clarification about file visualization

2012-08-31 Thread Terry Brady
The following application may be useful for your task.  I created this
application at the National Archives.  The team that I was on used this
application for a number of file system analysis tasks.

https://github.com/usnationalarchives/File-Analyzer

This application will allow you to select a recipe to use when crawling a
file system.  The recipe that you select will determine the type of report
that will be generated.  Once the report is generated, you can filter and
sort for information of interest.  Essentially, the application converts
the tree structure of the file system into a table structure.  The table
structure seemed to simplify decisions about a complex file hierarchy.

Terry

On Thu, Aug 30, 2012 at 6:02 PM, Shearer, Timothy J
tshea...@email.unc.eduwrote:

 Hi Folks,

 My query may have been poorly expressed...

 What we have is a webserver with 64,665 files (html, css, js, jpg, you get
 the idea) and lots of directories with subdirectories.

 The goal is to be able to conveniently take all that in in a way that
 makes it pretty simple to see/navigate (say for a public services staff
 member tasked with doing a survey of the old content) so that we can get a
 handle on what's there (prior to say, moving from a php+html template
 approach to a CMS).  It's about exploring the website from under the hood.

 In my limited imagination it might look like: the document tree
 represented in xml as viewed through a web browser.  Expanding/contracting
 nodes (and being able to recursively explode the view at at any node).
 Maybe choose to hide things like image, css, and js files.  Annotation
 would be lovely (say at a subdirectory be able to say: this one's old and
 needs to go, this one we keep as is, this one needs to be reworked
 entirely).  And in an ideal world state could be preserved...if you'd
 expanded/contracted chunks as you were exploring, you could come back
 later and be where you were in your exploration.

 tree expresses the file system as (strangely enough) a tree, but the
 output is not interactive and it's huge and unwieldy to deal with.  If you
 find a subdirectory that's full of thousands of files that are irrelevant
 to the task of getting a handle on the overall content, they're on the
 screen and you page and page down and eventually lose track of where they
 are in the directory hierarchy.

 I'm more interested in how other shops help users understand a huge old
 webserver's content than focusing on a specific tool such as the one my
 brain imagines.

 Thanks for the feedback so far!

 Tim




-- 
Terry Brady
Applications Programmer Analyst
Lauinger Information Technology
202-687-7053