Re: [CODE4LIB] bagit java version
John, Thank you for the TAR library recommendations. I think the example folder would be very helpful. Thanks, Terry On Wed, Mar 23, 2016 at 9:24 AM, Scancella, John wrote: > Terry, > > Thanks for responding! There are libraries that already deal with making > Tar(and other archive formats) files much better than bagit-java ever could > simply because they have more resources to maintain and add new features. > After a quick search here are two that you could use > http://commons.apache.org/proper/commons-compress/tar.html > https://github.com/kamranzafar/jtar > > Would it help to create an examples folder and show how you would create a > tar yourself when using bagit-java? > > Thanks > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Terry Brady > Sent: Wednesday, March 23, 2016 12:09 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] bagit java version > > John, > > I am glad to hear that the bagit library will be enhanced. > > At Georgetown, I have integrated the the bagit java library into our > FileAnalyzer application. We use this application for a number of > digitization related tasks. > > > https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Bagit-automation > > Our real use case is to prepare bags for the APTrust preservation > repository. > > > https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Bagit-automation-for-Academic-Preservation-Trust-(APTrust) > > APTrust requires bags to be packaged as TAR files. This code creates some > APTrust tag files and then packages the bag as a tar file. It would be > useful to have a tar option built into the bagit library. > > Since we have a number of applications that are already in Java, we will > continue to work with the Java version of the library. > > Terry > > On Wed, Mar 23, 2016 at 6:36 AM, Scancella, John wrote: > > > Hi All, > > > > I am currently rewriting the library so that it better conforms to the > > spec, and to make it easier to extend and use. I was wondering if > > people would be so kind as to provide any feedback on: > > * How they currently use the bagit-java library > > * Do you use the command line? > > * If so what is stopping you from using the python version > instead? > > * Do you use the library in a java application? > > * If so what functionality do you use? What do you wish it did? > > * Trying out the latest version (you can see examples here > > https://github.com/LibraryOfCongress/bagit-java/blob/master/README.md# > > examples > > on how to use it) > > > > Thanks > > > > John > > > > Please note, all opinions expressed in this email are my own. > > > > > > -- > Terry Brady > Applications Programmer Analyst > Georgetown University Library Information Technology > https://www.library.georgetown.edu/lit/code > 425-298-5498 (Seattle, WA) > -- Terry Brady Applications Programmer Analyst Georgetown University Library Information Technology https://www.library.georgetown.edu/lit/code 425-298-5498 (Seattle, WA)
Re: [CODE4LIB] bagit java version
John, I am glad to hear that the bagit library will be enhanced. At Georgetown, I have integrated the the bagit java library into our FileAnalyzer application. We use this application for a number of digitization related tasks. https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Bagit-automation Our real use case is to prepare bags for the APTrust preservation repository. https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Bagit-automation-for-Academic-Preservation-Trust-(APTrust) APTrust requires bags to be packaged as TAR files. This code creates some APTrust tag files and then packages the bag as a tar file. It would be useful to have a tar option built into the bagit library. Since we have a number of applications that are already in Java, we will continue to work with the Java version of the library. Terry On Wed, Mar 23, 2016 at 6:36 AM, Scancella, John wrote: > Hi All, > > I am currently rewriting the library so that it better conforms to the > spec, and to make it easier to extend and use. I was wondering if people > would be so kind as to provide any feedback on: > * How they currently use the bagit-java library > * Do you use the command line? > * If so what is stopping you from using the python version instead? > * Do you use the library in a java application? > * If so what functionality do you use? What do you wish it did? > * Trying out the latest version (you can see examples here > https://github.com/LibraryOfCongress/bagit-java/blob/master/README.md#examples > on how to use it) > > Thanks > > John > > Please note, all opinions expressed in this email are my own. > -- Terry Brady Applications Programmer Analyst Georgetown University Library Information Technology https://www.library.georgetown.edu/lit/code 425-298-5498 (Seattle, WA)
Re: [CODE4LIB] Software to track scheduled processes
We are running Sierra. On Mon, Oct 5, 2015 at 9:44 AM, Cornel Darden Jr. wrote: > Hello, > > What ILS are you currently using? > > Thanks, > > > Sent from my iPhone > > > On Oct 5, 2015, at 11:19 AM, Terry Brady > wrote: > > > > Our ILS (library system) has a mechanism to run tasks on a schedule. > > Unfortunately, it is very difficult to track the success & failure of > these > > scheduled tasks. We also have some cron-initiated processes that perform > > additional processing on the files generated by the ILS scheduler. > > > > We would like to acquire or develop a simple reporting system to track > the > > execution of these processes. I would envision that this system that > would > > allow scheduled tasks to post the following information at job start up > and > > at job completion. > > > > - Task name > > - Task Step (Start, Export, Import, FTP, Complete) > > - Date/Time > > - Status: Success/Failure > > - Data relevant to the process (number of items processed) > > > > Ideally, we would like to have a centralized view of the status of these > > tasks in a spreadsheet-like view. We do not need the system to initiate > > tasks or to attempt to restart tasks. > > > > Before we consider building a simple system to perform this function, are > > you aware of any existing software or open source projects that might > > accomplish this goal? > > > > Thanks, Terry > > > > -- > > Terry Brady > > Applications Programmer Analyst > > Georgetown University Library Information Technology > > https://www.library.georgetown.edu/lit/code > > 425-298-5498 (Seattle, WA) > -- Terry Brady Applications Programmer Analyst Georgetown University Library Information Technology https://www.library.georgetown.edu/lit/code 425-298-5498 (Seattle, WA)
[CODE4LIB] Software to track scheduled processes
Our ILS (library system) has a mechanism to run tasks on a schedule. Unfortunately, it is very difficult to track the success & failure of these scheduled tasks. We also have some cron-initiated processes that perform additional processing on the files generated by the ILS scheduler. We would like to acquire or develop a simple reporting system to track the execution of these processes. I would envision that this system that would allow scheduled tasks to post the following information at job start up and at job completion. - Task name - Task Step (Start, Export, Import, FTP, Complete) - Date/Time - Status: Success/Failure - Data relevant to the process (number of items processed) Ideally, we would like to have a centralized view of the status of these tasks in a spreadsheet-like view. We do not need the system to initiate tasks or to attempt to restart tasks. Before we consider building a simple system to perform this function, are you aware of any existing software or open source projects that might accomplish this goal? Thanks, Terry -- Terry Brady Applications Programmer Analyst Georgetown University Library Information Technology https://www.library.georgetown.edu/lit/code 425-298-5498 (Seattle, WA)
Re: [CODE4LIB] 'automation' tools
I learned about Open Refine <http://openrefine.org/> at the Code4Lib conference, and it looks like it would be a great tool for normalizing data. I worked on a few projects in the past in which this would have been very helpful. Bohyun Kim wrote a great article about how to query Google Spreadsheet data from a web page: http://www.bohyunkim.net/blog/archives/2831. I have found this approach very helpful for parsing Google Form data. I have created an application that has been very useful for our library: http://georgetown-university-libraries.github.io/File-Analyzer/. We use this application within our digitization and ingest workflows. We have written some custom code to convert files exported from our ILS. If you check out the wiki pages <https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki>, you might find some tasks that would be useful to recommend. Good luck with your workshop. Terry On Fri, Jul 4, 2014 at 1:51 AM, Owen Stephens wrote: > I'm doing a workshop in the UK at a library tech unconference-style event > (Pi and Mash http://piandmash.info) on automating computer based tasks. > I want to cover tools that are usable by non-programmers and that would > work in a typical library environment. The types of tools I'm thinking of > are: > > MacroExpress > AutoHotKey > iMacros for Firefox > > While I'm hoping workshop attendees will bring ideas about tasks they > would like to automate the type of thing I have in mind are things like: > > Filling out a set of standard data on a GUI or Web form (e.g. standard set > of budget codes for an order) > Processing a list of item barcodes from a spreadsheet and doing something > with them on the library system (e.g. change loan status, check for holds) > Similarly for User IDs > Navigating to a web page and doing some task > > Clearly some of these tasks would be better automated with appropriate > APIs and scripts, but I want to try to introduce those without programming > skills to some of the concepts and tools and essentially how they can work > around problems themselves to some extent. > > What tools do you use for this kind of automation task, and what kind of > tasks do they best deal with? > > Thanks, > > Owen > > Owen Stephens > Owen Stephens Consulting > Web: http://www.ostephens.com > Email: o...@ostephens.com > Telephone: 0121 288 6936 > -- Terry Brady Applications Programmer Analyst Georgetown University Library Information Technology https://www.library.georgetown.edu/lit/code 425-298-5498
Re: [CODE4LIB] Need a Win7 checksum creator and verifer for files in a folder
Georgetown Libraries has released an updated and improved version of the File Analyzer application. http://georgetown-university-libraries.github.io/File-Analyzer/ Information about the checksum tool is available at https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki/Core-File-Test-Rules#sort-by-checksum Terry On Fri, Jun 27, 2014 at 5:38 PM, Levy, Michael wrote: > You might look at NARA File Analyzer > http://www.archives.gov/social-media/github.html > Some docs here: > http://www.mnhs.org/preserve/records/docs_pdfs/NARAFileAnalyzer.pdf > > On Fri, Jun 27, 2014 at 5:21 PM, Nathan Tallman > wrote: > > > Perhaps Fixity from AV Preserve < > > http://www.avpreserve.com/avpsresources/tools/>? > > > > Nathan > > > > > > On Fri, Jun 27, 2014 at 4:43 PM, Kari R Smith wrote: > > > > > Looking for a free, Windows 7 sha-256 checksum creator and verifier > tool > > > that runs as a GUI and will produce a file of the checksums (that can > > then > > > be later verified.) > > > > > > Thoughts? > > > > > > Thank you, > > > Kari > > > > > > -- Terry Brady Applications Programmer Analyst Georgetown University Library Information Technology https://www.library.georgetown.edu/lit/code 425-298-5498
Re: [CODE4LIB] Excel to XML
The current version of Excel offers a save as XML option. It will produce something like this. There is other wrapping metadata, but the table is pretty easy to parse. row 1 question 1 answer 1 row 2 answer 2 row 3 answer 3 row 4 question 2 answer 1 row 5 answer 2 row 6 quest answer 3 On Fri, Jun 13, 2014 at 2:28 PM, Ryan Engel wrote: > Hello - > > I have an Excel spreadsheet that, for the purposes of an easy import into > a Drupal site, I'd like to convert to XML. I know people more > knowledgeable than I could code up something in Python or Perl to convert a > CSV version of the data to XML (and I have a colleague who offered to do > just that for me), but I am looking for recommendations for something more > immediately accessible. > > Here's an idea of how the spreadsheet is structured: > > Row1Question1Q1Answer1 > Row2Q1Answer2 > Row3Q1Answer3 > Row4Question2Q2Answer1 > Row5Q2Answer2 > Row6Question3Q3Answer1 > etc. > > How do other people approach this? Import the data to an SQL database, > write some clever queries, and then export that to XML? Work some wizardry > in GoogleRefine/OpenRefine? Are scripting languages really the best all > around solution? Excel's built in XML mapping function wasn't able to > process the one-to-many relationship of questions to answers, though maybe > I just don't know how to build the mapping structure correctly. > > In the interest immanent deadlines, I have handed the spreadsheet off to > my Perl-writing colleague. But as a professional growth opportunity, I'm > interested in suggestions from Libraryland about ways others have > approached this successfully. > > Thanks! > > Ryan Engel > Web Stuff > UW-Madison > -- Terry Brady Applications Programmer Analyst Georgetown University Library Information Technology https://www.library.georgetown.edu/lit/code 425-298-5498
Re: [CODE4LIB] Tool to highlight differences in two files
WinMerge is a great diff tool, and it is easy to use. On Tue, Apr 23, 2013 at 4:29 PM, Jim DelRosso wrote: > The one resource that came immediately to mind was Juxta: > http://www.juxtasoftware.org/juxta-commons/ > > Thanks! > > Jim > > *Jim DelRosso, MPA, MSLIS > Digital Projects Coordinator* > *Hospitality, Labor, and Management Library* > Catherwood Library > ILR School > Cornell University > 239D Ives Hall > Ithaca, NY 14853 > p 607.255.8688 > f 607.255.9641 > e jd...@cornell.edu > www.ilr.cornell.edu > *Advancing the World of Work* > > > On Tue, Apr 23, 2013 at 4:24 PM, Wilhelmina Randtke >wrote: > > > I would like to compare versions of a website scraped at different times > to > > see what paragraphs on a page have changed. Does anyone here know of a > > tool for holding two files side by side and noting what is the same and > > what is different between the files? > > > > It seems like any simple script to note differences in two strings of > text > > would work, but I don't know a tool to use. > > > > -Wilhelmina Randtke > > > -- Terry Brady Applications Programmer Analyst Lauinger Information Technology 202-687-7053
Re: [CODE4LIB] Image de-duping and file identification
Carmen, The following code may be able to help. https://github.com/Georgetown-University-Libraries/File-Analyzer This application can scan a file system and report counts of files by type. The application can also report on files by checksum. If you are trying to find exact file duplicates, the checksum report will identify exact duplicates found across a file system. I will be presenting an overview of this application during the virtual lightning talks session on April 3. If this looks useful to you, I will be glad to give you an overview of the application. Terry On Tue, Mar 19, 2013 at 4:51 PM, Carmen Mitchell wrote: > Hello Code4Libbers, > > I'm working with a faculty member and trying to help them to formalize > their data collection practices. Part of this process is also going through > old data and trying to assess what they currently have. This particular > faculty member has been doing research for 10 years without any kind of > structure or regular method. So far we have over 2 TB of data in various > states. (With more to come.) > > I've got a programmer working with me to: > a) identify file types > b) count how many files of each type > > We are now working on de-duping and assessing file size, focusing on the > JPEGs first. With over 300,000 over them...it might take a while. (Of > course they aren't following any kind of file naming structure, > either...It's a mess.) > > Any tips or tricks or tools that you might know of to help speed up this > process? Is there a good image recognition tool that you could suggest that > would help us with automation? > > Thanks, > > Carmen Mitchell > Institutional Repository Librarian > Cal State San Marcos > -- Terry Brady Applications Programmer Analyst Lauinger Information Technology 202-687-7053
Re: [CODE4LIB] Code4lib 2013 Presentation Election now open!
Ross, I submitted 2 proposals. I noticed that on my second proposal "Quality Assurance Reports for DSpace Collections" my name and affiliation are not present. http://wiki.code4lib.org/index.php/2013_talks_proposals#Quality_Assurance_Reports_for_DSpace_Collections Thanks for setting this up. Terry On Tue, Nov 13, 2012 at 12:03 AM, Ross Singer wrote: > http://vote.code4lib.org/election/24 > > Vote early, vote often, but most importantly, vote soon: the polls close > sometime on the night of Monday the 19th of November (looking at the host > that the diebold-o-tron, I think it will be around 11 PM EST, but when they > close, they close!). > > -Ross. > p.s. given the new design, let me know if there are any voting problems. > -- Terry Brady Applications Programmer Analyst Lauinger Information Technology 202-687-7053
Re: [CODE4LIB] clarification about file visualization
The following application may be useful for your task. I created this application at the National Archives. The team that I was on used this application for a number of file system analysis tasks. https://github.com/usnationalarchives/File-Analyzer This application will allow you to select a recipe to use when crawling a file system. The recipe that you select will determine the type of report that will be generated. Once the report is generated, you can filter and sort for information of interest. Essentially, the application converts the tree structure of the file system into a table structure. The table structure seemed to simplify decisions about a complex file hierarchy. Terry On Thu, Aug 30, 2012 at 6:02 PM, Shearer, Timothy J wrote: > Hi Folks, > > My query may have been poorly expressed... > > What we have is a webserver with 64,665 files (html, css, js, jpg, you get > the idea) and lots of directories with subdirectories. > > The goal is to be able to conveniently take all that in in a way that > makes it pretty simple to see/navigate (say for a public services staff > member tasked with doing a survey of the old content) so that we can get a > handle on what's there (prior to say, moving from a php+html template > approach to a CMS). It's about exploring the website from under the hood. > > In my limited imagination it might look like: the document tree > represented in xml as viewed through a web browser. Expanding/contracting > nodes (and being able to recursively explode the view at at any node). > Maybe choose to hide things like image, css, and js files. Annotation > would be lovely (say at a subdirectory be able to say: "this one's old and > needs to go", "this one we keep as is", "this one needs to be reworked > entirely"). And in an ideal world state could be preserved...if you'd > expanded/contracted chunks as you were exploring, you could come back > later and be where you were in your exploration. > > tree expresses the file system as (strangely enough) a tree, but the > output is not interactive and it's huge and unwieldy to deal with. If you > find a subdirectory that's full of thousands of files that are irrelevant > to the task of getting a handle on the overall content, they're on the > screen and you page and page down and eventually lose track of where they > are in the directory hierarchy. > > I'm more interested in how other shops help users understand a huge old > webserver's content than focusing on a specific tool such as the one my > brain imagines. > > Thanks for the feedback so far! > > Tim > -- Terry Brady Applications Programmer Analyst Lauinger Information Technology 202-687-7053