Re: [CODE4LIB] Processing Circ data

2015-08-06 Thread Harrison G. Dekker
Hi Cynthia,

R would be ideal for the types data manipulation you describe and would
allow to automate the entire process. If you can share a sample of your
data and examples of the types of queries you're running, I'd be glad to
help you get started.

If you'd like to keep a relational database in your workflow, check out
sqlite. It's a file format rather than a database server, so won't be an
issue for your IT staff. There's a Firefox plug-in that provides basic
client functionality, and you can also easily access the tables from R
(directly) or for that matter with Access or Excel via odbc (not what I'd
recommend, but it's possible!).

Harrison Dekker
Head, Library Data Lab
UC Berkeley Libraries

On Thu, Aug 6, 2015 at 6:05 AM, Harper, Cynthia char...@vts.edu wrote:

 I have compacted the database, and I'm using the Group By SQL query. I
 think I actually am hitting the 2GB limit, because of all the data I have
 for each row. I'm wondering if having added a field for reserves history
 notes, that that's treated as a fixed-length field for every record, rather
 than variable length, and just appearing for the small number of records
 that have been put on reserve.  I suppose if I exported my data in two
 tables - bib and item data, the database would be much more efficient than
 the flat-file approach I've been using.  Time to turn the mind back on,
 rather than just taking the lazy approach every time...

 Cindy

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Kevin Ford
 Sent: Wednesday, August 05, 2015 5:16 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Processing Circ data

 On the surface, your difficulties suggest you may need look at a few
 optimization tactics. Apologies if these are things you've already
 considered and addressed - just offering a suggestion.

 This page [1] is for Access 2003 but the items under Improve query
 performance should apply - I think - to newer versions also.  I'll draw
 specific attention to 1) Compacting the database; 2) making sure you have
 an index set up on the bib record number field and number of circs field;
 and 3) make sure you are using hte Group by sql syntax [2].

 Now, I'm not terribly familiar with Access so I can't actually help you
 with point/click instructions, but the above are common 'gotchas' that
 could be a problem regardless of RDBMS.

 Yours,
 Kevin

 [1] https://support.microsoft.com/en-us/kb/209126
 [2] http://www.w3schools.com/sql/sql_groupby.asp



 On 8/5/15 4:01 PM, Harper, Cynthia wrote:
  Well, I guess it could be bad data, but I don't know how to tell. I
 think I've done more than this before.
 
  I have a Find duplicates query that groups by bib record number.  That
 query seemed to take about 40 minutes to process. Then I added a criterion
 to limit to only records that had 0 circs this year. That query displays
 the rotating cursor, then says Not Responding, then the cursor, and loops
 through that for hours.  Maybe I can find the Access bad data, but I'd be
 glad to find a more modern data analysis software.  My db is 136,256 kb.
 But adding that extra query will probably put it over the 2GB mark.  I've
 tried extracting to a csv, and that didn't work. Maybe I'll try a Make
 table to a separate db.
 
  Or the OpenRefine suggestion sounds good too.
 
  Cindy Harper
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
  Of Kevin Ford
  Sent: Wednesday, August 05, 2015 4:23 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] Processing Circ data
 
  Hi Cindy,
 
  This doesn't quite address your issue, but, unless you've hit the 2 GB
 Access size limit [1], Access can handle a good deal more than 250,000 item
 records (rows, yes?) you cited.
 
  What makes you think you've hit the limit?  Slowness, something else?
 
  All the best,
  Kevin
 
  [1]
  https://support.office.com/en-us/article/Access-2010-specifications-1e
  521481-7f9a-46f7-8ed9-ea9dff1fa854
 
 
 
 
 
  On 8/5/15 3:07 PM, Harper, Cynthia wrote:
  Hi all. What are you using to process circ data for ad-hoc queries.  I
 usually extract csv or tab-delimited files - one row per item record, with
 identifying bib record data, then total checkouts over the given time
 period(s).  I have been importing these into Access then grouping them by
 bib record. I think that I've reached the limits of scalability for Access
 for this project now, with 250,000 item records.  Does anyone do this in
 R?  My other go-to- software for data processing is RapidMiner free
 version.  Or do you just use MySQL or other SQL database?  I was looking
 into doing it in R with RSQLite (just read about this and sqldf
 http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because ...  I'm
 rusty enough in R that if anyone will give me some start-off data import
 code, that would be great.
 
  Cindy Harper
  E-services and periodicals librarian
  Virginia Theological Seminary
  

Re: [CODE4LIB] Job: Associate Director of Library Technology and Digital Initiatives at Colgate University

2014-09-11 Thread Harrison G. DEKKER
Hi Francis,

Although I'm not sure the timing of this is great for me, this is the type
of job (small college, seemingly forward-looking library, etc.) that I'm
contemplating as a next step in my career. I'd be interested to hear what
you have to say about it. I met you at code4lib in Chicago, by the way.
Have you worked at Colgate?

My main question would be, is Colgate an institution that would be amenable
to the Library being a key provider of research technology instruction and
services (e.g statistical computing, data viz, data management, etc.)?

Harrison Dekker


On Wed, Sep 10, 2014 at 8:22 AM, Francis Kayiwa kay...@pobox.com wrote:

 If you have any questions on this job I'm happy to talk about it.

 Cheers,
 ./fxk



 On 09/10/2014 09:33 AM, j...@code4lib.org wrote:

 Associate Director of Library Technology and Digital Initiatives
 Colgate University
 Hamilton, NY

 Come join the team at Colgate!


 The Colgate University Libraries seek a collegial and thoughtful
 individual to
 provide forward-thinking, collaborative, and results-oriented leadership
 for
 the Colgate University Libraries (CUL) through planning and
 implementation of
 new technology and the management and support of library legacy
 technologies
 involving information systems and digital infrastructure and
 programs. Reporting to the University Librarian, this
 individual serves on the Libraries' senior management team and
 participates in
 the development and implementation of a shared vision for CUL's future
 that
 supports the mission of the university. Using highly
 effective communication and interpersonal skills, this individual will
 respond
 to the changing information needs of the Colgate community by
 participating in
 system-wide planning, policy development, and resource and personnel
 management and build and sustain effective working relationships within
 the
 Libraries and across and beyond the Colgate community. This individual
 will
 lead, manage, and plan for the Libraries Systems unit, and supervise,
 evaluate
 and provide backup for the Systems Librarian to oversee, develop and
 support
 the integrated library system (Innovative Interfaces) particularly its
 interface with the Dematic ASRS.


 Qualifications: Master's degree, such as an MLS or MIS from an
 ALA-accredited
 program, MS in computer science or other relevant degree. Minimum of five
 years of progressively responsible experience in information technology,
 including experience leading and managing information technology or
 systems
 operations; successful experience supervising, developing, and mentoring
 information technology professionals. Substantive knowledge of digital
 assets
 and the technical infrastructure required for their life-cycle management,
 including metadata requirements, migration strategies, best practices in
 digital preservation, and relevant national and international standards.


 Additional information about Colgate University, the Colgate Libraries,
 and
 the full job description can be found at
 http://exlibris.colgate.edu/joinus.html


 Application instructions can be found at
 https://academicjobsonline.org/ajo/jobs/4433. Candidates will need to
 upload a
 letter of application, curriculum vita, and provide email addresses for
 three
 references, including current supervisor. Official transcripts will be
 required of candidates selected for an on-campus interview.


 Review of application materials will begin on October 10, 2014, and
 continue
 until the position is filled.


 Colgate is a highly selective private liberal arts university located in
 Hamilton, NY, and is an EO/AA employer. Developing and sustaining a
 diverse
 faculty, staff, and student body further the university's educational
 mission.
 Women and candidates from historically underrepresented groups are
 encouraged
 to apply. Applicants with dual career considerations can find postings of
 other employment opportunities at http://www.upstatenyherc.org



 Brought to you by code4lib jobs: http://jobs.code4lib.org/job/16607/
 To post a new job please visit http://jobs.code4lib.org/


 --
 You single-handedly fought your way into this hopeless mess.



Re: [CODE4LIB] Anyone working with iPython?

2013-12-19 Thread Harrison G. DEKKER
Hi Roy,

iPython is huge at UC Berkeley and it's creator, Fernando Perez is
part of the team that will be launching the Berkeley Institute for
Data Science, which incidentally will be based in Doe Library when it
opens in a few months. Here's a blog post about the project:
http://blog.fperez.org/2013/11/an-ambitious-experiment-in-data-science.html

Also of interest, my colleague Raymond Yee uses iPython when he
teaches his Open Data class in the UC Berkeley School of Information.
The class actually publishes their final projects in iPython Notebook
format. You can seem their work here:
http://nbviewer.ipython.org/github/fperez/blog/blob/master/130507-Berkeley-iSchool-OpenData.ipynb

I'm sure there are other cool examples of how it's being used in
teaching and science. Seems to me like something that's going to be
around for awhile, but admittedly, my perspective is from iPython
ground zero!

-Harrison




On Thu, Dec 19, 2013 at 9:48 AM, Roy Tennant roytenn...@gmail.com wrote:
 Our Wikipedian in Residence, Max Klein brought iPython [1] to my attention
 recently and even in just the little exploration I've done with it so far
 I'm quite impressed. Although you could call it interactive Python that
 doesn't begin to put across the full range of capabilities, as when I first
 heard that I thought Great, a Python shell where you enter a command, hit
 the return, and it executes. Great. Just what I need. NOT. But I was SO
 WRONG.

 It certainly can and does do that, but also so much more. You can enter
 blocks of code that then execute. Those blocks don't even have to be
 Python. They can be Ruby or Perl or bash. There are built-in functions of
 various kinds that it (oddly) calls magic. But perhaps the killer bit is
 the idea of Notebooks that can capture all of your work in a way that is
 also editable and completely web-ready. This last part is probably
 difficult to understand until you experience it.

 Anyway, i was curious if others have been working with it and if so, what
 they are using it for. I can think of all kinds of things I might want to
 do with it, but hearing from others can inspire me further, I'm sure.
 Thanks,
 Roy

 [1] http://ipython.org/