Re: [CODE4LIB] Processing Circ data
Hi Cynthia, R would be ideal for the types data manipulation you describe and would allow to automate the entire process. If you can share a sample of your data and examples of the types of queries you're running, I'd be glad to help you get started. If you'd like to keep a relational database in your workflow, check out sqlite. It's a file format rather than a database server, so won't be an issue for your IT staff. There's a Firefox plug-in that provides basic client functionality, and you can also easily access the tables from R (directly) or for that matter with Access or Excel via odbc (not what I'd recommend, but it's possible!). Harrison Dekker Head, Library Data Lab UC Berkeley Libraries On Thu, Aug 6, 2015 at 6:05 AM, Harper, Cynthia char...@vts.edu wrote: I have compacted the database, and I'm using the Group By SQL query. I think I actually am hitting the 2GB limit, because of all the data I have for each row. I'm wondering if having added a field for reserves history notes, that that's treated as a fixed-length field for every record, rather than variable length, and just appearing for the small number of records that have been put on reserve. I suppose if I exported my data in two tables - bib and item data, the database would be much more efficient than the flat-file approach I've been using. Time to turn the mind back on, rather than just taking the lazy approach every time... Cindy -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kevin Ford Sent: Wednesday, August 05, 2015 5:16 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Processing Circ data On the surface, your difficulties suggest you may need look at a few optimization tactics. Apologies if these are things you've already considered and addressed - just offering a suggestion. This page [1] is for Access 2003 but the items under Improve query performance should apply - I think - to newer versions also. I'll draw specific attention to 1) Compacting the database; 2) making sure you have an index set up on the bib record number field and number of circs field; and 3) make sure you are using hte Group by sql syntax [2]. Now, I'm not terribly familiar with Access so I can't actually help you with point/click instructions, but the above are common 'gotchas' that could be a problem regardless of RDBMS. Yours, Kevin [1] https://support.microsoft.com/en-us/kb/209126 [2] http://www.w3schools.com/sql/sql_groupby.asp On 8/5/15 4:01 PM, Harper, Cynthia wrote: Well, I guess it could be bad data, but I don't know how to tell. I think I've done more than this before. I have a Find duplicates query that groups by bib record number. That query seemed to take about 40 minutes to process. Then I added a criterion to limit to only records that had 0 circs this year. That query displays the rotating cursor, then says Not Responding, then the cursor, and loops through that for hours. Maybe I can find the Access bad data, but I'd be glad to find a more modern data analysis software. My db is 136,256 kb. But adding that extra query will probably put it over the 2GB mark. I've tried extracting to a csv, and that didn't work. Maybe I'll try a Make table to a separate db. Or the OpenRefine suggestion sounds good too. Cindy Harper -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kevin Ford Sent: Wednesday, August 05, 2015 4:23 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Processing Circ data Hi Cindy, This doesn't quite address your issue, but, unless you've hit the 2 GB Access size limit [1], Access can handle a good deal more than 250,000 item records (rows, yes?) you cited. What makes you think you've hit the limit? Slowness, something else? All the best, Kevin [1] https://support.office.com/en-us/article/Access-2010-specifications-1e 521481-7f9a-46f7-8ed9-ea9dff1fa854 On 8/5/15 3:07 PM, Harper, Cynthia wrote: Hi all. What are you using to process circ data for ad-hoc queries. I usually extract csv or tab-delimited files - one row per item record, with identifying bib record data, then total checkouts over the given time period(s). I have been importing these into Access then grouping them by bib record. I think that I've reached the limits of scalability for Access for this project now, with 250,000 item records. Does anyone do this in R? My other go-to- software for data processing is RapidMiner free version. Or do you just use MySQL or other SQL database? I was looking into doing it in R with RSQLite (just read about this and sqldf http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because ... I'm rusty enough in R that if anyone will give me some start-off data import code, that would be great. Cindy Harper E-services and periodicals librarian Virginia Theological Seminary
Re: [CODE4LIB] Job: Associate Director of Library Technology and Digital Initiatives at Colgate University
Hi Francis, Although I'm not sure the timing of this is great for me, this is the type of job (small college, seemingly forward-looking library, etc.) that I'm contemplating as a next step in my career. I'd be interested to hear what you have to say about it. I met you at code4lib in Chicago, by the way. Have you worked at Colgate? My main question would be, is Colgate an institution that would be amenable to the Library being a key provider of research technology instruction and services (e.g statistical computing, data viz, data management, etc.)? Harrison Dekker On Wed, Sep 10, 2014 at 8:22 AM, Francis Kayiwa kay...@pobox.com wrote: If you have any questions on this job I'm happy to talk about it. Cheers, ./fxk On 09/10/2014 09:33 AM, j...@code4lib.org wrote: Associate Director of Library Technology and Digital Initiatives Colgate University Hamilton, NY Come join the team at Colgate! The Colgate University Libraries seek a collegial and thoughtful individual to provide forward-thinking, collaborative, and results-oriented leadership for the Colgate University Libraries (CUL) through planning and implementation of new technology and the management and support of library legacy technologies involving information systems and digital infrastructure and programs. Reporting to the University Librarian, this individual serves on the Libraries' senior management team and participates in the development and implementation of a shared vision for CUL's future that supports the mission of the university. Using highly effective communication and interpersonal skills, this individual will respond to the changing information needs of the Colgate community by participating in system-wide planning, policy development, and resource and personnel management and build and sustain effective working relationships within the Libraries and across and beyond the Colgate community. This individual will lead, manage, and plan for the Libraries Systems unit, and supervise, evaluate and provide backup for the Systems Librarian to oversee, develop and support the integrated library system (Innovative Interfaces) particularly its interface with the Dematic ASRS. Qualifications: Master's degree, such as an MLS or MIS from an ALA-accredited program, MS in computer science or other relevant degree. Minimum of five years of progressively responsible experience in information technology, including experience leading and managing information technology or systems operations; successful experience supervising, developing, and mentoring information technology professionals. Substantive knowledge of digital assets and the technical infrastructure required for their life-cycle management, including metadata requirements, migration strategies, best practices in digital preservation, and relevant national and international standards. Additional information about Colgate University, the Colgate Libraries, and the full job description can be found at http://exlibris.colgate.edu/joinus.html Application instructions can be found at https://academicjobsonline.org/ajo/jobs/4433. Candidates will need to upload a letter of application, curriculum vita, and provide email addresses for three references, including current supervisor. Official transcripts will be required of candidates selected for an on-campus interview. Review of application materials will begin on October 10, 2014, and continue until the position is filled. Colgate is a highly selective private liberal arts university located in Hamilton, NY, and is an EO/AA employer. Developing and sustaining a diverse faculty, staff, and student body further the university's educational mission. Women and candidates from historically underrepresented groups are encouraged to apply. Applicants with dual career considerations can find postings of other employment opportunities at http://www.upstatenyherc.org Brought to you by code4lib jobs: http://jobs.code4lib.org/job/16607/ To post a new job please visit http://jobs.code4lib.org/ -- You single-handedly fought your way into this hopeless mess.
Re: [CODE4LIB] Anyone working with iPython?
Hi Roy, iPython is huge at UC Berkeley and it's creator, Fernando Perez is part of the team that will be launching the Berkeley Institute for Data Science, which incidentally will be based in Doe Library when it opens in a few months. Here's a blog post about the project: http://blog.fperez.org/2013/11/an-ambitious-experiment-in-data-science.html Also of interest, my colleague Raymond Yee uses iPython when he teaches his Open Data class in the UC Berkeley School of Information. The class actually publishes their final projects in iPython Notebook format. You can seem their work here: http://nbviewer.ipython.org/github/fperez/blog/blob/master/130507-Berkeley-iSchool-OpenData.ipynb I'm sure there are other cool examples of how it's being used in teaching and science. Seems to me like something that's going to be around for awhile, but admittedly, my perspective is from iPython ground zero! -Harrison On Thu, Dec 19, 2013 at 9:48 AM, Roy Tennant roytenn...@gmail.com wrote: Our Wikipedian in Residence, Max Klein brought iPython [1] to my attention recently and even in just the little exploration I've done with it so far I'm quite impressed. Although you could call it interactive Python that doesn't begin to put across the full range of capabilities, as when I first heard that I thought Great, a Python shell where you enter a command, hit the return, and it executes. Great. Just what I need. NOT. But I was SO WRONG. It certainly can and does do that, but also so much more. You can enter blocks of code that then execute. Those blocks don't even have to be Python. They can be Ruby or Perl or bash. There are built-in functions of various kinds that it (oddly) calls magic. But perhaps the killer bit is the idea of Notebooks that can capture all of your work in a way that is also editable and completely web-ready. This last part is probably difficult to understand until you experience it. Anyway, i was curious if others have been working with it and if so, what they are using it for. I can think of all kinds of things I might want to do with it, but hearing from others can inspire me further, I'm sure. Thanks, Roy [1] http://ipython.org/