I did just bring in my own laptop to see if my problem is unique to my work 
computer.  I actually have used Amazon AWS, and yes, that might be the best 
option.  I've been looking into why my MSAccess job is limited to 25% of my CPU 
time - Maybe Access just can't use multiprocessors.  I'm going to investigate 
SLQite and OpenRefine on my presonal laptop.

Thanks all!
Cindy Harper

-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf Of Kyle 
Banerjee
Sent: Thursday, August 06, 2015 12:34 PM
To: [email protected]
Subject: Re: [CODE4LIB] Processing Circ data

On Wed, Aug 5, 2015 at 1:07 PM, Harper, Cynthia <[email protected]> wrote:

> Hi all. What are you using to process circ data for ad-hoc queries.  I 
> usually extract csv or tab-delimited files - one row per item record, 
> with identifying bib record data, then total checkouts over the given 
> time period(s).  I have been importing these into Access then grouping 
> them by bib record. I think that I've reached the limits of 
> scalability for Access for this project now, with 250,000 item 
> records.  Does anyone do this in R?  My other go-to- software for data 
> processing is RapidMiner free version.  Or do you just use MySQL or 
> other SQL database?  I was looking into doing it in R with RSQLite 
> (just read about this and sqldf 
> http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because I'm sure my 
> IT department will be skeptical of letting me have MySQL on my desktop.
> (I've moved into a much more users-don't-do-real-computing kind of 
> environment).  I'm rusty enough in R that if anyone will give me some 
> start-off data import code, that would be great.
>

As has been mentioned already, it's worth investigating whether OpenRefine or 
sqllite are options for you. If not, I'd be inclined to explore solutions that 
don't rely on your local IT dept.

It's so easy to spend far more time going through approval, procurement, and 
then negotiating local IT security/policies than actually working that it pays 
to do a lot of things on the cloud. There are many services out there, but I 
like Amazon for occasional need things because you can provision anything you 
want in minutes and they're stupid cheap. If all you need is mysql for a few 
minutes now and then, just pay for Relational Database Services. If you'd 
rather have a server and run mysql off it, get an EBS backed EC2 instance (the 
reason to go this route rather than instance store is improved IO and your data 
is all retained if you shut off the server without taking a snapshot). 
Depending on your usage, bills of less than a buck a month are very doable. If 
you need something that runs 24x7, other routes will probably be more 
attractive. Another option is to try the mysql built into cheapo web hosting 
accounts like bluehost, though you might find that your disk IO gets yo!
 u throttled. But it might be worth a shot.

If doing this work on your desktop is acceptable (i.e. other people don't need 
access to this service), you might seriously consider just doing it on a 
personal laptop that you can install anything you want on. In addition to 
mysql, you can also install VirtualBox which is a great environment for 
provisioning servers that you can export to other environments or even carry 
around on your cell phone.

With regards to some of the specific issues you bring up, 40 minutes for a 
query on a database that size is insane which indicates the tool you have is 
not up for the job. Because of the way databases store info, performance 
degrades on a logarthmic (rather than linear) basis on indexed data. In plain 
English, this means even queries on millions of records take surprisingly 
little power. Based on what you've described, changing a field from variable to 
fixed might not save you any space and could even increase it depending on what 
you have. In any case, the difference won't be worth worrying about.

Whatever solution you go with, I'd recommend learning to provision yourself 
resources when you can find some time. Work is hard enough when you can't get 
the resources you need. When you can simply assign them to yourself, the tools 
you need are always at hand so life gets much easier and more fun.

kyle

Reply via email to