Hello, I'd go with a full database solution over sqlite. No limitations.
Thanks, Sent from my iPhone > On Aug 6, 2015, at 11:49 AM, Harper, Cynthia <[email protected]> wrote: > > I did just bring in my own laptop to see if my problem is unique to my work > computer. I actually have used Amazon AWS, and yes, that might be the best > option. I've been looking into why my MSAccess job is limited to 25% of my > CPU time - Maybe Access just can't use multiprocessors. I'm going to > investigate SLQite and OpenRefine on my presonal laptop. > > Thanks all! > Cindy Harper > > -----Original Message----- > From: Code for Libraries [mailto:[email protected]] On Behalf Of Kyle > Banerjee > Sent: Thursday, August 06, 2015 12:34 PM > To: [email protected] > Subject: Re: [CODE4LIB] Processing Circ data > >> On Wed, Aug 5, 2015 at 1:07 PM, Harper, Cynthia <[email protected]> wrote: >> >> Hi all. What are you using to process circ data for ad-hoc queries. I >> usually extract csv or tab-delimited files - one row per item record, >> with identifying bib record data, then total checkouts over the given >> time period(s). I have been importing these into Access then grouping >> them by bib record. I think that I've reached the limits of >> scalability for Access for this project now, with 250,000 item >> records. Does anyone do this in R? My other go-to- software for data >> processing is RapidMiner free version. Or do you just use MySQL or >> other SQL database? I was looking into doing it in R with RSQLite >> (just read about this and sqldf >> http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because I'm sure my >> IT department will be skeptical of letting me have MySQL on my desktop. >> (I've moved into a much more users-don't-do-real-computing kind of >> environment). I'm rusty enough in R that if anyone will give me some >> start-off data import code, that would be great. > > As has been mentioned already, it's worth investigating whether OpenRefine or > sqllite are options for you. If not, I'd be inclined to explore solutions > that don't rely on your local IT dept. > > It's so easy to spend far more time going through approval, procurement, and > then negotiating local IT security/policies than actually working that it > pays to do a lot of things on the cloud. There are many services out there, > but I like Amazon for occasional need things because you can provision > anything you want in minutes and they're stupid cheap. If all you need is > mysql for a few minutes now and then, just pay for Relational Database > Services. If you'd rather have a server and run mysql off it, get an EBS > backed EC2 instance (the reason to go this route rather than instance store > is improved IO and your data is all retained if you shut off the server > without taking a snapshot). Depending on your usage, bills of less than a > buck a month are very doable. If you need something that runs 24x7, other > routes will probably be more attractive. Another option is to try the mysql > built into cheapo web hosting accounts like bluehost, though you might find > that your disk IO gets ! yo! > u throttled. But it might be worth a shot. > > If doing this work on your desktop is acceptable (i.e. other people don't > need access to this service), you might seriously consider just doing it on a > personal laptop that you can install anything you want on. In addition to > mysql, you can also install VirtualBox which is a great environment for > provisioning servers that you can export to other environments or even carry > around on your cell phone. > > With regards to some of the specific issues you bring up, 40 minutes for a > query on a database that size is insane which indicates the tool you have is > not up for the job. Because of the way databases store info, performance > degrades on a logarthmic (rather than linear) basis on indexed data. In plain > English, this means even queries on millions of records take surprisingly > little power. Based on what you've described, changing a field from variable > to fixed might not save you any space and could even increase it depending on > what you have. In any case, the difference won't be worth worrying about. > > Whatever solution you go with, I'd recommend learning to provision yourself > resources when you can find some time. Work is hard enough when you can't get > the resources you need. When you can simply assign them to yourself, the > tools you need are always at hand so life gets much easier and more fun. > > kyle
