Hello,

I'd go with a full database solution over sqlite. No limitations. 

Thanks,

Sent from my iPhone

> On Aug 6, 2015, at 11:49 AM, Harper, Cynthia <[email protected]> wrote:
> 
> I did just bring in my own laptop to see if my problem is unique to my work 
> computer.  I actually have used Amazon AWS, and yes, that might be the best 
> option.  I've been looking into why my MSAccess job is limited to 25% of my 
> CPU time - Maybe Access just can't use multiprocessors.  I'm going to 
> investigate SLQite and OpenRefine on my presonal laptop.
> 
> Thanks all!
> Cindy Harper
> 
> -----Original Message-----
> From: Code for Libraries [mailto:[email protected]] On Behalf Of Kyle 
> Banerjee
> Sent: Thursday, August 06, 2015 12:34 PM
> To: [email protected]
> Subject: Re: [CODE4LIB] Processing Circ data
> 
>> On Wed, Aug 5, 2015 at 1:07 PM, Harper, Cynthia <[email protected]> wrote:
>> 
>> Hi all. What are you using to process circ data for ad-hoc queries.  I 
>> usually extract csv or tab-delimited files - one row per item record, 
>> with identifying bib record data, then total checkouts over the given 
>> time period(s).  I have been importing these into Access then grouping 
>> them by bib record. I think that I've reached the limits of 
>> scalability for Access for this project now, with 250,000 item 
>> records.  Does anyone do this in R?  My other go-to- software for data 
>> processing is RapidMiner free version.  Or do you just use MySQL or 
>> other SQL database?  I was looking into doing it in R with RSQLite 
>> (just read about this and sqldf 
>> http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because I'm sure my 
>> IT department will be skeptical of letting me have MySQL on my desktop.
>> (I've moved into a much more users-don't-do-real-computing kind of 
>> environment).  I'm rusty enough in R that if anyone will give me some 
>> start-off data import code, that would be great.
> 
> As has been mentioned already, it's worth investigating whether OpenRefine or 
> sqllite are options for you. If not, I'd be inclined to explore solutions 
> that don't rely on your local IT dept.
> 
> It's so easy to spend far more time going through approval, procurement, and 
> then negotiating local IT security/policies than actually working that it 
> pays to do a lot of things on the cloud. There are many services out there, 
> but I like Amazon for occasional need things because you can provision 
> anything you want in minutes and they're stupid cheap. If all you need is 
> mysql for a few minutes now and then, just pay for Relational Database 
> Services. If you'd rather have a server and run mysql off it, get an EBS 
> backed EC2 instance (the reason to go this route rather than instance store 
> is improved IO and your data is all retained if you shut off the server 
> without taking a snapshot). Depending on your usage, bills of less than a 
> buck a month are very doable. If you need something that runs 24x7, other 
> routes will probably be more attractive. Another option is to try the mysql 
> built into cheapo web hosting accounts like bluehost, though you might find 
> that your disk IO gets !
 yo!
> u throttled. But it might be worth a shot.
> 
> If doing this work on your desktop is acceptable (i.e. other people don't 
> need access to this service), you might seriously consider just doing it on a 
> personal laptop that you can install anything you want on. In addition to 
> mysql, you can also install VirtualBox which is a great environment for 
> provisioning servers that you can export to other environments or even carry 
> around on your cell phone.
> 
> With regards to some of the specific issues you bring up, 40 minutes for a 
> query on a database that size is insane which indicates the tool you have is 
> not up for the job. Because of the way databases store info, performance 
> degrades on a logarthmic (rather than linear) basis on indexed data. In plain 
> English, this means even queries on millions of records take surprisingly 
> little power. Based on what you've described, changing a field from variable 
> to fixed might not save you any space and could even increase it depending on 
> what you have. In any case, the difference won't be worth worrying about.
> 
> Whatever solution you go with, I'd recommend learning to provision yourself 
> resources when you can find some time. Work is hard enough when you can't get 
> the resources you need. When you can simply assign them to yourself, the 
> tools you need are always at hand so life gets much easier and more fun.
> 
> kyle

Reply via email to