Re: [CODE4LIB] Processing Circ data

Kevin Ford Wed, 05 Aug 2015 14:17:19 -0700

On the surface, your difficulties suggest you may need look at a fewoptimization tactics. Apologies if these are things you've alreadyconsidered and addressed - just offering a suggestion.

This page [1] is for Access 2003 but the items under "Improve queryperformance" should apply - I think - to newer versions also. I'll drawspecific attention to 1) Compacting the database; 2) making sure youhave an index set up on the bib record number field and number of circsfield; and 3) make sure you are using hte "Group by" sql syntax [2].

Now, I'm not terribly familiar with Access so I can't actually help youwith point/click instructions, but the above are common 'gotchas' thatcould be a problem regardless of RDBMS.


Yours,
Kevin

[1] https://support.microsoft.com/en-us/kb/209126
[2] http://www.w3schools.com/sql/sql_groupby.asp



On 8/5/15 4:01 PM, Harper, Cynthia wrote:

Well, I guess it could be bad data, but I don't know how to tell. I think I've 
done more than this before.

I have a "Find duplicates" query that groups by bib record number.  That query seemed to 
take about 40 minutes to process. Then I added a criterion to limit to only records that had >0 
circs this year. That query displays the rotating cursor, then says "Not Responding", then 
the cursor, and loops through that for hours.  Maybe I can find the Access bad data, but I'd be glad 
to find a more modern data analysis software.  My db is 136,256 kb.  But adding that extra query will 
probably put it over the 2GB mark.  I've tried extracting to a csv, and that didn't work. Maybe I'll 
try a Make table to a separate db.

Or the OpenRefine suggestion sounds good too.

Cindy Harper

-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf Of Kevin 
Ford
Sent: Wednesday, August 05, 2015 4:23 PM
To: [email protected]
Subject: Re: [CODE4LIB] Processing Circ data

Hi Cindy,

This doesn't quite address your issue, but, unless you've hit the 2 GB Access size limit 
[1], Access can handle a good deal more than 250,000 item records ("rows," 
yes?) you cited.

What makes you think you've hit the limit?  Slowness, something else?

All the best,
Kevin

[1]
https://support.office.com/en-us/article/Access-2010-specifications-1e521481-7f9a-46f7-8ed9-ea9dff1fa854





On 8/5/15 3:07 PM, Harper, Cynthia wrote:

Hi all. What are you using to process circ data for ad-hoc queries.  I usually 
extract csv or tab-delimited files - one row per item record, with identifying 
bib record data, then total checkouts over the given time period(s).  I have 
been importing these into Access then grouping them by bib record. I think that 
I've reached the limits of scalability for Access for this project now, with 
250,000 item records.  Does anyone do this in R?  My other go-to- software for 
data processing is RapidMiner free version.  Or do you just use MySQL or other 
SQL database?  I was looking into doing it in R with RSQLite (just read about 
this and sqldf  http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) 
because ...  I'm rusty enough in R that if anyone will give me some start-off 
data import code, that would be great.

Cindy Harper
E-services and periodicals librarian
Virginia Theological Seminary
Bishop Payne Library
3737 Seminary Road
Alexandria VA 22304
[email protected]<mailto:[email protected]>
703-461-1794

Re: [CODE4LIB] Processing Circ data

Reply via email to