Hi,

Continuing the tangent:

We found it confusing when we compared item views for the top items in our 
repository: the numbers on the site-wide stats don't match the item-level 
stats.  Immediately this reduced our confidence in both stats pages: what 
exactly is being counted and are these reliable sources to report on?  

After some digging, we were surprised to learn that: 1) DSpace generates and 
displays stats from two different sources, 2) what's termed "legacy" stats is 
actually the current source for site-level stats, 3) traffic from bots inflate 
stats in both Solr and log-based stats.  These are not things that are obvious 
in the UI or the documentation.  

At our institution, the repository manager is a non-technical person that 
(quite reasonably) took the stats at face value.  She did not expect to need a 
long explanation from the system administrator (myself) on how that stats 
actually work.

Moving forward, it would be preferable if the same source were used for all 
stats displayed in the UI (site-wide and item-level).  Further, the site-wide 
stats could be reviewed and brought up to date (log processing time? no thanks).

Cheers,

Anthony

-----Original Message-----
From: Mark H. Wood [mailto:mw...@iupui.edu] 
Sent: Wednesday, August 19, 2015 8:55 AM
To: dspace-tech@lists.sourceforge.net
Subject: Re: [Dspace-tech] Administrative Statistics

On Tue, Aug 18, 2015 at 02:12:18PM -0500, Tim Donohue wrote:
> The "Administrative Statistics" are the (very old) legacy DSpace 
> statistics pulled from log files, which pre-dated the Usage Statistics 
> (based on Solr).  They are only generated by running these commandline
> options:
> 
> [dspace]/bin/dspace stat-initial
> [dspace]/bin/dspace stat-general
> [dspace]/bin/dspace stat-monthly
> 
> The only reason they still exist is that the Usage Statistics don't 
> provide all the same information (yet).  The Usage Statistics are much 
> more accurate in providing usage information (as these legacy, 
> log-based stats do not filter out spiders or similar).  But, the 
> legacy, log-based stats do provide some unique administrative 
> statistics, like the counts of the number of actions performed in your 
> DSpace, etc.

[tangent]

I don't find the situation confusing at all.  Service administrators have 
different needs than contributors and editors.  While the mechanism for 
gathering and storing sitewide admin. statistics might be improvable, I think 
we ought to look at bringing them up to date and fleshing them out with other 
information that admin.s would want.

As an example, people interested in the content will appreciate having robot 
accesses filtered out, but admin.s might profit from seeing filtered and 
unfiltered counts side by side.  Even more so if they can sample these counts 
mechanically and accumulate them for visualization.

Some other stuff I get asked for includes simple counts of how much stuff we 
have:  how many Items, how many Bitstreams, how many image Bitstreams.  End 
users don't care about such things, but senior administrators do.

For the future, another thing that our biggest statistical consumers want very 
much is views aggregated by *author*.  I'm looking forward to first-class 
support for author identities so that we can do this well.

[tired old refrain]

"Statistics" doesn't mean the same thing to everyone.  It may not mean the same 
thing to *anyone*.

--
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

------------------------------------------------------------------------------
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to