Hi all,

I'm working on DSpace 4->5 upgrades and I've run into a problem with the 
data in the solr-based usage statistics. Before I open a Jira issue, I 
thought I'd check whether anyone else has seen the same thing; we've 
heavily customised the usage statistics, so perhaps the problem affects 
only "my" repositories.

In DSpace 5, the fields for geographical information in the usage stats 
were changed in the solr schema to have docValues="true" 
(https://github.com/DSpace/DSpace/commit/8e2f87e75548b48ad44c6257b47bf45af3e5b4ef).
 
What I'm now seeing is that the statistics pages only show geographical 
information for visits from the upgrade onwards. Manually running the 
corresponding facet queries via curl shows the same, so the issue is 
definitely in what is coming out of solr, not in the DSpace code.

I believe that the issue is that when docValues data is present for a 
field, only those solr documents are included in facet query results 
that actually have the docValues (see 
http://www.signaldump.org/solr/qpod/15072/adding-docvalues-after-or-in-the-middle-of-indexing).
 
The pre-upgrade data doesn't have docValues, so they are omitted from 
the facets used by that part of the statistics pages. Setting 
docValues="false" and _then_ doing the upgrade retains all geographical 
data in the stats (but of course throws away the performance 
improvements gained by enabling docValues).

I'm assuming I might be able to fix the docValues problem by re-indexing 
all documents in the index, and perhaps the solr index auto-upgrade 
tries to do just that. However, quite a few documents in the index don't 
have a uid (required unique key field). I'm wondering whether I'm seeing 
this issue only because we've strung along our data/customisations 
across several updates. Back when the solr-based usage stats first 
became a part of DSpace, the usage stats solr documents didn't have any 
unique identifiers. It looks like this was added for DSpace 3: 
https://github.com/DSpace/DSpace/commit/808bc6fc5d169f4996523c20a101a30e3e8c6a43#diff-c4dd1d4c13c979500d59399da9c0e861R325
 


So -- has anyone upgraded a repository with pre-3.0 usage stats data to 
5.x? Did the geographical information in the usage stats make it along? 
I'm talking about the "Top country views" / "Top city views" sections on 
the item stats pages, eg 
http://demo.dspace.org/xmlui/handle/10673/3/statistics

cheers,
Andrea

-- 
Dr Andrea Schweer
IRR Technical Specialist, ITS Information Systems
The University of Waikato, Hamilton, New Zealand


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to