Hi all, I'm working on DSpace 4->5 upgrades and I've run into a problem with the data in the solr-based usage statistics. Before I open a Jira issue, I thought I'd check whether anyone else has seen the same thing; we've heavily customised the usage statistics, so perhaps the problem affects only "my" repositories.
In DSpace 5, the fields for geographical information in the usage stats were changed in the solr schema to have docValues="true" (https://github.com/DSpace/DSpace/commit/8e2f87e75548b48ad44c6257b47bf45af3e5b4ef). What I'm now seeing is that the statistics pages only show geographical information for visits from the upgrade onwards. Manually running the corresponding facet queries via curl shows the same, so the issue is definitely in what is coming out of solr, not in the DSpace code. I believe that the issue is that when docValues data is present for a field, only those solr documents are included in facet query results that actually have the docValues (see http://www.signaldump.org/solr/qpod/15072/adding-docvalues-after-or-in-the-middle-of-indexing). The pre-upgrade data doesn't have docValues, so they are omitted from the facets used by that part of the statistics pages. Setting docValues="false" and _then_ doing the upgrade retains all geographical data in the stats (but of course throws away the performance improvements gained by enabling docValues). I'm assuming I might be able to fix the docValues problem by re-indexing all documents in the index, and perhaps the solr index auto-upgrade tries to do just that. However, quite a few documents in the index don't have a uid (required unique key field). I'm wondering whether I'm seeing this issue only because we've strung along our data/customisations across several updates. Back when the solr-based usage stats first became a part of DSpace, the usage stats solr documents didn't have any unique identifiers. It looks like this was added for DSpace 3: https://github.com/DSpace/DSpace/commit/808bc6fc5d169f4996523c20a101a30e3e8c6a43#diff-c4dd1d4c13c979500d59399da9c0e861R325 So -- has anyone upgraded a repository with pre-3.0 usage stats data to 5.x? Did the geographical information in the usage stats make it along? I'm talking about the "Top country views" / "Top city views" sections on the item stats pages, eg http://demo.dspace.org/xmlui/handle/10673/3/statistics cheers, Andrea -- Dr Andrea Schweer IRR Technical Specialist, ITS Information Systems The University of Waikato, Hamilton, New Zealand ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Dspace-devel mailing list Dspace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-devel