[
https://jira.duraspace.org/browse/DS-892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=23063#comment-23063
]
Kevin Van de Velde commented on DS-892:
---------------------------------------
This issue was discussed in the DSpace developers meeting on 2011-11-16, adding
meeting transcripts below.
[20:02] <tdonohue> we kick off JIRA review with DS-892
[20:02] <kompewter> [ https://jira.duraspace.org/browse/DS-892 ] - [#DS-892]
Performance issues in update enabling the StatisticsLoggingConsumer - DuraSpace
JIRA
[20:03] <kshepherd> morning!
[20:03] <mhwood> Afternoon!
[20:03] <PeterDietz> hi
[20:04] <kshepherd> ok, 892 looks like a discussion more than a patch ;)
[20:04] <kshepherd> but it sounds like a useful discussion
[20:04] <tdonohue> true, now that I read it I see it is more a discussion (even
though it's marked as a "bug")
[20:04] <mhwood> It's both?
[20:04] <kshepherd> maybe ask the folks with the biggest stats indexes to chip
in with their experiences in the comments (experiences of autocommit impact,
how long bactch updates take, etc)
[20:05] <KevinVdV> Well my experience is the fastest way to update is how it is
done in DS-599
[20:05] <kompewter> [ https://jira.duraspace.org/browse/DS-599 ] - [#DS-599]
SOLR statistics file download displays all files and not only those in the
Bundle Original - DuraSpace JIRA
[20:05] <kshepherd> KevinVdV's patch to remove non-original bitstream entries
will help get some index sizes down
[20:06] <PeterDietz> What does StatsLogConsumer not enabled by default mean?
[20:06] <tdonohue> mhwood -- actually you are right. It is a "bug" cause it is
a problem in existing code (even though it's not enabled by default).
[20:06] <mhwood> If it was just slow, I'd say: find out why it's slow. But if
auto-commit interferes, then we need to step back and look at the whole process.
[20:06] <kshepherd> hm, well, ok
[20:07] <KevinVdV> @ PeterDietz: if an item moves from the collection the stats
records would NOT be updated, the consumer takes care of this
[20:07] <KevinVdV> Updating all the records indivualy
[20:07] * stuartlewis ([email protected]) has joined
#duraspace
[20:07] <KevinVdV> Which can take a long time
[20:07] <mhwood> Pull it out into another thread.
[20:07] <kshepherd> stuartlewis: DS-892
[20:07] <kompewter> [ https://jira.duraspace.org/browse/DS-892 ] - [#DS-892]
Performance issues in update enabling the StatisticsLoggingConsumer - DuraSpace
JIRA
[20:09] <PeterDietz> so if you didn't want to pay penalty during changes, you'd
need a way to invoke statslogConsumer by cron
[20:09] <kshepherd> PeterDietz: i think the suggestion is to do batch updates
from logs rather than use consumer at all
[20:10] <kshepherd> or did i mistunderstand?
[20:10] <KevinVdV> That could be doable...
[20:10] <PeterDietz> and then the other issue is stat activity that occurs when
I might be restarting tomcat to change in input-form or something.. the last
entries, are not committed to solr
[20:10] <mhwood> So we need a listener to catch container shutdown and flush?
[20:10] <PeterDietz> hmm. okay, well then if you want to disable the real-time
stat logger, then everything is already built
[20:11] <PeterDietz> stats-log-converter, stats-log-importer
[20:11] <PeterDietz> no loss
[20:12] <PeterDietz> mhwood: Right, mdiggory had hinted at something like that,
I haven't seen any traction
> Performance issues in update enabling the StatisticsLoggingConsumer
> -------------------------------------------------------------------
>
> Key: DS-892
> URL: https://jira.duraspace.org/browse/DS-892
> Project: DSpace
> Issue Type: Bug
> Components: Solr
> Affects Versions: 1.6.0, 1.6.1, 1.6.2, 1.7.0, 1.7.1
> Reporter: Andrea Bollini
> Assignee: Kevin Van de Velde
> Priority: Critical
>
> We have found that enabling the StatisticsLoggingConsumer to keep statistics
> data up-to-date after item changes (metadata edit or collection
> moving/mapping) the item update operations become slowly and the system
> unusable.
> NOTE: the StatisticsLoggingConsumer is NOT enabled out-of-box in the
> dspace.cfg this imply that your statistics data could be incongruous (item
> access assigned to incorrect communities/collections)
> We noticed problems when there are large amount of statistics data (> 20M
> records), for small repository (< 1M statistics record) the overhead is
> acceptable.
> Finally, after the introduction of the autocommit patch, the
> StatisticsLoggingConsumer is not more able to assure the data consistence
> because the statistics data collected between two auto-commit are not
> processed by the class.
> Our current idea is to discard the consumer approach in favour to implement a
> batch tools to periodically analyze the statistics data and fix it as
> appropriate.
> This issue is a placeholder for such feature and discussion around it.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel