[ 
https://jira.duraspace.org/browse/DS-892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=23063#comment-23063
 ] 

Kevin Van de Velde commented on DS-892:
---------------------------------------

This issue was discussed in the DSpace developers meeting on 2011-11-16, adding 
meeting transcripts below.


[20:02] <tdonohue> we kick off JIRA review with DS-892
[20:02] <kompewter> [ https://jira.duraspace.org/browse/DS-892 ] - [#DS-892] 
Performance issues in update enabling the StatisticsLoggingConsumer - DuraSpace 
JIRA
[20:03] <kshepherd> morning!
[20:03] <mhwood> Afternoon!
[20:03] <PeterDietz> hi
[20:04] <kshepherd> ok, 892 looks like a discussion more than a patch ;)
[20:04] <kshepherd> but it sounds like a useful discussion
[20:04] <tdonohue> true, now that I read it I see it is more a discussion (even 
though it's marked as a "bug")
[20:04] <mhwood> It's both?
[20:04] <kshepherd> maybe ask the folks with the biggest stats indexes to chip 
in with their experiences in the comments (experiences of autocommit impact, 
how long bactch updates take, etc)
[20:05] <KevinVdV> Well my experience is the fastest way to update is how it is 
done in DS-599
[20:05] <kompewter> [ https://jira.duraspace.org/browse/DS-599 ] - [#DS-599] 
SOLR statistics file download displays all files and not only those in the 
Bundle Original - DuraSpace JIRA
[20:05] <kshepherd> KevinVdV's patch to remove non-original bitstream entries 
will help get some index sizes down
[20:06] <PeterDietz> What does StatsLogConsumer not enabled by default mean?
[20:06] <tdonohue> mhwood -- actually you are right. It is a "bug" cause it is 
a problem in existing code (even though it's not enabled by default).
[20:06] <mhwood> If it was just slow, I'd say: find out why it's slow. But if 
auto-commit interferes, then we need to step back and look at the whole process.
[20:06] <kshepherd> hm, well, ok
[20:07] <KevinVdV> @ PeterDietz: if an item moves from the collection the stats 
records would NOT be updated, the consumer takes care of this
[20:07] <KevinVdV> Updating all the records indivualy
[20:07] * stuartlewis (~stuart...@gendiglt02.lbr.auckland.ac.nz) has joined 
#duraspace
[20:07] <KevinVdV> Which can take a long time
[20:07] <mhwood> Pull it out into another thread.
[20:07] <kshepherd> stuartlewis: DS-892
[20:07] <kompewter> [ https://jira.duraspace.org/browse/DS-892 ] - [#DS-892] 
Performance issues in update enabling the StatisticsLoggingConsumer - DuraSpace 
JIRA
[20:09] <PeterDietz> so if you didn't want to pay penalty during changes, you'd 
need a way to invoke statslogConsumer by cron
[20:09] <kshepherd> PeterDietz: i think the suggestion is to do batch updates 
from logs rather than use consumer at all
[20:10] <kshepherd> or did i mistunderstand?
[20:10] <KevinVdV> That could be doable...
[20:10] <PeterDietz> and then the other issue is stat activity that occurs when 
I might be restarting tomcat to change in input-form or something.. the last 
entries, are not committed to solr
[20:10] <mhwood> So we need a listener to catch container shutdown and flush?
[20:10] <PeterDietz> hmm. okay, well then if you want to disable the real-time 
stat logger, then everything is already built
[20:11] <PeterDietz> stats-log-converter, stats-log-importer
[20:11] <PeterDietz> no loss
[20:12] <PeterDietz> mhwood: Right, mdiggory had hinted at something like that, 
I haven't seen any traction
                
> Performance issues in update enabling the StatisticsLoggingConsumer
> -------------------------------------------------------------------
>
>                 Key: DS-892
>                 URL: https://jira.duraspace.org/browse/DS-892
>             Project: DSpace
>          Issue Type: Bug
>          Components: Solr
>    Affects Versions: 1.6.0, 1.6.1, 1.6.2, 1.7.0, 1.7.1
>            Reporter: Andrea Bollini
>            Assignee: Kevin Van de Velde
>            Priority: Critical
>
> We have found that enabling the StatisticsLoggingConsumer to keep statistics 
> data up-to-date after item changes (metadata edit or collection 
> moving/mapping) the item update operations become slowly and the system 
> unusable.
> NOTE: the StatisticsLoggingConsumer is NOT enabled out-of-box in the 
> dspace.cfg this imply that your statistics data could be incongruous (item 
> access assigned to incorrect communities/collections)
> We noticed problems when there are large amount of statistics data (> 20M 
> records), for small repository (< 1M statistics record) the overhead is 
> acceptable.
> Finally, after the introduction of the autocommit patch, the 
> StatisticsLoggingConsumer is not more able to assure the data consistence 
> because the statistics data collected between two auto-commit are not 
> processed by the class.
> Our current idea is to discard the consumer approach in favour to implement a 
> batch tools to periodically analyze the statistics data and fix it as 
> appropriate.
> This issue is a placeholder for such feature and discussion around it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to