[ https://jira.duraspace.org/browse/DS-892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=23063#comment-23063 ]
Kevin Van de Velde commented on DS-892: --------------------------------------- This issue was discussed in the DSpace developers meeting on 2011-11-16, adding meeting transcripts below. [20:02] <tdonohue> we kick off JIRA review with DS-892 [20:02] <kompewter> [ https://jira.duraspace.org/browse/DS-892 ] - [#DS-892] Performance issues in update enabling the StatisticsLoggingConsumer - DuraSpace JIRA [20:03] <kshepherd> morning! [20:03] <mhwood> Afternoon! [20:03] <PeterDietz> hi [20:04] <kshepherd> ok, 892 looks like a discussion more than a patch ;) [20:04] <kshepherd> but it sounds like a useful discussion [20:04] <tdonohue> true, now that I read it I see it is more a discussion (even though it's marked as a "bug") [20:04] <mhwood> It's both? [20:04] <kshepherd> maybe ask the folks with the biggest stats indexes to chip in with their experiences in the comments (experiences of autocommit impact, how long bactch updates take, etc) [20:05] <KevinVdV> Well my experience is the fastest way to update is how it is done in DS-599 [20:05] <kompewter> [ https://jira.duraspace.org/browse/DS-599 ] - [#DS-599] SOLR statistics file download displays all files and not only those in the Bundle Original - DuraSpace JIRA [20:05] <kshepherd> KevinVdV's patch to remove non-original bitstream entries will help get some index sizes down [20:06] <PeterDietz> What does StatsLogConsumer not enabled by default mean? [20:06] <tdonohue> mhwood -- actually you are right. It is a "bug" cause it is a problem in existing code (even though it's not enabled by default). [20:06] <mhwood> If it was just slow, I'd say: find out why it's slow. But if auto-commit interferes, then we need to step back and look at the whole process. [20:06] <kshepherd> hm, well, ok [20:07] <KevinVdV> @ PeterDietz: if an item moves from the collection the stats records would NOT be updated, the consumer takes care of this [20:07] <KevinVdV> Updating all the records indivualy [20:07] * stuartlewis (~stuart...@gendiglt02.lbr.auckland.ac.nz) has joined #duraspace [20:07] <KevinVdV> Which can take a long time [20:07] <mhwood> Pull it out into another thread. [20:07] <kshepherd> stuartlewis: DS-892 [20:07] <kompewter> [ https://jira.duraspace.org/browse/DS-892 ] - [#DS-892] Performance issues in update enabling the StatisticsLoggingConsumer - DuraSpace JIRA [20:09] <PeterDietz> so if you didn't want to pay penalty during changes, you'd need a way to invoke statslogConsumer by cron [20:09] <kshepherd> PeterDietz: i think the suggestion is to do batch updates from logs rather than use consumer at all [20:10] <kshepherd> or did i mistunderstand? [20:10] <KevinVdV> That could be doable... [20:10] <PeterDietz> and then the other issue is stat activity that occurs when I might be restarting tomcat to change in input-form or something.. the last entries, are not committed to solr [20:10] <mhwood> So we need a listener to catch container shutdown and flush? [20:10] <PeterDietz> hmm. okay, well then if you want to disable the real-time stat logger, then everything is already built [20:11] <PeterDietz> stats-log-converter, stats-log-importer [20:11] <PeterDietz> no loss [20:12] <PeterDietz> mhwood: Right, mdiggory had hinted at something like that, I haven't seen any traction > Performance issues in update enabling the StatisticsLoggingConsumer > ------------------------------------------------------------------- > > Key: DS-892 > URL: https://jira.duraspace.org/browse/DS-892 > Project: DSpace > Issue Type: Bug > Components: Solr > Affects Versions: 1.6.0, 1.6.1, 1.6.2, 1.7.0, 1.7.1 > Reporter: Andrea Bollini > Assignee: Kevin Van de Velde > Priority: Critical > > We have found that enabling the StatisticsLoggingConsumer to keep statistics > data up-to-date after item changes (metadata edit or collection > moving/mapping) the item update operations become slowly and the system > unusable. > NOTE: the StatisticsLoggingConsumer is NOT enabled out-of-box in the > dspace.cfg this imply that your statistics data could be incongruous (item > access assigned to incorrect communities/collections) > We noticed problems when there are large amount of statistics data (> 20M > records), for small repository (< 1M statistics record) the overhead is > acceptable. > Finally, after the introduction of the autocommit patch, the > StatisticsLoggingConsumer is not more able to assure the data consistence > because the statistics data collected between two auto-commit are not > processed by the class. > Our current idea is to discard the consumer approach in favour to implement a > batch tools to periodically analyze the statistics data and fix it as > appropriate. > This issue is a placeholder for such feature and discussion around it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.duraspace.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Dspace-devel mailing list Dspace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-devel