Hi Tim,

As I understand this fix does not change the main design - first create a 
potentially very large list in memory, second update the DB in one single 
transaction. As I understand this fix promises to make the single 
transaction shorter/faster. One needs a big repository to test the fix and 
to compare. My only big repository is the production one. I can not run 
tests on it.

Best regards
Evgeni


On Wednesday, August 15, 2018 at 10:17:57 PM UTC+3, Tim Donohue wrote:
>
> Just a belated follow-up to this thread.  If you are still hitting issues 
> with the Checksum checker in 6.x, I'd recommend looking at the recently 
> logged bug (and proposed fix): https://jira.duraspace.org/browse/DS-3975
>
> Here's the proposed fix (which could use some testers to help us verify): 
> https://github.com/DSpace/DSpace/pull/2169   If you install the fix, 
> please report back on your findings by adding a comment to either the 
> GitHub PR or the JIRA ticket.  Reviews & reports back from the community 
> can help us to approve & merge fixes more quickly.
>
> Thanks,
>
> Tim
>
> On Thu, Jul 5, 2018 at 9:20 AM Evgeni Dimitrov <dimitr...@gmail.com 
> <javascript:>> wrote:
>
>> Thank you Mark,
>>
>> I am afraid that the checker first tries to update the 
>> most_recent_checksum table comparing with the bitstream table and only 
>> after that it looks at the options (in my case -c 1000).
>>
>> Which means that it will always first try to add 500 000 rows to 
>> most_recent_checksum table regardless the options.
>>
>>
>> On Thursday, July 5, 2018 at 4:10:19 PM UTC+3, Mark H. Wood wrote:
>>>
>>> On Thursday, July 5, 2018 at 8:20:28 AM UTC-4, Evgeni Dimitrov wrote:
>>>>
>>>>
>>>> Judging by MostRecentChecksumServiceImpl, first a simple List was 
>>>> created in memory with (in this case) 500 000 bitstreams.
>>>>
>>>> Now for every element in this list a row is being added to the 
>>>> most_recent_checksum table. Perhaps the transaction will be committed when 
>>>> all 500 000 rows are added . . . in two weeks time . . .
>>>>
>>>>
>>> I believe that you are correct.  There's a very unfortunate collision of 
>>> Java culture (it's easy to create elastic Collections, don't worry about 
>>> the 500 000 members case), ORM (of course you always want all 500 000 rows 
>>> trapped in one transaction until you are done consuming them sequentially), 
>>> and layers of service/DAO/holder, which makes it difficult to do 
>>> large-scale operations efficiently. We may need to augment the storage 
>>> layer with explicit support for bulk operations, such as the ability to 
>>> pass an arbitrary instance of a bulk-operator interface to be applied 
>>> iteratively to each result of a query, so that code which understands the 
>>> storage model can use it well to avoid memory bloat.  The business logic 
>>> layer does not and should not have access to these storage details.  There 
>>> are a number of places in DSpace which might be made less resource-hungry 
>>> by such means.
>>>
>>> Until this changes, you may wish to look over the checksum checker's 
>>> options for limiting the amount of work that it does.  It can be run for a 
>>> given amount of time or over a specific count of bitstreams, for example, 
>>> and continue where it stopped when run again.
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "DSpace Technical Support" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to dspace-tech...@googlegroups.com <javascript:>.
>> To post to this group, send email to dspac...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/dspace-tech.
>> For more options, visit https://groups.google.com/d/optout.
>>
> -- 
> Tim Donohue
> Technical Lead for DSpace & DSpaceDirect
> DuraSpace.org | DSpace.org | DSpaceDirect.org
>

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to