Re: [dspace-tech] Checksum checker "Un-Checked Bitstream" report

2016-11-28 Thread Monika Mevenkamp
I rewrote the checker so I can trigger checking such that I can select the 
bitstreams to be checked based on  last checked date, or the last check result. 
The check can be limited to checking a max number of bitstreams. I added  a CLI 
utility that can list the check status of bitstreams or count the number of 
bitstreams in a given status - aka DELETED or NOT_FOUND. The output of the 2 
CLIs is easy to grep through, aka it makes it easy to look for relevant info. I 
use them in my daily cronjob to go round robin through bitstreams such that all 
my bitstreams are checked every 3 weeks. 

Here a two example usages with corresponding  output: 

> $DSPACE_HOME/bin/dspace checksum -d check -c 4000 -x BITSTREAM_MARKED_DELETED
# 
org.dspace.checker.CheckBitstreamIterator(without_result=[BITSTREAM_MARKED_DELETED])
# Action check
# Max-Count 4000
# Printing  m for CHECKSUM_MATCH, d for BITSTREAM_MARKED_DELETED, and E in all 
other cases


….

# worked on 4000 bitstreams

> $DSPACE_HOME/bin/dspace checksum -d print -c 1 -x BITSTREAM_MARKED_DELETED | 
> egrep -v '^#’
1 BITSTREAM.39318 CHECKSUM_MATCH 
internalId=9128636048098563653161844066534785665  delete=false  
lastDate=2016-11-06 01:02:03.83 


See documentation HERE 

There is a JIRA HERE 
And a PR THERE 

I’d be happy to help if you want to try this out 

Monika

 
Monika Mevenkamp
Digital Repository Infrastructure Developer
Princeton University
Phone: 609-258-4161
Skype: mo-meven



> On Nov 24, 2016, at 11:58 AM, do...@uoguelph.ca wrote:
> 
> Hi all, 
> 
> We'll occasionally get bitstreams showing up in this report, but I'm really 
> unclear on why or what it is that we should be doing to address it. 
> 
> What conditions cause a bitstream to be skipped and included in this report?
> 
> As an administrator, what actions should I take when this occurs? 
> 
> The report itself says "To add these bitstreams to be checked run the 
> checksum checker with the -u option" but the checker doesn't seem to have a 
> -u option 
> .
>  Running the checker with the -u option doesn't seem to make any noticeable 
> difference.
> 
> Digging into the most recent report where this occurred, I can see that the 
> item associated to the unchecked bitstreams has finished its workflow and has 
> been accepted to our repository. 
> 
> Any help or guidance that you folks could provide would be greatly 
> appreciated!
> 
> Thanks,
> -Adam
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to dspace-tech+unsubscr...@googlegroups.com 
> .
> To post to this group, send email to dspace-tech@googlegroups.com 
> .
> Visit this group at https://groups.google.com/group/dspace-tech 
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


[dspace-tech] Checksum checker (un-checked bitstream report) discrepancy

2016-02-01 Thread Feed My Lambs Esq.
After a mass ingest of content into a fresh system (and I confirmed with 
fresh record creations), if the `checker-email` task runs without first 
properly running the `checksum` task, the outcome is alarmist and 
surprising and (I believe) incorrect.

Every item that is 'yet to be checked' is added to the report with this 
header message:

The following is a UN-CHECKED BITSTREAM REPORT report for2/1/16 [*note the 
> lack of space before the date*]
> To add these bitstreams to be checked run the checksum checker with the -u 
> option [*I believe this line to be incorrect*]


The main problem
There does not appear 
 
to be a "-u" option for the `checker` routine. Attempting to run `dspace 
checker -u` generates a FATAL log message. There does exist a "-u" flag for 
the `checker-email` task, but it simply generates the email with the 
un-checked report by itself -- *not* what is described in the report. I 
believe a more accurate message would be communicate the instructions to 
run the checker.

I realize it's a little nebulous and tough to condense those instructions 
but the checker does need to hit those records one way or another and there 
is no "-u" flag to make it easier. Maybe it is sufficient to point to the 
documentation or mention running "checker -l" to make sure everything gets 
seen.

Lack of Success / All is Well Confirmation
Now that I have processed every item, the checksum checker sends no 
message. Is there an option to generate an "all-clear" kind of message with 
the `checker-email` task? An "email anyway" option? The dspace log file 
even seems silent on `checker-email` completion. If no email is going to be 
sent, I recommend leaving an INFO note to reassure anyone that looks for 
one. An INFO note might be nice either way such as "INFO sending checker 
email results to ___" and "INFO no need to send an email for the checksum 
checker results. All is well."

A Dream Combo - email results immediately.
It would be really neat to create a flag that allows you to run the checker 
immediately followed by the `checker-email` routine rather than having to 
schedule two tasks separately and hope the one completes before the second 
begins. That stacking might be easier in Linux Cron job management but I 
don't see a way to chain them together in Windows' task scheduler.

Bonus question
Finally, for my sanity, on a Windows server running DSpace 5, is it safe to 
run the checker as "dspace checker -l -p" and do the flags operate the same 
as "dspace checker -lp"? I intend to run it weekly over every item and have 
the pruning happen automatically.

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.