Hi,

It's difficult to help as it looks like the full error message is not 
included.  There is likely should be details in your DSpace log files 
regarding the "RuntimeException" (the message "Unexpected RuntimeException 
from org.apache.tika.parser.pdf.PDFParser@65054901" appears to be cut 
off... as it doesn't say *why* it failed).

I'd recommend looking closer at your logs to see if it provides more 
details as to why these files are not indexing successfully.  These 
indexing failures may also be the result of the different results between 
DSpace 5 and 7... though it's difficult to say for certain.

I'd highly recommend looking at our troubleshooting guide to see if you can 
find more information on the errors.  It's not obvious from what you shared 
what may be happening & I suspect there are more errors occurring for some 
reason. 
https://wiki.lyrasis.org/display/DSPACE/Troubleshoot+an+error#Troubleshootanerror-DSpace7.x(orabove)

Tim

On Wednesday, December 7, 2022 at 2:04:55 PM UTC-6 crims...@gmail.com wrote:

> Hi,
>
> When I run *filter-media -f* after data migration, I get many of the 
> following errors:
>
> ------------------------------------------------------
> Unexpected RuntimeException from 
> org.apache.tika.parser.pdf.PDFParser@65054901
> ERROR filtering, skipping bitstream:
>         Item Handle: 10292/4198 Bundle Name: ORIGINAL   File Size: 364491 
>       Checksum: b4926549648c79076633915c5f85944a (MD5)        Asset Store: 0
> Unexpected RuntimeException from 
> org.apache.tika.parser.pdf.PDFParser@4ff81ac8
> ERROR filtering, skipping bitstream:
>         Item Handle: 10292/7000 Bundle Name: ORIGINAL   File Size: 7123084 
>      Checksum: 2a470bf352e35fea5b89771ce6de7629 (MD5)        Asset Store: 0
> ---------------------------------------------------------------------
>
> Could you please let me know how to fix these errors? Below are some 
> findings from my investigation:
>
> - The command completes successfully and I ran index-discovery -b 
> afterwards. 
> - When I search with empty space, I get +3000 items in Dspace 5.8 but only 
> get +2000 in Dspace 7.3.
> - When I search with a keyword, I get less results in Dspace 7.3 than 5.8. 
> I can open the items found in 5.8 but not in 7.3 manually using the uuid, 
> can also find the keyword in the item multiple times.
> - I cannot find the missing items in the filter-media log.
> - I can confirm the configuration found here - 
> https://wiki.lyrasis.org/display/DSDOC7x/Mediafilters+for+Transforming+DSpace+Content
>  
> - are correctly set.
> - Below are set in local.cfg.
> textextractor.max-chars = 1000000
> textextractor.use-temp-file = true
>
> Examples - 
>
> 1 article 
> https://openrepositorystage.aut.ac.nz/search?query=noodle&spc.page=1&f.dateIssued.min=2002&f.dateIssued.max=2009
>
>  14 articles 
> https://openrepository.aut.ac.nz/handle/10292/3/discover?query=noodle&rpp=10&filtertype=dateCopyright&filter_relational_operator=equals&filter=%5B2000+TO+2009%5D
>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/b3dfe397-f44c-4904-a740-7eacd8e9658an%40googlegroups.com.

Reply via email to