Hi Daniel,

On 08/03/13 10:57, Daniel Sifton wrote:

We’ve uploaded a limited amount of OCR pdf documents. Were we to edit the OCR bitstream (.pdf.text) does anyone have any advice on how to go about getting out the bitstream and then getting it back in? Or perhaps I’m coming at this from the wrong angle?


There's nothing special about .pdf.txt files other than the name. Just download the .pdf.txt file, make the edits you want, delete the .pdf.txt file from the DSpace item and upload the edited one. As long as you don't change the filename of the .pdf.txt file, all should be well. You'll have to update your index(es) to include the new text:
[dspace]/bin/dspace index-update -f
[dspace]/bin/dspace update-discovery-index -f (if you're using Discovery)

cheers,
Andrea
-- 
Dr Andrea Schweer
IRR Technical Specialist, ITS Information Systems
The University of Waikato, Hamilton, New Zealand


------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Dspace-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-general

Reply via email to