Re: Risks of DJVU/lossy compression - Re: If you OCR, always archive the bitmaps too

Toby Thain Sun, 27 Sep 2015 13:19:12 -0700

On 2015-09-27 4:14 PM, Toby Thain wrote:

On 2015-09-27 2:33 PM, Fred Cisin wrote:

On Sun, 27 Sep 2015, Pontus Pihlgren wrote:

It seems to me that a better tool could solve the issue. One that
could display the OCR:ed content only and the scanned content
only when desired, for instance when you suspect an error.
Is there such a reader? Is the content organised to make it
possible.


I haven't seen one.


I did start trying to write an heuristic probabilistic OCR one 25 years
ago.  The idea being to overlay the OCR'd (displayed with matching
fonts) over the scanned content. ...


DJVU compression is somewhat analogous to this process, ...

There was a somewhat scary case study on the web a few years ago (not
sure if it's still out there, haven't been able to find it)


Here it is.
https://news.ycombinator.com/item?id=6156238

The compression method was apparently JBIG2, but it could also affect DJVU.

--Toby

... The risks are obvious(*).

--Toby


* - Hat tip to PGN. comp.risks digest.

Re: Risks of DJVU/lossy compression - Re: If you OCR, always archive the bitmaps too

Reply via email to