Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

2018-09-18 Thread Monica
Yes, I agree. I have tried that but the quality is not so good. The quality is compromising here. Is there any other way to OCR pdfs without or less compromising with quality ? On Mon, Sep 17, 2018 at 11:41 PM Jeff Breidenbach wrote: > Tesseract produces searchable PDF directly. If you really

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

2018-09-17 Thread Jeff Breidenbach
Tesseract produces searchable PDF directly. If you really want to use HOCR as an intermediate format, you can but you will need external software. There are a couple of "hocr2pdf" programs floating around and "OCRMyPDF" does an admirable job tying things together. That said, going direct should

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

2018-09-17 Thread Shree Devi Kumar
I think pdf creation adds a text layer only and there isn't an option to add HOCR to it. @jbreiden can confirm. On Mon, Sep 17, 2018 at 6:10 PM, Monica wrote: > I have tried this, but this is showing the default behaviour. I think the > default output is overlaying on pdf instead of hocr out.

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

2018-09-17 Thread Monica
I have tried this, but this is showing the default behaviour. I think the default output is overlaying on pdf instead of hocr out. On Mon, Sep 17, 2018 at 5:47 PM Monica wrote: > Thanks Zdenko for you response. > will "tesseract scannedFile.png scanned.pdf -l eng hocr pdf" overlay on > pdf

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

2018-09-17 Thread Monica
Thanks Zdenko for you response. will "tesseract scannedFile.png scanned.pdf -l eng hocr pdf" overlay on pdf file ? On Mon, Sep 17, 2018 at 5:44 PM Zdenko Podobny wrote: > Something like this? > > tesseract scannedFile.png scanned.pdf -l eng hocr pdf > > Zdenko > > > po 17. 9. 2018 o 14:12

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

2018-09-17 Thread Zdenko Podobny
Something like this? tesseract scannedFile.png scanned.pdf -l eng hocr pdf Zdenko po 17. 9. 2018 o 14:12 monica kumari napĂ­sal(a): > for OCRing a scanned pdf, > first it is converted to image format then OCRed and gives a temperory > file of pdf/text format and overlays on original scanned

[tesseract-ocr] How to overlay hocr output on original scanned pdf.

2018-09-17 Thread monica kumari
for OCRing a scanned pdf, first it is converted to image format then OCRed and gives a temperory file of pdf/text format and overlays on original scanned pdf. I want the output format to be hocr. for this, I ran the command "convert scannedFile.pdf scannedFile.png" and then "tesseract