Tesseract produces searchable PDF directly. If you really want to use HOCR
as an
intermediate format, you can but you will need external software. There are
a couple
of "hocr2pdf" programs floating around and "OCRMyPDF" does an admirable
job
tying things together. That said, going direct should
I use it as follows and it works. Please check that you are using correct
paths for the files.
combine_lang_model \
--input_unicharset ./layersan/san.unicharset \
--script_dir ~/langdata \
--words ~/langdata/san/san.wordlist \
--numbers ~/langdata/san/san.numbers \
--puncs ~/langdata/san/san.punc
I think pdf creation adds a text layer only and there isn't an option to
add HOCR to it.
@jbreiden can confirm.
On Mon, Sep 17, 2018 at 6:10 PM, Monica wrote:
> I have tried this, but this is showing the default behaviour. I think the
> default output is overlaying on pdf instead of hocr out.
I have tried this, but this is showing the default behaviour. I think the
default output is overlaying on pdf instead of hocr out.
On Mon, Sep 17, 2018 at 5:47 PM Monica wrote:
> Thanks Zdenko for you response.
> will "tesseract scannedFile.png scanned.pdf -l eng hocr pdf" overlay on
> pdf
Thanks Zdenko for you response.
will "tesseract scannedFile.png scanned.pdf -l eng hocr pdf" overlay on pdf
file ?
On Mon, Sep 17, 2018 at 5:44 PM Zdenko Podobny wrote:
> Something like this?
>
> tesseract scannedFile.png scanned.pdf -l eng hocr pdf
>
> Zdenko
>
>
> po 17. 9. 2018 o 14:12
Something like this?
tesseract scannedFile.png scanned.pdf -l eng hocr pdf
Zdenko
po 17. 9. 2018 o 14:12 monica kumari napĂsal(a):
> for OCRing a scanned pdf,
> first it is converted to image format then OCRed and gives a temperory
> file of pdf/text format and overlays on original scanned
for OCRing a scanned pdf,
first it is converted to image format then OCRed and gives a temperory file
of pdf/text format and overlays on original scanned pdf.
I want the output format to be hocr. for this, I ran the command
"convert scannedFile.pdf scannedFile.png" and then "tesseract
7 matches
Mail list logo