Re: [tesseract-ocr] Re: Tesstrain.sh fails when provided > 7 tif/box pairs

2019-01-05 Thread bohdan . moskalevskyi
Change the loop inside phase_I_generate_image() of tesseract_utils.sh to

local counter=0
for font in "${FONTS[@]}"; do
sleep 1
generate_font_image "${font}" &
let counter=counter+1
if [[ "${counter}" -ge par_factor ]]; then
  wait -n
fi
done
Current version has bash error, moreover you waste time with wait instead 
of wait -n

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/34819121-f864-4d94-b5ac-730eb1e6587a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] What do iteration numbers mean in the train logging?

2019-01-01 Thread bohdan . moskalevskyi
Ok, it says it’s learning iteration, training iteration and sample 
iteration respectively. But what do those terms mean? How can one deduce an 
epoch?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/97e0597f-c740-434c-bea9-22e55033a12c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Can't encode transcription error with Sinhala language

2018-12-18 Thread bohdan . moskalevskyi
Getting this for combining acute accent (any 
language): 
https://github.com/tesseract-ocr/tesseract/issues/1012#issuecomment-448306794

неділя, 14 січня 2018 р. 09:01:17 UTC+2 користувач Sumedhe Dissanayake 
написав:
>
> I tried lstmtraining with sinhala language but I always get this error.
>
> Command:
>
> lstmtraining --traineddata ~/tesstutorial/sintrain/sin/sin.traineddata \
>--net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c155]' 
> \
>--debug_interval 0 --max_iterations 50 --max_image_MB 6 
> --learning_rate 
> 20e-4 \
>--model_output ~/tesstutorial/sinoutput/base \
> -U ~/tesstutorial/sintrain/sin/sin.unicharset \
>--traineddata ~/tesstutorial/sintrain/sin/sin.traineddata \
>--train_listfile ~/tesstutorial/sintrain/sin.training_files.txt 
>
>
> Error:
> Can't encode transcription: 'වැනි නිර්භීත දැන් පියඹා මෙන්ම හා' in 
> language ''
>
>
>
>
>
> 
>
> I tried with english language also, It worked well with english.
>
> How to resolve this issue?
>
> Platform:
> Linux Ubuntu 16.04 LTS
>
> Tesseract Version: 
> tesseract 4.00.00alpha
>  leptonica-1.74.4
>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 
> 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/98cd6d4f-9802-4ecb-b8d7-d82e6d675011%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Extract Header and Footer text separately from document image

2018-11-26 Thread bohdan . moskalevskyi
Same here. I’m surprised this issue isn’t more common. Any solutions?

понеділок, 9 квітня 2018 р. 15:43:41 UTC+3 користувач Mohit Jain написав:
>
> Is there a way to extract the header and footer content on a document page 
> separately using Tesseract OCR? I tried the hOCR output but it doesn't seem 
> to have any such tags associated with the output.
>
> Regards,
> Mohit
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/05f41cbb-0dd0-4744-9eba-a98a65393176%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] How recognize footnotes

2018-11-26 Thread bohdan . moskalevskyi
hocr doesn’t help
see 
also 
https://groups.google.com/forum/#!searchin/tesseract-ocr/footer%7Csort:date/tesseract-ocr/YY4jMNmSoTM/KAMTzkc5AQAJ

вівторок, 30 травня 2017 р. 17:57:43 UTC+3 користувач shree написав:
>
> Try the `hocr` output and see if it provides some of what you need.
>
> I don't think tesseract will link to footnotes though it may recognize the 
> text.
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, May 30, 2017 at 7:20 PM, Felipe Ghiardo  > wrote:
>
>> Hi all. 
>>  
>> Using another ocr engines (abby, for ex.), the process recognize the 
>> footnotes and make the link. Also recognize header and footer. The answer 
>> is how can i do the same with tesseract, at least with the footnotes. IIts 
>> something that one can train? And how do you do it? Thanks for the help 
>> (and sorry for my english). 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/dfaec4b7-77a2-4f01-be40-cf2fe1809ddd%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8787aeb9-2f55-4c15-9b67-c1319a46c2e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.