subject:"\[tesseract\-ocr\] Trying to add chars to tesseract 4.0"

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

2017-12-11 Thread ShreeDevi Kumar

You can add
  --debug_interval -1
to your lstmtraining command to get debug info with each training iteration
on console

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Dec 12, 2017 at 10:10 AM, ShreeDevi Kumar 
wrote:

> Your script seems to look ok.
>
> --U  $train_output_dir/eng/eng.unicharset \   # not sure if this is
> necessary; doesn't make a difference
> is NOT required
>
> I will suggest that you remove files from an earlier run, before running
> the script.
>
> Take a look at  $train_output_dir/eng directory and review the unicharset
> there to see whether your new characters are included in the unicharset.
>
> Take a look at the log file, specially in the initial portion to see
> whether it shows increase in number of characters.
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Dec 12, 2017 at 9:24 AM, J Klein  wrote:
>
>>
>> On Thursday, December 7, 2017 at 11:55:53 PM UTC-5, shree wrote:
>>>
>>> Please check the last section on
>>>  https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>>>
>>
>> Thank you for this tip.   I'm getting farther than before.  I thought
>> --trainedata was my final traineddata output file.
>> I now made the final eng.trainedata  'lstmtraining --stop_training "
>> as follows
>>
>> $tesstrain_dir/lstmtraining \
>> --stop_training \
>> --continue_from $train_output_dir/pluschars_checkpoint \
>> --traineddata $train_output_dir/eng/eng.traineddata \
>> --U  $train_output_dir/eng/eng.unicharset \   # not sure if this is
>> necessary; doesn't make a difference
>> --model_output $final_trained_data_file
>>
>> And I get a $final_trained_data_file that I can use to replace
>> /usr/local/share/tessdata/eng.traineddata and it doesn't fail on init3()
>> any more.  But it doesn't recognize any of the new chars either.
>> However, in running
>>
>>   /usr/local/bin/tesseract-training/lstmeval \
>> --model ./trained_plus_chars/pluschars_checkpoint  \
>> --traineddata ./trained_plus_chars/eng/eng.traineddata \
>> --eval_listfile ./trained_plus_chars/eng.training_files.txt
>>
>> it DID recognize the new chars most of the time.  So I think there may
>> still be something something wrong with the construction of the 
>> --model_output
>> $final_trained_data_file.
>>
>> My entire training sequence bash script is here:  
>> *https://pastebin.com/gNLvXkiM
>> *
>>
>> Can you tell if there is anything obviously wrong?
>>
>>
>> Thanks
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/10194cda-9e8d-494c-ae4a-157e3d25f913%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWwe2bcXuv%2B1bCV3c5kgfro-U_Q3jWHQFjAQd_YvaStmg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

2017-12-11 Thread ShreeDevi Kumar

Your script seems to look ok.

--U  $train_output_dir/eng/eng.unicharset \   # not sure if this is
necessary; doesn't make a difference
is NOT required

I will suggest that you remove files from an earlier run, before running
the script.

Take a look at  $train_output_dir/eng directory and review the unicharset
there to see whether your new characters are included in the unicharset.

Take a look at the log file, specially in the initial portion to see
whether it shows increase in number of characters.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Dec 12, 2017 at 9:24 AM, J Klein  wrote:

>
> On Thursday, December 7, 2017 at 11:55:53 PM UTC-5, shree wrote:
>>
>> Please check the last section on
>>  https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>>
>
> Thank you for this tip.   I'm getting farther than before.  I thought
> --trainedata was my final traineddata output file.
> I now made the final eng.trainedata  'lstmtraining --stop_training "
> as follows
>
> $tesstrain_dir/lstmtraining \
> --stop_training \
> --continue_from $train_output_dir/pluschars_checkpoint \
> --traineddata $train_output_dir/eng/eng.traineddata \
> --U  $train_output_dir/eng/eng.unicharset \   # not sure if this is
> necessary; doesn't make a difference
> --model_output $final_trained_data_file
>
> And I get a $final_trained_data_file that I can use to replace
> /usr/local/share/tessdata/eng.traineddata and it doesn't fail on init3()
> any more.  But it doesn't recognize any of the new chars either.
> However, in running
>
>   /usr/local/bin/tesseract-training/lstmeval \
> --model ./trained_plus_chars/pluschars_checkpoint  \
> --traineddata ./trained_plus_chars/eng/eng.traineddata \
> --eval_listfile ./trained_plus_chars/eng.training_files.txt
>
> it DID recognize the new chars most of the time.  So I think there may
> still be something something wrong with the construction of the --model_output
> $final_trained_data_file.
>
> My entire training sequence bash script is here:  
> *https://pastebin.com/gNLvXkiM
> *
>
> Can you tell if there is anything obviously wrong?
>
>
> Thanks
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/10194cda-9e8d-494c-ae4a-157e3d25f913%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVp9_2bwOYdWsFnLsWusK_N9p3htvGJEd4X7UjmmTskNA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

2017-12-11 Thread J Klein


On Thursday, December 7, 2017 at 11:55:53 PM UTC-5, shree wrote:
>
> Please check the last section on
>  https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>
 
Thank you for this tip.   I'm getting farther than before.  I thought 
--trainedata was my final traineddata output file.
I now made the final eng.trainedata  'lstmtraining --stop_training " as 
follows

$tesstrain_dir/lstmtraining \
--stop_training \
--continue_from $train_output_dir/pluschars_checkpoint \
--traineddata $train_output_dir/eng/eng.traineddata \
--U  $train_output_dir/eng/eng.unicharset \   # not sure if this is 
necessary; doesn't make a difference
--model_output $final_trained_data_file

And I get a $final_trained_data_file that I can use to replace 
/usr/local/share/tessdata/eng.traineddata and it doesn't fail on init3() 
any more.  But it doesn't recognize any of the new chars either.
However, in running
  
  /usr/local/bin/tesseract-training/lstmeval \
--model ./trained_plus_chars/pluschars_checkpoint  \
--traineddata ./trained_plus_chars/eng/eng.traineddata \
--eval_listfile ./trained_plus_chars/eng.training_files.txt 

it DID recognize the new chars most of the time.  So I think there may 
still be something something wrong with the construction of the --model_output 
$final_trained_data_file.

My entire training sequence bash script is here:  
*https://pastebin.com/gNLvXkiM*

Can you tell if there is anything obviously wrong?


Thanks



-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/10194cda-9e8d-494c-ae4a-157e3d25f913%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

2017-12-07 Thread ShreeDevi Kumar

It is possible that you are treating the 'starter' traineddata file as the
final one. Please read the training wiki page fully as the training process
has been changed by Ray in his last update.

On 08-Dec-2017 10:25 AM, "ShreeDevi Kumar"  wrote:

> Please check the last section on
>
>  https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>
> Regarding combining files to know the correct syntax for building the new
> traineddata file.
>
>
> On 08-Dec-2017 8:04 AM, "J Klein"  wrote:
>
>>
>>
>> On Thursday, December 7, 2017 at 9:02:11 PM UTC-5, shree wrote:
>>>
>>> Re smaller traineddata size, it could possibly be related to the word
>>> list dictionary size.
>>>
>>> You can unpack the original traineddata and compare the word list size
>>> with the one you used.
>>>
>>
>>
>> Thank you for the hint.
>>
>> I ran the following (-u is 'unpack all' I think),
>>
>>   combine_tessdata  -u /usr/local/share/tessdata/eng.traineddata eng.
>>
>> and I got:
>>
>> -rw-r--r--  1 klein  staff  11689099 Dec  7 21:22 eng.lstm
>>
>> -rw-r--r--  1 klein  staff  4738 Dec  7 21:22 eng.lstm-number-dawg
>>
>> -rw-r--r--  1 klein  staff  4322 Dec  7 21:22 eng.lstm-punc-dawg
>>
>> -rw-r--r--  1 klein  staff  1012 Dec  7 21:22 eng.lstm-recoder
>>
>> -rw-r--r--  1 klein  staff  6360 Dec  7 21:22 eng.lstm-unicharset
>>
>> -rw-r--r--  1 klein  staff   3694794 Dec  7 21:22 eng.lstm-word-dawg
>>
>> -rw-r--r--  1 klein  staff80 Dec  7 21:22 eng.version -- CONTENT
>> is 4.00.00alpha:eng:synth20170629:[1,36,0,1Ct3,3,16Mp3,3Lfys
>> 64Lfx96Lrx96Lfx512O1c1]
>>
>>
>> Now I tried to unpack the one I created by adding the characters, and I
>> get
>>
>>
>> x eng.lstm is missing!
>>
>> -rw-r--r--  1 klein  staff 3506 Dec  7 21:26 eng.lstm-number-dawg
>>
>> -rw-r--r--  1 klein  staff 4322 Dec  7 21:26 eng.lstm-punc-dawg
>>
>> -rw-r--r--  1 klein  staff 1030 Dec  7 21:26 eng.lstm-recoder
>>
>> -rw-r--r--  1 klein  staff 9379 Dec  7 21:26 eng.lstm-unicharset
>>
>> -rw-r--r--  1 klein  staff  4153402 Dec  7 21:26 eng.lstm-word-dawg
>>
>> -rw-r--r--  1 klein  staff   12 Dec  7 21:26 eng.version  -- CONTENT
>> IS '4.00.00alpha'
>>
>> So you're right that the word-list is different.
>>
>> But more importantly it seems that eng.lstm isn't in the final
>> eng.traineddata.   Do I not understand something about how the process
>> works?  Is this my mistake, or a glitch!
>>
>> Thanks for helping me to make progress.
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/0dc37684-c454-4993-9387-ad641f22f016%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXP3-B6tAd_e7K63EuDPSCfA0mRNf_NWzuZPOdCsPdvYw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

2017-12-07 Thread ShreeDevi Kumar

Please check the last section on

 https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00

Regarding combining files to know the correct syntax for building the new
traineddata file.


On 08-Dec-2017 8:04 AM, "J Klein"  wrote:

>
>
> On Thursday, December 7, 2017 at 9:02:11 PM UTC-5, shree wrote:
>>
>> Re smaller traineddata size, it could possibly be related to the word
>> list dictionary size.
>>
>> You can unpack the original traineddata and compare the word list size
>> with the one you used.
>>
>
>
> Thank you for the hint.
>
> I ran the following (-u is 'unpack all' I think),
>
>   combine_tessdata  -u /usr/local/share/tessdata/eng.traineddata eng.
>
> and I got:
>
> -rw-r--r--  1 klein  staff  11689099 Dec  7 21:22 eng.lstm
>
> -rw-r--r--  1 klein  staff  4738 Dec  7 21:22 eng.lstm-number-dawg
>
> -rw-r--r--  1 klein  staff  4322 Dec  7 21:22 eng.lstm-punc-dawg
>
> -rw-r--r--  1 klein  staff  1012 Dec  7 21:22 eng.lstm-recoder
>
> -rw-r--r--  1 klein  staff  6360 Dec  7 21:22 eng.lstm-unicharset
>
> -rw-r--r--  1 klein  staff   3694794 Dec  7 21:22 eng.lstm-word-dawg
>
> -rw-r--r--  1 klein  staff80 Dec  7 21:22 eng.version -- CONTENT
> is 4.00.00alpha:eng:synth20170629:[1,36,0,1Ct3,3,16Mp3,
> 3Lfys64Lfx96Lrx96Lfx512O1c1]
>
>
> Now I tried to unpack the one I created by adding the characters, and I get
>
>
> x eng.lstm is missing!
>
> -rw-r--r--  1 klein  staff 3506 Dec  7 21:26 eng.lstm-number-dawg
>
> -rw-r--r--  1 klein  staff 4322 Dec  7 21:26 eng.lstm-punc-dawg
>
> -rw-r--r--  1 klein  staff 1030 Dec  7 21:26 eng.lstm-recoder
>
> -rw-r--r--  1 klein  staff 9379 Dec  7 21:26 eng.lstm-unicharset
>
> -rw-r--r--  1 klein  staff  4153402 Dec  7 21:26 eng.lstm-word-dawg
>
> -rw-r--r--  1 klein  staff   12 Dec  7 21:26 eng.version  -- CONTENT
> IS '4.00.00alpha'
>
> So you're right that the word-list is different.
>
> But more importantly it seems that eng.lstm isn't in the final
> eng.traineddata.   Do I not understand something about how the process
> works?  Is this my mistake, or a glitch!
>
> Thanks for helping me to make progress.
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/0dc37684-c454-4993-9387-ad641f22f016%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX8XjPY3kptPsT1wsyFc%3D_JRZ9U%2Bdx9M681SJ3ZfgqMJw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

2017-12-07 Thread J Klein



On Thursday, December 7, 2017 at 9:02:11 PM UTC-5, shree wrote:
>
> Re smaller traineddata size, it could possibly be related to the word list 
> dictionary size.
>
> You can unpack the original traineddata and compare the word list size 
> with the one you used.
>


Thank you for the hint.

I ran the following (-u is 'unpack all' I think), 

  combine_tessdata  -u /usr/local/share/tessdata/eng.traineddata eng.

and I got:

-rw-r--r--  1 klein  staff  11689099 Dec  7 21:22 eng.lstm

-rw-r--r--  1 klein  staff  4738 Dec  7 21:22 eng.lstm-number-dawg

-rw-r--r--  1 klein  staff  4322 Dec  7 21:22 eng.lstm-punc-dawg

-rw-r--r--  1 klein  staff  1012 Dec  7 21:22 eng.lstm-recoder

-rw-r--r--  1 klein  staff  6360 Dec  7 21:22 eng.lstm-unicharset

-rw-r--r--  1 klein  staff   3694794 Dec  7 21:22 eng.lstm-word-dawg

-rw-r--r--  1 klein  staff80 Dec  7 21:22 eng.version -- CONTENT is 
4.00.00alpha:eng:synth20170629:[1,36,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1]


Now I tried to unpack the one I created by adding the characters, and I get


x eng.lstm is missing!

-rw-r--r--  1 klein  staff 3506 Dec  7 21:26 eng.lstm-number-dawg

-rw-r--r--  1 klein  staff 4322 Dec  7 21:26 eng.lstm-punc-dawg

-rw-r--r--  1 klein  staff 1030 Dec  7 21:26 eng.lstm-recoder

-rw-r--r--  1 klein  staff 9379 Dec  7 21:26 eng.lstm-unicharset

-rw-r--r--  1 klein  staff  4153402 Dec  7 21:26 eng.lstm-word-dawg

-rw-r--r--  1 klein  staff   12 Dec  7 21:26 eng.version  -- CONTENT IS 
'4.00.00alpha'

So you're right that the word-list is different. 

But more importantly it seems that eng.lstm isn't in the final 
eng.traineddata.   Do I not understand something about how the process 
works?  Is this my mistake, or a glitch!

Thanks for helping me to make progress.




-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0dc37684-c454-4993-9387-ad641f22f016%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

2017-12-07 Thread ShreeDevi Kumar

Re smaller traineddata size, it could possibly be related to the word list
dictionary size.

You can unpack the original traineddata and compare the word list size with
the one you used.

On 06-Dec-2017 12:07 PM, "J Klein"  wrote:

>
> [this might be a repost; the first attept didn't show up]
>
> I'm using the C API of tesseract 4.0 on OS X, and I tried to add some more
> characters.   (4.0 seems much better than 3.x, I should add - thanks to
> everyone who made this possible!)
>
> I used this manual section: https://github.com/
> tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-
> tuning-for--a-few-characters
>
> as a guide to construct the following script:  https://pastebin.com/
> 4n2mRSpq
>
> Before running, I modified  langdata/eng/eng.training_text with the extra
> chars, maybe 15 instances of each, as instructed.
>
> I'm using only a subset of the original training fonts, but I figure it is
> OK, since I'm adding only a few distinctive chars.
>
> The NN optimizer lstmtraining ran, and gave a bunch of checkpoints, and a
> final file $train_output_dir/eng/eng.trainedata
>
> But this eng.traineddata was 5MB when the original one was 15.4MB.And
> when I tried to copy it over the pre-loaded 'best' eng.traineddata and run
> tesseract it failed in TessBaseAPIinit3 with error=-1.
>
>
> Does anyone know why 1) my eng.trainedata is so much smaller and 2) why it
> fails to even load in API init()?
>
> Thanks for any tips!
>
>
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/50c6b233-602e-4479-a518-3bfd6baa10c9%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVBH93rq6mn6fyxruLYcbO9PMaRE35HGob0ahWv4pxO7Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Trying to add chars to tesseract 4.0

2017-12-05 Thread J Klein

[this might be a repost; the first attept didn't show up]

I'm using the C API of tesseract 4.0 on OS X, and I tried to add some more
characters. (4.0 seems much better than 3.x, I should add - thanks to
everyone who made this possible!)

I used this manual
section:
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters

as a guide to construct the following
script: https://pastebin.com/4n2mRSpq

Before running, I modified langdata/eng/eng.training_text with the extra
chars, maybe 15 instances of each, as instructed.

I'm using only a subset of the original training fonts, but I figure it is
OK, since I'm adding only a few distinctive chars.

The NN optimizer lstmtraining ran, and gave a bunch of checkpoints, and a
final file $train_output_dir/eng/eng.trainedata

But this eng.traineddata was 5MB when the original one was 15.4MB.And
when I tried to copy it over the pre-loaded 'best' eng.traineddata and run
tesseract it failed in TessBaseAPIinit3 with error=-1.

Does anyone know why 1) my eng.trainedata is so much smaller and 2) why it
fails to even load in API init()?

Thanks for any tips!

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/50c6b233-602e-4479-a518-3bfd6baa10c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

[tesseract-ocr] Trying to add chars to tesseract 4.0

8 matches

Site Navigation

Mail list logo

Footer information