[tesseract-ocr] Re: some questions about lstm training

2019-01-24 Thread Marziye Rahmati
hi
I had this problem before.
I think that you make a mistake in addressing traineddata. you must give 
traineddata's address that made by tesstrain.sh.
Good luck.

On Friday, January 25, 2019 at 7:04:36 AM UTC+3:30, 易鑫 wrote:
>
> Hello,everyone:
>  I am a new user of tesseract 4.0.Now  I follow the 
> instructions(*https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>  
> )* 
> to training lstm model.
>
> By the way,my environment is Ubuntu16.04 and I compile the tessract 4.0 by 
> myself.I met some problems.
>
> I follow these steps.
> 1.I run this command:
>
> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng 
> --linedata_only \
>   --noextract_font_properties --langdata_dir ../langdata \
>   --tessdata_dir ./tessdata \
>   --fontlist "Impact Condensed" --output_dir ~/tesstutorial/engeval
>
>
> It is okay.
>
> 2.I run this command
>
> mkdir -p ~/tesstutorial/engoutput*training/lstmtraining* --debug_interval 100 
> \
>   --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
>   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
>   --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \
>   --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
>   --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
>   --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log
>
> Here,I am confused,because currently I am in the tesseract directory, *I can 
> not find training folder under this directory.*
>
> and I think after I install the tesseract successfully,the system can 
> recognize the lstmtraining command,so I use this command instead.
>
> *lstmtraining* --debug_interval 100 \
>   --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
>   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
>   --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \
>   --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
>   --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
>   --max_iterations 5000
>
> There is an error.
>
> *mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file 
> ../../src/lstm/lstmtrainer.h, line 110
> Segmentation fault (core dumped)*
>
> *I look the source code in **lstmtrainer.h*
>
> 102   // assumed that the character set is to be re-mapped from 
> old_traineddata to
> 103   // the new, with consequent change in weight matrices etc.
> 104   bool TryLoadingCheckpoint(const char* filename, const char* 
> old_traineddata);
> 105 
> 106   // Initializes the character set encode/decode mechanism directly from a
> 107   // previously setup traineddata containing dawgs, UNICHARSET and
> 108   // UnicharCompress. Note: Call before InitNetwork!
> 109   void InitCharSet(const std::string& traineddata_path) {*110 
> ASSERT_HOST(mgr_.Init(traineddata_path.c_str()));*
> 111 InitCharSet();
> 112   }
> 113   void InitCharSet(const TessdataManager& mgr) {
> 114 mgr_ = mgr;
> 115 InitCharSet();
> 116   }
>
> I don't know how to solve the problem.Is anyone can help me.Thanks in 
> advance.Sorry for my poor english.
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9e042051-4fcf-4658-8bda-07f0023214b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Analyze output from the OCR tutorial

2018-11-26 Thread Marziye Rahmati
Hello to all
Can anyone help me understand the output from the training OCR version 4? 
for example : 
What is delta mean ؟ 
At iteration 3052/5000/5102, Mean rms=0.85%, delta=0.98%, char 
train=3.846%, word train=5.917%, skip ratio=2.1%,  New worst char error = 
3.846 wrote checkpoint.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d9c9e767-a25b-40d1-b2ec-54ca3f381711%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.