[ocropus] Re: Using a "dictionary" for recognition?

Amrit Mon, 30 May 2011 12:14:33 -0700

As an update, on the images I had shared there seems to be a scaling
issue.I am getting significantly better results by increasing the
scaling factor to a range of 1000x350.(Keeping an aspect ration ~1:3
appears to have a significant effect).Are there any guidelines which
can be followed in this regards?


Tom,
        I wanted to confirm the OCRopus line decoding results using
the following methods:
I ran ocropus-lattice and ocropus-linerec on a line image output after
page segmentation and found that the 1best from the lattice differs
from the output of ocropus-linerec.
Should this be expected?
Is there a difference between how the lattice is constructed vis-a-vis
decoding using ocropus-linerec ,if so, how these two should be used
for recognition?

Regards,
Amrit.

On May 26, 11:14 pm, Amrit <[email protected]> wrote:
> Hi Tom,
>              Continuing on our earlier correspondence on making use of a
> language model with ocropus line recognizer I would request your insights on
> the following things that I have been trying with ocropus:
>
> 1. According to the reference of ocropus-align , the language model can be
> supplied as a plain text file with elements separated out by newline.
>     e,g, ocropus-align -s gt --langmod=../IMAGE-Corpus/Test/SampleList.txt
> book/0001/010003.fst
>  where SampleList.txt contains the following:
> HARTFORD CT 06120
> HARTFORD CT 06105
> HARTFORD CT 06106
> NEWINGTON CT 06111
> WEST HARTFORD CT 06121
> EAST HARTFORD CT 06108
> WETHERSFIELD CT 06109
>
> and 010003.fst contains the recognition lattice for the line image (gt
> Newington CT 06111)
>
> I understand that internally it uses fstutils.add_line_to_fst for
> constructing a fst of the txt file and then composing with the recognition
> lattice.For some reason it is not able to obtain the correct output.
>
> I also tried playing with the Beam width increasing the value in the hope of
>
> I had even tried building an fst LM by extending on dict2linefst example,
> under pyopenfst without any success.
>
> Am I missing something on how to create a LM based on the list as mentioned
> above?
>
> 2.The recognition rates of out of the box OCRopus0.4.4 as compared to
> tesseract3.0 is very poor on the set of images I am testing.
>    Sharing a few samples - these are made up postal labels , I am only
> interested in the last line address .These also give an idea on the range
> of
>    image quality/size that I have in my corpus.
>
> Any suggestions on how to configure the OCRopus for better results?
>
> Thanks for the help , hope to scale up and start contributing soon.
>
> Regards,
> Amrit.
> ____________________________________
> Amriteshwar Singh
> Graduate Student(MS) - Computer Science
> The University Of Texas at Dallas
> [email protected]
>
>
>
>
>
>
>
> On Thu, Mar 3, 2011 at 4:54 AM, Tom <[email protected]> wrote:
> >   >>My apologies I should have made explicitly stated the error
> > encountered at my end.Below is the logged output that I see:
> > $ ocropus-calign -x .gt.txt -m 2m2-reject.cmodel 010004.png
> > loading ../../models/2m2-reject.cmodel
> > *** ('010004.png', None)
> > [[[
> > load 010004.png
> > lraw   0.00 12 s-u\v,|ok9aa
> > gt 010004.gt.txt
> > ERROR 010004.gt.txt failed to load
> > amrit@amrit:/media/Data/OCR/images/images/IMAGE-Results/TEST$ cat
> > 010004.gt.txt
> > SOUTHBURY, CT 06488
>
> >  >>This is just one example where I am using a single image ,it starts
> > decoding and fails to read the gt file which is refered by .gt.txt
> > extension.The same error occurs for bulk also.010004.png contains only the
> > stated ground truth as image.Do let me know in case I am usage is at fault.
>
> > Not sure; try running it with strace or look at the Python code (it's not
> > that complicated).
>
> > >>As suggested I would try this with the revamped ocropus-lattice +
> > ocropus-align implementation as well . Is there any changes to the steps in
> > training cmodel as opposed to the ones described for ocropus-calign ?
>
> > No, other than that it's the same.  The two programs were separated because
> > you often want to try different language models with the same lattices.
>
> > Tom
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "ocropus" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> > [email protected].
> > For more options, visit this group at
> >http://groups.google.com/group/ocropus?hl=en.
>
>
>
>  sample1.tif
> 33KViewDownload
>
>  Sample2.tif
> 315KViewDownload

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

[ocropus] Re: Using a "dictionary" for recognition?

Reply via email to