Re: [gentoo-user] multi-region OCR

Francisco Ares Wed, 30 Nov 2016 10:42:26 -0800

2016-11-30 16:28 GMT-02:00 Michael Mol <mike...@gmail.com>:

> On Wednesday, November 30, 2016 05:34:25 PM J. Roeleveld wrote:
> > On November 30, 2016 6:03:36 PM GMT+01:00, Michael Mol <
> mike...@gmail.com>
> wrote:
> > >On Wednesday, November 30, 2016 10:43:13 AM J. Roeleveld wrote:
> > >> On Tuesday, November 29, 2016 11:18:36 PM k...@aspodata.se wrote:
> > >> > Michael Mol:
> > >> > ...
> > >> >
> > >> > > xsane would have let me do it during the scan process if I'd
> > >
> > >thought of
> > >
> > >> > > it
> > >> > > then, but the scans are done, drives aren't there any more.
> > >
> > >Something
> > >
> > >> > ...
> > >> >
> > >> > If xsane solves your need why don't you just print your scans so
> > >
> > >xsane
> > >
> > >> > can do its job ?
> > >>
> > >> There has to be a way to do this without killing an entire forest...
> > >
> > >And big chunks of ink cartridges. The scans stretched the contrast so I
> > >can
> > >clearly read the drive labels through the translucent anti-static bags,
> > >which
> > >means a huge chunk of the image (what's outside the labels) is pure
> > >black.
> > >
> > >Which I could get around by spending fifteen minutes munging things in
> > >the Gimp
> > >before printing, but at that point, I may as well just transcribe
> > >things
> > >manually at that point.
> > >
> > >Looking for something reasonably simple to improve the general
> > >workflow. I'd
> > >have hoped something would have already been available on Linux; it'd
> > >be easy
> > >enough to copy the scans to my phone and feed them through Google
> > >Goggles for
> > >the desired output, but then I'm deliberately filtering company data
> > >through an
> > >outside entity.
> >
> > Did you manage to use that link I sent?
>
> I did. tesseract almost worked, even separating the regions cleanly in its
> output, but it seems, sadly, that the 300dpi scans were insufficient to
> get a
> good read; lots of clear corruption of the text, so things like serial
> numbers, model numbers, version numbers--everything you'd care
> about--would be
> highly suspect.
>
> The next tool that looked like it might work, gscan2pdf, wasn't in portage,
> and with the semi-garbled output from tesseract suggesting the scans were
> too
> poor quality, I didn't pursue further.
>
> --
> :wq



Well, I've had similar issue. I had gimp to resize the image to its double
(width and height, of course), filtered it a bit (edge enhancement) and
split the image in several ones for the regions of interest.

Of course, there might be an easier way ;-)

Francisco

Re: [gentoo-user] multi-region OCR

Reply via email to