As mentioned in the google code issue page a simple, but probably incorrect, workaround can be used to deal with stuff that touches the left edge of the page:

--- /usr/share/ocropus/scripts/lib/hocr.lua.orig        2010-04-27 
22:05:29.000000000 +0200
+++ /usr/share/ocropus/scripts/lib/hocr.lua     2010-07-28 17:08:14.000000000 
+0200
@@ -24,8 +24,8 @@
hocr = {}

function hocr.parse_rectangle(s, h)
-    local x0, y0, x1, y1 = s:match('^%s*(%d*)%s*(%d*)%s*(%d*)%s*(%d*)%s*$')
-    assert(x0 and y0 and x1 and y1, "rectangle parsing error")
+    local x0, y0, x1, y1 = s:match('^%s*(-?%d*)%s*(%d*)%s*(%d*)%s*(%d*)%s*$')
+    assert(x0 and y0 and x1 and y1, "rectangle parsing error " .. s)
    return rectangle(x0+0, h-1-y1, x1+0, h-1-y0)
end


If ocroscript is being called from ocrodjvu this "fix" tickles a problem in ocrodjvu:

Exception in thread Thread-1:
Traceback (most recent call last):
 File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
   self.run()
 File "/usr/lib/python2.6/threading.py", line 484, in run
   self.__target(*self.__args, **self.__kwargs)
 File "/usr/share/ocrodjvu/lib/_ocrodjvu.py", line 445, in page_thread
   result = self.process_page(page)
 File "/usr/share/ocrodjvu/lib/_ocrodjvu.py", line 425, in process_page
   page_size=size
 File "/usr/share/ocrodjvu/lib/hocr.py", line 462, in extract_text
   scan_result = scan(doc.find('/body'), settings)
 File "/usr/share/ocrodjvu/lib/hocr.py", line 428, in scan
   _rotate(zone, settings.rotation)
 File "/usr/share/ocrodjvu/lib/hocr.py", line 324, in _rotate
   assert obj.bbox[:2] == (0, 0)
AssertionError

Another dodgy workaround for that is:

--- /usr/share/ocrodjvu/lib/hocr.py.orig        2010-07-28 17:20:17.000000000 
+0200
+++ /usr/share/ocrodjvu/lib/hocr.py     2010-07-28 17:20:50.000000000 +0200
@@ -321,7 +321,7 @@
    if xform is None:
        assert isinstance(obj, Zone)
        assert obj.type == const.TEXT_ZONE_PAGE
-        assert obj.bbox[:2] == (0, 0)
+        # assert obj.bbox[:2] == (0, 0)
        page_size = obj.bbox[2:]
        if (rotation // 90) & 1:
            xform = decode.AffineTransform((0, 0) + tuple(reversed(page_size)), 
(0, 0) + page_size)







--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to