Launchpad has imported 8 comments from the remote bug at
https://bugs.freedesktop.org/show_bug.cgi?id=53925.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2012-08-22T12:10:51+00:00 Chpe wrote:

(Originally filed as https://bugzilla.gnome.org/show_bug.cgi?id=682451 )

Using poppler from git master and evince from git master (both updated
today to check the bug still occurs), when trying to print pages 3-4 of
http://www.unicode.org/Public/6.2.0/charts/blocks/U25A0.pdf I get two

Internal Error: cairo context error: input string not valid UTF-8

errors on console (and no output from the printer...).

Breaking on _cairo_error in gdb, this is the first error:

Breakpoint 1, _cairo_error (status=CAIRO_STATUS_INVALID_STRING) at cairo.c:171
171     {
(gdb) where
#0  _cairo_error (status=CAIRO_STATUS_INVALID_STRING) at cairo.c:171
#1  0xb60c5119 in _cairo_validate_text_clusters (
    utf8=utf8@entry=0x82c7480 "\355\240\275\355\264\273@\b\020", 
utf8_len=utf8_len@entry=6, 
    glyphs=glyphs@entry=0x84b2500, num_glyphs=num_glyphs@entry=1, 
    clusters=clusters@entry=0x87d13e8, num_clusters=num_clusters@entry=1, 
    cluster_flags=cluster_flags@entry=(unknown: 0)) at cairo-misc.c:319
#2  0xb60ad0cb in cairo_show_text_glyphs (cr=0xb614b460, 
    utf8=0x82c7480 "\355\240\275\355\264\273@\b\020", utf8_len=6, 
glyphs=0x84b2500, 
    num_glyphs=1, clusters=0x87d13e8, num_clusters=1, cluster_flags=(unknown: 
0))
    at cairo.c:3593
#3  0xa022937f in CairoOutputDev::endString (this=0x83a9d48, state=0x8748718)
    at CairoOutputDev.cc:1222
#4  0x9f7b2f0d in Gfx::doShowText (this=this@entry=0x874a400, s=0x87d41d8) at 
Gfx.cc:4036
#5  0x9f7b3ef9 in Gfx::opShowText (this=0x874a400, args=0xa00fe784, numArgs=1) 
at Gfx.cc:3737
#6  0x9f7a4366 in Gfx::execOp (this=this@entry=0x874a400, 
cmd=cmd@entry=0xa00fe764, 
    args=args@entry=0xa00fe784, numArgs=numArgs@entry=1) at Gfx.cc:857
#7  0x9f7aba5e in Gfx::go (this=this@entry=0x874a400, 
topLevel=topLevel@entry=false)
    at Gfx.cc:716
#8  0x9f7abf05 in Gfx::display (this=this@entry=0x874a400, 
obj=obj@entry=0xa00feb84, 
    topLevel=topLevel@entry=false) at Gfx.cc:682
#9  0x9f7ac30b in Gfx::drawForm (this=this@entry=0x874a400, 
str=str@entry=0xa00feb84, 
    resDict=resDict@entry=0x83f72f0, matrix=matrix@entry=0xa00feb00, 
    bbox=bbox@entry=0xa00feae0, transpGroup=transpGroup@entry=false, 
    softMask=softMask@entry=false, 
blendingColorSpace=blendingColorSpace@entry=0x0, 
    isolated=isolated@entry=false, knockout=knockout@entry=false, 
alpha=alpha@entry=false, 
    transferFunc=transferFunc@entry=0x0, backdropColor=backdropColor@entry=0x0) 
at Gfx.cc:4830
#10 0x9f7ad3e5 in Gfx::doForm (this=this@entry=0x874a400, 
str=str@entry=0xa00feb84)
    at Gfx.cc:4753
#11 0x9f7b0123 in Gfx::opXObject (this=0x874a400, args=0xa00feca4, numArgs=1) 
at Gfx.cc:4127
#12 0x9f7a4366 in Gfx::execOp (this=this@entry=0x874a400, 
cmd=cmd@entry=0xa00fec84, 
    args=args@entry=0xa00feca4, numArgs=numArgs@entry=1) at Gfx.cc:857
#13 0x9f7aba5e in Gfx::go (this=this@entry=0x874a400, 
topLevel=topLevel@entry=true)
    at Gfx.cc:716
#14 0x9f7abf05 in Gfx::display (this=0x874a400, obj=0xa00fef34, topLevel=true) 
at Gfx.cc:682
#15 0x9f7f0b93 in Page::displaySlice (this=0x834fd80, out=0x83a9d48, hDPI=72, 
vDPI=72, 
    rotate=0, useMediaBox=false, crop=true, sliceX=-1, sliceY=-1, sliceW=-1, 
sliceH=-1, 
    printing=true, abortCheckCbk=0, abortCheckCbkData=0x0, 
    annotDisplayDecideCbk=0xa021da60 <poppler_print_annot_cb(Annot*, void*)>, 
    annotDisplayDecideCbkData=0x1) at Page.cc:520
#16 0xa021e3f5 in _poppler_page_render (page=0x833b700, cairo=0xb614b460, 
printing=true, 
    print_flags=POPPLER_PRINT_MARKUP_ANNOTS) at poppler-page.cc:358
#17 0xa0247167 in pdf_document_print_print_page (document=0x8316b50, 
page=0x833bc30, 
    cr=0xb614b460) at ev-poppler.cc:1934
#18 0xb7f719cb in ev_document_print_print_page (document_print=0x8316b50, 
page=0x833bc30, 
    cr=0xb614b460) at ev-document-print.c:40
#19 0xb7f274ed in ev_job_print_run (job=0x8669c60) at ev-jobs.c:1866
#20 0xb7f232ba in ev_job_run (job=0x8669c60) at ev-jobs.c:215
#21 0xb7f27c2b in ev_job_thread (job=0x8669c60) at ev-job-scheduler.c:184
#22 0xb7f27d38 in ev_job_thread_proxy (data=0x0) at ev-job-scheduler.c:217
#23 0xb5c4e47f in g_thread_proxy (data=0x832d720) at gthread.c:801
#24 0xb5bc4adf in start_thread (arg=0xa00ffb40) at pthread_create.c:309
#25 0xb5ab754e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133

Reply at:
https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/1096685/comments/0

------------------------------------------------------------------------
On 2012-08-24T13:46:37+00:00 Adrian Johnson wrote:

The problem is that surrogate pairs are not decoded before converting to
utf8. The patch https://bugs.freedesktop.org/attachment.cgi?id=58178
(bug 46603 "convert utf-16 to ucs-4 when reading ToUnicode") fixes this
issue by moving all instances of the surrogate pair handling to where
the UTF-16 characters are read to ensure that the internal Unicode type
contains only UTF-32 values.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/1096685/comments/1

------------------------------------------------------------------------
On 2012-08-26T22:40:29+00:00 Albert Astals Cid wrote:

I'll have a look to see if integrating that patch Adrian mention breaks
something, "soon" by some definition of "soon" :D

Reply at:
https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/1096685/comments/2

------------------------------------------------------------------------
On 2012-08-27T20:47:36+00:00 Albert Astals Cid wrote:

Regression in pdftotext output in

https://bugs.freedesktop.org/attachment.cgi?id=58045

-In our disk model ̃
-𝑃 of the projective plane, we have obtained four bundles of half
+̃ of the projective plane, we have obtained four bundles of half
+In our disk model 𝑃

It is true that the original is not perfect, but at least it is in the
correct order, your new one exchanges the order of the text (i.e. "In
our disk model" has to be before "of the projective plane", not after)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/1096685/comments/3

------------------------------------------------------------------------
On 2012-08-28T13:05:04+00:00 Adrian Johnson wrote:

Created attachment 66222
increase tolerance for overlapping glyphs

This patch fixes the regression.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/1096685/comments/4

------------------------------------------------------------------------
On 2012-08-28T13:10:01+00:00 Adrian Johnson wrote:

Created attachment 66223
move text to unicode conversion to a separate function

As a result of the first patch, ActualText also needs to convert UTF-16
to UCS-4. This patch (from bug 46603 with a small fix) factors out the
duplicated code in ActualText and pdfinfo for converting text to
unicode.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/1096685/comments/5

------------------------------------------------------------------------
On 2012-08-30T20:37:41+00:00 Albert Astals Cid wrote:

I''ve commited this patches, but only to msater (i.e. 0.22.0) since they
change pdftotext output for a lot of files (around 400 in my test
suite). It is true that mostly are improvements but with such a huge
change i don't feel like putting it in 0.20.x

P.S: My eyes bleed after looking at the diffs of all those pdftotexts
outputs

Reply at:
https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/1096685/comments/6

------------------------------------------------------------------------
On 2012-08-31T07:05:31+00:00 Carlos Garcia Campos wrote:

(In reply to comment #6)
> I''ve commited this patches, but only to msater (i.e. 0.22.0) since they 
> change
> pdftotext output for a lot of files (around 400 in my test suite). It is true
> that mostly are improvements but with such a huge change i don't feel like
> putting it in 0.20.x
> 
> P.S: My eyes bleed after looking at the diffs of all those pdftotexts outputs

Thanks both!

Reply at:
https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/1096685/comments/7


** Changed in: poppler
       Status: Unknown => Fix Released

** Changed in: poppler
   Importance: Unknown => High

** Bug watch added: GNOME Bug Tracker #682451
   https://bugzilla.gnome.org/show_bug.cgi?id=682451

-- 
You received this bug notification because you are a member of Ubuntu
Desktop Bugs, which is subscribed to poppler in Ubuntu.
https://bugs.launchpad.net/bugs/1096685

Title:
  poppler feeds invalid UTF-8 to cairo

To manage notifications about this bug go to:
https://bugs.launchpad.net/poppler/+bug/1096685/+subscriptions

-- 
desktop-bugs mailing list
desktop-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/desktop-bugs

Reply via email to