Created attachment 114485
Combine base characters and diacritical marks

My attempt to improve this.

When you make a diacriticized character with LaTeX, ü for example, it
will make a PDF with separate u and ¨ characters and draw them over each
other.  This patch detects when this happens and converts it to a
combining character sequence so that pdftotext and the search function
will see a ü and not separate characters.  Also refactors some
(TextWord::ensureCapacity and TextWord::setInitialBounds) to avoid
duplicating code.

Limitations:

It doesn't handle some of LaTeX's diacritic commands, such as \b for bar
under letter or \d for dot under letter, because they are positioned
differently and \d would be easy to confuse with a period.  They don't
seem to be used very often though.

If the base character is unusual, such as a math symbol or number,
adding a combining character can make the result of pdftotext look a bit
odd.  I think this is because if the font or rendering engine don't know
how to draw the character sequence, it will place the diacritic in a
strange position, such as to the right of the letter.  In these cases,
the output of pdftotext is technically correct, it just looks odd when
drawn on screen.

When selecting text in evince, you can separately select the character
and diacritic.  If that's a problem, I think I could fix it by adding
clustering support so that a group of glyphs and characters are treated
as a single unit.  It would make this a much more invasive change, but
maybe I should try it anyway.  It would be nice to also fix the
assumpution that one glyph is always matched 1 character.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to poppler in Ubuntu.
https://bugs.launchpad.net/bugs/116453

Title:
  evince can not find ü in attached PDF

Status in Poppler:
  Confirmed
Status in poppler package in Ubuntu:
  Triaged

Bug description:
  Binary package hint: evince

  1) lsb_release -rd
  Description:  Ubuntu Vivid Vervet (development branch)
  Release:      15.04

  2) apt-cache policy evince
  evince:
    Installed: 3.14.1-0ubuntu1
    Candidate: 3.14.1-0ubuntu1
    Version table:
   *** 3.14.1-0ubuntu1 0
          500 http://us.archive.ubuntu.com/ubuntu/ vivid/main amd64 Packages
          100 /var/lib/dpkg/status
   
  3) What is expected to happen with the attached document is when one searches 
for:
  über

  it is found:
  
https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/116453/+attachment/102979/+files/example.pdf

  4) What happens instead is it does not return any matches.

  WORKAROUND: Use the built-in PDF viewer+search with chromium-browser
  or chrome (doesn't work in Firefox).

  apt-cache policy chromium-browser
  chromium-browser:
    Installed: 39.0.2171.65-0ubuntu0.14.04.1.1064
    Candidate: 39.0.2171.65-0ubuntu0.14.04.1.1064
    Version table:
   *** 39.0.2171.65-0ubuntu0.14.04.1.1064 0
          500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/universe 
amd64 Packages
          500 http://security.ubuntu.com/ubuntu/ trusty-security/universe amd64 
Packages
          100 /var/lib/dpkg/status
       34.0.1847.116-0ubuntu2 0
          500 http://us.archive.ubuntu.com/ubuntu/ trusty/universe amd64 
Packages

  apt-cache policy google-chrome-stable:i386
  google-chrome-stable:i386:
    Installed: 39.0.2171.95-1
    Candidate: 39.0.2171.95-1
    Version table:
   *** 39.0.2171.95-1 0
          500 http://dl.google.com/linux/chrome/deb/ stable/main i386 Packages
          100 /var/lib/dpkg/status

  ProblemType: Bug
  Architecture: i386
  Date: Wed May 23 18:22:27 2007
  DistroRelease: Ubuntu 7.04
  ExecutablePath: /usr/bin/evince
  Package: evince 0.8.1-0ubuntu1
  PackageArchitecture: i386
  ProcEnviron:
   LANGUAGE=en_US:en
   
PATH=~/local/bin:~/local/lib:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11:/usr/games
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  SourcePackage: evince
  Uname: Linux copper 2.6.20-15-generic #2 SMP Sun Apr 15 07:36:31 UTC 2007 
i686 GNU/Linux

To manage notifications about this bug go to:
https://bugs.launchpad.net/poppler/+bug/116453/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to