Repository: pdfbox-docs Updated Branches: refs/heads/asf-site 093b2e4ce -> d49a1632d
Site checkin for project Apache PDFBox Website Project: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/repo Commit: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/commit/d49a1632 Tree: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/tree/d49a1632 Diff: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/diff/d49a1632 Branch: refs/heads/asf-site Commit: d49a1632d844591f327f9e07e6b928b52fd673fc Parents: 093b2e4 Author: Maruan Sahyoun <sahy...@fileaffairs.de> Authored: Thu Mar 3 22:28:43 2016 +0100 Committer: Maruan Sahyoun <sahy...@fileaffairs.de> Committed: Thu Mar 3 22:28:43 2016 +0100 ---------------------------------------------------------------------- content/2.0/migration.html | 14 ++++++++++++++ 1 file changed, 14 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/d49a1632/content/2.0/migration.html ---------------------------------------------------------------------- diff --git a/content/2.0/migration.html b/content/2.0/migration.html index 1882a09..10f81ce 100644 --- a/content/2.0/migration.html +++ b/content/2.0/migration.html @@ -331,6 +331,20 @@ tree are now represented by the <code>PDNonTerminalField</code> class.</p> </code></pre></div> <p>Most <code>PDField</code> subclasses now accept Java generic types such as <code>String</code> as parameters instead of the former <code>COSBase</code> subclasses.</p> +<h3 id="why-was-the-replacetext-example-removed">Why was the ReplaceText example removed?</h3> + +<p>The ReplaceText example has been reomved as it gave the incorrect illusion that text can be replaced easily. +Words are often split, as seen by this excerpt of a content stream:</p> +<div class="highlight"><pre><code class="language-" data-lang="">[ (Do) -29 (c) -1 (umen) 30 (tation) ] TJ +</code></pre></div> +<p>Other problems will appear with font subsets: for example, if only the glyphs for a, b and c are used, +these would be encoded as hex 0, 1 and 2, so you won't find "abc". Additionally, you can't replace "c" with "d" because it isn't part of the subset.</p> + +<p>You could also have problems with ligatures, e.g. "ff", "fl", "fi", "ffi", "ffl", which can be represented by a single code in many fonts. +To understand this yourself, view any file with PDFDebugger and have a look at the "Contents" entry of a page.</p> + +<p>See also <a href="https://stackoverflow.com/questions/35420609/pdfbox-2-0-rc3-find-and-replace-text">https://stackoverflow.com/questions/35420609/pdfbox-2-0-rc3-find-and-replace-text</a></p> + </div> </div> </div>