[ 
https://issues.apache.org/jira/browse/PDFBOX-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622864#comment-14622864
 ] 

John Hewson commented on PDFBOX-2530:
-------------------------------------

There's a lot of data associated with fonts in PDF, then there's the font files 
themselves. Don't worry about covering everything there.

For PDSimpleFont subclasses, we'd want to view the Encoding entry in the Font 
dictionary. Ideally you could retrieve the  
org.apache.pdfbox.pdmodel.font.encoding.Encoding object from the PDSimpleFont. 
Then the encoding is simply a map of character codes to glyph names, which you 
can display in a table. The character codes are always 0 to 255. Perhaps you 
could display missing glyphs in red by calling PDSimpleFont#hasGlyph(name).

For PDType0Font subclasses, you'll need to look at the DescendantFont (there's 
always only one). When this is a PDCIDFontType2, then we want to display the 
CIDToGIDMap as a table. You can copy the code from readCIDToGIDMap to do this 
(I don't want to make that method public). This is simply a table of CID -> GID.


> Improve PDFDebugger
> -------------------
>
>                 Key: PDFBOX-2530
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2530
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.8.8, 2.0.0
>            Reporter: Tilman Hausherr
>            Assignee: khyrul bashar
>              Labels: gsoc2015
>         Attachments: Avoiding_NPE_for_null_Field_Type.diff, 
> BracketsColorChooser.png, DeviceNCS.diff, FlagBitsPane-26-06-2015.diff, 
> Flag_bits_showing_feature-redesigned.diff, Flag_bits_showing_feature.diff, 
> K4SystemFontsNotEmbeded218.pdf, PDFDebugger_StatusBar.png, 
> PDFDebugger_StatusBar_01.png, 
> Parent_dictionary_type_checking_for__f__and__flags.diff, indexedcs.diff, 
> openSelectedPath.diff, parent_node_redirect.diff, 
> parent_node_redirect_expand_disabled.diff, removed_redundant_codes.patch, 
> separationCS.diff, sonarqube_warning_resolve.diff, tree.diff, 
> treestatus.diff, treestatuspane.diff
>
>
> (This is an idea for the [Google Summer of Code 
> 2015|https://www.google-melange.com/])
> Our command line utility PDFDebugger (part of the command line pdfbox-app get 
> it [here|https://pdfbox.apache.org/downloads.html], read description 
> [here|https://pdfbox.apache.org/commandline/], see the source code 
> [here|https://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFDebugger.java?view=markup&sortby=date])
>  needs some improvements:
>    - hex view
>    - view of non printable characters
>    - saving streams
>    - binary copy & paste
>    - ✓ Create a status line that shows where we are in the tree. (Like in the 
> Windows REGEDIT)
>    - ✓ Copy the current tree string into the clipboard (useful in discussions 
> about details of a PDF)
>    - ✓ (Optional, not sure if easy) Jump to specific place in the tree by 
> entering tree string
>    - ability to search in streams (very useful for content streams and meta 
> data)
>    - show images that are streams
>    - ✓ show PDIndexed color lookup table, show the index value, the base and 
> RGB color value sets when the mouse moves
>    - ✓ show PDSeparation color
>    - ✓ show PDDeviceN colors
>    - optional, idea should be developed a bit: show meaningful explanation on 
> some attributes, e.g. "appearance stream" when hovering over /AP
>    - show font encodings and characters
>    - ✓ display flag bits (e.g. Annotation flags) in a way that is easy to 
> understand. There are probably others, I assume that the main work needs to 
> be done only once
>    - edit attributes (should be possible to enter values as decimal, hex or 
> binary)
>    - edit streams, while keeping or changing the compression filter
>    - save altered PDF 
>    - color mark of certain PDF operators, especially Q...q and text operators 
> (BT...ET). Ideally, it should help the user understand the "bracketing" of 
> these operators, i.e. understand where a sequence starts and where it ends. 
> (See "operator summary" in the PDF Spec) Other "important" operators I can 
> think of are the matrix, font and color operators. A cool advanced thing 
> would be to show the current color or the font in a popup when hovering above 
> such an operator.
> To see a product with a similar purpose that is better than PDFDebugger, 
> watch [this video|https://www.youtube.com/watch?v=g-QcU9B4qMc].
> I'm not asking to implement a clone of that product (I don't use it, all I 
> know is that video), but we at PDFBox really need something that makes PDF 
> debugging easier. As an example of how the current PDFDebugger prevented me 
> from finding a bug quickly, see PDFBOX-2401 and search for "PDFDebugger".
> Prerequisites:
> - java programming, especially the GUI components
> - the ability to understand existing source code
> Using external software components is possible (must have Apache License or a 
> compatible one), but should be decided on a case-by-case basis, we don't want 
> to get too big.
> Development strategy: go from the easy to the difficult. The wished features 
> are already sorted this way (mostly).
> Get introduced: [download the source code with 
> svn|https://pdfbox.apache.org/downloads.html#scm] and build it with maven. 
> Run PDFDebugger and view some PDFs to see the components of a PDF. Start with 
> the file of PDFBOX-2401. Read up something about the structure of PDF on the 
> web or from the [PDF 
> Specification|https://www.adobe.com/devnet/pdf/pdf_reference.html].
> Mentor: Tilman Hausherr (European timezone, languages: german, english, 
> french). To see the GSoC2014 project I mentored, go to PDFBOX-1915.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to