[
https://issues.apache.org/jira/browse/PDFBOX-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388430#comment-17388430
]
flywire edited comment on PDFBOX-5247 at 7/28/21, 4:00 AM:
-----------------------------------------------------------
Used https://pdfbox.apache.org/2.0/commandline.html#writedecodeddoc
All the confidential text is in 1 of 35 objects:
{code:java}
// BUY CONFIRMATION
BT
/YLFOZY+Arial-BoldMT 21.00000 Tf
425.00 608.00 Td
1.00000 0.00000 0.00000 rg
-0.06673 Tc
( % X \\ & R Q I L U P D W L R Q) Tj
ET
Q
Q
q
q
{code}
Tj line:
20 28 00 25 00 58 00 5c 5c 00 03 00 26 00 52 00 51 00 49 00 4c 00 55 00 50 00
44 00 57 00 4c 00 52 00 51 29 20 20 54 6a
00 03 is the space character. How do I find out what it is mapped to?
NP++ using UTF-8 encoding
was (Author: flywire):
Used https://pdfbox.apache.org/2.0/commandline.html#writedecodeddoc
All the confidential text is in 1 of 35 objects:
{code:java}
// BUY CONFIRMATION
BT
/YLFOZY+Arial-BoldMT 21.00000 Tf
425.00 608.00 Td
1.00000 0.00000 0.00000 rg
-0.06673 Tc
( % X \\ & R Q I L U P D W L R Q) Tj
ET
Q
Q
q
q
{code}
Tj line:
20 28 00 25 00 58 00 5c 5c 00 03 00 26 00 52 00 51 00 49 00 4c 00 55 00 50 00
44 00 57 00 4c 00 52 00 51 29 20 20 54 6a
00 03 is the space character. How do I find out what it is mapped to?
> Space in pdf returns c2 a0 characters instead of 20
> ---------------------------------------------------
>
> Key: PDFBOX-5247
> URL: https://issues.apache.org/jira/browse/PDFBOX-5247
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Environment: Portfolio Performance
> Version: 0.54.0 (Jul. 2021)
> Platform: win32, x86_64
> Java: 11.0.4+11-LTS, Azul Systems, Inc.
> Locale: AU
> Reporter: flywire
> Priority: Minor
>
> *pdf containing:*
> SelfWealth Limited ABN: 52 154 324 428 AFSL 421789 W: www.selfwealth.com.au
> E: [email protected]
> This trade was executed and cleared by OpenMarkets Australia Ltd ABN 38 090
> 472 012,
> AFSL 246 705, Market Particpant of ASX, CHIX and NSX.
> Buy Confirmation
>
> *Gives (see hex on right side):*
> !https://user-images.githubusercontent.com/11288701/126945391-18c0ccb4-289d-49cd-85a8-8714e145df3f.png!
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]