What is the problem?
The
text is correctly decoded.
There
is just an Unicode string in the stream.
You
wrote:
I am trying to convert
PDF to text using java. I got as far as reading and
decoding the stream objects. Some are still a bit strange even after
decoding, e.g. (This example was FlateDecode encoded, shown decoded here):
"1 g\r/GS2 gs\r0 792 m\r0 792 l\rf\rq\r1 i \r-1 793 614 -794 re\r0 792
m\rW
n\r0 792.36 612 -792 re\rW n\r0 0 0 0 k\r0 738 612 -402.1 re\r0 792
m\rf*\r/EmbeddedDocument /MC1 BDC\rQ\rq\r1 i \r0 738 612 -402.1 re\rW*
n\r/GS1 gs\rq\r624.4401 0 0 412.72 0.31 331.6501 cm\r/Im1
Do\rQ\rEMC\rQ\rq\r1 i \r-1 793 614 -794 re\r0 792 m\rW n\r0 792.36 612 -792
re\rW n\r/Cs9 cs 1 scn\r/GS1 gs\r-0.16 334.41 611.25 -29.64 re\rf*\r0 0 0 0
K\r0 J 0 j 0.911 w 10 M []0 d\r-0.61 334.87 612.15 -30.56 re\rS\rBT\r/F1 1
Tf\r15.78 0 0 15.79 21.5978 313.7892 Tm\r/Cs10 cs 1 scn\r0.0713 Tc\r0
Tw\r[([1]
\b)-34.3(\t\\012 \f
\\015
)-34.3(
\f
�
)]TJ\rET\r0 0
0 0 k\r444.86 161.15 167.14 -161.15 re\r346.693 313.789
m\rf*\r/EmbeddedDocument /MC2 BDC\rQ\rq\r1 i \r444.86 161.15 167.14 -161.15
re\rW* n\r-1 793 614 -794 re\r0 792 m\rW n\r0 792.36 612 -792 re\rW n\r/GS1
gs\rq\r94.3378 0 0 86.1608 446.88 70.83 cm\r/Im2 Do\rQ\rEMC\rQ\r"
Could it be a character encoding problem?
Perhaps someone has had the problem before...
|