Instead of extracting just text in the PDF - you should extract the
content in a format that maintains attribute information (even if
only a limited set) such as HTML or RTF. Then you could attempt to
reconstruct something akin to the PDF from that information.
Of course, such formats are still quite lossy when extracted from PDF
since they can't represent the same color fidelity, vector graphics,
etc.
Perhaps if you explained your task more completely, we could point
you in the correct direction...
Leonard
On Apr 16, 2008, at 2:49 AM, Nhia Nhia wrote:
Ok, I know it sounds a bit strange but i have to do it, and do it
automatically.
My question was if I can extract any information from de pdf file
about font and size, because i suppose you are going to say that
its impossible, but i want to be sure. I thinked maybe I could use
tokenizer to see the content stream and read /F1 12 Tf (for
example) but I couldn't because if I couldn't extract the text
using itext i would have now the same problem (I am talking about
that I used the examen on chapter 18, climb the tree, of the book).
Do you agree with me?
Do you believe it is possible with another library? And I could
know if there was a picture or a table in one page on the pdf ?
Sorry I'm a bit lost. I only want to know if its possible.
Laura
2008/4/16, Bruno Lowagie <[EMAIL PROTECTED]>:
Nhia Nhia wrote:
> Hello,
>
>
> My problem is that i have extracted the content from a pdf (using
PDFBox
> as Bruno recommend in the book) to a .txt,
Well, it appears that you have succeeded in doing so.
> but now I want to make the
> pdf again from this text.
Er... OK...
> The question is that I don't know if it's
> possible I can do a class where I can read line by line from my .txt
> and,
Certainly: see the Caesar examples in chapter 7.
> at the same time, know what was the format of that line in the
> original pdf and so I can reproduce this again.
No way!!! You have a text file. You can look at every byte
of that file. Do you see any formatting information???
I don't think so. In the process of making a 'plain text'
file from your PDF, plenty of information was lost. The
chance that you'll ever be able to reproduce the PDF in
its original state using only a plain text file is one in
a billion.
br,
Bruno
----------------------------------------------------------------------
---
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save
$100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://
java.sun.com/javaone
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar
----------------------------------------------------------------------
---
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save
$100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://
java.sun.com/javaone_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar