Re: [iText-questions] Save PDF as plain text

2011-11-17 Thread WMJ
Hello,


Thanks for pointing out this.


Before that, I have once met with a PDF which has 1 mega bytes each compressed 
page stream. It did not take too long to parse them and convert it to the 
primitive command model.


If ET does not come after a BT, the DOM model might assume the rest commands 
are for the text object. If enclosing commands interleave each other, it is an 
error. The DOM model should not tolerate this, and the current event model 
can't cope with this either. We can't expect too much for the first design. At 
least, handling correct documents is the initial goal.


Yes, where the styles goes is a problem. It is good to discuss and find out a 
way to handle them.


I think it is useful to have a PDF command model. Just like XML DOM. XML DOM is 
not so efficient while encountering huge XML documents. But it is still a very 
popular tool for ordinary size documents. The real problem is not whether we 
should have one or not, but how to design the command model to make it useful 
and easy to use. Afterwards, we provide the two models and let developers 
choose their favorite one.


WMJ.




>
>From: Leonard Rosenthol 
>Subject: Re: [iText-questions] Save PDF as plain text
>
>
>Find some vector-heavy documents such as those in prepress/publishing or CAD 
>drawings.  Those will give you the heaviest content streams for your DOM.  
>I've enclosed TWO PAGES from a REAL WORLD document to demonstrate my point.  
>This will give you something fairly normal to implement against. When you 
>think you're "done", I'll share my favorite REAL WORLD sample that blows out 
>every DOM implementation UNTIL they prepare for it :).
>
>
>But my point is NOT the dissuade you – you are correct.  A DOM model for PDF 
>page content is a good thing and very useful.  However, it's NOT trivial to 
>implement.  You should be prepared to throw away your first implementation and 
>rewrite it after beginning to run it against stuff in the real world.  Adobe 
>Acrobat/Reader allow for LOTS of crap, because there is a LOT of crap out 
>there.  If your implementation assumes perfection, it's going to fail when 
>faced with reality.  A perfect example is your comment below about nesting – 
>what happens when "the end never comes"??   The other big thing you need to 
>work out in your DOM model is where attributes/styling goes – separate 
>objects?  Attributes on the DOM nodes?  Other?   And then how you relate them 
>from stream->DOM.
>
>
>Oh – and then once you get it working on a single page, you'll need to think 
>about how to handle recursion!  (aka how do you walk from the main page into a 
>Form Xobject?)
>
>
>Have fun!!
>
>
>Leonard
>
>From:  WMJ 
>Reply-To:  WMJ , Post here 
>
>Date:  Thu, 17 Nov 2011 01:36:52 -0800
>To:  Post here 
>Subject:  Re: [iText-questions] Save PDF as plain text
>
>
>
>Hello,
>
>
>
>Firstly I agree that it is easy to convert the current event model to DOM 
>model. And I've done already implemented a very basic model with one or two 
>days' work.
>
>
>
>Currently I've processed quite some PDF files and I think huge page command 
>trees are rare. Few PDF documents contain page contents more than 100KB per 
>page. A DOM model is quite affordable. None to mention the fact that there are 
>already quite a lot of PDF editor or processors out there. They do have their 
>internal structure for those PDF objects to support content editing.
>
>
>
>With the DOM model mentioned above, the developers who want to extract and 
>analyze text can traverse the DOM tree and grab all PdfShowTextCommand 
>objects. By inspecting PdfShowTextCommand object, they immediately know the 
>font, size, position, color about those text pieces. A PDF rendering processor 
>named MuPDF appears to have a similar API to extract texts.
>
>
>
>
>
>
>
>And...
>
>
>
>
>
>Although we all know that the PDF commands are linear. However, according to 
>the PDF specification, there are de facto "multi-level" structures. For 
>example, text commands must be placed within a pair of BT and ET command, and 
>a pair of q and Q command encompasses graphic commands within a scope. In the 
>DOM model, we don't need to worry about "whether I've added an ET command 
>after the BT or not". A PdfTextAreaCommand denotes the BT and implies ET after 
>all its sub-commands. Sub-commands of the PdfTextAreaCommand can be 
>PdfTextMatrixCommmand, PdfShowTextCommand, PdfFontCommand, etc.
>
>
>
>
>
>We might need listen to other people's opinions and requirements on PDF 
>content processing.
>
>A job that is easy to do doesn't mean that it is a nonsense. If integrating it 
>into iText can save other programmers' days, doing such kind of low-tech jobs 
>may be meaningful indeed.
>
>
>I am currently experimenting on the PDF page command DOM model (I need support 
>above the font encoding, font subsetting, and more, and more aspects that 
>iText lacks). A good thing about the DOM model is that we don't have to create 
>many s

Re: [iText-questions] What is required to make a file PDF/A?

2011-11-17 Thread Leonard Rosenthol
2.x.  Maybe it's setPDFAConformance.

However, as you know, 2.x is no longer supported.

Leonard

From: David Thielen mailto:da...@windward.net>>
Reply-To: Post here 
mailto:itext-questions@lists.sourceforge.net>>
Date: Thu, 17 Nov 2011 09:33:17 -0800
To: Post here 
mailto:itext-questions@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

Hi;

Is setConformance() in iText 5 only? We’re on iText 2 and I can’t find it 
anywhere. We do use iText to create the PDF so we should be ok on that part.

Thanks – dave


From: Leonard Rosenthol [mailto:lrose...@adobe.com]
Sent: Thursday, November 17, 2011 9:20 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

If you are creating the PDF ENTIRELY with iText – then you can just use the 
setConformance() API and it will take care of the details for you.  If you are 
starting with an existing PDF – then there aren't any options for iText at this 
time.

But that will ONLY get you PDF/A-1b.

PDF/A-1a requires that you properly structure & tag your content – something 
that iText will not (currently) do for you.  So you will need to do all that 
work yourself if you want full conformance.

And then there's PDF/A-2, which iText doesn't currently support either – but 
most folks aren't there just yet…

Leonard

From: David Thielen mailto:da...@windward.net>>
Reply-To: Post here 
mailto:itext-questions@lists.sourceforge.net>>
Date: Thu, 17 Nov 2011 08:12:38 -0800
To: Post here 
mailto:itext-questions@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I didn’t know about pre-flight, that’s cool.

Ok, ran my iText generated document through it and got the following:

· Convert to PDF/A-1a (sRGB)

· Convert to PDF/A-1b (sRGB)

I then double clicked on “Verify compliance with PDF/A-1a” and got a lot:

· Author mismatch between Document Info and XMP Metadata

· CIDset in subset font missing (238 matches on 4 pages)

· Creation date mismatch between Document Info and XMP Metadata

· Device process color used but no PDF/A OutputIntent (253 matches on 4 
pages)

· Last Modification Date mismatch between Document Info and XMP Metadata

· MarkInfo missing

· Metadata missing (XMP)

· PDF/A entry missing

· Producer mismatch between Document Info and XMP Metadata

· Structured PDF: Structure tree root entry missing

The biggies seem to be the CIDset for the fonts and colors stored correctly. 
I’m guessing this is not a simple couple of hours to add in.

Is there a setting in iText to set these values or does this require a 3rd 
party app?

Thanks – dave



From: Leonard Rosenthol [mailto:lrose...@adobe.com]
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

> no external referencing
>
Careful with that phrase as it’s led to misunderstanding by non-technical 
people.

What you really mean to say is “no externally referenced resources/assets”.

Leonard

From: TvT [mailto:tvtre...@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. 
Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No 
javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta 
information. Probably if you set that one acrobat will say its PDF/A. A better 
check is the PDF/A preflight check acrobat professional is offering. It shows 
you which part of the spec you are missing. If all tests pass then you probably 
have a 95% compliant PDF/A document.

Regards,
ToM
2011/11/17 David Thielen mailto:da...@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is 
not PDF/A.

thanks - dave

--
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/

Re: [iText-questions] What is required to make a file PDF/A?

2011-11-17 Thread David Thielen
Hi;

Is setConformance() in iText 5 only? We're on iText 2 and I can't find it 
anywhere. We do use iText to create the PDF so we should be ok on that part.

Thanks - dave


From: Leonard Rosenthol [mailto:lrose...@adobe.com]
Sent: Thursday, November 17, 2011 9:20 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

If you are creating the PDF ENTIRELY with iText - then you can just use the 
setConformance() API and it will take care of the details for you.  If you are 
starting with an existing PDF - then there aren't any options for iText at this 
time.

But that will ONLY get you PDF/A-1b.

PDF/A-1a requires that you properly structure & tag your content - something 
that iText will not (currently) do for you.  So you will need to do all that 
work yourself if you want full conformance.

And then there's PDF/A-2, which iText doesn't currently support either - but 
most folks aren't there just yet...

Leonard

From: David Thielen mailto:da...@windward.net>>
Reply-To: Post here 
mailto:itext-questions@lists.sourceforge.net>>
Date: Thu, 17 Nov 2011 08:12:38 -0800
To: Post here 
mailto:itext-questions@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I didn't know about pre-flight, that's cool.

Ok, ran my iText generated document through it and got the following:

* Convert to PDF/A-1a (sRGB)

* Convert to PDF/A-1b (sRGB)

I then double clicked on "Verify compliance with PDF/A-1a" and got a lot:

* Author mismatch between Document Info and XMP Metadata

* CIDset in subset font missing (238 matches on 4 pages)

* Creation date mismatch between Document Info and XMP Metadata

* Device process color used but no PDF/A OutputIntent (253 matches on 4 
pages)

* Last Modification Date mismatch between Document Info and XMP Metadata

* MarkInfo missing

* Metadata missing (XMP)

* PDF/A entry missing

* Producer mismatch between Document Info and XMP Metadata

* Structured PDF: Structure tree root entry missing

The biggies seem to be the CIDset for the fonts and colors stored correctly. 
I'm guessing this is not a simple couple of hours to add in.

Is there a setting in iText to set these values or does this require a 3rd 
party app?

Thanks - dave



From: Leonard Rosenthol [mailto:lrose...@adobe.com]
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

> no external referencing
>
Careful with that phrase as it's led to misunderstanding by non-technical 
people.

What you really mean to say is "no externally referenced resources/assets".

Leonard

From: TvT [mailto:tvtre...@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. 
Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No 
javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta 
information. Probably if you set that one acrobat will say its PDF/A. A better 
check is the PDF/A preflight check acrobat professional is offering. It shows 
you which part of the spec you are missing. If all tests pass then you probably 
have a 95% compliant PDF/A document.

Regards,
ToM
2011/11/17 David Thielen mailto:da...@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is 
not PDF/A.

thanks - dave

--
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d_

Re: [iText-questions] What is required to make a file PDF/A?

2011-11-17 Thread Leonard Rosenthol
If you are creating the PDF ENTIRELY with iText – then you can just use the 
setConformance() API and it will take care of the details for you.  If you are 
starting with an existing PDF – then there aren't any options for iText at this 
time.

But that will ONLY get you PDF/A-1b.

PDF/A-1a requires that you properly structure & tag your content – something 
that iText will not (currently) do for you.  So you will need to do all that 
work yourself if you want full conformance.

And then there's PDF/A-2, which iText doesn't currently support either – but 
most folks aren't there just yet…

Leonard

From: David Thielen mailto:da...@windward.net>>
Reply-To: Post here 
mailto:itext-questions@lists.sourceforge.net>>
Date: Thu, 17 Nov 2011 08:12:38 -0800
To: Post here 
mailto:itext-questions@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I didn’t know about pre-flight, that’s cool.

Ok, ran my iText generated document through it and got the following:

· Convert to PDF/A-1a (sRGB)

· Convert to PDF/A-1b (sRGB)

I then double clicked on “Verify compliance with PDF/A-1a” and got a lot:

· Author mismatch between Document Info and XMP Metadata

· CIDset in subset font missing (238 matches on 4 pages)

· Creation date mismatch between Document Info and XMP Metadata

· Device process color used but no PDF/A OutputIntent (253 matches on 4 
pages)

· Last Modification Date mismatch between Document Info and XMP Metadata

· MarkInfo missing

· Metadata missing (XMP)

· PDF/A entry missing

· Producer mismatch between Document Info and XMP Metadata

· Structured PDF: Structure tree root entry missing

The biggies seem to be the CIDset for the fonts and colors stored correctly. 
I’m guessing this is not a simple couple of hours to add in.

Is there a setting in iText to set these values or does this require a 3rd 
party app?

Thanks – dave



From: Leonard Rosenthol [mailto:lrose...@adobe.com]
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

> no external referencing
>
Careful with that phrase as it’s led to misunderstanding by non-technical 
people.

What you really mean to say is “no externally referenced resources/assets”.

Leonard

From: TvT [mailto:tvtre...@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. 
Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No 
javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta 
information. Probably if you set that one acrobat will say its PDF/A. A better 
check is the PDF/A preflight check acrobat professional is offering. It shows 
you which part of the spec you are missing. If all tests pass then you probably 
have a 95% compliant PDF/A document.

Regards,
ToM
2011/11/17 David Thielen mailto:da...@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is 
not PDF/A.

thanks - dave

--
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/

Re: [iText-questions] What is required to make a file PDF/A?

2011-11-17 Thread David Thielen
I didn't know about pre-flight, that's cool.

Ok, ran my iText generated document through it and got the following:

* Convert to PDF/A-1a (sRGB)

* Convert to PDF/A-1b (sRGB)

I then double clicked on "Verify compliance with PDF/A-1a" and got a lot:

* Author mismatch between Document Info and XMP Metadata

* CIDset in subset font missing (238 matches on 4 pages)

* Creation date mismatch between Document Info and XMP Metadata

* Device process color used but no PDF/A OutputIntent (253 matches on 4 
pages)

* Last Modification Date mismatch between Document Info and XMP Metadata

* MarkInfo missing

* Metadata missing (XMP)

* PDF/A entry missing

* Producer mismatch between Document Info and XMP Metadata

* Structured PDF: Structure tree root entry missing

The biggies seem to be the CIDset for the fonts and colors stored correctly. 
I'm guessing this is not a simple couple of hours to add in.

Is there a setting in iText to set these values or does this require a 3rd 
party app?

Thanks - dave



From: Leonard Rosenthol [mailto:lrose...@adobe.com]
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

> no external referencing
>
Careful with that phrase as it's led to misunderstanding by non-technical 
people.

What you really mean to say is "no externally referenced resources/assets".

Leonard

From: TvT [mailto:tvtre...@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. 
Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No 
javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta 
information. Probably if you set that one acrobat will say its PDF/A. A better 
check is the PDF/A preflight check acrobat professional is offering. It shows 
you which part of the spec you are missing. If all tests pass then you probably 
have a 95% compliant PDF/A document.

Regards,
ToM
2011/11/17 David Thielen mailto:da...@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is 
not PDF/A.

thanks - dave

--
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Re: [iText-questions] What is required to make a file PDF/A?

2011-11-17 Thread TvT
>What you really mean to say is “no externally referenced resources/assets”.
Yes, excactly :-)

** **
>
> Leonard
>
> ** **
>
> *From:* TvT [mailto:tvtre...@nepatec.de]
> *Sent:* Thursday, November 17, 2011 7:12 AM
> *To:* Post all your questions about iText here
> *Subject:* Re: [iText-questions] What is required to make a file PDF/A?***
> *
>
> ** **
>
> 1. That depends which PDF/A you mean:
> PDF/A-1a or PDF/A-1b or PDF/A-2?
>
> 2. Even if you take the simplest PDF/A-1b there is lots of stuff to
> consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO
> 19005-2:2011) No javascript, no external referencing, colors etc etc.
>
> 3. What probably acrobat is looking at is the PDF/A tag in the meta
> information. Probably if you set that one acrobat will say its PDF/A. A
> better check is the PDF/A preflight check acrobat professional is offering.
> It shows you which part of the spec you are missing. If all tests pass then
> you probably have a 95% compliant PDF/A document.
>
> Regards,
> ToM
>
> 
>
> 2011/11/17 David Thielen 
>
> I thought it was just embedding fonts but when we do that Acrobat says it
> is not PDF/A.
>
>  
>
> thanks - dave
>
>
>
> --
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> ___
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> iText(R) is a registered trademark of 1T3XT BVBA.
> Many questions posted to this list can (and will) be answered with a
> reference to the iText book: http://www.itextpdf.com/book/
> Please check the keywords list before you ask for examples:
> http://itextpdf.com/themes/keywords.php
>
> ** **
>
>
> --
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> ___
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> iText(R) is a registered trademark of 1T3XT BVBA.
> Many questions posted to this list can (and will) be answered with a
> reference to the iText book: http://www.itextpdf.com/book/
> Please check the keywords list before you ask for examples:
> http://itextpdf.com/themes/keywords.php
>
--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Re: [iText-questions] What is required to make a file PDF/A?

2011-11-17 Thread Leonard Rosenthol
> no external referencing
>
Careful with that phrase as it's led to misunderstanding by non-technical 
people.

What you really mean to say is "no externally referenced resources/assets".

Leonard

From: TvT [mailto:tvtre...@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. 
Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No 
javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta 
information. Probably if you set that one acrobat will say its PDF/A. A better 
check is the PDF/A preflight check acrobat professional is offering. It shows 
you which part of the spec you are missing. If all tests pass then you probably 
have a 95% compliant PDF/A document.

Regards,
ToM

2011/11/17 David Thielen mailto:da...@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is 
not PDF/A.

thanks - dave

--
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Re: [iText-questions] What is required to make a file PDF/A?

2011-11-17 Thread Leonard Rosenthol
If only it were that simple...

Embedded Fonts that meet the requirements (CIDSet, CharSet, Width matching, 
etc.)
Calibrated Colors including an OutputIntent
Limited Actions
Limited Annots
And the list goes on

From: David Thielen [mailto:da...@windward.net]
Sent: Thursday, November 17, 2011 6:20 AM
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] What is required to make a file PDF/A?

I thought it was just embedding fonts but when we do that Acrobat says it is 
not PDF/A.

thanks - dave
--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Re: [iText-questions] What is required to make a file PDF/A?

2011-11-17 Thread TvT
1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to
consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO
19005-2:2011) No javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta
information. Probably if you set that one acrobat will say its PDF/A. A
better check is the PDF/A preflight check acrobat professional is offering.
It shows you which part of the spec you are missing. If all tests pass then
you probably have a 95% compliant PDF/A document.

Regards,
ToM


2011/11/17 David Thielen 

>   I thought it was just embedding fonts but when we do that Acrobat says
> it is not PDF/A.
>
> thanks - dave
>
>
> --
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> ___
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> iText(R) is a registered trademark of 1T3XT BVBA.
> Many questions posted to this list can (and will) be answered with a
> reference to the iText book: http://www.itextpdf.com/book/
> Please check the keywords list before you ask for examples:
> http://itextpdf.com/themes/keywords.php
>
--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

[iText-questions] What is required to make a file PDF/A?

2011-11-17 Thread David Thielen
I thought it was just embedding fonts but when we do that Acrobat says it is 
not PDF/A.

thanks - dave
--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Re: [iText-questions] Saving the PDF created on SUSE Linux Server (updated)

2011-11-17 Thread Siddhartha Rathi
Oppps Michael,

That's silly copy paste mistake.. That was by mistake copied twice. It is
once only. In fact that will not let me compile forget running it.

Thanks,
Siddhartha

--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/Saving-the-PDF-created-on-SUSE-Linux-Server-updated-tp4075377p4080043.html
Sent from the iText - General mailing list archive at Nabble.com.

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php


[iText-questions] appearance problem with Acrofields and Chinese

2011-11-17 Thread mercoSygest
Hi.
I'm using itextsharp 5.1.2 to fill with chinese text some acrofields.
These fields are set with "Arial unicode MS" font.

I set the 
 BaseFont.AddToResourceSearch(Parametri.AppPath & "iTextAsian.dll")
 BaseFont.AddToResourceSearch(Parametri.AppPath & "iTextAsianCmaps.dll")

FontArial = BaseFont.CreateFont("MSung-Light", "UniCNS-UCS2-H",
BaseFont.NOT_EMBEDDED)
...
_Dest.AcroFields.AddSubstitutionFont(FontArial)


All the fields have the correct values, but some of them are still "blank".
As you cas see here http://www.sygest.it/upgrade/itext/campi.pdf

the field named "DB_D6_XXX" is showing the value... but the field named
"#_ALBERO" not.

Only after a changes in the field object with acrobat, the value become
visible.

Usign preflight i can see that AP=>N=>Resources are different
the working one has the "Font" resources that is missing in the other field
.

Is this an itext issue ?

Thank you


--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/appearance-problem-with-Acrofields-and-Chinese-tp4079317p4079317.html
Sent from the iText - General mailing list archive at Nabble.com.

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php


Re: [iText-questions] Save PDF as plain text

2011-11-17 Thread WMJ
Hello,


Firstly I agree that it is easy to convert the current event model to DOM 
model. And I've done already implemented a very basic model with one or two 
days' work.


Currently I've processed quite some PDF files and I think huge page command 
trees are rare. Few PDF documents contain page contents more than 100KB per 
page. A DOM model is quite affordable. None to mention the fact that there are 
already quite a lot of PDF editor or processors out there. They do have their 
internal structure for those PDF objects to support content editing.


With the DOM model mentioned above, the developers who want to extract and 
analyze text can traverse the DOM tree and grab all PdfShowTextCommand objects. 
By inspecting PdfShowTextCommand object, they immediately know the font, size, 
position, color about those text pieces. A PDF rendering processor named MuPDF 
appears to have a similar API to extract texts.




And...



Although we all know that the PDF commands are linear. However, according to 
the PDF specification, there are de facto "multi-level" structures. For 
example, text commands must be placed within a pair of BT and ET command, and a 
pair of q and Q command encompasses graphic commands within a scope. In the DOM 
model, we don't need to worry about "whether I've added an ET command after the 
BT or not". A PdfTextAreaCommand denotes the BT and implies ET after all its 
sub-commands. Sub-commands of the PdfTextAreaCommand can be 
PdfTextMatrixCommmand, PdfShowTextCommand, PdfFontCommand, etc.




We might need listen to other people's opinions and requirements on PDF content 
processing.

A job that is easy to do doesn't mean that it is a nonsense. If integrating it 
into iText can save other programmers' days, doing such kind of low-tech jobs 
may be meaningful indeed.

I am currently experimenting on the PDF page command DOM model (I need support 
above the font encoding, font subsetting, and more, and more aspects that iText 
lacks). A good thing about the DOM model is that we don't have to create many 
small classes to consume the PDF command events.  A single class may do a 
variety of jobs against the same amount of content. I am trying to program an 
application to filtering out unwanted parts, or batch modifying parts in PDF 
pages. Event model is not so sufficient or effective when doing this. I may try 
to find out more and improve the design.

WMJ.




>
>From: Kevin Day 
>
>h...  Well that was certainly part of the original design consideration. 
>But when you are processing a stack based operator stream, and you have the
>potential for huge streams, an event based handler makes
 the most sense from
>an implementation perspective.  As others are sure to point out, creating a
>DOM from the event model is actually not that hard to do.  Heck,
>LocationTextExtractionStrategy effectively does this as it accumulates text
>operations (it's a pretty flat DOM, but that could be extended).
>
>At the end of the day, what we have heard from users is that they want to
>get text extracted from the page.  Not access to every single draw
>operation...  But there certainly could be use cases that aren't being
>considered.
>
>I think that it's also important to recognize that the PDF format doesn't
>lend itself to rich, multi-level data structures.  For example, you outline
>the concept of sub-nodes in your sample code.  What exactly would those
>sub-nodes contain?  If you are expecting to see a DOM that consists of
>pages, paragraphs, sentences and words, I think you may be asking
 for
>something that PDF doesn't support.
>
>
>So, how do you envision using the information that is in the DOM structure
>that you describe?  And how much state do you want to capture in every node?
>
>I could absolutely see an enhancement to LocationTextStrategy that would
>return a DOM of some sort (or at least a "rich" string - which would
>effectively be a DOM, instead of just a string - this has been the intent of
>the *Strategy objects from day one.
>
>
>
>
>
>--
>View this message in context: 
>http://itext-general.2136553.n4.nabble.com/Save-PDF-as-plain-text-tp4041246p4073263.html
>Sent from the iText - General mailing list archive at Nabble.com.
>
>--
>RSA(R) Conference 2012
>Save $700 by Nov
 18
>Register
 now
>http://p.sf.net/sfu/rsa-sfdev2dev1
>___
>iText-questions mailing list
>iText-questions@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/itext-questions
>
>iText(R) is a registered trademark of 1T3XT BVBA.
>Many questions posted to this list can (and will) be answered with a reference 
>to the iText book: http://www.itextpdf.com/book/
>Please check the keywords list before you ask for examples: 
>http://itextpdf.com/themes/keywords.php
>
>
>--
All the dat

Re: [iText-questions] Saving the PDF created on SUSE Linux Server (updated)

2011-11-17 Thread mkl
Siddhartha,

Siddhartha Rathi wrote:
> As for writing the code twice I found it using the net and that's why I
> have wrote it twice.

Just to be sure we are talking about the same thing. You initially posted
this as your code:


> String path = "/local/notesdata/appln/Reports/" + strDate +
> ".pdf";
> File PdfAtt = new File(path);
>  
> Document pdfDoc = new Document(PageSize.A4.rotate());
> PdfWriter writer = PdfWriter.getInstance(pdfDoc, new
> FileOutputStream(path));
>  
> Document pdfDoc = new Document(PageSize.A4.rotate());
> PdfWriter writer = PdfWriter.getInstance(pdfDoc, new
> FileOutputStream(path));
> 
> pdfDoc.open();
>  
> //-- my other code to fill up the PDF
>  
> pdfDoc.close();

Is this really the current version of your code? I ask because it will not
compile properly as the local variables pdfDoc and writer are defined twice
in the same scope. Depending on your setup you might, therefore, in the
compiled classes still have a former version of that class in which you had
not yet added the iText code. This would easily explain why you don't get
any output there.

Regards,   Michael

--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/Saving-the-PDF-created-on-SUSE-Linux-Server-updated-tp4075377p4079324.html
Sent from the iText - General mailing list archive at Nabble.com.

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php