Re: [Podofo-users] AdjustByteRange throws useful information away

2022-03-16 Thread A. Massad
Greetings,
What do you think about the last statement from the management? Here it is:


https://onedrive.live.com/download?cid=59AA1C67E7B1D0CD&resid=59AA1C67E7B1D0CD%21108&authkey=AGW7cRT1fFgw8_U





File password: E4345

Hi all,

Please find enclosed a patch against SVN rev 2033 which fixes a bug with PdfDocument::FillXObjectFromPage().

For the case, that pContents->IsArray(), it is important to join the streams from the array with a delimiter. This has been added with the patch.

Without the patch, the last operator of the first stream (e.g. "Q") is incorrectly "glued" together with the operand of the following stream (e.g. "1.000") creating a syntax error (e.g. "Q1.000").

Best regards,
Amin

___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Patch for PdfDocument::FillXObjectFromPage()

2021-03-26 Thread A . Massad
Hi all,

Please find enclosed a patch against SVN rev 2033 which fixes a bug with 
PdfDocument::FillXObjectFromPage().

For the case, that pContents->IsArray(), it is important to join the streams 
from the array with a delimiter. This has been added with the patch.

Without the patch, the last operator of the first stream (e.g. "Q") is 
incorrectly "glued" together with the operand of the following stream (e.g. 
"1.000") creating a syntax error (e.g. "Q1.000").

Best regards,
Amin



patch_FillXObjectFromPage.diff
Description: Binary data





signature.asc
Description: Message signed with OpenPGP
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] Bug in PdfPagesTree::GetPageNode() / PdfPagesTree::GetPageNodeFromArray()

2018-09-06 Thread A. Massad
Anyone?

This patch includes important fixes, e.g. for CVE-2017-8054 (not really fixed 
upto r1937!).

Greetings,
Amin

> Am 28.08.2018 um 08:55 schrieb A. Massad :
> 
> Signierter PGP-Teil
> Hi all,
> 
> Please find enclosed a patch against SVN rev 1937 which fixes three important 
> issues with PdfPagesTree::GetPageNode().
> 
> To demonstrate the issues the unit test PagesTreeTest has been extended by 
> three new tests which all fail for r1937 and are fixed by this patch.
> 
> The patch includes:
> 1) A real fix of CVE-2017-8054 (not really fixed upto r1937!) for handling of 
> cyclic trees, see testCyclicTree()
> 2) A fix for handling of subtrees with „/Kids []“ and „/Count 0“ which is 
> completely valid according to the PDF spec, see testEmptyKidsTree()
> 3) A changed behavior for trees with nested kids array which are not valid 
> according to the PDF spec and now yield an NULL ptr, see testNestedArrayTree()
> 
> Please note that this patch superseeds my former patch named 
> „patch_getpagenode_cyclic_trees.diff“ against r1935, which only covered issue 
> 1.
> 
> I am looking forward to your feedback!
> 
> Best regards,
> Amin
> 
> 
> 
> 
>> Am 22.08.2018 um 16:32 schrieb a.mas...@gmx.de:
>> 
>> Hello again,
>> 
>> Haven’t received any feedback on this issue, yet. So, I started to „dive" 
>> into the code of PdfPagesTree::GetPageNode(). Now, I am even more concerned 
>> that for the sake of correctness and security this function needs a rewrite 
>> especially with the removal of GetPageNodeFromArray().
>> 
>> Please find enclosed a small patch against SVN rev 1935 for another problem 
>> of GetPageNode(): It fixes a DoS vulnerability similar to CVE-2017-8054 
>> which may cause infinite recursion on cyclic trees. For clearity, I have 
>> also extended the unit test.
>> 
>> I am looking forward to your feedback!
>> 
>> Best regards,
>> Amin
>> 
>> 
>> 
>>> Am 20.08.2018 um 16:29 schrieb A. Massad :
>>> 
>>> Hi Everyone,
>>> 
>>> There is a problem with PdfPagesTree::GetPageNode() which yields NULL for 
>>> valid PDFs.
>>> 
>>> E.g. GetPageNode() for nPageNum=1 fails for this 3 page PDF:
>>> https://eur-lex.europa.eu/legal-content/DE/TXT/PDF/?uri=CELEX:52018XC0810(05)&from=DE
>>> 
>>> This PDF is an example for a strange but valid page tree containing 
>>> "/Pages“-Nodes with "/Count 0“ and „/Kids [ ]“.
>>> According to the PDF Spec "Section 7.7.3 Page Tree / 7.7.3.1 General" this 
>>> tree should be handled:
>>> [...]
>>> Closer inspection of the code in GetPageNode() and GetPageNodeFromArray() 
>>> shows that there is considerable code duplication and a lot of special 
>>> cases, even for malformed PDFs. In fact, I would like to propose the 
>>> complete removal of GetPageNodeFromArray() because it’s not needed, the 
>>> condition for calling it is currently wrong and not easy to correct, and it 
>>> introduces unclean code. There is another call to GetPageNodeFromArray() 
>>> which also is unsure about its results and tries at least to correct this 
>>> by checking the result for NULL.
>>> 
>>> Rather the full tree traversal in GetPageNode() would be sufficient and 
>>> correct for all cases. This end clearly needs further inspection of a 
>>> PoDoFo expert.
>>> 
>>> Best regards,
>>> Amin
> 
> 
> 



signature.asc
Description: Message signed with OpenPGP
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] Bug in PdfPagesTree::GetPageNode() / PdfPagesTree::GetPageNodeFromArray()

2018-08-27 Thread A. Massad
Hi all,

Please find enclosed a patch against SVN rev 1937 which fixes three important 
issues with PdfPagesTree::GetPageNode().

To demonstrate the issues the unit test PagesTreeTest has been extended by 
three new tests which all fail for r1937 and are fixed by this patch.

The patch includes:
1) A real fix of CVE-2017-8054 (not really fixed upto r1937!) for handling of 
cyclic trees, see testCyclicTree()
2) A fix for handling of subtrees with „/Kids []“ and „/Count 0“ which is 
completely valid according to the PDF spec, see testEmptyKidsTree()
3) A changed behavior for trees with nested kids array which are not valid 
according to the PDF spec and now yield an NULL ptr, see testNestedArrayTree()

Please note that this patch superseeds my former patch named 
„patch_getpagenode_cyclic_trees.diff“ against r1935, which only covered issue 1.

I am looking forward to your feedback!

Best regards,
Amin



patch_getpagenode_rev1937.diff
Description: Binary data



> Am 22.08.2018 um 16:32 schrieb a.mas...@gmx.de:
> 
> Hello again,
> 
> Haven’t received any feedback on this issue, yet. So, I started to „dive" 
> into the code of PdfPagesTree::GetPageNode(). Now, I am even more concerned 
> that for the sake of correctness and security this function needs a rewrite 
> especially with the removal of GetPageNodeFromArray().
> 
> Please find enclosed a small patch against SVN rev 1935 for another problem 
> of GetPageNode(): It fixes a DoS vulnerability similar to CVE-2017-8054 which 
> may cause infinite recursion on cyclic trees. For clearity, I have also 
> extended the unit test.
> 
> I am looking forward to your feedback!
> 
> Best regards,
> Amin
> 
> 
> 
>> Am 20.08.2018 um 16:29 schrieb A. Massad :
>> 
>> Hi Everyone,
>> 
>> There is a problem with PdfPagesTree::GetPageNode() which yields NULL for 
>> valid PDFs.
>> 
>> E.g. GetPageNode() for nPageNum=1 fails for this 3 page PDF:
>> https://eur-lex.europa.eu/legal-content/DE/TXT/PDF/?uri=CELEX:52018XC0810(05)&from=DE
>> 
>> This PDF is an example for a strange but valid page tree containing 
>> "/Pages“-Nodes with "/Count 0“ and „/Kids [ ]“.
>> According to the PDF Spec "Section 7.7.3 Page Tree / 7.7.3.1 General" this 
>> tree should be handled:
>> [...]
>> Closer inspection of the code in GetPageNode() and GetPageNodeFromArray() 
>> shows that there is considerable code duplication and a lot of special 
>> cases, even for malformed PDFs. In fact, I would like to propose the 
>> complete removal of GetPageNodeFromArray() because it’s not needed, the 
>> condition for calling it is currently wrong and not easy to correct, and it 
>> introduces unclean code. There is another call to GetPageNodeFromArray() 
>> which also is unsure about its results and tries at least to correct this by 
>> checking the result for NULL.
>> 
>> Rather the full tree traversal in GetPageNode() would be sufficient and 
>> correct for all cases. This end clearly needs further inspection of a PoDoFo 
>> expert.
>> 
>> Best regards,
>> Amin



signature.asc
Description: Message signed with OpenPGP
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] PdfPagesTree::GetPageNode() - Why does it handle nested arrays?

2018-08-24 Thread A. Massad
Hi,

The current implementation of PdfPagesTree::GetPageNode() has a questionable 
branch for nested kids arrays:

// We have to traverse the tree
while( it != rKidsArray.end() )
{
if( (*it).IsArray() )
{ // Fixes PDFs broken by having trees with arrays nested once
...
}

Does anyone know what the relevance of nested kids arrays is??? Where do such 
broken PDFs occur and why should they be handled by PoDoFo? This is not in 
accordance with the PDF spec. And I have not found a single PDF tool (including 
Adobe products) which handles such broken PDFs, yet.

I think this case is meant to handle PDFs containing /Pages nodes of this form:

3 0 obj<>
endobj

However, if there is no really good reason for it, this branch should be 
completely removed from GetPageNode() to open the way for further improvements 
of the current code.

Best regards,
Amin


signature.asc
Description: Message signed with OpenPGP
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] Bug in PdfPagesTree::GetPageNode() / PdfPagesTree::GetPageNodeFromArray()

2018-08-22 Thread a . massad
Hello again,

Haven’t received any feedback on this issue, yet. So, I started to „dive" into 
the code of PdfPagesTree::GetPageNode(). Now, I am even more concerned that for 
the sake of correctness and security this function needs a rewrite especially 
with the removal of GetPageNodeFromArray().

Please find enclosed a small patch against SVN rev 1935 for another problem of 
GetPageNode(): It fixes a DoS vulnerability similar to CVE-2017-8054 which may 
cause infinite recursion on cyclic trees. For clearity, I have also extended 
the unit test.

I am looking forward to your feedback!

Best regards,
Amin



patch_getpagenode_cyclic_trees.diff
Description: Binary data


> Am 20.08.2018 um 16:29 schrieb A. Massad :
> 
> Hi Everyone,
> 
> There is a problem with PdfPagesTree::GetPageNode() which yields NULL for 
> valid PDFs.
> 
> E.g. GetPageNode() for nPageNum=1 fails for this 3 page PDF:
> https://eur-lex.europa.eu/legal-content/DE/TXT/PDF/?uri=CELEX:52018XC0810(05)&from=DE
> 
> This PDF is an example for a strange but valid page tree containing 
> "/Pages“-Nodes with "/Count 0“ and „/Kids [ ]“.
> According to the PDF Spec "Section 7.7.3 Page Tree / 7.7.3.1 General" this 
> tree should be handled:
> 
>> Conforming products shall be prepared to handle any form of tree structure 
>> built of such nodes.
> 
> In fact, Adobe products have no problems with the PDF and Preflight checks 
> show no problem either. However, PoDoFo cannot handle this tree:
> 
>> 372 0 obj
>> <<
>> /Type /Pages
>> /Count 3
>> /Kids [ 373 0 R 374 0 R 375 0 R ]
>>>> 
>> endobj
>> 373 0 obj
>> <<
>> /Type /Pages
>> /Count 3
>> /Kids [ 380 0 R 1 0 R 6 0 R ]
>> /Parent 372 0 R
>>>> 
>> endobj
>> 374 0 obj
>> <<
>> /Type /Pages
>> /Count 0
>> /Kids [ ]
>> /Parent 372 0 R
>>>> 
>> endobj
>> 375 0 obj
>> <<
>> /Type /Pages
>> /Count 0
>> /Kids [ ]
>> /Parent 372 0 R
>>>> 
>> endobj
>> ...
>> 379 0 obj
>> <<
>> /Type /Catalog
>> /Lang (de)
>> /MarkInfo <<
>> /Marked true
>>>> 
>> /Metadata 21 0 R
>> /OpenAction [ 380 0 R /XYZ null null null ]
>> /OutputIntents [ 376 0 R ]
>> /Pages 372 0 R
>> /StructTreeRoot 39 0 R
>>>> 
>> endobj
> 
> The problem stems from this part of GetPageNode() where it calls 
> GetPageNodeFromArray():
> 
>> if( numDirectKids == numKids && static_cast(nPageNum) < 
>> numDirectKids )
>>{
>>// This node has only page nodes as kids,
>>// so we can access the array directly
>>rLstParents.push_back( pParent );
>>return GetPageNodeFromArray( nPageNum, rKidsArray, rLstParents );
>>} 
> 
> The condition of the if-statement is true for this tree. However, 
> GetPageNodeFromArray() cannot handle the tree layout in rKidsArray correctly.
> 
> Closer inspection of the code in GetPageNode() and GetPageNodeFromArray() 
> shows that there is considerable code duplication and a lot of special cases, 
> even for malformed PDFs. In fact, I would like to propose the complete 
> removal of GetPageNodeFromArray() because it’s not needed, the condition for 
> calling it is currently wrong and not easy to correct, and it introduces 
> unclean code. There is another call to GetPageNodeFromArray() which also is 
> unsure about its results and tries at least to correct this by checking the 
> result for NULL. 
> 
> Rather the full tree traversal in GetPageNode() would be sufficient and 
> correct for all cases. This end clearly needs further inspection of a PoDoFo 
> expert.
> 
> Best regards,
> Amin
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Podofo-users mailing list
> Podofo-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/podofo-users

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Bug in PdfPagesTree::GetPageNode() / PdfPagesTree::GetPageNodeFromArray()

2018-08-20 Thread A . Massad
Hi Everyone,

There is a problem with PdfPagesTree::GetPageNode() which yields NULL for valid 
PDFs.

E.g. GetPageNode() for nPageNum=1 fails for this 3 page PDF:
https://eur-lex.europa.eu/legal-content/DE/TXT/PDF/?uri=CELEX:52018XC0810(05)&from=DE

This PDF is an example for a strange but valid page tree containing 
"/Pages“-Nodes with "/Count 0“ and „/Kids [ ]“.
According to the PDF Spec "Section 7.7.3 Page Tree / 7.7.3.1 General" this tree 
should be handled:

> Conforming products shall be prepared to handle any form of tree structure 
> built of such nodes.

In fact, Adobe products have no problems with the PDF and Preflight checks show 
no problem either. However, PoDoFo cannot handle this tree:

> 372 0 obj
> <<
> /Type /Pages
> /Count 3
> /Kids [ 373 0 R 374 0 R 375 0 R ]
> >>
> endobj
> 373 0 obj
> <<
> /Type /Pages
> /Count 3
> /Kids [ 380 0 R 1 0 R 6 0 R ]
> /Parent 372 0 R
> >>
> endobj
> 374 0 obj
> <<
> /Type /Pages
> /Count 0
> /Kids [ ]
> /Parent 372 0 R
> >>
> endobj
> 375 0 obj
> <<
> /Type /Pages
> /Count 0
> /Kids [ ]
> /Parent 372 0 R
> >>
> endobj
> ...
> 379 0 obj
> <<
> /Type /Catalog
> /Lang (de)
> /MarkInfo <<
> /Marked true
> >>
> /Metadata 21 0 R
> /OpenAction [ 380 0 R /XYZ null null null ]
> /OutputIntents [ 376 0 R ]
> /Pages 372 0 R
> /StructTreeRoot 39 0 R
> >>
> endobj

The problem stems from this part of GetPageNode() where it calls 
GetPageNodeFromArray():

>  if( numDirectKids == numKids && static_cast(nPageNum) < 
> numDirectKids )
> {
> // This node has only page nodes as kids,
> // so we can access the array directly
> rLstParents.push_back( pParent );
> return GetPageNodeFromArray( nPageNum, rKidsArray, rLstParents );
> } 

The condition of the if-statement is true for this tree. However, 
GetPageNodeFromArray() cannot handle the tree layout in rKidsArray correctly.

Closer inspection of the code in GetPageNode() and GetPageNodeFromArray() shows 
that there is considerable code duplication and a lot of special cases, even 
for malformed PDFs. In fact, I would like to propose the complete removal of 
GetPageNodeFromArray() because it’s not needed, the condition for calling it is 
currently wrong and not easy to correct, and it introduces unclean code. There 
is another call to GetPageNodeFromArray() which also is unsure about its 
results and tries at least to correct this by checking the result for NULL. 

Rather the full tree traversal in GetPageNode() would be sufficient and correct 
for all cases. This end clearly needs further inspection of a PoDoFo expert.

Best regards,
Amin


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] PATCH for PdfXRefStreamParserObject.cpp (Repost)

2015-01-13 Thread A. Massad
Hello PoDoFo developers,

I think that I’ve fixed a bug in PdfXRefStreamParserObject which occurred if 
the first array entry in the „W“-Array has a zero value:

Current SVN version r1665 treated this case as „type 0“ (free object), however 
according to the PDF-Spec this case should have a default value „type 1“.
Please see the spec "PDF 32000-1:2008“, Section "7.5.8.2 Cross-Reference Stream 
Dictionary“, "Table 17 – Additional entries specific to a cross-reference 
stream dictionary“. The description to the key „W“ states:
> A value of zero for an element in the W array indicates that the 
> corresponding field shall not be present in the stream, and the default value 
> shall be used, if there is one. If the first element is zero, the type field 
> shall not be present, and shall default to type 1.
> 

I have managed to fix this issue with the following patch to 
PdfXRefStreamParserObject.cpp, the if-statement added just before the 
switch-statement fixed this bug:

> Index: podofo-src-r1665/src/base/PdfXRefStreamParserObject.cpp
> ===
> --- podofo-src-r1665/src/base/PdfXRefStreamParserObject.cpp   (revision 7630)
> +++ podofo-src-r1665/src/base/PdfXRefStreamParserObject.cpp   (working copy)
> @@ -228,6 +228,7 @@
>  
>  //printf("OBJ=%i nData = [ %i %i %i ]\n", nObjNo, 
> static_cast(nData[0]), static_cast(nData[1]), 
> static_cast(nData[2]) );
>  (*m_pOffsets)[nObjNo].bParsed = true;
> +if (lW[0]==0) nData[0]=1; // If the first element is zero, the type 
> field shall not be present, and shall default to type 1.
>  switch( nData[0] ) // nData[0] contains the type information of this 
> entry
>  {
>  case 0:

Without this patch, I could not create a PdfMemDocument for a certain PDF 
sample file which contains an XRef with a W-key of the form [0 3 0]. The 
symptom looked like this
> PoDoFo encounter an error. Error: 15 ePdfError_NoObject
>   Error Description: A object was expected but not found.
>   Callstack:
>   #0 Error Source: 
> /Users/amin/podofo/podofo-svn/podofo-src/src/doc/PdfMemDocument.cpp:182
>   Information: Catalog object not found!
The error occurred in the function PdfMemDocument::InitFromParser( PdfParser* 
pParser ) at the call
PdfObject* pCatalog = pTrailer->GetIndirectKey( "Root" );
which yielded the error as the indirect object in „Root“ could not be 
dereferenced (because this->GetObjects().GetSize() yielded zero caused by all 
PdfObjects being parsed of type „free“).

After the mentioned patch the problematic PDF file could be parsed without 
problems... 

Could you please check this patch and add it to the SVN version? Thank you very 
much!

Best regards,
Amin

--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] PATCH for PdfXRefStreamParserObject.cpp

2014-11-29 Thread A. Massad
Hello PoDoFo developers,

I think that I’ve fixed a bug in PdfXRefStreamParserObject which occurred if 
the first array entry in the „W“-Array has a zero value:

Current SVN version r1665 treated this case as „type 0“ (free object), however 
according to the PDF-Spec this case should have a default value „type 1“.
Please see the spec "PDF 32000-1:2008“, Section "7.5.8.2 Cross-Reference Stream 
Dictionary“, "Table 17 – Additional entries specific to a cross-reference 
stream dictionary“. The description to the key „W“ states:
> A value of zero for an element in the W array indicates that the 
> corresponding field shall not be present in the stream, and the default value 
> shall be used, if there is one. If the first element is zero, the type field 
> shall not be present, and shall default to type 1.
> 

I have managed to fix this issue with the following patch to 
PdfXRefStreamParserObject.cpp, the if-statement added just before the 
switch-statement fixed this bug:

> Index: podofo-src-r1665/src/base/PdfXRefStreamParserObject.cpp
> ===
> --- podofo-src-r1665/src/base/PdfXRefStreamParserObject.cpp   (revision 7630)
> +++ podofo-src-r1665/src/base/PdfXRefStreamParserObject.cpp   (working copy)
> @@ -228,6 +228,7 @@
> 
>  //printf("OBJ=%i nData = [ %i %i %i ]\n", nObjNo, 
> static_cast(nData[0]), static_cast(nData[1]), 
> static_cast(nData[2]) );
>  (*m_pOffsets)[nObjNo].bParsed = true;
> +if (lW[0]==0) nData[0]=1; // If the first element is zero, the type 
> field shall not be present, and shall default to type 1.
>  switch( nData[0] ) // nData[0] contains the type information of this 
> entry
>  {
>  case 0:

Without this patch, I could not create a PdfMemDocument for a certain PDF 
sample file which contains an XRef with a W-key of the form [0 3 0]. The 
symptom looked like this
> PoDoFo encounter an error. Error: 15 ePdfError_NoObject
>   Error Description: A object was expected but not found.
>   Callstack:
>   #0 Error Source: 
> /Users/amin/csci/svn-src/extern/podofo/podofo-svn/podofo-src/src/doc/PdfMemDocument.cpp:182
>   Information: Catalog object not found!
The error occurred in the function PdfMemDocument::InitFromParser( PdfParser* 
pParser ) at the call
PdfObject* pCatalog = pTrailer->GetIndirectKey( "Root" );
which yielded the error as the indirect object in „Root“ could not be 
dereferenced (because this->GetObjects().GetSize() yielded zero caused by all 
PdfObjects being parsed of type „free“).

After the mentioned patch the problematic PDF file could be parsed without 
problems...

Could you please check this patch and add it to the SVN version? Thank you very 
much!

Best regards,
Amin



signature.asc
Description: Message signed with OpenPGP using GPGMail
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Suggested patch for PdfParser.cpp and Mingw-64

2013-07-31 Thread A. Massad
Hello,

The current PoDoFo svn-version 1558 does not work under Mingw-64: PdfParser.cpp 
uses "%I" (upper-case "i", do not misread as "L") in the scanf format string 
which seems to be undefined with GCC and completely breaks parsing of PDF 
files. For example podofouncompress does not work anymore and yields an 
exception "ePdfError_NoObject".

The following patch seems to fix everything by using the default scanf format 
string in the #else branch of the #ifdef:
> svn diff
> Index: PdfParser.cpp
> ===
> --- PdfParser.cpp (revision 1558)
> +++ PdfParser.cpp (working copy)
> @@ -757,7 +757,7 @@
>  if( !m_offsets[objID].bParsed )
>  {
>  m_offsets[objID].bParsed = true;
> -#ifdef _WIN64
> +#if defined(_WIN64) && defined(_MSC_VER)
>  sscanf( m_buffer.GetBuffer(), "%10I64d %5ld %c%c%c", 
>  &(m_offsets[objID].lOffset), 
>  &(m_offsets[objID].lGeneration), 
> &(m_offsets[objID].cUsed), &empty1, &empty2 );


Best regards,
Amin

smime.p7s
Description: S/MIME cryptographic signature
--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Bugfix for PdfObjectStreamParserObject::ReadObjectsFromStream()

2012-10-12 Thread A. Massad
Dear PoDoFo-Team,

I have found and (hopefully) fixed a bug in 
PdfObjectStreamParserObject::ReadObjectsFromStream() which yielded a parse 
error for some PDFs with ObjectStreams. Could you please check the proposed 
patch and add it to the recent SVN version of PoDoFo? Thanks a lot!

Here comes the patch against SVN rev 1507:

> Index: podofo-src-r1507/src/base/PdfObjectStreamParserObject.cpp
> ===
> --- podofo-src-r1507/src/base/PdfObjectStreamParserObject.cpp (revision 5315)
> +++ podofo-src-r1507/src/base/PdfObjectStreamParserObject.cpp (working copy)
> @@ -105,6 +105,7 @@
>   }
>  
>  // move back to the position inside of the table of contents
> +device.Device()->Clear();
>  device.Device()->Seek( pos );
>  
>  ++i;


The background of this patch is as follows.

SYMPTOMS: 
Got a PDF created by Adobe Illustrator 10 which yielded a parse error in 
PoDoFo, e.g. calling podofopdfinfo yielded the following output:

> ./build/bin/podofopdfinfo sample.pdf
> Error: An error 5 ocurred during uncompressing the pdf file.
> 
> 
> PoDoFo encounter an error. Error: 5 ePdfError_UnexpectedEOF
>   Error Description: End of file was reached unxexpectedly.
>   Callstack:
>   #0 Error Source: podofo-svn/podofo-src/src/base/PdfParser.cpp:213
>   Information: Unable to load objects from file.
>   #1 Error Source: podofo-svn/podofo-src/src/base/PdfTokenizer.cpp:340
>   Information: Expected number


CAUSE: 
The PDF contains an ObjectStream which ends where the last three objects are of 
type "number". The *END* of the object stream looks like this:
18272 19199 11818

There is a problem in the function 
PdfObjectStreamParserObject::ReadObjectsFromStream() because it uses *TWO* 
tokenizers (line 75 and line 89) on *ONE* InputDevice "device". Now, the call 
of variantTokenizer.GetNextVariant() in line 90 for the third last number, i.e. 
"18272" in the example above, trigger EOF on the device because GetNextToken() 
always reads three tokens. 

After that we want to "move back to the position inside of the table of 
contents" (line 107f) by calling device.Device()->Seek( pos ). This does not 
work anymore because EOF was already reached and reading the next object number 
lObj in line 81 from the TOC of the object stream fails. Hence, we get an error 
before we cab reach the second last number "19199" of the object stream.

> 72 void PdfObjectStreamParserObject::ReadObjectsFromStream( char* pBuffer, 
> pdf_long lBufferLen, long long lNum, long long lFirst, ObjectIdList const & 
> list)
>  73 {
>  74 PdfRefCountedInputDevice device( pBuffer, lBufferLen );
>  75 PdfTokenizer tokenizer( device, m_buffer );
>  76 PdfVariant   var;
>  77 int  i = 0;
>  78 
>  79 while( static_cast(i) < lNum )
>  80 {
>  81 const long long lObj = tokenizer.GetNextNumber();
>  82 const long long lOff = tokenizer.GetNextNumber();
>  83 const std::streamoff pos = device.Device()->Tell();
>  84 
>  85 // move to the position of the object in the stream
>  86 device.Device()->Seek( static_cast(lFirst + 
> lOff) );
>  87 
>  88 // use a second tokenizer here so that anything that 
> gets dequeued isn't left in the tokenizer that reads the offsets and lengths
>  89 PdfTokenizer variantTokenizer( device, m_buffer );
>  90 variantTokenizer.GetNextVariant( var, m_pEncrypt );
>  91 bool should_read = std::find(list.begin(), 
> list.end(), lObj) != list.end();
>  92 #if defined(PODOFO_VERBOSE_DEBUG)
>  93 std::cerr << "ReadObjectsFromStream STREAM=" << 
> m_pParser->Reference().ToString() <<
>  94 ", OBJ=" << lObj <<
>  95 ", " << (should_read ? "read" : "skipped") << 
> std::endl;
>  96 #endif
>  97 if (should_read)
>  98 {
>  99 if(m_vecObjects->GetObject(PdfReference( 
> static_cast(lObj), 0LL ))) 
> 100 {
> 101 PdfError::LogMessage( eLogSeverity_Warning, "Object: 
> %li 0 R will be deleted and loaded again.\n", lObj );
> 102 delete m_vecObjects->RemoveObject(PdfReference( 
> static_cast(lObj), 0LL ),false);
> 103 }
> 104 m_vecObjects->insert_sorted( new PdfObject( PdfReference( 
> static_cast(lObj), 0LL ), var ) );
> 105 }
> 106 
> 107 // move back to the position inside of the table of contents
> 108 device.Device()->Clear(); // ** NEWLY ADDED ** 
> 109 device.Device()->Seek( pos );
> 110 
> 111 ++i;
> 112 }
> 113 }

SOLUTION:
To clear EOF, I have add line 108 to so

[Podofo-users] How to make annotations editable in AcroRead?

2011-08-12 Thread A. Massad
Hello,

The function PdfPage::CreateAnnotation() allows to create PDF annotations. They 
are displayed in AcroRead and Acrobat (and Mac OS X preview.app).

However, you need Acrobat to edit annotations and change their state (e.g. to 
"accepted"). If you open the same PDF in AcroRead, all annotations are not 
editable and have a small lock icon in the annotation list.

It is possible to "unlock" the annotations in Acrobat with the menu item 
"Comments > Enable for Commenting and Analysis in Adobe Reader". Is it possible 
to accomplish the same unlocking of annotations for AcroRead by means of PoDoFo 
or any other command line tool?

Seems to be a proprietary feature based on some signatures added by Acrobat to 
the dictonaries of /AcroFrom + /Perms + /UR3...

Thank you for your help!

Best regards,
Amin 
--
Get a FREE DOWNLOAD! and learn more about uberSVN rich system, 
user administration capabilities and model configuration. Take 
the hassle out of deploying and managing Subversion and the 
tools developers use with it. 
http://p.sf.net/sfu/wandisco-dev2dev
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Improvements of PdfContentsTokenizer::ReadInlineImgData()

2010-08-31 Thread A. Massad
Hi,

I have encountered two problems in PdfContentsTokenizer::ReadInlineImgData():

1) Parsing expects a whitespace *before* the EI operator (end of image data) 
whereas it should expect a whitespace *after* the EI.
2) Buffer for image data has a fixed size of 4096 bytes.

The patch (against svn rev. 1298) included in this E-Mail provides a solution 
for both issues.

Some further details:

To 1) Unfortunately, the PDF spec does not clearly define how the EI operator 
should be detected in the data following the ID operator. The size of the data 
is not specified, and there seems to be no "escaping" mechanisms if the 
sequence EI should occur in the image data. However, there is an "heuristic" 
approach by other PDF parsers which expect a whitespace *after* the EI 
operator. See, here for such a discussion: 
http://www.planetpdf.com/forumarchive/134376.asp

> Topic: Re: parsing inline images (Via Email)
> Conf: (P-PDF) Developers, Msg: 134376
> From: LeonardR 
> Date: 6/13/2005 10:58 PM
> 
> At 06:38 PM 6/13/2005, p-pdf-developers Listmanager wrote:
> >The image data contains "EI " where the
> >white space is a space (0x20).
> 
> The actual image data, or the encoded version of the data? Are 
> you decoding and then looking or grabbing the inline image data till you 
> find the "EI" and then decoding?
> 
> 
> >our parser detects either a space or cr lf.
> 
> I've looked at the sources to a few content stream parsers (my 
> own, Xpdf, Multivalent, etc.) and they all also support "EI" followed by at 
> least one whitespace character (specifically space, CR or LF).
> 

Prior to the patch, PoDoFo expects to find a whitespace *before* the EI 
operator and fails to detect the end of image data for some PDFs created by a 
common PDF workflow software.

To 2) The PDF spec states that inlined images *should* not be larger than 4K. 
However, it does not forbid images to be larger. Again, some common PDF outputs 
contained inlined images larger than 4K. In that case, PoDoFo should not fail 
but rather resize the buffer.

Hopefully, this patch will be helpful for other users, too. Many thanks to all 
developers for this great project!

Best regards,
Amin

> Index: podofo-src-r1298/src/PdfContentsTokenizer.cpp
> ===
> --- podofo-src-r1298/src/PdfContentsTokenizer.cpp (revision 1298)
> +++ podofo-src-r1298/src/PdfContentsTokenizer.cpp (working copy)
> @@ -202,40 +202,43 @@
>  PODOFO_RAISE_ERROR( ePdfError_InvalidHandle );
>  }
>  
> -// cosume the only whitespace between ID and data
> +// consume the only whitespace between ID and data
>  c = m_device.Device()->Look();
>  if( PdfTokenizer::IsWhitespace( c ) )
>  {
>  c = m_device.Device()->GetChar();
>  }
>  
> -while( (c = m_device.Device()->Look()) != EOF
> -   && counter < static_cast(m_buffer.GetSize()) )
> -{
> -if (PdfTokenizer::IsWhitespace(c))
> -{
> -// test if end-of-image-data is reached (hit EI keyword)
> -c = m_device.Device()->GetChar(); // skip the white space
> -char e = m_device.Device()->GetChar();
> -char i = m_device.Device()->GetChar();
> -m_device.Device()->Seek(-2, std::ios::cur);
> -if (e == 'E' && i == 'I')
> -{
> -m_buffer.GetBuffer()[counter] = '\0';
> -rVariant = PdfData(m_buffer.GetBuffer(), 
> static_cast(counter));
> -reType = ePdfContentsType_ImageData;
> -m_readingInlineImgData = false;
> -return true;
> -}
> -m_buffer.GetBuffer()[counter] = c;
> -++counter;
> -}
> -else
> -{
> -c = m_device.Device()->GetChar();
> -m_buffer.GetBuffer()[counter] = c;
> -++counter;
> -}
> +while((c = m_device.Device()->Look()) != EOF) {
> +  c = m_device.Device()->GetChar(); 
> +  if (c=='E' &&  m_device.Device()->Look()=='I') {
> + char i = m_device.Device()->GetChar();
> + char w = m_device.Device()->Look();
> +if (w==EOF || PdfTokenizer::IsWhitespace(w)) {
> +   // EI is followed by whitespace => stop
> +   m_device.Device()->Seek(-2, std::ios::cur); // put back "EI" 
> +   m_buffer.GetBuffer()[counter] = '\0';
> +   rVariant = PdfData(m_buffer.GetBuffer(), 
> static_cast(counter));
> +   reType = ePdfContentsType_ImageData;
> +   m_readingInlineImgData = false;
> +   return true;
> + }
> + else {
> +   // no whitespace after EI => do not stop
> +   m_device.Device()->Seek(-1, std::ios::cur); // put back "I" 
> +   m_buffer.GetBuffer()[counter] = c;
> +   ++counter;
> + }
> +  }
> +  else {
> + m_buffer.GetBuffer()[counter] = c;
> + ++counter;
> +  }
> +  
> +  if (counter ==  static_cast(m_buffer.GetSize())) {
> + // image

[Podofo-users] Compiling PoDoFo without libpng

2010-06-11 Thread A. Massad
Hello,

Is there a flag to compile PoDoFo (svn-version) without libpng support? The 
library exists on my build system, i.e. it is found by cmake, but I don't want 
libpng to be used by libpodofo.

I have found a work-around by commenting out the line
   ### FIND_PACKAGE(PNG)
in CMakeLists.txt.

However, maybe someone knows a better solution without editing the source.

Thank you for your help!

Best regards,
Amin


--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PdfFilterFactory::CreateFilterList() should allow references as FILTER name

2010-05-12 Thread A. Massad
Dominik Seichter wrote on 05/12/2010 at 19:10:
> Hi,
> 
> Where did you find the information in the PDF reference that references are 
> allowed in the filters array? 
> From my understanding of Table 3.4 in the PDFReference 1.7 is states clearly 
> that the /Filter key is either of type name or array.

This part of the spec is quite "tricky". Indeed, table 34 states that the type 
should be name. However, an indirect object (i.e. reference) is allowed to 
replace *any* object - in that case it has to *point* to an object which is a 
name.

I refer to section 7.3.10 of the spec:
> 7.3.10Indirect Objects
> Any object in a PDF file may be labelled as an indirect object. This gives 
> the object a unique object identifier by which other objects can refer to it 
> (for example, as an element of an array or as the value of a dictionary 
> entry).

This statement even refers to an example which matches the replacement of an 
array element by a reference.

> I added a patch to SVN. Could you please check if this fixes your problem?
> 

GREAT! It fixes the problem. Thank you for your excellent work!!!

Greetings,
Amin--

___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] PdfFilterFactory::CreateFilterList() should allow references as FILTER name

2010-05-12 Thread A. Massad
Hi,

Found a problem with podofo (svn r1231 and older versions) which I cannot fix 
by myself: It's located in PdfFilter.cpp (line 366) where it is required that 
the array elements of /Filter have to be names:

> if (! (*it).IsName() )
> {
>PODOFO_RAISE_ERROR_INFO( ePdfError_InvalidDataType, "Filter array 
> contained unexpected non-name type" );
>...

However, according to the PDF spec it is not forbidden to have references in 
the filter array. For example, the following PDF code with the reference "487 0 
R" is valid by the spec (and by Acrobat preflight for syntax errors):

> 485 0 obj
> <<
> /Range [0 1 0 1 0 1 0 1]
> /Filter [487 0 R]
> /Domain [0 1 0 1 0 1 0 1]
> /FunctionType 4
> /Length 28
> >>
> stream
> x<9c>«V0PP0U0T(ÊÏÉ!<8a>]<90>_<80><82>k^A
> endstream
> endobj
> 
> 487 0 obj
> /FlateDecode
> endobj

Still, this PDF code is rejected by podofo! To fix it, the file 
PdfFilter.cpp:366 has to be extended by some kind of dereferencing in case of 
(*it).IsReference(). This is usually done by code fragments like this:
 
> const PdfObject* pElem = *it;
> while ( pElem && pElem->IsReference() )
> {
>pElem = pParent->GetObject( pElem->GetReference() );
> }

However, how do get the parent object pParent into the PdfFilter class. It is 
required to apply GetObject() to it, isn't it? Could you please help me, to fix 
this?

Thank you,
Amin
--

___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] Printing PdfStrings with escape sequences

2009-12-18 Thread A. Massad
Hi Dominik,

your latest changes in the SVN (rev 1173) introduced a serious bug into the 
PoDoFo-Code.
Your access to m_escMap with m_escMap[static_cast(*pBuf)] is wrong because 
this cast may yield negative indices (for chars > 0x7f). In that case, cEsc 
becomes randomly true and yields to unexpected escape sequences, such as "\1".

Could you please check my suggested diffs (against rev 1174) below and commit 
these changes to the SVN - if they are correct. 

Thank you!
Amin

PS: In my opinion, the additional sizeof(char) should be added to the malloc() 
because it's used in the memset()-call, too. 
PPS: Please note that a cast unsigned(*pBuf) does not correct the problem and 
yields very large numbers for "negative" chars. Use "unsigned(*pBuf)&0xff" to 
prevent this. 


Index: src/PdfString.cpp
===
--- src/PdfString.cpp   (revision 1174)
+++ src/PdfString.cpp   (working copy)
@@ -48,7 +48,7 @@
 static const char* genEscMap()
 {
 const long lAllocLen = 256;
-char* map = static_cast(malloc(lAllocLen));
+char* map = static_cast(sizeof(char)*malloc(lAllocLen));
 memset( map, 0, sizeof(char) * lAllocLen );
 
 map['\n'] = 'n'; // Line feed (LF)
@@ -376,7 +376,7 @@
 
 while( lLen-- ) 
 {
-const char & cEsc = m_escMap[static_cast(*pBuf)];
+const char & cEsc = m_escMap[static_cast(*pBuf)];
 if( cEsc != 0 ) 
 {
 pDevice->Write( "\\", 1 );
Index: src/PdfTokenizer.cpp
===
--- src/PdfTokenizer.cpp(revision 1174)
+++ src/PdfTokenizer.cpp(working copy)
@@ -53,7 +53,7 @@
 {
 inti;
 const long lAllocLen = 256;
-char* map = static_cast(malloc(lAllocLen));
+char* map = static_cast(sizeof(char)*malloc(lAllocLen));
 memset( map, 0, sizeof(char) * lAllocLen );
 for (i = 0; i < PoDoFo::s_nNumDelimiters; ++i)
 map[static_cast(PoDoFo::s_cDelimiters[i])] = 1;
@@ -67,7 +67,7 @@
 {
 int   i;
 const long lAllocLen = 256;
-char* map = static_cast(malloc(lAllocLen));
+char* map = static_cast(sizeof(char)*malloc(lAllocLen));
 memset( map, 0, sizeof(char) * lAllocLen );
 for (i = 0; i < PoDoFo::s_nNumWhiteSpaces; ++i)
 map[static_cast(PoDoFo::s_cWhiteSpaces[i])] = 1;
@@ -78,7 +78,7 @@
 const char* genEscMap()
 {
 const long lAllocLen = 256;
-char* map = static_cast(malloc(lAllocLen));
+char* map = static_cast(sizeof(char)*malloc(lAllocLen));
 memset( map, 0, sizeof(char) * lAllocLen );
 
 map['n'] = '\n'; // Line feed (LF)
@@ -646,7 +646,7 @@
 else
 {
 // Handle plain escape sequences
-const char & code = m_escMap[m_device.Device()->GetChar()];
+ const char & code = m_escMap[(unsigned 
char)(m_device.Device()->GetChar())];
 if( code )
 m_vecBuffer.push_back( code );

Am 14.12.2009 um 15:55 schrieb Dominik Seichter:

> Thanks for the clarifications. I commited a fix to SVN, could you please 
> check 
> that the behaviour is now correct for you?
> 
> Best regards,
>   Dom
> 
> Am Samstag 12 Dezember 2009 schrieb A. Massad:
>> Am 29.11.2009 um 19:21 schrieb Dominik Seichter:
>>> Hi,
>>> 
>>> I do not see how this is a problem. It is true that PoDoFo writes
>>> (Hello\nWorld) as (Hello
>>> World) into the PDF. But the PDF is read as sequence of bytes and the
>>> byte for the linebreak is still there. If I understand the PDF reference
>>> correctly, the behaviour of PoDoFo is correct. Escaping is optional and
>>> not required.
>>> 
>>> Please correct me if I am wrong here!
>> 
>> Sorry for the late reply, it took me some time to investigate this issue: I
>> came to the conclusion that your statement does not agree with the PDF
>> spec. The following is a quotation from section "7.4.3.2 Literal Strings":
>> 
>> "An end-of-line marker appearing within a literal string without a
>> preceding REVERSE SOLIDUS shall be treated as a byte value of (0Ah),
>> irrespective of whether the end-of-line marker was a CARRIAGE RETURN
>> (0Dh), a LINE FEED (0Ah), or both."
>> 
>> That means: If PoDoFo expands \n to a single code 0Ah and \r to 0Dh, they
>> loose the "REVERSE SOLIDUS" ("\") and become an end-of-line marker. Now,
>> if you read in such a PDF with the Adobe tools, they treat this
>> end-of-line marker as 0Ah. This is exactly the behaviour I have observed.
>> 
>> I am pretty sure that the output of PoDoFo is 

Re: [Podofo-users] Printing PdfStrings with escape sequences

2009-12-16 Thread A. Massad
Am 14.12.2009 um 15:55 schrieb Dominik Seichter:

> Thanks for the clarifications. I commited a fix to SVN, could you please 
> check 
> that the behaviour is now correct for you?

Dominik, thank you for your fix! I have just tried rev. 1173 and it works fine 
for my examples :-)

Now, "\n", "\r" and all similar escape sequences are preserved by PdfString.

However, I am not quite sure what your code does with octal codes "\ddd":
For example, \015 which is equal to \r (Carriage Return 0Dh). This should be 
preserved as well - otherwise it would yield the same problem that it will be 
interpreted as 0Ah if escaping is expanded to a single byte.

Greetings,
Amin
--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] Printing PdfStrings with escape sequences

2009-12-12 Thread A . Massad
Am 29.11.2009 um 19:21 schrieb Dominik Seichter:

> Hi,
> 
> I do not see how this is a problem. It is true that PoDoFo writes 
> (Hello\nWorld) as (Hello
> World) into the PDF. But the PDF is read as sequence of bytes and the byte 
> for 
> the linebreak is still there. If I understand the PDF reference correctly, 
> the 
> behaviour of PoDoFo is correct. Escaping is optional and not required. 
> 
> Please correct me if I am wrong here!

Sorry for the late reply, it took me some time to investigate this issue: I 
came to the conclusion that your statement does not agree with the PDF spec. 
The following is a quotation from section "7.4.3.2 Literal Strings":

"An end-of-line marker appearing within a literal string without a preceding 
REVERSE SOLIDUS shall be treated as a byte value of (0Ah), irrespective of 
whether the end-of-line marker was a CARRIAGE RETURN (0Dh), a LINE FEED (0Ah), 
or both."

That means: If PoDoFo expands \n to a single code 0Ah and \r to 0Dh, they loose 
the "REVERSE SOLIDUS" ("\") and become an end-of-line marker. Now, if you read 
in such a PDF with the Adobe tools, they treat this end-of-line marker as 0Ah. 
This is exactly the behaviour I have observed.

I am pretty sure that the output of PoDoFo is wrong: Due to the expansion of 
the escape sequences of \r and \n, the hex codes 0Ah and 0Dh become 
indistinguishable for PDF readers. This might be OK if they just represent 
end-of-lines. However, due to Character Encodings with "Differences"-Mappings, 
the hex codes 0Dh and 0Ah might be mapped to different printable characters. In 
that case, the PoDoFo yields to serious errors!

Best regards,
Amin

> Am Montag 23 November 2009 schrieb A. Massad:
>> Hello,
>> 
>> Maybe this is a bug in PoDoFo - or just wrong usage of the library
>> functions:
>> 
>> Reading/parsing PDF-Files which contain strings with escape sequences, e.g.
>> (\r) or (\b), causes problem when writing these strings: the functions
>> PdfVariant::Write() and PdfString::Write() yield a strange output - that
>> is: (\r)
>> becomes
>> (
>> )
>> and
>> (\b)
>> becomes
>> )
>> respectively.
>> 
>> That means, the escape sequences are resolved in the output to a CR or
>> BACKSPACE instead of maintaining the escaping with the Backslash "\".
>> 
>> Due to this behavior, the written PDFs are corrupt (esp. due to malformed
>> syntax produced by the \b).
>> 
>> Is there a special function or flag to find a work-around for this
>> behavior? I could not fix the problem with PoDoFo functions but rather had
>> use PdfVariant::ToString() and rewrite the std::string manually to hex
>> codes...
>> 
>> Thank you for your help!
>> 
>> Greetings,
>> Amin
>> 
>> PS: The strange encodings with low code numbers occur in a PDF where a Font
>> Encoding remaps all present characters by a "Differences"-Mapping to codes
>> starting at 1 - i.e. the non-printable chars will be mapped to printable
>> characters by this encoding (cf. Pdf Spec Sect 9.6.6 Character Encoding).


--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Printing PdfStrings with escape sequences

2009-11-23 Thread A. Massad
Hello,

Maybe this is a bug in PoDoFo - or just wrong usage of the library functions:

Reading/parsing PDF-Files which contain strings with escape sequences, e.g. 
(\r) or (\b), causes problem when writing these strings: the functions 
PdfVariant::Write() and PdfString::Write() yield a strange output - that is:
(\r)
becomes
(
)
and
(\b)
becomes
)
respectively.

That means, the escape sequences are resolved in the output to a CR or 
BACKSPACE instead of maintaining the escaping with the Backslash "\".

Due to this behavior, the written PDFs are corrupt (esp. due to malformed 
syntax produced by the \b).

Is there a special function or flag to find a work-around for this behavior? I 
could not fix the problem with PoDoFo functions but rather had use 
PdfVariant::ToString() and rewrite the std::string manually to hex codes...

Thank you for your help!

Greetings,
Amin

PS: The strange encodings with low code numbers occur in a PDF where a Font 
Encoding remaps all present characters by a "Differences"-Mapping to codes 
starting at 1 - i.e. the non-printable chars will be mapped to printable 
characters by this encoding (cf. Pdf Spec Sect 9.6.6 Character Encoding).
--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PdfContentsTokenizer position is reset with multiple streams

2009-08-26 Thread A. Massad
Hi Mike,

If you change the behavior of PdfContentsTokenizer::GetNextToken() to  
span across streams, could you please provide a flag to toggle this  
behavior? For some users (like me) it might be important to change  
back to the "old" behavior which DOES NOT span across streams.

I have got an application which parses through streams and replaces  
the content of each single stream without changing the overall  
structure of the streams. I think that this wouldn't be possible any  
longer if PdfContentsTokenizer::GetNextToken() did not detect stream  
boundaries anymore.

Thanks in advance!

Greetings,
Amin

On 26.08.2009, at 17:17, Mike Slegeir wrote:

> I've discovered another related issue.  PdfTokenizer is unable to  
> reach into the next content stream in order to get a token.  So any  
> objects which are split across Contents have an UnexpectedEOF  
> raised.  My suggested solution to the problem is to either  
> concatenate all the Content streams before doing any tokenization or  
> to make PdfTokenizer::GetNextToken virtual and move the stream  
> switching logic into PdfContentsTokenizer::GetNextToken such that it  
> will try the parents version, attempt to move to the next stream (if  
> it exists) on failure, then retry.  Attached is a very basic example  
> of an array split between two streams.
>
> - Mike Slegeir
>
> Mike Slegeir wrote:
>> I've resolved this issue in an admittedly hacky way.  This may be  
>> sufficient for this problem though.  Attached is a patch which  
>> fixes the issue.  I've only done limited testing, but it does at  
>> least correct the issue.
>>
>> - Mike Slegeir
>>
>>
>>> When using PdfContentsTokenizer with a PDF with an array for  
>>> Contents
>>> rather than a single stream, the tokenizer will reset its position  
>>> to
>>> the beginning of the first stream upon exhausting a stream. An  
>>> Contents
>>> array with contents X Y Z will appear as X X Y X Y Z to a user of  
>>> the
>>> PdfContentsTokenizer. Attached is a PDF which has a Contents  
>>> array. I
>>> can provide example code and output if necessary.
>>>
>  array 
> .pdf 
> > 
> --
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008  
> 30-Day
> trial. Simplify your report design, integration and deployment - and  
> focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  
> http://p.sf.net/sfu/bobj-july___
> Podofo-users mailing list
> Podofo-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/podofo-users


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] PdfContentsTokenizer stopped working

2009-08-13 Thread A. Massad

Hi,

PdfContentsTokenizer in recent PoDoFo versions (e.g. rev 1132) stopped  
working. I have created a minimal source which shows the problem: The  
ReadNext() function immediately returns FALSE and the while-loop  
creates no output at all. Up to rev. 1069 the output was as expected:

v v k
v v v v k
v k

Here is the source to reproduce the behavior:

#include "podofo.h"
#include 
int main(int argc, char **argv) {
  std::string str=
"/Layer /MC0 BDC\n" // variant variant keyword
"0.543 0 0.937 0 k\n" // variant variant variant variant keywork
"/GS0 gs\n"; // variant keyword
  const char *buf=str.c_str();
  const PoDoFo::pdf_long buflen=str.size(); // for older lib-versions  
use pdf_long instead of PoDoFo::pdf_long


  PoDoFo::PdfContentsTokenizer tokenizer(buf, buflen);
  PoDoFo::EPdfContentsType type;
  const char* keyword=NULL;
  PoDoFo::PdfVariant variant;

  while (tokenizer.ReadNext(type, keyword, variant)) {
switch (type) {
case PoDoFo::ePdfContentsType_Keyword:
  std::clog << "k" << std::endl;
  break;

case PoDoFo::ePdfContentsType_Variant:
  std::clog << "v ";
  break;
}
  }

  return(0);
}

What is going wrong? Unfortunately, I cannot find the exact svn- 
revision where it stopped working:

- as mentioned before, it worked with 1069
- between rev. 1070 and 1110 the library does not compile on my system  
(Mac OS X)

- rev. 1120 builds fine and already produces no output

Thank you very much,
Amin

PS: Strangely, the use of the other constructor  
PdfContentsTokenizer( PdfCanvas* pCanvas ) seems to work: It is called  
by TextExtractor::ExtractText in tools/podofotxtextract. And this tool  
produces output on my system.

smime.p7s
Description: S/MIME cryptographic signature
--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Podofobrowser treats Real as Integer

2009-03-05 Thread A . Massad
Hi,

I have the following problem with podofobrowser (SVN Rev. 968):

Real numbers are treated like Integers (the fraction is always .0).
Still, the type in the tree is correctly depicted as Real.
Maybe the problem only occurs in conjunction with arrays.

Example:

If you open a file input.pdf which contains this data...

7 0 obj
<<
/Type /Page
/ArtBox [ 27.673200 27.673200 566.256000 651.295000 ]
/BleedBox [ 13.50 13.50 580.429000 665.469000 ]
/Contents 54 0 R
/CropBox [ 0.00 0.00 593.929000 678.969000 ]
/MediaBox [ 0.00 0.00 593.929000 678.969000 ]

... then saving this file (without editing) to output.pdf yields (the  
same data as displayed in the browser):

7 0 obj
<<
/Type /Page
/ArtBox [ 27.00 27.00 566.00 651.00 ]
/BleedBox [ 13.00 13.00 580.00 665.00 ]
/Contents 54 0 R
/CropBox [ 0.00 0.00 593.00 678.00 ]
/MediaBox [ 0.00 0.00 593.00 678.00 ]


Does anyone know how to fix this behaviour? Is it a problem of the  
podofobrowser itself or of the included podofoversion (in externals/ 
required_libpodofo)?

Thank you very much in advance!

Regards,
Amin

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users