[jira] [Commented] (FOP-2937) [PATCH]Post PDF generation, Soft reference of PDFObject in PDFReference are not immediately garbage collected leading to excessive memory usage.

2020-05-18 Thread Piyush Khandelwal (Jira)


[ 
https://issues.apache.org/jira/browse/FOP-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110107#comment-17110107
 ] 

Piyush Khandelwal commented on FOP-2937:


Apart from the change that I suggested in the description; I also changed 
entries HashMap in PDFDictionary class that is extended by PDFPage, 
PDFPageLabels to WeakHashMap - though this change in itself didn't bring much 
benefit but in combined to above some improvements was seen because of this 
well. 

Adding patch for that as well.

> [PATCH]Post PDF generation, Soft reference of PDFObject in PDFReference are 
> not immediately garbage collected leading to excessive memory usage.
> 
>
> Key: FOP-2937
> URL: https://issues.apache.org/jira/browse/FOP-2937
> Project: FOP
>  Issue Type: Improvement
>Affects Versions: 2.3, 2.4
>Reporter: Piyush Khandelwal
>Priority: Major
> Attachments: pdfreference.patch
>
>
> PDFReference object holds a SoftReference of PDFObject (PDFPage, PDFLabel, 
> PDFName etc.).
> If we generate a huge PDF ; *I tried with a PDF having around 150 thousand 
> pages with 12 GB of RAM;* lots of these references linger around waiting for 
> the garbage collector to collect them. 
> But GC wont collect them as long as JVM is able to recover enough memory 
> without throwing out of memory.
> Here are few metadata from my testing for further understanding of the issue 
> - 
> Stats for generating 1 PDF - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 11.3GB
> Residual memory after forced GC: 9 GB
> The FO mainly contains tabular data with each pages sequence having max of 
> 500 rows.
> On analyzing the memory dump; found lots of reference for PDFPage, PDFName 
> etc.
> *Question - * Is there any specific reason for using SoftReference in 
> PDFReference class  instead of WeakReference.
> Testing by changing SoftReference  to WeakReference in PDFReference shows 
> following improvements without any issue in the generation whatsoever - 
> Stats for Generating 5 PDF in parallel - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 4GB
> Residual memory after forced GC: 300 MB
> So, by changing SoftReference to WeakReference, I was able to generate 5 PDF 
> having 150K pages in parallel with max  4GB Ram; without any generation 
> issues.
> You can clearly see the performance benefits of changing to WeakReference. 
> But as I dont understand the complete internal details of how FOP works, I 
> would like to understand  if we can target this change and if not what is the 
> reason behind using SoftReference?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FOP-2937) [PATCH]Post PDF generation, Soft reference of PDFObject in PDFReference are not immediately garbage collected leading to excessive memory usage.

2020-05-18 Thread Chris Bowditch (Jira)


[ 
https://issues.apache.org/jira/browse/FOP-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110253#comment-17110253
 ] 

Chris Bowditch commented on FOP-2937:
-

I think we need to keep these references until entire PDF is generated because 
bookmarks or other links might refer back to objects on the first page. So 
clearing them at page 2 onwards would prevent that. We also need to 
de-duplicate resources and clearing information about earlier pages could also 
lead to larger file size with duplicated resources. I am therefore closing this 
suggestion but feel free to disprove my hypothesis and re-open if you can

> [PATCH]Post PDF generation, Soft reference of PDFObject in PDFReference are 
> not immediately garbage collected leading to excessive memory usage.
> 
>
> Key: FOP-2937
> URL: https://issues.apache.org/jira/browse/FOP-2937
> Project: FOP
>  Issue Type: Improvement
>Affects Versions: 2.3, 2.4
>Reporter: Piyush Khandelwal
>Priority: Major
> Attachments: PDFDictionary.patch, pdfreference.patch
>
>
> PDFReference object holds a SoftReference of PDFObject (PDFPage, PDFLabel, 
> PDFName etc.).
> If we generate a huge PDF ; *I tried with a PDF having around 150 thousand 
> pages with 12 GB of RAM;* lots of these references linger around waiting for 
> the garbage collector to collect them. 
> But GC wont collect them as long as JVM is able to recover enough memory 
> without throwing out of memory.
> Here are few metadata from my testing for further understanding of the issue 
> - 
> Stats for generating 1 PDF - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 11.3GB
> Residual memory after forced GC: 9 GB
> The FO mainly contains tabular data with each pages sequence having max of 
> 500 rows.
> On analyzing the memory dump; found lots of reference for PDFPage, PDFName 
> etc.
> *Question - * Is there any specific reason for using SoftReference in 
> PDFReference class  instead of WeakReference.
> Testing by changing SoftReference  to WeakReference in PDFReference shows 
> following improvements without any issue in the generation whatsoever - 
> Stats for Generating 5 PDF in parallel - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 4GB
> Residual memory after forced GC: 300 MB
> So, by changing SoftReference to WeakReference, I was able to generate 5 PDF 
> having 150K pages in parallel with max  4GB Ram; without any generation 
> issues.
> You can clearly see the performance benefits of changing to WeakReference. 
> But as I dont understand the complete internal details of how FOP works, I 
> would like to understand  if we can target this change and if not what is the 
> reason behind using SoftReference?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FOP-2937) [PATCH]Post PDF generation, Soft reference of PDFObject in PDFReference are not immediately garbage collected leading to excessive memory usage.

2020-05-18 Thread Piyush Khandelwal (Jira)


[ 
https://issues.apache.org/jira/browse/FOP-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110890#comment-17110890
 ] 

Piyush Khandelwal commented on FOP-2937:


[~cbowditch] My suggestion was not to clear the pages - it's to change the 
SoftReference to WeakReference. 
If the pages have bookmarks or other links/resources - then for sure the FOP 
will be holding references to it; and if the reference are held and reachable 
in program; JVM won't GC them.

Also, My test case included bookmarks; forward references - there were no 
change whatsoever.
Also there was no change in the size of the PDF

> [PATCH]Post PDF generation, Soft reference of PDFObject in PDFReference are 
> not immediately garbage collected leading to excessive memory usage.
> 
>
> Key: FOP-2937
> URL: https://issues.apache.org/jira/browse/FOP-2937
> Project: FOP
>  Issue Type: Improvement
>Affects Versions: 2.3, 2.4
>Reporter: Piyush Khandelwal
>Priority: Major
> Attachments: PDFDictionary.patch, pdfreference.patch
>
>
> PDFReference object holds a SoftReference of PDFObject (PDFPage, PDFLabel, 
> PDFName etc.).
> If we generate a huge PDF ; *I tried with a PDF having around 150 thousand 
> pages with 12 GB of RAM;* lots of these references linger around waiting for 
> the garbage collector to collect them. 
> But GC wont collect them as long as JVM is able to recover enough memory 
> without throwing out of memory.
> Here are few metadata from my testing for further understanding of the issue 
> - 
> Stats for generating 1 PDF - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 11.3GB
> Residual memory after forced GC: 9 GB
> The FO mainly contains tabular data with each pages sequence having max of 
> 500 rows.
> On analyzing the memory dump; found lots of reference for PDFPage, PDFName 
> etc.
> *Question - * Is there any specific reason for using SoftReference in 
> PDFReference class  instead of WeakReference.
> Testing by changing SoftReference  to WeakReference in PDFReference shows 
> following improvements without any issue in the generation whatsoever - 
> Stats for Generating 5 PDF in parallel - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 4GB
> Residual memory after forced GC: 300 MB
> So, by changing SoftReference to WeakReference, I was able to generate 5 PDF 
> having 150K pages in parallel with max  4GB Ram; without any generation 
> issues.
> You can clearly see the performance benefits of changing to WeakReference. 
> But as I dont understand the complete internal details of how FOP works, I 
> would like to understand  if we can target this change and if not what is the 
> reason behind using SoftReference?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FOP-2937) [PATCH]Post PDF generation, Soft reference of PDFObject in PDFReference are not immediately garbage collected leading to excessive memory usage.

2020-05-22 Thread Piyush Khandelwal (Jira)


[ 
https://issues.apache.org/jira/browse/FOP-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113892#comment-17113892
 ] 

Piyush Khandelwal commented on FOP-2937:


[~cbowditch] Is this being looked into?

> [PATCH]Post PDF generation, Soft reference of PDFObject in PDFReference are 
> not immediately garbage collected leading to excessive memory usage.
> 
>
> Key: FOP-2937
> URL: https://issues.apache.org/jira/browse/FOP-2937
> Project: FOP
>  Issue Type: Improvement
>Affects Versions: 2.3, 2.4
>Reporter: Piyush Khandelwal
>Priority: Major
> Attachments: PDFDictionary.patch, pdfreference.patch
>
>
> PDFReference object holds a SoftReference of PDFObject (PDFPage, PDFLabel, 
> PDFName etc.).
> If we generate a huge PDF ; *I tried with a PDF having around 150 thousand 
> pages with 12 GB of RAM;* lots of these references linger around waiting for 
> the garbage collector to collect them. 
> But GC wont collect them as long as JVM is able to recover enough memory 
> without throwing out of memory.
> Here are few metadata from my testing for further understanding of the issue 
> - 
> Stats for generating 1 PDF - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 11.3GB
> Residual memory after forced GC: 9 GB
> The FO mainly contains tabular data with each pages sequence having max of 
> 500 rows.
> On analyzing the memory dump; found lots of reference for PDFPage, PDFName 
> etc.
> *Question - * Is there any specific reason for using SoftReference in 
> PDFReference class  instead of WeakReference.
> Testing by changing SoftReference  to WeakReference in PDFReference shows 
> following improvements without any issue in the generation whatsoever - 
> Stats for Generating 5 PDF in parallel - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 4GB
> Residual memory after forced GC: 300 MB
> So, by changing SoftReference to WeakReference, I was able to generate 5 PDF 
> having 150K pages in parallel with max  4GB Ram; without any generation 
> issues.
> You can clearly see the performance benefits of changing to WeakReference. 
> But as I dont understand the complete internal details of how FOP works, I 
> would like to understand  if we can target this change and if not what is the 
> reason behind using SoftReference?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FOP-2937) [PATCH]Post PDF generation, Soft reference of PDFObject in PDFReference are not immediately garbage collected leading to excessive memory usage.

2023-06-06 Thread Peter Radomski (Jira)


[ 
https://issues.apache.org/jira/browse/FOP-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729746#comment-17729746
 ] 

Peter Radomski commented on FOP-2937:
-

we have the same issues with EOP 2.8 and large files. 

> [PATCH]Post PDF generation, Soft reference of PDFObject in PDFReference are 
> not immediately garbage collected leading to excessive memory usage.
> 
>
> Key: FOP-2937
> URL: https://issues.apache.org/jira/browse/FOP-2937
> Project: FOP
>  Issue Type: Improvement
>Affects Versions: 2.3, 2.4
>Reporter: Piyush Khandelwal
>Priority: Major
> Attachments: PDFDictionary.patch, pdfreference.patch
>
>
> PDFReference object holds a SoftReference of PDFObject (PDFPage, PDFLabel, 
> PDFName etc.).
> If we generate a huge PDF ; *I tried with a PDF having around 150 thousand 
> pages with 12 GB of RAM;* lots of these references linger around waiting for 
> the garbage collector to collect them. 
> But GC wont collect them as long as JVM is able to recover enough memory 
> without throwing out of memory.
> Here are few metadata from my testing for further understanding of the issue 
> - 
> Stats for generating 1 PDF - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 11.3GB
> Residual memory after forced GC: 9 GB
> The FO mainly contains tabular data with each pages sequence having max of 
> 500 rows.
> On analyzing the memory dump; found lots of reference for PDFPage, PDFName 
> etc.
> *Question - * Is there any specific reason for using SoftReference in 
> PDFReference class  instead of WeakReference.
> Testing by changing SoftReference  to WeakReference in PDFReference shows 
> following improvements without any issue in the generation whatsoever - 
> Stats for Generating 5 PDF in parallel - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 4GB
> Residual memory after forced GC: 300 MB
> So, by changing SoftReference to WeakReference, I was able to generate 5 PDF 
> having 150K pages in parallel with max  4GB Ram; without any generation 
> issues.
> You can clearly see the performance benefits of changing to WeakReference. 
> But as I dont understand the complete internal details of how FOP works, I 
> would like to understand  if we can target this change and if not what is the 
> reason behind using SoftReference?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FOP-2937) [PATCH]Post PDF generation, Soft reference of PDFObject in PDFReference are not immediately garbage collected leading to excessive memory usage.

2023-06-15 Thread Peter Radomski (Jira)


[ 
https://issues.apache.org/jira/browse/FOP-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733023#comment-17733023
 ] 

Peter Radomski commented on FOP-2937:
-

Also OutOfMemory exceptions in different classes. We can see, that the 
framework is using like more than 32 GB RAM memory to generate a 30 MB PDF.

> [PATCH]Post PDF generation, Soft reference of PDFObject in PDFReference are 
> not immediately garbage collected leading to excessive memory usage.
> 
>
> Key: FOP-2937
> URL: https://issues.apache.org/jira/browse/FOP-2937
> Project: FOP
>  Issue Type: Improvement
>Affects Versions: 2.3, 2.4
>Reporter: Piyush Khandelwal
>Priority: Major
> Attachments: PDFDictionary.patch, pdfreference.patch
>
>
> PDFReference object holds a SoftReference of PDFObject (PDFPage, PDFLabel, 
> PDFName etc.).
> If we generate a huge PDF ; *I tried with a PDF having around 150 thousand 
> pages with 12 GB of RAM;* lots of these references linger around waiting for 
> the garbage collector to collect them. 
> But GC wont collect them as long as JVM is able to recover enough memory 
> without throwing out of memory.
> Here are few metadata from my testing for further understanding of the issue 
> - 
> Stats for generating 1 PDF - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 11.3GB
> Residual memory after forced GC: 9 GB
> The FO mainly contains tabular data with each pages sequence having max of 
> 500 rows.
> On analyzing the memory dump; found lots of reference for PDFPage, PDFName 
> etc.
> *Question - * Is there any specific reason for using SoftReference in 
> PDFReference class  instead of WeakReference.
> Testing by changing SoftReference  to WeakReference in PDFReference shows 
> following improvements without any issue in the generation whatsoever - 
> Stats for Generating 5 PDF in parallel - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 4GB
> Residual memory after forced GC: 300 MB
> So, by changing SoftReference to WeakReference, I was able to generate 5 PDF 
> having 150K pages in parallel with max  4GB Ram; without any generation 
> issues.
> You can clearly see the performance benefits of changing to WeakReference. 
> But as I dont understand the complete internal details of how FOP works, I 
> would like to understand  if we can target this change and if not what is the 
> reason behind using SoftReference?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)