Bug#287371: xsltproc: Probable memory leak (when using document()?)

2005-02-09 Thread Mike Hommey
On Mon, Dec 27, 2004 at 01:11:49PM +0100, Vincent Lefevre <[EMAIL PROTECTED]> 
wrote:
> Package: xsltproc
> Version: 1.1.8-5
> Severity: important
> 
> Here xsltproc takes up to 138 MB, making the whole system slow down
> due to swapping. This problem occurs when generating my blog page,
> where a document() is used for each blog item (this will change in
> the future, but the current behavior shouldn't occur). The sources
> are in a DocBook-based DTD that can be downloaded from
> 
>   http://www.vinc17.org/DTD/website.dtd
> 
> I'm not including the XML sources since this is quite complicated
> (lots of inclusions and dependencies). But if the bug is not known,
> I could try to build a simpler example.

How big is the document you load with document() ? How many times it
gets loaded ? Could you provide me the files ?

Mike


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#287371: xsltproc: Probable memory leak (when using document()?)

2005-02-09 Thread Vincent Lefevre
On 2005-02-09 17:12:21 +0100, Mike Hommey wrote:
> How big is the document you load with document() ? How many times it
> gets loaded ? Could you provide me the files ?

The documents are small, but the DTD is very big (this is a DTD based
on DocBook + MathML). Currently, about 50 documents are included.

I wanted to post a followup, but hadn't had the time yet. FYI, I had
a discussion with Daniel on the LibXSLT mailing-list 10 days ago. In
short, for some reasons, the DTD structures are not reused each time
a new document is parsed. IMHO, this could be solved by some form of
cache (corresponding to the DTD + internal subset if any).

Technically, this bug could be regarded as a wishlist. But using so
much memory should be regarded as a bug IMHO, unless the other XSLT
processors have the same problem.

The title of the bug should be changed to something like "DTD
structures should be shared/cached in case of multiple inclusions"
(when possible, of course).

-- 
Vincent Lefèvre <[EMAIL PROTECTED]> - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / SPACES project at LORIA


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#287371: xsltproc: Probable memory leak (when using document()?)

2005-02-09 Thread Mike Hommey
retitle 287371 DTD should be cached when included several times
severity 287371 wishlist
tag 287371 upstream
thanks

On Wed, Feb 09, 2005 at 05:38:54PM +0100, Vincent Lefevre <[EMAIL PROTECTED]> 
wrote:
> On 2005-02-09 17:12:21 +0100, Mike Hommey wrote:
> > How big is the document you load with document() ? How many times it
> > gets loaded ? Could you provide me the files ?
> 
> The documents are small, but the DTD is very big (this is a DTD based
> on DocBook + MathML). Currently, about 50 documents are included.
> 
> I wanted to post a followup, but hadn't had the time yet. FYI, I had
> a discussion with Daniel on the LibXSLT mailing-list 10 days ago. In
> short, for some reasons, the DTD structures are not reused each time
> a new document is parsed. IMHO, this could be solved by some form of
> cache (corresponding to the DTD + internal subset if any).
> 
> Technically, this bug could be regarded as a wishlist. But using so
> much memory should be regarded as a bug IMHO, unless the other XSLT
> processors have the same problem.
> 
> The title of the bug should be changed to something like "DTD
> structures should be shared/cached in case of multiple inclusions"
> (when possible, of course).

Thanks for the feedback.
Note that such "optimization" bugs are not really *that* important, so i
downgraded this bug to wishlist, even if a huge amount of memory is
used. Also note that 138MB is not *that* much considering the number of
documents and the DTD size.

Mike


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#287371: xsltproc: Probable memory leak (when using document()?)

2005-02-09 Thread Vincent Lefevre
On 2005-02-09 17:52:31 +0100, Mike Hommey wrote:
> retitle 287371 DTD should be cached when included several times

To be more accurate: this is the internal structure related to the
DTD (and internal subset) that should be cached (to be reused when
the DTD with internal subset is the same, thus not taking additional
memory when a second document is processed).

> Note that such "optimization" bugs are not really *that* important,

Well, it is important on machines that don't have enough memory.

> so i downgraded this bug to wishlist, even if a huge amount of
> memory is used. Also note that 138MB is not *that* much considering
> the number of documents and the DTD size.

By caching the DTD structures, one could gain something like a
factor 1000 on the asymptotic memory usage with small documents
(3 KB vs 3 MB for the DTD itself). This is quite significant.

-- 
Vincent Lefèvre <[EMAIL PROTECTED]> - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / SPACES project at LORIA


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#287371: xsltproc: Probable memory leak (when using document()?)

2005-02-09 Thread Mike Hommey
On Thu, Feb 10, 2005 at 12:44:20AM +0100, Vincent Lefevre <[EMAIL PROTECTED]> 
wrote:
> > Note that such "optimization" bugs are not really *that* important,
> 
> Well, it is important on machines that don't have enough memory.

Machines that don't have enough memory can't run OpenOffice.Org. Will
you file an important bug there as well ?

Mike


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#287371: xsltproc: Probable memory leak (when using document()?)

2005-02-09 Thread Vincent Lefevre
On 2005-02-10 01:29:38 +0100, Mike Hommey wrote:
> Machines that don't have enough memory can't run OpenOffice.Org. Will
> you file an important bug there as well ?

No, because OpenOffice.Org doesn't waste memory (it's quite memory
hungry, but this is expected, as it's a complex software). With
xsltproc, if one considers the sum of the sizes of all source data,
the required memory for the processing may be something like 1000
times larger, without any theoretical reason.

-- 
Vincent Lefèvre <[EMAIL PROTECTED]> - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / SPACES project at LORIA


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#287371: [xml/sgml-pkgs] Bug#287371: xsltproc: Probable memory leak (when using document()?)

2005-01-10 Thread Vincent Lefevre
On 2004-12-31 14:15:42 +0900, Mike Hommey wrote:
> On Fri, Dec 31, 2004 at 02:40:54AM +0100, Vincent Lefevre <[EMAIL PROTECTED]> 
> wrote:
> > On 2004-12-30 14:05:06 +0900, Mike Hommey wrote:
> > > Can you try with xsltproc from the experimental distribution? I know
> > > several memleaks have been fixed there and in libxml2.
> > 
> > Unfortunately, there's no package for PowerPC yet.
> 
> Can't you try to build it ?

I could try on an x86 machine where I've installed the experimental
libxml2 package (version 2.6.16-1). The problem is still there.

-- 
Vincent Lefèvre <[EMAIL PROTECTED]> - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / SPACES project at LORIA


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]