[Maposmatic-dev] Size of PDF when splitting a PDF surface

Thomas Petazzoni Wed, 18 Aug 2010 01:30:26 -0700

Hello,

I am one of the developer of MapOSMatic (http://www.maposmatic.org), a
Web service that generates printable maps and street indexes using
OpenStreetMap data. We heavily use the Cairo backend of Mapnik to
generate our PDF, SVG and PNG maps, and we also use Cairo to draw and
render the street index.


In the new version of MapOSMatic we're developing, we are implementing
a "booklet" rendering mode: instead of having the city map rendered on
a single, large PDF file that is hard to print on common printers, we
will split the map on different (A5, A4, etc.) pages. To do this, we
ask Mapnik to render the full map in a single large Cairo PDFSurface,
and then create another Cairo PDFSurface of the destination size (A5,
A4, etc.), in which we use Context.set_source_surface(),
Context.rectangle() and Context.fill() to render on each page a part of
the original Cairo PDFSurface.

It works well. However, we are facing a size problem in the resulting
PDF file. If the original large map takes, say, 3 MB, and we split it
on 16 pages, the final PDF takes 3 * 16 = 48 MB. If the original large
map takes 5 MB, and we split it on 48 pages, the final PDF takes 240
MB. It appears that the full contents of the original PDF gets
replicated on every page of the resulting PDF, even though each page
only displays a small part of the original PDF.

To highlight this problem, I've created a simple test case in which
I've replaced the complicated Mapnik rendering by some simple Cairo
drawings. The test case (attached, using Python Cairo) does the
following things :

 * creates a 8*72 x 8*72 PDF surface in which we draw some stuff,
   render it inside a PDF and then display its size

 * creates the same 8*72 x 8*72 PDF surface with the same contents, and
   then a 2*72 x 2*72 PDF surface in which on each page, we render
   1/16th of the original surface. We then display the size of this
   final PDF file. It happens that this file is always about 16 times
   bigger than the original PDF.

A "complexity" argument allows to complexify the initial drawing, which
increases the size of the original PDF surface. Whatever the
"complexity" is, the final PDF will always be about 16 times bigger
than the original PDF surface. Some numbers :

 Complexity 20, original PDF  65209 bytes, final PDF 1050035 bytes
 Complexity 30, original PDF  97072 bytes, final PDF 1559843 bytes
 Complexity 50, original PDF 160927 bytes, final PDF 2581507 bytes

Of course, those final PDF sizes are acceptable, but in our real
application (MapOSMatic), we get PDF up to 200 MB.

Is our way of extracting parts of a surface into another surface
incorrect ? Is there a way of making sure that Cairo includes only once
in the final PDF the contents of the original surface ?

Thanks for your feedback !

Thomas
-- 
Thomas Petazzoni                         http://thomas.enix.org
MapOSMatic                               http://www.maposmatic.org

#!/usr/bin/python

import cairo
import os
import sys

# Auxilliary function to draw some crap inside a surface
def draw_stuff(ctx, it):
    for l in range(0, 8):
        for i in range(0, 8):
            for j in range(0, 8):
                ctx.set_source_rgb(i / 10., j / 10., 0.3 * it / 500.)
                ctx.move_to(i * 72 + j * (72 / 8), l * 72 - it / 50.)
                ctx.line_to((i + 1) * 72 + j * (72 / 8), (l + 1) * 72 + it / 50.)
                ctx.set_line_width(4)
                ctx.stroke()

# Draw some crap inside a surface.
def draw_my_complex_stuff(ctx, complexity):
    ctx.set_source_rgb (0.3, 0.3, 0.3)
    ctx.paint()

    for i in range(0, complexity):
        draw_stuff(ctx, i)

if len(sys.argv) != 2:
    print "error: need complexity as argument"
    sys.exit(1)

complexity = int(sys.argv[1])

# First step: draw a single PDF file with the random crap on a (8*72)
# x (8*72) PDF surface
orig = cairo.PDFSurface("original.pdf", 8 * 72, 8 * 72)
ctx = cairo.Context(orig)
draw_my_complex_stuff(ctx, complexity)
orig.finish()

orig_size = os.stat("original.pdf")[6]

# Now, redraw the same thing in an anonymous PDF surface
orig = cairo.PDFSurface(None, 8 * 72, 8 * 72)
ctx = cairo.Context(orig)
draw_my_complex_stuff(ctx, complexity)

# Create a new surface whose width and height are a quarter of the
# original PDF surface. The goal is to display 1/16th of the original
# PDF surface into each page of the new surface.

new = cairo.PDFSurface("splitted.pdf", 2 * 72, 2 * 72)
ctx = cairo.Context(new)

for i in range(0, 4):
    for j in range(0, 4):
        # Each page is a part of the original PDF surface
        x_in_orig = - (i * 2 * 72)
        y_in_orig = - (j * 2 * 72)
        ctx.set_source_surface(orig, x_in_orig, y_in_orig)
        ctx.rectangle(0, 0, 2 * 72, 2 * 72)
        ctx.fill()
        ctx.show_page()

new.finish()

splitted_size = os.stat("splitted.pdf")[6]

# Show the size of the original file
print "Orig surface size once rendered in PDF is : %d bytes" % \
    orig_size

# Show the size of the 'splitted' file
print "Splitted surface size once rendered in PDF is : %d bytes" % \
    splitted_size

print "Size ratio: %f" % (float(splitted_size) / float(orig_size))

signature.asc
Description: PGP signature

[Maposmatic-dev] Size of PDF when splitting a PDF surface

Reply via email to