Hi Michal, Jozef and Martin,

For a very long time, I have wanted a program that would automatically resize and rescale a PDF page so that it fits on a standard-size sheet of paper. After many bad hacks and after receiving a lot of help, I finally have something that works. (See attached).

I created a Perl script (called "pdfcrop") that:

   *   finds the borders of the PDF's bounding box (using ghostscript),
   *   determines the page orientation (by "grepping" the PDF file),
   *   calculates the optimal page metrics and scale factor and
   *   calls PDFedit to resize and rescale the pages.

The script is less than ideal, but it works.

Could you help me create a pure PDFedit implementation of this script? There's no good reason for using Perl and there's no good reason why my script should use ghostscript and grep to obtain the bounding box and page orientation. Everything should be done entirely within PDFedit.

If I knew the names of the PDFedit functions that obtain the bounding box and media box of each page, I could create a pure PDFedit implementation.

Could you tell me the names of those functions?

Thanks,
- Eric

#!/usr/bin/env perl

##    PDFCrop  version 0.3
##    -------  ------- ---
## 
##  Copyright 2011 -- Eric Doviak <[email protected]>
##  Copyright 2011 -- Tomas Janousek <[email protected]>
## 
##  Eric Doviak wrote most of this Perl script, but Tomas Janousek wrote 
##  the template for the PDFedit script. Eric Doviak's work is copyrighted 
##  under the terms of the GPL v3. Janousek's work is copyrighted under 
##  the terms of the BSD license. The BSD licensed portions are denoted 
##  in the text of this script and the PDFedit script that this script
##  generates.


##  Syntax:   pdfcrop input.pdf [papersize] [output.pdf]
##  
##  Function: 
##    calculates the page metrics and scale factor of a PDF file, 
##    then crops and scales the PDF, so that it neatly fits on a 
##    standard size sheet of paper
##  
##  Options:
##    -h  --help  help      print usage and exit
##  
##    letter  Letter        if you want letter size paper
##    a4      A4            if you want A4 size paper 
##    legal   Legal         if you want legal size paper 
##  
##  Details:
##    pdfcrop attempts to determine your desired paper size by
##    reading /etc/papersize. If that file cannot be read, it 
##    assumes that you want letter size pages.
##  
##    If an output file name is not given, then pdfcrop assumes
##    that the input file should be cropped and scaled.
##  
##  Examples:
##    pdfcrop input.pdf
##    pdfcrop input.pdf A4 output.pdf


use strict ; 
use warnings ; 
no warnings qw( uninitialized ) ; 

## read arguments
my ( $infile , $pagesize , $otfile ) = read_arguments( @ARGV ) ; 

## create a randomly named directory in /tmp/
my $random = sprintf( "%04d" , int( 1000*rand() ) )  ; 
my $tmpdir = "/tmp/pdfcrop_$random/" ; 
mkdir( $tmpdir ) ; 

## if a different output file is desired, then create it
## all scripts will then operate on that output file, 
## so reset the input file name
if ( $infile ne $otfile ) {
    use File::Copy ; 
    copy( $infile , $otfile ) ; 
    $infile = $otfile ; 
}

## names of script files
my $gsfile = $tmpdir . $infile . ".gs.out" ; 
my $gpfile = $tmpdir . $infile . ".gp.out" ; 
my $qsfile = $tmpdir . $infile . ".fix.qs" ; 

## run ghostscript
open( BASH , "|/bin/bash") || die "could not open BASH" ; 
print BASH "gs -dSAFER -dNOPAUSE -dBATCH -q -r72 -sDEVICE=bbox -f $infile 2> 
$gsfile" ; 
close BASH ; 

## we also want to get the original orientation of each page 
## so that we do not accidentally convert a "short portrait page" to landscape
open( BASH , "|/bin/bash") || die "could not open BASH" ; 
print BASH "grep --binary-files=text MediaBox $infile > $gpfile" ; 
close BASH ; 

## keep the orientations in an array
## fortunately, edits are tacked onto the end, so the first <number of pages> 
elements
## will contain the "orientation information"
my @page_orients = get_orientation( $gpfile ) ; 

## print the prologue
open( OVERWRITE , ">$qsfile" ) || die "could not overwrite $qsfile" ; 
print OVERWRITE make_prologue() ; 

## initialize page numbers
my $pagenumber = 1 ; 

## read the ghostscript file and create the script for "pdfedit"
open( GSFILE , $gsfile ) || die "could not read $gsfile" ; 
while (<GSFILE>) {

    chomp ; 
    my $line = $_ ; 

    if ( $line =~ /HiResBoundingBox: (\S+) (\S+) (\S+) (\S+)/ ) {

        ## what are the borders? 
        my $boxborders = "$1 $2 $3 $4" ; 
        
        my @borders = get_borders( $boxborders ) ;
        my ( $hl , $vb , $hr , $vt ) = @borders ; 
        
        ## check for correctness
        should_quit( @borders ) ; 

        ## get page size, orientation etc. 
        my @size_orient = get_size( @borders , $page_orients[$pagenumber-1] , 
$pagesize ) ;

        ## get the new page metrics
        my ( $nhl , $nvb , $nhr , $nvt , $scale ) = get_new_metrics( @borders , 
@size_orient ) ; 

        ## print it all out
        print OVERWRITE "setCrop( doc, $pagenumber, $nhl, $nvb, $nhr, $nvt 
);\n";
        print OVERWRITE "setScale( doc, $pagenumber, $scale, $scale );\n\n";

        ## increment the page number
        $pagenumber++ ; 
    }
}
close GSFILE ; 

## print the epilogue
print OVERWRITE make_epilogue() ; 
close OVERWRITE ; 

## crop and resize the pages
open( BASH , "|/bin/bash") || die "could not open BASH" ; 
print BASH "pdfedit -console -run $qsfile $infile 2> /dev/null" ; 
close BASH ; 

## get rid of junk
unlink( $gsfile , $gpfile , $qsfile ) ; 
rmdir( $tmpdir ) ; 


##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  
  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  
##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  

## SUBROUTINES
## ===========

## read arguments

sub read_arguments {

    my @args = @_ ; 

    ## do you need help?
    if ( ! @args) {
        print_usage() ; 
    }

    foreach my $arg (@args) {
        if ( $arg =~ /^-h$|^--help$|^help$/ ) {
            print_usage() ; 
        }
    }

    ## which arguments are PDF files? which are not
    my @pdffiles ; 
    my $pagesize ;

    foreach my $arg (@args) {
        if ( $arg =~ /pdf$/ ) {
            push( @pdffiles , $arg ) ; 

        } elsif ( $arg =~ /^letter$|^legal$|^a4$/i ) {
            $pagesize = lc( $arg ) ; 
        }
    }

    ## first PDF is assumed to be the input file
    my $infile = $pdffiles[0] ; 
    my $otfile = ( ! $pdffiles[1] ) ? $infile : $pdffiles[1] ;

    ## if $infile is not a file, then there's an error
    if ( ! -f $infile ) {
        print "\n\tError! $infile not found.\n\n";
        print_usage() ; 
    }
    
    ## what is desired papersize? -- attempt to read "/etc/papersize"
    my $etc_papersize ; 
    if ( -f "/etc/papersize" ) {
        open( PAPERSIZE , "/etc/papersize" ) ; 
        chomp( $etc_papersize = <PAPERSIZE> ) ; 
        $etc_papersize = lc( $etc_papersize ) ; 
        close PAPERSIZE ; 
    } 

    ## if $pagesize not found, then check /etc/papersize
    ## if still not found, then assume "letter"    
    ## note: "papersize" vs. "pagesize" ... <roll eyes>
    if ( ! $pagesize ) {
        $pagesize = ( $etc_papersize !~ /^letter$|^legal$|^a4$/ ) ? "letter" : 
$etc_papersize ; 
    }
    
    my @otarray = ( $infile , $pagesize , $otfile ) ; 
    return @otarray ; 
}

## we need to get the input for the borders 
## it's in the form: HL VB HR VT

sub get_borders {

    my $line = $_[0] ; 
    $line =~ s/ {1,}/ /g ; 
    $line =~ s/^ // ; 
    $line =~ s/ $// ; 

    my @args = split( / / , $line ) ; 
    
    my $hl = $args[0] ; ##  HL is Horizontal Left 
    my $vb = $args[1] ; ##  VB is Vertical   Bottom 
    my $hr = $args[2] ; ##  HR is Horizontal Right
    my $vt = $args[3] ; ##  VT is Vertical   Top    

    my @otarray = ( $hl , $vb , $hr , $vt ) ; 
    return @otarray ; 
}

## what's the orientation?
## if horizontal width longer than vertical, then landscape
## otherwise assume portrait

sub get_orientation {

    my $infile = $_[0] ; 

    ## fortunately, edits are tacked onto the end, so the first <number of 
pages> elements
    ## of an array will contain the "orientation information"

    my @page_orients ; 
    
    open( GPFILE , $gpfile ) || die "could not read $gpfile" ; 
    while (<GPFILE>) {

        chomp ; 
        
        my $line = $_ ; 
        $line =~ s/^.*\[// ; 
        $line =~ s/\].*$// ; 
        $line =~ s/ {1,}/ /g ; 
        $line =~ s/^ // ; 
        $line =~ s/ $// ; 
        
        my ( $hl , $vb , $hr , $vt ) = split( / / , $line ) ;
        my $orientation = ( $hr - $hl > $vt - $vb ) ? "landscape" : "portrait" 
; 

        push( @page_orients , $orientation ) ; 
        
    }
    close GPFILE ; 

    return @page_orients ; 
}


## we also need to know the page size (i.e. "A4" or "letter")
## we also need to know the unit of measure (i.e. "pts" "inches" "cm" )
## we'll set default to "letter" and "pts"

sub get_size {

    ## what are the borders? 
    my ( $hl , $vb , $hr , $vt , $orientation , $pagesize ) = @_[0..5] ;
        
    ## there are 72 points per inch, therefore
    ## letter size paper is 612 by 792 points = 8.5 by 11 inches
    my $pgwidth ; 
    my $pgheight ; 

    ## set page size in points 
    if ( $orientation eq "portrait" && $pagesize eq "letter" ) { 
        $pgwidth  =  612 ; 
        $pgheight =  792 ;

    } elsif ( $orientation eq "landscape" && $pagesize eq "letter" ) { 
        $pgwidth  =  792 ; 
        $pgheight =  612 ;

    } elsif ( $orientation eq "portrait" && $pagesize eq "legal" ) { 
        $pgwidth  =  612 ; 
        $pgheight = 1008 ;

    } elsif ( $orientation eq "landscape" && $pagesize eq "legal" ) { 
        $pgwidth  = 1008 ; 
        $pgheight =  612 ;

    } elsif ( $orientation eq "portrait" && $pagesize eq "a4" ) { 
        $pgwidth  = 21.0 * 28.3464567 ; 
        $pgheight = 29.7 * 28.3464567 ;

    } elsif ( $orientation eq "landscape" && $pagesize eq "a4" ) { 
        $pgwidth   = 29.7 * 28.3464567 ;
        $pgheight  = 21.0 * 28.3464567 ; 

    } else {
        die "\n\tSorry. I do not understand the desired page size and 
orientation.\n\n"
    }

    my @otarray = ($pagesize , $orientation , $pgwidth , $pgheight ) ; 
    return @otarray ; 
}


## define function to get the proper scale factor 
sub getscale { 

    my ($hl , $vb , $hr , $vt , $pagesize , $orientation , $pgwidth , $pgheight 
) = @_[0..7] ; 

    my $horiz = $pgwidth  / ($hr-$hl) ;
    my $vert  = $pgheight / ($vt-$vb) ;
    
    if ( $horiz < $vert ) { 
        return $horiz ; 
    } else {
        return $vert  ;
    }
}


## define a function that gets new page metrics and converts 
## to the units of measure 

sub get_new_metrics {

    ## arguments
    my @inarray = @_[0..7] ; 
    my ($hl , $vb , $hr , $vt , $pagesize , $orientation , $pgwidth , $pgheight 
) = @inarray ; 

    ## assume that we want 5 percent white margins around the edges ???
    ## we should make this an argument in the future
    my $margins = 1.10 ; 
    my $scale = round( getscale( @inarray ) / $margins , 6 ) ; 

    ## find midpoints on the page 
    my $midh = (($hr - $hl) / 2 ) + $hl ;
    my $midv = (($vt - $vb) / 2 ) + $vb ;

    ## get integer values of page width and height
    $pgwidth  = round( $pgwidth  , 0 ) ; 
    $pgheight = round( $pgheight , 0 ) ; 

    ## here are the new page metrics
    my $nhl = round( (( $midh * $scale ) - ( $pgwidth  / 2 )) , 0 ) ;
    my $nhr = $nhl + $pgwidth   ; 
    ## my $nvb = round( (( $midv * $scale ) - ( $pgheight / 2 )) , 0 ) ;
    ## my $nvt = $nvb + $pgheight  ; 

    ## alternative is to top-align the pages
    my $nvt = round( $vt * $scale * (1+ (($margins -1)/2 )) , 0 ) ;
    my $nvb = $nvt - $pgheight  ; 
    
    
    my @otarray = ( $nhl , $nvb , $nhr , $nvt , $scale ) ; 
    return @otarray ; 
}



## need a function that evaluates correctness of arguments 
sub should_quit {

    ## die if page has zero width or height
    my ( $hl , $vb , $hr , $vt ) = @_[0..3] ; 
    
    if ( $hl == $hr || $vb == $vt ) { 
        
        print "\n" ; 
        if ( $hl == $hr ) { 
            print "\tError! The arguments imply page width of zero.\n";
        }
        if ( $vb == $vt ) { 
            print "\tError! The arguments imply page height of zero.\n";
        }
        print "\n" ; 
        die ; 
    }
}



## these two are modified versions from:
## 
http://stackoverflow.com/questions/12647/how-do-i-tell-if-a-variable-has-a-numeric-value-in-perl

sub is_integer {
    my $inval = $_[0] ; 
    defined $inval && $inval =~ /^[+-]?\d+$/;
}

sub is_float {
    my $inval = $_[0] ; 
    defined $inval && $inval =~ /^[+-]?\d+(\.\d+)?$/;
}


## define a function to do rounding because the integer function truncates 
toward 0 
## 
## if the argument is a float, then leading zeroes are automatically removed 

sub round_int { 

    my $inval = $_[0] ; 

    ## get rid of commas 
    $inval =~ s/,//g ;

    if ( is_float( $inval ) ) {

        my $down = int( $inval ) ;
        my $up = $down + 1 ;
        
        my $ui = $up - $inval ;
        my $id = $inval - $down ; 
        
        if ( $ui <= $id ) { 
            return $up ;
        } else { 
            return $down ; 
        }
    } else { 
        return undef ; 
    }
}


sub round {

    my $inval = $_[0] ;
    $inval =~ s/,//g ;

    if ( is_float( $inval ) ) {

        if ( ! $_[1] ) {
            my $place = 1 ;
            return round_int( $place * $inval ) / $place ;
        } else {
            my $place = 10**($_[1]) ;
            return round_int( $place * $inval ) / $place ;
        }
    } else { 
        return undef ; 
    }
}


## we need prologue and epilogue

sub make_prologue {

    my $text = <<END;
margin = 0;

/* Beginning of BSD licensed code from Tomas Janousek. */
function setProp4( dict, p, a, b, c, d ) {
    if ( !dict.exist( p ) ) {
            var n = createArray();
            n.add( createReal( 0 ) );
            n.add( createReal( 0 ) );
            n.add( createReal( 0 ) );
            n.add( createReal( 0 ) );
            dict.add( p, n );
        }
        
        x = dict.property( p );
        x.property( 0 ).set( a );
        x.property( 1 ).set( b );
        x.property( 2 ).set( c );
        x.property( 3 ).set( d );
}

function setCrop( doc, pagenum, a, b, c, d ) {
    /* Ignore blank pages. */
    if ( a >= c || b >= d ) return;
        
        dict = doc.getPage( pagenum ).getDictionary();

        /* Adjust to bottom left corner of MediaBox. */
        media = dict.property( "MediaBox" );
        x = media.property( 0 ).value();
        y = media.property( 1 ).value();
        a += x; c += x;
        b += y; d += y;
        
        /* Add margin. */
        a -= margin; b -= margin; c += margin; d += margin;
        
        /* Set CropBox and TrimBox. */
        setProp4( dict, "CropBox", a, b, c, d );
        setProp4( dict, "MediaBox", a, b, c, d );
        setProp4( dict, "TrimBox", a, b, c, d );

}
/* End of BSD licensed code from Tomas Janousek. */

function setScale( doc , pagenum , sx , sy ) {
    var pg = doc.getPage( pagenum ) ; 
    pg.setTransformMatrix([sx,0,0,sy,0,0]);
}

doc = loadPdf( takeParameter(), false );
END

    return $text ; 
}

sub make_epilogue {

    my $text = <<END;
doc.save( false );
exit( 0 );
END

    return $text ; 
}

## print usage of this script

sub print_usage {

    my $program = "pdfcrop" ;

my $usage = <<"END_OF_USAGE";
Syntax:   \L$program\E input.pdf [papersize] [output.pdf]

Function: 
  calculates the page metrics and scale factor of a PDF file, 
  then crops and scales the PDF, so that it neatly fits on a 
  standard size sheet of paper

Options:
  -h  --help  help      print usage and exit

  letter  Letter        if you want letter size paper
  a4      A4            if you want A4 size paper 
  legal   Legal         if you want legal size paper 

Details:
  $program attempts to determine your desired paper size by
  reading /etc/papersize. If that file cannot be read, it 
  assumes that you want letter size pages.

  If an output file name is not given, then $program assumes
  that the input file should be cropped and scaled.

Examples:
  \L$program\E input.pdf
  \L$program\E input.pdf A4 output.pdf
END_OF_USAGE


    ## now print it out and quit 
    print "\n";
    print $usage ;
    die "\n"; 
}
------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Pdfedit-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pdfedit-support

Reply via email to