Re: count pages in a pdf

2007-11-27 Thread Andreas Lobinger
Tim Golden wrote:
> [EMAIL PROTECTED] wrote:
> 
>> is it possible to parse a pdf file in python?  for starters, i would
>> like to count the number of pages in a pdf file.  i see there is a
>> project called ReportLab, but it seems to be a pdf generator... i
>> can't tell if i would be able to parse a pdf file programmically.

http://groups.google.de/group/comp.lang.python/msg/6f304970b4ff40ce
and following.

> Well the simple expedient of putting "python count pages pdf" into
> Google turned up the following link:
> 
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496837

h. There is a non-vanishing possibility that this pattern-
matching can give you false positives -> not reliable.

Wishing a happy day,
LOBI
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat error, help to debug?

2007-08-28 Thread Andreas Lobinger
Aloha,

Andreas Lobinger wrote:
> Andreas Lobinger wrote:
>> Lawrence D'Oliveiro wrote:
>>> In message <[EMAIL PROTECTED]>, Andreas Lobinger wrote:
>>>> Anyone any idea where the error is produced?
> The registered Handler has to return a (integer) value.
> Would have been nice if this had been mentioned in the documentation.

Delete last line, it is mentioned in the documentation.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat error, help to debug?

2007-08-28 Thread Andreas Lobinger
Aloha,

Andreas Lobinger wrote:
> Lawrence D'Oliveiro wrote:
>> In message <[EMAIL PROTECTED]>, Andreas Lobinger wrote:
>>> Anyone any idea where the error is produced?

... to share my findings with you:

 def ex(self,context,baseid,n1,n2):
 print "x",context,n1,n2
 return 1

The registered Handler has to return a (integer) value.
Would have been nice if this had been mentioned in the documentation.

Wishing a happy day,
LOBI
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expat error, help to debug?

2007-08-27 Thread Andreas Lobinger
Aloha,

Lawrence D'Oliveiro wrote:
> In message <[EMAIL PROTECTED]>, Andreas Lobinger wrote:
>>Anyone any idea where the error is produced?

> Do you want to try adding an EndElementHandler as well, just to get more
> information on where the error might be happening?

I want.

Adding an EndElement (left as an exercise to the user) handler the
output looks like this:
[42] scylla(scylla)> python pbxml.py s3.xml
s 7 book {}
x bookinfo bookinfo.xml None
s 9 chapter {u'id': u'technicalDescription'}
s 9 title {}
e title
s 10 para {}
e para
e chapter
e book
Traceback (most recent call last):
   File "pbxml.py", line 29, in ?
 fromxml(sys.argv[1])
   File "pbxml.py", line 24, in fromxml
 p.ParseFile(file(fname))
TypeError: an integer is required

which shows me that the error is caused after parsing the /book ...
BUT still within p.ParseFile (expat internal), so i can't look
into it.

The example here may be missleading. It was stripped down from
a quite large docbook.xml and there ther error happened in the
middle of the document, not at the end.

Wishing a happy day,
LOBI
-- 
http://mail.python.org/mailman/listinfo/python-list


expat error, help to debug?

2007-08-23 Thread Andreas Lobinger
Aloha,

i'm trying to write an xml filter, that extracts some info about
an .xml document (with external entities), esp. start elements and
external entities. The document is a DOCBOOK xml and afacs
well formed and passes our docbook toolchain (dblatex etc.).

My parser is (very simple):
[115] scylla(scylla)> more pbxml.py

class xmlhandle:
 def __init__(self):
 self.parser_stack = [];
 self.parser = None;

 def se(self,name,attr):
 print "s", self.parser.CurrentLineNumber, name, attr

 def ex(self,context,baseid,n1,n2):
 print "x",context,n1,n2

def fromxml(fname):
 import xml.parsers.expat
 p = xml.parsers.expat.ParserCreate()
 xl = xmlhandle()
 p.StartElementHandler = xl.se
 p.ExternalEntityRefHandler = xl.ex
 xl.parser = p
 p.ParseFile(file(fname))
 return

if __name__ == "__main__":
import sys
fromxml(sys.argv[1])

my document (in 2 parts):

[116] scylla(scylla)> more s3.xml


]>

&bookinfo;
technical description
 
 This chapter includes specification of the main simulation loop.
 



[118] scylla(scylla)> more bookinfo.xml

   BookTitle
   
 
 A
 B
 
   


The run produces:

[120] scylla(scylla)> python pbxml.py s3.xml
s 7 book {}
x bookinfo bookinfo.xml None
s 9 chapter {u'id': u'technicalDescription'}
s 9 title {}
s 10 para {}
Traceback (most recent call last):
   File "pbxml.py", line 25, in ?
 fromxml(sys.argv[1])
   File "pbxml.py", line 20, in fromxml
 p.ParseFile(file(fname))
TypeError: an integer is required

Anyone any idea where the error is produced?
Anyone any idea how to debug(? if it's really a bug or
missunderstanding of expate) this?

Hoping for an answer and wishing a happy day,
LOBI
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: SVG rendering with Python

2005-12-15 Thread Andreas Lobinger
Aloha,

richard wrote:
> Dennis Benzinger wrote:
>>Does anybody know of a SVG rendering library for Python?
> Google "python svg"

... to find what?

Whishing a happy day
LOBI
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Addressing the last element of a list

2005-11-08 Thread Andreas Lobinger
Aloha,

[EMAIL PROTECTED] wrote:
> Isn't there an easier way than
> lst[len(lst) - 1] = ...
lst[-1] = ...

Wishing a happy day
LOBI
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: extract PDF pages

2005-10-13 Thread Andreas Lobinger
Aloha,

David Isaac wrote:
> I am looking for a Python solution.
> Just for PDF page extraction.
> Any hope?

With python, there's always hope.
http://sourceforge.net/projects/pdfplayground

In the CVS (sorry no distribution at the time) you'll find
an example page-extract.
http://cvs.sourceforge.net/viewcvs.py/pdfplayground/ppg/Exp/page-extract.py?rev=1.1&view=markup

pdfplayground is limited at the moment to PDF <= 1.4.
If you want to do more with .pdfs you'll probably need at least
a basic understanding of the PDF specification. pdfplayground
is focused at low-level .pdf (by implementation resources...).

Thomas Lotze is also preparing a pdf reader/writer project:
http://svn.thomas-lotze.de/PDFSpec/

So is David Boddie:
http://www.boddie.org.uk/david/Projects/Python/pdftools

Wishing a happy day
LOBI
-- 
http://mail.python.org/mailman/listinfo/python-list


zlib written in python

2005-09-20 Thread Andreas Lobinger
Aloha,

is a pure _python_ implementation of the zlib available?
I have broken zlib streams and need to patch the deocder to
get them back.

Wishing a happy day
LOBI

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: encryption with python

2005-09-07 Thread Andreas Lobinger
Aloha,

[EMAIL PROTECTED] wrote:
> I was wondering if someone can recommend a good encryption algorithm
> written in python. 
> It would be great if there exists a library already written to do this,
> and if there is, can somebody please point me to it??

M2Crypto, interface to OpenSSL
http://sandbox.rulemaker.net/ngps/m2

Wishing a happy day
LOBI
-- 
http://mail.python.org/mailman/listinfo/python-list


using hotshot for timing and coverage analysis

2005-07-15 Thread Andreas Lobinger
Aloha,

hotshot.Profile has flags for recording timing per line and line
events. Even if i had both set to 1 i still get only the
standard data (time per call).

Is there any document available that has examples how to use
the hotshot for converage analysis and to display timing
per line?

Hoping for an answer and wishing a happy day
LOBI
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Frankenstring

2005-07-14 Thread Andreas Lobinger
Aloha,

Thomas Lotze wrote:
>>A string, and a pointer on that string. If you give up the boundary
>>condition to tell backwards, you can start to eat up the string via f =
>>f[p:]. There was a performance difference with that, in fact it was faster
>>~4% on a python2.2.

> When I tried it just now, it was the other way around. Eating up the
> string was slower, which makes sense to me since it involves creating new
> string objects all the time.

I expected the f[p:] also to be slower, the 4% i only measured on one 
platform. Most propably the CG and memory management isn't the same.

>>I dont't expect any iterator solution to be faster than that.

> It's not so much an issue of iterators, but handling Python objects
> for every char. Iterators would actually be quite helpful for searching: I
> wonder why there doesn't seem to be an str.iterfind or str.itersplit
> thing. And I wonder whether there shouldn't be str.findany and
> str.iterfindany, which takes a sequence as an argument and returns the
> next match on any element of it.

There is a finditer in the re. I'm currently rewriting a few pattern
matching things and find it quite valueable.

 >>> import re
 >>> pat = re.compile('[57]')
 >>> f = "754356184756046104564"
 >>> for a in pat.finditer(f):
...  print a.start(),f[a.start()]
...
0 7
1 5
4 5
9 7
10 5
18 5

Wishing a happy day
LOBI
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Frankenstring

2005-07-13 Thread Andreas Lobinger
Aloha,

Thomas Lotze wrote:
> I think I need an iterator over a string of characters pulling them out
> one by one, like a usual iterator over a str does. At the same time the
> thing should allow seeking and telling like a file-like object:
f = frankenstring("0123456789")
for c in f:
> ... print c
> ... if c == "2":
> ... break
> ... 
> 0
> 1
> 2
f.tell()
> 3L
f.seek(7)
for c in f:
> ... print c
> ... 
> 7
> 8
> 9
> I can think of more than one clumsy way to implement the desired
> behaviour in Python; I'd rather like to know whether there's an
> implementation somewhere that does it fast. (Yes, it's me and speed
> considerations again; this is for a tokenizer at the core of a library,
> and I'd really like it to be fast.) 

You can already think my answer, because i'm doing this
at the core of a similar library, but to give others
the chance to discuss.

 >>> f = "0123456789"
 >>> p = 0
 >>> t2 = f.find('2')+1
 >>> for c in f[p:t2]:
...  print c
...
0
1
2
 >>> p = 7
 >>> for c in f[p:]:
...  print c
...
7
8
9

A string, and a pointer on that string. If you give up the
boundary condition to tell backwards, you can start to eat up
the string via f = f[p:]. There was a performance difference
with that, in fact it was faster ~4% on a python2.2.

I dont't expect any iterator solution to be faster than
that.

Wishing a happy day
LOBI
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Obtaining glyph width in Python

2005-07-04 Thread Andreas Lobinger
Aloha,

Charlie wrote:
> Hi, I'm looking for a way to obtain the width of a string, either in actual
> inches/centimeters, or pixels will also work.  Unfortunately this seems
> difficult as I'd like to keep things as close to the stock Python install as
> possible, and I'm not working with Graphics or X at all. 

So you need both: metrics for single characters/glyphs and con-
catenated glyphs and words.

> PIL = Huge for only using one function.  I'm not working with any graphics.
> PyFT = Everyone uses FreeType2 now, and PyFT seems dead anyhow.
> PyFT2 = Does not exist.
> tkinter.text() = Works with X, creates windows no matter what you do.
> t1lib = Separate package, no TTF support.
> t1python = Same thing as t1lib?

For the glyph metrics and informations there is the ttx/fonttools
project on sourceforge available. Afiar fonttools only need a
Numeric installation.

> Ultimately, I'm looking to take a stream of text, and break it up into lines
> based on page width... and I need to know how wide (and ultimately how tall,
> for page breaks) the individual glyphs are so I can break properly.  If 
> there's
> an easier way to do this than calculating individual glyph width, I'm open to
> that too.

It looks like a little bit that you're redeveloping TeX (in python)...

> I was really just looking to see if there was anything out there that wasn't
> too large or too obscure/dated.  Maybe there's something lower level that 
> could
> be done to achieve this?  Is there metadata in the font that holds this
> information that could be extracted?

Actually there is not only meta but real data included in the font,
speaking of Type1, TrueType and OpenType scalable outline fonts.

Wishing a happy day
LOBI


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Writing a bytecode interpreter (for TeX dvi files)

2005-05-27 Thread Andreas Lobinger
Aloha,

Jonathan Fine wrote:
> I'm writing some routines for handling dvi files.
> In case you didn't know, these are TeX's typeset output.
> These are binary files containing opcodes.
> I wish to write one or more dvi opcode interpreters.
> Are there any tools or good examples to follow for
> writing a bytecode interpreter?

As far as i know, dvi is a very straight forward format, commands
followed by parameters, no conditionals, no loops.
For similar designs i used something like the following approach:

s = file('a.dvi','r').read() # read complete file to string

while s:
command = ord(s[0])

 if command < 128:
#typeset command
s = s[1:]
 elif command = 139:
#bop command
param = s[:40]
#interpret param
   c = struct.unpack('D',param[:3])
#consume s
s = s[41:]
 else:
#undefined command
s = s[1:]

You can work directly on strings, or convert to a list. If you don't
want long if/elif lists, you can use a dict as a dispatcher (python
cookbook has an example?). For most of the commands you can use
a lookup table for the parameter list length.
TeX §591 claims, that dvi is stricly interpretable from front to end.
The description in Tex§585++ can be transcripted to struct definitions
easily.

Wishing a happy day
LOBI

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: searching pdf files for certain info

2005-02-22 Thread Andreas Lobinger
Aloha,
rbt wrote:
Thanks guys... what if I convert it to PS via printing it to a file or 
something? Would that make it easier to work with?
Not really...
The classical PS Drivers (f.e. Acroread4-Unix print-> ps) simply
define the pdf graphics and text operators as PS commands and
copy the pdf content directly.
Wishing a happy day
LOBI
--
http://mail.python.org/mailman/listinfo/python-list


Re: searching pdf files for certain info

2005-02-22 Thread Andreas Lobinger
Aloha,
rbt wrote:
Not really a Python question... but here goes: Is there a way to read 
the content of a PDF file and decode it with Python? I'd like to read 
PDF's, decode them, and then search the data for certain strings.
First of all,
http://groups.google.de/groups?selm=400CF2E3.29506EAE%40netsurf.de&output=gplain
still applies here.
If you can deal with a very basic implementation of a pdf-lib you
might be interested in
http://sourceforge.net/projects/pdfplayground
In the CVS (or the current snapshot) you can find in
ppg/Doc/text_extract.txt an example for text extraction.
 >>> import pdffile
 >>> import pages
 >>> import zlib
 >>> pf = pdffile.pdffile('../pdf-testset1/a.pdf')
 >>> pp = pages.pages(pf)
 >>> c = zlib.decompress(pf[pp.pagelist[0]['/Contents']].stream)
 >>> op = pdftool.parse_content(c)
 >>> sop = [x[1] for x in op if x[0] in ["'", "Tj"]]
 >>> for a in sop:
print a[0]
Wishing a happy day
LOBI
--
http://mail.python.org/mailman/listinfo/python-list


OT: Re: PDF count pages

2004-12-09 Thread Andreas Lobinger
Aloha,
[EMAIL PROTECTED] wrote:
Andreas Lobinger wrote:
>>> import pdffile
I browsed the code in CVS and it looks like a pretty comprehensive
implementation. Maybe we should join forces.
I have problems contacting you via the given e-mail adress.
Wishing a happy day
LOBI
--
http://mail.python.org/mailman/listinfo/python-list


Re: PDF count pages

2004-12-06 Thread Andreas Lobinger
Aloha,
Jose Benito Gonzalez Lopez wrote:
Does anyone know how I could do in order
to get/count the number of pages of a PDF file? 
Like this ?
Python 2.2.2 (#3, Apr 10 2003, 17:06:52)
[GCC 2.95.2 19991024 (release)] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import pdffile
>>> pf = pdffile.pdffile('../rfc1950.pdf')
>>> import pages
>>> pp = pages.pages(pf)
>>> len(pp.pagelist)
10
>>>
This is an example of the usage of pdfplayground. pdfplayground
is available via sourceforge. There is no package at the
moment, but you should be able to check out via anon-cvs.
Wishing a happy day
LOBI
--
http://mail.python.org/mailman/listinfo/python-list