Don't you hate the *.ps/*.pdf texts which are arranged in columns as if it was a newspaper ? Especially when you want to email a section after using 'pdftotxt'.
I'm guessing that an algorithm to extract colums could work like this : [assume 2 column, but 3, 4.. should be similar, remember that the RHS-colm of pageN continues to the LHS-colm of pageN+1] Initialise; Repeat (* NextBlok or exit DO *) BeginBloks:- Mark the TopLeftCorner -> get(StartRow,StartColm); Mark the BotmRightCorner -> get(EndRow,EndColm); Extract the Blok's text :- For Row = StartRow to EndRow; For Colm = StartColm to EndColm PutCharToBufr; DoLineTerminator; Until ExitBloks. Obviously the nesting is: Bloks > Rows > Colms. Then it can be morphed to clean up the ">>>>" in newsgroup threads as the lines get too long for the extra ">" ? Thanks for any input, == Chris Glur. -- http://mail.python.org/mailman/listinfo/python-list