Pythonise this algorithm ?

news Wed, 18 Jan 2006 16:15:40 -0800

Don't you hate the *.ps/*.pdf texts which are arranged in columns
as if it was a newspaper ? Especially when you want to email
a section after using 'pdftotxt'.


I'm guessing that an algorithm to extract colums could work 
like this : [assume 2 column, but 3, 4.. should be similar, remember 
that the RHS-colm of pageN continues to the LHS-colm of pageN+1]

Initialise;
Repeat (* NextBlok or exit DO *)
BeginBloks:-
   Mark the TopLeftCorner -> get(StartRow,StartColm);
   Mark the BotmRightCorner -> get(EndRow,EndColm);
   Extract the Blok's text :-
    For Row = StartRow to EndRow;
       For Colm = StartColm to EndColm
         PutCharToBufr;
      DoLineTerminator;
Until ExitBloks.

Obviously the nesting is: Bloks > Rows > Colms.

Then it can be morphed to clean up the ">>>>" in newsgroup
threads as the lines get too long for the extra ">"  ?

Thanks for any input,

== Chris Glur.

-- 
http://mail.python.org/mailman/listinfo/python-list

Pythonise this algorithm ?

Reply via email to