Hello gov,

> Here's an example of the raw text that I have to work with:
> 
> 
> ADDRESS INFORMATION/RENSEIGNEMENTS SUR L'ADRESSE:
> ****************************
> 
> FOR/POUR AL/LA:  20
>   CORR TYP:  A1B 2C3      P:3 CHNGD/CHANG
>   LANG: E CONS/REGR:             #######
>   MRS XXX X XXXXXXX
>   ### XXXXXXXXX ST                      DD   TYP:               P:6
> CHNGD/CHANG
>   MONCTON NB                            LANG: E CONS/REGR:
> #######
>                                         MRS XXX X          XXXXXXX
>                                         #####
>                                         ####
>                                         ###-###-#
> 
> ADDRESS INFORMATION/RENSEIGNEMENTS SUR L'ADRESSE:
> ****************************
> 
> FOR/POUR AL/LA:  30
>   BOTH TYP:  A1B 2D3      P:3 CHNGD/CHANG
>   LANG: E CONS/REGR:             #######
>   MISS XXXX XXXXX
>   ### XXXXXXXX ST
>   MONCTON NB
> 
> EARNINGS VITAL INFORMATION/RENSEIGNEMENTS ESSENTIELS SUR LES GAINS:
> ***********
> 
> (the # = any number, and the X's are just regular text)
> I would like to extract the address information, but the two different
> text objects on the right hand side are difficult to remove.  I think
> it would be easier if I could just extract a fixed square of
> information, but I don't have a clue as to how to go about it.
> 
> If anyone could give me suggestions as to methods in sorting this type
> of data, it would be appreciated.
Maybe regular expression are too difficult for this. I'd try one of the
parsing toolkits (such as PLY, PyParsing ...), it might be more suitable
for the job.

HTH.
--
------------------------------------------------------------------------
Miki Tebeka <[EMAIL PROTECTED]>
http://tebeka.bizhat.com
The only difference between children and adults is the price of the toys

Attachment: pgpcPcZBclyGq.pgp
Description: PGP signature

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to