Re: String parsing

Gabriel Genellina Tue, 08 May 2007 18:45:24 -0700

En Tue, 08 May 2007 22:09:52 -0300, HMS Surprise <[EMAIL PROTECTED]>  
escribió:


> The string below is a piece of a longer string of about 20000
> characters returned from a web page. I need to isolate the number at
> the end of the line containing 'LastUpdated'. I can find
> 'LastUpdated'  with .find but not sure about how to isolate the
> number. 'LastUpdated' is guaranteed to occur only once. Would
> appreciate it if one of you string parsing whizzes would take a stab
> at it.

> <input type="hidden" name="RFP"                               value="-1"/>
> <!--<input type="hidden" name="EnteredBy"             value="johnxxxx"/>-->
> <input type="hidden" name="EnteredBy"         value="john"/>
> <input type="hidden" name="ServiceIndex"      value="1"/>
> <input type="hidden" name="LastUpdated"       value="1178658863"/>
> <input type="hidden" name="NextPage"          value="../active/active.php"/>
> <input type="hidden" name="ExistingStatus"    value="10" ?>
> <table width="98%" cellpadding="0" cellspacing="0" border="0"
> align="center"

You really should use an html parser here. But assuming that the page will  
not change a lot its structure you could use a regular expression like  
this:

expr = re.compile(r'name\s*=\s*"LastUpdated"\s+value\s*=\s*"(.*?)"',  
re.IGNORECASE)
number = expr.search(text).group(1)
(Handling of "not found" and "duplicate" cases is left as an exercise for  
the reader)

Note that <input value="1178658863" type="hidden" name="LastUpdated" /> is  
as valid as your html, but won't match the expression.

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: String parsing

Reply via email to