On Wednesday, November 25, 2015 at 6:34:00 PM UTC-5, Grobu wrote: > On 25/11/15 23:48, ryguy7272 wrote: > >> re.findall( r'\<a[^>]+title="(.+?)"', html ) > [ ... ] > > Thanks!! Is that regex? Can you explain exactly what it is doing? > > Also, it seems to pick up a lot more than just the list I wanted, but > > that's ok, I can see why it does that. > > > > Can you just please explain what it's doing??? > > > > Yes it's a regular expression. Because RegEx's use the backslash as an > escape character, it is advisable to use the "raw string" prefix (r > before single/double/triple quote. To illustrate it with an example : > >>> print "1\n2" > 1 > 2 > >>> print r"1\n2" > 1\n2 > As the backslash escape character is "neutralized" by the raw string, > you can use the usual RegEx syntax at leisure : > > \<a[^>]+title="(.+?)" > > \< was a mistake on my part, a single < is perfectly enough > [^>] is a class definition, and the caret (^) character indicates > negation. Thus it means : any character other than > > + incidates repetition : one or more of the previous element > . will match just anything > .+" is a _greedy_ pattern that would match anything until it encountered > a double quote > > The problem with a greedy pattern is that it doesn't stop at the first > match. To illustrate : > >>> a = re.search( r'".+"', 'title="this is a test" class="test"' ) > >>> a.group() > '"this is a test" class="test"' > > It matches the first quote up to the last one. > On the other hand, you can use the "?" modifier to specify a non-greedy > pattern : > > >>> b = re.search( r'".+?"', 'title="this is a test" class="test"' ) > '"this is a test"' > > It matches the first quote and stops looking for further matches after > the second quote. > > Finally, the parentheses are used to indicate a capture group : > >>> a = re.search( r'"this (is) a (.+?)"', 'title="this is a test" > class="test"' ) > >>> a.groups() > ('is', 'test') > > > You can find detailed explanations about Python regular expressions at > this page : https://docs.python.org/2/howto/regex.html > > HTH, > > -Grobu-
Wow! Awesome! I bookmarked that link! Thanks for everything!!! -- https://mail.python.org/mailman/listinfo/python-list