Re: [Tutor] How do I make pattern to find only '.html' file using Python Regular Expression?

2015-04-01 Thread Ben Finney
Abdullah Al Imran  writes:

> How to do it using Python Regular Expression?

Don't assume which tool you must use; instead, ask how best the problem
can be solved.

In the case of parsing HTML, regular expressions are a poor fit
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454>
because they're not doing the job of parsing.

Use a parser which better understands HTML, like Beautiful Soup
https://pypi.python.org/pypi/BeautifulSoup>.

-- 
 \  “An expert is a man who has made all the mistakes which can be |
  `\ made in a very narrow field.” —Niels Bohr |
_o__)  |
Ben Finney

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How do I make pattern to find only '.html' file using Python Regular Expression?

2015-04-01 Thread Alan Gauld

On 01/04/15 20:22, Abdullah Al Imran wrote:

I have some HTML content where there are many links as the following pattern:

http://example.com/2013/01/problem1.html";>Problem No-1

I want to filter all the links  into a list as:
['http://example.com/2013/01/problem1.html', 
'http://example.com/2013/02/problem2.html']

How to do it using Python Regular Expression?


You can try, but regular expressions are not a reliable way
to parse HTML.

You are much better to use a dedicated HTML parser such
as the one  found in  htmllib in the standard library or
a third party tool like BeautifulSoup.

These recognise the different tag types and separate the content
and data for you. You can then just ask for the parser to
find  tags and then fetch the data from each tag.

HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] How do I make pattern to find only '.html' file using Python Regular Expression?

2015-04-01 Thread Abdullah Al Imran
I have some HTML content where there are many links as the following pattern:

http://example.com/2013/01/problem1.html";>Problem No-1

I want to filter all the links  into a list as:
['http://example.com/2013/01/problem1.html', 
'http://example.com/2013/02/problem2.html']

How to do it using Python Regular Expression?

If I want to filter all the links into a dictionary as: 
['http://example.com/2013/01/problem1.html':'Problem No-1', 
'http://example.com/2013/02/problem2.html ':'Problem No-2',]

How do I do it?   
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor