Re: [Tutor] How do I make pattern to find only '.html' file using Python Regular Expression?
Abdullah Al Imran writes: > How to do it using Python Regular Expression? Don't assume which tool you must use; instead, ask how best the problem can be solved. In the case of parsing HTML, regular expressions are a poor fit http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454> because they're not doing the job of parsing. Use a parser which better understands HTML, like Beautiful Soup https://pypi.python.org/pypi/BeautifulSoup>. -- \ “An expert is a man who has made all the mistakes which can be | `\ made in a very narrow field.” —Niels Bohr | _o__) | Ben Finney ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How do I make pattern to find only '.html' file using Python Regular Expression?
On 01/04/15 20:22, Abdullah Al Imran wrote: I have some HTML content where there are many links as the following pattern: http://example.com/2013/01/problem1.html";>Problem No-1 I want to filter all the links into a list as: ['http://example.com/2013/01/problem1.html', 'http://example.com/2013/02/problem2.html'] How to do it using Python Regular Expression? You can try, but regular expressions are not a reliable way to parse HTML. You are much better to use a dedicated HTML parser such as the one found in htmllib in the standard library or a third party tool like BeautifulSoup. These recognise the different tag types and separate the content and data for you. You can then just ask for the parser to find tags and then fetch the data from each tag. HTH -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] How do I make pattern to find only '.html' file using Python Regular Expression?
I have some HTML content where there are many links as the following pattern: http://example.com/2013/01/problem1.html";>Problem No-1 I want to filter all the links into a list as: ['http://example.com/2013/01/problem1.html', 'http://example.com/2013/02/problem2.html'] How to do it using Python Regular Expression? If I want to filter all the links into a dictionary as: ['http://example.com/2013/01/problem1.html':'Problem No-1', 'http://example.com/2013/02/problem2.html ':'Problem No-2',] How do I do it? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor