Re: Generic web parser
On Sat, May 16, 2009 at 2:18 PM, S.Selvam s.selvams...@gmail.com wrote: Hi all, I have to design web parser which will visit the given list of websites and need to fetch a particular set of details. It has to be so generic that even if we add new websites, it must fetch those details if available anywhere. So it must be something like a framework. Though i have done some parsers ,but they will parse for a given format(For. eg It will get the data from title tag).But here each website may have different format and the information may available within any tags. I know its a tough task for me,but i feel with python it should be possible. My request is, if such thing is already available please let me know ,also your suggestions are welcome. Note: I planned to use BeautifulSoup for parsing. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list I'd recommend mechanize in combination with BeautifulSoup - it greatly simplifies most web-scraping tasks. -- http://mail.python.org/mailman/listinfo/python-list
Re: Generic web parser
http://groups.google.com/group/beautifulsoup/browse_thread/thread/d416dd19fdaa43a6 http://jjinux.blogspot.com/2008/10/python-some-notes-on-lxml.html andrew -- http://mail.python.org/mailman/listinfo/python-list
Re: Generic web parser
On Mon, May 18, 2009 at 1:59 PM, Jeremiah Dodds jeremiah.do...@gmail.comwrote: On Sat, May 16, 2009 at 2:18 PM, S.Selvam s.selvams...@gmail.com wrote: Hi all, I have to design web parser which will visit the given list of websites and need to fetch a particular set of details. It has to be so generic that even if we add new websites, it must fetch those details if available anywhere. So it must be something like a framework. Though i have done some parsers ,but they will parse for a given format(For. eg It will get the data from title tag).But here each website may have different format and the information may available within any tags. I know its a tough task for me,but i feel with python it should be possible. My request is, if such thing is already available please let me know ,also your suggestions are welcome. Note: I planned to use BeautifulSoup for parsing. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list I'd recommend mechanize in combination with BeautifulSoup - it greatly simplifies most web-scraping tasks. -- http://mail.python.org/mailman/listinfo/python-list Thank you all for your response, I have started to develop my design based on BeautifulSoup,i planned to write separate module for each information which i would like to extract from the website and through the url at it.It has to extract the required information if available. Each module tries with pattern matching and returns the result. I planned to write it in a generic way.I welcome your suggestions. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list
Re: Generic web parser
I don't see the issue of using urllib and Sqllite for everything you mention here. On Sat, May 16, 2009 at 4:18 PM, S.Selvam s.selvams...@gmail.com wrote: Hi all, I have to design web parser which will visit the given list of websites and need to fetch a particular set of details. It has to be so generic that even if we add new websites, it must fetch those details if available anywhere. So it must be something like a framework. Though i have done some parsers ,but they will parse for a given format(For. eg It will get the data from title tag).But here each website may have different format and the information may available within any tags. I know its a tough task for me,but i feel with python it should be possible. My request is, if such thing is already available please let me know ,also your suggestions are welcome. Note: I planned to use BeautifulSoup for parsing. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list -- http://www.goldwatches.com -- http://mail.python.org/mailman/listinfo/python-list
Generic web parser
Hi all, I have to design web parser which will visit the given list of websites and need to fetch a particular set of details. It has to be so generic that even if we add new websites, it must fetch those details if available anywhere. So it must be something like a framework. Though i have done some parsers ,but they will parse for a given format(For. eg It will get the data from title tag).But here each website may have different format and the information may available within any tags. I know its a tough task for me,but i feel with python it should be possible. My request is, if such thing is already available please let me know ,also your suggestions are welcome. Note: I planned to use BeautifulSoup for parsing. -- Yours, S.Selvam -- http://mail.python.org/mailman/listinfo/python-list