Re: Generic web parser

2009-05-18 Thread Jeremiah Dodds
On Sat, May 16, 2009 at 2:18 PM, S.Selvam s.selvams...@gmail.com wrote:

 Hi all,

 I have to design web parser which will visit the given list of websites and
 need to fetch a particular set of details.
 It has to be so generic that even if we add new websites, it must fetch
 those details if available anywhere.
 So it must be something like a framework.

 Though i have done some parsers ,but they will parse for a given
 format(For. eg It will get the data from title tag).But here each website
 may have different format and the information may available within any tags.

 I know its a tough task for me,but i feel with python it should be
 possible.
 My request is, if such thing is already available please let me know ,also
 your suggestions are welcome.

 Note: I planned to use BeautifulSoup for parsing.

 --
 Yours,
 S.Selvam

 --
 http://mail.python.org/mailman/listinfo/python-list


I'd recommend mechanize in combination with BeautifulSoup - it greatly
simplifies most web-scraping tasks.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Generic web parser

2009-05-18 Thread andrew cooke

http://groups.google.com/group/beautifulsoup/browse_thread/thread/d416dd19fdaa43a6

http://jjinux.blogspot.com/2008/10/python-some-notes-on-lxml.html

andrew


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Generic web parser

2009-05-18 Thread S.Selvam
On Mon, May 18, 2009 at 1:59 PM, Jeremiah Dodds jeremiah.do...@gmail.comwrote:



 On Sat, May 16, 2009 at 2:18 PM, S.Selvam s.selvams...@gmail.com wrote:

 Hi all,

 I have to design web parser which will visit the given list of websites
 and need to fetch a particular set of details.
 It has to be so generic that even if we add new websites, it must fetch
 those details if available anywhere.
 So it must be something like a framework.

 Though i have done some parsers ,but they will parse for a given
 format(For. eg It will get the data from title tag).But here each website
 may have different format and the information may available within any tags.

 I know its a tough task for me,but i feel with python it should be
 possible.
 My request is, if such thing is already available please let me know ,also
 your suggestions are welcome.

 Note: I planned to use BeautifulSoup for parsing.

 --
 Yours,
 S.Selvam

 --
 http://mail.python.org/mailman/listinfo/python-list


 I'd recommend mechanize in combination with BeautifulSoup - it greatly
 simplifies most web-scraping tasks.

 --
 http://mail.python.org/mailman/listinfo/python-list



Thank you all for your response,

I have started to develop my design based on BeautifulSoup,i planned to
write separate module for each information which i would like to extract
from the website and through the url at it.It has to extract the required
information if available.

Each module tries with pattern matching and returns the result.

I planned to write it in a generic way.I welcome your suggestions.
-- 
Yours,
S.Selvam
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Generic web parser

2009-05-17 Thread James Matthews
I don't see the issue of using urllib and Sqllite for everything you mention
here.

On Sat, May 16, 2009 at 4:18 PM, S.Selvam s.selvams...@gmail.com wrote:

 Hi all,

 I have to design web parser which will visit the given list of websites and
 need to fetch a particular set of details.
 It has to be so generic that even if we add new websites, it must fetch
 those details if available anywhere.
 So it must be something like a framework.

 Though i have done some parsers ,but they will parse for a given
 format(For. eg It will get the data from title tag).But here each website
 may have different format and the information may available within any tags.

 I know its a tough task for me,but i feel with python it should be
 possible.
 My request is, if such thing is already available please let me know ,also
 your suggestions are welcome.

 Note: I planned to use BeautifulSoup for parsing.

 --
 Yours,
 S.Selvam

 --
 http://mail.python.org/mailman/listinfo/python-list




-- 
http://www.goldwatches.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Generic web parser

2009-05-16 Thread S.Selvam
Hi all,

I have to design web parser which will visit the given list of websites and
need to fetch a particular set of details.
It has to be so generic that even if we add new websites, it must fetch
those details if available anywhere.
So it must be something like a framework.

Though i have done some parsers ,but they will parse for a given format(For.
eg It will get the data from title tag).But here each website may have
different format and the information may available within any tags.

I know its a tough task for me,but i feel with python it should be possible.
My request is, if such thing is already available please let me know ,also
your suggestions are welcome.

Note: I planned to use BeautifulSoup for parsing.

-- 
Yours,
S.Selvam
-- 
http://mail.python.org/mailman/listinfo/python-list