Re: [Tutor] Question about scraping

2014-05-31 Thread Joel Goldstick
On May 30, 2014 10:12 PM, Matthew Ngaha chigga...@gmail.com wrote:

 Thanks for the response Alan. I forgot to reply to tutor on my 2nd
 comment. Just incase someone might want to see it, here it is:

 Okay I think learning how to scrap (library or framework) is not
 worth the trouble. Especially if some people consider it illegal.
 Thanks for the input.
 ___

Check out beautiful soup

 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 https://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Question about scraping

2014-05-30 Thread Matthew Ngaha
Hey all. I've been meaning to get into web scraping and was pointed to
the directions of lxml (library) and scrapy (framework). Can I ask in
terms of web scraping, what's the difference between a library and a
framework? Surely everyone should use a framework but I get the idea
more people use the available libraries like lxml over scrapy. Any
advantages or disadvantages for each? And which do you use out of the
2?

I also have another question due to reading this: [Tutor] HTML
Parsing . It seems some experienced coders don't find scraping as
useful since web sites offer apis for their data. Is the idea/concept
here the same as scraping? And is there any use of scraping anymore
when sites are now offering their data?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question about scraping

2014-05-30 Thread Alan Gauld

On 30/05/14 18:25, Matthew Ngaha wrote:

Hey all. I've been meaning to get into web scraping and was pointed to
the directions of lxml (library) and scrapy (framework). Can I ask in
terms of web scraping, what's the difference between a library and a
framework?


I don;t know of anything web specific. A framework tends to be a much 
bigger thing than a library. It dictates the architecture of the 
solution rather than just providing a few functions/classes.



Surely everyone should use a framework


Why?
A framework is usually the fastest way to get started from zero but if 
you are integrating with an existing solution then a framework can add 
layers of unneeded complexity. As always the correct solution depends on 
the problem.



I also have another question due to reading this: [Tutor] HTML
Parsing . It seems some experienced coders don't find scraping as
useful since web sites offer apis for their data. Is the idea/concept
here the same as scraping?


No, its completely different. Scraping means trying to decipher a public 
web page that is designed for display in a browser. Web pages are prone 
to frequent change and the data often moves around within the page 
meaning constant updates to your scraper. Also web pages are 
increasingly dynamically generated which makes scraping much harder.


An API is relatively stable and returns just the data elements of
the page. As such its usually easier to use, more secure,
more stable, faster (lower bandwidth required) and has much
less impact on the providers network/servers thus improving
performance for everyone.


And is there any use of scraping anymore
when sites are now offering their data?


If a site offers an API that returns the data you need then use it,
If not you have few alternatives to scraping (although scraping
may be 'illegal' anyway due to the impact on other users). But scraping, 
whether a web page or a GUI or an old mainframe terminal

is always a fragile and unsatisfactory solution. An API will
always be better in the long term if it exists.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.flickr.com/photos/alangauldphotos

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question about scraping

2014-05-30 Thread ALAN GAULD
On Fri, May 30, 2014 at 7:20 PM, Alan Gauld alan.ga...@btinternet.com wrote:


 If a site offers an API that returns the data you need then use it,
 If not you have few alternatives to scraping (although scraping
 may be 'illegal' anyway due to the impact on other users). But scraping,
 whether a web page or a GUI or an old mainframe terminal
 is always a fragile and unsatisfactory solution.

Okay I think learning how to scrap (library or framework) is not worth
the trouble. Especially if some people consider it illegal. Thanks for
the input.


As I say, sometimes you have no choice but to scrape.
Its only 'illegal' if the site owner says so, in other words if their terms of 
use 
prohibit web scraping. If they have gone to the effort (and cost)  of providing 
an API then it probably means scraping is prohibited. But many (most!) sites
don't offer APIs  and most smaller sites don't prohibit scraping, so it is 
still a 
valid technique. But before you try its always worth checking whether an API 
exists and whether scraping is permitted.

And by 'illegal' I mean you are unlikely to be prosecuted in a court but 
you are likely to find your IP address and/or account closed. The systems 
generally monitor activity and if an account is navigating through pages 
too quickly to be a human they often close the account down.

Alan g.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question about scraping

2014-05-30 Thread Matthew Ngaha
Thanks for the response Alan. I forgot to reply to tutor on my 2nd
comment. Just incase someone might want to see it, here it is:

Okay I think learning how to scrap (library or framework) is not
worth the trouble. Especially if some people consider it illegal.
Thanks for the input.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor