On 06/08/2018 06:02 PM, Brad Roberts wrote:

Essentially (if not actually) everything on github is available through their api's.  No need for scraping or other heroics to gather it.

That does make things a little bit simpler, but web scraping really isn't all that much more complicated.

Whether web API or web scraping: Either way, you still have to submit an HTTP request, parse the results according to the format the server has chosen to spit out, and possibly follow up with additional HTTP requests. The main differences are just: Web scraping can occasionally get thwarted by changes in the webapp's presentation layer. Whereas web API can occasionally get thwarted by business rules changing what is/isn't accessible via API (this has been known to happen).

Ie, scraping needs to deal with UI changes, but unlike API, it cannot be selectively hindered/disabled (unless the primary website itself is hindered/disabled, too).

Thus, a robust tool will support both published web API and web scraping, and select the answers from whichever one works.

Reply via email to