Hi All, I would like to learn how to scrape a web site which is password protected. I do my training with my Delicious web site. I will obey all rules and legislation existent.
The delicious export api was shut down. I assume that the web site will be shut down in the foreseeable future. In my Coursera Course I learned that it is possible to scrape web sites and extract the information in it. I would like to use this possibility to download the bookmark pages and extract the bookmarks with its accompanying tags as an alternative to the non-existant export api. I started with -- cut -- url_base <- "https://del.icio.us/gmaubach?&page=" data_created <- as.character(Sys.Date()) filename_base <- paste0( data_created, "_Delicious_Page_") page_start <- 1 page_end <- 670 for (page in seq_along(page_start:page_end)) { download.file( url = paste0( url_base, as.character(page)), destfile = paste0( filename_base, as.character(page))) } -- cut -- This way approx. 1000 bookmarks are not loaded cause only the public bookmarks are shown. I know that it is possible to authenticate using something like -- cut -- page <- GET("https://del.icio.us", authenticate("user", "password")) -- cut -- To not have to authenticate over and over again, it is possible to use handles like -- cut -- delicious <- handle("https://del.icio.us") -- cut -- I do not know how I have to put it all together. What would be a statement sequence in getting all stored booksmarks on the pages 1..670 using authentication? Kind regards Georg ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.