On Tue, Aug 25, 2015 at 10:33 PM, Martin Morgan <mtmor...@fredhutch.org> wrote: > > actually I don't know that it does -- it addresses the symptom but I think > there should be an error from libcurl on the 403 / 404 rather than from > read.dcf on error page...
Indeed, the only correct behavior is to turn the protocol error code into an R exception. When the server returns a status code >= 400, it indicates that the request was unsuccessful and the response body does not contain the content the client had requested, but should instead be interpreted as an error message/page. Ignoring this fact and proceeding with parsing the body as usual is incorrect and leads to all kind of strange errors downstream. The other download methods did this correctly, it is unclear why the current implementation of the "libcurl" method does not. Not only does it lead to hard to interpret downstream parsing errors, it also makes the behavior of R ambiguous as it is dependent on which download method is in use. It is certainly not a limitation of the libcurl library: the 'curl' package has alternative implementations of url() and download.file() which exercise the correct behavior. I can only speculate, but if the motivation is to explicitly support retrieval of error pages, perhaps the download.file() and url() functions can gain an argument 'stop_on_error' or something similar which give the user an option to ignore server errors. However this behavior should certainly not be the default. When a function or script contains a line like this: download.file("https://someserver.com/mydata.csv", "mydata.csv") Then in the next line of code we must be able to expect that the file "mydata.csv" we have downloaded to our disk is in fact the file "mydata.csv" that was requested from the server. An implementation that instead saves an error page (likely html content) to the "mydata.csv" file is simply incorrect and will lead to obvious problems, even with a warning. [1] https://www.opencpu.org/posts/cran-https/ ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel