Re: [Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

2015-08-27 Thread Kevin Ushey
Thanks for looking into this so promptly!

Should users expect the behaviour to be congruent across all of the
supported external programs (curl, wget) as well? E.g.

URL <- "http://cran.rstudio.org/no/such/file/here.tar.gz";
download <- function(file, method, ...)
  print(download.file(file, destfile = tempfile(), method = method, ...))

download(URL, method = "internal") ## error
download(URL, method = "curl") ## status code 0
download(URL, method = "wget") ## warning (status code 8)
download(URL, method = "libcurl") ## status code 0

It seems unfortunate that the behaviour differs across each method; at
least in my mind `download.file()` should be a unified interface that
tries to do the 'same thing' regardless of the chosen method.

FWIW, one can force 'curl' to fail on HTTP error codes (-f) and this
can be passed down by R, e.g.

download(URL, method = "curl", extra = "-f") ## warning (status code 22)

but I still think this should be promoted to an error rather than a
warning. (Of course, changing that would imply a backwards
incompatible change; however, I think it would be the correct change).

(PS: I just tested r69197 and method = "libcurl" does indeed report an
error now in the above test case on my system [OS X]; thanks!)

Kevin


On Thu, Aug 27, 2015 at 10:27 AM, Martin Morgan  wrote:
> R-devel r69197 returns appropriate errors for the cases below; I know of a
> few rough edges
>
> - ftp error codes are not reported correctly
> - download.file creates destfile before discovering that http fails, leaving
> an empty file on disk
>
> and am happy to hear of more.
>
> Martin
>
>
> On 08/27/2015 08:46 AM, Jeroen Ooms wrote:
>>
>> On Thu, Aug 27, 2015 at 5:16 PM, Martin Maechler
>>  wrote:
>>>
>>> Probably I'm confused now...
>>> Both R-patched and R-devel give an error (after a *long* wait!)
>>> for
>>> download.file("https://someserver.com/mydata.csv";, "mydata.csv")
>>>
>>> So that problem is I think  solved now.
>>
>>
>> I'm sorry for the confusion, this was a hypothetical example.
>> Connection failures are different from http status errors. Below some
>> real examples of servers returning http errors. For each example the
>> "internal" method correctly raises an R error, whereas the "libcurl"
>> method does not.
>>
>> # File not found (404)
>> download.file("http://httpbin.org/data.csv";, "data.csv", method =
>> "internal")
>> download.file("http://httpbin.org/data.csv";, "data.csv", method =
>> "libcurl")
>> readLines(url("http://httpbin.org/data.csv";, method = "internal"))
>> readLines(url("http://httpbin.org/data.csv";, method = "libcurl"))
>>
>> # Unauthorized (401)
>> download.file("https://httpbin.org/basic-auth/user/passwd";,
>> "data.csv", method = "internal")
>> download.file("https://httpbin.org/basic-auth/user/passwd";,
>> "data.csv", method = "libcurl")
>> readLines(url("https://httpbin.org/basic-auth/user/passwd";, method =
>> "internal"))
>> readLines(url("https://httpbin.org/basic-auth/user/passwd";, method =
>> "libcurl"))
>>
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

2015-08-27 Thread Martin Morgan
R-devel r69197 returns appropriate errors for the cases below; I know of a few 
rough edges


- ftp error codes are not reported correctly
- download.file creates destfile before discovering that http fails, leaving an 
empty file on disk


and am happy to hear of more.

Martin

On 08/27/2015 08:46 AM, Jeroen Ooms wrote:

On Thu, Aug 27, 2015 at 5:16 PM, Martin Maechler
 wrote:

Probably I'm confused now...
Both R-patched and R-devel give an error (after a *long* wait!)
for
download.file("https://someserver.com/mydata.csv";, "mydata.csv")

So that problem is I think  solved now.


I'm sorry for the confusion, this was a hypothetical example.
Connection failures are different from http status errors. Below some
real examples of servers returning http errors. For each example the
"internal" method correctly raises an R error, whereas the "libcurl"
method does not.

# File not found (404)
download.file("http://httpbin.org/data.csv";, "data.csv", method = "internal")
download.file("http://httpbin.org/data.csv";, "data.csv", method = "libcurl")
readLines(url("http://httpbin.org/data.csv";, method = "internal"))
readLines(url("http://httpbin.org/data.csv";, method = "libcurl"))

# Unauthorized (401)
download.file("https://httpbin.org/basic-auth/user/passwd";,
"data.csv", method = "internal")
download.file("https://httpbin.org/basic-auth/user/passwd";,
"data.csv", method = "libcurl")
readLines(url("https://httpbin.org/basic-auth/user/passwd";, method =
"internal"))
readLines(url("https://httpbin.org/basic-auth/user/passwd";, method = "libcurl"))




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

2015-08-27 Thread Jeroen Ooms
On Thu, Aug 27, 2015 at 5:16 PM, Martin Maechler
 wrote:
> Probably I'm confused now...
> Both R-patched and R-devel give an error (after a *long* wait!)
> for
>download.file("https://someserver.com/mydata.csv";, "mydata.csv")
>
> So that problem is I think  solved now.

I'm sorry for the confusion, this was a hypothetical example.
Connection failures are different from http status errors. Below some
real examples of servers returning http errors. For each example the
"internal" method correctly raises an R error, whereas the "libcurl"
method does not.

# File not found (404)
download.file("http://httpbin.org/data.csv";, "data.csv", method = "internal")
download.file("http://httpbin.org/data.csv";, "data.csv", method = "libcurl")
readLines(url("http://httpbin.org/data.csv";, method = "internal"))
readLines(url("http://httpbin.org/data.csv";, method = "libcurl"))

# Unauthorized (401)
download.file("https://httpbin.org/basic-auth/user/passwd";,
"data.csv", method = "internal")
download.file("https://httpbin.org/basic-auth/user/passwd";,
"data.csv", method = "libcurl")
readLines(url("https://httpbin.org/basic-auth/user/passwd";, method =
"internal"))
readLines(url("https://httpbin.org/basic-auth/user/passwd";, method = "libcurl"))

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

2015-08-27 Thread Martin Morgan

On 08/27/2015 08:16 AM, Martin Maechler wrote:

"DM" == Duncan Murdoch 
 on Wed, 26 Aug 2015 19:07:23 -0400 writes:


 DM> On 26/08/2015 6:04 PM, Jeroen Ooms wrote:
 >> On Tue, Aug 25, 2015 at 10:33 PM, Martin Morgan 
 wrote:
 >>>
 >>> actually I don't know that it does -- it addresses the symptom but I 
think there should be an error from libcurl on the 403 / 404 rather than from read.dcf 
on error page...
 >>
 >> Indeed, the only correct behavior is to turn the protocol error code
 >> into an R exception. When the server returns a status code >= 400, it
 >> indicates that the request was unsuccessful and the response body does
 >> not contain the content the client had requested, but should instead
 >> be interpreted as an error message/page. Ignoring this fact and
 >> proceeding with parsing the body as usual is incorrect and leads to
 >> all kind of strange errors downstream.

 DM> Yes.  I haven't been following this long thread.  Is it only in 
R-devel,
 DM> or is this happening in 3.2.2 or R-patched?

 DM> If the latter, please submit a bug report.  If it is only R-devel,
 DM> please just be patient.  When R-devel becomes R-alpha next year, if the
 DM> bug still exists, please report it.

 DM> Duncan Murdoch

Probably I'm confused now...
Both R-patched and R-devel give an error (after a *long* wait!)
for
download.file("https://someserver.com/mydata.csv";, "mydata.csv")

So that problem is I think  solved now.
Ideally, it would nice to set the *timeout* as an R function
argument ourselves.. though.

Kevin Ushey's original problem however is still in R-patched and
R-devel:

ap <- available.packages("http://www.stats.ox.ac.uk/pub/RWin";, method="libcurl")
ap

giving


ap <- available.packages("http://www.stats.ox.ac.uk/pub/RWin";, 
method="libcurl")Warning: unable to access index for repository 
http://www.stats.ox.ac.uk/pub/RWin:

   Line starting '
ap

  Package Version Priority Depends Imports LinkingTo Suggests Enhances 
License License_is_FOSS License_restricts_use OS_type Archs
  MD5sum NeedsCompilation File Repository




and the resulting 'ap' is the same as e.g., with the the default
method which also gives a warning and then an empty list (well
"data.frame") of packages.


I don't see a big problem with the above.
It would be better if the warning did not contain the extra
"Line starting '

In Kevin's original post, he was using an earlier version of R, and the code in 
available.packages was returning an error.


The code had been updated (by me) in the version that you are using to return a 
warning, which was the original design and intention (to convert errors during 
repository queries into warnings, so other repositories could be queried; this 
was Kevin's original point).


The fix I provided does not address the underlying problem, which is that

  download.file("http://www.stats.ox.ac.uk/pub/RWin/PACKAGES.gz";,
fl <- tempfile(), method="libcurl")

actually downloads the error file, without throwing an error

>   download.file("http://www.stats.ox.ac.uk/pub/RWin/PACKAGES.gz";,   fl <- 
tempfile(), method="libcurl")

trying URL 'http://www.stats.ox.ac.uk/pub/RWin/PACKAGES.gz'
Content type 'text/html; charset=iso-8859-1' length 302 bytes
==
downloaded 302 bytes

> cat(paste(readLines(fl), collapse="\n"))


404 Not Found

Not Found
The requested URL /pub/RWin/PACKAGES.gz was not found on this server.

Apache/2.2.22 (Debian) Server at www.stats.ox.ac.uk Port 80
>


I do have a patch for this, which I will share off-list before committing.

Martin
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

2015-08-27 Thread Martin Maechler
> "DM" == Duncan Murdoch 
> on Wed, 26 Aug 2015 19:07:23 -0400 writes:

DM> On 26/08/2015 6:04 PM, Jeroen Ooms wrote:
>> On Tue, Aug 25, 2015 at 10:33 PM, Martin Morgan  
wrote:
>>> 
>>> actually I don't know that it does -- it addresses the symptom but I 
think there should be an error from libcurl on the 403 / 404 rather than from 
read.dcf on error page...
>> 
>> Indeed, the only correct behavior is to turn the protocol error code
>> into an R exception. When the server returns a status code >= 400, it
>> indicates that the request was unsuccessful and the response body does
>> not contain the content the client had requested, but should instead
>> be interpreted as an error message/page. Ignoring this fact and
>> proceeding with parsing the body as usual is incorrect and leads to
>> all kind of strange errors downstream.

DM> Yes.  I haven't been following this long thread.  Is it only in R-devel,
DM> or is this happening in 3.2.2 or R-patched?

DM> If the latter, please submit a bug report.  If it is only R-devel,
DM> please just be patient.  When R-devel becomes R-alpha next year, if the
DM> bug still exists, please report it.

DM> Duncan Murdoch

Probably I'm confused now...
Both R-patched and R-devel give an error (after a *long* wait!) 
for
   download.file("https://someserver.com/mydata.csv";, "mydata.csv")

So that problem is I think  solved now.
Ideally, it would nice to set the *timeout* as an R function
argument ourselves.. though.

Kevin Ushey's original problem however is still in R-patched and
R-devel:

ap <- available.packages("http://www.stats.ox.ac.uk/pub/RWin";, method="libcurl")
ap

giving

> ap <- available.packages("http://www.stats.ox.ac.uk/pub/RWin";, 
> method="libcurl")Warning: unable to access index for repository 
> http://www.stats.ox.ac.uk/pub/RWin:
  Line starting ' ap
 Package Version Priority Depends Imports LinkingTo Suggests Enhances 
License License_is_FOSS License_restricts_use OS_type Archs
 MD5sum NeedsCompilation File Repository
> 

and the resulting 'ap' is the same as e.g., with the the default
method which also gives a warning and then an empty list (well
"data.frame") of packages.


I don't see a big problem with the above.
It would be better if the warning did not contain the extra
   "Line starting 'https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

2015-08-26 Thread Duncan Murdoch
On 26/08/2015 6:04 PM, Jeroen Ooms wrote:
> On Tue, Aug 25, 2015 at 10:33 PM, Martin Morgan  
> wrote:
>>
>> actually I don't know that it does -- it addresses the symptom but I think 
>> there should be an error from libcurl on the 403 / 404 rather than from 
>> read.dcf on error page...
> 
> Indeed, the only correct behavior is to turn the protocol error code
> into an R exception. When the server returns a status code >= 400, it
> indicates that the request was unsuccessful and the response body does
> not contain the content the client had requested, but should instead
> be interpreted as an error message/page. Ignoring this fact and
> proceeding with parsing the body as usual is incorrect and leads to
> all kind of strange errors downstream.

Yes.  I haven't been following this long thread.  Is it only in R-devel,
or is this happening in 3.2.2 or R-patched?

If the latter, please submit a bug report.  If it is only R-devel,
please just be patient.  When R-devel becomes R-alpha next year, if the
bug still exists, please report it.

Duncan Murdoch

> 
> The other download methods did this correctly, it is unclear why the
> current implementation of the "libcurl" method does not. Not only does
> it lead to hard to interpret downstream parsing errors, it also makes
> the behavior of R ambiguous as it is dependent on which download
> method is in use. It is certainly not a limitation of the libcurl
> library: the 'curl' package has alternative implementations of url()
> and download.file() which exercise the correct behavior.
> 
> I can only speculate, but if the motivation is to explicitly support
> retrieval of error pages, perhaps the download.file() and url()
> functions can gain an argument 'stop_on_error' or something similar
> which give the user an option to ignore server errors. However this
> behavior should certainly not be the default. When a function or
> script contains a line like this:
> 
>   download.file("https://someserver.com/mydata.csv";, "mydata.csv")
> 
> Then in the next line of code we must be able to expect that the file
> "mydata.csv" we have downloaded to our disk is in fact the file
> "mydata.csv" that was requested from the server. An implementation
> that instead saves an error page (likely html content) to the
> "mydata.csv" file is simply incorrect and will lead to obvious
> problems, even with a warning.
> 
> 
> [1] https://www.opencpu.org/posts/cran-https/
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

2015-08-26 Thread Jeroen Ooms
On Tue, Aug 25, 2015 at 10:33 PM, Martin Morgan  wrote:
>
> actually I don't know that it does -- it addresses the symptom but I think 
> there should be an error from libcurl on the 403 / 404 rather than from 
> read.dcf on error page...

Indeed, the only correct behavior is to turn the protocol error code
into an R exception. When the server returns a status code >= 400, it
indicates that the request was unsuccessful and the response body does
not contain the content the client had requested, but should instead
be interpreted as an error message/page. Ignoring this fact and
proceeding with parsing the body as usual is incorrect and leads to
all kind of strange errors downstream.

The other download methods did this correctly, it is unclear why the
current implementation of the "libcurl" method does not. Not only does
it lead to hard to interpret downstream parsing errors, it also makes
the behavior of R ambiguous as it is dependent on which download
method is in use. It is certainly not a limitation of the libcurl
library: the 'curl' package has alternative implementations of url()
and download.file() which exercise the correct behavior.

I can only speculate, but if the motivation is to explicitly support
retrieval of error pages, perhaps the download.file() and url()
functions can gain an argument 'stop_on_error' or something similar
which give the user an option to ignore server errors. However this
behavior should certainly not be the default. When a function or
script contains a line like this:

  download.file("https://someserver.com/mydata.csv";, "mydata.csv")

Then in the next line of code we must be able to expect that the file
"mydata.csv" we have downloaded to our disk is in fact the file
"mydata.csv" that was requested from the server. An implementation
that instead saves an error page (likely html content) to the
"mydata.csv" file is simply incorrect and will lead to obvious
problems, even with a warning.


[1] https://www.opencpu.org/posts/cran-https/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

2015-08-25 Thread Kevin Ushey
(final post, sorry to be spamming everyone all day...)

As kindly pointed out by Martin off-list, I was in fact using an old
version of R-devel (it looks like the binaries provided at
http://r.research.att.com/ are currently stale -- although the page
lists r69167 as the current version, the binaries being distributed
are for r69078).

Building R locally with trunk (r69180) and testing confirms that
errors no longer clobber the whole `install.packages()` process;
having the various download methods respect HTTP status / error codes
when using `libcurl` is still an issue but one I imagine that R-core
is aware of.

Thanks, and apologies again for the spam,
Kevin

On Tue, Aug 25, 2015 at 2:41 PM, Kevin Ushey  wrote:
> In fact, this does reproduce on R-devel:
>
> > options(download.file.method = "libcurl")
> > options(repos = c(CRAN = "https://cran.rstudio.com/";, CRANextra =
> + "http://www.stats.ox.ac.uk/pub/RWin";))
> > install.packages("lattice") ## could be any package
> Installing package into ‘/Users/kevinushey/Library/R/3.3/library’
> (as ‘lib’ is unspecified)
> Error: Line starting '
> > sessionInfo()
> R Under development (unstable) (2015-08-14 r69078)
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
> Running under: OS X 10.10.4 (Yosemite)
>
> I think this could be problematic for users with custom CRAN
> repositories. For example, if I have a CRAN repository that only
> serves source packages (no binary packages), this implies that any R
> session configured to download binary packages would fail to download
> any packages at all (as it would barf on attempting to read the
> non-existent PACKAGES file for the 'binary' branch of the custom
> repository).
>
> This can also be seen by attempting to install a package using current
> R-devel (since no binaries are made available for R 3.3):
>
> > options(download.file.method = "libcurl")
> > options(repos = c(CRAN = "https://cran.rstudio.com/";))
> > print(getOption("pkgType"))
> [1] "both"
> > install.packages("lattice")
> Installing package into ‘/Users/kevinushey/Library/R/3.3/library’
> (as ‘lib’ is unspecified)
> Error in install.packages : Line starting ' ...' is malformed!
>
> The same error (with a different, XML response) is returned when using
> e.g. `https://cran.fhcrc.org`.
>
> Kevin
>
> On Tue, Aug 25, 2015 at 1:33 PM, Martin Morgan  wrote:
>> On 08/25/2015 01:30 PM, Kevin Ushey wrote:
>>>
>>> Hi Martin,
>>>
>>> Indeed it does (and I should have confirmed myself with R-patched and
>>> R-devel
>>> before posting...)
>>
>>
>> actually I don't know that it does -- it addresses the symptom but I think
>> there should be an error from libcurl on the 403 / 404 rather than from
>> read.dcf on error page...
>>
>> Martin
>>
>>
>>>
>>> Thanks, and sorry for the noise.
>>> Kevin
>>>
>>>
>>> On Tue, Aug 25, 2015, 13:11 Martin Morgan >> > wrote:
>>>
>>> On 08/25/2015 12:54 PM, Kevin Ushey wrote:
>>>  > Hi all,
>>>  >
>>>  > The following fails for me (on OS X, although I imagine it's the
>>> same
>>>  > on other platforms using libcurl):
>>>  >
>>>  >  options(download.file.method = "libcurl")
>>>  >  options(repos = c(CRAN = "https://cran.rstudio.com/";,
>>> CRANextra =
>>>  > "http://www.stats.ox.ac.uk/pub/RWin";))
>>>  >  install.packages("lattice") ## could be any package
>>>  >
>>>  > gives me:
>>>  >
>>>  >  > options(download.file.method = "libcurl")
>>>  >  > options(repos = c(CRAN = "https://cran.rstudio.com/";,
>>> CRANextra
>>>  > = "http://www.stats.ox.ac.uk/pub/RWin";))
>>>  >  > install.packages("lattice") ## coudl be any package
>>>  >  Installing package into
>>> ‘/Users/kevinushey/Library/R/3.2/library’
>>>  >  (as ‘lib’ is unspecified)
>>>  >  Error: Line starting '>>  >
>>>  > This seems to come from a call to `available.packages()` to a URL
>>> that
>>>  > doesn't exist on the server (likely when querying PACKAGES on the
>>>  > CRANextra repo)
>>>  >
>>>  > Eg.
>>>  >
>>>  >  > URL <- "http://www.stats.ox.ac.uk/pub/RWin";
>>>  >  > available.packages(URL, method = "internal")
>>>  >  Warning: unable to access index for repository
>>>  > http://www.stats.ox.ac.uk/pub/RWin
>>>  >   Package Version Priority Depends Imports LinkingTo
>>> Suggests
>>>  > Enhances License License_is_FOSS
>>>  >  License_restricts_use OS_type Archs MD5sum
>>> NeedsCompilation
>>>  > File Repository
>>>  >  > available.packages(URL, method = "libcurl")
>>>  >  Error: Line starting '>>  >
>>>  > It looks like libcurl downloads and retrieves the 403 page itself,
>>>  > rather than reporting that it was actually forbidden, e.g.:
>>>  >
>>>  >  >
>>>
>>> download.file("http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks

Re: [Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

2015-08-25 Thread Kevin Ushey
In fact, this does reproduce on R-devel:

> options(download.file.method = "libcurl")
> options(repos = c(CRAN = "https://cran.rstudio.com/";, CRANextra =
+ "http://www.stats.ox.ac.uk/pub/RWin";))
> install.packages("lattice") ## could be any package
Installing package into ‘/Users/kevinushey/Library/R/3.3/library’
(as ‘lib’ is unspecified)
Error: Line starting ' sessionInfo()
R Under development (unstable) (2015-08-14 r69078)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.4 (Yosemite)

I think this could be problematic for users with custom CRAN
repositories. For example, if I have a CRAN repository that only
serves source packages (no binary packages), this implies that any R
session configured to download binary packages would fail to download
any packages at all (as it would barf on attempting to read the
non-existent PACKAGES file for the 'binary' branch of the custom
repository).

This can also be seen by attempting to install a package using current
R-devel (since no binaries are made available for R 3.3):

> options(download.file.method = "libcurl")
> options(repos = c(CRAN = "https://cran.rstudio.com/";))
> print(getOption("pkgType"))
[1] "both"
> install.packages("lattice")
Installing package into ‘/Users/kevinushey/Library/R/3.3/library’
(as ‘lib’ is unspecified)
Error in install.packages : Line starting 'https://cran.fhcrc.org`.

Kevin

On Tue, Aug 25, 2015 at 1:33 PM, Martin Morgan  wrote:
> On 08/25/2015 01:30 PM, Kevin Ushey wrote:
>>
>> Hi Martin,
>>
>> Indeed it does (and I should have confirmed myself with R-patched and
>> R-devel
>> before posting...)
>
>
> actually I don't know that it does -- it addresses the symptom but I think
> there should be an error from libcurl on the 403 / 404 rather than from
> read.dcf on error page...
>
> Martin
>
>
>>
>> Thanks, and sorry for the noise.
>> Kevin
>>
>>
>> On Tue, Aug 25, 2015, 13:11 Martin Morgan > > wrote:
>>
>> On 08/25/2015 12:54 PM, Kevin Ushey wrote:
>>  > Hi all,
>>  >
>>  > The following fails for me (on OS X, although I imagine it's the
>> same
>>  > on other platforms using libcurl):
>>  >
>>  >  options(download.file.method = "libcurl")
>>  >  options(repos = c(CRAN = "https://cran.rstudio.com/";,
>> CRANextra =
>>  > "http://www.stats.ox.ac.uk/pub/RWin";))
>>  >  install.packages("lattice") ## could be any package
>>  >
>>  > gives me:
>>  >
>>  >  > options(download.file.method = "libcurl")
>>  >  > options(repos = c(CRAN = "https://cran.rstudio.com/";,
>> CRANextra
>>  > = "http://www.stats.ox.ac.uk/pub/RWin";))
>>  >  > install.packages("lattice") ## coudl be any package
>>  >  Installing package into
>> ‘/Users/kevinushey/Library/R/3.2/library’
>>  >  (as ‘lib’ is unspecified)
>>  >  Error: Line starting '>  >
>>  > This seems to come from a call to `available.packages()` to a URL
>> that
>>  > doesn't exist on the server (likely when querying PACKAGES on the
>>  > CRANextra repo)
>>  >
>>  > Eg.
>>  >
>>  >  > URL <- "http://www.stats.ox.ac.uk/pub/RWin";
>>  >  > available.packages(URL, method = "internal")
>>  >  Warning: unable to access index for repository
>>  > http://www.stats.ox.ac.uk/pub/RWin
>>  >   Package Version Priority Depends Imports LinkingTo
>> Suggests
>>  > Enhances License License_is_FOSS
>>  >  License_restricts_use OS_type Archs MD5sum
>> NeedsCompilation
>>  > File Repository
>>  >  > available.packages(URL, method = "libcurl")
>>  >  Error: Line starting '>  >
>>  > It looks like libcurl downloads and retrieves the 403 page itself,
>>  > rather than reporting that it was actually forbidden, e.g.:
>>  >
>>  >  >
>>
>> download.file("http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks/contrib/3.2/PACKAGES.gz";,
>>  > tempfile(), method = "libcurl")
>>  >  trying URL
>>
>> 'http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks/contrib/3.2/PACKAGES.gz'
>>  >  Content type 'text/html; charset=iso-8859-1' length 339 bytes
>>  >  ==
>>  >  downloaded 339 bytes
>>  >
>>  > Using `method = "internal"` gives an error related to the inability
>> to
>>  > access that URL due to the HTTP status 403.
>>  >
>>  > The overarching issue here is that package installation shouldn't
>> fail
>>  > even if libcurl fails to access one of the repositories set.
>>  >
>>
>> With
>>
>>   > R.version.string
>> [1] "R version 3.2.2 Patched (2015-08-25 r69179)"
>>
>> the behavior is to warn with an indication of the repository for which
>> the
>> problem occurs
>>
>>   > URL <- "http://www.stats.ox.ac.uk/pub/RWin";
>>   > availa

Re: [Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

2015-08-25 Thread Martin Morgan

On 08/25/2015 01:30 PM, Kevin Ushey wrote:

Hi Martin,

Indeed it does (and I should have confirmed myself with R-patched and R-devel
before posting...)


actually I don't know that it does -- it addresses the symptom but I think there 
should be an error from libcurl on the 403 / 404 rather than from read.dcf on 
error page...


Martin




Thanks, and sorry for the noise.
Kevin


On Tue, Aug 25, 2015, 13:11 Martin Morgan mailto:mtmor...@fredhutch.org>> wrote:

On 08/25/2015 12:54 PM, Kevin Ushey wrote:
 > Hi all,
 >
 > The following fails for me (on OS X, although I imagine it's the same
 > on other platforms using libcurl):
 >
 >  options(download.file.method = "libcurl")
 >  options(repos = c(CRAN = "https://cran.rstudio.com/";, CRANextra =
 > "http://www.stats.ox.ac.uk/pub/RWin";))
 >  install.packages("lattice") ## could be any package
 >
 > gives me:
 >
 >  > options(download.file.method = "libcurl")
 >  > options(repos = c(CRAN = "https://cran.rstudio.com/";, CRANextra
 > = "http://www.stats.ox.ac.uk/pub/RWin";))
 >  > install.packages("lattice") ## coudl be any package
 >  Installing package into ‘/Users/kevinushey/Library/R/3.2/library’
 >  (as ‘lib’ is unspecified)
 >  Error: Line starting '
 > This seems to come from a call to `available.packages()` to a URL that
 > doesn't exist on the server (likely when querying PACKAGES on the
 > CRANextra repo)
 >
 > Eg.
 >
 >  > URL <- "http://www.stats.ox.ac.uk/pub/RWin";
 >  > available.packages(URL, method = "internal")
 >  Warning: unable to access index for repository
 > http://www.stats.ox.ac.uk/pub/RWin
 >   Package Version Priority Depends Imports LinkingTo Suggests
 > Enhances License License_is_FOSS
 >  License_restricts_use OS_type Archs MD5sum NeedsCompilation
 > File Repository
 >  > available.packages(URL, method = "libcurl")
 >  Error: Line starting '
 > It looks like libcurl downloads and retrieves the 403 page itself,
 > rather than reporting that it was actually forbidden, e.g.:
 >
 >  >

download.file("http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks/contrib/3.2/PACKAGES.gz";,
 > tempfile(), method = "libcurl")
 >  trying URL

'http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks/contrib/3.2/PACKAGES.gz'
 >  Content type 'text/html; charset=iso-8859-1' length 339 bytes
 >  ==
 >  downloaded 339 bytes
 >
 > Using `method = "internal"` gives an error related to the inability to
 > access that URL due to the HTTP status 403.
 >
 > The overarching issue here is that package installation shouldn't fail
 > even if libcurl fails to access one of the repositories set.
 >

With

  > R.version.string
[1] "R version 3.2.2 Patched (2015-08-25 r69179)"

the behavior is to warn with an indication of the repository for which the
problem occurs

  > URL <- "http://www.stats.ox.ac.uk/pub/RWin";
  > available.packages(URL, method="libcurl")
Warning: unable to access index for repository
http://www.stats.ox.ac.uk/pub/RWin:
Line starting ' available.packages(URL, method="internal")
Warning: unable to access index for repository
http://www.stats.ox.ac.uk/pub/RWin:
cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/PACKAGES'
   Package Version Priority Depends Imports LinkingTo Suggests Enhances
   License License_is_FOSS License_restricts_use OS_type Archs MD5sum
   NeedsCompilation File Repository

Does that work for you / address the problem?

Martin

 >> sessionInfo()
 > R version 3.2.2 (2015-08-14)
 > Platform: x86_64-apple-darwin13.4.0 (64-bit)
 > Running under: OS X 10.10.4 (Yosemite)
 >
 > locale:
 > [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
 >
 > attached base packages:
 > [1] stats graphics  grDevices utils datasets  methods   base
 >
 > other attached packages:
 > [1] testthat_0.8.1.0.99  knitr_1.11   devtools_1.5.0.9001
 > [4] BiocInstaller_1.15.5
 >
 > loaded via a namespace (and not attached):
 >   [1] httr_1.0.0 R6_2.0.0.9000  tools_3.2.2parallel_3.2.2
whisker_0.3-2
 >   [6] RCurl_1.95-4.1 memoise_0.2.1  stringr_0.6.2  digest_0.6.4
  evaluate_0.7.2
 >
 > Thanks,
 > Kevin
 >
 > __
 > R-devel@r-project.org  mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-devel
 >


--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2

Re: [Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

2015-08-25 Thread Kevin Ushey
Hi Martin,

Indeed it does (and I should have confirmed myself with R-patched and
R-devel before posting...)

Thanks, and sorry for the noise.
Kevin

On Tue, Aug 25, 2015, 13:11 Martin Morgan  wrote:

> On 08/25/2015 12:54 PM, Kevin Ushey wrote:
> > Hi all,
> >
> > The following fails for me (on OS X, although I imagine it's the same
> > on other platforms using libcurl):
> >
> >  options(download.file.method = "libcurl")
> >  options(repos = c(CRAN = "https://cran.rstudio.com/";, CRANextra =
> > "http://www.stats.ox.ac.uk/pub/RWin";))
> >  install.packages("lattice") ## could be any package
> >
> > gives me:
> >
> >  > options(download.file.method = "libcurl")
> >  > options(repos = c(CRAN = "https://cran.rstudio.com/";, CRANextra
> > = "http://www.stats.ox.ac.uk/pub/RWin";))
> >  > install.packages("lattice") ## coudl be any package
> >  Installing package into ‘/Users/kevinushey/Library/R/3.2/library’
> >  (as ‘lib’ is unspecified)
> >  Error: Line starting ' >
> > This seems to come from a call to `available.packages()` to a URL that
> > doesn't exist on the server (likely when querying PACKAGES on the
> > CRANextra repo)
> >
> > Eg.
> >
> >  > URL <- "http://www.stats.ox.ac.uk/pub/RWin";
> >  > available.packages(URL, method = "internal")
> >  Warning: unable to access index for repository
> > http://www.stats.ox.ac.uk/pub/RWin
> >   Package Version Priority Depends Imports LinkingTo Suggests
> > Enhances License License_is_FOSS
> >  License_restricts_use OS_type Archs MD5sum NeedsCompilation
> > File Repository
> >  > available.packages(URL, method = "libcurl")
> >  Error: Line starting ' >
> > It looks like libcurl downloads and retrieves the 403 page itself,
> > rather than reporting that it was actually forbidden, e.g.:
> >
> >  > download.file("
> http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks/contrib/3.2/PACKAGES.gz
> ",
> > tempfile(), method = "libcurl")
> >  trying URL '
> http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks/contrib/3.2/PACKAGES.gz
> '
> >  Content type 'text/html; charset=iso-8859-1' length 339 bytes
> >  ==
> >  downloaded 339 bytes
> >
> > Using `method = "internal"` gives an error related to the inability to
> > access that URL due to the HTTP status 403.
> >
> > The overarching issue here is that package installation shouldn't fail
> > even if libcurl fails to access one of the repositories set.
> >
>
> With
>
>  > R.version.string
> [1] "R version 3.2.2 Patched (2015-08-25 r69179)"
>
> the behavior is to warn with an indication of the repository for which the
> problem occurs
>
>  > URL <- "http://www.stats.ox.ac.uk/pub/RWin";
>  > available.packages(URL, method="libcurl")
> Warning: unable to access index for repository
> http://www.stats.ox.ac.uk/pub/RWin:
>Line starting '   Package Version Priority Depends Imports LinkingTo Suggests Enhances
>   License License_is_FOSS License_restricts_use OS_type Archs MD5sum
>   NeedsCompilation File Repository
>  > available.packages(URL, method="internal")
> Warning: unable to access index for repository
> http://www.stats.ox.ac.uk/pub/RWin:
>cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/PACKAGES'
>   Package Version Priority Depends Imports LinkingTo Suggests Enhances
>   License License_is_FOSS License_restricts_use OS_type Archs MD5sum
>   NeedsCompilation File Repository
>
> Does that work for you / address the problem?
>
> Martin
>
> >> sessionInfo()
> > R version 3.2.2 (2015-08-14)
> > Platform: x86_64-apple-darwin13.4.0 (64-bit)
> > Running under: OS X 10.10.4 (Yosemite)
> >
> > locale:
> > [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
> >
> > attached base packages:
> > [1] stats graphics  grDevices utils datasets  methods   base
> >
> > other attached packages:
> > [1] testthat_0.8.1.0.99  knitr_1.11   devtools_1.5.0.9001
> > [4] BiocInstaller_1.15.5
> >
> > loaded via a namespace (and not attached):
> >   [1] httr_1.0.0 R6_2.0.0.9000  tools_3.2.2parallel_3.2.2
> whisker_0.3-2
> >   [6] RCurl_1.95-4.1 memoise_0.2.1  stringr_0.6.2  digest_0.6.4
>  evaluate_0.7.2
> >
> > Thanks,
> > Kevin
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

2015-08-25 Thread Martin Morgan

On 08/25/2015 12:54 PM, Kevin Ushey wrote:

Hi all,

The following fails for me (on OS X, although I imagine it's the same
on other platforms using libcurl):

 options(download.file.method = "libcurl")
 options(repos = c(CRAN = "https://cran.rstudio.com/";, CRANextra =
"http://www.stats.ox.ac.uk/pub/RWin";))
 install.packages("lattice") ## could be any package

gives me:

 > options(download.file.method = "libcurl")
 > options(repos = c(CRAN = "https://cran.rstudio.com/";, CRANextra
= "http://www.stats.ox.ac.uk/pub/RWin";))
 > install.packages("lattice") ## coudl be any package
 Installing package into ‘/Users/kevinushey/Library/R/3.2/library’
 (as ‘lib’ is unspecified)
 Error: Line starting ' URL <- "http://www.stats.ox.ac.uk/pub/RWin";
 > available.packages(URL, method = "internal")
 Warning: unable to access index for repository
http://www.stats.ox.ac.uk/pub/RWin
  Package Version Priority Depends Imports LinkingTo Suggests
Enhances License License_is_FOSS
 License_restricts_use OS_type Archs MD5sum NeedsCompilation
File Repository
 > available.packages(URL, method = "libcurl")
 Error: Line starting ' 
download.file("http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks/contrib/3.2/PACKAGES.gz";,
tempfile(), method = "libcurl")
 trying URL 
'http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks/contrib/3.2/PACKAGES.gz'
 Content type 'text/html; charset=iso-8859-1' length 339 bytes
 ==
 downloaded 339 bytes

Using `method = "internal"` gives an error related to the inability to
access that URL due to the HTTP status 403.

The overarching issue here is that package installation shouldn't fail
even if libcurl fails to access one of the repositories set.



With

> R.version.string
[1] "R version 3.2.2 Patched (2015-08-25 r69179)"

the behavior is to warn with an indication of the repository for which the 
problem occurs


> URL <- "http://www.stats.ox.ac.uk/pub/RWin";
> available.packages(URL, method="libcurl")
Warning: unable to access index for repository 
http://www.stats.ox.ac.uk/pub/RWin:
  Line starting ' available.packages(URL, method="internal")
Warning: unable to access index for repository 
http://www.stats.ox.ac.uk/pub/RWin:
  cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/PACKAGES'
 Package Version Priority Depends Imports LinkingTo Suggests Enhances
 License License_is_FOSS License_restricts_use OS_type Archs MD5sum
 NeedsCompilation File Repository

Does that work for you / address the problem?

Martin


sessionInfo()

R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.4 (Yosemite)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] testthat_0.8.1.0.99  knitr_1.11   devtools_1.5.0.9001
[4] BiocInstaller_1.15.5

loaded via a namespace (and not attached):
  [1] httr_1.0.0 R6_2.0.0.9000  tools_3.2.2parallel_3.2.2 whisker_0.3-2
  [6] RCurl_1.95-4.1 memoise_0.2.1  stringr_0.6.2  digest_0.6.4   evaluate_0.7.2

Thanks,
Kevin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

2015-08-25 Thread Kevin Ushey
Hi all,

The following fails for me (on OS X, although I imagine it's the same
on other platforms using libcurl):

options(download.file.method = "libcurl")
options(repos = c(CRAN = "https://cran.rstudio.com/";, CRANextra =
"http://www.stats.ox.ac.uk/pub/RWin";))
install.packages("lattice") ## could be any package

gives me:

> options(download.file.method = "libcurl")
> options(repos = c(CRAN = "https://cran.rstudio.com/";, CRANextra
= "http://www.stats.ox.ac.uk/pub/RWin";))
> install.packages("lattice") ## coudl be any package
Installing package into ‘/Users/kevinushey/Library/R/3.2/library’
(as ‘lib’ is unspecified)
Error: Line starting ' URL <- "http://www.stats.ox.ac.uk/pub/RWin";
> available.packages(URL, method = "internal")
Warning: unable to access index for repository
http://www.stats.ox.ac.uk/pub/RWin
 Package Version Priority Depends Imports LinkingTo Suggests
Enhances License License_is_FOSS
License_restricts_use OS_type Archs MD5sum NeedsCompilation
File Repository
> available.packages(URL, method = "libcurl")
Error: Line starting ' 
download.file("http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks/contrib/3.2/PACKAGES.gz";,
tempfile(), method = "libcurl")
trying URL 
'http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks/contrib/3.2/PACKAGES.gz'
Content type 'text/html; charset=iso-8859-1' length 339 bytes
==
downloaded 339 bytes

Using `method = "internal"` gives an error related to the inability to
access that URL due to the HTTP status 403.

The overarching issue here is that package installation shouldn't fail
even if libcurl fails to access one of the repositories set.

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.4 (Yosemite)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] testthat_0.8.1.0.99  knitr_1.11   devtools_1.5.0.9001
[4] BiocInstaller_1.15.5

loaded via a namespace (and not attached):
 [1] httr_1.0.0 R6_2.0.0.9000  tools_3.2.2parallel_3.2.2 whisker_0.3-2
 [6] RCurl_1.95-4.1 memoise_0.2.1  stringr_0.6.2  digest_0.6.4   evaluate_0.7.2

Thanks,
Kevin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel