Re: [R-pkg-devel] CRAN rules re. web scraping?

2020-01-27 Thread Adam H Sparks
Hi Spencer,
To add to what Roy has already provided. If you have tests that require
Internet access, you should be using skip_on_cran() for those tests and in
your examples using the \donttest{} tags to prevent errors on CRAN servers
when Internet is not available or the server is not responding or the
resource is unavailable.

Using tryCatch() will be helpful for the end-user experience, but will not
completely fix the issue that is being raised here.


On Thu, 23 Jan 2020 at 11:59, Roy Mendelssohn - NOAA Federal via
R-package-devel  wrote:

> Hi Spencer:
>
> I think that message means what it says,  and I read it as pretty
> straightforward and business like.  The issue is not web scraping.  There
> are two errors here:
>
> 1.  You can not write to the user's space without first explicitly asking
> permission of the user.   The suggested policy is to write to a temp
> directory,  R has tempdir() and related commands for how to do this.
>
> 2.  When accessing something over the internet,  failure of the access
> must be checked for and the program exiting gracefully.  The second error
> appears to be that at times on the builds the .csv file is not downloaded,
> but there is no check,  just an error is thrown.  There are a number of
> ways to catch such errors,  such as "try...catch"  which will solve this
> problem
>
> HTH,
>
> -Roy
>
>
> > On Jan 22, 2020, at 5:48 PM, Spencer Graves <
> spencer.gra...@effectivedefense.org> wrote:
> >
> > Hello, All:
> >
> >
> > GOOD NEWS AND BAD NEWS:
> >
> >
> >   * First the good news:  I heard from Brian Ripley;  see below.
> > His web site says, "He retired in August 2014 on grounds of ill health."
> > (http://www.stats.ox.ac.uk/~ripley/)  I was pleased to see that he
> seems
> > to be well enough to send me the email below.
> >
> >
> >   * BAD NEWS:  My Ecfun package is violating current CRAN rules
> > regarding "not writing anywhere in the file space".  (See below.)
> >
> >
> > QUESTION:
> >
> >
> >   How do you suggest I respond to this?
> >
> >
> >   It's hard for me to fix, because I cannot replicate the error and
> > I don't understand the rules Prof. Ripley is trying to enforce. The
> > "CRAN Package Check Results for" this package show an error on 1
> > platform (r-devel-linux-x86_64-fedora-gcc), NOTEs on 3 platforms
> > (Fedora-clang and Debian), and "OK" on 9 others.  I can program selected
> > tests not to run on CRAN, e.g., with (!fda::CRAN()).
> >
> >
> >   However, I suspect I should be able to do better than that.
> >
> >
> >   Suggestions?
> >
> >
> >   Thanks,
> >   Spencer Graves
> >
> >
> > p.s.  The development version of this package is available at
> > "https://github.com/sbgraves237/Ecfun";.
> >
> >
> > https://cloud.r-project.org/web/checks/check_results_Ecfun.html
> >
> >
> >  Forwarded Message 
> > Subject:  CRAN package Ecfun
> > Date: Tue, 21 Jan 2020 21:26:02 +
> > From: Prof Brian Ripley 
> > Reply-To: CRAN 
> > To:   Spencer Graves 
> > CC:   CRAN 
> >
> >
> >
> > This has been intermittently failing its checks for a week: different
> > check runs failed (in the 24h prior to) the 14th, 15th, 17th and today.
> > The current failure is
> >
> > Check: examples
> > Result: ERROR
> > Running examples in ‘Ecfun-Ex.R’ failed
> > The error most likely occurred in:
> >
> >> ### Name: read.testURLs
> >> ### Title: Read a file produced by testURLs
> >> ### Aliases: read.testURLs
> >> ### Keywords: IO
> >>
> >> ### ** Examples
> >>
> >> # Test only 2 web sites, not the default 4,
> >> # and test only twice, not the default 10 times:
> >> tst <- testURLs(c(
> > + PVI="http://en.wikipedia.org/wiki/Cook_Partisan_Voting_Index";,
> > + house="http://house.gov/representatives";),
> > + n=2, maxFail=2)
> > 1
> > 1579634784, PVI, TRUE 0.828
> > 1579634785, house, FALSE 0.051
> > 1579634785, house, FALSE 0.048
> > 2
> > 1579634785, PVI, TRUE 0.043
> > 1579634785, house, FALSE 0.11
> > 1579634785, house, FALSE 0.035
> >>
> >> # The above should have created a file 'testURLresults.csv'
> >> # in the working directory. Read it.
> >>
> >> dat <- read.testURLs()
> > Error in read.table(file = file, header = header, sep = sep, quote =
> > quote, :
> > more columns than column names
> > Calls: read.testURLs -> read.csv -> read.table
> >
> > That does not conform to the policy on Internet access, not least as no
> > attempt is made to check if the file was created, let alone that it has
> > the expected layout. Nor does it conform to the policy on not writing
> > anywhere in the file space (and that shows on its CRAN results page too).
> >
> > Please correct ASAP and before Feb 4 to safely retain the package on
> CRAN.
> >
> > --
> > Brian D. Ripley,  rip...@stats.ox.ac.uk
> > Emeritus Professor of Applied Statistics, University of Oxford
> >
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-package-devel@r-proje

Re: [R-pkg-devel] CRAN rules re. web scraping?

2020-01-23 Thread Spencer Graves
  Thanks very much to Iñaki Ucar, Adam H Sparks, and Roy 
Mendelssohn for their replies that helped me understand what I needed to 
do to fix problems identified in the CRAN Checks.  I believe that those 
problems are not fixed in the development version of Ecfun available at 
"https://github.com/sbgraves237/Ecfun".  The package still needs more 
work, but I will make Prof. Ripley's Feb. 4 deadline.



  Thanks again,
  Spencer Graves


On 2020-01-23 01:55, Iñaki Ucar wrote:

On Thu, 23 Jan 2020 at 02:49, Spencer Graves
 wrote:

Hello, All:


GOOD NEWS AND BAD NEWS:


* First the good news:  I heard from Brian Ripley;  see below.
His web site says, "He retired in August 2014 on grounds of ill health."
(http://www.stats.ox.ac.uk/~ripley/)  I was pleased to see that he seems
to be well enough to send me the email below.


* BAD NEWS:  My Ecfun package is violating current CRAN rules
regarding "not writing anywhere in the file space".  (See below.)


QUESTION:


How do you suggest I respond to this?


It's hard for me to fix, because I cannot replicate the error and
I don't understand the rules Prof. Ripley is trying to enforce. The
"CRAN Package Check Results for" this package show an error on 1
platform (r-devel-linux-x86_64-fedora-gcc), NOTEs on 3 platforms
(Fedora-clang and Debian), and "OK" on 9 others.  I can program selected
tests not to run on CRAN, e.g., with (!fda::CRAN()).


However, I suspect I should be able to do better than that.


Suggestions?

The message from Prof. Ripley is crystal-clear, and exposes two issues
(Internet access, writing files) that have been discussed many times
in this list. A quick scan of the CRAN policy [1] yields:

- Packages which use Internet resources should fail gracefully with an
informative message if the resource is not available (and not give a
check warning nor error).

- Packages should not write in the user’s home filespace (including
clipboards), nor anywhere else on the file system apart from the R
session’s temporary directory.

[1] https://cran.r-project.org/web/packages/policies.html

Iñaki


Thanks,
Spencer Graves


p.s.  The development version of this package is available at
"https://github.com/sbgraves237/Ecfun";.


https://cloud.r-project.org/web/checks/check_results_Ecfun.html


 Forwarded Message 
Subject:CRAN package Ecfun
Date:   Tue, 21 Jan 2020 21:26:02 +
From:   Prof Brian Ripley 
Reply-To:   CRAN 
To: Spencer Graves 
CC: CRAN 



This has been intermittently failing its checks for a week: different
check runs failed (in the 24h prior to) the 14th, 15th, 17th and today.
The current failure is

Check: examples
Result: ERROR
Running examples in ‘Ecfun-Ex.R’ failed
The error most likely occurred in:

  > ### Name: read.testURLs
  > ### Title: Read a file produced by testURLs
  > ### Aliases: read.testURLs
  > ### Keywords: IO
  >
  > ### ** Examples
  >
  > # Test only 2 web sites, not the default 4,
  > # and test only twice, not the default 10 times:
  > tst <- testURLs(c(
+ PVI="http://en.wikipedia.org/wiki/Cook_Partisan_Voting_Index";,
+ house="http://house.gov/representatives";),
+ n=2, maxFail=2)
1
1579634784, PVI, TRUE 0.828
1579634785, house, FALSE 0.051
1579634785, house, FALSE 0.048
2
1579634785, PVI, TRUE 0.043
1579634785, house, FALSE 0.11
1579634785, house, FALSE 0.035
  >
  > # The above should have created a file 'testURLresults.csv'
  > # in the working directory. Read it.
  >
  > dat <- read.testURLs()
Error in read.table(file = file, header = header, sep = sep, quote =
quote, :
more columns than column names
Calls: read.testURLs -> read.csv -> read.table

That does not conform to the policy on Internet access, not least as no
attempt is made to check if the file was created, let alone that it has
the expected layout. Nor does it conform to the policy on not writing
anywhere in the file space (and that shows on its CRAN results page too).

Please correct ASAP and before Feb 4 to safely retain the package on CRAN.

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford


 [[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel




__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] CRAN rules re. web scraping?

2020-01-22 Thread Iñaki Ucar
On Thu, 23 Jan 2020 at 02:49, Spencer Graves
 wrote:
>
> Hello, All:
>
>
> GOOD NEWS AND BAD NEWS:
>
>
>* First the good news:  I heard from Brian Ripley;  see below.
> His web site says, "He retired in August 2014 on grounds of ill health."
> (http://www.stats.ox.ac.uk/~ripley/)  I was pleased to see that he seems
> to be well enough to send me the email below.
>
>
>* BAD NEWS:  My Ecfun package is violating current CRAN rules
> regarding "not writing anywhere in the file space".  (See below.)
>
>
> QUESTION:
>
>
>How do you suggest I respond to this?
>
>
>It's hard for me to fix, because I cannot replicate the error and
> I don't understand the rules Prof. Ripley is trying to enforce. The
> "CRAN Package Check Results for" this package show an error on 1
> platform (r-devel-linux-x86_64-fedora-gcc), NOTEs on 3 platforms
> (Fedora-clang and Debian), and "OK" on 9 others.  I can program selected
> tests not to run on CRAN, e.g., with (!fda::CRAN()).
>
>
>However, I suspect I should be able to do better than that.
>
>
>Suggestions?

The message from Prof. Ripley is crystal-clear, and exposes two issues
(Internet access, writing files) that have been discussed many times
in this list. A quick scan of the CRAN policy [1] yields:

- Packages which use Internet resources should fail gracefully with an
informative message if the resource is not available (and not give a
check warning nor error).

- Packages should not write in the user’s home filespace (including
clipboards), nor anywhere else on the file system apart from the R
session’s temporary directory.

[1] https://cran.r-project.org/web/packages/policies.html

Iñaki

>Thanks,
>Spencer Graves
>
>
> p.s.  The development version of this package is available at
> "https://github.com/sbgraves237/Ecfun";.
>
>
> https://cloud.r-project.org/web/checks/check_results_Ecfun.html
>
>
>  Forwarded Message 
> Subject:CRAN package Ecfun
> Date:   Tue, 21 Jan 2020 21:26:02 +
> From:   Prof Brian Ripley 
> Reply-To:   CRAN 
> To: Spencer Graves 
> CC: CRAN 
>
>
>
> This has been intermittently failing its checks for a week: different
> check runs failed (in the 24h prior to) the 14th, 15th, 17th and today.
> The current failure is
>
> Check: examples
> Result: ERROR
> Running examples in ‘Ecfun-Ex.R’ failed
> The error most likely occurred in:
>
>  > ### Name: read.testURLs
>  > ### Title: Read a file produced by testURLs
>  > ### Aliases: read.testURLs
>  > ### Keywords: IO
>  >
>  > ### ** Examples
>  >
>  > # Test only 2 web sites, not the default 4,
>  > # and test only twice, not the default 10 times:
>  > tst <- testURLs(c(
> + PVI="http://en.wikipedia.org/wiki/Cook_Partisan_Voting_Index";,
> + house="http://house.gov/representatives";),
> + n=2, maxFail=2)
> 1
> 1579634784, PVI, TRUE 0.828
> 1579634785, house, FALSE 0.051
> 1579634785, house, FALSE 0.048
> 2
> 1579634785, PVI, TRUE 0.043
> 1579634785, house, FALSE 0.11
> 1579634785, house, FALSE 0.035
>  >
>  > # The above should have created a file 'testURLresults.csv'
>  > # in the working directory. Read it.
>  >
>  > dat <- read.testURLs()
> Error in read.table(file = file, header = header, sep = sep, quote =
> quote, :
> more columns than column names
> Calls: read.testURLs -> read.csv -> read.table
>
> That does not conform to the policy on Internet access, not least as no
> attempt is made to check if the file was created, let alone that it has
> the expected layout. Nor does it conform to the policy on not writing
> anywhere in the file space (and that shows on its CRAN results page too).
>
> Please correct ASAP and before Feb 4 to safely retain the package on CRAN.
>
> --
> Brian D. Ripley,  rip...@stats.ox.ac.uk
> Emeritus Professor of Applied Statistics, University of Oxford
>
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel


-- 
Iñaki Úcar

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] CRAN rules re. web scraping?

2020-01-22 Thread Roy Mendelssohn - NOAA Federal via R-package-devel
Hi Spencer:

I think that message means what it says,  and I read it as pretty 
straightforward and business like.  The issue is not web scraping.  There are 
two errors here:

1.  You can not write to the user's space without first explicitly asking 
permission of the user.   The suggested policy is to write to a temp directory, 
 R has tempdir() and related commands for how to do this.

2.  When accessing something over the internet,  failure of the access must be 
checked for and the program exiting gracefully.  The second error appears to be 
that at times on the builds the .csv file is not downloaded,  but there is no 
check,  just an error is thrown.  There are a number of ways to catch such 
errors,  such as "try...catch"  which will solve this problem

HTH,

-Roy


> On Jan 22, 2020, at 5:48 PM, Spencer Graves 
>  wrote:
> 
> Hello, All:
> 
> 
> GOOD NEWS AND BAD NEWS:
> 
> 
>   * First the good news:  I heard from Brian Ripley;  see below.  
> His web site says, "He retired in August 2014 on grounds of ill health." 
> (http://www.stats.ox.ac.uk/~ripley/)  I was pleased to see that he seems 
> to be well enough to send me the email below.
> 
> 
>   * BAD NEWS:  My Ecfun package is violating current CRAN rules 
> regarding "not writing anywhere in the file space".  (See below.)
> 
> 
> QUESTION:
> 
> 
>   How do you suggest I respond to this?
> 
> 
>   It's hard for me to fix, because I cannot replicate the error and 
> I don't understand the rules Prof. Ripley is trying to enforce. The 
> "CRAN Package Check Results for" this package show an error on 1 
> platform (r-devel-linux-x86_64-fedora-gcc), NOTEs on 3 platforms 
> (Fedora-clang and Debian), and "OK" on 9 others.  I can program selected 
> tests not to run on CRAN, e.g., with (!fda::CRAN()).
> 
> 
>   However, I suspect I should be able to do better than that.
> 
> 
>   Suggestions?
> 
> 
>   Thanks,
>   Spencer Graves
> 
> 
> p.s.  The development version of this package is available at 
> "https://github.com/sbgraves237/Ecfun";.
> 
> 
> https://cloud.r-project.org/web/checks/check_results_Ecfun.html
> 
> 
>  Forwarded Message 
> Subject:  CRAN package Ecfun
> Date: Tue, 21 Jan 2020 21:26:02 +
> From: Prof Brian Ripley 
> Reply-To: CRAN 
> To:   Spencer Graves 
> CC:   CRAN 
> 
> 
> 
> This has been intermittently failing its checks for a week: different 
> check runs failed (in the 24h prior to) the 14th, 15th, 17th and today. 
> The current failure is
> 
> Check: examples
> Result: ERROR
> Running examples in ‘Ecfun-Ex.R’ failed
> The error most likely occurred in:
> 
>> ### Name: read.testURLs
>> ### Title: Read a file produced by testURLs
>> ### Aliases: read.testURLs
>> ### Keywords: IO
>> 
>> ### ** Examples
>> 
>> # Test only 2 web sites, not the default 4,
>> # and test only twice, not the default 10 times:
>> tst <- testURLs(c(
> + PVI="http://en.wikipedia.org/wiki/Cook_Partisan_Voting_Index";,
> + house="http://house.gov/representatives";),
> + n=2, maxFail=2)
> 1
> 1579634784, PVI, TRUE 0.828
> 1579634785, house, FALSE 0.051
> 1579634785, house, FALSE 0.048
> 2
> 1579634785, PVI, TRUE 0.043
> 1579634785, house, FALSE 0.11
> 1579634785, house, FALSE 0.035
>> 
>> # The above should have created a file 'testURLresults.csv'
>> # in the working directory. Read it.
>> 
>> dat <- read.testURLs()
> Error in read.table(file = file, header = header, sep = sep, quote = 
> quote, :
> more columns than column names
> Calls: read.testURLs -> read.csv -> read.table
> 
> That does not conform to the policy on Internet access, not least as no 
> attempt is made to check if the file was created, let alone that it has 
> the expected layout. Nor does it conform to the policy on not writing 
> anywhere in the file space (and that shows on its CRAN results page too).
> 
> Please correct ASAP and before Feb 4 to safely retain the package on CRAN.
> 
> -- 
> Brian D. Ripley,  rip...@stats.ox.ac.uk
> Emeritus Professor of Applied Statistics, University of Oxford
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

**
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new street address***
110 McAllister Way
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: https://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

__
R-package-devel@r-pro

[R-pkg-devel] CRAN rules re. web scraping?

2020-01-22 Thread Spencer Graves
Hello, All:


GOOD NEWS AND BAD NEWS:


   * First the good news:  I heard from Brian Ripley;  see below.  
His web site says, "He retired in August 2014 on grounds of ill health." 
(http://www.stats.ox.ac.uk/~ripley/)  I was pleased to see that he seems 
to be well enough to send me the email below.


   * BAD NEWS:  My Ecfun package is violating current CRAN rules 
regarding "not writing anywhere in the file space".  (See below.)


QUESTION:


   How do you suggest I respond to this?


   It's hard for me to fix, because I cannot replicate the error and 
I don't understand the rules Prof. Ripley is trying to enforce. The 
"CRAN Package Check Results for" this package show an error on 1 
platform (r-devel-linux-x86_64-fedora-gcc), NOTEs on 3 platforms 
(Fedora-clang and Debian), and "OK" on 9 others.  I can program selected 
tests not to run on CRAN, e.g., with (!fda::CRAN()).


   However, I suspect I should be able to do better than that.


   Suggestions?


   Thanks,
   Spencer Graves


p.s.  The development version of this package is available at 
"https://github.com/sbgraves237/Ecfun";.


https://cloud.r-project.org/web/checks/check_results_Ecfun.html


 Forwarded Message 
Subject:CRAN package Ecfun
Date:   Tue, 21 Jan 2020 21:26:02 +
From:   Prof Brian Ripley 
Reply-To:   CRAN 
To: Spencer Graves 
CC: CRAN 



This has been intermittently failing its checks for a week: different 
check runs failed (in the 24h prior to) the 14th, 15th, 17th and today. 
The current failure is

Check: examples
Result: ERROR
Running examples in ‘Ecfun-Ex.R’ failed
The error most likely occurred in:

 > ### Name: read.testURLs
 > ### Title: Read a file produced by testURLs
 > ### Aliases: read.testURLs
 > ### Keywords: IO
 >
 > ### ** Examples
 >
 > # Test only 2 web sites, not the default 4,
 > # and test only twice, not the default 10 times:
 > tst <- testURLs(c(
+ PVI="http://en.wikipedia.org/wiki/Cook_Partisan_Voting_Index";,
+ house="http://house.gov/representatives";),
+ n=2, maxFail=2)
1
1579634784, PVI, TRUE 0.828
1579634785, house, FALSE 0.051
1579634785, house, FALSE 0.048
2
1579634785, PVI, TRUE 0.043
1579634785, house, FALSE 0.11
1579634785, house, FALSE 0.035
 >
 > # The above should have created a file 'testURLresults.csv'
 > # in the working directory. Read it.
 >
 > dat <- read.testURLs()
Error in read.table(file = file, header = header, sep = sep, quote = 
quote, :
more columns than column names
Calls: read.testURLs -> read.csv -> read.table

That does not conform to the policy on Internet access, not least as no 
attempt is made to check if the file was created, let alone that it has 
the expected layout. Nor does it conform to the policy on not writing 
anywhere in the file space (and that shows on its CRAN results page too).

Please correct ASAP and before Feb 4 to safely retain the package on CRAN.

-- 
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford


[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel