Re: [R] Question about Rfast colMins and colMaxs

Stephen H. Dawson, DSL via R-help Wed, 01 Dec 2021 09:01:59 -0800

Hi Jeff,


Thanks for the reply.

Your attitude in your writing is terse.


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 12/1/21 11:31 AM, Jeff Newmiller wrote:

Avi: As I understand it, the definition regarding what is on-topic is not 
targeted at tidyverse... it addresses the futility of trying to support the 
thousands of problem domains represented by contributed packages in one mailing 
list without drowning the subscribers in discussions about problem domains they 
are not interested in. This mailing list happens to be about the R language, 
and extends as far as the packages included with the R language. Tidyverse just 
happens to be outside this scope (though some here are more opposed to using it 
than others and express that sentiment in their responses).

Stephen: As to obtaining min or max of columns, such a question is nonsensical 
when the column contains character data, and mostly nonsensical when the data 
are binary regardless of what package you are using. From this perspective, it 
only makes sense to look at extremes on numeric columns selected from the data 
after they are loaded into a data frame.

     dta <- read.csv( "yourdata.csv" )
     dta_num <- dta[ , c("Elapsed_Time", "Age" ) ]

I don't know this package Rfast or its functions, but I would imagine that you 
could then use

     dta_maxs <- colMaxs( as.matrix( dta_num ) )

which might be a few microseconds faster than a typical base R solution:

     dta_maxs <- apply( dta_num, 2, max )

which works fine without loading a special package for this rather basic 
functionality.

Sometimes packages get developed to meet unusual needs (thousands of columns) 
or while chasing an optimization that turns out to be marginal (the real time 
sink may be in converting a data frame into a matrix so the compiled C code can 
then process it a smidge quicker), and none of this should be interfering in 
your learning of R but you end up following biased blogs or package vignettes 
that may or may not deliver on their promise to simplify your tasks.

Some aspects of tidyverse are quite effective and I use them regularly... but 
there are a lot of these less valuable contributions also that are distractions 
or are best for corner cases. You can only figure out the difference by 
learning the basics first and comparing approaches (sometimes a package that 
requires you to manually convert to matrix is a good idea, but sometimes it is 
just a pain) for usefulness and/or speed (e.g. using the microbenchmark 
contributed package).

On November 30, 2021 7:42:35 PM PST, Avi Gross via R-help 
<r-help@r-project.org> wrote:

Stephen,

Although what is in the STANDARD R distribution can vary several ways, in
general, if you need to add a line like:

library(something)
or
require(something)

and your code does not work unless you have done that, then you can imagine
it is not sort of built in to R as it starts.

Having said that, tons of exceptions may exist that cause R to load in
things on your machine for everyone or just for you without you having to
notice.

I think this forum lately has been deluged with questions about all kinds of
add-on packages and in particular, lots of the ones in the tidyverse.
Clearly the purpose here is not that broad.

But since I use some packages like the tidyverse extensively, and I am far

>from alone, I wonder if someday the powers that be realize it is a losing

battle to exclude at least some of it. It would be so nice not having to
include a long list of packages for some programs or somehow arrange that
people using something you shared had installed and loaded them. But there
are too many packages out there of varying quality and usefulness and
popularity with more every day being added. Worse, many are somewhat
incompatible such as having functions with the same names that hide earlier
ones loaded.

Base R doe come with functions like colSums and colMeans and similar row
functions. But as mentioned, a data.frame is a list of vectors and R
supports certain functional programming constructs over lists using things
like:

lapply(df, min)
sapply(df, min)

And actually quite a few ways depending on what info you want back and
whether you insist it be returned as a list or vector or other things . You
can even supply additional arguments that might be needed such as if you
want to ignore any NA values,

lapply(df, min, na.rm=TRUE

The package you looked at it is trying to be fast and uses what looks like
compiled external code but so does lapply.

If this is too bothersome for you, consider making a one-liner function like
this:

mycolMins <- function(df, ...) lapply(df, min, ...)

Once defined, you can use that just fine and not think about it again and I
note this answer (like others) is offering you something in base R that
works fine on data.frames and the like.

You can extend to many similar ideas like this one that calulates the min
unless you over-ride it with max or mean or sd or a bizarre function like
`[` so a call to:

mycolCalc(df, `[`, 3)

Will return exactly the third items in each row!

I find it to be very common for someone these days to do a quick search for
a way to do something in a language like R and not really look to see if it
is a standard way or something special/ Matrices in R are not quite the same
as some other objects like a data.frame or tibble and a package written to
be used on one may (or may not) happen to work with another. Some packages
are carefully written to try to detect what kind of object it gets and when
possible convert it to another. The "apply" function is meant for matrices
but if it sees something else it looks ta the dimensionality and tries to
coerce it with as.matrix or as.array first. As others have noted, this mean
a data.frame containing non-numeric parts may fail or should have any other
columns hidden/removed as in this df that has some non-numeric fields:

df

i       s   f     b i2
1 1   hello 1.2  TRUE  5
2 2   there 2.3 FALSE  4
3 3 goodbye 3.4  TRUE  3

So a bit more complex one-liner removes any non-numeric columns like this:

mycolMins(df[, sapply(df, is.numeric)])

$i
[1] 1

$f
[1] 1.2

$i2
[1] 3

Clearly converting that to a matrix while whole would result in everything
being converted to character and a minimum may be elusive.

-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Stephen H. Dawson,
DSL via R-help
Sent: Tuesday, November 30, 2021 5:37 PM
To: Bert Gunter <bgunter.4...@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] Question about Rfast colMins and colMaxs

Oh, you are segmenting standard R from the rest of R.

Well, that part did not come across to me in your original reply. I am not
clear on a standard versus non-standard list. I will look into this aspect
and see what I can learn going forward.


Thanks,
*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 11/30/21 5:26 PM, Bert Gunter wrote:

... but Rfast is *not* a "standard" package, as the rest of the PG
excerpt says. So contact the maintainer and ask him/her what they
think the best practice should be for their package. As has been
pointed out already, it appears to differ from the usual "read it in
as a data frame" procedure.

Bert

On Tue, Nov 30, 2021 at 2:11 PM Stephen H. Dawson, DSL
<serv...@shdawson.com> wrote:

Right, R Studio is not R.

However, the Rfast package is part of R.

https://cran.r-project.org/web/packages/Rfast/index.html

So, rephrasing my question...
What is the best practice to bring a csv file into R so it can be
accessed by colMaxs and colMins, please?

*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 11/30/21 3:19 PM, Bert Gunter wrote:

RStudio is **not** R. In particular, the so-called TidyVerse
consists of all *non*-standard contributed packages, about which the PG

says:

"For questions about functions in standard packages distributed with
R (see the FAQ Add-on packages in R), ask questions on R-help.
[The link is:
https://cran.r-project.org/doc/FAQ/R-FAQ.html#Add-on-packages-in-R
This gives the list of current _standard_ packages]

If the question relates to a contributed package , e.g., one
downloaded from CRAN, try contacting the package maintainer first.
You can also use find("functionname") and
packageDescription("packagename") to find this information. Only
send such questions to R-help or R-devel if you get no reply or need
further assistance. This applies to both requests for help and to
bug reports."

Note that RStudio maintains its own help resources at:
https://community.rstudio.com/
This is where questions about the TidyVerse, ggplot, etc. should be

posted.



Bert Gunter

"The trouble with having an open mind is that people keep coming
along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Nov 30, 2021 at 10:55 AM Stephen H. Dawson, DSL via R-help
<r-help@r-project.org> wrote:

Hi,


I am working to understand the Rfast functions of colMins and
colMaxs. I worked through the example listed on page 54 of the PDF.

https://cran.r-project.org/web/packages/Rfast/index.html

https://cran.r-project.org/web/packages/Rfast/Rfast.pdf

My data is in a CSV file. So, I bring it into R Studio using:
Data <- read.csv("./input/DataSet05.csv", header=T)

However, I read the instructions listed on page 54 of the PDF
saying I need to bring data into R using a matrix. I think read.csv
brings the data in as a dataframe. I think colMins is failing
because it is looking for a matrix but finds a dataframe.

    > colMaxs(Data)
Error in colMaxs(Data) :
      Not compatible with requested type: [type=list; target=double].
    > colMins(Data, na.rm = TRUE)
Error in colMins(Data, na.rm = TRUE) :
      unused argument (na.rm = TRUE)
    > colMins(Data, value = FALSE, parallel = FALSE) Error in
colMins(Data, value = FALSE, parallel = FALSE) :
      Not compatible with requested type: [type=list; target=double].

QUESTION
What is the best practice to bring a csv file into R Studio so it
can be accessed by colMaxs and colMins, please?


Thanks,
--
*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about Rfast colMins and colMaxs

Reply via email to