Re: [R] problem: try to passing macro value into submit block

2021-12-21 Thread David Winsemius



On 12/21/21 6:00 PM, Kai Yang via R-help wrote:

Hi team,I'm trying to pass macro variable into R script in Proc iml. I want to do change 
variable in color= and export the result with different file name.If I don't use macro, 
the code work well. But when I try to use macro below, I got error message: "Submit 
block cannot be directly placed in a macro. Instead, place the submit block into a file 
first and then use %include to include the file within a macro definition.". After 
reading the message, I still not sure how to fix the problem in the code. Anyone can help 
me?
Thank you,Kai
%macro pplot(a);proc iml;
submit / R;
library(ggplot2)library(tidyverse)
mpg %>%  filter(hwy <35) %>%   ggplot(aes(x = displ, y = hwy, color = )) +   
geom_point()ggsave("c:/temp/")
endsubmit;
quit;%mend;%pplot(drv);%pplot(cyl);

[[alternative HTML version deleted]]



Two problems I see. 1) you posted to R-help using html whereas the 
mailing list is a plain text venue, and 2) I reasonably sure that's a 
SAS error message and we don't consult on SAS problems.


If you strip out the stuff involving "" and add back in the elided 
line-breaks the R code runs without error.


--

David.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem: try to passing macro value into submit block

2021-12-21 Thread Kai Yang via R-help
Hi team,I'm trying to pass macro variable into R script in Proc iml. I want to 
do change variable in color= and export the result with different file name.If 
I don't use macro, the code work well. But when I try to use macro below, I got 
error message: "Submit block cannot be directly placed in a macro. Instead, 
place the submit block into a file first and then use %include to include the 
file within a macro definition.". After reading the message, I still not sure 
how to fix the problem in the code. Anyone can help me?
Thank you,Kai
%macro pplot(a);proc iml;
submit / R;
library(ggplot2)library(tidyverse)
mpg %>%  filter(hwy <35) %>%   ggplot(aes(x = displ, y = hwy, color = )) +   
geom_point()ggsave("c:/temp/")
endsubmit;
quit;%mend;%pplot(drv);%pplot(cyl);

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating NA equivalent

2021-12-21 Thread Avi Gross via R-help
Jim,

there are indeed many mathematical areas where data are not quite fixed. 
Consider inequalities such as a value that can be higher than some number but 
lower than another. A grade of A can often mean a score between 90 and 100 (no 
extra credit). An event deemed to be "significant at the 95% level of 
probability can be in a 5% range or based on various errors, may not even be in 
the range. Some places you can have infinitesimals or things approaching 
infinity and yet sometimes cancel things out without having an exact number.

The list of such things is vast and as was already pointed out here, many such 
cases have some info, even USEFUL info, that is lost if you declare them to be 
an NA or an Inf or by say choosing to view an A is exactly 95. If a student has 
straight A's, there is an excellent chance many of those A's came from scores 
above 95. A student with an overall C average may be more likely to have the 
single A be in the low 90's. 

R was not necessarily designed to work this way. For some purposes, you may 
want to use a variable that is more of a range. When I make plots in ggplot, I 
often use Inf or -Inf to specify one end of a range, so that, for example, 
whatever the data makes ggplot choose for upper and lower bounds, something I 
draw in the background will extend to that border. 

But there is a difference between how we store info, and how we use it. Many R 
functions have a feature like saying na.rm=TRUE that may not make sense if you 
store a value as an NA whose meaning is "between 95 and 100". You might want to 
write code that makes two copies of any vector which has an NA value associated 
with a range, and do something like place the minimum value(s) in one and the 
maximum in the other and then do some complex calculation.

Or consider a value like measuring a room with a ruler accurate only to 1/4 
inch? If a side is 100 inches, the real value can be between 99.75 and 100.25 
inches. Each measurement can be stored as a number and a plus/minus. To 
calculate the volume of a room, you might multiply all the low values to get 
one number and the high values to get another and store that as a range or 
whatever else makes send like averaging the two. 

Still, some of that is normally ignored or done some other way, without 
inventing new meanings for NA. I noted earlier that programs outside R will 
often do something like store out-of-band info that when imported into R is 
always treated as NA. Some thig may be unavailable because the person did not 
show up, others because they had horrible handwriting and the one who typed it 
in guessed what it said, and others who refused to answer . It may be that much 
of your program should treat all those as NA but other parts might want to 
record that some percent of the responders did this or that. As noted, Adrian 
Dusa and others had such needs and have a package that in some way annotates NA 
values when asked. I have played with it but currently have no need for it. 
And, just FYI, Adrian tried other things first as there already are multiple 
bit patterns that mean specific variation on an NA such as NA_integer_ (note 
the two underscores) and other variants for character, real, complex and a few 
more. In a bizarre way, you can play games and test them as in:

  > a=NA_integer_
  > b=NA_character_
  > identical(a, NA_integer_)
  [1] TRUE
  > identical(a, NA_character_)
  [1] FALSE
  > identical(a, a)
  [1] TRUE
  > identical(a, b)
  [1] FALSE
  > identical(a, NA)
  [1] FALSE

So, in THEORY, you might get away to using these oddball bitmap variations, or 
adding to them but they do not survive well in vectors which must in some sense 
only contain one type. I have had some minor success making a list and test the 
contents, which normally show all version as NA but clearly retain subtle 
differences:

  > temp=list(1, NA_integer_, 2, NA_character_, 3, NA)
  > temp
  [[1]]
  [1] 1
  
  [[2]]
  [1] NA
  
  [[3]]
  [1] 2
  
  [[4]]
  [1] NA
  
  [[5]]
  [1] 3
  
  [[6]]
  [1] NA
  
  > temp[[2]]
  [1] NA
  > identical(temp[[2]], NA_integer_)
  [1] TRUE
  > identical(temp[[2]], NA_character_)
  [1] FALSE
  > identical(temp[[4]], NA_character_)
  [1] TRUE

So, yes, I can imagine a subtle window of opportunity for re-using some of 
these NA variants to act like an NA but also be able to carefully signal some 
other opportunities. But as noted, vectors break the scheme so your data.frame 
might need to use list columns, which is doable. I bet many tools you use, 
especially ones that make copies or conversions, will break the scheme.

Please note that for ME, the above discussion is academic and a reaction to the 
ideas raised by others. I am not in any way suggesting R is deficient for not 
being designed for things like this, nor that wanting some such feature is a 
bad thing. What Adrian provided is sort of in between as real NA are stored but 
also some attributes record what the NA is supposed to represent.





Re: [R] [EXT] Re: Creating NA equivalent

2021-12-21 Thread David K Stevens

Hello all,

My two cents. We use the term "below the detection limit" for any 
physical measurement that is cannot be distinguished from noise in the 
measurement system. This may either be instance specific (determine the 
detection limit for each instance) or "below the reporting limit" which 
is usually set at the maximum of the detection limits found for each 
instance as an administrative simplification. Either way the 
interpretation is "the value is between 0 and the limit" which carries 
information just as the "at least this much" limit in survival analysis. 
Some data sets have both lower and upper censoring and survival analysis 
appears to be the most appropriate. This is discussed in detail by 
Dennis Helsel from Practical Stats and is captured in the NADA package 
in R for environmental and hydrological data.


regards

David Stevens

On 12/21/2021 4:35 PM, Jim Lemon wrote:

Hi Bert,
What troubles me about this is that something like detectable level(s)
is determined at a particular time and may change. Censoring in
survival tells us that the case lasted "at least this long". While a
less than detectable value doesn't give any useful information apart
from perhaps "non-zero", an over limit value gives something like
censoring with "at least this much". However, it is more difficult to
conceptualize and I suspect, to quantify. To me, the important
information is that we think there _may be_ a value but we don't
(yet?) know it.

Jim

On Wed, Dec 22, 2021 at 9:56 AM Bert Gunter  wrote:

But you appear to be missing something, Jim -- see inline below (and
the original post):

Bert


On Tue, Dec 21, 2021 at 2:00 PM Jim Lemon  wrote:

Please pardon a comment that may be off-target as well as off-topic.
This appears similar to a number of things like fuzzy logic, where an
instance can take incompatible truth values.

It is known that an instance may have an attribute with a numeric
value, but that value cannot be determined.

Yes, but **something** about the value is known: that it is > an upper
value or < a lower value. Such information should be used
(censoring!), not characterized as completely unknown. Think about it
in terms of survival time: saying that a person lasted longer than k
months is much more informative than saying that how long they lasted
is completely unknown!


It seems to me that an appropriate designation for the value is Unk,
perhaps with an associated probability of determination to distinguish
it from NA (it is definitely not known).

Jim

On Wed, Dec 22, 2021 at 6:55 AM Avi Gross via R-help
 wrote:

I wonder if the package Adrian Dușa created might be helpful or point you along 
the way.

It was eventually named "declared"

https://cran.r-project.org/web/packages/declared/index.html

With a vignette here:

https://cran.r-project.org/web/packages/declared/vignettes/declared.pdf

I do not know if it would easily satisfy your needs but it may be a step along 
the way. A package called Haven was part of the motivation and Adrian wanted a 
way to import data from external sources that had more than one category of NA 
that sounds a bit like what you want. His functions should allow the creation 
of such data within R, as well. I am including him in this email if you want to 
contact him or he has something to say.


-Original Message-
From: R-help  On Behalf Of Duncan Murdoch
Sent: Tuesday, December 21, 2021 5:26 AM
To: Marc Girondot ; r-help@r-project.org
Subject: Re: [R] Creating NA equivalent

On 20/12/2021 11:41 p.m., Marc Girondot via R-help wrote:

Dear members,

I work about dosage and some values are bellow the detection limit. I
would like create new "numbers" like LDL (to represent lower than
detection limit) and UDL (upper the detection limit) that behave like
NA, with the possibility to test them using for example is.LDL() or
is.UDL().

Note that NA is not the same than LDL or UDL: NA represent missing data.
Here the data is available as LDL or UDL.

NA is built in R language very deep... any option to create new
version of NA-equivalent ?


There was a discussion of this back in May.  Here's a link to one approach that 
I suggested:

https://stat.ethz.ch/pipermail/r-devel/2021-May/080776.html

Read the followup messages, I made at least one suggested improvement.
I don't know if anyone has packaged this, but there's a later version of the 
code here:

https://stackoverflow.com/a/69179441/2554330

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and 

Re: [R] Creating NA equivalent

2021-12-21 Thread Jim Lemon
Hi Bert,
What troubles me about this is that something like detectable level(s)
is determined at a particular time and may change. Censoring in
survival tells us that the case lasted "at least this long". While a
less than detectable value doesn't give any useful information apart
from perhaps "non-zero", an over limit value gives something like
censoring with "at least this much". However, it is more difficult to
conceptualize and I suspect, to quantify. To me, the important
information is that we think there _may be_ a value but we don't
(yet?) know it.

Jim

On Wed, Dec 22, 2021 at 9:56 AM Bert Gunter  wrote:
>
> But you appear to be missing something, Jim -- see inline below (and
> the original post):
>
> Bert
>
>
> On Tue, Dec 21, 2021 at 2:00 PM Jim Lemon  wrote:
> >
> > Please pardon a comment that may be off-target as well as off-topic.
> > This appears similar to a number of things like fuzzy logic, where an
> > instance can take incompatible truth values.
> >
> > It is known that an instance may have an attribute with a numeric
> > value, but that value cannot be determined.
> Yes, but **something** about the value is known: that it is > an upper
> value or < a lower value. Such information should be used
> (censoring!), not characterized as completely unknown. Think about it
> in terms of survival time: saying that a person lasted longer than k
> months is much more informative than saying that how long they lasted
> is completely unknown!
>
> >
> > It seems to me that an appropriate designation for the value is Unk,
> > perhaps with an associated probability of determination to distinguish
> > it from NA (it is definitely not known).
> >
> > Jim
> >
> > On Wed, Dec 22, 2021 at 6:55 AM Avi Gross via R-help
> >  wrote:
> > >
> > > I wonder if the package Adrian Dușa created might be helpful or point you 
> > > along the way.
> > >
> > > It was eventually named "declared"
> > >
> > > https://cran.r-project.org/web/packages/declared/index.html
> > >
> > > With a vignette here:
> > >
> > > https://cran.r-project.org/web/packages/declared/vignettes/declared.pdf
> > >
> > > I do not know if it would easily satisfy your needs but it may be a step 
> > > along the way. A package called Haven was part of the motivation and 
> > > Adrian wanted a way to import data from external sources that had more 
> > > than one category of NA that sounds a bit like what you want. His 
> > > functions should allow the creation of such data within R, as well. I am 
> > > including him in this email if you want to contact him or he has 
> > > something to say.
> > >
> > >
> > > -Original Message-
> > > From: R-help  On Behalf Of Duncan Murdoch
> > > Sent: Tuesday, December 21, 2021 5:26 AM
> > > To: Marc Girondot ; r-help@r-project.org
> > > Subject: Re: [R] Creating NA equivalent
> > >
> > > On 20/12/2021 11:41 p.m., Marc Girondot via R-help wrote:
> > > > Dear members,
> > > >
> > > > I work about dosage and some values are bellow the detection limit. I
> > > > would like create new "numbers" like LDL (to represent lower than
> > > > detection limit) and UDL (upper the detection limit) that behave like
> > > > NA, with the possibility to test them using for example is.LDL() or
> > > > is.UDL().
> > > >
> > > > Note that NA is not the same than LDL or UDL: NA represent missing data.
> > > > Here the data is available as LDL or UDL.
> > > >
> > > > NA is built in R language very deep... any option to create new
> > > > version of NA-equivalent ?
> > > >
> > >
> > > There was a discussion of this back in May.  Here's a link to one 
> > > approach that I suggested:
> > >
> > >https://stat.ethz.ch/pipermail/r-devel/2021-May/080776.html
> > >
> > > Read the followup messages, I made at least one suggested improvement.
> > > I don't know if anyone has packaged this, but there's a later version of 
> > > the code here:
> > >
> > >https://stackoverflow.com/a/69179441/2554330
> > >
> > > Duncan Murdoch
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating NA equivalent

2021-12-21 Thread Bert Gunter
But you appear to be missing something, Jim -- see inline below (and
the original post):

Bert


On Tue, Dec 21, 2021 at 2:00 PM Jim Lemon  wrote:
>
> Please pardon a comment that may be off-target as well as off-topic.
> This appears similar to a number of things like fuzzy logic, where an
> instance can take incompatible truth values.
>
> It is known that an instance may have an attribute with a numeric
> value, but that value cannot be determined.
Yes, but **something** about the value is known: that it is > an upper
value or < a lower value. Such information should be used
(censoring!), not characterized as completely unknown. Think about it
in terms of survival time: saying that a person lasted longer than k
months is much more informative than saying that how long they lasted
is completely unknown!

>
> It seems to me that an appropriate designation for the value is Unk,
> perhaps with an associated probability of determination to distinguish
> it from NA (it is definitely not known).
>
> Jim
>
> On Wed, Dec 22, 2021 at 6:55 AM Avi Gross via R-help
>  wrote:
> >
> > I wonder if the package Adrian Dușa created might be helpful or point you 
> > along the way.
> >
> > It was eventually named "declared"
> >
> > https://cran.r-project.org/web/packages/declared/index.html
> >
> > With a vignette here:
> >
> > https://cran.r-project.org/web/packages/declared/vignettes/declared.pdf
> >
> > I do not know if it would easily satisfy your needs but it may be a step 
> > along the way. A package called Haven was part of the motivation and Adrian 
> > wanted a way to import data from external sources that had more than one 
> > category of NA that sounds a bit like what you want. His functions should 
> > allow the creation of such data within R, as well. I am including him in 
> > this email if you want to contact him or he has something to say.
> >
> >
> > -Original Message-
> > From: R-help  On Behalf Of Duncan Murdoch
> > Sent: Tuesday, December 21, 2021 5:26 AM
> > To: Marc Girondot ; r-help@r-project.org
> > Subject: Re: [R] Creating NA equivalent
> >
> > On 20/12/2021 11:41 p.m., Marc Girondot via R-help wrote:
> > > Dear members,
> > >
> > > I work about dosage and some values are bellow the detection limit. I
> > > would like create new "numbers" like LDL (to represent lower than
> > > detection limit) and UDL (upper the detection limit) that behave like
> > > NA, with the possibility to test them using for example is.LDL() or
> > > is.UDL().
> > >
> > > Note that NA is not the same than LDL or UDL: NA represent missing data.
> > > Here the data is available as LDL or UDL.
> > >
> > > NA is built in R language very deep... any option to create new
> > > version of NA-equivalent ?
> > >
> >
> > There was a discussion of this back in May.  Here's a link to one approach 
> > that I suggested:
> >
> >https://stat.ethz.ch/pipermail/r-devel/2021-May/080776.html
> >
> > Read the followup messages, I made at least one suggested improvement.
> > I don't know if anyone has packaged this, but there's a later version of 
> > the code here:
> >
> >https://stackoverflow.com/a/69179441/2554330
> >
> > Duncan Murdoch
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating NA equivalent

2021-12-21 Thread Jim Lemon
Please pardon a comment that may be off-target as well as off-topic.
This appears similar to a number of things like fuzzy logic, where an
instance can take incompatible truth values.

It is known that an instance may have an attribute with a numeric
value, but that value cannot be determined.

It seems to me that an appropriate designation for the value is Unk,
perhaps with an associated probability of determination to distinguish
it from NA (it is definitely not known).

Jim

On Wed, Dec 22, 2021 at 6:55 AM Avi Gross via R-help
 wrote:
>
> I wonder if the package Adrian Dușa created might be helpful or point you 
> along the way.
>
> It was eventually named "declared"
>
> https://cran.r-project.org/web/packages/declared/index.html
>
> With a vignette here:
>
> https://cran.r-project.org/web/packages/declared/vignettes/declared.pdf
>
> I do not know if it would easily satisfy your needs but it may be a step 
> along the way. A package called Haven was part of the motivation and Adrian 
> wanted a way to import data from external sources that had more than one 
> category of NA that sounds a bit like what you want. His functions should 
> allow the creation of such data within R, as well. I am including him in this 
> email if you want to contact him or he has something to say.
>
>
> -Original Message-
> From: R-help  On Behalf Of Duncan Murdoch
> Sent: Tuesday, December 21, 2021 5:26 AM
> To: Marc Girondot ; r-help@r-project.org
> Subject: Re: [R] Creating NA equivalent
>
> On 20/12/2021 11:41 p.m., Marc Girondot via R-help wrote:
> > Dear members,
> >
> > I work about dosage and some values are bellow the detection limit. I
> > would like create new "numbers" like LDL (to represent lower than
> > detection limit) and UDL (upper the detection limit) that behave like
> > NA, with the possibility to test them using for example is.LDL() or
> > is.UDL().
> >
> > Note that NA is not the same than LDL or UDL: NA represent missing data.
> > Here the data is available as LDL or UDL.
> >
> > NA is built in R language very deep... any option to create new
> > version of NA-equivalent ?
> >
>
> There was a discussion of this back in May.  Here's a link to one approach 
> that I suggested:
>
>https://stat.ethz.ch/pipermail/r-devel/2021-May/080776.html
>
> Read the followup messages, I made at least one suggested improvement.
> I don't know if anyone has packaged this, but there's a later version of the 
> code here:
>
>https://stackoverflow.com/a/69179441/2554330
>
> Duncan Murdoch
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Bert Gunter
Stephen:
You seem confused about data frames. sort(unique(...)) has no problem
sorting individual columns in a data frame (mod the issues about
mixing numerics and non-numerics that have already been discussed).
But the problem is that the results can *not* be put back in a data
frame because, **by definition** all columns in a data frame **must**
have the same number of values. unique() will change the number of
values in a column if done column by column, e.g. via lapply() or
looping over columns. Consequently, if you do this by lapply(), you'll
get a list back, not a data frame. e.g.

> dat <- data.frame(a = rep(3:1,2), b = c(5:1,5))
> dat
  a b
1 3 5
2 2 4
3 1 3
4 3 2
5 2 1
6 1 5
>
> ## via lapply
> dat <- lapply(dat, \(x)sort(unique(x)))
> dat  ## a list.
$a
[1] 1 2 3

$b
[1] 1 2 3 4 5

> ## Trying to do this with an explicit loop results in an error
> dat <- data.frame(a = rep(1:3,2), b = c(1:5,5))
> for(nm in names(dat))dat[[nm]] <- sort(unique(dat[[nm]])) ## error
Error in `[[<-.data.frame`(`*tmp*`, nm, value = c(1, 2, 3, 4, 5)) :
  replacement has 5 rows, data has 6

OTOH, unique() has a data.frame method which will give unique *rows*
(thinking of a data frame as a matrix-like object with a "dim"
attribute):

> dat <- data.frame(a = c(1,2,1), b = c('a','b','a'))
> dat
  a b
1 1 a
2 2 b
3 1 a
> unique(dat)
  a b
1 1 a
2 2 b

There is no sort() method for data frames as this has no obvious
single interpretation of sorting by whole rows. However, see ?sort for
an example using ?order to carry out one possible interpretation of
sorting by rows.

Bert


On Tue, Dec 21, 2021 at 7:16 AM Stephen H. Dawson, DSL via R-help
 wrote:
>
> Thanks everyone for the replies.
>
> It is clear one either needs to write a function or put the unique
> entries into another dataframe.
>
> It seems odd R cannot sort a list of unique column entries with ease.
> Python and SQL can do it with ease.
>
> QUESTION
> Is there a simpler means than other than the unique function to capture
> distinct column entries, then sort that list?
>
>
> *Stephen Dawson, DSL*
> /Executive Strategy Consultant/
> Business & Technology
> +1 (865) 804-3454
> http://www.shdawson.com 
>
>
> On 12/20/21 5:53 PM, Rui Barradas wrote:
> > Hello,
> >
> > Inline.
> >
> > Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
> >> Thanks.
> >>
> >> sort(unique(Data[[1]]))
> >>
> >> This syntax provides row numbers, not column values.
> >
> > This is not right.
> > The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
> > extracts the column vector.
> >
> > As for my previous answer, it was not addressing the question, I
> > misinterpreted it as being a question on how to sort by numeric order
> > when the data is not numeric. Here is a, hopefully, complete answer.
> > Still with package stringr.
> >
> >
> > cols_to_sort <- 1:4
> >
> > Data2 <- lapply(Data[cols_to_sort], \(x){
> >   stringr::str_sort(unique(x), numeric = TRUE)
> > })
> >
> >
> > Or using Avi's suggestion of writing a function to do all the work and
> > simplify the lapply loop later,
> >
> >
> > unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
> > Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> >
> >>
> >> *Stephen Dawson, DSL*
> >> /Executive Strategy Consultant/
> >> Business & Technology
> >> +1 (865) 804-3454
> >> http://www.shdawson.com 
> >>
> >>
> >> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:
> >>> Hi,
> >>>
> >>>
> >>> Running a simple syntax set to review entries in dataframe columns.
> >>> Here is the working code.
> >>>
> >>> Data <- read.csv("./input/Source.csv", header=T)
> >>> describe(Data)
> >>> summary(Data)
> >>> unique(Data[1])
> >>> unique(Data[2])
> >>> unique(Data[3])
> >>> unique(Data[4])
> >>>
> >>> I would like to add sort the unique entries. The data in the various
> >>> columns are not defined as numbers, but also text. I realize 1 and
> >>> 10 will not sort properly, as the column is not defined as a number,
> >>> but want to see what I have in the columns viewed as sorted.
> >>>
> >>> QUESTION
> >>> What is the best process to sort unique output, please?
> >>>
> >>>
> >>> Thanks.
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__

Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Jeff Newmiller
It is not about outlawing matrix notation... to the contrary, it is about 
consistency. For tibbles, [] always returns another tibble. If you wanted a 
column vector, you should have asked for a column vector. Does the fact that 
DF[ 1, ] yields a different type than DF[ , 1 ] and DF[ 1:2, ] satisfy your 
desire to "support" matrix notation? Matlab has no concept of vectors distinct 
from row or column matrices, but R tries too hard to blur the lines between 
vectors and matrix-like objects.  The "drop" argument was a mistaken hack in 
defense of this failure to live with the difference between vectors and 
matrix-like objects and data frames.

On December 21, 2021 10:09:14 AM PST, Duncan Murdoch  
wrote:
>On 21/12/2021 12:53 p.m., Duncan Murdoch wrote:
>> On 21/12/2021 12:29 p.m., Jeff Newmiller wrote:
>>> It is a very rational choice, not a design flaw. I don't like every choice 
>>> they have made for that class, but this one is very solid, and treating 
>>> data frames as lists of columns consistently helps all of us.
>> I think outlawing matrix notation is a really bad idea.  It makes code
>> harder to read, and makes it much harder to switch to matrices, which
>> sometimes gives a huge speed boost to code.
>> 
>> For example, John Fox posted an example that showed that operations on
>> whole columns of dataframes is about twice as fast using list notation
>> as using matrix notation.  But for operating on whole rows, 
>
>... or on individual elements ...
>
> > matrices are
>> about 100 times faster than dataframes.  You shouldn't use notation that
>> makes the switch to matrices more difficult.
>> 
>> Duncan Murdoch
>> 
>>>
>>> On December 21, 2021 9:02:56 AM PST, Duncan Murdoch 
>>>  wrote:
 On 21/12/2021 11:59 a.m., Jeff Newmiller wrote:
> Intuitive, perhaps, but noticably slower. And it doesn't work on tibbles 
> by design. Data frames are lists of columns.

 That's just one of the design flaws in tibbles, but not the worst one.

 Duncan Murdoch

>
> On December 21, 2021 8:38:35 AM PST, Duncan Murdoch 
>  wrote:
>> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
 Thanks for the reply.

 sort(unique(Data[1]))
 Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
 decreasing)) :
     undefined columns selected
>>>
>>> That's the wrong syntax:  Data[1] is not "column one of Data".  Use
>>> Data[[1]] for that, so
>>>
>>>sort(unique(Data[[1]]))
>>
>> Actually, I'd probably recommend
>>
>>  sort(unique(Data[, 1]))
>>
>> instead.  This treats Data as a matrix rather than as a list.
>> Dataframes are lists that look like matrices, but to me the matrix
>> aspect is usually more intuitive.
>>
>> Duncan Murdoch
>>
>>>
>>> I think Rui already pointed out the typo in the quoted text below...
>>>
>>> Duncan Murdoch
>>>

 The recommended syntax did not work, as listed above.

 What I want is the sort of distinct column output. Again, the column 
 may
 be text or numbers. This is a huge analysis effort with data coming at
 me from many different sources.


 *Stephen Dawson, DSL*
 /Executive Strategy Consultant/
 Business & Technology
 +1 (865) 804-3454
 http://www.shdawson.com 


 On 12/21/21 11:07 AM, Duncan Murdoch wrote:
> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:
>> Thanks everyone for the replies.
>>
>> It is clear one either needs to write a function or put the unique
>> entries into another dataframe.
>>
>> It seems odd R cannot sort a list of unique column entries with ease.
>> Python and SQL can do it with ease.
>
> I've seen several responses that looked pretty simple.  It's hard to
> beat sort(unique(x)), though there's a fair bit of confusion about
> what you actually want.  Maybe you should post an example of the code
> you'd use in Python?
>
> Duncan Murdoch
>
>>
>> QUESTION
>> Is there a simpler means than other than the unique function to 
>> capture
>> distinct column entries, then sort that list?
>>
>>
>> *Stephen Dawson, DSL*
>> /Executive Strategy Consultant/
>> Business & Technology
>> +1 (865) 804-3454
>> http://www.shdawson.com 
>>
>>
>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>> Hello,
>>>
>>> Inline.
>>>
>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
 Thanks.

 

Re: [R] Creating NA equivalent

2021-12-21 Thread Avi Gross via R-help
I wonder if the package Adrian Dușa created might be helpful or point you along 
the way.

It was eventually named "declared" 

https://cran.r-project.org/web/packages/declared/index.html

With a vignette here:

https://cran.r-project.org/web/packages/declared/vignettes/declared.pdf

I do not know if it would easily satisfy your needs but it may be a step along 
the way. A package called Haven was part of the motivation and Adrian wanted a 
way to import data from external sources that had more than one category of NA 
that sounds a bit like what you want. His functions should allow the creation 
of such data within R, as well. I am including him in this email if you want to 
contact him or he has something to say.


-Original Message-
From: R-help  On Behalf Of Duncan Murdoch
Sent: Tuesday, December 21, 2021 5:26 AM
To: Marc Girondot ; r-help@r-project.org
Subject: Re: [R] Creating NA equivalent

On 20/12/2021 11:41 p.m., Marc Girondot via R-help wrote:
> Dear members,
> 
> I work about dosage and some values are bellow the detection limit. I 
> would like create new "numbers" like LDL (to represent lower than 
> detection limit) and UDL (upper the detection limit) that behave like 
> NA, with the possibility to test them using for example is.LDL() or 
> is.UDL().
> 
> Note that NA is not the same than LDL or UDL: NA represent missing data.
> Here the data is available as LDL or UDL.
> 
> NA is built in R language very deep... any option to create new 
> version of NA-equivalent ?
> 

There was a discussion of this back in May.  Here's a link to one approach that 
I suggested:

   https://stat.ethz.ch/pipermail/r-devel/2021-May/080776.html

Read the followup messages, I made at least one suggested improvement. 
I don't know if anyone has packaged this, but there's a later version of the 
code here:

   https://stackoverflow.com/a/69179441/2554330

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Fox, John
Dear Jeff,

I haven't investigated your claim systematically, but out of curiosity, I did 
try extending my previous example, admittedly arbitrarily. In doing so, I 
assumed that you intended col in the first case to be the column subscript, not 
the row subscript. Here's what I got (on a newish M1 MacBook Pro):

> system.time(
+   for ( col in colnames( D ) ) {
+ idx <- sample(1e6, 1000)
+ D[ idx, col ] <- idx
+   }
+ )
   user  system elapsed 
  0.913   6.545  43.737 

> system.time(
+   for ( col in colnames( D ) ) {
+ idx <- sample(1e6, 1000)
+ D[[ col ]][ idx ] <- idx
+   }
+ )
   user  system elapsed 
  0.876   6.828  52.033

Best,
 John

On 2021-12-21, 1:04 PM, "R-help on behalf of Jeff Newmiller" 
 wrote:

When your brain is wired to treat a data frame like a matrix, then you 
think things like

for ( col in colnames( col ) ) {
  idx <- expr
  D[ col, idx ] <- otherexpr
}

are reasonable, when

for ( col in colnames( col ) ) {
  idx <- expr
  D[[ col ]][ idx ] <- otherexpr
}

does actually run significantly faster.


On December 21, 2021 9:28:52 AM PST, "Fox, John"  wrote:
>Dear Jeff,
>
>On 2021-12-21, 11:59 AM, "R-help on behalf of Jeff Newmiller" 
 wrote:
>
>Intuitive, perhaps, but noticably slower. 
>
>I think that in most applications, one wouldn't notice the difference; for 
example:
>
>> D <- data.frame(matrix(rnorm(1000*1e6), 1e6, 1000))
>
>> microbenchmark(D[, 1])
>Unit: microseconds
>   expr   minlqmean median uqmax neval
> D[, 1] 3.321 3.362 3.98561  3.444 3.5875 51.291   100
>
>> microbenchmark(D[[1]])
>Unit: microseconds
>   expr   minlqmean median uqmax neval
> D[[1]] 1.722 1.763 1.99137  1.804 1.8655 17.876   100
>
>Best,
> John
>
>
>And it doesn't work on tibbles by design. Data frames are lists of 
columns.
>
>
>On December 21, 2021 8:38:35 AM PST, Duncan Murdoch 
 wrote:
>>On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>>> Thanks for the reply.
>>>>
>>>> sort(unique(Data[1]))
>>>> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
>>>> decreasing)) :
>>>>  undefined columns selected
>>> 
>>> That's the wrong syntax:  Data[1] is not "column one of Data".  Use
>>> Data[[1]] for that, so
>>> 
>>> sort(unique(Data[[1]]))
>>
>>Actually, I'd probably recommend
>>
>>   sort(unique(Data[, 1]))
>>
>>instead.  This treats Data as a matrix rather than as a list. 
>>Dataframes are lists that look like matrices, but to me the matrix 
>>aspect is usually more intuitive.
>>
>>Duncan Murdoch
>>
>>> 
>>> I think Rui already pointed out the typo in the quoted text below...
>>> 
>>> Duncan Murdoch
>>> 
>>>>
>>>> The recommended syntax did not work, as listed above.
>>>>
>>>> What I want is the sort of distinct column output. Again, the 
column may
>>>> be text or numbers. This is a huge analysis effort with data 
coming at
>>>> me from many different sources.
>>>>
>>>>
>>>> *Stephen Dawson, DSL*
>>>> /Executive Strategy Consultant/
>>>> Business & Technology
>>>> +1 (865) 804-3454
>>>> http://www.shdawson.com 
>>>>
>>>>
>>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:
>> Thanks everyone for the replies.
>>
>> It is clear one either needs to write a function or put the 
unique
>> entries into another dataframe.
>>
>> It seems odd R cannot sort a list of unique column entries with 
ease.
>> Python and SQL can do it with ease.
>
> I've seen several responses that looked pretty simple.  It's hard 
to
> beat sort(unique(x)), though there's a fair bit of confusion about
> what you actually want.  Maybe you should post an example of the 
code
> you'd use in Python?
>
> Duncan Murdoch
>
>>
>> QUESTION
>> Is there a simpler means than other than the unique function to 
capture
>> distinct column entries, then sort that list?
>>
>>
>> *Stephen Dawson, DSL*
>> /Executive Strategy Consultant/
>> Business & Technology
>> +1 (865) 804-3454
>> http://www.shdawson.com 
>>
>>
>> On 12/20/21 

Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Avi Gross via R-help
Duncan,

Let's not go there discussing the trouble with tibbles when the topic asked how 
to do things in more native R.

The reality is that tibbles when used in the tidyverse often use somewhat 
different ways to select what columns you want including some very quite 
sophisticated ones like:

select(mydf, wed:fri, ends_with(".xyz), everything())

So it is often not really used to select columns by number but you can do that 
too. What you re talking about is using [] notation which is often not needed 
as you use verbs like filter and select independently.

I find it often way more intuitive to solve things the dplyr way but I agree 
you sometimes want to convert tibbles back to data.frames before using base R 
techniques on them.

-Original Message-
From: R-help  On Behalf Of Duncan Murdoch
Sent: Tuesday, December 21, 2021 12:03 PM
To: Jeff Newmiller ; r-help@r-project.org; 
serv...@shdawson.com; Rui Barradas 
Subject: Re: [R] Adding SORT to UNIQUE

On 21/12/2021 11:59 a.m., Jeff Newmiller wrote:
> Intuitive, perhaps, but noticably slower. And it doesn't work on tibbles by 
> design. Data frames are lists of columns.

That's just one of the design flaws in tibbles, but not the worst one.

Duncan Murdoch

> 
> On December 21, 2021 8:38:35 AM PST, Duncan Murdoch 
>  wrote:
>> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
 Thanks for the reply.

 sort(unique(Data[1]))
 Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
 decreasing)) :
   undefined columns selected
>>>
>>> That's the wrong syntax:  Data[1] is not "column one of Data".  Use 
>>> Data[[1]] for that, so
>>>
>>>  sort(unique(Data[[1]]))
>>
>> Actually, I'd probably recommend
>>
>>sort(unique(Data[, 1]))
>>
>> instead.  This treats Data as a matrix rather than as a list.
>> Dataframes are lists that look like matrices, but to me the matrix 
>> aspect is usually more intuitive.
>>
>> Duncan Murdoch
>>
>>>
>>> I think Rui already pointed out the typo in the quoted text below...
>>>
>>> Duncan Murdoch
>>>

 The recommended syntax did not work, as listed above.

 What I want is the sort of distinct column output. Again, the 
 column may be text or numbers. This is a huge analysis effort with 
 data coming at me from many different sources.


 *Stephen Dawson, DSL*
 /Executive Strategy Consultant/
 Business & Technology
 +1 (865) 804-3454
 http://www.shdawson.com 


 On 12/21/21 11:07 AM, Duncan Murdoch wrote:
> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:
>> Thanks everyone for the replies.
>>
>> It is clear one either needs to write a function or put the 
>> unique entries into another dataframe.
>>
>> It seems odd R cannot sort a list of unique column entries with ease.
>> Python and SQL can do it with ease.
>
> I've seen several responses that looked pretty simple.  It's hard 
> to beat sort(unique(x)), though there's a fair bit of confusion 
> about what you actually want.  Maybe you should post an example of 
> the code you'd use in Python?
>
> Duncan Murdoch
>
>>
>> QUESTION
>> Is there a simpler means than other than the unique function to 
>> capture distinct column entries, then sort that list?
>>
>>
>> *Stephen Dawson, DSL*
>> /Executive Strategy Consultant/
>> Business & Technology
>> +1 (865) 804-3454
>> http://www.shdawson.com 
>>
>>
>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>> Hello,
>>>
>>> Inline.
>>>
>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
 Thanks.

 sort(unique(Data[[1]]))

 This syntax provides row numbers, not column values.
>>>
>>> This is not right.
>>> The syntax Data[1] extracts a sub-data.frame, the syntax 
>>> Data[[1]] extracts the column vector.
>>>
>>> As for my previous answer, it was not addressing the question, I 
>>> misinterpreted it as being a question on how to sort by numeric 
>>> order when the data is not numeric. Here is a, hopefully, complete 
>>> answer.
>>> Still with package stringr.
>>>
>>>
>>> cols_to_sort <- 1:4
>>>
>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>>   stringr::str_sort(unique(x), numeric = TRUE)
>>> })
>>>
>>>
>>> Or using Avi's suggestion of writing a function to do all the 
>>> work and simplify the lapply loop later,
>>>
>>>
>>> unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), 
>>> ...)
>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
>>>
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>>

 *Stephen Dawson, DSL*
 /Executive 

Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Avi Gross via R-help
Stephen,

Languages have their own philosophies and are often focused initially on doing 
specific things well. Later, they tend to accumulate additional functionality 
both in the base language and extensions.

I am wondering if you have explained your need precisely enough to get the 
answers you want. 

SQL and Python have their own ways and both have advantages but also huge 
deficiencies relative to just base R. 

But there are rules you live with and if you choose day a data.frame to store 
things in, the columns must all be the same length. The unique members of one 
data.frame are likely to not be the same number so storing them in a data.frame 
does not work. They can be stored quite  few other ways, such as a list of 
lists.

And what is your definition of ease? I can program in Python and SQL and way 
over a hundred other languages and I know I need to adapt my thinking to the 
flow of the language and not the other way around. Base R was not designed to 
be like either SQL or Python. But it can be extended quite a few ways to do 
just about anything.

What you ran into for example is the fact that some functionality is more 
selective in what it works on. A data.frame with one column is logically the 
same as a matrix with one column and as a vector but in reality, they are not 
the same thing. Yes, they can be converted into each other fairly trivially. 
Sort() seems to care what you feed it. If you did not worry about efficiency, 
you could have a version of sort that accepts a wide variety of inputs, 
converts any it can to some possibly common internal form, then converts the 
output back into the form it was received in, or uses a command-line option to 
specify the output format. It is not hard in R to make such a function as it 
has the primitives needed to examine an arbitrary object and see what 
dimensions it has for some number of types and so on, and has utilities to do 
the conversion.

If you want a language that has calculated every possible combination of ways 
to combine functions and already made tens of thousands available, good luck. 
What languages (including Python and R) expect is for you to compose such 
combinations yourself in one of many ways. The annoying discussions here 
between purists and those wanting to use pre-made packages aside, your question 
can be handled in many of the ways we already discussed. They include making 
your own (often very small) function that implements consolidating the many 
steps into one logical step. It can mean using pipelines like the new "|>" 
operator recently added to base R or the older versions often used in the 
tidyverse packages like "%>%".

You want to take a data.frame and select a column at a time and ask for it to 
be made into unique values then ordered and shown. So you want a VECTOR and 
your initial use of the "[" operator does not take the underlying list 
structure of a data.frame apart the way you might have thought but as a narrow 
data.frame. So you MAY need to either extract it using "[[" or use various 
routines R supplies like unlist() or as.vector().

Here is a pipeline using this as my data:

mydf <- data.frame(ints=c(5,4,3,3,4,5), chars=c("z","i","t","s","t","i"))

Note the number of unique items differs s does the data type:

  mydf
  ints chars
  15 z
  24 i
  33 t
  43 s
  54 t
  65 i

To handle the columns one at a time can be done using a pipeline like:

  > mydf[2] |> unlist() |> unique() |> sort()
  [1] "i" "s" "t" "z"
  > mydf[1] |> unlist() |> unique() |> sort()
  [1] 3 4 5

The above takes a two-column data.frame and restricts it into a one-column 
data.frame and then passes the new temporary variable/object into the command 
line of the unlist() function which returns an object (again temporary) which 
is a  vector (in one case numeric and in the other character) and then that 
result is passed into the command line of unique() which returns a shorter 
vector in the same order and then you pass it on to sort() which reorders it. 

Note the first steps can be shortened if using the "[[" notation or by using 
the named way of asking for a column:

  > mydf[[1]] |> unique() |> sort()
  [1] 3 4 5
  > mydf$ints |> unique() |> sort()
  [1] 3 4 5

But pipelines are simply syntactic sugar mostly so you also can just nest 
function calls as in sort(unique(unlist(mydf[1]))) or do what I showed earlier 
of creating a function that does the work invisibly and call that.

Python often does their own version of pipelines by adding a dot at the end and 
calling a method and if needed another dot and then calling a method on the 
resulting object and so on. But that is arguably more limiting in some ways and 
more powerful in others. Different paradigms. In R, you do not do 
object.method1.method2(args).method3(args) so a pieline method is used to sort 
of so something related.

Now if your need was to do your operation on an entire data.frame at once, then 

Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Duncan Murdoch

On 21/12/2021 12:53 p.m., Duncan Murdoch wrote:

On 21/12/2021 12:29 p.m., Jeff Newmiller wrote:

It is a very rational choice, not a design flaw. I don't like every choice they 
have made for that class, but this one is very solid, and treating data frames 
as lists of columns consistently helps all of us.

I think outlawing matrix notation is a really bad idea.  It makes code
harder to read, and makes it much harder to switch to matrices, which
sometimes gives a huge speed boost to code.

For example, John Fox posted an example that showed that operations on
whole columns of dataframes is about twice as fast using list notation
as using matrix notation.  But for operating on whole rows, 


... or on individual elements ...

> matrices are

about 100 times faster than dataframes.  You shouldn't use notation that
makes the switch to matrices more difficult.

Duncan Murdoch



On December 21, 2021 9:02:56 AM PST, Duncan Murdoch  
wrote:

On 21/12/2021 11:59 a.m., Jeff Newmiller wrote:

Intuitive, perhaps, but noticably slower. And it doesn't work on tibbles by 
design. Data frames are lists of columns.


That's just one of the design flaws in tibbles, but not the worst one.

Duncan Murdoch



On December 21, 2021 8:38:35 AM PST, Duncan Murdoch  
wrote:

On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:

On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:

Thanks for the reply.

sort(unique(Data[1]))
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
decreasing)) :
    undefined columns selected


That's the wrong syntax:  Data[1] is not "column one of Data".  Use
Data[[1]] for that, so

   sort(unique(Data[[1]]))


Actually, I'd probably recommend

 sort(unique(Data[, 1]))

instead.  This treats Data as a matrix rather than as a list.
Dataframes are lists that look like matrices, but to me the matrix
aspect is usually more intuitive.

Duncan Murdoch



I think Rui already pointed out the typo in the quoted text below...

Duncan Murdoch



The recommended syntax did not work, as listed above.

What I want is the sort of distinct column output. Again, the column may
be text or numbers. This is a huge analysis effort with data coming at
me from many different sources.


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/21/21 11:07 AM, Duncan Murdoch wrote:

On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:

Thanks everyone for the replies.

It is clear one either needs to write a function or put the unique
entries into another dataframe.

It seems odd R cannot sort a list of unique column entries with ease.
Python and SQL can do it with ease.


I've seen several responses that looked pretty simple.  It's hard to
beat sort(unique(x)), though there's a fair bit of confusion about
what you actually want.  Maybe you should post an example of the code
you'd use in Python?

Duncan Murdoch



QUESTION
Is there a simpler means than other than the unique function to capture
distinct column entries, then sort that list?


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 5:53 PM, Rui Barradas wrote:

Hello,

Inline.

Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:

Thanks.

sort(unique(Data[[1]]))

This syntax provides row numbers, not column values.


This is not right.
The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
extracts the column vector.

As for my previous answer, it was not addressing the question, I
misinterpreted it as being a question on how to sort by numeric order
when the data is not numeric. Here is a, hopefully, complete answer.
Still with package stringr.


cols_to_sort <- 1:4

Data2 <- lapply(Data[cols_to_sort], \(x){
    stringr::str_sort(unique(x), numeric = TRUE)
})


Or using Avi's suggestion of writing a function to do all the work and
simplify the lapply loop later,


unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)


Hope this helps,

Rui Barradas




*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:

Hi,


Running a simple syntax set to review entries in dataframe columns.
Here is the working code.

Data <- read.csv("./input/Source.csv", header=T)
describe(Data)
summary(Data)
unique(Data[1])
unique(Data[2])
unique(Data[3])
unique(Data[4])

I would like to add sort the unique entries. The data in the various
columns are not defined as numbers, but also text. I realize 1 and
10 will not sort properly, as the column is not defined as a number,
but want to see what I have in the columns viewed as sorted.

QUESTION
What is the best process to sort unique output, 

Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Jeff Newmiller
When your brain is wired to treat a data frame like a matrix, then you think 
things like

for ( col in colnames( col ) ) {
  idx <- expr
  D[ col, idx ] <- otherexpr
}

are reasonable, when

for ( col in colnames( col ) ) {
  idx <- expr
  D[[ col ]][ idx ] <- otherexpr
}

does actually run significantly faster.


On December 21, 2021 9:28:52 AM PST, "Fox, John"  wrote:
>Dear Jeff,
>
>On 2021-12-21, 11:59 AM, "R-help on behalf of Jeff Newmiller" 
> wrote:
>
>Intuitive, perhaps, but noticably slower. 
>
>I think that in most applications, one wouldn't notice the difference; for 
>example:
>
>> D <- data.frame(matrix(rnorm(1000*1e6), 1e6, 1000))
>
>> microbenchmark(D[, 1])
>Unit: microseconds
>   expr   minlqmean median uqmax neval
> D[, 1] 3.321 3.362 3.98561  3.444 3.5875 51.291   100
>
>> microbenchmark(D[[1]])
>Unit: microseconds
>   expr   minlqmean median uqmax neval
> D[[1]] 1.722 1.763 1.99137  1.804 1.8655 17.876   100
>
>Best,
> John
>
>
>And it doesn't work on tibbles by design. Data frames are lists of columns.
>
>
>On December 21, 2021 8:38:35 AM PST, Duncan Murdoch 
>  wrote:
>>On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>>> Thanks for the reply.
>>>>
>>>> sort(unique(Data[1]))
>>>> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
>>>> decreasing)) :
>>>>  undefined columns selected
>>> 
>>> That's the wrong syntax:  Data[1] is not "column one of Data".  Use
>>> Data[[1]] for that, so
>>> 
>>> sort(unique(Data[[1]]))
>>
>>Actually, I'd probably recommend
>>
>>   sort(unique(Data[, 1]))
>>
>>instead.  This treats Data as a matrix rather than as a list. 
>>Dataframes are lists that look like matrices, but to me the matrix 
>>aspect is usually more intuitive.
>>
>>Duncan Murdoch
>>
>>> 
>>> I think Rui already pointed out the typo in the quoted text below...
>>> 
>>> Duncan Murdoch
>>> 
>>>>
>>>> The recommended syntax did not work, as listed above.
>>>>
>>>> What I want is the sort of distinct column output. Again, the column 
> may
>>>> be text or numbers. This is a huge analysis effort with data coming at
>>>> me from many different sources.
>>>>
>>>>
>>>> *Stephen Dawson, DSL*
>>>> /Executive Strategy Consultant/
>>>> Business & Technology
>>>> +1 (865) 804-3454
>>>> http://www.shdawson.com 
>>>>
>>>>
>>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:
>> Thanks everyone for the replies.
>>
>> It is clear one either needs to write a function or put the unique
>> entries into another dataframe.
>>
>> It seems odd R cannot sort a list of unique column entries with ease.
>> Python and SQL can do it with ease.
>
> I've seen several responses that looked pretty simple.  It's hard to
> beat sort(unique(x)), though there's a fair bit of confusion about
> what you actually want.  Maybe you should post an example of the code
> you'd use in Python?
>
> Duncan Murdoch
>
>>
>> QUESTION
>> Is there a simpler means than other than the unique function to 
> capture
>> distinct column entries, then sort that list?
>>
>>
>> *Stephen Dawson, DSL*
>> /Executive Strategy Consultant/
>> Business & Technology
>> +1 (865) 804-3454
>> http://www.shdawson.com 
>>
>>
>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>> Hello,
>>>
>>> Inline.
>>>
>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
>>>> Thanks.
>>>>
>>>> sort(unique(Data[[1]]))
>>>>
>>>> This syntax provides row numbers, not column values.
>>>
>>> This is not right.
>>> The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
>>> extracts the column vector.
>>>
>>> As for my previous answer, it was not addressing the question, I
>>> misinterpreted it as being a question on how to sort by numeric 
> order
>>> when the data is not numeric. Here is a, hopefully, complete answer.
>>> Still with package stringr.
>>>
>>>
>>> cols_to_sort <- 1:4
>>>
>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>>  stringr::str_sort(unique(x), numeric = TRUE)
>>> })
>>>
>>>
>>> Or using Avi's suggestion of writing a function to do all the work 
> and
>>> simplify the lapply loop later,
>>>
>>>
>>> unisort2 <- function(vec, ...) 

Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Duncan Murdoch

On 21/12/2021 12:29 p.m., Jeff Newmiller wrote:

It is a very rational choice, not a design flaw. I don't like every choice they 
have made for that class, but this one is very solid, and treating data frames 
as lists of columns consistently helps all of us.
I think outlawing matrix notation is a really bad idea.  It makes code 
harder to read, and makes it much harder to switch to matrices, which 
sometimes gives a huge speed boost to code.


For example, John Fox posted an example that showed that operations on 
whole columns of dataframes is about twice as fast using list notation 
as using matrix notation.  But for operating on whole rows, matrices are 
about 100 times faster than dataframes.  You shouldn't use notation that 
makes the switch to matrices more difficult.


Duncan Murdoch



On December 21, 2021 9:02:56 AM PST, Duncan Murdoch  
wrote:

On 21/12/2021 11:59 a.m., Jeff Newmiller wrote:

Intuitive, perhaps, but noticably slower. And it doesn't work on tibbles by 
design. Data frames are lists of columns.


That's just one of the design flaws in tibbles, but not the worst one.

Duncan Murdoch



On December 21, 2021 8:38:35 AM PST, Duncan Murdoch  
wrote:

On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:

On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:

Thanks for the reply.

sort(unique(Data[1]))
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
decreasing)) :
   undefined columns selected


That's the wrong syntax:  Data[1] is not "column one of Data".  Use
Data[[1]] for that, so

  sort(unique(Data[[1]]))


Actually, I'd probably recommend

sort(unique(Data[, 1]))

instead.  This treats Data as a matrix rather than as a list.
Dataframes are lists that look like matrices, but to me the matrix
aspect is usually more intuitive.

Duncan Murdoch



I think Rui already pointed out the typo in the quoted text below...

Duncan Murdoch



The recommended syntax did not work, as listed above.

What I want is the sort of distinct column output. Again, the column may
be text or numbers. This is a huge analysis effort with data coming at
me from many different sources.


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/21/21 11:07 AM, Duncan Murdoch wrote:

On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:

Thanks everyone for the replies.

It is clear one either needs to write a function or put the unique
entries into another dataframe.

It seems odd R cannot sort a list of unique column entries with ease.
Python and SQL can do it with ease.


I've seen several responses that looked pretty simple.  It's hard to
beat sort(unique(x)), though there's a fair bit of confusion about
what you actually want.  Maybe you should post an example of the code
you'd use in Python?

Duncan Murdoch



QUESTION
Is there a simpler means than other than the unique function to capture
distinct column entries, then sort that list?


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 5:53 PM, Rui Barradas wrote:

Hello,

Inline.

Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:

Thanks.

sort(unique(Data[[1]]))

This syntax provides row numbers, not column values.


This is not right.
The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
extracts the column vector.

As for my previous answer, it was not addressing the question, I
misinterpreted it as being a question on how to sort by numeric order
when the data is not numeric. Here is a, hopefully, complete answer.
Still with package stringr.


cols_to_sort <- 1:4

Data2 <- lapply(Data[cols_to_sort], \(x){
   stringr::str_sort(unique(x), numeric = TRUE)
})


Or using Avi's suggestion of writing a function to do all the work and
simplify the lapply loop later,


unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)


Hope this helps,

Rui Barradas




*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:

Hi,


Running a simple syntax set to review entries in dataframe columns.
Here is the working code.

Data <- read.csv("./input/Source.csv", header=T)
describe(Data)
summary(Data)
unique(Data[1])
unique(Data[2])
unique(Data[3])
unique(Data[4])

I would like to add sort the unique entries. The data in the various
columns are not defined as numbers, but also text. I realize 1 and
10 will not sort properly, as the column is not defined as a number,
but want to see what I have in the columns viewed as sorted.

QUESTION
What is the best process to sort unique output, please?


Thanks.


__
R-help@r-project.org 

Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Jeff Newmiller
It is a very rational choice, not a design flaw. I don't like every choice they 
have made for that class, but this one is very solid, and treating data frames 
as lists of columns consistently helps all of us.

On December 21, 2021 9:02:56 AM PST, Duncan Murdoch  
wrote:
>On 21/12/2021 11:59 a.m., Jeff Newmiller wrote:
>> Intuitive, perhaps, but noticably slower. And it doesn't work on tibbles by 
>> design. Data frames are lists of columns.
>
>That's just one of the design flaws in tibbles, but not the worst one.
>
>Duncan Murdoch
>
>> 
>> On December 21, 2021 8:38:35 AM PST, Duncan Murdoch 
>>  wrote:
>>> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
 On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
> Thanks for the reply.
>
> sort(unique(Data[1]))
> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
> decreasing)) :
>   undefined columns selected

 That's the wrong syntax:  Data[1] is not "column one of Data".  Use
 Data[[1]] for that, so

  sort(unique(Data[[1]]))
>>>
>>> Actually, I'd probably recommend
>>>
>>>sort(unique(Data[, 1]))
>>>
>>> instead.  This treats Data as a matrix rather than as a list.
>>> Dataframes are lists that look like matrices, but to me the matrix
>>> aspect is usually more intuitive.
>>>
>>> Duncan Murdoch
>>>

 I think Rui already pointed out the typo in the quoted text below...

 Duncan Murdoch

>
> The recommended syntax did not work, as listed above.
>
> What I want is the sort of distinct column output. Again, the column may
> be text or numbers. This is a huge analysis effort with data coming at
> me from many different sources.
>
>
> *Stephen Dawson, DSL*
> /Executive Strategy Consultant/
> Business & Technology
> +1 (865) 804-3454
> http://www.shdawson.com 
>
>
> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:
>>> Thanks everyone for the replies.
>>>
>>> It is clear one either needs to write a function or put the unique
>>> entries into another dataframe.
>>>
>>> It seems odd R cannot sort a list of unique column entries with ease.
>>> Python and SQL can do it with ease.
>>
>> I've seen several responses that looked pretty simple.  It's hard to
>> beat sort(unique(x)), though there's a fair bit of confusion about
>> what you actually want.  Maybe you should post an example of the code
>> you'd use in Python?
>>
>> Duncan Murdoch
>>
>>>
>>> QUESTION
>>> Is there a simpler means than other than the unique function to capture
>>> distinct column entries, then sort that list?
>>>
>>>
>>> *Stephen Dawson, DSL*
>>> /Executive Strategy Consultant/
>>> Business & Technology
>>> +1 (865) 804-3454
>>> http://www.shdawson.com 
>>>
>>>
>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
 Hello,

 Inline.

 Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
> Thanks.
>
> sort(unique(Data[[1]]))
>
> This syntax provides row numbers, not column values.

 This is not right.
 The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
 extracts the column vector.

 As for my previous answer, it was not addressing the question, I
 misinterpreted it as being a question on how to sort by numeric order
 when the data is not numeric. Here is a, hopefully, complete answer.
 Still with package stringr.


 cols_to_sort <- 1:4

 Data2 <- lapply(Data[cols_to_sort], \(x){
   stringr::str_sort(unique(x), numeric = TRUE)
 })


 Or using Avi's suggestion of writing a function to do all the work and
 simplify the lapply loop later,


 unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
 Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)


 Hope this helps,

 Rui Barradas


>
> *Stephen Dawson, DSL*
> /Executive Strategy Consultant/
> Business & Technology
> +1 (865) 804-3454
> http://www.shdawson.com 
>
>
> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:
>> Hi,
>>
>>
>> Running a simple syntax set to review entries in dataframe columns.
>> Here is the working code.
>>
>> Data <- read.csv("./input/Source.csv", header=T)
>> describe(Data)
>> summary(Data)
>> unique(Data[1])
>> unique(Data[2])
>> unique(Data[3])
>> 

Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Fox, John
Dear Jeff,

On 2021-12-21, 11:59 AM, "R-help on behalf of Jeff Newmiller" 
 wrote:

Intuitive, perhaps, but noticably slower. 

I think that in most applications, one wouldn't notice the difference; for 
example:

> D <- data.frame(matrix(rnorm(1000*1e6), 1e6, 1000))

> microbenchmark(D[, 1])
Unit: microseconds
   expr   minlqmean median uqmax neval
 D[, 1] 3.321 3.362 3.98561  3.444 3.5875 51.291   100

> microbenchmark(D[[1]])
Unit: microseconds
   expr   minlqmean median uqmax neval
 D[[1]] 1.722 1.763 1.99137  1.804 1.8655 17.876   100

Best,
 John


And it doesn't work on tibbles by design. Data frames are lists of columns.


On December 21, 2021 8:38:35 AM PST, Duncan Murdoch 
 wrote:
>On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>> Thanks for the reply.
>>>
>>> sort(unique(Data[1]))
>>> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
>>> decreasing)) :
>>>  undefined columns selected
>> 
>> That's the wrong syntax:  Data[1] is not "column one of Data".  Use
>> Data[[1]] for that, so
>> 
>> sort(unique(Data[[1]]))
>
>Actually, I'd probably recommend
>
>   sort(unique(Data[, 1]))
>
>instead.  This treats Data as a matrix rather than as a list. 
>Dataframes are lists that look like matrices, but to me the matrix 
>aspect is usually more intuitive.
>
>Duncan Murdoch
>
>> 
>> I think Rui already pointed out the typo in the quoted text below...
>> 
>> Duncan Murdoch
>> 
>>>
>>> The recommended syntax did not work, as listed above.
>>>
>>> What I want is the sort of distinct column output. Again, the column may
>>> be text or numbers. This is a huge analysis effort with data coming at
>>> me from many different sources.
>>>
>>>
>>> *Stephen Dawson, DSL*
>>> /Executive Strategy Consultant/
>>> Business & Technology
>>> +1 (865) 804-3454
>>> http://www.shdawson.com 
>>>
>>>
>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
 On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:
> Thanks everyone for the replies.
>
> It is clear one either needs to write a function or put the unique
> entries into another dataframe.
>
> It seems odd R cannot sort a list of unique column entries with ease.
> Python and SQL can do it with ease.

 I've seen several responses that looked pretty simple.  It's hard to
 beat sort(unique(x)), though there's a fair bit of confusion about
 what you actually want.  Maybe you should post an example of the code
 you'd use in Python?

 Duncan Murdoch

>
> QUESTION
> Is there a simpler means than other than the unique function to 
capture
> distinct column entries, then sort that list?
>
>
> *Stephen Dawson, DSL*
> /Executive Strategy Consultant/
> Business & Technology
> +1 (865) 804-3454
> http://www.shdawson.com 
>
>
> On 12/20/21 5:53 PM, Rui Barradas wrote:
>> Hello,
>>
>> Inline.
>>
>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
>>> Thanks.
>>>
>>> sort(unique(Data[[1]]))
>>>
>>> This syntax provides row numbers, not column values.
>>
>> This is not right.
>> The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
>> extracts the column vector.
>>
>> As for my previous answer, it was not addressing the question, I
>> misinterpreted it as being a question on how to sort by numeric order
>> when the data is not numeric. Here is a, hopefully, complete answer.
>> Still with package stringr.
>>
>>
>> cols_to_sort <- 1:4
>>
>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>  stringr::str_sort(unique(x), numeric = TRUE)
>> })
>>
>>
>> Or using Avi's suggestion of writing a function to do all the work 
and
>> simplify the lapply loop later,
>>
>>
>> unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>>>
>>> *Stephen Dawson, DSL*
>>> /Executive Strategy Consultant/
>>> Business & Technology
>>> +1 (865) 804-3454
>>> http://www.shdawson.com 
>>>
>>>
>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help 

Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Duncan Murdoch

On 21/12/2021 11:59 a.m., Jeff Newmiller wrote:

Intuitive, perhaps, but noticably slower. And it doesn't work on tibbles by 
design. Data frames are lists of columns.


That's just one of the design flaws in tibbles, but not the worst one.

Duncan Murdoch



On December 21, 2021 8:38:35 AM PST, Duncan Murdoch  
wrote:

On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:

On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:

Thanks for the reply.

sort(unique(Data[1]))
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
decreasing)) :
  undefined columns selected


That's the wrong syntax:  Data[1] is not "column one of Data".  Use
Data[[1]] for that, so

 sort(unique(Data[[1]]))


Actually, I'd probably recommend

   sort(unique(Data[, 1]))

instead.  This treats Data as a matrix rather than as a list.
Dataframes are lists that look like matrices, but to me the matrix
aspect is usually more intuitive.

Duncan Murdoch



I think Rui already pointed out the typo in the quoted text below...

Duncan Murdoch



The recommended syntax did not work, as listed above.

What I want is the sort of distinct column output. Again, the column may
be text or numbers. This is a huge analysis effort with data coming at
me from many different sources.


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/21/21 11:07 AM, Duncan Murdoch wrote:

On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:

Thanks everyone for the replies.

It is clear one either needs to write a function or put the unique
entries into another dataframe.

It seems odd R cannot sort a list of unique column entries with ease.
Python and SQL can do it with ease.


I've seen several responses that looked pretty simple.  It's hard to
beat sort(unique(x)), though there's a fair bit of confusion about
what you actually want.  Maybe you should post an example of the code
you'd use in Python?

Duncan Murdoch



QUESTION
Is there a simpler means than other than the unique function to capture
distinct column entries, then sort that list?


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 5:53 PM, Rui Barradas wrote:

Hello,

Inline.

Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:

Thanks.

sort(unique(Data[[1]]))

This syntax provides row numbers, not column values.


This is not right.
The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
extracts the column vector.

As for my previous answer, it was not addressing the question, I
misinterpreted it as being a question on how to sort by numeric order
when the data is not numeric. Here is a, hopefully, complete answer.
Still with package stringr.


cols_to_sort <- 1:4

Data2 <- lapply(Data[cols_to_sort], \(x){
      stringr::str_sort(unique(x), numeric = TRUE)
})


Or using Avi's suggestion of writing a function to do all the work and
simplify the lapply loop later,


unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)


Hope this helps,

Rui Barradas




*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:

Hi,


Running a simple syntax set to review entries in dataframe columns.
Here is the working code.

Data <- read.csv("./input/Source.csv", header=T)
describe(Data)
summary(Data)
unique(Data[1])
unique(Data[2])
unique(Data[3])
unique(Data[4])

I would like to add sort the unique entries. The data in the various
columns are not defined as numbers, but also text. I realize 1 and
10 will not sort properly, as the column is not defined as a number,
but want to see what I have in the columns viewed as sorted.

QUESTION
What is the best process to sort unique output, please?


Thanks.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.










__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Jeff Newmiller
Intuitive, perhaps, but noticably slower. And it doesn't work on tibbles by 
design. Data frames are lists of columns.

On December 21, 2021 8:38:35 AM PST, Duncan Murdoch  
wrote:
>On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>> Thanks for the reply.
>>>
>>> sort(unique(Data[1]))
>>> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
>>> decreasing)) :
>>>  undefined columns selected
>> 
>> That's the wrong syntax:  Data[1] is not "column one of Data".  Use
>> Data[[1]] for that, so
>> 
>> sort(unique(Data[[1]]))
>
>Actually, I'd probably recommend
>
>   sort(unique(Data[, 1]))
>
>instead.  This treats Data as a matrix rather than as a list. 
>Dataframes are lists that look like matrices, but to me the matrix 
>aspect is usually more intuitive.
>
>Duncan Murdoch
>
>> 
>> I think Rui already pointed out the typo in the quoted text below...
>> 
>> Duncan Murdoch
>> 
>>>
>>> The recommended syntax did not work, as listed above.
>>>
>>> What I want is the sort of distinct column output. Again, the column may
>>> be text or numbers. This is a huge analysis effort with data coming at
>>> me from many different sources.
>>>
>>>
>>> *Stephen Dawson, DSL*
>>> /Executive Strategy Consultant/
>>> Business & Technology
>>> +1 (865) 804-3454
>>> http://www.shdawson.com 
>>>
>>>
>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
 On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:
> Thanks everyone for the replies.
>
> It is clear one either needs to write a function or put the unique
> entries into another dataframe.
>
> It seems odd R cannot sort a list of unique column entries with ease.
> Python and SQL can do it with ease.

 I've seen several responses that looked pretty simple.  It's hard to
 beat sort(unique(x)), though there's a fair bit of confusion about
 what you actually want.  Maybe you should post an example of the code
 you'd use in Python?

 Duncan Murdoch

>
> QUESTION
> Is there a simpler means than other than the unique function to capture
> distinct column entries, then sort that list?
>
>
> *Stephen Dawson, DSL*
> /Executive Strategy Consultant/
> Business & Technology
> +1 (865) 804-3454
> http://www.shdawson.com 
>
>
> On 12/20/21 5:53 PM, Rui Barradas wrote:
>> Hello,
>>
>> Inline.
>>
>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
>>> Thanks.
>>>
>>> sort(unique(Data[[1]]))
>>>
>>> This syntax provides row numbers, not column values.
>>
>> This is not right.
>> The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
>> extracts the column vector.
>>
>> As for my previous answer, it was not addressing the question, I
>> misinterpreted it as being a question on how to sort by numeric order
>> when the data is not numeric. Here is a, hopefully, complete answer.
>> Still with package stringr.
>>
>>
>> cols_to_sort <- 1:4
>>
>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>      stringr::str_sort(unique(x), numeric = TRUE)
>> })
>>
>>
>> Or using Avi's suggestion of writing a function to do all the work and
>> simplify the lapply loop later,
>>
>>
>> unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>>>
>>> *Stephen Dawson, DSL*
>>> /Executive Strategy Consultant/
>>> Business & Technology
>>> +1 (865) 804-3454
>>> http://www.shdawson.com 
>>>
>>>
>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:
 Hi,


 Running a simple syntax set to review entries in dataframe columns.
 Here is the working code.

 Data <- read.csv("./input/Source.csv", header=T)
 describe(Data)
 summary(Data)
 unique(Data[1])
 unique(Data[2])
 unique(Data[3])
 unique(Data[4])

 I would like to add sort the unique entries. The data in the various
 columns are not defined as numbers, but also text. I realize 1 and
 10 will not sort properly, as the column is not defined as a number,
 but want to see what I have in the columns viewed as sorted.

 QUESTION
 What is the best process to sort unique output, please?


 Thanks.
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> 

Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Duncan Murdoch

On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:

On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:

Thanks for the reply.

sort(unique(Data[1]))
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
decreasing)) :
     undefined columns selected


That's the wrong syntax:  Data[1] is not "column one of Data".  Use
Data[[1]] for that, so

sort(unique(Data[[1]]))


Actually, I'd probably recommend

  sort(unique(Data[, 1]))

instead.  This treats Data as a matrix rather than as a list. 
Dataframes are lists that look like matrices, but to me the matrix 
aspect is usually more intuitive.


Duncan Murdoch



I think Rui already pointed out the typo in the quoted text below...

Duncan Murdoch



The recommended syntax did not work, as listed above.

What I want is the sort of distinct column output. Again, the column may
be text or numbers. This is a huge analysis effort with data coming at
me from many different sources.


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/21/21 11:07 AM, Duncan Murdoch wrote:

On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:

Thanks everyone for the replies.

It is clear one either needs to write a function or put the unique
entries into another dataframe.

It seems odd R cannot sort a list of unique column entries with ease.
Python and SQL can do it with ease.


I've seen several responses that looked pretty simple.  It's hard to
beat sort(unique(x)), though there's a fair bit of confusion about
what you actually want.  Maybe you should post an example of the code
you'd use in Python?

Duncan Murdoch



QUESTION
Is there a simpler means than other than the unique function to capture
distinct column entries, then sort that list?


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 5:53 PM, Rui Barradas wrote:

Hello,

Inline.

Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:

Thanks.

sort(unique(Data[[1]]))

This syntax provides row numbers, not column values.


This is not right.
The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
extracts the column vector.

As for my previous answer, it was not addressing the question, I
misinterpreted it as being a question on how to sort by numeric order
when the data is not numeric. Here is a, hopefully, complete answer.
Still with package stringr.


cols_to_sort <- 1:4

Data2 <- lapply(Data[cols_to_sort], \(x){
     stringr::str_sort(unique(x), numeric = TRUE)
})


Or using Avi's suggestion of writing a function to do all the work and
simplify the lapply loop later,


unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)


Hope this helps,

Rui Barradas




*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:

Hi,


Running a simple syntax set to review entries in dataframe columns.
Here is the working code.

Data <- read.csv("./input/Source.csv", header=T)
describe(Data)
summary(Data)
unique(Data[1])
unique(Data[2])
unique(Data[3])
unique(Data[4])

I would like to add sort the unique entries. The data in the various
columns are not defined as numbers, but also text. I realize 1 and
10 will not sort properly, as the column is not defined as a number,
but want to see what I have in the columns viewed as sorted.

QUESTION
What is the best process to sort unique output, please?


Thanks.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.










__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Duncan Murdoch

On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:

Thanks for the reply.

sort(unique(Data[1]))
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
decreasing)) :
    undefined columns selected


That's the wrong syntax:  Data[1] is not "column one of Data".  Use 
Data[[1]] for that, so


  sort(unique(Data[[1]]))

I think Rui already pointed out the typo in the quoted text below...

Duncan Murdoch



The recommended syntax did not work, as listed above.

What I want is the sort of distinct column output. Again, the column may
be text or numbers. This is a huge analysis effort with data coming at
me from many different sources.


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/21/21 11:07 AM, Duncan Murdoch wrote:

On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:

Thanks everyone for the replies.

It is clear one either needs to write a function or put the unique
entries into another dataframe.

It seems odd R cannot sort a list of unique column entries with ease.
Python and SQL can do it with ease.


I've seen several responses that looked pretty simple.  It's hard to
beat sort(unique(x)), though there's a fair bit of confusion about
what you actually want.  Maybe you should post an example of the code
you'd use in Python?

Duncan Murdoch



QUESTION
Is there a simpler means than other than the unique function to capture
distinct column entries, then sort that list?


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 5:53 PM, Rui Barradas wrote:

Hello,

Inline.

Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:

Thanks.

sort(unique(Data[[1]]))

This syntax provides row numbers, not column values.


This is not right.
The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
extracts the column vector.

As for my previous answer, it was not addressing the question, I
misinterpreted it as being a question on how to sort by numeric order
when the data is not numeric. Here is a, hopefully, complete answer.
Still with package stringr.


cols_to_sort <- 1:4

Data2 <- lapply(Data[cols_to_sort], \(x){
    stringr::str_sort(unique(x), numeric = TRUE)
})


Or using Avi's suggestion of writing a function to do all the work and
simplify the lapply loop later,


unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)


Hope this helps,

Rui Barradas




*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:

Hi,


Running a simple syntax set to review entries in dataframe columns.
Here is the working code.

Data <- read.csv("./input/Source.csv", header=T)
describe(Data)
summary(Data)
unique(Data[1])
unique(Data[2])
unique(Data[3])
unique(Data[4])

I would like to add sort the unique entries. The data in the various
columns are not defined as numbers, but also text. I realize 1 and
10 will not sort properly, as the column is not defined as a number,
but want to see what I have in the columns viewed as sorted.

QUESTION
What is the best process to sort unique output, please?


Thanks.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.








__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Stephen H. Dawson, DSL via R-help

Thanks for the reply.

sort(unique(Data[1]))
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = 
decreasing)) :

  undefined columns selected

The recommended syntax did not work, as listed above.

What I want is the sort of distinct column output. Again, the column may 
be text or numbers. This is a huge analysis effort with data coming at 
me from many different sources.



*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/21/21 11:07 AM, Duncan Murdoch wrote:

On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:

Thanks everyone for the replies.

It is clear one either needs to write a function or put the unique
entries into another dataframe.

It seems odd R cannot sort a list of unique column entries with ease.
Python and SQL can do it with ease.


I've seen several responses that looked pretty simple.  It's hard to 
beat sort(unique(x)), though there's a fair bit of confusion about 
what you actually want.  Maybe you should post an example of the code 
you'd use in Python?


Duncan Murdoch



QUESTION
Is there a simpler means than other than the unique function to capture
distinct column entries, then sort that list?


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 5:53 PM, Rui Barradas wrote:

Hello,

Inline.

Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:

Thanks.

sort(unique(Data[[1]]))

This syntax provides row numbers, not column values.


This is not right.
The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
extracts the column vector.

As for my previous answer, it was not addressing the question, I
misinterpreted it as being a question on how to sort by numeric order
when the data is not numeric. Here is a, hopefully, complete answer.
Still with package stringr.


cols_to_sort <- 1:4

Data2 <- lapply(Data[cols_to_sort], \(x){
   stringr::str_sort(unique(x), numeric = TRUE)
})


Or using Avi's suggestion of writing a function to do all the work and
simplify the lapply loop later,


unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)


Hope this helps,

Rui Barradas




*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:

Hi,


Running a simple syntax set to review entries in dataframe columns.
Here is the working code.

Data <- read.csv("./input/Source.csv", header=T)
describe(Data)
summary(Data)
unique(Data[1])
unique(Data[2])
unique(Data[3])
unique(Data[4])

I would like to add sort the unique entries. The data in the various
columns are not defined as numbers, but also text. I realize 1 and
10 will not sort properly, as the column is not defined as a number,
but want to see what I have in the columns viewed as sorted.

QUESTION
What is the best process to sort unique output, please?


Thanks.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Duncan Murdoch

On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:

Thanks everyone for the replies.

It is clear one either needs to write a function or put the unique
entries into another dataframe.

It seems odd R cannot sort a list of unique column entries with ease.
Python and SQL can do it with ease.


I've seen several responses that looked pretty simple.  It's hard to 
beat sort(unique(x)), though there's a fair bit of confusion about what 
you actually want.  Maybe you should post an example of the code you'd 
use in Python?


Duncan Murdoch



QUESTION
Is there a simpler means than other than the unique function to capture
distinct column entries, then sort that list?


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 5:53 PM, Rui Barradas wrote:

Hello,

Inline.

Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:

Thanks.

sort(unique(Data[[1]]))

This syntax provides row numbers, not column values.


This is not right.
The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
extracts the column vector.

As for my previous answer, it was not addressing the question, I
misinterpreted it as being a question on how to sort by numeric order
when the data is not numeric. Here is a, hopefully, complete answer.
Still with package stringr.


cols_to_sort <- 1:4

Data2 <- lapply(Data[cols_to_sort], \(x){
   stringr::str_sort(unique(x), numeric = TRUE)
})


Or using Avi's suggestion of writing a function to do all the work and
simplify the lapply loop later,


unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)


Hope this helps,

Rui Barradas




*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:

Hi,


Running a simple syntax set to review entries in dataframe columns.
Here is the working code.

Data <- read.csv("./input/Source.csv", header=T)
describe(Data)
summary(Data)
unique(Data[1])
unique(Data[2])
unique(Data[3])
unique(Data[4])

I would like to add sort the unique entries. The data in the various
columns are not defined as numbers, but also text. I realize 1 and
10 will not sort properly, as the column is not defined as a number,
but want to see what I have in the columns viewed as sorted.

QUESTION
What is the best process to sort unique output, please?


Thanks.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding SORT to UNIQUE

2021-12-21 Thread Stephen H. Dawson, DSL via R-help

Thanks everyone for the replies.

It is clear one either needs to write a function or put the unique 
entries into another dataframe.


It seems odd R cannot sort a list of unique column entries with ease. 
Python and SQL can do it with ease.


QUESTION
Is there a simpler means than other than the unique function to capture 
distinct column entries, then sort that list?



*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 5:53 PM, Rui Barradas wrote:

Hello,

Inline.

Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:

Thanks.

sort(unique(Data[[1]]))

This syntax provides row numbers, not column values.


This is not right.
The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]] 
extracts the column vector.


As for my previous answer, it was not addressing the question, I 
misinterpreted it as being a question on how to sort by numeric order 
when the data is not numeric. Here is a, hopefully, complete answer.

Still with package stringr.


cols_to_sort <- 1:4

Data2 <- lapply(Data[cols_to_sort], \(x){
  stringr::str_sort(unique(x), numeric = TRUE)
})


Or using Avi's suggestion of writing a function to do all the work and 
simplify the lapply loop later,



unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)


Hope this helps,

Rui Barradas




*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:

Hi,


Running a simple syntax set to review entries in dataframe columns. 
Here is the working code.


Data <- read.csv("./input/Source.csv", header=T)
describe(Data)
summary(Data)
unique(Data[1])
unique(Data[2])
unique(Data[3])
unique(Data[4])

I would like to add sort the unique entries. The data in the various 
columns are not defined as numbers, but also text. I realize 1 and 
10 will not sort properly, as the column is not defined as a number, 
but want to see what I have in the columns viewed as sorted.


QUESTION
What is the best process to sort unique output, please?


Thanks.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating NA equivalent

2021-12-21 Thread Viechtbauer, Wolfgang (SP)
Say 'yi' is left censored. Then:

# naive regression model
res1 <- lm(yi ~ xi, data=dat)

# tobit model via survreg()
res2a <- survreg(Surv(yi, yi > censval, type="left") ~ xi, dist="gaussian", 
data=dat)

# tobit model via tobit() from AER package
res2b <- tobit(yi ~ xi, left=censval, data=dat)

# tobit model via censReg() from censReg package
res2c <- censReg(yi ~ xi, left=censval, data=dat)

(forgot to mention the AER package; and I assume there are even other packages 
that can fit Tobit models).

One can also have censoring on both sides in Tobit models. Just explore these 
packages to see what they can do.

Best,
Wolfgang

>-Original Message-
>From: Chris Evans [mailto:chrish...@psyctc.org]
>Sent: Tuesday, 21 December, 2021 12:56
>To: Viechtbauer, Wolfgang (SP)
>Cc: r-help@r-project.org
>Subject: Re: Creating NA equivalent
>
>Many thanks Wolfgang,
>
>I guess I can see that survival analyses don't have to be time based but
>clearly I need to read up on that.  I can't see an example in the survival
>package.  And it proves to be hard to search for one. Can anyone point me
>to useful resources on that, in {survival} or not?
>
>I am probably straying way off topic and  off list guide here but isn't a
>Tobit only handling censoring at one edge, i.e. the LDL scenario, or the UDL,
>but not both?  I think this may be getting back to Marc's original question
>and certainly, again, I would love to be pointed to either Tobit handling
>LDL _and_ UDL or to any other existing methods.
>
>TIA,
>
>Chris
>
>- Original Message -
>> From: "Wolfgang Viechtbauer" 
>> To: "Chris Evans" 
>> Cc: r-help@r-project.org
>> Sent: Tuesday, 21 December, 2021 11:31:55
>> Subject: RE: Creating NA equivalent
>
>> Hi Chris,
>>
>> The survival package provides machinery for handling censored observations.
>> Whether time is censored or some other type of variable (e.g., viral load due
>> to some lower detection limit) does not make a fundamental difference. In 
>> fact,
>> the type of model you are thinking of with 2) is a Tobit model, which can be
>> fitted using the survival package (or censReg).
>>
>> Best,
>> Wolfgang
>>
>>>-Original Message-
>>>From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Chris Evans
>>>Sent: Tuesday, 21 December, 2021 12:17
>>>To: Duncan Murdoch
>>>Cc: r-help@r-project.org
>>>Subject: Re: [R] Creating NA equivalent
>>>
>>> I am neither a programmer nor a professional statistician but this topic
>>> interests me because:
>>>
>>> 1) I remember from long, long ago that S had a way to create labels that 
>>> could
>>>denote multiple ways in which a value could be missing that was sometimes
>>>useful to me as my field sometimes has such situations.  In R I handle 
>>> this
>>>with a second variable but I can see that using attributes is cleaner and
>>>might have real benefits when doing missing value analyses.  That might
>>>raise questions about whether some of the nice packages that help with
>>>missing value analyses would take on board some standardised use of
>>>attributes for this.
>>>
>>> 2) I think Marc's question LDL/UDL is about a very particular sort of value
>>>that isn't missing and _is_ censored but not in survival analysis meaning
>>>of censored. (At least, it's not the same to my mind, perhaps it is?  To 
>>> me
>>>the difference is that I most often hit the LDL/UDL issue in data that
>>>don't have much, or any, time frame.) Again, this comes up a lot for me
>>>whe people are given limited possible answers in questionnaires and I've
>>>often wondered if I should explore simulating probability models for an 
>>> the
>>>"off the edge" value on a latent variable beneath/behind the measured
>>>responses.  I'd be very grateful to hear of any work in R packages (to 
>>> stay
>>>only just "off the edge" of the postingguide).  Or of any work a long
>>>the lines that Duncan offers, that sort of pulls this towardbase R,
>>>though that sounds to me as if it would be a huge undertaking.
>>>
>>> I'm very interested to hear any thoughts on either aspect.
>>>
>>> Seasonal (mutivalued) greetings to all!
>>>
>> > Chris
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating NA equivalent

2021-12-21 Thread Chris Evans
Many thanks Wolfgang,

I guess I can see that survival analyses don't have to be time based but 
clearly I need to read up on that.  I can't see an example in the survival 
package.  And it proves to be hard to search for one. Can anyone point me 
to useful resources on that, in {survival} or not?

I am probably straying way off topic and  off list guide here but isn't a 
Tobit only handling censoring at one edge, i.e. the LDL scenario, or the UDL, 
but not both?  I think this may be getting back to Marc's original question
and certainly, again, I would love to be pointed to either Tobit handling
LDL _and_ UDL or to any other existing methods.

TIA,

Chris

- Original Message -
> From: "Wolfgang Viechtbauer" 
> To: "Chris Evans" 
> Cc: r-help@r-project.org
> Sent: Tuesday, 21 December, 2021 11:31:55
> Subject: RE: Creating NA equivalent

> Hi Chris,
> 
> The survival package provides machinery for handling censored observations.
> Whether time is censored or some other type of variable (e.g., viral load due
> to some lower detection limit) does not make a fundamental difference. In 
> fact,
> the type of model you are thinking of with 2) is a Tobit model, which can be
> fitted using the survival package (or censReg).
> 
> Best,
> Wolfgang
> 
>>-Original Message-
>>From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Chris Evans
>>Sent: Tuesday, 21 December, 2021 12:17
>>To: Duncan Murdoch
>>Cc: r-help@r-project.org
>>Subject: Re: [R] Creating NA equivalent
>>
>> I am neither a programmer nor a professional statistician but this topic
>> interests me because:
>> 
>> 1) I remember from long, long ago that S had a way to create labels that 
>> could
>>denote multiple ways in which a value could be missing that was sometimes
>>useful to me as my field sometimes has such situations.  In R I handle 
>> this
>>with a second variable but I can see that using attributes is cleaner and
>>might have real benefits when doing missing value analyses.  That might
>>raise questions about whether some of the nice packages that help with
>>missing value analyses would take on board some standardised use of
>>attributes for this.
>> 
>> 2) I think Marc's question LDL/UDL is about a very particular sort of value
>>that isn't missing and _is_ censored but not in survival analysis meaning
>>of censored. (At least, it's not the same to my mind, perhaps it is?  To 
>> me
>>the difference is that I most often hit the LDL/UDL issue in data that
>>don't have much, or any, time frame.) Again, this comes up a lot for me
>>where people are given limited possible answers in questionnaires and I've
>>often wondered if I should explore simulating probability models for an 
>> the
>>"off the edge" value on a latent variable beneath/behind the measured
>>responses.  I'd be very grateful to hear of any work in R packages (to 
>> stay
>>only just "off the edge" of the postingguide).  Or of any work a long
>>the lines that Duncan offers, that sort of pulls this towardbase R,
>>though that sounds to me as if it would be a huge undertaking.
>> 
>> I'm very interested to hear any thoughts on either aspect.
>> 
>> Seasonal (mutivalued) greetings to all!
>> 
> > Chris

-- 
Chris Evans (he/him)  
Visiting Professor, UDLA, Quito, Ecuador & Honorary Professor, University of 
Roehampton, London, UK.
Work web site: https://www.psyctc.org/psyctc/ 
CORE site: https://www.coresystemtrust.org.uk/
Personal site: https://www.psyctc.org/pelerinage2016/
OMbook:https://ombook.psyctc.org/book/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating NA equivalent

2021-12-21 Thread Viechtbauer, Wolfgang (SP)
Hi Chris,

The survival package provides machinery for handling censored observations. 
Whether time is censored or some other type of variable (e.g., viral load due 
to some lower detection limit) does not make a fundamental difference. In fact, 
the type of model you are thinking of with 2) is a Tobit model, which can be 
fitted using the survival package (or censReg).

Best,
Wolfgang

>-Original Message-
>From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Chris Evans
>Sent: Tuesday, 21 December, 2021 12:17
>To: Duncan Murdoch
>Cc: r-help@r-project.org
>Subject: Re: [R] Creating NA equivalent
>
> I am neither a programmer nor a professional statistician but this topic 
> interests me because:
> 
> 1) I remember from long, long ago that S had a way to create labels that could
>denote multiple ways in which a value could be missing that was sometimes
>useful to me as my field sometimes has such situations.  In R I handle this
>with a second variable but I can see that using attributes is cleaner and
>might have real benefits when doing missing value analyses.  That might
>raise questions about whether some of the nice packages that help with
>missing value analyses would take on board some standardised use of
>attributes for this.
> 
> 2) I think Marc's question LDL/UDL is about a very particular sort of value
>that isn't missing and _is_ censored but not in survival analysis meaning
>of censored. (At least, it's not the same to my mind, perhaps it is?  To me
>the difference is that I most often hit the LDL/UDL issue in data that
>don't have much, or any, time frame.) Again, this comes up a lot for me
>where people are given limited possible answers in questionnaires and I've
>often wondered if I should explore simulating probability models for an the
>"off the edge" value on a latent variable beneath/behind the measured
>responses.  I'd be very grateful to hear of any work in R packages (to stay
>only just "off the edge" of the postingguide).  Or of any work a long
>the lines that Duncan offers, that sort of pulls this towardbase R,
>though that sounds to me as if it would be a huge undertaking.
> 
> I'm very interested to hear any thoughts on either aspect.
> 
> Seasonal (mutivalued) greetings to all!
> 
> Chris

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating NA equivalent

2021-12-21 Thread Chris Evans
I am neither a programmer nor a professional statistician but this topic 
interests me because:

1) I remember from long, long ago that S had a way to create labels that could 
denote multiple ways
   in which a value could be missing that was sometimes useful to me as my 
field sometimes has 
   such situations.  In R I handle this with a second variable but I can see 
that using attributes
   is cleaner and might have real benefits when doing missing value analyses.  
That might raise
   questions about whether some of the nice packages that help with missing 
value analyses would
   take on board some standardised use of attributes for this.

2) I think Marc's question LDL/UDL is about a very particular sort of value 
that isn't missing 
   and _is_ censored but not in survival analysis meaning of censored. (At 
least, it's not the same
   to my mind, perhaps it is?  To me the difference is that I most often hit 
the LDL/UDL issue
   in data that don't have much, or any, time frame.) Again, this comes up a 
lot for me where 
   people are given limited possible answers in questionnaires and I've often 
wondered if I 
   should explore simulating probability models for an the "off the edge" value 
on a latent 
   variable beneath/behind the measured responses.  I'd be very grateful to 
hear of any work 
   in R packages (to stay only just "off the edge" of the postingguide).  
Or of any work a
   long the lines that Duncan offers, that sort of pulls this towardbase R, 
though that sounds
   to me as if it would be a huge undertaking.

I'm very interested to hear any thoughts on either aspect.

Seasonal (mutivalued) greetings to all!

Chris

- Original Message -
> From: "Duncan Murdoch" 
> To: "Marc Girondot" , r-help@r-project.org
> Sent: Tuesday, 21 December, 2021 10:26:12
> Subject: Re: [R] Creating NA equivalent

> On 20/12/2021 11:41 p.m., Marc Girondot via R-help wrote:
>> Dear members,
>> 
>> I work about dosage and some values are bellow the detection limit. I
>> would like create new "numbers" like LDL (to represent lower than
>> detection limit) and UDL (upper the detection limit) that behave like
>> NA, with the possibility to test them using for example is.LDL() or
>> is.UDL().
>> 
>> Note that NA is not the same than LDL or UDL: NA represent missing data.
>> Here the data is available as LDL or UDL.
>> 
>> NA is built in R language very deep... any option to create new version
>> of NA-equivalent ?
>> 
> 
> There was a discussion of this back in May.  Here's a link to one
> approach that I suggested:
> 
>   https://stat.ethz.ch/pipermail/r-devel/2021-May/080776.html
> 
> Read the followup messages, I made at least one suggested improvement.
> I don't know if anyone has packaged this, but there's a later version of
> the code here:
> 
>   https://stackoverflow.com/a/69179441/2554330
> 
> Duncan Murdoch
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Chris Evans (he/him)  
Visiting Professor, UDLA, Quito, Ecuador & Honorary Professor, University of 
Roehampton, London, UK.
Work web site: https://www.psyctc.org/psyctc/ 
CORE site: https://www.coresystemtrust.org.uk/
Personal site: https://www.psyctc.org/pelerinage2016/
OMbook:https://ombook.psyctc.org/book/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating NA equivalent

2021-12-21 Thread Duncan Murdoch

On 20/12/2021 11:41 p.m., Marc Girondot via R-help wrote:

Dear members,

I work about dosage and some values are bellow the detection limit. I
would like create new "numbers" like LDL (to represent lower than
detection limit) and UDL (upper the detection limit) that behave like
NA, with the possibility to test them using for example is.LDL() or
is.UDL().

Note that NA is not the same than LDL or UDL: NA represent missing data.
Here the data is available as LDL or UDL.

NA is built in R language very deep... any option to create new version
of NA-equivalent ?



There was a discussion of this back in May.  Here's a link to one 
approach that I suggested:


  https://stat.ethz.ch/pipermail/r-devel/2021-May/080776.html

Read the followup messages, I made at least one suggested improvement. 
I don't know if anyone has packaged this, but there's a later version of 
the code here:


  https://stackoverflow.com/a/69179441/2554330

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.