Bert,

Thanks for the reply.

I did not think to put values back into the same column. This action would not make sense to me, as it would destroy data integrity. I guess adding to a new column in the same container, in this case a dataframe, is possible but again not probable with me.

Either way, thanks for confirming all that comes out count-wise in a dataframe is what must go back into a dataframe count-wise.

It is nice to have folks on a mailing list that help to flush out what one thinks is and will happen with syntax versus what is happening and will happen with syntax.


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 12/21/21 3:38 PM, Bert Gunter wrote:
Stephen:
You seem confused about data frames. sort(unique(...)) has no problem
sorting individual columns in a data frame (mod the issues about
mixing numerics and non-numerics that have already been discussed).
But the problem is that the results can *not* be put back in a data
frame because, **by definition** all columns in a data frame **must**
have the same number of values. unique() will change the number of
values in a column if done column by column, e.g. via lapply() or
looping over columns. Consequently, if you do this by lapply(), you'll
get a list back, not a data frame. e.g.

dat <- data.frame(a = rep(3:1,2), b = c(5:1,5))
dat
   a b
1 3 5
2 2 4
3 1 3
4 3 2
5 2 1
6 1 5
## via lapply
dat <- lapply(dat, \(x)sort(unique(x)))
dat  ## a list.
$a
[1] 1 2 3

$b
[1] 1 2 3 4 5

## Trying to do this with an explicit loop results in an error
dat <- data.frame(a = rep(1:3,2), b = c(1:5,5))
for(nm in names(dat))dat[[nm]] <- sort(unique(dat[[nm]])) ## error
Error in `[[<-.data.frame`(`*tmp*`, nm, value = c(1, 2, 3, 4, 5)) :
   replacement has 5 rows, data has 6

OTOH, unique() has a data.frame method which will give unique *rows*
(thinking of a data frame as a matrix-like object with a "dim"
attribute):

dat <- data.frame(a = c(1,2,1), b = c('a','b','a'))
dat
   a b
1 1 a
2 2 b
3 1 a
unique(dat)
   a b
1 1 a
2 2 b

There is no sort() method for data frames as this has no obvious
single interpretation of sorting by whole rows. However, see ?sort for
an example using ?order to carry out one possible interpretation of
sorting by rows.

Bert


On Tue, Dec 21, 2021 at 7:16 AM Stephen H. Dawson, DSL via R-help
<r-help@r-project.org> wrote:
Thanks everyone for the replies.

It is clear one either needs to write a function or put the unique
entries into another dataframe.

It seems odd R cannot sort a list of unique column entries with ease.
Python and SQL can do it with ease.

QUESTION
Is there a simpler means than other than the unique function to capture
distinct column entries, then sort that list?


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 12/20/21 5:53 PM, Rui Barradas wrote:
Hello,

Inline.

Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
Thanks.

sort(unique(Data[[1]]))

This syntax provides row numbers, not column values.
This is not right.
The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
extracts the column vector.

As for my previous answer, it was not addressing the question, I
misinterpreted it as being a question on how to sort by numeric order
when the data is not numeric. Here is a, hopefully, complete answer.
Still with package stringr.


cols_to_sort <- 1:4

Data2 <- lapply(Data[cols_to_sort], \(x){
   stringr::str_sort(unique(x), numeric = TRUE)
})


Or using Avi's suggestion of writing a function to do all the work and
simplify the lapply loop later,


unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)


Hope this helps,

Rui Barradas


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:
Hi,


Running a simple syntax set to review entries in dataframe columns.
Here is the working code.

Data <- read.csv("./input/Source.csv", header=T)
describe(Data)
summary(Data)
unique(Data[1])
unique(Data[2])
unique(Data[3])
unique(Data[4])

I would like to add sort the unique entries. The data in the various
columns are not defined as numbers, but also text. I realize 1 and
10 will not sort properly, as the column is not defined as a number,
but want to see what I have in the columns viewed as sorted.

QUESTION
What is the best process to sort unique output, please?


Thanks.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to