Re: [R] Plotting the ASCII character set.

2021-07-03 Thread David Winsemius



Sent from my iPhone

> On Jul 3, 2021, at 7:00 PM, Rolf Turner  wrote:
> 
> 
>> On Sat, 3 Jul 2021 09:40:28 +0200
>> Ivan Krylov  wrote:
>> 
>> Hello Rolf Turner,
>> 
>> On Sat, 3 Jul 2021 14:02:59 +1200
>> Rolf Turner  wrote:
>> 
>>> Can anyone suggest how I might get my plot_ascii() function working
>>> again?  Basically, it seems to me, the question is:  how do I
>>> persuade R to read in "\260" as "\ub0" rather than "\xb0"?
>> 
>> Part of the problem is that the "\xb0" byte is not in ASCII, which
>> covers only the lower half of possible 8-bit bytes. I guess that the
>> strings containing bytes with highest bit set used to be interpreted
>> as Latin-1 on your machine, but now get interpreted as UTF-8, which
>> changes their meaning (in UTF-8, the highest bit being set indicates
>> that there will be more bytes to follow, making the string invalid if
>> there is none).
>> 
>> The good news is, since it's Latin-1, which is natively supported by
>> R, there are even multiple options:
>> 
>> 1. Mark the string as Latin-1 by setting Encoding(a) <- 'latin1' and
>> let R do the re-encoding if and when Pango asks it for a UTF-8-encoded
>> string.
>> 
>> 2. Decode Latin-1 into the locale encoding by using iconv(a, 'latin1',
>> '') (or set the third parameter to 'UTF-8', which would give almost
>> the same result on a machine with a UTF-8 locale). The result is,
>> again, a string where Encoding(a) matches the truth. Explicitly
>> setting UTF-8 may be preferable on Windows machines running pre-UCRT
>> builds of R where the locale encoding may not contain all Latin-1
>> characters, but that's not a problem for you, as far as I know.
>> 
>> For any encoding other than Latin-1 or UTF-8, option (2) is still
>> valid.
>> 
>> I have verified that your example works on my GNU/Linux system with a
>> UTF-8 locale if I use either option.
> 
> Thanks Ivan. That solves most of the problem, but there are still
> glitches. I get a plot OK, but a substantial number of the characters
> are displayed as a wee rectangle containing a 2 x 2 array of digits
> such as
> 
>>  0 0
>>  8 0
> 
> Also note that there is a bit of difference between the results of using
> Encoding() and the results of using iconv(). E.g. if I do
> 
> a <- "\x80"
> b <- iconv(a,"latin1","UTF-8")
> Encoding(a) <- "latin1"
> 
> then when I type "a" I get the Euro symbol "€", but when I type "b"
> I get the string "\u0080".
> 
> But that doesn't really matter.  More problematic is the fact that if I
> do either
> 
>plot(0,0,type="n",xlim=c(0,1),ylim=c(0,1),ann=FALSE,axes=FALSE)
>text(0.5,0.5,labels=a,cex=6)
> or
> 
>plot(0,0,type="n",xlim=c(0,1),ylim=c(0,1),ann=FALSE,axes=FALSE)
>text(0.5,0.5,labels=b,cex=6)
> 
> then I get wee rectangle with 0 0 8 0 arranged in a 2 x 2 array inside.
> (Setting cex=6 makes it easier for my ageing eyes to see what the
> mAxdigits are.)
> 
> E Is hethere any way that I can get the Euro symbol to display correctly in
> such a graphic?
> 
Pick a font that is supported on your OS that has the desired glyph. 
Also look at the examples in:

?points

— 
David 
> Thanks.
> 
> cheers,
> 
> Rolf
> 
> -- 
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting the ASCII character set.

2021-07-03 Thread Rolf Turner


On Sat, 3 Jul 2021 09:40:28 +0200
Ivan Krylov  wrote:

> Hello Rolf Turner,
> 
> On Sat, 3 Jul 2021 14:02:59 +1200
> Rolf Turner  wrote:
> 
> > Can anyone suggest how I might get my plot_ascii() function working
> > again?  Basically, it seems to me, the question is:  how do I
> > persuade R to read in "\260" as "\ub0" rather than "\xb0"?
> 
> Part of the problem is that the "\xb0" byte is not in ASCII, which
> covers only the lower half of possible 8-bit bytes. I guess that the
> strings containing bytes with highest bit set used to be interpreted
> as Latin-1 on your machine, but now get interpreted as UTF-8, which
> changes their meaning (in UTF-8, the highest bit being set indicates
> that there will be more bytes to follow, making the string invalid if
> there is none).
> 
> The good news is, since it's Latin-1, which is natively supported by
> R, there are even multiple options:
> 
> 1. Mark the string as Latin-1 by setting Encoding(a) <- 'latin1' and
> let R do the re-encoding if and when Pango asks it for a UTF-8-encoded
> string.
> 
> 2. Decode Latin-1 into the locale encoding by using iconv(a, 'latin1',
> '') (or set the third parameter to 'UTF-8', which would give almost
> the same result on a machine with a UTF-8 locale). The result is,
> again, a string where Encoding(a) matches the truth. Explicitly
> setting UTF-8 may be preferable on Windows machines running pre-UCRT
> builds of R where the locale encoding may not contain all Latin-1
> characters, but that's not a problem for you, as far as I know.
> 
> For any encoding other than Latin-1 or UTF-8, option (2) is still
> valid.
> 
> I have verified that your example works on my GNU/Linux system with a
> UTF-8 locale if I use either option.

Thanks Ivan. That solves most of the problem, but there are still
glitches. I get a plot OK, but a substantial number of the characters
are displayed as a wee rectangle containing a 2 x 2 array of digits
such as

>   0 0
>   8 0

Also note that there is a bit of difference between the results of using
Encoding() and the results of using iconv(). E.g. if I do

a <- "\x80"
b <- iconv(a,"latin1","UTF-8")
Encoding(a) <- "latin1"

then when I type "a" I get the Euro symbol "€", but when I type "b"
I get the string "\u0080".

But that doesn't really matter.  More problematic is the fact that if I
do either

plot(0,0,type="n",xlim=c(0,1),ylim=c(0,1),ann=FALSE,axes=FALSE)
text(0.5,0.5,labels=a,cex=6)
or

plot(0,0,type="n",xlim=c(0,1),ylim=c(0,1),ann=FALSE,axes=FALSE)
text(0.5,0.5,labels=b,cex=6)

then I get wee rectangle with 0 0 8 0 arranged in a 2 x 2 array inside.
(Setting cex=6 makes it easier for my ageing eyes to see what the
digits are.)

Is there any way that I can get the Euro symbol to display correctly in
such a graphic?

Thanks.

cheers,

Rolf

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] concatenating columns in data.frame

2021-07-03 Thread Micha Silver
Again thanks for carrying on this thread with your additional, 
informative comments, as well as the welcome humor.




On 7/3/21 2:59 AM, Jeff Newmiller wrote:
I am very agnostic about tidyverse/base R. However, the complexity of 
setting up NSE functions is often simply not needed, and I encounter 
so many people who simply disregard base R as being too outdated so 
that they never learn how simple solutions in R can be. The contrast 
between your solution and Bert's was... perhaps informative, but a 
nuclear bomb where an axe was sufficient.


On Fri, 2 Jul 2021, Avi Gross via R-help wrote:

I know what you mean Jeff. Yes I am very familiar with base R 
techniques. What I had hoped for was to do two things that some of 
the other methods mentioned do that ended up bringing two data.frames 
together as part of the solution.


Much of what I used is now standard R. I was looking at the accessory 
functions now commonly used in dplyr that let you dynamically select 
which columns to work with like begins_with() to choose. Sadly, they 
seem to work on a top-level but not easily within a call to something 
like paste(...) where they are not evaluated in the way I want.


But the odd method I tried can also be used in standard R with a bit 
of work. You can create a function without using dplyr that takes 
your df and uses it to concatenate and end with something like:


df$new_col <- do_something(df, selected_cols)

That too adds a column without the need to merge larger structures 
explicitly..


But your other point is a tad religious in a sense. I happen to 
prefer learning a core language first then looking at enhancement 
opportunities. But at some point, if teaching someone new who wants 
to focus on getting a job done simply but not necessarily repeatedly 
or in some ideal way, it is best to do things in a way that their 
mind flows better.


Many things in the tidyverse are redundant with base R or just "fix" 
inconsistencies like making sure the first argument is always the 
same. But many add substantially to doing things in a more 
step-by-step manner.


I do not worship the base language as it first came out or even as it 
has evolved. I do like to know what choices I have and pick and 
choose among them as needed. Of course a forum like this is more 
about base R than otherwise and I acknowledge that. Still, the ":=" 
operator is now base R. There is a new pipeline operator "|>" in base 
R. Some ideas, good or otherwise, do get in eventually.


I started doing graphs using base R as in the plot() command. It was 
adequate but I wanted better. So I learned about Lattice and various 
packages and eventually ggplot. I can now do things I barely imagined 
before and am still learning that there is much more I can do with 
packages underneath much of the magic and also additional packages 
layered above it, in some sense. So I do not approach that with an 
either-or mentality either.


Note I am not really talking about just R. I have similar issues with 
other languages I program in such as Python. None of them were 
created fully-formed and many had to add huge amounts to adapt to 
additional wants and needs. Base R for me is often inadequate. But so 
what?


The task being asked for in this thread in isolation, indeed may not 
be done any better using packages. However, if it is part of a larger 
set of tasks that can be pipelined, it may well be and I personally 
was wondering if there was a way in dplyr. There probably is a much 
better way than I assembled if I only knew about it, and if not, they 
may add this kind of indirection in a future release if deemed worthy 
of doing. I have gone back to programs I did years ago with humungous 
amounts of code using what I knew then and reducing it drastically 
now that I can tell a function to select say all my column names that 
end in .orig and apply a set of functions to them with output going 
to the base name followed by .mean and .sd and so on. All that can 
often be done in one or two lines of code where previously I had to 
do 18 near repetitions of each part and then another and another. 
That used a limited form of dynamism.


Be that as it may I think the requester has enough info and we can 
move on.


-Original Message-
From: Jeff Newmiller 
Sent: Friday, July 2, 2021 1:03 AM
To: Avi Gross ; Avi Gross via R-help 
; R-help@r-project.org

Subject: Re: [R] concatenating columns in data.frame

I use parts of the tidyverse frequently, but this post is the best 
argument I can imagine for learning base R techniques.


On July 1, 2021 8:41:06 PM PDT, Avi Gross via R-help 
 wrote:

Micha,

Others have provided ways in standard R so I will contribute a somewhat
odd solution using the dplyr and related packages in the tidyverse
including a sample data.frame/tibble I made. It requires newer versions
of R and other  packages as it uses some fairly esoteric features
including "the big bang" and the new ":=" operator and more.

You can use 

Re: [R] Plotting the ASCII character set.

2021-07-03 Thread Ivan Krylov
Hello Rolf Turner,

On Sat, 3 Jul 2021 14:02:59 +1200
Rolf Turner  wrote:

> Can anyone suggest how I might get my plot_ascii() function working
> again?  Basically, it seems to me, the question is:  how do I persuade
> R to read in "\260" as "\ub0" rather than "\xb0"?

Part of the problem is that the "\xb0" byte is not in ASCII, which
covers only the lower half of possible 8-bit bytes. I guess that the
strings containing bytes with highest bit set used to be interpreted as
Latin-1 on your machine, but now get interpreted as UTF-8, which
changes their meaning (in UTF-8, the highest bit being set indicates
that there will be more bytes to follow, making the string invalid if
there is none).

The good news is, since it's Latin-1, which is natively supported by R,
there are even multiple options:

1. Mark the string as Latin-1 by setting Encoding(a) <- 'latin1' and
let R do the re-encoding if and when Pango asks it for a UTF-8-encoded
string.

2. Decode Latin-1 into the locale encoding by using iconv(a, 'latin1',
'') (or set the third parameter to 'UTF-8', which would give almost the
same result on a machine with a UTF-8 locale). The result is, again, a
string where Encoding(a) matches the truth. Explicitly setting UTF-8
may be preferable on Windows machines running pre-UCRT builds of R
where the locale encoding may not contain all Latin-1 characters, but
that's not a problem for you, as far as I know.

For any encoding other than Latin-1 or UTF-8, option (2) is still valid.

I have verified that your example works on my GNU/Linux system with a
UTF-8 locale if I use either option.

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] add a variable a data frame to sequentially count unique rows

2021-07-03 Thread Rui Barradas

Hello,

Either I'm not understanding or isn't this just any of


aggregate(count ~ ., data = test, FUN = length)

test %>% count(group1, group2, name = "Count")


?

Hope this helps,

Rui Barradas

Às 23:27 de 02/07/21, Yuan Chun Ding escreveu:

Hi R users,

In this test file,
test  <- data.frame(group1=c("g1", "g1", "g1", "g2", "g2", "g2", "g2", "g2", 
"g2"),
group2=c("k1", "a2", "a2", "c5", "n6", "n6", "n6", 
"m10","m10"),
count= c( 1, 1,2,   1, 2,   
2, 2,3,3 ));

I have group 1 and group2 variable and want to add the count variable to 
sequentially count unique rows defined by group1 and group2.

I hope to use the following functions in library (tidyverse),  No one worked 
well.
test %>% group_by(group1, group2) %>% mutate(count = row_number())
test %>% group_by(group1, group2) %>% mutate(count = 1:n())
test %>% group_by(group1, group2) %>% mutate(count = seq_len(n()))
test %>% group_by(group1, group2) %>% mutate(count = seq_along(group1, group2))

Can you help me to make the third column in the test data frame?

Thank you,

Ding

--

-SECURITY/CONFIDENTIALITY WARNING-

This message and any attachments are intended solely for the individual or 
entity to which they are addressed. This communication may contain information 
that is privileged, confidential, or exempt from disclosure under applicable 
law (e.g., personal health information, research data, financial information). 
Because this e-mail has been sent without encryption, individuals other than 
the intended recipient may be able to view the information, forward it to 
others or tamper with the information without the knowledge or consent of the 
sender. If you are not the intended recipient, or the employee or person 
responsible for delivering the message to the intended recipient, any 
dissemination, distribution or copying of the communication is strictly 
prohibited. If you received the communication in error, please notify the 
sender immediately by replying to this message and deleting the message and any 
accompanying files from your system. If, due to the security risks, you do not 
wish to receive further communications via e-mail, please reply to this message 
and inform the sender that you do not wish to receive further e-mail from the 
sender. (LCP301)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.