Re: [R] How to represent tree-structured values

2022-05-29 Thread Jeff Newmiller
Really this depends on the analysis you want to perform.

In the past, I have used a super/sub two-column format as a compact, 
non-redundant representation for data entry, and after applying a recursive 
algorithm to convert this to a super/sub/level/id table where _all_ sub 
components have (duplicative) entries corresponding to each super component.

But there is always the recursive list structure that formats such as yaml and 
json functions typically return.

On May 29, 2022 9:54:44 PM PDT, Richard O'Keefe  wrote:
>There is a kind of data I run into fairly often
>which I have never known how to represent in R,
>and nothing I've tried really satisfies me.
>
>Consider for example
> ...
> - injuries
>   ...
>   - injuries to limbs
> ...
> - injuries to extremities
>   ...
>   - injuries to hands
> - injuries to dominant hand
> - injuries to non-dominant hand
>   ...
> ...
>   ...
>
>This isn't ordinal data, because there is no
>"left to right" order on the values.  But there
>IS a "part/whole" order, which an analysis should
>respect, so it's not pure nominal data either.
>
>As one particular example, if I want to
>tabulate data like this, an occurrence of one
>value should be counted as an occurrence of
>*every* superordinate value.
>
>Examples of such data include "why is this patient
>being treated", "what drug is this patient being
>treated with", "what geographic region is this
>school from", "what biological group does this
>insect belong to".
>
>So what is the recommended way to represent
>and the recommended way to analyse such data in R?
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to represent tree-structured values

2022-05-29 Thread Richard O'Keefe
There is a kind of data I run into fairly often
which I have never known how to represent in R,
and nothing I've tried really satisfies me.

Consider for example
 ...
 - injuries
   ...
   - injuries to limbs
 ...
 - injuries to extremities
   ...
   - injuries to hands
 - injuries to dominant hand
 - injuries to non-dominant hand
   ...
 ...
   ...

This isn't ordinal data, because there is no
"left to right" order on the values.  But there
IS a "part/whole" order, which an analysis should
respect, so it's not pure nominal data either.

As one particular example, if I want to
tabulate data like this, an occurrence of one
value should be counted as an occurrence of
*every* superordinate value.

Examples of such data include "why is this patient
being treated", "what drug is this patient being
treated with", "what geographic region is this
school from", "what biological group does this
insect belong to".

So what is the recommended way to represent
and the recommended way to analyse such data in R?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] categorizing data

2022-05-29 Thread David Carlson via R-help
Here is one way to get the table you are describing. First some made up data:

dta <- structure(list(tree = c(27, 47, 33, 31, 45, 54, 47, 27, 33, 26,
14, 43, 36, 0, 29, 24, 43, 38, 32, 21, 21, 23, 12, 42, 34), shrub = c(19,
29, 27, 31, 5, 24, 6, 37, 4, 6, 59, 7, 23, 15, 32, 1, 31, 37,
30, 44, 40, 10, 28, 23, 32), grass = c(44, 14, 30, 28, 40, 12,
37, 26, 53, 58, 17, 40, 31, 75, 29, 65, 16, 15, 28, 25, 29, 57,
50, 25, 24)), class = "data.frame", row.names = c(NA, -25L))

rnks <- data.frame(t(apply(dta, 1, rank, ties.method="first")))
rnks <- sapply(rnks, factor, labels=c("Low", "Med", "High"))
head(rnks)
 tree   shrub  grass
[1,] "Med"  "Low"  "High"
[2,] "High" "Med"  "Low"
[3,] "High" "Low"  "Med"
[4,] "Med"  "High" "Low"
[5,] "High" "Low"  "Med"
[6,] "High" "Med"  "Low"

table(apply(rnks, 1, paste, collapse="/"))

High/Low/Med High/Med/Low Low/High/Med Low/Med/High Med/High/Low Med/Low/High
   664225

David L Carlson
Texas A University


On Sun, May 29, 2022 at 5:08 PM Roy Mendelssohn - NOAA Federal via
R-help  wrote:
>
> Hi Janet: here is a start to give you the idea, now you need loop either use 
> a "for" or one of the apply functions. 1. Preallocate new data (i am lazy so 
> it is array, for example of size three. 2. order the data and set values. 
> junk <- array(0,
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
>
> Hi Janet:
>
> here is a start to give you the idea,  now you need  loop either use a "for" 
> or one of the apply functions.
>
> 1.  Preallocate new data  (i am lazy so it is array, for example of size 
> three.
>
> 2.  order the data and set values.
>
> junk <- array(0, dim = c(2,3))
> values <- c(10, 30, 50)
> junk[1, order(c(32, 11, 17))] <- values
> junk[1, ]
> [1] 50 10 30
>
>
> This works because order() returns the index of the ordering, not the values.
>
> HTH,
>
> -Roy
> > On May 29, 2022, at 1:31 PM, Janet Choate  wrote:
> >
> > I'm sorry if this has come across as a homework assignment!I was trying to
> > provide a simple example.
> > There are actually 38323 rows of data, each row is an observation of the
> > percent that each of those veg types occupies in a spatial unit - where
> > each line adds to 90 - and values are different every line.
> > I need a way to categorize the data, so I can reduce the number of unique
> > observations.
> >
> > So instead of 38323 unique observations - I can reduce this to
> > X number of High/Med/Low
> > X number of Med/Low/High
> > X number of Low/High/Med
> > etc... for all combinations
> >
> > I hope this makes it more clear..
> > thank you all for your responses,
> > JC
> >
> > On Sun, May 29, 2022 at 1:16 PM Avi Gross via R-help 
> > wrote:
> >
> >> Tom,
> >> You may have a very different impression of what was asked! LOL!
> >> Unless Janet clarifies what seems a bit like a homework assignment, it
> >> seems to be a fairly simple and straightforward assignment with exactly
> >> three rows/columns and asking how to replace the variables, in a sense, by
> >> finding the high and low and perhaps thus identifying the medium, but to do
> >> this for each row without changing the order of the resulting data.frame.
> >> I note most techniques people have used focus on columns, not rows, but an
> >> all-numeric data.frame can be transposed, or converted to a matrix and
> >> later converted back.
> >> If this is HW, the question becomes what has been taught so far and is
> >> supposed to be used in solving it. Can they make their own functions
> >> perhaps to be called three times, once per row or column, to replace that
> >> row/column, or can they use some form of loop to iterate over the columns?
> >> Does it need to sort of be done in place or can they create gradually a
> >> second data.frame and then move the pointer to it and lots of other similar
> >> ideas.
> >> I am not sure, other than as a HW assignment, why this transformation
> >> would need to be done but of course, there may well be a reason.
> >> I note that the particular example shown just happens to create almost a
> >> magic square as the sum of rows and columns and the major diagonal happen
> >> to be 0, albeit the reverse diagonal is all 50's.
> >> Again, there are many solutions imaginable but the goal may be more
> >> specific and I shudder to supply one given that too often questions here
> >> are not detailed enough and are misunderstood. In this case, I thought I
> >> understood until I saw what Tom wrote! LOL!
> >> I will add this. Is it guaranteed that no two items in the same row are
> >> never equal or is there some requirement for how to handle a tie? And note
> >> there are base R functions called min() and max() and you can ask for
> >> things like:
> >>
> >> if ( current == min(mydata[1,])) ...
> >>
> >>
> >> -Original Message-
> >> From: Tom Woolman 
> >> To: Janet Choate 
> 

Re: [R] Circular Graph Recommendation Request

2022-05-29 Thread Christopher W. Ryan via R-help
If the units of analysis are real spatial regions (e.g. states), how
about a cartogram?

https://gisgeography.com/cartogram-maps/

An R package (I have no experience with it)

https://cran.r-project.org/web/packages/cartogram/index.html

The advantage of a cartogram is that it is a single graphic, rather than
2 like the original post referenced. No need to move eye back and forth
to decode the colors. And it maintains---as much as possible given the
distortion, which is the whole point of a cartogram--- the relative
spatial positions of the areal units (in this case, states.)  The round
figure in the original post has the northern midwestern region in the
7:00 to 8:00-ish position, what might be considered notionally the
"southwest."  A little counterintuitive.

--Chris Ryan

Bert Gunter wrote:
> Very nice plot. Thanks for sharing.
> Can't help directly, but as the plot is sort of a map with polygonal
> areas encoding the value of a variable, you might try posting on
> r-sig-geo instead where there might be more relevant expertise in such
>  things -- or perhaps suggestions for alternative visualizations that
> work similarly.
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> On Sat, May 28, 2022 at 8:39 AM Stephen H. Dawson, DSL via R-help
>  wrote:
>>
>> https://www.visualcapitalist.com/us-goods-exports-by-state/
>> Visualizing U.S. Exports by State
>>
>> Good Morning,
>>
>>
>> https://www.visualcapitalist.com/wp-content/uploads/2022/05/us-exports-by-state-infographic.jpg
>>
>> Saw an impressive graph today. Sharing with the list.
>>
>> The size proportionality of the state segments in a circle graph is catchy.
>>
>> QUESTION
>> Is there a package one could use with R to accomplish this particular
>> circular-style graph?
>>
>>
>> Kindest Regards,
>> --
>> *Stephen Dawson, DSL*
>> /Executive Strategy Consultant/
>> Business & Technology
>> +1 (865) 804-3454
>> http://www.shdawson.com
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to color boxplots with respect to the variable names

2022-05-29 Thread Neha gupta
Thank you so much Jim for your help.

Best regards

On Monday, May 30, 2022, Jim Lemon  wrote:

> Hi Neha,
> As you have a distinguishing feature in the variable names, here is
> one way to do it:
>
> RF<- c(4.7, 1.52, 1.46, 4.5, 0.62, 1.12)
> RF_LOO<- c(5.2, 1.52, 1.44, 4.3, 0.64, 1.11)
> RF_boot<- c(5.8, 1.5, 1.23, 4.3, 0.64, 1.12)
> Ranger<- c(4.5, 1.57, 1.25, 3.75, 0.56, 1.09)
> Ranger_LOO<- c(5, 1.56, 1.35, 3.7, 0.6, 1.0)
> Ranger_boot<- c(4.2, 1.53, 1.12, 3.7, 0.63, 1.1)
> SVM<- c(3.51, 1.34, 0.62, 1.45, 0.5, 1.06)
> SVM_LOO<- c(3.6, 1.33, 0.33, 1.4, 0.41, 1.1)
> SVM_boot<- c(3.75, 1.35, 0.58, 1.4, 0.4, 1.0)
> KNN<- c(2.85, 1.35, 0.25, 1.76, 0.43, 1.25)
> KNN_LOO<- c(2.85, 1.34, 0.375, 1.75, 0.44, 1.27)
> KNN_boot<- c(2.75, 1.35, 0.375, 1.75, 0.45, 1.27)
> varnames<-c("RF","RF_LOO","RF_boot",
>  "RANGER","RANGER_LOO","RANGER_boot",
>  "SVM","SVM_LOO","SVM_boot",
>  "KNN","KNN_LOO","KNN_boot")
> colors<-rep("blue",length(varnames))
> colors[grep("LOO",varnames)]<-"green"
> colors[grep("boot",varnames)]<-"red"
> at.x <- seq(1,by=.4, length.out = 10)
> boxplot(RF, RF_LOO, RF_boot, Ranger, Ranger_LOO, Ranger_boot, SVM, SVM_LOO,
> SVM_boot,
> KNN, KNN_LOO, KNN_boot, range = 0, col=colors, names= c("RF",
> "RF_LOO", "RF_boot",
> "Ranger", "Ranger_LOO", "Ranger_boot", "SVM", "SVM_LOO",
> "SVM_boot",
> "KNN", "KNN_LOO",
> "KNN_boot"),las=2,boxwex=0.5,outline=FALSE,cex.axis=0.8, main="Consistency
> of the 100% features ")
> legend(8,5.5,c("Raw","LOO","boot"),fill=c("blue","green","red"))
>
> Jim
>
> On Mon, May 30, 2022 at 4:46 AM Neha gupta 
> wrote:
> >
> > I have the following data and I need to use a boxplot which displays the
> > variables (RF, Ranger, SVM, KNN) with one color, variables (RF_boot,
> > Ranger_boot, SVM_boot, KNN_boot) with another color and the variables
> > (RF_LOO, SVM_LOO, Ranger_LOO, KNN_LOO) with another color.
> >
> > How can I do that? Currently, I am using the base boxplot which displays
> > them in one color. I know it will be more easily achieved with ggplot
> but I
> > have no experience/knowledge with it.
> >
> > RF= c(4.7, 1.52, 1.46, 4.5, 0.62, 1.12)
> > RF_LOO= c(5.2, 1.52, 1.44, 4.3, 0.64, 1.11)
> > RF_boot= c(5.8, 1.5, 1.23, 4.3, 0.64, 1.12)
> > Ranger= c(4.5, 1.57, 1.25, 3.75, 0.56, 1.09)
> > Ranger_LOO= c(5, 1.56, 1.35, 3.7, 0.6, 1.0)
> > Ranger_boot= c(4.2, 1.53, 1.12, 3.7, 0.63, 1.1)
> > SVM= c(3.51, 1.34, 0.62, 1.45, 0.5, 1.06)
> > SVM_LOO= c(3.6, 1.33, 0.33, 1.4, 0.41, 1.1)
> > SVM_boot= c(3.75, 1.35, 0.58, 1.4, 0.4, 1.0)
> > KNN= c(2.85, 1.35, 0.25, 1.76, 0.43, 1.25)
> > KNN_LOO= c(2.85, 1.34, 0.375, 1.75, 0.44, 1.27)
> > KNN_boot= c(2.75, 1.35, 0.375, 1.75, 0.45, 1.27)
> >
> > My base boxplot is here
> >
> > colors = rep("blue",12)
> > at.x <- seq(1,by=.4, length.out = 10)
> > boxplot(RF, RF_LOO, RF_boot, Ranger, Ranger_LOO, Ranger_boot, SVM,
> SVM_LOO,
> > SVM_boot,
> > KNN, KNN_LOO, KNN_boot, range = 0, col=colors, names= c("RF",
> > "RF_LOO", "RF_boot",
> > "Ranger", "Ranger_LOO", "Ranger_boot", "SVM", "SVM_LOO",
> "SVM_boot",
> > "KNN", "KNN_LOO",
> > "KNN_boot"),las=2,boxwex=0.5,outline=FALSE,cex.axis=0.8,
> main="Consistency
> > of the 100% features ")
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to color boxplots with respect to the variable names

2022-05-29 Thread Jim Lemon
Hi Neha,
As you have a distinguishing feature in the variable names, here is
one way to do it:

RF<- c(4.7, 1.52, 1.46, 4.5, 0.62, 1.12)
RF_LOO<- c(5.2, 1.52, 1.44, 4.3, 0.64, 1.11)
RF_boot<- c(5.8, 1.5, 1.23, 4.3, 0.64, 1.12)
Ranger<- c(4.5, 1.57, 1.25, 3.75, 0.56, 1.09)
Ranger_LOO<- c(5, 1.56, 1.35, 3.7, 0.6, 1.0)
Ranger_boot<- c(4.2, 1.53, 1.12, 3.7, 0.63, 1.1)
SVM<- c(3.51, 1.34, 0.62, 1.45, 0.5, 1.06)
SVM_LOO<- c(3.6, 1.33, 0.33, 1.4, 0.41, 1.1)
SVM_boot<- c(3.75, 1.35, 0.58, 1.4, 0.4, 1.0)
KNN<- c(2.85, 1.35, 0.25, 1.76, 0.43, 1.25)
KNN_LOO<- c(2.85, 1.34, 0.375, 1.75, 0.44, 1.27)
KNN_boot<- c(2.75, 1.35, 0.375, 1.75, 0.45, 1.27)
varnames<-c("RF","RF_LOO","RF_boot",
 "RANGER","RANGER_LOO","RANGER_boot",
 "SVM","SVM_LOO","SVM_boot",
 "KNN","KNN_LOO","KNN_boot")
colors<-rep("blue",length(varnames))
colors[grep("LOO",varnames)]<-"green"
colors[grep("boot",varnames)]<-"red"
at.x <- seq(1,by=.4, length.out = 10)
boxplot(RF, RF_LOO, RF_boot, Ranger, Ranger_LOO, Ranger_boot, SVM, SVM_LOO,
SVM_boot,
KNN, KNN_LOO, KNN_boot, range = 0, col=colors, names= c("RF",
"RF_LOO", "RF_boot",
"Ranger", "Ranger_LOO", "Ranger_boot", "SVM", "SVM_LOO", "SVM_boot",
"KNN", "KNN_LOO",
"KNN_boot"),las=2,boxwex=0.5,outline=FALSE,cex.axis=0.8, main="Consistency
of the 100% features ")
legend(8,5.5,c("Raw","LOO","boot"),fill=c("blue","green","red"))

Jim

On Mon, May 30, 2022 at 4:46 AM Neha gupta  wrote:
>
> I have the following data and I need to use a boxplot which displays the
> variables (RF, Ranger, SVM, KNN) with one color, variables (RF_boot,
> Ranger_boot, SVM_boot, KNN_boot) with another color and the variables
> (RF_LOO, SVM_LOO, Ranger_LOO, KNN_LOO) with another color.
>
> How can I do that? Currently, I am using the base boxplot which displays
> them in one color. I know it will be more easily achieved with ggplot but I
> have no experience/knowledge with it.
>
> RF= c(4.7, 1.52, 1.46, 4.5, 0.62, 1.12)
> RF_LOO= c(5.2, 1.52, 1.44, 4.3, 0.64, 1.11)
> RF_boot= c(5.8, 1.5, 1.23, 4.3, 0.64, 1.12)
> Ranger= c(4.5, 1.57, 1.25, 3.75, 0.56, 1.09)
> Ranger_LOO= c(5, 1.56, 1.35, 3.7, 0.6, 1.0)
> Ranger_boot= c(4.2, 1.53, 1.12, 3.7, 0.63, 1.1)
> SVM= c(3.51, 1.34, 0.62, 1.45, 0.5, 1.06)
> SVM_LOO= c(3.6, 1.33, 0.33, 1.4, 0.41, 1.1)
> SVM_boot= c(3.75, 1.35, 0.58, 1.4, 0.4, 1.0)
> KNN= c(2.85, 1.35, 0.25, 1.76, 0.43, 1.25)
> KNN_LOO= c(2.85, 1.34, 0.375, 1.75, 0.44, 1.27)
> KNN_boot= c(2.75, 1.35, 0.375, 1.75, 0.45, 1.27)
>
> My base boxplot is here
>
> colors = rep("blue",12)
> at.x <- seq(1,by=.4, length.out = 10)
> boxplot(RF, RF_LOO, RF_boot, Ranger, Ranger_LOO, Ranger_boot, SVM, SVM_LOO,
> SVM_boot,
> KNN, KNN_LOO, KNN_boot, range = 0, col=colors, names= c("RF",
> "RF_LOO", "RF_boot",
> "Ranger", "Ranger_LOO", "Ranger_boot", "SVM", "SVM_LOO", "SVM_boot",
> "KNN", "KNN_LOO",
> "KNN_boot"),las=2,boxwex=0.5,outline=FALSE,cex.axis=0.8, main="Consistency
> of the 100% features ")
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] categorizing data

2022-05-29 Thread Roy Mendelssohn - NOAA Federal via R-help
Hi Janet:

here is a start to give you the idea,  now you need  loop either use a "for" or 
one of the apply functions.

1.  Preallocate new data  (i am lazy so it is array, for example of size three.

2.  order the data and set values.

junk <- array(0, dim = c(2,3))
values <- c(10, 30, 50)
junk[1, order(c(32, 11, 17))] <- values
junk[1, ]
[1] 50 10 30


This works because order() returns the index of the ordering, not the values.

HTH,

-Roy
> On May 29, 2022, at 1:31 PM, Janet Choate  wrote:
> 
> I'm sorry if this has come across as a homework assignment!I was trying to
> provide a simple example.
> There are actually 38323 rows of data, each row is an observation of the
> percent that each of those veg types occupies in a spatial unit - where
> each line adds to 90 - and values are different every line.
> I need a way to categorize the data, so I can reduce the number of unique
> observations.
> 
> So instead of 38323 unique observations - I can reduce this to
> X number of High/Med/Low
> X number of Med/Low/High
> X number of Low/High/Med
> etc... for all combinations
> 
> I hope this makes it more clear..
> thank you all for your responses,
> JC
> 
> On Sun, May 29, 2022 at 1:16 PM Avi Gross via R-help 
> wrote:
> 
>> Tom,
>> You may have a very different impression of what was asked! LOL!
>> Unless Janet clarifies what seems a bit like a homework assignment, it
>> seems to be a fairly simple and straightforward assignment with exactly
>> three rows/columns and asking how to replace the variables, in a sense, by
>> finding the high and low and perhaps thus identifying the medium, but to do
>> this for each row without changing the order of the resulting data.frame.
>> I note most techniques people have used focus on columns, not rows, but an
>> all-numeric data.frame can be transposed, or converted to a matrix and
>> later converted back.
>> If this is HW, the question becomes what has been taught so far and is
>> supposed to be used in solving it. Can they make their own functions
>> perhaps to be called three times, once per row or column, to replace that
>> row/column, or can they use some form of loop to iterate over the columns?
>> Does it need to sort of be done in place or can they create gradually a
>> second data.frame and then move the pointer to it and lots of other similar
>> ideas.
>> I am not sure, other than as a HW assignment, why this transformation
>> would need to be done but of course, there may well be a reason.
>> I note that the particular example shown just happens to create almost a
>> magic square as the sum of rows and columns and the major diagonal happen
>> to be 0, albeit the reverse diagonal is all 50's.
>> Again, there are many solutions imaginable but the goal may be more
>> specific and I shudder to supply one given that too often questions here
>> are not detailed enough and are misunderstood. In this case, I thought I
>> understood until I saw what Tom wrote! LOL!
>> I will add this. Is it guaranteed that no two items in the same row are
>> never equal or is there some requirement for how to handle a tie? And note
>> there are base R functions called min() and max() and you can ask for
>> things like:
>> 
>> if ( current == min(mydata[1,])) ...
>> 
>> 
>> -Original Message-
>> From: Tom Woolman 
>> To: Janet Choate 
>> Cc: r-help@r-project.org
>> Sent: Sun, May 29, 2022 3:42 pm
>> Subject: Re: [R] categorizing data
>> 
>> 
>> Some ideas:
>> 
>> You could create a cluster model with k=3 for each of the 3 variables,
>> to determine what constitutes high/medium/low centroid values for each
>> of the 3 types of plant types. Centroid values could then be used as the
>> upper/lower boundary ranges for high/med/low.
>> 
>> Or utilize a histogram for each variable, and use quantiles or
>> densities, etc. to determine the natural breaks for the high/med/low
>> ranges for each of the IVs.
>> 
>> 
>> 
>> 
>> On 2022-05-29 15:28, Janet Choate wrote:
>>> Hi R community,
>>> I have a data frame with three variables, where each row adds up to 90.
>>> I want to assign a category of low, medium, or high to the values in
>>> each
>>> row - where the lowest value per row will be set to 10, the medium
>>> value
>>> set to 30, and the high value set to 50 - so each row still adds up to
>>> 90.
>>> 
>>> For example:
>>> Data: Orig
>>> tree  shrub  grass
>>> 3211  47
>>> 23  41  26
>>> 49  23  18
>>> 
>>> Data: New
>>> tree  shrub  grass
>>> 30  10  50
>>> 10  5030
>>> 50  3010
>>> 
>>> I am not attaching any code here as I have not been able to write
>>> anything
>>> effective! appreciate help with this!
>>> thank you,
>>> JC
>>> 
>>> --
>>> 
>>>[[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> 

Re: [R] categorizing data

2022-05-29 Thread Janet Choate
I'm sorry if this has come across as a homework assignment!I was trying to
provide a simple example.
There are actually 38323 rows of data, each row is an observation of the
percent that each of those veg types occupies in a spatial unit - where
each line adds to 90 - and values are different every line.
I need a way to categorize the data, so I can reduce the number of unique
observations.

So instead of 38323 unique observations - I can reduce this to
X number of High/Med/Low
X number of Med/Low/High
X number of Low/High/Med
etc... for all combinations

I hope this makes it more clear..
thank you all for your responses,
JC

On Sun, May 29, 2022 at 1:16 PM Avi Gross via R-help 
wrote:

> Tom,
> You may have a very different impression of what was asked! LOL!
> Unless Janet clarifies what seems a bit like a homework assignment, it
> seems to be a fairly simple and straightforward assignment with exactly
> three rows/columns and asking how to replace the variables, in a sense, by
> finding the high and low and perhaps thus identifying the medium, but to do
> this for each row without changing the order of the resulting data.frame.
> I note most techniques people have used focus on columns, not rows, but an
> all-numeric data.frame can be transposed, or converted to a matrix and
> later converted back.
> If this is HW, the question becomes what has been taught so far and is
> supposed to be used in solving it. Can they make their own functions
> perhaps to be called three times, once per row or column, to replace that
> row/column, or can they use some form of loop to iterate over the columns?
> Does it need to sort of be done in place or can they create gradually a
> second data.frame and then move the pointer to it and lots of other similar
> ideas.
> I am not sure, other than as a HW assignment, why this transformation
> would need to be done but of course, there may well be a reason.
> I note that the particular example shown just happens to create almost a
> magic square as the sum of rows and columns and the major diagonal happen
> to be 0, albeit the reverse diagonal is all 50's.
> Again, there are many solutions imaginable but the goal may be more
> specific and I shudder to supply one given that too often questions here
> are not detailed enough and are misunderstood. In this case, I thought I
> understood until I saw what Tom wrote! LOL!
> I will add this. Is it guaranteed that no two items in the same row are
> never equal or is there some requirement for how to handle a tie? And note
> there are base R functions called min() and max() and you can ask for
> things like:
>
> if ( current == min(mydata[1,])) ...
>
>
> -Original Message-
> From: Tom Woolman 
> To: Janet Choate 
> Cc: r-help@r-project.org
> Sent: Sun, May 29, 2022 3:42 pm
> Subject: Re: [R] categorizing data
>
>
> Some ideas:
>
> You could create a cluster model with k=3 for each of the 3 variables,
> to determine what constitutes high/medium/low centroid values for each
> of the 3 types of plant types. Centroid values could then be used as the
> upper/lower boundary ranges for high/med/low.
>
> Or utilize a histogram for each variable, and use quantiles or
> densities, etc. to determine the natural breaks for the high/med/low
> ranges for each of the IVs.
>
>
>
>
> On 2022-05-29 15:28, Janet Choate wrote:
> > Hi R community,
> > I have a data frame with three variables, where each row adds up to 90.
> > I want to assign a category of low, medium, or high to the values in
> > each
> > row - where the lowest value per row will be set to 10, the medium
> > value
> > set to 30, and the high value set to 50 - so each row still adds up to
> > 90.
> >
> > For example:
> > Data: Orig
> > tree  shrub  grass
> > 3211  47
> > 23  41  26
> > 49  23  18
> >
> > Data: New
> > tree  shrub  grass
> > 30  10  50
> > 10  5030
> > 50  3010
> >
> > I am not attaching any code here as I have not been able to write
> > anything
> > effective! appreciate help with this!
> > thank you,
> > JC
> >
> > --
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> 

Re: [R] categorizing data

2022-05-29 Thread Avi Gross via R-help
Tom,
You may have a very different impression of what was asked! LOL!
Unless Janet clarifies what seems a bit like a homework assignment, it seems to 
be a fairly simple and straightforward assignment with exactly three 
rows/columns and asking how to replace the variables, in a sense, by finding 
the high and low and perhaps thus identifying the medium, but to do this for 
each row without changing the order of the resulting data.frame.
I note most techniques people have used focus on columns, not rows, but an 
all-numeric data.frame can be transposed, or converted to a matrix and later 
converted back.
If this is HW, the question becomes what has been taught so far and is supposed 
to be used in solving it. Can they make their own functions perhaps to be 
called three times, once per row or column, to replace that row/column, or can 
they use some form of loop to iterate over the columns? Does it need to sort of 
be done in place or can they create gradually a second data.frame and then move 
the pointer to it and lots of other similar ideas.
I am not sure, other than as a HW assignment, why this transformation would 
need to be done but of course, there may well be a reason.
I note that the particular example shown just happens to create almost a magic 
square as the sum of rows and columns and the major diagonal happen to be 0, 
albeit the reverse diagonal is all 50's. 
Again, there are many solutions imaginable but the goal may be more specific 
and I shudder to supply one given that too often questions here are not 
detailed enough and are misunderstood. In this case, I thought I understood 
until I saw what Tom wrote! LOL!
I will add this. Is it guaranteed that no two items in the same row are never 
equal or is there some requirement for how to handle a tie? And note there are 
base R functions called min() and max() and you can ask for things like:

if ( current == min(mydata[1,])) ...


-Original Message-
From: Tom Woolman 
To: Janet Choate 
Cc: r-help@r-project.org
Sent: Sun, May 29, 2022 3:42 pm
Subject: Re: [R] categorizing data


Some ideas:

You could create a cluster model with k=3 for each of the 3 variables, 
to determine what constitutes high/medium/low centroid values for each 
of the 3 types of plant types. Centroid values could then be used as the 
upper/lower boundary ranges for high/med/low.

Or utilize a histogram for each variable, and use quantiles or 
densities, etc. to determine the natural breaks for the high/med/low 
ranges for each of the IVs.




On 2022-05-29 15:28, Janet Choate wrote:
> Hi R community,
> I have a data frame with three variables, where each row adds up to 90.
> I want to assign a category of low, medium, or high to the values in 
> each
> row - where the lowest value per row will be set to 10, the medium 
> value
> set to 30, and the high value set to 50 - so each row still adds up to 
> 90.
> 
> For example:
> Data: Orig
> tree  shrub  grass
> 32    11      47
> 23      41      26
> 49      23      18
> 
> Data: New
> tree  shrub  grass
> 30      10      50
> 10      50    30
> 50      30    10
> 
> I am not attaching any code here as I have not been able to write 
> anything
> effective! appreciate help with this!
> thank you,
> JC
> 
> --
> 
>     [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] categorizing data

2022-05-29 Thread Rui Barradas

Hello,

Here is a way. Define a function to change the values and call it in a 
apply loop. But Tom's suggestions are more reasonable, you should have a 
good reason why to change the data.



x <- '
tree  shrub  grass
32 11   47
23  41  26
49  23  18'
orig <- read.table(textConnection(x), header = TRUE)

f <- function(x) {
  stopifnot(length(x) == 3L)
  i_min <- which.min(x)
  i_max <- which.max(x)
  s <- (x[i_min] - 10) + (x[i_max] - 50)
  x[i_min] <- 10
  x[i_max] <- 50
  x[-c(i_min, i_max)] <- x[-c(i_min, i_max)] + s
  x
}

t(apply(orig, 1, f))
#   tree shrub grass
#  [1,]   301050
#  [2,]   105030
#  [3,]   503010


Hope this helps,

Rui Barradas

Às 20:28 de 29/05/2022, Janet Choate escreveu:

Hi R community,
I have a data frame with three variables, where each row adds up to 90.
I want to assign a category of low, medium, or high to the values in each
row - where the lowest value per row will be set to 10, the medium value
set to 30, and the high value set to 50 - so each row still adds up to 90.

For example:
Data: Orig
tree  shrub  grass
32 11   47
23  41  26
49  23  18

Data: New
tree  shrub  grass
30  10  50
10   50 30
50   30 10

I am not attaching any code here as I have not been able to write anything
effective! appreciate help with this!
thank you,
JC

--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External] categorizing data

2022-05-29 Thread Richard M. Heiberger
Orig <- read.table(text="
tree shrub grass
32 11 47
23 41 26
49 23 18
", header=TRUE)

New <- Orig
for (i in seq(nrow(Orig)))
  New[i,] <- c(10, 30, 50)[order(unlist(Orig[i,]))]

New


> On May 29, 2022, at 15:28, Janet Choate  wrote:
> 
> Hi R community,
> I have a data frame with three variables, where each row adds up to 90.
> I want to assign a category of low, medium, or high to the values in each
> row - where the lowest value per row will be set to 10, the medium value
> set to 30, and the high value set to 50 - so each row still adds up to 90.
> 
> For example:
> Data: Orig
> tree  shrub  grass
> 32 11   47
> 23  41  26
> 49  23  18
> 
> Data: New
> tree  shrub  grass
> 30  10  50
> 10   50 30
> 50   30 10
> 
> I am not attaching any code here as I have not been able to write anything
> effective! appreciate help with this!
> thank you,
> JC
> 
> --
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Crmh%40temple.edu%7C165bca7d509542fc339d08da41a98821%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637894493792524879%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=ZxDMzULApfm9p%2BnnXhToAfvFNZx7du6e%2BbqoaNc6iYE%3Dreserved=0
> PLEASE do read the posting guide 
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=05%7C01%7Crmh%40temple.edu%7C165bca7d509542fc339d08da41a98821%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637894493792524879%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=oVJe7FTikuD7Y59kbg9O1k4od357HPwTcylhTn6ZLWw%3Dreserved=0
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] categorizing data

2022-05-29 Thread Bill Dunlap
You could write a function that deals with one row of your data, based on
the order() function.  E.g.,
  > to_10_30_50
  function(x) {
stopifnot(is.numeric(x), length(x)==3, sum(x)==90, all(x>0))
c(10,30,50)[order(x)]
  }
  
  > to_10_30_50(c(23,41,26))
  [1] 10 50 30
Then loop over the rows.  Since this is a data.frame and not a matrix, you
need to coerce each row from a single-row data.frame to a numeric vector:
  > data <- data.frame(tree=c(32,23,49), shrub=c(11,41,23),
grass=c(47,26,18))
  > for(i in 1:nrow(new)) data[i,] <- to_10_30_50(as.numeric(data[i,]))
  > data
tree shrub grass
  1   301050
  2   105030
  3   503010

-Bill

On Sun, May 29, 2022 at 12:29 PM Janet Choate  wrote:

> Hi R community,
> I have a data frame with three variables, where each row adds up to 90.
> I want to assign a category of low, medium, or high to the values in each
> row - where the lowest value per row will be set to 10, the medium value
> set to 30, and the high value set to 50 - so each row still adds up to 90.
>
> For example:
> Data: Orig
> tree  shrub  grass
> 32 11   47
> 23  41  26
> 49  23  18
>
> Data: New
> tree  shrub  grass
> 30  10  50
> 10   50 30
> 50   30 10
>
> I am not attaching any code here as I have not been able to write anything
> effective! appreciate help with this!
> thank you,
> JC
>
> --
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] categorizing data

2022-05-29 Thread Tom Woolman



Some ideas:

You could create a cluster model with k=3 for each of the 3 variables, 
to determine what constitutes high/medium/low centroid values for each 
of the 3 types of plant types. Centroid values could then be used as the 
upper/lower boundary ranges for high/med/low.


Or utilize a histogram for each variable, and use quantiles or 
densities, etc. to determine the natural breaks for the high/med/low 
ranges for each of the IVs.





On 2022-05-29 15:28, Janet Choate wrote:

Hi R community,
I have a data frame with three variables, where each row adds up to 90.
I want to assign a category of low, medium, or high to the values in 
each
row - where the lowest value per row will be set to 10, the medium 
value
set to 30, and the high value set to 50 - so each row still adds up to 
90.


For example:
Data: Orig
tree  shrub  grass
32 11   47
23  41  26
49  23  18

Data: New
tree  shrub  grass
30  10  50
10   50 30
50   30 10

I am not attaching any code here as I have not been able to write 
anything

effective! appreciate help with this!
thank you,
JC

--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to color boxplots with respect to the variable names

2022-05-29 Thread Neha gupta
I have the following data and I need to use a boxplot which displays the
variables (RF, Ranger, SVM, KNN) with one color, variables (RF_boot,
Ranger_boot, SVM_boot, KNN_boot) with another color and the variables
(RF_LOO, SVM_LOO, Ranger_LOO, KNN_LOO) with another color.

How can I do that? Currently, I am using the base boxplot which displays
them in one color. I know it will be more easily achieved with ggplot but I
have no experience/knowledge with it.

RF= c(4.7, 1.52, 1.46, 4.5, 0.62, 1.12)
RF_LOO= c(5.2, 1.52, 1.44, 4.3, 0.64, 1.11)
RF_boot= c(5.8, 1.5, 1.23, 4.3, 0.64, 1.12)
Ranger= c(4.5, 1.57, 1.25, 3.75, 0.56, 1.09)
Ranger_LOO= c(5, 1.56, 1.35, 3.7, 0.6, 1.0)
Ranger_boot= c(4.2, 1.53, 1.12, 3.7, 0.63, 1.1)
SVM= c(3.51, 1.34, 0.62, 1.45, 0.5, 1.06)
SVM_LOO= c(3.6, 1.33, 0.33, 1.4, 0.41, 1.1)
SVM_boot= c(3.75, 1.35, 0.58, 1.4, 0.4, 1.0)
KNN= c(2.85, 1.35, 0.25, 1.76, 0.43, 1.25)
KNN_LOO= c(2.85, 1.34, 0.375, 1.75, 0.44, 1.27)
KNN_boot= c(2.75, 1.35, 0.375, 1.75, 0.45, 1.27)

My base boxplot is here

colors = rep("blue",12)
at.x <- seq(1,by=.4, length.out = 10)
boxplot(RF, RF_LOO, RF_boot, Ranger, Ranger_LOO, Ranger_boot, SVM, SVM_LOO,
SVM_boot,
KNN, KNN_LOO, KNN_boot, range = 0, col=colors, names= c("RF",
"RF_LOO", "RF_boot",
"Ranger", "Ranger_LOO", "Ranger_boot", "SVM", "SVM_LOO", "SVM_boot",
"KNN", "KNN_LOO",
"KNN_boot"),las=2,boxwex=0.5,outline=FALSE,cex.axis=0.8, main="Consistency
of the 100% features ")

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Use of ellipsis

2022-05-29 Thread Andreas Matre



Thank you very much Ivan and Bert! I used the eval(substitute()) 
workaround suggested by Ivan and it worked perfectly.


Andreas Matre

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.