Re: [R] How to represent tree-structured values

2022-05-30 Thread Jan van der Laan
For visualising hierarchical data a treemap can also work well. For 
example, using the treemap package:


n <- 1000

library(data.table)
library(treemap)

dta <- data.table(
  level1 = sample(LETTERS[1:5], n, replace = TRUE),
  level2 = sample(letters[1:5], n, replace = TRUE),
  level3 = sample(1:9, n, replace = TRUE),
  event = sample(0:1, n, replace = TRUE)
  )

tab <- dta[, .(n = .N, rate = sum(event)/.N),
  by = .(level1, level2, level3)]

treemap(tab, index = names(tab)[1:3], vSize = "n", vColor = "rate",
  type = "value", fontsize.labels = 20*c(1, 0.7, 0))


--

Jan




On 30-05-2022 11:40, Jim Lemon wrote:

Hi Richard,
Thinking about this, you might also find intersectDiagram, also in
plotrix, to be useful.

Jim

On Mon, May 30, 2022 at 4:37 PM Jim Lemon  wrote:

Hi Richard,
Some years ago I had a try at illustrating Multiple Causes of Death
(MCoD) data. I settled on what is sometimes called a "sizetree". You
can see some examples in the sizetree function help page in "plotrix".
Unfortunately I can't use the original data as it was confidential.

Jim

On Mon, May 30, 2022 at 2:55 PM Richard O'Keefe  wrote:

There is a kind of data I run into fairly often
which I have never known how to represent in R,
and nothing I've tried really satisfies me.

Consider for example
  ...
  - injuries
...
- injuries to limbs
  ...
  - injuries to extremities
...
- injuries to hands
  - injuries to dominant hand
  - injuries to non-dominant hand
...
  ...
...

This isn't ordinal data, because there is no
"left to right" order on the values.  But there
IS a "part/whole" order, which an analysis should
respect, so it's not pure nominal data either.

As one particular example, if I want to
tabulate data like this, an occurrence of one
value should be counted as an occurrence of
*every* superordinate value.

Examples of such data include "why is this patient
being treated", "what drug is this patient being
treated with", "what geographic region is this
school from", "what biological group does this
insect belong to".

So what is the recommended way to represent
and the recommended way to analyse such data in R?

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to represent tree-structured values

2022-05-30 Thread Jim Lemon
Hi Richard,
Thinking about this, you might also find intersectDiagram, also in
plotrix, to be useful.

Jim

On Mon, May 30, 2022 at 4:37 PM Jim Lemon  wrote:
>
> Hi Richard,
> Some years ago I had a try at illustrating Multiple Causes of Death
> (MCoD) data. I settled on what is sometimes called a "sizetree". You
> can see some examples in the sizetree function help page in "plotrix".
> Unfortunately I can't use the original data as it was confidential.
>
> Jim
>
> On Mon, May 30, 2022 at 2:55 PM Richard O'Keefe  wrote:
> >
> > There is a kind of data I run into fairly often
> > which I have never known how to represent in R,
> > and nothing I've tried really satisfies me.
> >
> > Consider for example
> >  ...
> >  - injuries
> >...
> >- injuries to limbs
> >  ...
> >  - injuries to extremities
> >...
> >- injuries to hands
> >  - injuries to dominant hand
> >  - injuries to non-dominant hand
> >...
> >  ...
> >...
> >
> > This isn't ordinal data, because there is no
> > "left to right" order on the values.  But there
> > IS a "part/whole" order, which an analysis should
> > respect, so it's not pure nominal data either.
> >
> > As one particular example, if I want to
> > tabulate data like this, an occurrence of one
> > value should be counted as an occurrence of
> > *every* superordinate value.
> >
> > Examples of such data include "why is this patient
> > being treated", "what drug is this patient being
> > treated with", "what geographic region is this
> > school from", "what biological group does this
> > insect belong to".
> >
> > So what is the recommended way to represent
> > and the recommended way to analyse such data in R?
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to represent tree-structured values

2022-05-29 Thread Jim Lemon
Hi Richard,
Some years ago I had a try at illustrating Multiple Causes of Death
(MCoD) data. I settled on what is sometimes called a "sizetree". You
can see some examples in the sizetree function help page in "plotrix".
Unfortunately I can't use the original data as it was confidential.

Jim

On Mon, May 30, 2022 at 2:55 PM Richard O'Keefe  wrote:
>
> There is a kind of data I run into fairly often
> which I have never known how to represent in R,
> and nothing I've tried really satisfies me.
>
> Consider for example
>  ...
>  - injuries
>...
>- injuries to limbs
>  ...
>  - injuries to extremities
>...
>- injuries to hands
>  - injuries to dominant hand
>  - injuries to non-dominant hand
>...
>  ...
>...
>
> This isn't ordinal data, because there is no
> "left to right" order on the values.  But there
> IS a "part/whole" order, which an analysis should
> respect, so it's not pure nominal data either.
>
> As one particular example, if I want to
> tabulate data like this, an occurrence of one
> value should be counted as an occurrence of
> *every* superordinate value.
>
> Examples of such data include "why is this patient
> being treated", "what drug is this patient being
> treated with", "what geographic region is this
> school from", "what biological group does this
> insect belong to".
>
> So what is the recommended way to represent
> and the recommended way to analyse such data in R?
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to represent tree-structured values

2022-05-29 Thread Jeff Newmiller
Really this depends on the analysis you want to perform.

In the past, I have used a super/sub two-column format as a compact, 
non-redundant representation for data entry, and after applying a recursive 
algorithm to convert this to a super/sub/level/id table where _all_ sub 
components have (duplicative) entries corresponding to each super component.

But there is always the recursive list structure that formats such as yaml and 
json functions typically return.

On May 29, 2022 9:54:44 PM PDT, Richard O'Keefe  wrote:
>There is a kind of data I run into fairly often
>which I have never known how to represent in R,
>and nothing I've tried really satisfies me.
>
>Consider for example
> ...
> - injuries
>   ...
>   - injuries to limbs
> ...
> - injuries to extremities
>   ...
>   - injuries to hands
> - injuries to dominant hand
> - injuries to non-dominant hand
>   ...
> ...
>   ...
>
>This isn't ordinal data, because there is no
>"left to right" order on the values.  But there
>IS a "part/whole" order, which an analysis should
>respect, so it's not pure nominal data either.
>
>As one particular example, if I want to
>tabulate data like this, an occurrence of one
>value should be counted as an occurrence of
>*every* superordinate value.
>
>Examples of such data include "why is this patient
>being treated", "what drug is this patient being
>treated with", "what geographic region is this
>school from", "what biological group does this
>insect belong to".
>
>So what is the recommended way to represent
>and the recommended way to analyse such data in R?
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to represent tree-structured values

2022-05-29 Thread Richard O'Keefe
There is a kind of data I run into fairly often
which I have never known how to represent in R,
and nothing I've tried really satisfies me.

Consider for example
 ...
 - injuries
   ...
   - injuries to limbs
 ...
 - injuries to extremities
   ...
   - injuries to hands
 - injuries to dominant hand
 - injuries to non-dominant hand
   ...
 ...
   ...

This isn't ordinal data, because there is no
"left to right" order on the values.  But there
IS a "part/whole" order, which an analysis should
respect, so it's not pure nominal data either.

As one particular example, if I want to
tabulate data like this, an occurrence of one
value should be counted as an occurrence of
*every* superordinate value.

Examples of such data include "why is this patient
being treated", "what drug is this patient being
treated with", "what geographic region is this
school from", "what biological group does this
insect belong to".

So what is the recommended way to represent
and the recommended way to analyse such data in R?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.