[R-es] Ordenar gráficos de distribución de frecuencias con ggplot2

2021-03-06 Thread Manuel Mendoza
Buenos días, como veis en el código que os copio abajo, represento la
distribución de la frecuencia de muestras de las 6 categorías presentes en
la variable Clst, a lo largo de la variable NPP. Me representa los 6
gráficos ordenados de arriba a abajo. Dos cuestiones:
1. ¿Cómo le puedo indicar el orden?
2. ¿Cómo puedo representar los 6 juntos, superpuestos (con cierta
transparencia) en un mismo gráfico?

Muchas gracias, como siempre,
Manuel


pIFd = data %>%
  gather(x, y, NPP) %>%
  ggplot(aes(x = y, y = Clst, color = Clst, fill = Clst)) +
  facet_wrap( ~ x, scale = "free", ncol = 3) +
  scale_fill_tableau() +
  scale_color_tableau() +
  geom_density_ridges(alpha = 0.8) +
  guides(fill = F, color = F)

windows();pIFd

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] quantile from quantile table calculation without original data

2021-03-06 Thread David Winsemius



On 3/6/21 1:02 AM, Abby Spurdle wrote:

I came up with a solution.
But not necessarily the best solution.

I used a spline to approximate the quantile function.
Then use that to generate a large sample.
(I don't see any need for the sample to be random, as such).
Then compute the sample mean and sd, on a log scale.
Finally, plug everything into the plnorm function:

p <- seq (0.01, 0.99,, 1e6)
Fht <- splinefun (temp$percent, temp$size)
x <- log (Fht (p) )
psolution <- plnorm (0.1, mean (x), sd (x), FALSE)
psolution

The value of the solution is very close to one.
Which is not a surprise.

Here's a plot of everything:

u <- seq (0.01, 1.65,, 200)
v <- plnorm (u, mean (x), sd (x), FALSE)
plot (u, v, type="l", ylim = c (0, 1) )
points (temp$size, temp$percent, pch=16)
points (0.1, psolution, pch=16, col="blue")


Here's another approach, which uses minimization of the squared error to 
get the parameters for a lognormal distribution.


temp <- structure(list(size = c(1.6, 0.9466, 0.8062, 0.6477, 0.5069,
0.3781, 0.3047, 0.2681, 0.1907), percent = c(0.01, 0.05, 0.1,
0.25, 0.5, 0.75, 0.9, 0.95, 0.99)), .Names = c("size", "percent"
), row.names = c(NA, -9L), class = "data.frame")

obj <- function(x) {sum( qlnorm(1-temp$percent, x[[1]], 
x[[2]])-temp$size )^2}


# Note the inversion of the poorly named and flipped "percent" column,

optim( list(a=-0.65, b=0.42), obj)

#

$par
 a  b
-0.7020649  0.4678656

$value
[1] 3.110316e-12

$counts
function gradient
  51   NA

$convergence
[1] 0

$message
NULL


I'm not sure how principled this might be. There's no consideration in 
this approach for expected sampling error at the right tail where the 
magnitudes of the observed values will create much larger contributions 
to the sum of squares.


--

David.




On Sat, Mar 6, 2021 at 8:09 PM Abby Spurdle  wrote:

I'm sorry.
I misread your example, this morning.
(I didn't read the code after the line that calls plot).

After looking at this problem again, interpolation doesn't apply, and
extrapolation would be a last resort.
If you can assume your data comes from a particular type of
distribution, such as a lognormal distribution, then a better approach
would be to find the most likely parameters.

i.e.
This falls within the broader scope of maximum likelihood.
(Except that you're dealing with a table of quantile-probability
pairs, rather than raw observational data).

I suspect that there's a relatively easy way of finding the parameters.

I'll think about it...
But someone else may come back with an answer first...


On Sat, Mar 6, 2021 at 8:17 AM Abby Spurdle  wrote:

I note three problems with your data:
(1) The name "percent" is misleading, perhaps you want "probability"?
(2) There are straight (or near-straight) regions, each of which, is
equally (or near-equally) spaced, which is not what I would expect in
problems involving "quantiles".
(3) Your plot (approximating the distribution function) is
back-the-front (as per what is customary).


On Fri, Mar 5, 2021 at 10:14 PM PIKAL Petr  wrote:

Dear all

I have table of quantiles, probably from lognormal distribution

  dput(temp)
temp <- structure(list(size = c(1.6, 0.9466, 0.8062, 0.6477, 0.5069,
0.3781, 0.3047, 0.2681, 0.1907), percent = c(0.01, 0.05, 0.1,
0.25, 0.5, 0.75, 0.9, 0.95, 0.99)), .Names = c("size", "percent"
), row.names = c(NA, -9L), class = "data.frame")

and I need to calculate quantile for size 0.1

plot(temp$size, temp$percent, pch=19, xlim=c(0,2))
ss <- approxfun(temp$size, temp$percent)
points((0:100)/50, ss((0:100)/50))
abline(v=.1)

If I had original data it would be quite easy with ecdf/quantile function but 
without it I am lost what function I could use for such task.

Please, give me some hint where to look.


Best regards

Petr
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních 
partnerů PRECHEZA a.s. jsou zveřejněny na: 
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about 
processing and protection of business partner's personal data are available on 
website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a 
podléhají tomuto právně závaznému prohlá±ení o vyloučení odpovědnosti: 
https://www.precheza.cz/01-dovetek/ | This email and any documents attached to 
it may be confidential and are subject to the legally binding disclaimer: 
https://www.precheza.cz/en/01-disclaimer/


 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see

Re: [R-es] Separar respuestas en distintas columnas.

2021-03-06 Thread palazon
Hola:

Lo que quieres es crear variables dummy, aquí puedes ver el 
procedimiento muy claramente: 
https://statisticsglobe.com/convert-factor-to-dummy-variables-in-r

Seguimos

El 5/3/21 a las 19:54, juan manuel dias escribió:
> Hola Estimados/as,
>
> Tengo la siguiente tarea que realizar.
>
> Tengo una base y una de las variables (problemas_salud_paciente) es de 
> respuesta múltiple (acepta más de una opción de respuesta por 
> caso/fila) pero tiene cargadas todas las respuestas en la misma 
> columna, y dichas opciones están separadas por una coma (",").
>
> 1- Necesito separar cada opción de respuesta en una columna distinta.
>
> 2- Generar tantas columnas como opciones de respuesta tenga.
>
> 3- Luego, a cada columna (opción de respuesta) debería asignarle valor 
> 1 según si la fila/caso haya respondido por esa opción.
>
> Así está la variable en la base:
>
> id   problemas_salud_paciente
>  1    Demencia
>  2    Demencia
>  3    Enfermedad Pulmonar, Demencia, Afasia primaria progresiva 
> diagnosticada 2010
>  4    Enfermedad Cardíaca
>
>
> Y así necesitaría que quede:
>
> id  demencia | enferm_pulmonar | afasia_prima_progr  | enfermedad_cardiaca
>  1      1
>  2      1
>  3      1                           1           1
>  4                                               1
>
> Dejo un csv con una muestra de casos de esa variable.
>
> Muchas gracias. Saludos, Juan.
>
>
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es

-- 

José Antonio Palazón Ferrando
Profesor Titular. Departamento de Ecología e Hidrología.
Facultad de Biología. Universidad de Murcia.
Campus Universitario de Espinardo
30100 MURCIA-SPAIN
Telf: +34 868 88 49 80
Fax : +34 868 88 39 63
Email: pala...@um.es


[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] quantile from quantile table calculation without original data

2021-03-06 Thread Abby Spurdle
I came up with a solution.
But not necessarily the best solution.

I used a spline to approximate the quantile function.
Then use that to generate a large sample.
(I don't see any need for the sample to be random, as such).
Then compute the sample mean and sd, on a log scale.
Finally, plug everything into the plnorm function:

p <- seq (0.01, 0.99,, 1e6)
Fht <- splinefun (temp$percent, temp$size)
x <- log (Fht (p) )
psolution <- plnorm (0.1, mean (x), sd (x), FALSE)
psolution

The value of the solution is very close to one.
Which is not a surprise.

Here's a plot of everything:

u <- seq (0.01, 1.65,, 200)
v <- plnorm (u, mean (x), sd (x), FALSE)
plot (u, v, type="l", ylim = c (0, 1) )
points (temp$size, temp$percent, pch=16)
points (0.1, psolution, pch=16, col="blue")


On Sat, Mar 6, 2021 at 8:09 PM Abby Spurdle  wrote:
>
> I'm sorry.
> I misread your example, this morning.
> (I didn't read the code after the line that calls plot).
>
> After looking at this problem again, interpolation doesn't apply, and
> extrapolation would be a last resort.
> If you can assume your data comes from a particular type of
> distribution, such as a lognormal distribution, then a better approach
> would be to find the most likely parameters.
>
> i.e.
> This falls within the broader scope of maximum likelihood.
> (Except that you're dealing with a table of quantile-probability
> pairs, rather than raw observational data).
>
> I suspect that there's a relatively easy way of finding the parameters.
>
> I'll think about it...
> But someone else may come back with an answer first...
>
>
> On Sat, Mar 6, 2021 at 8:17 AM Abby Spurdle  wrote:
> >
> > I note three problems with your data:
> > (1) The name "percent" is misleading, perhaps you want "probability"?
> > (2) There are straight (or near-straight) regions, each of which, is
> > equally (or near-equally) spaced, which is not what I would expect in
> > problems involving "quantiles".
> > (3) Your plot (approximating the distribution function) is
> > back-the-front (as per what is customary).
> >
> >
> > On Fri, Mar 5, 2021 at 10:14 PM PIKAL Petr  wrote:
> > >
> > > Dear all
> > >
> > > I have table of quantiles, probably from lognormal distribution
> > >
> > >  dput(temp)
> > > temp <- structure(list(size = c(1.6, 0.9466, 0.8062, 0.6477, 0.5069,
> > > 0.3781, 0.3047, 0.2681, 0.1907), percent = c(0.01, 0.05, 0.1,
> > > 0.25, 0.5, 0.75, 0.9, 0.95, 0.99)), .Names = c("size", "percent"
> > > ), row.names = c(NA, -9L), class = "data.frame")
> > >
> > > and I need to calculate quantile for size 0.1
> > >
> > > plot(temp$size, temp$percent, pch=19, xlim=c(0,2))
> > > ss <- approxfun(temp$size, temp$percent)
> > > points((0:100)/50, ss((0:100)/50))
> > > abline(v=.1)
> > >
> > > If I had original data it would be quite easy with ecdf/quantile function 
> > > but without it I am lost what function I could use for such task.
> > >
> > > Please, give me some hint where to look.
> > >
> > >
> > > Best regards
> > >
> > > Petr
> > > Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních 
> > > partnerů PRECHEZA a.s. jsou zveřejněny na: 
> > > https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information 
> > > about processing and protection of business partner's personal data are 
> > > available on website: 
> > > https://www.precheza.cz/en/personal-data-protection-principles/
> > > Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou 
> > > důvěrné a podléhají tomuto právně závaznému prohlá±ení o vyloučení 
> > > odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any 
> > > documents attached to it may be confidential and are subject to the 
> > > legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
> > >
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.