[R] unable to get barchart of censored subjects from ggsurvplot() in survminer package, if there is no predcitor

2024-06-17 Thread Christopher W. Ryan via R-help
Hello. Running R 4.2.3 on Windows 10. Using survimer package version
survminer_0.4.9 published 2021-03-09.

I'm encountering an error with ggsurvplot() in the survminer package.
Email to the author/maintainer about 2 weeks ago has not yet resulted in
a reply.

I seem unable to produce a n.censor.table for
an unconditional (no predictor) survival curve. See code below.

Any advice?

Thanks.

--Chris Ryan, MD, MS, MSPH


library(survival)
library(survminer)
data(lung)

fit <- survfit(Surv(time, status) ~ 1, data = lung)


ggsurvplot(
   fit,
   data = lung,
   xlab = "Time in days",
   ncensor.plot = TRUE
  )

## produces error "Error:
! Problem while converting geom to grob.
ℹ Error occurred in the 1st layer.
Caused by error:
! Unknown colour name: strata"
about "uknown colour name strata"
## but there are no strata



## omit ncensor.plot, and no error

ggsurvplot(
   fit,
   data = lung,
   xlab = "Time in days",
   #   ncensor.plot = TRUE
  )

## restore ncensor.plot = TRUE, but use a predictor in the model,
## and no error.

fit.2 <- survfit(Surv(time, status) ~ sex, data = lung)
ggsurvplot(
   fit.2,
   data = lung,
   xlab = "Time in days",
   ncensor.plot = TRUE
  )

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Obtaining a value of pie in a zero inflated model (fm-zinb2)

2024-01-04 Thread Christopher W. Ryan via R-help
Are you referring to the zeroinfl() function in the countreg package? If
so, I think

predict(fm_zinb2, type = "zero", newdata = some.new.data)

will give you pi for each combination of covariate values that you
provide in some.new.data

where pi is the probability to observe a zero from the point mass component.

As to your second question, I'm not sure that's possible, for any
*particular, individual* subject. Others will undoubtedly know better
than I.

--Chris Ryan

Sorkin, John wrote:
> I am running a zero inflated regression using the zeroinfl function similar 
> to the model below:
>   
>  fm_zinb2 <- zeroinfl(art ~ . | ., data = bioChemists, dist = "poisson")
> summary(fm_zinb2)
> 
> I have three questions:
> 
> 1) How can I obtain a value for the parameter pie, which is the fraction of 
> the population that is in the zero inflated model vs the fraction in the 
> count model? 
> 
> 2) For any particular subject, how can I determine if the subject is in the 
> portion of the population that contributes a zero count because the subject 
> is in the group of subjects who have structural zero responses vs. the 
> subject being in the portion of the population who can contribute a zero or a 
> non-zero response?
> 
> 3) zero inflated models can be solved using closed form solutions, or using 
> iterative methods. Which method is used by fm_zinb2?
> 
> Thank you,
> John
> 
> John David Sorkin M.D., Ph.D.
> Professor of Medicine, University of Maryland School of Medicine;
> 
> Associate Director for Biostatistics and Informatics, Baltimore VA Medical 
> Center Geriatrics Research, Education, and Clinical Center; 
> 
> PI Biostatistics and Informatics Core, University of Maryland School of 
> Medicine Claude D. Pepper Older Americans Independence Center;
> 
> Senior Statistician University of Maryland Center for Vascular Research;
> 
> Division of Gerontology and Paliative Care,
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> Cell phone 443-418-5382
> 
> 
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anyone having trouble accesing CRAN?

2023-11-15 Thread Christopher W. Ryan via R-help
Ah, thanks all. Guess I missed the message before they started the
maintenance.

--Chris

Ivan Krylov wrote:
> On Wed, 15 Nov 2023 14:13:00 -0500
> "Christopher W. Ryan via R-help"  wrote:
> 
>> Anyone seeing similar?
> 
> Same for me.
> 
> While it worked, CRAN website had the following message:
> 
>>> The CRAN Admin Team will perform system upgrades during the period
>>> Wednesday November 15 until Thursday November 16, 2023. There will
>>> be intermittent outages in service during this time. 
> 
> Use chooseCRANmirror(local.only = TRUE) (or subset() the return value
> of getCRANmirrors(local.only = TRUE)) to access a mirror that works.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] anyone having trouble accesing CRAN?

2023-11-15 Thread Christopher W. Ryan via R-help
at https://cran.r-project.org/ I get this error message:

=
Secure Connection Failed

An error occurred during a connection to cran.r-project.org.
PR_END_OF_FILE_ERROR

Error code: PR_END_OF_FILE_ERROR

The page you are trying to view cannot be shown because the
authenticity of the received data could not be verified.
===

Three different browsers, two different devices, two different networks.
(The text of the error messages varies.)

Anyone seeing similar?

Thanks.

--Chris Ryan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] make a lattice dotplot with symbol size proportional to a variable in the plotted dataframe

2023-11-08 Thread Christopher W. Ryan via R-help
Very helpful, Deepayan, and educational. Thank you.

What does NSE stand for?

Thanks,
Chris

Deepayan Sarkar wrote:
> 
> --Chris Ryan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum data according to date in sequence

2023-11-02 Thread Christopher W. Ryan via R-help
date appears to be a character variable, and R is treating it as such.

str(dt1)

might give you some insight.  Or the dplyr equivalent

glimpse(dt1)

I think R did what you asked, but if you want to be able to order
records by date, in temporal order, you need to tell R that it is a date:

library(dplyr)
sums <- dt1 %>%
mutate(realDate = as.Date(date, format = "%m/%d/%Y") %>%
group_by(realDate) %>%
summarise(EnergykWh = sum(EnergykWh))


--Chris Ryan


roslinazairimah zakaria wrote:
> Dear all,
> 
> I have this set of data. I would like to sum the EnergykWh according date
> sequences.
> 
>> head(dt1,20)   StationName  date  time EnergykWh
> 1  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09  4.680496
> 2  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50  6.272414
> 3  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22  1.032782
> 4  PALO ALTO CA / CAMBRIDGE #1 1/15/2016  8:25 11.004884
> 5  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824
> 6  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17  6.658797
> 7  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46  4.808874
> 8  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19  1.469384
> 9  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12  2.996239
> 10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12  0.303222
> 11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22  4.988339
> 12 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:16  8.131804
> 13 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:19  0.117156
> 14 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 20:24  3.285669
> 15 PALO ALTO CA / CAMBRIDGE #1 1/17/2016  9:54  1.175608
> 16 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 12:16  3.677487
> 17 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 13:53  1.068393
> 18 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 19:03  8.820755
> 19 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 22:00  8.138583
> 20 PALO ALTO CA / CAMBRIDGE #1 1/18/2016  8:58  9.057500
> 
> I have tried this:
> library(dplyr)
> sums <- dt1 %>%
>   group_by(date) %>%
>   summarise(EnergykWh = sum(EnergykWh))
> 
> head(sums,20)
> 
> The date is not by daily sequence but by year sequence.
> 
>> head(sums,20)# A tibble: 20 × 2
>date  EnergykWh
>  1 1/1/2017 25.3   2 1/1/2018 61.0   3
> 1/1/2019  0.627 4 1/1/2020 10.7   5 1/10/201769.4   6
> 1/10/201854.5   7 1/10/201949.1   8 1/10/202045.9   9
> 1/11/201773.9  10 1/11/201853.3  11 1/11/201993.5  12
> 1/11/202066.7  13 1/12/201778.6  14 1/12/201842.2  15
> 1/12/201922.7  16 1/12/202080.9  17 1/13/201785.6  18
> 1/13/201846.4  19 1/13/201940.0  20 1/13/2020   121.
> 
> 
> 
> Thank you very much for any help given.
> 
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Amazing AI

2022-12-19 Thread Christopher W. Ryan via R-help
In clinical medicine, the question the patient asks rarely represents
their main concern. Most of what I've done in my career, and most of
what I've taught, is about how to have the back-and-forth dynamic dialoq
with the patient, to help them formulate what's really on their mind,
and make sure I understand it before proceeding.

Seems to me that statistical consulting, or working IT in an
organization where one is serving "internal" customers, is similar.
Students need to learn the skills of clarifying, verifying, and
paraphrasing a client's needs.

Sure, ChatGPT may be able to generate decent code to do what a client
*says* they want to do, but is that *really* what they are looking for?

So don't retire yet :)

--Chris Ryan

Boris Steipe wrote:
> Technically not a help question. But crucial to be aware of, especially for 
> those of us in academia, or otherwise teaching R. I am not aware of a 
> suitable alternate forum. If this does not interest you, please simply ignore 
> - I already know that this may be somewhat OT.
> 
> Thanks.
> --
> 
> You very likely have heard of ChatGPT, the conversation interface on top of 
> the GPT-3 large language model and that it can generate code. I thought it 
> doesn't do R - I was wrong. Here is a little experiment:
> 
> MY PROMPT:
> ==
> Please write me some R code that will read a tsv file, remove all rows that 
> have duplicates in the column "names", and store the median values of the 
> remaining rows in a variable called "Medians". Of course, that calculation 
> should only be done with columns that are numeric. Print the last variable in 
> order from largest to smallest.
> 
> 
> AI RESPONSE:
> 
> 
> Here is some R code that should accomplish what you've described:
> 
> (Note: the code is formatted in a little terminal window, complete with a 
> button to "copy code" at the top.)
> 
> 
> # Read the tsv file
> data <- read.table("file.tsv", header = TRUE, sep = "\t")
> 
> # Remove rows with duplicates in the "names" column
> data <- data[!duplicated(data$names), ]
> 
> # Select only the numeric columns
> num_columns <- sapply(data, is.numeric)
> numeric_data <- data[, num_columns]
> 
> # Calculate the medians for each column
> Medians <- apply(numeric_data, 1, median)
> 
> # Sort Medians in descending order
> Medians <- sort(Medians, decreasing = TRUE)
> 
> # Print Medians
> print(Medians)
> 
> 
> This code assumes that the tsv file is called "file.tsv" and is stored in the 
> current working directory. It also assumes that the first row of the file 
> contains the column names.
> 
> 
> 
> MY PROMPT:
> ==
> Can you provide me with a bit of sample data that I can use to test this code?
> 
> AI RESPONSE:
> 
> Certainly! Here is some sample data that you can use to test the code I 
> provided:
> 
> 
> name  age height  weight
> Alice 23  5.6 120
> Bob   32  6.2 180
> Charlie   29  5.9 160
> Alice 23  5.6 120
> David 25  5.8 140
> Eve   27  5.5 110
> 
> 
> You can save this data as a tsv file and use it to test the code.
> 
> 
> --
> 
> Notes: 
> 1) The code, as written does not work with the test data. The reason is that 
> the test data specifies a column "name", but the code assumes "names" (which 
> I specified in the instructions). Once this is fixed in the test data, the 
> code executes.
> 
> 2) The instructions are not accurately translated: I had asked for row means. 
> But changing the margin in the apply() statement from 2 to 1 does the correct 
> thing.
> 
> 3) Note that the test data contains both numeric and non-numeric columns. 
> Also, the name "Alice" is duplicated in the test data, which is subtle, and 
> the right thing to do.
> 
> 
> --
> 
> A similar experiment, run on the beta-site of OpenAI's Codex code-completion 
> tool gives the following code (complete with the comments.):
> 
> 
> # Read in the data
> data <- read.table("mydata.tsv", sep = "\t", header = TRUE)
> 
> # Remove all rows that have duplicates in the column "names"
> unique_names <- unique(data$names)
> data_unique <- data[data$names %in% unique_names,]
> 
> # Calculate medians 
> Medians <- sapply(data_unique[, sapply(data_unique, is.numeric)], median, 
> na.rm = TRUE)
> 
> # Print last variable in order form largest to smallest
> rev(sort(Medians[length(Medians)]))
> 
> 
> Note that the strategy is quite different (e.g using %in%, not duplicated() 
> ), the interpretation of "last variable" is technically correct but not what 
> I had in mind (ChatGPT got that right though).
> 
> 
> Changing my prompts slightly resulted it going for a dplyr solution instead, 
> complete with %>% idioms etc ... again, syntactically correct but not giving 
> me the fully correct results. 
> 
> 

Re: [R] Circular Graph Recommendation Request

2022-05-29 Thread Christopher W. Ryan via R-help
If the units of analysis are real spatial regions (e.g. states), how
about a cartogram?

https://gisgeography.com/cartogram-maps/

An R package (I have no experience with it)

https://cran.r-project.org/web/packages/cartogram/index.html

The advantage of a cartogram is that it is a single graphic, rather than
2 like the original post referenced. No need to move eye back and forth
to decode the colors. And it maintains---as much as possible given the
distortion, which is the whole point of a cartogram--- the relative
spatial positions of the areal units (in this case, states.)  The round
figure in the original post has the northern midwestern region in the
7:00 to 8:00-ish position, what might be considered notionally the
"southwest."  A little counterintuitive.

--Chris Ryan

Bert Gunter wrote:
> Very nice plot. Thanks for sharing.
> Can't help directly, but as the plot is sort of a map with polygonal
> areas encoding the value of a variable, you might try posting on
> r-sig-geo instead where there might be more relevant expertise in such
>  things -- or perhaps suggestions for alternative visualizations that
> work similarly.
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> On Sat, May 28, 2022 at 8:39 AM Stephen H. Dawson, DSL via R-help
>  wrote:
>>
>> https://www.visualcapitalist.com/us-goods-exports-by-state/
>> Visualizing U.S. Exports by State
>>
>> Good Morning,
>>
>>
>> https://www.visualcapitalist.com/wp-content/uploads/2022/05/us-exports-by-state-infographic.jpg
>>
>> Saw an impressive graph today. Sharing with the list.
>>
>> The size proportionality of the state segments in a circle graph is catchy.
>>
>> QUESTION
>> Is there a package one could use with R to accomplish this particular
>> circular-style graph?
>>
>>
>> Kindest Regards,
>> --
>> *Stephen Dawson, DSL*
>> /Executive Strategy Consultant/
>> Business & Technology
>> +1 (865) 804-3454
>> http://www.shdawson.com
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[ESS] When the R window opens, sometimes it is side-by-side with my .Rnw file. I want them one above the other. It seems unpredictable

2022-05-27 Thread Christopher W. Ryan via ESS-help
I'm using Vince Goulet's modified version of emacs 26.2 (build 1,
x86_64-w64-mingw32)  of 2019-04-13 with ESS, on Windows 10. I use Sweave
and LaTeX, so I have my R code in .Rnw files. When I execute my first
line of R code, an R buffer opens of course. I prefer it to be below my
Rnw buffer. Sometimes it is. But other times the R buffer opens to the
side. It's unpredictable, which is disconcerting; I have not been able
to discern a pattern.  How can I ensure that every time emacs/ESS opens
an R buffer, the emacs window is split top-and-bottom, rather than
side-by-side.

Thanks.

--Chris Ryan

__
ESS-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/ess-help


Re: [R] [External Email] Re: how to rename variables by lopping off first 3 characters

2022-03-14 Thread Christopher W Ryan via R-help
Thanks! This has gotten me on the right track.  One of my mistakes was that
I was explicitly using the word select, as in

rename_at(select(starts_with 
or
rename_at(select(vars(starts_with 

--Chris Ryan

On Mon, Mar 14, 2022 at 1:11 PM Bert Gunter  wrote:

> If you care to entertain this, one of many simple base R ways to do this
> is:
>
> dat <- data.frame(
> +i..One = 1:3,
> +i..Two = letters[1:3],
> +ixx = 5:7)
> > dat
>   i..One i..Two ixx
> 1  1  a   5
> 2  2  b   6
> 3  3  c   7
>
> > nm <- names(dat)
> > nm <- ifelse(substring(nm, 1,3) == "i..",
> +  substring(nm,4),
> +  nm)
> > names(dat) <- nm
> > dat
>   One Two ixx
> 1   1   a   5
> 2   2   b   6
> 3   3   c   7
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Mar 14, 2022 at 9:27 AM Christopher W Ryan via R-help
>  wrote:
> >
> > I have data coming to me from another source, in which some of the
> variable
> > names begin with "i.."
> >
> > As in "i..actual_meaningful_var_name"
> >
> > I would like to remove the first three characters from any variable name
> if
> > they are "i.."
> >
> > I'm using R on Win 10 and dplyr, so ideally I'm looking for a dplyr
> > solution. Apparently I'm just not understanding how the various select,
> > contains, rename_at, rename_with, and so-on dplyr expressions work. I've
> > tried various arrangements of them, usually resulting in
> >
> > Error: `contains()` must be used within a *selecting* function.
> > i See <https://tidyselect.r-lib.org/reference/faq-selection-context.html
> >
> >
> > A simple select(contains(foo)) I can do fine, to select a subset of
> > variables. It's combining it with renaming that I am struggling with.
> >
> > Grateful for any advice.
> >
> > Thanks.
> >
> > --Chris Ryan
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] the opposite of pluck() in purrr

2021-11-18 Thread Christopher W. Ryan via R-help
I've just learned about pluck() and chuck() in the purrr package. Very
cool!  As I understand it, they both will return one element of a list,
either by name or by [[]] index, or even "first"  or "last"

I was hoping to find a way to return all *but* one specified element of
a list. Speaking loosely, pluck(-1) or pluck(!1) or !pluck(1), but none
of those of course work. Thinking of English language, I had hopes for
chuck(1) as in "chuck element 1 away, leaving the rest"  but that's now
how it works.

Any tidyverse-centric ways to return all except one specified element of
a list?

Thanks.

--Chris Ryan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External Email] Re: how to make the far right section of a smoother line look different from the rest of that line?

2021-11-16 Thread Christopher W Ryan via R-help
Thanks Bert, that looks promising.

panel.smoother() is from latticeExtra

https://rdrr.io/cran/latticeExtra/man/panel.smoother.html

--Chris Ryan

On Tue, Nov 16, 2021 at 2:08 PM Bert Gunter  wrote:

> Where did you get panel.smoother()? There is no such panel function in
> lattice.
>
> Is something like this what you want?
>
> x <- 1:100
> y <- rnorm(100, mean =5)
> end <- 91 # tentative smooth after this
> xyplot(y ~x, cutoff = end, col1 = "black", col2 = "red"
>, panel = function(x, y,  col1, col2, cutoff) {
>   sqleft <- seq_len(cutoff)
>   sqright <- seq.int(cutoff +1, length(x))
>   col <- rep(c(col1,col2), times = c(cutoff, length(x) - cutoff))
>   panel.points(x, y, col = col)
>   ylo <- predict(loess(y ~ x))
>   panel.lines(x[sqleft], ylo[sqleft], col = col1, lwd =2)
>   panel.lines(x[sqright], ylo[sqright], col = col2, lwd = 2, lty =
> "dotted")
>})
>
> Notes:
> 1. This works because of loess default to predict at given x's. Modify as
> required if you change to another smoother or wish to use different points
> at which to plot the smoother.
> 2. This can almost certainly be done by creating a grouping variable to
> separate the two plotting regimes and might be slicker and more robust with
> that approach.
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Nov 16, 2021 at 7:45 AM Christopher W Ryan via R-help <
> r-help@r-project.org> wrote:
>
>> eclrs.3 %>%
>> mutate(start.week = floor_date(realCollectionDate, unit = "week")) %>%
>> group_by(start.week, k12) %>%
>> summarise(n = n(), pctpos = 100 * mean(realResult))  %>%
>> xyplot(pctpos ~ start.week | k12, col = "red", data = ., layout = c(1,2),
>> ylab = "percent of test results positive", xlab = "specimen collection
>> date",  strip = strip.custom(strip.names = c(TRUE, TRUE)),  sub = "The
>> final week shown may not yet be complete so is likely inaccurate",   panel
>> = function(...){
>> panel.xyplot(..., type = "p", cex = 0.8)
>> panel.smoother(..., span = 0.3, col.se = "red", alpha.se = 0.08, lwd
>> =
>> 2)})
>>
>> The above takes patient-level data, each record containing a date, and
>> aggregates them by week according to that date, then plots 2 weekly time
>> series, one for k12 and one for not-k12, each with a smoother. Note my
>> disclaimer in the subtitle that "the final week shown may not yet be
>> complete . . . ." since I might run this on any arbitrary day. How might I
>> change the appearance of the smoother lines to emphasize to the viewer
>> that
>> the recent trends are tentative due to daily data still coming in?  For
>> example, how might I make the far right-end stretch of the lines,
>> representing the most recent week or two, dotted?
>>
>> Thanks.
>>
>> --Chris Ryan
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to make the far right section of a smoother line look different from the rest of that line?

2021-11-16 Thread Christopher W Ryan via R-help
eclrs.3 %>%
mutate(start.week = floor_date(realCollectionDate, unit = "week")) %>%
group_by(start.week, k12) %>%
summarise(n = n(), pctpos = 100 * mean(realResult))  %>%
xyplot(pctpos ~ start.week | k12, col = "red", data = ., layout = c(1,2),
ylab = "percent of test results positive", xlab = "specimen collection
date",  strip = strip.custom(strip.names = c(TRUE, TRUE)),  sub = "The
final week shown may not yet be complete so is likely inaccurate",   panel
= function(...){
panel.xyplot(..., type = "p", cex = 0.8)
panel.smoother(..., span = 0.3, col.se = "red", alpha.se = 0.08, lwd =
2)})

The above takes patient-level data, each record containing a date, and
aggregates them by week according to that date, then plots 2 weekly time
series, one for k12 and one for not-k12, each with a smoother. Note my
disclaimer in the subtitle that "the final week shown may not yet be
complete . . . ." since I might run this on any arbitrary day. How might I
change the appearance of the smoother lines to emphasize to the viewer that
the recent trends are tentative due to daily data still coming in?  For
example, how might I make the far right-end stretch of the lines,
representing the most recent week or two, dotted?

Thanks.

--Chris Ryan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External Email] Live Online Training for High School Teachers and Students

2021-11-04 Thread Christopher W Ryan via R-help
Tracy--

I enjoy doing this sort of thing. Over the years I've done two full-day
"introduction to R" workshops for high school students. The workshops also
inevitably get into software-agnostic, basic issues about how to think
about data, and how to measure, record, and store it---which is all pretty
cool. They were in-person, pre-pandemic workshops, but I believe could be
adapted to a remote, online approach.  Feel free to email me.

If you don't already know about it, you might also be interested in the
R-sig-teaching List here:
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching

--Chris Ryan


On Thu, Nov 4, 2021 at 3:06 AM Tracy Lenz  wrote:

> Hi,
>
> I am looking for live training that can be conducted via Zoom or another
> online platform to assist high school teachers and students who are working
> with R. These teachers and students are using R at a very basic level.
> They've reviewed a variety of beginner-level texts and videos on R, but
> they continue to encounter issues that could be resolved in a session with
> someone who is more familiar with R. I'm not looking for a long-term
> solution such as a Code Academy course; rather, this session would be
> intended as a brief beginner's introduction to R as well as a Q for
> specific use cases and troubleshooting. I've searched online for such
> offerings but have not found anything. If anyone has any advice, I'd
> appreciate it. Thanks!
>
> Tracy Lenz
>
>
>
>
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External Email] Group by and add a constant value based on a condition dply

2021-05-26 Thread Christopher W Ryan via R-help
Is the grouping beforehand necessary? Could you simply, "for all the dates
that are "4.01.2020"  and have the "Value" greater than zero  add 5 to the
"Value" "?  I may be missing something.

--Chris Ryan

On Wed, May 26, 2021 at 11:53 AM Elahe chalabi via R-help <
r-help@r-project.org> wrote:

> Hi everyone,
>
> I have the following dataframe:
>
>
>
>   structure(list(Department = c("A", "A", "A", "A", "A", "A", "A",
>  "A", "B", "B", "B", "B", "B", "B", "B", "B"), Class = c(1L, 1L,
> 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Value = c(0L,
> 100L, 800L, 800L, 0L, 300L, 1200L, 0L, 0L, 0L, 400L, 400L, 200L,
> 800L, 1200L, 0L), Date = c("1.01.2020", "2.01.2020", "3.01.2020",
> "4.01.2020", "1.01.2020", "2.01.2020", "3.01.2020", "4.01.2020",
> "1.01.2020", "2.01.2020", "3.01.2020", "4.01.2020", "1.01.2020",
> "2.01.2020", "3.01.2020", "4.01.2020")), class = "data.frame",
> row.names = c(NA,
>  -16L))
>
>  using dplyr I need to group by "Depatment" and "Class" and then for all
> the dates that are "4.01.2020"  and have the "Value" greater than zero  add
> 5 to the "Value", meaning the desired dataframe will be (NewValue column) :
>
>
>
>structure(list(Department = c("A", "A", "A", "A", "A", "A", "A",
>  "A", "B", "B", "B", "B", "B", "B", "B", "B"), Class = c(1L, 1L,
>  1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Value = c(0L,
>  100L, 800L, 800L, 0L, 300L, 1200L, 0L, 0L, 0L, 400L, 400L, 200L,
>  800L, 1200L, 0L), Date = c("1.01.2020", "2.01.2020", "3.01.2020",
>  "4.01.2020", "1.01.2020", "2.01.2020", "3.01.2020", "4.01.2020",
>  "1.01.2020", "2.01.2020", "3.01.2020", "4.01.2020", "1.01.2020",
>  "2.01.2020", "3.01.2020", "4.01.2020"), NewValue = c(0L, 100L,
>  800L, 805L, 0L, 300L, 1200L, 0L, 0L, 0L, 400L, 405L, 200L, 800L,
>  1200L, 0L)), class = "data.frame", row.names = c(NA, -16L))
>
>
> Thanks a lot for any help!
> Elahe
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[ESS] how to change default emacs behavior, upon running R code, from 2 side-by-by side frames to top-and-bottom frames?

2020-11-09 Thread Christopher W. Ryan via ESS-help


I'm running, on Windows 10, Vince Goulet's version of emacs/ess, which
includesÂ

GNU Emacs 26.3 (build 1, x86_64-w64-mingw32)
 of 2019-08-29

When I execute a line of R code, the R buffer opens up as expected, but
it opens in a frame adjacent to the frame containing my source buffer. I
would like it to open the R buffer as a frame below my source code,
rather than adjacent to it.Â

How can I change the behavior from side-side frames to top-bottom frames?

Thanks.

--Chris Ryan

__
ESS-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/ess-help


Re: [R] [External Email] how to generate this kind of graph

2020-07-23 Thread Christopher W Ryan
You might like the beeswarm package.

--Chris Ryan
SUNY Upstate Medical University
and
Broome County Health Department

On Thu, Jul 23, 2020 at 1:15 AM array chip via R-help 
wrote:

>
> Hello everyone,
>
> I saw this scatterplots from a paper and thought it looked very nice:
>
>
> https://drive.google.com/file/d/1V7F1gq-J_GIFDOrJs00hwGyXUqCZ_xwa/view?usp=sharing
>
> It was similar to stripchart() with 'jitter' method, but it has a special
> pattern of aligning points which made it look nicer than standard
> stripchart().
>
> Does anyone know if there is a package in R that can do this kind of plots?
>
> Thanks,
>
> John
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] challenging data merging/joining problem

2020-07-05 Thread Christopher W. Ryan
I've been conducting relatively simple COVID-19 surveillance for our
jurisdiction. We get data on lab test results automatically, and then
interview patients to obtain other information, like clinical details.
We had been recording all data in our long-time data system (call it
dataSystemA). But as of a particular date, there was a major change in
the data system we were compelled to use. Call the new one dataSystemB.
dataSystemA and dataSystemB contain very similar information,
conceptually, but the variable names are all different, and there are
some variables in one that do not appear in the other. Total number of
variables in each is about 50-70.

Furthermore, for about 2 weeks prior to the transition, lab test results
started being deposited into dataSystemB while dataSystemA was still
being used to record the full information from the interviews.
Subsequent to the transition, lab test results and interview information
are being recorded in dataSystemB, while the lab test results alone are
still being automatically deposited into dataSystemA.

Diagrammatically:

dataSystemA usage:  >>

dataSystemB usage:   ..._>>

where  represents full data and . represents partial data,
and >> represents the progress of time.


The following will create MWE of the data wrangling problem, with the
change in data systems made to occur overnight on 2020-07-07:

library(dplyr)
dataSystemA <- tibble(lastName = c("POTTER", "WEASLEY", "GRAINGER",
"LONGBOTTOM"),
  firstName = c("harry", "ron", "hermione", "neville"),
  dob = as.Date(Sys.Date() + c(sample(-3650:-3000,
size = 2), -3500, -3450)),
  onsetDate = as.Date(Sys.Date() + 1:4),
  symptomatic = c(TRUE, FALSE, NA, NA) )
dataSystemB <- tibble(last_name = c("GRAINGER", "LONGBOTTOM", "MALFOY",
"LOVEGOOD", "DIGGORY"),
  first_name = c("hermione", "neville", "draco",
"luna", "cedric"),
  birthdate = as.Date(Sys.Date() + c(-3500, -3450,
sample(-3650:-3000, size = 3))),
  date_of_onset = as.Date(Sys.Date() + 3:7),
  symptoms_present = c(TRUE, TRUE, FALSE, FALSE, TRUE))



Obviously, this is all the same public health problem, so I don't want a
big uninterpretable gap in my reports. I am looking for advice on the
best strategy for combining two different tibbles with some overlap in
observations (some patients appear in both data systems, with varying
degrees of completeness of data) and with some of the same things being
mesaured and recorded in the two data systems, but with different
variable names.

I've thought of two different strategies, neither of which seems ideal
but either of which might work:

1. change the variable names in dataSystemB to match their
conceptually-identical variables in dataSystemA, and then use some
version of bind_rows()

2. Create a unique identifier from last names, first names, and dates of
birth, use some type of full_join(), matching on that identifier,
obtaining all columns from both tibbles, and then "collapse"
conceptually-identical variables like onsetDate and date_of_onset using
coalesce()

Sorry for my long-windedness. Grateful for any advice.

--Chris Ryan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External Email] R Software Risk Analysis

2020-06-18 Thread Christopher W. Ryan
I use R every day with pretty sensitive data in my county health
department. Of course, this is for manipulation and analysis of data
pulled from their sources, not for interacting directly with, or
updating, patient records in any clinically operational sense. As others
have said, the structure and security of the overall computing
environment is what matters most.

--Chris Ryan

Wait, Kristin wrote:
> HI all,
> 
> I am with a NYS major trauma center and all programs that our 
> employees/providers use must be vetted through the IT Department by way of a 
> Risk Analysis.
> Is there someone I would talk to about this?
> 
> I scoured your website and could not find a specific person.
> 
> Thank you so much
> Kristin Wait
> Albany, NY
> - CONFIDENTIALITY NOTICE: This email 
> and any attachments may contain confidential information that is protected by 
> law and is for the sole use of the individuals or entities to which it is 
> addressed. If you are not the intended recipient, please notify the sender by 
> replying to this email and destroying all copies of the communication and 
> attachments. Further use, disclosure, copying, distribution of, or reliance 
> upon the contents of this email and attachments is strictly prohibited. To 
> contact Albany Medical Center, or for a copy of our privacy practices, please 
> visit us on the Internet at www.amc.edu.
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External Email] Re: [External] Re: access for free more than 500 essential Springer Nature textbooks

2020-05-22 Thread Christopher W. Ryan
Am I interpreting this offer correctly, that it is for libraries to
obtain access to the e-books for free? It does not seem to me that an
invididual can download one--am I missing that part?

Thanks

--Chris Ryan

Mark Leeds wrote:
> Abby: here's an easier link for seeing what you might like.
> 
> https://link.springer.com/search?facet-content-type=%22Book%22=mat-covid19_textbooks&%23038;facet-language=%22En%22&%23038;sortOrder=newestFirst&%23038;showAll=true
> 
> On Fri, May 22, 2020 at 9:18 PM Richard O'Keefe  wrote:
> 
>> the real pleasure comes from things you weren't looking for but recognise
>> as just what you needed.
>>
>> On Sat, 23 May 2020 at 12:34, Jeff Newmiller 
>> wrote:
>>
>>> You are bound to be disappointed if you invert the purpose of the list.
>>> This is marketing... think of it as a sale... stores rarely put their
>>> entire stock on sale... particularly if the sale price is zero. You have
>> to
>>> start with the list and look for interesting titles.
>>>
>>> But don't let me dissuade you from adding to your killfile if that seems
>>> more useful to you.
>>>
>>> On May 22, 2020 5:04:29 PM PDT, Abby Spurdle 
>> wrote:
> The Excel file is what you need.

 Well, now I'm in a bad mood.

 I went to all the trouble of opening the thing...
 And the first two Springer-published books I look for, aren't there.

 (1) Programming with Data, John Chambers
 (2) Applied Econometrics with R, Z and co.

 Next time someone tells me to use an Excel document, I'm adding them
 to the spam list.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>>>
>>> --
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] CRAN library down? - UPDATE

2020-05-02 Thread Christopher W. Ryan
The message at that URL reads:

CRAN mirror restricted to UC Berkeley

The CRAN mirror at UC Berkeley's College of Natural Resources is no
longer available to off campus users and has been removed from the CRAN
mirror list. The load on our server was too much.

Berkeley folks can continue to access the mirror from on-campus networks
or from within the VPN.

The rest of the R community: Sorry, but you'll have to please choose a
different CRAN mirror.

--Chris Ryan


Roger Coppock wrote:
> Recently a message appears when I try to view the index of the CRAN library.
> 
> Warning: unable to access index for repository 
> http://cran.cnr.Berkeley.edu/bin/macosx/contrib/4.0:
>   cannot open URL 
> 'http://cran.cnr.Berkeley.edu/bin/macosx/contrib/4.0/PACKAGES'
> 
> Does the CRAN library have the Coronavirus?
> 
> - -  Roger Coppock (rcopp...@cox.net)
> 
>> On May 1, 2020, at 9:24 AM, Roger Coppock  wrote:
>>
>> After I changed to Version R version 4.0.0 (2020-04-24) -- "Arbor Day" on my 
>> MacBook running Mac OS 10.13.6, I lost all my loaded libraries.  Also, the 
>> package installer can not contact CRAN libraries either for binaries or 
>> sources, to replace the missing loaded libraries.  The package installer can 
>> contact "BioConductor", however.  I am now specifically looking for "HURDAT" 
>> and "lmtest", which were on CRAN but not "BioConductor".
>>
>> - -  Roger Coppock (rcopp...@cox.net)
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External Email] Re: arranging multiple lattice graphs on a page

2020-04-02 Thread Christopher W. Ryan
Thanks, I'll take a look. I also finally came across gridExtra, which allows me 
to do it as well.

--Chris Ryan

On April 2, 2020 6:45:58 PM EDT, Bert Gunter  wrote:
>See ?print.trellis, especially the "split" argument.
>
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>On Thu, Apr 2, 2020 at 2:23 PM Christopher W. Ryan
> wrote:
>>
>> I would like to place two separate plots, one above the other,
>something
>> like this (MWE for illustration):
>>
>> library(lattice)
>> data(iris)
>> layout(matrix(c(1,2), 2, 1, byrow = TRUE))
>> with(iris, (plot(Sepal.Length ~ Petal.Length)))
>> with(iris, (plot(Sepal.Length ~ Petal.Width)))
>>
>> but with lattice, so one of the plots can have panels. So something
>like
>> this:
>>
>> library(lattice)
>> data(iris)
>> layout(matrix(c(1,2), 2, 1, byrow = TRUE))
>> xyplot(Sepal.Length ~ Petal.Length, data = iris)
>> xyplot(Sepal.Length ~ Petal.Width | Species, data = iris, layout =
>c(3,1))
>>
>> But the latter does not accomplish my goal. Appreciate any advice.
>>
>> Thanks.
>>
>> --Chris Ryan
>> SUNY Upstate Medical University
>> Binghamton, NY
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] arranging multiple lattice graphs on a page

2020-04-02 Thread Christopher W. Ryan
I would like to place two separate plots, one above the other, something
like this (MWE for illustration):

library(lattice)
data(iris)
layout(matrix(c(1,2), 2, 1, byrow = TRUE))
with(iris, (plot(Sepal.Length ~ Petal.Length)))
with(iris, (plot(Sepal.Length ~ Petal.Width)))

but with lattice, so one of the plots can have panels. So something like
this:

library(lattice)
data(iris)
layout(matrix(c(1,2), 2, 1, byrow = TRUE))
xyplot(Sepal.Length ~ Petal.Length, data = iris)
xyplot(Sepal.Length ~ Petal.Width | Species, data = iris, layout = c(3,1))

But the latter does not accomplish my goal. Appreciate any advice.

Thanks.

--Chris Ryan
SUNY Upstate Medical University
Binghamton, NY

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External Email] Re: incomplete reading of a large csv file

2020-02-21 Thread Christopher W. Ryan
Ah, Uwe, you are a lifesaver. Although there should not have been, there
were some lines with entries like this in the 6th field:

medical alarm - unk problem   "B"

I would have thought that my effort to read just the first field of each
line, uniformly an integer, would have neutralized any issues arising
from problematic characters in other fields. Apparently not. So the
entire file is processed in some manner, even when using the colClasses
argument to restrict the result to the first column?

Thanks.

--Chris Ryan


Uwe Ligges wrote:
> 
> 
> On 21.02.2020 20:10, Christopher W. Ryan wrote:
>> sessionInfo at end of message.
>>
>> I have data that I was given as an Excel .xlsx file. It contains 96266
>> lines and 24 columns. I opened it in OpenOffice.org and saved it in .csv
>> format, using the pipe character as a field separator. This produced a
>> file with 96266 lines.
>>
>> When I read it into R thusly:
>>
>>> skip0.dd <- read.csv("AmbulanceDispatches2017-2019-02-18-2020.csv",
>> sep = "|", header = TRUE, comment.char = "", skip = 0)
>>
>> the resulting skip0.dd dataframe has only 58208 lines:
>>
>>> dim(skip0.dd)
>> [1] 58208    24
>>
>>
>> I've tried a variety of things to troubleshoot. Using head() and tail(),
>> the expected first and last lines (comparing to the .csv file) do indeed
>> exist in skip0.dd.  Several arbitrary lines from the "middle" of the csv
>> file are also present in the skip0.dd dataframe.
>>
>> I tried reading only the first column, which is integer, but still it
>> appears that not all lines are read in:
>>
>>> classes <- c(NA, rep("NULL", 23))
>>> skip01.dd <- read.csv("AmbulanceDispatches2017-2019-02-18-2020.csv",
>> sep = "|", header = TRUE, comment.char = "", skip = 0, colClasses =
>> classes)
>>> dim(skip01.dd)
>> [1] 58208    1
>>
>> Skipping the first 5 lines nominally should give me a dataframe of
>> 46266 lines, or at least one of 5 fewer lines than skip0.dd (i.e.
>> 8208 lines), but it does neither:
>>
>>> skip5.dd <-
>> read.csv("AmbulanceDispatches2017-2019-02-18-2020.csv", sep = "|",
>> header = TRUE, comment.char = "", skip = 5)
>>> dim(skip5.dd)
>> [1] 22170    24
>>
>> Any thoughts on what might be going wrong? Some funky characters from
>> Excel or OpenOffice.org lurking in the .csv file?
> 
> quotes are a typical proiblem, what if you try with arg quote=""?
> 
> 
> 
> 
>>
>> Perhaps I'd have more success with one of the packages that enables
>> reading directly from an .xlsx file.
>>
>> Thanks.
>>
>> --Chris Ryan
>> SUNY Upstate Medical University Binghamton Clinical Campus
>> Broome County Health Department
>> Binghamton University
>>
>>
>> 
>>> sessionInfo()
>> R version 3.5.3 (2019-03-11)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows 10 x64 (build 17763)
>>
>> Matrix products: default
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods   base
>>
>> other attached packages:
>> [1] dplyr_0.8.3   stringr_1.4.0 Hmisc_4.2-0   ggplot2_3.2.1
>> [5] Formula_1.2-3 survival_2.44-1.1 lattice_0.20-38
>>
>> loaded via a namespace (and not attached):
>>   [1] Rcpp_1.0.1  pillar_1.4.0    compiler_3.5.3
>>   [4] RColorBrewer_1.1-2  tools_3.5.3 base64enc_0.1-3
>>   [7] digest_0.6.18   zeallot_0.1.0   rpart_4.1-13
>> [10] checkmate_1.9.3 tibble_2.1.1    gtable_0.3.0
>> [13] htmlTable_1.13.1    pkgconfig_2.0.2 rlang_0.4.0
>> [16] Matrix_1.2-15   rstudioapi_0.10 xfun_0.7
>> [19] gridExtra_2.3   knitr_1.23  withr_2.1.2
>> [22] cluster_2.0.7-1 htmlwidgets_1.3 vctrs_0.2.0
>> [25] grid_3.5.3  nnet_7.3-12 tidyselect_0.2.5
>> [28] data.table_1.12.2   glue_1.3.1  R6_2.4.0
>> [31] foreign_0.8-71  latticeExtra_0.6-28 purrr_0.3.2
>> [34] magrittr_1.5    htmltools_0.3.6 backports_1.1.4
>> [37] scales_1.0.0    splines_3.5.3   assertthat_0.2.1
>> [40] colorspace_1.4-1    stringi_1.4.3   acepack_1.4.1
>> [43] lazyeval_0.2.2  munsell_0.5.0   crayon_1.3.4
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] incomplete reading of a large csv file

2020-02-21 Thread Christopher W. Ryan
sessionInfo at end of message.

I have data that I was given as an Excel .xlsx file. It contains 96266
lines and 24 columns. I opened it in OpenOffice.org and saved it in .csv
format, using the pipe character as a field separator. This produced a
file with 96266 lines.

When I read it into R thusly:

> skip0.dd <- read.csv("AmbulanceDispatches2017-2019-02-18-2020.csv",
sep = "|", header = TRUE, comment.char = "", skip = 0)

the resulting skip0.dd dataframe has only 58208 lines:

> dim(skip0.dd)
[1] 5820824


I've tried a variety of things to troubleshoot. Using head() and tail(),
the expected first and last lines (comparing to the .csv file) do indeed
exist in skip0.dd.  Several arbitrary lines from the "middle" of the csv
file are also present in the skip0.dd dataframe.

I tried reading only the first column, which is integer, but still it
appears that not all lines are read in:

> classes <- c(NA, rep("NULL", 23))
> skip01.dd <- read.csv("AmbulanceDispatches2017-2019-02-18-2020.csv",
sep = "|", header = TRUE, comment.char = "", skip = 0, colClasses = classes)
> dim(skip01.dd)
[1] 582081

Skipping the first 5 lines nominally should give me a dataframe of
46266 lines, or at least one of 5 fewer lines than skip0.dd (i.e.
8208 lines), but it does neither:

> skip5.dd <-
read.csv("AmbulanceDispatches2017-2019-02-18-2020.csv", sep = "|",
header = TRUE, comment.char = "", skip = 5)
> dim(skip5.dd)
[1] 2217024

Any thoughts on what might be going wrong? Some funky characters from
Excel or OpenOffice.org lurking in the .csv file?

Perhaps I'd have more success with one of the packages that enables
reading directly from an .xlsx file.

Thanks.

--Chris Ryan
SUNY Upstate Medical University Binghamton Clinical Campus
Broome County Health Department
Binghamton University



> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] dplyr_0.8.3   stringr_1.4.0 Hmisc_4.2-0   ggplot2_3.2.1
[5] Formula_1.2-3 survival_2.44-1.1 lattice_0.20-38

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1  pillar_1.4.0compiler_3.5.3
 [4] RColorBrewer_1.1-2  tools_3.5.3 base64enc_0.1-3
 [7] digest_0.6.18   zeallot_0.1.0   rpart_4.1-13
[10] checkmate_1.9.3 tibble_2.1.1gtable_0.3.0
[13] htmlTable_1.13.1pkgconfig_2.0.2 rlang_0.4.0
[16] Matrix_1.2-15   rstudioapi_0.10 xfun_0.7
[19] gridExtra_2.3   knitr_1.23  withr_2.1.2
[22] cluster_2.0.7-1 htmlwidgets_1.3 vctrs_0.2.0
[25] grid_3.5.3  nnet_7.3-12 tidyselect_0.2.5
[28] data.table_1.12.2   glue_1.3.1  R6_2.4.0
[31] foreign_0.8-71  latticeExtra_0.6-28 purrr_0.3.2
[34] magrittr_1.5htmltools_0.3.6 backports_1.1.4
[37] scales_1.0.0splines_3.5.3   assertthat_0.2.1
[40] colorspace_1.4-1stringi_1.4.3   acepack_1.4.1
[43] lazyeval_0.2.2  munsell_0.5.0   crayon_1.3.4

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External Email] Help to do this exercise

2020-02-10 Thread Christopher W Ryan
Homework questions are generally frowned upon on R-help List. It is best to
discuss those questions with your instructor.

--Chris Ryan
SUNY Upstate Medical University Clinical Campus at Binghamton

On Mon, Feb 10, 2020 at 9:39 AM hương phạm  wrote:

> N1 Consider the database "LakeHuron" , containing the annual measurements
> of the level (in feet) of Lake Huron 1875{1972, see
> https://stat.ethz.ch/R-manual/Rdevel/library/datasets/html/LakeHuron.html.
> The general aim is to estimate the probability density of the level of the
> lake.
> (i) Construct the histogram estimator with the number of bins selected
> by the Sturges rule. On the same plot display the graph of the density of
> the normal distribution with estimated mean and standard
> deviation (normal fit).
> (ii) Among the histograms with the number of bins from 5 to 30, find
> the histogram estimator which is closest to the normal fit. Comment on the
> bias-variance tradeoff in this case.
> (iii) Construct the kernel estimators with various kernels (apply all
> kernels available in the R language). The bandwidth can be chosen by
> default. Construct the kernel estimators under various choices of
> bandwidth (apply all rules for bandwidth selection, which are implemented
> in the R language, the kernel can be chosen by default).
> Among all constructed kernel estimators, find the kernel estimator
> which is closest to the normal fit
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to change format of dates in a tibble

2019-12-18 Thread Christopher W Ryan
I'm not understanding how the tidyverse handles date formats.

output of sessionInfo() at the end of my message.

dateRanges <- structure(list(apptType = structure(1:2, .Label = c("initial
visit",
"start of treatment visit"), class = "factor"), minMadeRequestDates =
structure(c(18124,
18115), class = "Date"), maxMadeRequestDates = structure(c(18187,
18199), class = "Date"), minApptDate = structure(c(18129, 18129
), class = "Date"), maxApptDate = structure(c(18199, 18214), class =
"Date")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -2L))
str(dateRanges)

## produces desired result
format(dateRanges, format = "%d %b %Y")

## does not produce desired result
library(dplyr)
format(dateRanges, format = "%d %b %Y")

## rather cumbersome, and also does not produce the desired output
mutate(dateRanges, minMRD = as.Date(minMadeRequestDates, format = "%d %b
%Y"))

How does one change the format of a date variable inside a tibble when
dplyr is loaded?

Thanks

Chris Ryan

 session info ===
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C   LC_TIME=English_United
States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] dplyr_0.8.3

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2   fansi_0.4.0  zeallot_0.1.0utf8_1.1.4
crayon_1.3.4 assertthat_0.2.1 R6_2.4.0
 [8] backports_1.1.5  magrittr_1.5 pillar_1.4.2 rlang_0.4.0
 cli_1.1.0vctrs_0.2.0  glue_1.3.1
[15] purrr_0.3.3  compiler_3.6.1   pkgconfig_2.0.3  tidyselect_0.2.5
tibble_2.1.3


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help about codes from probability

2019-11-07 Thread Christopher W. Ryan
Is this homework for a class? If so, it would be better to ask your professor 
for guidance. R-help List frowns on homework questions.

Chris Ryan


On November 7, 2019 2:46:32 AM EST, Sadia Seddiqi  wrote:
>Could you help me how I can solve run this code in R-labUnions
>
>1-You are picking flowers from a garden. The garden contains flowers of
>varying colors, including flowers that are completely blue, flowers
>that
>are completely pink, and also flowers that are both colours. The
>probability of picking a flower that is at least partly blue is 0.4 and
>the
>probability of picking a flower that is at least partly pink is 0.2.
>Let's
>think about some probabilities!
>
># Calculate the probability of picking a flower that is blue, pink or
>both
>
>2-
>Conditional Probability I
>
>Enter plants into your console to see information about whether 90
>plants
>live indoors or outdoors, and whether they are a flower, a succulent or
>a
>tree. The values in this table represent probabilities, calculated from
>frequencies.
>
>Let's use these probability values to find some conditional
>probabilities.
>To do this you need to use the formula
>P(A|B)=P(A+B)/P(B)P(A|B)=P(A+B)/P(B).
>
># What is the probability a plant lives indoors?
>
># What is the probability a plant is a flower, given that we know it
>lives
>indoors?
>
># What is the probability a plant lives indoors, given that we know it
>is a
>succulent?
>
>-
>
>*Sadia seddiqi*
>
>
>*Health Care Financing Officer *
>
>
>*Health Economics & Financing Directorate (HEFD) Ministry of Public
>Health
>(MoPH), Kabul-Afghanistan*
>
>* Email: hefd.sa...@gmail.com 
><+93%2078%20692%207270>*
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to place a rug on only the x-axis in a scatterplot with lattice

2019-11-05 Thread Christopher W Ryan
The following produces a scatterplot with rugs on both the vertical and
horizontal axes.

library(dplyr)
library(stringr)
library(lattice)
library(latticeExtra)
## .
xyplot(scheduleInterval ~ calledForApptDate, data = dd.2, xlab = "Date
patient called for appointment", ylab = "Days in the future that patient
was scheduled",
panel = function(...) {
panel.xyplot(..., col = "red")
panel.smoother(..., span = 0.9, se = FALSE)
panel.rug(...)
})

I'd like a rug to appear only on the horizontal axis.  None of the
following seem to be the correct syntax:

panel.rug(..., y = NULL)
panel.rug(..., y = FALSE)
panel.rug(x)
panel.rug(x = ...)

This does the job:

xyplot(scheduleInterval ~ calledForApptDate, data = dd.2, xlab = "Date
patient called for appointment", ylab = "Days in the future that patient
was scheduled",
panel = function(...) {
panel.xyplot(..., col = "red")
panel.smoother(..., span = 0.9, se = FALSE)
panel.rug(x = dd.2$calledForApptDate)
})

but seems inadvisable. Shouldn't I be making use of ... for passing
arguments through to the panel.rug() function?  Specifying a variable in a
dataframe by name isn't generalizable.

Thanks.

--Chris Ryan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] getting summary statistics easily with dplyr

2019-11-05 Thread Christopher W Ryan
I'm trying to modernize my way of thinking, and my coding, into the
dplyr/tidyverse way of doing things.

To get basic summary statistics on a variable in a dataframe, with the
output also being a dataframe. I previously would do something like this,
using other packages:

library(doBy)
doBy.output <- summaryBy(mpg ~ am, data = mtcars, FUN = fivenum)
str(doBy.output)   ## yes, it's a dataframe
## which I would then incorporate into my report via Sweave and latex
latex(doBy.output, file = "")

## Or this:

library(mosaic)
mosaic.output <- favstats(mpg ~ am, data = mtcars)
str(mosaic.output)  ## yes, it's a dataframe
latex(mosaic.output, file = "")


## What would be the "dplyr way" of doing this?  I know I could specify
each summary statistic individually:

library(dplyr)
dplyr.output <- mtcars %>% group_by(am) %>% summarise(min = min(mpg),
 p25 = quantile(mpg, prob = 0.25),
 p50 = median(mpg),
 p75 = quantile(mpg, prob = 0.75),
 max = max(mpg) )
str(dplyr.output)  ## yes, it's a dataframe
latex(dplyr.output, file = "")

## Is there a way to use a single function like fivenum instead of
specifying each desired summary statistic?  dplyr summarise() wants a
result of length 1, not 5

dplyr.output.2 <- mtcars %>% group_by(am) %>% summarise(fivenum(mpg) )

group_map or group_modify seem like they might do the job, but I could
use some guidance on the syntax.


Thanks.

--Chris Ryan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reading in csv files, some of which have column names and some of which don't

2019-08-13 Thread Christopher W Ryan
Alas, we spend so much time and energy on data wrangling . . . .

I'm given a collection of csv files to work with---"found data". They arose
via saving Excel files to csv format. They all have the same column
structure, except that some were saved with column names and some were not.

I have a code snippet that I've used before to traverse a directory and
read into R all the csv files of a certain filename pattern within it, and
combine them all into a single dataframe:

library(dplyr)
## specify the csv files that I will want to access
files.to.read <- list.files(path = "H:/EH", pattern =
"WICLeadLabOrdersDone.+", all.files = FALSE, full.names = TRUE, recursive =
FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)

## function to read csv files back in
read.csv.files <- function(filename) {
bb <- read.csv(filename, colClasses = "character", header = TRUE)
bb
}

## now read the csv files, as all character
b <- lapply(files.to.read, read.csv.files)

ddd <- bind_rows(b)

But this assumes that all files have column names in their first row. In
this case, some don't. Any advice how to handle it so that those with
column names and those without are read in and combined properly? The only
thing I've come up with so far is:

## function to read csv files back in
## Unfortunately, some of the csv files are saved with column headers, and
some are saved without them.
## This presents a problem when defining the function to read them: header
= TRUE or header = FALSE?
## The best solution I can think of as of 13 August 2019 is to use header =
FALSE and skip the
## first row of every file. This will sacrifice one record from each csv of
about 80 files
read.csv.files <- function(filename) {
bb <- read.csv(filename, colClasses = "character", header = FALSE, skip
= 1)
bb
}

This sacrifices about 80 out of about 1600 records. For my purposes in this
instance, this may be acceptable, but of course I'd rather not.

Thanks.

--Chris Ryan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help

2019-07-16 Thread Christopher W Ryan
You will get the best help if you:

1. post email in plain text, not html
2. post the command/code you issued and the result/error that was returned
3. Information about your operating system sometimes helps.

--Chris Ryan
Binghamton, NY, US

On Tue, Jul 16, 2019 at 3:36 PM Sujaya  wrote:

> Respected Sir,
> I am Sujaya ,a data researcher in India.I am unable to read text file
> folder in R.the permission denied error refuses to get resolved.Can you
> help me?
> yours sincerely,
> Sujaya
>
> <
> http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail
> >
> Virus-free.
> www.avg.com
> <
> http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail
> >
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help installing netReg

2019-04-15 Thread Christopher W. Ryan


Sorry, was typing on my phone. Not "compy."  "Computer."  I was asking
whether you were working on your own, standalone, computer, or whether
perhaps this was an institutional, networked, machine, on which you
don't have write permissions that you need.

--Chris Ryan

Spencer Brackett wrote:
> Mr. Ryan, 
> 
> If you are referring to the CRAN mirror, my default is US (NY) [https]
> 
> On Sun, Apr 14, 2019 at 7:23 PM Chris Ryan  > wrote:
> 
> Spencer--
> 
> What is your compy set-up? Standalone, networked? Where is R
> installed? Do you have write privileges to that location?
> 
> Chris Ryan
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
> 
> On April 14, 2019 6:18:37 PM EDT, Spencer Brackett
>  > wrote:
> >Good evening,
> >
> > I am having problems with downloading the package used to generate
> >regression models on R. The following is the error message I received.
> >I
> >tried installing BiocManager instead as suggested, but this too did not
> >work. Any ideas?
> >
> >The downloaded binary packages are in
> >C:\Users\Spencer\AppData\Local\Temp\Rtmp8YKVqx\downloaded_packages
> >installation path not writeable, unable to update packages: class,
> >cluster,
> >codetools, foreign,
> >  lattice, MASS, Matrix, mgcv, nlme, rpart, survival
> >Warning message:
> >'biocLite' is deprecated.
> >Use 'BiocManager::install' instead.
> >See help("Deprecated")
> >
> >Best,
> >
> >Spencer Brackett
> >
> >       [[alternative HTML version deleted]]
> >
> >__
> >R-help@r-project.org  mailing list --
> To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org  mailing list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R maintains old values

2018-07-03 Thread Christopher W Ryan
This TV series might have some advice on that.

https://www.imdb.com/title/tt2543312/

--Chris Ryan

On Tue, Jul 3, 2018 at 9:25 AM, J C Nash  wrote:

> . . . Now, to add to the controversy, how do you set a computer on fire?
>
> JN
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] printing an arbitrary-length character vector in columns on a page of a pdf report

2018-06-05 Thread Christopher W. Ryan
Richard--

Nice. If I understand your code correctly, in the line

ddm <- matrix("", (n+2) %/% nc, nc)

I could instead use

ddm <- matrix("", (n + nc - 1) %/% nc, nc)

for generalizability, as I may have to increase nc as the list of words
grows ever longer.

Thanks everyone. Several good suggestions.

--Chris Ryan

Richard M. Heiberger wrote:
> n <- length(dd)
> ddm <- matrix("", (n+2) %/% nc, nc)
> ddm[1:n] <- dd

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] printing an arbitrary-length character vector in columns on a page of a pdf report

2018-06-05 Thread Christopher W Ryan
I'm writing code for a recurring report, using an R --> Sweave --> pdflatex
workflow. It includes a character vector of short words that I would like
to display compactly, in columns on a page, rather than one word per line,
which would waste a lot of space. The vector of words will increase
unpredictably over time, with future versions of the report.

I thought I would go about it by turning the character vector into a
matrix, as follows:

dd <- LETTERS
## set number of columns. Three for now
nc <- 3
## have to pad the character vector to a length that is multiple of nc
add <- nc - (length(dd) %% nc)
dd2 <- c(dd, rep("", add))
ddm <- matrix(dd2, ncol = nc)
library(Hmisc)
latex(ddm, file = "")

Any ideas for a more elegant way to do this?

Thanks.

--Chris Ryan
Binghamton University
and
Broome County Health Department
Binghamton, NY, US

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How would I color points conditional on their value in a plot of a time series

2018-05-01 Thread Christopher W Ryan
Excellent! Worked like a charm. Thanks.

--Chris Ryan

On Tue, May 1, 2018 at 4:33 PM, William Dunlap <wdun...@tibco.com> wrote:

> The ts method for plot() is quirky.  You can use the default method:
>
>  plot(as.vector(time(ttt)), as.vector(ttt), type = "p", col=ifelse(ttt<8,
> "black", "red"))
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Tue, May 1, 2018 at 1:17 PM, Christopher W Ryan <cr...@binghamton.edu>
> wrote:
>
>> How would I color points conditional on their value in a plot of a time
>> series.  Something like this:
>>
>> ## demonstration data
>> ttt <- ts(rpois(12, lambda = 8), start = c(2000, 1), freq = 4)
>> ttt
>> plot(ttt, type = "p")
>>
>> ## doesn't work--all points the same color
>> plot(ttt, type = "p", col = ifelse(ttt < 8, "black", "red"))
>>
>> ## also doesn't work--all points the same color
>> q <- as.numeric(ttt)
>> q
>> plot(ttt, type = "p", col = ifelse(q < 8, "black", "red"))
>>
>>
>> ## works OK with a simple, non-time-series scatterplot, as in
>>
>> sss <- data.frame(x = rpois(12, lambda = 8), y = rnorm(12, mean = 100, sd
>> =
>> 25))
>> with(sss, plot(y ~ x, col = ifelse(y > 100, "black", "red")))
>>
>> ## but I am missing something about time series.
>>
>> Thanks.
>>
>> --Chris Ryan
>> Broome County Health Department
>> and Binghamton University
>> Binghamton, NY
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How would I color points conditional on their value in a plot of a time series

2018-05-01 Thread Christopher W Ryan
How would I color points conditional on their value in a plot of a time
series.  Something like this:

## demonstration data
ttt <- ts(rpois(12, lambda = 8), start = c(2000, 1), freq = 4)
ttt
plot(ttt, type = "p")

## doesn't work--all points the same color
plot(ttt, type = "p", col = ifelse(ttt < 8, "black", "red"))

## also doesn't work--all points the same color
q <- as.numeric(ttt)
q
plot(ttt, type = "p", col = ifelse(q < 8, "black", "red"))


## works OK with a simple, non-time-series scatterplot, as in

sss <- data.frame(x = rpois(12, lambda = 8), y = rnorm(12, mean = 100, sd =
25))
with(sss, plot(y ~ x, col = ifelse(y > 100, "black", "red")))

## but I am missing something about time series.

Thanks.

--Chris Ryan
Broome County Health Department
and Binghamton University
Binghamton, NY

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reasons to Use R in a Public Administrations and Ideas for a Short Training

2018-04-19 Thread Christopher W Ryan
A few years ago I gave two 5-hour workshops about R to a class of US high
school students in a somewhat-accelerated science research class (so these
were already science-motivated kids).  They had been using mainly Excel,
and some SPSS for which the school had a license.  Overall they seemed to
like the sessions. They started a computer programming club afterward, and
some continued to use R.

As for motivation, I found it helpful with the kids to contrast the
instructions for making a scatterplot in Excel, with the command in R.  You
can find many long documents on the internet describing how to do it in
Excel. Find a long one (I've seen 17 pages) filled with screenshots and
many numbered steps ("click here, then click there, then click over here .
. . ")  Then contrast it with something like plot(force, acceleration) or
plot(x,y)   It becomes pretty clear which is easier (postponing, for the
moment, the question of where x and y came from)

Showing cool graphs that could never be made in Excel is also helpful.

Lastly, in the international setting you describe, I would emphasize the
portability of code. Ask them how, with Excel, they would share their
analysis steps with colleagues in another country, for application to the
colleagues' data.

--Chris Ryan
Broome County Health Department
Binghamton, NY US

On Wed, Apr 18, 2018 at 9:47 AM, Lorenzo Isella 
wrote:

> Dear All,
> Ages ago I posted to this mailing list asking for advice about to
> evangelize the use of R in an international public
> administration where the fact that R is free is not a decisive factor
> (actually its being "freeware" may even be seen negatively). After a
> long time, I think it is worthwhile asking the question again and see
> what suggestions other users have.
>
> Another question related to that: let's say you have the possibility
> to give a short course (most likely short of 10 hours) to people
> who are not trained in statistics (people with a background in
> international relations or political scientists frustrated at Excel
> and who sometimes have to do a number of repetitive tasks). How would
> you formulate a short training to make them not R proficient users,
> but aware and looking forward to learning more about R?
> Any suggestion and/or pointer to online resources is appreciated.
> Many thanks
>
> Lorenzo
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data cleaning & Data preparation, what do R users want?

2017-11-29 Thread Christopher W. Ryan
Great question. What do I want? I want my co-workers to stop using Excel
spreadsheets for data entry, storage, and sharing! I want them to
understand the value of data discipline. But alas . . . .

I work in a county health department in the US. Between dplyr, stringr,
grep, grepl, and the base R read() functions, I'm doing OK.

I need to learn more about APIs, so I can see if I can make R directly
grab data from, e.g. our state health department sources. My biggest
hassle is having to download a data file, save it somewhere, and then
open R and read it in. I'd like to be able to do it all in R. Would make
the generation of recurring reports easier.

--Chris Ryan

Robert Wilkins wrote:
> R has a very wide audience, clinical research, astronomy, psychology, and
> so on and so on.
> I would consider data analysis work to be three stages: data preparation,
> statistical analysis, and producing the report.
> This regards the process of getting the data ready for analysis and
> reporting, sometimes called "data cleaning" or "data munging" or "data
> wrangling".
> 
> So as regards tools for data preparation, speaking to the highly diverse
> audience mentioned, here is my question:
> 
> What do you want?
> Or are you already quite happy with the range of tools that is currently
> before you?
> 
> [BTW,  I posed the same question last week to the r-devel list, and was
> advised that r-help might be a more suitable audience by one of the
> moderators.]
> 
> Robert Wilkins
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help

2017-06-20 Thread Christopher W Ryan
Leonardo--

R-help can be a very useful resource. Some suggestions to use it well:

1. use an informative subject line, not "help"
2. include a "minimal working example:" a *little* data, the code that,
with those data, reproduces your problem, and the error message that
resulted.

As to your particular question, at this point I can only guess, but for
starters it would probably help to show the output of

str(ind.davis)

--Chris Ryan

On Tue, Jun 20, 2017 at 1:06 AM, Leonardo Malaguti <
leonardomalagut...@gmail.com> wrote:

> Dear expert friends,
> I'm pretty young of this world and  my question at your eyes can be petty
> easy.
> I'll need to change the name of the levels inside a column of my data-frame
>
> levels(ind.davis$Ageclass) <- c("adult", "Juvanile", "sub-adult")
> names(ind.davis$Ageclass) <- c("Adult", "Juvenile", "Sub-adult")
> that is what I tried but of course doesn't work.
>
> Thanks,
> have a wonderful day,
> Leo
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to Transform a Factor Variable into a Date

2017-06-19 Thread Christopher W Ryan
A couple thoughts:

1. converting factors into dates often requires that they be converted to
character first.

2. you don't really have dates; you have just months and years

3. therefore perhaps the as.yearmon() function in the zoo package could help

library(zoo)
my.factor <- factor("Feb 2017")
as.yearmon(my.factor)  ## gets around the factor-vs-character issue


On Mon, Jun 19, 2017 at 3:07 PM, Paul Bernal  wrote:

> Dear all,
>
> Hope you are doing great. I have a .csv file that I read into R, the .csv
> file consistss of two fields (TransitDate and CargoTons).
>
> The TransitDate I formatted from Excel in the fashion mmm-yy (e.g.:
> Apr-2013). However R does not recognize the field TransitDate as a date
> field.
>
> Here is the code:
>
> library(lubridate)
>
> Dataset <- read.table("U:/NEWCargoData.csv", header=TRUE, sep=",",
> na.strings="NA", dec=".", strip.white=TRUE)
>
> DatasetFrame <- data.frame(Dataset)
>
> DatasetFrame$TransitDate <- as.Date(DatasetFrame$TransitDate, format =
> "%b-%y")
>
>
> Now, when I do DatasetFrame[1,1], the following happens:
>
>
> > DatasetFrame[1,1]
> [1] NA
> >
> > DatasetFrame[2,1]
> [1] NA
> >
>
> Now when I do:
>
> > Dataset[1,1] #this is the dataset as was read from my computer
> [1] Jun-11
> 62 Levels: Apr-13 Apr-14 Apr-15 Apr-16 Apr-17 Aug-13 Aug-14 Aug-15 Aug-16
> Dec-12 Dec-13 Dec-14 ... Sep-16
> >
>
> I am also attaching the .csv file for your reference. How can I do to get R
> to convert TransitDate into an actual date  field? R is not recognizing it
> as a date.
>
> Any help will be greatly appreciated,
>
> Best regards,
>
> Paul
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about fitting a regression line

2017-06-15 Thread Christopher W Ryan
Rather than just posting your error message, it helps immensely to post the
code that produced the error--indeed with some small sample data that
reproduces the problem.

x <- rnorm(40)
y <- 0.6 * x + rnorm(40, sd = 0.3)
plot(y ~ x)
model <- lm(y ~ cos(x))
summary(model)
plot(y ~ cos(x))
abline(model, col = "red")

## obviously I am not claiming that this is a meaningful or sensible model
## It's just for illustrative purposes

--Chris Ryan

On Thu, Jun 15, 2017 at 3:48 PM, lily li  wrote:

> Thanks for your replies. I tried the regression, but then got a NA value
> for the slope. And here is the error message:
> Coefficients: (1 not defined because of singularities)
>
>
> On Thu, Jun 15, 2017 at 12:20 AM, PIKAL Petr 
> wrote:
>
> > Hi
> >
> > But X can be some function like - sin, cos, log, exp...
> >
> > Cheers
> > Petr
> >
> > > -Original Message-
> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of lily
> li
> > > Sent: Thursday, June 15, 2017 1:28 AM
> > > To: MacQueen, Don 
> > > Cc: R mailing list 
> > > Subject: Re: [R] about fitting a regression line
> > >
> > > Thanks. I thought lm() function is for linear model, such as the
> > correlation
> > > below:
> > > Y= aX + b
> > >
> > > On Wed, Jun 14, 2017 at 5:25 PM, MacQueen, Don 
> > > wrote:
> > >
> > > > Start with the lm() function; i.e., see
> > > >
> > > >   ?lm
> > > >
> > > > -Don
> > > >
> > > > --
> > > > Don MacQueen
> > > >
> > > > Lawrence Livermore National Laboratory
> > > > 7000 East Ave., L-627
> > > > Livermore, CA 94550
> > > > 925-423-1062
> > > >
> > > >
> > > > On 6/14/17, 3:40 PM, "R-help on behalf of lily li" <
> > > > r-help-boun...@r-project.org on behalf of chocol...@gmail.com>
> wrote:
> > > >
> > > > Hi R users,
> > > >
> > > > I have some data points (Xi, Yi), and they may follow such a
> > > > pattern Yi =
> > > > cCOS(Xi) + d, how to find the c and d in R? which function to
> use?
> > > > Also,
> > > > how to get the R2 and p value for this correlation? Thanks for
> any
> > > > kind of
> > > > help.
> > > >
> > > > [[alternative HTML version deleted]]
> > > >
> > > > __
> > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
> see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide http://www.R-project.org/
> > > > posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible
> code.
> > > >
> > > >
> > > >
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > 
> > Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou
> > určeny pouze jeho adresátům.
> > Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
> > neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho
> kopie
> > vymažte ze svého systému.
> > Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento
> email
> > jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> > Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi
> > či zpožděním přenosu e-mailu.
> >
> > V případě, že je tento e-mail součástí obchodního jednání:
> > - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
> > smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
> > - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně
> přijmout;
> > Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany
> > příjemce s dodatkem či odchylkou.
> > - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
> > výslovným dosažením shody na všech jejích náležitostech.
> > - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
> > společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně
> zmocněn
> > nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi
> tohoto
> > emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich
> > existence je adresátovi či osobě jím zastoupené známá.
> >
> > This e-mail and any documents attached to it may be confidential and are
> > intended only for its intended recipients.
> > If you received this e-mail by mistake, please immediately inform its
> > sender. Delete the contents of this e-mail with all attachments and its
> > copies from your system.
> > If you are not the intended recipient of this e-mail, you are not
> > authorized to use, disseminate, copy or disclose this e-mail in any
> manner.
> > The sender of 

[R] trouble with starting date in sts and disProg objects in the surveillance package

2016-12-27 Thread Christopher W Ryan
I create a data frame, then, using package surveillance version 1.3, I
convert it to a surveillance time series (sts) object and then to a disProg
object.  But in the plot of the disProg object, the dates seem to be
mislabelled?

Grateful for any advice.

--Chris Ryan
Broome County Health Department
Binghamton, NY

Below is MWE to reproduce the behavior

library(surveillance)
ll <- data.frame(onset = sample(1:200, size = 400, replace = TRUE))
## let the dates begin in mid-September 2012
ll$onset.date <- as.Date("2012-09-15") + ll$onset
dd.sts <- linelist2sts(ll, dateCol="onset.date", aggregate.by="1 week")
## the below command produces a plot where the horizontal (time) axis
begins around mid-September 2012, as expected
plot(dd.sts)
## now convert the sts object to a disProg object
dd.dp <- sts2disProg(dd.sts)
## the below commands shows the start date of the disProg object seems as
expected
str(dd.dp)
dd.dp$start
## But when I plot the disProg object, the horizontal axis is labelled
differently.
## It seems to begin back in 2011 or before
plot(dd.dp)

## here is my sessionInfo
R version 3.3.1 (2016-06-21)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252
LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] surveillance_1.13.0 polyCub_0.5-2   xtable_1.8-2
sp_1.2-4

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8 lattice_0.20-34 deldir_0.1-12   MASS_7.3-45
grid_3.3.1  nlme_3.1-128
 [7] tensor_1.5  spatstat_1.48-0 goftest_1.0-3   rpart_4.1-10
Matrix_1.2-7.1  tools_3.3.1
[13] polyclip_1.5-6  abind_1.4-5 mgcv_1.8-16

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dplyr or plyr or both?

2016-09-15 Thread Christopher W. Ryan
Thank you Frans. This is exactly the sort of nuance that I want to learn 
about.


--Chris

Frans Marcelissen wrote:

Hello Christopher and others
:
What cannot be stressed enough is: do not combine both packages, it
gives errors and incorrect results! I will show that below

a<-data.frame(groep=1:4,v=1:40)
library(dplyr)
a %>% group_by(groep) %>% summarise(m=mean(v),n=n())
# groep m n
#   
# 1 11910
# 2 22010
# 3 32110
# 4 42210
# correct

library(plyr)
a %>% group_by(groep) %>% summarise(m=mean(v),n=n())

Error in n() : This function should not be called directly
# ???
a %>% group_by(groep) %>% summarise(m=mean(v))
# m
# 1 20.5
#incorrect!
--

So both n() and group_by from dplyr don't work after library(plyr)!

My advice is: do not use plyr. Unfortunately plyr has some functions
that are very important, and that are not in dplyr. For
instance: rbind.fill() (for combining the rows of two dataframes with
unequal columns). If you need this: do'nt library plyr, use plyr::rbind.fil

Until now I have the impression that it is also possible to library
dplyr after plyr, but it is better to remove plyr!

This is a serious problem that has been reported before, but not solved
(in dplyr 0.5.0 and plyr 1.8.4)

Frams

2016-09-15 16:09 GMT+02:00 Christopher W Ryan <cr...@binghamton.edu
<mailto:cr...@binghamton.edu>>:

I've set myself the task of learning about these packages, and about
tidy data concepts.

What is the relationship between plyr and dplyr?  Does the latter
replace the former (meaning I can concentrate on learning the latter)?
Or is there ever a need to use functions from both (meaning I should
learn both)?

Thanks.

--Chris Ryan

__
R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dplyr or plyr or both?

2016-09-15 Thread Christopher W Ryan
I've set myself the task of learning about these packages, and about
tidy data concepts.

What is the relationship between plyr and dplyr?  Does the latter
replace the former (meaning I can concentrate on learning the latter)?
Or is there ever a need to use functions from both (meaning I should
learn both)?

Thanks.

--Chris Ryan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] what happened to inside-r? [possibly OT]

2016-08-01 Thread Christopher W Ryan
No, not off the grid. I just don't follow developments with Revolution
or with Microsoft.

I was merely lamenting that inside-r URLs, that used to get me quickly
to the R help pages (even if I was at a computer that did not have R),
now re-direct to the corporate MRAN homepage, with the help pages
nowhere to be found. Alas!

But thanks anyway.

--Chris

On Fri, Jul 29, 2016 at 3:54 PM, David Winsemius <dwinsem...@comcast.net> wrote:
> Been off the grid for the last year? MS bought Revolution R.
>
>
> Sent from my iPhone
>
>> On Jul 29, 2016, at 11:30 AM, Christopher W. Ryan <cr...@binghamton.edu> 
>> wrote:
>>
>> This might be a bit off-topic, but up until recently (a day or so ago?) I 
>> loved using inside-r.org as a quick and easy way to access help pages on R 
>> commands. Took me to what I needed without any fuss. Now that URL redirects 
>> to the "Microsoft R Application Network"?  Looks to be something related to 
>> Revolution R.  What happened?
>>
>> Thanks.
>>
>> --Chris Ryan
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] I'm getting confused with notation and terminology in output from weibull parametric survival model from survreg()

2016-07-29 Thread Christopher W. Ryan
I'm trying to run a Weibull parametric survival model for recurrent 
event data, with subject-specific frailties, using survreg() in the 
survival package, and I'm having trouble understanding the output and 
its notation, and how that translates to some of the books I am using as 
references (DF Moore, Applied Survival Analysis Using R; and Kleinbaum 
and Klein, Survival Analysis A Self-Learning Text). I understand there 
are different notations for different ways of parameterizing a Weibull 
or a gamma distribution, and perhaps that's where I am getting hung up. 
I also may be confusing "scale" for the Weibull distribution of the 
survial times with "scale" for the gamma distribution of the frailties.


My ultimate goal is to display example survival curves: say, one for 
"typically frail" subjects, one for "extra-frail" subjects, and one for 
"not-so-frail" subjects. I'd like to get estimated "frailty" for each of 
my subjects; or at least the distribution of those frailties. Do I need 
the parameters of the gamma distribution to do that? If so, how do I 
extract them? Or are they readily available in the survreg object?


Here is what I have tried so far:

## create some data similar to my real data, in which
## vast majority of subjects had no event
id <- c(1:600, rep(601:630, each=1), rep(631:650, each=2), rep(651:656, 
each=3), rep(677:679, each=4), rep(680, 5))
time <- c(rpois(lambda=800, 600), rpois(lambda=600, length(601:630)), 
rpois(lambda=600, length(631:650)*2), rpois(lambda=600, 
length(651:656)*3), rpois(lambda=600, length(677:679)*4), 
rpois(lambda=600, 5))

event <- c(rep(0, 600), rep(1, (length(id) - 600)))
dd <- data.frame(id=id, time=time, event=event)
dd.2 <- dd[order(id, time), ]
str(dd.2)
table(table(dd.2$id))
# time until censoring, for those without events
summary(subset(dd.2, event==0, select=time))

library(survival)
Surv.1 <- Surv(time, event)

# model without frailties
model.1 <- survreg(Surv.1 ~ 1, data=dd.2, dist="weibull")

# add frailty term
model.2 <- survreg(Surv.1 ~ 1 + frailty(id), data=dd.2, dist="weibull")

# should be same as above line
model.2.b <- survreg(Surv.1 ~ 1 + frailty(id, distribution="gamma"), 
data=dd.2, dist="weibull")


# I don't know if this is the right way to go about it
a.scale <- model.2$scale
var.X <- model.2$history$frailty$theta
s.shape <- sqrt(var.X/a.scale)

gamma.frail.x <- function(a,s,q){ 1/((s^a) * gamma(a)) * (q^(a-1) * 
exp(-(q/s))) }

q <- seq(0.1, 10, by=0.2)

maybe.my.frailties <- gamma.frail.x(a.scale, s.shape, q)))
plot(density(maybe.my.frailties))

## end code


Or, would I be better off changing tactics and using frailtypack?

Thanks for any help.  Session info is below, in case it is relevant.

--Chris Ryan



> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] survival_2.39-5

loaded via a namespace (and not attached):
[1] compiler_3.3.1  Matrix_1.2-6tools_3.3.1 splines_3.3.1
[5] grid_3.3.1  lattice_0.20-33

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] what happened to inside-r? [possibly OT]

2016-07-29 Thread Christopher W. Ryan
This might be a bit off-topic, but up until recently (a day or so ago?) 
I loved using inside-r.org as a quick and easy way to access help pages 
on R commands. Took me to what I needed without any fuss. Now that URL 
redirects to the "Microsoft R Application Network"?  Looks to be 
something related to Revolution R.  What happened?


Thanks.

--Chris Ryan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code, heat wave statistics

2016-07-18 Thread Christopher W Ryan
I think much depends on how you define heat wave.

--Chris Ryan
Broome County Health Department
Binghamton, NY USA

On Mon, Jul 18, 2016 at 6:16 AM, Damjan /  wrote:
> Dar all,
>
> I have a txt file with 4 column data about daily maximum temperature for
> some years.
>
> I need to find the number of heat waves and :
>
> Average (+ standard deviation) and maximum duration of the heatwave, · The
> number of heat waves, · The number of days in the summer heat wave .
>
> txt file is like this:
>
>   year  month   day   temmax 1879 1 1 7,8 1879 1 2 7,5 1879 1 3 5,6 1879 1
> 4 6,2 1879 1 5 4 1879 1 6 -0,9 1879 1 7 -3 1879 1 8 -2,6 1879 1 9 -4,6 1879
> 1 10 -3,2 1879 1 11 -0,4 1879 1 12 3 1879 1 13 1,2 1879 1 14 -1,4 1879 1 15
> 0 1879 1 16 -2,4 1879 1 17 -3 1879 1 18 0,5 1879 1 19 0,6 1879 1 20 -2,6
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Documenting data

2016-06-30 Thread Christopher W Ryan
Pito--

You describe excellent practices.

The R code itself, saved as a script, provides some documentation of how you 
got from original data to wherever you are.

Use # comments liberally. 

Whenever possible, save your raw data, however it was when you got it--avoid 
changing it--make all the changes on the objects in R. 

Have you looked into various "reproducible research" systems for R, like Sweave 
or knitr?  They allow you to include analysis code and text of a manuscript or 
report all together in one file.

Christopher W. Ryan
sent from my phone with BlueMail



On Jun 30, 2016, 11:30, at 11:30, Pito Salas <pitosa...@brandeis.edu> wrote:
>I am studying statistics and using R in doing it. I come from software
>development where we document everything we do.
>
>As I “massage” my data, adding columns to a frame, computing on other
>data, perhaps cleaning, I feel the need to document in detail what the
>meaning, or background, or calculations, or whatever of the data is.
>After all it is now derived from my raw data (which may have been well
>documented) but it is “new.” 
>
>Is this a real problem? Is there a “best practice” to address this?
>
>Thanks!
>
>Pito Salas
>Brandeis Computer Science
>Feldberg 131
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] anonymizing subject identifiers for survival analysis

2016-05-17 Thread Christopher W. Ryan
Excellent, thanks. Much simpler.

--Chris

Christopher W. Ryan, MD, MS
cryanatbinghamtondotedu
https://www.linkedin.com/in/ryancw

Early success is a terrible teacher. You’re essentially being rewarded
for a lack of preparation, so when you find yourself in a situation
where you must prepare, you can’t do it. You don’t know how.
--Chris Hadfield, An Astronaut's Guide to Life on Earth

William Dunlap wrote:
> You can also use match(code, unique(code)), as in
>   transform(dd.2, codex2 = paste0("Person", match(code, unique(code
> It is not guaranteed that x!=y implies digest(x)!=digest(y), but it is
> extremely
> unlikely to fail.  This match idiom guarantees that.
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com <http://tibco.com>
> 
> On Thu, May 12, 2016 at 1:06 PM, Christopher W Ryan
> <cr...@binghamton.edu <mailto:cr...@binghamton.edu>> wrote:
> 
> I would like to conduct a survival analysis, examining a subject's
> time to *next* appearance in a database, after their first appearance.
> It is a database of dated events.
> 
> I need to obfuscate or anonymize or mask the subject identifiers (a
> combination of name and birthdate). And obviously any given subject
> should have the same anonymous code ever time he/she appears in the
> database.  I'm not talking "safe from the NSA" here. And I won't be
> releasing it. It's just sensitive data and I don't want to be working
> every day with cleartext versions of it.
> 
> I've looked at packages digest, anonymizer, and anonymize.  What do
> you think of this approach:
> 
> # running R 3.1.1 on Windows 7 Enterprise
> library(digest)
> dd <- data.frame(id=1:6, name = c("Harry", "Ron", "Hermione", "Luna",
> "Ginny", "Harry"), dob = c("1990-01-01", "1990-06-15", "1990-04-08",
> "1999-11-26", "1990-07-21", "1990-01-01"))
> dd.2 <- transform(dd, code=paste0(tolower(name), tolower(dob), sep=""))
> library(digest)
> anonymize <- function(x, algo="sha256"){
>   unq_hashes <- vapply(x, function(object) digest(object, algo=algo),
> FUN.VALUE="", USE.NAMES=TRUE)
>   unname(unq_hashes[x])
> }
> dd.2$codex <- anonymize(dd.2$code)
> dd.2
> table(duplicated(dd.2$codex))
> 
> Thanks.
> 
> --Chris Ryan
> Broome County Health Department
> 
> __
> R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What is the easiest way to turn a dataframe into a barplot?

2016-05-13 Thread Christopher W. Ryan
Here is one way:

dd <- data.frame(var1=c("string1", "string2", "string3"), var2=c(3,7,4))
dd
with(dd, barplot(var2, names.arg=var1))

--Chris Ryan
Binghamton, NY

yoursurrogate...@gmail.com wrote:
> Hello, I can't post my code since it's on a work computer.
> 
> But basically, I have a dataframe that has two columns, one is a string and 
> the other is an integer.  I want to turn this into a vertival barplot where 
> on the x-axis I have the string in the first columb and then the plot will 
> display the integer count.
> 
> I have found many examples online and most of those matched either odd edge 
> cases or putting the data into a format that strips out some of the data and 
> I can't use it later.
> 
> This should be a breeze, what am I missing?
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] anonymizing subject identifiers for survival analysis

2016-05-13 Thread Christopher W Ryan
I would like to conduct a survival analysis, examining a subject's
time to *next* appearance in a database, after their first appearance.
It is a database of dated events.

I need to obfuscate or anonymize or mask the subject identifiers (a
combination of name and birthdate). And obviously any given subject
should have the same anonymous code ever time he/she appears in the
database.  I'm not talking "safe from the NSA" here. And I won't be
releasing it. It's just sensitive data and I don't want to be working
every day with cleartext versions of it.

I've looked at packages digest, anonymizer, and anonymize.  What do
you think of this approach:

# running R 3.1.1 on Windows 7 Enterprise
library(digest)
dd <- data.frame(id=1:6, name = c("Harry", "Ron", "Hermione", "Luna",
"Ginny", "Harry"), dob = c("1990-01-01", "1990-06-15", "1990-04-08",
"1999-11-26", "1990-07-21", "1990-01-01"))
dd.2 <- transform(dd, code=paste0(tolower(name), tolower(dob), sep=""))
library(digest)
anonymize <- function(x, algo="sha256"){
  unq_hashes <- vapply(x, function(object) digest(object, algo=algo),
FUN.VALUE="", USE.NAMES=TRUE)
  unname(unq_hashes[x])
}
dd.2$codex <- anonymize(dd.2$code)
dd.2
table(duplicated(dd.2$codex))

Thanks.

--Chris Ryan
Broome County Health Department

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] detecting any element in a vector of strings, appearing anywhere in any of several character variables in a dataframe

2015-07-13 Thread Christopher W Ryan
Interesting thoughts about the partial-word matches, and speed  On
another real data set, about 73,000 records and 6 columns to search
through for matches (one column of which contains very long character
strings--several paragraphs each), I ran both John's and Bert's
solutions.  John's was noticeably slower, although still quite
tolerable.  There were a different number of matches, though:

  oic.2
oic  FALSETRUE Sum
  FALSE 74939 074939
  TRUE274   927 1201
  Sum 75213927 76140

where oic is the logical vector generated by John's solution, and
oic.2 is the logical vector generated by Bert's solution. Bert's
solution detected about 77% of the cases detected by John's.

I'm still exploring why that might be. One possible explanation, for
at least part of the difference, is the issue of partial-word matches.
Substantively, I am searching ambulance run records for words related
to opioid overdose, and I've noticed that the medics often spell
heroin as heroine  So in this context, I like partial-word
matches--I want to pick up records that (partially) match heroin
because it is contained in the word heroine .

There may be other things going on too.

Thanks.

--Chris

On Thu, Jul 9, 2015 at 3:24 PM, John Fox j...@mcmaster.ca wrote:
 Dear Christopher,

 My usual orientation to this kind of one-off problem is that I'm looking for 
 a simple correct solution. Computing time is usually much smaller than 
 programming time.

 That said, Bert Gunter's solution was about 5 times faster in a simple check 
 that I ran with microbenchmark, and Jeff Newmiller's solution was about 10 
 times faster. Both Bert's and Jeff's (eventual) solution protect against 
 partial (rather than full-word) matches, while mine doesn't (though it could 
 easily be modified to do that).

 Best,
  John

 -Original Message-
 From: Christopher W Ryan [mailto:cr...@binghamton.edu]
 Sent: July-09-15 2:49 PM
 To: Bert Gunter
 Cc: Jeff Newmiller; R Help; John Fox
 Subject: Re: [R] detecting any element in a vector of strings, appearing
 anywhere in any of several character variables in a dataframe

 Thanks everyone.  John's original solution worked great.  And with
 27,000 records, 65 alarm.words, and 6 columns to search, it takes only
 about 15 seconds.  That is certainly adequate for my needs.  But I
 will try out the other strategies too.

 And thanks also for lot's of new R things to learn--grep, grepl,
 do.call . . . that's always a bonus!

 --Chris Ryan

 On Thu, Jul 9, 2015 at 1:52 PM, Bert Gunter bgunter.4...@gmail.com
 wrote:
  Yup, that does it. Let grep figure out what's a word rather than doing
  it manually. Forgot about \b
 
  Cheers,
  Bert
 
 
  Bert Gunter
 
  Data is not information. Information is not knowledge. And knowledge
  is certainly not wisdom.
 -- Clifford Stoll
 
 
  On Thu, Jul 9, 2015 at 10:30 AM, Jeff Newmiller
  jdnew...@dcn.davis.ca.us wrote:
  Just add a word break marker before and after:
 
  zz$v5 - grepl( paste0( \\b(, paste0( alarm.words, collapse=| ),
 )\\b ), do.call( paste, zz[ , 2:3 ] ) ) )
  -
 --
  Jeff NewmillerThe .   .  Go
 Live...
  DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
Live:   OO#.. Dead: OO#..
 Playing
  Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
  /Software/Embedded Controllers)   .OO#.   .OO#.
 rocks...1k
  -
 --
  Sent from my phone. Please excuse my brevity.
 
  On July 9, 2015 10:12:23 AM PDT, Bert Gunter bgunter.4...@gmail.com
 wrote:
 Jeff:
 
 Well, it would be much better (no loops!) except, I think, for one
 issue: red would match barred and I don't think that this is what
 is wanted: the matches should be on whole words not just string
 patterns.
 
 So you would need to fix up the matching pattern to make this work,
 but it may be a little tricky, as arbitrary whitespace characters,
 e.g.   or \n etc. could be in the strings to be matched
 separating
 the words or ending the sentence.  I'm sure it can be done, but
 I'll
 leave it to you or others to figure it out.
 
 Of course, if my diagnosis is wrong or silly, please point this out.
 
 Cheers,
 Bert
 
 
 Bert Gunter
 
 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
-- Clifford Stoll
 
 
 On Thu, Jul 9, 2015 at 9:34 AM, Jeff Newmiller
 jdnew...@dcn.davis.ca.us wrote:
  I think grep is better suited to this:
 
  zz$v5 - grepl( paste0( alarm.words, collapse=| ), do.call(
 paste,
 zz[ , 2:3 ] ) ) )
 
 -
 --
  Jeff NewmillerThe .   .  Go
 Live...
  DCN:jdnew...@dcn.davis.ca.usBasics

Re: [R] detecting any element in a vector of strings, appearing anywhere in any of several character variables in a dataframe

2015-07-10 Thread Christopher W Ryan
Thanks everyone.  John's original solution worked great.  And with
27,000 records, 65 alarm.words, and 6 columns to search, it takes only
about 15 seconds.  That is certainly adequate for my needs.  But I
will try out the other strategies too.

And thanks also for lot's of new R things to learn--grep, grepl,
do.call . . . that's always a bonus!

--Chris Ryan

On Thu, Jul 9, 2015 at 1:52 PM, Bert Gunter bgunter.4...@gmail.com wrote:
 Yup, that does it. Let grep figure out what's a word rather than doing
 it manually. Forgot about \b

 Cheers,
 Bert


 Bert Gunter

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
-- Clifford Stoll


 On Thu, Jul 9, 2015 at 10:30 AM, Jeff Newmiller
 jdnew...@dcn.davis.ca.us wrote:
 Just add a word break marker before and after:

 zz$v5 - grepl( paste0( \\b(, paste0( alarm.words, collapse=| ), )\\b 
 ), do.call( paste, zz[ , 2:3 ] ) ) )
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 On July 9, 2015 10:12:23 AM PDT, Bert Gunter bgunter.4...@gmail.com wrote:
Jeff:

Well, it would be much better (no loops!) except, I think, for one
issue: red would match barred and I don't think that this is what
is wanted: the matches should be on whole words not just string
patterns.

So you would need to fix up the matching pattern to make this work,
but it may be a little tricky, as arbitrary whitespace characters,
e.g.   or \n etc. could be in the strings to be matched separating
the words or ending the sentence.  I'm sure it can be done, but I'll
leave it to you or others to figure it out.

Of course, if my diagnosis is wrong or silly, please point this out.

Cheers,
Bert


Bert Gunter

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
   -- Clifford Stoll


On Thu, Jul 9, 2015 at 9:34 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:
 I think grep is better suited to this:

 zz$v5 - grepl( paste0( alarm.words, collapse=| ), do.call( paste,
zz[ , 2:3 ] ) ) )

---
 Jeff NewmillerThe .   .  Go
Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
Go...
   Live:   OO#.. Dead: OO#..
Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.
rocks...1k

---
 Sent from my phone. Please excuse my brevity.

 On July 9, 2015 8:51:10 AM PDT, Bert Gunter bgunter.4...@gmail.com
wrote:
Here's a way to do it that uses %in% (i.e. match() ) and uses only a
single, not a double, loop. It should be more efficient.

 sapply(strsplit(do.call(paste,zz[,2:3]),[[:space:]]+),
+   function(x)any(x %in% alarm.words))

 [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE

The idea is to paste the strings in each row (do.call allows an
arbitrary number of columns) into a single string and then use
strsplit to break the string into individual words on whitespace.
Then the matching is vectorized with the any( %in% ... ) call.

Cheers,
Bert
Bert Gunter

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
   -- Clifford Stoll


On Thu, Jul 9, 2015 at 6:05 AM, John Fox j...@mcmaster.ca wrote:
 Dear Chris,

 If I understand correctly what you want, how about the following?

 rows - apply(zz[, 2:3], 1, function(x) any(sapply(alarm.words,
grepl, x=x)))
 zz[rows, ]

   v1  v2v3 v4
 3  -1.022329green turtleronald weasley  2
 6   0.336599  waffle the hamsterred sparks  1
 9  -1.631874 yellow giraffe with a long neck gandalf the white  1
 10  1.130622  black bear  gandalf the grey  2

 I hope this helps,
  John

 
 John Fox, Professor
 McMaster University
 Hamilton, Ontario, Canada
 http://socserv.mcmaster.ca/jfox/


 On Wed, 08 Jul 2015 22:23:37 -0400
  Christopher W. Ryan cr...@binghamton.edu wrote:
 Running R 3.1.1 on windows 7

 I want to identify as a case any record in a dataframe that
contains
any
 of several keywords in any of several variables.

 Example:

 # create a dataframe with 4 variables and 10 records
 v2 - c(white bird, blue bird, green turtle, quick brown

[R] detecting any element in a vector of strings, appearing anywhere in any of several character variables in a dataframe

2015-07-09 Thread Christopher W. Ryan
Running R 3.1.1 on windows 7

I want to identify as a case any record in a dataframe that contains any
of several keywords in any of several variables.

Example:

# create a dataframe with 4 variables and 10 records
v2 - c(white bird, blue bird, green turtle, quick brown fox,
big black dog, waffle the hamster, benny likes food a lot, hello
world, yellow giraffe with a long neck, black bear)
v3 - c(harry potter, hermione grainger, ronald weasley, ginny
weasley, dudley dursley, red sparks, blue sparks, white dress
robes, gandalf the white, gandalf the grey)
zz - data.frame(v1=rnorm(10), v2=v2, v3=v3, v4=rpois(10, lambda=2),
stringsAsFactors=FALSE)
str(zz)
zz

# here are the keywords
alarm.words - c(red, green, turtle, gandalf)

# For each row/record, I want to test whether the string in v2 or the
string in v3 contains any of the strings in alarm.words. And then if so,
set zz$v5=TRUE for that record.

# I'm thinking the str_detect function in the stringr package ought to
be able to help, perhaps with some use of apply over the rows, but I
obviously misunderstand something about how str_detect works

library(stringr)

str_detect(zz[,2:3], alarm.words)# error: the target of the search
 # must be a vector, not multiple
 # columns

str_detect(zz[1:4,2:3], alarm.words) # same error

str_detect(zz[,2], alarm.words)  # error, length of alarm.words
 # is less than the number of
 # rows I am using for the
 # comparison

str_detect(zz[1:4,2], alarm.words)   # works as hoped when
length(alarm.words)  # confining nrows
 # to the length of alarm.words

str_detect(zz, alarm.words)  # obviously not right

# maybe I need apply() ?
my.f - function(x){str_detect(x, alarm.words)}

apply(zz[,2], 1, my.f) # again, a mismatch in lengths
   # between alarm.words and that
   # in which I am searching for
   # matching strings

apply(zz, 2, my.f) # now I'm getting somewhere
apply(zz[1:4,], 2, my.f)   # but still only works with 4
   # rows of the dataframe


# perhaps %in% could do the job?

Appreciate any advice.

--Chris Ryan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] detecting any element in a vector of strings, appearing anywhere in any of several character variables in a dataframe

2015-07-09 Thread Christopher W Ryan
Running R 3.1.1 on windows 7

I want to identify as a case any record in a dataframe that contains any
of several keywords in any of several variables.

Example:

# create a dataframe with 4 variables and 10 records
v2 - c(white bird, blue bird, green turtle, quick brown fox,
big black dog, waffle the hamster, benny likes food a lot, hello
world, yellow giraffe with a long neck, black bear)
v3 - c(harry potter, hermione grainger, ronald weasley, ginny
weasley, dudley dursley, red sparks, blue sparks, white dress
robes, gandalf the white, gandalf the grey)
zz - data.frame(v1=rnorm(10), v2=v2, v3=v3, v4=rpois(10, lambda=2),
stringsAsFactors=FALSE)
str(zz)
zz

# here are the keywords
alarm.words - c(red, green, turtle, gandalf)

# For each row/record, I want to test whether the string in v2 or the
string in v3 contains any of the strings in alarm.words. And then if so,
set zz$v5=TRUE for that record.

# I'm thinking the str_detect function in the stringr package ought to
be able to help, perhaps with some use of apply over the rows, but I
obviously misunderstand something about how str_detect works

library(stringr)

str_detect(zz[,2:3], alarm.words)# error: the target of the search
 # must be a vector, not multiple
 # columns

str_detect(zz[1:4,2:3], alarm.words) # same error

str_detect(zz[,2], alarm.words)  # error, length of alarm.words
 # is less than the number of
 # rows I am using for the
 # comparison

str_detect(zz[1:4,2], alarm.words)   # works as hoped when
length(alarm.words)  # confining nrows
 # to the length of alarm.words

str_detect(zz, alarm.words)  # obviously not right

# maybe I need apply() ?
my.f - function(x){str_detect(x, alarm.words)}

apply(zz[,2], 1, my.f) # again, a mismatch in lengths
   # between alarm.words and that
   # in which I am searching for
   # matching strings

apply(zz, 2, my.f) # now I'm getting somewhere
apply(zz[1:4,], 2, my.f)   # but still only works with 4
   # rows of the dataframe

Appreciate any advice.

-- Chris Ryan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting the one-dimensional density of events in time

2015-03-09 Thread Christopher W. Ryan
Tom and Bill--

Thanks! Both excellent solutions.

--Chris

Christopher W. Ryan, MD, MS
cryanatbinghamtondotedu

Early success is a terrible teacher. You’re essentially being rewarded
for a lack of preparation, so when you find yourself in a situation
where you must prepare, you can’t do it. You don’t know how.
--Chris Hadfield, An Astronaut's Guide to Life on Earth

William Dunlap wrote:
 You could change the x component of density's output back into a Date object
 and let plot choose a Date axis in its usual way.  E.g.,
den - density(as.numeric(dd))
den$x - as.Date(den$x, origin=as.Date(1970-01-01))
plot(den$x, den$y)
 (You probably will also want to normalize the y component to be on a
 specific
 per time unit, say day or year, basis.)
 
 
 
 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com http://tibco.com
 
 On Fri, Mar 6, 2015 at 1:03 PM, Christopher W. Ryan
 cr...@binghamton.edu mailto:cr...@binghamton.edu wrote:
 
 I have the dates of occurence of a repetitive event. I would like to
 plot the density of these events, as well as their specific temporal
 location.  This is as far as I have gotten:
 
 # generate some sample data: dates in 2014
 random.dates - sample(1:31, 100, replace=TRUE)
 random.months - sample(1:12, 100, replace=TRUE)
 dd - as.Date(as.character((paste(random.dates, random.months, 2014,
 sep=-))), format=%d-%m-%Y)
 dd - dd[!is.na http://is.na(dd)]
 
 # plot density with a rug.
 density(as.numeric(dd))
 plot(density(as.numeric(dd)))
 rug(as.numeric(dd))
 
 # But horizontal axis label is not very informative
 # would prefer labeling the start of each month
 plot(density(as.numeric(dd)), axes=FALSE)
 library(zoo)
 new.axis - as.yearmon(dd)
 
 # but then what? This is where I get stuck--adding back a sensible axis
 
 Grateful for any guidance.
 
 Thanks.
 
 --Chris
 --
 Christopher W. Ryan, MD, MS
 cryanatbinghamtondotedu
 
 Early success is a terrible teacher. You’re essentially being rewarded
 for a lack of preparation, so when you find yourself in a situation
 where you must prepare, you can’t do it. You don’t know how.
 --Chris Hadfield, An Astronaut's Guide to Life on Earth
 
 
 ---
 
 __
 R-help@r-project.org mailto:R-help@r-project.org mailing list --
 To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


---

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plotting the one-dimensional density of events in time

2015-03-06 Thread Christopher W. Ryan
I have the dates of occurence of a repetitive event. I would like to
plot the density of these events, as well as their specific temporal
location.  This is as far as I have gotten:

# generate some sample data: dates in 2014
random.dates - sample(1:31, 100, replace=TRUE)
random.months - sample(1:12, 100, replace=TRUE)
dd - as.Date(as.character((paste(random.dates, random.months, 2014,
sep=-))), format=%d-%m-%Y)
dd - dd[!is.na(dd)]

# plot density with a rug.
density(as.numeric(dd))
plot(density(as.numeric(dd)))
rug(as.numeric(dd))

# But horizontal axis label is not very informative
# would prefer labeling the start of each month
plot(density(as.numeric(dd)), axes=FALSE)
library(zoo)
new.axis - as.yearmon(dd)

# but then what? This is where I get stuck--adding back a sensible axis

Grateful for any guidance.

Thanks.

--Chris
-- 
Christopher W. Ryan, MD, MS
cryanatbinghamtondotedu

Early success is a terrible teacher. You’re essentially being rewarded
for a lack of preparation, so when you find yourself in a situation
where you must prepare, you can’t do it. You don’t know how.
--Chris Hadfield, An Astronaut's Guide to Life on Earth


---

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Best Beginner Books?

2014-10-01 Thread Christopher W Ryan
I would recommend these for the absolute beginner with R:

A Beginner's Guide to R by Zuur
and
Data Manipulation with R by Spector

I have not seen this, but if their pattern holds, this one coming out
from Highland Statistics will also probably be useful for a newcomer:

A Beginner's Guide to Data Exploration and Visualization with R
Ieno EN, Zuur AF
Paperback available from November 2014
Harcover available from January 2015

--Chris Ryan

On Thu, Oct 2, 2014 at 12:48 AM, Jason Eyerly teamtraders3...@gmail.com wrote:
 Hey Folks,
 I’m hoping to get a general consenus on a good book for someone with no 
 prior experience in R that is new to data science and statistical analysis. 
 So far, I’ve been recommended to read “Software For Data Analysis: 
 Programming With R (Statistics And Computing) by John Chambers. I’ve seen 
 some other books mentioned here and there in the mailings, but I can’t recall 
 their names. Does anyone have any though on this book, or others?

 Best Regards,
 Jason Eyerly
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] package loadable in R 3.1.1 Rterm but not in emacs/ESS

2014-09-26 Thread Christopher W Ryan
I'm running R on Windows 7. Clean install on a brand new computer
yesterday.  I installed Protext then R then Vincent Goulet's emacs
with ESS, in that order.

I then installed some R packages, in the R terminal window.  Among them was car

Today I opened emacs, hit M-x R to start an R session, and tried to
load the car library. Results looked like the following:


R version 3.1.1 (2014-07-10) -- Sock it to Me
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: i386-w64-mingw32/i386 (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

  options(chmhelp=FALSE, help_type=text)
 options(STERM='iESS', str.dendrogram.last=', editor='emacsclient.exe', 
 show.error.locations=TRUE)

 library(car)
Error in library(car) : there is no package called 'car'




However, in an R terminal (that is, outside of emacs) the car package
loads fine:

R version 3.1.1 (2014-07-10) -- Sock it to Me
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: i386-w64-mingw32/i386 (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

 library(car)


I have not encountered this before and am confused. Why would R 3.1.1
in its terminal see a library, whereas it would not in emacs?  (car
is just an example; the same thing happens with zoo, stringr, Hmisc,
and others.)

Thanks.

--Chris Ryan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] empty density plot for point pattern in spatstat

2014-07-02 Thread Christopher W Ryan
Thanks, this solves my problem.

On R 3.1.0 on Windows XP:

library(spatstat)  # version 1.37.0
data(redwood)
dens - density(redwood)
plot(dens, useRaster=FALSE, ribargs=list(useRaster=FALSE))

# produces the proper plot

--Chris Ryan

On Fri, Jun 20, 2014 at 5:24 PM, Pablo Ramón para...@utpl.edu.ec wrote:
 Hi Chris,

 Try the form: plot(dens, useRaster=FALSE, ribargs=list(useRaster=FALSE))



 Pablo


 2014-06-20 15:58 GMT-05:00 Christopher W Ryan cr...@binghamton.edu:

 I'm back in the office with the machine that was giving me trouble.

 # fresh start-up of R 3.1.0, installed on the my machine's hard drive,
 #under Windows XP Service Pack 3.
 # spatstat version 1.37-0

 library(spatstat)
 data(redwood)
 dens - density(redwood)
 str(dens) # everything looks to be in order
 plot(dens)  # empty plot with empty ribbon on right side
 dev.cur() # yields  null device
   #1
   # I think I closed the plot window
   # before issuing this command

 plot(dens, col=grey(seq(0,1,length=32)))
 # yields an empty density plot with empty ribbon

 plot(dens, useRaster=FALSE)
 # yields a proper density plot in blue/green/yellow,
 # with an empty ribbon

 plot(dens, useRaster=FALSE, ribargs=list(useRaster=TRUE))
 # yields a proper density plot in blue/green/yellow,
 # with an empty ribbon

 dev.cur()
 # yields windows
 #   2 

 example(plot.im)
 # yields a series of 8 plot.
 #All but one of them is empty--both plot area and ribbon
 # the second one has a color ramp in the ribbon
 # along the bottom, but again no plot

 with(bei.extra,plot(elev))
 # yields an empty plot with empty ribbon

 with(bei.extra,plot(elev, useRaster=FALSE))
 with(bei.extra,plot(elev, useRaster=FALSE,  ribargs=list(useRaster=TRUE)))
 # these both yield the same result:
 # a colored density plot with an empty ribbon

 Thanks

 --Chris Ryan

 On Thu, Jun 19, 2014 at 8:31 PM, Adrian Baddeley
 adrian.badde...@uwa.edu.au wrote:
  Dear Pablo,
 
  Yes, effectively utilizando useRaster = FALSE, the plot is printed.
 
  Muy bien!
 
  Only the ribbon appears without color.
 
  Try setting ribargs=list(useRaster=TRUE).
 
  I'm using R version 3.0.2, Windows 7 system, and spatstat 1.33-0
  package version.
 
  Things may improve if you upgrade to the current versions, R 3.1.0 and
  spatstat 1.37-0.
 
  A
 
  Prof Adrian Baddeley FAA
  University of Western Australia




 --
 Mat. Pablo Ramón
 Sección de Ecología
 Departamento de Ciencias Naturales
 Universidad Técnica Particular de Loja (http://www.utpl.edu.ec/)
 Telf. 593 7 2570275 Ext. 2505, 2258

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] empty density plot for point pattern in spatstat

2014-06-20 Thread Christopher W Ryan
I'm back in the office with the machine that was giving me trouble.

# fresh start-up of R 3.1.0, installed on the my machine's hard drive,
#under Windows XP Service Pack 3.
# spatstat version 1.37-0

library(spatstat)
data(redwood)
dens - density(redwood)
str(dens) # everything looks to be in order
plot(dens)  # empty plot with empty ribbon on right side
dev.cur() # yields  null device
  #1
  # I think I closed the plot window
  # before issuing this command

plot(dens, col=grey(seq(0,1,length=32)))
# yields an empty density plot with empty ribbon

plot(dens, useRaster=FALSE)
# yields a proper density plot in blue/green/yellow,
# with an empty ribbon

plot(dens, useRaster=FALSE, ribargs=list(useRaster=TRUE))
# yields a proper density plot in blue/green/yellow,
# with an empty ribbon

dev.cur()
# yields windows
#   2 

example(plot.im)
# yields a series of 8 plot.
#All but one of them is empty--both plot area and ribbon
# the second one has a color ramp in the ribbon
# along the bottom, but again no plot

with(bei.extra,plot(elev))
# yields an empty plot with empty ribbon

with(bei.extra,plot(elev, useRaster=FALSE))
with(bei.extra,plot(elev, useRaster=FALSE,  ribargs=list(useRaster=TRUE)))
# these both yield the same result:
# a colored density plot with an empty ribbon

Thanks

--Chris Ryan

On Thu, Jun 19, 2014 at 8:31 PM, Adrian Baddeley
adrian.badde...@uwa.edu.au wrote:
 Dear Pablo,

 Yes, effectively utilizando useRaster = FALSE, the plot is printed.

 Muy bien!

 Only the ribbon appears without color.

 Try setting ribargs=list(useRaster=TRUE).

 I'm using R version 3.0.2, Windows 7 system, and spatstat 1.33-0 package 
 version.

 Things may improve if you upgrade to the current versions, R 3.1.0 and 
 spatstat 1.37-0.

 A

 Prof Adrian Baddeley FAA
 University of Western Australia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] empty density plot for point pattern in spatstat

2014-06-18 Thread Christopher W. Ryan
Let me just say that the expertise and support on this List is amazing,
and I'm very grateful. I'm learning a lot from your discussions.

I will not be at my problematic machine again until Friday 20 June. I
will take these steps then, and try to provide as many details about the
system as I can. It is Win XP, now R 3.1.0, and (IIRC) spatstat 1.37-0
'Model Prisoner'.

I do not encounter the problem on Windows 7, R 3.1.0, spatstat 1.36-0
‘Intense Scrutiny’.  Nor on Ubuntu 12.04, R 3.1.0, spatstat 1.36-0.

 Thanks.

--Chris Ryan

Adrian Baddeley wrote:
 Dear Pablo
 
 Please provide details of the system you are using (Windows - version?  R 
 version? spatstat version?)
 
 Also, when you issue the plot command, what graphics device are you using?
 [To find out, type dev.cur() after performing the plot, and read the printout]
 
 Try changing the colour map by running 'example(plot.im)' or 
 setting col=grey(seq(0,1,length=32)) in the call to plot.im
 
 Try setting 'useRaster=FALSE' in the call to plot.im. 
 
 Let me know if any of these work. If not, let's talk on skype
 to figure it out.
 
 regards
 Adrian
 
 
 Prof Adrian Baddeley FAA
 University of Western Australia
 
 From: Pablo Ramón [para...@utpl.edu.ec]
 Sent: Wednesday, 18 June 2014 9:36 PM
 To: Rolf Turner
 Cc: Christopher W Ryan; R-help; Adrian Baddeley
 Subject: Re: [R] empty density plot for point pattern in spatstat
 
 Dear Rolf,
 
 The problem persists with the function plot.imhttp://plot.im(), or plot 
 (density ()), similar to with(bei.extra, plot (elev)). This problem occurs 
 only with my personal machine, to another machine this problem does not 
 occur. However, when I use another function, such kde2d () in the MASS 
 package and perform the plot, the graph is obtained without problem.
 
 Best wishes
 Pablo
 
 
 
 
 
 2014-06-17 16:23 GMT-05:00 Rolf Turner 
 r.tur...@auckland.ac.nzmailto:r.tur...@auckland.ac.nz:
 On 18/06/14 03:49, Christopher W Ryan wrote:
 I upgraded to version 3.1.0 of R, still on Win XP, but that did not
 solve the problem:
 
 library(spatstat)
 data(redwood)
 str(redwood)
 plot(redwood)  # works OK
 plot(Kest(redwood))  # works OK
 plot(density(redwood))  # produces an empty plot
 # with just the figure border
 # and a thin vertical box for
 the color ramp
 # but no color ramp in it
 
 Any other ideas? I'd be much obliged.
 
 I have had an inquiry of this nature previously and sad to say was completely 
 unable to find a solution.  I made numerous suggestions to the inquirer, e.g.:
 
 *looking for a plot.imhttp://plot.im() function masking the real
 plot.imhttp://plot.im() function from spatstat
 
 * cleaning out the global environment
 (.RData) in case there were some artefacts hanging around that had an 
 impact on plot.imhttp://plot.im()
 
 * completely *removing* .RData and starting again,...
 
 Nothing worked.
 
 It is very difficult to diagnose a problem like this remotely.  If you could 
 find an R guru who was able to come and sit with you at your computer it is 
 ***just possible*** that they could track down what is going wrong.
 
 However it seems to me that the problem is due to some sort of recalcitrance 
 of the Windoze system and is thus probably insoluble.
 
 I am cc-ing this to the person who made the previous inquiry.  Pablo, have 
 there been any further developments since we last communicated about this 
 problem?
 
 cheers,
 
 Rolf Turner
 
 P. S. BTW Chris:  Does plot.imhttp://plot.im() screw up with other images?  
 E.g. what happens if you do
 
 with(bei.extra,plot(elev))
 
 ???
 
 R. T.
 
 
 
 
 --
 Mat. Pablo Ramón
 Sección de Ecología
 Departamento de Ciencias Naturales
 Universidad Técnica Particular de Loja (http://www.utpl.edu.ec/)
 Telf. 593 7 2570275 Ext. 2505, 2258


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] empty density plot for point pattern in spatstat

2014-06-17 Thread Christopher W Ryan
I upgraded to version 3.1.0 of R, still on Win XP, but that did not
solve the problem:

library(spatstat)
data(redwood)
str(redwood)
plot(redwood)  # works OK
plot(Kest(redwood))  # works OK
plot(density(redwood))  # produces an empty plot
   # with just the figure border
   # and a thin vertical box for
the color ramp
   # but no color ramp in it

Any other ideas? I'd be much obliged.

Thanks.

--Chris Ryan



On Tue, May 20, 2014 at 4:31 PM, Christopher W Ryan
cr...@binghamton.edu wrote:
 I'm running R-3.0.3 on Windows XP Professional on an institutional PC.
 (My freedom to update to a more recent version of R is limited,
 although I might be able to prevail upon the powers that be, if that
 would solve the problem.)

 I installed spatstat via install.packages() and received version 1.37-0.

 Here is some code:

 library(spatstat)
 data(redwood)
 plot(redwoods)  # works OK
 plot(Kest(redwood))  # works OK
 plot(density(redwood))  # produces an empty plot as described above

 Any suggestions? What am I missing?

 Thanks.

 --Chris Ryan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] empty density plot for point pattern in spatstat

2014-05-20 Thread Christopher W Ryan
I'm running R-3.0.3 on Windows XP Professional on an institutional PC.
(My freedom to update to a more recent version of R is limited,
although I might be able to prevail upon the powers that be, if that
would solve the problem.)

I installed spatstat via install.packages() and received version 1.37-0.

Here is some code:

library(spatstat)
data(redwood)
plot(redwoods)  # works OK
plot(Kest(redwood))  # works OK
plot(density(redwood))  # produces an empty plot as described above

Any suggestions? What am I missing?

Thanks.

--Chris Ryan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] demonstrating R in introductory class using point-and-click software

2014-01-17 Thread Christopher W. Ryan
I've never taught a complete course, but I recently conducted 2
introduction to R workshops, each about 5 hours long, for a class of
about 15 high school science students. Very basic. We emphasized
graphics. But by the end, we had gotten into conceptual stuff about the
population vs the sample, sampling variation, and the distribution of
test statistics. They loved it, and although I am of course biased, I
think it went well. We may do more.

I used base R, no GUI, on purpose. I wanted to convey to them the
advantages (and ease) of writing, saving, and sharing code. Literate
programming and reproducible research and all that.

I have everything we did in an emacs org file. I'd be happy to share it
with you if you'd like.

So maybe don't give up on command line just yet?

I also can't understand how engineering students, no matter where they
are in the course of their training, could be averse to writing code.

--Chris

Christopher W. Ryan, MD, MS
SUNY Upstate Medical University Clinical Campus at Binghamton
425 Robinson Street, Binghamton, NY  13904
cryanatbinghamtondotedu

Once we recognize that we do not err out of laziness, stupidity, or
evil intent, we can liberate ourselves from the impossible burden of
trying to be permanently right. We can take seriously the proposition
that we could be in error, without deeming ourselves idiotic or
unworthy. [Karen Schulz, in Being Wrong: Adventures in the Margin of Error]


Ranjan Maitra wrote:
 Dear friends,
 
 OK, I did not think that it would ever come down to this, but I am
 here with a question on what would be the best point-and-click approach
 to using R in the classroom in a way that the students can also follow
 and exhibit (on their own). 
 
 So let me explain: I am teaching an introductory-level statistics class
 for introductory first- and second-year civil and industrial
 engineering students. This is a basic class following the book (not
 important): Basic Engineering Data Collection and Analysis by Stephen B.
 Vardeman and John Marcus. The class is very basic, and has
 traditionally relied on JMP and Excel (less prevalent) to illustrate
 data examples. I don't want to use either because I am a proponent of
 OSS, and also because I find these two too cumbersome to handle. Also,
 I don't think I have the time (and the students do not have the
 inclination, I am told) to handle even basic interactive programming.
 So, I was wondering if people with more experience would have
 suggestions on what would be best to use.
 
 I apologize if this has been discussed quite a bit here, but as I said
 before, I did not think that it would come to this, so I basically did
 not pay much attention.
 
 Thanks very much for suggestions and experiences!
 Best wishes,
 Ranjan
 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hello R user!

2013-12-17 Thread Christopher W Ryan
What do you mean by merge them into one?  Make both graphs appear on
the same page of a document? Make a single figure containing both
graphs?  Plot data from both dataframes on the same set of axes?

--Chris Ryan

On Tue, Dec 17, 2013 at 12:33 PM, bibek sharma mbhpat...@gmail.com wrote:
 Hello R user,

 I have created two plots (attached!) using the codes below
 and would like to merge these figures in one. any suggestions are highly
 appreciated!
 Thanks,

 plot(graph1$yod,graph1$xod,data=graph1)
 dfx = data.frame(ev1=graph1$xod, ev2=graph1$yod, ev3=abs(graph1$dif))
 symbols(x=dfx$ev1, y=dfx$ev2, circles=dfx$ev3,inches=1/8, ann=F,
 bg=black, fg=NULL,xlim=c(-35,35),ylim=c(-35,35))
 abline(h=0,v=0)

 plot(graph2$yod,graph2$xod,data=graph2)
 dfx = data.frame(ev1=graph2$xod, ev2=graph2$yod, ev3=abs(graph2$dif))
 lines(symbols(x=dfx$ev1, y=dfx$ev2, circles=dfx$ev3,inches=1/8, ann=F,
 bg=blue, fg=NULL,xlim=c(-35,35),ylim=c(-35,35)))
 abline(h=0,v=0)

 Best,
 Bibek

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] a better method than a long expression with many OR clauses

2013-12-17 Thread Christopher W Ryan
dd - data.frame(longVariableName1=sample(1:4, 10, replace=TRUE),
longVariableName2=sample(1:4, 10, replace=TRUE))
dd
# define who is a case and who is not
transform(dd, case=(longVariableName1==3 | longVariableName2==3))

But in reality I have 9 of those longVariableName variables,
all of this pattern: alphaCauseX, where X is an integer 1:9.
For any given observation, if any of them == 3, then case=TRUE
Is there a shorter or more elegant way of doing this than
typing out that long string of 9 OR clauses?

I read about any(), but couldn't quite make that do what I want. Maybe
I was using it wrong.

Thanks.

--Chris Ryan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hello R user!

2013-12-17 Thread Christopher W Ryan
Here is a simple example (without the proportional size
bubbles--you've been given some references on that) using the lattice
package:

# one dataframe holds the data from both sources I call them.
# they would be data from your two separate dataframes,
# that you call graph1 and graph2
dd - data.frame(x=rnorm(10), y=rnorm(10), source=sample(c(A,B),
10, replace=TRUE))
dd
library(lattice)
xyplot(y~x | source, data=dd)
# or another way
xyplot(y~x, groups=source, data=dd,  auto.key=TRUE)

You'll want to think about how you are storing your data. Certain ways
of doing it lend themselves to certain ways of graphing. Some ways
make things difficult . . .

--Chris Ryan

On Tue, Dec 17, 2013 at 1:24 PM, Sarah Goslee sarah.gos...@gmail.com wrote:
 On Tue, Dec 17, 2013 at 1:04 PM, bibek sharma mbhpat...@gmail.com wrote:
 Hi Sarah,
 It is not about mfrow or mfcol.  I would like to see both sets of data in
 one figure.
 All I want was combining these two plots  to one.
 Any suggestions?
 Bibek

 Suggestions? Yes. Read the link I and others provided about
 reproducible questions.

 Then there's the suggestion I already provided, using points() or
 lines() to add more data, possibly with xlim or ylim specified (see
 ?par for details). Without a reproducible example, I can't give
 specific details.

 Also, size of the circle in the plots represents  rates and so should be
 shown in different sizes.
 I tried using plots and points but this did  not give me different sizes.

 If you want to use base graphics, then cex is what you need (see, you
 guessed it, ?par). You can pass a vector of sizes for your plotting
 character.

 For more sophisticated approaches, you might google bubble plot R for ideas.

 Sarah



 On Tue, Dec 17, 2013 at 9:45 AM, Sarah Goslee sarah.gos...@gmail.com
 wrote:

 What do you mean by merge these figures in one? If you want two
 figures on one page, see ?par - specifically mfrow and mfcol.

 If you want both sets of data in one figure, maybe ?points or ?lines
 though I see you're already familiar with at least ?lines.

 The list doesn't take most attachments, and you might also take a look at:

 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

 Asking intelligible questions is the best strategy for receiving
 intelligible answers.

 Sarah

 On Tue, Dec 17, 2013 at 12:33 PM, bibek sharma mbhpat...@gmail.com
 wrote:
  Hello R user,
 
  I have created two plots (attached!) using the codes below
  and would like to merge these figures in one. any suggestions are highly
  appreciated!
  Thanks,
 
  plot(graph1$yod,graph1$xod,data=graph1)
  dfx = data.frame(ev1=graph1$xod, ev2=graph1$yod, ev3=abs(graph1$dif))
  symbols(x=dfx$ev1, y=dfx$ev2, circles=dfx$ev3,inches=1/8, ann=F,
  bg=black, fg=NULL,xlim=c(-35,35),ylim=c(-35,35))
  abline(h=0,v=0)
 
  plot(graph2$yod,graph2$xod,data=graph2)
  dfx = data.frame(ev1=graph2$xod, ev2=graph2$yod, ev3=abs(graph2$dif))
  lines(symbols(x=dfx$ev1, y=dfx$ev2, circles=dfx$ev3,inches=1/8, ann=F,
  bg=blue, fg=NULL,xlim=c(-35,35),ylim=c(-35,35)))
  abline(h=0,v=0)
 
  Best,
  Bibek





 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R for a stats intro for undergrads in the US?

2013-11-17 Thread Christopher W. Ryan
I would recommend it. I have no experience teaching statistics to
psychology students, but I have done a sequence of hands-on workshops
introducing R to a class of high school students who were engaged in a
three-year-long science research class. My presentations were not
discipline-specific, and we have just barely gotten into any real
statistical concepts so far. Mainly it was the nuts and bolts of how to
use base R; the advantages of writing and saving code over a
point-and-click interface, reproducible research and all; and a lot of
graphics. End of last session we just started to tackle the concepts of
sample versus population, and sampling variation.  I could share with
you my org file where I stored all the commands and notes, if it would
be of any use.

--Chris Ryan
SUNY Upstate Medical University
Binghamton, NY

Spencer Graves wrote:
 Hello, All:
 
 
   Would anyone recommend R for an introductory statistics class for
 freshman psychology students in the US?  If yes, might there be any
 notes for such available?
 
 
   I just checked r-projects.org and CRAN contributed documentation
 and found nothing.
 
 
   I have a friend who teaches such a class, and wondered if R might
 be suitable.  The alternative is SPSS at $406 per student.
 
 
   Thanks,
   Spencer
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] making a wider, shorter, 4-column table instead of the narrower, longer, 2-column table I get with tm, Hmisc, and Sweave

2013-09-19 Thread Christopher W. Ryan
I think my question sort of straddles two mailing lists, texhax and
r-help. Response so far on texhax has been scant.

Using the tm package, I am tabulating the frequencies of words used by
respondents to several survey questions. I use Sweave and the Hmisc
latex() command to produce the output report. running R 2.15.2, MikTeX
2.9, on WinXP.

Here is some Rnw code to replicate the problem:

\documentclass{article}
\usepackage[margin=1in]{geometry}
\usepackage{Sweave}

\begin{document}

options, echo=FALSE, results=hide=
options(SweaveSyntax=SweaveSyntaxNoweb)
library(tm)
library(Hmisc)
@

generatedocuments, echo=FALSE=
tags - sample(1:40, 110, replace=TRUE)
words - paste(word, tags, sep=)
words2 - paste(words, tags, sep=)
n - length(words)
words3 - paste(words2[1:(n/2)], words2[(n/2 + 1):n], sep= )
docs - data.frame(words3)
# the data.frame part isn't really needed for
# this minimal example, but I include it to
# mimic my operational code
@

makecorpora, echo=FALSE, results=hid=
docs2 - Corpus(DataframeSource(docs))
@

maketdm, echo=FALSE, results=hide=
docs3 - TermDocumentMatrix(docs2)
@

wordfrequencies, echo=FALSE, results=hide=
word.freq - rowSums(inspect(docs3))
@

narrowtable, results=tex, echo=FALSE=
latex(sort(word.freq, decreasing=TRUE), file=, caption=Frequencies of
words, label=qol )
@

\end{document}

The output consists, of course, of a very long two-column table, with
wide swaths of empty white space on either side. I'd like to use the
page width more efficiently, by making a shorter, wider table of 4
columns (2 pairs of columns), with the rows  flowing from the bottom
of the left-hand pair to the top of the right-hand pair.

I know I could modify the intermediate .tex file by hand, after
R/Sweave/HMisc generates it, splitting it at some row roughly halfway
down. But is there a way to do this programmatically in the Rnw file?
All I can think of is using two separate chunks to generate two separate
tables, one for elements 1:(length(word.freq)/2) of word.freq and one
for the remaining elements. Is there a better way?

Thanks.

--Chris
-- 
Christopher W. Ryan, MD, MS
SUNY Upstate Medical University Clinical Campus at Binghamton
425 Robinson Street, Binghamton, NY  13904
cryanatbinghamtondotedu

Once we recognize that we do not err out of laziness, stupidity, or
evil intent, we can liberate ourselves from the impossible burden of
trying to be permanently right. We can take seriously the proposition
that we could be in error, without deeming ourselves idiotic or
unworthy. [Karen Schulz, in Being Wrong: Adventures in the Margin of Error]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.dta()

2013-09-05 Thread Christopher W. Ryan
I don't know about 3.0.1, but the 2.15.x that I'm still using requires
the foreign package--that's where the read.dta command resides.

library(foreign)

--Chris Ryan
SUNY Upstate Medical University
Binghamton, NY USA



Debasish Roy wrote:
 I've been using R 3.0.1 version. I tried to read a file named  abc.dta() 
 
 I used the command  X - read.dta(abc.dta) and it gave me 
 Error: could not find function read.dta
 
 Can anyone help me what could be the problem and how to fix it ? 
 
 
 
 Thanks, Deb.
   [[alternative HTML version deleted]]
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hist with Dates

2013-01-15 Thread Christopher W Ryan
If all you want is frequencies, could you use table()?

dd - c(2001-01-02, 2001-01-02,  2001-01-02, 2001-01-02,
2001-01-02, 2001-01-02, 2001-01-02, 2001-01-02, 2001-01-02,
2001-01-03, 2001-01-03, 2001-01-03, 2001-01-03, 2001-01-03,
2001-01-03, 2001-01-03, 2001-01-03, 2001-01-04, 2001-01-04)
ddd - as.Date(dd)
str(ddd)
table(ddd)
as.numeric(table(ddd))

--Chris Ryan
SUNY Upstate Medical University
Binghamton, NY USA

On Tue, Jan 15, 2013 at 7:45 AM, Dominic Roye dominic.r...@gmail.com wrote:
 Hello R-Helpers,


 I would like to generate the frequencies of dates (data.RData).

 str(data)
  Date[1:43748], format: 2001-01-01 2001-01-01 2001-01-01 2001-01-01
 ...


 test - hist(data,days,freq=T)
 test$counts
[1] 27 17 12 15  6  4  5 13  5 13 15  9  7 10 11 21 10 10  9 11 15  7 10
 14


 but the count is wrong, because i have only 9 times 01/01/2001 etc.

 data[1:30]
  [1] 2001-01-01 2001-01-01 2001-01-01 2001-01-01 2001-01-01
  [6] 2001-01-01 2001-01-01 2001-01-01 2001-01-01

 2001-01-02
 [11] 2001-01-02 2001-01-02 2001-01-02 2001-01-02 2001-01-02
 [16] 2001-01-02 2001-01-02 2001-01-02 2001-01-02 2001-01-02
 [21] 2001-01-03 2001-01-03 2001-01-03 2001-01-03 2001-01-03
 [26] 2001-01-03 2001-01-03 2001-01-03 2001-01-03 2001-01-03



 Now, when i give R the breaks.

 br -
 seq(as.Date(31.12.2000,format=%d.%m.%Y),as.Date(01.01.2012,format=%d.%m.%Y),by=1)
 test2- hist(data,br,freq=T)
 test2$counts[1:10]
  [1] 12 15 17 12 15  6  4  5 13  5


 In this case, neither get the right counts but limiting i get the correct
 counts:

 test3- hist(data[1:50],br[1:6],freq=T)
 test3$counts
 [1]  9 11 13  9  8


 I hope someone can help me with this problem.


 Thank you very much!!

 Best regards,

 Dominic

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] follow up on teaching R to high school students

2013-01-02 Thread Christopher W. Ryan
Some of you may recall that a few months ago I solicited advice and
opinions on both R-help and R-sig-teaching about an introduction to R
workshop I intended to present to a class of high school students
enrolled in a 3-year longitudinal science research class.  Much lively
discussion ensued. Well, I did the workshop on 8 November, and thought
I'd give some follow-up as to how it went, in case anyone was interested.

The class consisted of about 20 kids, grades 10-12. Each was at a
computer with R installed. It wasn't clear that the students would be
able to install packages, given the school's network setup, so I
confined my plans to base R. We ran for 5 hours, with about a 40-minute
break for lunch. Their science teacher was present for the entire time,
and the school's IT person attended about half of it.  They were both
very helpful, both in preparation and in execution.

I conducted an (utterly arbitrary and unvalidated) online survey among
the students a couple weeks in advance, to gauge their familiarity with
what I called technical computing, i.e. anything beyond commonplace
word-processing, spreadsheets, web surfing, and social media. The
questions were:

1. What operating systems do you know how to work in? Check all that apply.
   Windows 19
   Mac OS X 12
   Linux 2
   others 0

2. Do you have a favorite text editor?
   Yes 5
   No 7
   I don't know what a text editor is 7

3. Do you use a two-pane file manager?
   Yes 1
   No 6
   I don't know what a two-pane file manager is 12

4. Have you written programs in any computer language?
   Yes 4
   No 11
   I don't know 4
   (the specific languages cited included Basic, Java, Javascript, Ruby,
 C++, Python, MS-DOS command prompt batch files.)

I had my lesson plan all laid out in an org-mode file, from which I
typed code into an R console projected on the screen. The students
followed my steps initially, and then broadened out to some
experimentation as the day went on. A couple students were quite skilled
at working ahead, while others struggled a bit, but everyone was
eventually able to get the desired results.  They were generally very
engaged, interactive, and enthusiastic. No one left, except for the odd
music lesson here and there.  Overall, we had a lot of fun.

I tried to go pretty slowly. I prepared much more that we had time to
cover. I emphasized graphics. I did not get into inferential statistics
or hypothesis testing at all, despite their eagerness to do a t-test
and such. Maybe that will come at a future session, if we do one.

In general, topics we covered included:

vectorized mathematics (what I called bulk math)
generating sequences
(meant to do logical conditions here, but skipped it inadvertantly)
drawing random samples
different kinds of objects (we limited ourselves to scalars, vectors,
dataframes; character, numeric, and factor)
levels of measurement: nominal, ordinal, interval/ratio
exploring objects: str(), head(), tail(), class(), summary()
using in-built data sets provided with R
general principles of good data entry and storage, and the virtues of
plain text. Went over read.table (I meant to do more with reading data
into R, but ran out of time. I sent simple instructions for the foreign
package and read.spss() to their teacher after the fact, since up until
now they had been using SPSS a lot, and several of their data sets were
in that format.)
graphs: boxplots, scatterplots, stripcharts, scatterplot matrices, and
coplots (they liked that last one a lot).  Also some graphical
parameters: type=, main=, sub=, col=, xlim=, ylim=, and pch=

Comments to teacher over the subsequent couple of days included:
This should be taught in high school. I got to see data for the first
time in a different way. I had the most fun when I realized I could
play around with the program. (Of course, any less-than-positive
comments, the students (or their teacher) may have kept to themselves
out of courtesy to me!)

A post-workshop survey of the students has just begun, basically asking
them if they have downloaded R on their own computers, and whether they
have used it in any way since the workshop.

--Chris
-- 
Christopher W. Ryan, MD, MS
SUNY Upstate Medical University Clinical Campus at Binghamton
425 Robinson Street, Binghamton, NY  13904
cryanatbinghamtondotedu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plot txt

2012-05-23 Thread Christopher W. Ryan

Dear Trying:

after

g - read.table(ip.txt)

enter

str(g)

You will see that g is a dataframe with one variable, called V1. V1 
contains the data you want to plot.


So type

plot(g$V1, type=l)

or

with(g, plot(V1, type=l))

or, if you prefer to attach g:

attach(g)
plot(V1, type=l)

--Chris Ryan
SUNY Upstate Medical University
Binghamton, NY



Trying To learn again wrote:

Hi all

Sorry in advance
  I have this txt with data and I want to plot the data with a line between
dots.

The thing is that if I run this

g-read.table(ip.txt)
plot(g,type=l)


I have prove to attach(g) but doesn´ t work.

I know is a begginer question and sure is of the type of data or the file
but I can´t find where is the solution. It appears:

Error in plot.default(xlim, ylim, type = n, ann = FALSE, axes = FALSE,  :
  type  in associated with multiple argumetns specified

  Can anyone give me a clue...I will send a  Parma ham ¡¡¡Many thaks¡¡¡

I promise not to ask again but I would said I was sure that that sentences
worked.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional average

2012-04-25 Thread Christopher W. Ryan

Kebrab67--

It's difficult to answer in the absence of a small bit of representative 
data, and more information about it. How is age76 recorded, as numerical 
years or in categories (age ranges?)  And what are years (you didn't 
mention them as a variable in your data.)


That being said, perhaps by(), or summaryBy() in the doBy package, might 
help.


--Chris Ryan
SUNY Upstate Medical University Clincal Campus
Binghamton, NY

kebrab67 wrote:

Hello, I have a set of data including age, wage and education level each
called age76, wage76 and grade76 I want to know how i can calculate the
average wage of people age 15 to 65 (each year separetly) , only for those
who have an education level of 10 12 and 16...

--
View this message in context: 
http://r.789695.n4.nabble.com/Conditional-average-tp4585313p4585313.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional average

2012-04-25 Thread Christopher W. Ryan

I think this will accomplish what you describe:

dd - data.frame(id=c(1:6), age76=c(25, 27, 30, 82, 20, 25), 
Wage76=c(102456, 15432, 12340, 6, 2, 15000), Grade76=c(12, 15, 16, 
88, 16, 12), Black=c(1, 0, 1, 0, 1, 1), imigrt=c(0, 1, 1, 0, 1, 1))

dd
dd.sub - subset(dd, Grade76 %in% c(12,15,16))
dd.sub
#aggregate and summaryBy accomplish similar thing
aggregate(data=dd.sub, Wage76 ~ age76, FUN=mean)
library(doBy) #lattice gets loaded too
summaryBy(data=dd.sub, Wage76 ~ age76)
dd.means - aggregate(data=dd.sub, Wage76 ~ age76, FUN=mean)
xyplot(data=dd.means, Wage76 ~ age76)   # lattice already loaded

--Chris Ryan

kebrab67 wrote:

id   age76   Wage76   Grade76   Black   immigrt. ...
1  25  102456  12  1 0
2  27  15432 15  0 1
3  30  12340  16 1 1

Then I have lots of data variables 100 for 5000 individuals nearly. What I
wanted is to discriminated by age and education level (age is years only and
grade is years in educational system and wage is salary only no rational
numbers here)
I want to have for people of education level 10 12 16, the average wage for
age 15, and then the same for age 16, and the same for age 17 and so on...
Once i have these average I plot (age vs average salary) to see the
distribution in average

--
View this message in context: 
http://r.789695.n4.nabble.com/Conditional-average-tp4585313p4586691.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] introducing R to high school students

2012-04-22 Thread Christopher W. Ryan
I have to agree that Excel is a poor tool for serious scientific and 
engineering data analysis (love the phrase.) I too have spent way too 
much time beating Excel files into submission, with workarounds and 
manipulations, just to be able to do anything useful with them. I'm told 
that one can to some degree impose structure on Excel data entry, but I 
don't know how, and no users ever seem to set up their spreadsheets that 
way.


Somehow, a reasonable tool for business (I suppose, not being a 
businessman), has infiltrated the scientific world as well.


That's really the motivation for my proposal to my science teacher 
colleague. I want to introduce budding scientists to the idea that there 
is a better tool for data analysis, even for exploratory analysis and 
univariates and bivariates, which R does very handily. Why start an 
analysis in Excel only to have to switch to something else for the 
latter half?


And this will lead inevitably into conversations about better ways to 
record, store, and share data. And it ties into concepts of 
collaboration and reproducible research.


--Chris Ryan
SUNY Upstate Clinical Campus
Binghamton, NY

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] introducing R to high school students

2012-04-18 Thread Christopher W. Ryan
Thanks all for the excellent thought-provoking comments.

I want to clarify that these students are, for good or for ill, already
doing all these analytical and graphical things for their projects. They
are doing them with Excel and SPSS. One of my goals would be to teach
them how they can be done (and I think done better) in R. Better for
many reasons, not least of which is the reproducibility offered by lines
of saved code.

It seems that many (not all) on the list agree with the science teachers
that R is too difficult for high school students. Is R intrinsically
more difficult to learn than commercial spreadsheet software? If so,
why? Or is the issue that it is difficult to change to R after many
years experience in the mind-set of spreadsheets? If a child was
brought up on R for math/stats, in a developmentally progressive way,
instead of Excel or a graphing calculator, would he/she perceive it as
difficult?

Are the intrinsic cognitive differences between high schoolers, college
students, and graduate students substantial enough to explain why the
last can learn R and the first can't? Or is it a matter of exposure,
opportunity, etc?

Indrajit, I'm curious: given your preference for hand-drawn graphs for
learners (a very good point), why is Excel fine but R not?

At any rate, I should probably migrate this thread over to the Teaching
SIG listserve, which I didn't know about before.

Thanks again.

--Chris
Christopher W. Ryan, MD
SUNY Upstate Medical University Clinical Campus at Binghamton
425 Robinson Street, Binghamton, NY  13904
cryanatbinghamtondotedu

Observation is a more powerful force than you could possibly reckon.
The invisible, the overlooked, and the unobserved are the most in danger
of reaching the end of the spectrum. They lose the last of their light.
From there, anything can happen . . .  [God, in Joan of Arcadia,
episode entitled, The Uncertainty Principle.]

Bert Gunter wrote:
 ...snipped
 
 I anticipate keeping things very simple:
 --objects and the fact that there is stuff inside them. str(), head(), 
 tail()
 --how to get data into R
 --dataframes, as I imagine they will mostly be using single,
 rectangular datasets
 --a lot of graphics (I can't imagine that  plot(force, acceleration)
 is beyond a high-schooler's capability.)
 --simple descriptive statistics
 --maybe t-tests, chi-square tests, and simple linear regression.


 I have some experience in this and would have to agree with Indrajit
 that this is not a good idea.

 When I tried to teach R to a high school student it was not very
 successful.  Certainly based on that experience the list above is way
 too complex.  Don't teach anything on that list at all.  The number of
 concepts involved in that is simply overwhelming.
 
 Oh amen amen!
 
 I'd go farther: It's overwhelming for college students.
 
 Farther yet: I've met very few scientists and engineers who understand
 what a standard deviation is. Fewer still who understand the
 difference between a sample statistic and a population parameter for
 which it's an estimate.
 
 This approach to basic statistics is (imho) symptomatic of why our
 discipline is so widely disliked and misunderstood.
 
 Cheers,
 Bert
 
  Also avoid teaching
 anything that requires complex installation if you want them to be
 able to carry it forward by themselves.

 I would expect the reaction would be that most will have no interest
 and the ones that do will be frustrated by the large number of
 concepts needed to get going.

 The only part that seemed to trigger any interest was when I showed
 the large list of colors available in colors() and then playing with
 inserting different colors in:

 colors()
 plot(1:5, col = violetred)

 Assuming you are committed to this and go ahead, I would divide it
 into two parts:

 1. a graphics demo -- make it clear its a demonstration so they have
 an appreciation of what is possible and you are not actually teaching
 anything in this portion.

 2. Teach them how to install R, run the above two commands
 (substituting in different colors), how to exit and point out that
 there are many tutorials in:
 http://cran.r-project.org/other-docs.html
 and they can pick one they like (since the official documents will be
 over their head).

 If you do that then perhaps a small number will have sufficient
 interest to try it some more at home but I wouldn't be surprised if
 none do and that most or all would prefer something with more
 immediate gratification.

 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch

[R] introducing R to high school students

2012-04-17 Thread Christopher W Ryan
I participate peripherally on a listserve for middle- and high-school
science teachers. Sometimes questions about graphing or data analysis
come up. I never miss an opportunity to advocate for R. However, the
teachers are often skeptical that their students would be able to
issue commands or write a little code; they think it would be too
difficult. Perhaps this stems from the Microsoft- and
spreadsheet-centered, pointy-clicky culture prevalent in most US
public schools. Then again, I have little experience teaching this age
group, besides my own kids and my Science Olympiad team, so I respect
their concerns and expertise.

I don't know yet what software they generally use, but I suspect MS
Excel and SPSS.

Now I have to put my money where my mouth is. I've offered to visit a
high school and introduce R to some fairly advanced students
participating in a longitudinal 3-year science research class.

I anticipate keeping things very simple:
--objects and the fact that there is stuff inside them. str(), head(), tail()
--how to get data into R
--dataframes, as I imagine they will mostly be using single,
rectangular datasets
--a lot of graphics (I can't imagine that  plot(force, acceleration)
is beyond a high-schooler's capability.)
--simple descriptive statistics
--maybe t-tests, chi-square tests, and simple linear regression.

Alas, probably more than we would have time to cover.

Has anyone done anything with R in high schools?

Thanks.

--Chris Ryan
SUNY Upstate Medical University
Binghamton Clinical Campus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question on simple graph

2012-01-31 Thread Christopher W. Ryan
Without knowing more about what you are trying to accomplish, and what 
you have tried so far (code) it's difficult to say. But I'll venture 
this: are you sure you need a graph? Seems like a table might suffice.


BT - sample(1:5, 50, replace=TRUE)
RA - sample(1:5, 50, replace=TRUE)
table(BT)
table(RA)
dd - data.frame(better - BT, race=RA)
str(dd)
dd.table - xtabs(~better+race, data=dd)

# doesn't this table convey what you want?
prop.table(dd.table,2)

# if for some reason a graph is essential, this creates one:
mosaicplot(dd.table)

Is this sort of what you had in mind?

--Chris Ryan
SUNY Upstate Medical University
Clinical Campus at Binghamton

Rebecca Lisi wrote:

I am having trouble generating a graph.

I want to know the % of respondents who answered that they strongly
agree or agree the America owes R's ethnic group a better chance
(BTTRCHNC) and I want to organize it by racial group (RACESHRT).

BTTRCHNC is organized ordinally from 1 through 5 with 1=Strongly
Agree, 5=Strongly Disagree
RACESHRT is ordinally organized from 1 through 5 where each number
represents a racial group category, i.e. white, black, Asian, etc.

Any hints for how to proceed?


The information contained in this e-mail is confidential...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] having trouble extracting week from chron object

2011-08-30 Thread Christopher W Ryan
Running R 2.13.1 on Windows XP.
I would like to get week of the year (1-52) for each date.

library(chron)
dts - dates(c(02/27/92, 02/27/92, 01/14/92,02/28/92, 02/01/92))
dts
dts.chron - as.chron(dts)
dts.chron
class(dts.chron)
# all of these component extractions work:
months(dts.chron)
weekdays(dts.chron)
years(dts.chron)
quarters(dts.chron)
# but weeks() doesn't seem to work
weeks(dts.chron)

?chron tells me: . . . The functions days(), months(), quarters(),
years(), weeks(), weekdays(), hours(), minutes(), and seconds() take
any chron object as input and extract the corresponding time interval
. . . .

Any advice?

Thanks.

Chris Ryan
SUNY Upstate Medical University Clinical Campus at Binghamton
Binghamton, NY

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] having trouble extracting week from chron object

2011-08-30 Thread Christopher W Ryan
Here it is with the output:

 library(chron)
 dts - dates(c(02/27/92, 02/27/92, 01/14/92,02/28/92, 02/01/92))
 dts
[1] 02/27/92 02/27/92 01/14/92 02/28/92 02/01/92
 dts.chron - as.chron(dts)
 dts.chron
[1] 02/27/92 02/27/92 01/14/92 02/28/92 02/01/92
 class(dts.chron)
[1] dates times
 # all of these component extractions work:
 months(dts.chron)
[1] Feb Feb Jan Feb Feb
12 Levels: Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  ...  Dec
 weekdays(dts.chron)
[1] Thu Thu Tue Fri Sat
Levels: Sun  Mon  Tue  Wed  Thu  Fri  Sat
 years(dts.chron)
[1] 1992 1992 1992 1992 1992
Levels: 1992
 quarters(dts.chron)
[1] 1Q 1Q 1Q 1Q 1Q
Levels: 1Q  2Q  3Q  4Q
 # but weeks() doesn't seem to work
 weeks(dts.chron)
Error: could not find function weeks
 ?weeks
No documentation for 'weeks' in specified packages and libraries:
you could try '??weeks'
 weeks
Error: object 'weeks' not found
 week
Error: object 'week' not found

 sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] chron_2.3-42

Thanks.

--Chris Ryan


On Tue, Aug 30, 2011 at 1:54 PM, David Winsemius dwinsem...@comcast.net wrote:

 On Aug 30, 2011, at 1:36 PM, Christopher W Ryan wrote:

 Running R 2.13.1 on Windows XP.
 I would like to get week of the year (1-52) for each date.

 library(chron)
 dts - dates(c(02/27/92, 02/27/92, 01/14/92,02/28/92, 02/01/92))
 dts
 dts.chron - as.chron(dts)
 dts.chron
 class(dts.chron)
 # all of these component extractions work:
 months(dts.chron)
 weekdays(dts.chron)
 years(dts.chron)
 quarters(dts.chron)
 # but weeks() doesn't seem to work

 It doesn't work for me either but since I also have the package lubridate
 loaded, and ?weeks brings up a lubridate help page, I don't feel I have any
 right to complain. I get an error message what did you get?:

 Error in Ops.dates(pieces$week, 7) : * not defined for chron objects

 If I were posting I would also have had the good sense to include the output
 of sessionInfo().

 weeks(dts.chron)

 Try type just:

  weeks  # I get a lubridate function.


 ?chron tells me:


 Bur what does ?weeks tell you?

 . . . The functions days(), months(), quarters(),
 years(), weeks(), weekdays(), hours(), minutes(), and seconds() take
 any chron object as input and extract the corresponding time interval
 . . . .

 Any advice?

 Thanks.

 Chris Ryan
 SUNY Upstate Medical University Clinical Campus at Binghamton
 Binghamton, NY

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About GAM in R, Need YOUR HELP!

2011-06-20 Thread Christopher W Ryan
If you use the mgcv package for GAMs, I found this book very helpful:

Wood, SN
Generalized additive models: An introduction with R
CRC Press, 2006
ISBN-13: 9781584884743

--Chris Ryan
SUNY Upstate Medical University



On Mon, Jun 20, 2011 at 1:25 AM, pigpigmeow gloryk...@hotmail.com wrote:
 I'm beginner in R! I have a lot of problems on R.

 I have three questions about GAM
 1. What is the function of Gaussian distribution in GAM?(if I choose family
 is Gaussian)
 Is it used in the predictand value (Y)?

 2. How to plot a graph the gam function?
 For example: y-gam(a~s(b),family=gaussian (link=log)
 ,Data)
 how to plot x axis is s(b) and y axis is log a???


 3. if I use GAM to create an equation with smoothing function
 For example: y-gam(a~s(b)+s(c),family=gaussian (link=log),Data)
 how to get the coefficient of s(b) , s(c)
 My final target is getting the GAM equation to predict a, so I need to know
 how to get the coefficient of equation and how to write the equation!!!


 As I know, I'm silly to ask these questions. But GAM is very difficult to
 me. From now on, I need to clear the concept of GAM.

 Thank YOU!


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/About-GAM-in-R-Need-YOUR-HELP-tp3610522p3610522.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] specifying scales in lattice xyplot makes the lines disappear?

2011-05-10 Thread Christopher W Ryan
I have a dataframe concerning manner of death from death certificates,
from 2005 to 2009 inclusive, with the following structure:

 str(MannerYoung.plot.data)
'data.frame':   245 obs. of  4 variables:
 $ year : Factor w/ 5 levels 2005,2006,..: 1 2 3 4 5 1 2 3 4 5 ...
 $ manner   : Factor w/ 7 levels Accident,Homicide,..: 1 1 1 1
1 2 2 2 2 2 ...
 $ new.age.group: Factor w/ 7 levels 1 to 4,5 to 9,..: 1 1 1 1 1 1
1 1 1 1 ...
 $ Freq : int  [omitted from this post]...


new.age.group categorizes age of death in 5-year age bands.

The following makes a nice lattice scatterplot, with a panel for each
age-band, year on the horizontal axis,and frequency on the veritcal
axis.

xyplot(Freq~year | new.age.group, groups=manner,
data=MannerYoung.plot.data, type=l,  auto.key =
list(text=levels(MannerYoung.plot.data$manner), lines=TRUE,
points=FALSE, space = top),
layout=c(length(levels(MannerYoung.plot.data$new.age.group)),1))

Except that the year tick labels overlap each other. I thought I would
try to label only every other tick, using scales(), like this:

xyplot(Freq~year | new.age.group, groups=manner,
data=MannerYoung.plot.data, type=l,
scales=list(x=list(at=c(2005,2007,2009),
labels=c(2005,2007,2009))), auto.key =
list(text=levels(MannerYoung.plot.data$manner), lines=TRUE,
points=FALSE, space = top),
layout=c(length(levels(MannerYoung.plot.data$new.age.group)),1))

This draws the panels, vertical axis, horizontal axes, with the
odd-numbered years, everything as desired, except that there are no
lines in the panels--no actual content.

I suppose I could decrease the font size for the year labels, using
cex(), and that cured the overlap, but I wanted to understand
scales().

Where am I going wrong?

Thanks.

--Chris Ryan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding vertical segments to an xyplot in lattice

2011-03-23 Thread Christopher W. Ryan

Peter--

That's it exactly! Thanks.

--Chris

Christopher W. Ryan, MD
SUNY Upstate Medical University Clinical Campus at Binghamton
425 Robinson Street, Binghamton, NY  13904
cryanatbinghamtondotedu

Observation is a more powerful force than you could possibly reckon. 
The invisible, the overlooked, and the unobserved are the most in danger 
of reaching the end of the spectrum. They lose the last of their light. 
From there, anything can happen . . .  [God, in Joan of Arcadia, 
episode entitled, The Uncertainty Principle.]


Peter Ehlers wrote:

On 2011-03-22 12:12, Christopher W Ryan wrote:

I have a dataframe that looks like this:

 str(chr)
'data.frame': 84 obs. of 7 variables:
$ county: Factor w/ 3 levels Broome,Nassau,..: 3 3 3 3 3 3 3 3 3 3
...
$ item : Factor w/ 28 levels Access to healthy foods,..: 21 19 20
18 16 3 2 6 17 8 ...
$ value : num 8644 15 3.5 3.9 7.7 ...
$ low : num 7897 9 2.5 2.6 7 ...
$ high : num 9390 22 4.5 5.2 8.4 37 30 23 24 101 ...
$ target: num 5034 11 2.7 2.6 6.1 ...
$ nys : num 6099 16 3.5 3.3 8 ...


head(chr)

county item value low high target nys
1 Sullivan Premature death 8644.0 7897.0 9390.0 5034.0 6099.0
2 Sullivan Poor or fair health 15.0 9.0 22.0 11.0 16.0
3 Sullivan Poor physical health days 3.5 2.5 4.5 2.7 3.5
4 Sullivan Poor mental health days 3.9 2.6 5.2 2.6 3.3
5 Sullivan Low birthweight 7.7 7.0 8.4 6.1 8.0
6 Sullivan Adult smoking 29.0 22.0 37.0 15.0 20.0

I'd like to graph high and low for Premature death for each of the
three counties, with 3 vertical line segments, one connecting those
two points for each county. I can get the two points for each county:


xyplot(low+high ~ county, data=subset(chr, item==Premature death))


but I have not yet been able to figure out how to draw the 3 vertical
line segments. Been struggling to understand panel functions, but no
success so far. I'd be grateful for any advice.


For lattice, I usually prefer the long version of a dataset.
Try this:

dd - data.frame(county = letters[1:3],
lo = c(5,2,3),
hi = c(9,5,10))

## convert to 'long' format (you can use the reshape() function in stats
or the reshape package:

require(reshape)
dd.long - melt(dd, id = county)
dd.long

require(lattice)
xyplot(value ~ county, data = dd.long, groups = county,
pch = 19, type = 'b', cex = 2, lwd = 5, col = 2:4)

Peter Ehlers



Thanks.

--Chris Ryan
SUNY Upstate Medical University
Clinical Campus at Binghamton


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding vertical segments to an xyplot in lattice

2011-03-22 Thread Christopher W Ryan
I have a dataframe that looks like this:

  str(chr)
'data.frame':   84 obs. of  7 variables:
 $ county: Factor w/ 3 levels Broome,Nassau,..: 3 3 3 3 3 3 3 3 3 3 ...
 $ item  : Factor w/ 28 levels Access to healthy foods,..: 21 19 20
18 16 3 2 6 17 8 ...
 $ value : num  8644 15 3.5 3.9 7.7 ...
 $ low   : num  7897 9 2.5 2.6 7 ...
 $ high  : num  9390 22 4.5 5.2 8.4 37 30 23 24 101 ...
 $ target: num  5034 11 2.7 2.6 6.1 ...
 $ nys   : num  6099 16 3.5 3.3 8 ...

 head(chr)
county  item  valuelow   high targetnys
1 Sullivan   Premature death 8644.0 7897.0 9390.0 5034.0 6099.0
2 Sullivan   Poor or fair health   15.09.0   22.0   11.0   16.0
3 Sullivan Poor physical health days3.52.54.52.73.5
4 Sullivan   Poor mental health days3.92.65.22.63.3
5 Sullivan   Low birthweight7.77.08.46.18.0
6 Sullivan Adult smoking   29.0   22.0   37.0   15.0   20.0

I'd like to graph high and low for Premature death for each of the
three counties, with 3 vertical line segments, one connecting those
two points for each county.  I can get the two points for each county:

xyplot(low+high ~ county, data=subset(chr, item==Premature death))

but I have not yet been able to figure out how to draw the 3 vertical
line segments. Been struggling to understand panel functions, but no
success so far. I'd be grateful for any advice.

Thanks.

--Chris Ryan
SUNY Upstate Medical University
Clinical Campus at Binghamton

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] trouble with \textless in Hmisc latex() on a drop1 object

2010-10-21 Thread Christopher W Ryan
Yes, it's homework . . . delete now if desired . . . but I think it is an
interesting problem.

Running R 2.11.1, LaTeX on WinXP, via Sweave.

A drop1() object from a glm() produces, as part of its output, a string that
looks like this:

none

The trouble I run into is  that running latex() on a drop1() object from
glm() produces a string that looks like this in the generated .tex file:

\textlessnone\textgreater

This gives an undefined control sequence error when compiling  the .tex
file. I can just ENTER past the error, but the resulting .pdf file
contains just a greater-than symbol in the table where none is supposed to
appear.

If I go into the .tex file and manually insert a space, changng it to

\textless none\textgreater

the document compiles fine.

Reproducile example below. Any suggestions?

Thanks.
Chris Ryan
SUNY Upstate Medical University Binghamton Clinical Campus

\documentclass{article}
\begin{document}

options, echo=FALSE, hide=TRUE=
options(SweaveSyntax = SweaveSyntaxNoweb)
library(Hmisc)
@

code=
salmon - data.frame(cbind(c(rep(0,49),rep(1,51)), rnorm(100), rnorm(100)))
names(salmon) - c(country,freshwater,marine)
model1 - glm(country~freshwater+marine, family=binomial, data=salmon)
summary(model1)
model1r - round(drop1(model1),3)
@

Here is some problem code.

problemcode, results=tex=
latex(model1r, file=, caption=signficance of individual predictors)
@

\end{document}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple car scatterplots on one graph

2010-10-14 Thread Christopher W Ryan
R version 2.11.1 on WinXP

How do I get 3 scatterplots with marginal boxplots (from the
car package) onto a single plot?

I have a data frame called bank

 dim(bank)
[1] 46  5

head(bank)

 x1x2   x3   x4 pop
1 -0.45 -0.41 1.09 0.45   0
2 -0.56 -0.31 1.51 0.16   0
3  0.06  0.02 1.01 0.40   0
4 -0.07 -0.09 1.45 0.26   0
5 -0.10 -0.09 1.56 0.67   0
6 -0.14 -0.07 0.71 0.28   0

library(car)
par(mfrow=c(2,2))

# following lines may be wrapped badly--sorry

with(bank, scatterplot(x1,x2,groups=pop, reg.line=FALSE,
smooth=FALSE, boxplots=xy, reset.par=FALSE))

with(bank, scatterplot(x1,x3,groups=pop, reg.line=FALSE,
smooth=FALSE, boxplots=xy, reset.par=FALSE))

with(bank, scatterplot(x1,x4,groups=pop, reg.line=FALSE,
smooth=FALSE, boxplots=xy, reset.par=FALSE))

I have tried various permutations of the reset.par=  option:
all three lines FALSE, all 3 lines TRUE, the first TRUE and
the others FALSE, vice versa, etc. And always I get just one
scatterplot showing up on the device at a time, and
occupying the whole thing.

Thanks.

--Chris Ryan
SUNY Upstate Medical University
Binghamton Clinical Campus

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple car scatterplots on one graph--sorry for duplicate post--already solved

2010-10-14 Thread Christopher W Ryan
Apologies for the duplicate post. I'm caught between different misbehaving
email clients on different computers at the moment. Didn't see my original
post, nor Greg's reply, nor any evidence in my sent folder that my
original message ever went out. I've got them now.

--Chris Ryan

On Thu, Oct 14, 2010 at 11:29 PM, David Winsemius dwinsem...@comcast.netwrote:

 This is a duplicate, right? You're not posting again after what Greg Snow
 already explained what needed to be done, are you?

 --
 David.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R editor in ubuntu!

2010-10-05 Thread Christopher W Ryan
As an alternative to emacs-ess, you could try gedit with the R plugin.

--Chris Ryan


 I am looking for an editor to be able to execute commands into R in
Linux ubuntu. Is there any suggestion?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Founding

2010-09-16 Thread Christopher W. Ryan
I mailed a check for an R Foundation membership almost a year ago, along 
with the form. In US dollars,  corrected by the then-current Euro 
exchange rate. It has never been cashed.


Christopher W. Ryan, MD
SUNY Upstate Medical University Clinical Campus at Binghamton
425 Robinson Street, Binghamton, NY  13904
cryanatbinghamtondotedu

Observation is a more powerful force than you could possibly reckon. 
The invisible, the overlooked, and the unobserved are the most in danger 
of reaching the end of the spectrum. They lose the last of their light. 
From there, anything can happen . . .  [God, in Joan of Arcadia, 
episode entitled, The Uncertainty Principle.]


stephen sefick wrote:

I am a poor student, and would gladly donate 20ish bucks if it would
help.  R continues to make me more productive.  Thanks for all of the
good work!



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] OT: model diagnostics in the published literature

2010-09-09 Thread Christopher W. Ryan

This is a more general statiscal question, not specific to R:

As I move through my masters curriculum in statistics, I am becoming 
more and more attuned to issues of model fit and diagnostics (graphical 
methods, AIC, BIC, deviance, etc.) As my regression professor always 
likes to say, only draw substantive conclusions from valid models.


Yet in published articles in my field (medicine), I rarely see any 
explicit description of whether, and if so how, model fit was assessed 
and assumptions checked. Mostly the results sections are all about 
hypothesis testing on model coefficients.


Is this common in other disciplines? Are there fields of study in which 
it is customary to provide a discussion of model adequacy, either in the 
text or perhaps in an online appendix?


And if that discussion is not provided, what, if anything, can one 
conclude about whether, and how well, it was done? Is it sort of taken 
as a given that those diagnostic checks were carried out? Do journal 
editors often ask?


Thanks for your thoughts.

--Chris Ryan
Clinical Associate Professor of Family Medicine
SUNY Upstate Medical University Clinical Campus at Binghamton

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Learning ANOVA

2010-08-13 Thread Christopher W Ryan
Read documentation for TukeyHSD by typing the command:

?TukeyHSD

The input to that function should usually be, a fitted model object,
usually an aov fit.

You have not created a fitted model object.

This seems to work:
model - aov(InsectSprays$count ~ InsectSprays$spray)
TukeyHSD(model)


The aov() makes the aov model object. Tacking .aov onto an object name
doesn't do that.

Which introductory R book(s) have you read? That would be a worthwhile
investment of your time.

--Chris Ryan

On Fri, Aug 13, 2010 at 12:43 PM, Stephen Liu sati...@yahoo.com wrote:

 Hi Erik,


 I followed following video as example;

 ANOVA in R
 http://www.youtube.com/watch?v=Dwd3ha0P8uwfeature=related


 Now I got it done;

  boxplot(test01$count ~ test01$spray)


 Continued:

  InsectSprays.aov -(test01$count ~ test01$spray)
  summary(InsectSprays.aov)
  Length   ClassMode
  3 formulacall

 Seems having problem here.


  TukeyHSD(InsectSprays.aov)
 Error in UseMethod(TukeyHSD) :
  no applicable method for 'TukeyHSD' applied to an object of class
 formula

 I'm still stuck here.


 B.R.
 Stephen L




 - Original Message 
  From: Erik Iverson er...@ccbr.umn.edu
 To: Stephen Liu sati...@yahoo.com
 Cc: r-help@r-project.org
 Sent: Sat, August 14, 2010 12:15:31 AM
 Subject: Re: [R]  Learning ANOVA



  Performed following steps on R:-
 
  ### to access to the object
  data(InsectSprays)
 
  ### create a .csv file
  write.csv(InsectSprays, InsectSpraysCopy.csv)
 
 
  On another terminal
  $ sudo updatedb
  $ locate InsectSpraysCopy.csv
  /home/userA/InsectSpraysCopy.csv
 
 
  ### Read in some data
  test01 - read.csv(file.choose(), header=TRUE)
 
  Enter file name: /home/userA/InsectSpraysCopy.csv

 I either don't understand what you're doing, or you seem
 very confused.

 R comes with many sample data sets for you to use.

 You can see a list of them using the ?data function.

 Calling data with an argument loads that dataset.

 So, when you type:

  data(InsectSprays)

 that data object is now available in R, see

  objects()

 You can look at it simply by printing it:

  InsectSprays

 If for some reason it makes sense to do it this way for your
 use case, then that's fine, I just want to make sure you understand
 that you don't have to if accessing built-in datasets is all you want.

  ### Look at the data
  test01
  X count spray
  1   110 A
  2   2 7 A
  3   320 A

 snip

  ### Create a side-by-side boxplot of the data
  boxplot(test01$DO ~ test01$Stream)
  Error in model.frame.default(formula = test01$DO ~ test01$Stream) :
 invalid
 type (NULL) for variable 'test01$DO'

 Why do you think test01 has an element called D0 or Stream?? The
 column names when you print the data tell you otherwise!



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding new column variable to a existing spreadsheet

2010-04-23 Thread Christopher W Ryan
You should be aware that if you have read the data from your csv file
into an object in R, generally nothing you do with that object in R
will change your original csv file (this is generally a good thing),
until/unless, you save or write something from R to that csv file
(which is sometimes not a good idea, if you want to preserve what your
original data looked like.)

It would also be helpful if you could post to the list  the lines of
code that you used to do whatever you are describing here in words.
This will get you better answers.

If the dataframe in R into which you read your data is called my.data,
then you could create a new column or variable with a command like
this:

my.data$newcolumn - [in here put the expression for your new column]

for example:

my.data$newcolumn - my.data$firstcolumn + my.data$secondcolumn

On Fri, Apr 23, 2010 at 9:49 PM, Carol Gao carol.g...@gmail.com wrote:
 Hi,

 I've got a csv format of file. There are seven columns of variables all
 together, and I've used the diff() function to take the differences between
 the variables in one column. Now I want to add the differences as the 8th
 column variable to the csv file. Is there any way that I can do this?

 Thanks in advance.

 Carol Gao

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] managing data and removing lines

2010-04-16 Thread Christopher W. Ryan

Tara--

Welcome to R!

Your questions could be answered from a variety of angles. I'd start by 
asking, where did the n/a's come from--they were in your text file, I 
assume, to represent missing data?  If so, when you imported your data 
from that text file into R, those n/a's were considered (rightfully) to 
be character strings, not numbers, so your columns (variables) ended up 
being what R calls factors, not numeric variables.


I used the data you provided, saved it in a csv file called 
untitled.csv, the values in each record separated by commas instead of 
spaces.  Then I did the following (# is the comment symbol)


habitat - read.csv(untitled.csv, header=TRUE)
head(habitat) # useful command to look at the first few lines of data
str(habitat) # a very useful command to examine structure of an R object
# see that all columns are factors.

R knows what to do with missing data in a csv file. It will turn them 
into NA. Which is *not* the same as n/a. In R, NA means not available. 
Sometimes you will also run into NaN, which means not a number.


For example, I replaced all the n/a's in the text file with . . . 
nothingness (that is to say, when a value is missing, there will just be 
2 commas in a row), and saved the file as untitled2.csv.  Now:


habitat2 - read.csv(untitled2.csv, header=TRUE)
head(habitat2)
str(habitat2) # all columns are num for numeric
model - lm(gdist ~ gair, data=habitat2)

Hope this helps get you started.

There's an excellent book by Phil Spector, called Data Manipulation with 
R. I'd recommend it very highly.


--Chris Ryan


Tara Imlay wrote:

Hi,

I am very new to R and I've been trying to work through the R book to gain a
better idea of the code (which is also completely new to me).

Initially I imputed my data from a text file and that seemed to work ok, but
I'm trying to examine linear relationships between gdist and gair, gdist and
gsub, m6dist and m6air, etc.

This didn't work and I think it might have something to do with the n/a's in
my dataset.

habitat

gdist gair gsub m6dist m6air m6sub m7dist m7air m7sub m8dist m8air m8sub
1  208   14   -0.52419  7  12.1  16.12.51212
2   4   13   15   -0.1  24.5  24.50.1  11.4  15.1  21416
3  30 12.6 16.4 -325262.5   9.7  12.80.1  11.514
4  40 12.6 17.9  1   n/a   n/a0.1   8.1  15.2  21620
5  402  1.8  1   n/a   n/a0.7  10.2  24.1  21619

. . . . . .


Is there anyway to use my old data set with all the n/a's to look at
relationships between the variables?  Ideally I want to add in more habitat
variables to this analysis, that will include some categorical data and more
n/a's since the data collection was not complete with every observation.

Any help is appreciated.

Tara

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] trouble calculating rates--sometimes the denominator is missing

2010-03-11 Thread Christopher W. Ryan
Thanks! I think that will do the trick. I had not known of the match 
function before. Always learn something new on this list.


--Chris
Christopher W. Ryan, MD
SUNY Upstate Medical University Clinical Campus at Binghamton
425 Robinson Street, Binghamton, NY  13904
cryanatbinghamtondotedu

If you want to build a ship, don't drum up the men to gather wood, 
divide the work and give orders. Instead, teach them to yearn for the 
vast and endless sea.  [Antoine de St. Exupery]


jim holtman wrote:

Here is how you can do the lookup:
 
  # lookup up the school
  lkup - read.table(textConnection(1'BENJAMIN FRANKLIN 
ES'465

+ 2'CALVIN COOLIDGE SCHL'379
+ 3'EAST MS' 590
+ 4'HORACE MANN SCHL'374
+ 5'MAC ARTHUR SCHL' 481
+ 6'THEO ROOSEVELT SCHL' 377))
 
  # find a match
  lkup[match('HORACE MANN SCHL', lkup[,2]), 3]
[1] 374
 


 
On Wed, Mar 10, 2010 at 10:44 AM, Christopher W. Ryan 
cr...@binghamton.edu mailto:cr...@binghamton.edu wrote:


One more bit:  I got as far as this, thinking it might help:

Using a data file that I know has all the necessary denominators, I
created a dataframe of school names (as factor) and
TotalStudentsEnrolled.

 data.frame(data$School[!duplicated(data$School)],
data$TotalStudentsEnrolled[!duplicated(data$School)])

1BENJAMIN FRANKLIN ES465
2CALVIN COOLIDGE SCHL379
3EAST MS 590
4HORACE MANN SCHL374
5MAC ARTHUR SCHL 481
6THEO ROOSEVELT SCHL 377
[truncated]

I thought I might be able to look up the necessary value for each
school from this dataframe. But I can't get my head around using
indices to do it.

Thanks again.

--Chris


-- 
Christopher W. Ryan, MD

SUNY Upstate Medical University Clinical Campus at Binghamton
425 Robinson Street, Binghamton, NY  13904
cryanatbinghamtondotedu

If you want to build a ship, don't drum up the men to gather wood,
divide the work and give orders. Instead, teach them to yearn for
the vast and endless sea.  [Antoine de St. Exupery]

__
R-help@r-project.org mailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] trouble calculating rates--sometimes the denominator is missing

2010-03-10 Thread Christopher W. Ryan
Every day I get a csv file containing the names of the 64 schools in our 
county, the number of students sent home ill, and the number of students 
absent (plus lots of other variables). The file is cumulative since fall 
of 2009. It is in long format: one line per school per day.


Each line is also supposed to contain the total number of students 
enrolled in the school. That number doesn't change often or much, so the 
same value is usually repeated on each line for each school. Thus 
calculating proportion of students absent or sent home ill is easy (see 
lines between the #); here is the beginning of my code (my apologies 
for the word-wrapping, I use some long variable names):


setwd(C:/data/bchd/schoolsurveillance)
library(ggplot2)
library(doBy)
library(reshape)
data - read.csv(C:/DATA/BCHD/schoolsurveillance/Broome_02MAR10.csv, 
header=TRUE, sep=,, fill=TRUE)

data$date - as.character(data$ReportingDate)
data$date - as.Date(data$ReportingDate, format=%d%b%y)

data$PercentStudentsAbsent - 
data$StudentsAbsentTotal/data$TotalStudentsEnrolled

data$PercentSentHome - data$SentHomeTotal/data$TotalStudentsEnrolled

attach(data)

The problem is that sometimes, in some of the daily files, the 
TotalStudentsEnrolled field is left entirely blank--in every record. 
Unfortunately the data collection system is out of my hands, and still a 
little rough around the edges. The powers-that-be can put those numbers 
back in on the subsequent day, then my code runs fine. But if possible, 
I want to make my code less susceptible to this external threat.


What would be a good way to store up the names of the 64 schools and 
their total enrollments (which are basically static), and them use those 
values for the denominators for the rates as calculated above (), 
rather than relying on always having a complete, rectangular, data file, 
every line containing the necessary value for a denominator?


Thanks.
--
Christopher W. Ryan, MD
SUNY Upstate Medical University Clinical Campus at Binghamton
425 Robinson Street, Binghamton, NY  13904
cryanatbinghamtondotedu

If you want to build a ship, don't drum up the men to gather wood, 
divide the work and give orders. Instead, teach them to yearn for the 
vast and endless sea.  [Antoine de St. Exupery]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >