Re: [R] Subsetting a list of lists using lapply

2015-02-20 Thread Aron Lindberg
Thanks Chuck and Rolf.




While Rolf’s code also works on the dput that I actually gave you (a smaller 
subset of the full dataset), it failed to work on the larger dataset, because 
there are further exceptions:





input[[i]]$content[[1]] is sometimes a list, sometimes a character vector, and 
sometimes input[[i]]$content simply returns list().




Chuck’s solution however bypasses this and works on the full dataset (which was 
8mb, which is why I didn’t upload it as a gist).




Best,

Aron




-- 

Aron Lindberg




Doctoral Candidate, Information Systems

Weatherhead School of Management 

Case Western Reserve University

aronlindberg.github.io

On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu wrote:

 Aron Lindberg aron.lindberg at case.edu writes:
 
 Hi Everyone,
 
 I'm working on a thorny subsetting problem involving list of lists. I've put 
 a 
 dput of the data here:
 
  https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/
 raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput
 
 IIUC, you want the value of every list element that is named sha and 
 that name will only apply to atomic objects.
 If so, this should do it. 
 input - dget(/tmp/dpt)
 shas - unlist( input, use.names=FALSE )[ grepl( sha, 
 names(unlist(input)))]
 input[[67]]$content[[1]]$sha
 [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15
 which(input[[67]]$content[[1]]$sha == shas )
 [1] 194
 HTH,
 Chuck
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] colours in ggplot2

2015-02-20 Thread Thierry Onkelinx
Dear Antonello,

You can specify the colours manually with scale_colour_manual(). See
http://docs.ggplot2.org/0.9.3.1/scale_manual.html for some examples. The
last examples uses greys.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-02-20 13:54 GMT+01:00 Antonello Preti antovi...@gmail.com:

 Hi, I'm using ggplot2 to make a plot of the regression of a variable x (let
 say, levels of depression),
 on a variable y (let say, degree of social impairment),
 by taking into account a binary factor (having had or not a past admission
 to a psychiatric service),
 and age of partecipants.

 After some search in Internet I produced a code which is satisfying to me.
 This site was very helpful: http://editerna.free.fr/wp/?p=266

 However, I have a problem: no matter what I try, the figures always include
 bluette and pink flamingo colours.
 The figure is for an academic article, and I cannot afford the price of
 having the plot printed in colours.

 I've extracted the structure of the figure, and I understand that the
 problem is in the scale_name hue,
 but I cannot figure out how to deal with it.

 Any way to override the ggplot2 system of dealing with factors?

 Here the codes and the sessionInfo
 The code is a bit baroque, but this is the best I was able to do.


 Thank you in advance,
 Antonello Preti



  code for exemplification

 ### the dataset

 df - structure(list(Social_impairment = c(2.83, 3.08, 2.75, 2.08,
 2.92, 1.75, 3.5, 2.33, 2.91, 2.5, 3.25, 2.64, 3.25, 2.83, 2.08,
 2.25, 2.17, 2.42, 2.58, 2.42, 2.58, 2.42, 3, 3, 2.83, 2.67, 3.58,
 1.58, 2.83, 2.83, 2.67, 3.17, 2.42, 1.92, 2.92, 2.5, 2.42, 2.42,
 2.58, 2.42, 3.33, 3, 3.17, 2.17, 2.58, 2.67, 2.58, 3.75, 2.5,
 2.08, 2.25, 3.25, 3.17, 2.91, 2.08, 2.25, 3.08, 2.91, 3.08, 2.92,
 1.83, 2.5, 2.5, 2.83, 2.67, 3.33, 2.83, 3.33, 2.92, 3), Levels_Depression =
 c(1.3,
 1.71, 3.08, 0.48, 0.51, 0.71, 1.37, 0.2, 1.21, 1.07, 2.8, 1.24,
 0.46, 0.97, 0.81, 1.13, 1.58, 3.12, 1.8, 1.54, 1.02, 0.32, 2.63,
 1.39, 1.34, 2.37, 2.6, 1.11, 1.59, 2.17, 1.99, 0.59, 0.76, 0.23,
 2.22, 1.98, 0.41, 0.32, 0.37, 1.11, 2.29, 0.97, 1.61, 1.27, 1.22,
 2.38, 1.28, 1.21, 0.93, 2.3, 0.8, 2.1, 2.86, 2.47, 2.34, 2.67,
 0.31, 0.88, 1.84, 0.23, 2.41, 0.56, 2.03, 1.11, 0.12, 2.39, 0.34,
 2.08, 1.01, 1.51), Age = c(66, 59, 49, 70, 42, 55, 28, 41, 69,
 65, 40, 21, 18, 77, 28, 40, 47, 37, 47, 39, 32, 33, 42, 28, 59,
 49, 29, 41, 22, 29, 53, 39, 55, 61, 30, 49, 43, 46, 18, 36, 34,
 17, 42, 37, 37, 54, 48, 23, 71, 42, 52, 83, 19, 47, 23, 80, 43,
 38, 47, 80, 36, 73, 74, 51, 76, 14, 65, 39, 17, 73), Past_Admissions = c(1,
 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1,
 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0,
 1, 0, 1, 0, 0, 1)), .Names = c(Social_impairment, Levels_Depression,
 Age, Past_Admissions), row.names = c(NA, 70L), class = data.frame)

 dim(df)
 head(df)
 str(df)
 summary(df)


 ### call the library

 library(ggplot2)


  the plot


  Levels_Depression on Social_impairment by Past_Admissions (yes/no)
  linear model
  radius of the bubbles proportional to age


  background elimination

 p1 - ggplot(data = df, aes(x =Levels_Depression, y = Social_impairment,
 group = as.factor(Past_Admissions), col = as.factor(Past_Admissions))) +
   geom_point(aes(size = Age)) + geom_smooth(method = lm) + xlab(Levels
 of depression) + ylab(Social impairment) +
   scale_colour_discrete(History of \npast admissions\nto a psychiatric
 service, labels = c(No, Yes))

 p1 + theme(panel.grid.major = element_blank(), panel.grid.minor =
 element_blank(),
 panel.background = element_blank(), axis.line = element_line(colour =
 black))


 ### change of then axes' ticks

 p1 + theme(panel.grid.major = element_blank(), panel.grid.minor =
 element_blank(),
 panel.background = element_blank(), axis.line = element_line(colour =
 black),
 axis.text = element_text(color = black, size = 12, face = italic))


 ### after saving, dev.off()
 ###


  Age on Social_impairment by Past_Admissions (yes/no)
  linear model
  radius of the bubbles proportional to Levels_Depression



  background elimination

 p2 - ggplot(data = df, aes(x =Age , y = Social_impairment, group =
 as.factor(Past_Admissions), col = as.factor(Past_Admissions))) +
   geom_point(aes(size = Levels_Depression)) + 

Re: [R] How to analyse nonlinear response to categorical and quantitative explanatory variables?

2015-02-20 Thread Michael Friendly

You want to use a generalized linear model of some sort

glm(count ~ flow + gravity + group, data=mydata, family=poisson)

would be a start, however, the effects of flow rate are nonlinear, so 
you might use a natural spline term like ns(flow,5) to allow 
nonlinearity, and there also seem to be interactions in your plot.


library(splines)
glm(count ~ ns(flow,5) * gravity + group, data=mydata, family=poisson)



That might get you started while you look for a statistician to consult 
with.


-Michael



On 2/19/2015 9:47 AM, Jan-Ulrich Kreft wrote:

Dear list

I have data from a collaborator who has used DesignExpert to design the 
experiment and analyse the data but no longer has access to this software and 
does not know exactly what the software did and why.

So I’m now trying to analyse the data in R but can't quite decide what to do.

Cell count is the response variable (number of cells attached to a surface per 
unit area and time interval, so could be Poisson distributed).

This cell count depends on whether the surface was oriented upwards or 
downwards (categorical - with or against gravity). Some more categorical 
variables were also studied such as surface material (glass or polycarbonate, 
symbols g and p in the figure) and position in flow cell (inlet or outlet), but 
they seem to have no significant effect.

Cell count also depends on a quantitative variable in a nonlinear manner: the 
flow rate with which the cell suspension was pumped along the surface.

I was wondering which kind of statistical model would be appropriate. I was 
first thinking ANCOVA but this seems to be a linear model and treating the 
quantitative explanatory variable as covariate when this is actually of 
interest. What else could I use?

Attached a figure showing the means of 4 replicates.

Many thanks.

Best wishes,
Jan.

---
Dr Jan-Ulrich Kreft
+44 (0)121 41-48851
School of Biosciences
University of Birmingham, Birmingham, B15 2TT, UK
http://www.tinyurl.com/kreftlab





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] colours in ggplot2

2015-02-20 Thread Antonello Preti
Hi, I'm using ggplot2 to make a plot of the regression of a variable x (let
say, levels of depression),
on a variable y (let say, degree of social impairment),
by taking into account a binary factor (having had or not a past admission
to a psychiatric service),
and age of partecipants.

After some search in Internet I produced a code which is satisfying to me.
This site was very helpful: http://editerna.free.fr/wp/?p=266

However, I have a problem: no matter what I try, the figures always include
bluette and pink flamingo colours.
The figure is for an academic article, and I cannot afford the price of
having the plot printed in colours.

I've extracted the structure of the figure, and I understand that the
problem is in the scale_name hue,
but I cannot figure out how to deal with it.

Any way to override the ggplot2 system of dealing with factors?

Here the codes and the sessionInfo
The code is a bit baroque, but this is the best I was able to do.


Thank you in advance,
Antonello Preti



 code for exemplification

### the dataset

df - structure(list(Social_impairment = c(2.83, 3.08, 2.75, 2.08,
2.92, 1.75, 3.5, 2.33, 2.91, 2.5, 3.25, 2.64, 3.25, 2.83, 2.08,
2.25, 2.17, 2.42, 2.58, 2.42, 2.58, 2.42, 3, 3, 2.83, 2.67, 3.58,
1.58, 2.83, 2.83, 2.67, 3.17, 2.42, 1.92, 2.92, 2.5, 2.42, 2.42,
2.58, 2.42, 3.33, 3, 3.17, 2.17, 2.58, 2.67, 2.58, 3.75, 2.5,
2.08, 2.25, 3.25, 3.17, 2.91, 2.08, 2.25, 3.08, 2.91, 3.08, 2.92,
1.83, 2.5, 2.5, 2.83, 2.67, 3.33, 2.83, 3.33, 2.92, 3), Levels_Depression =
c(1.3,
1.71, 3.08, 0.48, 0.51, 0.71, 1.37, 0.2, 1.21, 1.07, 2.8, 1.24,
0.46, 0.97, 0.81, 1.13, 1.58, 3.12, 1.8, 1.54, 1.02, 0.32, 2.63,
1.39, 1.34, 2.37, 2.6, 1.11, 1.59, 2.17, 1.99, 0.59, 0.76, 0.23,
2.22, 1.98, 0.41, 0.32, 0.37, 1.11, 2.29, 0.97, 1.61, 1.27, 1.22,
2.38, 1.28, 1.21, 0.93, 2.3, 0.8, 2.1, 2.86, 2.47, 2.34, 2.67,
0.31, 0.88, 1.84, 0.23, 2.41, 0.56, 2.03, 1.11, 0.12, 2.39, 0.34,
2.08, 1.01, 1.51), Age = c(66, 59, 49, 70, 42, 55, 28, 41, 69,
65, 40, 21, 18, 77, 28, 40, 47, 37, 47, 39, 32, 33, 42, 28, 59,
49, 29, 41, 22, 29, 53, 39, 55, 61, 30, 49, 43, 46, 18, 36, 34,
17, 42, 37, 37, 54, 48, 23, 71, 42, 52, 83, 19, 47, 23, 80, 43,
38, 47, 80, 36, 73, 74, 51, 76, 14, 65, 39, 17, 73), Past_Admissions = c(1,
1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1,
1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0,
1, 0, 1, 0, 0, 1)), .Names = c(Social_impairment, Levels_Depression,
Age, Past_Admissions), row.names = c(NA, 70L), class = data.frame)

dim(df)
head(df)
str(df)
summary(df)


### call the library

library(ggplot2)


 the plot


 Levels_Depression on Social_impairment by Past_Admissions (yes/no)
 linear model
 radius of the bubbles proportional to age


 background elimination

p1 - ggplot(data = df, aes(x =Levels_Depression, y = Social_impairment,
group = as.factor(Past_Admissions), col = as.factor(Past_Admissions))) +
  geom_point(aes(size = Age)) + geom_smooth(method = lm) + xlab(Levels
of depression) + ylab(Social impairment) +
  scale_colour_discrete(History of \npast admissions\nto a psychiatric
service, labels = c(No, Yes))

p1 + theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank(),
panel.background = element_blank(), axis.line = element_line(colour =
black))


### change of then axes' ticks

p1 + theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank(),
panel.background = element_blank(), axis.line = element_line(colour =
black),
axis.text = element_text(color = black, size = 12, face = italic))


### after saving, dev.off()
###


 Age on Social_impairment by Past_Admissions (yes/no)
 linear model
 radius of the bubbles proportional to Levels_Depression



 background elimination

p2 - ggplot(data = df, aes(x =Age , y = Social_impairment, group =
as.factor(Past_Admissions), col = as.factor(Past_Admissions))) +
  geom_point(aes(size = Levels_Depression)) + geom_smooth(method = lm)
+xlab(Age of participants) + ylab(Social impairment) +
  scale_colour_discrete(History of \npast admissions\nto a psychiatric
service, labels = c(No, Yes))

p2 + theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank(),
panel.background = element_blank(), axis.line = element_line(colour =
black))


### change of then axes' ticks

p2 + theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank(),
panel.background = element_blank(), axis.line = element_line(colour =
black),
axis.text = element_text(color = black, size = 12, face = italic))


### after saving, dev.off()
###



 paired plots


library(gridExtra)

grid.arrange(p1, p2, ncol = 2)




### sessionInfo()


R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Italian_Italy.1252  

Re: [R] multiple parameter optimization with optim()

2015-02-20 Thread Doran, Harold
John et al

Thank you for your advice below. It was sloppy of me not to verify my 
reproducible code below.  I have tried a few of your suggestions and wrapped 
the working code into the function below called pl2. The function properly 
lands on the right model parameters when I use the optim or nlminb (for nlminb 
I had to increase max iterations). 

The function is enormously slow. At first, I created the object rr1 with two 
calls to sapply(). This works, but creates an extremely large matrix at each 
iteration. 

library(statmod)
dat - replicate(20, sample(c(0,1), 2000, replace = T))
a - b - rep(1, 20)
Q - 10
qq - gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
nds - qq$nodes
wts - qq$weights

rr1 - sapply(1:nrow(dat), function(j)
sapply(1:Q, function(i)

exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (qq$nodes[i] - b))), log = 
TRUE))) * qq$weights[i]))

So, I thought to reduce some memory, I would do it this way which is 
equivalent, doesn't create such a large matrix, but instead uses an explicit 
loop. Both approaches are still equally as slow. 

rr1 - numeric(nrow(dat))
for(j in 1:length(rr1)){
rr1[j] - sum(sapply(1:Q, 
function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * 
(nds[i] - b))), log = TRUE))) * wts[i]))
}

As you noted, my likelihood is not complex; in fact I have another program that 
uses newton-raphson with the analytic first and second derivatives because they 
are so easy to find. In that program, the model converges very (very) quickly. 
My purpose in using numeric differentiation is experiential in some respects 
and hoping to apply this to problems for which the analytic derivatives might 
not be so easy to come by.

I think the basic idea here to improve speed is to make a call to the gradient, 
which I understand to be the vector of first derivatives of my likelihood 
function, is that right? If that is right, in a multi-parameter problem, I'm 
not sure how to think about the gradient function. Since I am maximizing w.r.t. 
a and b (these are the parameters of the model), I would have a vector of first 
partials for a and another for b. So I conceptually do not understand what the 
gradient would be in this instance, perhaps some clarification would be helpful.

Below is the working function, which as I noted is enormously slow. Any advice 
on speed improvements here would be helpful. Thank you

pl2 - function(dat, Q, startVal = NULL, ...){
if(!is.null(startVal)  length(startVal) != ncol(dat) ){
stop(Length of argument startVal not equal to 
the number of parameters estimated)
} 
if(!is.null(startVal)){
startVal - startVal
} else {
p - colMeans(dat)
startValA - rep(1, ncol(dat))
startValB - as.vector(log((1 - p)/p))
startVal - c(startValA,startValB)
}
rr1 - numeric(nrow(dat))
qq - gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
nds - qq$nodes
wts - qq$weights
dat - as.matrix(dat)
fn - function(params){
a - params[1:20]
b - params[21:40] 
for(j in 1:length(rr1)){
rr1[j] - sum(sapply(1:Q, 
function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * 
(nds[i] - b))), log = TRUE))) * wts[i]))
}
-sum(log(rr1))
} 
#opt - optim(startVal, fn, method = BFGS, hessian = TRUE)
opt -  nlminb(startVal, fn)
#opt - Rcgmin(startVal, fn)
opt
#list(coefficients = opt$par, LogLik = -opt$value, 
Std.Error = sqrt(diag(solve(opt$hessian
}

dat - replicate(20, sample(c(0,1), 2000, replace = T))
r2 - pl2(datat, Q =10)

-Original Message-
From: Prof J C Nash (U30A) [mailto:nas...@uottawa.ca] 
Sent: Wednesday, February 18, 2015 9:07 AM
To: r-help@r-project.org; Doran, Harold
Subject: Re: [R] multiple parameter optimization with optim()

Some observations -- no solution here though:

1) the code is not executable. I tried. Maybe that makes it reproducible!
Typos such as stat mod, undefined Q etc.

2) My experience is that any setup with a ?apply approach that doesn't then 
check to see that the structure of the data is correct has a high probability 
of failure due to mismatch with the optimizer requirements.
It's worth being VERY pedestrian in setting up optimization functions and 
checking obsessively that you get what you expect and that there are no regions 
you 

[R] irregular sequence of events

2015-02-20 Thread PIKAL Petr
Dear all

I know I am missing something obvious but after few hours of trials I ask for 
some help.

I have some sequence of values (days)
x - 1:30

and an indication of event start and end day
mimo-c(5,10, 13,16, 21,27)

or

events - structure(list(start = c(5, 13, 21), end = c(10, 16, 27)), .Names = 
c(start,
end), row.names = c(NA, -3L), class = data.frame)

I need to get a factor indicating event

event - c(rep(NA, 4), rep(A1, 6), rep(NA, 2), rep(A2, 4), rep(NA, 4), 
rep(A3, 7), rep(NA,3))
factor(event)

In such small example I can do it manually but I have a long vector of dates 
and would like to use start and end day of events either from mimo vector or 
from events data frame.

Is there any function which does it automagically? I know I have seen it before 
but I cannot find it now.

Best regards
Petr


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Design patterns for data munging?

2015-02-20 Thread Aron Lindberg
Hi All,


The most difficult challenge that I face in “learning R” is to do data munging. 
I have reviewed Hadley’s advanced R programming guide, familiarized myself with 
data structures, subsetting, plyr, dplyr, tidy, the lapply() family of 
functions, basic string manipulation and grepping, SQL etc. I’ve also written a 
few dozens of functions that do basic data munging tasks. Further, I’ve already 
reviewed things like the Coursera course “Computing for Data Analysis” - 
https://www.coursera.org/course/compdata and Data Camp's data.table course.


However, many of the tasks that are commonly solved by the tools mentioned 
above seem to be mainly applied to datasets with fairly well-structured 
variables that needs to be transformed and subsetted in various ways - these 
tasks are often not so difficult. 



Much of my work involves querying APIs, SQL databases or scraping websites, and 
then assembling lists of various things that can then be transformed into 
social networks or timestamped sequences of various events etc. Solutions to 
many tricky problems in this area still seem to imply creative leaps of 
imagination that I can understand after I see them, but I have trouble seeing 
how I could ever come up with them independently.


Therefore I ask - what do I need to learn to become better at solving tricky 
data munging problems?


I realize a common answer may be: solve many data munging problems. I 
understand that this is a clear factor, however, I’m trying to figure out if 
there is some more tangible guidance. 


* Is there something like “design patterns” for data munging? 
* Would doing a course in algorithms help? (I’ve reviewed parts of Guide to 
Programming and Algorithms Using R - 
http://www.springer.com/computer/swe/book/978-1-4471-5327-6 - many of the 
problems are mathematical and seem far-removed from the kinds of problems that 
I’m trying to solve)
* Is there something like SelectorGadget (http://selectorgadget.com/) for R 
objects?
* Could something like OpenRefine (http://openrefine.org/) make these tasks 
easier?


Best,
Aron

-- 
Aron Lindberg


Doctoral Candidate, Information Systems
Weatherhead School of Management 
Case Western Reserve University
aronlindberg.github.io
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting a list of lists using lapply

2015-02-20 Thread Aron Lindberg
Hmm…Chuck’s solution may actually be problematic because there are several 
entries which at the deepest level are called “sha”, but that should not be 
included, such as:





input[[67]]$content[[1]]$commit$tree$sha




and




input[[67]]$content[[1]]$parents[[1]]$sha





it’s only the “sha” that fit the following subsetting pattern that should be 
included:





input[[i]]$content[[1]]$sha[1]




It’s getting thornier!




To be fair to Rolf’s solution (which probably can be updated to solve the 
problem), I’ve posted the complete dput here:

https://gist.githubusercontent.com/aronlindberg/92700c04c88ff112e4f7/raw/0f3cd8468f4dc82267be3cec72d53a7a04f5c449/dput.R







-- 

Aron Lindberg




Doctoral Candidate, Information Systems

Weatherhead School of Management 

Case Western Reserve University

aronlindberg.github.io

On Fri, Feb 20, 2015 at 8:25 AM, Aron Lindberg aron.lindb...@case.edu
wrote:

 Thanks Chuck and Rolf.
 While Rolf’s code also works on the dput that I actually gave you (a smaller 
 subset of the full dataset), it failed to work on the larger dataset, because 
 there are further exceptions:
 input[[i]]$content[[1]] is sometimes a list, sometimes a character vector, 
 and sometimes input[[i]]$content simply returns list().
 Chuck’s solution however bypasses this and works on the full dataset (which 
 was 8mb, which is why I didn’t upload it as a gist).
 Best,
 Aron
 -- 
 Aron Lindberg
 Doctoral Candidate, Information Systems
 Weatherhead School of Management 
 Case Western Reserve University
 aronlindberg.github.io
 On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu wrote:
 Aron Lindberg aron.lindberg at case.edu writes:
 
 Hi Everyone,
 
 I'm working on a thorny subsetting problem involving list of lists. I've 
 put a 
 dput of the data here:
 
 https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/
 raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput
 
 IIUC, you want the value of every list element that is named sha and 
 that name will only apply to atomic objects.
 If so, this should do it. 
 input - dget(/tmp/dpt)
 shas - unlist( input, use.names=FALSE )[ grepl( sha, 
 names(unlist(input)))]
 input[[67]]$content[[1]]$sha
 [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15
 which(input[[67]]$content[[1]]$sha == shas )
 [1] 194
 HTH,
 Chuck
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] irregular sequence of events

2015-02-20 Thread Sarah Goslee
Hi,

On Fri, Feb 20, 2015 at 9:27 AM, PIKAL Petr petr.pi...@precheza.cz wrote:
 Dear all

 I know I am missing something obvious but after few hours of trials I ask for 
 some help.

 I have some sequence of values (days)
 x - 1:30

 and an indication of event start and end day
 mimo-c(5,10, 13,16, 21,27)


 cut(x, mimo)

 [1] NANANANANA(5,10]  (5,10]  (5,10]  (5,10]

[10] (5,10]  (10,13] (10,13] (10,13] (13,16] (13,16] (13,16] (16,21] (16,21]

[19] (16,21] (16,21] (16,21] (21,27] (21,27] (21,27] (21,27] (21,27] (21,27]

[28] NANANA

Levels: (5,10] (10,13] (13,16] (16,21] (21,27]


should get you started. You'll need to tweak the arguments to get
exactly what you want,


 or

 events - structure(list(start = c(5, 13, 21), end = c(10, 16, 27)), .Names = 
 c(start,
 end), row.names = c(NA, -3L), class = data.frame)

 I need to get a factor indicating event

 event - c(rep(NA, 4), rep(A1, 6), rep(NA, 2), rep(A2, 4), rep(NA, 4), 
 rep(A3, 7), rep(NA,3))
 factor(event)

 In such small example I can do it manually but I have a long vector of dates 
 and would like to use start and end day of events either from mimo vector or 
 from events data frame.

 Is there any function which does it automagically? I know I have seen it 
 before but I cannot find it now.

 Best regards
 Petr


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem in R

2015-02-20 Thread thanoon younis
Dear all members

I have the following matrix in R

 thd - matrix(testJAGSdata$thd, ncol=6, byrow=TRUE)
 head(thd)
 [,1]   [,2]   [,3]   [,4]  [,5] [,6]
[1,] -200 -2.517 -1.245 -0.444 0.848  200
[2,] -200 -1.447 -0.420  0.119  1.245 200
[3,] -200 -1.671 -0.869 -0.194  0.679 200
[4,] -200 -1.642 -0.869 -0.293  0.332 200
[5,] -200 -1.671 -0.827  0.052  0.756 200
[6,] -200 -1.769 -1.098 -0.469  0.255 200
[7,] -200 -1.490 -0.670 -0.082  0.880 200
[8,] -200 -1.933 -0.880 -0.317  1.008 200
[9,] -200  -1.587 -0.624  0.000  1.008 200
[10,] -200 -1.983 -1.348 -0.348  1.045 200
[11,] -200 -1.983 -1.229 -0.247  0.869 200
[12,] -200 -2.262 -1.426  0.037  1.330 200
[13,] -200 -2.371 -1.295 -0.224  0.651 200
[14,] -200 -2.039 -1.112 -0.149  1.169 200
[15,] -200 -2.262 -1.198 -0.309  1.198 200
[16,] -200 -2.176 -1.537 -0.717  0.597 200
[17,] -200 -1.447 -0.786  0.119  1.008 200
[18,] -200 -2.039 -1.769 -0.661  0.642 200

and when i implemented this matrix i found this error


+  head(thd)
+  [,1]   [,2]   [,3]   [,4]  [,5] [,6]
+ [1,] -200 -2.517 -1.245 -0.444 0.848  200
Error: unexpected numeric constant in:
 [,1]   [,2]   [,3]   [,4]  [,5] [,6]
[1,] -200 -2.517 -1.245 -0.444 0.848
 [2,] -200 -1.447 -0.420  0.119  1.245 200
Error: unexpected '[' in [
 [3,] -200 -1.671 -0.869 -0.194  0.679 200
Error: unexpected '[' in [
 [4,] -200 -1.642 -0.869 -0.293  0.332 200
Error: unexpected '[' in [
 [5,] -200 -1.671 -0.827  0.052  0.756 200
Error: unexpected '[' in [
 [6,] -200 -1.769 -1.098 -0.469  0.255 200
Error: unexpected '[' in [
 [7,] -200 -1.490 -0.670 -0.082  0.880 200
Error: unexpected '[' in [
 [8,] -200 -1.933 -0.880 -0.317  1.008 200
Error: unexpected '[' in [
 [9,] -200  -1.587 -0.624  0.000  1.008 200
Error: unexpected '[' in [
 [10,] -200 -1.983 -1.348 -0.348  1.045 200
Error: unexpected '[' in [
 [11,] -200 -1.983 -1.229 -0.247  0.869 200
Error: unexpected '[' in [
 [12,] -200 -2.262 -1.426  0.037  1.330 200
Error: unexpected '[' in [
 [13,] -200 -2.371 -1.295 -0.224  0.651 200
Error: unexpected '[' in [
 [14,] -200 -2.039 -1.112 -0.149  1.169 200
Error: unexpected '[' in [
 [15,] -200 -2.262 -1.198 -0.309  1.198 200
Error: unexpected '[' in [
 [16,] -200 -2.176 -1.537 -0.717  0.597 200
Error: unexpected '[' in [
 [17,] -200 -1.447 -0.786  0.119  1.008 200
Error: unexpected '[' in [
 [18,] -200 -2.039 -1.769 -0.661  0.642 200
Error: unexpected '[' in [


Any help would be very appreciated

thanks in advance
-- 
Thanoon Y. Thanoon
PhD Candidate
Department of Mathematical Sciences
Faculty of Science
University Technology Malaysia, UTM
E.Mail: thanoon.youni...@gmail.com
E.Mail: dawn_praye...@yahoo.com
Facebook:Thanoon Younis AL-Shakerchy
Twitter: Thanoon Alshakerchy
H.P:00601127550205

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem in R

2015-02-20 Thread Sarah Goslee
Hi,

On Fri, Feb 20, 2015 at 9:48 AM, thanoon younis
thanoon.youni...@gmail.com wrote:
 Dear all members

 I have the following matrix in R

 thd - matrix(testJAGSdata$thd, ncol=6, byrow=TRUE)
 head(thd)
  [,1]   [,2]   [,3]   [,4]  [,5] [,6]
 [1,] -200 -2.517 -1.245 -0.444 0.848  200
 [2,] -200 -1.447 -0.420  0.119  1.245 200
 [3,] -200 -1.671 -0.869 -0.194  0.679 200
 [4,] -200 -1.642 -0.869 -0.293  0.332 200
 [5,] -200 -1.671 -0.827  0.052  0.756 200
 [6,] -200 -1.769 -1.098 -0.469  0.255 200
 [7,] -200 -1.490 -0.670 -0.082  0.880 200
 [8,] -200 -1.933 -0.880 -0.317  1.008 200
 [9,] -200  -1.587 -0.624  0.000  1.008 200
 [10,] -200 -1.983 -1.348 -0.348  1.045 200
 [11,] -200 -1.983 -1.229 -0.247  0.869 200
 [12,] -200 -2.262 -1.426  0.037  1.330 200
 [13,] -200 -2.371 -1.295 -0.224  0.651 200
 [14,] -200 -2.039 -1.112 -0.149  1.169 200
 [15,] -200 -2.262 -1.198 -0.309  1.198 200
 [16,] -200 -2.176 -1.537 -0.717  0.597 200
 [17,] -200 -1.447 -0.786  0.119  1.008 200
 [18,] -200 -2.039 -1.769 -0.661  0.642 200

This is not reproducible, so the below are guesses.

 and when i implemented this matrix i found this error

I have no idea what implemented this matrix might mean.


 +  head(the)

First problem: the + means R expects continuation of a previous line,
because the command is incomplete. So whatever you did BEFORE this
line is wrong.

 +  [,1]   [,2]   [,3]   [,4]  [,5] [,6]
 + [1,] -200 -2.517 -1.245 -0.444 0.848  200
 Error: unexpected numeric constant in:
  [,1]   [,2]   [,3]   [,4]  [,5] [,6]
 [1,] -200 -2.517 -1.245 -0.444 0.848

These and subsequent errors are what you'd get if you pasted the above
R output back into the R console. Why would you do that?

So: check your previous commands.
If you can't find your mistake, respond to the list with a clear
reproducible example.

 [2,] -200 -1.447 -0.420  0.119  1.245 200
 Error: unexpected '[' in [
 [3,] -200 -1.671 -0.869 -0.194  0.679 200
 Error: unexpected '[' in [
 [4,] -200 -1.642 -0.869 -0.293  0.332 200
 Error: unexpected '[' in [
 [5,] -200 -1.671 -0.827  0.052  0.756 200
 Error: unexpected '[' in [
 [6,] -200 -1.769 -1.098 -0.469  0.255 200
 Error: unexpected '[' in [
 [7,] -200 -1.490 -0.670 -0.082  0.880 200
 Error: unexpected '[' in [
 [8,] -200 -1.933 -0.880 -0.317  1.008 200
 Error: unexpected '[' in [
 [9,] -200  -1.587 -0.624  0.000  1.008 200
 Error: unexpected '[' in [
 [10,] -200 -1.983 -1.348 -0.348  1.045 200
 Error: unexpected '[' in [
 [11,] -200 -1.983 -1.229 -0.247  0.869 200
 Error: unexpected '[' in [
 [12,] -200 -2.262 -1.426  0.037  1.330 200
 Error: unexpected '[' in [
 [13,] -200 -2.371 -1.295 -0.224  0.651 200
 Error: unexpected '[' in [
 [14,] -200 -2.039 -1.112 -0.149  1.169 200
 Error: unexpected '[' in [
 [15,] -200 -2.262 -1.198 -0.309  1.198 200
 Error: unexpected '[' in [
 [16,] -200 -2.176 -1.537 -0.717  0.597 200
 Error: unexpected '[' in [
 [17,] -200 -1.447 -0.786  0.119  1.008 200
 Error: unexpected '[' in [
 [18,] -200 -2.039 -1.769 -0.661  0.642 200
 Error: unexpected '[' in [


 Any help would be very appreciated

 thanks in advance
 --
 Thanoon Y. Thanoon
 PhD Candidate
 Department of Mathematical Sciences
 Faculty of Science
 University Technology Malaysia, UTM
 E.Mail: thanoon.youni...@gmail.com
 E.Mail: dawn_praye...@yahoo.com
 Facebook:Thanoon Younis AL-Shakerchy
 Twitter: Thanoon Alshakerchy
 H.P:00601127550205


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple parameter optimization with optim()

2015-02-20 Thread Bert Gunter
This is not the proper venue for a discussion of the mathematics of
optimization, no matter that it is interesting. Please take it off
list.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
Clifford Stoll




On Fri, Feb 20, 2015 at 6:03 AM, Doran, Harold hdo...@air.org wrote:
 John et al

 Thank you for your advice below. It was sloppy of me not to verify my 
 reproducible code below.  I have tried a few of your suggestions and wrapped 
 the working code into the function below called pl2. The function properly 
 lands on the right model parameters when I use the optim or nlminb (for 
 nlminb I had to increase max iterations).

 The function is enormously slow. At first, I created the object rr1 with two 
 calls to sapply(). This works, but creates an extremely large matrix at each 
 iteration.

 library(statmod)
 dat - replicate(20, sample(c(0,1), 2000, replace = T))
 a - b - rep(1, 20)
 Q - 10
 qq - gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
 nds - qq$nodes
 wts - qq$weights

 rr1 - sapply(1:nrow(dat), function(j)
 sapply(1:Q, function(i)
 
 exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (qq$nodes[i] - b))), log = 
 TRUE))) * qq$weights[i]))

 So, I thought to reduce some memory, I would do it this way which is 
 equivalent, doesn't create such a large matrix, but instead uses an explicit 
 loop. Both approaches are still equally as slow.

 rr1 - numeric(nrow(dat))
 for(j in 1:length(rr1)){
 rr1[j] - sum(sapply(1:Q,
 function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a 
 * (nds[i] - b))), log = TRUE))) * wts[i]))
 }

 As you noted, my likelihood is not complex; in fact I have another program 
 that uses newton-raphson with the analytic first and second derivatives 
 because they are so easy to find. In that program, the model converges very 
 (very) quickly. My purpose in using numeric differentiation is experiential 
 in some respects and hoping to apply this to problems for which the analytic 
 derivatives might not be so easy to come by.

 I think the basic idea here to improve speed is to make a call to the 
 gradient, which I understand to be the vector of first derivatives of my 
 likelihood function, is that right? If that is right, in a multi-parameter 
 problem, I'm not sure how to think about the gradient function. Since I am 
 maximizing w.r.t. a and b (these are the parameters of the model), I would 
 have a vector of first partials for a and another for b. So I conceptually do 
 not understand what the gradient would be in this instance, perhaps some 
 clarification would be helpful.

 Below is the working function, which as I noted is enormously slow. Any 
 advice on speed improvements here would be helpful. Thank you

 pl2 - function(dat, Q, startVal = NULL, ...){
 if(!is.null(startVal)  length(startVal) != ncol(dat) ){
 stop(Length of argument startVal not equal 
 to the number of parameters estimated)
 }
 if(!is.null(startVal)){
 startVal - startVal
 } else {
 p - colMeans(dat)
 startValA - rep(1, ncol(dat))
 startValB - as.vector(log((1 - p)/p))
 startVal - c(startValA,startValB)
 }
 rr1 - numeric(nrow(dat))
 qq - gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
 nds - qq$nodes
 wts - qq$weights
 dat - as.matrix(dat)
 fn - function(params){
 a - params[1:20]
 b - params[21:40]
 for(j in 1:length(rr1)){
 rr1[j] - sum(sapply(1:Q,
 function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a 
 * (nds[i] - b))), log = TRUE))) * wts[i]))
 }
 -sum(log(rr1))
 }
 #opt - optim(startVal, fn, method = BFGS, hessian = TRUE)
 opt -  nlminb(startVal, fn)
 #opt - Rcgmin(startVal, fn)
 opt
 #list(coefficients = opt$par, LogLik = -opt$value, 
 Std.Error = sqrt(diag(solve(opt$hessian
 }

 dat - replicate(20, sample(c(0,1), 2000, replace = T))
 r2 - pl2(datat, Q =10)

 -Original Message-
 From: Prof J C Nash (U30A) [mailto:nas...@uottawa.ca]
 Sent: Wednesday, February 18, 2015 9:07 AM
 To: r-help@r-project.org; Doran, Harold
 Subject: Re: [R] multiple parameter optimization with optim()

 Some observations -- no solution here though:

 1) the code is not executable. I tried. Maybe that makes 

[R] creating a distinct zip file

2015-02-20 Thread Erin Hodgess
Hello yet again.

I am trying to create a zip file for a friend who has a Windows machine.

He needs to access this via the local zip file packages option.

When I use R CMD INSTALL --compile-both, it produces an item in the library
tree (as promised).

However, I would like to have an actual .zip file.

I do know at one time that was possible, not sure if I can still do it.

I did try R CMD INSTALL --force-biarch as well, same result as compile both.

thank you for any suggestions.

Sincerely,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Mathematical and Statistics
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] irregular sequence of events

2015-02-20 Thread jim holtman
A little shorter version of the SQL solution after consulting with my
SQL expert:

  require(sqldf)
 timeline - data.frame(time = 1:30)
 events - structure(list(start = c(5, 13, 21), end = c(10, 16, 27)), .Names = 
 c(start,
+  end), row.names = c(NA, -3L), class = data.frame)
 # add event number
 events$num - paste0(A, seq(nrow(events)))
 events
  start end num
1 5  10  A1
213  16  A2
321  27  A3

 sqldf(
+ select t.*, e.num
+ from timeline t
+ left join events as e
+ on t.time between e.start and e.end
+ )
   time  num
1 1 NA
2 2 NA
3 3 NA
4 4 NA
5 5   A1
6 6   A1
7 7   A1
8 8   A1
9 9   A1
10   10   A1
11   11 NA
12   12 NA
13   13   A2
14   14   A2
15   15   A2
16   16   A2
17   17 NA
18   18 NA
19   19 NA
20   20 NA
21   21   A3
22   22   A3
23   23   A3
24   24   A3
25   25   A3
26   26   A3
27   27   A3
28   28 NA
29   29 NA
30   30 NA


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Feb 20, 2015 at 9:27 AM, PIKAL Petr petr.pi...@precheza.cz wrote:
 Dear all

 I know I am missing something obvious but after few hours of trials I ask for 
 some help.

 I have some sequence of values (days)
 x - 1:30

 and an indication of event start and end day
 mimo-c(5,10, 13,16, 21,27)

 or

 events - structure(list(start = c(5, 13, 21), end = c(10, 16, 27)), .Names = 
 c(start,
 end), row.names = c(NA, -3L), class = data.frame)

 I need to get a factor indicating event

 event - c(rep(NA, 4), rep(A1, 6), rep(NA, 2), rep(A2, 4), rep(NA, 4), 
 rep(A3, 7), rep(NA,3))
 factor(event)

 In such small example I can do it manually but I have a long vector of dates 
 and would like to use start and end day of events either from mimo vector or 
 from events data frame.

 Is there any function which does it automagically? I know I have seen it 
 before but I cannot find it now.

 Best regards
 Petr

 
 Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou 
 určeny pouze jeho adresátům.
 Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
 jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
 svého systému.
 Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
 jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
 Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
 zpožděním přenosu e-mailu.

 V případě, že je tento e-mail součástí obchodního jednání:
 - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, 
 a to z jakéhokoliv důvodu i bez uvedení důvodu.
 - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
 Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany 
 příjemce s dodatkem či odchylkou.
 - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
 dosažením shody na všech jejích náležitostech.
 - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
 žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
 pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu 
 případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je 
 adresátovi či osobě jím zastoupené známá.

 This e-mail and any documents attached to it may be confidential and are 
 intended only for its intended recipients.
 If you received this e-mail by mistake, please immediately inform its sender. 
 Delete the contents of this e-mail with all attachments and its copies from 
 your system.
 If you are not the intended recipient of this e-mail, you are not authorized 
 to use, disseminate, copy or disclose this e-mail in any manner.
 The sender of this e-mail shall not be liable for any possible damage caused 
 by modifications of the e-mail or by delay with transfer of the email.

 In case that this e-mail forms part of business dealings:
 - the sender reserves the right to end negotiations about entering into a 
 contract in any time, for any reason, and without stating any reasoning.
 - if the e-mail contains an offer, the recipient is entitled to immediately 
 accept such offer; The sender of this e-mail (offer) excludes any acceptance 
 of the offer on the part of the recipient containing any amendment or 
 variation.
 - the sender insists on that the respective contract is concluded only upon 
 an express mutual agreement on all its aspects.
 - the sender of this e-mail informs that he/she is not authorized to enter 
 into any contracts on behalf of the company except for cases in which he/she 
 is expressly authorized to do so in writing, and such authorization or power 
 of attorney is submitted to the recipient or the person represented by the 
 recipient, or the existence of such authorization is known to the recipient 
 of the person represented by the recipient.
 

Re: [R] Simple Histogram

2015-02-20 Thread JS Huang
Hi,

  Your data may look like the following and named speed.txt in working
directory.  Then the cod follows the data.  The graph is attached as
speed.pdf.

Speed
50
52
55
57
58
59
60
61
62
63
64
65
65
65
67
68
68
68
68
69
69
70
71
72
72
72
73
73
73
73
75
76
77
78
79



 speed - read.table(speed.txt,header=TRUE)
 speed
   Speed
1 50
2 52
3 55
4 57
5 58
6 59
7 60
8 61
9 62
1063
1164
1265
1365
1465
1567
1668
1768
1868
1968
2069
2169
2270
2371
2472
2572
2672
2773
2873
2973
3073
3175
3276
3377
3478
3579
 hist(speed$Speed)

Speed.pdf http://r.789695.n4.nabble.com/file/n4703616/Speed.pdf  



--
View this message in context: 
http://r.789695.n4.nabble.com/Simple-Histogram-tp4703615p4703616.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] running rcorr (Hmisc) on multiple cores

2015-02-20 Thread R Tagett

Hello,

I am having trouble generating a correlation matrix on multiple cores.

I have a matrix myMat for which I would like to do a Pearson correlation. 
Lets say dim(myMat) is 100 200. I want a 200x200 correlation matrix and 
corresponding p-value matrix.

I like to use rcorr(myMat) in the Hmisc package, but for larger matrices this 
command is too time consuming.
I have spent a day playing with mclapply(myMat, rcorr, ...) from the parallel 
package, trying to distribute the job on multiple cores. But I can't figure it 
out.

I also tried mclapply( myMat, cor.test, ...), but it runs even more slowly.

Does anyone have any suggestions?

Thanks very much for your help,
Beck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] irregular sequence of events

2015-02-20 Thread JS Huang
Hi,

  Herr is one implementation with function named eventList.

 start
[1]  5 13 21
 start
[1]  5 13 21
 end
[1] 10 16 27
 x
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30
 eventList
function(start, end, x)
{
  result - character(0)
  for (i in 1:length(start))
  {
if (i == 1)
{
  if (start[i]  1)
  {
result - c(result, rep(NA, start[i] - 1))
  }
  result - c(result, rep(paste0(A,i), end[i] - start[i] + 1))
}
else
{
  if (start[i]  end[i - 1] + 1)
  {
result - c(result, rep(NA, start[i] - end[i - 1] - 1))
  }
  result - c(result, rep(paste0(A, i), end[i] - start[i] + 1))
}
  }
  if (end[length(start)]  length(x))
  {
result - c(result, rep(NA, length(x) - end[length(start)]))
  }
  return(result)
}
 eventList(start, end, x)
 [1] NA   NA   NA   NA   A1 A1 A1 A1 A1 A1 NA   NA   A2 A2
A2 A2 NA   NA   NA  
[20] NA   A3 A3 A3 A3 A3 A3 A3 NA   NA   NA  
 



--
View this message in context: 
http://r.789695.n4.nabble.com/irregular-sequence-of-events-tp4703579p4703624.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Example of Calling a DLL

2015-02-20 Thread Alex Restrepo
All, 

I'm a newbie to R and am interested in seeing a simple example of calling a 3rd 
party Visual Studio generated DLL from RStudio.  Does anyone have a simple 
example which also walks through the preliminary steps of setting up the 
INCLUDE path and the library path to either a DLL or LIB file ?  I have tried 
to find an easy example, but thus far has no luck finding an example using Rcpp 
to communicate to a 3rd party visual studio DLL. 

Many Thanks in Advance, Alex
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I access a specific element of a multi-dimensional list

2015-02-20 Thread William Dunlap
Using lapply() where Jim used sapply() would keep the types
right and be a fair bit faster than a solution based on repeatedly
appending to a list (like your getFirst).

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Feb 20, 2015 at 1:52 PM, JS Huang js.hu...@protective.com wrote:

 Hi,

   Jim's answer is neat.  There is an issue on the result.  All are
 characters even though some are numeric or logic.  The following
 implementation retains the variable type.

  x
 [[1]]
 [1] 2 3 5

 [[2]]
 [1] aa bb cc

 [[3]]
 [1]  TRUE FALSE  TRUE

  getFirst
 function(aList)
 {
   result - list()
   for (i in 1:length(aList))
   {
 result - c(result, aList[[i]][1])
   }
   return(result)
 }
  getFirst(x)
 [[1]]
 [1] 2

 [[2]]
 [1] aa

 [[3]]
 [1] TRUE

 



 --
 View this message in context:
 http://r.789695.n4.nabble.com/How-do-I-access-a-specific-element-of-a-multi-dimensional-list-tp4703596p4703622.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating a distinct zip file

2015-02-20 Thread Rolf Turner

On 21/02/15 15:02, Jeff Newmiller wrote:

R CMD INSTALL --build packagename


That will create a *.tar.gz file, not a *.zip file.  The latter being
what Erin wanted, if I understand correctly.

I have worked around the problem in the past with a shell script
like unto:

#! /bin/csh
set vnum = `grep Version $pkge/DESCRIPTION | sed -e 's/Version: //'`
R CMD INSTALL -l Lib $pkge  /dev/null
cd Lib
zip -r -l $pkge.zip $pkge  /dev/null
mv $pkge.zip ../$pkge_$vnum.zip

In the foregoing pkge is the name of the package you are trying to 
build.  You will have to have created the holding library Lib a priori.


There are doubtless (much) better ways of accomplishing this task, but I 
don't know them.


cheers,

Rolf


---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

On February 20, 2015 1:07:10 PM PST, Erin Hodgess erinm.hodg...@gmail.com 
wrote:

Hello yet again.

I am trying to create a zip file for a friend who has a Windows
machine.

He needs to access this via the local zip file packages option.

When I use R CMD INSTALL --compile-both, it produces an item in the
library
tree (as promised).

However, I would like to have an actual .zip file.

I do know at one time that was possible, not sure if I can still do it.

I did try R CMD INSTALL --force-biarch as well, same result as compile
both.

thank you for any suggestions.

Sincerely,
Erin


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Rolf Turner
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
Home phone: +64-9-480-4619

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I access a specific element of a multi-dimensional list

2015-02-20 Thread JS Huang
Hi,

  Jim's answer is neat.  There is an issue on the result.  All are
characters even though some are numeric or logic.  The following
implementation retains the variable type.

 x
[[1]]
[1] 2 3 5

[[2]]
[1] aa bb cc

[[3]]
[1]  TRUE FALSE  TRUE

 getFirst
function(aList)
{
  result - list()
  for (i in 1:length(aList))
  {
result - c(result, aList[[i]][1])
  }
  return(result)
}
 getFirst(x)
[[1]]
[1] 2

[[2]]
[1] aa

[[3]]
[1] TRUE

 



--
View this message in context: 
http://r.789695.n4.nabble.com/How-do-I-access-a-specific-element-of-a-multi-dimensional-list-tp4703596p4703622.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Replacing 9999 and 999 values with NA

2015-02-20 Thread Alexandra Catena
Hello All,

I have a data frame of two columns for wind.  The first column is for wind
speed and the second wind direction.  I'm trying to replace the  values
in the first column and the 999 values in the second column with NA.  I
tried to use the function ltdl.fix.df but it doesn't seem to do anything.

 ltdl.fix.df(windMV, zero2na = FALSE, coded = 999)

  n = 9432 by p = 4 matrix checked, 0 NA(s) present

  0 factor variable(s) present

  5675 value(s) coded 999 set to NA

  0 -ve value(s) set to +ve half the negative value


I have R version 3.1.1

Thanks,
Alexandra

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replacing 9999 and 999 values with NA

2015-02-20 Thread Jeff Newmiller
You did not say how you imported the data, but if you used one of the 
read.table variants (including read.csv) then you can use the na.strings 
argument as documented in the help file for read.table.

Next time please read the posting guide, as there are some useful tips in 
there, such as posting using plain text (a setting in your email program) so we 
don't get garbled info from you, and providing a reproducible example.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On February 20, 2015 10:55:30 AM PST, Alexandra Catena amc5...@gmail.com 
wrote:
Hello All,

I have a data frame of two columns for wind.  The first column is for
wind
speed and the second wind direction.  I'm trying to replace the 
values
in the first column and the 999 values in the second column with NA.  I
tried to use the function ltdl.fix.df but it doesn't seem to do
anything.

 ltdl.fix.df(windMV, zero2na = FALSE, coded = 999)

  n = 9432 by p = 4 matrix checked, 0 NA(s) present

  0 factor variable(s) present

  5675 value(s) coded 999 set to NA

  0 -ve value(s) set to +ve half the negative value


I have R version 3.1.1

Thanks,
Alexandra

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating a distinct zip file

2015-02-20 Thread Jeff Newmiller
R CMD INSTALL --build packagename
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On February 20, 2015 1:07:10 PM PST, Erin Hodgess erinm.hodg...@gmail.com 
wrote:
Hello yet again.

I am trying to create a zip file for a friend who has a Windows
machine.

He needs to access this via the local zip file packages option.

When I use R CMD INSTALL --compile-both, it produces an item in the
library
tree (as promised).

However, I would like to have an actual .zip file.

I do know at one time that was possible, not sure if I can still do it.

I did try R CMD INSTALL --force-biarch as well, same result as compile
both.

thank you for any suggestions.

Sincerely,
Erin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a list of lists using lapply

2015-02-20 Thread Bert Gunter
How can you expect a solution if you cannot specify the problem?

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
Clifford Stoll




On Fri, Feb 20, 2015 at 6:13 AM, Aron Lindberg aron.lindb...@case.edu wrote:
 Hmm…Chuck’s solution may actually be problematic because there are several 
 entries which at the deepest level are called “sha”, but that should not be 
 included, such as:





 input[[67]]$content[[1]]$commit$tree$sha




 and




 input[[67]]$content[[1]]$parents[[1]]$sha





 it’s only the “sha” that fit the following subsetting pattern that should be 
 included:





 input[[i]]$content[[1]]$sha[1]




 It’s getting thornier!




 To be fair to Rolf’s solution (which probably can be updated to solve the 
 problem), I’ve posted the complete dput here:

 https://gist.githubusercontent.com/aronlindberg/92700c04c88ff112e4f7/raw/0f3cd8468f4dc82267be3cec72d53a7a04f5c449/dput.R







 --

 Aron Lindberg




 Doctoral Candidate, Information Systems

 Weatherhead School of Management

 Case Western Reserve University

 aronlindberg.github.io

 On Fri, Feb 20, 2015 at 8:25 AM, Aron Lindberg aron.lindb...@case.edu
 wrote:

 Thanks Chuck and Rolf.
 While Rolf’s code also works on the dput that I actually gave you (a smaller 
 subset of the full dataset), it failed to work on the larger dataset, 
 because there are further exceptions:
 input[[i]]$content[[1]] is sometimes a list, sometimes a character vector, 
 and sometimes input[[i]]$content simply returns list().
 Chuck’s solution however bypasses this and works on the full dataset (which 
 was 8mb, which is why I didn’t upload it as a gist).
 Best,
 Aron
 --
 Aron Lindberg
 Doctoral Candidate, Information Systems
 Weatherhead School of Management
 Case Western Reserve University
 aronlindberg.github.io
 On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu wrote:
 Aron Lindberg aron.lindberg at case.edu writes:

 Hi Everyone,

 I'm working on a thorny subsetting problem involving list of lists. I've 
 put a
 dput of the data here:

 https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/
 raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput

 IIUC, you want the value of every list element that is named sha and
 that name will only apply to atomic objects.
 If so, this should do it.
 input - dget(/tmp/dpt)
 shas - unlist( input, use.names=FALSE )[ grepl( sha, 
 names(unlist(input)))]
 input[[67]]$content[[1]]$sha
 [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15
 which(input[[67]]$content[[1]]$sha == shas )
 [1] 194
 HTH,
 Chuck
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Averaging column scores when participants vary in number of observations

2015-02-20 Thread John Kane
And just to muddy the waters more here's another way to do it using the handy 
plyr package where the data.frame is dat1

library(plyr)
ddply(dat1, .(Participant.ID), summarize, mean = mean(Score))

John Kane
Kingston ON Canada


 -Original Message-
 From: js.hu...@protective.com
 Sent: Thu, 19 Feb 2015 17:36:19 -0800 (PST)
 To: r-help@r-project.org
 Subject: Re: [R] Averaging column scores when participants vary in number
 of observations
 
 Hi,
 
   Another implication:
 
 data1
   Observation Participant.ID Video.Coder Score
 1   A  1  Donald 4
 2   B  1   Tracy 5
 3   C  2  Donald 6
 4   D  3 Sam 2
 5   E  3   Tracy 3
 6   F  4  Donald 2
 7   G  4   Tracy 1
 8   H  5 Sam 8
 tapply(data1$Score,data1$Participant.ID,mean)
   1   2   3   4   5
 4.5 6.0 2.5 1.5 8.0
 
 
 
 
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Re-Averaging-column-scores-when-participants-vary-in-number-of-observations-tp4703549p4703561.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords  protects your account.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Raster Help

2015-02-20 Thread John Kane
Simon,
You missed a key request from Sven.  He asked for data in dput() format.  This 
is essential for dealing with many problems.  Do ?dput for info on the function 
but esssentially it let's the reader see your data exactly as you see it, 
unfazed buy any special setting the reader may have for reading in data on R. 

Here is a little example of dput output. Just copy and paste into to get the 
new data.frame dat.

dat1 - structure(list(Observation = c(A, B, C, D, E, F, 
G, H), Participant.ID = c(1L, 1L, 2L, 3L, 3L, 4L, 4L, 5L), 
Video.Coder = c(Donald, Tracy, Donald, Sam, Tracy, 
Donald, Tracy, Sam), Score = c(4L, 5L, 6L, 2L, 3L, 
2L, 1L, 8L)), .Names = c(Observation, Participant.ID, 
Video.Coder, Score), class = data.frame, row.names = c(NA, 
-8L))

See these for some hints on asking questions.
https://github.com/hadley/devtools/wiki/Reproducibility
 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example


John Kane
Kingston ON Canada


 -Original Message-
 From: simon.t...@adtrak.co.uk
 Sent: Fri, 20 Feb 2015 15:45:04 +
 To: sven.temp...@gmail.com
 Subject: Re: [R] Raster Help
 
 Hi Sven,
 
 Many thanks for the reply and my apologies for not posting any code. So
 far, I have been able to write this (but it's very basic and just getting
 me to the 'complicated' stage).
 
 setwd(C:\\Users\\simon.tarr\\Documents\\GIS\\Test Data)
 require(raster)
 require(rgdal)
 revenue-read.table(revenue.csv,header=T,row.names=1,sep=,)
 postcodes-raster(C:\\Users\\simon.tarr\\Documents\\GIS\\Test
 Data\\rasters\\postcodes\\postcodes.img)
 trim(postcodes)
 plot(postcodes)
 
 I have attached a .csv file that contains my revenue data (this is
 actually
 just made up data- I wanted to make sure I could get the mapping to work
 before I start handling large quantities of real data).
 
 As I mentioned, the raster contains the same list of postcode names that
 appear in the CSV. So I need to somehow 'attach' the revenue figures to
 each postcode in the raster and then plot this.
 
 I hope this makes sense and apologies for the loose language...it's the
 only way I can think of to describe it.
 
 I'm trying hard to learn R and its syntax but sometimes I get stuck. I
 often know what needs to be done but struggle to write the necessary code
 to make it happen.
 
 All the best,
 
 Simon
 
 On 19 February 2015 at 20:37, Sven E. Templer sven.temp...@gmail.com
 wrote:
 
 Without (example) code it is hard to follow... use ?dput to present
 some data (subset).
 But if it is data.frames you are dealing with (for sure with read.csv,
 but not so sure at all with raster maps), give this a try:
 
 ?merge
 
 On 19 February 2015 at 17:44, Simon Tarr simon.t...@adtrak.co.uk
 wrote:
 Hello everyone,
 
 I need a little help with some R syntax to complete what (I think) is a
 fairly straightforward task- hopefully someone can assist!
 
 I have a raster map of the UK which is split into postcode areas (e.g.
 DE,
 NG, NR etc. 127 postcodes in total).
 
 I have installed the package 'raster' and have successfully plotted the
 .img in R. All working and looks correct with the raster.
 
 I also have a comma delimited CSV file containing the same postcodes as
 the
 raster with another column next to it containing revenue for each
 postcode.
 
 *I was wondering if someone could help me merge/bind the revenue
 figures
 into the correct postcode in the raster so that I can plot revenue per
 postcode.*
 
 I feel I should be using cbind and reclassify to do this but I can't be
 sure.
 
 Any help would be appreciated. Thanks in advance!
 
 Simon
 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] return named list from foreach

2015-02-20 Thread Jeff Newmiller
You cannot do that in one step. Do it right after:

names(out) - df$nm

Please don't post using HTML format.. it scrambles code, and since we cannot 
see what you saw it doesn't help in any way.

Also note that df is a function in the base stats package... not a good name 
to use.

---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On February 20, 2015 7:44:41 AM PST, Alexander Shenkin ashen...@ufl.edu wrote:
Hello all,

I've been trying to figure out how to return a named list from foreach.
 
Given that the order of the returned list is guaranteed to be in the 
order in which the object is passed to foreach, list members can be 
named afterwards.  However, I'm wondering if there's a better way to do

it, perhaps with some sort of combine function?

library(doParallel)
library(foreach)

cl - makeCluster(4)
registerDoParallel(cl)

df = data.frame(nm = letters[11:20], a = 1:10, b=11:20)

out = foreach(i=1:nrow(df)) %dopar% {
 a = list(j = sqrt(df[i,]$a), k = sqrt(df[i,]$b))
 a
}

How do I name the elements of out using the corresponding values
df$nm?

thanks,
allie

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem in R

2015-02-20 Thread Sarah Goslee
First, please reply to the list, not just me.

On Fri, Feb 20, 2015 at 10:22 AM, thanoon younis
thanoon.youni...@gmail.com wrote:
 thank you very much for your help

 actually, my data set like this

 #Data Set
 testJAGSdata =  list(N1=200, N2=200, P=18,

R=structure(
   .Data=c(8.0, 1.0,1.0, 8.0),
   .Dim=c(2,2)),

 thd - matrix(testJAGSdata$thd, ncol=6, byrow=TRUE)
 head(thd)
  [,1]   [,2]   [,3]   [,4]  [,5] [,6]
 [1,] -200 -2.517 -1.245 -0.444 0.848  200
 [2,] -200 -1.447 -0.420  0.119  1.245 200
 [3,] -200 -1.671 -0.869 -0.194  0.679 200
 [4,] -200 -1.642 -0.869 -0.293  0.332 200
 [5,] -200 -1.671 -0.827  0.052  0.756 200
 [6,] -200 -1.769 -1.098 -0.469  0.255 200
 [7,] -200 -1.490 -0.670 -0.082  0.880 200
 [8,] -200 -1.933 -0.880 -0.317  1.008 200
 [9,] -200  -1.587 -0.624  0.000  1.008 200
 [10,] -200 -1.983 -1.348 -0.348  1.045 200
 [11,] -200 -1.983 -1.229 -0.247  0.869 200
 [12,] -200 -2.262 -1.426  0.037  1.330 200
 [13,] -200 -2.371 -1.295 -0.224  0.651 200
 [14,] -200 -2.039 -1.112 -0.149  1.169 200
 [15,] -200 -2.262 -1.198 -0.309  1.198 200
 [16,] -200 -2.176 -1.537 -0.717  0.597 200
 [17,] -200 -1.447 -0.786  0.119  1.008 200
 [18,] -200 -2.039 -1.769 -0.661  0.642 200


 #Data Set
 testJAGSdata =  list(N1=200, N2=200, P=18,
 +
 +R=structure(
 +   .Data=c(8.0, 1.0,1.0, 8.0),
 +   .Dim=c(2,2)),
 +
 + thd - matrix(testJAGSdata$thd, ncol=6, byrow=TRUE)
 +  head(the)

Did you do what I suggested and look at your commands?

R is expecting the rest of whatever your first command is supposed to
do, and it isn't complete. That's what the + prompt is trying to tell
you.

The first command ends with a , and doesn't have enough parentheses -
something is missing there.
After that, you continue trying to paste in R output to the console.

It looks to  me like you are working by copying and pasting from notes
that you don't understand. Maybe going back and rereading an
introduction to R would help you.


 +  [,1]   [,2]   [,3]   [,4]  [,5] [,6]
 + [1,] -200 -2.517 -1.245 -0.444 0.848  200
 Error: unexpected numeric constant in:
  [,1]   [,2]   [,3]   [,4]  [,5] [,6]
 [1,] -200 -2.517 -1.245 -0.444 0.848
 [2,] -200 -1.447 -0.420  0.119  1.245 200
 Error: unexpected '[' in [
 [3,] -200 -1.671 -0.869 -0.194  0.679 200
 Error: unexpected '[' in [
 [4,] -200 -1.642 -0.869 -0.293  0.332 200
 Error: unexpected '[' in [
 [5,] -200 -1.671 -0.827  0.052  0.756 200
 Error: unexpected '[' in [
 [6,] -200 -1.769 -1.098 -0.469  0.255 200
 Error: unexpected '[' in [
 [7,] -200 -1.490 -0.670 -0.082  0.880 200
 Error: unexpected '[' in [
 [8,] -200 -1.933 -0.880 -0.317  1.008 200
 Error: unexpected '[' in [
 [9,] -200  -1.587 -0.624  0.000  1.008 200
 Error: unexpected '[' in [
 [10,] -200 -1.983 -1.348 -0.348  1.045 200
 Error: unexpected '[' in [
 [11,] -200 -1.983 -1.229 -0.247  0.869 200
 Error: unexpected '[' in [
 [12,] -200 -2.262 -1.426  0.037  1.330 200
 Error: unexpected '[' in [
 [13,] -200 -2.371 -1.295 -0.224  0.651 200
 Error: unexpected '[' in [
 [14,] -200 -2.039 -1.112 -0.149  1.169 200
 Error: unexpected '[' in [
 [15,] -200 -2.262 -1.198 -0.309  1.198 200
 Error: unexpected '[' in [
 [16,] -200 -2.176 -1.537 -0.717  0.597 200
 Error: unexpected '[' in [
 [17,] -200 -1.447 -0.786  0.119  1.008 200
 Error: unexpected '[' in [
 [18,] -200 -2.039 -1.769 -0.661  0.642 200
 Error: unexpected '[' in [


 Many thanks in advance

 On 20 February 2015 at 18:04, Sarah Goslee sarah.gos...@gmail.com wrote:

 Hi,

 On Fri, Feb 20, 2015 at 9:48 AM, thanoon younis
 thanoon.youni...@gmail.com wrote:
  Dear all members
 
  I have the following matrix in R
 
  thd - matrix(testJAGSdata$thd, ncol=6, byrow=TRUE)
  head(thd)
   [,1]   [,2]   [,3]   [,4]  [,5] [,6]
  [1,] -200 -2.517 -1.245 -0.444 0.848  200
  [2,] -200 -1.447 -0.420  0.119  1.245 200
  [3,] -200 -1.671 -0.869 -0.194  0.679 200
  [4,] -200 -1.642 -0.869 -0.293  0.332 200
  [5,] -200 -1.671 -0.827  0.052  0.756 200
  [6,] -200 -1.769 -1.098 -0.469  0.255 200
  [7,] -200 -1.490 -0.670 -0.082  0.880 200
  [8,] -200 -1.933 -0.880 -0.317  1.008 200
  [9,] -200  -1.587 -0.624  0.000  1.008 200
  [10,] -200 -1.983 -1.348 -0.348  1.045 200
  [11,] -200 -1.983 -1.229 -0.247  0.869 200
  [12,] -200 -2.262 -1.426  0.037  1.330 200
  [13,] -200 -2.371 -1.295 -0.224  0.651 200
  [14,] -200 -2.039 -1.112 -0.149  1.169 200
  [15,] -200 -2.262 -1.198 -0.309  1.198 200
  [16,] -200 -2.176 -1.537 -0.717  0.597 200
  [17,] -200 -1.447 -0.786  0.119  1.008 200
  [18,] -200 -2.039 -1.769 -0.661  0.642 200

 This is not reproducible, so the below are guesses.

  and when i implemented this matrix i found this error

 I have no idea what implemented this matrix might mean.

 
  +  head(the)

 First problem: the + means R expects continuation of a previous line,
 because the command is incomplete. So whatever you did BEFORE this
 line is wrong.

  +  [,1]   [,2]   [,3]   [,4]  [,5] [,6]
  + [1,] -200 

[R] return named list from foreach

2015-02-20 Thread Alexander Shenkin
Hello all,

I've been trying to figure out how to return a named list from foreach.  
Given that the order of the returned list is guaranteed to be in the 
order in which the object is passed to foreach, list members can be 
named afterwards.  However, I'm wondering if there's a better way to do 
it, perhaps with some sort of combine function?

library(doParallel)
library(foreach)

cl - makeCluster(4)
registerDoParallel(cl)

df = data.frame(nm = letters[11:20], a = 1:10, b=11:20)

out = foreach(i=1:nrow(df)) %dopar% {
 a = list(j = sqrt(df[i,]$a), k = sqrt(df[i,]$b))
 a
}

How do I name the elements of out using the corresponding values df$nm?

thanks,
allie

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Raster Help

2015-02-20 Thread Simon Tarr
Hi John  everyone else,

My apologies for not providing you all with a good reproducible example. I
will get to work on this and reply in due course.

Thanks John for pointing me in the right direction with this.

Regards,

On 20 February 2015 at 16:06, John Kane jrkrid...@inbox.com wrote:

 Simon,
 You missed a key request from Sven.  He asked for data in dput() format.
 This is essential for dealing with many problems.  Do ?dput for info on the
 function but esssentially it let's the reader see your data exactly as you
 see it, unfazed buy any special setting the reader may have for reading in
 data on R.

 Here is a little example of dput output. Just copy and paste into to get
 the new data.frame dat.

 dat1 - structure(list(Observation = c(A, B, C, D, E, F,
 G, H), Participant.ID = c(1L, 1L, 2L, 3L, 3L, 4L, 4L, 5L),
 Video.Coder = c(Donald, Tracy, Donald, Sam, Tracy,
 Donald, Tracy, Sam), Score = c(4L, 5L, 6L, 2L, 3L,
 2L, 1L, 8L)), .Names = c(Observation, Participant.ID,
 Video.Coder, Score), class = data.frame, row.names = c(NA,
 -8L))

 See these for some hints on asking questions.
 https://github.com/hadley/devtools/wiki/Reproducibility

 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example


 John Kane
 Kingston ON Canada


  -Original Message-
  From: simon.t...@adtrak.co.uk
  Sent: Fri, 20 Feb 2015 15:45:04 +
  To: sven.temp...@gmail.com
  Subject: Re: [R] Raster Help
 
  Hi Sven,
 
  Many thanks for the reply and my apologies for not posting any code. So
  far, I have been able to write this (but it's very basic and just getting
  me to the 'complicated' stage).
 
  setwd(C:\\Users\\simon.tarr\\Documents\\GIS\\Test Data)
  require(raster)
  require(rgdal)
  revenue-read.table(revenue.csv,header=T,row.names=1,sep=,)
  postcodes-raster(C:\\Users\\simon.tarr\\Documents\\GIS\\Test
  Data\\rasters\\postcodes\\postcodes.img)
  trim(postcodes)
  plot(postcodes)
 
  I have attached a .csv file that contains my revenue data (this is
  actually
  just made up data- I wanted to make sure I could get the mapping to work
  before I start handling large quantities of real data).
 
  As I mentioned, the raster contains the same list of postcode names that
  appear in the CSV. So I need to somehow 'attach' the revenue figures to
  each postcode in the raster and then plot this.
 
  I hope this makes sense and apologies for the loose language...it's the
  only way I can think of to describe it.
 
  I'm trying hard to learn R and its syntax but sometimes I get stuck. I
  often know what needs to be done but struggle to write the necessary code
  to make it happen.
 
  All the best,
 
  Simon
 
  On 19 February 2015 at 20:37, Sven E. Templer sven.temp...@gmail.com
  wrote:
 
  Without (example) code it is hard to follow... use ?dput to present
  some data (subset).
  But if it is data.frames you are dealing with (for sure with read.csv,
  but not so sure at all with raster maps), give this a try:
 
  ?merge
 
  On 19 February 2015 at 17:44, Simon Tarr simon.t...@adtrak.co.uk
  wrote:
  Hello everyone,
 
  I need a little help with some R syntax to complete what (I think) is a
  fairly straightforward task- hopefully someone can assist!
 
  I have a raster map of the UK which is split into postcode areas (e.g.
  DE,
  NG, NR etc. 127 postcodes in total).
 
  I have installed the package 'raster' and have successfully plotted the
  .img in R. All working and looks correct with the raster.
 
  I also have a comma delimited CSV file containing the same postcodes as
  the
  raster with another column next to it containing revenue for each
  postcode.
 
  *I was wondering if someone could help me merge/bind the revenue
  figures
  into the correct postcode in the raster so that I can plot revenue per
  postcode.*
 
  I feel I should be using cbind and reclassify to do this but I can't be
  sure.
 
  Any help would be appreciated. Thanks in advance!
 
  Simon
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 
 FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
 Check it out at http://www.inbox.com/earth




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see

Re: [R] irregular sequence of events

2015-02-20 Thread jim holtman
Here is a solution using the sqldf package:


 require(sqldf)
 timeline - data.frame(time = 1:30)
 events - structure(list(start = c(5, 13, 21), end = c(10, 16, 27)), .Names = 
 c(start,
+  end), row.names = c(NA, -3L), class = data.frame)
 # add event number
 events$num - paste0(A, seq(nrow(events)))
 events
  start end num
1 5  10  A1
213  16  A2
321  27  A3

 sqldf(
+ select t.*, e.num
+ from timeline t
+ left join (
+ select t.*, e.num
+ from timeline t, events e
+ where t.time between e.start and e.end) as e
+ on t.time = e.time
+ )
   time  num
1 1 NA
2 2 NA
3 3 NA
4 4 NA
5 5   A1
6 6   A1
7 7   A1
8 8   A1
9 9   A1
10   10   A1
11   11 NA
12   12 NA
13   13   A2
14   14   A2
15   15   A2
16   16   A2
17   17 NA
18   18 NA
19   19 NA
20   20 NA
21   21   A3
22   22   A3
23   23   A3
24   24   A3
25   25   A3
26   26   A3
27   27   A3
28   28 NA
29   29 NA
30   30 NA




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Feb 20, 2015 at 9:27 AM, PIKAL Petr petr.pi...@precheza.cz wrote:
 Dear all

 I know I am missing something obvious but after few hours of trials I ask for 
 some help.

 I have some sequence of values (days)
 x - 1:30

 and an indication of event start and end day
 mimo-c(5,10, 13,16, 21,27)

 or

 events - structure(list(start = c(5, 13, 21), end = c(10, 16, 27)), .Names = 
 c(start,
 end), row.names = c(NA, -3L), class = data.frame)

 I need to get a factor indicating event

 event - c(rep(NA, 4), rep(A1, 6), rep(NA, 2), rep(A2, 4), rep(NA, 4), 
 rep(A3, 7), rep(NA,3))
 factor(event)

 In such small example I can do it manually but I have a long vector of dates 
 and would like to use start and end day of events either from mimo vector or 
 from events data frame.

 Is there any function which does it automagically? I know I have seen it 
 before but I cannot find it now.

 Best regards
 Petr

 
 Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou 
 určeny pouze jeho adresátům.
 Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
 jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
 svého systému.
 Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
 jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
 Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
 zpožděním přenosu e-mailu.

 V případě, že je tento e-mail součástí obchodního jednání:
 - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, 
 a to z jakéhokoliv důvodu i bez uvedení důvodu.
 - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
 Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany 
 příjemce s dodatkem či odchylkou.
 - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
 dosažením shody na všech jejích náležitostech.
 - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
 žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
 pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu 
 případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je 
 adresátovi či osobě jím zastoupené známá.

 This e-mail and any documents attached to it may be confidential and are 
 intended only for its intended recipients.
 If you received this e-mail by mistake, please immediately inform its sender. 
 Delete the contents of this e-mail with all attachments and its copies from 
 your system.
 If you are not the intended recipient of this e-mail, you are not authorized 
 to use, disseminate, copy or disclose this e-mail in any manner.
 The sender of this e-mail shall not be liable for any possible damage caused 
 by modifications of the e-mail or by delay with transfer of the email.

 In case that this e-mail forms part of business dealings:
 - the sender reserves the right to end negotiations about entering into a 
 contract in any time, for any reason, and without stating any reasoning.
 - if the e-mail contains an offer, the recipient is entitled to immediately 
 accept such offer; The sender of this e-mail (offer) excludes any acceptance 
 of the offer on the part of the recipient containing any amendment or 
 variation.
 - the sender insists on that the respective contract is concluded only upon 
 an express mutual agreement on all its aspects.
 - the sender of this e-mail informs that he/she is not authorized to enter 
 into any contracts on behalf of the company except for cases in which he/she 
 is expressly authorized to do so in writing, and such authorization or power 
 of attorney is submitted to the recipient or the person represented by the 
 recipient, or the existence of such 

Re: [R] Raster Help

2015-02-20 Thread Simon Tarr
Hi Sven,

Many thanks for the reply and my apologies for not posting any code. So
far, I have been able to write this (but it's very basic and just getting
me to the 'complicated' stage).

setwd(C:\\Users\\simon.tarr\\Documents\\GIS\\Test Data)
require(raster)
require(rgdal)
revenue-read.table(revenue.csv,header=T,row.names=1,sep=,)
postcodes-raster(C:\\Users\\simon.tarr\\Documents\\GIS\\Test
Data\\rasters\\postcodes\\postcodes.img)
trim(postcodes)
plot(postcodes)

I have attached a .csv file that contains my revenue data (this is actually
just made up data- I wanted to make sure I could get the mapping to work
before I start handling large quantities of real data).

As I mentioned, the raster contains the same list of postcode names that
appear in the CSV. So I need to somehow 'attach' the revenue figures to
each postcode in the raster and then plot this.

I hope this makes sense and apologies for the loose language...it's the
only way I can think of to describe it.

I'm trying hard to learn R and its syntax but sometimes I get stuck. I
often know what needs to be done but struggle to write the necessary code
to make it happen.

All the best,

Simon

On 19 February 2015 at 20:37, Sven E. Templer sven.temp...@gmail.com
wrote:

 Without (example) code it is hard to follow... use ?dput to present
 some data (subset).
 But if it is data.frames you are dealing with (for sure with read.csv,
 but not so sure at all with raster maps), give this a try:

 ?merge

 On 19 February 2015 at 17:44, Simon Tarr simon.t...@adtrak.co.uk wrote:
  Hello everyone,
 
  I need a little help with some R syntax to complete what (I think) is a
  fairly straightforward task- hopefully someone can assist!
 
  I have a raster map of the UK which is split into postcode areas (e.g.
 DE,
  NG, NR etc. 127 postcodes in total).
 
  I have installed the package 'raster' and have successfully plotted the
  .img in R. All working and looks correct with the raster.
 
  I also have a comma delimited CSV file containing the same postcodes as
 the
  raster with another column next to it containing revenue for each
 postcode.
 
  *I was wondering if someone could help me merge/bind the revenue figures
  into the correct postcode in the raster so that I can plot revenue per
  postcode.*
 
  I feel I should be using cbind and reclassify to do this but I can't be
  sure.
 
  Any help would be appreciated. Thanks in advance!
 
  Simon
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Issue:CCC Doesn't Match in R and SAS

2015-02-20 Thread sagnik chakravarty
Hi,

I was trying to calculate the CCC metric in R with the help of NbClust
source code and the available SAS manual for CCC. All the Pseudo-F and
R-square are matching exactly with the SAS output except for the E_R2 and
hence CCC. I have tried and tested in multiple ways but couldn’t get any
explanation for this.

I have attached sample data, initial seed and also the SAS cluster output
in Rdata format which I used for E_R2 and CCC calculation.

FYI, following are the values of E_R2 in SAS and R respectively:

E_R2=0.4630339 (R); but ERSQ=0.3732597284 (SAS)

Could you please help me out with finding what's going wrong in the
background?

-

Kindly find below the codes I have used for this:

--

SAS

--

proc fastclus data=sample_data

maxiter=100 seed=initial_seed maxc=5 outstat=metrics out=output;

Var v1 v2 v3 v4 v5 v6;

run;



--

R

--

load(C:\\Users\\sagnik\\Desktop\\SAS_cluster.Rdata)



clust.perf.metrics - function(data, cl) {

  data1 - as.matrix(data)

  numberObsBefore - dim(data1)[1]

  data - na.omit(data1)

  nn - numberObsAfter - dim(data)[1]

  pp - dim(data)[2]

  qq - max(cl)

  TT - t(data) %*% data

  sizeEigenTT - length(eigen(TT)$value)

  eigenValues - eigen(TT/(nn - 1))$value

  for (i in 1:sizeEigenTT) {

if (eigenValues[i]  0) {

  cat(paste(There are only, numberObsAfter, non-missing observations
out of a possible,

numberObsBefore, observations.))

  stop(The TSS matrix is indefinite. There must be too many missing
values. The index cannot be calculated.)

}

  }



  s1 - sqrt(eigenValues)

  ss - rep(1, sizeEigenTT)

  for (i in 1:sizeEigenTT) {

if (s1[i] != 0)

  ss[i] = s1[i]

  }

  vv - prod(ss)

  z - matrix(0, ncol = qq, nrow = nn)

  clX - as.matrix(cl)

  for (i in 1:nn)

for (j in 1:qq) {

  z[i, j] == 0

  if (clX[i, 1] == j) z[i, j] = 1

}

  xbar - solve(t(z) %*% z) %*% t(z) %*% data

  B - t(xbar) %*% t(z) %*% z %*% xbar

  W - TT - B

  R2 - 1 - (sum(diag(W))/sum(diag(TT)))

  PseudoF - (sum(diag(B))/(qq-1))/(sum(diag(W))/(nn-qq))



  v1 - 1

  u1 - rep(0, pp)

  c1 - (vv/qq)^(1/pp)

  u1 - ss/c1

  k1 - sum((u1 = 1) == TRUE)

  p1 - min(k1, qq - 1)



  if (all(p1  0, p1  pp)) {

for (i in 1:p1) { v1 - v1 * ss[i]}

c - (v1/qq)^(1/p1)

u - ss/c

b1 - sum(1/(nn + u[1:p1]))

b2 - sum(u[(p1 + 1):pp]^2/(nn + u[(p1 + 1):pp]), na.rm = TRUE)

E_R2 - 1 - ((b1 + b2)/sum(u^2)) * ((nn - qq)^2/nn) * (1 + (4/nn))

ccc - log((1 - E_R2)/(1 - R2)) * (sqrt(nn * p1/2)/((0.001 + E_R2)^1.2))

  } else {

b1 - sum(1/(nn + u))

E_R2 - 1 - (b1/sum(u^2)) * ((nn - qq)^2/nn) * (1 + 4/nn)

ccc - log((1 - E_R2)/(1 - R2)) * (sqrt(nn * pp/2)/((0.001 + E_R2)^1.2))

  }

  results - list(R_2=R2, PseudoF=PseudoF, CCC = ccc, E_R2=E_R2);
return(results)

}



clust.perf.metrics(output[,1:6],output[,7])

#---

THANKS IN ADVANCE,

REGARDS,

SAGNIK
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How do I access a specific element of a multi-dimensional list

2015-02-20 Thread Knut Hansen
Dear list,

Let's say I have setup the following list:
a = c(2, 3, 5) 
b = c(aa, bb, cc) 
c = c(TRUE, FALSE, TRUE) 

x = list(a, b, c)

I want to access the first second dimension element of each first dimension 
element so that the result is something like:
(2, aa, TRUE)

In my real life problem the list is about 350 elements in the first dimension 
so the solution must handle that.

Sincerely
Knut Hansen

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Split a dataframe by rownames and/or colnames

2015-02-20 Thread Tim Richter-Heitmann

Dear List,

Consider this example

df - data.frame(matrix(rnorm(9*9), ncol=9))
names(df) - c(c_1, d_1, e_1, a_p, b_p, c_p, 1_o1, 2_o1, 
3_o1)

row.names(df) - names(df)


indx - gsub(.*_, , names(df))

I can split the dataframe by the index that is given in the column.names 
after the underscore _.


list2env(
  setNames(
lapply(split(colnames(df), indx), function(x) df[x]),
paste('df', sort(unique(indx)), sep=_)),
  envir=.GlobalEnv)

However, i changed my mind and want to do it now by rownames. Exchanging 
colnames with rownames does not work, it gives the exact same output (9 
rows x 3 columns). I could do

as.data.frame(t(df_x),
but maybe that is not elegant.
What would be the solution for splitting the dataframe by rows?

Thank you very much!

--
Tim Richter-Heitmann

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I access a specific element of a multi-dimensional list

2015-02-20 Thread jim holtman
try this:

 a = c(2, 3, 5)
  b = c(aa, bb, cc)
  c = c(TRUE, FALSE, TRUE)

  x = list(a, b, c)
 x
[[1]]
[1] 2 3 5
[[2]]
[1] aa bb cc
[[3]]
[1]  TRUE FALSE  TRUE
 sapply(x, '[[', 1)
[1] 2aa   TRUE


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Feb 20, 2015 at 7:18 AM, Knut Hansen knut.han...@uit.no wrote:
 Dear list,

 Let's say I have setup the following list:
 a = c(2, 3, 5)
 b = c(aa, bb, cc)
 c = c(TRUE, FALSE, TRUE)

 x = list(a, b, c)

 I want to access the first second dimension element of each first dimension
 element so that the result is something like:
 (2, aa, TRUE)

 In my real life problem the list is about 350 elements in the first dimension
 so the solution must handle that.

 Sincerely
 Knut Hansen

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Windows7, latest R-Studio, newb, how to display 1 column name from a data frame.

2015-02-20 Thread Rod Meriwether
I'm supposed to return for class data with the ID and the value.
I'm returning just the correct value. Here's the code and the output.
 nobs - data.frame()
  files_list - list.files(directory, full.names=TRUE)
  dat - data.frame()

  for (i in id){
 dat - (read.csv(files_list[i]))

 nobs -  sum(complete.cases(dat))
 print(nobs)

 }
}

The below values are correct, but I dont have the id in front of each
sum.
Any help?
 complete(specdata,c(2,4,8,10,12))
[1] 1041
[1] 474
[1] 192
[1] 148
[1] 96

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Example of Calling a DLL

2015-02-20 Thread Jeff Newmiller
This is off-topic here (read the posting guide). You would probably proceed 
most effectively by studying how GCC interacts with VS object code, e.g. [1], 
and studying the Writing R Extensions manual.

[1] 
http://stackoverflow.com/questions/8683046/compatibility-of-dll-a-lib-def-between-visualstudio-and-gcc
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On February 20, 2015 9:11:45 PM PST, Alex Restrepo alex_restr...@hotmail.com 
wrote:
All, 

I'm a newbie to R and am interested in seeing a simple example of
calling a 3rd party Visual Studio generated DLL from RStudio.  Does
anyone have a simple example which also walks through the preliminary
steps of setting up the INCLUDE path and the library path to either a
DLL or LIB file ?  I have tried to find an easy example, but thus far
has no luck finding an example using Rcpp to communicate to a 3rd party
visual studio DLL. 

Many Thanks in Advance, Alex
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating a distinct zip file

2015-02-20 Thread Jeff Newmiller
On Windows it builds a zip file. If you are on Linux, you might [1] need [2].

[1] https://stat.ethz.ch/pipermail/r-help/2005-January/063596.html
[2] http://cran.r-project.org/doc/contributed/cross-build.pdf
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On February 20, 2015 6:56:34 PM PST, Rolf Turner r.tur...@auckland.ac.nz 
wrote:
On 21/02/15 15:02, Jeff Newmiller wrote:
 R CMD INSTALL --build packagename

That will create a *.tar.gz file, not a *.zip file.  The latter being
what Erin wanted, if I understand correctly.

I have worked around the problem in the past with a shell script
like unto:

#! /bin/csh
set vnum = `grep Version $pkge/DESCRIPTION | sed -e 's/Version: //'`
R CMD INSTALL -l Lib $pkge  /dev/null
cd Lib
zip -r -l $pkge.zip $pkge  /dev/null
mv $pkge.zip ../$pkge_$vnum.zip

In the foregoing pkge is the name of the package you are trying to 
build.  You will have to have created the holding library Lib a
priori.

There are doubtless (much) better ways of accomplishing this task, but
I 
don't know them.

cheers,

Rolf


---
 Jeff NewmillerThe .   .  Go
Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
Go...
Live:   OO#.. Dead: OO#.. 
Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#. 
rocks...1k

---
 Sent from my phone. Please excuse my brevity.

 On February 20, 2015 1:07:10 PM PST, Erin Hodgess
erinm.hodg...@gmail.com wrote:
 Hello yet again.

 I am trying to create a zip file for a friend who has a Windows
 machine.

 He needs to access this via the local zip file packages option.

 When I use R CMD INSTALL --compile-both, it produces an item in the
 library
 tree (as promised).

 However, I would like to have an actual .zip file.

 I do know at one time that was possible, not sure if I can still do
it.

 I did try R CMD INSTALL --force-biarch as well, same result as
compile
 both.

 thank you for any suggestions.

 Sincerely,
 Erin

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating a distinct zip file

2015-02-20 Thread Prof Brian Ripley

On 21/02/2015 07:31, Jeff Newmiller wrote:

On Windows it builds a zip file. If you are on Linux, you might [1] need [2].

[1] https://stat.ethz.ch/pipermail/r-help/2005-January/063596.html
[2] http://cran.r-project.org/doc/contributed/cross-build.pdf


But the first is from 2005 and the second is invalid (it should be
http://cran.r-project.org/doc/contrib/cross-build.pdf and describes R 
1.7.x: what it describes is no longer supported).


For some time you can install a package without compilable sources on 
any R platform.  So the tarball would be all that is needed.  If 
compilation is needed, submit to winbuilder to make  .zip file.


See also
http://cran.r-project.org/bin/windows/base/rw-FAQ.html#How-can-I-get-a-binary-version-of-a-package_003f




---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

On February 20, 2015 6:56:34 PM PST, Rolf Turner r.tur...@auckland.ac.nz 
wrote:

On 21/02/15 15:02, Jeff Newmiller wrote:

R CMD INSTALL --build packagename


That will create a *.tar.gz file, not a *.zip file.  The latter being
what Erin wanted, if I understand correctly.

I have worked around the problem in the past with a shell script
like unto:

#! /bin/csh
set vnum = `grep Version $pkge/DESCRIPTION | sed -e 's/Version: //'`
R CMD INSTALL -l Lib $pkge  /dev/null
cd Lib
zip -r -l $pkge.zip $pkge  /dev/null
mv $pkge.zip ../$pkge_$vnum.zip

In the foregoing pkge is the name of the package you are trying to
build.  You will have to have created the holding library Lib a
priori.

There are doubtless (much) better ways of accomplishing this task, but
I
don't know them.

cheers,

Rolf




---

Jeff NewmillerThe .   .  Go

Live...

DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live

Go...

Live:   OO#.. Dead: OO#..

Playing

Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.

rocks...1k



---

Sent from my phone. Please excuse my brevity.

On February 20, 2015 1:07:10 PM PST, Erin Hodgess

erinm.hodg...@gmail.com wrote:

Hello yet again.

I am trying to create a zip file for a friend who has a Windows
machine.

He needs to access this via the local zip file packages option.

When I use R CMD INSTALL --compile-both, it produces an item in the
library
tree (as promised).

However, I would like to have an actual .zip file.

I do know at one time that was possible, not sure if I can still do

it.


I did try R CMD INSTALL --force-biarch as well, same result as

compile

both.

thank you for any suggestions.

Sincerely,
Erin


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] trouble with .Rd file

2015-02-20 Thread Erin Hodgess
Hello everyone!

I've been messing with this .Rd file and am having forest/trees problem by
now.

Here is the section of the .Rd file that is the troublemaker:

\usage{
plot.fore4nodate(y, sim, dates, date.fmt = %Y-%m-%d, gof.leg = FALSE,
gof.digits = 2, legend =  , leg.cex = 1, bands.col = lightblue, border
= NA,
tick.tstep = auto, lab.tstep = auto, lab.fmt = NULL, main = NULL,
cal.ini = NA,
val.ini = NA, xlab = Time,  ylab =  , ylim, col = c(black, blue),
type = c(lines, lines), cex = c(1.8, 1.8), cex.axis = 1.8, ex.lab = 1.8,
lwd = c(2.5, 2.5), y = 1:2, pch = c(1, 9), cex.main = 2.1, lasa = 1, mt1 =
1.27, ...)
}


Now the error part:

c:\Progra~1\R\R-3.0.2\bin\x64\Rcmd build ts1

* checking for file 'ts1/DESCRIPTION' ... OK
* preparing 'ts1':
* checking DESCRIPTION meta-information ... OK
Warning: newline within quoted string at plot.fore4nodate.Rd:10
Error in parse_Rd
(C:/Users/hodgesse/AppData/Local/Temp/Rt../ts1/man/plot.fore4nodate.RD,
:
  Unexpected end of input in (in  quoted string opened at
plot.fore4nodate.Rd:14.26)
Execution halted


The 14.26 would be at the word dates in the first line of the
plot.fore4nodate line.

This is making me a little nuts.  Actually a lot nuts.

If anyone can see anything, I would really appreciate any suggestions.

Sincerely,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Mathematical and Statistics
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] trouble with .Rd file

2015-02-20 Thread Duncan Murdoch
On 20/02/2015 12:58 PM, Erin Hodgess wrote:
 Hello everyone!
 
 I've been messing with this .Rd file and am having forest/trees problem by
 now.
 
 Here is the section of the .Rd file that is the troublemaker:
 
 \usage{
 plot.fore4nodate(y, sim, dates, date.fmt = %Y-%m-%d, gof.leg = FALSE,
 gof.digits = 2, legend =  , leg.cex = 1, bands.col = lightblue, border
 = NA,
 tick.tstep = auto, lab.tstep = auto, lab.fmt = NULL, main = NULL,
 cal.ini = NA,
 val.ini = NA, xlab = Time,  ylab =  , ylim, col = c(black, blue),
 type = c(lines, lines), cex = c(1.8, 1.8), cex.axis = 1.8, ex.lab = 1.8,
 lwd = c(2.5, 2.5), y = 1:2, pch = c(1, 9), cex.main = 2.1, lasa = 1, mt1 =
 1.27, ...)
 }
 
 
 Now the error part:
 
 c:\Progra~1\R\R-3.0.2\bin\x64\Rcmd build ts1
 
 * checking for file 'ts1/DESCRIPTION' ... OK
 * preparing 'ts1':
 * checking DESCRIPTION meta-information ... OK
 Warning: newline within quoted string at plot.fore4nodate.Rd:10
 Error in parse_Rd
 (C:/Users/hodgesse/AppData/Local/Temp/Rt../ts1/man/plot.fore4nodate.RD,
 :
   Unexpected end of input in (in  quoted string opened at
 plot.fore4nodate.Rd:14.26)
 Execution halted
 
 
 The 14.26 would be at the word dates in the first line of the
 plot.fore4nodate line.
 
 This is making me a little nuts.  Actually a lot nuts.
 
 If anyone can see anything, I would really appreciate any suggestions.

Generally percent symbols (%) need to be escaped in Rd files.  So the
default value for date.fmt should be entered as \%Y-\%m-\%d.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Chi-square test

2015-02-20 Thread pari hesabi
Hello,
If the vector of observed frequencies is:  
f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5)
and the vector of probability :p11-c(7.577864e-06, 1.999541e-04  
,1.833510e-03,  9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 
1.525880e-01, 1.689712e-01, 1.563522e-01,   1.232031e-01, 8.395000e-02, 
5.009534e-02, 2.645857e-02,0.0205403)
The sum of the probabilities is equal to one. But when I want to do the the 
Chi-square test, I get this error: probabilities must sum to one.
Does anybody know the reason?
Best Regards,
pari
  
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Split a dataframe by rownames and/or colnames

2015-02-20 Thread Bert Gunter
I think

?tapply

and friends: ?by ?aggregate  ?ave

is what you want.

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
Clifford Stoll




On Fri, Feb 20, 2015 at 9:33 AM, Tim Richter-Heitmann
trich...@uni-bremen.de wrote:
 Dear List,

 Consider this example

 df - data.frame(matrix(rnorm(9*9), ncol=9))
 names(df) - c(c_1, d_1, e_1, a_p, b_p, c_p, 1_o1, 2_o1,
 3_o1)
 row.names(df) - names(df)


 indx - gsub(.*_, , names(df))

 I can split the dataframe by the index that is given in the column.names
 after the underscore _.

 list2env(
   setNames(
 lapply(split(colnames(df), indx), function(x) df[x]),
 paste('df', sort(unique(indx)), sep=_)),
   envir=.GlobalEnv)

 However, i changed my mind and want to do it now by rownames. Exchanging
 colnames with rownames does not work, it gives the exact same output (9 rows
 x 3 columns). I could do
 as.data.frame(t(df_x),
 but maybe that is not elegant.
 What would be the solution for splitting the dataframe by rows?

 Thank you very much!

 --
 Tim Richter-Heitmann

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a list of lists using lapply

2015-02-20 Thread Charles C. Berry

On Fri, 20 Feb 2015, Aron Lindberg wrote:


Hmm…Chuck’s solution may actually be problematic because there are several 
entries which at the deepest level are called “sha”, but that should not be 
included, such as:





input[[67]]$content[[1]]$commit$tree$sha




and




input[[67]]$content[[1]]$parents[[1]]$sha





it’s only the “sha” that fit the following subsetting pattern that should be 
included:





input[[i]]$content[[1]]$sha[1]





This should be straightforward. Look at what grepl() is doing.

And look at what names(unlist(input)) yields.

You can either write a regular expression to handle this (perhaps 
content.sha$) or write other grepl() expressions to select (or get rid 
of) the desired (or unwanted) pattern.


See ?grepl and the page on regular expression referenced there.

HTH,

Chuck
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-square test

2015-02-20 Thread Berend Hasselman

 On 20-02-2015, at 19:05, pari hesabi statistic...@hotmail.com wrote:
 
 Hello,
 If the vector of observed frequencies is:  
 f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5)
 and the vector of probability :p11-c(7.577864e-06, 1.999541e-04  
 ,1.833510e-03,  9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 
 1.525880e-01, 1.689712e-01, 1.563522e-01,   1.232031e-01, 8.395000e-02, 
 5.009534e-02, 2.645857e-02,0.0205403)
 The sum of the probabilities is equal to one. But when I want to do the the 
 Chi-square test, I get this error: probabilities must sum to one.

print  sum(p11)-1

 Does anybody know the reason?

R FAQ 7.31  (http://cran.r-project.org/doc/FAQ/R-FAQ.html)

Berend

 Best Regards,
 pari
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] trouble with .Rd file

2015-02-20 Thread Erin Hodgess
Ah!  Perfect.  Thanks so much!

Sincerely,
Erin


On Fri, Feb 20, 2015 at 1:03 PM, Duncan Murdoch murdoch.dun...@gmail.com
wrote:

 On 20/02/2015 12:58 PM, Erin Hodgess wrote:
  Hello everyone!
 
  I've been messing with this .Rd file and am having forest/trees problem
 by
  now.
 
  Here is the section of the .Rd file that is the troublemaker:
 
  \usage{
  plot.fore4nodate(y, sim, dates, date.fmt = %Y-%m-%d, gof.leg = FALSE,
  gof.digits = 2, legend =  , leg.cex = 1, bands.col = lightblue,
 border
  = NA,
  tick.tstep = auto, lab.tstep = auto, lab.fmt = NULL, main = NULL,
  cal.ini = NA,
  val.ini = NA, xlab = Time,  ylab =  , ylim, col = c(black, blue),
  type = c(lines, lines), cex = c(1.8, 1.8), cex.axis = 1.8, ex.lab =
 1.8,
  lwd = c(2.5, 2.5), y = 1:2, pch = c(1, 9), cex.main = 2.1, lasa = 1, mt1
 =
  1.27, ...)
  }
 
 
  Now the error part:
 
  c:\Progra~1\R\R-3.0.2\bin\x64\Rcmd build ts1
 
  * checking for file 'ts1/DESCRIPTION' ... OK
  * preparing 'ts1':
  * checking DESCRIPTION meta-information ... OK
  Warning: newline within quoted string at plot.fore4nodate.Rd:10
  Error in parse_Rd
 
 (C:/Users/hodgesse/AppData/Local/Temp/Rt../ts1/man/plot.fore4nodate.RD,
  :
Unexpected end of input in (in  quoted string opened at
  plot.fore4nodate.Rd:14.26)
  Execution halted
 
 
  The 14.26 would be at the word dates in the first line of the
  plot.fore4nodate line.
 
  This is making me a little nuts.  Actually a lot nuts.
 
  If anyone can see anything, I would really appreciate any suggestions.

 Generally percent symbols (%) need to be escaped in Rd files.  So the
 default value for date.fmt should be entered as \%Y-\%m-\%d.

 Duncan Murdoch




-- 
Erin Hodgess
Associate Professor
Department of Mathematical and Statistics
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-square test

2015-02-20 Thread David Winsemius

On Feb 20, 2015, at 10:05 AM, pari hesabi wrote:

 Hello,
 If the vector of observed frequencies is:  
 f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5)
 and the vector of probability :p11-c(7.577864e-06, 1.999541e-04  
 ,1.833510e-03,  9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 
 1.525880e-01, 1.689712e-01, 1.563522e-01,   1.232031e-01, 8.395000e-02, 
 5.009534e-02, 2.645857e-02,0.0205403)
 The sum of the probabilities is equal to one.

Well, the sum is close to 1.0 but not exact. There's a simple fix:

 sum(p11)==1
[1] FALSE
 sum( p11/sum(p11) )==1
[1] TRUE

  But when I want to do the the Chi-square test, I get this error: 
 probabilities must sum to one.
 Does anybody know the reason?

Numerical accuracy. See R-FAQ 7.31

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-square test

2015-02-20 Thread David L Carlson
And probably why chisq.test has the rescale.p= argument. Your second problem 
with small expected values can be handled with simulate.p.value=.

 chisq.test(f, p=p11)
Error in chisq.test(f, p = p11) : probabilities must sum to 1.
 1-sum(p11)
[1] 4.3036e-08
 chisq.test(f, p=p11, rescale.p=TRUE)

Chi-squared test for given probabilities

data:  f
X-squared = 7.6268, df = 14, p-value = 0.9078

Warning message:
In chisq.test(f, p = p11, rescale.p = TRUE) :
  Chi-squared approximation may be incorrect
 chisq.test(f, p=p11, rescale.p=TRUE, simulate.p.value=TRUE)

Chi-squared test for given probabilities with simulated p-value (based
on 2000 replicates)

data:  f
X-squared = 7.6268, df = NA, p-value = 0.7996

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Berend Hasselman
Sent: Friday, February 20, 2015 12:13 PM
To: pari hesabi
Cc: r-help@r-project.org
Subject: Re: [R] Chi-square test


 On 20-02-2015, at 19:05, pari hesabi statistic...@hotmail.com wrote:
 
 Hello,
 If the vector of observed frequencies is:  
 f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5)
 and the vector of probability :p11-c(7.577864e-06, 1.999541e-04  
 ,1.833510e-03,  9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 
 1.525880e-01, 1.689712e-01, 1.563522e-01,   1.232031e-01, 8.395000e-02, 
 5.009534e-02, 2.645857e-02,0.0205403)
 The sum of the probabilities is equal to one. But when I want to do the the 
 Chi-square test, I get this error: probabilities must sum to one.

print  sum(p11)-1

 Does anybody know the reason?

R FAQ 7.31  (http://cran.r-project.org/doc/FAQ/R-FAQ.html)

Berend

 Best Regards,
 pari
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a list of lists using lapply

2015-02-20 Thread David Winsemius

On Feb 20, 2015, at 6:13 AM, Aron Lindberg wrote:

 Hmm…Chuck’s solution may actually be problematic because there are several 
 entries which at the deepest level are called “sha”, but that should not be 
 included, such as:
 
 input[[67]]$content[[1]]$commit$tree$sh
 
 
 and
 
 input[[67]]$content[[1]]$parents[[1]]$sha
 
 it’s only the “sha” that fit the following subsetting pattern that should be 
 included:
 
 
 input[[i]]$content[[1]]$sha[1]
 
 
 It’s getting thornier!
 
 To be fair to Rolf’s solution (which probably can be updated to solve the 
 problem), I’ve posted the complete dput here:
 
 https://gist.githubusercontent.com/aronlindberg/92700c04c88ff112e4f7/raw/0f3cd8468f4dc82267be3cec72d53a7a04f5c449/dput.R

I didn't try on the larger example, but this works on the smaller one:

 get_shas - function(input){
x - lapply(input, [[, content)
y - lapply(x, [[, 1)   
z - lapply(y, function(yy) if( length(names(yy))  names(yy) ==sha  
){ yy[[sha]] })
}
  sha_lists - get_shas(input)

It does deliver an entry for every leaf of the input-object which is either the 
value of sha or NA. I think that is not a bad thing because it lets you 
figure out where the values are coming from.

 
 -- 
 
 Aron Lindberg
 
 
 
 
 Doctoral Candidate, Information Systems
 
 Weatherhead School of Management 
 
 Case Western Reserve University
 
 aronlindberg.github.io
 
 On Fri, Feb 20, 2015 at 8:25 AM, Aron Lindberg aron.lindb...@case.edu
 wrote:
 
 Thanks Chuck and Rolf.
 While Rolf’s code also works on the dput that I actually gave you (a smaller 
 subset of the full dataset), it failed to work on the larger dataset, 
 because there are further exceptions:
 input[[i]]$content[[1]] is sometimes a list, sometimes a character vector, 
 and sometimes input[[i]]$content simply returns list().
 Chuck’s solution however bypasses this and works on the full dataset (which 
 was 8mb, which is why I didn’t upload it as a gist).
 Best,
 Aron
 -- 
 Aron Lindberg
 Doctoral Candidate, Information Systems
 Weatherhead School of Management 
 Case Western Reserve University
 aronlindberg.github.io
 On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu wrote:
 Aron Lindberg aron.lindberg at case.edu writes:
 
 Hi Everyone,
 
 I'm working on a thorny subsetting problem involving list of lists. I've 
 put a 
 dput of the data here:
 
https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/
 raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput
 
 IIUC, you want the value of every list element that is named sha and 
 that name will only apply to atomic objects.
 If so, this should do it. 
 input - dget(/tmp/dpt)
 shas - unlist( input, use.names=FALSE )[ grepl( sha, 
 names(unlist(input)))]
 input[[67]]$content[[1]]$sha
 [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15
 which(input[[67]]$content[[1]]$sha == shas )
 [1] 194
 HTH,
 Chuck
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split a dataframe by rownames and/or colnames

2015-02-20 Thread David Winsemius

On Feb 20, 2015, at 9:33 AM, Tim Richter-Heitmann wrote:

 Dear List,
 
 Consider this example
 
 df - data.frame(matrix(rnorm(9*9), ncol=9))
 names(df) - c(c_1, d_1, e_1, a_p, b_p, c_p, 1_o1, 2_o1, 
 3_o1)
 row.names(df) - names(df)
 
 
 indx - gsub(.*_, , names(df))
 
 I can split the dataframe by the index that is given in the column.names 
 after the underscore _.
 
 list2env(
  setNames(
lapply(split(colnames(df), indx), function(x) df[x]),
paste('df', sort(unique(indx)), sep=_)),
  envir=.GlobalEnv)
 

 However, i changed my mind and want to do it now by rownames. Exchanging 
 colnames with rownames does not work, it gives the exact same output (9 rows 
 x 3 columns). I could do
 as.data.frame(t(df_x),
 but maybe that is not elegant.
 What would be the solution for splitting the dataframe by rows?

The split.data.frame method seems to work perfectly well with a 
rownames-derived index argument:

 split(df, sub(.+_,, rownames(df) ) )
$`1`
  c_1   d_1  e_1   a_p   b_p   c_p  1_o1 2_o1  3_o1
c_1 -0.11 -0.04 1.33 -0.87 -0.16 -0.25 -0.75 0.34  0.14
d_1 -0.62 -0.94 0.80 -0.78 -0.70  0.74  0.11 1.44 -0.33
e_1  0.98 -0.83 0.48  0.19 -0.32 -1.01  1.28 1.04 -2.16

$o1
   c_1   d_1   e_1   a_p   b_p   c_p  1_o1  2_o1  3_o1
1_o1 -0.93 -0.02  0.69 -0.67  1.04  1.04 -1.50 -0.36  0.50
2_o1  0.02 -0.16 -0.09 -1.50 -0.02 -1.04  1.07 -0.45  1.56
3_o1 -1.42  0.88 -0.05  0.85 -1.35  0.21  1.35  0.92 -0.76

$p
  c_1   d_1   e_1   a_p  b_p   c_p  1_o1  2_o1  3_o1
a_p -1.35  0.91 -0.58 -0.63 0.94 -1.13  0.71  0.25  0.82
b_p -0.25 -0.73 -0.41 -1.71 1.28  0.19 -0.35  1.74 -0.93
c_p -0.01 -1.11 -0.12  0.58 1.51  0.03 -0.99 -0.23 -0.03

 
 Thank you very much!
 
 -- 
 Tim Richter-Heitmann
 
-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a list of lists using lapply

2015-02-20 Thread William Dunlap
The elNamed(x, name) function can simplify this code a bit.  The following
gives the same
result as David W's get_shas() for the sample dataset provided:

   get_shas2 - function (input) {
  lapply(input, function(el) elNamed(elNamed(el, content)[[1]],
 sha)[1])
   }

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Feb 20, 2015 at 10:56 AM, David Winsemius dwinsem...@comcast.net
wrote:


 On Feb 20, 2015, at 6:13 AM, Aron Lindberg wrote:

  Hmm…Chuck’s solution may actually be problematic because there are
 several entries which at the deepest level are called “sha”, but that
 should not be included, such as:
 
  input[[67]]$content[[1]]$commit$tree$sh
 
 
  and
 
  input[[67]]$content[[1]]$parents[[1]]$sha
 
  it’s only the “sha” that fit the following subsetting pattern that
 should be included:
 
 
  input[[i]]$content[[1]]$sha[1]
 
 
  It’s getting thornier!
 
  To be fair to Rolf’s solution (which probably can be updated to solve
 the problem), I’ve posted the complete dput here:
 
 
 https://gist.githubusercontent.com/aronlindberg/92700c04c88ff112e4f7/raw/0f3cd8468f4dc82267be3cec72d53a7a04f5c449/dput.R

 I didn't try on the larger example, but this works on the smaller one:

  get_shas - function(input){
 x - lapply(input, [[, content)
 y - lapply(x, [[, 1)
 z - lapply(y, function(yy) if( length(names(yy))  names(yy)
 ==sha  ){ yy[[sha]] })
 }
   sha_lists - get_shas(input)

 It does deliver an entry for every leaf of the input-object which is
 either the value of sha or NA. I think that is not a bad thing because it
 lets you figure out where the values are coming from.

 
  --
 
  Aron Lindberg
 
 
 
 
  Doctoral Candidate, Information Systems
 
  Weatherhead School of Management
 
  Case Western Reserve University
 
  aronlindberg.github.io
 
  On Fri, Feb 20, 2015 at 8:25 AM, Aron Lindberg aron.lindb...@case.edu
  wrote:
 
  Thanks Chuck and Rolf.
  While Rolf’s code also works on the dput that I actually gave you (a
 smaller subset of the full dataset), it failed to work on the larger
 dataset, because there are further exceptions:
  input[[i]]$content[[1]] is sometimes a list, sometimes a character
 vector, and sometimes input[[i]]$content simply returns list().
  Chuck’s solution however bypasses this and works on the full dataset
 (which was 8mb, which is why I didn’t upload it as a gist).
  Best,
  Aron
  --
  Aron Lindberg
  Doctoral Candidate, Information Systems
  Weatherhead School of Management
  Case Western Reserve University
  aronlindberg.github.io
  On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry ccbe...@ucsd.edu
 wrote:
  Aron Lindberg aron.lindberg at case.edu writes:
 
  Hi Everyone,
 
  I'm working on a thorny subsetting problem involving list of lists.
 I've put a
  dput of the data here:
 
 
 https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/
  raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput
 
  IIUC, you want the value of every list element that is named sha and
  that name will only apply to atomic objects.
  If so, this should do it.
  input - dget(/tmp/dpt)
  shas - unlist( input, use.names=FALSE )[ grepl( sha,
 names(unlist(input)))]
  input[[67]]$content[[1]]$sha
  [1] 58cf43ecdc1beb7e1043e9de612ecc817b090f15
  which(input[[67]]$content[[1]]$sha == shas )
  [1] 194
  HTH,
  Chuck
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.