Re: [R] Defining multiple variables in a loop

2012-06-25 Thread Berend Hasselman

Taylor White wrote
 
 Good day,
 
 For lack of a better solution (or perhaps I am ignorant to something
 more elegant), I have been bootstrapping panel data by hand so to
 speak and I would like to know if there is a way to define multiple
 variables in a loop using the loop variable.  I found a post (here:
 https://stat.ethz.ch/pipermail/r-help/2002-October/026305.html ) that
 discussed naming multiple variables but it didn't seem to allow me to
 define the variables as something more useful.  I tried the code
 immediately below (plus some variations) and it just didn't work.
 
 for (i in 1:20) {
 assign(paste(country., i, sep = ) - subset(OECDFiscal2, Country == i)
 }
 

Leaving aside the question whether this is a good way of doing what you
want, this is the wrong syntax.
?assign will show you the correct syntax.

assign(paste(country., i, sep = ) , subset(OECDFiscal2, Country == i))

Berend

--
View this message in context: 
http://r.789695.n4.nabble.com/Defining-multiple-variables-in-a-loop-tp4634361p4634387.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Loop for multiple plots in figure

2012-06-25 Thread Marcel Curlin
Hello, I have longitudinal data of the form below from N subjects; I am
trying to create figure with N small subplots on a single page, in which
each plot is from only one subject, and in each plot there is a separate
curve for each value of param1. 

So in this case, there would be four plots on the page (one each for Bob,
Steve, Kevin and Dave), and each plot would have two separate curves (one
for param1 = 1 and one for param1 = 0). The main title of the plot should be
the subject name. I also need to sort the order of the plots on the page by
param2.

I can do this with a small number of subjects using manual commands. For a
larger number I know that a 'for loop' is called for, but can't figure out
how to get each of the subjects to plot separately, could not figure it out
from the existing posts.  For now I want to do this in the basic environment
though I know that lattice could also work (might try that later). Any help
appreciated

tC - textConnection(
Subject XvarYvarparam1  param2
bob 9   100 1   100
bob 0   250 1   200
steve   2   454 1   50
bob -5  271 0   35
bob 3   10  0   74
steve   1   500 1   365
kevin   5   490 1   546
bob 8   855 0   76
dave2   233 0   343
steve   -10 388 0   556
steve   -7  284 1   388
dave3   568 1   555
kevin   4   247 0   57
bob 6   300 1   600
)
data - read.table(header=TRUE, tC)
close.connection(tC)
rm(tC)

par(mfrow=c(2,2)

--
View this message in context: 
http://r.789695.n4.nabble.com/Loop-for-multiple-plots-in-figure-tp4634390.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] equality of values

2012-06-25 Thread Rui Barradas

Hello,

Just as a complement, if you want to see a solution, and why.


x - 3/5
y - 2/5
z - 1/5
#
x - (y + z) # not zero
x == y + z
all.equal(x, y + z)
#
x - y - z   # not zero
x - y == z
all.equal(x - y, z)

# To see why it works: abs diff less than tolerance
equal - function(x, y, tol=.Machine$double.eps) abs(x - y)  tol
equal(x, y + z)
equal(x - y, z)


Hope this helps,

Rui Barradas


Em 24-06-2012 23:14, Andreia Leite escreveu:

Thanks a lot for your explanation!

I see the point and I think I'll solve my question using a different method!

Andreia Leite

On Sun, Jun 24, 2012 at 10:53 PM, Oliver Ruebenacker cur...@gmail.comwrote:


 Hello,

  The FAQ answer 7.31 may be confusing to noobs and the referenced
paper is long. The essence is:

  (1) It's not an R issue, it is a floating point (flop) arithmetic issue.
  (2) The issue is that flop arithmetic is almost always only approximate.
  (3) Therefore, asking for exact equality of two flops is almost
always a mistake.
  (4) In simple cases, you can take the difference of two flops and
ask whether it is good enough
  (5) For extended calculations, you need to keep in mind that errors
propagate and accumulate.
  (6) Some ill-defined problems may lead to solutions that aren't
solutions (e.g. trying to invert a singular matrix)
  (7) Some well-defined problems may defy a straight-forward approach
(e.g. solving stiff differential equations)

 Take care
 Oliver

On Sun, Jun 24, 2012 at 5:29 PM, Sarah Goslee sarah.gos...@gmail.com
wrote:

Please read R FAQ 7.31, Why doesn't R think these numbers are equal?

Sarah

On Sun, Jun 24, 2012 at 1:58 PM, Andreia Leite
andreiaheitorle...@gmail.com wrote:

Dear list,

I'm trying to do a loop where I should choose a cell of a data frame
comparing it with another. The code actually works for the majority of

the

loop but for some reason it doesn't and I can't figure out why.

When I tried to understand why I've noticed this:


dif[11]

[1] 118.8333

linhasUpdate[10,6]

[1] 118.8333

dif[11]==linhasUpdate[10,6]

[1] FALSE

Even though the values are the same R says they aren't! This happens for
some of the values (44) and for the others (128) it works fine.

Does anybody why is this happening or a way to fix it? I' using 2.15.0.

Thanks a lot!

Andreia Leite

--

--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--
Oliver Ruebenacker, Bioinformatics and Network Analysis Consultant
President and Founder of Knowomics
(http://www.knowomics.com/wiki/Oliver_Ruebenacker)
Consultant at Predictive Medicine
(http://predmed.com/people/oliverruebenacker.html)
SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org)







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loop for multiple plots in figure

2012-06-25 Thread baptiste auguie
Hi,

Here's one approach:

plot_one - function(d){
  with(d, plot(Xvar, Yvar, t=n)) # set limits
  with(d[d$param1 == 0,], lines(Xvar, Yvar, lty=1)) # first line
  with(d[d$param1 == 1,], lines(Xvar, Yvar, lty=2)) # second line

}

par(mfrow=c(2,2))
plyr::d_ply(data, Subject, plot_one)

HTH,

b.

On 24 June 2012 23:06, Marcel Curlin cemar...@u.washington.edu wrote:
 Hello, I have longitudinal data of the form below from N subjects; I am
 trying to create figure with N small subplots on a single page, in which
 each plot is from only one subject, and in each plot there is a separate
 curve for each value of param1.

 So in this case, there would be four plots on the page (one each for Bob,
 Steve, Kevin and Dave), and each plot would have two separate curves (one
 for param1 = 1 and one for param1 = 0). The main title of the plot should be
 the subject name. I also need to sort the order of the plots on the page by
 param2.

 I can do this with a small number of subjects using manual commands. For a
 larger number I know that a 'for loop' is called for, but can't figure out
 how to get each of the subjects to plot separately, could not figure it out
 from the existing posts.  For now I want to do this in the basic environment
 though I know that lattice could also work (might try that later). Any help
 appreciated

 tC - textConnection(
 Subject Xvar    Yvar    param1  param2
 bob     9       100     1       100
 bob     0       250     1       200
 steve   2       454     1       50
 bob     -5      271     0       35
 bob     3       10      0       74
 steve   1       500     1       365
 kevin   5       490     1       546
 bob     8       855     0       76
 dave    2       233     0       343
 steve   -10     388     0       556
 steve   -7      284     1       388
 dave    3       568     1       555
 kevin   4       247     0       57
 bob     6       300     1       600
 )
 data - read.table(header=TRUE, tC)
 close.connection(tC)
 rm(tC)

 par(mfrow=c(2,2)

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Loop-for-multiple-plots-in-figure-tp4634390.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cannot use simple=TRUE in boot

2012-06-25 Thread he chen
I am doing boot with a large database, thus want to use simple=TRUE to
reduce the memory used.
I alreday set up sim=ordinary, stype=i , but I don't know how to
set n=0. In fact, I don't know what does n=0 mean?
For example,

a-c(1:1000)
b-c(2:1001)
c-cbind(a,b)

library(boot)
table-function(c,indices){
  d-c[indices,]
  e-mean(d[,1])
  return(e)
}

boot-boot(c,table,100, sim=ordinary, stype=i,simple=TRUE)


AND, r give Error in statistic(data, original, ...) : unused
argument(s) (simlpe = TRUE)

Thanks a lot!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] axis in r plot

2012-06-25 Thread sathya7priya
I have the graph plotted with x axis(-50 to 250) and y axis (-50 to 500).I
need the x axis values(-50 to 250) with spacing between two  tick marks as
1or 0.1.The graph should be wider to get enough resolution.

--
View this message in context: 
http://r.789695.n4.nabble.com/axis-in-r-plot-tp4634199p4634391.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] significance level (p) for t-value in package zelig

2012-06-25 Thread Rune Haubo
According to standard likelihood theory these are actually not
t-values, but z-values, i.e., they asymptotically follow a standard
normal distribution under the null hypothesis. This means that you
could use pnorm instead of pt to get the p-values, but an easier
solution is probably to use the clm-function (for Cumulative Link
Models) from the ordinal package - here you get the p-values
automatically.

Cheers,
Rune

On 23 June 2012 07:02, Bert Gunter gunter.ber...@gene.com wrote:
 This advice is almost certainly false!

 A t-statistic can be calculated, but the distribution will not
 necessarily be student's t nor will the df be those of the rse.  See, for
 example, rlm() in MASS, where values of the t-statistic are given without p
 values. If Brian Ripley says that p values cannot be straightforwardly
 calculated by pt(), then believe it!

 -- Bert

 On Fri, Jun 22, 2012 at 9:30 PM, Özgür Asar oa...@metu.edu.tr wrote:

 Michael,

 Try

 ?pt

 Best
 Ozgur

 --
 View this message in context:
 http://r.789695.n4.nabble.com/significance-level-p-for-t-value-in-package-zelig-tp4634252p4634271.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Rune Haubo Bojesen Christensen

PhD Student, M.Sc. Eng.
Phone: (+45) 45 25 33 63
Mobile: (+45) 30 26 45 54

DTU Informatics, Section for Statistics
Technical University of Denmark, Build. 305, Room 122,
DK-2800 Kgs. Lyngby, Denmark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cannot use simple=TRUE in boot

2012-06-25 Thread Rui Barradas

Hello,

Your code works with me, unchanged:


 boot-boot(c,table,100, sim=ordinary, stype=i,simple=TRUE)
 boot

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = c, statistic = table, R = 100, sim = ordinary,
stype = i, simple = TRUE)


Bootstrap Statistics :
original  biasstd. error
t1*500.5 0.762098.876232


But your error message shows a typo in 'simlpe' so, check your typing.

Also, 'c', 'table' and 'boot' are really bad names for data objects or 
functions, they already are R functions.


Hope this helps,

Rui Barradas

Em 25-06-2012 08:26, he chen escreveu:

I am doing boot with a large database, thus want to use simple=TRUE to
reduce the memory used.
I alreday set up sim=ordinary, stype=i , but I don't know how to
set n=0. In fact, I don't know what does n=0 mean?
For example,

a-c(1:1000)
b-c(2:1001)
c-cbind(a,b)

library(boot)
table-function(c,indices){
   d-c[indices,]
   e-mean(d[,1])
   return(e)
}

boot-boot(c,table,100, sim=ordinary, stype=i,simple=TRUE)


AND, r give Error in statistic(data, original, ...) : unused
argument(s) (simlpe = TRUE)

Thanks a lot!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] significance level (p) for t-value in package zelig

2012-06-25 Thread Prof Brian Ripley

On 25/06/2012 09:32, Rune Haubo wrote:

According to standard likelihood theory these are actually not
t-values, but z-values, i.e., they asymptotically follow a standard
normal distribution under the null hypothesis. This means that you


Whose 'standard'?

It is conventional to call a value of t-like statistic (i.e. a ratio of 
the form value/standard error) a 't-value'.  And that is nothing to do 
with 'likelihood theory' (t statistics predate the term 'likelihood'!).


The separate issue is whether a t statistic is even approximately 
t-distributed (and if so, on what df?), and another is if it is 
asymptotically normal.  For the latter you have to say what you mean by 
'asymptotic': we have lost a lot of the context, but as this does not 
appear to be IID univariate observations:


- 'standard likelihood theory' is unlikely to apply.

- standard asymptotics may well not be a good approximation (in 
regression modelling, people tend to fit more complex models to large 
datasets, which is often why a large dataset was collected).


- even for IID observations the derivation of the t distribution assumes 
normality.


The difference between a t distribution and a normal distribution is 
practically insignificant unless the df is small.   And if the df is 
small, one can rarely rely on the CLT for approximate normality 



could use pnorm instead of pt to get the p-values, but an easier
solution is probably to use the clm-function (for Cumulative Link
Models) from the ordinal package - here you get the p-values
automatically.

Cheers,
Rune

On 23 June 2012 07:02, Bert Gunter gunter.ber...@gene.com wrote:

This advice is almost certainly false!

A t-statistic can be calculated, but the distribution will not
necessarily be student's t nor will the df be those of the rse.  See, for
example, rlm() in MASS, where values of the t-statistic are given without p
values. If Brian Ripley says that p values cannot be straightforwardly
calculated by pt(), then believe it!

-- Bert

On Fri, Jun 22, 2012 at 9:30 PM, Özgür Asar oa...@metu.edu.tr wrote:


Michael,

Try

?pt

Best
Ozgur

--
View this message in context:
http://r.789695.n4.nabble.com/significance-level-p-for-t-value-in-package-zelig-tp4634252p4634271.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.








--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] axis in r plot

2012-06-25 Thread Petr PIKAL
Hi

 
 I have the graph plotted with x axis(-50 to 250) and y axis (-50 to 
500).I
 need the x axis values(-50 to 250) with spacing between two  tick marks 
as
 1or 0.1.The graph should be wider to get enough resolution.

For the first case you shall have some special display as you will need at 
least 15000 pixels wide screen. 

 length(seq(-50,250,.1))
[1] 3001

I assume 4-5 pixels between ticks as minimum spacing.

For the second case it is slightly better, however still your screen 
resolution shall be at least somewhere near 2000 pixels.
 length(seq(-50,250,1))
[1] 301

According to Wikipedia 27-30 inch HD display can probably cope with your 
request.

Regards
Petr

 
 --
 View this message in context: http://r.789695.n4.nabble.com/axis-in-r-
 plot-tp4634199p4634391.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Replacing text with a carriage return

2012-06-25 Thread Thomas
I have a comma separated data file with no carriage returns and what  
I'd like to do is


1. read the data as a block text
2. search for the string that starts each record record_start, and  
replace this with a carriage return. Replace will do, or just add a  
carriage return before it. The string is the same for each record, but  
it is enclosed in double quote marks in the file.

3. Write the results out as a csv file.

Let's say file.text looked like this:
record_start, data item 1, data item 2, record_start, data  
item 3, data item 4

and I wanted:
,data item 1, data item 2,
data item 3, data item 4

text - readLines(file.txt,encoding=UTF-8)
text - gsub(record_start, /n, text)
write.csv(text, file2.csv)

This gives me /n in the text file, enclosed in the double quotes  
that were there in the file around record_start already. Even if the  
double quotes weren't there though, I'm still not sure this would  
work. (BTW, I can live with the first incorrect comma in the output  
file because I can just remove it manually.)

Can anyone suggest a solution?
Thank you,
Thomas Chesney

This message and any attachment are intended solely for the addressee and may 
contain confidential information. If you have received this message in error, 
please send it back to me, and immediately delete it.   Please do not use, copy 
or disclose the information contained in this message or in any attachment.  
Any views or opinions expressed by the author of this email do not necessarily 
reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment
may still contain software viruses which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Indexing matrices from the Matrix package with [i, j] seems to be very slow. Are there faster alternatives?

2012-06-25 Thread Duncan Murdoch

On 12-06-24 4:50 PM, Søren Højsgaard wrote:

Dear all,

Indexing matrices from the Matrix package with [i,j] seems to be very slow. For 
example:

  library(rbenchmark)
  library(Matrix)
  mm- matrix(c(1,0,0,0,0,0,0,0), nr=20, nc=20)
  MM- as(mm, Matrix)
  lookup- function(mat){
for (i in 1:nrow(mat)){
  for (j in 1:ncol(mat)){
 mat[i,j]
  }
}
}

  benchmark(lookup(mm), lookup(MM),  columns=c(test, replications, elapsed, 
relative), replications=50)
test replications elapsed relative
1 lookup(mm)   500.01   1
2 lookup(MM)   508.77  877

I would have expected a small overhead when indexing a matrix from the Matrix 
package, but this result is really surprising...
Does anybody know if there are faster alternatives to [i,j] ?


There's also a large overhead when indexing a dataframe, though Matrix 
appears to be slower.  It's designed to work on whole matrices at a 
time, not single entries.  So I'd suggest that if you need to use [i,j] 
indexing, then try to arrange your code to localize the access, and 
extract a submatrix as a regular fast matrix first. (Or if it will fit 
in memory, convert the whole thing to a matrix just for the access.  If 
I just add the line


mat - as.matrix(mat)

at the start of your lookup function, it becomes several hundred times 
faster.)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Boxplot with Log10 and base-exponent axis

2012-06-25 Thread Luigi
Dear Mr Dunlap,
Your solution works really fine. Thank you for your time,
Best wishes,
Luigi


-Original Message-
From: William Dunlap [mailto:wdun...@tibco.com] 
Sent: 23 June 2012 18:48
To: Luigi
Cc: 'Martin Maechler'; r-help@r-project.org
Subject: RE: [R] Boxplot with Log10 and base-exponent axis

To change the y-limits you need to add ylim=c(min,max)
to the call to boxplot.  The limits there should be
in the units of the y variable (not its log10).  par(usr)[3:4]
report the y-limits on the log10 scale (they will be expanded
a bit from your desired limits so the plot does not extend
all the way to edge).

 boxplot(abs(tan(1:100)), log=y, ylim=c(.001, 1000))
 par(usr)[3:4]
[1] -3.24  3.24
 10 ^ par(usr)[3:4]
[1] 5.754399e-04 1.737801e+03

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 -Original Message-
 From: Luigi [mailto:marongiu.lu...@gmail.com]
 Sent: Saturday, June 23, 2012 8:49 AM
 To: William Dunlap
 Cc: 'Martin Maechler'; r-help@r-project.org
 Subject: RE: [R] Boxplot with Log10 and base-exponent axis
 
 Thank you!
 This works good. I understand that the value are now in Log10 scale,
 although I did not understand what is happening  at line 4 of your script.
 I would like to ask how can I change the y limits since they now depend on
 par(user). What if I'd like y limits extending from 1000 to 1 000 000?
 I've tried to modify ylim (line 3) and ceiling (line 4) but I obtained
 errors.
 Best wishes,
 Luigi
 
 
 ### UPDATED EXAMPLE

 # generationg random numbers
   x-runif(100, min=0, max=10)
 
 
 #plotting
   boxplot(x, log = y, yaxt=n)
ylim - par(usr)[3:4]
 
   log10AtY - seq(ceiling(ylim[1]), floor(ylim[2]))
 
   axis(side=2, at=10^log10AtY, lab=as.expression(lapply(log10AtY,
 function(y)bquote(10^.(y)
 


 #
 
 
 -- Your Response
 
 The key is to supply an expression, not text, to the labels argument to
 axis.
 See help(plotmath) for details.  Here is an example:
 1  x - list(One=10^(sin(1:10)+5), Two=10^(cos(1:30)*2))
 2  boxplot(x, log=y, yaxt=n)
 3  ylim - par(usr)[3:4]
 4  log10AtY - seq(ceiling(ylim[1]), floor(ylim[2]))
 5  axis(side=2, at=10^log10AtY, lab=as.expression(lapply(log10AtY,
 function(y)bquote(10^.(y)
 
 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf
  Of Luigi
  Sent: Friday, June 22, 2012 7:54 AM
  To: r-help@r-project.org
  Subject: [R] Boxplot with Log10 and base-exponent axis
 
  Dear all,
 
  I would like to (i) produce boxplot graphs with axis in logarithm in
base
 10
  and (ii) showing the values on the axis in 10^exponent format rather
than
  10E+exponent.
 
 
 
  To illustrate with an example, I have some widely spread data that I
chart
  plot using  boxplot() [figure on the left]; the log=y option of
 boxplot()
  I obtained the natural logarithm conversion of the data and the
unfriendly
  notation baseE+exponent [figure on the centre]; if I log10 the data I
 obtain
  the desired plot, but the axis are showing only the exponent. [figure on
 the
  right].
 
 
 
  Can anybody help?
 
 
 
  Best regards
 
  Luigi Marongiu, MSc
 
 
 
  ### EXAMPLE 
 
  # generationg random numbers
 
  x-runif(100, min=0, max=10)
 
 
 
  # create plot
 
  par(mfrow = c(1,3))
 
 
 
  #plotting in the left side
 
  boxplot(x, xlab=Linear values)
 
 
 
  #plotting in the centre
 
  boxplot(x, log = y, xlab=y axis logged)
 
 
 
  # creating log10 values and plotting on the right side
 
  Log.base10.x-log10(x)
 
  boxplot(Log.base10.x, xlab=LOG10 of data)
 
 
 
 
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replacing text with a carriage return

2012-06-25 Thread Rui Barradas

Hello,

Instead of replacing record_start with newlines, you can split the 
string by it. And use 'gsub' to make it prettier.



x - readLines(test.txt)
x
y - gsub(\, , x)  # remove the double quotes
y - unlist(strsplit(y, record_start,))  # do the split, with comma
# remove leading blanks and trailing blanks/commas
y - gsub(^[[:blank:]]|[[:blank:]]$|,[[:blank:]]$, , y)
# see the result and save it to file
y
writeLines(text=y, con=result.txt)


Hope this helps,

Rui Barradas

Em 25-06-2012 10:17, Thomas escreveu:

I have a comma separated data file with no carriage returns and what I'd
like to do is

1. read the data as a block text
2. search for the string that starts each record record_start, and
replace this with a carriage return. Replace will do, or just add a
carriage return before it. The string is the same for each record, but
it is enclosed in double quote marks in the file.
3. Write the results out as a csv file.

Let's say file.text looked like this:
record_start, data item 1, data item 2, record_start, data item
3, data item 4
and I wanted:
,data item 1, data item 2,
data item 3, data item 4

text - readLines(file.txt,encoding=UTF-8)
text - gsub(record_start, /n, text)
write.csv(text, file2.csv)

This gives me /n in the text file, enclosed in the double quotes that
were there in the file around record_start already. Even if the double
quotes weren't there though, I'm still not sure this would work. (BTW, I
can live with the first incorrect comma in the output file because I can
just remove it manually.)
Can anyone suggest a solution?
Thank you,
Thomas Chesney

This message and any attachment are intended solely for the addressee
and may contain confidential information. If you have received this
message in error, please send it back to me, and immediately delete
it.   Please do not use, copy or disclose the information contained in
this message or in any attachment.  Any views or opinions expressed by
the author of this email do not necessarily reflect the views of the
University of Nottingham.

This message has been checked for viruses but the contents of an attachment
may still contain software viruses which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cannot use simple=TRUE in boot

2012-06-25 Thread he chen
Dear Rui,

Thanks. I check the spell in the code, no wrong (just spell wrong in
the error message). After I download the newest 2.15  version, the
code work fine with me, too. I guess it resulted from the old version
(2.7.1) I used.





2012/6/25 Rui Barradas ruipbarra...@sapo.pt:
 Hello,

 Your code works with me, unchanged:



 boot-boot(c,table,100, sim=ordinary, stype=i,simple=TRUE)
 boot

 ORDINARY NONPARAMETRIC BOOTSTRAP


 Call:
 boot(data = c, statistic = table, R = 100, sim = ordinary,
    stype = i, simple = TRUE)


 Bootstrap Statistics :
    original  bias    std. error
 t1*    500.5 0.76209    8.876232


 But your error message shows a typo in 'simlpe' so, check your typing.

 Also, 'c', 'table' and 'boot' are really bad names for data objects or
 functions, they already are R functions.

 Hope this helps,

 Rui Barradas

 Em 25-06-2012 08:26, he chen escreveu:

 I am doing boot with a large database, thus want to use simple=TRUE to
 reduce the memory used.
 I alreday set up sim=ordinary, stype=i , but I don't know how to
 set n=0. In fact, I don't know what does n=0 mean?
 For example,

 a-c(1:1000)
 b-c(2:1001)
 c-cbind(a,b)

 library(boot)
 table-function(c,indices){
   d-c[indices,]
   e-mean(d[,1])
   return(e)
 }

 boot-boot(c,table,100, sim=ordinary, stype=i,simple=TRUE)


 AND, r give Error in statistic(data, original, ...) : unused
 argument(s) (simlpe = TRUE)

 Thanks a lot!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Best Regards,
    Yours,
Chen He

PhD Candidate
Institute of Population Research
Peking University, Beijing, China

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Very simple question R5 setRefClass and Initialize

2012-06-25 Thread Julien Cochennec

Hi,
New to this list, and glad I found R5 which si a great improvement to R 
for my work.
Something seems odd anyway, when I do a setRefClass, the initialize 
method execute at the class definition.

I would like it to execute only when an instance is created.
How can I do that? I found no way to test if the code execution is when 
the class definition happens or when a $new instance is created.

Thanks for your help.
For example my class definition looks like this :

mongoDbUser = setRefClass(mongoDbUser,
fields = list(
auth = logical,
host = character,
username = character,
password = character,
db = character,
connector = mongo
),
  methods = list(
initialize = function(auth,host,username,password,db,connector){
  print(initialization)
}
  )
)

If I execute this code, it says initialization, but it shouldn't, I 
created no instance, right??

I would like initialization to appear only when I do :
mongoDbUser$new(...)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fractional Factorial - Wrong values using lm-function

2012-06-25 Thread Staleno
Hello.

I'm a new user of R, and I have a question regarding the use of aov and
lm-functions. I'm doing a fractional factorial experiment at our production
site, and I need to familiarize myself with the analysis before I conduct
the experiment. I've been working my way through the examples provided at 
http://www.itl.nist.gov/div898/handbook/pri/section4/pri472.htm
http://www.itl.nist.gov/div898/handbook/pri/section4/pri472.htm , but I
can't get the results provided in the trial model calculations. Why is this.
Here is how I have tried to do it:

 data.catapult=read.table(Fractional.txt,header=T) #Read the data in the
 table provided in the example.

 data.catapult
   Distanceh  s b l  e
1 28.00 3.25  0 1 0 80
2 99.00 4.00 10 2 2 62
3126.50 4.75 20 2 4 80
4126.50 4.75  0 2 4 45
5 45.00 3.25 20 2 4 45
6 35.00 4.75  0 1 0 45
7 45.00 4.00 10 1 2 62
8 28.25 4.75 20 1 0 80
9 85.00 4.75  0 1 4 80
10 8.00 3.25 20 1 0 45
1136.50 4.75 20 1 4 45
1233.00 3.25  0 1 4 45
1384.50 4.00 10 2 2 62
1428.50 4.75 20 2 0 45
1533.50 3.25  0 2 0 45
1636.00 3.25 20 2 0 80
1784.00 4.75  0 2 0 80
1845.00 3.25 20 1 4 80
1937.50 4.00 10 1 2 62
20   106.00 3.25  0 2 4 80

 aov.catapult =
 aov(Distance~h+s+b+l+e+h*s+h*b+h*l+h*e+s*b+s*l+s*e+b*l+b*e+l*e,data=data.catapult)
 summary(aov.catapult)
Df Sum Sq Mean Sq F value  Pr(F)   
h1   29092909  15.854 0.01638 * 
s1   19641964  10.701 0.03076 * 
b1   75377537  41.072 0.00305 **
l1   64906490  35.369 0.00401 **
e1   22972297  12.518 0.02406 * 
h:s  1122 122   0.667 0.45998   
h:b  1345 345   1.878 0.24247   
h:l  1354 354   1.929 0.23724   
h:e  1  0   0   0.001 0.97578   
s:b  1161 161   0.877 0.40199   
s:l  1 20  20   0.107 0.75966   
s:e  1114 114   0.622 0.47427   
b:l  1926 926   5.049 0.08795 . 
b:e  1124 124   0.677 0.45689   
l:e  1158 158   0.860 0.40623   
Residuals4734 184   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

This seems just about right to me. However, when I attempt to make the
linear model, based on main factors and two-factor interactions, I get a
completely different result:

 lm.catapult =
 lm(Distance~h+s+b+l+e+h*s+h*b+h*l+h*e+s*b+s*l+s*e+b*l+b*e+l*e,data=data.catapult)
 summary(lm.catapult)

Call:
lm(formula = Distance ~ h + s + b + l + e + h * s + h * b + h * 
l + h * e + s * b + s * l + s * e + b * l + b * e + l * e, 
data = data.catapult)

Residuals:
  1   2   3   4   5   6   7   8   9 
10 
-0.8100 22.3875 -3.6763 -3.8925 -3.8925 -0.8576  7.0852 -0.8100 -0.8100
-0.8576 
 11  12  13  14  15  16  17  18  19 
20 
-0.8576 -0.8576  7.8875 -3.8925 -3.8925 -3.6763 -3.6763 -0.8100 -0.4148
-3.6763 

Coefficients:
  Estimate Std. Error t value Pr(|t|)  
(Intercept)  25.031042 100.791955   0.248   0.8161  
h-3.687500  22.466457  -0.164   0.8776  
s 0.475446   2.446791   0.194   0.8554  
b   -39.417973  44.906164  -0.878   0.4296  
l   -18.938988  12.233954  -1.548   0.1965  
e-0.158449   1.230683  -0.129   0.9038  
h:s  -0.368750   0.451546  -0.817   0.4600  
h:b  12.375000   9.030925   1.370   0.2425  
h:l   3.135417   2.257731   1.389   0.2372  
h:e   0.008333   0.258026   0.032   0.9758  
s:b  -0.634375   0.677319  -0.937   0.4020  
s:l  -0.055469   0.169330  -0.328   0.7597  
s:e   0.015268   0.019352   0.789   0.4743  
b:l   7.609375   3.386597   2.247   0.0879 .
b:e   0.318397   0.387008   0.823   0.4569  
l:e   0.089732   0.096760   0.927   0.4062  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 13.55 on 4 degrees of freedom
Multiple R-squared: 0.9697, Adjusted R-squared: 0.8563 
F-statistic: 8.545 on 15 and 4 DF,  p-value: 0.02559

This result is nothing like the results provided in the example. Why is
this? Any help is very much appreciated.

Regards, Ståle.

--
View this message in context: 
http://r.789695.n4.nabble.com/Fractional-Factorial-Wrong-values-using-lm-function-tp4634400.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.spss function

2012-06-25 Thread niloo javan
Hi dear all,
I am trying to import a dataset from SPSS into R (R x64 2.15.0).
I used both  
read.spss(D:/Untitled2.sav, use.value.labels = TRUE, to.data.frame = TRUE, 
max.value.labels = Inf,
trim.factor.names = FALSE,trim_values = TRUE, reencode = NA,
use.missings = to.data.frame)

read.spss(D:\\Untitled2.sav, to.data.frame = TRUE, stringsAsFactors = FALSE)
 
but the result is:
Error: could not find function read.spss
 
What should I do to import data set into R?
_
Best Regards


Niloofar.Javanrouh
MSc Of BioStatistics 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using multiple cpu's - scaling in processing power

2012-06-25 Thread soren wilkening
Hi All

In the past I have worked with parallel processing in R where a function F
is applied to the elements of a list L. The more cpu cores one has, the
faster the process will run. At the time of launching the process for (F,L)
I will have a certain fixed number of cpu's that I can use. I have tested
this approach and it works fine (i.e. package 'multicore' , using 'mapply' )

But now I am encountering a slightly different situation.

I have a task (F,L) that will run for a *long* time (a week) even if I have
N cpu's processing it. N is the maximum possible number cpus that I can use,
however they will not all be available when I start the process.  
So the problem is that, when I start the process, I may have only
n1  N cpu's at my disposal. AFter some time, I then have
n1  n2  N cpu's at my disposal. After some more time, I have 
n2  n3  N cpu's  and finally, at one point, I will have
N cpu's that I can work with. I scale in cpu power over the duration of
the process. Why this is the case does not matter. Essentially I cannot
control when new cpu's become available nor how many of them will become
available at that point.

With this I cannot use the standard approach above, where all the cpu cores
have to be available before I launch the process !!

It would help me if someone knew if R offered a solution for this type of
processing. But I would also be happy for pointers to non-R resources that
could deal with this.

Thanks

Soren


-
http://censix.com
--
View this message in context: 
http://r.789695.n4.nabble.com/using-multiple-cpu-s-scaling-in-processing-power-tp4634405.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.spss function

2012-06-25 Thread Jim Holtman
you have to load the 'foriegn' package first.

Sent from my iPad

On Jun 25, 2012, at 5:12, niloo javan niloo_ja...@yahoo.com wrote:

 Hi dear all,
 I am trying to import a dataset from SPSS into R (R x64 2.15.0).
 I used both  
 read.spss(D:/Untitled2.sav, use.value.labels = TRUE, to.data.frame = TRUE, 
 max.value.labels = Inf,
 trim.factor.names = FALSE,trim_values = TRUE, reencode = NA,
 use.missings = to.data.frame)
 
 read.spss(D:\\Untitled2.sav, to.data.frame = TRUE, stringsAsFactors = FALSE)
  
 but the result is:
 Error: could not find function read.spss
  
 What should I do to import data set into R?
 _
 Best Regards
 
 
 Niloofar.Javanrouh
 MSc Of BioStatistics 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] special simbol (±) in a legend

2012-06-25 Thread gianni lavaredo
dear reserachers,

I am looking for a expression about a special symbol (±) in a Legend and
add on front of Mean ± SD the symbol of bracket


sorry if the example is not great

disptest - matrix(rnorm(10),nrow=10)
disptest.means- rowMeans(disptest)
plot(1:10,disptest.means)
dispersion(1:10,disptest.means,(disptest.means+disptest
),(disptest.means-disptest ),intervals=FALSE,arrow.cap=0.001,arrow.gap=0)

legend(topleft,
legend=c(Mean,Mean +/- SD),
pch=c(1,NA),
bty=n,cex=1.3)

thanks for all suggestions
Gianni

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply() on columns

2012-06-25 Thread faelsendoorn
I do now know how to navigate through the table.
But my question is, what kind of graphical and numerical summaries can I
make with keeping in mind that I am interested in the average working hour
per employee?

--
View this message in context: 
http://r.789695.n4.nabble.com/Apply-on-columns-tp4633468p4634411.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] special simbol (±) in a legend

2012-06-25 Thread Prof Brian Ripley
The posting guide asked for 'at a minimum' information, which you have 
not provided (nor do we know what graphics device this is).


But on many OSes and devices simply use the Unicode character \u00b1. 
 Or use plotmath (see ?plotmath).


On 25/06/2012 12:03, gianni lavaredo wrote:

dear reserachers,

I am looking for a expression about a special symbol (±) in a Legend and
add on front of Mean ± SD the symbol of bracket


sorry if the example is not great

disptest - matrix(rnorm(10),nrow=10)
disptest.means- rowMeans(disptest)
plot(1:10,disptest.means)
dispersion(1:10,disptest.means,(disptest.means+disptest
),(disptest.means-disptest ),intervals=FALSE,arrow.cap=0.001,arrow.gap=0)

legend(topleft,
legend=c(Mean,Mean +/- SD),
pch=c(1,NA),
bty=n,cex=1.3)

thanks for all suggestions
Gianni

[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply() on columns

2012-06-25 Thread Petr PIKAL
 
 I do now know how to navigate through the table.
 But my question is, what kind of graphical and numerical summaries can I
 make with keeping in mind that I am interested in the average working 
hour
 per employee?

Rather vague question without data or code, so rather vague answer.

For numerical values summarised according to some factor see

?aggregate
?by
?tapply

For graphical summary maybe

?dotplot
?boxplot
or maybe you could try package ggplot2

Regards
Petr

 
 --
 View this message in context: http://r.789695.n4.nabble.com/Apply-on-
 columns-tp4633468p4634411.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply() on columns

2012-06-25 Thread Rui Barradas

Hello,

It's not that complicated.



f - function(index, dat, offset=5){
x - data.frame(Emp=dat[, index], Hours=dat[, index + offset])
aggregate(Hours~Emp, data=x, sum)
}

sp - read.table(supermarkt.txt)

lapply(1:5, f, sp)


Also, please provide context. (Quote the post you are reffering to.)

Hope this helps,

Rui Barradas

Em 25-06-2012 11:43, faelsendoorn escreveu:

I do now know how to navigate through the table.
But my question is, what kind of graphical and numerical summaries can I
make with keeping in mind that I am interested in the average working hour
per employee?

--
View this message in context: 
http://r.789695.n4.nabble.com/Apply-on-columns-tp4633468p4634411.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply() on columns

2012-06-25 Thread Rui Barradas

Hello, again.

Sorry, I've used 'sum' when it should be 'mean'.

Rui Barradas

Em 25-06-2012 12:49, Rui Barradas escreveu:

Hello,

It's not that complicated.



f - function(index, dat, offset=5){
 x - data.frame(Emp=dat[, index], Hours=dat[, index + offset])
 aggregate(Hours~Emp, data=x, sum)
}

sp - read.table(supermarkt.txt)

lapply(1:5, f, sp)


Also, please provide context. (Quote the post you are reffering to.)

Hope this helps,

Rui Barradas

Em 25-06-2012 11:43, faelsendoorn escreveu:

I do now know how to navigate through the table.
But my question is, what kind of graphical and numerical summaries can I
make with keeping in mind that I am interested in the average working
hour
per employee?

--
View this message in context:
http://r.789695.n4.nabble.com/Apply-on-columns-tp4633468p4634411.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Arules - predict function issues - subscript out of bounds

2012-06-25 Thread ankur verma
Hi,

I'm doing a Market Basket Analysis in which I have a list of transaction
id's in column 2 and transactions(product names) in column 1. I read this
into a transaction file using a
txn-read.transaction(file=data.csv,format='single', rm.duplicates=F,
cols=c(1,2))

If I want to use the apriori algorithm everything seems to be running fine.
However it is when I want to form cluster of these transactional patterns I
find problems. I formed clusters using the following code:

s-sample(txn,1000)
d-dissimilarity(s, method=Jaccard)
clustering-pam(d,k=5)

But when I'm trying to predict this on the larger set it keeps throwing an
subscript out of bound error

Label-predict(s[clustering$mediods],txn,method=Jaccard)

Can anyone explain to me why this keeps happening ?? I've tried this on
other datasets like Groceries/ Adult in the arules package and it seems to
work fine !!

Thanks,
Ankur

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using multiple cpu's - scaling in processing power

2012-06-25 Thread Drew Tyre
Hi Soren

Have you looked into using the condor scheduler
(http://research.cs.wisc.edu/condor/)? I'm not aware of it linking up
with multicore or other parallel processing code inside R, but I've
used it to run multiple R processes on a variable number of processors
where N can both increase and decrease over time.

hth

On Mon, Jun 25, 2012 at 5:11 AM, soren wilkening m...@censix.com wrote:
 Hi All

 In the past I have worked with parallel processing in R where a function F
 is applied to the elements of a list L. The more cpu cores one has, the
 faster the process will run. At the time of launching the process for (F,L)
 I will have a certain fixed number of cpu's that I can use. I have tested
 this approach and it works fine (i.e. package 'multicore' , using 'mapply' )

 But now I am encountering a slightly different situation.

 I have a task (F,L) that will run for a *long* time (a week) even if I have
 N cpu's processing it. N is the maximum possible number cpus that I can use,
 however they will not all be available when I start the process.
 So the problem is that, when I start the process, I may have only
 n1  N cpu's at my disposal. AFter some time, I then have
 n1  n2  N cpu's at my disposal. After some more time, I have
 n2  n3  N cpu's  and finally, at one point, I will have
 N cpu's that I can work with. I scale in cpu power over the duration of
 the process. Why this is the case does not matter. Essentially I cannot
 control when new cpu's become available nor how many of them will become
 available at that point.

 With this I cannot use the standard approach above, where all the cpu cores
 have to be available before I launch the process !!

 It would help me if someone knew if R offered a solution for this type of
 processing. But I would also be happy for pointers to non-R resources that
 could deal with this.

 Thanks

 Soren


 -
 http://censix.com
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/using-multiple-cpu-s-scaling-in-processing-power-tp4634405.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Drew Tyre

School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974

phone: +1 402 472 4054
fax: +1 402 472 2946
email: aty...@unl.edu
http://snr.unl.edu/tyre
http://aminpractice.blogspot.com
http://www.flickr.com/photos/atiretoo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] do.call or something instead of for

2012-06-25 Thread Kathie
Dear R users,

I'd like to compute  X like below.

X_{i,t} = 0.1*t + 2*X_{i,t-1} + W_{i,t}

where W_{i,t}  are from Uniform(0,2) and X_{i,0} = 1+5*W_{i,0}

Of course, I can do this with for statement, but I don't think it's good
idea because i and t are too big.  

So, my question is that

Is there any better idea to avoid for statement for this problem? 

Thank you in advance.

Kathie

--
View this message in context: 
http://r.789695.n4.nabble.com/do-call-or-something-instead-of-for-tp4634421.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tune.svm (e1071) takes forever

2012-06-25 Thread aramhovsepyan
Hi,

I am in the process of migrating my machine learning scripts from bash +
libsvm to R and for the past weeks I have been battling a substantial
problem when using the tune.svm function from the e1071 package. The
tune.svm function that performs the grid-search for the best cost and gamma
parameters takes really long time compared to the grid search provided by
the libsvm library. The tune.svm takes 10 times longer than the grid search
from libsvm. For a very large data set this is too slow.
I would like to ask if there is any way to speed up the grid search. 
Here is the function call I use:

tune.obj - tune.svm( formula, data=dataset, cost = 10^(0:2), gamma =
10^(-6:-4) )

Ideally I would like to search for cost 10^(0:3) and gamma = 10^(-6:-3)
however that takes even longer.
Thanks,
Aram.

--
View this message in context: 
http://r.789695.n4.nabble.com/tune-svm-e1071-takes-forever-tp4634415.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very simple question R5 setRefClass and Initialize

2012-06-25 Thread Julien Cochennec
Help anyone? Is this impossible? I tried to use the initFields method 
but it did not change anything.


Le 25/06/2012 10:55, Julien Cochennec a écrit :

Hi,
New to this list, and glad I found R5 which si a great improvement to 
R for my work.
Something seems odd anyway, when I do a setRefClass, the initialize 
method execute at the class definition.

I would like it to execute only when an instance is created.
How can I do that? I found no way to test if the code execution is 
when the class definition happens or when a $new instance is created.

Thanks for your help.
For example my class definition looks like this :

mongoDbUser = setRefClass(mongoDbUser,
fields = list(
auth = logical,
host = character,
username = character,
password = character,
db = character,
connector = mongo
),
  methods = list(
initialize = function(auth,host,username,password,db,connector){
  print(initialization)
}
  )
)

If I execute this code, it says initialization, but it shouldn't, I 
created no instance, right??

I would like initialization to appear only when I do :
mongoDbUser$new(...)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fractional Factorial - Wrong values using lm-function

2012-06-25 Thread Ståle Nordås
Hello.

Thank you for the help. However, I'm not sure your reply answers my question. 
Let me rephrase:

I'm trying to reproduce the values in the second table in the example on 
http://www.itl.nist.gov/div898/handbook/pri/section4/pri472.htm. The table 
shows the summary of the linear model, which are the values I'm trying to 
reproduce, using the input in the example. When I use the lm-function on the 
data, I get values completely different from those given in the example (I've 
provided these values in my first post). Obviously I'm missing something - why 
can't I reproduce the values in the example using:

lm.catapult = 
lm(Distance~h+s+b+l+e+h*s+h*b+h*l+h*e+s*b+s*l+s*e+b*l+b*e+l*e,data=data.catapult)
 summary(lm.catapult)

?

I hope this was clearer.

Regards, Ståle Nordås


-Original Message-
From: arun [mailto:smartpink...@yahoo.com] 
Sent: 25. juni 2012 13:50
To: Ståle Nordås
Cc: R help
Subject: Re: [R] Fractional Factorial - Wrong values using lm-function

Hi,

You need to use,

anova(lm.catapult)

Analysis of Variance Table

Response: Distance
  Df Sum Sq Mean Sq F value   Pr(F) h  1 2909.3  2909.3 
15.8538 0.016378 * s  1 1963.6  1963.6 10.7005 0.030755 * b  1 
7536.9  7536.9 41.0720 0.003046 ** l  1 6490.3  6490.3 35.3687 0.004010 
** e  1 2297.0  2297.0 12.5177 0.024056 * h:s    1  122.4   122.4  
0.6669 0.459978 h:b    1  344.6   344.6  1.8777 0.242467 h:l    1  
353.9   353.9  1.9286 0.237236 h:e    1    0.2 0.2  0.0010 0.975783 
s:b    1  161.0   161.0  0.8772 0.401991 s:l    1   19.7    19.7  
0.1073 0.759658 s:e    1  114.2   114.2  0.6225 0.474270 b:l    1  
926.4   926.4  5.0486 0.087946 . 
b:e    1  124.2   124.2  0.6769 0.456887 l:e    1  157.8   157.8  
0.8600 0.406226 Residuals  4  734.0   183.5   


#the summary result you got is the summary of linear model, while the summary 
of aov is the anova summary.


A.K.    



- Original Message -
From: Staleno s...@bergen-plastics.no
To: r-help@r-project.org
Cc: 
Sent: Monday, June 25, 2012 5:26 AM
Subject: [R] Fractional Factorial - Wrong values using lm-function

Hello.

I'm a new user of R, and I have a question regarding the use of aov and 
lm-functions. I'm doing a fractional factorial experiment at our production 
site, and I need to familiarize myself with the analysis before I conduct the 
experiment. I've been working my way through the examples provided at 
http://www.itl.nist.gov/div898/handbook/pri/section4/pri472.htm
http://www.itl.nist.gov/div898/handbook/pri/section4/pri472.htm , but I can't 
get the results provided in the trial model calculations. Why is this.
Here is how I have tried to do it:

 data.catapult=read.table(Fractional.txt,header=T) #Read the data in 
 the table provided in the example.

 data.catapult
   Distance    h  s b l  e
1     28.00 3.25  0 1 0 80
2     99.00 4.00 10 2 2 62
3    126.50 4.75 20 2 4 80
4    126.50 4.75  0 2 4 45
5     45.00 3.25 20 2 4 45
6     35.00 4.75  0 1 0 45
7     45.00 4.00 10 1 2 62
8     28.25 4.75 20 1 0 80
9     85.00 4.75  0 1 4 80
10     8.00 3.25 20 1 0 45
11    36.50 4.75 20 1 4 45
12    33.00 3.25  0 1 4 45
13    84.50 4.00 10 2 2 62
14    28.50 4.75 20 2 0 45
15    33.50 3.25  0 2 0 45
16    36.00 3.25 20 2 0 80
17    84.00 4.75  0 2 0 80
18    45.00 3.25 20 1 4 80
19    37.50 4.00 10 1 2 62
20   106.00 3.25  0 2 4 80

 aov.catapult =
 aov(Distance~h+s+b+l+e+h*s+h*b+h*l+h*e+s*b+s*l+s*e+b*l+b*e+l*e,data=da
 ta.catapult)
 summary(aov.catapult)
            Df Sum Sq Mean Sq F value  Pr(F) h            1   2909    2909  
15.854 0.01638 * s            1   1964    1964  10.701 0.03076 * b            
1   7537    7537  41.072 0.00305 ** l            1   6490    6490  35.369 
0.00401 ** e            1   2297    2297  12.518 0.02406 * h:s          1    
122     122   0.667 0.45998 h:b          1    345     345   1.878 0.24247 h:l   
       1    354     354   1.929 0.23724 h:e          1      0       0   0.001 
0.97578 s:b          1    161     161   0.877 0.40199 s:l          1     20     
 20   0.107 0.75966 s:e          1    114     114   0.622 0.47427 b:l          
1    926     926   5.049 0.08795 . 
b:e          1    124     124   0.677 0.45689 l:e          1    158     158   
0.860 0.40623 Residuals    4    734     184
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

This seems just about right to me. However, when I attempt to make the linear 
model, based on main factors and two-factor interactions, I get a completely 
different result:

 lm.catapult =
 lm(Distance~h+s+b+l+e+h*s+h*b+h*l+h*e+s*b+s*l+s*e+b*l+b*e+l*e,data=dat
 a.catapult)
 summary(lm.catapult)

Call:
lm(formula = Distance ~ h + s + b + l + e + h * s + h * b + h *
    l + h * e + s * b + s * l + s * e + b * l + b * e + l * e,
    data = data.catapult)

Residuals:
      1       2       3       4       5       6       7       8       9
10
-0.8100 22.3875 -3.6763 -3.8925 

Re: [R] do.call or something instead of for

2012-06-25 Thread Mark Leeds
Hi: Use arima.sim because when you have recursive relationships like that,
there's no way to
not loop. If arima.sim doesn't allow for the trend piece (0.1*t ), then you
can do that part seperately using cumsum(0.1*(1:t)) and then add it back in
to what arima.sim gives you.  I don't remember if arima.sim allows for a
trend. It might.


Mark


On Mon, Jun 25, 2012 at 8:39 AM, Kathie kathryn.lord2...@gmail.com wrote:

 Dear R users,

 I'd like to compute  X like below.

 X_{i,t} = 0.1*t + 2*X_{i,t-1} + W_{i,t}

 where W_{i,t}  are from Uniform(0,2) and X_{i,0} = 1+5*W_{i,0}

 Of course, I can do this with for statement, but I don't think it's good
 idea because i and t are too big.

 So, my question is that

 Is there any better idea to avoid for statement for this problem?

 Thank you in advance.

 Kathie

 --
 View this message in context:
 http://r.789695.n4.nabble.com/do-call-or-something-instead-of-for-tp4634421.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] do.call or something instead of for

2012-06-25 Thread Mark Leeds
Hi Kathryn: I'm sorry because I didn't read your question carefully enough.
arima.sim won't help because you don't have a normal error term. I think
you have to loop unless someone else knows of a way that I'm not aware of.
good luck.




On Mon, Jun 25, 2012 at 8:39 AM, Kathie kathryn.lord2...@gmail.com wrote:

 Dear R users,

 I'd like to compute  X like below.

 X_{i,t} = 0.1*t + 2*X_{i,t-1} + W_{i,t}

 where W_{i,t}  are from Uniform(0,2) and X_{i,0} = 1+5*W_{i,0}

 Of course, I can do this with for statement, but I don't think it's good
 idea because i and t are too big.

 So, my question is that

 Is there any better idea to avoid for statement for this problem?

 Thank you in advance.

 Kathie

 --
 View this message in context:
 http://r.789695.n4.nabble.com/do-call-or-something-instead-of-for-tp4634421.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible bug in is.na.data.frame(): input without columns

2012-06-25 Thread Mikko Korpela
On 06/20/2012 05:36 PM, Mikko Korpela wrote:
 Hello list!
 
 Let's construct a matrix / data.frame with 0 columns, but  0 rows, and
 non-NULL rownames. Then, call is.na() on both the data.frame and the
 matrix. We find that is.na.data.frame() gives an error. When row.names
 are removed, is.na.data.frame() returns NULL. I think that the NULL
 result is also wrong. From ?is.na:
 
  The method ‘is.na.data.frame’ returns a logical matrix with the
  same dimensions as the data frame, and with dimnames taken from
  the row and column names of the data frame.
 
 No problems are seen when is.na() is run on the matrix. What do you
 think, should a formal bug report be filed?

Bug report filed to R bugzilla.

-- 
Mikko Korpela
Aalto University School of Science
Department of Information and Computer Science

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Questions about doing analysis based on time

2012-06-25 Thread John Kane
Arrgh yes I did mean dput(head(mydata, 100)).  Thanks for catching it.

John Kane
Kingston ON Canada


 -Original Message-
 From: michael.weyla...@gmail.com
 Sent: Fri, 22 Jun 2012 14:25:30 -0500
 To: jrkrid...@inbox.com
 Subject: Re: [R] Questions about doing analysis based on time
 
 On Fri, Jun 22, 2012 at 2:18 PM, John Kane jrkrid...@inbox.com wrote:
 Hi and welcome to the R-help list.
 
 It would be much better for readers to get your data in a more easily
 used format.
 
 There is a function called dput() that will output your data in a way
 that R can read easily.
 
 We don't need to see all the data but perhaps hundred lines of it would
 be nice.
 
 Try this  where your file is called mydata
 # just copy the line below and paste into R
 head(mydata, 100)
 
 I think you mean dput(head(mydata, 100))
 
 OP: Once you put this up I'll give more reply, but for now I'd suggest
 you try to put your data in a proper time series class (zoo/xts if I
 might give a personal-ish plug) which will make all these calculations
 much easier.
 
 Best,
 Michael
 
 
 # Now copy the output and paste it into your wordprocess as a reply to
 the list and we will have decent data to work with.
 
 John Kane
 Kingston ON Canada
 
 
 -Original Message-
 From: mikeedinge...@gmail.com
 Sent: Fri, 22 Jun 2012 09:21:40 -0700 (PDT)
 To: r-help@r-project.org
 Subject: [R] Questions about doing analysis based on time
 
 
 I have a spreadsheet that I've read into R using read.csv.  I've also
 attached it.  It looks like this (except there are 1600+ entries):
 
 Sunday
               SunDate SunTime SunScore
 1       5/9/2010 0:00    0:00      127
 2      6/12/2011 0:00    0:00      125
 3      6/15/2008 0:04    0:04       98
 4       8/3/2008 0:07    0:07      118
 5      7/24/2011 0:07    0:07      122
 6      5/25/2008 0:09    0:09      104
 7      5/20/2012 0:11    0:11      124
 8     10/18/2009 0:12    0:12      121
 9      3/14/2010 0:12    0:12      117
 10      1/2/2011 0:12    0:12      131
 
 SunDate and SunTime are both factors.  In order to change the class to
 something I can work with, I use the following:
 
 Sunday$SunTime-as.POSIXlt(SunTime,tz=””,”%H:%M”)
 Sunday$SunDate-as.POSIXlt(SunDate,tz=””,”%m/%d/%Y %H:%M”)
 
 Now, the str(Sunday) command yields:
 
 'data.frame':   1644 obs. of  3 variables:
  $ SunDate : POSIXlt, format: 2010-05-09 00:00:00 2011-06-12
 00:00:00
 ...
  $ SunTime : POSIXlt, format: 2012-06-18 00:00:00 2012-06-18
 00:00:00
 ...
  $ SunScore: int  127 125 98 118 122 104 124 121 117 131 ...
 
 I think all the elements in Sunday are correct for me to do what I want
 to
 do, but I don't know how to do them.
 
 1. How can I get the mean score by hour?  For example, I want the mean
 score
 
 
 of all the entries between 0:00 and 0:59, then 1:00  and 1:59, etc.
 2. Is it possible for me to create a histogram by hour for each score
 over a
 certain point?  For example, I want to make a histogram of all scores
 above
 140 by the hour they occurred in.  Is that possible?
 
 These last few might not be possibe (at least with R), but I'll ask
 anyway.
 I've got another data set similar to the one above, except it's got
 12,000
 entries over four years.  If I do the same commands as above to turn
 Date
 and Time into POSIXlt, is it possible for me to do the following:
 
 1. The data was recorded at irregular intervals, and the difference
 between
 recorded points can range from anywhere between 1 hour and up to 7.  Is
 it
 possible, when data isn't recorded between two points, to insert the
 hours
 that are unrecorded along with the average of what that hour is.  This
 is
 sort of a pre-requisite for the next two.
 2. If one of the entries has a Score above a certain point, is it
 possible
 to determine how long it was above that point and determine the mean
 for
 all
 the instances this occurred.  For example:
 01/01/11 01:00 AM
 101
 01/01/11 02:21 AM
 142
 01/01/11 03:36 AM
 156
 01/01/11 04:19 AM
 130
 01/01/11 05:12 AM
 146
 01/01/11 06:49 AM
 116
 01/01/11 07:09 AM
 111
       There are two spans where it's above 140. The two and three
 o'clock
 hours,
 and the 5 o'clock hour.  So the mean time would be 1.5 hours.  Is it
 possible for R to do this over a much larger time period?
 
 3.  If a score reaches a certain point, is it possible for R to
 determine
 the average time between that and when the score reaches another point.
 For
 example:
 01/01/11 01:01 AM
 101
 01/01/11 02:21 AM
 121
 01/01/11 03:14 AM
 134
 01/01/11 04:11 AM
 149
 01/01/11 05:05 AM
 119
 01/01/11 06:14 AM
 121
 01/01/11 07:19 AM
 127
 01/01/11 08:45 AM
 134
 01/01/11 09:11 AM
 142
 01/01/11 10:10 AM
 131
 The score goes above 120 during the 2 AM hour and doesn't go above 140
 until
 the 4 AM hour.  Then it goes above 120 again in the 6 AM hour, but
 doesn't
 go above 140 until the 9 AM hour.  So the average time to go from 120
 to
 140
 is 2.5 hours.  Can R does this over a much larger time frame?
 
 If anyone knows 

Re: [R] Questions about doing analysis based on time

2012-06-25 Thread R. Michael Weylandt
On Fri, Jun 22, 2012 at 3:45 PM, APOCooter mikeedinge...@gmail.com wrote:

 [snip and rearrange]

 try to put your data in a proper time series class (zoo/xts if I
might give a personal-ish plug) which will make all these calculations
much easier.

 I thought that's what I was doing with as.POSIXlt?

as.POSIXlt converts your time stamps to one of many formats (though
it's really probably better to use as.POSIXct instead) -- zoo/xts
objects are containers for time series data and provide you with
methods/wrappers to do many common time-series style calculations.
E.g., rollapply() or period.apply()

Best,
Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] do.call or something instead of for

2012-06-25 Thread Petr Savicky
On Mon, Jun 25, 2012 at 05:39:45AM -0700, Kathie wrote:
 Dear R users,
 
 I'd like to compute  X like below.
 
 X_{i,t} = 0.1*t + 2*X_{i,t-1} + W_{i,t}
 
 where W_{i,t}  are from Uniform(0,2) and X_{i,0} = 1+5*W_{i,0}
 
 Of course, I can do this with for statement, but I don't think it's good
 idea because i and t are too big.  
 
 So, my question is that
 
 Is there any better idea to avoid for statement for this problem? 

Hi.

The recurrence does not use index i. So, it is possible to vectorize
over i. Try the following.

  m - 10 # bound on i
  n - 6 # bound on tt
  W - matrix(runif(m*n, max=2), nrow=m, ncol=n)
  X - matrix(nrow=m, ncol=n)
  X[, 1] - 1 + 5*W[, 1]
  for (tt in seq.int(from=2, length=n-1)) {
  X[, tt] - 0.1*(tt-1) + 2*X[, tt-1] + W[, tt]
  }

The code uses tt instead of t to avoid confusion with the transpose function.

If n is always at least 2, then seq.int(from=2, length=n-1) may be
replaced by 2:n.

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] do.call or something instead of for

2012-06-25 Thread Mark Leeds
thanks peter. I was thinking more about t but you're right in that there's
an i there also. my
bad ( twice ).



On Mon, Jun 25, 2012 at 9:37 AM, Petr Savicky savi...@cs.cas.cz wrote:

 On Mon, Jun 25, 2012 at 05:39:45AM -0700, Kathie wrote:
  Dear R users,
 
  I'd like to compute  X like below.
 
  X_{i,t} = 0.1*t + 2*X_{i,t-1} + W_{i,t}
 
  where W_{i,t}  are from Uniform(0,2) and X_{i,0} = 1+5*W_{i,0}
 
  Of course, I can do this with for statement, but I don't think it's
 good
  idea because i and t are too big.
 
  So, my question is that
 
  Is there any better idea to avoid for statement for this problem?

 Hi.

 The recurrence does not use index i. So, it is possible to vectorize
 over i. Try the following.

  m - 10 # bound on i
  n - 6 # bound on tt
  W - matrix(runif(m*n, max=2), nrow=m, ncol=n)
  X - matrix(nrow=m, ncol=n)
  X[, 1] - 1 + 5*W[, 1]
  for (tt in seq.int(from=2, length=n-1)) {
  X[, tt] - 0.1*(tt-1) + 2*X[, tt-1] + W[, tt]
  }

 The code uses tt instead of t to avoid confusion with the transpose
 function.

 If n is always at least 2, then seq.int(from=2, length=n-1) may be
 replaced by 2:n.

 Hope this helps.

 Petr Savicky.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fractional Factorial - Wrong values using lm-function

2012-06-25 Thread Bert Gunter
Staleno:

As always, you need to read the Help file carefully. From ?aov:

The main difference from lm is in the way print, summary and so on handle
the fit: this is expressed in the traditional language of the analysis of
variance rather than that of linear models. 

summary.aov() computes sequential ss. lm() uses the t-statistics for the
estimated coefficients. They are not the same if non-orthogonal contrasts
are used. Of course, the coefficients and fits **are** identical.

If you don't know what this means, consult a statistician or linear models
references.

-- Bert

On Mon, Jun 25, 2012 at 2:26 AM, Staleno s...@bergen-plastics.no wrote:

 Hello.

 I'm a new user of R, and I have a question regarding the use of aov and
 lm-functions. I'm doing a fractional factorial experiment at our production
 site, and I need to familiarize myself with the analysis before I conduct
 the experiment. I've been working my way through the examples provided at
 http://www.itl.nist.gov/div898/handbook/pri/section4/pri472.htm
 http://www.itl.nist.gov/div898/handbook/pri/section4/pri472.htm , but I
 can't get the results provided in the trial model calculations. Why is
 this.
 Here is how I have tried to do it:

  data.catapult=read.table(Fractional.txt,header=T) #Read the data in the
  table provided in the example.

  data.catapult
   Distanceh  s b l  e
 1 28.00 3.25  0 1 0 80
 2 99.00 4.00 10 2 2 62
 3126.50 4.75 20 2 4 80
 4126.50 4.75  0 2 4 45
 5 45.00 3.25 20 2 4 45
 6 35.00 4.75  0 1 0 45
 7 45.00 4.00 10 1 2 62
 8 28.25 4.75 20 1 0 80
 9 85.00 4.75  0 1 4 80
 10 8.00 3.25 20 1 0 45
 1136.50 4.75 20 1 4 45
 1233.00 3.25  0 1 4 45
 1384.50 4.00 10 2 2 62
 1428.50 4.75 20 2 0 45
 1533.50 3.25  0 2 0 45
 1636.00 3.25 20 2 0 80
 1784.00 4.75  0 2 0 80
 1845.00 3.25 20 1 4 80
 1937.50 4.00 10 1 2 62
 20   106.00 3.25  0 2 4 80

  aov.catapult =
 
 aov(Distance~h+s+b+l+e+h*s+h*b+h*l+h*e+s*b+s*l+s*e+b*l+b*e+l*e,data=data.catapult)
  summary(aov.catapult)
Df Sum Sq Mean Sq F value  Pr(F)
 h1   29092909  15.854 0.01638 *
 s1   19641964  10.701 0.03076 *
 b1   75377537  41.072 0.00305 **
 l1   64906490  35.369 0.00401 **
 e1   22972297  12.518 0.02406 *
 h:s  1122 122   0.667 0.45998
 h:b  1345 345   1.878 0.24247
 h:l  1354 354   1.929 0.23724
 h:e  1  0   0   0.001 0.97578
 s:b  1161 161   0.877 0.40199
 s:l  1 20  20   0.107 0.75966
 s:e  1114 114   0.622 0.47427
 b:l  1926 926   5.049 0.08795 .
 b:e  1124 124   0.677 0.45689
 l:e  1158 158   0.860 0.40623
 Residuals4734 184
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 This seems just about right to me. However, when I attempt to make the
 linear model, based on main factors and two-factor interactions, I get a
 completely different result:

  lm.catapult =
 
 lm(Distance~h+s+b+l+e+h*s+h*b+h*l+h*e+s*b+s*l+s*e+b*l+b*e+l*e,data=data.catapult)
  summary(lm.catapult)

 Call:
 lm(formula = Distance ~ h + s + b + l + e + h * s + h * b + h *
l + h * e + s * b + s * l + s * e + b * l + b * e + l * e,
data = data.catapult)

 Residuals:
  1   2   3   4   5   6   7   8   9
 10
 -0.8100 22.3875 -3.6763 -3.8925 -3.8925 -0.8576  7.0852 -0.8100 -0.8100
 -0.8576
 11  12  13  14  15  16  17  18  19
 20
 -0.8576 -0.8576  7.8875 -3.8925 -3.8925 -3.6763 -3.6763 -0.8100 -0.4148
 -3.6763

 Coefficients:
  Estimate Std. Error t value Pr(|t|)
 (Intercept)  25.031042 100.791955   0.248   0.8161
 h-3.687500  22.466457  -0.164   0.8776
 s 0.475446   2.446791   0.194   0.8554
 b   -39.417973  44.906164  -0.878   0.4296
 l   -18.938988  12.233954  -1.548   0.1965
 e-0.158449   1.230683  -0.129   0.9038
 h:s  -0.368750   0.451546  -0.817   0.4600
 h:b  12.375000   9.030925   1.370   0.2425
 h:l   3.135417   2.257731   1.389   0.2372
 h:e   0.008333   0.258026   0.032   0.9758
 s:b  -0.634375   0.677319  -0.937   0.4020
 s:l  -0.055469   0.169330  -0.328   0.7597
 s:e   0.015268   0.019352   0.789   0.4743
 b:l   7.609375   3.386597   2.247   0.0879 .
 b:e   0.318397   0.387008   0.823   0.4569
 l:e   0.089732   0.096760   0.927   0.4062
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 13.55 on 4 degrees of freedom
 Multiple R-squared: 0.9697, Adjusted R-squared: 0.8563
 F-statistic: 8.545 on 15 and 4 DF,  p-value: 0.02559

 This result is nothing like the results provided in the example. Why is
 this? Any help is very much appreciated.

 Regards, Ståle.

 --
 View 

Re: [R] Fractional Factorial - Wrong values using lm-function

2012-06-25 Thread Simon Knapp
They are coding the variables as factors and using orthogonal
polynomial contrasts. This:

data.catapult - data.frame(data.catapult$Distance,
do.call(data.frame, lapply(data.catapult[-1], factor, ordered=T)))
contrasts(data.catapult$h) -
contrasts(data.catapult$s) -
contrasts(data.catapult$l) -
contrasts(data.catapult$e) -
contr.poly(3, contrasts=F)
contrasts(data.catapult$b) - contr.poly(2, contrasts=F)
lm1 - lm(Distance ~ .^2, data=data.catapult)
summary(lm1)

gets you closer (same intercept at least), but I can't explain the
remaining differences. I'm not even sure why the results to look like
they do (interaction terms like a*b not a:b and one level for each
interaction).

Hope that helps,
Simon Knapp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help Digest, Vol 112, Issue 25

2012-06-25 Thread John C Nash
While lm() is a linear modeling, the constraints make it easier to solve with a 
nonlinear
tool. Both my packages Rvmmin and Rcgmin (I recommend the R-forge versions as 
more
up-to-date) have bounds constraints and masks i.e., fixed parameters.

I am actually looking for example problems of this type that are more recent 
than the ones
that got me into this 30 years ago. Do contact me off-list if you have 
something that
could be shared. I'd also welcome discussion on appropriate tools for such 
constrained
linear modeling problems. They are, I believe, more or less present in most 
linear
modeling situations, but we rarely impose the constraints explicitly, and tend 
to use lm()
and (hopefully) check if the solution obeys the conditions.

Best,

John Nash


On 06/25/2012 06:00 AM, r-help-requ...@r-project.org wrote:
 Message: 5
 Date: Sun, 24 Jun 2012 03:34:10 -0700 (PDT)
 From: rgoodman rosa.good...@gmail.com
 To: r-help@r-project.org
 Subject: Re: [R] Constrained coefficients in lm (correction)
 Message-ID: 1340534050627-4634321.p...@n4.nabble.com
 Content-Type: text/plain; charset=us-ascii
 
 Hi Jorge,
 
 Did you ever figure this out? I want to do the same thing with the
 additional constraint of the coef for x1 = 2. 
 
 lm(Y~offset(2*x1)+x2+x3,data=mydata)
 where b= coeff for x2, c = coeff for x3, b+c=1 and b and c0. 
 
 I've loaded the systemfit package, but the suggestion R*beta0 = q, where R
 is R.restr and q is q.restr in the function call makes no sense to me.
 
 Cheers,
 Rosie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fractional Factorial - Wrong values using lm-function

2012-06-25 Thread Simon Knapp
... but this is tantalisingly close:

dat1 - with(data.catapult,
data.frame(
Distance,
h=C(h, poly, 1),
s=C(s, poly, 1),
l=C(l, poly, 1),
e=C(e, poly, 1),
b=C(b, poly, 1)
)
)
lm4 - lm(Distance ~ .^2, data = dat1)
summary(lm4)

... wish I knew what it meant.



On Tue, Jun 26, 2012 at 12:18 AM, Simon Knapp sleepingw...@gmail.com wrote:
 They are coding the variables as factors and using orthogonal
 polynomial contrasts. This:

 data.catapult - data.frame(data.catapult$Distance,
 do.call(data.frame, lapply(data.catapult[-1], factor, ordered=T)))
 contrasts(data.catapult$h) -
 contrasts(data.catapult$s) -
 contrasts(data.catapult$l) -
 contrasts(data.catapult$e) -
 contr.poly(3, contrasts=F)
 contrasts(data.catapult$b) - contr.poly(2, contrasts=F)
 lm1 - lm(Distance ~ .^2, data=data.catapult)
 summary(lm1)

 gets you closer (same intercept at least), but I can't explain the
 remaining differences. I'm not even sure why the results to look like
 they do (interaction terms like a*b not a:b and one level for each
 interaction).

 Hope that helps,
 Simon Knapp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fractional Factorial - Wrong values using lm-function

2012-06-25 Thread Simon Knapp
and finally...

thingy - function(x) {
x - C(x, poly, 1)
tmp - contrasts(x)
contrasts(x, 1) - 2 * tmp / sum(abs(tmp))
x
}

dat2 - with(data.catapult,
data.frame(
Distance,
h=thingy(h),
s=thingy(s),
l=thingy(l),
e=thingy(e),
b=thingy(b)
)
)
lm5 - lm(Distance ~ .^2, data = dat2)
summary(lm5)




On Tue, Jun 26, 2012 at 12:35 AM, Simon Knapp sleepingw...@gmail.com wrote:
 ... but this is tantalisingly close:

 dat1 - with(data.catapult,
    data.frame(
        Distance,
        h=C(h, poly, 1),
        s=C(s, poly, 1),
        l=C(l, poly, 1),
        e=C(e, poly, 1),
        b=C(b, poly, 1)
    )
 )
 lm4 - lm(Distance ~ .^2, data = dat1)
 summary(lm4)

 ... wish I knew what it meant.



 On Tue, Jun 26, 2012 at 12:18 AM, Simon Knapp sleepingw...@gmail.com wrote:
 They are coding the variables as factors and using orthogonal
 polynomial contrasts. This:

 data.catapult - data.frame(data.catapult$Distance,
 do.call(data.frame, lapply(data.catapult[-1], factor, ordered=T)))
 contrasts(data.catapult$h) -
 contrasts(data.catapult$s) -
 contrasts(data.catapult$l) -
 contrasts(data.catapult$e) -
 contr.poly(3, contrasts=F)
 contrasts(data.catapult$b) - contr.poly(2, contrasts=F)
 lm1 - lm(Distance ~ .^2, data=data.catapult)
 summary(lm1)

 gets you closer (same intercept at least), but I can't explain the
 remaining differences. I'm not even sure why the results to look like
 they do (interaction terms like a*b not a:b and one level for each
 interaction).

 Hope that helps,
 Simon Knapp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

2012-06-25 Thread Simon Wood

Martin,

I had a nagging feeling that there must be more to this than my previous 
reply suggested, and have been digging a bit further. Basically I would 
expect these p-values to not be great if you had nested random effects 
(such as main effects + their interaction), but your case looked rather 
straightforward...


After some digging it turns out that the p-value for Placename is wrong 
in this case, as you suspected. R routine `qr' has a `tol' argument that 
by default is set to 1e-7. In the computation of the test statistic for 
re terms I had called qr without changing this default to the value 0 
that is actually appropriate (I should have noticed this, I've been 
caught out by it before). With the correct 'tol', Placement is 
non-significant. This will be fixed in the next release, but that won't 
happen until I've tested the random effects p-value approximations a bit 
more thoroughly.


best,
Simon



On 11/06/12 14:56, Martijn Wieling wrote:

Dear Simon,

I ran an additional analysis using bam (mgcv 1.7-17) with three random
intercepts and no non-linearities, and compared these to the results
of lmer (lme4). Using bam results in a significant random intercept
(even though it has a very low edf-value), while the lmer results show
no variance associated to the random intercept of Placename. Should I
drop the random intercept of Placename and if so, how is this apparent
from the results of bam?

Summaries of both models are shown below.

With kind regards,
Martijn

 l1 = lmer(RefPMIdistMeanLog.c ~ geogamfit + (1|Word) + (1|Key) +
(1|Placename), data=wrddst); print(l1,cor=F)

Linear mixed model fit by REML
Formula: RefPMIdistMeanLog.c ~ geogamfit + (1 | Word) + (1 | Key) + (1
|  Placename)
Data: wrddst
 AICBIC logLik deviance REMLdev
  -44985 -44927  22498   -45009  -44997
Random effects:
  GroupsNameVariance  Std.Dev.
  Word  (Intercept) 0.0800944 0.283009
  Key   (Intercept) 0.0013641 0.036933
  Placename (Intercept) 0.000 0.00
  Residual  0.0381774 0.195390
Number of obs: 112608, groups: Word, 357; Key, 320; Placename, 40

Fixed effects:
 Estimate Std. Error t value
(Intercept) -0.003420.01513   -0.23
geogamfit0.992490.02612   37.99


 m1 = bam(RefPMIdistMeanLog.c ~ geogamfit + s(Word,bs=re) +
s(Key,bs=re) + s(Placename,bs=re), data=wrddst,method=REML);
summary(m1,freq=F)

Family: gaussian
Link function: identity

Formula:
RefPMIdistMeanLog.c ~ geogamfit + s(Word, bs = re) + s(Key,
 bs = re) + s(Placename, bs = re)

Parametric coefficients:
 Estimate Std. Error t value Pr(|t|)
(Intercept) -0.003420.01513  -0.2260.821
geogamfit0.992490.02612  37.9912e-16 ***
---
Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1

Approximate significance of smooth terms:
edf Ref.df   F p-value
s(Word)  3.554e+02347 634.7162e-16 ***
s(Key)   2.946e+02316  23.0542e-16 ***
s(Placename) 1.489e-04 38   7.2822e-16 ***
---
Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1

R-sq.(adj) =  0.693   Deviance explained = 69.4%
REML score = -22498  Scale est. = 0.038177  n = 112608


On Wed, May 23, 2012 at 11:30 AM, Simon Wood-4 [via R]
ml-node+s789695n4631060...@n4.nabble.com  wrote:

Having looked at this further, I've made some changes in mgcv_1.7-17 to
the p-value computations for terms that can be penalized to zero during
fitting (e.g. s(x,bs=re), s(x,m=1) etc).

The Wald statistic based p-values from summary.gam and anova.gam (i.e.
what you get from e.g. anova(a) where a is a fitted gam object) are
quite well founded for smooth terms that are non-zero under full
penalization (e.g. a cubic spline is a straight line under full
penalization). For such smooths, an extension of Nychka's (1988) result
on CI's for splines gives a well founded distributional result on which
to base a Wald statistic. However, the Nychka result requires the
smoothing bias to be substantially less than the smoothing estimator
variance, and this will often not be the case if smoothing can actually
penalize a term to zero (to understand why, see argument in appendix of
Marra  Wood, 2012, Scandinavian Journal of Statistics, 39,53-74).

Simulation testing shows that this theoretical concern has serious
practical consequences. So for terms that can be penalized to zero,
alternative approximations have to be used, and these are now
implemented in mgcv_1.7-17 (see ?summary.gam).

The approximate test performed by anova(a,b) (a and b are fitted gam
objects) is less well founded. It is a reasonable approximation when
each smooth term in the models could in principle be well approximated
by an unpenalized term of rank approximately equal to the edf of the
smooth term, but otherwise the p-values produced are likely to be much
too small. In particular simulation testing suggests that the test is
not to be trusted with s(...,bs=re) terms, and can be poor if the
models 

[R] compare between mixdist models

2012-06-25 Thread Yakir Gagnon
Hi all!
Is there any way to compare mixdist models with a different number of
components using AIC or BIC?
I'm looking for some similar functionality as in flexmix, but not sure
which to stick with: mixdist or flexmix..

Yakir Gagnon
cell+1 919 886 3877
office +1 919 684 7188
Johnsen Lab
Biology Department
Box 90338
Duke University
Durham, NC 27708
BioSci Building
Room 307
http://fds.duke.edu/db/aas/Biology/postdoc/yg32
http://www.biology.duke.edu/johnsenlab/people/yakir.html

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using multiple cpu's - scaling in processing power

2012-06-25 Thread soren wilkening
Thanks Drew

a first glance at Condor tells me that this will very likely do what I need
in terms of flexible allocation of (cpu) resources. Unfortunately there is
not much R integration so I will need to see if setting up Condor and
submitting R jobs to it is worth the effort.

Cheers

Soren

-
http://censix.com
--
View this message in context: 
http://r.789695.n4.nabble.com/using-multiple-cpu-s-scaling-in-processing-power-tp4634405p4634438.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Tight Axes in Prepanel Function

2012-06-25 Thread Elliot Joel Bernstein
How do I specify a tight y-axis, where the plot completely fills the
y-axis range, inside the prepanel function? For example, consider the
following code:

require(lattice)

set.seed(12345)
x - 1:1000
y - cumsum(rnorm(length(x)))


prepanel.test - function(x, y, groups = NULL, subscripts = NULL, ...) {

  if (is.null(groups)) {
result - list(ylim = range(y))
  } else {
result - list(ylim = range(y, finite = TRUE))
  }

  return (result)

}

print(xyplot(y~x, prepanel = prepanel.test, type = 'l', grid = TRUE), split
= c(1,1,1,2), more = TRUE)
print(xyplot(y~x, ylim = range(y, finite = TRUE), type = 'l', grid = TRUE),
split = c(1,2,1,2))

The top plot has extra whitespace at the top and bottom. Is there any way
to eliminate this without having to specify 'ylim' directly in the call the
'xyplot'? (The reason I want to include this in the prepanel function is
that I want to add conditioning and use the combineLimits function in the
latticeExtra package.)

Thanks.

- Elliot

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using multiple cpu's - scaling in processing power

2012-06-25 Thread soren wilkening
Oh I was wrong. R package 'GridR' seems to be what one needs to run R scripts
on Condor directly ... interesting.  thanks again.

-
http://censix.com
--
View this message in context: 
http://r.789695.n4.nabble.com/using-multiple-cpu-s-scaling-in-processing-power-tp4634405p4634442.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combineLimits and Dates

2012-06-25 Thread Elliot Joel Bernstein
I'm having some trouble using the latticeExtra 'combineLimits' function
with a Date x-variable:

require(lattice)

set.seed(12345)

dates - seq(as.Date(2011-01-01), as.Date(2011-12-31), days)
dat - data.frame(d = rep(dates, 4),
  g = factor(rep(rep(c(1,2), each = length(dates)), 2)),
  h = factor(rep(c(a, b), each = length(dates)*2)),
  x = rnorm(4 * length(dates)),
  y = rnorm(4 * length(dates)))

plt1 - xyplot(x + y ~ d | g, groups = h, data = dat, type = 'l', scales =
list(relation = free),
   auto.key = TRUE)
plt1 - useOuterStrips(plt1)
plt1 - combineLimits(plt1)

The x-axis labels are right after the call to 'useOuterStrips' but they get
converted to numeric after the call to 'combineLimits'. How do I keep them
as date labels?

Thanks.

- Elliot

-- 
Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
134 Mount Auburn Street | Cambridge, MA | 02138
Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] array complexity for the MH test

2012-06-25 Thread Richard M. Heiberger
I will try again.

1. If you have 5 groups, and split each in half, then you have 10 groups.
If MH is applicable to 5 groups, then it should be applicable to the 10
groups as
long as they are disjoint groups.

2. I don't think MH applies to your example because the groups do not have
similar behavior.
That is why I suggested you look at dose response behavior instead.

3. vcd is by David Meyer [aut, cre], Achim Zeileis [aut], Kurt Hornik [aut],
Michael Friendly [ctb]
Rich
On Sun, Jun 24, 2012 at 5:30 AM, francogrex francog...@mail.com wrote:

 Thanks for your answer. The answer advertises your VCD package, which by
 the
 way is a very nice package that I use and recommend for everyone doing such
 kind of data analysis. However if you really examine the answer you gave
 me,
 it does not really or specifically answer my question.

 --
 View this message in context:
 http://r.789695.n4.nabble.com/array-complexity-for-the-MH-test-tp4633999p4634318.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] setdiff datframes

2012-06-25 Thread nathalie

hi,


I have 2 files example 1 and example 2 and would like to know what is in 
example2 and not in example1 (attached)

V1 contain data which could be in duplicated which I am using as identifiers

I used setdiff(example2$V1,example1$V1) to find the identifiers which 
are specific to example2:

[1] rs2276598  rs17253672

I am looking for a way to get an  output with  all columns (V1 to V14) 
for these 2 identifiers


thanks for any suggestions


format example1
   V1  V2 V3  V4  V5 V6
1  rs4685 2:198257795  C ENSG0115524 ENST0424674 Transcript
2  rs4685 2:198257795  C ENSG0115524 ENST0335508 Transcript
3rs788018 2:198265526  G ENSG0115524 ENST0335508 Transcript
4rs788023 2:198283305  C ENSG0115524 ENST0335508 Transcript
5  rs41284843  2:25536827  A ENSG0119772 ENST0406659 Transcript
6  rs41284843  2:25536827  A ENSG0119772 ENST0321117 Transcript
7  rs41284843  2:25536827  A ENSG0119772 ENST0264709 Transcript
8  rs41284843  2:25536827  A ENSG0119772 ENST0380756 Transcript
9   rs3729680 3:178927410  G ENSG0121879 ENST0263967 Transcript
10 rs61744960 4:106156163  A ENSG0168769 ENST0305737 Transcript
11 rs61744960 4:106156163  A ENSG0168769 ENST0413648 Transcript
12 rs61744960 4:106156163  A ENSG0168769 ENST0540549 Transcript
13 rs61744960 4:106156163  A ENSG0168769 ENST0545826 Transcript
14 rs61744960 4:106156163  A ENSG0168769 ENST0380013 Transcript
15 rs61744960 4:106156163  A ENSG0168769 ENST0535110 Transcript
16 rs61744960 4:106156163  A ENSG0168769 ENST0394764 Transcript
17 rs61744960 4:106156163  A ENSG0168769 ENST0513237 Transcript
18 rs61744960 4:106156163  A ENSG0168769 ENST0265149 Transcript
19  rs2454206 4:106196951  G ENSG0168769 ENST0540549 Transcript
20  rs2454206 4:106196951  G ENSG0168769 ENST0513237 Transcript
 V7   V8   V9  V10 V11 V12V13
1 SYNONYMOUS_CODING  704  705  235   V gtA/gtG 
rs4685
2 SYNONYMOUS_CODING 3749 3657 1219   V gtA/gtG 
rs4685

3 SYNONYMOUS_CODING 2723 2631  877   G ggT/ggC rs788018
4 SYNONYMOUS_CODING  515  423  141   K aaA/aaG rs788023
5 SYNONYMOUS_CODING  365   279   P ccC/ccT 
rs41284843
6 SYNONYMOUS_CODING  264   279   P ccC/ccT 
rs41284843
7 SYNONYMOUS_CODING  365   279   P ccC/ccT 
rs41284843
8  NMD_TRANSCRIPT,SYNONYMOUS_CODING  264   279   P ccC/ccT 
rs41284843

9 NON_SYNONYMOUS_CODING 1330 1173  391 I/M atA/atG rs3729680
10NON_SYNONYMOUS_CODING 1468 1064  355 G/D gGt/gAt 
rs61744960
11NON_SYNONYMOUS_CODING 1204 1064  355 G/D gGt/gAt 
rs61744960
12NON_SYNONYMOUS_CODING 1924 1064  355 G/D gGt/gAt 
rs61744960
13NON_SYNONYMOUS_CODING 1924 1064  355 G/D gGt/gAt 
rs61744960
14NON_SYNONYMOUS_CODING 1450 1064  355 G/D gGt/gAt 
rs61744960
15NON_SYNONYMOUS_CODING 1167 1064  355 G/D gGt/gAt 
rs61744960
16NON_SYNONYMOUS_CODING 1450 1064  355 G/D gGt/gAt 
rs61744960
17NON_SYNONYMOUS_CODING 1924 1127  376 G/D gGt/gAt 
rs61744960
18 NMD_TRANSCRIPT,NON_SYNONYMOUS_CODING 1450 1064  355 G/D gGt/gAt 
rs61744960

19NON_SYNONYMOUS_CODING 6144 5284 1762 I/V Ata/Gta rs2454206
20NON_SYNONYMOUS_CODING 6144 5347 1783 I/V Ata/Gta rs2454206
V14
1 ENSP=ENSP0409435;HGNC=SF3B1
2 ENSP=ENSP0335321;HGNC=SF3B1
3 ENSP=ENSP0335321;HGNC=SF3B1
4 ENSP=ENSP0335321;HGNC=SF3B1
5 ENSP=ENSP0384852;HGNC=DNMT3A
6 ENSP=ENSP0324375;HGNC=DNMT3A
7 ENSP=ENSP0264709;HGNC=DNMT3A
8 ENSP=ENSP0370132;HGNC=DNMT3A
9 
ENSP=ENSP0263967;PolyPhen=benign(0.019);SIFT=tolerated(0.13);HGNC=PIK3CA
10 
ENSP=ENSP0306705;PolyPhen=probably_damaging(0.983);SIFT=deleterious(0.01);HGNC=TET2
11 
ENSP=ENSP0391448;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2
12 
ENSP=ENSP0442788;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2
13 
ENSP=ENSP0442867;PolyPhen=probably_damaging(0.952);SIFT=deleterious(0.01);HGNC=TET2
14 
ENSP=ENSP0369351;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2
15 
ENSP=ENSP0438851;PolyPhen=probably_damaging(0.998);SIFT=deleterious(0.01);HGNC=TET2
16 
ENSP=ENSP0378245;PolyPhen=probably_damaging(0.983);SIFT=deleterious(0.01);HGNC=TET2
17 
ENSP=ENSP0425443;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2
18 
ENSP=ENSP0265149;PolyPhen=probably_damaging(0.952);SIFT=deleterious(0.01);HGNC=TET2
19 
ENSP=ENSP0442788;PolyPhen=benign(0.029);SIFT=tolerated(0.15);HGNC=TET2
20 
ENSP=ENSP0425443;PolyPhen=benign(0.029);SIFT=tolerated(0.15);HGNC=TET2



 example2
   V1  V2 V3   

Re: [R] setdiff datframes

2012-06-25 Thread Rui Barradas

Hello,

It is recomended to post data using dput(). Something like

dput(head(DF, 20))  # or 30

Then, paste the output of this command in a post.

Untested, but I think it should work:


# Create a logical index into 'example2'
ix - example2$V1 %in% setdiff(example2$V1,example1$V1)
example2[ix, ]  # which rows does it index?


You could also save the results of setdiff() and then see which of 
example2$V1 are %in% it. If the data.frame has many rows, this way could 
save some memory. (setdifff(9 gives only 2 character values, versus 2 
TRUE plus n-2 FALSE).


Hope this helps,

Rui Barradas

Em 25-06-2012 17:37, nathalie escreveu:

hi,


I have 2 files example 1 and example 2 and would like to know what is in
example2 and not in example1 (attached)
V1 contain data which could be in duplicated which I am using as
identifiers

I used setdiff(example2$V1,example1$V1) to find the identifiers which
are specific to example2:
[1] rs2276598  rs17253672

I am looking for a way to get an  output with  all columns (V1 to V14)
for these 2 identifiers

thanks for any suggestions


format example1
V1  V2 V3  V4  V5 V6
1  rs4685 2:198257795  C ENSG0115524 ENST0424674 Transcript
2  rs4685 2:198257795  C ENSG0115524 ENST0335508 Transcript
3rs788018 2:198265526  G ENSG0115524 ENST0335508 Transcript
4rs788023 2:198283305  C ENSG0115524 ENST0335508 Transcript
5  rs41284843  2:25536827  A ENSG0119772 ENST0406659 Transcript
6  rs41284843  2:25536827  A ENSG0119772 ENST0321117 Transcript
7  rs41284843  2:25536827  A ENSG0119772 ENST0264709 Transcript
8  rs41284843  2:25536827  A ENSG0119772 ENST0380756 Transcript
9   rs3729680 3:178927410  G ENSG0121879 ENST0263967 Transcript
10 rs61744960 4:106156163  A ENSG0168769 ENST0305737 Transcript
11 rs61744960 4:106156163  A ENSG0168769 ENST0413648 Transcript
12 rs61744960 4:106156163  A ENSG0168769 ENST0540549 Transcript
13 rs61744960 4:106156163  A ENSG0168769 ENST0545826 Transcript
14 rs61744960 4:106156163  A ENSG0168769 ENST0380013 Transcript
15 rs61744960 4:106156163  A ENSG0168769 ENST0535110 Transcript
16 rs61744960 4:106156163  A ENSG0168769 ENST0394764 Transcript
17 rs61744960 4:106156163  A ENSG0168769 ENST0513237 Transcript
18 rs61744960 4:106156163  A ENSG0168769 ENST0265149 Transcript
19  rs2454206 4:106196951  G ENSG0168769 ENST0540549 Transcript
20  rs2454206 4:106196951  G ENSG0168769 ENST0513237 Transcript
  V7   V8   V9  V10 V11 V12V13
1 SYNONYMOUS_CODING  704  705  235   V gtA/gtG rs4685
2 SYNONYMOUS_CODING 3749 3657 1219   V gtA/gtG rs4685
3 SYNONYMOUS_CODING 2723 2631  877   G ggT/ggC rs788018
4 SYNONYMOUS_CODING  515  423  141   K aaA/aaG rs788023
5 SYNONYMOUS_CODING  365   279   P ccC/ccT
rs41284843
6 SYNONYMOUS_CODING  264   279   P ccC/ccT
rs41284843
7 SYNONYMOUS_CODING  365   279   P ccC/ccT
rs41284843
8  NMD_TRANSCRIPT,SYNONYMOUS_CODING  264   279   P ccC/ccT
rs41284843
9 NON_SYNONYMOUS_CODING 1330 1173  391 I/M atA/atG
rs3729680
10NON_SYNONYMOUS_CODING 1468 1064  355 G/D gGt/gAt
rs61744960
11NON_SYNONYMOUS_CODING 1204 1064  355 G/D gGt/gAt
rs61744960
12NON_SYNONYMOUS_CODING 1924 1064  355 G/D gGt/gAt
rs61744960
13NON_SYNONYMOUS_CODING 1924 1064  355 G/D gGt/gAt
rs61744960
14NON_SYNONYMOUS_CODING 1450 1064  355 G/D gGt/gAt
rs61744960
15NON_SYNONYMOUS_CODING 1167 1064  355 G/D gGt/gAt
rs61744960
16NON_SYNONYMOUS_CODING 1450 1064  355 G/D gGt/gAt
rs61744960
17NON_SYNONYMOUS_CODING 1924 1127  376 G/D gGt/gAt
rs61744960
18 NMD_TRANSCRIPT,NON_SYNONYMOUS_CODING 1450 1064  355 G/D gGt/gAt
rs61744960
19NON_SYNONYMOUS_CODING 6144 5284 1762 I/V Ata/Gta
rs2454206
20NON_SYNONYMOUS_CODING 6144 5347 1783 I/V Ata/Gta
rs2454206
V14
1 ENSP=ENSP0409435;HGNC=SF3B1
2 ENSP=ENSP0335321;HGNC=SF3B1
3 ENSP=ENSP0335321;HGNC=SF3B1
4 ENSP=ENSP0335321;HGNC=SF3B1
5 ENSP=ENSP0384852;HGNC=DNMT3A
6 ENSP=ENSP0324375;HGNC=DNMT3A
7 ENSP=ENSP0264709;HGNC=DNMT3A
8 ENSP=ENSP0370132;HGNC=DNMT3A
9
ENSP=ENSP0263967;PolyPhen=benign(0.019);SIFT=tolerated(0.13);HGNC=PIK3CA

10
ENSP=ENSP0306705;PolyPhen=probably_damaging(0.983);SIFT=deleterious(0.01);HGNC=TET2

11
ENSP=ENSP0391448;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2

12
ENSP=ENSP0442788;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2

13
ENSP=ENSP0442867;PolyPhen=probably_damaging(0.952);SIFT=deleterious(0.01);HGNC=TET2

14

Re: [R] Very simple question R5 setRefClass and Initialize

2012-06-25 Thread Martin Morgan
When I try your code I'm told that 'mongo' is not a defined class. Removing 
that field from the definition, I do not get a print on definition. 

 mongoDbUser = setRefClass(mongoDbUser,
+ fields = list(
+ auth = logical,
+ host = character,
+ username = character,
+ password = character,
+ db = character
+ ),
+   methods = list(
+ initialize =
+ function(auth,host,username,password,db,connector){
+   print(initialization)
+ }
+   )
+ )
 R.Version()$version.string
[1] R version 2.15.1 Patched (2012-06-22 r59603)

So I guess there's more to the story.

In S4-land initialize methods can be tricky (e.g., invoked when an object is 
coerced between types) and a common pattern is to write a constructor rather 
than initialize methods

MongoDbUser -
function(username, password, auth, host, db, connector)
{
   ## initialize, then
   mongoDbUser$new(auth, host, username, password, db, connector
}

this also separates implementation from interface and provides the opportunity 
to give hints to the user about data types

Martin Morgan

- Original Message -
 Help anyone? Is this impossible? I tried to use the initFields method
 but it did not change anything.
 
 Le 25/06/2012 10:55, Julien Cochennec a écrit :
  Hi,
  New to this list, and glad I found R5 which si a great improvement
  to
  R for my work.
  Something seems odd anyway, when I do a setRefClass, the initialize
  method execute at the class definition.
  I would like it to execute only when an instance is created.
  How can I do that? I found no way to test if the code execution is
  when the class definition happens or when a $new instance is
  created.
  Thanks for your help.
  For example my class definition looks like this :
 
  mongoDbUser = setRefClass(mongoDbUser,
  fields = list(
  auth = logical,
  host = character,
  username = character,
  password = character,
  db = character,
  connector = mongo
  ),
methods = list(
  initialize =
  function(auth,host,username,password,db,connector){
print(initialization)
  }
)
  )
 
  If I execute this code, it says initialization, but it shouldn't,
  I
  created no instance, right??
  I would like initialization to appear only when I do :
  mongoDbUser$new(...)
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] graph displays

2012-06-25 Thread MSousa

Good Afternoon, I'm trying to create a graph that displays the best way the
following information.

    For instance organized by bar graph, A, B, C



Source X1000s X600s X500s X250s X100s X50s X10s X5s X3s X1s
1  A 476375   116   125  129  131 131 131 131
2  B 3764451125   19   61 131 186 186
3  C 1762256612   29   91 171 186 186

thanks



--
View this message in context: 
http://r.789695.n4.nabble.com/graph-displays-tp4634448.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graph displays

2012-06-25 Thread Sarah Goslee
There's no way we can tell you the best way to display your
information, because we don't know anything about it. The best display
method has a lot to do with what the data are, and what you're trying
to illustrate. That said, here are two possibilities, one using the
bar graph you requested, and both using only base graphics.

# PLEASE use dput() to provide your data, rather than pasting it in
testdata - structure(list(Source = c(A, B, C), X1000s = c(47L, 37L,
17L), X600s = c(63L, 64L, 62L), X500s = c(75L, 45L, 25L), X250s = c(116L,
11L, 66L), X100s = c(125L, 25L, 12L), X50s = c(129L, 19L, 29L
), X10s = c(131L, 61L, 91L), X5s = c(131L, 131L, 171L), X3s = c(131L,
186L, 186L), X1s = c(131L, 186L, 186L)), .Names = c(Source,
X1000s, X600s, X500s, X250s, X100s, X50s, X10s,
X5s, X3s, X1s), class = data.frame, row.names = c(1,
2, 3))

barplot(as.matrix(testdata[,-1]), beside=TRUE, col=1:3)
legend(topleft, LETTERS[1:3], col=1:3, pch=15)

plot(1:10, testdata[1, -1], type=b, ylim=c(0, 200), xaxt=n, col=1)
axis(1, 1:10, colnames(testdata)[-1])
lines(1:10, testdata[2, -1], type=b, col=2)
lines(1:10, testdata[3, -1], type=b, col=3)
legend(topleft, LETTERS[1:3], col=1:3, pch=15)

Sarah

On Mon, Jun 25, 2012 at 12:24 PM, MSousa ricardosousa2...@clix.pt wrote:

 Good Afternoon, I'm trying to create a graph that displays the best way the
 following information.

     For instance organized by bar graph, A, B, C



 Source X1000s X600s X500s X250s X100s X50s X10s X5s X3s X1s
 1      A     47    63    75   116   125  129  131 131 131 131
 2      B     37    64    45    11    25   19   61 131 186 186
 3      C     17    62    25    66    12   29   91 171 186 186

 thanks



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] fitting a Bi-Weibull

2012-06-25 Thread Luis Borda de Agua
I want to fit a bi-Weibull distribution, as defined by Evans et al. in their 
book Statistical distributions:

http://www.scribd.com/doc/57260630/164/FIVE-PARAMETER-BI-WEIBULL-DISTRIBUTION

I would like to know if there is any package in R that already does it, or any 
quick procedure to accomplish it.

Thank you in advance, 

Luís Borda de Água

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] setdiff datframes

2012-06-25 Thread nac
thanks  
it works brilliantly



wrote:
 Hello,
 
 It is recomended to post data using dput(). Something like
 
 dput(head(DF, 20))  # or 30
 
 Then, paste the output of this command in a post.
 
 Untested, but I think it should work:
 
 
 # Create a logical index into 'example2'
 ix - example2$V1 %in% setdiff(example2$V1,example1$V1)
 example2[ix, ]  # which rows does it index?
 
 
 You could also save the results of setdiff() and then see which of 
 example2$V1 are %in% it. If the data.frame has many rows, this way could

 save some memory. (setdifff(9 gives only 2 character values, versus 2 
 TRUE plus n-2 FALSE).
 
 Hope this helps,
 
 Rui Barradas
 
 Em 25-06-2012 17:37, nathalie escreveu:
 hi,


 I have 2 files example 1 and example 2 and would like to know what is
in
 example2 and not in example1 (attached)
 V1 contain data which could be in duplicated which I am using as
 identifiers

 I used setdiff(example2$V1,example1$V1) to find the identifiers which
 are specific to example2:
 [1] rs2276598  rs17253672

 I am looking for a way to get an  output with  all columns (V1 to V14)
 for these 2 identifiers

 thanks for any suggestions


 format example1
 V1  V2 V3  V4  V5 V6
 1  rs4685 2:198257795  C ENSG0115524 ENST0424674 Transcript
 2  rs4685 2:198257795  C ENSG0115524 ENST0335508 Transcript
 3rs788018 2:198265526  G ENSG0115524 ENST0335508 Transcript
 4rs788023 2:198283305  C ENSG0115524 ENST0335508 Transcript
 5  rs41284843  2:25536827  A ENSG0119772 ENST0406659 Transcript
 6  rs41284843  2:25536827  A ENSG0119772 ENST0321117 Transcript
 7  rs41284843  2:25536827  A ENSG0119772 ENST0264709 Transcript
 8  rs41284843  2:25536827  A ENSG0119772 ENST0380756 Transcript
 9   rs3729680 3:178927410  G ENSG0121879 ENST0263967 Transcript
 10 rs61744960 4:106156163  A ENSG0168769 ENST0305737 Transcript
 11 rs61744960 4:106156163  A ENSG0168769 ENST0413648 Transcript
 12 rs61744960 4:106156163  A ENSG0168769 ENST0540549 Transcript
 13 rs61744960 4:106156163  A ENSG0168769 ENST0545826 Transcript
 14 rs61744960 4:106156163  A ENSG0168769 ENST0380013 Transcript
 15 rs61744960 4:106156163  A ENSG0168769 ENST0535110 Transcript
 16 rs61744960 4:106156163  A ENSG0168769 ENST0394764 Transcript
 17 rs61744960 4:106156163  A ENSG0168769 ENST0513237 Transcript
 18 rs61744960 4:106156163  A ENSG0168769 ENST0265149 Transcript
 19  rs2454206 4:106196951  G ENSG0168769 ENST0540549 Transcript
 20  rs2454206 4:106196951  G ENSG0168769 ENST0513237 Transcript
   V7   V8   V9  V10 V11 V12   
   V13
 1 SYNONYMOUS_CODING  704  705  235   V gtA/gtG
rs4685
 2 SYNONYMOUS_CODING 3749 3657 1219   V gtA/gtG
rs4685
 3 SYNONYMOUS_CODING 2723 2631  877   G ggT/ggC
 rs788018
 4 SYNONYMOUS_CODING  515  423  141   K aaA/aaG
 rs788023
 5 SYNONYMOUS_CODING  365   279   P ccC/ccT
 rs41284843
 6 SYNONYMOUS_CODING  264   279   P ccC/ccT
 rs41284843
 7 SYNONYMOUS_CODING  365   279   P ccC/ccT
 rs41284843
 8  NMD_TRANSCRIPT,SYNONYMOUS_CODING  264   279   P ccC/ccT
 rs41284843
 9 NON_SYNONYMOUS_CODING 1330 1173  391 I/M atA/atG
 rs3729680
 10NON_SYNONYMOUS_CODING 1468 1064  355 G/D gGt/gAt
 rs61744960
 11NON_SYNONYMOUS_CODING 1204 1064  355 G/D gGt/gAt
 rs61744960
 12NON_SYNONYMOUS_CODING 1924 1064  355 G/D gGt/gAt
 rs61744960
 13NON_SYNONYMOUS_CODING 1924 1064  355 G/D gGt/gAt
 rs61744960
 14NON_SYNONYMOUS_CODING 1450 1064  355 G/D gGt/gAt
 rs61744960
 15NON_SYNONYMOUS_CODING 1167 1064  355 G/D gGt/gAt
 rs61744960
 16NON_SYNONYMOUS_CODING 1450 1064  355 G/D gGt/gAt
 rs61744960
 17NON_SYNONYMOUS_CODING 1924 1127  376 G/D gGt/gAt
 rs61744960
 18 NMD_TRANSCRIPT,NON_SYNONYMOUS_CODING 1450 1064  355 G/D gGt/gAt
 rs61744960
 19NON_SYNONYMOUS_CODING 6144 5284 1762 I/V Ata/Gta
 rs2454206
 20NON_SYNONYMOUS_CODING 6144 5347 1783 I/V Ata/Gta
 rs2454206
 V14
 1 ENSP=ENSP0409435;HGNC=SF3B1
 2 ENSP=ENSP0335321;HGNC=SF3B1
 3 ENSP=ENSP0335321;HGNC=SF3B1
 4 ENSP=ENSP0335321;HGNC=SF3B1
 5 ENSP=ENSP0384852;HGNC=DNMT3A
 6 ENSP=ENSP0324375;HGNC=DNMT3A
 7 ENSP=ENSP0264709;HGNC=DNMT3A
 8 ENSP=ENSP0370132;HGNC=DNMT3A
 9

ENSP=ENSP0263967;PolyPhen=benign(0.019);SIFT=tolerated(0.13);HGNC=PIK3CA

 10

ENSP=ENSP0306705;PolyPhen=probably_damaging(0.983);SIFT=deleterious(0.01);HGNC=TET2

 11

ENSP=ENSP0391448;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2

 12


Re: [R] Questions about doing analysis based on time

2012-06-25 Thread APOCooter
Oh, whoops.  When I responded the first time, I had done dput(head(Sunday,
100)) too, but it didn't look useful.  It was only just now that I saw that
it basically prints out the vectors.  Sorry about that.

Anyways, here's dput(head(Sunday, 100)):

 dput(head(Sunday, 100))
structure(list(SunDate = structure(c(1046L, 1058L, 1080L, 1462L, 
1268L, 977L, 935L, 158L, 646L, 46L, 1357L, 1131L, 465L, 512L, 
1562L, 1555L, 1401L, 503L, 294L, 200L, 1368L, 701L, 364L, 318L, 
1120L, 1316L, 486L, 338L, 1604L, 1496L, 925L, 201L, 1007L, 1339L, 
758L, 615L, 896L, 38L, 969L, 1543L, 1212L, 612L, 904L, 1197L, 
372L, 842L, 571L, 1596L, 1229L, 408L, 912L, 1026L, 1452L, 473L, 
409L, 714L, 578L, 1059L, 743L, 579L, 495L, 1280L, 460L, 1081L, 
1470L, 1348L, 865L, 346L, 725L, 118L, 604L, 604L, 254L, 397L, 
1605L, 149L, 389L, 785L, 1163L, 1206L, 707L, 1189L, 14L, 1590L, 
1269L, 26L, 1476L, 513L, 662L, 874L, 283L, 670L, 1613L, 1075L, 
1112L, 263L, 623L, 1425L, 1240L, 1240L), .Label = c(1/1/2012 10:27, 
1/1/2012 13:59, 1/1/2012 14:50, 1/1/2012 15:44, 1/1/2012 16:48, 
1/1/2012 20:43, 1/1/2012 22:41, 1/1/2012 3:43, 1/1/2012 4:57, 
1/10/2010 12:37, 1/10/2010 17:03, 1/10/2010 21:04, 1/10/2010 9:19, 
1/11/2009 1:25, 1/11/2009 10:51, 1/11/2009 14:13, 1/11/2009 18:44, 
1/11/2009 22:45, 1/11/2009 6:25, 1/15/2012 10:33, 1/15/2012 13:48, 
1/15/2012 18:54, 1/15/2012 21:58, 1/15/2012 7:00, 1/15/2012 7:06, 
1/16/2011 1:26, 1/16/2011 11:30, 1/16/2011 16:15, 1/16/2011 18:36, 
1/16/2011 20:42, 1/16/2011 5:23, 1/17/2010 10:56, 1/17/2010 14:25, 
1/17/2010 19:13, 1/17/2010 21:00, 1/17/2010 21:55, 1/17/2010 6:54, 
1/18/2009 0:45, 1/18/2009 12:10, 1/18/2009 15:13, 1/18/2009 18:27, 
1/18/2009 20:39, 1/18/2009 22:19, 1/18/2009 3:12, 1/18/2009 8:38, 
1/2/2011 0:12, 1/2/2011 13:05, 1/2/2011 17:32, 1/2/2011 18:06, 
1/2/2011 19:28, 1/2/2011 20:42, 1/2/2011 23:40, 1/2/2011 6:43, 
1/2/2011 6:44, 1/2/2011 7:20, 1/22/2012 12:05, 1/22/2012 12:59, 
1/22/2012 17:13, 1/22/2012 19:25, 1/22/2012 20:08, 1/22/2012 23:11, 
1/22/2012 6:30, 1/23/2011 13:09, 1/23/2011 18:40, 1/23/2011 2:35, 
1/23/2011 20:15, 1/23/2011 23:44, 1/23/2011 6:25, 1/23/2011 9:46, 
1/24/2010 11:47, 1/24/2010 14:13, 1/24/2010 16:20, 1/24/2010 18:06, 
1/24/2010 19:52, 1/24/2010 6:56, 1/25/2009 15:13, 1/25/2009 19:47, 
1/25/2009 5:18, 1/25/2009 6:49, 1/25/2009 9:26, 1/29/2012 11:10, 
1/29/2012 14:06, 1/29/2012 14:13, 1/29/2012 18:00, 1/29/2012 3:22, 
1/29/2012 7:13, 1/3/2010 16:07, 1/3/2010 19:55, 1/3/2010 19:56, 
1/3/2010 2:10, 1/3/2010 8:31, 1/30/2011 11:27, 1/30/2011 15:38, 
1/30/2011 16:58, 1/30/2011 2:34, 1/30/2011 21:01, 1/30/2011 23:58, 
1/30/2011 7:08, 1/31/2010 10:31, 1/31/2010 13:36, 1/31/2010 16:29, 
1/31/2010 18:09, 1/4/2009 11:15, 1/4/2009 16:29, 1/4/2009 17:06, 
1/4/2009 20:14, 1/4/2009 21:58, 1/4/2009 8:49, 1/8/2012 14:02, 
1/8/2012 15:29, 1/8/2012 19:37, 1/8/2012 22:25, 1/8/2012 6:17, 
1/9/2011 11:46, 1/9/2011 15:53, 1/9/2011 19:12, 1/9/2011 5:08, 
10/10/2010 1:13, 10/10/2010 12:05, 10/10/2010 13:12, 10/10/2010
16:06, 
10/10/2010 17:40, 10/10/2010 2:46, 10/10/2010 20:46, 10/10/2010
23:52, 
10/11/2009 11:26, 10/11/2009 16:07, 10/11/2009 17:47, 10/11/2009
20:41, 
10/11/2009 3:20, 10/11/2009 7:03, 10/12/2008 13:13, 10/12/2008
18:13, 
10/12/2008 2:19, 10/12/2008 21:22, 10/12/2008 5:50, 10/12/2008 9:45, 
10/16/2011 1:57, 10/16/2011 10:43, 10/16/2011 12:09, 10/16/2011
15:23, 
10/16/2011 16:58, 10/16/2011 18:16, 10/16/2011 19:14, 10/16/2011
23:49, 
10/16/2011 6:27, 10/16/2011 9:00, 10/16/2011 9:35, 10/17/2010 1:18, 
10/17/2010 11:02, 10/17/2010 15:30, 10/17/2010 17:06, 10/17/2010
19:31, 
10/17/2010 20:33, 10/17/2010 22:04, 10/17/2010 3:24, 10/17/2010
6:07, 
10/18/2009 0:12, 10/18/2009 10:15, 10/18/2009 14:54, 10/18/2009
16:52, 
10/18/2009 18:49, 10/18/2009 23:10, 10/18/2009 5:17, 10/19/2008
11:52, 
10/19/2008 14:11, 10/19/2008 15:15, 10/19/2008 16:47, 10/19/2008
18:40, 
10/19/2008 22:53, 10/19/2008 4:31, 10/19/2008 7:02, 10/19/2008 9:20, 
10/2/2011 12:25, 10/2/2011 14:21, 10/2/2011 15:20, 10/2/2011 16:04, 
10/2/2011 5:06, 10/2/2011 7:33, 10/23/2011 10:41, 10/23/2011 12:41, 
10/23/2011 13:15, 10/23/2011 13:53, 10/23/2011 14:07, 10/23/2011
16:59, 
10/23/2011 20:25, 10/23/2011 5:39, 10/24/2010 12:05, 10/24/2010
16:24, 
10/24/2010 18:16, 10/24/2010 20:55, 10/24/2010 22:10, 10/24/2010
3:24, 
10/24/2010 8:49, 10/25/2009 11:07, 10/25/2009 18:12, 10/25/2009
22:17, 
10/25/2009 5:06, 10/25/2009 8:08, 10/26/2008 0:23, 10/26/2008 0:37, 
10/26/2008 12:24, 10/26/2008 15:47, 10/26/2008 21:08, 10/26/2008
23:55, 
10/26/2008 4:28, 10/26/2008 7:58, 10/3/2010 16:35, 10/3/2010 19:25, 
10/3/2010 2:38, 10/3/2010 7:43, 10/30/2011 11:29, 10/30/2011 13:15, 
10/30/2011 14:22, 10/30/2011 16:01, 10/30/2011 16:02, 10/30/2011
22:28, 
10/30/2011 3:05, 10/30/2011 5:48, 10/30/2011 8:01, 10/31/2010 10:47, 
10/31/2010 14:57, 10/31/2010 15:48, 10/31/2010 16:57, 10/31/2010
18:20, 
10/31/2010 2:34, 10/31/2010 22:10, 10/31/2010 3:32, 10/31/2010 5:59, 
10/4/2009 16:19, 10/4/2009 2:20, 10/4/2009 20:23, 10/4/2009 23:09, 
10/5/2008 10:56, 

Re: [R] Conditioned Latin Hypercube Sampling within determined distance

2012-06-25 Thread SergioHGS
Hi,

I'm sorry for not explaining well. I'm trying to do my best, but it's
alittle complicated for me. I will send what I've tried but I think I should
also provide some files to make possible for you to run the codes I'm
running, but i couldn't find a way to do it.
I have 2 rasters to represent environmental characteristics of my area
(slasc.asc and weasc.asc). I also have a raster containing the cost of
reaching every place in the area (euc_.asc). The function clhs allows to
sample in places according to lowest-reaching costs, it means the clhs will
choose points where is easier to go. To make this, I should use the code 

res - clhs(s, size = 30, iter = 2000, progress = FALSE, simple = FALSE,
cost = cost).

Then the R returns the error:

Error in .local(x, ...) : Could not find the cost attribute

I don't know what i'm doing wrong. I was wondering if somebody knows what I
should try.  The R-help says the cost is a character giving the name or an
integer giving the index of the attribute in x that gives a cost that can be
use to constrain the cLHS sampling. If NULL (default), the cost-constrained
implementation is not used. 
Below it's the codes I'm using:

require(raster)
require(clhs)
require(proj4)
require(rgdal)
# 1 - load the data 
r1-raster(D:/Sg/Lavr/R/slasc.asc)
r2-raster(D:/Sg/Lavr/R/weasc.asc)
#2 - load cost raster # I'M NOT SURE IF THIS STEP IS CORRECT
rcost-raster(D:/Sg/Lavr/R/euc_.asc)
#3 - set spatial references
projection(r1) - +proj=utm +zone=23 +south +ellps=WGS84 +datum=WGS84
+units=m +no_defs
projection(r2) - +proj=utm +zone=23 +south +ellps=WGS84 +datum=WGS84
+units=m +no_defs
projection(rcost) - +proj=utm +zone=23 +south +ellps=WGS84 +datum=WGS84
+units=m +no_defs
#4 - creates a multiple raster*object
s = stack(r1, r2, rcost)
#5 - Conditional latin hypercube sampling
res - clhs(s, size = 30, iter = 2000, progress = FALSE, simple = FALSE,
cost = rcost)

Error in .local(x, ...) : Could not find the cost attribute.

Thank you!

Sergio




--
View this message in context: 
http://r.789695.n4.nabble.com/Conditioned-Latin-Hypercube-Sampling-within-determined-distance-tp4633974p4634457.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graph displays

2012-06-25 Thread John Kane
xx  - structure(list(X1000s = c(47L, 37L, 17L), X600s = c(63L, 64L, 
62L), X500s = c(75L, 45L, 25L), X250s = c(116L, 11L, 66L), X100s = c(125L, 
25L, 12L), X50s = c(129L, 19L, 29L), X10s = c(131L, 61L, 91L), 
X5s = c(131L, 131L, 171L), X3s = c(131L, 186L, 186L), X1s = c(131L, 
186L, 186L)), .Names = c(X1000s, X600s, X500s, X250s, 
X100s, X50s, X10s, X5s, X3s, X1s), class = data.frame, row.names 
= c(A, 
B, C))

#then this should do 
barplot(t(xx))

John Kane
Kingston ON Canada


 -Original Message-
 From: ricardosousa2...@clix.pt
 Sent: Mon, 25 Jun 2012 09:24:36 -0700 (PDT)
 To: r-help@r-project.org
 Subject: [R] graph displays
 
 
 Good Afternoon, I'm trying to create a graph that displays the best way
 the
 following information.
 
     For instance organized by bar graph, A, B, C
 
 
 
 Source X1000s X600s X500s X250s X100s X50s X10s X5s X3s X1s
 1  A 476375   116   125  129  131 131 131 131
 2  B 3764451125   19   61 131 186 186
 3  C 1762256612   29   91 171 186 186
 
 thanks
 
 
 
 --
 View this message in context:
 http://r.789695.n4.nabble.com/graph-displays-tp4634448.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at 
http://www.inbox.com/smileys
Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most 
webmails

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.spss function

2012-06-25 Thread peter dalgaard

On Jun 25, 2012, at 12:30 , Jim Holtman wrote:

 you have to load the 'foriegn' package first.
 

'foreign' works better

 Sent from my iPad

...with spellcheck disabled, it seems ;-)

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combineLimits and Dates

2012-06-25 Thread Duncan Mackay

Hi Elliot

This works on Win 7 ver 2.15

 useOuterStrips(combineLimits(
 xyplot(x + y ~ d | g, groups = h, data = dat, type = 'l',
scales = list(y = list(relation = free),
  x = list( at =seq(from = 
as.Date(2011-01-01), to = as.Date(2011-10-01), by = 3 month),
  labels = format(seq(from = 
as.Date(2011-01-01), to = as.Date(2011-10-01), by = 3 month), %b))

),
auto.key = TRUE)  ))

amend the seq as required and the format if required
see ?strptime for format


HTH

Duncan


Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mac...@northnet.com.au


At 02:28 26/06/2012, you wrote:

I'm having some trouble using the latticeExtra 'combineLimits' function
with a Date x-variable:

require(lattice)

set.seed(12345)

dates - seq(as.Date(2011-01-01), as.Date(2011-12-31), days)
dat - data.frame(d = rep(dates, 4),
  g = factor(rep(rep(c(1,2), each = length(dates)), 2)),
  h = factor(rep(c(a, b), each = length(dates)*2)),
  x = rnorm(4 * length(dates)),
  y = rnorm(4 * length(dates)))

plt1 - xyplot(x + y ~ d | g, groups = h, data = dat, type = 'l', scales =
list(relation = free),
   auto.key = TRUE)
plt1 - useOuterStrips(plt1)
plt1 - combineLimits(plt1)

The x-axis labels are right after the call to 'useOuterStrips' but they get
converted to numeric after the call to 'combineLimits'. How do I keep them
as date labels?

Thanks.

- Elliot

--
Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
134 Mount Auburn Street | Cambridge, MA | 02138
Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rrdf package for mac not working

2012-06-25 Thread Ricardo Pietrobon
rrdf is incredibly helpful, but I've notice that the rrdf package for mac
hasn't been working for some time: http://goo.gl/5Ukpn . wondering if there
is still a plan to maintain that in the long run, or if there is some other
alternative to read RDF files.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Passing Arima Parameters from Excel to R

2012-06-25 Thread PaulJr
Hello everyone,

I would like to have an Excel SpreadSheet with a Macro that allows me to
select between the following Models (in the form of a combobox): AR, MA,
ARMA and ARIMA. And depending on the model chosen, give Excel the parameters
(for p, q, d and P, D, Q). 
I want to find a way to connect R with Excel in such a way that, If I select
the desired model and the required parameters I could pass those to the
arima.fit function and then generate the graph and produce the forecasts
from R.

Is there a way to do this?

Best regards,

Paul

--
View this message in context: 
http://r.789695.n4.nabble.com/Passing-Arima-Parameters-from-Excel-to-R-tp4634467.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fitting binomial data to a probit distribution without an equation

2012-06-25 Thread mort459
Hey everyone,

I've been reading an old scientific paper (well, not that old, about 15
years) and I want to verify the authors' statistical results. The paper is
fairly unclear about what exactly they did, and none of the relatively
simple commands I'm familiar with are producing results similar to theirs. 

The data is dose-response, recorded as binomial data:

structure(list(X1 = c(10, 10, 12, 13, 14, 15, 16, 18, 20, 20, 
23, 23, 25, 30, 45, 46, 46, 48, 50, 52), X2 = c(0, 1, 0, 1, 1, 
1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1)), .Names = c(X1, 
X2), row.names = c(NA, 20L), class = data.frame)

The quote(s) from the paper is as follows: 

Maximum likelihood was applied to probit models (normal, lognormal,
weibull, logistic, and
log-logistic) and the data to determine the incidence relationships. The
lognormal and loglogistic
models best represent the data.

and later:

Binary data are utilized for the maximum likelihood dose response
calculations

I tried using a simple glm() with a probit linker, but that produced
border-line nonsense results. This made sense when I thought about it more,
as R was trying to fit a regression line to raw binary data as opposed to
binned/high repetition binary data. 

I've also messed around with maximum likelihood but they're not clear about
what equation they're using. 

In the end, I guess what I'm trying to do is:  figure out how they're
estimating their probit parameters from a binary data set. To me, estimating
parameters seems very different from doing a GLM. Is this even possible to
do? Is there a package out there than performs this function or is it in the
basic functionality of R and I'm just being dumb??

-Mort

(apologies if this is too much theoretical statistics and not enough 'R')

--
View this message in context: 
http://r.789695.n4.nabble.com/Fitting-binomial-data-to-a-probit-distribution-without-an-equation-tp4634466.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] setdiff datframes

2012-06-25 Thread arun
Hi,

Try this:
example1-data.frame(V1=c(rs4685,rs4685,rs788018,rs788023),V2=c(2:198257795,2:198257795,2:198265526,2:198283305))
example2-data.frame(V1=c(rs4685,rs4675,rs788018,rs788023),V2=c(2:198257795,2:198258795,2:198265526,2:198283305))
subset(example2,!(V1 %in% example1$V1))

  V1  V2
2 rs4675 2:198258795



A.K.

- Original Message -
From: nathalie n...@sanger.ac.uk
To: r-help@r-project.org
Cc: 
Sent: Monday, June 25, 2012 12:37 PM
Subject: [R] setdiff datframes

hi,


I have 2 files example 1 and example 2 and would like to know what is in 
example2 and not in example1 (attached)
V1 contain data which could be in duplicated which I am using as identifiers

I used setdiff(example2$V1,example1$V1) to find the identifiers which are 
specific to example2:
[1] rs2276598  rs17253672

I am looking for a way to get an  output with  all columns (V1 to V14) for 
these 2 identifiers

thanks for any suggestions


format example1
           V1          V2 V3              V4              V5 V6
1      rs4685 2:198257795  C ENSG0115524 ENST0424674 Transcript
2      rs4685 2:198257795  C ENSG0115524 ENST0335508 Transcript
3    rs788018 2:198265526  G ENSG0115524 ENST0335508 Transcript
4    rs788023 2:198283305  C ENSG0115524 ENST0335508 Transcript
5  rs41284843  2:25536827  A ENSG0119772 ENST0406659 Transcript
6  rs41284843  2:25536827  A ENSG0119772 ENST0321117 Transcript
7  rs41284843  2:25536827  A ENSG0119772 ENST0264709 Transcript
8  rs41284843  2:25536827  A ENSG0119772 ENST0380756 Transcript
9   rs3729680 3:178927410  G ENSG0121879 ENST0263967 Transcript
10 rs61744960 4:106156163  A ENSG0168769 ENST0305737 Transcript
11 rs61744960 4:106156163  A ENSG0168769 ENST0413648 Transcript
12 rs61744960 4:106156163  A ENSG0168769 ENST0540549 Transcript
13 rs61744960 4:106156163  A ENSG0168769 ENST0545826 Transcript
14 rs61744960 4:106156163  A ENSG0168769 ENST0380013 Transcript
15 rs61744960 4:106156163  A ENSG0168769 ENST0535110 Transcript
16 rs61744960 4:106156163  A ENSG0168769 ENST0394764 Transcript
17 rs61744960 4:106156163  A ENSG0168769 ENST0513237 Transcript
18 rs61744960 4:106156163  A ENSG0168769 ENST0265149 Transcript
19  rs2454206 4:106196951  G ENSG0168769 ENST0540549 Transcript
20  rs2454206 4:106196951  G ENSG0168769 ENST0513237 Transcript
                                     V7   V8   V9  V10 V11 V12        V13
1                     SYNONYMOUS_CODING  704  705  235   V gtA/gtG     rs4685
2                     SYNONYMOUS_CODING 3749 3657 1219   V gtA/gtG     rs4685
3                     SYNONYMOUS_CODING 2723 2631  877   G ggT/ggC rs788018
4                     SYNONYMOUS_CODING  515  423  141   K aaA/aaG rs788023
5                     SYNONYMOUS_CODING  365   27    9   P ccC/ccT rs41284843
6                     SYNONYMOUS_CODING  264   27    9   P ccC/ccT rs41284843
7                     SYNONYMOUS_CODING  365   27    9   P ccC/ccT rs41284843
8      NMD_TRANSCRIPT,SYNONYMOUS_CODING  264   27    9   P ccC/ccT rs41284843
9                 NON_SYNONYMOUS_CODING 1330 1173  391 I/M atA/atG rs3729680
10                NON_SYNONYMOUS_CODING 1468 1064  355 G/D gGt/gAt rs61744960
11                NON_SYNONYMOUS_CODING 1204 1064  355 G/D gGt/gAt rs61744960
12                NON_SYNONYMOUS_CODING 1924 1064  355 G/D gGt/gAt rs61744960
13                NON_SYNONYMOUS_CODING 1924 1064  355 G/D gGt/gAt rs61744960
14                NON_SYNONYMOUS_CODING 1450 1064  355 G/D gGt/gAt rs61744960
15                NON_SYNONYMOUS_CODING 1167 1064  355 G/D gGt/gAt rs61744960
16                NON_SYNONYMOUS_CODING 1450 1064  355 G/D gGt/gAt rs61744960
17                NON_SYNONYMOUS_CODING 1924 1127  376 G/D gGt/gAt rs61744960
18 NMD_TRANSCRIPT,NON_SYNONYMOUS_CODING 1450 1064  355 G/D gGt/gAt rs61744960
19                NON_SYNONYMOUS_CODING 6144 5284 1762 I/V Ata/Gta rs2454206
20                NON_SYNONYMOUS_CODING 6144 5347 1783 I/V Ata/Gta rs2454206
V14
1 ENSP=ENSP0409435;HGNC=SF3B1
2 ENSP=ENSP0335321;HGNC=SF3B1
3 ENSP=ENSP0335321;HGNC=SF3B1
4 ENSP=ENSP0335321;HGNC=SF3B1
5 ENSP=ENSP0384852;HGNC=DNMT3A
6 ENSP=ENSP0324375;HGNC=DNMT3A
7 ENSP=ENSP0264709;HGNC=DNMT3A
8 ENSP=ENSP0370132;HGNC=DNMT3A
9 ENSP=ENSP0263967;PolyPhen=benign(0.019);SIFT=tolerated(0.13);HGNC=PIK3CA
10 
ENSP=ENSP0306705;PolyPhen=probably_damaging(0.983);SIFT=deleterious(0.01);HGNC=TET2
11 
ENSP=ENSP0391448;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2
12 
ENSP=ENSP0442788;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2
13 
ENSP=ENSP0442867;PolyPhen=probably_damaging(0.952);SIFT=deleterious(0.01);HGNC=TET2
14 
ENSP=ENSP0369351;PolyPhen=possibly_damaging(0.825);SIFT=deleterious(0);HGNC=TET2
15 
ENSP=ENSP0438851;PolyPhen=probably_damaging(0.998);SIFT=deleterious(0.01);HGNC=TET2
16 

Re: [R] Questions about doing analysis based on time

2012-06-25 Thread Rui Barradas

Hello,

The following worked with the supplied dataset, named 'Sunday'.

dat - Sunday
dat$SunDate - as.POSIXct(Sunday$SunDate, format=%m/%d/%Y %H:%M)
dat - na.exclude(dat)
h - hours(dat$SunDate)

# 1. Two different displays
aggregate(dat$SunScore, by=list(h), mean)
aggregate(SunScore ~ h, data=dat, mean)

# 2. Several graphs, including hostogram
ix - dat$SunScore  140
hist(h[ix])
boxplot(SunScore[ix] ~ h[ix], data=dat)
plot(SunScore[ix] ~ jitter(h[ix], 0.5), data=dat, col=h[ix]+2, pch=16)


I haven't thought about the other questions, but these seem to be a start.

Hope this helps,

Rui Barradas

Em 25-06-2012 18:56, APOCooter escreveu:

Oh, whoops.  When I responded the first time, I had done dput(head(Sunday,
100)) too, but it didn't look useful.  It was only just now that I saw that
it basically prints out the vectors.  Sorry about that.

Anyways, here's dput(head(Sunday, 100)):


dput(head(Sunday, 100))

structure(list(SunDate = structure(c(1046L, 1058L, 1080L, 1462L,
1268L, 977L, 935L, 158L, 646L, 46L, 1357L, 1131L, 465L, 512L,
1562L, 1555L, 1401L, 503L, 294L, 200L, 1368L, 701L, 364L, 318L,
1120L, 1316L, 486L, 338L, 1604L, 1496L, 925L, 201L, 1007L, 1339L,
758L, 615L, 896L, 38L, 969L, 1543L, 1212L, 612L, 904L, 1197L,
372L, 842L, 571L, 1596L, 1229L, 408L, 912L, 1026L, 1452L, 473L,
409L, 714L, 578L, 1059L, 743L, 579L, 495L, 1280L, 460L, 1081L,
1470L, 1348L, 865L, 346L, 725L, 118L, 604L, 604L, 254L, 397L,
1605L, 149L, 389L, 785L, 1163L, 1206L, 707L, 1189L, 14L, 1590L,
1269L, 26L, 1476L, 513L, 662L, 874L, 283L, 670L, 1613L, 1075L,
1112L, 263L, 623L, 1425L, 1240L, 1240L), .Label = c(1/1/2012 10:27,
1/1/2012 13:59, 1/1/2012 14:50, 1/1/2012 15:44, 1/1/2012 16:48,
1/1/2012 20:43, 1/1/2012 22:41, 1/1/2012 3:43, 1/1/2012 4:57,
1/10/2010 12:37, 1/10/2010 17:03, 1/10/2010 21:04, 1/10/2010 9:19,
1/11/2009 1:25, 1/11/2009 10:51, 1/11/2009 14:13, 1/11/2009 18:44,
1/11/2009 22:45, 1/11/2009 6:25, 1/15/2012 10:33, 1/15/2012 13:48,
1/15/2012 18:54, 1/15/2012 21:58, 1/15/2012 7:00, 1/15/2012 7:06,
1/16/2011 1:26, 1/16/2011 11:30, 1/16/2011 16:15, 1/16/2011 18:36,
1/16/2011 20:42, 1/16/2011 5:23, 1/17/2010 10:56, 1/17/2010 14:25,
1/17/2010 19:13, 1/17/2010 21:00, 1/17/2010 21:55, 1/17/2010 6:54,
1/18/2009 0:45, 1/18/2009 12:10, 1/18/2009 15:13, 1/18/2009 18:27,
1/18/2009 20:39, 1/18/2009 22:19, 1/18/2009 3:12, 1/18/2009 8:38,
1/2/2011 0:12, 1/2/2011 13:05, 1/2/2011 17:32, 1/2/2011 18:06,
1/2/2011 19:28, 1/2/2011 20:42, 1/2/2011 23:40, 1/2/2011 6:43,
1/2/2011 6:44, 1/2/2011 7:20, 1/22/2012 12:05, 1/22/2012 12:59,
1/22/2012 17:13, 1/22/2012 19:25, 1/22/2012 20:08, 1/22/2012 23:11,
1/22/2012 6:30, 1/23/2011 13:09, 1/23/2011 18:40, 1/23/2011 2:35,
1/23/2011 20:15, 1/23/2011 23:44, 1/23/2011 6:25, 1/23/2011 9:46,
1/24/2010 11:47, 1/24/2010 14:13, 1/24/2010 16:20, 1/24/2010 18:06,
1/24/2010 19:52, 1/24/2010 6:56, 1/25/2009 15:13, 1/25/2009 19:47,
1/25/2009 5:18, 1/25/2009 6:49, 1/25/2009 9:26, 1/29/2012 11:10,
1/29/2012 14:06, 1/29/2012 14:13, 1/29/2012 18:00, 1/29/2012 3:22,
1/29/2012 7:13, 1/3/2010 16:07, 1/3/2010 19:55, 1/3/2010 19:56,
1/3/2010 2:10, 1/3/2010 8:31, 1/30/2011 11:27, 1/30/2011 15:38,
1/30/2011 16:58, 1/30/2011 2:34, 1/30/2011 21:01, 1/30/2011 23:58,
1/30/2011 7:08, 1/31/2010 10:31, 1/31/2010 13:36, 1/31/2010 16:29,
1/31/2010 18:09, 1/4/2009 11:15, 1/4/2009 16:29, 1/4/2009 17:06,
1/4/2009 20:14, 1/4/2009 21:58, 1/4/2009 8:49, 1/8/2012 14:02,
1/8/2012 15:29, 1/8/2012 19:37, 1/8/2012 22:25, 1/8/2012 6:17,
1/9/2011 11:46, 1/9/2011 15:53, 1/9/2011 19:12, 1/9/2011 5:08,
10/10/2010 1:13, 10/10/2010 12:05, 10/10/2010 13:12, 10/10/2010
16:06,
10/10/2010 17:40, 10/10/2010 2:46, 10/10/2010 20:46, 10/10/2010
23:52,
10/11/2009 11:26, 10/11/2009 16:07, 10/11/2009 17:47, 10/11/2009
20:41,
10/11/2009 3:20, 10/11/2009 7:03, 10/12/2008 13:13, 10/12/2008
18:13,
10/12/2008 2:19, 10/12/2008 21:22, 10/12/2008 5:50, 10/12/2008 9:45,
10/16/2011 1:57, 10/16/2011 10:43, 10/16/2011 12:09, 10/16/2011
15:23,
10/16/2011 16:58, 10/16/2011 18:16, 10/16/2011 19:14, 10/16/2011
23:49,
10/16/2011 6:27, 10/16/2011 9:00, 10/16/2011 9:35, 10/17/2010 1:18,
10/17/2010 11:02, 10/17/2010 15:30, 10/17/2010 17:06, 10/17/2010
19:31,
10/17/2010 20:33, 10/17/2010 22:04, 10/17/2010 3:24, 10/17/2010
6:07,
10/18/2009 0:12, 10/18/2009 10:15, 10/18/2009 14:54, 10/18/2009
16:52,
10/18/2009 18:49, 10/18/2009 23:10, 10/18/2009 5:17, 10/19/2008
11:52,
10/19/2008 14:11, 10/19/2008 15:15, 10/19/2008 16:47, 10/19/2008
18:40,
10/19/2008 22:53, 10/19/2008 4:31, 10/19/2008 7:02, 10/19/2008 9:20,
10/2/2011 12:25, 10/2/2011 14:21, 10/2/2011 15:20, 10/2/2011 16:04,
10/2/2011 5:06, 10/2/2011 7:33, 10/23/2011 10:41, 10/23/2011 12:41,
10/23/2011 13:15, 10/23/2011 13:53, 10/23/2011 14:07, 10/23/2011
16:59,
10/23/2011 20:25, 10/23/2011 5:39, 10/24/2010 12:05, 10/24/2010
16:24,
10/24/2010 18:16, 10/24/2010 20:55, 10/24/2010 22:10, 10/24/2010
3:24,
10/24/2010 8:49, 10/25/2009 11:07, 10/25/2009 18:12, 10/25/2009
22:17,
10/25/2009 5:06, 10/25/2009 8:08, 10/26/2008 0:23, 10/26/2008 

[R] weighted logistic regression

2012-06-25 Thread Katherine Nishimura
I have an epidemiologic dataset and want to do a weighted logistic
regression.  I have a continuous exposure variable and I want to see
if it is associated with a binary outcome.  The accuracy of the
exposure value is highly variable and I want the logistic regression
to assign greater weight/importance to exposure values that are highly
accurate, and lower weight to exposure values that are less accurate
(there is a third variable which specifies the quality of the exposure
value).  Is the weights argument for the glm command appropriate
for this?
Thanks,
Katie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] do.call or something instead of for

2012-06-25 Thread Rolf Turner

On 26/06/12 00:39, Kathie wrote:

Dear R users,

I'd like to compute  X like below.

X_{i,t} = 0.1*t + 2*X_{i,t-1} + W_{i,t}

where W_{i,t}  are from Uniform(0,2) and X_{i,0} = 1+5*W_{i,0}

Of course, I can do this with for statement, but I don't think it's good
idea because i and t are too big.

So, my question is that

Is there any better idea to avoid for statement for this problem?

Thank you in advance.


You could use filter(), as follows:

foo - function (n) {
w0 - 1+5*runif(1,0,2)
w  - c(w0,runif(n-1,0,2))
y  - filter(w,filter=2,method=recursive)
mu - sapply(0:(n-1),function(k){if(k==0) 
return(0);sum((1:k)*2^((k-1):0))*0.1})

mu + y
}

(This of course just does univariate time series and ignores the 
subscript i.

To get multivariate time series, just use foo() to generate as many
components as you want; they are independent.)

Check:

set.seed(42)
x - foo(10)
x
Time Series:
Start = 1
End = 10
Frequency = 1
 [1]   10.14806   22.27027   45.31282   92.58654  186.85657 375.25133
 [7]  752.57585 1506.12103 3014.35603 6031.02220

set.seed(42)
w0 - 1+5*runif(1,0,1)
w   - c(w0,runif(9,0,2)
0.1+2*x[1]+w[2]
[1] 22.27027# Bingo.
0.2+2*x[2]+w[3]
[1] 45.31282# Bingo.

etc.

Notice that the generated series explodes rather rapidly.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.