Re: [R] selecting significant predictors from ANOVA result

2010-01-28 Thread ram basnet
Dear Sir,
 
Thanks for your message. My problem is in writing codes. I did ANOVA for 75000 
response variables (let's say Y) with 243 predictors (let's say X-matrix) one 
by one with for loop in R. I stored the p-values of all predictors, however, 
i have very huge file because i have pvalues of 243 predictors for all 75000 
Y-variables.
Now, i want to find some codes that autamatically select only significant 
X-predictors from the whole list. If you have ideas on that, it will be great 
help.
Thanks in advances
 
Sincerely,
Ram

--- On Wed, 1/27/10, Bert Gunter gunter.ber...@gene.com wrote:


From: Bert Gunter gunter.ber...@gene.com
Subject: RE: [R] selecting significant predictors from ANOVA result
To: 'ram basnet' basnet...@yahoo.com, 'R help' r-help@r-project.org
Date: Wednesday, January 27, 2010, 7:56 AM


Ram:

You do not say how many cases (rows in your dataset) you have, but I suspect
it may be small (a few hundred, say).

In any case, what you describe is probably just a complicated way to
generate random numbers -- it is **highly** unlikely that any meaningful,
replicable scientific results would result from your proposed approach.

Not surprising -- this appears to be a very difficult data analysis issue.
It is obvious that you have only a minimal statistical background, so I
would strongly recommend that you find a competent local statistician to
help you with your work. Remote help from this list is wholly inadequate.

Bert Gunter
Genentech Nonclinical Statistics



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of ram basnet
Sent: Wednesday, January 27, 2010 2:52 AM
To: R help
Subject: [R] selecting significant predictors from ANOVA result

Dear all,

I did ANOVA for many response variables (Var1, Var2, Var75000), and i
got the result of p-value like below. Now, I want to select those
predictors, which have pvalue less than or equal to 0.05 for each response
variable. For example, X1, X2, X3, X4, X5 and X6 in case of Var1, and
similarly, X1, X2...X5 in case of Var2, only X1 in case of Var3 and none
of the predictors in case of Var4.







predictors    
Var1
Var2
Var3
Var4

X1
0.5
0.001
0.05
0.36

X2
0.0001
0.001
0.09
0.37

X3
0.0002
0.005
0.13
0.38

X4
0.0003
0.01
0.17
0.39

X5
0.01
0.05
0.21
0.4

X6
0.05
0.0455
0.25
0.41

X7
0.038063
0.0562
0.29
0.42

X8
0.04605
0.0669
0.33
0.43

X9
0.054038
0.0776
0.37
0.44

X10
0.062025
0.0883
0.41
0.45

I have very large data sets (# of response variables = ~75,000). So, i need
some kind of automated procedure. But i have no ideas.
If i got help from some body, it will be great for me.

Thanks in advance.

Sincerely,

Ram Kumar Basnet,
Ph. D student
Wageningen University,
The Netherlands.




      
    [[alternative HTML version deleted]]





  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge: sort=F not preserving order?

2010-01-28 Thread Bart Joosen

You could add an extra sequence on the dataframe you wish to sort on.
Merge together, sort by the sequence, delete the sequence.
It's a bit more work, but it will give you what you want.

Bart
-- 
View this message in context: 
http://n4.nabble.com/Merge-sort-F-not-preserving-order-tp1312234p1340790.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] number of decimal

2010-01-28 Thread Ivan Calandra

Hi everybody,

I'm trying to set the number of decimals (i.e. the number of digits 
after the .). I looked into options but I can only set the total 
number of digits, with options(digits=6). But since I have different 
variables with different order of magnitude, I would like that they're 
all displayed with the same number of decimals.
I searched for it and found the format() function, with nsmall=6, but it 
is for a given vector. I would like to set it for the whole session, as 
with options.


Can anyone help me?
Thanks in advance
Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problems with fitdistr

2010-01-28 Thread vikrant

Hi,
I want to estimate parameters of weibull distribution. For this, I am using
fitdistr() function  in MASS package.But when I give fitdistr(c,weibull) I
get a Error as follows:- 
 Error in optim(x = c(4L, 41L, 20L, 6L, 12L, 6L, 7L, 13L, 2L, 8L, 22L, 
: 
 non-finite value supplied by optim
Any help or suggestions are most welcomed
-- 
View this message in context: 
http://n4.nabble.com/Problems-with-fitdistr-tp1334772p1334772.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] If then test

2010-01-28 Thread claytonmccandless

close,

So I have a vector, lets say

[1] 1.5 1.2

And a matrix
 [,1] [,2]
[1,] 1.9   1.3
[2,]-.2   2

I want to somehow use the first number in my vector(1.5) and compare this
number to my whole first column. So I want to see how many times the numbers
in column 11.5 which should be 1 in this case. Now for the other number, we
compare 1.2. We get 0. So I need a vector to have these results like 

[1] 1 0 

-- 
View this message in context: 
http://n4.nabble.com/If-then-test-tp1322119p1336898.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data.frame manipulation

2010-01-28 Thread AC Del Re
Thank you Dennis--this is perfect!!

AC

On Thu, Jan 28, 2010 at 12:24 AM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:
 There are several ways to do this, but these are the most commonly used:
 aggregate() and the ddply() function in package plyr.

 (1) plyr solution (using x as the name of your input data frame):

 library(plyr)
  ddply(x, .(id, mod1), summarize, es = mean(es))
   id mod1   es
 1  12 0.30
 2  24 0.15
 3  31 0.20
  ddply(x, .(id, mod1, mod2), summarize, es = mean(es))
   id mod1   mod2   es
 1  12wai 0.30
 2  24 calpas 0.20
 3  24  other 0.10
 4  31   itas 0.10
 5  31wai 0.25

 (2) aggregate() function in base R:

  with(x, aggregate(list(es = es), by = list(id = id, mod1 = mod1), mean))
   id mod1   es
 1  31 0.20
 2  12 0.30
 3  24 0.15
  with(x, aggregate(list(es = es), by = list(id = id, mod1 = mod1, mod2 =
 mod2),
 +  mean))
   id mod1   mod2   es
 1  24 calpas 0.20
 2  31   itas 0.10
 3  24  other 0.10
 4  31wai 0.25
 5  12wai 0.30

 Note that enclosing the variable names in lists and 'equating' them
 maintains
 the variable name in the output. Here's what happens if you don't:

  with(x, aggregate(es, list(id, mod1), mean))
   Group.1 Group.2x
 1   3   1 0.20
 2   1   2 0.30
 3   2   4 0.15

 ddply() is a little less painless and sorts the output for you
 automatically.

 HTH,
 Dennis

 On Wed, Jan 27, 2010 at 7:34 PM, AC Del Re acde...@gmail.com wrote:

 Hi All,

 I'm conducting a meta-analysis and have taken a data.frame with multiple
 rows per
 study (for each effect size) and performed a weighted average of effect
 size
 for
 each study. This results in a reduced # of rows. I am particularly
 interested in
 simply reducing the additional variables in the data.frame to the first
 row
 of the
 corresponding id variable. For example:

 id-c(1,2,2,3,3,3)
 es-c(.3,.1,.3,.1,.2,.3)
 mod1-c(2,4,4,1,1,1)
 mod2-c(wai,other,calpas,wai,itas,other)
 data-as.data.frame(cbind(id,es,mod1,mod2))

 data

   id   esmod1 mod2
 1  1   0.32 wai
 2  2   0.14 other
 3  2   0.24 calpas
 4  3   0.11 itas
 5  3   0.21 wai
 6  3   0.31 wai

 # I would like to reduce the entire data.frame like this:

 id  es   mod1  mod2

 1  .30 2wai
 2  .15 4other
 3  .20 1 itas

 # If possible, I would also like the option of this (collapsing on id and
 mod2):

 id  es   mod1  mod2
 1  .30  2wai
 2   0.1 4   other
 2   0.2  4calpas
 3   0.1 1 itas
 3   0.251 wai

 Any help is much appreciated!

 AC Del Re

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting significant predictors from ANOVA result

2010-01-28 Thread Petr PIKAL
Hi

I agree with Bert that what you want to do is, how to say it politely, OK, 
not reasonable.

If p value is significant depends on number of observations. Let assume 
that they are same for each p value.

Then you need your p values in suitable object which you did not reveal to 
us. Again I will assume that it is matrix 75000 x 243, let's call it mat. 
Then you can select elements smaller then some threshold.

Here is a smaller one

mat-matrix(runif(12),4,3)
mat-mat/5
daf-as.data.frame(mat)
daf
V1  V2 V3
1 0.1833271959 0.182649428 0.16363889
2 0.1160545138 0.095533401 0.09378235
3 0.1622977912 0.005841073 0.08108027
4 0.0006527514 0.064333027 0.17431492

sapply(daf, function(x) x[x.1])
$V1
[1] 0.0006527514

$V2
[1] 0.095533401 0.005841073 0.064333027

$V3
[1] 0.09378235 0.08108027

But how do you control which of the significant values have real meaning 
and what you want to do with them is mystery.

Regards
Petr 

r-help-boun...@r-project.org napsal dne 28.01.2010 09:39:29:

 Dear Sir,
  
 Thanks for your message. My problem is in writing codes. I did ANOVA for 
75000
 response variables (let's say Y) with 243 predictors (let's say 
X-matrix) one 
 by one with for loop in R. I stored the p-values of all predictors, 
however,
 i have very huge file because i have pvalues of 243 predictors for all 
75000 
 Y-variables.
 Now, i want to find some codes that autamatically select only 
significant X-
 predictors from the whole list. If you have ideas on that, it will be 
great help.
 Thanks in advances
  
 Sincerely,
 Ram
 
 --- On Wed, 1/27/10, Bert Gunter gunter.ber...@gene.com wrote:
 
 
 From: Bert Gunter gunter.ber...@gene.com
 Subject: RE: [R] selecting significant predictors from ANOVA result
 To: 'ram basnet' basnet...@yahoo.com, 'R help' 
r-help@r-project.org
 Date: Wednesday, January 27, 2010, 7:56 AM
 
 
 Ram:
 
 You do not say how many cases (rows in your dataset) you have, but I 
suspect
 it may be small (a few hundred, say).
 
 In any case, what you describe is probably just a complicated way to
 generate random numbers -- it is **highly** unlikely that any 
meaningful,
 replicable scientific results would result from your proposed approach.
 
 Not surprising -- this appears to be a very difficult data analysis 
issue.
 It is obvious that you have only a minimal statistical background, so I
 would strongly recommend that you find a competent local statistician to
 help you with your work. Remote help from this list is wholly 
inadequate.
 
 Bert Gunter
 Genentech Nonclinical Statistics
 
 
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] 
On
 Behalf Of ram basnet
 Sent: Wednesday, January 27, 2010 2:52 AM
 To: R help
 Subject: [R] selecting significant predictors from ANOVA result
 
 Dear all,
 
 I did ANOVA for many response variables (Var1, Var2, Var75000), and 
i
 got the result of p-value like below. Now, I want to select those
 predictors, which have pvalue less than or equal to 0.05 for each 
response
 variable. For example, X1, X2, X3, X4, X5 and X6 in case of Var1, and
 similarly, X1, X2...X5 in case of Var2, only X1 in case of Var3 and 
none
 of the predictors in case of Var4.
 
 
 
 
 
 
 
 predictors
 Var1
 Var2
 Var3
 Var4
 
 X1
 0.5
 0.001
 0.05
 0.36
 
 X2
 0.0001
 0.001
 0.09
 0.37
 
 X3
 0.0002
 0.005
 0.13
 0.38
 
 X4
 0.0003
 0.01
 0.17
 0.39
 
 X5
 0.01
 0.05
 0.21
 0.4
 
 X6
 0.05
 0.0455
 0.25
 0.41
 
 X7
 0.038063
 0.0562
 0.29
 0.42
 
 X8
 0.04605
 0.0669
 0.33
 0.43
 
 X9
 0.054038
 0.0776
 0.37
 0.44
 
 X10
 0.062025
 0.0883
 0.41
 0.45
 
 I have very large data sets (# of response variables = ~75,000). So, i 
need
 some kind of automated procedure. But i have no ideas.
 If i got help from some body, it will be great for me.
 
 Thanks in advance.
 
 Sincerely,
 
 Ram Kumar Basnet,
 Ph. D student
 Wageningen University,
 The Netherlands.
 
 
 
 
   
 [[alternative HTML version deleted]]
 
 
 
 
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NA Replacement by lowest value?

2010-01-28 Thread Joel Fürstenberg-Hägg

Hi all,

 

I need to replace missing values in a matrix by 10 % of the lowest available 
value in the matrix. I've got a function I've used earlier to replace negative 
values by the lowest value, in a data frame, but I'm not sure how to modify 
it...

 

nonNeg = as.data.frame(apply(orig.df, 2, function(col) # Change negative values 
to a small value, close to zero
{
   min.val = min(col[col  0])

   col[col  0] = (min.val / 10)
   col # Column index
}))

 

I think this is how to start, but the NA replacement part doesn't work...

 

newMatrix = as.matrix(apply(oldMatrix, 2, function(col)

{

   min.val = min(mData, na.rm = T) # Find the smallest value in the dataset

   col[col == NA] = (min.val / 10) # Doesn't work...
   col # Column index

}

 

Does any of you have any suggestions?

 

 

Best regards,

 

Joel

 
  
_
Hitta kärleken i vinter!
http://dejting.se.msn.com/channel/index.aspx?trackingid=1002952
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Matthew Dowle
How it represents data internally is very important,  depending on the real 
goal :
http://en.wikipedia.org/wiki/Column-oriented_DBMS


Gabor Grothendieck ggrothendi...@gmail.com wrote in message 
news:971536df1001271710o4ea62333l7f1230b860114...@mail.gmail.com...
How it represents data internally should not be important as long as
you can do what you want.  SQL is declarative so you just specify what
you want rather than how to get it and invisibly to the user it
automatically draws up a query plan and then uses that plan to get the
result.

On Wed, Jan 27, 2010 at 12:48 PM, Matthew Dowle mdo...@mdowle.plus.com 
wrote:

 sqldf(select * from BOD order by Time desc limit 3)
 Exactly. SQL requires use of order by. It knows the order, but it isn't
 ordered. Thats not good, but might be fine, depending on what the real 
 goal
 is.


 Gabor Grothendieck ggrothendi...@gmail.com wrote in message
 news:971536df1001270629w4795da89vb7d77af6e4e8b...@mail.gmail.com...
 On Wed, Jan 27, 2010 at 8:56 AM, Matthew Dowle mdo...@mdowle.plus.com
 wrote:
 How many columns, and of what type are the columns ? As Olga asked too, 
 it
 would be useful to know more about what you're really trying to do.

 3.5m rows is not actually that many rows, even for 32bit R. Its depends 
 on
 the columns and what you want to do with those columns.

 At the risk of suggesting something before we know the full facts, one
 possibility is to load the data from flat file into data.table. Use
 setkey()
 to set your keys. Use tables() to summarise your various tables. Then do
 your joins etc all-in-R. data.table has fast ways to do those sorts of
 joins (but we need more info about your task).

 Alternatively, you could check out the sqldf website. There is an
 sqlread.csv (or similar name) which can read your files directly into SQL

 read.csv.sql

 instead of going via R. Gabor has some nice examples there about that and
 its faster.

 You use some buzzwords which makes me think that SQL may not be
 appropriate
 for your task though. Can't say for sure (because we don't have enough
 information) but its possible you are struggling because SQL has no row
 ordering concept built in. That might be why you've created an increment

 In the SQLite database it automatically assigns a self incrementing
 hidden column called rowid to each row. e.g. using SQLite via the
 sqldf package on CRAN and the BOD data frame which is built into R we
 can display the rowid column explicitly by referring to it in our
 select statement:

 library(sqldf)
 BOD
 Time demand
 1 1 8.3
 2 2 10.3
 3 3 19.0
 4 4 16.0
 5 5 15.6
 6 7 19.8
 sqldf(select rowid, * from BOD)
 rowid Time demand
 1 1 1 8.3
 2 2 2 10.3
 3 3 3 19.0
 4 4 4 16.0
 5 5 5 15.6
 6 6 7 19.8


 field? Do your queries include order by incrementing field? SQL is not
 good at first and last type logic. An all-in-R solution may well be

 In SQLite you can get the top 3 values, say, like this (continuing the
 prior example):

 sqldf(select * from BOD order by Time desc limit 3)
 Time demand
 1 7 19.8
 2 5 15.6
 3 4 16.0

 better, since R is very good with ordered vectors. A 1GB data.table (or
 data.frame) for example, at 3.5m rows, could have 76 integer columns, or
 38 double columns. 1GB is well within 32bit and allows some space for
 working copies, depending on what you want to do with the data. If you
 have
 38 or less columns, or you have 64bit, then an all-in-R solution *might*
 get your task done quicker, depending on what your real goal is.

 If this sounds plausible, you could post more details and, if its
 appropriate, and luck is on your side, someone might even sketch out how
 to
 do an all-in-R solution.


 Nathan S. Watson-Haigh nathan.watson-ha...@csiro.au wrote in message
 news:4b5fde1b.10...@csiro.au...
I have a table (contact) with several fields and it's PK is an auto
increment field. I'm bulk loading data to this table from files which if
successful will be about 3.5million rows (approx 16000 rows per file).
However, I have a linking table (an_contact) to resolve a m:m 
relationship
between the an and contact tables. How can I retrieve the PK's for the
data
bulk loaded into contact so I can insert the relevant data into
an_contact.

 I currently load the data into contact using:
 dbWriteTable(con, contact, dat, append=TRUE, row.names=FALSE)

 But I then need to get all the PK's which this dbWriteTable() appended 
 to
 the contact table so I can load the data into my an_contact link table. 
 I
 don't want to issue a separate INSERT query for each row in dat and then
 use MySQLs LAST_INSERT_ID() functionnot when I have 3.5million rows
 to
 insert!

 Any pointers welcome,
 Nathan

 --
 
 Dr. Nathan S. Watson-Haigh
 OCE Post Doctoral Fellow
 CSIRO Livestock Industries
 University Drive
 Townsville, QLD 4810
 Australia

 Tel: +61 (0)7 4753 8548
 Fax: +61 (0)7 4753 8600
 Web: http://www.csiro.au/people/Nathan.Watson-Haigh.html


 

Re: [R] NA Replacement by lowest value?

2010-01-28 Thread Paul Hiemstra

Joel Fürstenberg-Hägg wrote:

Hi all,

 


I need to replace missing values in a matrix by 10 % of the lowest available 
value in the matrix. I've got a function I've used earlier to replace negative 
values by the lowest value, in a data frame, but I'm not sure how to modify 
it...

 


nonNeg = as.data.frame(apply(orig.df, 2, function(col) # Change negative values 
to a small value, close to zero
{
   min.val = min(col[col  0])
  



   col[col  0] = (min.val / 10)
   col # Column index
}))

 


I think this is how to start, but the NA replacement part doesn't work...

 


newMatrix = as.matrix(apply(oldMatrix, 2, function(col)

{

   min.val = min(mData, na.rm = T) # Find the smallest value in the dataset

   col[col == NA] = (min.val / 10) # Doesn't work...
  

use is.na(col) t find the NA's.

cheers,
Paul

   col # Column index

}

 


Does any of you have any suggestions?

 

 


Best regards,

 


Joel

 
 		 	   		  
_

Hitta kärleken i vinter!
http://dejting.se.msn.com/channel/index.aspx?trackingid=1002952
[[alternative HTML version deleted]]

  



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  



--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lpSolve API - add Vs set

2010-01-28 Thread Kohleth Chia
Hi,

Using the package lpSolve API, I need to build a 2000*10 constraint matrix.
I wonder which method is faster:

(a) 
model = make.lp(0,0)
add.constraint(model,   ...)

or

(b)
model = make.lp(2000,10)
set.constraint(model,...)

Thanks

KC
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NA Replacement by lowest value?

2010-01-28 Thread Joel Fürstenberg-Hägg

Thanks a lot Paul!!

 

Best,

 

Joel
 
 Date: Thu, 28 Jan 2010 10:48:37 +0100
 From: p.hiems...@geo.uu.nl
 To: joel_furstenberg_h...@hotmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] NA Replacement by lowest value?
 
 Joel Fürstenberg-Hägg wrote:
  Hi all,
 
  
 
  I need to replace missing values in a matrix by 10 % of the lowest 
  available value in the matrix. I've got a function I've used earlier to 
  replace negative values by the lowest value, in a data frame, but I'm not 
  sure how to modify it...
 
  
 
  nonNeg = as.data.frame(apply(orig.df, 2, function(col) # Change negative 
  values to a small value, close to zero
  {
  min.val = min(col[col  0])
  
 
  col[col  0] = (min.val / 10)
  col # Column index
  }))
 
  
 
  I think this is how to start, but the NA replacement part doesn't work...
 
  
 
  newMatrix = as.matrix(apply(oldMatrix, 2, function(col)
 
  {
 
  min.val = min(mData, na.rm = T) # Find the smallest value in the dataset
 
  col[col == NA] = (min.val / 10) # Doesn't work...
  
 use is.na(col) t find the NA's.
 
 cheers,
 Paul
  col # Column index
 
  }
 
  
 
  Does any of you have any suggestions?
 
  
 
  
 
  Best regards,
 
  
 
  Joel
 
  
  
  _
  Hitta kärleken i vinter!
  http://dejting.se.msn.com/channel/index.aspx?trackingid=1002952
  [[alternative HTML version deleted]]
 
  
  
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
 
 
 -- 
 Drs. Paul Hiemstra
 Department of Physical Geography
 Faculty of Geosciences
 University of Utrecht
 Heidelberglaan 2
 P.O. Box 80.115
 3508 TC Utrecht
 Phone: +3130 274 3113 Mon-Tue
 Phone: +3130 253 5773 Wed-Fri
 http://intamap.geo.uu.nl/~paul
 
  
_
Hitta hetaste singlarna på MSN Dejting!
http://dejting.se.msn.com/channel/index.aspx?trackingid=1002952
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NA Replacement by lowest value?

2010-01-28 Thread Jim Lemon

On 01/28/2010 08:35 PM, Joel Fürstenberg-Hägg wrote:


Hi all,



I need to replace missing values in a matrix by 10 % of the lowest available 
value in the matrix. I've got a function I've used earlier to replace negative 
values by the lowest value, in a data frame, but I'm not sure how to modify 
it...



nonNeg = as.data.frame(apply(orig.df, 2, function(col) # Change negative values 
to a small value, close to zero
{
min.val = min(col[col  0])

col[col  0] = (min.val / 10)
col # Column index
}))



I think this is how to start, but the NA replacement part doesn't work...



newMatrix = as.matrix(apply(oldMatrix, 2, function(col)

{

min.val = min(mData, na.rm = T) # Find the smallest value in the dataset

col[col == NA] = (min.val / 10) # Doesn't work...
col # Column index

}



Does any of you have any suggestions?


Hi Joel,

You probably want to use:

col[is.na(col)]-min.val/10

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NA Replacement by lowest value?

2010-01-28 Thread Joel Fürstenberg-Hägg

Hi Jim,

 

That's what Pauls suggested too, works great!

 

Best,

 

Joel
 
 Date: Thu, 28 Jan 2010 20:57:57 +1100
 From: j...@bitwrit.com.au
 To: joel_furstenberg_h...@hotmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] NA Replacement by lowest value?
 
 On 01/28/2010 08:35 PM, Joel Fürstenberg-Hägg wrote:
 
  Hi all,
 
 
 
  I need to replace missing values in a matrix by 10 % of the lowest 
  available value in the matrix. I've got a function I've used earlier to 
  replace negative values by the lowest value, in a data frame, but I'm not 
  sure how to modify it...
 
 
 
  nonNeg = as.data.frame(apply(orig.df, 2, function(col) # Change negative 
  values to a small value, close to zero
  {
  min.val = min(col[col 0])
 
  col[col 0] = (min.val / 10)
  col # Column index
  }))
 
 
 
  I think this is how to start, but the NA replacement part doesn't work...
 
 
 
  newMatrix = as.matrix(apply(oldMatrix, 2, function(col)
 
  {
 
  min.val = min(mData, na.rm = T) # Find the smallest value in the dataset
 
  col[col == NA] = (min.val / 10) # Doesn't work...
  col # Column index
 
  }
 
 
 
  Does any of you have any suggestions?
 
 Hi Joel,
 
 You probably want to use:
 
 col[is.na(col)]-min.val/10
 
 Jim
 
  
_
Hitta kärleken i vinter!
http://dejting.se.msn.com/channel/index.aspx?trackingid=1002952
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] large integers in R

2010-01-28 Thread Benilton Carvalho
Hi Duncan,

On Tue, Jan 26, 2010 at 9:09 PM, Duncan Murdoch murd...@stats.uwo.ca wrote:
 On 26/01/2010 3:25 PM, Blanford, Glenn wrote:

 Has there been any update on R's handling large integers greater than 10^9
 (between 10^9 and 4x10^9) ?

 as.integer() in R 2.9.2 lists this as a restriction but doesnt list the
 actual limit or cause, nor if anyone was looking at fixing it.

 Integers in R are 4 byte signed integers, so the upper limit is 2^31-1.
  That's not likely to change soon.

But in the hypothetical scenario that this was to change soon and we
were to have 64bit integer type (say, when under a 64 bit OS),
wouldn't this allow us to have objects whose length exceeded the
2^31-1 limit?


Benilton Carvalho




 The double type in R can hold exact integer values up to around 2^52. So for
 example calculations like this work fine:

 x - 2^50
 y - x + 1
 y-x
 [1] 1

 Just don't ask R to put those values into a 4 byte integer, they won't fit:

 as.integer(c(x,y))
 [1] NA NA
 Warning message:
 NAs introduced by coercion

 Duncan Murdoch


 Glenn D Blanford, PhD
 mailto:glenn.blanf...@us.army.mil
 Scientific Research Corporation
 gblanf...@scires.commailto:gblanf...@scires.com


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems with fitdistr

2010-01-28 Thread Mario Valle

Try to pass a start value to help optim (see ?fitdistr)
Ciao!
mario

vikrant wrote:
 Hi,
 I want to estimate parameters of weibull distribution. For this, I am using
 fitdistr() function  in MASS package.But when I give fitdistr(c,weibull) I
 get a Error as follows:- 
  Error in optim(x = c(4L, 41L, 20L, 6L, 12L, 6L, 7L, 13L, 2L, 8L, 22L, 
 : 
  non-finite value supplied by optim
 Any help or suggestions are most welcomed

-- 
Ing. Mario Valle
Data Analysis and Visualization Group| http://www.cscs.ch/~mvalle
Swiss National Supercomputing Centre (CSCS)  | Tel:  +41 (91) 610.82.60
v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax:  +41 (91) 610.82.82

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] large integers in R

2010-01-28 Thread Duncan Murdoch

On 28/01/2010 5:30 AM, Benilton Carvalho wrote:

Hi Duncan,

On Tue, Jan 26, 2010 at 9:09 PM, Duncan Murdoch murd...@stats.uwo.ca wrote:

On 26/01/2010 3:25 PM, Blanford, Glenn wrote:

Has there been any update on R's handling large integers greater than 10^9
(between 10^9 and 4x10^9) ?

as.integer() in R 2.9.2 lists this as a restriction but doesnt list the
actual limit or cause, nor if anyone was looking at fixing it.

Integers in R are 4 byte signed integers, so the upper limit is 2^31-1.
 That's not likely to change soon.


But in the hypothetical scenario that this was to change soon and we
were to have 64bit integer type (say, when under a 64 bit OS),
wouldn't this allow us to have objects whose length exceeded the
2^31-1 limit?


Those are certainly related problems, but you don't need 64 bit integers 
to have longer vectors.  We could switch to indexing by doubles in R 
(though internally the indexing would probably be done in 64 bit ints).


A problem with exposing 64 bit ints in R is that they break the rule 
that doubles can represent any integer exactly.  If x is an integer, x+1 
is a double, and it would be unfortunate if (x+1) != (x+1L), as will 
happen with values bigger than 2^52.


Duncan Murdoch





Benilton Carvalho





The double type in R can hold exact integer values up to around 2^52. So for
example calculations like this work fine:


x - 2^50
y - x + 1
y-x

[1] 1

Just don't ask R to put those values into a 4 byte integer, they won't fit:


as.integer(c(x,y))

[1] NA NA
Warning message:
NAs introduced by coercion

Duncan Murdoch


Glenn D Blanford, PhD
mailto:glenn.blanf...@us.army.mil
Scientific Research Corporation
gblanf...@scires.commailto:gblanf...@scires.com


   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Maptools runs out of memory installing the help files for spCbind-methods

2010-01-28 Thread Roger Bivand

This has been seen on two ubuntu systems, but cannot be reproduced elsewhere
- this is a first report for gentoo. The fix (found by Barry Rowlingson) is
to install with R CMD INSTALL --no-latex maptools-blah.tar.gz rather than
install.packages(), with the comment that perl was taking all available
memory when --no-latex was omitted. As package maintainer, I can't reproduce
this, as I have RHEL/f12 systems rather than Debian-based ones or indeed
gentoo, which here is showing the same behaviour. If someone could offer me
ssh access to a system with problems, I can try to see whether any of the
text in that file or its successor is unpalatable.

Roger




-
Roger Bivand
Economic Geography Section
Department of Economics
Norwegian School of Economics and Business Administration
Helleveien 30
N-5045 Bergen, Norway

-- 
View this message in context: 
http://n4.nabble.com/Maptools-runs-out-of-memory-installing-the-help-files-for-spCbind-methods-tp1311062p1361079.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem associated with importing xlsx data file (Excel 2007)

2010-01-28 Thread David Winsemius


On Jan 27, 2010, at 9:41 PM, Steven Kang wrote:


Hi all,


I have imported xlsx file (Excel 2007) into R using the following  
scripts.



*library(RODBC)
*
*setwd(...) *

*query - odbcConnectExcel2007(xls.file = GI 2010.xlsx, readOnly =  
TRUE)


dat - sqlQuery(query, select * from [sheet1$], as.is = TRUE,  
na.strings =

exp)*


*dat* contain one column consisting of intergers and characters
(unique value which is exp).

However, R recognises the class of this column as 'numeric' instead of
'character' (i.e via sapply(dat, class)).

In addition, all the values of this column that are supposed to be  
class of

'character' are presented as 'NA'.


If the the vector is of type numeric then NO values in the  
vector(column) are supposed to be (or even can be) of type character.  
R does not have a mixed type vector.  You have told sqlQuery that  
exp should be converted to NA at the time of input.




Interestingly, when the file is saved in csv format and imported  
into R,

this problem does not occur.



Then that vector must be of character type. (So it's less interesting  
than you might have thought.)



Any advice on this problem?

Thank you as always.


--
Steven

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conditional editing of rows in a data frame

2010-01-28 Thread Irene Gallego Romero
Dear R users,

I have a dataframe (main.table) with ~30,000 rows and 6 columns, of
which here are a few rows:

  id chr window gene xp.normxp.top
129 1_32   1 32   TAS1R1  1.28882115 FALSE
130 1_32   1 32   ZBTB48  1.28882115 FALSE
131 1_32   1 32   KLHL21  1.28882115 FALSE
132 1_32   1 32PHF13  1.28882115 FALSE
133 1_33   1 33PHF13  1.02727430 FALSE
134 1_33   1 33THAP3  1.02727430 FALSE
135 1_33   1 33  DNAJC11  1.02727430 FALSE
136 1_33   1 33   CAMTA1  1.02727430 FALSE
137 1_34   1 34   CAMTA1  1.40312732  TRUE
138 1_35   1 35   CAMTA1  1.52104538 FALSE
139 1_36   1 36   CAMTA1  1.04853732 FALSE
140 1_37   1 37   CAMTA1  0.64794094 FALSE
141 1_38   1 38   CAMTA1  1.23026086  TRUE
142 1_38   1 38VAMP3  1.23026086  TRUE
143 1_38   1 38 PER3  1.23026086  TRUE
144 1_39   1 39 PER3  1.18154967  TRUE
145 1_39   1 39 UTS2  1.18154967  TRUE
146 1_39   1 39  TNFRSF9  1.18154967  TRUE
147 1_39   1 39PARK7  1.18154967  TRUE
148 1_39   1 39   ERRFI1  1.18154967  TRUE
149 1_40   1 40  no_gene  1.79796879 FALSE
150 1_41   1 41  SLC45A1  0.20193560 FALSE

I want to create two new columns, xp.bg and xp.n.top, using the
following criteria:

If gene is the same in consecutive rows, xp.bg is the minimum value of
xp.norm in those rows; if gene is not the same, xp.bg is simply the
value of xp.norm for that row;

Likewise, if there's a run of contiguous xp.top = TRUE values,
xp.n.top is the minimum value in that range, and if xp.top is false or
NA, xp.n.top is NA, or 0 (I don't care).

So, in the above example,
xp.bg for rows 136:141 should be 0.64794094, and is equal to xp.norm
for all other rows,
xp.n.top for row 137 is 1.40312732, 1.18154967 for rows 141:148, and
0/NA for all other rows.

Is there a way to combine indexing and if statements or some such to
accomplish this? I want to it this without using split(main.table,
main.table$gene), because there's about 20,000 unique entries for
gene, and one of the entries, no_gene, is repeated throughout. I
thought briefly of subsetting the rows where xp.top is TRUE, but I
then don't know how to set the range for min, so that it only looks at
what would originally have been consecutive rows, and searching the
help has not proved particularly useful.

Thanks in advance,
Irene Gallego Romero


-- 
Irene Gallego Romero
Leverhulme Centre for Human Evolutionary Studies
University of Cambridge
Fitzwilliam St
Cambridge
CB1 3QH
UK
email: ig...@cam.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using tcltk or other graphical widgets to view zoo time series objects

2010-01-28 Thread Research

Dear all,

I am looking at the R-help entry below:

http://finzi.psych.upenn.edu/R/Rhelp02/archive/26640.html

I have a more complicatedt problem. I have a zoo time series frame with 
100+ sequences.


I want to cycle through them back and forth and compare them to the 1st 
column at any time.


I need also a button to click when I need the viewed-selected sequence 
(that is being compared to the 1st column one) to be manipulated

(by some algorithm or be saved individually etc. etc.)...

I am trying to modify the code at the above link but somehow I can not 
make it to work with zoo time series objects.



Any help would be greatly appreciated.

Thanks in advance,
Costas


__ Information from ESET Smart Security, version of virus signature 
database 4813 (20100128) __

The message was checked by ESET Smart Security.

http://www.eset.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using tcltk or other graphical widgets to view zoo time series objects

2010-01-28 Thread Gabor Grothendieck
There is an example of using zoo together with the playwith package at
the end of the examples section of help(xyplot.zoo) which may address
this.

On Thu, Jan 28, 2010 at 7:10 AM, Research risk2...@ath.forthnet.gr wrote:
 Dear all,

 I am looking at the R-help entry below:

 http://finzi.psych.upenn.edu/R/Rhelp02/archive/26640.html

 I have a more complicatedt problem. I have a zoo time series frame with 100+
 sequences.

 I want to cycle through them back and forth and compare them to the 1st
 column at any time.

 I need also a button to click when I need the viewed-selected sequence (that
 is being compared to the 1st column one) to be manipulated
 (by some algorithm or be saved individually etc. etc.)...

 I am trying to modify the code at the above link but somehow I can not make
 it to work with zoo time series objects.


 Any help would be greatly appreciated.

 Thanks in advance,
 Costas


 __ Information from ESET Smart Security, version of virus signature
 database 4813 (20100128) __

 The message was checked by ESET Smart Security.

 http://www.eset.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Gabor Grothendieck
Its only important internally.  Externally its undesirable that the
user have to get involved in it.  The idea of making software easy to
write and use is to hide the implementation and focus on the problem.
That is why we use high level languages, object orientation, etc.

On Thu, Jan 28, 2010 at 4:37 AM, Matthew Dowle mdo...@mdowle.plus.com wrote:
 How it represents data internally is very important,  depending on the real
 goal :
 http://en.wikipedia.org/wiki/Column-oriented_DBMS


 Gabor Grothendieck ggrothendi...@gmail.com wrote in message
 news:971536df1001271710o4ea62333l7f1230b860114...@mail.gmail.com...
 How it represents data internally should not be important as long as
 you can do what you want.  SQL is declarative so you just specify what
 you want rather than how to get it and invisibly to the user it
 automatically draws up a query plan and then uses that plan to get the
 result.

 On Wed, Jan 27, 2010 at 12:48 PM, Matthew Dowle mdo...@mdowle.plus.com
 wrote:

 sqldf(select * from BOD order by Time desc limit 3)
 Exactly. SQL requires use of order by. It knows the order, but it isn't
 ordered. Thats not good, but might be fine, depending on what the real
 goal
 is.


 Gabor Grothendieck ggrothendi...@gmail.com wrote in message
 news:971536df1001270629w4795da89vb7d77af6e4e8b...@mail.gmail.com...
 On Wed, Jan 27, 2010 at 8:56 AM, Matthew Dowle mdo...@mdowle.plus.com
 wrote:
 How many columns, and of what type are the columns ? As Olga asked too,
 it
 would be useful to know more about what you're really trying to do.

 3.5m rows is not actually that many rows, even for 32bit R. Its depends
 on
 the columns and what you want to do with those columns.

 At the risk of suggesting something before we know the full facts, one
 possibility is to load the data from flat file into data.table. Use
 setkey()
 to set your keys. Use tables() to summarise your various tables. Then do
 your joins etc all-in-R. data.table has fast ways to do those sorts of
 joins (but we need more info about your task).

 Alternatively, you could check out the sqldf website. There is an
 sqlread.csv (or similar name) which can read your files directly into SQL

 read.csv.sql

 instead of going via R. Gabor has some nice examples there about that and
 its faster.

 You use some buzzwords which makes me think that SQL may not be
 appropriate
 for your task though. Can't say for sure (because we don't have enough
 information) but its possible you are struggling because SQL has no row
 ordering concept built in. That might be why you've created an increment

 In the SQLite database it automatically assigns a self incrementing
 hidden column called rowid to each row. e.g. using SQLite via the
 sqldf package on CRAN and the BOD data frame which is built into R we
 can display the rowid column explicitly by referring to it in our
 select statement:

 library(sqldf)
 BOD
 Time demand
 1 1 8.3
 2 2 10.3
 3 3 19.0
 4 4 16.0
 5 5 15.6
 6 7 19.8
 sqldf(select rowid, * from BOD)
 rowid Time demand
 1 1 1 8.3
 2 2 2 10.3
 3 3 3 19.0
 4 4 4 16.0
 5 5 5 15.6
 6 6 7 19.8


 field? Do your queries include order by incrementing field? SQL is not
 good at first and last type logic. An all-in-R solution may well be

 In SQLite you can get the top 3 values, say, like this (continuing the
 prior example):

 sqldf(select * from BOD order by Time desc limit 3)
 Time demand
 1 7 19.8
 2 5 15.6
 3 4 16.0

 better, since R is very good with ordered vectors. A 1GB data.table (or
 data.frame) for example, at 3.5m rows, could have 76 integer columns, or
 38 double columns. 1GB is well within 32bit and allows some space for
 working copies, depending on what you want to do with the data. If you
 have
 38 or less columns, or you have 64bit, then an all-in-R solution *might*
 get your task done quicker, depending on what your real goal is.

 If this sounds plausible, you could post more details and, if its
 appropriate, and luck is on your side, someone might even sketch out how
 to
 do an all-in-R solution.


 Nathan S. Watson-Haigh nathan.watson-ha...@csiro.au wrote in message
 news:4b5fde1b.10...@csiro.au...
I have a table (contact) with several fields and it's PK is an auto
increment field. I'm bulk loading data to this table from files which if
successful will be about 3.5million rows (approx 16000 rows per file).
However, I have a linking table (an_contact) to resolve a m:m
relationship
between the an and contact tables. How can I retrieve the PK's for the
data
bulk loaded into contact so I can insert the relevant data into
an_contact.

 I currently load the data into contact using:
 dbWriteTable(con, contact, dat, append=TRUE, row.names=FALSE)

 But I then need to get all the PK's which this dbWriteTable() appended
 to
 the contact table so I can load the data into my an_contact link table.
 I
 don't want to issue a separate INSERT query for each row in dat and then
 use MySQLs LAST_INSERT_ID() functionnot when I have 3.5million rows
 to
 

Re: [R] Using tcltk or other graphical widgets to view zoo time series objects

2010-01-28 Thread Felix Andrews
The playwith package might help, though if I understand the problem
correctly, the help(xyplot.zoo) example is not so relevant. If you
want to switch between many series you could use a spin-button or
somesuch. To execute a function you can create a button.

If you have a hundred-column dataset like
dat - zoo(matrix(rnorm(100*100),ncol=100), Sys.Date()+1:100)
colnames(dat) - paste(Series, 1:100)

Then this will give you a spin button to choose the column to plot,
and a button to print out the current series number.

playwith(xyplot(dat[,c(1,i)]),  parameters = list(i = 1:100,
do_something = function(playState) print(playState$env$i))
)

Note that the playwith package uses RGtk2, and therefore requires the
GTK+ libraries to be installed on your system.



On 28 January 2010 23:16, Gabor Grothendieck ggrothendi...@gmail.com wrote:
 There is an example of using zoo together with the playwith package at
 the end of the examples section of help(xyplot.zoo) which may address
 this.

 On Thu, Jan 28, 2010 at 7:10 AM, Research risk2...@ath.forthnet.gr wrote:
 Dear all,

 I am looking at the R-help entry below:

 http://finzi.psych.upenn.edu/R/Rhelp02/archive/26640.html

 I have a more complicatedt problem. I have a zoo time series frame with 100+
 sequences.

 I want to cycle through them back and forth and compare them to the 1st
 column at any time.

 I need also a button to click when I need the viewed-selected sequence (that
 is being compared to the 1st column one) to be manipulated
 (by some algorithm or be saved individually etc. etc.)...

 I am trying to modify the code at the above link but somehow I can not make
 it to work with zoo time series objects.


 Any help would be greatly appreciated.

 Thanks in advance,
 Costas


 __ Information from ESET Smart Security, version of virus signature
 database 4813 (20100128) __

 The message was checked by ESET Smart Security.

 http://www.eset.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Felix Andrews / 安福立
Postdoctoral Fellow
Integrated Catchment Assessment and Management (iCAM) Centre
Fenner School of Environment and Society [Bldg 48a]
The Australian National University
Canberra ACT 0200 Australia
M: +61 410 400 963
T: + 61 2 6125 4670
E: felix.andr...@anu.edu.au
CRICOS Provider No. 00120C
-- 
http://www.neurofractal.org/felix/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using function boot

2010-01-28 Thread COURVOISIER Delphine
Dear R Users,

I am trying to use the function boot of the boot package to sample from a 
dataframe of two character variables (N=1127). Each character variable can take 
five different values. Here is an example of the data:
1  b95-99.9 d25%
2  b95-99.9  a1%
3  b95-99.9  a1%
4  b95-99.9  a1%
5  b95-99.9  a1%
6a99.9  a1%
7  b95-99.9  a1%
8  b95-99.9  a1%
9  b95-99.9  a1%
10 b95-99.9  a1%

The statistic I want to use is the median polish (I created my own function 
that calls function medpolish from stats package). In my function, I included a 
second argument for the weight as asked by the boot function. Here is my 
function, which basically creates the table from the two variables, divides 
each cell by the sum of the column to obtain percentage, does the median 
polish, and computes the median of some of the cells:

juste.polish-function(data,w=rep(1,nrow(data))/nrow(data))
{tableR-table(data[,1],data[,2])
 tableP-tableR
 marg2-apply(tableR,2,sum)
 for (i in 1:nrow(tableP))
 {tableP[i,]-100*(tableR[i,]/marg2)}
juste.medp-medpolish(tableP)
median(c(juste.medp$residuals[dimnames(juste.medp$residuals)[[1]]==e60,1],
 juste.medp$residuals[dimnames(juste.medp$residuals)[[1]]==d60-79,2],
 juste.medp$residuals[dimnames(juste.medp$residuals)[[1]]==c80-94,3],
 
 juste.medp$residuals[dimnames(juste.medp$residuals)[[1]]==b95-99.9,4],
 
 juste.medp$residuals[dimnames(juste.medp$residuals)[[1]]==a99.9,5]))
}

When I call the boot function 
(juste.boot-boot(data=mydata,statistic=juste.polish,R=999)), it works but 
computes the same parameter at every boot sample, as if the resampling did not 
work and always provided a sample identical to the original sample.

If you have any ideas, I would be very grateful.

thanks,

delphine

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] add points to 3D plot using p3d {onion}

2010-01-28 Thread Uwe Ligges



On 27.01.2010 17:50, Viechtbauer Wolfgang (STAT) wrote:

Just as an aside, the scatterplot3d package does things like this very cleverly. 
Essentially, when you create a plot with scatterplot3d, the function actually returns 
functions with values set so that points3d(), for example, knows the axis 
scaling.



Right, it makes use of lexical scoping properties, where the environment 
is attached to the returned graphics functions.


Uwe Ligges







Best,



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional editing of rows in a data frame

2010-01-28 Thread David Winsemius


On Jan 28, 2010, at 7:05 AM, Irene Gallego Romero wrote:


Dear R users,

I have a dataframe (main.table) with ~30,000 rows and 6 columns, of
which here are a few rows:

 id chr window gene xp.normxp.top
129 1_32   1 32   TAS1R1  1.28882115 FALSE
130 1_32   1 32   ZBTB48  1.28882115 FALSE
131 1_32   1 32   KLHL21  1.28882115 FALSE
132 1_32   1 32PHF13  1.28882115 FALSE
133 1_33   1 33PHF13  1.02727430 FALSE
134 1_33   1 33THAP3  1.02727430 FALSE
135 1_33   1 33  DNAJC11  1.02727430 FALSE
136 1_33   1 33   CAMTA1  1.02727430 FALSE
137 1_34   1 34   CAMTA1  1.40312732  TRUE
138 1_35   1 35   CAMTA1  1.52104538 FALSE
139 1_36   1 36   CAMTA1  1.04853732 FALSE
140 1_37   1 37   CAMTA1  0.64794094 FALSE
141 1_38   1 38   CAMTA1  1.23026086  TRUE
142 1_38   1 38VAMP3  1.23026086  TRUE
143 1_38   1 38 PER3  1.23026086  TRUE
144 1_39   1 39 PER3  1.18154967  TRUE
145 1_39   1 39 UTS2  1.18154967  TRUE
146 1_39   1 39  TNFRSF9  1.18154967  TRUE
147 1_39   1 39PARK7  1.18154967  TRUE
148 1_39   1 39   ERRFI1  1.18154967  TRUE
149 1_40   1 40  no_gene  1.79796879 FALSE
150 1_41   1 41  SLC45A1  0.20193560 FALSE

I want to create two new columns, xp.bg and xp.n.top, using the
following criteria:

If gene is the same in consecutive rows, xp.bg is the minimum value of
xp.norm in those rows; if gene is not the same, xp.bg is simply the
value of xp.norm for that row;


Assuming that gene values are adjacent in a dataframe named df1, then  
this would work:


df1$xp.bg- with(df1, ave(xp.norm, gene, FUN=min))



Likewise, if there's a run of contiguous xp.top = TRUE values,
xp.n.top is the minimum value in that range, and if xp.top is false or
NA, xp.n.top is NA, or 0 (I don't care).


df1$seqgrp - c(0, diff(df1$xp.top))
df1$seqgrp2 - cumsum(df1$seqgrp != 0)
df1$xp.n.top - with(df1, ave(xp.norm, seqgrp2, FUN=min))
is.na(df1$xp.n.top) - !xp.top

 df1$xp.bg- with(df1, ave(xp.norm, gene, FUN=min))
 df1
  id chr windowgene   xp.norm xp.top seqgrp seqgrp2  
xp.n.top xp.bg
129 1_32   1 32  TAS1R1 1.2888211  FALSE  0   0   NA  
1.2888211
130 1_32   1 32  ZBTB48 1.2888211  FALSE  0   0   NA  
1.2888211
131 1_32   1 32  KLHL21 1.2888211  FALSE  0   0   NA  
1.2888211
132 1_32   1 32   PHF13 1.2888211  FALSE  0   0   NA  
1.0272743
133 1_33   1 33   PHF13 1.0272743  FALSE  0   0   NA  
1.0272743
134 1_33   1 33   THAP3 1.0272743  FALSE  0   0   NA  
1.0272743
135 1_33   1 33 DNAJC11 1.0272743  FALSE  0   0   NA  
1.0272743
136 1_33   1 33  CAMTA1 1.0272743  FALSE  0   0   NA  
0.6479409
137 1_34   1 34  CAMTA1 1.4031273   TRUE  1   1 1.403127  
0.6479409
138 1_35   1 35  CAMTA1 1.5210454  FALSE -1   2   NA  
0.6479409
139 1_36   1 36  CAMTA1 1.0485373  FALSE  0   2   NA  
0.6479409
140 1_37   1 37  CAMTA1 0.6479409  FALSE  0   2   NA  
0.6479409
141 1_38   1 38  CAMTA1 1.2302609   TRUE  1   3 1.181550  
0.6479409
142 1_38   1 38   VAMP3 1.2302609   TRUE  0   3 1.181550  
1.2302609
143 1_38   1 38PER3 1.2302609   TRUE  0   3 1.181550  
1.1815497
144 1_39   1 39PER3 1.1815497   TRUE  0   3 1.181550  
1.1815497
145 1_39   1 39UTS2 1.1815497   TRUE  0   3 1.181550  
1.1815497
146 1_39   1 39 TNFRSF9 1.1815497   TRUE  0   3 1.181550  
1.1815497
147 1_39   1 39   PARK7 1.1815497   TRUE  0   3 1.181550  
1.1815497
148 1_39   1 39  ERRFI1 1.1815497   TRUE  0   3 1.181550  
1.1815497
149 1_40   1 40 no_gene 1.7979688  FALSE -1   4   NA  
1.7979688
150 1_41   1 41 SLC45A1 0.2019356  FALSE  0   4   NA  
0.2019356



And if the adjacent-gene assumption of the first request above were  
not met, then the first portion of this method could be used instead  
to great group indices.


--
David.



So, in the above example,
xp.bg for rows 136:141 should be 0.64794094, and is equal to xp.norm
for all other rows,
xp.n.top for row 137 is 1.40312732, 1.18154967 for rows 141:148, and
0/NA for all other rows.

Is there a way to combine indexing and if statements or some such to
accomplish this? I want to it this without using split(main.table,
main.table$gene), because there's about 20,000 unique entries for
gene, and one of the entries, no_gene, is repeated throughout. I
thought briefly of subsetting the rows where xp.top is TRUE, but I
then don't know how to set the range for min, so that it only looks at
what would originally have been consecutive rows, and searching the
help has not proved particularly 

[R] AFT-model with time-varying covariates and left-truncation

2010-01-28 Thread Philipp Rappold

Dear Prof. Broström,
Dear R-mailinglist,

first of all thanks a lot for your great effort to incorporate 
time-varying covariates into aftreg. It works like a charm so far 
and I'll update you with detailled benchmarks as soon as I have them.


I have one more questions regarding Accelerated Failure Time models 
(with aftreg):


You mention that left truncation in combination with time-varying 
covariates only works if ...it can be assumed that the covariate 
values during the first non-observable interval are the same as at 
the beginning of the first interval under observation.. My question 
is: Is there a way to use an AFT model where one has no explicit 
assumption about what values the covariates have before the subject 
enters the study (see example below if unclear)? For me personally 
it would already be a great help to know if this is statistically 
feasible in general, however I'm also interested if it can me 
modelled with aftreg.


EXAMPLE (to make sure we're talking about the same thing):
Suppose I want to model the lifetime of two wearparts A and B with 
temperature as a covariate. For some reason, I can only observe 
the temperature at three distinct times t1, t2, t3 where they each 
have a certain age (5 hours, 6 hours, 7 hours respectively). Of 
course, I have a different temperature for each part at each 
observation t1, t2, t3. Unfortunately at t1 both parts have not been 
used for the first time and already have a certain age (5 hours) and 
I cannot observe what the temperature was before (at ages 1hr, 2hr, 
...).


Thanks a lot for your help!

All the best
Philipp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional editing of rows in a data frame

2010-01-28 Thread Gabor Grothendieck
If DF is your data frame then:

DF$xp.bg - ave(DF$xp.norm, DF$gene, FUN = min)

will create a new column such that the entry in each row has the
minimum xp.norm of all rows with the same gene.  ave does  use split
internally but I think it would be worth trying anyways since its only
one short line of code.

See help(ave)

On Thu, Jan 28, 2010 at 7:05 AM, Irene Gallego Romero ig...@cam.ac.uk wrote:
 Dear R users,

 I have a dataframe (main.table) with ~30,000 rows and 6 columns, of
 which here are a few rows:

      id chr window         gene     xp.norm    xp.top
 129 1_32   1     32       TAS1R1  1.28882115     FALSE
 130 1_32   1     32       ZBTB48  1.28882115     FALSE
 131 1_32   1     32       KLHL21  1.28882115     FALSE
 132 1_32   1     32        PHF13  1.28882115     FALSE
 133 1_33   1     33        PHF13  1.02727430     FALSE
 134 1_33   1     33        THAP3  1.02727430     FALSE
 135 1_33   1     33      DNAJC11  1.02727430     FALSE
 136 1_33   1     33       CAMTA1  1.02727430     FALSE
 137 1_34   1     34       CAMTA1  1.40312732      TRUE
 138 1_35   1     35       CAMTA1  1.52104538     FALSE
 139 1_36   1     36       CAMTA1  1.04853732     FALSE
 140 1_37   1     37       CAMTA1  0.64794094     FALSE
 141 1_38   1     38       CAMTA1  1.23026086      TRUE
 142 1_38   1     38        VAMP3  1.23026086      TRUE
 143 1_38   1     38         PER3  1.23026086      TRUE
 144 1_39   1     39         PER3  1.18154967      TRUE
 145 1_39   1     39         UTS2  1.18154967      TRUE
 146 1_39   1     39      TNFRSF9  1.18154967      TRUE
 147 1_39   1     39        PARK7  1.18154967      TRUE
 148 1_39   1     39       ERRFI1  1.18154967      TRUE
 149 1_40   1     40      no_gene  1.79796879     FALSE
 150 1_41   1     41      SLC45A1  0.20193560     FALSE

 I want to create two new columns, xp.bg and xp.n.top, using the
 following criteria:

 If gene is the same in consecutive rows, xp.bg is the minimum value of
 xp.norm in those rows; if gene is not the same, xp.bg is simply the
 value of xp.norm for that row;

 Likewise, if there's a run of contiguous xp.top = TRUE values,
 xp.n.top is the minimum value in that range, and if xp.top is false or
 NA, xp.n.top is NA, or 0 (I don't care).

 So, in the above example,
 xp.bg for rows 136:141 should be 0.64794094, and is equal to xp.norm
 for all other rows,
 xp.n.top for row 137 is 1.40312732, 1.18154967 for rows 141:148, and
 0/NA for all other rows.

 Is there a way to combine indexing and if statements or some such to
 accomplish this? I want to it this without using split(main.table,
 main.table$gene), because there's about 20,000 unique entries for
 gene, and one of the entries, no_gene, is repeated throughout. I
 thought briefly of subsetting the rows where xp.top is TRUE, but I
 then don't know how to set the range for min, so that it only looks at
 what would originally have been consecutive rows, and searching the
 help has not proved particularly useful.

 Thanks in advance,
 Irene Gallego Romero


 --
 Irene Gallego Romero
 Leverhulme Centre for Human Evolutionary Studies
 University of Cambridge
 Fitzwilliam St
 Cambridge
 CB1 3QH
 UK
 email: ig...@cam.ac.uk

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems with fitdistr

2010-01-28 Thread Peter Ehlers

Do you have any zeros in your data?
fitdistr() will need start values (see the code),
but even with start values, optim() will have problems.

x - rweibull(100, 2, 10)
fitdistr(x, weibull)  ## no problem
fitdistr(c(0,x), weibull)  ## your error message
fitdistr(c(0,x), weibull,start=list(shape=2,scale=10))
  ## still an error message from optim()

 -Peter Ehlers

vikrant wrote:

Hi,
I want to estimate parameters of weibull distribution. For this, I am using
fitdistr() function  in MASS package.But when I give fitdistr(c,weibull) I
get a Error as follows:- 
 Error in optim(x = c(4L, 41L, 20L, 6L, 12L, 6L, 7L, 13L, 2L, 8L, 22L, 
: 
 non-finite value supplied by optim

Any help or suggestions are most welcomed


--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrained vector permutation

2010-01-28 Thread Jason Smith
Andrew Rominger ajrominger at gmail.com writes:

 I'm trying to permute a vector of positive integers  0 with the constraint

Hi Andy

I'm not sure if you are explicitly wanting to use a sampling approach, but the 
gtools library has a permutations function (found by ??permutation then ?
gtools::combinations).

Hope this helps,
Jason Smith

Here is the script I used:


# Constraint
# f(n_i) = 2 * f(n_(i-1))
#
# Given a start value and the number of elements
# recursively generate a vector representing the 
# maximum values each index is allowed
#
f - function(value, num_elements) {
#cat(paste(f(,value,,,num_elements,)\n))
if (num_elements  1) {
value;
} else {
z - c(value,f(2*value, num_elements-1))
}
}

# Generate base vector
v - 2:6

# Calculate constraint vector
v.constraints - f(v[1],length(v)-1)

# Generate permutations using gtools functions
library(gtools) 
v.permutations - permutations(length(v), length(v), v)

# Check each permutation
results - apply(v.permutations,1, function(x) all(x = v.constraints))

#
# Display Results
#
print(Original Vector)
print(v)
print(Constraint Vector)
print(v.constraints)
print(Does Vector meet Constraints)
print(cbind(v.permutations,results))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Matthew Dowle
Are you claiming that SQL is that utopia?  SQL is a row store.  It cannot 
give the user the benefits of column store.

For example, why does SQL take 113 seconds in the example in this thread :
http://tolstoy.newcastle.edu.au/R/e9/help/10/01/1872.html
but data.table takes 5 seconds to get the same result ? How come the high 
level language SQL doesn't appear to hide the user from this detail ?

If you are just describing utopia, then of course I agree.  It would be 
great to have a language which hid us from this.  In the meantime the user 
has choices, and the best choice depends on the task and the real goal.

Gabor Grothendieck ggrothendi...@gmail.com wrote in message 
news:971536df1001280428p345f8ff4v5f3a80c13f96d...@mail.gmail.com...
Its only important internally.  Externally its undesirable that the
user have to get involved in it.  The idea of making software easy to
write and use is to hide the implementation and focus on the problem.
That is why we use high level languages, object orientation, etc.

On Thu, Jan 28, 2010 at 4:37 AM, Matthew Dowle mdo...@mdowle.plus.com 
wrote:
 How it represents data internally is very important, depending on the real
 goal :
 http://en.wikipedia.org/wiki/Column-oriented_DBMS


 Gabor Grothendieck ggrothendi...@gmail.com wrote in message
 news:971536df1001271710o4ea62333l7f1230b860114...@mail.gmail.com...
 How it represents data internally should not be important as long as
 you can do what you want. SQL is declarative so you just specify what
 you want rather than how to get it and invisibly to the user it
 automatically draws up a query plan and then uses that plan to get the
 result.

 On Wed, Jan 27, 2010 at 12:48 PM, Matthew Dowle mdo...@mdowle.plus.com
 wrote:

 sqldf(select * from BOD order by Time desc limit 3)
 Exactly. SQL requires use of order by. It knows the order, but it isn't
 ordered. Thats not good, but might be fine, depending on what the real
 goal
 is.


 Gabor Grothendieck ggrothendi...@gmail.com wrote in message
 news:971536df1001270629w4795da89vb7d77af6e4e8b...@mail.gmail.com...
 On Wed, Jan 27, 2010 at 8:56 AM, Matthew Dowle mdo...@mdowle.plus.com
 wrote:
 How many columns, and of what type are the columns ? As Olga asked too,
 it
 would be useful to know more about what you're really trying to do.

 3.5m rows is not actually that many rows, even for 32bit R. Its depends
 on
 the columns and what you want to do with those columns.

 At the risk of suggesting something before we know the full facts, one
 possibility is to load the data from flat file into data.table. Use
 setkey()
 to set your keys. Use tables() to summarise your various tables. Then do
 your joins etc all-in-R. data.table has fast ways to do those sorts of
 joins (but we need more info about your task).

 Alternatively, you could check out the sqldf website. There is an
 sqlread.csv (or similar name) which can read your files directly into 
 SQL

 read.csv.sql

 instead of going via R. Gabor has some nice examples there about that 
 and
 its faster.

 You use some buzzwords which makes me think that SQL may not be
 appropriate
 for your task though. Can't say for sure (because we don't have enough
 information) but its possible you are struggling because SQL has no row
 ordering concept built in. That might be why you've created an increment

 In the SQLite database it automatically assigns a self incrementing
 hidden column called rowid to each row. e.g. using SQLite via the
 sqldf package on CRAN and the BOD data frame which is built into R we
 can display the rowid column explicitly by referring to it in our
 select statement:

 library(sqldf)
 BOD
 Time demand
 1 1 8.3
 2 2 10.3
 3 3 19.0
 4 4 16.0
 5 5 15.6
 6 7 19.8
 sqldf(select rowid, * from BOD)
 rowid Time demand
 1 1 1 8.3
 2 2 2 10.3
 3 3 3 19.0
 4 4 4 16.0
 5 5 5 15.6
 6 6 7 19.8


 field? Do your queries include order by incrementing field? SQL is not
 good at first and last type logic. An all-in-R solution may well be

 In SQLite you can get the top 3 values, say, like this (continuing the
 prior example):

 sqldf(select * from BOD order by Time desc limit 3)
 Time demand
 1 7 19.8
 2 5 15.6
 3 4 16.0

 better, since R is very good with ordered vectors. A 1GB data.table (or
 data.frame) for example, at 3.5m rows, could have 76 integer columns, or
 38 double columns. 1GB is well within 32bit and allows some space for
 working copies, depending on what you want to do with the data. If you
 have
 38 or less columns, or you have 64bit, then an all-in-R solution *might*
 get your task done quicker, depending on what your real goal is.

 If this sounds plausible, you could post more details and, if its
 appropriate, and luck is on your side, someone might even sketch out how
 to
 do an all-in-R solution.


 Nathan S. Watson-Haigh nathan.watson-ha...@csiro.au wrote in message
 news:4b5fde1b.10...@csiro.au...
I have a table (contact) with several fields and it's PK is an auto
increment field. I'm bulk loading data 

Re: [R] number of decimal

2010-01-28 Thread Peter Ehlers

?formatC
?sprintf

Ivan Calandra wrote:

Hi everybody,

I'm trying to set the number of decimals (i.e. the number of digits 
after the .). I looked into options but I can only set the total 
number of digits, with options(digits=6). But since I have different 
variables with different order of magnitude, I would like that they're 
all displayed with the same number of decimals.
I searched for it and found the format() function, with nsmall=6, but it 
is for a given vector. I would like to set it for the whole session, as 
with options.


Can anyone help me?
Thanks in advance
Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrained vector permutation

2010-01-28 Thread Jason Smith
I just realized I read through your email too quickly and my script does
not actually address the constraint on each permutation, sorry about that.

You should be able to use the permutations function to generate the vector 
permutations however.

Jason

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Print lattice output to table?

2010-01-28 Thread GL

I have beautiful box and whisker charts formatted with lattice, which is
obviously calculating summary statistics internally in order to draw the
charts. Is there a way to dump the associated summary tables that are being
used to generate the charts? Realize I could use tapply or such to get
something similar, but I have all the groupings and such already configured
to generate the charts. Simply want to dump those values to a table so that
I don't have to interpolate where the 75th percentile is on a visual chart.
Appreciate any thoughts..
-- 
View this message in context: 
http://n4.nabble.com/Print-lattice-output-to-table-tp1375040p1375040.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Setting breaks for histogram of dates

2010-01-28 Thread Loris Bennett
Hi,

I have a list of dates like this: 

  date
  2009-12-03
  2009-12-11
  2009-10-07
  2010-01-25
  2010-01-05
  2009-09-09
  2010-01-19
  2010-01-25
  2009-02-05
  2010-01-25
  2010-01-27
  2010-01-27
  ...

and am creating a histogram like this

  t - read.table(test.dat,header=TRUE)
  hist(as.Date(t$date), years, format = %d/%m/%y, freq=TRUE)
  
However, I would rather not label the breaks themselves, but instead
print the date with the format %Y, between the breaks.

Is there a simple way of doing this?

Regards

Loris

-- 
Dr. Loris Bennett
ZEDAT Computer Centre
Freie Universität Berlin
Berlin, Germany

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] select one row from data-frame by name, indirectly (as string)

2010-01-28 Thread Oliver
Hello,

say I have a dataframe   x
and it contains rows  like ch_01, ch_02 and so on.

How can I select those channels iundirectly, by name?

I tried to select the data with get() but get() seems only to work
on simple variables?

Or how to do it?

I need something like that:


name1 - ch_01
name2 - ch_02

selected - function( x, name1)


Any ideas?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] select one row from data-frame by name, indirectly (as string)

2010-01-28 Thread Oliver

OK, now it works... just using [ and ]  or [[ and ]] works.
I thought have tried it before... why does it workj now and not before?

hmhh


sorry for the traffic

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] select one row from data-frame by name, indirectly (as string)

2010-01-28 Thread David Winsemius


On Jan 28, 2010, at 10:04 AM, Oliver wrote:



OK, now it works... just using [ and ]  or [[ and ]] works.
I thought have tried it before... why does it workj now and not  
before?


Provide your console session and someone can tell you. Failing that,  
you are asking us to read your mind.







David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] number of decimal

2010-01-28 Thread Ivan Calandra

It looks to me that it does more or less the same as format().

Maybe I didn't explain myself correctly then. I would like to set the 
number of decimal by default, for the whole R session, like I do with 
options(digits=6). Except that digits sets up the number of digits 
(including what is before the .). I'm looking for some option that 
will let me set the number of digits AFTER the .


Example: I have 102.33556677 and 2.999555666
If I set the number of decimal to 6, I should get: 102.335567 and 2.999556.
And that for all numbers that will be in/output from R (read.table, 
write.table, statistic tests, etc)


Or is it that I didn't understand everything about formatC() and sprintf()?

Thanks again
Ivan

Le 1/28/2010 15:12, Peter Ehlers a écrit :

?formatC
?sprintf

Ivan Calandra wrote:

Hi everybody,

I'm trying to set the number of decimals (i.e. the number of digits 
after the .). I looked into options but I can only set the total 
number of digits, with options(digits=6). But since I have different 
variables with different order of magnitude, I would like that 
they're all displayed with the same number of decimals.
I searched for it and found the format() function, with nsmall=6, but 
it is for a given vector. I would like to set it for the whole 
session, as with options.


Can anyone help me?
Thanks in advance
Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Print lattice output to table?

2010-01-28 Thread Deepayan Sarkar
On Thu, Jan 28, 2010 at 6:25 AM, GL pfl...@shands.ufl.edu wrote:

 I have beautiful box and whisker charts formatted with lattice, which is
 obviously calculating summary statistics internally in order to draw the
 charts. Is there a way to dump the associated summary tables that are being
 used to generate the charts? Realize I could use tapply or such to get
 something similar, but I have all the groupings and such already configured
 to generate the charts. Simply want to dump those values to a table so that
 I don't have to interpolate where the 75th percentile is on a visual chart.
 Appreciate any thoughts..

You can customize the function that computes the summary statistics,
and that seems like the only reasonable entry-point for you. A simple
example:

bwplot(voice.part ~ height, data = singer,
   stats = function(...) {
   ans - boxplot.stats(...)
   str(ans)
   ans
   })

You will need to figure out how you will dump the parts you want.

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] make a grid with longitude, latitude and bathymetry data

2010-01-28 Thread karine heerah

hi,

i have a longitude vector (x) a latitude vector (y) and a matrix of bathymetry 
(z) with the dimensions (x,y). I have already succeeded in plotting it with the 
image.plot (package 'field') and the contour functions.

But now, I want to make a grid in order to extract easily the bathymetry 
corresponding to a couple of longitude, latitude coordinates. 

Do you know a function or a package which can help me? Or do you know how to do 
it? (because i have already looked for it on the internet and i didn't find 
anything)

Thanks a lot.

Karine



  
_



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Print lattice output to table?

2010-01-28 Thread GL

That works great. Thanks!
-- 
View this message in context: 
http://n4.nabble.com/Print-lattice-output-to-table-tp1375040p1380862.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional density plot in lattice

2010-01-28 Thread Deepayan Sarkar
On Fri, Jan 22, 2010 at 2:08 AM, Dieter Menne
dieter.me...@menne-biomed.de wrote:


 Deepayan Sarkar wrote:

 With a restructuring of the data:

 df1 = data.frame(x=0:n, y1=((0:n)/n)^2, y2=1-((0:n)/n)^2, age=young)
 df2 = data.frame(x=0:n, y1=((0:n)/n)^3, y2=1-((0:n)/n)^3, age=old)
 df = rbind(df1, df2)

 xyplot((y1+y2) + y1 ~ x | age, data=df, type = l)

 xyplot((y1+y2) + y1 ~ x | age, data=df, type = l,
        scales = list(axs = i),
        panel = panel.superpose,
        panel.groups = function(x, y, fill, ...) {
            panel.polygon(c(min(x), x, max(x)), c(0, y, 0), fill = fill)
        })



 Thanks, Deepayan. I noted, that the color of the bands is determined by
 superpose.symbol. Is that by design or typo?

By design, in the sense that the default 'fill' for panel.superpose is
taken from superpose.symbol$fill, which makes sense because the
default is to plot symbols.

You could supply a top-level vector 'fill' instead (which could be
trellis.par.get(superpose.poygon)$fill).

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] color palette for points, lines, text / interactive Rcolorpicker?

2010-01-28 Thread Michael Friendly
I'm looking for a scheme to generate a default color palette for 
plotting points, lines and text (on a white or transparent background)

with from 2 to say 9 colors with the following constraints:
- red is reserved for another purpose
- colors should be highly distinct
- avoid light colors (like yellows)

In RColorBrewer, most of the schemes are designed for area fill rather 
than points and lines. The closest I can find

for these needs is the Dark2 palette, e.g.,

library(RColorBrewer)
display.brewer.pal(7,Dark2)

I'm wondering if there is something else I can use.

On a related note, I wonder if there is something like an interactive 
color picker for R.  For example,

http://research.stowers-institute.org/efg/R/Color/Chart/
displays several charts of all R colors.  I'd like to find something 
that displays such a chart and uses
identify() to select a set of tiles, whose colors() indices are returned 
by the function.


-Michael

--
Michael Friendly Email: friendly AT yorku DOT ca 
Professor, Psychology Dept.

York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grid.image(), pckg grid

2010-01-28 Thread Markus Loecher
While I am very happy with and awed by the grid package and its basic
plotting primitives such as grid.points, grid.lines, etc, I was wondering
whether the equivalent of a grid.image() function exists ?

Any pointer would be helpful.

Thanks !

Markus

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] exporting multidimensional matrix from R

2010-01-28 Thread Gopikrishna Deshpande
Hi,

I have a matrix of size 19x512x20 in R. I want to export this file into
another format which can be imported into MATLAB.
write.xls or write.table exports only one dimension.
please send a code if possible. I am very new to R and have been struggling
with this.

Thanks !
Gopi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] number of decimal

2010-01-28 Thread Peter Ehlers

Ivan Calandra wrote:

It looks to me that it does more or less the same as format().

Maybe I didn't explain myself correctly then. I would like to set the 
number of decimal by default, for the whole R session, like I do with 
options(digits=6). Except that digits sets up the number of digits 
(including what is before the .). I'm looking for some option that 
will let me set the number of digits AFTER the .


Example: I have 102.33556677 and 2.999555666
If I set the number of decimal to 6, I should get: 102.335567 and 2.999556.
And that for all numbers that will be in/output from R (read.table, 
write.table, statistic tests, etc)


Or is it that I didn't understand everything about formatC() and sprintf()?

You didn't:

formatC(x, digits=6, format=f)
[1] 102.335567 2.999556

sprintf(%12.6f, x)
[1]   102.335567 2.999556

 -Peter Ehlers




Thanks again
Ivan

Le 1/28/2010 15:12, Peter Ehlers a écrit :

?formatC
?sprintf

Ivan Calandra wrote:

Hi everybody,

I'm trying to set the number of decimals (i.e. the number of digits 
after the .). I looked into options but I can only set the total 
number of digits, with options(digits=6). But since I have different 
variables with different order of magnitude, I would like that 
they're all displayed with the same number of decimals.
I searched for it and found the format() function, with nsmall=6, but 
it is for a given vector. I would like to set it for the whole 
session, as with options.


Can anyone help me?
Thanks in advance
Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.







--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting additive ns components

2010-01-28 Thread Thomas Lumley

On Wed, 27 Jan 2010, David Winsemius wrote:



On Jan 27, 2010, at 9:09 PM, GlenB wrote:




I have an additive model of the following form :

zmdlfit - lm(z~ns(x,df=6)+ns(y,df=6))

I can get the fitted values and plot them against z easily enough, but I
also want to both obtain and plot the two additive components (the 
estimates

of the two additive terms on the RHS)



?termplot.

 -thomas

Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] number of decimal

2010-01-28 Thread Marc Schwartz
Ivan,

The default behavior for print()ing objects to the console in an R session is 
via the use of the print.* methods. For real numerics, print.default() is used 
and the format is based upon the number of significant digits, not the number 
of decimal places. There is also an interaction with par(scipen), which 
influences when scientific notation is used. See ?print.default for more 
information on defaults and behavior, taking note of the 'digits' argument, 
which is influenced by options(digits).

Importantly, you need to differentiate between how R stores numeric real values 
and how it displays or prints them. Internally, R stores real numbers using a 
double precision data type by default.

The internal storage is not truncated by default and is stored to full 
precision for doubles, within binary representation limits. You can of course 
modify the values using functions such as round() or truncate(), etc. See 
?round for more information.

For display, Peter has already pointed you to sprintf() and related functions, 
which allow you to format output for pretty printing to things like column 
aligned tables and such. Those do not however, affect the default output to the 
R console.

HTH,

Marc Schwartz


On Jan 28, 2010, at 9:21 AM, Ivan Calandra wrote:

 It looks to me that it does more or less the same as format().
 
 Maybe I didn't explain myself correctly then. I would like to set the number 
 of decimal by default, for the whole R session, like I do with 
 options(digits=6). Except that digits sets up the number of digits (including 
 what is before the .). I'm looking for some option that will let me set the 
 number of digits AFTER the .
 
 Example: I have 102.33556677 and 2.999555666
 If I set the number of decimal to 6, I should get: 102.335567 and 2.999556.
 And that for all numbers that will be in/output from R (read.table, 
 write.table, statistic tests, etc)
 
 Or is it that I didn't understand everything about formatC() and sprintf()?
 
 Thanks again
 Ivan
 
 Le 1/28/2010 15:12, Peter Ehlers a écrit :
 ?formatC
 ?sprintf
 
 Ivan Calandra wrote:
 Hi everybody,
 
 I'm trying to set the number of decimals (i.e. the number of digits after 
 the .). I looked into options but I can only set the total number of 
 digits, with options(digits=6). But since I have different variables with 
 different order of magnitude, I would like that they're all displayed with 
 the same number of decimals.
 I searched for it and found the format() function, with nsmall=6, but it is 
 for a given vector. I would like to set it for the whole session, as with 
 options.
 
 Can anyone help me?
 Thanks in advance
 Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] large integers in R

2010-01-28 Thread Thomas Lumley

On Thu, 28 Jan 2010, Benilton Carvalho wrote:


Hi Duncan,

On Tue, Jan 26, 2010 at 9:09 PM, Duncan Murdoch murd...@stats.uwo.ca wrote:

On 26/01/2010 3:25 PM, Blanford, Glenn wrote:


Has there been any update on R's handling large integers greater than 10^9
(between 10^9 and 4x10^9) ?

as.integer() in R 2.9.2 lists this as a restriction but doesnt list the
actual limit or cause, nor if anyone was looking at fixing it.


Integers in R are 4 byte signed integers, so the upper limit is 2^31-1.
 That's not likely to change soon.


But in the hypothetical scenario that this was to change soon and we
were to have 64bit integer type (say, when under a 64 bit OS),
wouldn't this allow us to have objects whose length exceeded the
2^31-1 limit?



The other possibility is that an additional longer type capable of holding 
vector lengths would be included.  In addition to the issues that Duncan 
mentioned, having the integer type be 64-bit means that it wouldn't match the 
Fortran default INTEGER type or the C int on most platforms, which are 32-bit.  
Calling C code would become more difficult.

 -thomas

Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] number of decimal

2010-01-28 Thread Ivan Calandra

First things first: thanks for your help!

I see where the confusion is. With formatC and sprintf, I have to store 
the numbers I want to change into x.


I would like a way without applying a function on specific numbers 
because I can shorten the numbers that way, but it won't give me more 
decimals for a test for example.
What I mean here is that if I have a F-value = 1.225, formatC won't give 
me the next 3 decimals, it will just add zeros.
I need that because for some of my variables, the sample differ only at 
the 6th decimal (0.05 vs 0.06), and for other ones the order of 
magnitude is much higher (120.120225 vs 210.665331). So 
options(digits=6) cannot do the job as I would like. To make myself even 
clearer, notice that in my example, all numbers have 6 decimals, but a 
different number of digits.


I hope I'm not bothering you with this question, but I believe that the 
functions you advised me will not do what I need.
I really need something that will set up the number of decimals by 
default, before the numbers are created by any function.
Does such an option even exist in R? Or is it that it doesn't make sense 
to have different numbers of digits? Would it be better to compare 
0.05 and 210.665? Therefore options(digits=6) would be enough.


Regards,
Ivan

Le 1/28/2010 16:43, Peter Ehlers a écrit :

Ivan Calandra wrote:

It looks to me that it does more or less the same as format().

Maybe I didn't explain myself correctly then. I would like to set the 
number of decimal by default, for the whole R session, like I do with 
options(digits=6). Except that digits sets up the number of digits 
(including what is before the .). I'm looking for some option that 
will let me set the number of digits AFTER the .


Example: I have 102.33556677 and 2.999555666
If I set the number of decimal to 6, I should get: 102.335567 and 
2.999556.
And that for all numbers that will be in/output from R (read.table, 
write.table, statistic tests, etc)


Or is it that I didn't understand everything about formatC() and 
sprintf()?

You didn't:

formatC(x, digits=6, format=f)
[1] 102.335567 2.999556

sprintf(%12.6f, x)
[1]   102.335567 2.999556

 -Peter Ehlers




Thanks again
Ivan

Le 1/28/2010 15:12, Peter Ehlers a écrit :

?formatC
?sprintf

Ivan Calandra wrote:

Hi everybody,

I'm trying to set the number of decimals (i.e. the number of digits 
after the .). I looked into options but I can only set the total 
number of digits, with options(digits=6). But since I have 
different variables with different order of magnitude, I would like 
that they're all displayed with the same number of decimals.
I searched for it and found the format() function, with nsmall=6, 
but it is for a given vector. I would like to set it for the whole 
session, as with options.


Can anyone help me?
Thanks in advance
Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.









__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] number of decimal

2010-01-28 Thread Peter Ehlers

Looks like I didn't read your post carefully enough.
If you want some sort of global option to set the
display of numbers from any operation performed by R
then that's not likely to be possible without
capturing all output and formatting it yourself.
As the saying goes 'good luck with that'.

Note that options(digits=..) won't give you the
requested number of digits in all parts of, say,
print(t.test(x,y)).

 -Peter Ehlers

Peter Ehlers wrote:

Ivan Calandra wrote:

It looks to me that it does more or less the same as format().

Maybe I didn't explain myself correctly then. I would like to set the 
number of decimal by default, for the whole R session, like I do with 
options(digits=6). Except that digits sets up the number of digits 
(including what is before the .). I'm looking for some option that 
will let me set the number of digits AFTER the .


Example: I have 102.33556677 and 2.999555666
If I set the number of decimal to 6, I should get: 102.335567 and 
2.999556.
And that for all numbers that will be in/output from R (read.table, 
write.table, statistic tests, etc)


Or is it that I didn't understand everything about formatC() and 
sprintf()?

You didn't:

formatC(x, digits=6, format=f)
[1] 102.335567 2.999556

sprintf(%12.6f, x)
[1]   102.335567 2.999556

 -Peter Ehlers

 


Thanks again
Ivan

Le 1/28/2010 15:12, Peter Ehlers a écrit :

?formatC
?sprintf

Ivan Calandra wrote:

Hi everybody,

I'm trying to set the number of decimals (i.e. the number of digits 
after the .). I looked into options but I can only set the total 
number of digits, with options(digits=6). But since I have different 
variables with different order of magnitude, I would like that 
they're all displayed with the same number of decimals.
I searched for it and found the format() function, with nsmall=6, 
but it is for a given vector. I would like to set it for the whole 
session, as with options.


Can anyone help me?
Thanks in advance
Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.









--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] number of decimal

2010-01-28 Thread David Winsemius


On Jan 28, 2010, at 10:55 AM, Marc Schwartz wrote:


Ivan,

The default behavior for print()ing objects to the console in an R  
session is via the use of the print.* methods. For real numerics,  
print.default() is used and the format is based upon the number of  
significant digits, not the number of decimal places. There is also  
an interaction with par(scipen), which influences when scientific  
notation is used. See ?print.default for more information on  
defaults and behavior, taking note of the 'digits' argument, which  
is influenced by options(digits).


Importantly, you need to differentiate between how R stores numeric  
real values and how it displays or prints them. Internally, R stores  
real numbers using a double precision data type by default.


The internal storage is not truncated by default and is stored to  
full precision for doubles, within binary representation limits. You  
can of course modify the values using functions such as round() or  
truncate(), etc. See ?round for more information.


For display, Peter has already pointed you to sprintf() and related  
functions, which allow you to format output for pretty printing to  
things like column aligned tables and such. Those do not however,  
affect the default output to the R console.


If one alters print.default, one can get different behavior, for  
instance:


print.default - function (x, digits = NULL, quote = TRUE, na.print =  
NULL, print.gap = NULL,

right = FALSE, max = NULL, useSource = TRUE, ...)
{if (is.numeric(x)) {x - as.numeric(sprintf(%7.3f, x))}
noOpt - missing(digits)  missing(quote)  missing(na.print) 
missing(print.gap)  missing(right)  missing(max) 
missing(useSource)  length(list(...)) == 0L
.Internal(print.default(x, digits, quote, na.print, print.gap,
right, max, useSource, noOpt))
}

This will have the requested effect for numeric vectors, but does not  
seem to be altering the behavior of print.data.frame().


 print(ac2)
   score pt times trt
1  28.825139  1 0   1
2  97.458521  1 3   1
3  26.217289  1 6   1
4  80.636507  2 0   1
5  99.729364  2 3   1
6  85.812312  2 6   1
7   2.515870  3 0   1
8   3.893545  3 3   1
9  55.666848  3 6   1
10 21.966027  4 0   1
 print(ac2$score)
 [1] 28.825 97.459 26.217 80.637 99.729 85.812  2.516  3.894 55.667  
21.966





HTH,

Marc Schwartz


On Jan 28, 2010, at 9:21 AM, Ivan Calandra wrote:


It looks to me that it does more or less the same as format().

Maybe I didn't explain myself correctly then. I would like to set  
the number of decimal by default, for the whole R session, like I  
do with options(digits=6). Except that digits sets up the number of  
digits (including what is before the .). I'm looking for some  
option that will let me set the number of digits AFTER the .


Example: I have 102.33556677 and 2.999555666
If I set the number of decimal to 6, I should get: 102.335567 and  
2.999556.
And that for all numbers that will be in/output from R (read.table,  
write.table, statistic tests, etc)


Or is it that I didn't understand everything about formatC() and  
sprintf()?


Thanks again
Ivan

Le 1/28/2010 15:12, Peter Ehlers a écrit :

?formatC
?sprintf

Ivan Calandra wrote:

Hi everybody,

I'm trying to set the number of decimals (i.e. the number of  
digits after the .). I looked into options but I can only set  
the total number of digits, with options(digits=6). But since I  
have different variables with different order of magnitude, I  
would like that they're all displayed with the same number of  
decimals.
I searched for it and found the format() function, with nsmall=6,  
but it is for a given vector. I would like to set it for the  
whole session, as with options.


Can anyone help me?
Thanks in advance
Ivan


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] number of decimal

2010-01-28 Thread Ivan Calandra
I guess the easiest solution for me would therefore be to set 
options(digits) to a high number, and then round down if I need to!

Thanks you both for your input!
Ivan


Le 1/28/2010 17:02, Peter Ehlers a écrit :

Looks like I didn't read your post carefully enough.
If you want some sort of global option to set the
display of numbers from any operation performed by R
then that's not likely to be possible without
capturing all output and formatting it yourself.
As the saying goes 'good luck with that'.

Note that options(digits=..) won't give you the
requested number of digits in all parts of, say,
print(t.test(x,y)).

 -Peter Ehlers

Peter Ehlers wrote:

Ivan Calandra wrote:

It looks to me that it does more or less the same as format().

Maybe I didn't explain myself correctly then. I would like to set 
the number of decimal by default, for the whole R session, like I do 
with options(digits=6). Except that digits sets up the number of 
digits (including what is before the .). I'm looking for some 
option that will let me set the number of digits AFTER the .


Example: I have 102.33556677 and 2.999555666
If I set the number of decimal to 6, I should get: 102.335567 and 
2.999556.
And that for all numbers that will be in/output from R (read.table, 
write.table, statistic tests, etc)


Or is it that I didn't understand everything about formatC() and 
sprintf()?

You didn't:

formatC(x, digits=6, format=f)
[1] 102.335567 2.999556

sprintf(%12.6f, x)
[1]   102.335567 2.999556

 -Peter Ehlers




Thanks again
Ivan

Le 1/28/2010 15:12, Peter Ehlers a écrit :

?formatC
?sprintf

Ivan Calandra wrote:

Hi everybody,

I'm trying to set the number of decimals (i.e. the number of 
digits after the .). I looked into options but I can only set 
the total number of digits, with options(digits=6). But since I 
have different variables with different order of magnitude, I 
would like that they're all displayed with the same number of 
decimals.
I searched for it and found the format() function, with nsmall=6, 
but it is for a given vector. I would like to set it for the whole 
session, as with options.


Can anyone help me?
Thanks in advance
Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.











__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RMySQL install

2010-01-28 Thread Robert Schneider

Hi everyone,

I am trying to install the RMySQL package under windows xp. I've got the MySQL 
installed on the computer (MySQL server 5.1). I went through the steps 
presented on the webpage http://biostat.mc.vanderbilt.edu/wiki/Main/RMySQL and 
googled around and still can't find the answer. With the command

readRegistry(SOFTWARE\\MySQL AB, hive=HLM, maxdepth=2)

I get the following info:

-
$`MySQL Connector/ODBC 5.1`
$`MySQL Connector/ODBC 5.1`$Version
[1] 5.1.6

$`MySQL Server 5.1`
$`MySQL Server 5.1`$DataLocation
[1] C:\\Documents and Settings\\All Users\\Application Data\\MySQL\\MySQL 
Server 5.1\\

$`MySQL Server 5.1`$FoundExistingDataDir
[1] 0

$`MySQL Server 5.1`$Location
[1] C:\\Program Files\\MySQL\\MySQL Server 5.1\\

$`MySQL Server 5.1`$Version
[1] 5.1.42

$`MySQL Workbench 5.2 OSS`
$`MySQL Workbench 5.2 OSS`$Location
[1] C:\\Program Files\\MySQL\\MySQL Workbench 5.2 OSS\\

$`MySQL Workbench 5.2 OSS`$Version
[1] 5.2.14
-

Everything seems to be ok. However, when loading the package, I get (sorry, 
it's in French):

-
Le chargement a nécessité le package : DBI
Error in if (utils::file_test(-d, MySQLhome)) break : 
  l'argument est de longueur nulle
De plus : Messages d'avis :
1: le package 'RMySQL' a été compilé avec la version R 2.10.1 
2: le package 'DBI' a été compilé avec la version R 2.10.1 
Error : .onLoad a échoué dans 'loadNamespace' pour 'RMySQL'
Erreur : le chargement du package / espace de noms a échoué pour 'RMySQL'
-

Do you need MySQL Server 5.0, or will it work with 5.1 also ?

Thanks for any help.

Rob
  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] number of decimal

2010-01-28 Thread Peter Ehlers

Ivan,

Now I'm no longer sure of just what you want. Are you concerned about
the *internal* handling of numbers by R or just about the *printing*
of numbers? As Marc has pointed out, internally R will use the full
precision that your input allows.

Perhaps you're using the F-value from the output of a
procedure like aov() as input to further analysis. If so,
don't use the printed value; pull the value out of the
object with something like

 fm - aov(y ~ x)
 Fval - summary(fm)[[1]][1,4]

But maybe this is not at all what you're after.

 -Peter Ehlers

Ivan Calandra wrote:

First things first: thanks for your help!

I see where the confusion is. With formatC and sprintf, I have to store 
the numbers I want to change into x.


I would like a way without applying a function on specific numbers 
because I can shorten the numbers that way, but it won't give me more 
decimals for a test for example.
What I mean here is that if I have a F-value = 1.225, formatC won't give 
me the next 3 decimals, it will just add zeros.
I need that because for some of my variables, the sample differ only at 
the 6th decimal (0.05 vs 0.06), and for other ones the order of 
magnitude is much higher (120.120225 vs 210.665331). So 
options(digits=6) cannot do the job as I would like. To make myself even 
clearer, notice that in my example, all numbers have 6 decimals, but a 
different number of digits.


I hope I'm not bothering you with this question, but I believe that the 
functions you advised me will not do what I need.
I really need something that will set up the number of decimals by 
default, before the numbers are created by any function.
Does such an option even exist in R? Or is it that it doesn't make sense 
to have different numbers of digits? Would it be better to compare 
0.05 and 210.665? Therefore options(digits=6) would be enough.


Regards,
Ivan

Le 1/28/2010 16:43, Peter Ehlers a écrit :

Ivan Calandra wrote:

It looks to me that it does more or less the same as format().

Maybe I didn't explain myself correctly then. I would like to set the 
number of decimal by default, for the whole R session, like I do with 
options(digits=6). Except that digits sets up the number of digits 
(including what is before the .). I'm looking for some option that 
will let me set the number of digits AFTER the .


Example: I have 102.33556677 and 2.999555666
If I set the number of decimal to 6, I should get: 102.335567 and 
2.999556.
And that for all numbers that will be in/output from R (read.table, 
write.table, statistic tests, etc)


Or is it that I didn't understand everything about formatC() and 
sprintf()?

You didn't:

formatC(x, digits=6, format=f)
[1] 102.335567 2.999556

sprintf(%12.6f, x)
[1]   102.335567 2.999556

 -Peter Ehlers




Thanks again
Ivan

Le 1/28/2010 15:12, Peter Ehlers a écrit :

?formatC
?sprintf

Ivan Calandra wrote:

Hi everybody,

I'm trying to set the number of decimals (i.e. the number of digits 
after the .). I looked into options but I can only set the total 
number of digits, with options(digits=6). But since I have 
different variables with different order of magnitude, I would like 
that they're all displayed with the same number of decimals.
I searched for it and found the format() function, with nsmall=6, 
but it is for a given vector. I would like to set it for the whole 
session, as with options.


Can anyone help me?
Thanks in advance
Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.












--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Recoding Variables in R

2010-01-28 Thread Mathew, Abraham T
 

VAR 980490 

Some people have suggested placing new limits on foreign

imports in order to protect American jobs. Others say

that such limits would raise consumer prices and hurt

American exports.

Do you FAVOR or OPPOSE placing new limits on imports, or

haven't you thought much about this?

1. Favor

5. Oppose

8. DK

9. NA; RF

0. Haven't thought much about this

 

 

I am trying to recode the data for the following public opinion question from 
the ANES. I would like to throw out 8 and 9. Furthermore, I would like to 
reorder the responses so that:

1. Oppose (originally 5)

2. Haven't though much about this (originally 0)

3. favor (originally 1)

 

I tried the following, which did not work:

library(car)
data96$V961327 - recode(data96$V961327, c(1)=2; c(2)=3; c(3)=1)

 

 

I also tried the following, which also did not work:

new - as.numeric(data96$V961327)
new
data96$V961327 - recode(new, c(5)=1; c(0)=2; c(1)=3)

 

 

 

Help,

Abraham M

 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data.frame manipulation

2010-01-28 Thread AC Del Re
Thank you, Dennis and Petr.

One more question:  when aggregating to one es per id, how would I go about
keeping the other variables in the data.frame (e.g., keeping the value for
the first row of the other variables, such as mod2) e.g.:

# Dennis provided this example (notice how mod2 is removed from the output):

 with(x, aggregate(list(es = es), by = list(id = id, mod1 = mod1), mean))
  id mod1   es
1  31 0.20
2  12 0.30
3  24 0.15

# How can I get this output (taking the first row of the other variable in
the data.frame):

id  es   mod1  mod2

1  .30 2wai
2  .15 4other
3  .20 1 itas


Thank you,

AC


On Thu, Jan 28, 2010 at 1:29 AM, Petr PIKAL petr.pi...@precheza.cz wrote:

 HI

 r-help-boun...@r-project.org napsal dne 28.01.2010 04:35:29:

   Hi All,
  
   I'm conducting a meta-analysis and have taken a data.frame with
 multiple
   rows per
   study (for each effect size) and performed a weighted average of
 effect
   size for
   each study. This results in a reduced # of rows. I am particularly
   interested in
   simply reducing the additional variables in the data.frame to the
 first row
   of the
   corresponding id variable. For example:
  
   id-c(1,2,2,3,3,3)
   es-c(.3,.1,.3,.1,.2,.3)
   mod1-c(2,4,4,1,1,1)
   mod2-c(wai,other,calpas,wai,itas,other)
   data-as.data.frame(cbind(id,es,mod1,mod2))

 Do not use cbind. Its output is a matrix and in this case character
 matrix. Resulting data frame will consist from factors as you can check by


 str(data)

 data-data.frame(id=id,es=es,mod1=mod1,mod2=mod2)


  
   data
  
  id   esmod1 mod2
   1  1   0.32 wai
   2  2   0.14 other
   3  2   0.24 calpas
   4  3   0.11 itas
   5  3   0.21 wai
   6  3   0.31 wai
  
   # I would like to reduce the entire data.frame like this:

 E.g. aggregate

 aggregate(data[, -(3:4)], data[,3:4], mean)
  mod1   mod2 id  es
 14 calpas  2 0.3
 21   itas  3 0.2
 31  other  3 0.3
 44  other  2 0.1
 51wai  3 0.1
 62wai  1 0.3

 doBy or tapply or ddply from plyr library or 

 Regards
 Petr

  
   id  es   mod1  mod2
  
   1  .30 2wai
   2  .15 4other
   3  .20 1 itas
  
   # If possible, I would also like the option of this (collapsing on id
 and
   mod2):
  
   id  es   mod1  mod2
   1  .30  2wai
   2   0.1 4   other
   2   0.2  4calpas
   3   0.1 1 itas
   3   0.251 wai
  
   Any help is much appreciated!
  
   AC Del Re
  
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Setting base level for contrasts with lme

2010-01-28 Thread Marcin Kozak
Hi all,

Note:
lm(Yield ~ Block + C(Variety, base = 2), Alfalfa)

equals
i - 2; lm(Yield ~ Block + C(Variety, base = i), Alfalfa)

However,
lme(Yield ~ C(Variety, base = 2), Alfalfa, random=~1|Block)

which is fine, does not equal
i - 2; lme(Yield ~ C(Variety, base = i), Alfalfa, random=~1|Block)
after which I get the message
Error in model.frame.default(formula = ~Yield + Variety + i + Block,
data = list( :  variable lengths differ (found for 'i')

Is everything fine with that?

Regards,
Marcin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply on multiple groups

2010-01-28 Thread David Winsemius


On Jan 28, 2010, at 10:26 AM, GL wrote:



Can you make tapply break down groups similar to bwplot or such?  
Example:


Data frame has one measure (Days) and two Dimensions (MM and  
Place). All

have the same length.


length(dbs.final$Days)

[1] 3306

length()

[1] 3306

length()

[1] 3306

Doing the following makes a nice table for one dimension and one  
measure:


   do.call(rbind,tapply(dbs.final$Days,dbs.final$Place, summary))

But, what I really need to do is break it down on two dimensions and  
one

measures - effectively equivalent to the following bwplot call:

   bwplot( Days ~ MM | Place, ,data=dbs.final)

Is there an equivalent to the | operation in tapply?


Please reread the help page for tapply.

Perhaps?:

tapply(dbs.final$Days, list(dbs.final$MM, dbs.final$Place) summary)

-- David



--
View this message in context: 
http://n4.nabble.com/tapply-on-multiple-groups-tp1380593p1380593.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recoding Variables in R

2010-01-28 Thread John Fox
Dear Abraham,

If I follow correctly what you want to do, the following should do it:

 f - factor(c(1, 1, 5, 5, 8, 8, 9, 9, 0, 0))
 f
 [1] 1 1 5 5 8 8 9 9 0 0
Levels: 0 1 5 8 9
 recode(f,  '1'=3; '5'=1; '0'=2; else=NA )
 [1] 3311NA NA NA NA 22   
Levels: 1 2 3

I think that your problem was that you didn't distinguish correctly between
factor levels and their numeric encoding; factor levels should be quoted in
recode().

I hope this helps,
 John


John Fox
Senator William McMaster 
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On
 Behalf Of Mathew, Abraham T
 Sent: January-28-10 10:15 AM
 To: r-help@r-project.org
 Subject: [R] Recoding Variables in R
 
 
 
 VAR 980490
 
 Some people have suggested placing new limits on foreign
 
 imports in order to protect American jobs. Others say
 
 that such limits would raise consumer prices and hurt
 
 American exports.
 
 Do you FAVOR or OPPOSE placing new limits on imports, or
 
 haven't you thought much about this?
 
 1. Favor
 
 5. Oppose
 
 8. DK
 
 9. NA; RF
 
 0. Haven't thought much about this
 
 
 
 
 
 I am trying to recode the data for the following public opinion question
from
 the ANES. I would like to throw out 8 and 9. Furthermore, I would like to
 reorder the responses so that:
 
 1. Oppose (originally 5)
 
 2. Haven't though much about this (originally 0)
 
 3. favor (originally 1)
 
 
 
 I tried the following, which did not work:
 
 library(car)
 data96$V961327 - recode(data96$V961327, c(1)=2; c(2)=3; c(3)=1)
 
 
 
 
 
 I also tried the following, which also did not work:
 
 new - as.numeric(data96$V961327)
 new
 data96$V961327 - recode(new, c(5)=1; c(0)=2; c(1)=3)
 
 
 
 
 
 
 
 Help,
 
 Abraham M
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Gabor Grothendieck
I think one would only be concerned about such internals if one were
primarily interested in performance; otherwise, one would be more
interested in ease of specification and part of that ease is having it
independent of implementation and separating implementation from
specification activities.  An example of separation of specification
and implementation is that by simply specifying a disk-based database
rather than an in-memory database SQL can perform queries that take
more space than memory.  The query itself need not be modified.

I think the viewpoint you are discussing is primarily one of
performance whereas the viewpoint I was discussing is primarily ease
of use and that accounts for the difference.

I believe your performance comparison is comparing a sequence of
operations that include building a database, transferring data to it,
performing the operation, reading it back in and destroying the
database to an internal manipulation.  I would expect the internal
manipulation, particular one done primarily in C code as is the case
with data.table, to be faster although some benchmarks of the database
approach found that it compared surprisingly well to straight R code
-- some users of sqldf found that for an 8000 row data frame sqldf
actually ran faster than aggregate and also faster than tapply.  The
News section on the sqldf home page provides links to their
benchmarks.  Thus if R is fast enough then its likely that the
database approach is fast enough too since its even faster.

On Thu, Jan 28, 2010 at 8:52 AM, Matthew Dowle mdo...@mdowle.plus.com wrote:
 Are you claiming that SQL is that utopia?  SQL is a row store.  It cannot
 give the user the benefits of column store.

 For example, why does SQL take 113 seconds in the example in this thread :
 http://tolstoy.newcastle.edu.au/R/e9/help/10/01/1872.html
 but data.table takes 5 seconds to get the same result ? How come the high
 level language SQL doesn't appear to hide the user from this detail ?

 If you are just describing utopia, then of course I agree.  It would be
 great to have a language which hid us from this.  In the meantime the user
 has choices, and the best choice depends on the task and the real goal.

 Gabor Grothendieck ggrothendi...@gmail.com wrote in message
 news:971536df1001280428p345f8ff4v5f3a80c13f96d...@mail.gmail.com...
 Its only important internally.  Externally its undesirable that the
 user have to get involved in it.  The idea of making software easy to
 write and use is to hide the implementation and focus on the problem.
 That is why we use high level languages, object orientation, etc.

 On Thu, Jan 28, 2010 at 4:37 AM, Matthew Dowle mdo...@mdowle.plus.com
 wrote:
 How it represents data internally is very important, depending on the real
 goal :
 http://en.wikipedia.org/wiki/Column-oriented_DBMS


 Gabor Grothendieck ggrothendi...@gmail.com wrote in message
 news:971536df1001271710o4ea62333l7f1230b860114...@mail.gmail.com...
 How it represents data internally should not be important as long as
 you can do what you want. SQL is declarative so you just specify what
 you want rather than how to get it and invisibly to the user it
 automatically draws up a query plan and then uses that plan to get the
 result.

 On Wed, Jan 27, 2010 at 12:48 PM, Matthew Dowle mdo...@mdowle.plus.com
 wrote:

 sqldf(select * from BOD order by Time desc limit 3)
 Exactly. SQL requires use of order by. It knows the order, but it isn't
 ordered. Thats not good, but might be fine, depending on what the real
 goal
 is.


 Gabor Grothendieck ggrothendi...@gmail.com wrote in message
 news:971536df1001270629w4795da89vb7d77af6e4e8b...@mail.gmail.com...
 On Wed, Jan 27, 2010 at 8:56 AM, Matthew Dowle mdo...@mdowle.plus.com
 wrote:
 How many columns, and of what type are the columns ? As Olga asked too,
 it
 would be useful to know more about what you're really trying to do.

 3.5m rows is not actually that many rows, even for 32bit R. Its depends
 on
 the columns and what you want to do with those columns.

 At the risk of suggesting something before we know the full facts, one
 possibility is to load the data from flat file into data.table. Use
 setkey()
 to set your keys. Use tables() to summarise your various tables. Then do
 your joins etc all-in-R. data.table has fast ways to do those sorts of
 joins (but we need more info about your task).

 Alternatively, you could check out the sqldf website. There is an
 sqlread.csv (or similar name) which can read your files directly into
 SQL

 read.csv.sql

 instead of going via R. Gabor has some nice examples there about that
 and
 its faster.

 You use some buzzwords which makes me think that SQL may not be
 appropriate
 for your task though. Can't say for sure (because we don't have enough
 information) but its possible you are struggling because SQL has no row
 ordering concept built in. That might be why you've created an increment

 In the SQLite database it automatically assigns a self 

Re: [R] tapply on multiple groups

2010-01-28 Thread Gigi Lipori
Thanks. My mistake was that I used c(dbs.final$Days,dbs.final$Place) instead of 
list(... when I tried to follow that part of the documentation. 

 David Winsemius dwinsem...@comcast.net 1/28/2010 11:49 AM 

On Jan 28, 2010, at 10:26 AM, GL wrote:


 Can you make tapply break down groups similar to bwplot or such?  
 Example:

 Data frame has one measure (Days) and two Dimensions (MM and  
 Place). All
 have the same length.

 length(dbs.final$Days)
 [1] 3306
 length()
 [1] 3306
 length()
 [1] 3306

 Doing the following makes a nice table for one dimension and one  
 measure:

do.call(rbind,tapply(dbs.final$Days,dbs.final$Place, summary))

 But, what I really need to do is break it down on two dimensions and  
 one
 measures - effectively equivalent to the following bwplot call:

bwplot( Days ~ MM | Place, ,data=dbs.final)

 Is there an equivalent to the | operation in tapply?

Please reread the help page for tapply.

Perhaps?:

tapply(dbs.final$Days, list(dbs.final$MM, dbs.final$Place) summary)

-- David


 -- 
 View this message in context: 
 http://n4.nabble.com/tapply-on-multiple-groups-tp1380593p1380593.html 
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] weighted least squares vs linear regression

2010-01-28 Thread DispersionMap


I need to find out the difference between the way R calculates weighted
regression and standard regression.


I want to plot a 95% confidence interval around an estimte i got from least
squares regression.

I cant find he documentation for this

ive looked in 
?stats
?lm
?predict.lm
?weights
?residuals.lm

Can anyone shed light?

thanks

Chris.  
-- 
View this message in context: 
http://n4.nabble.com/weighted-least-squares-vs-linear-regression-tp1387957p1387957.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] number of decimal

2010-01-28 Thread Marc Schwartz

On Jan 28, 2010, at 10:04 AM, David Winsemius wrote:

 
 On Jan 28, 2010, at 10:55 AM, Marc Schwartz wrote:
 
 Ivan,
 
 The default behavior for print()ing objects to the console in an R session 
 is via the use of the print.* methods. For real numerics, print.default() is 
 used and the format is based upon the number of significant digits, not the 
 number of decimal places. There is also an interaction with par(scipen), 
 which influences when scientific notation is used. See ?print.default for 
 more information on defaults and behavior, taking note of the 'digits' 
 argument, which is influenced by options(digits).
 
 Importantly, you need to differentiate between how R stores numeric real 
 values and how it displays or prints them. Internally, R stores real numbers 
 using a double precision data type by default.
 
 The internal storage is not truncated by default and is stored to full 
 precision for doubles, within binary representation limits. You can of 
 course modify the values using functions such as round() or truncate(), etc. 
 See ?round for more information.
 
 For display, Peter has already pointed you to sprintf() and related 
 functions, which allow you to format output for pretty printing to things 
 like column aligned tables and such. Those do not however, affect the 
 default output to the R console.
 
 If one alters print.default, one can get different behavior, for instance:
 
 print.default - function (x, digits = NULL, quote = TRUE, na.print = NULL, 
 print.gap = NULL,
right = FALSE, max = NULL, useSource = TRUE, ...)
 {if (is.numeric(x)) {x - as.numeric(sprintf(%7.3f, x))}
noOpt - missing(digits)  missing(quote)  missing(na.print) 
missing(print.gap)  missing(right)  missing(max) 
missing(useSource)  length(list(...)) == 0L
.Internal(print.default(x, digits, quote, na.print, print.gap,
right, max, useSource, noOpt))
 }
 
 This will have the requested effect for numeric vectors, but does not seem to 
 be altering the behavior of print.data.frame().
 
  print(ac2)
   score pt times trt
 1  28.825139  1 0   1
 2  97.458521  1 3   1
 3  26.217289  1 6   1
 4  80.636507  2 0   1
 5  99.729364  2 3   1
 6  85.812312  2 6   1
 7   2.515870  3 0   1
 8   3.893545  3 3   1
 9  55.666848  3 6   1
 10 21.966027  4 0   1
  print(ac2$score)
 [1] 28.825 97.459 26.217 80.637 99.729 85.812  2.516  3.894 55.667 21.966
 


David,

The issue there is that when printing the vector, you are using print.default() 
directly, so you get the desired result with a numeric vector. 

When you print the data frame, internally print.data.frame() calls 
format.data.frame(), which then internally uses format() on a column-by-column 
basis and there is the rub. format() brings you back to using significant 
digits on numeric vectors and of course returns a character vector. By the time 
the output is actually print()ed to the console, the original data frame has 
been converted to a formatted character matrix and that is what gets printed.

 str(format.data.frame(ac2))
'data.frame':   10 obs. of  4 variables:
 $ score:Class 'AsIs'  chr [1:10] 28.825139 97.458521 26.217289 
80.636507 ...
 $ pt   :Class 'AsIs'  chr [1:10] 1 1 1 2 ...
 $ times:Class 'AsIs'  chr [1:10] 0 3 6 0 ...
 $ trt  :Class 'AsIs'  chr [1:10] 1 1 1 1 ...


 str(format.data.frame(ac2, digits = 2))
'data.frame':   10 obs. of  4 variables:
 $ score:Class 'AsIs'  chr [1:10]  28.8  97.5  26.2  80.6 ...
 $ pt   :Class 'AsIs'  chr [1:10] 1 1 1 2 ...
 $ times:Class 'AsIs'  chr [1:10] 0 3 6 0 ...
 $ trt  :Class 'AsIs'  chr [1:10] 1 1 1 1 ...


This is why changing print.default() by itself is not sufficient. Other object 
classes are formatted and printed in varying ways and print methods have been 
defined for them which may not use it directly. 

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] exporting multidimensional matrix from R

2010-01-28 Thread David Winsemius


On Jan 28, 2010, at 10:42 AM, Gopikrishna Deshpande wrote:


Hi,

I have a matrix of size 19x512x20 in R.


No, you don't. Matrices are only 2 dimensional in R. You may have an  
array, however.



I want to export this file into
another format which can be imported into MATLAB.
write.xls or write.table exports only one dimension.
please send a code if possible. I am very new to R and have been  
struggling

with this.


install.packages(pkgs=R.matlab, dependencies=TRUE)
library(R.matlab)
?writeMat
filename - ~/test.mat
writeMat(filename, arr=arr)

 readMat(filename)
$arr
, , 1

 [,1] [,2] [,3]
[1,]147
[2,]258
[3,]369

, , 2

 [,1] [,2] [,3]
[1,]   10   13   16
[2,]   11   14   17
[3,]   12   15   18

, , 3

 [,1] [,2] [,3]
[1,]   19   22   25
[2,]   20   23   26
[3,]   21   24   27


attr(,header)
attr(,header)$description
[1] MATLAB 5.0 MAT-file, Platform: unix, Software: R v2.10.1, Created  
on: Thu Jan 28 12:08:25 2010  


attr(,header)$version
[1] 5

attr(,header)$endian
[1] little




Thanks !
Gopi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] number of decimal

2010-01-28 Thread David Winsemius


On Jan 28, 2010, at 12:08 PM, Marc Schwartz wrote:



On Jan 28, 2010, at 10:04 AM, David Winsemius wrote:



On Jan 28, 2010, at 10:55 AM, Marc Schwartz wrote:


Ivan,

The default behavior for print()ing objects to the console in an R  
session is via the use of the print.* methods. For real numerics,  
print.default() is used and the format is based upon the number of  
significant digits, not the number of decimal places. There is  
also an interaction with par(scipen), which influences when  
scientific notation is used. See ?print.default for more  
information on defaults and behavior, taking note of the 'digits'  
argument, which is influenced by options(digits).


Importantly, you need to differentiate between how R stores  
numeric real values and how it displays or prints them.  
Internally, R stores real numbers using a double precision data  
type by default.


The internal storage is not truncated by default and is stored to  
full precision for doubles, within binary representation limits.  
You can of course modify the values using functions such as  
round() or truncate(), etc. See ?round for more information.


For display, Peter has already pointed you to sprintf() and  
related functions, which allow you to format output for pretty  
printing to things like column aligned tables and such. Those do  
not however, affect the default output to the R console.


If one alters print.default, one can get different behavior, for  
instance:


print.default - function (x, digits = NULL, quote = TRUE, na.print  
= NULL, print.gap = NULL,

  right = FALSE, max = NULL, useSource = TRUE, ...)
{if (is.numeric(x)) {x - as.numeric(sprintf(%7.3f, x))}
  noOpt - missing(digits)  missing(quote)  missing(na.print) 
  missing(print.gap)  missing(right)  missing(max) 
  missing(useSource)  length(list(...)) == 0L
  .Internal(print.default(x, digits, quote, na.print, print.gap,
  right, max, useSource, noOpt))
}

This will have the requested effect for numeric vectors, but does  
not seem to be altering the behavior of print.data.frame().



print(ac2)

 score pt times trt
1  28.825139  1 0   1
2  97.458521  1 3   1
3  26.217289  1 6   1
4  80.636507  2 0   1
5  99.729364  2 3   1
6  85.812312  2 6   1
7   2.515870  3 0   1
8   3.893545  3 3   1
9  55.666848  3 6   1
10 21.966027  4 0   1

print(ac2$score)
[1] 28.825 97.459 26.217 80.637 99.729 85.812  2.516  3.894 55.667  
21.966





David,

The issue there is that when printing the vector, you are using  
print.default() directly, so you get the desired result with a  
numeric vector.


Thanks, Marc;

I do understand. I had been hoping that there might be a final common  
pathway to use a biochemistry analogy, at least for numeric objects,  
but it appears not.


--
David.



When you print the data frame, internally print.data.frame() calls  
format.data.frame(), which then internally uses format() on a column- 
by-column basis and there is the rub. format() brings you back to  
using significant digits on numeric vectors and of course returns a  
character vector. By the time the output is actually print()ed to  
the console, the original data frame has been converted to a  
formatted character matrix and that is what gets printed.



str(format.data.frame(ac2))

'data.frame':   10 obs. of  4 variables:
$ score:Class 'AsIs'  chr [1:10] 28.825139 97.458521 26.217289  
80.636507 ...

$ pt   :Class 'AsIs'  chr [1:10] 1 1 1 2 ...
$ times:Class 'AsIs'  chr [1:10] 0 3 6 0 ...
$ trt  :Class 'AsIs'  chr [1:10] 1 1 1 1 ...



str(format.data.frame(ac2, digits = 2))

'data.frame':   10 obs. of  4 variables:
$ score:Class 'AsIs'  chr [1:10]  28.8  97.5  26.2  80.6 ...
$ pt   :Class 'AsIs'  chr [1:10] 1 1 1 2 ...
$ times:Class 'AsIs'  chr [1:10] 0 3 6 0 ...
$ trt  :Class 'AsIs'  chr [1:10] 1 1 1 1 ...


This is why changing print.default() by itself is not sufficient.  
Other object classes are formatted and printed in varying ways and  
print methods have been defined for them which may not use it  
directly.


HTH,

Marc Schwartz



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] weighted least squares vs linear regression

2010-01-28 Thread Bert Gunter
You'll probably need to consult a suitable text on linear models/applied
regression, as this is a statistics, not an R question -- or look for a
suitable tutorial on the web. You might also try one of the statistics
mailing lists or Google on some suitable phrase. 

Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of DispersionMap
Sent: Thursday, January 28, 2010 9:06 AM
To: r-help@r-project.org
Subject: [R] weighted least squares vs linear regression



I need to find out the difference between the way R calculates weighted
regression and standard regression.


I want to plot a 95% confidence interval around an estimte i got from least
squares regression.

I cant find he documentation for this

ive looked in 
?stats
?lm
?predict.lm
?weights
?residuals.lm

Can anyone shed light?

thanks

Chris.  
-- 
View this message in context:
http://n4.nabble.com/weighted-least-squares-vs-linear-regression-tp1387957p1
387957.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data.frame manipulation

2010-01-28 Thread Dennis Murphy
Hi:


On Thu, Jan 28, 2010 at 8:40 AM, AC Del Re de...@wisc.edu wrote:

 Thank you, Dennis and Petr.

 One more question:  when aggregating to one es per id, how would I go about
 keeping the other variables in the data.frame (e.g., keeping the value for
 the first row of the other variables, such as mod2) e.g.:

 # Dennis provided this example (notice how mod2 is removed from the
 output):

  with(x, aggregate(list(es = es), by = list(id = id, mod1 = mod1), mean))
   id mod1   es
 1  31 0.20
 2  12 0.30
 3  24 0.15

 # How can I get this output (taking the first row of the other variable in
 the data.frame):

 id  es   mod1  mod2

 1  .30 2wai
 2  .15 4other
 3  .20 1 itas


Using ddply from the plyr package:

 ddply(x, .(id, mod1), summarize, es = mean(es), mod2 = head(mod2, 1))
  id mod1   es  mod2
1  12 0.30   wai
2  24 0.15 other
3  31 0.20  itas

mod2 = head(...)  selects the first instance of mod2 in each id/mod1
combination.

It appears from the help page that aggregate only allows one summary
function
per call; if so, it wouldn't be able to do this. You could, however, do this
in the
doBy package with a custom summary function.

HTH,
Dennis



 Thank you,

 AC


 On Thu, Jan 28, 2010 at 1:29 AM, Petr PIKAL petr.pi...@precheza.czwrote:

 HI

 r-help-boun...@r-project.org napsal dne 28.01.2010 04:35:29:

   Hi All,
  
   I'm conducting a meta-analysis and have taken a data.frame with
 multiple
   rows per
   study (for each effect size) and performed a weighted average of
 effect
   size for
   each study. This results in a reduced # of rows. I am particularly
   interested in
   simply reducing the additional variables in the data.frame to the
 first row
   of the
   corresponding id variable. For example:
  
   id-c(1,2,2,3,3,3)
   es-c(.3,.1,.3,.1,.2,.3)
   mod1-c(2,4,4,1,1,1)
   mod2-c(wai,other,calpas,wai,itas,other)
   data-as.data.frame(cbind(id,es,mod1,mod2))

 Do not use cbind. Its output is a matrix and in this case character
 matrix. Resulting data frame will consist from factors as you can check by


 str(data)

 data-data.frame(id=id,es=es,mod1=mod1,mod2=mod2)


  
   data
  
  id   esmod1 mod2
   1  1   0.32 wai
   2  2   0.14 other
   3  2   0.24 calpas
   4  3   0.11 itas
   5  3   0.21 wai
   6  3   0.31 wai
  
   # I would like to reduce the entire data.frame like this:

 E.g. aggregate

 aggregate(data[, -(3:4)], data[,3:4], mean)
  mod1   mod2 id  es
 14 calpas  2 0.3
 21   itas  3 0.2
 31  other  3 0.3
 44  other  2 0.1
 51wai  3 0.1
 62wai  1 0.3

 doBy or tapply or ddply from plyr library or 

 Regards
 Petr

  
   id  es   mod1  mod2
  
   1  .30 2wai
   2  .15 4other
   3  .20 1 itas
  
   # If possible, I would also like the option of this (collapsing on id
 and
   mod2):
  
   id  es   mod1  mod2
   1  .30  2wai
   2   0.1 4   other
   2   0.2  4calpas
   3   0.1 1 itas
   3   0.251 wai
  
   Any help is much appreciated!
  
   AC Del Re
  
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems with fitdistr

2010-01-28 Thread J. R. M. Hosking

vikrant wrote:

Hi,
I want to estimate parameters of weibull distribution. For this, I am using
fitdistr() function  in MASS package.But when I give fitdistr(c,weibull) I
get a Error as follows:- 
 Error in optim(x = c(4L, 41L, 20L, 6L, 12L, 6L, 7L, 13L, 2L, 8L, 22L, 
: 
 non-finite value supplied by optim

Any help or suggestions are most welcomed


Use function pelwei() in package lmom.


J. R. M. Hosking

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] qplot themes

2010-01-28 Thread evgeny55

Hi,

I'm having trouble editing the qplot layout.  I'm using the geom=tile
option and I want to do a few things:

1. move the vertical and horizontal gridlines so that they appear on the
edge of each tile (right now they're in the middle) 
2. bring the gridlines to the foreground and change their color

I've been playing around with the opts(...) options but so far can't get any
of them to work correctly. Has anyone done this or have an example

thanks
-- 
View this message in context: 
http://n4.nabble.com/qplot-themes-tp1388708p1388708.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question about reshape

2010-01-28 Thread Dana TUDORASCU
 Hello everyone,
 I have a bit of a problem with reshape function in R.
 I have simulated some normal data, which I have saved in 4 vectors.
 y.1,y.2,y.3,y.4 which I combined a dataset:
datasetcbind(y1,y2,y3,y4). I have also generated some subject id number,
and denoted that by subject.
 So, my dataset looks like this:
subject   y.1y.2   y.3   y.4
 [1,]   1 20.302707 16.9643106 30.291031  7.118748
 [2,]   2  9.942679  9.3674844  7.578465 16.494813
..etc, I have 20 subjects.
 I want to transform this data into long form dataset, but it does not work.
 I am using reshape command, and should be very straight forward...
 Here is what I use:
 long-reshape(dataset, idvar=subject, v.names=response,
varying=list(2:5), direction=long)

Here is what I get:
Error in d[, timevar] - times[1L] : subscript out of bounds

Now, do I get that error because the first column shows me the row number?
 I have been using R for a while, but not a lot for data manipulations.
 Any help would be great! Thank you in advance.
 Dana

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpolation

2010-01-28 Thread stephen sefick
Why not look into the zoo package na.approx? And related functions.

On Thu, Jan 28, 2010 at 11:29 AM, ogbos okike ogbos.ok...@gmail.com wrote:
 Happy New Year.
 I have a data of four columns - year, month, day and count. The last column,
 count, contains some missing data which I have to replace with NA. I tried
 to use the method of interpolation to assign some values to these NA so that
 the resulting plot will be better. I used x to represent date and y to
 represent count.  With the method below, I tried to interpolate on the NA's,
 but that resulted in the warning message below. I went ahead and plotted the
 graph of date against count (plot attached). The diagonal line between May
 and Jul is not looking good and I suspect that it is the result of the
 warning message.
 It would be appreciated if anybody could give me some help.
 Warmest regards
 Ogbos
 y1-approx(x,y,xout = x)$y
 Warning message:
 In approx(x, y, xout = x) : collapsing to unique 'x' values

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Stephen Sefick

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about reshape

2010-01-28 Thread Henrique Dallazuanna
Try this:

ong-reshape(as.data.frame(dataset), idvar=subject,
v.names=response, varying=list(2:5), direction=long)
or
dataset - cbind.data.frame(y1, y2, y3, y4)

On Thu, Jan 28, 2010 at 3:07 PM, Dana TUDORASCU dana...@gmail.com wrote:
  Hello everyone,
  I have a bit of a problem with reshape function in R.
  I have simulated some normal data, which I have saved in 4 vectors.
  y.1,y.2,y.3,y.4 which I combined a dataset:
 datasetcbind(y1,y2,y3,y4). I have also generated some subject id number,
 and denoted that by subject.
  So, my dataset looks like this:
    subject       y.1        y.2       y.3       y.4
  [1,]       1 20.302707 16.9643106 30.291031  7.118748
  [2,]       2  9.942679  9.3674844  7.578465 16.494813
 ..etc, I have 20 subjects.
  I want to transform this data into long form dataset, but it does not work.
  I am using reshape command, and should be very straight forward...
  Here is what I use:
  long-reshape(dataset, idvar=subject, v.names=response,
 varying=list(2:5), direction=long)

 Here is what I get:
 Error in d[, timevar] - times[1L] : subscript out of bounds

 Now, do I get that error because the first column shows me the row number?
  I have been using R for a while, but not a lot for data manipulations.
  Any help would be great! Thank you in advance.
  Dana

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpolation

2010-01-28 Thread Ravi Varadhan
The warning message simply indicates that you have more than one data point
with the same x value.  So, `approx' collapses over the dulicate x values
by averaging the corresponding y values. I am not sure if this is your
problem - it doesn't seem like it.  It is doing what seems reasonable for a
linear interpolation.  If you have some idea of how the interpolation should
look like, you may fit a model to the data and impute based on the model.  

Ravi.

---

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvarad...@jhmi.edu

Webpage:
http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
tml

 





-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of ogbos okike
Sent: Thursday, January 28, 2010 12:30 PM
To: r-help@r-project.org
Subject: [R] Interpolation

Happy New Year.
I have a data of four columns - year, month, day and count. The last column,
count, contains some missing data which I have to replace with NA. I tried
to use the method of interpolation to assign some values to these NA so that
the resulting plot will be better. I used x to represent date and y to
represent count.  With the method below, I tried to interpolate on the NA's,
but that resulted in the warning message below. I went ahead and plotted the
graph of date against count (plot attached). The diagonal line between May
and Jul is not looking good and I suspect that it is the result of the
warning message.
It would be appreciated if anybody could give me some help.
Warmest regards
Ogbos
 y1-approx(x,y,xout = x)$y
Warning message:
In approx(x, y, xout = x) : collapsing to unique 'x' values

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Matthew Dowle
I'm talking about ease of use to.  The first line of the Details section in 
?[.data.table says :
   Builds on base R functionality to reduce 2 types of time :
   1. programming time (easier to write, read, debug and maintain)
   2. compute time

Once again, I am merely saying that the user has choices, and the best 
choice (and there are many choices including plyr, and lots of other great 
packages and base methods) depends on the task and the real goal.   This 
choice is not restricted to compute time only, as you seem to suggest.  In 
fact I listed programming time first (i.e ease of use).

To answer your points :

This is the SQL code you posted and I used in the comparison. Notice its 
quite long,  repeats the text var1,var2,var3 4 times, contains two 
'select's and a 'using'.
 system.time(sqldf(select var1, var2, var3, dt from a, (select var1, var2, 
 var3, min(dt) mindt from a group by var1, var2, var3) using(var1, var2, 
 var3) where dt - mindt  7))
   user  system elapsed
 103.132.17  106.23

Isolating the series of operations you described :
 system.time(sqldf(select * from a))
   user  system elapsed
  39.000.63   39.62

So thats roughly 40% of the time. Whats happening in the remaining 66 secs?

Heres a repeat of the equivalent in data.table :

 system.time({adt-data.table(a)})
   user  system elapsed
   0.900.131.03
 system.time(adt[ , list(dt=dt[dt-min(dt)7]) , by=var1,var2,var3]) 
 #  is that so hard to use compared to the SQL above ?
   user  system elapsed
   3.920.784.71

I looked at the news section, but I didn't find the benchmarks quickly or 
easily.  The links I saw took me to the FAQs.



Gabor Grothendieck ggrothendi...@gmail.com wrote in message 
news:971536df1001280855i1d5f7c03v46f7a3e58ff93...@mail.gmail.com...
I think one would only be concerned about such internals if one were
primarily interested in performance; otherwise, one would be more
interested in ease of specification and part of that ease is having it
independent of implementation and separating implementation from
specification activities.  An example of separation of specification
and implementation is that by simply specifying a disk-based database
rather than an in-memory database SQL can perform queries that take
more space than memory.  The query itself need not be modified.

I think the viewpoint you are discussing is primarily one of
performance whereas the viewpoint I was discussing is primarily ease
of use and that accounts for the difference.

I believe your performance comparison is comparing a sequence of
operations that include building a database, transferring data to it,
performing the operation, reading it back in and destroying the
database to an internal manipulation.  I would expect the internal
manipulation, particular one done primarily in C code as is the case
with data.table, to be faster although some benchmarks of the database
approach found that it compared surprisingly well to straight R code
-- some users of sqldf found that for an 8000 row data frame sqldf
actually ran faster than aggregate and also faster than tapply.  The
News section on the sqldf home page provides links to their
benchmarks.  Thus if R is fast enough then its likely that the
database approach is fast enough too since its even faster.

On Thu, Jan 28, 2010 at 8:52 AM, Matthew Dowle mdo...@mdowle.plus.com 
wrote:
 Are you claiming that SQL is that utopia? SQL is a row store. It cannot
 give the user the benefits of column store.

 For example, why does SQL take 113 seconds in the example in this thread :
 http://tolstoy.newcastle.edu.au/R/e9/help/10/01/1872.html
 but data.table takes 5 seconds to get the same result ? How come the high
 level language SQL doesn't appear to hide the user from this detail ?

 If you are just describing utopia, then of course I agree. It would be
 great to have a language which hid us from this. In the meantime the user
 has choices, and the best choice depends on the task and the real goal.

 Gabor Grothendieck ggrothendi...@gmail.com wrote in message
 news:971536df1001280428p345f8ff4v5f3a80c13f96d...@mail.gmail.com...
 Its only important internally. Externally its undesirable that the
 user have to get involved in it. The idea of making software easy to
 write and use is to hide the implementation and focus on the problem.
 That is why we use high level languages, object orientation, etc.

 On Thu, Jan 28, 2010 at 4:37 AM, Matthew Dowle mdo...@mdowle.plus.com
 wrote:
 How it represents data internally is very important, depending on the 
 real
 goal :
 http://en.wikipedia.org/wiki/Column-oriented_DBMS


 Gabor Grothendieck ggrothendi...@gmail.com wrote in message
 news:971536df1001271710o4ea62333l7f1230b860114...@mail.gmail.com...
 How it represents data internally should not be important as long as
 you can do what you want. SQL is declarative so you just specify what
 you want rather than how to get it and invisibly to the user it
 automatically draws 

Re: [R] weighted least squares vs linear regression

2010-01-28 Thread DispersionMap

sorry, i ommited some important information.

this is a documentation question!

i meant to ask how to find out how R calculates the standard error and how
it differs between the two models 
-- 
View this message in context: 
http://n4.nabble.com/weighted-least-squares-vs-linear-regression-tp1387957p1393060.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grid.image(), pckg grid

2010-01-28 Thread Paul Murrell

Hi


Markus Loecher wrote:
 While I am very happy with and awed by the grid package and its basic
 plotting primitives such as grid.points, grid.lines, etc, I was wondering
 whether the equivalent of a grid.image() function exists ?


No.  But a simple implementation based on grid.rect() is not too hard 
(e.g., see 
http://www.stat.auckland.ac.nz/~paul/RGraphics/interactgrid-imagefun.R)


Also, the next version of R will include a grid.raster() function, which 
will provide another way to draw a matrix of colour values.


Paul


 Any pointer would be helpful.

 Thanks !

 Markus

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.

--
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
p...@stat.auckland.ac.nz
http://www.stat.auckland.ac.nz/~paul/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Gabor Grothendieck
Regarding the explanation of where the time goes it might be parsing
the statement or the development of the query plan.  The SQL statement
for the more complex query is obviously much longer and its generated
query plan involves 95 lines of byte code vs 19 lines of generated
code for the simpler query.

On Thu, Jan 28, 2010 at 2:02 PM, Matthew Dowle mdo...@mdowle.plus.com wrote:
 I'm talking about ease of use to.  The first line of the Details section in
 ?[.data.table says :
   Builds on base R functionality to reduce 2 types of time :
       1. programming time (easier to write, read, debug and maintain)
       2. compute time

 Once again, I am merely saying that the user has choices, and the best
 choice (and there are many choices including plyr, and lots of other great
 packages and base methods) depends on the task and the real goal.   This
 choice is not restricted to compute time only, as you seem to suggest.  In
 fact I listed programming time first (i.e ease of use).

 To answer your points :

 This is the SQL code you posted and I used in the comparison. Notice its
 quite long,  repeats the text var1,var2,var3 4 times, contains two
 'select's and a 'using'.
 system.time(sqldf(select var1, var2, var3, dt from a, (select var1, var2,
 var3, min(dt) mindt from a group by var1, var2, var3) using(var1, var2,
 var3) where dt - mindt  7))
   user  system elapsed
  103.13    2.17  106.23

 Isolating the series of operations you described :
 system.time(sqldf(select * from a))
   user  system elapsed
  39.00    0.63   39.62

 So thats roughly 40% of the time. Whats happening in the remaining 66 secs?

 Heres a repeat of the equivalent in data.table :

 system.time({adt-data.table(a)})
   user  system elapsed
   0.90    0.13    1.03
 system.time(adt[ , list(dt=dt[dt-min(dt)7]) , by=var1,var2,var3])
 #  is that so hard to use compared to the SQL above ?
   user  system elapsed
   3.92    0.78    4.71

 I looked at the news section, but I didn't find the benchmarks quickly or
 easily.  The links I saw took me to the FAQs.



 Gabor Grothendieck ggrothendi...@gmail.com wrote in message
 news:971536df1001280855i1d5f7c03v46f7a3e58ff93...@mail.gmail.com...
 I think one would only be concerned about such internals if one were
 primarily interested in performance; otherwise, one would be more
 interested in ease of specification and part of that ease is having it
 independent of implementation and separating implementation from
 specification activities.  An example of separation of specification
 and implementation is that by simply specifying a disk-based database
 rather than an in-memory database SQL can perform queries that take
 more space than memory.  The query itself need not be modified.

 I think the viewpoint you are discussing is primarily one of
 performance whereas the viewpoint I was discussing is primarily ease
 of use and that accounts for the difference.

 I believe your performance comparison is comparing a sequence of
 operations that include building a database, transferring data to it,
 performing the operation, reading it back in and destroying the
 database to an internal manipulation.  I would expect the internal
 manipulation, particular one done primarily in C code as is the case
 with data.table, to be faster although some benchmarks of the database
 approach found that it compared surprisingly well to straight R code
 -- some users of sqldf found that for an 8000 row data frame sqldf
 actually ran faster than aggregate and also faster than tapply.  The
 News section on the sqldf home page provides links to their
 benchmarks.  Thus if R is fast enough then its likely that the
 database approach is fast enough too since its even faster.

 On Thu, Jan 28, 2010 at 8:52 AM, Matthew Dowle mdo...@mdowle.plus.com
 wrote:
 Are you claiming that SQL is that utopia? SQL is a row store. It cannot
 give the user the benefits of column store.

 For example, why does SQL take 113 seconds in the example in this thread :
 http://tolstoy.newcastle.edu.au/R/e9/help/10/01/1872.html
 but data.table takes 5 seconds to get the same result ? How come the high
 level language SQL doesn't appear to hide the user from this detail ?

 If you are just describing utopia, then of course I agree. It would be
 great to have a language which hid us from this. In the meantime the user
 has choices, and the best choice depends on the task and the real goal.

 Gabor Grothendieck ggrothendi...@gmail.com wrote in message
 news:971536df1001280428p345f8ff4v5f3a80c13f96d...@mail.gmail.com...
 Its only important internally. Externally its undesirable that the
 user have to get involved in it. The idea of making software easy to
 write and use is to hide the implementation and focus on the problem.
 That is why we use high level languages, object orientation, etc.

 On Thu, Jan 28, 2010 at 4:37 AM, Matthew Dowle mdo...@mdowle.plus.com
 wrote:
 How it represents data internally is very important, depending on the
 real
 goal 

Re: [R] Constrained vector permutation

2010-01-28 Thread Andrew Rominger
Hi Jason,

Thanks for you suggestions, I think that's pretty close to what I'd need.
The only glitch is that I'd be working with a vector of ~30 elements, so
permutations(...) would take quite a long time.  I only need one permutation
per vector (the whole routine will be within a loop that generates
pseudo-random vectors that could potentially conform to the constraints).

In light of that, do you think I'd be better off doing something like:
v.permutations - replicate(1,sample(v,length(v),rep=FALSE))   # instead
of permutations()
results - apply(v.permutations,2,function(x){all(x =
f(x[1],length(x)-1))})   # function f(...) would be like your f

It wouldn't be guaranteed to produce any usable permutation, but it seems
like it would be much faster and so could be repeated until an acceptable
vector is found.  What do you think?

Thanks--
Andy


On Thu, Jan 28, 2010 at 6:15 AM, Jason Smith devja...@gmail.com wrote:

 I just realized I read through your email too quickly and my script does
 not actually address the constraint on each permutation, sorry about that.

 You should be able to use the permutations function to generate the vector
 permutations however.

 Jason

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using functions with multiple arguments in the apply family

2010-01-28 Thread Peter Ehlers

chipmaney wrote:
typically, the apply family wants you to use vectors to run functions on. 
However, I have a function, kruskal.test, that requires 2 arguments.


kruskal.test(Herb.df$Score,Herb.df$Year)

This easily computes the KW ANOVA statistic for any difference across
years

However, my data has multiple sites on which KW needs to be run...

here's the data:

Herb.df-
data.frame(Score=rep(c(2,4,6,6,6,5,7,8,6,9),2),Year=rep(c(rep(1,5),rep(2,5)),2),Site=c(rep(3,10),rep(4,10)))

However, if I try this:

 tapply(Herb.df,Herb.df$Site,function(.data)
kruskal.test(.data$Indicator_Rating,.data$Year))



Error in tapply(Herb.df, Herb.df$ID, function(.data)
kruskal.test(.data$Indicator_Rating,  : 
  arguments must have same length



How can I vectorize the kruskal.test() for all sites using tapply() in lieu
of a loop?


Your example data makes little sense; you have precisely the
same data for both sites and you have only two sites (why do
kruskal.test on two sites?). Finally, you need to decide what
your response variable is: 'Score' or 'Indicator_Rating'.

So here's some made-up data and the use of by() to apply
the test to each site:

dat - data.frame(y = rnorm(60), yr=gl(4,5,60), st=gl(3,20))
with(dat, by(dat, st, function(x) kruskal.test(y~yr, data=x)))

See the last example in ?by.

 -Peter Ehlers








--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting significant predictors from ANOVA result

2010-01-28 Thread Ista Zahn
Hi Ram,
As others have pointed out, writing the code is the least of your
problems. In case this isn't sinking in, try the following exercise:

set.seed(10)
P - vector()
DF - as.data.frame(matrix(rep(NA, 10), nrow=100))
names(DF) - c(paste(x,1:999, sep=), y)

for(i in 1:1000) {
  DF[,i] - rnorm(100)
}

for(i in 1:999) {
  P[i] - summary(lm(DF$y ~ DF[,i]))$coefficients[2,4]
}

which(P  .05)

Notice that the variables in the data set DF are random numbers. The
fact that 53 of them are 'significantly' correlated with y at p  .05
doesn't change that. So in this example, those 53 significant
predictors are meaningless. And your actual problem is even worse than
this example, because you're running way more than 999 models.

As has already been suggested, it's time to consult a statistician.

-Ista

On Thu, Jan 28, 2010 at 3:39 AM, ram basnet basnet...@yahoo.com wrote:
 Dear Sir,

 Thanks for your message. My problem is in writing codes. I did ANOVA for 
 75000 response variables (let's say Y) with 243 predictors (let's say 
 X-matrix) one by one with for loop in R. I stored the p-values of all 
 predictors, however, i have very huge file because i have pvalues of 243 
 predictors for all 75000 Y-variables.
 Now, i want to find some codes that autamatically select only significant 
 X-predictors from the whole list. If you have ideas on that, it will be great 
 help.
 Thanks in advances

 Sincerely,
 Ram

 --- On Wed, 1/27/10, Bert Gunter gunter.ber...@gene.com wrote:


 From: Bert Gunter gunter.ber...@gene.com
 Subject: RE: [R] selecting significant predictors from ANOVA result
 To: 'ram basnet' basnet...@yahoo.com, 'R help' r-help@r-project.org
 Date: Wednesday, January 27, 2010, 7:56 AM


 Ram:

 You do not say how many cases (rows in your dataset) you have, but I suspect
 it may be small (a few hundred, say).

 In any case, what you describe is probably just a complicated way to
 generate random numbers -- it is **highly** unlikely that any meaningful,
 replicable scientific results would result from your proposed approach.

 Not surprising -- this appears to be a very difficult data analysis issue.
 It is obvious that you have only a minimal statistical background, so I
 would strongly recommend that you find a competent local statistician to
 help you with your work. Remote help from this list is wholly inadequate.

 Bert Gunter
 Genentech Nonclinical Statistics



 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of ram basnet
 Sent: Wednesday, January 27, 2010 2:52 AM
 To: R help
 Subject: [R] selecting significant predictors from ANOVA result

 Dear all,

 I did ANOVA for many response variables (Var1, Var2, Var75000), and i
 got the result of p-value like below. Now, I want to select those
 predictors, which have pvalue less than or equal to 0.05 for each response
 variable. For example, X1, X2, X3, X4, X5 and X6 in case of Var1, and
 similarly, X1, X2...X5 in case of Var2, only X1 in case of Var3 and none
 of the predictors in case of Var4.







 predictors
 Var1
 Var2
 Var3
 Var4

 X1
 0.5
 0.001
 0.05
 0.36

 X2
 0.0001
 0.001
 0.09
 0.37

 X3
 0.0002
 0.005
 0.13
 0.38

 X4
 0.0003
 0.01
 0.17
 0.39

 X5
 0.01
 0.05
 0.21
 0.4

 X6
 0.05
 0.0455
 0.25
 0.41

 X7
 0.038063
 0.0562
 0.29
 0.42

 X8
 0.04605
 0.0669
 0.33
 0.43

 X9
 0.054038
 0.0776
 0.37
 0.44

 X10
 0.062025
 0.0883
 0.41
 0.45

 I have very large data sets (# of response variables = ~75,000). So, i need
 some kind of automated procedure. But i have no ideas.
 If i got help from some body, it will be great for me.

 Thanks in advance.

 Sincerely,

 Ram Kumar Basnet,
 Ph. D student
 Wageningen University,
 The Netherlands.





     [[alternative HTML version deleted]]






        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] navigation panel with help

2010-01-28 Thread Edwin Sun

All,

I installed the lastest version of R 2.10.1. On the help page for a specific
function, it turns out that the vertical navigation panel on the left does
not appear anymore. For example, 

?lm

The help page from this command is a page without navigation panel (which I
prefer to use). I notice there is an index link at the bottom of the page.
By the way, I did not make any change on my browser.

Is this a change for this version? Thank you for your help.

Edwin Sun
-- 
View this message in context: 
http://n4.nabble.com/navigation-panel-with-help-tp1395663p1395663.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about reshape

2010-01-28 Thread Dana TUDORASCU
Thank you very much everybody. That worked.
 Dana


On Thu, Jan 28, 2010 at 12:23 PM, Henrique Dallazuanna www...@gmail.comwrote:

 Try this:

 ong-reshape(as.data.frame(dataset), idvar=subject,
 v.names=response, varying=list(2:5), direction=long)
 or
 dataset - cbind.data.frame(y1, y2, y3, y4)

 On Thu, Jan 28, 2010 at 3:07 PM, Dana TUDORASCU dana...@gmail.com wrote:
   Hello everyone,
   I have a bit of a problem with reshape function in R.
   I have simulated some normal data, which I have saved in 4 vectors.
   y.1,y.2,y.3,y.4 which I combined a dataset:
  datasetcbind(y1,y2,y3,y4). I have also generated some subject id number,
  and denoted that by subject.
   So, my dataset looks like this:
 subject   y.1y.2   y.3   y.4
   [1,]   1 20.302707 16.9643106 30.291031  7.118748
   [2,]   2  9.942679  9.3674844  7.578465 16.494813
  ..etc, I have 20 subjects.
   I want to transform this data into long form dataset, but it does not
 work.
   I am using reshape command, and should be very straight forward...
   Here is what I use:
   long-reshape(dataset, idvar=subject, v.names=response,
  varying=list(2:5), direction=long)
 
  Here is what I get:
  Error in d[, timevar] - times[1L] : subscript out of bounds
 
  Now, do I get that error because the first column shows me the row
 number?
   I have been using R for a while, but not a lot for data manipulations.
   Any help would be great! Thank you in advance.
   Dana
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O




--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] navigation panel with help

2010-01-28 Thread Duncan Murdoch

On 28/01/2010 3:15 PM, Edwin Sun wrote:

All,

I installed the lastest version of R 2.10.1. On the help page for a specific
function, it turns out that the vertical navigation panel on the left does
not appear anymore. For example, 


?lm

The help page from this command is a page without navigation panel (which I
prefer to use). I notice there is an index link at the bottom of the page.
By the way, I did not make any change on my browser.

Is this a change for this version? Thank you for your help.


Yes, we have dropped support for CHM help, which had the navigation 
pane.  The default display is now HTML help. 


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] navigation panel with help

2010-01-28 Thread Changyou Sun
Duncan,

Thank you for your quick reply. Do we users have any options to change
that? I personally become addicted to the navigation panel and feel it
is kind of table of contents.

Regards,


Edwin Sun


-Original Message-
From: Duncan Murdoch [mailto:murd...@stats.uwo.ca] 
Sent: Thursday, January 28, 2010 2:19 PM
To: Changyou Sun
Cc: r-help@r-project.org
Subject: Re: [R] navigation panel with help

On 28/01/2010 3:15 PM, Edwin Sun wrote:
 All,

 I installed the lastest version of R 2.10.1. On the help page for a
specific
 function, it turns out that the vertical navigation panel on the left
does
 not appear anymore. For example, 

 ?lm

 The help page from this command is a page without navigation panel
(which I
 prefer to use). I notice there is an index link at the bottom of the
page.
 By the way, I did not make any change on my browser.

 Is this a change for this version? Thank you for your help.

Yes, we have dropped support for CHM help, which had the navigation 
pane.  The default display is now HTML help. 

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] navigation panel with help

2010-01-28 Thread Duncan Murdoch

On 28/01/2010 3:22 PM, Changyou Sun wrote:

Duncan,

Thank you for your quick reply. Do we users have any options to change
that? I personally become addicted to the navigation panel and feel it
is kind of table of contents.
  


You could downgrade to 2.9.2, but you'd lose all the other new stuff.  
You could contribute code to display the table of contents in the HTML 
version.  I suppose you could resurrect the old code and build your own 
CHM help in 2.10.1, but I would guess adding the table of contents to 
the HTML help would be easier.


Duncan Murdoch

Regards,


Edwin Sun


-Original Message-
From: Duncan Murdoch [mailto:murd...@stats.uwo.ca] 
Sent: Thursday, January 28, 2010 2:19 PM

To: Changyou Sun
Cc: r-help@r-project.org
Subject: Re: [R] navigation panel with help

On 28/01/2010 3:15 PM, Edwin Sun wrote:
 All,

 I installed the lastest version of R 2.10.1. On the help page for a
specific
 function, it turns out that the vertical navigation panel on the left
does
 not appear anymore. For example, 


 ?lm

 The help page from this command is a page without navigation panel
(which I
 prefer to use). I notice there is an index link at the bottom of the
page.
 By the way, I did not make any change on my browser.

 Is this a change for this version? Thank you for your help.

Yes, we have dropped support for CHM help, which had the navigation 
pane.  The default display is now HTML help. 


Duncan Murdoch



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrained vector permutation

2010-01-28 Thread Jason Smith
 It wouldn't be guaranteed to produce any usable permutation, but it seems
 like it would be much faster and so could be repeated until an acceptable
 vector is found.  What do you think?

 Thanks--
 Andy


I think I am not understanding what your ultimate goal is so I'm not
sure I can give you appropriate advice.  Are you looking for a single
valid permutation or all of them?

Since that constraint sets a ceiling on each subsequent value, it
seems like you could solve this problem more easily and quickly by
using a search strategy instead of random sampling or generating all
permutations then testing.  The constraint will help prune the search
space so you only generate valid permutations.  Once you are examining
a particular element you can determine which of the additional
elements would be valid, so only consider those.

--jason

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] color palette for points, lines, text / interactive Rcolorpicker?

2010-01-28 Thread Greg Snow
I don't know of any existing palettes that meet your conditions, but here are a 
couple of options for interactive exploration of colorsets (this is quick and 
dirty, there are probably some better orderings, base colors, etc.):

colpicker - function( cols=colors() ) {
n - length(cols)
nr - ceiling(sqrt(n))
nc - ceiling( n/nr )

imat - matrix(c(seq_along(cols), rep(NA, nr*nc-n) ),
ncol=nc, nrow=nr)

image( seq.int(nr),seq.int(nc), imat, col=cols, xlab='', ylab='' )
xy - locator()

cols[ imat[ cbind( round(xy$x), round(xy$y) ) ] ]
}

colpicker()


## another approach

library(TeachingDemos)

cols - colors()
n - length(cols)
par(xpd=TRUE)

# next line only works on windows
HWidentify( (1:n) %% 26, (1:n) %/% 26, label=cols, col=cols, pch=15, cex=2 )

# next line works on all platforms with tcltk
HTKidentify( (1:n) %% 26, (1:n) %/% 26, label=cols, col=cols, pch=15, cex=2 )


# reorder
cols.rgb - col2rgb( cols )
d - dist(t(cols.rgb))
clst - hclust(d)

colpicker(cols[clst$order])
HWidentify( (1:n) %% 26, (1:n) %/% 26, label=cols[clst$order], 
col=cols[clst$order], pch=15, cex=2 )
## or HTKidentify

cols.hsv - rgb2hsv( cols.rgb )
d2 - dist(t(cols.hsv))
clst2 - hclust(d2)

HWidentify( (1:n) %% 26, (1:n) %/% 26, label=cols[clst2$order], 
col=cols[clst2$order], pch=15, cex=2 )
## or HTKidentify

Hope this helps,


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Michael Friendly
 Sent: Thursday, January 28, 2010 8:38 AM
 To: R-Help
 Subject: [R] color palette for points, lines, text / interactive
 Rcolorpicker?
 
 I'm looking for a scheme to generate a default color palette for
 plotting points, lines and text (on a white or transparent background)
 with from 2 to say 9 colors with the following constraints:
 - red is reserved for another purpose
 - colors should be highly distinct
 - avoid light colors (like yellows)
 
 In RColorBrewer, most of the schemes are designed for area fill rather
 than points and lines. The closest I can find
 for these needs is the Dark2 palette, e.g.,
 
 library(RColorBrewer)
 display.brewer.pal(7,Dark2)
 
 I'm wondering if there is something else I can use.
 
 On a related note, I wonder if there is something like an interactive
 color picker for R.  For example,
 http://research.stowers-institute.org/efg/R/Color/Chart/
 displays several charts of all R colors.  I'd like to find something
 that displays such a chart and uses
 identify() to select a set of tiles, whose colors() indices are
 returned
 by the function.
 
 -Michael
 
 --
 Michael Friendly Email: friendly AT yorku DOT ca
 Professor, Psychology Dept.
 York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
 4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html
 Toronto, ONT  M3J 1P3 CANADA
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data frame of different sized lists in a function call

2010-01-28 Thread Jonathan Greenberg
I'm hoping to get some best practice feedback for constructing a 
function call which takes an undefined set of DIFFERENT length vectors 
-- e.g. say we have two lists:


list1=c(1:10)
list2=c(2:4)

lists = data.frame(list1,list2) coerces those two to be the same length 
(recycling list2 to fill in the missing rows) -- what is a quick way of 
having each of those lists retain their original lengths?  my function 
ultimately should look like:



myfunction = function(lists) {

...

}

I'm hoping this can be done with a single line, so the user doesn't have 
to pre-construct the data.frame before running the function, if at all 
possible.


Thanks!

--j

--

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] weighted least squares vs linear regression

2010-01-28 Thread David Winsemius


On Jan 28, 2010, at 2:14 PM, DispersionMap wrote:



sorry, i ommited some important information.

this is a documentation question!

i meant to ask how to find out how R calculates the standard error  
and how

it differs between the two models


Luke,  Use the Code!.



--
View this message in context: 
http://n4.nabble.com/weighted-least-squares-vs-linear-regression-tp1387957p1393060.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data frame of different sized lists in a function call

2010-01-28 Thread David Winsemius


On Jan 28, 2010, at 4:03 PM, Jonathan Greenberg wrote:


list1=c(1:10)  # neither of which really are lists
list2=c(2:4)

lists = list(list1,list2)  $ a list of two vectors.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data frame of different sized lists in a function call

2010-01-28 Thread Greg Snow
If you understand the differences between R lists and R vectors then this 
should be easy:

 vec1 - 1:10
 vec2 - 2:4

 myListOfVectors - list( vec1, vec2 )

Now you can pass the single list of 2 different sized vectors to your function. 
 For more details on working with lists (and vectors and functions and ... ) 
read An Introduction to R which is worth a lot more than you pay for it.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Jonathan Greenberg
 Sent: Thursday, January 28, 2010 2:03 PM
 To: r-help
 Subject: [R] Data frame of different sized lists in a function call
 
 I'm hoping to get some best practice feedback for constructing a
 function call which takes an undefined set of DIFFERENT length vectors
 -- e.g. say we have two lists:
 
 list1=c(1:10)
 list2=c(2:4)
 
 lists = data.frame(list1,list2) coerces those two to be the same length
 (recycling list2 to fill in the missing rows) -- what is a quick way of
 having each of those lists retain their original lengths?  my function
 ultimately should look like:
 
 
 myfunction = function(lists) {
 
 ...
 
 }
 
 I'm hoping this can be done with a single line, so the user doesn't
 have
 to pre-construct the data.frame before running the function, if at all
 possible.
 
 Thanks!
 
 --j
 
 --
 
 Jonathan A. Greenberg, PhD
 Postdoctoral Scholar
 Center for Spatial Technologies and Remote Sensing (CSTARS)
 University of California, Davis
 One Shields Avenue
 The Barn, Room 250N
 Davis, CA 95616
 Phone: 415-763-5476
 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error on using lag function

2010-01-28 Thread anna

Hello everyone, I have a vector P and I want to replace each of its missing
values by its next element, for example:
P[i] = NA -- P[i] = P[i+1]
To do this I am using the replace() and lag() functions like this:
P - replace(as.ts(P),is.na(as.ts(P)),as.ts(lag(P,1)))
but here is the error that I get:
Warning message:
In NextMethod([-) :
  number of items to replace is not a multiple of replacement length
I have tried to reduce the dimension of P on the first two elements of the
replace() function by one but it wouldn't work either. Any idea?

-
Anna Lippel
-- 
View this message in context: 
http://n4.nabble.com/Error-on-using-lag-function-tp1399935p1399935.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] hist - unevenly spaced bars

2010-01-28 Thread Worik R
I am sure this is trivial, but I cannot solve it.

I make a histogram.  There are 5 categories 1,...,5 and 80 values and
the histogram does not evenly space the bars.

Bars 1 and 2 have no space between them and the rest are evenly spaced.

How can I get all bars evenly spaced?

The code:

 Q5
 [1] 4 4 4 5 2 4 5 3 4 5 3 4 3 5 2 4 5 5
4
[20] 3 1 4 5 5 4 3 1 5 4 3 5 3 3 5 5 5 5
4
[39] 4 5 1 1 5 4 4 4 1 4 4 5 5 2 4 5 4 3
4
[58] 5 1 2 1 5 4 5 5 1 4 1 4 5 1 4 5 5 4
5
[77] 5 4 4 3
 hist(as.numeric(Q5),  density=30, main=strwrap(S5, width=60), axes=FALSE)
 axis(side=1, labels=c(Disagree, 2, Not Sure, 4, Strongly Agree),
at=c(1, 2, 3, 4, 5))
 axis(side=2)

cheers
Worik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optimization challenge

2010-01-28 Thread Greg Snow
Well, Albyn Jones gave a great solution to my challenge that found the best 
reading schedule.

My original thought was that doing an exhaustive search would take too much 
time, but Albyn showed that there are ways to do it efficiently.

My approach (as mentioned before) was to use optim with method SANN.  I treated 
it like a balls and urns problem.  Since I wanted to read 239 chapters in 128 
days it would be like putting 239 balls in 128 urns.  I started with 1 ball in 
each urn, then put the remain balls into urns to get a starting state (tried a 
couple different starting situations, 1 additional ball in each of the 1st 
urns, all the remaining balls in the first or last urn).  Then my update step 
was just to take a single ball from one of the urns with 2 or more balls and 
move it to another urn at random.

On tricky thing with this method is that moving one ball could change things 
quite a bit because one setup could have the longest chapters being read by 
themselves, but moving one ball would result in a long chapter now being 
grouped with others.  My first update function just moved the ball to a random 
urn, then I tried moving the ball only one urn forward or backwards (this 
seemed to work better, but probably needed a longer run time).  Finally the 
best method that I found chose the ball to move proportional to the lengths of 
the days reading and chose the urn to put it in with highest probability for 
days with the shortest readings.

I thought my answers were pretty good (they looked reasonable), but Albyn's 
solution had half the variance as my best result.  Below is the code that I 
used for my best results in case anyone is interested.  I would also be 
interested if anyone could find a way to improve on what I did to get better 
results (help me learn SANN better, the arguments I used came mostly from trial 
and error).

days - seq( as.Date('1/24/10',%m/%d/%y), as.Date('5/31/10','%m/%d/%y'), by=1)

sq2.2 - rep(1, length(days))
sq2.2[ length(days) ] - nrow(bom3) -sum(sq2.2) + 1

genseq4 - function(sq) {
w - rep(1:length(days), sq)
tmp - tapply( bom3$Verses, w, sum )

ww - which(sq1)

dwn - if (length(ww)  1) {
sample( ww, 1, prob= tmp[ww] )
} else {
ww
}
up  - sample( seq_along(sq)[-dwn], 1, prob=max(tmp) - tmp[-dwn] )

sq[dwn] - sq[dwn] - 1
sq[up]  - sq[up]  + 1

sq
}

distance - function(sq) {
w - rep(1:length(days), sq)
tmp - tapply( bom3$Verses, w, sum )
var(tmp)
}

res - optim(sq2.2, distance, genseq4, method=SANN,
control=list(maxit=3, temp=50, trace=TRUE ))



-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error on using lag function

2010-01-28 Thread Peter Ehlers

Does this help:

library(zoo)
na.locf(P, fromLast=TRUE)

You'll have to decide what to do if the last value is NA.

 -Peter Ehlers

anna wrote:

Hello everyone, I have a vector P and I want to replace each of its missing
values by its next element, for example:
P[i] = NA -- P[i] = P[i+1]
To do this I am using the replace() and lag() functions like this:
P - replace(as.ts(P),is.na(as.ts(P)),as.ts(lag(P,1)))
but here is the error that I get:
Warning message:
In NextMethod([-) :
  number of items to replace is not a multiple of replacement length
I have tried to reduce the dimension of P on the first two elements of the
replace() function by one but it wouldn't work either. Any idea?

-
Anna Lippel


--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hist - unevenly spaced bars

2010-01-28 Thread Peter Ehlers

Well, your bars are not unevenly spaced; you just have
some zero-count intervals. Time to learn about the
str() function which will tell you what's going on.

zh - hist(your_code)
str(zh)
zh$breaks
zh$counts

You could set breaks with

hist(..., breaks=0:5 + .5)

But a histogram doesn't seem like the right thing to do.
Try barplot:

barplot(table(Q5))

 -Peter Ehlers

Worik R wrote:

I am sure this is trivial, but I cannot solve it.

I make a histogram.  There are 5 categories 1,...,5 and 80 values and
the histogram does not evenly space the bars.

Bars 1 and 2 have no space between them and the rest are evenly spaced.

How can I get all bars evenly spaced?

The code:


Q5

 [1] 4 4 4 5 2 4 5 3 4 5 3 4 3 5 2 4 5 5
4
[20] 3 1 4 5 5 4 3 1 5 4 3 5 3 3 5 5 5 5
4
[39] 4 5 1 1 5 4 4 4 1 4 4 5 5 2 4 5 4 3
4
[58] 5 1 2 1 5 4 5 5 1 4 1 4 5 1 4 5 5 4
5
[77] 5 4 4 3

hist(as.numeric(Q5),  density=30, main=strwrap(S5, width=60), axes=FALSE)
axis(side=1, labels=c(Disagree, 2, Not Sure, 4, Strongly Agree),

at=c(1, 2, 3, 4, 5))

axis(side=2)


cheers
Worik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error on using lag function

2010-01-28 Thread anna

Hi Peter, thank you for helping. The thing is don't want to it replace it
with the last value but with the next value

-
Anna Lippel
-- 
View this message in context: 
http://n4.nabble.com/Error-on-using-lag-function-tp1399935p1401319.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >