[R] Simulation

2011-03-01 Thread bwaxxlo
I tried looking for help but I couldn't locate the exact solution.
I have data that has several variables. I want to do several sample
simulations using only two of the variables (eg: say you have data between
people and properties owned. You only want to check how many in the samples
will come up with bicycles) to estimate probabilities and that sort of
thing.
Now, I can only do a simulation in terms of this code: sample(1:10, size =
15, replace = TRUE).
I do not know how select specific variables only. 
I'll appreciate the help

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Simulation-tp3329173p3329173.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help

2011-03-01 Thread Laura Clasemann

Hi,
 
I've been working with R the past few days in trying to get a proper table set 
up but have not had any luck at all and the manuals I've read have not worked 
for me either. I was wondering if anyone here would be able to help me in 
setting this data up in R both as a table and as a bargraph as I have not been 
able to do it myself. If somebody could help me with doing this and send me an 
overview in a document on how it should look, I would really appreciate it. 
Thank you.
 
Laura
 
 
Diet: Binger-yes:   Binger-No:  Total:
None 24 134 158
Healthy 9 52 61
Unhealthy 23 72 95
Dangerous 12 15 27



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Explained variance for ICA

2011-03-01 Thread Pavel Goldstein
Hello,
I think to use FastICA package for microarray data clusterization,
but one question stops me:  can I know how much variance explain each
component (or all components together) ?
I will be very thankful for the help.

Thanks,

Pavel

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help

2011-03-01 Thread Ivan Calandra

Hi Laura,

Have you read this documentation 
http://cran.r-project.org/doc/manuals/R-data.pdf ? If not, you should. 
Specifically, see read.table() and read.csv().
When you'll have this, you can look for functions that can import xls or 
xlsx spreadsheets directly, but you're not that far yet.


There are also plenty of documentation for beginners that you should 
read, see for example 
http://www.burns-stat.com/pages/Tutor/hints_R_begin.html
This one is always good too 
http://www.burns-stat.com/pages/Tutor/hints_R_begin.html


Concerning your data, it already looks good (except for the :). R can 
work with Excel spreadsheets without problems.


I think that if you read that, you will know how to do barplots too.
If you then have problems using some specific functions or doing some 
specific task, you can ask the list again, with a reproducible example, 
with the code you tried, with the error/warning you got, and with the 
desired output if relevant. For this, read the posting guide if you 
still haven't.


Now good luck with the readings,
Ivan


Le 3/1/2011 05:05, Laura Clasemann a écrit :

Hi,

I've been working with R the past few days in trying to get a proper table set 
up but have not had any luck at all and the manuals I've read have not worked 
for me either. I was wondering if anyone here would be able to help me in 
setting this data up in R both as a table and as a bargraph as I have not been 
able to do it myself. If somebody could help me with doing this and send me an 
overview in a document on how it should look, I would really appreciate it. 
Thank you.

Laura


Diet: Binger-yes:   Binger-No:  Total:
None 24 134 158
Healthy 9 52 61
Unhealthy 23 72 95
Dangerous 12 15 27




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulation

2011-03-01 Thread Ivan Calandra

Well, knowing how your data looks like would definitely help!
Say your data object is called mydata, just paste the output from 
dput(mydata) into the email you want to send to the list.

Ivan

Le 3/1/2011 04:18, bwaxxlo a écrit :

I tried looking for help but I couldn't locate the exact solution.
I have data that has several variables. I want to do several sample
simulations using only two of the variables (eg: say you have data between
people and properties owned. You only want to check how many in the samples
will come up with bicycles) to estimate probabilities and that sort of
thing.
Now, I can only do a simulation in terms of this code: sample(1:10, size =
15, replace = TRUE).
I do not know how select specific variables only.
I'll appreciate the help



--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stuk at another point: simple question

2011-03-01 Thread Dennis Murphy
Hi:

This is *really* ugly, but given the number of variables you have in mind,
it seems rather necessary, at least to me.

# Given an original data frame dataf, find the max ID number:
Mvars - names(dataf)[grep('^M', names(dataf))]
# Pick out the number of variables that start with M:
n - max(as.numeric(sapply(strsplit(Mvars, ''), function(x) unlist(x)[2])))

# Generate all the new variables, similar to those in the transform()
statement to get df2
#  In your example, you showed the new variables for one pair - this extends
to all pairs,
# repeated up to the largest parent number
newvars - c(paste('m', rep(1:max(c(Parent1, Parent2), na.rm = TRUE), each =
n),
rep(c('a', 'b')), rep(c('p1', 'p2'), each = 2), sep
= ''))

# Generate a two column matrix of the M*a and M*b names
v - matrix(names(dataf)[-(1:4)], ncol = 2, byrow = TRUE)

# cbind v with itself, transpose the result and collapse it to a vector
vnames - as.vector(t(cbind(v, v)))

# vnames and newvars should have the same length

# Generate a character string of the generate the set of calls,
# extending the call you used to get the new variables in df2
nvarx - paste(newvars, '=', vnames, '[', rep(c('Parent1', 'Parent2'), each
= 2), ']', sep = '')
nvarx.str - paste(nvarx, collapse = ', ')
# Generate the text string that comprises the call to transform()
txtexp - paste('transform(dataf, ', nvarx.str, ')', sep = '')

# Parse the text string in txtemp and evaluate it to get the extended data
frame df3
df3 - eval(parse(text = txtexp))

# Next, we work on producing the averages. To do this, we first generate a
matrix of names,
# then write a function for each row of the matrix that
#  * produces text strings that select the proper names for each of the
calls in hP1, hP2, t1 and t2
#  * evaluates the parsed strings
#  * produces the average

# Matrix of names for each parent
namesmat - matrix(c(paste('M', 1:n, 'a', sep = ''), paste('M', 1:n, 'b',
sep = ''),
 paste('m', 1:n, 'ap1', sep = ''), paste('m', 1:n,
'bp1', sep = ''),
 paste('m', 1:n, 'ap2', sep = ''), paste('m', 1:n,
'bp2', sep = '')),
 nrow = n)

# Function to apply to each row of the above matrix
findavg - function(x)  {
   # x is an input string of variable names
   # we construct the calls as strings and then evaluate them
   # final output is the average
   x - as.vector(x)
   hP1 - eval(parse(text = 'as.numeric(df3[x[3]] != df3[x[4]])'))
   hP2 - eval(parse(text ='as.numeric(df3[x[5]] != df3[x[6]])'))
t1 - eval(parse(text ='as.numeric(df3[x[1]] != df3[x[3]])'))
t2 - eval(parse(text ='as.numeric(df3[x[2]] != df3[x[5]])'))
 C - (hP1*(t1-0.25)+ hP2 *(t2-0.25))
yv - df3$y
mean(C * yv, na.rm = TRUE)
  }

# Apply it to the matrix of names:
apply(namesmat, 1, findavg)

# For the example data given,
[1] -1.17  6.50 -1.17  2.916667

Please double check this on your example below to make sure it's doing the
right thing - I didn't check whether or not the averages were right.

After all that eval(parse(text = *))ing, I need a shower...I feel dirty :)
If there's a better way, I'd love to see it.

HTH,
Dennis


On Mon, Feb 28, 2011 at 5:00 AM, Umesh Rosyara rosyar...@gmail.com wrote:

  Dear R-community members.

 I am really appreciate R-help group. Dennis has been extrremely helpful to
 solve some of my questions. I am following Dennis recommendation in the
 following email, yet I am stuck at another point (hope this will took me to
 end of this project.

 Ind - c(1:5)
 Parent1 - c(NA,NA,1,1,3)
 Parent2 - c(NA,NA,2,2,4)
 y - c(6,5,8,10,7)
 M1a - c(1,2,1,1,1)
 M1b - c(1,2,2,2,1)
 M2a - c(3,3,1,1,3)
 M2b - c(1,1,3,3,3)
 M3a - c(4,4,4,4,4)
 M3b - c(4,4,1,1,4)
 M4a - c(1,4,4,1,4)
 M4b - c(4,4,4,4,4)

 dataf - data.frame (Ind, Parent1, Parent2, y, M1a, M1b,M2a,M2b,
 M3a,M3b,M4a, M4b) # I have more than 1000 variables pair

 # pair1 (M1a,M1b) pair2 (M2a, M2b), pair3 (M3a, M3b)...

 df2 - transform(dataf,m1ap1 = dataf$M1a[dataf$Parent1],
m1bp1 = dataf$M1b[dataf$Parent1],
m1ap2 = dataf$M1a[dataf$Parent2],
m1bp2 = dataf$M1b[dataf$Parent2])
 # downstream calculations
  hP1 - ifelse(df2$m1ap1==df2$m1bp1,0,1)
  hP2 - ifelse(df2$m1bp2==df2$m1bp2,0,1)
  t1 - ifelse(df2$M1a==df2$m1ap1,1,0)
  t2 - ifelse(df2$M1b==df2$m1ap2,1,0)
  C - (hP1*(t1-0.25)+ hP2 *(t2-0.25))
  yv - df2$y
  Cy - C*yv
  avgCy - mean(Cy, na.rm=T)
  avgCy # I want to store this value to new dataframe with first model i.e.


 How can I loop the process to output the second pair( here M2a, M2b), third
 pair (here M3a, M3b) to all pairs (I have more than 1000)

 Mode1  avgCy
 1   1.75  # from pair M1a and M1b
 2 # from pair M2a and M2b
 3 # from pair M3a and M3b
 4 # from pair M4a and M4b

 to the end of the file

 Thank you in advance

 Umesh R

  --
 *From:* Dennis Murphy [mailto:djmu...@gmail.com]
 

Re: [R] tricky (for me) merging of data...more clarity

2011-03-01 Thread Grant Gillis
Hi Again,

Thanks very much for your response.  It seems my example got rearranged
(transposed?) after I posted it.  Hopefully this example will be more
clear.  I have one file (ex. sheet 1) that will have a column for
individuals (ind) and a column for the date (date).  I would like to merge
this with another file (ex. sheet 2) that has both the 'ind' and date column
as well as associated body condition measurements (BC1 and BC2).

My problem: The body condition values were measured intermittently
throughout an individuals history but for our purposes we would like to
treat them as representative of 15 days before and after measurement and
days outside of this window should have NAs (there are other data associated
with those days).  When I merge these to files, is there a way to write
these body condition values forward and back 15 days?This would give me
something that looks like sheet 3.

Thank you!


Sheet 1


ind date
1 01/02/87
1 02/02/87
1 03/02/87
1 04/02/87
1 05/02/87
1 06/02/87
1 07/02/87
1 08/02/87
1 09/02/87
1 10/02/87
1 11/02/87
1 12/02/87
1 13/02/87
1 14/02/87
1 15/02/87
1 16/02/87
1 17/02/87
1 18/02/87
1 19/02/87
1 20/02/87
1 21/02/87
1 22/02/87
1 23/02/87
1 24/02/87
1 25/02/87
1 26/02/87
1 27/02/87
1 28/02/87
1 01/03/87
1 02/03/87
1 03/03/87
1 04/03/87
1 05/03/87
1 06/03/87
1 07/03/87
1 08/03/87
1 09/03/87
1 10/03/87
1 11/03/87
1 12/03/87
1 13/03/87
1 14/03/87
1 15/03/87
1 16/03/87
1 17/03/87
1 18/03/87
1 19/03/87
1 20/03/87
1 21/03/87
1 22/03/87
1 23/03/87
1 24/03/87

Sheet 2

ind   DateBC1 BC2
1 01/02/87 33 3
1 03/03/87 44 3


Sheet 3
ind date BC1 BC2
1 01/02/87 33 3
1 02/02/87 33 3
1 03/02/87 33 3
1 04/02/87 33 3
1 05/02/87 33 3
1 06/02/87 33 3
1 07/02/87 33 3
1 08/02/87 33 3
1 09/02/87 33 3
1 10/02/87 33 3
1 11/02/87 33 3
1 12/02/87 33 3
1 13/02/87 33 3
1 14/02/87 33 3
1 15/02/87 33 3
1 16/02/87 NA NA
1 17/02/87 NA NA
1 18/02/87 44 3
1 19/02/87 44 3
1 20/02/87 44 3
1 21/02/87 44 3
1 22/02/87 44 3
1 23/02/87 44 3
1 24/02/87 44 3
1 25/02/87 44 3
1 26/02/87 44 3
1 27/02/87 44 3
1 28/02/87 44 3
1 01/03/87 44 3
1 02/03/87 44 3
1 03/03/87 44 3
1 04/03/87 44 3
1 05/03/87 44 3
1 06/03/87 44 3
1 07/03/87 44 3
1 08/03/87 44 3
1 09/03/87 44 3
1 10/03/87 44 3
1 11/03/87 44 3
1 12/03/87 44 3
1 13/03/87 44 3
1 14/03/87 44 3
1 15/03/87 44 3
1 16/03/87 44 3
1 17/03/87 44 3
1 18/03/87 44 3
1 19/03/87 44 3
1 20/03/87 44 3
1 21/03/87 NA NA
1 22/03/87 NA NA
1 23/03/87 NA NA
1 24/03/87 NA NA



On 27 February 2011 20:49, Tal Galili tal.gal...@gmail.com wrote:

 Hi Grant,
 I don't have a solution, but just to be clearer on your situation:

 One row from sheet 2 looks like this:
 BC1 BC2  1 01/02/87 33 3  1 03/03/87 44 3
 ?

 Are you using only the first 6 columns for the data to be replicated, and
 using the other columns as some sort of indicators on when a sequence ends?




 If so, I would suggest asking the group how you might be able to turn sheet
 2 so that it will have as many rows as you needs (which you will then merge
 with sheet 1).

 And for clarity sake, consider using ?dput for your objects.
 Looking at data the way you pasted them (also, without column names) is not
 very easy for the reader (which might reduce your chances of getting help).

 Cheers,
 Tal

 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)

 --





 On Sun, Feb 27, 2011 at 6:41 PM, Grant Gillis grant.j.gil...@gmail.comwrote:

 BC1 BC2  1 01/02/87 33 3  1 03/03/87 44 3




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to Save R library data into xls or dta format

2011-03-01 Thread JoonGi

Thanks in advance.

I'm having a trouble with data saving.

I want to run the same data which is in Ecdat library at different statistic
programs(excel, stata and matlab)

The data I want to use is

library(Ecdat)
data(Housing)

and I want to extract this data our of R as *.dta *.xls formats.
So, my first try was to open the data in R window and drag and paste to
excel or notepad.

BUT, it didn't work.

Do you have any decent skills to extract this library data?

Please, share to me.



-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-to-Save-R-library-data-into-xls-or-dta-format-tp3329489p3329489.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to Save R library data into xls or dta format

2011-03-01 Thread Santosh Srinivas
for excel .. see library(xlsx)

On Tue, Mar 1, 2011 at 2:57 PM, JoonGi joo...@hanmail.net wrote:

 Thanks in advance.

 I'm having a trouble with data saving.

 I want to run the same data which is in Ecdat library at different statistic
 programs(excel, stata and matlab)

 The data I want to use is

 library(Ecdat)
 data(Housing)

 and I want to extract this data our of R as *.dta *.xls formats.
 So, my first try was to open the data in R window and drag and paste to
 excel or notepad.

 BUT, it didn't work.

 Do you have any decent skills to extract this library data?

 Please, share to me.



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-to-Save-R-library-data-into-xls-or-dta-format-tp3329489p3329489.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RWinEdt difficulties

2011-03-01 Thread John Seers
Hello Everyone

I have just upgraded my PC to Windows 7 (64 bit) and I have installed R
2.12.2. R seems to be working fine.

I am having problems getting RWinEdt working with it though.

I have tried installing WinEdt 6.0 and WinEdt 5.5. But both fail with the
same error using R as 64 bit or 32 bit. I install the package using
Administrator rights.


 library(RWinEdt)
Warning message:
In shell(paste(\\, .gW$InstallRoot, \\WinEdt.exe\ -C=\R-WinEdt\
-E=,  :
  'C:\Program Files (x86)\WinEdt Team\WinEdt 6\WinEdt.exe -C=R-WinEdt
-E=C:\Program Files (x86)\WinEdt Team\WinEdt 6\R.ini' execution failed
with error code 1


Does it matter if you are using 64 bit R and the 32 bit WinEdt? (I have
tried 32 bit R).

Does RWinEdt work with WinEdt 6.0?

Can anybody suggest a solution?

Thanks for any help.

Regards

John Seers


**
**

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help

2011-03-01 Thread Martyn Byng
Hi,

It is a bit unclear what it is you are trying to do, as mentioned in
replies by a variety of people previously, if you are just trying to get
your data into R and label rows / columns, then 

tt = matrix(c(24,134,158,9,52,61,23,72,95,12,15,27),ncol=3,byrow=T)
rownames(tt) = c(None, Healthy,Unhealthy,Dangerous)
colnames(tt) = c(Binger-yes:, Binger-no:, Total)

is sufficient, and bar charts can be created using the barplot command,
for example:

barplot(t(tt[,1:2]),legend.text=colnames(tt[,1:2]),xlab=Diet)

If you are trying to do something other than this then you will need to
supply some additional information

Martyn

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Laura Clasemann
Sent: 01 March 2011 04:05
To: r-help@r-project.org
Subject: [R] Help


Hi,
 
I've been working with R the past few days in trying to get a proper
table set up but have not had any luck at all and the manuals I've read
have not worked for me either. I was wondering if anyone here would be
able to help me in setting this data up in R both as a table and as a
bargraph as I have not been able to do it myself. If somebody could help
me with doing this and send me an overview in a document on how it
should look, I would really appreciate it. Thank you.
 
Laura
 
 
Diet: Binger-yes:   Binger-No:
Total:
None 24 134 158
Healthy 9 52 61
Unhealthy 23 72 95
Dangerous 12 15 27



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This e-mail has been scanned for all viruses by Star.\ _...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting attributes of list without the names.

2011-03-01 Thread Duncan Murdoch

On 11-02-28 11:17 PM, Jeroen Ooms wrote:

I am trying to encode arbitrary S3 objects by recursively looping over the
object and all its attributes. However, there is an unfortunate feature of
the attributes() function that is causing trouble. From the manual for
?attributes:

The names of a pairlist are not stored as attributes, but are reported as if
they were (and can be set by the replacement method for attributes).

Now because of this, my program ends up in infinite recursion, because it
will try to encode
attributes(attributes(attributes(attributes(list(foo=123 etc. I can't
remove the 'names' attribute, because this will actually affect the list
structure. And even when I do:

attributes(attributes(obj)[names(attributes(obj)) != names])

This will keep giving me a named list. Is there any way I can get the
attributes() of a list without it reporting the names of a list as
attributes? I.e it should hold that:

atr1- attributes(list(foo=bar));
atr2- attributes(list());
identical(atr1,atr2);


The names of a list (a generic vector) are attributes, just like the 
names of other vectors.  The documentation is talking about pairlists, a 
mostly internal structure, used for example to store parts of 
expressions.  So your premise might be wrong about the cause of the 
recursion...


But assuming you really want to see all attributes except names.  Then 
just write your own version:


nonameattributes - function(obj) {
  result - attributes(obj)
  if (!is.null(result$names))
result$names - NULL

  # This removes the empty names of the result if there were no other
  # attributes.  It's optional, but you said you wanted
  # identical(atr1, atr2)

  if (!length(result))
names(result) - NULL

  result
}

You can make the conditional more complicated, only making the change 
for pairlists, etc., using tests on typeof(obj) or other tests.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to Save R library data into xls or dta format

2011-03-01 Thread Prof Brian Ripley

On Tue, 1 Mar 2011, Santosh Srinivas wrote:


for excel .. see library(xlsx)


(That's for Excel = 2007 only, .xlsx not .xls are requested.)

Simply consult the relevant manual, 'R Data Import/Export': all of 
this is covered there.  There is a very new package XLConnect that is 
only covered in the latest version (post R 2.12.2).




On Tue, Mar 1, 2011 at 2:57 PM, JoonGi joo...@hanmail.net wrote:


Thanks in advance.

I'm having a trouble with data saving.

I want to run the same data which is in Ecdat library at different statistic
programs(excel, stata and matlab)

The data I want to use is

library(Ecdat)
data(Housing)

and I want to extract this data our of R as *.dta *.xls formats.
So, my first try was to open the data in R window and drag and paste to
excel or notepad.

BUT, it didn't work.

Do you have any decent skills to extract this library data?

Please, share to me.


We already did: we wrote a manual for you!

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] df.residual for rlm()

2011-03-01 Thread Jan
Hello,

for testing coefficients of lm(), I wrote the following function (with
the kind support of this mailing list):

# See Verzani, simpleR (pdf), p. 80
coeff.test - function(lm.result, idx, value) {
  # idx = 1 is the intercept, idx1 the other coefficients
  # null hypothesis: coeff = value
  # alternative hypothesis: coeff != value
  coeff - coefficients(lm.result)[idx]
  SE - coefficients(summary(lm.result))[idx,Std. Error]
  n - df.residual(lm.result) 
  t - (coeff - value )/SE
  2 * pt(-abs(t),n) # times two because problem is two-sided
}

This works fine for lm() objects, but fails for rlm() because
df.residual() is NA. 

Can I get the degrees of freedom by calculating 

n = length(lm.result) - length(coefficients(lm.result))

Thanks for any help!
Jan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] which does the S.D. returned by {Hmisc} rcorr.cens measure?

2011-03-01 Thread vikkiyft
Dear R-help,

This is an example in the {Hmisc} manual under rcorr.cens function:

 set.seed(1)
 x - round(rnorm(200))
 y - rnorm(200)
 round(rcorr.cens(x, y, outx=F),4)
   C IndexDxy   S.D.  nmissing
uncensored Relevant Pairs Concordant  Uncertain 
0.4831-0.0338 0.0462   200. 0.  
200. 39800. 19228. 0.

That S.D. confuses me!!
It is obviously not the standard deviation of x or y.. but there is only one
realization of the c-index or Dxy for this sample dataset, where does the
variation come from..??  if I use the conventional formula for calculating
the standard deviation of proportions: sqrt((C Index)*(1-C Index)/n), I get
0.0353 instead of 0.0462..

Any advice is appreciated.


Vikki

-- 
View this message in context: 
http://r.789695.n4.nabble.com/which-does-the-S-D-returned-by-Hmisc-rcorr-cens-measure-tp3329609p3329609.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Is there any Command showing correlation of all variables in a dataset?

2011-03-01 Thread JoonGi

Thanks in advance.

I want to derive correlations of variables in a dataset

Specifically

library(Ecdat)
data(Housing)
attach(Housing)
cor(lotsize, bathrooms)

this code results only the correlationship between two variables.
But I want to examine all the combinations of variables in this dataset.
And I will finally make a table in Latex.

How can I test correlations for all combinations of variables?
with one simple command?


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Is-there-any-Command-showing-correlation-of-all-variables-in-a-dataset-tp3329599p3329599.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] bootstrap resampling question

2011-03-01 Thread Bodnar Laszlo EB_HU
Hello there,

I have a problem concerning bootstrapping in R - especially focusing on the 
resampling part of it. I try to sum it up in a simplified way so that I would 
not confuse anybody.

I have a small database consisting of 20 observations (basically numbers from 1 
to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).

I would like to resample this database many times for the bootstrap process 
with the following two conditions. The resampled databases should also have 20 
observations and you can select each of the previously mentioned 20 numbers 
with replacement. I guess it is obvious so far. Now the more difficult second 
condition is that one number can be selected only maximum 5 times. In order to 
make this clear I try to show you an example. So there can be resampled 
databases like the following ones:

(1st database)  1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
(4 different numbers are chosen, each selected 5 times)

(2nd database)  1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
(Two numbers - 8 and 6 - selected 5 times, number 1 selected four times, the 
others selected less than 4 times)

My very first guess that came to my mind whilst thinking about the problem was 
the sample function where there are settings like replace=TRUE and prob=... 
where you can create a probability vector i.e. how much should be the 
probability of selecting a number. So I tried to calculate probabilities first. 
I thought the problem can basically described as a k-combination with 
repetitions. Unfortunately the only thing I could calculate so far is the total 
number of all possible selections which amounts to 137 846 527 049.

Anybody knows how to implement my second tricky condition into one of the R 
functions? Are 'boot' and 'bootstrap' packages capable of managing this? I 
guess they are, I just couldn't figure it out yet...

Thanks very much! Best regards,
Laszlo Bodnar


Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy 
jogilag, szakmailag vagy más módon védett információt tartalmazhat. 
Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése, 
reprodukálása, másolása, vagy egyéb más úton történő terjesztése, 
felhasználása szigorúan tilos. Amennyiben tévedésből kapta meg ezt az 
üzenetet kérjük azonnal értesítse az üzenet küldőjét. Az Erste Bank 
Hungary Zrt. (EBH) nem vállal felelősséget az információ teljes és pontos 
- címzett(ek)hez történő - eljuttatásáért, valamint semmilyen 
késésért, kapcsolat megszakadásból eredő hibáért, vagy az információ 
felhasználásából vagy annak megbízhatatlanságából eredő kárért.

Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és 
hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az 
EBH folytonos munkamenetének biztosítása érdekében.


This e-mail and any attached files are confidential and/...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SetInternet2, RCurl and proxy

2011-03-01 Thread Manta
Dear all,

I am facing a problem. I am trying to install packages using a proxy, but I
am not able to call the setInternet2 function, either with the small or
capital s. What package do I have to call then? And, could there be a reason
why this does not function?

Thanks,
Marco

-- 
View this message in context: 
http://r.789695.n4.nabble.com/SetInternet2-RCurl-and-proxy-tp3248576p3329624.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Robust variance estimation with rq (failure of the bootstrap?)

2011-03-01 Thread James Shaw
Matt:

Thanks for your prompt reply.

The disparity between the bootstrap and sandwich variance estimates
derived when modeling the highly skewed outcome suggest that either
(A) the empirical robust variance estimator is underestimating the
variance or (B) the bootstrap is breaking down.  The bootstrap
variance estimate of a robust location estimate is not necessarily
robust, see Statistics  Probability Letters 50 (2000) 49-53.  Since
submitting my earlier post, I have noticed that the the robust kernel
variance estimate is similar to the bootstrap estimate.  Under what
conditions would one expect Koenker and Machado's sandwich variance
estimator, which uses a local estimate of the sparsity, to fail?

--
Jim



On Mon, Feb 28, 2011 at 8:59 PM, Matt Shotwell m...@biostatmatt.com wrote:
 Jim,

 If repeated measurements on patients are correlated, then resampling all
 measurements independently induces an incorrect sampling distribution
 (= incorrect variance) on a statistic of these data. One solution, as
 you mention, is the block or cluster bootstrap, which preserves the
 correlation among repeated observations in resamples. I don't
 immediately see why the cluster bootstrap is unsuitable.

 Beyond this, I would be concerned about *any* variance estimates that
 are blind to correlated observations.

 The bootstrap variance estimate may be larger than the asymptotic
 variance estimate, but that alone isn't evidence to favor one over the
 other.

 Also, I can't justify (to myself) why skew would hamper the quality of
 bootstrap variance estimates. I wonder how it affects the sandwich
 variance estimate...

 Best,
 Matt

 On Mon, 2011-02-28 at 17:50 -0600, James Shaw wrote:
 I am fitting quantile regression models using data collected from a
 sample of 124 patients.  When modeling cross-sectional associations, I
 have noticed that nonparametric bootstrap estimates of the variances
 of parameter estimates are much greater in magnitude than the
 empirical Huber estimates derived using summary.rq's nid option.
 The outcome variable is severely skewed, and I am afraid that this may
 be affecting the consistency of the bootstrap variance estimates.  I
 have read that the m out of n bootstrap can be used to overcome this
 problem.  However, this procedure requires both the original sample
 (n) and the subsample (m) sizes to be large.  The version implemented
 in rq.boot does not appear to provide any improvement over the naive
 bootstrap.  Ultimately, I am interested in using median regression to
 model changes in the outcome variable over time.  Summary.rq's robust
 variance estimator is not applicable to repeated-measures data.  I
 question whether the block (cluster) bootstrap variance estimator,
 which can accommodate intraclass correlation, would perform well.  Can
 anyone suggest alternatives for variance estimation in this situation?
 Regards,

 Jim


 James W. Shaw, Ph.D., Pharm.D., M.P.H.
 Assistant Professor
 Department of Pharmacy Administration
 College of Pharmacy
 University of Illinois at Chicago
 833 South Wood Street, M/C 871, Room 266
 Chicago, IL 60612
 Tel.: 312-355-5666
 Fax: 312-996-0868
 Mobile Tel.: 215-852-3045

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.






-- 
James W. Shaw, Ph.D., Pharm.D., M.P.H.
Assistant Professor
Department of Pharmacy Administration
College of Pharmacy
University of Illinois at Chicago
833 South Wood Street, M/C 871, Room 266
Chicago, IL 60612
Tel.: 312-355-5666
Fax: 312-996-0868
Mobile Tel.: 215-852-3045

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there any Command showing correlation of all variables in a dataset?

2011-03-01 Thread Mike Marchywka












 Date: Tue, 1 Mar 2011 02:41:00 -0800
 From: joo...@hanmail.net
 To: r-help@r-project.org
 Subject: [R] Is there any Command showing correlation of all variables in a 
 dataset?


 Thanks in advance.

 I want to derive correlations of variables in a dataset

 Specifically

 library(Ecdat)
 data(Housing)
 attach(Housing)
 cor(lotsize, bathrooms)

 this code results only the correlationship between two variables.
 But I want to examine all the combinations of variables in this dataset.
 And I will finally make a table in Latex.

 How can I test correlations for all combinations of variables?
 with one simple command?


This appears to work as expected although I don't have enough
intuition to know if these numbers are plausible beyond rough
guess( diag of course is one LOL), 

 df-data.frame(a=rnorm(100), b=runif(100),c=runif(100)+rnorm(100))
 str(df)
'data.frame':   100 obs. of  3 variables:
 $ a: num  0.1841 0.2296 -1.2251 0.0898 -0.961 ...
 $ b: num  0.586 0.343 0.821 0.41 0.352 ...
 $ c: num  -0.373 2.225 0.102 1.186 -0.737 ...
 cor(df)
   a   b   c
a 1.  0.07710107  0.11088579
b 0.07710107  1. -0.02424471
c 0.11088579 -0.02424471  1.
 ?cor


 df-data.frame(a=.1*rnorm(100), b=(1:100)/100,c=(1:100)/100+.1*rnorm(100))
 cor(df)
 a   b    c
a  1.0 -0.01970874 -0.003665239
b -0.019708737  1.  0.950375445
c -0.003665239  0.95037544  1.0




  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Hi

2011-03-01 Thread preeja
Hi,
I am facing problem for classification based on graph kernels. we calculated
the kernel between two molecule data set.Then I am confused about
classification

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Hi-tp3329650p3329650.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulation

2011-03-01 Thread Mike Marchywka






 Date: Mon, 28 Feb 2011 19:18:18 -0800
 From: kadodamb...@hotmail.com
 To: r-help@r-project.org
 Subject: [R] Simulation

 I tried looking for help but I couldn't locate the exact solution.
 I have data that has several variables. I want to do several sample
 simulations using only two of the variables (eg: say you have data between
 people and properties owned. You only want to check how many in the samples
 will come up with bicycles) to estimate probabilities and that sort of
 thing.
 Now, I can only do a simulation in terms of this code: sample(1:10, size =
 15, replace = TRUE).
 I do not know how select specific variables only.
 I'll appreciate the help



This is probably not the best R but you can do something like either of these.
Note that this is just the easiest derivative of stuff I already had
and can be fixed to your needs, I usually use runif instead of sample for 
example.
The first example probably being much less efficient than the second,


df-data.frame(a=.1*rnorm(100), b=(1:100)/100,c=(1:100)/100+.1*rnorm(100))

res=1:100; for ( i in 1:100) {res[i]=cor(df[which(runif(100).9),])[1,3] }
res
hist(res)
res=1:100; for ( i in 1:100) {wh=which(runif(100).9); 
res[i]=cor(df$a[wh],df$c[wh]); }
res






 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Simulation-tp3329173p3329173.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Components of variance with lme

2011-03-01 Thread John Sorkin
R 2.10
Windows Vista

Is it possible to run a variance-components analysis using lme? I looked at 
Pinheiro and Bates' book and don't see code that will perform these analyses. 
If the analyses can not be done using lme, what package might I try?
Thanks,
John


John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nested case-control study

2011-03-01 Thread Terry Therneau
1. Using offset(logweight) in coxph is the same as using an offset
logweight; statement in SAS, and neither is the same as case weights.

2. For a nested case control, which is what you said you have, the
strata controls who is in what risk set.  No trickery with start,stop
times is needed.  It does no harm, but is not needed.
  In a case-cohort design the start= stop-epsilon trick is one way to
set risk sets up properly.  But that's not the design you gave for your
study.  

Terry T.



On Mon, 2011-02-28 at 17:02 -0800, array chip wrote:
 Terry, thanks very much! 
 
 Professor Langholz used a SAS software trick to estimate absolute risk
 by creating a fake variable entry_time that is 0.001 less than the
 variable exit_time (i.e. time to event), and then use both variables
 in Phreg. Is this equivalent to your creating a dummy survival with
 time=1?
 
 Another question is, is using offset(logweight) inside the formula of
 coxph() the same as using weight=logweight argument in coxph(),
 because my understanding of Professor Langholz's approach for nested
 case-control study is weighted regression?
 
 Thank you very much for the help.
 
 John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Components of variance with lme

2011-03-01 Thread Scott Chamberlain
VarCorr() and then calculate percentage variance for each component from output 
of VarCorr
On Tuesday, March 1, 2011 at 6:55 AM, John Sorkin wrote: 
 R 2.10
 Windows Vista
 
 Is it possible to run a variance-components analysis using lme? I looked at 
 Pinheiro and Bates' book and don't see code that will perform these analyses. 
 If the analyses can not be done using lme, what package might I try?
 Thanks,
 John
 
 
 John David Sorkin M.D., Ph.D.
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing)
 
 Confidentiality Statement:
 This email message, including any attachments, is for ...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mlogit.data

2011-03-01 Thread gmacfarlane
http://r.789695.n4.nabble.com/file/n3329821/workdata.csv workdata.csv 

The code I posted is exactly what I am running. What you need is this data.
Here is the code again.

 hbwmode-mlogit.data(worktrips.csv, shape=long, choice=CHOSEN,
 alt.var=ALTNUM)
 hbwmode-mlogit.data(hbwtrips, shape=long, choice=CHOSEN,
 alt.var=ALTNUM)

-- 
View this message in context: 
http://r.789695.n4.nabble.com/mlogit-data-tp3328739p3329821.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unicodepdf font problem RESOLVED

2011-03-01 Thread Ben Madin
Just to add to this (I've been looking through the archive) problem with 
display unicode fonts in pdf document in R

If you can use the Cairo package to create pdf on Mac, it seems quite happy 
with pushing unicode characters through (probably still font family dependant 
whether it will display)

probstring - c(' \u2264 0.2',' \u2268 0.4',' \u00FC 0.6',' \u2264 
0.8',' \u2264 1.0')
Cairo(type='pdf', file='outputs/demo.pdf', width=9,height=12, 
units='in', bg='transparent')
plot(1:5,1:5, type='n')
text(1:5,1:5,probstring)  
dev.off()

?Cairo suggests encoding is ignored if you do try to set it.

cheers

Ben



On 14/01/2011, at 7:00 PM, r-help-requ...@r-project.org wrote:

 Date: Thu, 13 Jan 2011 10:47:09 -0500
 From: David Winsemius dwinsem...@comcast.net
 To: Sascha Vieweg saschav...@gmail.com
 Cc: r-help@r-project.org
 Subject: Re: [R] unicodepdf font problem RESOLVED
 Message-ID: 74fa099f-4ce5-45c7-a05a-4a1de6c87...@comcast.net
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes
 
 
 On Jan 13, 2011, at 10:41 AM, Sascha Vieweg wrote:
 
 I have many German umlauts in my data sets and code them UTF-8. When  
 it comes to plotting on pdf, I figured out that CP1257 is a good  
 choice to output Umlauts. I have no experiences with CP1250, but  
 maybe this small hint helps:
 
 pdf(file=paste(sharepath, /filename.pdf, sep=), 9, 6, pointsize  
 = 11, family = Helvetica, encoding = CP1257)
 
 Just an FYI for the archives, that encoding fails with  
 pdf(encoding=CP1257) on a Mac when printing that target umlaut.
 
 David.
 
 *S*
 
 On 11-01-13 16:17, tde...@cogpsyphy.hu wrote:
 
 Date: Thu, 13 Jan 2011 16:17:04 +0100 (CET)
 From: tde...@cogpsyphy.hu
 To: David Winsemius dwinsem...@comcast.net
 Cc: r-help@r-project.org
 Subject: Re: [R] unicodepdf font problem RESOLVED
 
 Dear David,
 
 Thank you for your efforts. Inspired by your remarks, I started a new
 google-search and found this:
 http://stackoverflow.com/questions/3434349/sweave-not-printing-localized-characters
 
 SO HERE COMES THE SOLUTION (it works on both OSs):
 
 pdf.options(encoding = CP1250)
 pdf()
 plot(1,type=n)
 text(1,1,\U0171)
 dev.off()
 
 CP1250 should work for all Central-European languages:
 http://en.wikipedia.org/wiki/Windows-1250
 
 
 Thank you again,
 Denes
 
 
 
 
 On Jan 13, 2011, at 7:01 AM, tde...@cogpsyphy.hu wrote:
 
 
 Hi!
 
 Sorry for the missing specs, here they are:
 version
 _
 platform   i386-pc-mingw32
 arch   i386
 os mingw32
 system i386, mingw32
 status
 major  2
 minor  12.1
 year   2010
 month  12
 day16
 svn rev53855
 language   R
 version.string R version 2.12.1 (2010-12-16)
 
 OS: Windows 7 (English version, 32 bit)
 
 
 
 You are after what Adobe calls: udblacute; 0171.  It is recognized  
 in
 the list of adobe glyphs:
 str(tools::Adobe_glyphs[371, ])
 'data.frame':  1 obs. of  2 variables:
 $ adobe  : chr udblacute
 $ unicode: chr 0171
 
 Consulted the help pages
 points {graphics}
 postscript {grDevices}
 pdf {grDevices}
 charsets {tools}
 postscriptFonts {grDevices}
 
 I have tried a variety of the pdfFonts installed on my Mac without
 success. You can perhaps make a list of fonts on your machines with
 names(pdfFonts()). Perhaps the range of fonts and the glyphs they
 contain is different on your machines. I get consistently warning
 messages saying there is a conversion failure:
 
 pdf(trial.pdf, family=Helvetica)
 # also tried with font=Helvetica but I think that is erroneous
 plot(1,type=n)
 text(1,1,print \U0170\U0171)
 Warning messages:
 1: In text.default(1, 1, print ) :
  conversion failure on 'print ' in 'mbcsToSbcs': dot  
 substituted
 for c5
 2: In text.default(1, 1, print ) :
  conversion failure on 'print ' in 'mbcsToSbcs': dot  
 substituted
 for b0
 3: In text.default(1, 1, print ) :
  conversion failure on 'print ' in 'mbcsToSbcs': dot  
 substituted
 for c5
 4: In text.default(1, 1, print ) :
  conversion failure on 'print ' in 'mbcsToSbcs': dot  
 substituted
 for b1
 5: In text.default(1, 1, print ) :
  font metrics unknown for Unicode character U+0170
 6: In text.default(1, 1, print ) :
  font metrics unknown for Unicode character U+0171
 7: In text.default(1, 1, print ) :
  conversion failure on 'print ' in 'mbcsToSbcs': dot  
 substituted
 for c5
 8: In text.default(1, 1, print ) :
  conversion failure on 'print ' in 'mbcsToSbcs': dot  
 substituted
 for b0
 9: In text.default(1, 1, print ) :
  conversion failure on 'print ' in 'mbcsToSbcs': dot  
 substituted
 for c5
 10: In text.default(1, 1, print ) :
  conversion failure on 'print ' in 'mbcsToSbcs': dot  
 substituted
 for b1
 
 And this is despite my system saying the \U0170 and \U0171 are  
 present
 in the Helvetica font. Also tried family=URWHelvetica and
 family=NimbusSanand and a bunch of 

Re: [R] Explained variance for ICA

2011-03-01 Thread rex.dwyer
You determine the variance explained by *any* unit vector by taking its inner 
product with the data points, then finding the variance of the results.  In the 
case of FastICA, the variance explained by the ICs collectively is exactly the 
same as the variance explained by the principal components (collectively) from 
which they are derived.
HTH
Rex

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Pavel Goldstein
Sent: Tuesday, March 01, 2011 1:24 AM
To: r-help@r-project.org
Subject: [R] Explained variance for ICA

Hello,
I think to use FastICA package for microarray data clusterization,
but one question stops me:  can I know how much variance explain each
component (or all components together) ?
I will be very thankful for the help.

Thanks,

Pavel

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] what does the S.D. returned by {Hmisc} rcorr.cens measure?

2011-03-01 Thread Frank Harrell
Vikki,

The formula you used for std. error of C is not correct.  C is not a simple
per-observation proportion.

SD in the output is the standard error of Dxy.  Dxy = 2(C - .5).  Backsolve
for std err of C.  

Variation in Dxy or C comes from the usual source: sampling variability. 
You can also see this by sampling from the original dataset (a la
bootstrap).

Frank


vikkiyft wrote:
 
 Dear R-help,
 
 This is an example in the {Hmisc} help manual in the section of
 rcorr.cens:
 
 set.seed(1)
 x - round(rnorm(200))
 y - rnorm(200)
 round(rcorr.cens(x, y, outx=F),4)
C IndexDxy   S.D.  nmissing
 uncensored Relevant Pairs Concordant  Uncertain 
 0.4831-0.0338 0.0462   200. 0.
   
 200. 39800. 19228. 0.
 
 That S.D. confuses me!!
 It is obviously not the standard deviation of x or y.. but there is only
 one realization of the c-index or Dxy for this sample dataset, where does
 the variation come from..??  if I use the conventional formula for
 calculating the standard deviation of proportions: sqrt((C Index)*(1-C
 Index)/n), I get 0.0353 instead of 0.0462..
 
 Any advice is appreciated.
 
 
 Vikki
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/what-does-the-S-D-returned-by-Hmisc-rcorr-cens-measure-tp3329609p3329899.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to Save R library data into xls or dta format

2011-03-01 Thread Gabor Grothendieck
On Tue, Mar 1, 2011 at 4:27 AM, JoonGi joo...@hanmail.net wrote:

 Thanks in advance.

 I'm having a trouble with data saving.

 I want to run the same data which is in Ecdat library at different statistic
 programs(excel, stata and matlab)

 The data I want to use is

 library(Ecdat)
 data(Housing)

 and I want to extract this data our of R as *.dta *.xls formats.
 So, my first try was to open the data in R window and drag and paste to
 excel or notepad.


For excel see:
http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windowss=excel


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] High standard error

2011-03-01 Thread danielepippo
Hi to everyone,

 if the estimate of the parameter results in 0.196 and his standard
error is 0.426, can I say that this parameter is not significant for the
model?


Thank you very much

Pippo

-- 
View this message in context: 
http://r.789695.n4.nabble.com/High-standard-error-tp3329903p3329903.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] High standard error

2011-03-01 Thread John Kane
Sure why not?

You do realize, do you not that no one has the slightest idea of what you are 
doing?

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.




--- On Tue, 3/1/11, danielepippo dan...@hotmail.it wrote:

 From: danielepippo dan...@hotmail.it
 Subject: [R] High standard error
 To: r-help@r-project.org
 Received: Tuesday, March 1, 2011, 9:07 AM
 Hi to everyone,
     
      if the estimate of the parameter
 results in 0.196 and his standard
 error is 0.426, can I say that this parameter is not
 significant for the
 model?
 
 
 Thank you very much
 
 Pippo
 
 -- 
 View this message in context: 
 http://r.789695.n4.nabble.com/High-standard-error-tp3329903p3329903.html
 Sent from the R help mailing list archive at Nabble.com.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Robust variance estimation with rq (failure of the bootstrap?)

2011-03-01 Thread Matt Shotwell
Jim,

Thanks for pointing me to this article. The authors argue that the 
bootstrap intervals for a robust estimator may not be as robust as the 
estimator. In this context, robustness is measured by the breakdown 
point, which is supposed to measure robustness to outliers. Even so, the
authors found that the upper bound of a quantile bootstrap interval for 
the sample median was nearly as robust as the sample median. That brings
some comfort in using quantile bootstrap intervals in quantile
regression.

Does the sandwich estimator assume that errors are independent? And a 
related question: Does the rq function allow the user to specify 
clusters/grouping among the observations?

Best,
Matt

On Tue, 2011-03-01 at 05:35 -0600, James Shaw wrote:
 Matt:
 
 Thanks for your prompt reply.
 
 The disparity between the bootstrap and sandwich variance estimates
 derived when modeling the highly skewed outcome suggest that either
 (A) the empirical robust variance estimator is underestimating the
 variance or (B) the bootstrap is breaking down.  The bootstrap
 variance estimate of a robust location estimate is not necessarily
 robust, see Statistics  Probability Letters 50 (2000) 49-53.  Since
 submitting my earlier post, I have noticed that the the robust kernel
 variance estimate is similar to the bootstrap estimate.  Under what
 conditions would one expect Koenker and Machado's sandwich variance
 estimator, which uses a local estimate of the sparsity, to fail?
 
 --
 Jim
 
 
 
 On Mon, Feb 28, 2011 at 8:59 PM, Matt Shotwell m...@biostatmatt.com wrote:
  Jim,
 
  If repeated measurements on patients are correlated, then resampling all
  measurements independently induces an incorrect sampling distribution
  (= incorrect variance) on a statistic of these data. One solution, as
  you mention, is the block or cluster bootstrap, which preserves the
  correlation among repeated observations in resamples. I don't
  immediately see why the cluster bootstrap is unsuitable.
 
  Beyond this, I would be concerned about *any* variance estimates that
  are blind to correlated observations.
 
  The bootstrap variance estimate may be larger than the asymptotic
  variance estimate, but that alone isn't evidence to favor one over the
  other.
 
  Also, I can't justify (to myself) why skew would hamper the quality of
  bootstrap variance estimates. I wonder how it affects the sandwich
  variance estimate...
 
  Best,
  Matt
 
  On Mon, 2011-02-28 at 17:50 -0600, James Shaw wrote:
  I am fitting quantile regression models using data collected from a
  sample of 124 patients.  When modeling cross-sectional associations, I
  have noticed that nonparametric bootstrap estimates of the variances
  of parameter estimates are much greater in magnitude than the
  empirical Huber estimates derived using summary.rq's nid option.
  The outcome variable is severely skewed, and I am afraid that this may
  be affecting the consistency of the bootstrap variance estimates.  I
  have read that the m out of n bootstrap can be used to overcome this
  problem.  However, this procedure requires both the original sample
  (n) and the subsample (m) sizes to be large.  The version implemented
  in rq.boot does not appear to provide any improvement over the naive
  bootstrap.  Ultimately, I am interested in using median regression to
  model changes in the outcome variable over time.  Summary.rq's robust
  variance estimator is not applicable to repeated-measures data.  I
  question whether the block (cluster) bootstrap variance estimator,
  which can accommodate intraclass correlation, would perform well.  Can
  anyone suggest alternatives for variance estimation in this situation?
  Regards,
 
  Jim
 
 
  James W. Shaw, Ph.D., Pharm.D., M.P.H.
  Assistant Professor
  Department of Pharmacy Administration
  College of Pharmacy
  University of Illinois at Chicago
  833 South Wood Street, M/C 871, Room 266
  Chicago, IL 60612
  Tel.: 312-355-5666
  Fax: 312-996-0868
  Mobile Tel.: 215-852-3045
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 
 
 -- 
 James W. Shaw, Ph.D., Pharm.D., M.P.H.
 Assistant Professor
 Department of Pharmacy Administration
 College of Pharmacy
 University of Illinois at Chicago
 833 South Wood Street, M/C 871, Room 266
 Chicago, IL 60612
 Tel.: 312-355-5666
 Fax: 312-996-0868
 Mobile Tel.: 215-852-3045
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list

Re: [R] nls not solving

2011-03-01 Thread Schatzi
Here is a reply by Bart:
Yes you're right (I should have taken off my glasses and looked closer).
However, the argument is essentially the same:

Suppose you have a solution with a,b,k,l. Then for any positive c, [a+b-bc]
+ [bc] + (bc) *exp(kl')exp(-kx) is also a solution, where l'
= l - log(c)/k  .

Cheers,
Bert

(Feel free to post this correction if you like)


This is from me:
The problem with dropping the l parameter is that it is supposed to
account for the lag component. This equation was published in the literature
and has been being solved in SAS. When I put it in excel, it solves, but not
very well as it comes to a different solution for each time that I change
the starting values. As such, I'm not sure how SAS solves for it and I'm not
sure what I should do about the equation. Maybe I should just drop the
parameter a. Thanks for the help.

-- 
View this message in context: 
http://r.789695.n4.nabble.com/nls-not-solving-tp3328647p3329936.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Quantreg model error and goodness of fit

2011-03-01 Thread sadz a
Hi All,

I am using the package quantreg to create a 'model' that I can then use to
predict the response variable (volume) from a larger set of explanatory
variables (environmental factors):

ie-
#model-
fit - rqss(volume~qss(env.factor1,lambda=1)+ qss(env.factor2,lambda=1),
tau =  0.9)
summary(fit)
predict volume from new environmental factors for a larger area where I do
not know the volume
 predi-predict(fit, new, interval = none, level = 0.9)

However I am getting the following error message:
Error in predict.qss1(qss, newdata = newd, ...) :
no extrapolation allowed in predict.qss

Is there a way around this?

and also with the initial model, is there a way to test goodness of fit? If
so how? because #summary(fit) only tells me if the model is significant not
how good it is (like with a linear regression you get and R square value
which tells you how good the model is).

Thank you
All help is appreciated,
If you would like anything clarified please contact me,
Kitty

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there any Command showing correlation of all variables in a dataset?

2011-03-01 Thread rex.dwyer
?cor  answers that question.  If Housing is a dataframe, cor(Housing) should do 
it.  Surprisingly, ??correlation doesn't point you to ?cor.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of JoonGi
Sent: Tuesday, March 01, 2011 5:41 AM
To: r-help@r-project.org
Subject: [R] Is there any Command showing correlation of all variables in a 
dataset?


Thanks in advance.

I want to derive correlations of variables in a dataset

Specifically

library(Ecdat)
data(Housing)
attach(Housing)
cor(lotsize, bathrooms)

this code results only the correlationship between two variables.
But I want to examine all the combinations of variables in this dataset.
And I will finally make a table in Latex.

How can I test correlations for all combinations of variables?
with one simple command?


--
View this message in context: 
http://r.789695.n4.nabble.com/Is-there-any-Command-showing-correlation-of-all-variables-in-a-dataset-tp3329599p3329599.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] bootstrap resampling - simplified

2011-03-01 Thread Bodnar Laszlo EB_HU
Hello there,

I have a problem concerning bootstrapping in R - especially focusing on the 
resampling part of it. I try to sum it up in a simplified way so that I would 
not confuse anybody.

I have a small database consisting of 20 observations (basically numbers from 1 
to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).

I would like to resample this database many times for the bootstrap process 
with the following conditions. Firstly, every resampled database should also 
include 20 observations. Secondly, when selecting a number from the 
above-mentioned 20 numbers, you can do this selection with replacement. The 
difficult part comes now: one number can be selected only maximum 5 times. In 
order to make this clear I show you a couple of examples. So the resampled 
databases might be like the following ones:

(1st database)  1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
4 different numbers are chosen (1, 2, 3, 4), each selected - for the maximum 
possible - 5 times.

(2nd database)  1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
Two numbers - 8 and 6 - selected 5 times (the maximum possible times), number 1 
selected 4 times, the others selected less than 4 times.

(3rd database)  1,1,2,2,3,3,4,4,9,9,9,10,10,13,10,9,3,9,2,1
Number 9 chosen for the maximum possible 5 times, number 10, 3, 2, 1 chosen for 
3 times, number 4 selected twice and number 13 selected only once.

...

Anybody knows how to implement my tricky condition into one of the R 
functions - that one number can be selected only 5 times at most? Are 'boot' 
and 'bootstrap' packages capable of managing this? I guess they are, I just 
couldn't figure it out yet...

Thanks very much! Best regards,
Laszlo Bodnar



Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy 
jogilag, szakmailag vagy más módon védett információt tartalmazhat. 
Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése, 
reprodukálása, másolása, vagy egyéb más úton történő terjesztése, 
felhasználása szigorúan tilos. Amennyiben tévedésből kapta meg ezt az 
üzenetet kérjük azonnal értesítse az üzenet küldőjét. Az Erste Bank 
Hungary Zrt. (EBH) nem vállal felelősséget az információ teljes és pontos 
- címzett(ek)hez történő - eljuttatásáért, valamint semmilyen 
késésért, kapcsolat megszakadásból eredő hibáért, vagy az információ 
felhasználásából vagy annak megbízhatatlanságából eredő kárért.

Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és 
hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az 
EBH folytonos munkamenetének biztosítása érdekében.


This e-mail and any attached files are confidential and/...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem on flexmix when trying to apply signature developed in one model to a new sample

2011-03-01 Thread Jon Toledo


Problem on flexmix when trying to apply signature developed in one model to a 
new sample.
Dear
R Users, R Core Team,

 

I have a problem when trying to know the
classification of the tested cases using two variables with the function  of 
flexmix:

 

After importing the database and creating
a matrix:

BM-cbind(Data$var1,Data$var2)

 

I see that the best model has 2 groups and
use:

 

ex2
- flexmix(BM~1, k=2, model=FLXMCmvnorm(diagonal=FALSE))

print(ex2)

plotEll(ex2, BM)

 

Then I want to test to which group one
subset of patients belongs, so I import a smaller sample of the previous data:

BM2-data.frame (Data2$var1,Data2$var2)

 

However when I test the results I get are
from the complete training sample I used in ex2 and not from the new sample
BM2.

 

ProbMCI-posterior(ex2, BM2)

 

And if I do the following I get double the
number of entered cases (I think because I entered 2 variables):

BM2-cbind (Data2$var1,Data2$var2)

p-posterior(ex2)[BMMCI,]

max.col(p)

 

(The same with clusters(ex2)[BM2])

 

In the future I would like to test the
result of this mixture also in new samples.

 

Thank you in advance  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bootstrap resampling question

2011-03-01 Thread Greg Snow
Here are a couple of thoughts.

If you want to use the boot package then the statistic function you give it 
just receives the bootstrapped indexes, you could test the indexes for your 
condition of not more than 5 of each and if it fails return an NA instead of 
computing the statistic.  Then in your output just remove the NAs (you should 
increase the total number of samples tried so that you have a reasonable number 
after deletion).

If you want to do it by hand, just use the rep function to create 5 replicates 
of your data, then sample from that without replacement.  You can get up to 5 
copies of each value (from the hand replication), but no more since it samples 
without replacement.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Bodnar Laszlo EB_HU
 Sent: Tuesday, March 01, 2011 3:31 AM
 To: 'r-help@r-project.org'
 Subject: [R] bootstrap resampling question
 
 Hello there,
 
 
 
 I have a problem concerning bootstrapping in R - especially focusing on
 the resampling part of it. I try to sum it up in a simplified way so
 that I would not confuse anybody.
 
 
 
 I have a small database consisting of 20 observations (basically
 numbers from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
 
 
 
 I would like to resample this database many times for the bootstrap
 process with the following two conditions. The resampled databases
 should also have 20 observations and you can select each of the
 previously mentioned 20 numbers with replacement. I guess it is obvious
 so far. Now the more difficult second condition is that one number can
 be selected only maximum 5 times. In order to make this clear I try to
 show you an example. So there can be resampled databases like the
 following ones:
 
 
 
 (1st database)  1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
 
 (4 different numbers are chosen, each selected 5 times)
 
 
 
 (2nd database)  1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
 
 (Two numbers - 8 and 6 - selected 5 times, number 1 selected four
 times, the others selected less than 4 times)
 
 
 
 My very first guess that came to my mind whilst thinking about the
 problem was the sample function where there are settings like
 replace=TRUE and prob=... where you can create a probability vector
 i.e. how much should be the probability of selecting a number. So I
 tried to calculate probabilities first. I thought the problem can
 basically described as a k-combination with repetitions. Unfortunately
 the only thing I could calculate so far is the total number of all
 possible selections which amounts to 137 846 527 049.
 
 
 
 Anybody knows how to implement my second tricky condition into one of
 the R functions? Are 'boot' and 'bootstrap' packages capable of
 managing this? I guess they are, I just couldn't figure it out yet...
 
 
 
 Thanks very much! Best regards,
 
 Laszlo Bodnar
 
 
 
 ___
 _
 
 Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos
 és/vagy jogilag, szakmailag vagy más módon védett információt
 tartalmazhat. Amennyiben nem Ön a levél címzettje akkor a levél
 tartalmának közlése, reprodukálása, másolása, vagy egyéb más úton
 történő terjesztése, felhasználása szigorúan tilos. Amennyiben
 tévedésből kapta meg ezt az üzenetet kérjük azonnal értesítse az üzenet
 küldőjét. Az Erste Bank Hungary Zrt. (EBH) nem vállal felelősséget az
 információ teljes és pontos - címzett(ek)hez történő - eljuttatásáért,
 valamint semmilyen késésért, kapcsolat megszakadásból eredő hibáért,
 vagy az információ felhasználásából vagy annak megbízhatatlanságából
 eredő kárért.
 
 
 
 Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és
 hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az
 EBH folytonos munkamenetének biztosítása érdekében.
 
 
 
 
 
 This e-mail and any attached files are confidential
 and/...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bootstrap resampling question

2011-03-01 Thread Giovanni Petris
A simple way of sampling with replacement from 1:20, with the additional
constraint that each number can be selected at most five times is

 sample(rep(1:20, 5), 20)

HTH,
Giovanni

On Tue, 2011-03-01 at 11:30 +0100, Bodnar Laszlo EB_HU wrote:
 Hello there,
 
 I have a problem concerning bootstrapping in R - especially focusing on the 
 resampling part of it. I try to sum it up in a simplified way so that I would 
 not confuse anybody.
 
 I have a small database consisting of 20 observations (basically numbers from 
 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
 
 I would like to resample this database many times for the bootstrap process 
 with the following two conditions. The resampled databases should also have 
 20 observations and you can select each of the previously mentioned 20 
 numbers with replacement. I guess it is obvious so far. Now the more 
 difficult second condition is that one number can be selected only maximum 5 
 times. In order to make this clear I try to show you an example. So there can 
 be resampled databases like the following ones:
 
 (1st database)  1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
 (4 different numbers are chosen, each selected 5 times)
 
 (2nd database)  1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
 (Two numbers - 8 and 6 - selected 5 times, number 1 selected four times, 
 the others selected less than 4 times)
 
 My very first guess that came to my mind whilst thinking about the problem 
 was the sample function where there are settings like replace=TRUE and 
 prob=... where you can create a probability vector i.e. how much should be 
 the probability of selecting a number. So I tried to calculate probabilities 
 first. I thought the problem can basically described as a k-combination with 
 repetitions. Unfortunately the only thing I could calculate so far is the 
 total number of all possible selections which amounts to 137 846 527 049.
 
 Anybody knows how to implement my second tricky condition into one of the R 
 functions? Are 'boot' and 'bootstrap' packages capable of managing this? I 
 guess they are, I just couldn't figure it out yet...
 
 Thanks very much! Best regards,
 Laszlo Bodnar
 
 
 Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy 
 jogilag, szakmailag vagy más módon védett információt tartalmazhat. 
 Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése, 
 reprodukálása, másolása, vagy egyéb más úton történő terjesztése, 
 felhasználása szigorúan tilos. Amennyiben tévedésből kapta meg ezt az 
 üzenetet kérjük azonnal értesítse az üzenet küldőjét. Az Erste Bank Hungary 
 Zrt. (EBH) nem vállal felelősséget az információ teljes és pontos - 
 címzett(ek)hez történő - eljuttatásáért, valamint semmilyen késésért, 
 kapcsolat megszakadásból eredő hibáért, vagy az információ felhasználásából 
 vagy annak megbízhatatlanságából eredő kárért.
 
 Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és 
 hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH 
 folytonos munkamenetének biztosítása érdekében.
 
 
 This e-mail and any attached files are confidential and/...{{dropped:19}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 

Giovanni Petris  gpet...@uark.edu
Associate Professor
Department of Mathematical Sciences
University of Arkansas - Fayetteville, AR 72701
Ph: (479) 575-6324, 575-8630 (fax)
http://definetti.uark.edu/~gpetris/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problems with playwith

2011-03-01 Thread R Heberto Ghezzo, Dr
hello, i tried to run playwith but :

 library(playwith)
Loading required package: lattice
Loading required package: cairoDevice
Loading required package: gWidgetsRGtk2
Loading required package: gWidgets
Error in inDL(x, as.logical(local), as.logical(now), ...) : 
  unable to load shared object 'H:/R/cran/RGtk2/libs/i386/RGtk2.dll':
  LoadLibrary failure:  The specified procedure could not be found.
Failed to load RGtk2 dynamic library, attempting to install it.
Learn more about GTK+ at http://www.gtk.org
If the package still does not load, please ensure that GTK+ is installed and 
that it is on your PATH environment variable
IN ANY CASE, RESTART R BEFORE TRYING TO LOAD THE PACKAGE AGAIN
Error : .onAttach failed in attachNamespace() for 'gWidgetsRGtk2', details:
  call: .Call(name, ..., PACKAGE = PACKAGE)
  error: C symbol name S_gtk_icon_factory_new not in DLL for package RGtk2
Error: package 'gWidgetsRGtk2' could not be loaded
 
 Sys.getenv(PATH)




PATH 
H:\\R/GTK/bin;H:\\R/GTK/lib;H:\\R/ImageMagick;C:\\windows\\system32;C:\\windows;C:\\windows\\System32\\Wbem;C:\\windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Program
 Files\\Common Files\\Ulead Systems\\MPEG;C:\\Program 
Files\\QuickTime\\QTSystem\\;H:\\R\\GTK\\GTK2-Runtime\\bin;H:\\PortableUSB/PortableApps/MikeTex/miktex/bin
 
 
packages(lattice, cairoDevice, gWidgetsRGtk2, gWidgets, RGtk2, playwith) were 
reinstalled
program GTK was reinstalled.
using R-2-12-2 on Windows 7
Can anybody suggest a solution?
thanks

R.Heberto Ghezzo Ph.D.
Montreal - Canada
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Does POSIXlt extract date components properly?

2011-03-01 Thread Seth W Bigelow
I would like to use POSIX classes to store dates and extract components of 
dates. Following the example in Spector (Data Manipulation in R), I 
create a date

 mydate = as. POSIXlt('2005-4-19 7:01:00')

I then successfully extract the day with the command

 mydate$day
[1] 19

But when I try to extract the month

  mydate$mon
[1] 3

it returns the wrong month. And mydate$year is off by about 2,000 years. 
Am I doing something wrong?

Dr. Seth  W. Bigelow
Biologist, USDA-FS Pacific Southwest Research Station
1731 Research Park Drive, Davis California
sbige...@fs.fed.us /  ph. 530 759 1718
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Speed up sum of outer products?

2011-03-01 Thread AjayT
Hi, I'm new to R and stats, and I'm trying to speed up the following sum,

for (i in 1:n){
C = C + (X[i,] %o% X[i,])   # the sum of outer products - this is very 
slow
according to Rprof()
}

where X is a data matrix (nrows=1000 X ncols=50), and n=1000. The sum has to
be calculated over 10,000 times for different X. 

I think it is similar to estimating a co-variance matrix for demeaned data
X. I tried using cov, but got different answers, and it was'nt much quicker?

Any help gratefully appreciated, 

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Speed-up-sum-of-outer-products-tp3330160p3330160.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mlogit.data

2011-03-01 Thread Mark Difford
My previous posting seems to have got mangled. This reposts it.

On Mar 01, 2011; 03:32pm gmacfarlane wrote:

 workdata.csv
 The code I posted is exactly what I am running. What you need is this
 data. Here is the code again. 
 hbwmode-mlogit.data(worktrips.csv, shape=long, choice=CHOSEN,
 alt.var=ALTNUM) 
 hbwmode-mlogit.data(hbwtrips, shape=long, choice=CHOSEN,
 alt.var=ALTNUM)

You still have not done what the posting guide asks for but have expected me
(or someone else) to scrutinize a large unknown data set (22003 rows). 

Fortunately there are other routes. Had you studied Yves Croissant's
examples (?mlogit.data), which do work, you would have seen that your input
or raw data have to have a particular format for mlogit.data to work.

In particular, the alt.var (mode in the TravelMode data set and ALTNUM
in your data set) has to go through all its levels in sequence. Yours don't
(your variable has 6 levels but sometimes runs from 1 to 5, sometimes from 2
to 6, and so on). Within each run there must be only one choice.

##
 library(mlogit)
 data(TravelMode, package = AER)
 head(TravelMode, n= 20)
   individual  mode choice wait vcost travel gcost income size
1   1   air no   695910070 351
2   1 train no   343137271 351
3   1   bus no   352541770 351
4   1   caryes01018030 351
5   2   air no   6458 6868 302
6   2 train no   443135484 302
7   2   bus no   532539985 302
8   2   caryes01125550 302
9   3   air no   69   115125   129 401
10  3 train no   3498892   195 401
11  3   bus no   3553882   149 401
12  3   caryes023720   101 401
13  4   air no   6449 6859 703
14  4 train no   442635479 703
15  4   bus no   532139981 703
16  4   caryes0 518032 703
17  5   air no   646014482 452
18  5 train no   443240493 452
19  5   bus no   532644994 452
20  5   caryes0 860099 452

When we look at just the relevant part of your data we have the following:

 hbwtrips-read.csv(E:/Downloads/workdata.csv, header=TRUE, sep=,,
 dec=., row.names=NULL)
 head(hbwtrips[, c(2:11)], n=25)
   HHID PERID CASE ALTNUM NUMALTS CHOSEN  IVTT  OVTT  TVTT   COST
1 2 11  1   5  1 13.38  2.00 15.38  70.63
2 2 11  2   5  0 18.38  2.00 20.38  35.32
3 2 11  3   5  0 20.38  2.00 22.38  20.18
4 2 11  4   5  0 25.90 15.20 41.10 115.64
5 2 11  5   5  0 40.50  2.00 42.50   0.00
6 3 12  1   5  0 29.92 10.00 39.92 390.81
7 3 12  2   5  0 34.92 10.00 44.92 195.40
8 3 12  3   5  0 21.92 10.00 31.92  97.97
9 3 12  4   5  1 22.96 14.20 37.16 185.00
103 12  5   5  0 58.95 10.00 68.95   0.00
115 13  1   4  1  8.60  6.00 14.60  37.76
125 13  2   4  0 13.60  6.00 19.60  18.88
135 13  3   4  0 15.60  6.00 21.60  10.79
145 13  4   4  0 16.87 21.40 38.27 105.00
156 14  1   4  0 30.60  8.50 39.10 417.32
166 14  2   4  0 35.70  8.50 44.20 208.66
176 14  3   4  0 22.70  8.50 31.20 105.54
186 14  4   4  1 24.27  9.00 33.27 193.49
198 25  2   4  1 23.04  3.00 26.04  29.95
208 25  3   4  0 25.04  3.00 28.04  17.12
218 25  4   4  0 25.04 23.50 48.54 100.00
228 25  5   4  0 34.35  3.00 37.35   0.00
238 36  2   5  0 11.14  3.50 14.64  14.00
248 36  3   5  0 13.14  3.50 16.64   8.00
258 36  4   5  1  3.95 16.24 20.19 100.00

To show you that this is so we will mock up two variables that have the
characteristics described above and use them to execute the function.

##
hbwtrips$CHOICEN - rep(c(rep(0,10),1), 2003)
hbwtrips$ALTNUMTest - gl(11,1,22033, labels=LETTERS[1:11])
hbwtrips[1:30, c(1:11,44,45)]
hbwmode - mlogit.data(hbwtrips, varying=c(8:11), shape=long,
choice=CHOICEN, 
  alt.var=ALTNUMTest)

Hope that helps,

Regards, Mark.

-- 
View this message in context: 
http://r.789695.n4.nabble.com/mlogit-data-tp3328739p3330148.html
Sent from the R help mailing list archive at Nabble.com.

__

Re: [R] Is there any Command showing correlation of all variables in a dataset?

2011-03-01 Thread Scott Chamberlain
cor.prob function gives matrix of correlation coefficients and p-values together


### Function for calculating correlation matrix, corrs below diagonal, 
### and P-values above diagonal cor.prob - function(X, dfr = nrow(X) - 2) 
{ R - cor(X) above - row(R)  col(R) r2 - R[above]^2 Fstat - r2 * dfr / (1 
- r2) R[above] - 1 - pf(Fstat, 1, dfr) R } 


Scott 
On Tuesday, March 1, 2011 at 9:54 AM, rex.dw...@syngenta.com wrote:
?cor answers that question. If Housing is a dataframe, cor(Housing) should do 
it. Surprisingly, ??correlation doesn't point you to ?cor.
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of JoonGi
 Sent: Tuesday, March 01, 2011 5:41 AM
 To: r-help@r-project.org
 Subject: [R] Is there any Command showing correlation of all variables in a 
 dataset?
 
 
 Thanks in advance.
 
 I want to derive correlations of variables in a dataset
 
 Specifically
 
 library(Ecdat)
 data(Housing)
 attach(Housing)
 cor(lotsize, bathrooms)
 
 this code results only the correlationship between two variables.
 But I want to examine all the combinations of variables in this dataset.
 And I will finally make a table in Latex.
 
 How can I test correlations for all combinations of variables?
 with one simple command?
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Is-there-any-Command-showing-correlation-of-all-variables-in-a-dataset-tp3329599p3329599.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 message may contain confidential information. If you are not the designated 
 recipient, please notify the sender immediately, and delete the original and 
 any copies. Any use of the message by you is prohibited. 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problems with playwith

2011-03-01 Thread Liviu Andronic
On Tue, Mar 1, 2011 at 5:58 PM, R Heberto Ghezzo, Dr
heberto.ghe...@mcgill.ca wrote:
 hello, i tried to run playwith but :

 library(playwith)
 Loading required package: lattice
 Loading required package: cairoDevice
 Loading required package: gWidgetsRGtk2
 Loading required package: gWidgets
 Error in inDL(x, as.logical(local), as.logical(now), ...) :
  unable to load shared object 'H:/R/cran/RGtk2/libs/i386/RGtk2.dll':
  LoadLibrary failure:  The specified procedure could not be found.
 Failed to load RGtk2 dynamic library, attempting to install it.

Did you install RGtk2? [1]
Liviu

[1] https://code.google.com/p/playwith/


 Learn more about GTK+ at http://www.gtk.org
 If the package still does not load, please ensure that GTK+ is installed and 
 that it is on your PATH environment variable
 IN ANY CASE, RESTART R BEFORE TRYING TO LOAD THE PACKAGE AGAIN
 Error : .onAttach failed in attachNamespace() for 'gWidgetsRGtk2', details:
  call: .Call(name, ..., PACKAGE = PACKAGE)
  error: C symbol name S_gtk_icon_factory_new not in DLL for package RGtk2
 Error: package 'gWidgetsRGtk2' could not be loaded

 Sys.getenv(PATH)
                                                                               
                                                                               
                                                                               
                                                                               
            PATH
 H:\\R/GTK/bin;H:\\R/GTK/lib;H:\\R/ImageMagick;C:\\windows\\system32;C:\\windows;C:\\windows\\System32\\Wbem;C:\\windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Program
  Files\\Common Files\\Ulead Systems\\MPEG;C:\\Program 
 Files\\QuickTime\\QTSystem\\;H:\\R\\GTK\\GTK2-Runtime\\bin;H:\\PortableUSB/PortableApps/MikeTex/miktex/bin

 packages(lattice, cairoDevice, gWidgetsRGtk2, gWidgets, RGtk2, playwith) were 
 reinstalled
 program GTK was reinstalled.
 using R-2-12-2 on Windows 7
 Can anybody suggest a solution?
 thanks

 R.Heberto Ghezzo Ph.D.
 Montreal - Canada
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there any Command showing correlation of all variables in a dataset?

2011-03-01 Thread Liviu Andronic
On Tue, Mar 1, 2011 at 11:41 AM, JoonGi joo...@hanmail.net wrote:

 Thanks in advance.

 I want to derive correlations of variables in a dataset

 Specifically

 library(Ecdat)
 data(Housing)
 attach(Housing)
 cor(lotsize, bathrooms)

 this code results only the correlationship between two variables.
 But I want to examine all the combinations of variables in this dataset.
 And I will finally make a table in Latex.

 How can I test correlations for all combinations of variables?
 with one simple command?

See Rcmdr for an example.
Liviu



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Is-there-any-Command-showing-correlation-of-all-variables-in-a-dataset-tp3329599p3329599.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Does POSIXlt extract date components properly?

2011-03-01 Thread Luke Miller
Month counts from 0 in POSIXlt objects, so that April is month 3 in your
example, January being month 0.

Year counts from 1900 in POSIXlt objects, so that 2005 should return as 105
in your example.

All of the other fields in POSIXlt should return values that you might
expect them to a priori. Keep an eye on daylight savings time adjustments if
they apply in your time zone.

On Tue, Mar 1, 2011 at 12:01 PM, Seth W Bigelow sbige...@fs.fed.us wrote:

 I would like to use POSIX classes to store dates and extract components of
 dates. Following the example in Spector (Data Manipulation in R), I
 create a date

  mydate = as. POSIXlt('2005-4-19 7:01:00')

 I then successfully extract the day with the command

  mydate$day
 [1] 19

 But when I try to extract the month

   mydate$mon
 [1] 3

 it returns the wrong month. And mydate$year is off by about 2,000 years.
 Am I doing something wrong?

 Dr. Seth  W. Bigelow
 Biologist, USDA-FS Pacific Southwest Research Station
 1731 Research Park Drive, Davis California
 sbige...@fs.fed.us /  ph. 530 759 1718
[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
___
Luke Miller
Postdoctoral Researcher
Marine Science Center
Northeastern University
Nahant, MA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SetInternet2, RCurl and proxy

2011-03-01 Thread Uwe Ligges



On 01.03.2011 12:00, Manta wrote:

Dear all,

I am facing a problem. I am trying to install packages using a proxy, but I
am not able to call the setInternet2 function, either with the small or
capital s. What package do I have to call then? And, could there be a reason
why this does not function?


What does the error message say? Is this a recent version of R?

Uwe Ligges



Thanks,
Marco



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speed up sum of outer products?

2011-03-01 Thread Phil Spector
What you're doing is breaking up the calculation of X'X 
into n steps.   I'm not sure what you mean by very slow:



X = matrix(rnorm(1000*50),1000,50)
n = 1000
system.time({C=matrix(0,50,50);for(i in 1:n)C = C + (X[i,] %o% X[i,])})

   user  system elapsed
  0.096   0.008   0.104

Of course, you could just do the calculation directly:


system.time({C1 = t(X) %*% X})

   user  system elapsed
  0.008   0.000   0.007 

all.equal(C,C1)

[1] TRUE


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu



On Tue, 1 Mar 2011, AjayT wrote:


Hi, I'm new to R and stats, and I'm trying to speed up the following sum,

for (i in 1:n){
C = C + (X[i,] %o% X[i,])   # the sum of outer products - this is very 
slow
according to Rprof()
}

where X is a data matrix (nrows=1000 X ncols=50), and n=1000. The sum has to
be calculated over 10,000 times for different X.

I think it is similar to estimating a co-variance matrix for demeaned data
X. I tried using cov, but got different answers, and it was'nt much quicker?

Any help gratefully appreciated,

--
View this message in context: 
http://r.789695.n4.nabble.com/Speed-up-sum-of-outer-products-tp3330160p3330160.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speed up sum of outer products?

2011-03-01 Thread Doran, Harold
Isn't the following the canonical (R-ish) way of doing this:

X = matrix(rnorm(1000*50),1000,50)
system.time({C1 = t(X) %*% X}) # Phil's example

C2 - crossprod(X) # use crossprod instead

 all.equal(C1,C2)
[1] TRUE

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of Phil Spector
 Sent: Tuesday, March 01, 2011 12:31 PM
 To: AjayT
 Cc: r-help@r-project.org
 Subject: Re: [R] Speed up sum of outer products?
 
 What you're doing is breaking up the calculation of X'X
 into n steps.   I'm not sure what you mean by very slow:
 
  X = matrix(rnorm(1000*50),1000,50)
  n = 1000
  system.time({C=matrix(0,50,50);for(i in 1:n)C = C + (X[i,] %o% X[i,])})
 user  system elapsed
0.096   0.008   0.104
 
 Of course, you could just do the calculation directly:
 
  system.time({C1 = t(X) %*% X})
 user  system elapsed
0.008   0.000   0.007
  all.equal(C,C1)
 [1] TRUE
 
 
   - Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spec...@stat.berkeley.edu
 
 
 
 On Tue, 1 Mar 2011, AjayT wrote:
 
  Hi, I'm new to R and stats, and I'm trying to speed up the following sum,
 
  for (i in 1:n){
  C = C + (X[i,] %o% X[i,])   # the sum of outer products - this is very
 slow
  according to Rprof()
  }
 
  where X is a data matrix (nrows=1000 X ncols=50), and n=1000. The sum has to
  be calculated over 10,000 times for different X.
 
  I think it is similar to estimating a co-variance matrix for demeaned data
  X. I tried using cov, but got different answers, and it was'nt much quicker?
 
  Any help gratefully appreciated,
 
  --
  View this message in context: http://r.789695.n4.nabble.com/Speed-up-sum-of-
 outer-products-tp3330160p3330160.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls not solving

2011-03-01 Thread Peter Ehlers

On 2011-03-01 06:38, Schatzi wrote:

Here is a reply by Bart:
Yes you're right (I should have taken off my glasses and looked closer).
However, the argument is essentially the same:

Suppose you have a solution with a,b,k,l. Then for any positive c, [a+b-bc]
+ [bc] + (bc) *exp(kl')exp(-kx) is also a solution, where l'
= l - log(c)/k  .

Cheers,
Bert

(Feel free to post this correction if you like)


This is from me:
The problem with dropping the l parameter is that it is supposed to
account for the lag component. This equation was published in the literature
and has been being solved in SAS. When I put it in excel, it solves, but not
very well as it comes to a different solution for each time that I change
the starting values. As such, I'm not sure how SAS solves for it and I'm not
sure what I should do about the equation. Maybe I should just drop the
parameter a. Thanks for the help.


When you say 'published in the literature' you should
provide a reference; you may be misinterpreting what's
published.

If SAS provides a 'solution', then there's an added
assumption being made (perhaps 'l' is being fixed?).
What Excel does is of little interest.

'Dropping' the parameter 'a' is equivalent to setting a=0.
You could also set, say, a = -10 or l = 50, or ...
The point is that, as Bert says, the model is
nonidentifiable.

Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SetInternet2, RCurl and proxy

2011-03-01 Thread Manta
It says the function does not exist. The version is around 2.8, cant check
right now. Is it because it's an older version? If so, is there any way to
do it in a different way then?

-- 
View this message in context: 
http://r.789695.n4.nabble.com/SetInternet2-RCurl-and-proxy-tp3248576p3330244.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data type problem when extract data from SQLite to R by using RSQLite

2011-03-01 Thread chen jia
Hi Seth,

Thanks so much for identifying the problem and explaining everything.
I think the first solution that you suggest--make sure the schema has
well defined types--would work the best for me. But, I have one
question about how to implement it, which is more about sqlite itself.

First, I found out that the columns that don't have the expected data
types in the table annual_data3 are created by aggregate functions in
a separate table. These columns are later combined with other columns
that do.

I read the link that you provide,
http://www.sqlite.org/datatype3.html. One paragraph says When
grouping values with the GROUP BY clause values with different storage
classes are considered distinct, except for INTEGER and REAL values
which are considered equal if they are numerically equal. No
affinities are applied to any values as the result of a GROUP by
clause.

If I understand it correctly, the columns created by aggregate
functions with a GROUP by clause do not have any expected data types.

My solution is to use CREATE TABLE clause to declare the expected
datatype and then insert the values of columns created by the
aggregate functions with the GROUP by clause. However, this solution
requires a CREATE TABLE cause every time the aggregate function and
the GROUP by clause is used.

My question is: Is this the best way to make sure that the columns as
a result of a GROUP by clause have the expected data types? Thanks.

Best,
Jia

On Tue, Mar 1, 2011 at 1:16 AM, Seth Falcon s...@userprimary.net wrote:
 Hi Jia,

 On Mon, Feb 28, 2011 at 6:57 PM, chen jia chen_1...@fisher.osu.edu wrote:
 The .schema of table annual_data3 is
 sqlite .schema annual_data3
 CREATE TABLE annual_data3(
  PERMNO INT,
  DATE INT,
  CUSIP TEXT,
  EXCHCD INT,
  SICCD INT,
  SHROUT INT,
  PRC REAL,
  RET REAL,
  ...
  pret_var,
  pRET_sd,
  nmret,
  pya_var,

 [snip]

 Is there a reason that you've told SQLite the expected data type for
 only some of the columns?

 Interestingly, I find that the problem I reported does not for columns
 labeled real in the schema info. For example, the type of column RET
 never changes no matter what the first observation is.

 Yes, that is expected and I think it is the solution to your problem:
 setup your schema so that all columns have a declared type.  For some
 details on SQLite's type system see
 http://www.sqlite.org/datatype3.html.

 RSQLite currently maps NA values to NULL in the database.  Pulling
 data out of a SELECT query, RSQLite uses the sqlite3_column_type
 SQLite API to determine the data type and map it to an R type.  If
 NULL is encountered, then the schema is inspected using
 sqlite3_column_decltype to attempt to obtain a type.  If that fails,
 the data is mapped to a character vector at the R level.  The type
 selection is done once after the first row has been fetched.

 To work around this you can:

 - make sure your schema has well defined
  types (which will help SQLite perform its operations);

 - check whether the returned column has the expected type and convert
  if needed at the R level.

 - remove NA/NULL values from the db or decide on a different way of
  encoding them (e.g you might be able to use -1 in the db in some
  situation to indicate missing).  Your R code would then need to map
  these to proper NA.

 Hope that helps.

 + seth



 --
 Seth Falcon | @sfalcon | http://userprimary.net/




-- 
700 Fisher Hall
2100 Neil Ave.
Columbus, Ohio  43210
http://www.fisher.osu.edu/~chen_1002/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adjusting values via vectorization

2011-03-01 Thread Albert Vernon Smith
I'm adjusting values in a list based on a couple of matrixes.  One matrix 
specifies the row to be taken from the adjustment matrix, while using the 
aligned column values.  I have an approach which works, but I might find an 
approach with vectorization. 

Here is code with my solution:

--
nids - 10
npredictors - 2
ncol - 4
values - sample(c(-1,0,1),nids,replace=TRUE)
input - matrix(sample(1:ncol,nids*npredictors,replace=TRUE),nrow=nids)
values.adjust - matrix(rnorm(ncol*npredictors),ncol=npredictors)

for(i in 1:nids){
  for(j in 1:npredictors){
values[i] - values[i] + values.adjust[input[i,j],j]
  }
}
--

I'm using this as an example to hopefully better understand R syntax w.r.t. 
vectorization.  Is there such a way to replace my for loops?

Thanks,
-albert
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Does POSIXlt extract date components properly?

2011-03-01 Thread Prof Brian Ripley

On Tue, 1 Mar 2011, Seth W Bigelow wrote:


I would like to use POSIX classes to store dates and extract components of
dates. Following the example in Spector (Data Manipulation in R), I
create a date


mydate = as. POSIXlt('2005-4-19 7:01:00')


I then successfully extract the day with the command


mydate$day

[1] 19

But when I try to extract the month

 mydate$mon
[1] 3

it returns the wrong month. And mydate$year is off by about 2,000 years.
Am I doing something wrong?


Not reading the documentation (nor the posting guide). 
?DateTimeClasses says


 ‘mon’ 0-11: months after the first of the year.

 ‘year’ years since 1900.

That is the POSIX standard ... you could also have looked there.


Dr. Seth  W. Bigelow
Biologist, USDA-FS Pacific Southwest Research Station
1731 Research Park Drive, Davis California
sbige...@fs.fed.us /  ph. 530 759 1718
[[alternative HTML version deleted]]


Please note what the posting guide said about that!


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boa library and plots

2011-03-01 Thread emj83
Many thanks for your response, and I am sorry I did not post correctly.

I have found dev.copy2eps() useful.

Emma

-- 
View this message in context: 
http://r.789695.n4.nabble.com/boa-library-and-plots-tp3322508p3330299.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bootstrap resampling question

2011-03-01 Thread Jonathan P Daily
I'm not sure that is equivalent to sampling with replacement, since if the 
first draw is 1, then the probability that the next draw will be one is 
4/100 instead of the 1/20 it would be in sampling with replacement. I 
think the way to do this would be what Greg suggested - something like:

bigsamp - sample(1:20, 100, T)
idx - sort(unlist(sapply(1:20, function(x) which(bigsamp == 
x)[1:5])))[1:20]
samp - bigsamp[idx]

--
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it.
 - Jubal Early, Firefly

r-help-boun...@r-project.org wrote on 03/01/2011 09:37:31 AM:

 [image removed] 
 
 Re: [R] bootstrap resampling question
 
 Giovanni Petris 
 
 to:
 
 Bodnar Laszlo EB_HU
 
 03/01/2011 11:58 AM
 
 Sent by:
 
 r-help-boun...@r-project.org
 
 Cc:
 
 'r-help@r-project.org'
 
 A simple way of sampling with replacement from 1:20, with the additional
 constraint that each number can be selected at most five times is
 
  sample(rep(1:20, 5), 20)
 
 HTH,
 Giovanni
 
 On Tue, 2011-03-01 at 11:30 +0100, Bodnar Laszlo EB_HU wrote:
  Hello there,
  
  I have a problem concerning bootstrapping in R - especially 
 focusing on the resampling part of it. I try to sum it up in a 
 simplified way so that I would not confuse anybody.
  
  I have a small database consisting of 20 observations (basically 
 numbers from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
  
  I would like to resample this database many times for the 
 bootstrap process with the following two conditions. The resampled 
 databases should also have 20 observations and you can select each 
 of the previously mentioned 20 numbers with replacement. I guess it 
 is obvious so far. Now the more difficult second condition is that 
 one number can be selected only maximum 5 times. In order to make 
 this clear I try to show you an example. So there can be resampled 
 databases like the following ones:
  
  (1st database)  1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
  (4 different numbers are chosen, each selected 5 times)
  
  (2nd database)  1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
  (Two numbers - 8 and 6 - selected 5 times, number 1 selected 
 four times, the others selected less than 4 times)
  
  My very first guess that came to my mind whilst thinking about the
 problem was the sample function where there are settings like 
 replace=TRUE and prob=... where you can create a probability vector 
 i.e. how much should be the probability of selecting a number. So I 
 tried to calculate probabilities first. I thought the problem can 
 basically described as a k-combination with repetitions. 
 Unfortunately the only thing I could calculate so far is the total 
 number of all possible selections which amounts to 137 846 527 049.
  
  Anybody knows how to implement my second tricky condition into 
 one of the R functions? Are 'boot' and 'bootstrap' packages capable 
 of managing this? I guess they are, I just couldn't figure it out yet...
  
  Thanks very much! Best regards,
  Laszlo Bodnar
  
  
 

  Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos 
 és/vagy jogilag, szakmailag vagy más módon védett információt
 tartalmazhat. Amennyiben nem Ön a levél címzettje akkor a levél 
 tartalmának közlése, reprodukálása, másolása, vagy egyéb más úton 
 történő terjesztése, felhasználása szigorúan tilos. Amennyiben 
 tévedésből kapta meg ezt az üzenetet kérjük azonnal értesítse az 
 üzenet küldőjét. Az Erste Bank Hungary Zrt. (EBH) nem vállal 
 felelősséget az információ teljes és pontos - címzett(ek)hez történő
 - eljuttatásáért, valamint semmilyen késésért, kapcsolat 
 megszakadásból eredő hibáért, vagy az információ felhasználásából 
 vagy annak megbízhatatlanságából eredő kárért.
  
  Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és
 hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet
 az EBH folytonos munkamenetének biztosítása érdekében.
  
  
  This e-mail and any attached files are confidential 
and/...{{dropped:19}}
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 
 Giovanni Petris  gpet...@uark.edu
 Associate Professor
 Department of Mathematical Sciences
 University of Arkansas - Fayetteville, AR 72701
 Ph: (479) 575-6324, 575-8630 (fax)
 http://definetti.uark.edu/~gpetris/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 

[R] more boa plots questions

2011-03-01 Thread emj83
I have MCMC output chains A and B for example, I want to produce trace plots
for them using the boa command line...

#loads boa
boa.init()
#reads in chains
boa.chain.add(boa.importMatrix('A'), 'A')
boa.chain.add(boa.importMatrix('B'), 'B')
#plot trace plot
problems arise here!

I know I can get trace plots using boa.plot('trace') but this plots the
parameter chains on the same plot- I want separate plots using
boa.plot.trace()

#from the manual..
boa.plot.trace(lnames, pname, annotate = boa.par(legend))
lnames: Character vector giving the name of the desired MCMC sequence
   in the working session list of sequences.
pname: Character string giving the name of the parameters to be
  plotted.
annotate: Logical value indicating that a legend be included in the
  plot.

I tried boa.plot.trace(B) and boa.plot.trace(B,B) but both do not give me
a trace plot for chain B and print FALSE. I am obviously misinterpreting
lnames and pnames.

Can anyone help please?
Thanks in advance Emma





-- 
View this message in context: 
http://r.789695.n4.nabble.com/more-boa-plots-questions-tp3330312p3330312.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Lattice: useOuterStrips and axes

2011-03-01 Thread Jim Price
Consider the following:


library(lattice)
library(latticeExtra)

temp - expand.grid(
subject = factor(paste('Subject', 1:3)),
var = factor(paste('Variable', 1:3)),
time = 1:10
)
temp$resp - rnorm(nrow(temp), 10 * as.numeric(temp$var), 1)

ylimits - by(temp$resp, temp$var, function(x) range(pretty(x)))

useOuterStrips(xyplot(
  resp ~ time | subject * var,
  data = temp,
  as.table = TRUE,
  scales = list(
alternating = 1, tck = c(1, 0),
y = list(relation = 'free', rot = 0, limits 
= rep(ylimits, each =
3))
)
  ))


This is a matrix of variables on subjects, where it makes sense to have
panel-specific y-axes because of the differing variable ranges. In fact, it
makes sense to have row-specific y-axes, because of the similarity of
intra-variable inter-subject response. However the graphic as presented
gives per-panel y-axis annotation - I'd like to drop the 2nd and 3rd columns
of y-axes labels, so the panels become a contiguous array.

Is this possible?

Thanks,
Jim Price.
Cardiome Pharma. Corp.



-- 
View this message in context: 
http://r.789695.n4.nabble.com/Lattice-useOuterStrips-and-axes-tp3330338p3330338.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speed up sum of outer products?

2011-03-01 Thread AjayT
Hey thanks alot guys !!! That really speeds things up !!! I didn't know %*%
and crossprod, could operate on matrices. I think you've saved me hours in
calculation time. Thanks again.

 system.time({C=matrix(0,50,50);for(i in 1:n)C = C + (X[i,] %o% X[i,])}) 
   user  system elapsed 
   0.450.000.90 
 system.time({C1 = t(X) %*% X}) 
   user  system elapsed 
   0.020.000.05 
 system.time({C2 = crossprod(X)})
   user  system elapsed 
   0.020.000.02 

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Speed-up-sum-of-outer-products-tp3330160p3330378.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding pairs with least magnitude difference from mean

2011-03-01 Thread rex.dwyer
No, that's not what I meant, but maybe I didn't understand the question.
What I suggested would involve sorting y, not x: sort the *distances*.
If you want to minimize the sd of a subset of numbers, you sort the numbers and 
find a subset that is clumped together.
If the numbers are a function of pairs, you compute the function for all pairs 
of numbers, and find a subset that's clumped together.
Anyway, it's an idea, not a theorem, so proof is left as an exercise for the 
esteemed reader.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Hans W Borchers
Sent: Monday, February 28, 2011 2:17 PM
To: r-h...@stat.math.ethz.ch
Subject: Re: [R] Finding pairs with least magnitude difference from mean

 rex.dwyer at syngenta.com writes:

 James,
 It seems the 2*mean(x) term is irrelevant if you are seeking to
 minimize sd. Then you want to sort the distances from smallest to
 largest. Then it seems clear that your five values will be adjacent in
 the list, since if you have a set of five adjacent values, exchanging
 any of them for one further away in the list will increase the sd. The
 only problem I see with this is that you can't use a number more than
 once. In any case, you need to compute the best five pairs beginning
 at position i in the sorted list, for 1=i=choose(n,2), then take the
 max over all i.
 There no R in my answer such as you'd notice, but I hope it helps just
 the same.
 Rex

You probably mean something like the following:

x - rnorm(10)
y - outer(x, x, +) - (2 * mean(x))

o - order(x)
sd(c(y[o[1],o[10]], y[o[2],o[9]], y[o[3],o[8]], y[o[4],o[7]], y[o[5],o[6]]))

This seems reasonable, though you would have to supply a more stringent
argument. I did two tests and it works alright.

--Hans Werner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bootstrap resampling - simplified

2011-03-01 Thread Dennis Murphy
Hi:

On Tue, Mar 1, 2011 at 8:22 AM, Bodnar Laszlo EB_HU 
laszlo.bod...@erstebank.hu wrote:

 Hello there,

 I have a problem concerning bootstrapping in R - especially focusing on the
 resampling part of it. I try to sum it up in a simplified way so that I
 would not confuse anybody.

 I have a small database consisting of 20 observations (basically numbers
 from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).


To check on the probability of this event happening, I ran the following:
bootmat - matrix(sample(1:20, 20, replace = TRUE), nrow = 1)
sum(apply(bootmat, 1, function(x) any(table(x) = 5)) )
[1] 492

It's about 0.05. A Q D 'solution' would be to oversample by at least 5%
(let's do 10% just to be on the safe side) and then pick out the first B of
these. In the above example, we could do 11000 samples instead, and pick out
the first 1 that meet the criterion:

bootmat - matrix(sample(1:20, 22, replace = TRUE), nrow = 11000)
badsamps - apply(bootmat, 1, function(x) any(tabulate(x) = 5))
bootfin - bootmat[-badsamps, ][1:1, ]

Time:
   user  system elapsed
   0.280.000.28

(Note 1: Using table instead of tabulate took 4.22 seconds on my machine -
tabulate is much faster.)
(Note 2: In the call above, there were 539 bad samples, so the 5% ballpark
estimate seems plausible.)

This is a simple application of the accept-reject criterion. I don't know
how large 'many' is to you, but 10,000 seems to be a reasonable starting
point. I ran it again for 1,000,000 such samples, and the completion time
was
   user  system elapsed
  36.740.31   37.15
so the processing time is of an order a bit larger than linear. If your
simulations are of this magnitude and are to be run repeatedly, you probably
need to write a function to improve the speed and to get rid of the waste
produced by a rejection sampling approach. If this is a one-off deal,
perhaps the above is sufficient.

HTH,
Dennis

 I would like to resample this database many times for the bootstrap process
 with the following conditions. Firstly, every resampled database should also
 include 20 observations. Secondly, when selecting a number from the
 above-mentioned 20 numbers, you can do this selection with replacement. The
 difficult part comes now: one number can be selected only maximum 5 times.
 In order to make this clear I show you a couple of examples. So the
 resampled databases might be like the following ones:

 (1st database)  1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
 4 different numbers are chosen (1, 2, 3, 4), each selected - for the
 maximum possible - 5 times.

 (2nd database)  1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
 Two numbers - 8 and 6 - selected 5 times (the maximum possible times),
 number 1 selected 4 times, the others selected less than 4 times.

 (3rd database)  1,1,2,2,3,3,4,4,9,9,9,10,10,13,10,9,3,9,2,1
 Number 9 chosen for the maximum possible 5 times, number 10, 3, 2, 1 chosen
 for 3 times, number 4 selected twice and number 13 selected only once.

 ...

 Anybody knows how to implement my tricky condition into one of the R
 functions - that one number can be selected only 5 times at most? Are 'boot'
 and 'bootstrap' packages capable of managing this? I guess they are, I just
 couldn't figure it out yet...

 Thanks very much! Best regards,
 Laszlo Bodnar



 
 Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy
 jogilag, szakmailag vagy más módon védett információt tartalmazhat.
 Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése,
 reprodukálása, másolása, vagy egyéb más úton történõ terjesztése,
 felhasználása szigorúan tilos. Amennyiben tévedésbõl kapta meg ezt az
 üzenetet kérjük azonnal értesítse az üzenet küldõjét. Az Erste Bank Hungary
 Zrt. (EBH) nem vállal felelõsséget az információ teljes és pontos -
 címzett(ek)hez történõ - eljuttatásáért, valamint semmilyen késésért,
 kapcsolat megszakadásból eredõ hibáért, vagy az információ felhasználásából
 vagy annak megbízhatatlanságából eredõ kárért.

 Az üzenetek EBH-n kívüli küldõje vagy címzettje tudomásul veszi és
 hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH
 folytonos munkamenetének biztosítása érdekében.


 This e-mail and any attached files are confidential and/...{{dropped:19}}


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, 

[R] inefficient ifelse() ?

2011-03-01 Thread ivo welch
dear R experts---

  t - 1:30
  f - function(t) { cat(f for, t, \n); return(2*t) }
  g - function(t) { cat(g for, t, \n); return(3*t) }
  s - ifelse( t%%2==0, g(t), f(t))

shows that the ifelse function actually evaluates both f() and g() for
all values first, and presumably then just picks left or right results
based on t%%2.  uggh... wouldn't it make more sense to evaluate only
the relevant parts of each vector and then reassemble them?

/iaw

Ivo Welch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient ifelse() ?

2011-03-01 Thread Henrique Dallazuanna
Try this:

mapply(function(x, f)f(x), split(t, t %% 2), list(g, f))

On Tue, Mar 1, 2011 at 4:19 PM, ivo welch ivo...@gmail.com wrote:

 dear R experts---

  t - 1:30
  f - function(t) { cat(f for, t, \n); return(2*t) }
  g - function(t) { cat(g for, t, \n); return(3*t) }
  s - ifelse( t%%2==0, g(t), f(t))

 shows that the ifelse function actually evaluates both f() and g() for
 all values first, and presumably then just picks left or right results
 based on t%%2.  uggh... wouldn't it make more sense to evaluate only
 the relevant parts of each vector and then reassemble them?

 /iaw
 
 Ivo Welch

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glht() used with coxph()

2011-03-01 Thread array chip
Hi, I am experimenting with using glht() from multcomp package together with 
coxph(), and glad to find that glht() can work on coph object, for example:

 (fit-coxph(Surv(stop, status0)~treatment,bladder1))
coxph(formula = Surv(stop, status  0) ~ treatment, data = bladder1)


  coef exp(coef) se(coef)  zp
treatmentpyridoxine -0.063 0.9390.161 -0.391 0.70
treatmentthiotepa   -0.159 0.8530.168 -0.947 0.34

Likelihood ratio test=0.91  on 2 df, p=0.635  n= 294 

 glht(fit,linfct=mcp(treatment='Tukey'))

 General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Linear Hypotheses:
   Estimate
pyridoxine - placebo == 0  -0.06303
thiotepa - placebo == 0-0.15885
thiotepa - pyridoxine == 0 -0.09582

However, once I added a strata term in the formula of coxph(), then glht() 
can't 
work anymore:

 (fit-coxph(Surv(stop, status0)~treatment+strata(enum),bladder1))
coxph(formula = Surv(stop, status  0) ~ treatment + strata(enum), 
data = bladder1)


  coef exp(coef) se(coef) zp
treatmentpyridoxine  0.188  1.210.170  1.11 0.27
treatmentthiotepa   -0.210  0.810.172 -1.22 0.22

Likelihood ratio test=4.39  on 2 df, p=0.111  n= 294 

 glht(fit,linfct=mcp(treatment='Tukey'))
Error in glht.matrix(model = list(coefficients = c(0.187792527684977,  : 
  ‘ncol(linfct)’ is not equal to ‘length(coef(model))’

Can anyone suggest why strata would make coxph object ineligible for glht()? Or 
how to make it work?

Thanks

John


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] expression help

2011-03-01 Thread Daryl Morris

Hello,

I am trying to write math-type on a plot.  Due to space limitations on 
the plot, I want 2 short expressions written on top of each other.  It 
is certainly possible to write them in two separate calls, but that 
involves fine-tuning locations and a lot of trial and error (and I'm 
trying to write a general purpose function).


Here's where I've gotten to:
plot(0:1,0:1,xaxt=n)
axis(side=1,at=.3,expression(paste(IFN-, gamma, \n, TNF-, alpha)))
axis(side=1,at=.6,a label\n2nd line)

What I am trying to do is illustrated by the the non-expression axis 
label (a label\n2nd line).  The \n forces a new line when I'm not 
using expressions, but doesn't work for my real example.


I have googled for general documentation on expressions, but the only 
thing I've been able to find is the help for plotmath(), so if someone 
can point me to more complete documentation that would also be helpful.


thanks, Daryl
SCHARP, FHCRC, UW Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Export R dataframes to excel

2011-03-01 Thread maxsilva
I'm trying to do this in several ways but havent had any result. Im asked to
install python, or perl etc. Can anybody suggest a direct, easy and
understandable way?  Every help would be appreciated. 

Thx.

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Export-R-dataframes-to-excel-tp3330399p3330399.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] odbcConnectExcel2007 creates corrupted files

2011-03-01 Thread Mark Lyman
I tried creating a .xlsx file using odbcConnectExcel2007 and adding a 
worksheet with sqlSave. This seems to work, I am even able to query the 
worksheet, but when I try opening the file in Excel I get the following 
message: Excel cannot open the file 'test.xlx' because the file format or 
file extension is not valid. Verify that the file has not been corrupted and 
that the file extension matches the format of the file. Is this a known 
issue? Or just user error? The RODBC manual seemed to indicate that sqlSave 
worked fine with Excel, however it did not mention Excel 2007.

I am running Excel 2007 and R 2.12.1 on Windows XP. Below is my example code. 

$ library(RODBC)
$ 
$ # This doesn't work
$ # Connect to an previously non-existent Excel file
$ out - odbcConnectExcel2007(test.xlsx, readOnly=FALSE)
$ test - data.frame(x=1:10, y=rnorm(10))
$ sqlSave(out, test)
$ sqlTables(out)
TABLE_CAT TABLE_SCHEM 
TABLE_NAME   TABLE_TYPE REMARKS
1 C:\\Documents and Settings\\G69974\\My Documents\\test.xlsxNA  
test$ SYSTEM TABLENA
2 C:\\Documents and Settings\\G69974\\My Documents\\test.xlsx
NA   testTABLENA
$ sqlFetch(out, test)
x  y
1   1  0.5832882
2   2  0.4387569
3   3 -0.6444048
4   4 -1.0013450
5   5  1.0324718
6   6 -0.7844128
7   7 -1.6789266
8   8  0.1402672
9   9  0.8650061
10 10 -0.0420201
$ close(out)
$ # Opening test.xlsx now fails
$ 
$ # This works
$ out - odbcConnectExcel(test.xls, readOnly=FALSE)
$ test - data.frame(x=1:10, y=rnorm(10))
$ sqlSave(out, test)
$ sqlTables(out)
   TABLE_CAT TABLE_SCHEM 
TABLE_NAME   TABLE_TYPE REMARKS
1 C:\\Documents and Settings\\G69974\\My Documents\\testNA  
test$ SYSTEM TABLENA
2 C:\\Documents and Settings\\G69974\\My Documents\\testNA   
testTABLENA
$ sqlFetch(out, test)
x  y
1   1  0.5955787
2   2  1.0517528
3   3  0.3884892
4   4 -2.1408813
5   5 -0.7081686
6   6  0.1511828
7   7  2.0560555
8   8 -0.5801912
9   9 -0.6988058
10 10 -0.1237739
$ close(out)
$ # Opening test.xls now works

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export R dataframes to excel

2011-03-01 Thread KATSCHKE, ADRIAN CIV DFAS
write.table() using the sep=, and file extension as .csv works great to pull 
directly into excel. 

?write.table 

Without more detail as to the problem, it is difficult to give a more specific 
answer.

Adrian 
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of maxsilva
Sent: Tuesday, March 01, 2011 2:11 PM
To: r-help@r-project.org
Subject: [R] Export R dataframes to excel

I'm trying to do this in several ways but havent had any result. Im asked to
install python, or perl etc. Can anybody suggest a direct, easy and
understandable way?  Every help would be appreciated. 

Thx.

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Export-R-dataframes-to-excel-tp3330399p3330399.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data type problem when extract data from SQLite to R by using RSQLite

2011-03-01 Thread Seth Falcon
On Tue, Mar 1, 2011 at 10:06 AM, chen jia chen_1...@fisher.osu.edu wrote:
 Hi Seth,

 Thanks so much for identifying the problem and explaining everything.
 I think the first solution that you suggest--make sure the schema has
 well defined types--would work the best for me. But, I have one
 question about how to implement it, which is more about sqlite itself.

 First, I found out that the columns that don't have the expected data
 types in the table annual_data3 are created by aggregate functions in
 a separate table. These columns are later combined with other columns
 that do.

 I read the link that you provide,
 http://www.sqlite.org/datatype3.html. One paragraph says When
 grouping values with the GROUP BY clause values with different storage
 classes are considered distinct, except for INTEGER and REAL values
 which are considered equal if they are numerically equal. No
 affinities are applied to any values as the result of a GROUP by
 clause.

 If I understand it correctly, the columns created by aggregate
 functions with a GROUP by clause do not have any expected data types.

 My solution is to use CREATE TABLE clause to declare the expected
 datatype and then insert the values of columns created by the
 aggregate functions with the GROUP by clause. However, this solution
 requires a CREATE TABLE cause every time the aggregate function and
 the GROUP by clause is used.

 My question is: Is this the best way to make sure that the columns as
 a result of a GROUP by clause have the expected data types? Thanks.

That might be a good question to post to the SQLite user's list :-)

I don't have an answer off the top of my head.  My reading of the
SQLite docs would lead me to expect that a GROUP BY clause would not
change/remove type if the column being grouped contains all the same
declared type affinity.

+ seth


 Best,
 Jia

 On Tue, Mar 1, 2011 at 1:16 AM, Seth Falcon s...@userprimary.net wrote:
 Hi Jia,

 On Mon, Feb 28, 2011 at 6:57 PM, chen jia chen_1...@fisher.osu.edu wrote:
 The .schema of table annual_data3 is
 sqlite .schema annual_data3
 CREATE TABLE annual_data3(
  PERMNO INT,
  DATE INT,
  CUSIP TEXT,
  EXCHCD INT,
  SICCD INT,
  SHROUT INT,
  PRC REAL,
  RET REAL,
  ...
  pret_var,
  pRET_sd,
  nmret,
  pya_var,

 [snip]

 Is there a reason that you've told SQLite the expected data type for
 only some of the columns?

 Interestingly, I find that the problem I reported does not for columns
 labeled real in the schema info. For example, the type of column RET
 never changes no matter what the first observation is.

 Yes, that is expected and I think it is the solution to your problem:
 setup your schema so that all columns have a declared type.  For some
 details on SQLite's type system see
 http://www.sqlite.org/datatype3.html.

 RSQLite currently maps NA values to NULL in the database.  Pulling
 data out of a SELECT query, RSQLite uses the sqlite3_column_type
 SQLite API to determine the data type and map it to an R type.  If
 NULL is encountered, then the schema is inspected using
 sqlite3_column_decltype to attempt to obtain a type.  If that fails,
 the data is mapped to a character vector at the R level.  The type
 selection is done once after the first row has been fetched.

 To work around this you can:

 - make sure your schema has well defined
  types (which will help SQLite perform its operations);

 - check whether the returned column has the expected type and convert
  if needed at the R level.

 - remove NA/NULL values from the db or decide on a different way of
  encoding them (e.g you might be able to use -1 in the db in some
  situation to indicate missing).  Your R code would then need to map
  these to proper NA.

 Hope that helps.

 + seth



 --
 Seth Falcon | @sfalcon | http://userprimary.net/




 --
 700 Fisher Hall
 2100 Neil Ave.
 Columbus, Ohio  43210
 http://www.fisher.osu.edu/~chen_1002/




-- 
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export R dataframes to excel

2011-03-01 Thread Steve Taylor
You can copy it with the following function and then paste into Excel...
 
copy = function (df, buffer.kb=256) {
  write.table(df, file=paste(clipboard-,buffer.kb,sep=),
  sep=\t, na='', quote=FALSE, row.names=FALSE)
}


 

From: maxsilva mmsil...@uc.cl
To:r-help@r-project.org
Date: 2/Mar/2011 8:50a
Subject: [R] Export R dataframes to excel

I'm trying to do this in several ways but havent had any result. Im asked to
install python, or perl etc. Can anybody suggest a direct, easy and
understandable way?  Every help would be appreciated. 

Thx.

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Export-R-dataframes-to-excel-tp3330399p3330399.html
 
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R ( http://www.r/ 
)-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export R dataframes to excel

2011-03-01 Thread Berend Hasselman

maxsilva wrote:
 
 Thx, but im looking for a more direct solution... my problem is very
 simple, I have a dataframe and I want to create a standard excel
 spreadsheet. My dataframe could be something like this
 

More or less the same question  was answered several hours ago.
See  http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windowss=excel
as Gabor Grothendieck suggested.

/Berend

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Export-R-dataframes-to-excel-tp3330399p3330518.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export R dataframes to excel

2011-03-01 Thread Don McKenzie

Or

?write.csv

which excel will import

On 1-Mar-11, at 12:17 PM, Berend Hasselman wrote:



maxsilva wrote:


Thx, but im looking for a more direct solution... my problem is very
simple, I have a dataframe and I want to create a standard excel
spreadsheet. My dataframe could be something like this



More or less the same question  was answered several hours ago.
See  http://rwiki.sciviews.org/doku.php?id=tips:data- 
io:ms_windowss=excel

as Gabor Grothendieck suggested.

/Berend

--
View this message in context: http://r.789695.n4.nabble.com/Export- 
R-dataframes-to-excel-tp3330399p3330518.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html

and provide commented, minimal, self-contained, reproducible code.


Why does the universe go to all the bother of existing?
-- Stephen Hawking

#define QUESTION ((bb) || !(bb))
-- William Shakespeare



Don McKenzie, Research Ecologist
Pacific WIldland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Forest Resources, College of the Environment
CSES Climate Impacts Group
University of Washington

desk: 206-732-7824
cell: 206-321-5966
d...@uw.edu
donaldmcken...@fs.fed.us

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice: useOuterStrips and axes

2011-03-01 Thread Peter Ehlers

On 2011-03-01 10:29, Jim Price wrote:

Consider the following:


library(lattice)
library(latticeExtra)

temp- expand.grid(
subject = factor(paste('Subject', 1:3)),
var = factor(paste('Variable', 1:3)),
time = 1:10
)
temp$resp- rnorm(nrow(temp), 10 * as.numeric(temp$var), 1)

ylimits- by(temp$resp, temp$var, function(x) range(pretty(x)))

useOuterStrips(xyplot(
  resp ~ time | subject * var,
  data = temp,
  as.table = TRUE,
  scales = list(
alternating = 1, tck = c(1, 0),
y = list(relation = 'free', rot = 0, limits 
= rep(ylimits, each =
3))
)
  ))


This is a matrix of variables on subjects, where it makes sense to have
panel-specific y-axes because of the differing variable ranges. In fact, it
makes sense to have row-specific y-axes, because of the similarity of
intra-variable inter-subject response. However the graphic as presented
gives per-panel y-axis annotation - I'd like to drop the 2nd and 3rd columns
of y-axes labels, so the panels become a contiguous array.

Is this possible?


Looks like you want the combineLimits() function in latticeExtra.
Specifically,

 p - [your code]
 combineLimits(p, margin.y = 1, extend = FALSE)

Peter Ehlers



Thanks,
Jim Price.
Cardiome Pharma. Corp.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient ifelse() ?

2011-03-01 Thread ivo welch
thanks, Henrique.  did you mean

as.vector(t(mapply(function(x, f)f(x), split(t, ((t %% 2)==0)),
list(f, g   ?

otherwise, you get a matrix.

its a good solution, but unfortunately I don't think this can be used
to redefine ifelse(cond,ift,iff) in a way that is transparent.  the
ift and iff functions will always be evaluated before the function
call happens, even with lazy evaluation.  :-(

I still think that it makes sense to have a smarter vectorized %if% in
a vectorized language like R.  just my 5 cents.

/iaw


Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)





On Tue, Mar 1, 2011 at 2:33 PM, Henrique Dallazuanna www...@gmail.com wrote:
 Try this:

 mapply(function(x, f)f(x), split(t, t %% 2), list(g, f))

 On Tue, Mar 1, 2011 at 4:19 PM, ivo welch ivo...@gmail.com wrote:

 dear R experts---

  t - 1:30
  f - function(t) { cat(f for, t, \n); return(2*t) }
  g - function(t) { cat(g for, t, \n); return(3*t) }
  s - ifelse( t%%2==0, g(t), f(t))

 shows that the ifelse function actually evaluates both f() and g() for
 all values first, and presumably then just picks left or right results
 based on t%%2.  uggh... wouldn't it make more sense to evaluate only
 the relevant parts of each vector and then reassemble them?

 /iaw
 
 Ivo Welch

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speed up sum of outer products?

2011-03-01 Thread Dennis Murphy
...and this is where we cue the informative article on least squares
calculations in R by Doug Bates:

http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf

HTH,
Dennis

On Tue, Mar 1, 2011 at 10:52 AM, AjayT ajaytal...@googlemail.com wrote:

 Hey thanks alot guys !!! That really speeds things up !!! I didn't know %*%
 and crossprod, could operate on matrices. I think you've saved me hours in
 calculation time. Thanks again.

  system.time({C=matrix(0,50,50);for(i in 1:n)C = C + (X[i,] %o% X[i,])})
   user  system elapsed
0.450.000.90
  system.time({C1 = t(X) %*% X})
   user  system elapsed
0.020.000.05
  system.time({C2 = crossprod(X)})
   user  system elapsed
   0.020.000.02

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Speed-up-sum-of-outer-products-tp3330160p3330378.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] expression help

2011-03-01 Thread Dennis Murphy
Hi:

1. expression() in plotmath ignores control characters such as \n.
2. The workaround, discussed a couple of times on this list (hence in the
archives), is to use the atop function, so try something like

plot(0:1,0:1,xaxt=n)
axis(side=1,at=.3,expression(paste(IFN-, gamma, \n, TNF-, alpha)))
axis(side=1,at=.6,expression(atop(a label, 2nd line)))

It appears you'll have to make some vertical adjustments, but that's the
basic workaround.

HTH,
Dennis

On Tue, Mar 1, 2011 at 11:32 AM, Daryl Morris dar...@uw.edu wrote:

 Hello,

 I am trying to write math-type on a plot.  Due to space limitations on the
 plot, I want 2 short expressions written on top of each other.  It is
 certainly possible to write them in two separate calls, but that involves
 fine-tuning locations and a lot of trial and error (and I'm trying to write
 a general purpose function).

 Here's where I've gotten to:
 plot(0:1,0:1,xaxt=n)
 axis(side=1,at=.3,expression(paste(IFN-, gamma, \n, TNF-, alpha)))
 axis(side=1,at=.6,a label\n2nd line)

 What I am trying to do is illustrated by the the non-expression axis label
 (a label\n2nd line).  The \n forces a new line when I'm not using
 expressions, but doesn't work for my real example.

 I have googled for general documentation on expressions, but the only thing
 I've been able to find is the help for plotmath(), so if someone can point
 me to more complete documentation that would also be helpful.

 thanks, Daryl
 SCHARP, FHCRC, UW Biostatistics

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with many independent variables

2011-03-01 Thread Greg Snow
You can use ^2 to get all 2 way interactions and ^3 to get all 3 way 
interactions, e.g.:

lm(Sepal.Width ~ (. - Sepal.Length)^2, data=iris)

The lm.fit function is what actually does the fitting, so you could go directly 
there, but then you lose the benefits of using . and ^.  The Matrix package has 
ways of dealing with sparse matricies, but I don't know if  that would help 
here or not.

You could also just create x'x and x'y matricies directly since the variables 
are 0/1 then use solve.  A lot depends on what you are doing and what questions 
you are trying to answer.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: Matthew Douglas [mailto:matt.dougla...@gmail.com]
 Sent: Tuesday, March 01, 2011 1:09 PM
 To: Greg Snow
 Cc: r-help@r-project.org
 Subject: Re: [R] Regression with many independent variables
 
 Hi Greg,
 
 Thanks for the help, it works perfectly. To answer your question,
 there are 339 independent variables but only 10 will be used at one
 time . So at any given line of the data set there will be 10 non zero
 entries for the independent variables and the rest will be zeros.
 
 One more question:
 
 1. I still want to find a way to look at the interactions of the
 independent variables.
 
 the regression would look like this:
 
 y = b12*X1X2 + b23*X2X3 +...+ bk-1k*Xk-1Xk
 
 so I think the regression in R would look like this:
 
 lm(MARGIN, P235:P236+P236:P237+,weights = Poss, data = adj0708),
 
 my problem is that since I have technically 339 independent variables,
 when I do this regression I would have 339 Choose 2 = approx 57000
 independent variables (a vast majority will be 0s though) so I dont
 want to have to write all of these out. Is there a way to do this
 quickly in R?
 
 Also just a curious question that I cant seem to find to online:
 is there a more efficient model other than lm() that is better for
 very sparse data sets like mine?
 
 Thanks,
 Matt
 
 
 On Mon, Feb 28, 2011 at 4:30 PM, Greg Snow greg.s...@imail.org wrote:
  Don't put the name of the dataset in the formula, use the data
 argument to lm to provide that.  A single period (.) on the right
 hand side of the formula will represent all the columns in the data set
 that are not on the left hand side (you can then use - to remove any
 other columns that you don't want included on the RHS).
 
  For example:
 
  lm(Sepal.Width ~ . - Sepal.Length, data=iris)
 
  Call:
  lm(formula = Sepal.Width ~ . - Sepal.Length, data = iris)
 
  Coefficients:
       (Intercept)       Petal.Length        Petal.Width
  Speciesversicolor
            3.0485             0.1547             0.6234            -
 1.7641
   Speciesvirginica
           -2.1964
 
 
  But, are you sure that a regression model with 339 predictors will be
 meaningful?
 
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  greg.s...@imail.org
  801.408.8111
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Matthew Douglas
  Sent: Monday, February 28, 2011 1:32 PM
  To: r-help@r-project.org
  Subject: [R] Regression with many independent variables
 
  Hi,
 
  I am trying use lm() on some data, the code works fine but I would
  like to use a more efficient way to do this.
 
  The data looks like this (the data is very sparse with a few 1s, -1s
  and the rest 0s):
 
   head(adj0708)
        MARGIN Poss P235 P247 P703 P218 P430 P489 P83 P307 P337
  1   64.28571   29    0    0    0    0    0    0   0    0    0    0
  0    0    0
  2 -100.0    6    0    0    0    0    0    0   0    1    0    0
  0    0    0
  3  100.0    4    0    0    0    0    0    0   0    1    0    0
  0    0    0
  4  -33.3    7    0    0    0    0    0    0   0    0    0    0
  0    0    0
  5  200.0    2    0    0    0    0    0    0   0    0    0    0
  -1    0    0
  6  -83.3   12    0    -1    0    0    0    0   0    0    0    0
  0    0    0
 
  adj0708 is actually a 35657x341 data set. Each column after Poss
 is
  an independent variable, the dependent variable is MARGIN and it
 is
  weighted by Poss
 
 
  The regression is below:
  fit.adj0708 - lm( adj0708$MARGIN~adj0708$P235 + adj0708$P247 +
  adj0708$P703 + adj0708$P430 + adj0708$P489 + adj0708$P218 +
  adj0708$P605 + adj0708$P337 +  +
  adj0708$P510,weights=adj0708$Poss)
 
  I have two questions:
 
  1. Is there a way to to condense how I write the independent
 variables
  in the lm(), instead of having such a long line of code (I have 339
  independent variables to be exact)?
  2. I would like to pair the data to look a regression of the
  interactions between two independent variables. I think it would
 look
  something like this
  fit.adj0708 - lm( adj0708$MARGIN~adj0708$P235:adj0708$P247 +
  adj0708$P703:adj0708$P430 + adj0708$P489:adj0708$P218 +
  adj0708$P605:adj0708$P337 + 

Re: [R] Export R dataframes to excel

2011-03-01 Thread Iain Gallagher
This appeared today on the r-bloggers site and might be useful for you.

http://www.r-bloggers.com/release-of-xlconnect-0-1-3/

cheers

i

--- On Tue, 1/3/11, Steve Taylor steve.tay...@aut.ac.nz wrote:

 From: Steve Taylor steve.tay...@aut.ac.nz
 Subject: Re: [R] Export R dataframes to excel
 To: r-help@r-project.org, maxsilva mmsil...@uc.cl
 Date: Tuesday, 1 March, 2011, 20:15
 You can copy it with the following
 function and then paste into Excel...
  
 copy = function (df, buffer.kb=256) {
   write.table(df,
 file=paste(clipboard-,buffer.kb,sep=),
       sep=\t, na='', quote=FALSE,
 row.names=FALSE)
 }
 
 
  
 
 From: maxsilva mmsil...@uc.cl
 To:r-help@r-project.org
 Date: 2/Mar/2011 8:50a
 Subject: [R] Export R dataframes to excel
 
 I'm trying to do this in several ways but havent had any
 result. Im asked to
 install python, or perl etc. Can anybody suggest a
 direct, easy and
 understandable way?  Every help would be appreciated.
 
 
 Thx.
 
 -- 
 View this message in context: 
 http://r.789695.n4.nabble.com/Export-R-dataframes-to-excel-tp3330399p3330399.html
 
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide http://www.R ( http://www.r/
 )-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error in saved .csv

2011-03-01 Thread Tamas Barjak
Help me please!

I would like to be saved a data table:

write.csv(random.t1, place, dec=,, append = T, quote = FALSE, sep =  ,
qmethod = double, eol = \n, row.names=F)

It's OK!

But the rows of file

 1,1,21042,-4084.87179487179,2457.66483516483,-582.275562799881
2,2,23846,-6383.86480186479,-3409.98451548449,-3569.72145340269
and no

1
21042 - 4084.87179487179 2457.66483516483
Not proportional...

What's the problem???

Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient ifelse() ?

2011-03-01 Thread Dennis Murphy
Hi:

As far as I can see, the problem has to do with the way you wrote f and g:

 f(2)
f for 2
[1] 4
 g(5)
g for 5
[1] 15

You output an unsaved character string with a numeric result, but the
character string is not part of the return object since it is neither saved
as a name or an attribute. If you save the output to an object, it returns
the numeric outcome:
 s - f(2)
 s
[1] 4
 names(s)
NULL
 attributes(s)
NULL

However,
 f - function(x) 2^x
 g - function(x) 3^x
 ifelse(t %% 2 == 0, f(t), g(t))
 [1] 3.00e+00 4.00e+00 2.70e+01 1.60e+01 2.43e+02
 [6] 6.40e+01 2.187000e+03 2.56e+02 1.968300e+04 1.024000e+03
[11] 1.771470e+05 4.096000e+03 1.594323e+06 1.638400e+04 1.434891e+07
[16] 6.553600e+04 1.291402e+08 2.621440e+05 1.162261e+09 1.048576e+06
[21] 1.046035e+10 4.194304e+06 9.414318e+10 1.677722e+07 8.472886e+11
[26] 6.710886e+07 7.625597e+12 2.684355e+08 6.863038e+13 1.073742e+09

is the same as
 ifelse(t %% 2 == 0, 2^t, 3^t)
 [1] 3.00e+00 4.00e+00 2.70e+01 1.60e+01 2.43e+02
 [6] 6.40e+01 2.187000e+03 2.56e+02 1.968300e+04 1.024000e+03
[11] 1.771470e+05 4.096000e+03 1.594323e+06 1.638400e+04 1.434891e+07
[16] 6.553600e+04 1.291402e+08 2.621440e+05 1.162261e+09 1.048576e+06
[21] 1.046035e+10 4.194304e+06 9.414318e+10 1.677722e+07 8.472886e+11
[26] 6.710886e+07 7.625597e+12 2.684355e+08 6.863038e+13 1.073742e+09

which is the vector result you apparently wanted. If you also want to output
the character string with the value, you could either return a list or
rewrite the print method of ifelse() to accommodate it.

HTH,
Dennis


On Tue, Mar 1, 2011 at 12:36 PM, ivo welch ivo.we...@gmail.com wrote:

 thanks, Henrique.  did you mean

as.vector(t(mapply(function(x, f)f(x), split(t, ((t %% 2)==0)),
 list(f, g   ?

 otherwise, you get a matrix.

 its a good solution, but unfortunately I don't think this can be used
 to redefine ifelse(cond,ift,iff) in a way that is transparent.  the
 ift and iff functions will always be evaluated before the function
 call happens, even with lazy evaluation.  :-(

 I still think that it makes sense to have a smarter vectorized %if% in
 a vectorized language like R.  just my 5 cents.

 /iaw

 
 Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)





 On Tue, Mar 1, 2011 at 2:33 PM, Henrique Dallazuanna www...@gmail.com
 wrote:
  Try this:
 
  mapply(function(x, f)f(x), split(t, t %% 2), list(g, f))
 
  On Tue, Mar 1, 2011 at 4:19 PM, ivo welch ivo...@gmail.com wrote:
 
  dear R experts---
 
   t - 1:30
   f - function(t) { cat(f for, t, \n); return(2*t) }
   g - function(t) { cat(g for, t, \n); return(3*t) }
   s - ifelse( t%%2==0, g(t), f(t))
 
  shows that the ifelse function actually evaluates both f() and g() for
  all values first, and presumably then just picks left or right results
  based on t%%2.  uggh... wouldn't it make more sense to evaluate only
  the relevant parts of each vector and then reassemble them?
 
  /iaw
  
  Ivo Welch
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  --
  Henrique Dallazuanna
  Curitiba-Paraná-Brasil
  25° 25' 40 S 49° 16' 22 O
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Major update to rms package

2011-03-01 Thread Frank Harrell
A new version of rms is now available on CRAN for Linux and Windows (Mac  
will probably be available very soon). Largest changes include latex  
methods for validate.* and adding the capability to force a subset of  
variables to be included in all backwards stepdown models (single model or  
validation by resampling).


Recent updates:

   * In survplot.rms, fixed bug (curves were undefined if conf='bands' and  
labelc was FALSE)

   * In survfit.cph, fixed bug by which n wasn't always defined
   * In cph, put survival::: on exact fit call
   * Quit ignoring zlim argument in bplot; added xlabrot argument
   * Added caption argument for latex.anova.rms
   * Changed predab to not print summaries of variables selected if bw=TRUE
   * Changed predab to pass force argument to fastbw
   * fastbw: implemented force argument
   * Added force argument to validate.lrm, validate.bj, calibrate.default,  
calibrate.cph, calibrate.psm, validate.bj, validate.cph, validate.ols
   * print.validate: added B argument to limit how many resamples are  
printed summarizing variables selected if BW=TRUE

   * print.calibrate, print.calibrate.default: added B argument
   * Added latex method for results produced by validate functions
   * Fixed survest.cph to convert summary.survfit std.err to log S(t) scale
   * Fixed val.surv by pulling surv object from survest result
   * Clarified in predict.lrm help file that doesn't always use the first  
intercept
   * lrm.fit, lrm: linear predictor stored in fit object now uses first  
intercept and not middle one (NOT DOWNWARD COMPATIBLE but makes predict  
work when using stored linear.predictors)

   * Fixed argument consistency with validate methods


More information is at http://biostat.mc.vanderbilt.edu/Rrms

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export R dataframes to excel

2011-03-01 Thread maxsilva
Thx, but im looking for a more direct solution... my problem is very simple,
I have a dataframe and I want to create a standard excel spreadsheet. My
dataframe could be something like this

id   sex weight
1 M   5'8
2 F6'2
3 F5'5
4 M   5'7
5 F6'3



-- 
View this message in context: 
http://r.789695.n4.nabble.com/Export-R-dataframes-to-excel-tp3330399p3330491.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with many independent variables

2011-03-01 Thread Matthew Douglas
Hi Greg,

Thanks for the help, it works perfectly. To answer your question,
there are 339 independent variables but only 10 will be used at one
time . So at any given line of the data set there will be 10 non zero
entries for the independent variables and the rest will be zeros.

One more question:

1. I still want to find a way to look at the interactions of the
independent variables.

the regression would look like this:

y = b12*X1X2 + b23*X2X3 +...+ bk-1k*Xk-1Xk

so I think the regression in R would look like this:

lm(MARGIN, P235:P236+P236:P237+,weights = Poss, data = adj0708),

my problem is that since I have technically 339 independent variables,
when I do this regression I would have 339 Choose 2 = approx 57000
independent variables (a vast majority will be 0s though) so I dont
want to have to write all of these out. Is there a way to do this
quickly in R?

Also just a curious question that I cant seem to find to online:
is there a more efficient model other than lm() that is better for
very sparse data sets like mine?

Thanks,
Matt


On Mon, Feb 28, 2011 at 4:30 PM, Greg Snow greg.s...@imail.org wrote:
 Don't put the name of the dataset in the formula, use the data argument to lm 
 to provide that.  A single period (.) on the right hand side of the formula 
 will represent all the columns in the data set that are not on the left hand 
 side (you can then use - to remove any other columns that you don't want 
 included on the RHS).

 For example:

 lm(Sepal.Width ~ . - Sepal.Length, data=iris)

 Call:
 lm(formula = Sepal.Width ~ . - Sepal.Length, data = iris)

 Coefficients:
      (Intercept)       Petal.Length        Petal.Width  Speciesversicolor
           3.0485             0.1547             0.6234            -1.7641
  Speciesvirginica
          -2.1964


 But, are you sure that a regression model with 339 predictors will be 
 meaningful?

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Matthew Douglas
 Sent: Monday, February 28, 2011 1:32 PM
 To: r-help@r-project.org
 Subject: [R] Regression with many independent variables

 Hi,

 I am trying use lm() on some data, the code works fine but I would
 like to use a more efficient way to do this.

 The data looks like this (the data is very sparse with a few 1s, -1s
 and the rest 0s):

  head(adj0708)
       MARGIN Poss P235 P247 P703 P218 P430 P489 P83 P307 P337
 1   64.28571   29    0    0    0    0    0    0   0    0    0    0
 0    0    0
 2 -100.0    6    0    0    0    0    0    0   0    1    0    0
 0    0    0
 3  100.0    4    0    0    0    0    0    0   0    1    0    0
 0    0    0
 4  -33.3    7    0    0    0    0    0    0   0    0    0    0
 0    0    0
 5  200.0    2    0    0    0    0    0    0   0    0    0    0
 -1    0    0
 6  -83.3   12    0    -1    0    0    0    0   0    0    0    0
 0    0    0

 adj0708 is actually a 35657x341 data set. Each column after Poss is
 an independent variable, the dependent variable is MARGIN and it is
 weighted by Poss


 The regression is below:
 fit.adj0708 - lm( adj0708$MARGIN~adj0708$P235 + adj0708$P247 +
 adj0708$P703 + adj0708$P430 + adj0708$P489 + adj0708$P218 +
 adj0708$P605 + adj0708$P337 +  +
 adj0708$P510,weights=adj0708$Poss)

 I have two questions:

 1. Is there a way to to condense how I write the independent variables
 in the lm(), instead of having such a long line of code (I have 339
 independent variables to be exact)?
 2. I would like to pair the data to look a regression of the
 interactions between two independent variables. I think it would look
 something like this
 fit.adj0708 - lm( adj0708$MARGIN~adj0708$P235:adj0708$P247 +
 adj0708$P703:adj0708$P430 + adj0708$P489:adj0708$P218 +
 adj0708$P605:adj0708$P337 + ,weights=adj0708$Poss)
 but there will be 339 Choose 2 combinations, so a lot of independent
 variables! Is there a more efficient way of writing this code. Is
 there a way I can do this?

 Thanks,
 Matt

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Stepwise Criterion

2011-03-01 Thread Mano Gabor Toth
Dear R-help members,

I'd like to run a binomial logistic stepwise regression with ten explanatory
variables and as many interaction terms as R can handle. I'll come up with
the right R command sooner or later, but my real question is whether and how
the criterion for the evaluation of the different models can be set to be
the probability of the residual deviance in the Chi-Square distribution
(which would be more informative of overall model fit than AIC).

Thanks in advance for all your help.

Kind regards,

Mano Gabor TOTH
MA Political Science
Central European University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pairwise T-Tests and Dunnett's Test (possibly using multcomp)

2011-03-01 Thread Paul Miller
Hello Everyone,
 
I've been learning to use R in my spare time over the past several months. I've 
read about 7-8 books on the subject. Lately I've been testing what I've learned 
by trying to replicate the analyses from some of my SAS books. This helps me 
make sure I know how to use R properly and also helps me to understand how the 
two programs are similar and different.
 
Below is an attempt to replicate some SAS analyses. This was a success in that 
I was able to reproduce the results using R. I have a feeling though that it 
may not represent an ideal approach. So I was hoping a few people might be 
willing to look at what I've done and to suggest improvements or alternatives.
 
One thing I'm struggling with is setting the reference level. I inserted a 
comment about this in the R code.
 
I've also pasted in the original SAS code in case some people are also SAS 
users and might find this helpful or interesting.
 
Thanks,
 
Paul
 
  
R Code:
 
##
 Chapter 6: One-Way ANOVA 
##
 
 Example 6.1: One-Way ANOVA using HAM-A Scores in GAD  
 
connection - textConnection(
101 LO 21  104 LO 18
106 LO 19  110 LO  .
112 LO 28  116 LO 22
120 LO 30  121 LO 27
124 LO 28  125 LO 19
130 LO 23  136 LO 22
137 LO 20  141 LO 19
143 LO 26  148 LO 35
152 LO  .  103 HI 16
105 HI 21  109 HI 31
111 HI 25  113 HI 23
119 HI 25  123 HI 18
127 HI 20  128 HI 18
131 HI 16  135 HI 24
138 HI 22  140 HI 21
142 HI 16  146 HI 33
150 HI 21  151 HI 17
102 PB 22  107 PB 26
108 PB 29  114 PB 19
115 PB  .  117 PB 33
118 PB 37  122 PB 25
126 PB 28  129 PB 26
132 PB  .  133 PB 31
134 PB 27  139 PB 30
144 PB 25  145 PB 22
147 PB 36  149 PB 32
)
 
gad - data.frame(scan(connection,
   list(patno=0,dosegrp=,hama=0),na.strings=.))
 
gad
 
 Summary Statistics 
 
summaryfn - function(x) data.frame(
   mean=mean(x,na.rm=T),std.dev=sd(x,na.rm=T),n=sum(!is.na(x)))
 
by(gad$hama,gad$dosegrp,summaryfn)
 
 ANOVA 
 
model1 - aov(hama~dosegrp,data=gad)
summary.aov(model1)
 
 Multiple Comparisons 
 
library(multcomp)
 
 Pairwise T-Tests 
 
cht - glht(model1, linfct = mcp(dosegrp = Tukey))
summary(cht, test = univariate())
 
 Dunnett's Test 
 
#Not sure how to set the reference group to placebo
#cht - glht(model1, linfct = mcp(dosegrp = Dunnett))
#summary(cht, test = univariate())
 
model2 - aov(hama ~ dosegrp - 1, data = gad)
K - rbind(c(1, 0, -1), c(0, 1, -1))
rownames(K) - c(HI vs PL, LO vs PL)
colnames(K) - names(coef(model2))
summary(glht(model2, linfct = K))
 
 Treatment vs. Placebo 
 
#SAS produces an F-test but this is equivalent
 
K - rbind(c(0.5, 0.5, -1))
rownames(K) - c(TR vs PL)
colnames(K) - names(coef(model2))
summary(glht(model2, linfct = K))

SAS code:
 
*---*;
** Chapter 6: One-Way ANOVA;
 
/* Example 6.1: HAM-A Scores in GAD */
 
DATA GAD;
    INPUT PATNO DOSEGRP $ HAMA @@;
    DATALINES;
101 LO 21  104 LO 18
106 LO 19  110 LO  .
112 LO 28  116 LO 22
120 LO 30  121 LO 27
124 LO 28  125 LO 19
130 LO 23  136 LO 22
137 LO 20  141 LO 19
143 LO 26  148 LO 35
152 LO  .  103 HI 16
105 HI 21  109 HI 31
111 HI 25  113 HI 23
119 HI 25  123 HI 18
127 HI 20  128 HI 18
131 HI 16  135 HI 24
138 HI 22  140 HI 21
142 HI 16  146 HI 33
150 HI 21  151 HI 17
102 PB 22  107 PB 26
108 PB 29  114 PB 19
115 PB  .  117 PB 33
118 PB 37  122 PB 25
126 PB 28  129 PB 26
132 PB  .  133 PB 31
134 PB 27  139 PB 30
144 PB 25  145 PB 22
147 PB 36  149 PB 32
;
 
PROC SORT DATA = GAD; BY DOSEGRP;
PROC MEANS MEAN STD N DATA = GAD;
    BY DOSEGRP;
    VAR HAMA;
    TITLE1 'One-Way ANOVA';
    TITLE2 'EXAMPLE 6.1: HAM-A Scores in GAD';
RUN;
 
PROC GLM DATA = GAD;
    CLASS DOSEGRP;
    MODEL HAMA = DOSEGRP;
    MEANS DOSEGRP/T DUNNETT('PB');
    CONTRAST 'ACTIVE vs. PLACEBO' DOSEGRP 0.5 0.5 -1;
RUN;


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice: useOuterStrips and axes

2011-03-01 Thread Jim Price
Thank you, that's exactly what I needed.

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Lattice-useOuterStrips-and-axes-tp3330338p3330613.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to understand output from R's polr function (ordered logistic regression)?

2011-03-01 Thread Dan Frankowski
I am new to R, ordered logistic regression, and polr.

The Examples section at the bottom of the help page for
polrhttp://stat.ethz.ch/R-manual/R-patched/library/MASS/html/polr.html(that
fits a logistic or probit regression model to an ordered factor
response) shows

options(contrasts = c(contr.treatment, contr.poly))
house.plr - polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)
pr - profile(house.plr)
plot(pr)
pairs(pr)


   -

   What information does pr contain? The help page on
profilehttp://stat.ethz.ch/R-manual/R-patched/library/stats/html/profile.htmlis
generic, and gives no guidance for polr.
   -

   What is plot(pr) showing? I see six graphs. Each has an X axis that is
   numeric, although the label is an indicator variable (looks like an input
   variable that is an indicator for an ordinal value). Then the Y axis is
   tau which is completely unexplained.
   -

   What is pairs(pr) showing? It looks like a plot for each pair of input
   variables, but again I see no explanation of the X or Y axes.
   -

   How can one understand if the model gave a good fit?
summary(house.plr)shows Residual Deviance 3479.149 and AIC (Akaike
Information Criterion?) of
   3495.149. Is that good? In the case those are only useful as relative
   measures (i.e. to compare to another model fit), what is a good absolute
   measure? Is the residual deviance approximately chi-squared distributed? Can
   one use % correctly predicted on the original data or some
   cross-validation? What is the easiest way to do that?
   -

   How does one apply and interpret anova on this model? The docs say There
   are methods for the standard model-fitting functions, including predict,
   summary, vcov, anova. However, running anova(house.plr) results in anova
   is not implemented for a single polr object
   -

   How does one interpret the t values for each coefficient? Unlike some
   model fits, there are no P values here.

I realize this is a lot of questions, but it makes sense to me to ask as one
bundle (how do I use this thing?) rather than 7 different questions. Any
information appreciated.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to understand output from R's polr function (ordered logistic regression)?

2011-03-01 Thread Dan Frankowski
Also posted as
http://stats.stackexchange.com/questions/7720/how-to-understand-output-from-rs-polr-function-ordered-logistic-regression
.

Also, I read section 7.3 of Modern Applied Statistics with S by Venables
and Ripley (who wrote polr?), and I can still not answer many of these
questions.

On Tue, Mar 1, 2011 at 3:25 PM, Dan Frankowski dfran...@gmail.com wrote:

 I am new to R, ordered logistic regression, and polr.

 The Examples section at the bottom of the help page for 
 polrhttp://stat.ethz.ch/R-manual/R-patched/library/MASS/html/polr.html(that 
 fits a logistic or probit regression model to an ordered factor
 response) shows

 options(contrasts = c(contr.treatment, contr.poly))

 house.plr - polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)

 pr - profile(house.plr)

 plot(pr)
 pairs(pr)


-

What information does pr contain? The help page on 
 profilehttp://stat.ethz.ch/R-manual/R-patched/library/stats/html/profile.htmlis
  generic, and gives no guidance for polr.
-

What is plot(pr) showing? I see six graphs. Each has an X axis that is
numeric, although the label is an indicator variable (looks like an input
variable that is an indicator for an ordinal value). Then the Y axis is
tau which is completely unexplained.
-

What is pairs(pr) showing? It looks like a plot for each pair of input
variables, but again I see no explanation of the X or Y axes.
-

How can one understand if the model gave a good fit? 
 summary(house.plr)shows Residual Deviance 3479.149 and AIC (Akaike 
 Information Criterion?) of
3495.149. Is that good? In the case those are only useful as relative
measures (i.e. to compare to another model fit), what is a good absolute
measure? Is the residual deviance approximately chi-squared distributed? 
 Can
one use % correctly predicted on the original data or some
cross-validation? What is the easiest way to do that?
-

How does one apply and interpret anova on this model? The docs say
There are methods for the standard model-fitting functions, including
predict, summary, vcov, anova. However, running anova(house.plr)results 
 in anova
is not implemented for a single polr object
-

How does one interpret the t values for each coefficient? Unlike some
model fits, there are no P values here.

 I realize this is a lot of questions, but it makes sense to me to ask as
 one bundle (how do I use this thing?) rather than 7 different questions.
 Any information appreciated.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] splitting and stacking matrices

2011-03-01 Thread Darcy Webber
Dear R users,

I am having some difficulty arranging some matrices and wondered if
anyone could help out. As an example, consider the following matrix:

a - matrix(1:32, nrow = 4, ncol = 8)
a
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]159   13   17   21   25   29
[2,]26   10   14   18   22   26   30
[3,]37   11   15   19   23   27   31
[4,]48   12   16   20   24   28   32

I would like it to look like the following matrix:

 [,1] [,2] [,3] [,4]
[1,]159   13
[2,]26   10   14
[3,]37   11   15
[4,]48   12   16
[5,]  17   21   25   29
[6,]  18   22   26   30
[7,]  19   23   27   31
[8,]  20   24   28   32

I can achieve this using the following:

a1 - a[, 1:4]
a2 - a[, 5:8]
b - rbind(a1, a2)

However, my initial matrix often has a varibale number of columns (in
multiples of 4, and I still want to split the columns into blocks of 4
and stack these). I have considered working out how many blocks the
matrix must be split into using: no.blocks - ncol(a)/4. My problem is
then implementing this information to actually split the matrix up and
then stack it. Any guidance on this would be much appreciated.

Regards
Darcy Webber

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient ifelse() ?

2011-03-01 Thread William Dunlap
An ifelse-like function that only evaluated
what was needed would be fine, but it would
have to be different from ifelse itself.  The
trick is to come up with a good parameterization.

E.g., how would it deal with things like
   ifelse(is.na(x), mean(x, na.rm=TRUE), x)
or
   ifelse(x1, log(x), runif(length(x),-1,0)) 
or
   ifelse(x1, log(x), -seq_along(x))
Would it reject such things?  Deciding that the
x in mean(x,na.rm=TRUE) should be replaced by
x[is.na(x)] would be wrong.  Deciding that
runif(length(x)) should be replaced by runif(sum(x1))
seems a bit much to expect.  Replacing seq_along(x) with
seq_len(sum(x1)) is wrong.  It would be better to
parameterize the new function so it wouldn't have to
think about those cases.

Would you want it to depend only on a logical
vector or perhaps also on a factor (a vectorized
switch/case function)?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of ivo welch
 Sent: Tuesday, March 01, 2011 12:36 PM
 To: Henrique Dallazuanna
 Cc: r-help
 Subject: Re: [R] inefficient ifelse() ?
 
 thanks, Henrique.  did you mean
 
 as.vector(t(mapply(function(x, f)f(x), split(t, ((t %% 2)==0)),
 list(f, g   ?
 
 otherwise, you get a matrix.
 
 its a good solution, but unfortunately I don't think this can be used
 to redefine ifelse(cond,ift,iff) in a way that is transparent.  the
 ift and iff functions will always be evaluated before the function
 call happens, even with lazy evaluation.  :-(
 
 I still think that it makes sense to have a smarter vectorized %if% in
 a vectorized language like R.  just my 5 cents.
 
 /iaw
 
 
 Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)
 
 
 
 
 
 On Tue, Mar 1, 2011 at 2:33 PM, Henrique Dallazuanna 
 www...@gmail.com wrote:
  Try this:
 
  mapply(function(x, f)f(x), split(t, t %% 2), list(g, f))
 
  On Tue, Mar 1, 2011 at 4:19 PM, ivo welch ivo...@gmail.com wrote:
 
  dear R experts---
 
   t - 1:30
   f - function(t) { cat(f for, t, \n); return(2*t) }
   g - function(t) { cat(g for, t, \n); return(3*t) }
   s - ifelse( t%%2==0, g(t), f(t))
 
  shows that the ifelse function actually evaluates both f() 
 and g() for
  all values first, and presumably then just picks left or 
 right results
  based on t%%2.  uggh... wouldn't it make more sense to 
 evaluate only
  the relevant parts of each vector and then reassemble them?
 
  /iaw
  
  Ivo Welch
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  --
  Henrique Dallazuanna
  Curitiba-Paraná-Brasil
  25° 25' 40 S 49° 16' 22 O
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient ifelse() ?

2011-03-01 Thread Thomas Lumley
On Wed, Mar 2, 2011 at 9:36 AM, ivo welch ivo.we...@gmail.com wrote:
 thanks, Henrique.  did you mean

    as.vector(t(mapply(function(x, f)f(x), split(t, ((t %% 2)==0)),
 list(f, g   ?

 otherwise, you get a matrix.

 its a good solution, but unfortunately I don't think this can be used
 to redefine ifelse(cond,ift,iff) in a way that is transparent.  the
 ift and iff functions will always be evaluated before the function
 call happens, even with lazy evaluation.  :-(

 I still think that it makes sense to have a smarter vectorized %if% in
 a vectorized language like R.  just my 5 cents.


Ivo,

There is no guarantee in general that  f(x[3,5,7]) is the same as f(x)[3,5,7]


  -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient ifelse() ?

2011-03-01 Thread ivo welch
yikes.  you are asking me too much.

thanks everybody for the information.  I learned something new.

my suggestion would be for the much smarter language designers (than
I) to offer us more or less blissfully ignorant users another
vector-related construct in R.  It could perhaps be named %if% %else%,
analogous to if else (with naming inspired by %in%, and with
evaluation only of relevant parts [just as if else for scalars]), with
different outcomes in some cases, but with the advantage of typically
evaluating only half as many conditions as the ifelse() vector
construct.  %if% %else% may work only in a subset of cases, but when
it does work, it would be nice to have.  it would probably be my first
goto function, with ifelse() use only as a fallback.

of course, I now know how to fix my specific issue.  I was just
surprised that my first choice, ifelse(), was not as optimized as I
had thought.

best,

/iaw


On Tue, Mar 1, 2011 at 5:13 PM, William Dunlap wdun...@tibco.com wrote:
 An ifelse-like function that only evaluated
 what was needed would be fine, but it would
 have to be different from ifelse itself.  The
 trick is to come up with a good parameterization.

 E.g., how would it deal with things like
   ifelse(is.na(x), mean(x, na.rm=TRUE), x)
 or
   ifelse(x1, log(x), runif(length(x),-1,0))
 or
   ifelse(x1, log(x), -seq_along(x))
 Would it reject such things?  Deciding that the
 x in mean(x,na.rm=TRUE) should be replaced by
 x[is.na(x)] would be wrong.  Deciding that
 runif(length(x)) should be replaced by runif(sum(x1))
 seems a bit much to expect.  Replacing seq_along(x) with
 seq_len(sum(x1)) is wrong.  It would be better to
 parameterize the new function so it wouldn't have to
 think about those cases.

 Would you want it to depend only on a logical
 vector or perhaps also on a factor (a vectorized
 switch/case function)?

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of ivo welch
 Sent: Tuesday, March 01, 2011 12:36 PM
 To: Henrique Dallazuanna
 Cc: r-help
 Subject: Re: [R] inefficient ifelse() ?

 thanks, Henrique.  did you mean

     as.vector(t(mapply(function(x, f)f(x), split(t, ((t %% 2)==0)),
 list(f, g   ?

 otherwise, you get a matrix.

 its a good solution, but unfortunately I don't think this can be used
 to redefine ifelse(cond,ift,iff) in a way that is transparent.  the
 ift and iff functions will always be evaluated before the function
 call happens, even with lazy evaluation.  :-(

 I still think that it makes sense to have a smarter vectorized %if% in
 a vectorized language like R.  just my 5 cents.

 /iaw

 
 Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)





 On Tue, Mar 1, 2011 at 2:33 PM, Henrique Dallazuanna
 www...@gmail.com wrote:
  Try this:
 
  mapply(function(x, f)f(x), split(t, t %% 2), list(g, f))
 
  On Tue, Mar 1, 2011 at 4:19 PM, ivo welch ivo...@gmail.com wrote:
 
  dear R experts---
 
   t - 1:30
   f - function(t) { cat(f for, t, \n); return(2*t) }
   g - function(t) { cat(g for, t, \n); return(3*t) }
   s - ifelse( t%%2==0, g(t), f(t))
 
  shows that the ifelse function actually evaluates both f()
 and g() for
  all values first, and presumably then just picks left or
 right results
  based on t%%2.  uggh... wouldn't it make more sense to
 evaluate only
  the relevant parts of each vector and then reassemble them?
 
  /iaw
  
  Ivo Welch
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  --
  Henrique Dallazuanna
  Curitiba-Paraná-Brasil
  25° 25' 40 S 49° 16' 22 O
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in saved .csv

2011-03-01 Thread jim holtman
I am not sure what you are saying your problem is?  Is the format
incorrect?  BTW, notice that write.csv does not have a 'sep'
parameter.  Maybe you should be using write.table.

On Tue, Mar 1, 2011 at 4:36 PM, Tamas Barjak tamas.barja...@gmail.com wrote:
 Help me please!

 I would like to be saved a data table:

 write.csv(random.t1, place, dec=,, append = T, quote = FALSE, sep =  ,
 qmethod = double, eol = \n, row.names=F)

 It's OK!

 But the rows of file

  1,1,21042,-4084.87179487179,2457.66483516483,-582.275562799881
 2,2,23846,-6383.86480186479,-3409.98451548449,-3569.72145340269
 and no

 1
 21042 -         4084.87179487179 2457.66483516483
 Not proportional...

 What's the problem???

 Thanks!

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in saved .csv

2011-03-01 Thread Peter Langfelder
On Tue, Mar 1, 2011 at 1:36 PM, Tamas Barjak tamas.barja...@gmail.com wrote:
 Help me please!

 I would like to be saved a data table:

 write.csv(random.t1, place, dec=,, append = T, quote = FALSE, sep =  ,
 qmethod = double, eol = \n, row.names=F)

 It's OK!

 But the rows of file

  1,1,21042,-4084.87179487179,2457.66483516483,-582.275562799881
 2,2,23846,-6383.86480186479,-3409.98451548449,-3569.72145340269
 and no

 1
 21042 -         4084.87179487179 2457.66483516483
 Not proportional...

 What's the problem???

If I understand you correctly, you want a text file where the
separator is a white space. You cannot get that with write.csv - you
have to use write.table(). The function write.csv does not allow you
to change the sep argument. Here's what the help file says:

 ‘write.csv’ and ‘write.csv2’ provide convenience wrappers for
 writing CSV files.  They set ‘sep’, ‘dec’ and ‘qmethod’, and
 ‘col.names’ to ‘NA’ if ‘row.names = TRUE’ and ‘TRUE’ otherwise.

 ‘write.csv’ uses ‘.’ for the decimal point and a comma for the
 separator.

 ‘write.csv2’ uses a comma for the decimal point and a semicolon
 for the separator, the Excel convention for CSV files in some
 Western European locales.

 These wrappers are deliberately inflexible: they are designed to
 ensure that the correct conventions are used to write a valid
 file.  Attempts to change ‘append’, ‘col.names’, ‘sep’, ‘dec’ or
 ‘qmethod’ are ignored, with a warning.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Stepwise Criterion

2011-03-01 Thread Bill.Venables
The probability OF the residual deviance is zero.  The significance level for 
the residual deviance according to its asymptotic Chi-squared distribution is a 
possible criterion, but a silly one.  If you want to minimise that, just fit no 
variables at all.  That's the best you can do. If you want to maximise it, just 
minimise the deviance itself, which means include all possible variables in the 
regression, together with as many interactions as you can as well.  (Incidently 
R doesn't have restrictions on how many interaction terms it can handle, those 
are imposed my your computer.)

I suggest you think again about what criterion you really want to use.  Somehow 
you need to balance fit in the training sample against some complexity measure. 
 AIC and BIC are commonly used criteria, but not the only ones.  I suggest you 
start with these and see if either does the kind of job you want.

Stepwise regression with interaction terms can be a bit tricky if you want to 
impose the marginality constraints, but that is a bigger issue.

Bill Venables. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Mano Gabor Toth
Sent: Wednesday, 2 March 2011 6:44 AM
To: r-help@r-project.org
Subject: [R] Logistic Stepwise Criterion

Dear R-help members,

I'd like to run a binomial logistic stepwise regression with ten explanatory
variables and as many interaction terms as R can handle. I'll come up with
the right R command sooner or later, but my real question is whether and how
the criterion for the evaluation of the different models can be set to be
the probability of the residual deviance in the Chi-Square distribution
(which would be more informative of overall model fit than AIC).

Thanks in advance for all your help.

Kind regards,

Mano Gabor TOTH
MA Political Science
Central European University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to prove the MLE estimators are normal distributed?

2011-03-01 Thread Ning Cheng
Dear List,
I'm now working on MLE and OSL estimators.I just noticed that the
textbook argues they are joint normal distributed.But how to prove the
conclusion?

Thanks for your time in advance!

Best,
Ning

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in saved .csv

2011-03-01 Thread Tamas Barjak
Yes, the format is incorrect. I have already tried the write.table, but it
didn't work.


2011/3/1 jim holtman jholt...@gmail.com

 I am not sure what you are saying your problem is?  Is the format
 incorrect?  BTW, notice that write.csv does not have a 'sep'
 parameter.  Maybe you should be using write.table.

 On Tue, Mar 1, 2011 at 4:36 PM, Tamas Barjak tamas.barja...@gmail.com
 wrote:
  Help me please!
 
  I would like to be saved a data table:
 
  write.csv(random.t1, place, dec=,, append = T, quote = FALSE, sep = 
 ,
  qmethod = double, eol = \n, row.names=F)
 
  It's OK!
 
  But the rows of file
 
   1,1,21042,-4084.87179487179,2457.66483516483,-582.275562799881
  2,2,23846,-6383.86480186479,-3409.98451548449,-3569.72145340269
  and no
 
  1
  21042 - 4084.87179487179 2457.66483516483
  Not proportional...
 
  What's the problem???
 
  Thanks!
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting and stacking matrices

2011-03-01 Thread Gabor Grothendieck
On Tue, Mar 1, 2011 at 4:55 PM, Darcy Webber darcy.web...@gmail.com wrote:
 Dear R users,

 I am having some difficulty arranging some matrices and wondered if
 anyone could help out. As an example, consider the following matrix:

 a - matrix(1:32, nrow = 4, ncol = 8)
 a
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
 [1,]    1    5    9   13   17   21   25   29
 [2,]    2    6   10   14   18   22   26   30
 [3,]    3    7   11   15   19   23   27   31
 [4,]    4    8   12   16   20   24   28   32

 I would like it to look like the following matrix:

     [,1] [,2] [,3] [,4]
 [1,]    1    5    9   13
 [2,]    2    6   10   14
 [3,]    3    7   11   15
 [4,]    4    8   12   16
 [5,]  17   21   25   29
 [6,]  18   22   26   30
 [7,]  19   23   27   31
 [8,]  20   24   28   32

 I can achieve this using the following:

 a1 - a[, 1:4]
 a2 - a[, 5:8]
 b - rbind(a1, a2)

 However, my initial matrix often has a varibale number of columns (in
 multiples of 4, and I still want to split the columns into blocks of 4
 and stack these). I have considered working out how many blocks the
 matrix must be split into using: no.blocks - ncol(a)/4. My problem is
 then implementing this information to actually split the matrix up and
 then stack it. Any guidance on this would be much appreciated.

 Regards
 Darcy Webber


Try converting to a 3d array, swapping the last two dimensions and
reconstituting it as a matrix:

 matrix(aperm(array(a, c(4, 4, 2)), c(1, 3, 2)), nc = 4)
 [,1] [,2] [,3] [,4]
[1,]159   13
[2,]26   10   14
[3,]37   11   15
[4,]48   12   16
[5,]   17   21   25   29
[6,]   18   22   26   30
[7,]   19   23   27   31
[8,]   20   24   28   32



-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >