Re: [R] Problem Creating Partial Dependence Plot

2014-05-30 Thread Stephen Milborrow

Jane Shevtsov wrote:

I am trying to use the plotmo package to generate a partial dependence 
plot

for a CART model created with rpart. When running plotmo, I get Error:
get.plotmo.y returned the wrong length (got 204938 expected 205000). The
rpart predict function does indeed return 204938 results, but plotmo is
supposed to be able to handle NA's in rpart models.  What I might do about
this?


It’s because you have NAs in y; plotmo only allows them in x (and then only 
with rpart models).  The plotmo help page is admittedly not clear on this 
and I will update it in due course.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Scatter plot selection points

2014-05-30 Thread Beatriz

Hi all,

I'd like to do a scatterplot where some of the values, out of a subset, 
are plotted differently in color and shape.

I've worked around the following code but I don't manage to make it right.
Any help greatly appreciated!

# My data
dd - iris
iris$Code - 1:150

# A selection of my data I'd like to plot differently
subset - subset(iris, iris$Sepal.Width5)
sel - as.character(subset$Code) # I think the problems start already 
here :)


# Plotting doesn't work
plot(iris$Sepal.Length ~ iris$Sepal.Widith,
 col=ifelse(iris$Code==sel, red, black)
 pch=ifelse(iris$Code==sel, 17, 1))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dataframe: Average cells of two rows and replace them with one row

2014-05-30 Thread PIKAL Petr
Hi

Please do not use html formating in your post. It does not bring any advantage.
See inline.

From: Verena Weinbir [mailto:vwein...@gmail.com]
Sent: Thursday, May 29, 2014 3:33 PM
To: PIKAL Petr
Subject: Re: [R] Dataframe: Average cells of two rows and replace them with one 
row

Hey,
Thank you for your reply!

I've attached some sample data. When I tried your code it gave me the error 
message, that arguments must have same
Why you attached data? Preferable is using dput. When I tried to read your data 
it had some flaw with number of items in row 13 (and probably others), Excel is 
not famous for keeping same formating across versions.
 test-read.table(clipboard, header=T, na.string=NA, dec=,)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  line 13 did not have 25 elements
So I read only lines 1:10.
 test-read.table(clipboard, header=T, na.string=NA, dec=,)
Which results in data frame with two factor variables Author and Test. BTW 
there is no variable “Name” in your data.
 str(test)
'data.frame':   10 obs. of  25 variables:
$ Author  : Factor w/ 4 levels Beck,Joll,..: 2 2 2 2 1 1 1 1 3 4
$ Year: int  2006 2006 2006 2006 1988 1988 1988 1988 2004 2004
$ Number  : int  720 720 720 720 33 41 41 41 19 26
$ NumberA : int  344 344 344 344 5 6 6 6 9 12
$ NumberB : int  376 376 376 376 28 35 35 35 10 14
$ Age : num  15 15 15 15 25.5 NA NA NA 37.4 37.2
$ AgeA: int  NA NA NA NA 27 NA NA NA NA NA
$ AgeB: int  NA NA NA NA 24 NA NA NA NA NA
$ Test: Factor w/ 2 levels green,red: 2 2 2 2 1 1 1 1 1 1
$ ScoreA  : num  64.8 63 64.7 60.6 61 ...
$ ScoreAdv: num  9.96 9.96 9.96 9.96 20.64 ...
$ ScoreB  : num  75.5 73.4 74.6 69.2 70.8 ...
$ ScoreBdv: num  9.04 9.04 9.04 9.04 16.36 ...
$ Sub1: logi  NA NA NA NA NA NA ...
$ Sub2: logi  NA NA NA NA NA NA ...
$ Sub3: logi  NA NA NA NA NA NA ...
$ Sub4: logi  NA NA NA NA NA NA ...
$ Sub5: logi  NA NA NA NA NA NA ...
$ Sub6: logi  NA NA NA NA NA NA ...
$ Sub7: logi  NA NA NA NA NA NA ...
$ Sub8: logi  NA NA NA NA NA NA ...
$ Sub8.1  : logi  NA NA NA NA NA NA ...
$ Sub10   : logi  NA NA NA NA NA NA ...
$ yi  : num  1.124 1.092 1.04 0.903 0.515 ...
$ vi  : num  0.00643 0.00638 0.0063 0.00612 0.23337 ...
Here is output from dput which you can use to inspect if my data are the same 
as yours (that is why dput is preferable)
 dput(test)
structure(list(Author = structure(c(3L, 3L, 3L, 3L, 1L, 1L, 1L,
1L, 4L, 5L, 2L), .Label = c(Beck, Con, Joll, Per(a),
Per(b)), class = factor), Year = c(2006L, 2006L, 2006L, 2006L,
1988L, 1988L, 1988L, 1988L, 2004L, 2004L, 2012L), Number = c(720L,
720L, 720L, 720L, 33L, 41L, 41L, 41L, 19L, 26L, 312L), NumberA = c(344L,
344L, 344L, 344L, 5L, 6L, 6L, 6L, 9L, 12L, 156L), NumberB = c(376L,
376L, 376L, 376L, 28L, 35L, 35L, 35L, 10L, 14L, 156L), Age = c(15,
15, 15, 15, 25.5, NA, NA, NA, 37.4, 37.2, 37.25), AgeA = c(NA,
NA, NA, NA, 27, NA, NA, NA, NA, NA, 38.3), AgeB = c(NA, NA, NA,
NA, 24, NA, NA, NA, NA, NA, 36.2), Test = structure(c(3L, 3L,
3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c(blue, green,
red), class = factor), ScoreA = c(64.8, 63, 64.7, 60.6, 61,
60.66, 58.5, 61.66, 87.58, 91.2, 0.26), ScoreAdv = c(9.955, 9.955,
9.955, 9.955, 20.64, 19.38, 20.35, 19.44, 16.79, 15.6, 0.27),
ScoreB = c(75.5, 73.4, 74.6, 69.2, 70.83, 70.34, 70.91, 71.19,
98.08, 86.87, 0.3), ScoreBdv = c(9.043, 9.043, 9.043, 9.043,
16.36, 17.78, 18.23, 18.93, 16.35, 15.73, 0.26), Sub1 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Sub2 = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), Sub3 = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), Sub4 = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), Sub5 = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA), Sub6 = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), Sub7 = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA), Sub8 = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA), Sub8.1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA), Sub10 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA), yi = c(1.12396298138735, 1.0924560079, 1.03992836595652,
0.90337211588142, 0.514940166844419, 0.510422808437657, 0.629923007603453,
0.487074464117519, 0.605177248294008, -0.26766583881062,
0.150551071047105), vi = c(0.0064268782069221, 0.00637821308397186,
0.00630017975096319, 0.00611528303580472, 0.233373905723904,
0.212826775760406, 0.211924228535386, 0.222536036643126,
0.224889816220824, 0.158901797586393, 0.0128772400934118)), .Names = 
c(Author,
Year, Number, NumberA, NumberB, Age, AgeA, AgeB,
Test, ScoreA, ScoreAdv, ScoreB, ScoreBdv, Sub1, Sub2,
Sub3, Sub4, Sub5, Sub6, Sub7, Sub8, Sub8.1, Sub10,
yi, vi), class = data.frame, row.names = c(NA, -11L))

I can use aggregate without problems
 test.ag-aggregate(test[,-1], list(test[,1]), mean, na.rm=T)
Here is the result
 dput(test.ag)
structure(list(Group.1 = structure(1:5, .Label = c(Beck, Con,
Joll, Per(a), Per(b)), class = factor), Year = c(1988,

Re: [R] partykit ctree: minbucket and case weights

2014-05-30 Thread Henric Winell

Amber Dawn Nolder wrote 2014-05-28 23:16:


Hello,
I am an R novice, and I am using the partykit package to create
regression trees. I used the following to generate the trees:
ctree(y~x1+x2+x3+x4,data=my_data,control=ctree_control(testtype =
Bonferroni, mincriterion = 0.90, minsplit = 12, minbucket = 4,
majority = TRUE)
I thought that minbucket set the minimum value for the sum of weights
in each terminal node, and that each case weight is 1, unless otherwise
specified. In which case, the sum of case weights in a node should equal the
number of cases (n) in that node. However, I  sometimes obtain a tree with
a terminal node that contains fewer than 4 cases.


I do agree that the tree below looks suspicious.  You may have found a 
bug.


But you didn't provide commented, minimal, self-contained, reproducible 
code, i.e., we're missing your 'my_data' object, and therefore we 
cannot reproduce this easily.  Can you please provide us with the output 
from 'dput(my_data)'?



My data set has a total of 36 cases. The dependent and all independent
variables are continuous data. Variables x1 and x2 contain missing (NA)
values.


I tried a few other data sets and there the results seem to come out OK 
(even after inducing NAs).



Could someone please explain why I am getting these results?


Probably.  But you need to provide a reproducible example and the 
details obtained by 'sessionInfo()'.


As per the posting guide, since this is a contributed package you should 
first contact its maintainer (Torsten Hothorn, CC'd) and only post here 
if you get no reply.  Did you try contacting Torsten?



Am I  mistaken about the value of case weights or about the use of minbucket
to restrict the size of a terminal node?


I don't think you're mistaken since '?ctree_control' says that 
minbucket: the minimum sum of weights in a terminal node.



Henric




This is an example of the output:
Model formula:
y ~ x1 + x2 + x3 + x4
Fitted party:
[1] root
|   [2] x4 = 30: 0.927 (n = 17, err = 1.1)
|   [3] x4  30
|   |   [4] x2 = 43: 0.472 (n = 8, err = 0.4)
|   |   [5] x2  43
|   |   |   [6] x3 = 0.4: 0.282 (n = 3, err = 0.0)
|   |   |   [7] x3  0.4: 0.020 (n = 8, err = 0.0)
Number of inner nodes:3
Number of terminal nodes: 4
Many thanks!
Amber Nolder
Graduate Student
Indiana University of Pennsylvania
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scatter plot selection points

2014-05-30 Thread PIKAL Petr
Hi

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Beatriz
 Sent: Friday, May 30, 2014 9:37 AM
 To: R Help
 Subject: [R] Scatter plot selection points

 Hi all,

 I'd like to do a scatterplot where some of the values, out of a subset,
 are plotted differently in color and shape.
 I've worked around the following code but I don't manage to make it
 right.
 Any help greatly appreciated!

 # My data
 dd - iris
 iris$Code - 1:150

 # A selection of my data I'd like to plot differently
 subset - subset(iris, iris$Sepal.Width5)

 max(iris$Sepal.Width)
[1] 4.4

No values out of subset. So I changed threshold.

iris$code-iris$Sepal.Width3.5

 sel - as.character(subset$Code) # I think the problems start already
 here :)

 # Plotting doesn't work
 plot(iris$Sepal.Length ~ iris$Sepal.Widith,
   col=ifelse(iris$Code==sel, red, black)
   pch=ifelse(iris$Code==sel, 17, 1))

Overcomplicated

plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red, black)[iris$code+1], 
pch=c(17, 1)[iris$code+1])

Regards
Petr


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scatter plot selection points

2014-05-30 Thread Beatriz

Hi Ptr,

Thanks for your email however, I cannot make the code work.

Also, I quite like the ifelse approach. I find it very clean.

Cheers



On 30/05/2014 15:57, PIKAL Petr wrote:

Hi


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of Beatriz
Sent: Friday, May 30, 2014 9:37 AM
To: R Help
Subject: [R] Scatter plot selection points

Hi all,

I'd like to do a scatterplot where some of the values, out of a subset,
are plotted differently in color and shape.
I've worked around the following code but I don't manage to make it
right.
Any help greatly appreciated!

# My data
dd - iris
iris$Code - 1:150

# A selection of my data I'd like to plot differently
subset - subset(iris, iris$Sepal.Width5)
max(iris$Sepal.Width)

[1] 4.4
No values out of subset. So I changed threshold.

iris$code-iris$Sepal.Width3.5


sel - as.character(subset$Code) # I think the problems start already
here :)

# Plotting doesn't work
plot(iris$Sepal.Length ~ iris$Sepal.Widith,
   col=ifelse(iris$Code==sel, red, black)
   pch=ifelse(iris$Code==sel, 17, 1))

Overcomplicated

plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red, black)[iris$code+1], 
pch=c(17, 1)[iris$code+1])

Regards
Petr


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scatter plot selection points

2014-05-30 Thread PIKAL Petr
Hi


 -Original Message-
 From: Beatriz [mailto:aguitatie...@hotmail.com]
 Sent: Friday, May 30, 2014 10:08 AM
 To: PIKAL Petr; R Help
 Subject: Re: [R] Scatter plot selection points

 Hi Ptr,

 Thanks for your email however, I cannot make the code work.

Errors? What code you tried?

iris$code - iris$Sepal.Width3.5
plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red,black)[iris$code+1], 
pch=c(17, 1)[iris$code+1])

works for me without any problem.


 Also, I quite like the ifelse approach. I find it very clean.

Yes. I mean complicated is your subset approach. You can use ifelse if you like.

plot(iris$Sepal.Length ~ iris$Sepal.Width, col=ifelse(iris$code, black, 
red), pch= ifelse(iris$code, 1,17))

Regards
Petr



 Cheers



 On 30/05/2014 15:57, PIKAL Petr wrote:
  Hi
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Beatriz
  Sent: Friday, May 30, 2014 9:37 AM
  To: R Help
  Subject: [R] Scatter plot selection points
 
  Hi all,
 
  I'd like to do a scatterplot where some of the values, out of a
  subset, are plotted differently in color and shape.
  I've worked around the following code but I don't manage to make it
  right.
  Any help greatly appreciated!
 
  # My data
  dd - iris
  iris$Code - 1:150
 
  # A selection of my data I'd like to plot differently subset -
  subset(iris, iris$Sepal.Width5)
  max(iris$Sepal.Width)
  [1] 4.4
  No values out of subset. So I changed threshold.
 
  iris$code-iris$Sepal.Width3.5
 
  sel - as.character(subset$Code) # I think the problems start
 already
  here :)
 
  # Plotting doesn't work
  plot(iris$Sepal.Length ~ iris$Sepal.Widith,
 col=ifelse(iris$Code==sel, red, black)
 pch=ifelse(iris$Code==sel, 17, 1))
  Overcomplicated
 
  plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red,
  black)[iris$code+1], pch=c(17, 1)[iris$code+1])
 
  Regards
  Petr
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html and provide commented, minimal, self-contained,
  reproducible code.
  
  Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a
 jsou určeny pouze jeho adresátům.
  Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
 neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho
 kopie vymažte ze svého systému.
  Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento
 email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
  Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou
 modifikacemi či zpožděním přenosu e-mailu.
 
  V případě, že je tento e-mail součástí obchodního jednání:
  - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
 smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
  - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně
 přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky
 ze strany příjemce s dodatkem či odchylkou.
  - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
 výslovným dosažením shody na všech jejích náležitostech.
  - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
 společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně
 zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly
 adresátovi tohoto emailu případně osobě, kterou adresát zastupuje,
 předloženy nebo jejich existence je adresátovi či osobě jím zastoupené
 známá.
 
  This e-mail and any documents attached to it may be confidential and
 are intended only for its intended recipients.
  If you received this e-mail by mistake, please immediately inform its
 sender. Delete the contents of this e-mail with all attachments and its
 copies from your system.
  If you are not the intended recipient of this e-mail, you are not
 authorized to use, disseminate, copy or disclose this e-mail in any
 manner.
  The sender of this e-mail shall not be liable for any possible damage
 caused by modifications of the e-mail or by delay with transfer of the
 email.
 
  In case that this e-mail forms part of business dealings:
  - the sender reserves the right to end negotiations about entering
 into a contract in any time, for any reason, and without stating any
 reasoning.
  - if the e-mail contains an offer, the recipient is entitled to
 immediately accept such offer; The sender of this e-mail (offer)
 excludes any acceptance of the offer on the part of the recipient
 containing any amendment or variation.
  - the sender insists on that the respective contract is concluded
 only upon an express mutual agreement on all its aspects.
  - the sender of this e-mail informs that he/she is not authorized to
 enter into any contracts on behalf of the company except for cases in
 which he/she is expressly authorized to do so in writing, and such
 

Re: [R] Multiple regression in R

2014-05-30 Thread Rui Barradas

Hello,

lm() is designed to work with data.frames, not with matrices. You can 
change your code to something like


dat - data.frame(price, pred1 = c(5,6,3,4,5), pred2 = c(2,1,8,5,6))
fit - lm(price ~ pred1 + pred2, data = dat)

and then use the fitted model to do predictions. You don't have to give 
the new values in a matrix, you can give them as vectors of a data.frame.


predict(fit, data.frame(pred1 = 1:3, pred2 = 3:5))


Hope this helps,

Rui Barradas

Em 29-05-2014 21:38, Safiye Celik escreveu:

I want to perform a multiple regression in R and make predictions based on
the trained model. Below is an example code I am using:

price = c(10,18,18,11,17)
predictors = cbind(c(5,6,3,4,5),c(2,1,8,5,6))
predict(lm(price ~ predictors), data.frame(predictors=matrix(c(3,5),nrow=1)))

  So, based on the 2-variate regression model trained by 5 samples, I want
to make a prediction for the test data point where the first variate is 3
and second variate is 5. But I get a warning from above code saying
that 'newdata'
had 1 rows but variable(s) found have 5 rows. How can I correct above code?
Below code works fine where I give the variables separately to the model
formula. But since I will have hundreds of variates, I have to give them in
a matrix since it would be unfeasible to append hundreds of columns using +
  sign.

price = c(10,18,18,11,17)
predictor1 = c(5,6,3,4,5)
predictor2 = c(2,1,8,5,6)
predict(lm(price ~ predictor1 + predictor2),
data.frame(predictor1=3,predictor2=5))

  Thanks in advance!



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scatter plot selection points

2014-05-30 Thread Beatriz

Hi Petr,

Initially your code didn´t work because 'Code' wasn't in uppercase. It 
works now! :) The only thing is that I wanted in red the codes  3.5.


Optional code:
sel - iris[iris$Sepal.Width3.5,Code]
plot(iris$Sepal.Length ~ iris$Sepal.Width,
col=ifelse(iris$Code %in% sel, red, black),
pch=ifelse(iris$Code %in% sel, 17, 1))

Cheers


On 30/05/2014 17:38, PIKAL Petr wrote:

Hi



-Original Message-
From: Beatriz [mailto:aguitatie...@hotmail.com]
Sent: Friday, May 30, 2014 10:08 AM
To: PIKAL Petr; R Help
Subject: Re: [R] Scatter plot selection points

Hi Ptr,

Thanks for your email however, I cannot make the code work.

Errors? What code you tried?

iris$code - iris$Sepal.Width3.5
plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red,black)[iris$code+1], 
pch=c(17, 1)[iris$code+1])

works for me without any problem.


Also, I quite like the ifelse approach. I find it very clean.

Yes. I mean complicated is your subset approach. You can use ifelse if you like.

plot(iris$Sepal.Length ~ iris$Sepal.Width, col=ifelse(iris$code, black, 
red), pch= ifelse(iris$code, 1,17))

Regards
Petr



Cheers



On 30/05/2014 15:57, PIKAL Petr wrote:

Hi


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of Beatriz
Sent: Friday, May 30, 2014 9:37 AM
To: R Help
Subject: [R] Scatter plot selection points

Hi all,

I'd like to do a scatterplot where some of the values, out of a
subset, are plotted differently in color and shape.
I've worked around the following code but I don't manage to make it
right.
Any help greatly appreciated!

# My data
dd - iris
iris$Code - 1:150

# A selection of my data I'd like to plot differently subset -
subset(iris, iris$Sepal.Width5)
max(iris$Sepal.Width)

[1] 4.4
No values out of subset. So I changed threshold.

iris$code-iris$Sepal.Width3.5


sel - as.character(subset$Code) # I think the problems start

already

here :)

# Plotting doesn't work
plot(iris$Sepal.Length ~ iris$Sepal.Widith,
col=ifelse(iris$Code==sel, red, black)
pch=ifelse(iris$Code==sel, 17, 1))

Overcomplicated

plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red,
black)[iris$code+1], pch=c(17, 1)[iris$code+1])

Regards
Petr


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html and provide commented, minimal, self-contained,
reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a

jsou určeny pouze jeho adresátům.

Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě

neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho
kopie vymažte ze svého systému.

Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento

email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.

Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou

modifikacemi či zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření

smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.

- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně

přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky
ze strany příjemce s dodatkem či odchylkou.

- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve

výslovným dosažením shody na všech jejích náležitostech.

- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za

společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně
zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly
adresátovi tohoto emailu případně osobě, kterou adresát zastupuje,
předloženy nebo jejich existence je adresátovi či osobě jím zastoupené
známá.

This e-mail and any documents attached to it may be confidential and

are intended only for its intended recipients.

If you received this e-mail by mistake, please immediately inform its

sender. Delete the contents of this e-mail with all attachments and its
copies from your system.

If you are not the intended recipient of this e-mail, you are not

authorized to use, disseminate, copy or disclose this e-mail in any
manner.

The sender of this e-mail shall not be liable for any possible damage

caused by modifications of the e-mail or by delay with transfer of the
email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering

into a contract in any time, for any reason, and without stating any
reasoning.

- if the e-mail contains an offer, the recipient is entitled to

immediately accept such offer; The sender of this e-mail (offer)
excludes any acceptance of the offer on the part of the recipient
containing any amendment or variation.

- the sender insists on that the respective contract is concluded

only upon an express 

Re: [R] Scatter plot selection points

2014-05-30 Thread PIKAL Petr
Hi

My code worked. Your code did not work because you was not aware that R 
distinguish case of letters :)

 -Original Message-
 From: Beatriz [mailto:aguitatie...@hotmail.com]
 Sent: Friday, May 30, 2014 12:21 PM
 To: PIKAL Petr; R Help
 Subject: Re: [R] Scatter plot selection points

 Hi Petr,

 Initially your code didn´t work because 'Code' wasn't in uppercase. It
 works now! :) The only thing is that I wanted in red the codes  3.5.

If you insist on separate object for colouring purpose you can have it.

sel - iris$Sepal.Width3.5
plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red,black)[sel+1], 
pch=c(17, 1)[sel+1])

I still consider my code simpler and easier to read and understand.

sel is logical TRUE/FALSE which can be used as 1/0 in calculations.

This
c(red,black)[sel+1]
selects red if sel is FALSE and and black if sel is TRUE
the same applies to selection of pch.

If you want to change colouring just swap the red/black
c(black, red)[sel+1]

Regards
Petr


 Optional code:
 sel - iris[iris$Sepal.Width3.5,Code]
 plot(iris$Sepal.Length ~ iris$Sepal.Width,
 col=ifelse(iris$Code %in% sel, red, black),
 pch=ifelse(iris$Code %in% sel, 17, 1))

 Cheers


 On 30/05/2014 17:38, PIKAL Petr wrote:
  Hi
 
 
  -Original Message-
  From: Beatriz [mailto:aguitatie...@hotmail.com]
  Sent: Friday, May 30, 2014 10:08 AM
  To: PIKAL Petr; R Help
  Subject: Re: [R] Scatter plot selection points
 
  Hi Ptr,
 
  Thanks for your email however, I cannot make the code work.
  Errors? What code you tried?
 
  iris$code - iris$Sepal.Width3.5
  plot(iris$Sepal.Length ~ iris$Sepal.Width,
 col=c(red,black)[iris$code+1], pch=c(17, 1)[iris$code+1])
 
  works for me without any problem.
 
  Also, I quite like the ifelse approach. I find it very clean.
  Yes. I mean complicated is your subset approach. You can use ifelse
 if you like.
 
  plot(iris$Sepal.Length ~ iris$Sepal.Width, col=ifelse(iris$code,
 black, red), pch= ifelse(iris$code, 1,17))
 
  Regards
  Petr
 
 
  Cheers
 
 
 
  On 30/05/2014 15:57, PIKAL Petr wrote:
  Hi
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Beatriz
  Sent: Friday, May 30, 2014 9:37 AM
  To: R Help
  Subject: [R] Scatter plot selection points
 
  Hi all,
 
  I'd like to do a scatterplot where some of the values, out of a
  subset, are plotted differently in color and shape.
  I've worked around the following code but I don't manage to make
 it
  right.
  Any help greatly appreciated!
 
  # My data
  dd - iris
  iris$Code - 1:150
 
  # A selection of my data I'd like to plot differently subset -
  subset(iris, iris$Sepal.Width5)
  max(iris$Sepal.Width)
  [1] 4.4
  No values out of subset. So I changed threshold.
 
  iris$code-iris$Sepal.Width3.5
 
  sel - as.character(subset$Code) # I think the problems start
  already
  here :)
 
  # Plotting doesn't work
  plot(iris$Sepal.Length ~ iris$Sepal.Widith,
  col=ifelse(iris$Code==sel, red, black)
  pch=ifelse(iris$Code==sel, 17, 1))
  Overcomplicated
 
  plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red,
  black)[iris$code+1], pch=c(17, 1)[iris$code+1])
 
  Regards
  Petr
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html and provide commented, minimal, self-contained,
  reproducible code.
  
  Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a
  jsou určeny pouze jeho adresátům.
  Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
  neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a
 jeho
  kopie vymažte ze svého systému.
  Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni
 tento
  email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
  Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou
  modifikacemi či zpožděním přenosu e-mailu.
  V případě, že je tento e-mail součástí obchodního jednání:
  - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
  smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
  - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně
  přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí
 nabídky
  ze strany příjemce s dodatkem či odchylkou.
  - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
  výslovným dosažením shody na všech jejích náležitostech.
  - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
  společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně
  zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly
  adresátovi tohoto emailu případně osobě, kterou adresát zastupuje,
  předloženy nebo jejich existence je adresátovi či osobě jím
 zastoupené
  známá.
  This e-mail and any documents attached to it may be confidential
 and
  are intended only for its 

Re: [R] Smoothed HR for interaction term in coxph model

2014-05-30 Thread Andrews, Chris
Please include example data in the future.  Perhaps the following is useful.

(1) Your model is redundant.  The * produces both main effects and the 
interaction.  So I removed the main effects from your call
(2) For my simulated data, the df=0 option chose a model that resulted in a 
singular fit.  I selected a smoother spline (df=2).
(3) the two plots at the end show (1) the risk (exp(linear predictor)) for 
combinations of CONTINUOUS and DICHOTOMOUS and (2) a ratio (risk for A vs 
risk for B), which I think is what you wanted.

Chris

library(survival)
set.seed(20140530)
nn - 1000
datanew2 - data.frame(my.surv = Surv(rexp(nn)), 
DICHOTOMOUS=factor(rep(c(A,B), nn/2)), CONTINUOUS=rnorm(nn))

#surv.fit - coxph(my.surv ~ pspline(CONTINUOUS, df=0) + factor(DICHOTOMOUS) + 
pspline(CONTINUOUS, df=0) * factor(DICHOTOMOUS), data=datanew2)
#surv.fit - coxph(my.surv ~ pspline(CONTINUOUS, df=0) * factor(DICHOTOMOUS), 
data=datanew2)
surv.fit - coxph(my.surv ~ pspline(CONTINUOUS, df=2) * factor(DICHOTOMOUS), 
data=datanew2)
surv.fit

xseq - seq(-3, 3, length=100)
predictions - matrix(predict(surv.fit, newdata=expand.grid(CONTINUOUS=xseq, 
DICHOTOMOUS=factor(c(A,B))), type=risk), ncol=2)
matplot(predictions, type=l)
plot(xseq, predictions[,1]/predictions[,2], type=l, ylab=Hazard Ratio of 
Event (A vs B), xlab=CONTINUOUS)


-Original Message-
From: Lynn Dunsire [mailto:l...@contrastconsultancy.com] 
Sent: Thursday, May 29, 2014 6:03 AM
To: r-help@r-project.org
Subject: [R] Smoothed HR for interaction term in coxph model

Hello R-help members,

 

I have a dataset with 2 treatments and want to assess the effect of a
continous covariate on the Hazard ratio between treatment A and B.  I want a
smoothed interaction term which I have modelled below with the following
code:

 

surv.fit - coxph(my.surv ~ pspline(CONTINUOUS, df=0) + factor(DICHOTOMOUS)
+  pspline(CONTINUOUS, df=0)*factor(DICHOTOMOUS), data = datanew2)

 

and consequently I would like to obtain a smoothed plot of the hazard ratio
between treatment A and B on the y-axis with the continuous covariate on the
x-axis.  As termplot ignores interaction terms, I was wondering if anyone
has seen anything like this before and can advise on the best way to do it.

 

Many thanks in advance for any help that you can offer,

 

Kind regards,

 

Lynn 


[[alternative HTML version deleted]]


**
Electronic Mail is not secure, may not be read every day, and should not be 
used for urgent or sensitive issues 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] uGARCHspec function

2014-05-30 Thread Drew M
Hello,

I am trying to re-estimate parameters and standard errors in a mean regression 
equation by simultaneously running a GARCH (1,1) variance equation. This should 
be relatively straightforward, but I cannot for the life of me get it to work. 
This has taken up several weeks of my life already. 

My mean equation is this:
dlm2A1A-lm(A1.0-rSP_RI.0~rSP_RI.0+inter.0+interdev.1+inter.1+interdev.2+inter.2+interdev.3+inter.3+interdev.4


I'm fairly sure I need to use the uGARCHspec function and have adpoted this:

ugarchspec(variance.model = list(model = sGARCH, garchOrder = c(1, 1),
submodel = NULL, external.regressors = NULL, variance.targeting = FALSE),
mean.model = list(armaOrder = c(1, 1), include.mean = TRUE, archm = TRUE,
archpow = 1, arfima = FALSE, external.regressors = NULL, archex = FALSE),
distribution.model = norm, start.pars = list(), fixed.pars = list())

How to correctly input the mean equation in the 'external.regressors=' 
parameter is beyond me. Also, I don't know where my actual data fits in here as 
this is just a framework without any identification of the data. 


Can anyone help?
Drew

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Difference in coefficients in Cox proportional hazard estimates between R and Stata, why?

2014-05-30 Thread Hiyoshi, Ayako
Dear R users,



Hi, thank you so much for your help in advance.

I have been using Stata but new to R. For my paper revision using Aalen's 
survival analysis, I need to use R, as the command including Aalen's survival 
seems to be available in R (32-bit, version 3.1.0 (2014-04-10)) but less ready 
to be used in Stata (version 13/SE).



To make sure that I can do basics, I have fitted logistic regression and Cox 
proportional hazard regression using R and Stata.



The data I used were from UCLA R's textbook example page: 
http://www.ats.ucla.edu/stat/r/examples/asa/asa_ch1_r.htm. 
http://www.ats.ucla.edu/stat/r/examples/asa/asa_ch1_r.htm. I used this in Stata 
too.



When I fitted logistic regression as below, the estimates were exactly same 
between R and Stata.



Example using logistic regression

R:



logistic1 - glm(censor ~ age + drug, data=, family = binomial)

summary(logistic1)

exp(cbind(OR=coef(logistic1), confint(logistic1)))

   OR  2.5 %97.5 %
(Intercept) 1.0373731 0.06358296 16.797896
age 1.0436805 0.96801933  1.131233
drug0.7192149 0.26042635  1.937502



Stata:



logistic censor age i.drug
OR CI_lower CI_upper
age |   1.043681   .96623881.127329
drug |.719215   .26651941.940835
_cons |   1.037373   .065847 16.3431



However, when I fitted Cox proportional hazard regression, there were some 
discrepancies in coefficient (and exponentiated hazard ratios).



Example using Cox proportioanl hazard regression

R:



cox1 - coxph(Surv(time, censor) ~ drug, age, data=)
summary(cox1)

Call:
coxph(formula = Surv(time, censor) ~ drug + age, data = )
  n= 100, number of events= 80
coef exp(coef) se(coef) z Pr(|z|)
drug 1.01670   2.76405  0.25622 3.968 7.24e-05 ***
age  0.09714   1.10202  0.01864 5.211 1.87e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 exp(coef) exp(-coef) lower .95 upper .95
drug 2.764 0.3618 1.673 4.567
age  1.102 0.9074 1.062 1.143
Concordance= 0.711  (se = 0.042 )
Rsquare= 0.324   (max possible= 0.997 )
Likelihood ratio test= 39.13  on 2 df,   p=3.182e-09
Wald test= 36.13  on 2 df,   p=1.431e-08
Score (logrank) test = 38.39  on 2 df,   p=4.602e-09

Stata:

stset time, f(censor)
stcox drug age
--
  _t | Haz. Ratio   Std. Err.  zP|z| [95% Conf. Interval]
-+
drug |   2.563531   .6550089 3.68   0.000  1.553634.229893
 age |   1.095852 .02026 4.95   0.000 1.0568541.136289
--




The HR estimates for drug was 2.76 from R, but 2.56 from Stata.

I searched in internet for explanation, but could not find any.



In parametric survival regression with exponential distribution, R and Stata's 
coefficients were completely opposite while the values were exactly same (i.e. 
say 0.08 for Stata and -0.08 for R). I suspected something like this 
(http://www.theanalysisfactor.com/ordinal-logistic-regression-mystery/) going 
on, but for Cox proportional hazard regression, i coudl not find any resource 
helping me.



I highly appreciate if anyone could explain this for me, or suggest me resource 
that I can read.



Thank you so much for your help.



Best,

Ayako


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with sample session

2014-05-30 Thread Ista Zahn
On May 29, 2014 9:45 PM, Stephen Meskin actu...@umbc.edu wrote:

 Thanks Greg for your response. Is there a work around?

A work around for what?


 Of course this begs the question as to Why is attach part of the sample
session in App. A of the introductory manual?

Because people find it convenient.

All the commands are directly from App. A. Is it possible the configuration
of R on my computer is not in accord with acceptable practice? I.e. Could
my configuration be set so that attach works as App. A intends?

As far as I can see attach is working exactly as intended. What is the
perceived problem?

 If not then App. A needs to be changed to replace attach.
 If so, then App. A needs to provide instructing on appropriate
configuration of R for newbies.

I personally agree that the demonstration of the attach function should be
removed from the manual, but you've stated the case much too strongly. No
configuration is required, and the attach example is working as intended.

Best,
Ista

 Stephen Meskin
 Sent from my iPad

  On May 29, 2014, at 1:06 PM, Greg Snow 538...@gmail.com wrote:
 
  This is a warning and in your case would not be a problem, but it is
  good to think about and the reason why it is suggested that you avoid
  using attach and be very careful when you do use attach.  What is
  happening is that you first created a vector named 'x' in your global
  workspace, you then create a data frame that contains a column that is
  a copy of 'x' that is also named 'x' and the data frame also has
  another column named 'y'.  You then later attach the data frame to the
  search list (if you run the 'search()' command you will see your
  search list).  This is convenient in that you can now access 'y' by
  typing its name instead of something like 'dummy$y', but what happens
  if you just type x?  The issue is that there are 2 objects on your
  search path with that same name.  For your example it will not matter
  much because they have the same value, but what if you run a command
  like 'x - 3', now you will see a single value instead of a vector of
  length 20 which can lead to hard to find errors.  This is why R tries
  to be helpful by warning you that there are multiple objects named 'x'
  and therefore you may not be accessing the one that you think.  If you
  use attach without being careful it is possible to plot (or regress or
  ...) one variable from one dataset against another variable from a
  completely unrelated dataset and end up with meaningless results.  So,
  if you use attach, be careful.  You may also want to look at the
  followng functions for help with dealing with these issues: conflicts,
  find, get, with, within
 
  On Wed, May 28, 2014 at 11:55 PM, Stephen Meskin actu...@umbc.edu
wrote:
  While following the suggestion in the manual An Introduction to R to
  begin with Appendix A, I ran into the problem shown below about 3/4 of
  the way down the 1st page of App. A.
 
  After using the function /attach/, I did not get visible columns in the
  data frame as indicated but the rather puzzling message emphasized
below.
 
  I am running R version 3.1.0 (2014-04-10) using Windows XP.  Thanks in
  advance for your help.
 
  x-1:20
  w-1+sqrt(x)/2
  dummy-data.frame(x=x, y=x+rnorm(x)*w)
  dummy
 x y
  1   1  2.885347
  ...
  fm- lm(y ~ x, data=dummy)
  summary(fm)
 
  Call:
  ...
  fm1- lm(y ~ x, data=dummy,weight=1/w^2)
  summary(fm1)
 
  Call:
  ...
  attach(dummy)
  *_The following object is masked _by_ .GlobalEnv:
 
 x_**_
  _*
 
  --
  /Stephen A Meskin/, PhD, FSA, MAAA
  Adjunct Assistant Professor of Mathematics, UMBC
 
  **//
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  --
  Gregory (Greg) L. Snow Ph.D.
  538...@gmail.com
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using model coefficients on a new data set

2014-05-30 Thread Erin Hodgess
Hello again R People:

I fit an ARIMA model on a particular data set with x_1, ,x_n.  Point
x_(n+1) becomes available.  I now want to produce a forecast without
updating the model.  Is there a way to do that within R, or do I need to
write my own function, please?

Thanks!

Sincerely,
erin


-- 
Erin Hodgess
Associate Professor
Department of Mathematical and Statistics
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rJava fail

2014-05-30 Thread Bond, Stephen

R version 3.1.0 (2014-04-10) -- Spring Dance
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: i386-w64-mingw32/i386 (32-bit)

 library(rJava)
Error : .onLoad failed in loadNamespace() for 'rJava', details:
  call: dirname(this$RuntimeLib)
  error: a character vector argument expected
Error: package or namespace load failed for 'rJava'

Things used to work on R 3.0.1 but suddenly stopped. I installed the new R and 
new packages. Then started downgrading Java. Went from Java 7 to Java 6 update 
16 and still no luck. Please, advise which Java I need and if any paths need to 
be modified.
Thank you.

Stephen B

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Computer requirements to run R on huge datasets

2014-05-30 Thread Magdalena Kapelko
Dear R users,

I am writing to ask your advice with regard to the computer requirements
(RAM, architecture, processor, hard drive) in order to run R smoothly on
large datasets.

I will be running commands with many bootstrap replications (2000) on the
datasets of 10 firms.

Thank you in advance for your suggestions.
Best regards,
Magdalena

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Computer requirements to run R on huge datasets

2014-05-30 Thread Jeff Newmiller
You have given information related to the number of rows that will be involved, 
but have offered nothing about the number of columns. That is okay though...  
you should attempt your algorithms on progressively larger datasets to gauge 
how your problem scales and use your operating system to observe how much 
memory is involved and extrapolate. You can also rent time on cloud servers 
such as Amazon offers.

Any minimum number we tell you could turn out to be insufficient when you start 
exploring your large data sets... it is better for you to make your own 
estimate and safety margin so you don't blame us when it turns out to run 
slowly or choke completely.

Also, please stop posting in HTML format as requested by the Posting Guide.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On May 30, 2014 6:15:53 AM PDT, Magdalena Kapelko magdalena.kape...@gmail.com 
wrote:
Dear R users,

I am writing to ask your advice with regard to the computer
requirements
(RAM, architecture, processor, hard drive) in order to run R smoothly
on
large datasets.

I will be running commands with many bootstrap replications (2000) on
the
datasets of 10 firms.

Thank you in advance for your suggestions.
Best regards,
Magdalena

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difference in coefficients in Cox proportional hazard estimates between R and Stata, why?

2014-05-30 Thread Göran Broström
In the Cox regression case, the probable explanation is that you have 
ties in your data; Stata and coxph may have different defaults for 
handling ties. Read the manuals!


The difference in sign in the other cases is simply due to different 
definitions of the models. I am sure it is well documented in relevant 
manuals.


Göran

On 2014-05-30 13:37, Hiyoshi, Ayako wrote:

Dear R users,



Hi, thank you so much for your help in advance.

I have been using Stata but new to R. For my paper revision using
Aalen's survival analysis, I need to use R, as the command including
Aalen's survival seems to be available in R (32-bit, version 3.1.0
(2014-04-10)) but less ready to be used in Stata (version 13/SE).



To make sure that I can do basics, I have fitted logistic regression
and Cox proportional hazard regression using R and Stata.



The data I used were from UCLA R's textbook example page:
http://www.ats.ucla.edu/stat/r/examples/asa/asa_ch1_r.htm.
http://www.ats.ucla.edu/stat/r/examples/asa/asa_ch1_r.htm. I used
this in Stata too.



When I fitted logistic regression as below, the estimates were
exactly same between R and Stata.



Example using logistic regression

R:



logistic1 - glm(censor ~ age + drug, data=, family =
binomial)

summary(logistic1)

exp(cbind(OR=coef(logistic1), confint(logistic1)))

OR  2.5 %97.5 % (Intercept) 1.0373731 0.06358296 16.797896
age 1.0436805 0.96801933  1.131233 drug0.7192149
0.26042635  1.937502



Stata:



logistic censor age i.drug OR CI_lower CI_upper age |
1.043681   .96623881.127329 drug |.719215   .2665194
1.940835 _cons |   1.037373   .065847 16.3431



However, when I fitted Cox proportional hazard regression, there were
some discrepancies in coefficient (and exponentiated hazard ratios).



Example using Cox proportioanl hazard regression

R:



cox1 - coxph(Surv(time, censor) ~ drug, age, data=)
summary(cox1)

Call: coxph(formula = Surv(time, censor) ~ drug + age, data = )
n= 100, number of events= 80 coef exp(coef) se(coef) z Pr(|z|)
drug 1.01670   2.76405  0.25622 3.968 7.24e-05 *** age  0.09714
1.10202  0.01864 5.211 1.87e-07 *** --- Signif. codes:  0 '***' 0.001
'**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 exp(coef) exp(-coef) lower .95 upper
.95 drug 2.764 0.3618 1.673 4.567 age  1.102
0.9074 1.062 1.143 Concordance= 0.711  (se = 0.042 ) Rsquare=
0.324   (max possible= 0.997 ) Likelihood ratio test= 39.13  on 2 df,
p=3.182e-09 Wald test= 36.13  on 2 df,   p=1.431e-08
Score (logrank) test = 38.39  on 2 df,   p=4.602e-09

Stata:

stset time, f(censor) stcox drug age
--



_t | Haz. Ratio   Std. Err.  zP|z| [95% Conf. Interval]

-+



drug |   2.563531   .6550089 3.68   0.000  1.553634.229893

age |   1.095852 .02026 4.95   0.000 1.056854
1.136289
--





The HR estimates for drug was 2.76 from R, but 2.56 from Stata.

I searched in internet for explanation, but could not find any.



In parametric survival regression with exponential distribution, R
and Stata's coefficients were completely opposite while the values
were exactly same (i.e. say 0.08 for Stata and -0.08 for R). I
suspected something like this
(http://www.theanalysisfactor.com/ordinal-logistic-regression-mystery/)
going on, but for Cox proportional hazard regression, i coudl not
find any resource helping me.



I highly appreciate if anyone could explain this for me, or suggest
me resource that I can read.



Thank you so much for your help.



Best,

Ayako


[[alternative HTML version deleted]]

__ R-help@r-project.org
mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] accessing C code from a base package within a function

2014-05-30 Thread Erin Hodgess
Hello yet again.

I have written a small function which calls a couple of the C programs from
the stats base package.  It's actually a modification of the arima function.

However, when I try to run it, it says that the C program is not found.

Any suggestions would be much appreciated.

Windows 7, R version 3.0.2

Thanks,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Mathematical and Statistics
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rJava fail

2014-05-30 Thread Simon Urbanek
On May 30, 2014, at 9:55 AM, Bond, Stephen stephen.b...@cibc.com wrote:

 
 R version 3.1.0 (2014-04-10) -- Spring Dance
 Copyright (C) 2014 The R Foundation for Statistical Computing
 Platform: i386-w64-mingw32/i386 (32-bit)
 
 library(rJava)
 Error : .onLoad failed in loadNamespace() for 'rJava', details:
  call: dirname(this$RuntimeLib)
  error: a character vector argument expected
 Error: package or namespace load failed for 'rJava'
 
 Things used to work on R 3.0.1 but suddenly stopped. I installed the new R 
 and new packages. Then started downgrading Java. Went from Java 7 to Java 6 
 update 16 and still no luck. Please, advise which Java I need and if any 
 paths need to be modified.


Please make sure that your Java architecture matches your R architecture and 
then re-install the matching Java (i.e. both have to be 32-bit or both have to 
be 64-bit - you cannot mix/match). It seems that there is a problem with your 
Java registry entries. The version is irrelevant - any Java version 1.4 or 
higher should work.

Please direct further questions to the stats-rosuda-devel mailing list for 
rJava.

Thanks,
Simon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with sample session

2014-05-30 Thread Greg Snow
If you pay attention and are careful not to use any variables names
that conflict then you do not need a work around (and the conflicts
function can help you see if there are any conflicts that you may need
to worry about).

Probably the best work around is to use the with or within function
instead of attaching.  For a couple of quick commands these work great
and I prefer them to using attach.  But, sometimes for a long sequence
of commands attach is much more convenient and is fine to use as long
as you recognize the potential dangers and are careful.

On Thu, May 29, 2014 at 3:56 PM, Stephen Meskin actu...@umbc.edu wrote:
 Thanks Greg for your response. Is there a work around?

 Of course this begs the question as to Why is attach part of the sample 
 session in App. A of the introductory manual? All the commands are directly 
 from App. A. Is it possible the configuration of R on my computer is not in 
 accord with acceptable practice? I.e. Could my configuration be set so that 
 attach works as App. A intends?
 If not then App. A needs to be changed to replace attach.
 If so, then App. A needs to provide instructing on appropriate configuration 
 of R for newbies.

 Stephen Meskin
 Sent from my iPad

 On May 29, 2014, at 1:06 PM, Greg Snow 538...@gmail.com wrote:

 This is a warning and in your case would not be a problem, but it is
 good to think about and the reason why it is suggested that you avoid
 using attach and be very careful when you do use attach.  What is
 happening is that you first created a vector named 'x' in your global
 workspace, you then create a data frame that contains a column that is
 a copy of 'x' that is also named 'x' and the data frame also has
 another column named 'y'.  You then later attach the data frame to the
 search list (if you run the 'search()' command you will see your
 search list).  This is convenient in that you can now access 'y' by
 typing its name instead of something like 'dummy$y', but what happens
 if you just type x?  The issue is that there are 2 objects on your
 search path with that same name.  For your example it will not matter
 much because they have the same value, but what if you run a command
 like 'x - 3', now you will see a single value instead of a vector of
 length 20 which can lead to hard to find errors.  This is why R tries
 to be helpful by warning you that there are multiple objects named 'x'
 and therefore you may not be accessing the one that you think.  If you
 use attach without being careful it is possible to plot (or regress or
 ...) one variable from one dataset against another variable from a
 completely unrelated dataset and end up with meaningless results.  So,
 if you use attach, be careful.  You may also want to look at the
 followng functions for help with dealing with these issues: conflicts,
 find, get, with, within

 On Wed, May 28, 2014 at 11:55 PM, Stephen Meskin actu...@umbc.edu wrote:
 While following the suggestion in the manual An Introduction to R to
 begin with Appendix A, I ran into the problem shown below about 3/4 of
 the way down the 1st page of App. A.

 After using the function /attach/, I did not get visible columns in the
 data frame as indicated but the rather puzzling message emphasized below.

 I am running R version 3.1.0 (2014-04-10) using Windows XP.  Thanks in
 advance for your help.

 x-1:20
 w-1+sqrt(x)/2
 dummy-data.frame(x=x, y=x+rnorm(x)*w)
 dummy
x y
 1   1  2.885347
 ...
 fm- lm(y ~ x, data=dummy)
 summary(fm)

 Call:
 ...
 fm1- lm(y ~ x, data=dummy,weight=1/w^2)
 summary(fm1)

 Call:
 ...
 attach(dummy)
 *_The following object is masked _by_ .GlobalEnv:

x_**_
 _*

 --
 /Stephen A Meskin/, PhD, FSA, MAAA
 Adjunct Assistant Professor of Mathematics, UMBC

 **//

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Gregory (Greg) L. Snow Ph.D.
 538...@gmail.com




-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Gui for R-Script

2014-05-30 Thread Shane Carey
Hi,

I'm just looking into creating a GUI for my R-Script. Is it possible to
create a gui for the script and send it to somebody? Maybe as a .exe for
example

Thanks


-- 
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Gui for R-Script

2014-05-30 Thread Greg Snow
There are several options for creating GUIs depending on how much
control you want and how much work you are willing to put in.

One simple option is the tkexamp function in the TeachingDemos
package.  This approach would require whoever receives your script to
have R running, but then they could just run your script and have a
GUI to change parameters and see the results.

Another option is shiny from the Rstudio developers.  This has 2
options, you can either send a script that the user would then run on
their own machine (would need R and packages installed) or you can set
up a server and send the user the URL, R would only be installed on
the server and the user would only need web access.

I don't know of any options that would produce an .exe, since any R
code does need access to an implementation of R and it seems a bit of
overkill to package all of R with each sample script that you want to
run.

On Fri, May 30, 2014 at 10:48 AM, Shane Carey careys...@gmail.com wrote:
 Hi,

 I'm just looking into creating a GUI for my R-Script. Is it possible to
 create a gui for the script and send it to somebody? Maybe as a .exe for
 example

 Thanks


 --
 Shane

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Gui for R-Script

2014-05-30 Thread Jeff Newmiller
Possible, yes, anything is possible, but it your goal is to easily hide R from 
the users then you will probably not find the project worth the effort required.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On May 30, 2014 9:48:46 AM PDT, Shane Carey careys...@gmail.com wrote:
Hi,

I'm just looking into creating a GUI for my R-Script. Is it possible to
create a gui for the script and send it to somebody? Maybe as a .exe
for
example

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] missing return

2014-05-30 Thread hieu
hi R user, i have a problem about missing data while i calculate return in
times series. It means my time series have a lot of code of stocks, i
arrange them as type of panel data. I don't look for any solution.

For example:
CODE DATE RETURN
A   2008   NA
A   2009   0.25
A   2010   0.4
A   2011   0.3
B   2008   NA
B   2009   0.35
B   2010   0.15
B   2011   0.20
Please give me some idea to solve this problem and start step 2: run time
series to demonstrate market's efficient.
Thanks!




--
View this message in context: 
http://r.789695.n4.nabble.com/missing-return-tp4691489.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] EpiX Analytics - Quantitative Risk Analysis with R Course

2014-05-30 Thread Barbara ONeill
There are still a few places available to attend the following course:

Quantitative Risk Analysis with R
Fort Collins, Colorado, USA
June 16-19, 2014

This 4-day course will cover the core principles of quantitative risk
analysis and the most important risk modeling principles and techniques. The
course will be taught using the R statistical language but the lessons apply
equally well to other modeling environments. The focus of the course is on
how to conduct accurate and effective quantitative risk analyses, including
best practices of risk modeling, selecting the appropriate distribution,
using data and expert opinion, and avoiding common mistakes. The course will
also cover essential probability and statistics theory and various
stochastic processes to provide the participants with a solid understanding
of quantitative risk analysis.
For additional information and to register please visit our website at:
http://www.epixanalytics.com/Quantitative-Risk-Analysis-with-R.html

To register by phone or for any questions please contact: 
Barbara O'Neill - bone...@epixanalytics.com
Ph: 1-303-440-8524

EpiX Analytics 
1643 Spruce Street, Boulder, CO, 80302, USA 
www.EpiXAnalytics.com|bone...@epixanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RODBC and PosgreSQL problems

2014-05-30 Thread Fraser D. Neiman

Dear All,

I am trying for the first time to run SQL queries against a remote PostgreSQL 
database via RODBC. I am able to establish a connection just fine, as shown by 
getting results back from the sqlTables(),  sqlColumns() and sqlPrimary Key() 
functions in RODBC. However, when I try to run a SQL query using the sqlQuery() 
function I get

[1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while 
executing the query
[2] [RODBC] ERROR: Could not SQLExecDirect '\n SELECT * \n 
FROM tblCeramicWare

What am I doing wrong?

Here are the relevant snips from the R console.  What's puzzling is that 
tblcermicWare is recognized as an argument to sqlColumns() and 
sqlPrimaryKey() . But NOT in sqlQuery() .

Thanks for any pointers.

best, Fraser

 library(RODBC)

 # connect to DAACS and assign a name (DAACSch) to the connection
 DRCch - odbcConnect(postgreSQL35W , case= nochange, uid 
 =XX,pwd=XX);

 #list the tables that are avalailabale
 sqlTables(DRCch, tableType = TABLE)
 TABLE_QUALIFIER TABLE_OWNER   TABLE_NAME 
TABLE_TYPE REMARKS
1   daacs-production  public TempSTPTable  
TABLE
2   daacs-production  public   activities  
TABLE
3   daacs-production  public articles  
TABLE
4   daacs-production  publicschema_migrations  
TABLE
5   daacs-production  publictblACDistance  
TABLE
6   daacs-production  public   tblArtifactBox  
TABLE
7   daacs-production  public tblArtifactImage  
TABLE
8   daacs-production  publictblBasicColor  
TABLE
9   daacs-production  public  tblBead  
TABLE


 sqlColumns(DRCch, tblCeramicWare)
   TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME DATA_TYPE TYPE_NAME 
PRECISION LENGTH SCALE RADIX NULLABLE
1 daacs-production  public tblCeramicWare  WareID 4  int4   
 10  4 0100
2 daacs-production  public tblCeramicWareWare-9   varchar   
 50100NANA1
  REMARKS COLUMN_DEF SQL_DATA_TYPE SQL_DATETIME_SUB 
CHAR_OCTET_LENGTH ORDINAL_POSITION
1 nextval('global_id_seq'::regclass) 4   NA 
   -11
2   NA-9   NA 
  1002
  IS_NULLABLE DISPLAY_SIZE FIELD_TYPE AUTO_INCREMENT PHYSICAL NUMBER TABLE OID 
BASE TYPEID TYPMOD
1NA   11 23  1   1 27441  
 0 -1
2NA   50   1043  0   2 27441  
 0 50
 sqlPrimaryKeys(DRCch, tblCeramicWare)
   TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ 
PK_NAME
1 daacs-production  public tblCeramicWare  WareID   1 
tblCeramicWare_pkey

 sqlQuery(DRCch,paste(
+  SELECT *
+  FROM tblCeramicWare
+  ))
[1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while 
executing the query
[2] [RODBC] ERROR: Could not SQLExecDirect '\n SELECT * \n 
FROM tblCeramicWare \n '




Fraser D. Neiman
Department of Archaeology, Monticello
(434) 984 9812


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] missing return

2014-05-30 Thread hieu




--
View this message in context: 
http://r.789695.n4.nabble.com/missing-return-tp4691489p4691490.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] missing return

2014-05-30 Thread hieu




--
View this message in context: 
http://r.789695.n4.nabble.com/missing-return-tp4691489p4691493.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with sample session

2014-05-30 Thread Stephen Meskin
Greg, Ista, (or anyone else),
Let me take one last run at this problem.

Consider the following extract from the Appendix A text:

  x - 1:20
  w - 1+sqrt(x)/2
  dummy - data.frame(x=x, y=x+rnorm(x)*w)
  fm - lm(y~x, data=dummy)
  fm1 - lm(y~x, data=dummy, weight=1/w^2)

 attach(dummy)
 /Make the columns in the data frame visible as variables. /

 The following object is masked_by_.GlobalEnv:

 x

In the above I have included only one comment, Make the columns ... 
visible as variables. from the text and
only one response, The following object is masked by ... : x.

Stuff I don't understand:

 1. The purpose of attach seems to be to make x and y visible but I
can already see them by entering the command dummy even after the
warning. So what does attach do?
 2. What I would like to see is a table with 1st column x; 2nd column y;
3rd and 4th columns predicted ys from fm and fm1; plus possibly
columns of residuals and other stuff. Such tables don't seem to be
available according to the discussion in ?lm.
 3. The warning about attach seems to say that there is an x in the
Global Environment that will mask the x that I am using. But that
is not happening in what I see. If I enter  x after the warning,
I still get 1,2,3, as before. What is the problem?
 4. If I place the above R-script in a folder other than the R-console
that comes up when I first open R will that obviate the attach
problem.


/Stephen A Meskin/, PhD, FSA, MAAA
Adjunct Assistant Professor of Mathematics, UMBC

*Most people give you an anticipatory grin when you mention a /statistic/,
frown doubtingly when you mention the plural /statistics/, and
grunt and groan in a gurgle when you mention /a statistics course/.*//
On 5/30/2014 12:20 PM, Greg Snow wrote:
 If you pay attention and are careful not to use any variables names
 that conflict then you do not need a work around (and the conflicts
 function can help you see if there are any conflicts that you may need
 to worry about).

 Probably the best work around is to use the with or within function
 instead of attaching.  For a couple of quick commands these work great
 and I prefer them to using attach.  But, sometimes for a long sequence
 of commands attach is much more convenient and is fine to use as long
 as you recognize the potential dangers and are careful.

 On Thu, May 29, 2014 at 3:56 PM, Stephen Meskin actu...@umbc.edu wrote:
 Thanks Greg for your response. Is there a work around?

 Of course this begs the question as to Why is attach part of the sample 
 session in App. A of the introductory manual? All the commands are directly 
 from App. A. Is it possible the configuration of R on my computer is not in 
 accord with acceptable practice? I.e. Could my configuration be set so that 
 attach works as App. A intends?
 If not then App. A needs to be changed to replace attach.
 If so, then App. A needs to provide instructing on appropriate configuration 
 of R for newbies.

 Stephen Meskin
 Sent from my iPad

 On May 29, 2014, at 1:06 PM, Greg Snow 538...@gmail.com wrote:

 This is a warning and in your case would not be a problem, but it is
 good to think about and the reason why it is suggested that you avoid
 using attach and be very careful when you do use attach.  What is
 happening is that you first created a vector named 'x' in your global
 workspace, you then create a data frame that contains a column that is
 a copy of 'x' that is also named 'x' and the data frame also has
 another column named 'y'.  You then later attach the data frame to the
 search list (if you run the 'search()' command you will see your
 search list).  This is convenient in that you can now access 'y' by
 typing its name instead of something like 'dummy$y', but what happens
 if you just type x?  The issue is that there are 2 objects on your
 search path with that same name.  For your example it will not matter
 much because they have the same value, but what if you run a command
 like 'x - 3', now you will see a single value instead of a vector of
 length 20 which can lead to hard to find errors.  This is why R tries
 to be helpful by warning you that there are multiple objects named 'x'
 and therefore you may not be accessing the one that you think.  If you
 use attach without being careful it is possible to plot (or regress or
 ...) one variable from one dataset against another variable from a
 completely unrelated dataset and end up with meaningless results.  So,
 if you use attach, be careful.  You may also want to look at the
 followng functions for help with dealing with these issues: conflicts,
 find, get, with, within

 On Wed, May 28, 2014 at 11:55 PM, Stephen Meskin actu...@umbc.edu wrote:
 While following the suggestion in the manual An Introduction to R to
 begin with Appendix A, I ran into the problem shown below about 3/4 of
 the way down the 1st page of App. A.

 After using the function /attach/, I did not get visible columns in the
 data frame as 

Re: [R] Problem with sample session

2014-05-30 Thread Ista Zahn
Hi Stephen,

See in line.

On Fri, May 30, 2014 at 4:18 PM, Stephen Meskin actu...@umbc.edu wrote:
 Greg, Ista, (or anyone else),
 Let me take one last run at this problem.

 Consider the following extract from the Appendix A text:

 x - 1:20
 w - 1+sqrt(x)/2
 dummy - data.frame(x=x, y=x+rnorm(x)*w)
 fm - lm(y~x, data=dummy)
 fm1 - lm(y~x, data=dummy, weight=1/w^2)

attach(dummy)
 Make the columns in the data frame visible as variables.

 The following object is masked_by_.GlobalEnv:

 x


 In the above I have included only one comment, Make the columns ... visible
 as variables. from the text and
 only one response, The following object is masked by ... : x.

 Stuff I don't understand:

 The purpose of attach seems to be to make x and y visible but I can
 already see them by entering the command dummy even after the warning. So
 what does attach do?

It makes them visible in the sense that you can refer to them
without referring to dummy: try

rm(list=ls()) ## delete everything from your workspace

dummy - data.frame(x=1:20) # data.frame containing x
dummy$w - 1+sqrt(dummy$x)/2 # add w column to dummy
dummy$y - dummy$x + dummy$x + rnorm(dummy$x) * dummy$w # add y column

# x is not available
x #Error: object 'x' not found
#...exept as an element of dummy
dummy$x

attach(dummy)
#   Make the columns in the data frame visible as variables.
x # x is now available as x, as well as dummy$x


 What I would like to see is a table with 1st column x; 2nd column y; 3rd and
 4th columns predicted ys from fm and fm1; plus possibly columns of residuals
 and other stuff. Such tables don't seem to be available according to the
 discussion in ?lm.

In R it is common to calculate things only as you need them. The
predicted and residual values are not calculated by lm, but after the
fact by predict.lm() and residuals.lm(). For example:

fm - lm(y~x, data=dummy)
fm1 - lm(y~x, data=dummy, weight=1/w^2)

dummy$yhat.fm - predict(fm)
dummy$yhat.fm1 - predict(fm1)
dummy$yresid - residuals(fm1)
dummy


 The warning about attach seems to say that there is an x in the Global
 Environment that will mask the x that I am using. But that is not
 happening in what I see.

Yes it is.

 If I enter  x after the warning, I still get
 1,2,3, as before. What is the problem?

That 1,2,3 ... you are seeing comes from the x that was defined earlier, by

x - 1:20

_not_ from the x in dummy nor the x attached from dummy. Try this:

rm(list=ls())
x - 1:5
w - 1+sqrt(x)/2
dummy - data.frame(x=x, y=x+rnorm(x)*w)
attach(dummy)

You now have three x variables: the one created with x - 1:5 (in
the global environment), the one in dummy, and the attached one copied
from dummy. This makes it easy to become confused about which x you
are getting. Consider:

x - 1 ## changes x in the global workspace, but not dummy$x nor the
attached copy of dummy$x
dummy$x - 2 # changes dummy$x but not the attached copy of dummy$x

x # this is the x in the global environment
# [1] 1

rm(x)
x # this is the attached copy of x
#[1] 1 2 3 4 5
dummy$x # this is the x in dummy
# [1] 2 2 2 2 2


 If I place the above R-script in a folder other than the R-console that
 comes up when I first open R will that obviate the attach problem.

I'm not following this one...

Best,
Ista


 Stephen A Meskin, PhD, FSA, MAAA
 Adjunct Assistant Professor of Mathematics, UMBC

 Most people give you an anticipatory grin when you mention a statistic,
 frown doubtingly when you mention the plural statistics, and
 grunt and groan in a gurgle when you mention a statistics course.
 On 5/30/2014 12:20 PM, Greg Snow wrote:

 If you pay attention and are careful not to use any variables names
 that conflict then you do not need a work around (and the conflicts
 function can help you see if there are any conflicts that you may need
 to worry about).

 Probably the best work around is to use the with or within function
 instead of attaching.  For a couple of quick commands these work great
 and I prefer them to using attach.  But, sometimes for a long sequence
 of commands attach is much more convenient and is fine to use as long
 as you recognize the potential dangers and are careful.

 On Thu, May 29, 2014 at 3:56 PM, Stephen Meskin actu...@umbc.edu wrote:

 Thanks Greg for your response. Is there a work around?

 Of course this begs the question as to Why is attach part of the sample
 session in App. A of the introductory manual? All the commands are directly
 from App. A. Is it possible the configuration of R on my computer is not in
 accord with acceptable practice? I.e. Could my configuration be set so that
 attach works as App. A intends?
 If not then App. A needs to be changed to replace attach.
 If so, then App. A needs to provide instructing on appropriate configuration
 of R for newbies.

 Stephen Meskin
 Sent from my iPad

 On May 29, 2014, at 1:06 PM, Greg Snow 538...@gmail.com wrote:

 This is a warning and in your case would not be a problem, but it is
 good to think about and the 

[R] Calculating the focal mean of a raster using an annulus

2014-05-30 Thread Parks, Sean -FS
Hello esteemed R experts,

I am attempting the use the 'focal' function in the raster package to calculate 
the mean of an annulus (as opposed to a focal mean of a circle or square).

From what I can tell, this requires me to generate a weights matrix and use 
this matrix in the focal function.

The problem, however, is that there are edge effects because the weights to not 
get readjusted along the boundary/edge of my raster. Therefore, the focal mean 
of the annulus near the boundary of my rasters are lower than would be expected.

Code is below that repeats the problem. Please help if you can.

Thank you,
Sean Parks


##
##

library(raster)
the.raster - raster(matrix(round(rep(seq(1,5, 0.0001), 250)), nrow=250, 
ncol=250))

#
# Create annulus weights matrix
# There is actually an error in the weights matrix that I will figure out later
# Bonus points if you can identify and fix it
# This error does not affect the overall issue that I am attempting to address
# Please don't [publicly] make fun of me for this clunky code
#
w - focalWeight(the.raster, d=0.025, type='circle')
w.rev - w
for (i in 1:ncol(w)) {
   col - w[,i]
   num.recs - length(which(w != 0))

   first.rec - match(1/num.recs, col)
   last.rec - length(col) - (first.rec- 1)
   non.recs - seq(1:length(col))
   non.recs - non.recs[-c(first.rec, last.rec)]

   col[non.recs] - 0
   w.rev[,i] - col
}

count - length(which(w.rev != 0))
w.rev[w.rev != 0] - 1/count

###
# End of creating weights matrix
###

# Now make and view the annulus raster
# Note the edge effects in the top and bottom of the plot

annulus.raster - round(focal(the.raster, w=w.rev, na.rm=T, pad=T))
plot(annulus.raster)

# END

#
#






This electronic message contains information generated by the USDA solely for 
the intended recipients. Any unauthorized interception of this message or the 
use or disclosure of the information it contains may violate the law and 
subject the violator to civil or criminal penalties. If you believe you have 
received this message in error, please notify the sender and delete the email 
immediately.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with sample session

2014-05-30 Thread Jeff Newmiller
1. If you create a variable x or w or dummy, it is stored in the current 
environment. You can refer to it by the name x or w or dummy. If you create a 
column x in the data frame dummy, you can refer to it as dummy$x or 
dummy[[x]]. That is a different x than the variable x in your current 
environment. If you execute attach(dummy), then your current environment 
becomes the dummy data frame and you can refer to the column x in the data 
frame as x rather than the variable discussed above. If you have attached dummy 
and you refer to a variable like w that doesn't exist in dummy, R will search 
the chain of environments. The first environment it finds after looking through 
dummy and failing is the environment that was previously your current 
environment, which does have a w variable, and it will use it.
2. The R lm function is not a swiss army knife... there are other functions to 
obtain those additional columns... in this case the predict and residuals 
functions would get that data. For example,
dummy[ , y.fm ] - predict( fm )
dummy[ , resid.fm ] - residuals( fm )
Read ?lm and pay attention to the see also and examples sections. 
3. Until you use the detach function, when you refer to x you will see the x 
that is a column in the dummy data frame. Afterward, you will have to use 
dummy$x to see that same value.
4. The chain of variable environments exists only within the RAM used by R, and 
has nothing too do with the disk directory structure. That was a concept from S 
(so I was told), not from R.
5. You are overdue to read the Posting Guide mentioned in the footer. In there, 
among other things, is advice to post in plain text. HTML tends to be corrupted 
when the mailing list strips the HTML, so we may not see what you think we see.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On May 30, 2014 1:18:04 PM PDT, Stephen Meskin actu...@umbc.edu wrote:
Greg, Ista, (or anyone else),
Let me take one last run at this problem.

Consider the following extract from the Appendix A text:

  x - 1:20
  w - 1+sqrt(x)/2
  dummy - data.frame(x=x, y=x+rnorm(x)*w)
  fm - lm(y~x, data=dummy)
  fm1 - lm(y~x, data=dummy, weight=1/w^2)

 attach(dummy)
 /Make the columns in the data frame visible as variables. /

 The following object is masked_by_.GlobalEnv:

 x

In the above I have included only one comment, Make the columns ... 
visible as variables. from the text and
only one response, The following object is masked by ... : x.

Stuff I don't understand:

 1. The purpose of attach seems to be to make x and y visible but I
can already see them by entering the command dummy even after the
warning. So what does attach do?
2. What I would like to see is a table with 1st column x; 2nd column y;
3rd and 4th columns predicted ys from fm and fm1; plus possibly
columns of residuals and other stuff. Such tables don't seem to be
available according to the discussion in ?lm.
 3. The warning about attach seems to say that there is an x in the
Global Environment that will mask the x that I am using. But that
is not happening in what I see. If I enter  x after the warning,
I still get 1,2,3, as before. What is the problem?
 4. If I place the above R-script in a folder other than the R-console
that comes up when I first open R will that obviate the attach
problem.


/Stephen A Meskin/, PhD, FSA, MAAA
Adjunct Assistant Professor of Mathematics, UMBC

*Most people give you an anticipatory grin when you mention a
/statistic/,
frown doubtingly when you mention the plural /statistics/, and
grunt and groan in a gurgle when you mention /a statistics course/.*//
On 5/30/2014 12:20 PM, Greg Snow wrote:
 If you pay attention and are careful not to use any variables names
 that conflict then you do not need a work around (and the conflicts
 function can help you see if there are any conflicts that you may
need
 to worry about).

 Probably the best work around is to use the with or within function
 instead of attaching.  For a couple of quick commands these work
great
 and I prefer them to using attach.  But, sometimes for a long
sequence
 of commands attach is much more convenient and is fine to use as long
 as you recognize the potential dangers and are careful.

 On Thu, May 29, 2014 at 3:56 PM, Stephen Meskin actu...@umbc.edu
wrote:
 Thanks Greg for your response. Is there a work around?

 Of course this begs the question as to Why is attach part of the
sample session in App. A of 

[R] converting a data.frame into a different table

2014-05-30 Thread Assa Yeroslaviz
Hi,

I have a matrix of 4.5Kx4.5K elements with column- and row names

I need to convert this matrix into a table, where one column is the name of
the row for the element, the second column is the name of the column for
the same element and the third column is the element itself.

The way I do it at the moment is with a double for-loop.
With this way though it takes ages for the loop to finish.

I was wondering whether there is a faster way of doing the same conversion.

This is how I am doing it now:
my.df -data.frame()
for (i in 1:(nrow(out5.df)-1)){
for (j in i:ncol(out5.df)) {
#print(paste( I am at position: row-, i,  and col-, j, sep=))
a- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j],
Value=out5.df[i,j])
my.df - rbind(my.df, a)
}
}

this is an example for the data I have:
1234567
1FBgn0037249FBpp0312226FBtr0346646FBgn0266186
FBpp0312219FBtr0346639FBgn0010100
2FBgn0036389FBpp0312225FBtr0346645FBgn0037894
FBpp0312218FBtr0346638FBgn0026577
3FBgn0014002FBpp0312224FBtr0346644FBgn0025712
FBpp0312183FBtr0346593FBpp0312178
4FBgn0034201FBpp0312223FBtr0346643FBgn0025712
FBpp0312182FBtr0346592FBpp0312177
5FBgn0029860FBpp031FBtr0346642FBgn0261597
FBpp0312181FBtr0346591FBtr0346587
6FBgn0028526FBpp0312221FBtr0346641FBgn0263050
FBpp0312180FBtr0346589FBtr0346586
7FBgn0003486FBpp0312220FBtr0346640FBgn0263051
FBpp0312179FBtr0346588FBpp0312219

What I would like to get at the end is something like that:
 my.df
   start start.1   Value
1  1  X1 FBgn0037249
2  1  X2 FBpp0312226
3  1  X3 FBtr0346646
4  1  X4 FBgn0266186
5  1  X5 FBpp0312219
6  1  X6 FBtr0346639
7  1  X7 FBgn0010100
8  2  X2 FBpp0312225
9  2  X3 FBtr0346645
10 2  X4 FBgn0037894
11 2  X5 FBpp0312218
12 2  X6 FBtr0346638
13 2  X7 FBgn0026577
14 3  X3 FBtr0346644
15 3  X4 FBgn0025712
16 3  X5 FBpp0312183
17 3  X6 FBtr0346593
18 3  X7 FBpp0312178
19 4  X4 FBgn0025712
20 4  X5 FBpp0312182
21 4  X6 FBtr0346592
22 4  X7 FBpp0312177
23 5  X5 FBpp0312181
24 5  X6 FBtr0346591
25 5  X7 FBtr0346587
26 6  X6 FBtr0346589
27 6  X7 FBtr0346586


Sp I would like to know if there is a better way of ding it than a double
for loop.

thanks
Assa

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] converting a data.frame into a different table

2014-05-30 Thread Jeff Newmiller
library(reshape2) # you probably need to install reshape2 before this works
?melt

---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On May 30, 2014 3:07:25 PM PDT, Assa Yeroslaviz fry...@gmail.com wrote:
Hi,

I have a matrix of 4.5Kx4.5K elements with column- and row names

I need to convert this matrix into a table, where one column is the
name of
the row for the element, the second column is the name of the column
for
the same element and the third column is the element itself.

The way I do it at the moment is with a double for-loop.
With this way though it takes ages for the loop to finish.

I was wondering whether there is a faster way of doing the same
conversion.

This is how I am doing it now:
my.df -data.frame()
for (i in 1:(nrow(out5.df)-1)){
for (j in i:ncol(out5.df)) {
#print(paste( I am at position: row-, i,  and col-, j,
sep=))
a- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j],
Value=out5.df[i,j])
my.df - rbind(my.df, a)
}
}

this is an example for the data I have:
1234567
1FBgn0037249FBpp0312226FBtr0346646FBgn0266186
FBpp0312219FBtr0346639FBgn0010100
2FBgn0036389FBpp0312225FBtr0346645FBgn0037894
FBpp0312218FBtr0346638FBgn0026577
3FBgn0014002FBpp0312224FBtr0346644FBgn0025712
FBpp0312183FBtr0346593FBpp0312178
4FBgn0034201FBpp0312223FBtr0346643FBgn0025712
FBpp0312182FBtr0346592FBpp0312177
5FBgn0029860FBpp031FBtr0346642FBgn0261597
FBpp0312181FBtr0346591FBtr0346587
6FBgn0028526FBpp0312221FBtr0346641FBgn0263050
FBpp0312180FBtr0346589FBtr0346586
7FBgn0003486FBpp0312220FBtr0346640FBgn0263051
FBpp0312179FBtr0346588FBpp0312219

What I would like to get at the end is something like that:
 my.df
   start start.1   Value
1  1  X1 FBgn0037249
2  1  X2 FBpp0312226
3  1  X3 FBtr0346646
4  1  X4 FBgn0266186
5  1  X5 FBpp0312219
6  1  X6 FBtr0346639
7  1  X7 FBgn0010100
8  2  X2 FBpp0312225
9  2  X3 FBtr0346645
10 2  X4 FBgn0037894
11 2  X5 FBpp0312218
12 2  X6 FBtr0346638
13 2  X7 FBgn0026577
14 3  X3 FBtr0346644
15 3  X4 FBgn0025712
16 3  X5 FBpp0312183
17 3  X6 FBtr0346593
18 3  X7 FBpp0312178
19 4  X4 FBgn0025712
20 4  X5 FBpp0312182
21 4  X6 FBtr0346592
22 4  X7 FBpp0312177
23 5  X5 FBpp0312181
24 5  X6 FBtr0346591
25 5  X7 FBtr0346587
26 6  X6 FBtr0346589
27 6  X7 FBtr0346586


Sp I would like to know if there is a better way of ding it than a
double
for loop.

thanks
Assa

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] converting a data.frame into a different table

2014-05-30 Thread David Winsemius

On May 30, 2014, at 3:07 PM, Assa Yeroslaviz wrote:

 Hi,
 
 I have a matrix of 4.5Kx4.5K elements with column- and row names
 
 I need to convert this matrix into a table, where one column is the name of
 the row for the element, the second column is the name of the column for
 the same element and the third column is the element itself.

In R a table object is just a matrix with a class of table and there is a 
really kewl function to do exactly what you ask for on objects with class table 
so try this:

class(out5.df) - table

my.df - as.data.frame(out5.df)

 
 The way I do it at the moment is with a double for-loop.
 With this way though it takes ages for the loop to finish.
 
 I was wondering whether there is a faster way of doing the same conversion.
 
 This is how I am doing it now:
 my.df -data.frame()
 for (i in 1:(nrow(out5.df)-1)){
for (j in i:ncol(out5.df)) {
 #print(paste( I am at position: row-, i,  and col-, j, sep=))
a- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j],
 Value=out5.df[i,j])
my.df - rbind(my.df, a)
}
}
 
 this is an example for the data I have:

I would have tested this if it had been offered using the output of dput()

?dput


 out5.df - matrix(1:30,5,6)
 colnames(out5.df)-letters[1:6]
 rownames(out5.df)-LETTERS[1:5]
 class(out5.df) - table
 
 my.df - as.data.frame(out5.df)
 
 my.df
   Var1 Var2 Freq
1 Aa1
2 Ba2
3 Ca3
4 Da4
5 Ea5
6 Ab6
...snippped the rest

-- 
David.
 


1234567
 1FBgn0037249FBpp0312226FBtr0346646FBgn0266186
 FBpp0312219FBtr0346639FBgn0010100
 2FBgn0036389FBpp0312225FBtr0346645FBgn0037894
 FBpp0312218FBtr0346638FBgn0026577
 3FBgn0014002FBpp0312224FBtr0346644FBgn0025712
 FBpp0312183FBtr0346593FBpp0312178
 4FBgn0034201FBpp0312223FBtr0346643FBgn0025712
 FBpp0312182FBtr0346592FBpp0312177
 5FBgn0029860FBpp031FBtr0346642FBgn0261597
 FBpp0312181FBtr0346591FBtr0346587
 6FBgn0028526FBpp0312221FBtr0346641FBgn0263050
 FBpp0312180FBtr0346589FBtr0346586
 7FBgn0003486FBpp0312220FBtr0346640FBgn0263051
 FBpp0312179FBtr0346588FBpp0312219
 
 What I would like to get at the end is something like that:
 my.df
   start start.1   Value
 1  1  X1 FBgn0037249
 2  1  X2 FBpp0312226
 3  1  X3 FBtr0346646
 4  1  X4 FBgn0266186
 5  1  X5 FBpp0312219
 6  1  X6 FBtr0346639
 7  1  X7 FBgn0010100
 8  2  X2 FBpp0312225
 9  2  X3 FBtr0346645
 10 2  X4 FBgn0037894
 11 2  X5 FBpp0312218
 12 2  X6 FBtr0346638
 13 2  X7 FBgn0026577
 14 3  X3 FBtr0346644
 15 3  X4 FBgn0025712
 16 3  X5 FBpp0312183
 17 3  X6 FBtr0346593
 18 3  X7 FBpp0312178
 19 4  X4 FBgn0025712
 20 4  X5 FBpp0312182
 21 4  X6 FBtr0346592
 22 4  X7 FBpp0312177
 23 5  X5 FBpp0312181
 24 5  X6 FBtr0346591
 25 5  X7 FBtr0346587
 26 6  X6 FBtr0346589
 27 6  X7 FBtr0346586
 
 
 Sp I would like to know if there is a better way of ding it than a double
 for loop.
 
 thanks
 Assa
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] converting a data.frame into a different table

2014-05-30 Thread arun
Hi,

You may try:
##Assuming the dataset is a matrix

mat - structure(c(FBgn0037249, FBgn0036389, FBgn0014002, FBgn0034201, 
FBgn0029860, FBgn0028526, FBgn0003486, FBpp0312226, FBpp0312225, 
FBpp0312224, FBpp0312223, FBpp031, FBpp0312221, FBpp0312220, 
FBtr0346646, FBtr0346645, FBtr0346644, FBtr0346643, FBtr0346642, 
FBtr0346641, FBtr0346640, FBgn0266186, FBgn0037894, FBgn0025712, 
FBgn0025712, FBgn0261597, FBgn0263050, FBgn0263051, FBpp0312219, 
FBpp0312218, FBpp0312183, FBpp0312182, FBpp0312181, FBpp0312180, 
FBpp0312179, FBtr0346639, FBtr0346638, FBtr0346593, FBtr0346592, 
FBtr0346591, FBtr0346589, FBtr0346588, FBgn0010100, FBgn0026577, 
FBpp0312178, FBpp0312177, FBtr0346587, FBtr0346586, FBpp0312219
), .Dim = c(7L, 7L), .Dimnames = list(c(1, 2, 3, 4, 5, 
6, 7), c(1, 2, 3, 4, 5, 6, 7)))


res -  data.frame(start=rownames(mat)[col(mat)], 
start.1=colnames(mat)[row(mat)], Value= c(t(mat)))



##Comparing the speed with other methods:
###For easy comparison across methods, converted the columns to factors
fun1 - function(mat) {
    start - rownames(mat)[col(mat)]
    start.1 - paste0(X, colnames(mat)[row(mat)])
    Value - c(t(mat))
    data.frame(start = factor(start, levels = unique(start)), start.1 = 
factor(start.1, 
    levels = unique(start.1)), Value)
}


fun2 - function(mat) {
    colnames(mat) - paste0(X, colnames(mat))
    my.df - setNames(as.data.frame.table(mat), c(start, start.1, Value))
    my.df - my.df[with(my.df, order(start, start.1)), ]
    row.names(my.df) - 1:nrow(my.df)
    my.df
}

library(reshape2)

fun3 - function(mat) {
    colnames(mat) - paste0(X, colnames(mat))
    my.df - transform(setNames(melt(mat), c(start, start.1, Value)), 
start = as.factor(start))
    my.df - my.df[with(my.df, order(start, start.1)), ]
    row.names(my.df) - 1:nrow(my.df)
    my.df
}

set.seed(481)
mat1 - matrix(sample(mat, 4.5e3*4.5e3, replace=TRUE), ncol=4.5e3, 
dimnames=list(1:4.5e3, 1:4.5e3))
#system.time(res1 - fun1(mat1))
#   user  system elapsed 
#  7.914   0.836   8.750 
 system.time(res2 - fun2(mat1))
#   user  system elapsed 
# 28.257   1.336  29.578 
system.time(res3 - fun3(mat1))
#   user  system elapsed 
# 27.213   1.027  28.224 
 
 identical(res1,res2)
#[1] TRUE
 identical(res1,res3)
#[1] TRUE
A.K.




On Friday, May 30, 2014 6:10 PM, Assa Yeroslaviz fry...@gmail.com wrote:
Hi,

I have a matrix of 4.5Kx4.5K elements with column- and row names

I need to convert this matrix into a table, where one column is the name of
the row for the element, the second column is the name of the column for
the same element and the third column is the element itself.

The way I do it at the moment is with a double for-loop.
With this way though it takes ages for the loop to finish.

I was wondering whether there is a faster way of doing the same conversion.

This is how I am doing it now:
my.df -data.frame()
for (i in 1:(nrow(out5.df)-1)){
    for (j in i:ncol(out5.df)) {
#        print(paste( I am at position: row-, i,  and col-, j, sep=))
        a- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j],
Value=out5.df[i,j])
        my.df - rbind(my.df, a)
        }
    }

this is an example for the data I have:
    1    2    3    4    5    6    7
1    FBgn0037249    FBpp0312226    FBtr0346646    FBgn0266186
FBpp0312219    FBtr0346639    FBgn0010100
2    FBgn0036389    FBpp0312225    FBtr0346645    FBgn0037894
FBpp0312218    FBtr0346638    FBgn0026577
3    FBgn0014002    FBpp0312224    FBtr0346644    FBgn0025712
FBpp0312183    FBtr0346593    FBpp0312178
4    FBgn0034201    FBpp0312223    FBtr0346643    FBgn0025712
FBpp0312182    FBtr0346592    FBpp0312177
5    FBgn0029860    FBpp031    FBtr0346642    FBgn0261597
FBpp0312181    FBtr0346591    FBtr0346587
6    FBgn0028526    FBpp0312221    FBtr0346641    FBgn0263050
FBpp0312180    FBtr0346589    FBtr0346586
7    FBgn0003486    FBpp0312220    FBtr0346640    FBgn0263051
FBpp0312179    FBtr0346588    FBpp0312219

What I would like to get at the end is something like that:
 my.df
   start start.1       Value
1      1      X1 FBgn0037249
2      1      X2 FBpp0312226
3      1      X3 FBtr0346646
4      1      X4 FBgn0266186
5      1      X5 FBpp0312219
6      1      X6 FBtr0346639
7      1      X7 FBgn0010100
8      2      X2 FBpp0312225
9      2      X3 FBtr0346645
10     2      X4 FBgn0037894
11     2      X5 FBpp0312218
12     2      X6 FBtr0346638
13     2      X7 FBgn0026577
14     3      X3 FBtr0346644
15     3      X4 FBgn0025712
16     3      X5 FBpp0312183
17     3      X6 FBtr0346593
18     3      X7 FBpp0312178
19     4      X4 FBgn0025712
20     4      X5 FBpp0312182
21     4      X6 FBtr0346592
22     4      X7 FBpp0312177
23     5      X5 FBpp0312181
24     5      X6 FBtr0346591
25     5      X7 FBtr0346587
26     6      X6 FBtr0346589
27     6      X7 FBtr0346586


Sp I would like to know if there is a better way of ding it than a double
for loop.

thanks
Assa

    [[alternative HTML version deleted]]