Re: [R] Behavior of self-defined function within ddply

2014-01-16 Thread arun


Hi,
May be this helps:

small - read.table(text=monthend_n ticker wgtdiff ret interval b1 b2 b3 b4 b5 
b6
1 19990228 AA 0.7172 -2.58 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904 
1.108
2 19990228 AAPL -0.0828 -15.48 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904 
1.108
3 19990228 ABCW 0.0966 -7.36 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904 
1.108
705 19990331 AA 0.1932 1.7 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 
0.816
706 19990331 AAPL 0.033 3.23 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 
0.816
707 19990331 ABF 0.154 -20.51 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 
0.816
708 19990331 ABI 0.286 8.33 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 
0.816,sep=,header=TRUE,stringsAsFactors=FALSE)
res - mutate(small,bins=unlist(dlply(small,.(monthend_n),cutfunc)))
res$bins
#[1] 4 2 3 4 3 3 4


ddply(small,.(monthend_n),summarize,bins=cut(wgtdiff,breaks=unique(c(b1,b2,b3,b4,b5,b6)),labels=F))[,2]
#[1] 4 2 3 4 3 3 4

unlist(lapply(split(small,small$monthend_n),cutfunc),use.names=FALSE)
#[1] 4 2 3 4 3 3 4

A.K.

 



On Thursday, January 16, 2014 2:01 AM, Amitabh Dugar cleverc...@yahoo.com 
wrote:
I have a dataframe small whch has 5,000 rows and contains data for several 
tickers every month, as below:

  

monthend_n ticker wgtdiff ret interval b1 b2 b3 b4 b5 b6 
1 19990228 AA 0.7172 -2.58 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904 
1.108 
2 19990228 AAPL -0.0828 -15.48 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904 
1.108 
3 19990228 ABCW 0.0966 -7.36 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904 
1.108 

… … 









705 19990331 AA 0.1932 1.7 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 
0.816 
706 19990331 AAPL 0.033 3.23 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 
0.816 
707 19990331 ABF 0.154 -20.51 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 
0.816 
708 19990331 ABI 0.286 8.33 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 
0.816 
etc.
Variables b1 through b6 are break points that I want to use in the cut 
function and they vary each month according to the distribution of the variable 
wgtdiff  during that month. 

To handle this I wrote a function as below:
cutfunc - function(df)
{
vec - df$wgtdiff
# need to apply unique function as break points within each month are same for 
all tickers (b1-b6 values same in each within month)
breaks - c(unique(df$b1), unique(df$b2), unique(df$b3), unique(df$b4), 
unique(df$b5), unique(df$b6))
bin - cut(vec, breaks,labels=F)
bin
}
Then  I tried:
temp4 - ddply(small, .(monthend_n), summarize, bins=cutfunc(small))
I was expecting  to get back a data frame with 5,000 rows with bins assignments 
for each ticker, and if there are 6 break points the bin #s should range from 1 
to 5.
However instead I get  a data frame with 40,000 rows and bin # ranging from 1- 
40, as below:
  monthend_n bins
1   19990228   40
2   19990228   17
3   19990228   22
...
5000   19990228   17
5001   19990331   40
5002   19990331   17
5003   19990331   22

etc

It seems ddply doesn't pass in monthly pieces of the data frame small into my 
cutfunc in the way I expect

Any guidance is appreciated.
Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subgroups definition with rpart

2014-01-16 Thread Jérémy Lambert
Hello everyone, 

I just completed a recursive pratitionning analysis, using rpart, and have a 
beautiful tree with 6 terminal nodes. Each terminal node containing a precise 
number of patients (it's a clinical study), I'd like to create a new variable 
informing in which terminal node are locating all the patients. 

More clearly maybe : patient 1: node 5; patient 2 : node 3; patient 3 : node 6, 
etc... 

I thank you already for your answers 

Jeremy
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subgroups definition with rpart

2014-01-16 Thread Prof Brian Ripley

Search for 'where' in ?rpart.object (linked from ?rpart).


On 16/01/2014 08:50, Jérémy Lambert wrote:

Hello everyone,

I just completed a recursive pratitionning analysis, using rpart, and have a 
beautiful tree with 6 terminal nodes. Each terminal node containing a precise 
number of patients (it's a clinical study), I'd like to create a new variable 
informing in which terminal node are locating all the patients.

More clearly maybe : patient 1: node 5; patient 2 : node 3; patient 3 : node 6, 
etc...

I thank you already for your answers

Jeremy



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predefined area under the curve

2014-01-16 Thread eliza botto
Dear UseRs of R,
My sincere apologizes in advance if my question isn't relevant to the 
operations in R. I actually have the following two columns data, with 12 rows 
in it. 
 dput(el)

structure(c(-1.42607687227285, -1.0200762327862, -0.736315917376129, 
-0.502402223373355, -0.293381232121193, -0.0965586152896391, 
0.0965586152896391, 0.293381232121194, 0.502402223373355, 0.73631591737613, 
1.0200762327862, 1.42607687227285, 1.99095972340185, 1.84006682649012, 
1.71563586990498, 1.60312301737773, 0.748443534297919, 0.696909774793038, 
0.64586377528834, 0.594330015783459, 0.270606020696256, 0.24247780756, 
0.211370068418158, 0.173646844190226), .Dim = c(12L, 2L), .Dimnames = list(
NULL, c(, GG)))

When I plot column 2 against column 1 , i get a curve with an area 
[auc(column1,column2)] under it equals to 2.602997. As i am calibrating it for 
further simulations therefore i know that the area under the curve should 
actually be equal to 2.845. I also know that the first 6 rows have been located 
accurately, therefore the rows from 7 to 12 need to be relocated in such a 
manner that area under the curve gets equal to or as close as possible to 
2.845. How can I do that? i have been doing it manually but at the cost of time 
and  accuracy.
Thankyou very much in advance. 
Elisa 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ifelse...

2014-01-16 Thread Tim Smith
Hi,

Sorry for the newbie question! My code:

x - 'a'
ifelse(x == 'a',y - 1, y - 2)
print(y)

Shouldn't this assign a value of 1? When I execute this I get:

 x - 'a'
 ifelse(x == 'a',y - 1, y - 2)
[1] 1
 print(y)
[1] 2


Am I doing something really daft???

thanks!



 sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   parallel  stats graphics  grDevices utils datasets  
methods   base 

other attached packages:
 [1] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6    
ggplot2_0.9.3.1   sda_1.3.2 fdrtool_1.2.11    corpcor_1.6.6 
   
 [8] entropy_1.2.0 scatterplot3d_0.3-34  pdist_1.2 
hash_2.2.6    DAAG_1.18 multicore_0.1-7   
multtest_2.18.0  
[15] XML_3.95-0.2  hgu133a.db_2.10.1 affy_1.40.0   
genefilter_1.44.0 GOstats_2.28.0    graph_1.40.1  
Category_2.28.0  
[22] GO.db_2.10.1  venneuler_1.1-0   rJava_0.9-6   
colorRamps_2.3    RColorBrewer_1.0-5    sparcl_1.0.3  gap_1.1-10
   
[29] plotrix_3.5-2 som_0.3-5 pvclust_1.2-2 
lsr_0.3.1 compute.es_0.2-2  sm_2.2-5.3    
imputation_2.0.1 
[36] locfit_1.5-9.1    TimeProjection_0.2.0  Matrix_1.1-1.1    
timeDate_3010.98  lubridate_1.3.3   gbm_2.1   
lattice_0.20-24  
[43] survival_2.37-4   RobustRankAggreg_1.1  impute_1.36.0 
reshape_0.8.4 plyr_1.8  zoo_1.7-10    
data.table_1.8.10    
[50] foreach_1.4.1 foreign_0.8-57    languageR_1.4.1   
preprocessCore_1.24.0 gtools_3.1.1  BiocInstaller_1.12.0  
org.Hs.eg.db_2.10.1  
[57] RSQLite_0.11.4    DBI_0.2-7 AnnotationDbi_1.24.0  
Biobase_2.22.0    BiocGenerics_0.8.0    biomaRt_2.18.0   

loaded via a namespace (and not attached):
 [1] affyio_1.30.0 annotate_1.40.0   AnnotationForge_1.4.4 
codetools_0.2-8   colorspace_1.2-4  dichromat_2.0-0   digest_0.6.4  
   
 [8] grid_3.0.2    GSEABase_1.24.0   gtable_0.1.2  
iterators_1.0.6   labeling_0.2  latticeExtra_0.6-26   MASS_7.3-29   
   
[15] munsell_0.4.2 proto_0.3-10  RBGL_1.38.0   
RCurl_1.95-4.1    reshape2_1.2.2    scales_0.2.3  stats4_3.0.2  
   
[22] stringr_0.6.2 tools_3.0.2   xtable_1.7-1  
zlibbioc_1.8.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ifelse...

2014-01-16 Thread ONKELINX, Thierry
You want
y - ifelse(x == 'a', 1,  2)

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens 
Tim Smith
Verzonden: donderdag 16 januari 2014 14:32
Aan: r
Onderwerp: [R] ifelse...

Hi,

Sorry for the newbie question! My code:

x - 'a'
ifelse(x == 'a',y - 1, y - 2)
print(y)

Shouldn't this assign a value of 1? When I execute this I get:

 x - 'a'
 ifelse(x == 'a',y - 1, y - 2)
[1] 1
 print(y)
[1] 2


Am I doing something really daft???

thanks!



 sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   parallel  stats graphics  grDevices utils datasets  
methods   base

other attached packages:
 [1] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6
ggplot2_0.9.3.1   sda_1.3.2 fdrtool_1.2.11corpcor_1.6.6
 [8] entropy_1.2.0 scatterplot3d_0.3-34  pdist_1.2 
hash_2.2.6DAAG_1.18 multicore_0.1-7   
multtest_2.18.0 [15] XML_3.95-0.2  hgu133a.db_2.10.1 affy_1.40.0
   genefilter_1.44.0 GOstats_2.28.0graph_1.40.1  
Category_2.28.0 [22] GO.db_2.10.1  venneuler_1.1-0   rJava_0.9-6
   colorRamps_2.3RColorBrewer_1.0-5sparcl_1.0.3  
gap_1.1-10 [29] plotrix_3.5-2 som_0.3-5 pvclust_1.2-2   
  lsr_0.3.1 compute.es_0.2-2  sm_2.2-5.3
imputation_2.0.1 [36] locfit_1.5-9.1TimeProjection_0.2.0  
Matrix_1.1-1.1timeDate_3010.98  lubridate_1.3.3   gbm_2.1   
lattice_0.20-24 [43] survival_2.37-4   RobustRankAggreg_1.1  
impute_1.36.0 reshape_0.8.4 plyr_1.8  zoo_1.7-10
data.table_1.8.10 [50] foreach_1.4.1 foreign_0.8-57
languageR_1.4.1   pr!
 eprocessCore_1.24.0 gtools_3.1.1  BiocInstaller_1.12.0  
org.Hs.eg.db_2.10.1 [57] RSQLite_0.11.4DBI_0.2-7 
AnnotationDbi_1.24.0  Biobase_2.22.0BiocGenerics_0.8.0biomaRt_2.18.0

loaded via a namespace (and not attached):
 [1] affyio_1.30.0 annotate_1.40.0   AnnotationForge_1.4.4 
codetools_0.2-8   colorspace_1.2-4  dichromat_2.0-0   digest_0.6.4
 [8] grid_3.0.2GSEABase_1.24.0   gtable_0.1.2  
iterators_1.0.6   labeling_0.2  latticeExtra_0.6-26   MASS_7.3-29 
[15] munsell_0.4.2 proto_0.3-10  RBGL_1.38.0   
RCurl_1.95-4.1reshape2_1.2.2scales_0.2.3  stats4_3.0.2 
[22] stringr_0.6.2 tools_3.0.2   xtable_1.7-1  
zlibbioc_1.8.0

[[alternative HTML version deleted]]

* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en 
binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is 
door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the 
writer and may not be regarded as stating an official position of INBO, as long 
as the message is not confirmed by a duly signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Doubt in simple merge

2014-01-16 Thread kingsly
Dear R community

I have a two data set called Elder and Younger.
This is my code for simple merge.

Elder - data.frame(
  ID=c(ID1,ID2,ID3),
  age=c(38,35,31))
Younger - data.frame(
  ID=c(ID4,ID5,ID3),
  age=c(29,21,31))

mer - merge(Elder,Younger,by=ID, all=T)

Output I am expecting:

ID    age
ID1  38
ID2  35
ID3  31
ID4  29
ID5  21

It looks very simple.  But I need help.  
When I run the code it gives me age.x and age.y.
thank you




--
View this message in context: 
http://r.789695.n4.nabble.com/Doubt-in-simple-merge-tp4683671.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doubt in simple merge

2014-01-16 Thread Adams, Jean
You are telling it to merge by ID only.  But it sounds like you would like
it to merge by both ID and age.

merge(Elder, Younger, all=TRUE)

Jean


On Thu, Jan 16, 2014 at 6:25 AM, kingsly ecoking...@yahoo.co.in wrote:

 Dear R community

 I have a two data set called Elder and Younger.
 This is my code for simple merge.

 Elder - data.frame(
   ID=c(ID1,ID2,ID3),
   age=c(38,35,31))
 Younger - data.frame(
   ID=c(ID4,ID5,ID3),
   age=c(29,21,31))

 mer - merge(Elder,Younger,by=ID, all=T)

 Output I am expecting:

 IDage
 ID1  38
 ID2  35
 ID3  31
 ID4  29
 ID5  21

 It looks very simple.  But I need help.
 When I run the code it gives me age.x and age.y.
 thank you




 --
 View this message in context:
 http://r.789695.n4.nabble.com/Doubt-in-simple-merge-tp4683671.html
 Sent from the R help mailing list archive at Nabble.com.
 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ifelse...

2014-01-16 Thread Duncan Murdoch

On 16/01/2014 8:46 AM, ONKELINX, Thierry wrote:

You want
y - ifelse(x == 'a', 1,  2)


or use if, rather than ifelse, i.e.

if (x == 'a') {
  y - 1
} else {
  y - 2
}

ifelse() is mainly used when you want to work with whole vectors of 
decisions, e.g.


x - 1:10
ifelse(x  5, 1, 0)

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doubt in simple merge

2014-01-16 Thread Frede Aakmann Tøgersen
No I think the OP wants

mer - merge(Elder, Younger)

Br. Frede


 Oprindelig meddelelse 
Fra: Adams, Jean
Dato:16/01/2014 15.45 (GMT+01:00)
Til: kingsly
Cc: R help
Emne: Re: [R] Doubt in simple merge

You are telling it to merge by ID only.  But it sounds like you would like
it to merge by both ID and age.

merge(Elder, Younger, all=TRUE)

Jean


On Thu, Jan 16, 2014 at 6:25 AM, kingsly ecoking...@yahoo.co.in wrote:

 Dear R community

 I have a two data set called Elder and Younger.
 This is my code for simple merge.

 Elder - data.frame(
   ID=c(ID1,ID2,ID3),
   age=c(38,35,31))
 Younger - data.frame(
   ID=c(ID4,ID5,ID3),
   age=c(29,21,31))

 mer - merge(Elder,Younger,by=ID, all=T)

 Output I am expecting:

 IDage
 ID1  38
 ID2  35
 ID3  31
 ID4  29
 ID5  21

 It looks very simple.  But I need help.
 When I run the code it gives me age.x and age.y.
 thank you




 --
 View this message in context:
 http://r.789695.n4.nabble.com/Doubt-in-simple-merge-tp4683671.html
 Sent from the R help mailing list archive at Nabble.com.
 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doubt in simple merge

2014-01-16 Thread Frede Aakmann Tøgersen
Ups, sorry that should have been

mer - rbind(Elder, Younger)

/frede


 Oprindelig meddelelse 
Fra: Frede Aakmann Tøgersen
Dato:16/01/2014 15.54 (GMT+01:00)
Til: Adams, Jean ,kingsly
Cc: R help
Emne: Re: [R] Doubt in simple merge

No I think the OP wants

mer - merge(Elder, Younger)

Br. Frede


 Oprindelig meddelelse 
Fra: Adams, Jean
Dato:16/01/2014 15.45 (GMT+01:00)
Til: kingsly
Cc: R help
Emne: Re: [R] Doubt in simple merge

You are telling it to merge by ID only.  But it sounds like you would like
it to merge by both ID and age.

merge(Elder, Younger, all=TRUE)

Jean


On Thu, Jan 16, 2014 at 6:25 AM, kingsly ecoking...@yahoo.co.in wrote:

 Dear R community

 I have a two data set called Elder and Younger.
 This is my code for simple merge.

 Elder - data.frame(
   ID=c(ID1,ID2,ID3),
   age=c(38,35,31))
 Younger - data.frame(
   ID=c(ID4,ID5,ID3),
   age=c(29,21,31))

 mer - merge(Elder,Younger,by=ID, all=T)

 Output I am expecting:

 IDage
 ID1  38
 ID2  35
 ID3  31
 ID4  29
 ID5  21

 It looks very simple.  But I need help.
 When I run the code it gives me age.x and age.y.
 thank you




 --
 View this message in context:
 http://r.789695.n4.nabble.com/Doubt-in-simple-merge-tp4683671.html
 Sent from the R help mailing list archive at Nabble.com.
 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] DTM Package removeSparseTerms function question

2014-01-16 Thread ramoss
  IN   inspect(removeSparseTerms(dtm, 0.4)) does anyone knows how the sparse
term
 A numeric for the maximal allowed sparsity works?  ie what is the
difference between say 0.2, 0.4  0.6? 

Thanks for your help
 



--
View this message in context: 
http://r.789695.n4.nabble.com/DTM-Package-removeSparseTerms-function-question-tp4683678.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doubt in simple merge

2014-01-16 Thread Marc Schwartz
Not quite:

 rbind(Elder, Younger)
   ID age
1 ID1  38
2 ID2  35
3 ID3  31
4 ID4  29
5 ID5  21
6 ID3  31

Note that ID3 is duplicated.


Should be:

 merge(Elder, Younger, by = c(ID, age), all = TRUE)
   ID age
1 ID1  38
2 ID2  35
3 ID3  31
4 ID4  29
5 ID5  21


He wants to do a join on both ID and age to avoid duplications of rows when 
the same ID and age occur in both data frames. If the same column names (eg 
Var) appears in both data frames and are not part of the 'by' argument, you 
end up with Var.x and Var.y in the result.

In the case of two occurrences of the same ID but two different ages, if that 
is possible, both rows would be added to the result using the above code.

Regards,

Marc Schwartz


On Jan 16, 2014, at 9:04 AM, Frede Aakmann Tøgersen fr...@vestas.com wrote:

 Ups, sorry that should have been
 
 mer - rbind(Elder, Younger)
 
 /frede
 
 
  Oprindelig meddelelse 
 Fra: Frede Aakmann Tøgersen
 Dato:16/01/2014 15.54 (GMT+01:00)
 Til: Adams, Jean ,kingsly
 Cc: R help
 Emne: Re: [R] Doubt in simple merge
 
 No I think the OP wants
 
 mer - merge(Elder, Younger)
 
 Br. Frede
 
 
  Oprindelig meddelelse 
 Fra: Adams, Jean
 Dato:16/01/2014 15.45 (GMT+01:00)
 Til: kingsly
 Cc: R help
 Emne: Re: [R] Doubt in simple merge
 
 You are telling it to merge by ID only.  But it sounds like you would like
 it to merge by both ID and age.
 
 merge(Elder, Younger, all=TRUE)
 
 Jean
 
 
 On Thu, Jan 16, 2014 at 6:25 AM, kingsly ecoking...@yahoo.co.in wrote:
 
 Dear R community
 
 I have a two data set called Elder and Younger.
 This is my code for simple merge.
 
 Elder - data.frame(
  ID=c(ID1,ID2,ID3),
  age=c(38,35,31))
 Younger - data.frame(
  ID=c(ID4,ID5,ID3),
  age=c(29,21,31))
 
 mer - merge(Elder,Younger,by=ID, all=T)
 
 Output I am expecting:
 
 IDage
 ID1  38
 ID2  35
 ID3  31
 ID4  29
 ID5  21
 
 It looks very simple.  But I need help.
 When I run the code it gives me age.x and age.y.
 thank you

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Estimating parameters of 3 parameters lognormal distribution

2014-01-16 Thread Vito Ricci
Hi guys,

is there in some R package a statement to fit parameters in a 3 parameters 
lognormal distribution.
Many thanks
Vito Ricci
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Revolutions blog: December roundup

2014-01-16 Thread David Smith
Happy New Year (if a little late!). Revolution Analytics staff write
about R every weekday at the Revolutions blog:
 http://blog.revolutionanalytics.com
and every month I post a summary of articles from the previous month
of particular interest to readers of r-help.

In case you missed them, here are some articles related to R from the
month of December:

A ComputerWorld tutorial on basic data processing with R: http://bit.ly/1cvhuqI

Prediction: R will replace legacy SAS solutions and go mainstream
http://bit.ly/1cvhtmS

A chart of the growth of R user groups and local R meetings:
http://bit.ly/1cvhuqH

I discussed R, data science and big data in an interview with
technology journalist Robert Scoble: http://bit.ly/1cvhuqG

Looking at the evidence supporting the growth of R and Python:
http://bit.ly/1cvhtmQ

A replay of Mario Inchiosa’s webinar on scalable cross-platform
R-based predictive analytics: http://bit.ly/1cvhuqF

A look at the distribution of the number of R package dependencies:
http://bit.ly/1cvhuqJ

Revolution R Enterprise 7 is now available, with free download for
academic users: http://bit.ly/1cvhtD7

Estimating the empirical distribution of Twitter follower counts with
R: http://bit.ly/1cvhtD8

How R is used by insurance companies for catastrophe modeling:
http://bit.ly/1cvhuqM

Sheri Gilley creates an interactive chart of R package dependencies
with DeployR, rCharts, and AngularJS: http://bit.ly/1cvhuqO

Joseph Rickert offers 15 tips for computing with Big Data in R:
http://bit.ly/1cvhuqN

Daniel Hanson provides a step-by-step guide to download financial time
data from Quandl into R, and then chart and analyze the time series
using the xts package: http://bit.ly/1cvhuqR

Luba Gloukhov used cluster analysis in R to allocate single-malt
scotch whiskies to four distinct flavour profiles:
http://bit.ly/1cvhuqS

Some non-R stories in the past month included: Big Data Analytics
predictions for 2014 (http://bit.ly/1cvhuqT), forced perspective
illusions (http://bit.ly/1cvhtDb), analytics with Apache Spark
(http://bit.ly/1cvhuqW), wind pattern visualization
(http://bit.ly/1cvhuqX), privacy by design (http://bit.ly/1cvhtDc),
Big Data Analytics platforms (http://bit.ly/1cvhuHb), the leidenfrost
effect (http://bit.ly/1cvhuHa), big data and video gaming
(http://bit.ly/1cvhuHi) and an ASCII fluid simulator
(http://bit.ly/1cvhtDf).

Meeting times for local R user groups (http://bit.ly/eC5YQe) can be
found on the updated R Community Calendar at: http://bit.ly/bb3naW

If you're looking for more articles about R, you can find summaries
from previous months at http://blog.revolutionanalytics.com/roundups/.
You can receive daily blog posts via email using services like
blogtrottr.com, or join the Revolution Analytics mailing list at
http://revolutionanalytics.com/newsletter to be alerted to new
articles on a monthly basis.

As always, thanks for the comments and please keep sending suggestions
to me at da...@revolutionanalytics.com . Don't forget you can also
follow the blog using an RSS reader, or by following me on Twitter
(I'm @revodavid).

Cheers,
# David

-- 
David M Smith da...@revolutionanalytics.com
VP of Marketing, Revolution Analytics  http://blog.revolutionanalytics.com
Tel: +1 (650) 646-9523 (Seattle WA, USA)
Twitter: @revodavid

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inserting color into an irregular grid comprised of polygons

2014-01-16 Thread Morway, Eric
As a follow up to this thread started nearly a month ago, I'm in need of
help sorting out the R code that will create a log-scale legend in the call
to image.plot below.  As the last line of the code provided below shows,
I attempted to force the labeling through the argument legend.lab, to no
avail. The -4 on the legend label should actually be 0.0001, -3 = 0.001,
etc etc according to the inverse of the log base 10.

If possible, it would be nice to create legend similar to the one created
herehttp://r.789695.n4.nabble.com/sppolot-fill-below-minimum-legend-value-td902841.html,
only that uses spplot and I need to stick with the base graphics since I'm
using irregular polygons to draw the figure.  However, if someone is able
to show how to replace the -4 with 0.0001 etc, that would be a great place
to start.  Here's the reproducible R code (note that log10 is taken of the
matrix vals and is what guides the color fill in the plot):

library(gsubfn)  #uses paste0 func
library(colorRamps)  #uses blue2green2red()
library(fields)  #uses image.plot(..., legend.only=TRUE, ...)


z.space -
c(0.6790521,0.3454826,0.1872356,0.0891079,0.1525315,0.1088516,0.0950484,0.1128700,0.1247511,0.1188105,0.1143682,0.1232529,0.0930168,0.0751814,0.0511553,0.0244765,0.0424162,0.0435835,0.0577441,0.0471291,0.0974984,0.0303579,0.0234230,0.0378371,0.0396388,0.0278040,0.0427108,0.0450803,0.0735903,0.1499654,0.0235646,0.0309285,0.0770295,0.0687763,0.1007385,0.0666026,0.1083643,0.1092819,0.1372624,0.2248670,0.2620903,0.4606435,0.6262846,1.7111480,1.7111480,1.7111480,1.7111480,1.7111480,1.7111480,1.7111480,0.8662780,1.1220410,0.5368302)
x.space -
c(0.4477580,0.4058683,0.5047908,0.3488354,0.3170296,0.2280360,0.2371574,0.1658813,0.2098874,0.2441864,0.3050745,0.4087275,0.4448988,0.4416195,0.4020654,0.0862620,0.0332546,0.0871109,0.3531576,0.3037825,0.2396926,0.2351304,0.2144404,0.0733572,0.0338528,0.2016122,0.1533454,0.1265044,0.0932833,0.0481462,0.0662010,0.0150457,0.0481462,0.0270822,0.0318521,0.0995603,0.0583223,0.0371142,0.0854215,0.0577332,0.0883671,0.0786467,0.0787786,0.1135672,0.0897309,0.1659446,0.8536263,0.8536263,0.8536263,0.8536263,0.5794602,0.2741660)

#Obs - read.csv(Obs_Loc_for_R.txt,header=T)

x.range - c(0,sum(x.space))
z.range - c(0,sum(z.space))
z.sum - sum(z.space)

###
# Read HK
###

vals -
matrix(c(0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,

 
0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.1498912E-03,0.5414670E-02,

 
0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.9835267E-03,0.6889354E-01,0.1004814E-01,

 

Re: [R] Inserting color into an irregular grid comprised of polygons

2014-01-16 Thread Jim Lemon

On 01/17/2014 05:12 AM, Morway, Eric wrote:

As a follow up to this thread started nearly a month ago, I'm in need of
help sorting out the R code that will create a log-scale legend in the call
to image.plot below.  As the last line of the code provided below shows,
I attempted to force the labeling through the argument legend.lab, to no
avail. The -4 on the legend label should actually be 0.0001, -3 = 0.001,
etc etc according to the inverse of the log base 10.

If possible, it would be nice to create legend similar to the one created
herehttp://r.789695.n4.nabble.com/sppolot-fill-below-minimum-legend-value-td902841.html,
only that uses spplot and I need to stick with the base graphics since I'm
using irregular polygons to draw the figure.  However, if someone is able
to show how to replace the -4 with 0.0001 etc, that would be a great place
to start.  Here's the reproducible R code (note that log10 is taken of the
matrix vals and is what guides the color fill in the plot):

library(gsubfn)  #uses paste0 func
library(colorRamps)  #uses blue2green2red()
library(fields)  #uses image.plot(..., legend.only=TRUE, ...)


z.space-
c(0.6790521,0.3454826,0.1872356,0.0891079,0.1525315,0.1088516,0.0950484,0.1128700,0.1247511,0.1188105,0.1143682,0.1232529,0.0930168,0.0751814,0.0511553,0.0244765,0.0424162,0.0435835,0.0577441,0.0471291,0.0974984,0.0303579,0.0234230,0.0378371,0.0396388,0.0278040,0.0427108,0.0450803,0.0735903,0.1499654,0.0235646,0.0309285,0.0770295,0.0687763,0.1007385,0.0666026,0.1083643,0.1092819,0.1372624,0.2248670,0.2620903,0.4606435,0.6262846,1.7111480,1.7111480,1.7111480,1.7111480,1.7111480,1.7111480,1.7111480,0.8662780,1.1220410,0.5368302)
x.space-
c(0.4477580,0.4058683,0.5047908,0.3488354,0.3170296,0.2280360,0.2371574,0.1658813,0.2098874,0.2441864,0.3050745,0.4087275,0.4448988,0.4416195,0.4020654,0.0862620,0.0332546,0.0871109,0.3531576,0.3037825,0.2396926,0.2351304,0.2144404,0.0733572,0.0338528,0.2016122,0.1533454,0.1265044,0.0932833,0.0481462,0.0662010,0.0150457,0.0481462,0.0270822,0.0318521,0.0995603,0.0583223,0.0371142,0.0854215,0.0577332,0.0883671,0.0786467,0.0787786,0.1135672,0.0897309,0.1659446,0.8536263,0.8536263,0.8536263,0.8536263,0.5794602,0.2741660)

#Obs- read.csv(Obs_Loc_for_R.txt,header=T)

x.range- c(0,sum(x.space))
z.range- c(0,sum(z.space))
z.sum- sum(z.space)

###
# Read HK
###

vals-
matrix(c(0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,

  
0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.1498912E-03,0.5414670E-02,

  
0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.9835267E-03,0.6889354E-01,0.1004814E-01,

  

Re: [R] names of columns

2014-01-16 Thread arun
Try:
dat1 - read.table(text=a b c d
1  0.5    0.1  0.2   0.2
5  0.3    0.5  0.1   0.1,sep=,header=TRUE)
 data.frame(Names=apply(dat1,1,function(x) names(x)[x %in% max(x)]))
#  Names
#1 a
#5 b

#or
colnames(dat1)[apply(dat1,1,which.max)]
#[1] a b


A.K.


Hi, 

I need a small help... 

If I have a data frame like 
      a     b     c     d 
1  0.5    0.1  0.2   0.2 
5  0.3    0.5  0.1   0.1 

I need the name of the column with the biggest number in each column. My 
results will be 

1  a 
5  b 

If I do apply(data.frame, 1, max) I have the maximum by row but I want the 
name, not the value... 

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] names of columns

2014-01-16 Thread arun
I guess, you could also do:
names(dat1)[max.col(dat1)]
#[1] a b
A.K.




On Thursday, January 16, 2014 3:47 PM, arun smartpink...@yahoo.com wrote:
Try:
dat1 - read.table(text=a b c d
1  0.5    0.1  0.2   0.2
5  0.3    0.5  0.1   0.1,sep=,header=TRUE)
 data.frame(Names=apply(dat1,1,function(x) names(x)[x %in% max(x)]))
#  Names
#1 a
#5 b

#or
colnames(dat1)[apply(dat1,1,which.max)]
#[1] a b


A.K.


Hi, 

I need a small help... 

If I have a data frame like 
      a     b     c     d 
1  0.5    0.1  0.2   0.2 
5  0.3    0.5  0.1   0.1 

I need the name of the column with the biggest number in each column. My 
results will be 

1  a 
5  b 

If I do apply(data.frame, 1, max) I have the maximum by row but I want the 
name, not the value... 

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regression Modeling Strategies 4-Day Short Course March 2013

2014-01-16 Thread Frank Harrell
My yearly Regression Modeling Strategies course is expanded to 4 days 
this year to be able relax the pace a bit.  Details are below. 
Questions welcomed.

-

*RMS Short Course 2014*
Frank E. Harrell, Jr., Ph.D., Professor and Chair
Department of Biostatistics, Vanderbilt University School of Medicine

*March 4, 5, 6  7, 2014*
9:00am - 4:00pm (9:00am - 2:00pm March 7)
Alumni Hall
Vanderbilt University
Nashville Tennessee USA

See http://biostat.mc.vanderbilt.edu/2014RMSShortCourse for details.

The course includes statistical methodology, case studies, and use of
the R rms package.

Please email interest to Audrey Carvajal {audrey.carva...@vanderbilt.edu}

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting rows from data frame that approximately equal another data frame

2014-01-16 Thread arun
Hi,

May be this helps:
x - data.frame(V1=-1.162877, V2=0.1848928)
 set.seed(245)
df - as.data.frame(matrix(rnorm(5051*2),ncol=2))

 cut1 - cut(df[,1],breaks=c(x[,1]-0.1,x[,1]+0.1))
 cut2 - cut(df[,2],breaks=c(x[,2]-0.1,x[,2]+0.1))

df1 - df[!is.na(cut1)  !is.na(cut2),]

A.K.



I have a dataframe and would like to extract rows that approximately equal to 
the values in another data frame. 

say I have a data frame called x 

dim(x) 
[1] 1 2 


x 
          V1        V2 
x -1.162877 0.1848928 


I would like to search through a larger data frame 
called df and extract all rows that approximately equal the two values 
in the data frame x by say +- 0.1. 

The larger dataframe has these dimensions 

dim(df) 
[1] 5051    2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] barplot: segment-wise shading

2014-01-16 Thread Martin Weiser
Dear listers,

I would like to make stacked barplot, and to be able to define shading
(density or angle) segment-wise, i.e. NOT like here:
 # Bar shading example
 barplot(VADeaths, angle = 15+10*1:5, density = 20, col = black,
 legend = rownames(VADeaths))

The example has 5 different angles of shading, I would like to have as
many possible angle values as there are segments (i.e. 20 in the
VADeaths example).
I was not successful using web search.
Any advice?

Thank you for your patience.
With the best regards,
Martin Weiser

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counts and percentage of multiple categorical columns in R

2014-01-16 Thread arun
Also,
You can do the same with the previous solution:
result1 - result[,-6]

vec1 - unique(unlist(dat1))
result2 - as.data.frame(t(sapply(dat1,function(x) {counts- 
table(factor(x,levels=vec1)); 
percentage-sprintf(%.1f,(counts/sum(counts))*100); 
c(paste0(counts,paste0((,percentage,))), 
sum(!is.na(x)))})),stringsAsFactors=FALSE)
 result2[,6] - as.numeric(result2[,6])
 colnames(result2) - colnames(result1)
 identical(result1,result2)
#[1] TRUE

A.K.

On Thursday, January 16, 2014 11:24 AM, arun smartpink...@yahoo.com wrote:
Hi Jingxia,
May be this helps:

dat1 - read.table(text=fatfreemilk fatmilk halfmilk 2fatmilk
A A A A
A B B A
B A A A
C C C C
D . A A
A E A E
C A B A
A . A A
A B . A
A A B 
E,sep=,header=TRUE,stringsAsFactors=FALSE,check.names=FALSE,na.strings=.)
 dat2 - dat1
 dat2$id - 1:nrow(dat2)

library(reshape2)
res - acast(melt(dat2,id.var=id)[,-1],variable~value,length)
res[,-6] - 
paste0(res[,-6],paste0((,sprintf(%.1f,(res[,-6]/rowSums(res[,-6]))*100)),))
 result - as.data.frame(res,stringsAsFactors=FALSE)
#Either 
result$nonNAcount - dim(dat1)[1]-as.numeric(result$`NA`)
#or
result$nonNAcount - sapply(dat1,function(x) sum(!is.na(x)))
result[,-6]
#  A   B   C   D   E nonNAcount
#fatfreemilk 6(60.0) 1(10.0) 2(20.0) 1(10.0)  0(0.0) 10
#fatmilk 4(50.0) 2(25.0) 1(12.5)  0(0.0) 1(12.5)  8
#halfmilk    5(55.6) 3(33.3) 1(11.1)  0(0.0)  0(0.0)  9
#2fatmilk    7(70.0)  0(0.0) 1(10.0)  0(0.0) 2(20.0) 10

A.K.





On Thursday, January 16, 2014 9:49 AM, Jingxia Lin jingxi...@gmail.com wrote:

Dear Arun,

Sorry to bother you again.. But may I ask you for one more question regarding 
the data set? 
I am using the following method you offered for the data set. In our original 
data, there are some blank cells (i.e. data missing) in some columns. So in the 
output data frame, can we add an additional column to show the number of 
response (i.e. the number of non-blank cells)? I tried a couple of ways but 
failed (sorry I'm really not good at R...) I would be very grateful if you can 
help us with this problem at your convenience. Thank you! 

Best,
Jingxia

 dat2 - dat1
 dat2$id - 1:nrow(dat2)
library(reshape2)
 res - dcast(melt(dat2,id.var=id)[,-1],variable~value,length)
row.names(res) - res[,1]
res1 - res[,-1]
res2 - as.matrix(res1)
 res2[]- paste0(res2,paste0((,(res2/rowSums(res2))*100),))
 as.data.frame(res2)


results 
#    A B C D E
#fatfreemilk 6(60) 1(10) 2(20) 1(10)  0(0)
#fatmilk 6(60) 2(20) 1(10)  0(0) 1(10)
#halfmilk    5(50) 4(40) 1(10)  0(0)  0(0)
#2fatmilk    7(70)  0(0) 1(10)  0(0) 2(20)




On Mon, Dec 30, 2013 at 3:50 PM, arun smartpink...@yahoo.com wrote:

Dear Jingxia,
No problem.  Happy New Year to you too!
Arun








On Monday, December 30, 2013 2:49 AM, Jingxia Lin jingxi...@gmail.com wrote:

Dear Arun,

Thank YOU for your kind help :)  Happy new year!

Best,
Jingxia



On Mon, Dec 30, 2013 at 3:43 PM, arun smartpink...@yahoo.com wrote:

Dear Jingxia,

Glad that you were able to figure it out.  I was away from my computer.  My 
name is 'Arun Kirshna Sasikala-Appukuttan'.  I am a postdoctoral research 
fellow at Wayne State University, Detroit, MI, USA.  Thank you for the kind 
acknowledgment.
Regards,
Arun







On Sunday, December 29, 2013 9:25 PM, Jingxia Lin jingxi...@gmail.com wrote:

Dear A.K.

I also solved the character problem by using library(xlsx). So everything is 
fine now. Thank you again!

Best,
Jingxia



On Mon, Dec 30, 2013 at 10:17 AM, Jingxia Lin jingxi...@gmail.com wrote:

Dear A.K.,


Thank you a lot! I tried your way and it works perfect. The only thing I 
haven't figured out is that while I exported the final data frame into an 
excel file, all Chinese characters were not shown correctly (my original 
data has Chinese in row/column names). Other than that, everything is great! 
Would you mind letting me know your name so that we can acknowledge your 
help in our paper? Thank you again! 


Best
Jingxia





On Mon, Dec 30, 2013 at 3:48 AM, arun smartpink...@yahoo.com wrote:

Hi,
Try:
dat1 - read.table(text=fatfreemilk fatmilk halfmilk 2fatmilk

A A A A
A B B A
B A A A
C C C C
D A A A
A E A E
C A B A
A A A A
A B B A
A A B E,sep=,header=TRUE,stringsAsFactors=FALSE,check.names=FALSE)
 dat2 - dat1
 dat2$id - 1:nrow(dat2)
library(reshape2)
 res - dcast(melt(dat2,id.var=id)[,-1],variable~value,length)
row.names(res) - res[,1]
res1 - res[,-1]
res2 - as.matrix(res1)
 res2[]- paste0(res2,paste0((,(res2/rowSums(res2))*100),))
 as.data.frame(res2)
#    A B C D E
#fatfreemilk 6(60) 1(10) 2(20) 1(10)  0(0)
#fatmilk 6(60) 2(20) 1(10)  0(0) 1(10)
#halfmilk    5(50) 4(40) 1(10)  0(0)  0(0)
#2fatmilk    7(70)  0(0) 1(10)  0(0) 2(20)
A.K.





On Sunday, December 29, 2013 1:07 PM, Jingxia Lin jingxi...@gmail.com 
wrote:
Dear R helpers,

I have a data sheet (“milk”) with four types of milk from five brands (A,
B, C, D, E), the column shows the brands 

Re: [R] counts and percentage of multiple categorical columns in R

2014-01-16 Thread arun
Hi Jingxia,
May be this helps:

dat1 - read.table(text=fatfreemilk fatmilk halfmilk 2fatmilk
A A A A
A B B A
B A A A
C C C C
D . A A
A E A E
C A B A
A . A A
A B . A
A A B 
E,sep=,header=TRUE,stringsAsFactors=FALSE,check.names=FALSE,na.strings=.)
 dat2 - dat1
 dat2$id - 1:nrow(dat2)

library(reshape2)
res - acast(melt(dat2,id.var=id)[,-1],variable~value,length)
res[,-6] - 
paste0(res[,-6],paste0((,sprintf(%.1f,(res[,-6]/rowSums(res[,-6]))*100)),))
 result - as.data.frame(res,stringsAsFactors=FALSE)
#Either 
result$nonNAcount - dim(dat1)[1]-as.numeric(result$`NA`)
#or
result$nonNAcount - sapply(dat1,function(x) sum(!is.na(x)))
result[,-6]
#  A   B   C   D   E nonNAcount
#fatfreemilk 6(60.0) 1(10.0) 2(20.0) 1(10.0)  0(0.0) 10
#fatmilk 4(50.0) 2(25.0) 1(12.5)  0(0.0) 1(12.5)  8
#halfmilk    5(55.6) 3(33.3) 1(11.1)  0(0.0)  0(0.0)  9
#2fatmilk    7(70.0)  0(0.0) 1(10.0)  0(0.0) 2(20.0) 10

A.K.




On Thursday, January 16, 2014 9:49 AM, Jingxia Lin jingxi...@gmail.com wrote:

Dear Arun,

Sorry to bother you again.. But may I ask you for one more question regarding 
the data set? 
I am using the following method you offered for the data set. In our original 
data, there are some blank cells (i.e. data missing) in some columns. So in the 
output data frame, can we add an additional column to show the number of 
response (i.e. the number of non-blank cells)? I tried a couple of ways but 
failed (sorry I'm really not good at R...) I would be very grateful if you can 
help us with this problem at your convenience. Thank you! 

Best,
Jingxia

 dat2 - dat1
 dat2$id - 1:nrow(dat2)
library(reshape2)
 res - dcast(melt(dat2,id.var=id)[,-1],variable~value,length)
row.names(res) - res[,1]
res1 - res[,-1]
res2 - as.matrix(res1)
 res2[]- paste0(res2,paste0((,(res2/rowSums(res2))*100),))
 as.data.frame(res2)


results 
#    A B C D E
#fatfreemilk 6(60) 1(10) 2(20) 1(10)  0(0)
#fatmilk 6(60) 2(20) 1(10)  0(0) 1(10)
#halfmilk    5(50) 4(40) 1(10)  0(0)  0(0)
#2fatmilk    7(70)  0(0) 1(10)  0(0) 2(20)




On Mon, Dec 30, 2013 at 3:50 PM, arun smartpink...@yahoo.com wrote:

Dear Jingxia,
No problem.  Happy New Year to you too!
Arun








On Monday, December 30, 2013 2:49 AM, Jingxia Lin jingxi...@gmail.com wrote:

Dear Arun,

Thank YOU for your kind help :)  Happy new year!

Best,
Jingxia



On Mon, Dec 30, 2013 at 3:43 PM, arun smartpink...@yahoo.com wrote:

Dear Jingxia,

Glad that you were able to figure it out.  I was away from my computer.  My 
name is 'Arun Kirshna Sasikala-Appukuttan'.  I am a postdoctoral research 
fellow at Wayne State University, Detroit, MI, USA.  Thank you for the kind 
acknowledgment.
Regards,
Arun







On Sunday, December 29, 2013 9:25 PM, Jingxia Lin jingxi...@gmail.com wrote:

Dear A.K.

I also solved the character problem by using library(xlsx). So everything is 
fine now. Thank you again!

Best,
Jingxia



On Mon, Dec 30, 2013 at 10:17 AM, Jingxia Lin jingxi...@gmail.com wrote:

Dear A.K.,


Thank you a lot! I tried your way and it works perfect. The only thing I 
haven't figured out is that while I exported the final data frame into an 
excel file, all Chinese characters were not shown correctly (my original 
data has Chinese in row/column names). Other than that, everything is great! 
Would you mind letting me know your name so that we can acknowledge your 
help in our paper? Thank you again! 


Best
Jingxia





On Mon, Dec 30, 2013 at 3:48 AM, arun smartpink...@yahoo.com wrote:

Hi,
Try:
dat1 - read.table(text=fatfreemilk fatmilk halfmilk 2fatmilk

A A A A
A B B A
B A A A
C C C C
D A A A
A E A E
C A B A
A A A A
A B B A
A A B E,sep=,header=TRUE,stringsAsFactors=FALSE,check.names=FALSE)
 dat2 - dat1
 dat2$id - 1:nrow(dat2)
library(reshape2)
 res - dcast(melt(dat2,id.var=id)[,-1],variable~value,length)
row.names(res) - res[,1]
res1 - res[,-1]
res2 - as.matrix(res1)
 res2[]- paste0(res2,paste0((,(res2/rowSums(res2))*100),))
 as.data.frame(res2)
#    A B C D E
#fatfreemilk 6(60) 1(10) 2(20) 1(10)  0(0)
#fatmilk 6(60) 2(20) 1(10)  0(0) 1(10)
#halfmilk    5(50) 4(40) 1(10)  0(0)  0(0)
#2fatmilk    7(70)  0(0) 1(10)  0(0) 2(20)
A.K.





On Sunday, December 29, 2013 1:07 PM, Jingxia Lin jingxi...@gmail.com 
wrote:
Dear R helpers,

I have a data sheet (“milk”) with four types of milk from five brands (A,
B, C, D, E), the column shows the brands that each customer chose for each
type of the milk they bought. The data sheet goes like below. You can see
for some type of milk, no brand is chosen.

fatfreemilk fatmilk halfmilk 2fatmilk
A A A A
A B B A
B A A A
C C C C
D A A A
A E A E
C A B A
A A A A
A B B A
A A B E

I want to summarize each column so that for each type of milk, i know the
counts and percentages of the brands chosen for each milk type. I tried
summary in R, but the result is not shown nicely. How I can display the
result in a way like below:
A B C D E

[R] Object not Found Error on a .csv file

2014-01-16 Thread Valerie Shalin
Hello:

I am a new user, running the latest version of R on my Mac.  I have started by 
reading a file with the read.csv command:
task2analyses - read.csv(file=GroupsWithRTsEqualN.csv,head=TRUE,sep=,)

When I print it out in R, the file appears to be intact, with the proper 
headers.  Yet, when I run test commands like the following:
 cor(GfullUA,GFullUA) 

I get back a message Error in is.data.frame(y) : object 'GFullUA' not found

This is not the case for all of the variables in the file.  Some can be tested 
with is.numeric (or character). And some are read
properly when I test the summary command.  

I have looked on Google, and have been unsuccessful in searching documentation.

Thanks for any help,

Valerie Shalin
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question on adding a p-value in bwplot

2014-01-16 Thread Hae-Young Kim
Hi,

I am using bwplot to depict the box plots for two group by 6 time points.
I need to add 6 p-values in each time point to compare two group at each
time point.  P-values are (0.0020, 0.0204, 0.3361, 0.0185, 0.1981, and
0.6677).  I could depict the two box plots per each time point using the
code below, but I am not sure how to add a p-value per each time point.
 Please let me know if you know how to do it.

Thanks!


library(lattice)
library(Hmisc)
library(gridExtra)

font.settings - list( font = 1, cex = 1, fontfamily = serif)
my.theme - list(
  box.umbrella = list(col = black),
  box.rectangle = list(fill= rep(c(black, black),2)),
  box.dot = list(col = black, pch = 3, cex=2),
  plot.symbol   = list(cex = 0.5, col = 1, pch= 0), #outlier size and color
  par.xlab.text = font.settings,
  par.ylab.text = font.settings,
  axis.text = font.settings,
  superpose.symbol=list(fill=c(white,black)), # boxplots
  superpose.polygon=list(col=c(white,black)), # legend
  par.sub=font.settings)

kccqd - sas.get(I:/Protocol/Datasets/2013/09302013/DataForQOL/,
kccq_long,formats=F, sasprog=sasprog)
kccqlong - subset(kccqd, month 72)

id - (kccqlong$master.id)
group - (kccqlong$rdrug12)
month - (kccqlong$month)
kccq.pred - (kccqlong$kccq.pred)
kccq.raw - (kccqlong$kccq.raw)

bwplot(kccq.raw ~ time, data = kccqlong, groups = group, ylim=c(-100,100),
 pch = |, box.width = 1/3,
   auto.key = list(points = FALSE, rectangles = TRUE, space = right,
title=Treatment, cex.title=1),
 panel = panel.superpose,
 ylab = Change in KCCQ (Raw) from baseline,
   xlab=Visit,
 par.settings = my.theme,
   panel.groups = function(x, y, ..., group.number) {
   panel.bwplot(x + (group.number-1.5)/3, y, ...)
   })

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Object not Found Error on a .csv file

2014-01-16 Thread Sarah Goslee
Hi Valerie,

Assuming GfullUA is a column of your data frame task2analyses, you
need to tell R where to look. It's trying to find an object called
GfullUA, and there isn't one.

Here are two ways:
with(task2analyses, cor(GfullUA, GFullUA))

cor(task2analyses$GfullUA, task2analyses$GFullUA)

You might want to read the Introduction to R that came with your
software installation.

Sarah

On Thu, Jan 16, 2014 at 1:32 PM, Valerie Shalin vale...@knoesis.org wrote:
 Hello:

 I am a new user, running the latest version of R on my Mac.  I have started 
 by reading a file with the read.csv command:
 task2analyses - read.csv(file=GroupsWithRTsEqualN.csv,head=TRUE,sep=,)

 When I print it out in R, the file appears to be intact, with the proper 
 headers.  Yet, when I run test commands like the following:
 cor(GfullUA,GFullUA)

 I get back a message Error in is.data.frame(y) : object 'GFullUA' not found

 This is not the case for all of the variables in the file.  Some can be 
 tested with is.numeric (or character). And some are read
 properly when I test the summary command.

 I have looked on Google, and have been unsuccessful in searching 
 documentation.

 Thanks for any help,

 Valerie Shalin

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barplot: segment-wise shading

2014-01-16 Thread Marc Schwartz

On Jan 16, 2014, at 12:45 PM, Martin Weiser weis...@natur.cuni.cz wrote:

 Dear listers,
 
 I would like to make stacked barplot, and to be able to define shading
 (density or angle) segment-wise, i.e. NOT like here:
 # Bar shading example
 barplot(VADeaths, angle = 15+10*1:5, density = 20, col = black,
 legend = rownames(VADeaths))
 
 The example has 5 different angles of shading, I would like to have as
 many possible angle values as there are segments (i.e. 20 in the
 VADeaths example).
 I was not successful using web search.
 Any advice?
 
 Thank you for your patience.
 With the best regards,
 Martin Weiser


You could do something like this:

# Get the dimensions of VADeaths
 dim(VADeaths)
[1] 5 4

# How many segments?
 prod(dim(VADeaths))
[1] 20


Then use that value in the barplot() arguments as you desire, for example:

  barplot(VADeaths, angle = 15 + 10 * 1:prod(dim(VADeaths)), 
  density = 20, col = black, legend = rownames(VADeaths))


or wrap the barplot() function in your own, which pre-calculates the values and 
then passes them to the barplot() call in the function.

See ?dim and ?prod

Be aware that a vector (eg. 1:5) will be 'dim-less', thus if you are going to 
use this approach for a vector based data object, you would want to use ?length

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Estimating parameters of 3 parameters lognormal distribution

2014-01-16 Thread Göran Broström

On 01/16/2014 04:59 PM, Vito Ricci wrote:

Hi guys,

is there in some R package a statement to fit parameters in a 3 parameters 
lognormal distribution.


Yes, the function 'phreg' in the package 'eha'.

Göran Broström



Many thanks
Vito Ricci
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barplot: segment-wise shading

2014-01-16 Thread Martin Weiser
Marc Schwartz píše v Čt 16. 01. 2014 v 16:46 -0600:
 On Jan 16, 2014, at 12:45 PM, Martin Weiser weis...@natur.cuni.cz wrote:
 
  Dear listers,
  
  I would like to make stacked barplot, and to be able to define shading
  (density or angle) segment-wise, i.e. NOT like here:
  # Bar shading example
  barplot(VADeaths, angle = 15+10*1:5, density = 20, col = black,
  legend = rownames(VADeaths))
  
  The example has 5 different angles of shading, I would like to have as
  many possible angle values as there are segments (i.e. 20 in the
  VADeaths example).
  I was not successful using web search.
  Any advice?
  
  Thank you for your patience.
  With the best regards,
  Martin Weiser
 
 
 You could do something like this:
 
 # Get the dimensions of VADeaths
  dim(VADeaths)
 [1] 5 4
 
 # How many segments?
  prod(dim(VADeaths))
 [1] 20
 
 
 Then use that value in the barplot() arguments as you desire, for example:
 
   barplot(VADeaths, angle = 15 + 10 * 1:prod(dim(VADeaths)), 
   density = 20, col = black, legend = rownames(VADeaths))
 
 
 or wrap the barplot() function in your own, which pre-calculates the values 
 and then passes them to the barplot() call in the function.
 
 See ?dim and ?prod
 
 Be aware that a vector (eg. 1:5) will be 'dim-less', thus if you are going to 
 use this approach for a vector based data object, you would want to use 
 ?length
 
 Regards,
 
 Marc Schwartz
 

Hello,

thank you for your attempt, but this does not work (for me).
This produces 5 angles of shading, not 20.
Maybe because of my R version (R version 2.15.1 (2012-06-22); Platform:
i486-pc-linux-gnu (32-bit))?

Thank you.

Regards,
Martin Weiser

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grepping with a variable trouble

2014-01-16 Thread arun
Hi,
It's not clear about the pattern in your rownames.
In the for() loop, I guess you need rownames(df) instead of df.

Using an example dataset (Here the rownames may be different)
set.seed(59)
 x - as.data.frame(matrix(rnorm(110),ncol=2))

set.seed(24)
row.names(x) - paste0(row.names(x),Reduce(`paste0`,lapply(1:2,function(x) 
sample(letters,55,replace=TRUE

set.seed(435)
df - as.data.frame(matrix(sample(200,300*20,replace=TRUE),ncol=20))
set.seed(34)
row.names(df) - 
paste0(sample(1:55,300,replace=TRUE),Reduce(`paste0`,lapply(1:2,function(x) 
sample(letters,300,replace=TRUE
 gl1 - sapply(rownames(x),function(i) 
grep(paste0(gsub(\\d+,,i),$),rownames(df)))
 gl1[2]
#$`2fl`
#[1] 128
rownames(df[128,])
#[1] 31fl

A.K.




Hi, I'm having trouble using grep with a variable. When I do this it works 
fine: 

grep(^hb$, rownames(df))
[1] 9359 

but what I really want to do is use the rownames of 1 data frame
 (x) to extract the position of that same rowname in a larger data frame
 (df). How can I do this for say all of the rownames in x? The positions
 should be stored in a variable called g. 

dim(x)
[1] 55  2 

dim(df)
[1] 13000    19 

I've tried this but it does not seem to work. 

for(i in rownames(x)){ 
    g - grep(paste(^,i,$,sep=), df) 
} 

any ideas? 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barplot: segment-wise shading

2014-01-16 Thread Marc Schwartz

On Jan 16, 2014, at 5:03 PM, Martin Weiser weis...@natur.cuni.cz wrote:

 Marc Schwartz píše v Čt 16. 01. 2014 v 16:46 -0600:
 On Jan 16, 2014, at 12:45 PM, Martin Weiser weis...@natur.cuni.cz wrote:
 
 Dear listers,
 
 I would like to make stacked barplot, and to be able to define shading
 (density or angle) segment-wise, i.e. NOT like here:
 # Bar shading example
barplot(VADeaths, angle = 15+10*1:5, density = 20, col = black,
legend = rownames(VADeaths))
 
 The example has 5 different angles of shading, I would like to have as
 many possible angle values as there are segments (i.e. 20 in the
 VADeaths example).
 I was not successful using web search.
 Any advice?
 
 Thank you for your patience.
 With the best regards,
 Martin Weiser
 
 
 You could do something like this:
 
 # Get the dimensions of VADeaths
 dim(VADeaths)
 [1] 5 4
 
 # How many segments?
 prod(dim(VADeaths))
 [1] 20
 
 
 Then use that value in the barplot() arguments as you desire, for example:
 
  barplot(VADeaths, angle = 15 + 10 * 1:prod(dim(VADeaths)), 
  density = 20, col = black, legend = rownames(VADeaths))
 
 
 or wrap the barplot() function in your own, which pre-calculates the values 
 and then passes them to the barplot() call in the function.
 
 See ?dim and ?prod
 
 Be aware that a vector (eg. 1:5) will be 'dim-less', thus if you are going 
 to use this approach for a vector based data object, you would want to use 
 ?length
 
 Regards,
 
 Marc Schwartz
 
 
 Hello,
 
 thank you for your attempt, but this does not work (for me).
 This produces 5 angles of shading, not 20.
 Maybe because of my R version (R version 2.15.1 (2012-06-22); Platform:
 i486-pc-linux-gnu (32-bit))?
 
 Thank you.
 
 Regards,
 Martin Weiser


Arggh.

No, this is my error for not actually looking at the plot and presuming that it 
would work.

Turns out that it does work for a non-stacked barplot:

  barplot(VADeaths, angle = 1:20 * 10, density = 10, beside = TRUE)

However, internally within barplot(), actually barplot.default(), the manner in 
which the matrix is passed to an internal function called xyrect() to draw the 
segments, is that entire columns are passed, rather than the individual 
segments (counts), when the bars are stacked.

As a result, due to the vector based approach used, only the first 5 values of 
'angle' are actually used, since there are 5 columns, rather than all 20. The 
same impact will be observed when using the default legend that is created.

Thus, I don't believe that there will be an easy (non kludgy) way to do what 
you want, at least with the default barplot() function. 

You could fairly easily create/build your own function using ?rect, which is 
what barplot() uses to draw the segments. I am not sure if lattice based 
graphics can do this or perhaps using Hadley's ggplot based approach would 
offer a possibility.

Apologies for the confusion.

Regards,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Model averaging using QAICc

2014-01-16 Thread Kamil Bartoń

On 2014-01-15 11:00, r-help-requ...@r-project.org wrote:

Date: Wed, 15 Jan 2014 16:39:17 +1000
From: Diana Virkkid.vir...@griffith.edu.au
To:r-help@r-project.org
Subject: [R] Model averaging using QAICc
Message-ID:
  CAL6nRQcAyN-3SVeZSMXoJq=vsxotpg3e0prwjw7iu7g20b+...@mail.gmail.com
Content-Type: text/plain

Hi all,

I am having some trouble running GLMM's and using model averaging with
QAICc.

Let me know if you need more detail here:
I am trying to run GLMM's on count data in the package glmmADMB with a
negative binomial distribution due to overdispersion. The dispersion
parameter has now reduced to 2.679 for the global model (from a dispersion
parameter of 27.507 with a poisson distribution), and I am not sure if this
is still considered too high for running the models?

I would like to try to use QAICc's for model selection and model averaging
with the package MuMIn. I have so far been able to produce a QAICc output
only for the models. I read that model averaging with QAICc can be done in
MuMIn but cannot find the syntax to get these outputs, including the model
weightings, parameter estimates, confidence intervals, and relative
variable importance.



Use argument 'rank' to provide the information criterion to use:
- with 'dredge': rank = QAICc, chat = c-hat
- with 'model.sel' and 'model.avg' : rank = QAICc, rank.args =
list(chat = c-hat)

See example(QAICc) and example(model.avg)

kamil

The University of Aberdeen is a charity registered in Scotland, No SC013683.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] xts error: number of items to replace is not a multiple of replacement length

2014-01-16 Thread ce

Dear all , 

I am getting this error while trying to change columns of an xts object with a 
date range as index. 

 library(xts)
Loading required package: zoo

Attaching package: ‘zoo’

The following object is masked from ‘package:base’:

as.Date, as.Date.numeric

  data(sample_matrix)
  sample.xts - as.xts(sample_matrix, descr='my new xts object')
 head(sample.xts)
   Open High  LowClose
2007-01-02 50.03978 50.11778 49.95041 50.11778
2007-01-03 50.23050 50.42188 50.23050 50.39767
2007-01-04 50.42096 50.42096 50.26414 50.33236
2007-01-05 50.37347 50.37347 50.22103 50.33459
2007-01-06 50.24433 50.24433 50.11121 50.18112
2007-01-07 50.13211 50.21561 49.99185 49.99185  

 
 sample.xts$Close - sample.xts$Close+1
 
 head(sample.xts)
   Open High  LowClose
2007-01-02 50.03978 50.11778 49.95041 51.11778
2007-01-03 50.23050 50.42188 50.23050 51.39767
2007-01-04 50.42096 50.42096 50.26414 51.33236
2007-01-05 50.37347 50.37347 50.22103 51.33459
2007-01-06 50.24433 50.24433 50.11121 51.18112
2007-01-07 50.13211 50.21561 49.99185 50.99185
 
 sample.xts[2007-01-02::2007-01-04]
   Open High  LowClose
2007-01-02 50.03978 50.11778 49.95041 51.11778
2007-01-03 50.23050 50.42188 50.23050 51.39767
2007-01-04 50.42096 50.42096 50.26414 51.33236
 
 sample.xts[2007-01-02::2007-01-04]$Close - 
 sample.xts[2007-01-02::2007-01-04]$Close+1
Warning message:
In NextMethod(.Generic) :
  number of items to replace is not a multiple of replacement length

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xts error: number of items to replace is not a multiple of replacement length

2014-01-16 Thread arun
Hi,
Try:
sample.xts[2007-01-02::2007-01-04,Close] 
-sample.xts[2007-01-02::2007-01-04,Close] +1


sample.xts[2007-01-02::2007-01-04] 
#   Open High  Low    Close
#2007-01-02 50.03978 50.11778 49.95041 52.11778
#2007-01-03 50.23050 50.42188 50.23050 52.39767
#2007-01-04 50.42096 50.42096 50.26414 52.33236
A.K.


On Thursday, January 16, 2014 8:34 PM, ce zadi...@excite.com wrote:

Dear all , 

I am getting this error while trying to change columns of an xts object with a 
date range as index. 

 library(xts)
Loading required package: zoo

Attaching package: ‘zoo’

The following object is masked from ‘package:base’:

    as.Date, as.Date.numeric

      data(sample_matrix)
      sample.xts - as.xts(sample_matrix, descr='my new xts object')
 head(sample.xts)
               Open     High      Low    Close
2007-01-02 50.03978 50.11778 49.95041 50.11778
2007-01-03 50.23050 50.42188 50.23050 50.39767
2007-01-04 50.42096 50.42096 50.26414 50.33236
2007-01-05 50.37347 50.37347 50.22103 50.33459
2007-01-06 50.24433 50.24433 50.11121 50.18112
2007-01-07 50.13211 50.21561 49.99185 49.99185                                  
                        
 
 sample.xts$Close - sample.xts$Close+1
 
 head(sample.xts)
               Open     High      Low    Close
2007-01-02 50.03978 50.11778 49.95041 51.11778
2007-01-03 50.23050 50.42188 50.23050 51.39767
2007-01-04 50.42096 50.42096 50.26414 51.33236
2007-01-05 50.37347 50.37347 50.22103 51.33459
2007-01-06 50.24433 50.24433 50.11121 51.18112
2007-01-07 50.13211 50.21561 49.99185 50.99185
 
 sample.xts[2007-01-02::2007-01-04]
               Open     High      Low    Close
2007-01-02 50.03978 50.11778 49.95041 51.11778
2007-01-03 50.23050 50.42188 50.23050 51.39767
2007-01-04 50.42096 50.42096 50.26414 51.33236
 
 sample.xts[2007-01-02::2007-01-04]$Close - 
 sample.xts[2007-01-02::2007-01-04]$Close+1
Warning message:
In NextMethod(.Generic) :
  number of items to replace is not a multiple of replacement length

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xts error: number of items to replace is not a multiple of replacement length

2014-01-16 Thread ce
Indeed it  works ! Thanks a lot. But why?


-Original Message-
From: arun [smartpink...@yahoo.com]
Date: 01/16/2014 08:44 PM
To: r-help@r-project.org r-help@r-project.org
Subject: Re: [R] xts error: number of items to replace is not a multiple of 
replacement length

Hi,
Try:
sample.xts[2007-01-02::2007-01-04,Close] 
-sample.xts[2007-01-02::2007-01-04,Close] +1


sample.xts[2007-01-02::2007-01-04] 
#   Open High  Low    Close
#2007-01-02 50.03978 50.11778 49.95041 52.11778
#2007-01-03 50.23050 50.42188 50.23050 52.39767
#2007-01-04 50.42096 50.42096 50.26414 52.33236
A.K.


On Thursday, January 16, 2014 8:34 PM, ce zadi...@excite.com wrote:

Dear all , 

I am getting this error while trying to change columns of an xts object with a 
date range as index. 

 library(xts)
Loading required package: zoo

Attaching package: ‘zoo’

The following object is masked from ‘package:base’:

    as.Date, as.Date.numeric

      data(sample_matrix)
      sample.xts - as.xts(sample_matrix, descr='my new xts object')
 head(sample.xts)
               Open     High      Low    Close
2007-01-02 50.03978 50.11778 49.95041 50.11778
2007-01-03 50.23050 50.42188 50.23050 50.39767
2007-01-04 50.42096 50.42096 50.26414 50.33236
2007-01-05 50.37347 50.37347 50.22103 50.33459
2007-01-06 50.24433 50.24433 50.11121 50.18112
2007-01-07 50.13211 50.21561 49.99185 49.99185                                  
                        
 
 sample.xts$Close - sample.xts$Close+1
 
 head(sample.xts)
               Open     High      Low    Close
2007-01-02 50.03978 50.11778 49.95041 51.11778
2007-01-03 50.23050 50.42188 50.23050 51.39767
2007-01-04 50.42096 50.42096 50.26414 51.33236
2007-01-05 50.37347 50.37347 50.22103 51.33459
2007-01-06 50.24433 50.24433 50.11121 51.18112
2007-01-07 50.13211 50.21561 49.99185 50.99185
 
 sample.xts[2007-01-02::2007-01-04]
               Open     High      Low    Close
2007-01-02 50.03978 50.11778 49.95041 51.11778
2007-01-03 50.23050 50.42188 50.23050 51.39767
2007-01-04 50.42096 50.42096 50.26414 51.33236
 
 sample.xts[2007-01-02::2007-01-04]$Close - 
 sample.xts[2007-01-02::2007-01-04]$Close+1
Warning message:
In NextMethod(.Generic) :
  number of items to replace is not a multiple of replacement length

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barplot: segment-wise shading

2014-01-16 Thread Jim Lemon

On 01/17/2014 10:59 AM, Marc Schwartz wrote:


...
Arggh.

No, this is my error for not actually looking at the plot and presuming that it 
would work.

Turns out that it does work for a non-stacked barplot:

   barplot(VADeaths, angle = 1:20 * 10, density = 10, beside = TRUE)

However, internally within barplot(), actually barplot.default(), the manner in 
which the matrix is passed to an internal function called xyrect() to draw the 
segments, is that entire columns are passed, rather than the individual 
segments (counts), when the bars are stacked.

As a result, due to the vector based approach used, only the first 5 values of 
'angle' are actually used, since there are 5 columns, rather than all 20. The 
same impact will be observed when using the default legend that is created.

Thus, I don't believe that there will be an easy (non kludgy) way to do what 
you want, at least with the default barplot() function.

You could fairly easily create/build your own function using ?rect, which is 
what barplot() uses to draw the segments. I am not sure if lattice based 
graphics can do this or perhaps using Hadley's ggplot based approach would 
offer a possibility.

Apologies for the confusion.

Regards,

Marc


Hi Marc and Martin,
When I saw the original message I tried to look at the code for the 
barplot function to see if I could call the rectFill function from 
plotrix into it. Unfortunately barplot is one of those internal 
functions that are not at all easy to hack and I have never gotten 
around to adding stacked bars to the barp function. I thought that 
rectFill would allow you to use more easily discriminated fills than 
angles that only differed by 18 degrees.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Predicting probabilities from a logistic regression by hand (in code)

2014-01-16 Thread erikpukinskis
Thanks for looking at this, I've been tearing my hair out for a day or so
now.

I have done a multiple variable logistic regression in R, and obtained my
coefficients. I am able to make predictions for the training data in R
without problem. But now I would like to create a prediction model in Ruby
(that was the original point of doing the regression) and I'm having some
trouble.

Basically, my equation is:

predicted_logit = K + v1*c1 + v2*c2 + ... vn*cn
odds_ratio = e^predicted_logit/(1+e^predicted_logit)

But it always seems to either give 1.0 or 0.0! The output of predict() in R
is generally something nice and soft like 0.5578460!

I realize not everyone knows Ruby, but I'll include my code here for
reference:

# These are the coefficients that R gives me from my logistic regression:
intercept = 0.2700309

coefficients = {
  high: 1.0136028, 
  low: 1.0016712, 
  germ_mean: 1.0233327,
  gdds: 0.9990283,
  early_gdds: 0.9986464,
  mid_gdds: 1.0002979,
  late_gdds: 0
}

# And this is what R predicts for one datum:
#
#   outcome high low germ_mean gdds early_gdds mid_gdds late_gdds p_success
# 1   1   73  2840  119  0   9128 0.5578460
# ...

# So to get my own p_success, first I multiply each coefficient by it's
input data
period = {:high=73, :low=28, :germ_mean=40, :gdds=119, :early_gdds=0,
:mid_gdds=91, :late_gdds=28}
products = coefficients.map {|name,value| period[name]*value }

# Then I add those together and add that to the intercept
predicted_logit = intercept + products.sum

# Then my probability should be e^predicted_logit over 1 +
e^predicted_logit:
odds_ratio = Math.exp(predicted_logit) / (1 + Math.exp(predicted_logit))

# But the odds ratio comes out as 1.0, not 0.5578460 like R predicts.



--
View this message in context: 
http://r.789695.n4.nabble.com/Predicting-probabilities-from-a-logistic-regression-by-hand-in-code-tp4683713.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barplot: segment-wise shading

2014-01-16 Thread Martin Weiser
Jim Lemon píše v Pá 17. 01. 2014 v 13:21 +1100:
 On 01/17/2014 10:59 AM, Marc Schwartz wrote:
 
  ...
  Arggh.
 
  No, this is my error for not actually looking at the plot and presuming 
  that it would work.
 
  Turns out that it does work for a non-stacked barplot:
 
 barplot(VADeaths, angle = 1:20 * 10, density = 10, beside = TRUE)
 
  However, internally within barplot(), actually barplot.default(), the 
  manner in which the matrix is passed to an internal function called 
  xyrect() to draw the segments, is that entire columns are passed, rather 
  than the individual segments (counts), when the bars are stacked.
 
  As a result, due to the vector based approach used, only the first 5 values 
  of 'angle' are actually used, since there are 5 columns, rather than all 
  20. The same impact will be observed when using the default legend that is 
  created.
 
  Thus, I don't believe that there will be an easy (non kludgy) way to do 
  what you want, at least with the default barplot() function.
 
  You could fairly easily create/build your own function using ?rect, which 
  is what barplot() uses to draw the segments. I am not sure if lattice based 
  graphics can do this or perhaps using Hadley's ggplot based approach would 
  offer a possibility.
 
  Apologies for the confusion.
 
  Regards,
 
  Marc
 
 Hi Marc and Martin,
 When I saw the original message I tried to look at the code for the 
 barplot function to see if I could call the rectFill function from 
 plotrix into it. Unfortunately barplot is one of those internal 
 functions that are not at all easy to hack and I have never gotten 
 around to adding stacked bars to the barp function. I thought that 
 rectFill would allow you to use more easily discriminated fills than 
 angles that only differed by 18 degrees.
 
 Jim

Hi,

after Marc pointed me out where to look for, I hacked barplot.default a
bit, so now it does what I want (I added segmentwise argument).
Unfortunately, it works well with segmentwise = TRUE, but not with
segmentwise = FALSE (default)
With segmentwise = FALSE, density argument works only in 1/n-th of the
segments, where n is the number of columns (it seems like it refuses to
auto-multiplicate, but I do not know why).
Any ideas?

Martin

Here is my hack of barplot:

my.barplot-
function (height, width = 1, space = NULL, names.arg = NULL, 
legend.text = NULL, beside = FALSE, horiz = FALSE, density = NULL, 
angle = 45, col = NULL, border = par(fg), main = NULL, 
sub = NULL, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL, 
xpd = TRUE, log = , axes = TRUE, axisnames = TRUE, cex.axis =
par(cex.axis), 
cex.names = par(cex.axis), inside = TRUE, plot = TRUE, 
axis.lty = 0, offset = 0, add = FALSE, args.legend = NULL,
segmentwise = FALSE,
...) 
{
if (!missing(inside)) 
.NotYetUsed(inside, error = FALSE)
if (is.null(space)) 
space - if (is.matrix(height)  beside) 
c(0, 1)
else 0.2
space - space * mean(width)
if (plot  axisnames  is.null(names.arg)) 
names.arg - if (is.matrix(height)) 
colnames(height)
else names(height)
if (is.vector(height) || (is.array(height)  (length(dim(height))
== 
1))) {
height - cbind(height)
beside - TRUE
if (is.null(col)) 
col - grey
}
else if (is.matrix(height)) {
if (is.null(col)) 
col - gray.colors(nrow(height))
}
else stop('height' must be a vector or a matrix)
if (is.logical(legend.text)) 
legend.text - if (legend.text  is.matrix(height)) 
rownames(height)
stopifnot(is.character(log))
logx - logy - FALSE
if (log != ) {
logx - length(grep(x, log))  0L
logy - length(grep(y, log))  0L
}
if ((logx || logy)  !is.null(density)) 
stop(Cannot use shading lines in bars when log scale is used)
NR - nrow(height)
NC - ncol(height)
if (beside) {
if (length(space) == 2) 
space - rep.int(c(space[2L], rep.int(space[1L], 
NR - 1)), NC)
width - rep(width, length.out = NR)
}
else {
width - rep(width, length.out = NC)
}
offset - rep(as.vector(offset), length.out = length(width))
delta - width/2
w.r - cumsum(space + width)
w.m - w.r - delta
w.l - w.m - delta
log.dat - (logx  horiz) || (logy  !horiz)
if (log.dat) {
if (min(height + offset, na.rm = TRUE) = 0) 
stop(log scale error: at least one 'height + offset' value
= 0)
if (logx  !is.null(xlim)  min(xlim) = 0) 
stop(log scale error: 'xlim' = 0)
if (logy  !is.null(ylim)  min(ylim) = 0) 
stop(log scale error: 'ylim' = 0)
rectbase - if (logy  !horiz  !is.null(ylim)) 
ylim[1L]
else if (logx  horiz  !is.null(xlim)) 
xlim[1L]
else 0.9 * min(height, na.rm = TRUE)
}
else rectbase - 0
if (!beside) 

[R] Any recommendations for reusable profiling of name fields?

2014-01-16 Thread Jeff Johnson
Hi, I'm pretty new to R and am trying to develop a reusable set of scripts
that I can use to profile various data types and common fields in our
database. I know that what I'm asking is a can of worms, so please bear
with me. :)

For example, we store a person's first name, last name, phone number, email
address, last gift amount, gift date, etc. as well as integer type data.
I'm wondering if there's a best practice for validating a field that
holds, for example, first name or last name. A couple of things I've come
up with are:
1) Count of characters (nchar) in the first (or last) name field
2) Number of unique tokens
3) Patterns (converting alpha to A and numeric to N) and count the
frequency of each unique pattern that results.I suppose I could make lower
case alpha 'a' and upper = 'A' to be more specific.
4) Min and max name (helps identify those with leading spaces, numbers)

Does anyone have more suggestions for techniques that are common or that
you'd recommend for name fields? Ultimately, I'm looking to develop a
common set of profiles for various data types, so if there's a white paper
(I've googled, but not found any that hit the mark yet) I'd love to see it.

Perhaps there's even a package for this type of thing?

Thanks much!

-- 
Jeff

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Estimating parameters of 3 parameters lognormal distribution

2014-01-16 Thread Vito Ricci
Hi Goran,

thanks for your suggestion, but I believe it's not helpful for me...

phreg statement Proportional hazards model with parametric baseline hazard(s). 
 Allows for stratification with dif-ferent scale and shape in each stratum, and 
left truncated and right censored data

I've data whose distribution is lognormal with three parameters, I need to fit 
this model and its 3 parameters, especially the the 3rd, the theresold.
Regards.
VR


 
Se non ora, quando?
Se non qui, dove?
Se non tu, chi?



Il Venerdì 17 Gennaio 2014 8:26, Vito Ricci vito_ri...@yahoo.com ha scritto:
 
Many thanks for your suggestion.
Regards.
VR



 
Se non ora, quando?
Se non qui, dove?
Se non tu, chi?



Il Giovedì 16 Gennaio 2014 22:31, Göran Broström goran.brost...@umu.se ha 
scritto:
 
On 01/16/2014 04:59 PM, Vito Ricci wrote:
 Hi guys,

 is there in some R package a statement to fit parameters in a 3 parameters 
 lognormal distribution.

Yes, the function 'phreg' in the package 'eha'.

Göran Broström



 Many thanks
 Vito Ricci
     [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read
 the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tables package and alternative to col percent

2014-01-16 Thread Daniel Cher
Thanks for the reply. Another great option would be missing (like in SAS),
especially for factors.  I'm struggling to figure out how to do this with
tables.

Daniel Cher, MD
djc...@gmail.com
+1-650-269-5763

This message and its attachments are confidential.

-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] 
Sent: Monday, January 13, 2014 2:13 AM
To: Daniel Cher; r-help@r-project.org
Subject: Re: [R] tables package and alternative to col percent

On 14-01-13 12:02 AM, Daniel Cher wrote:
 Library tables and tabular function is neato.



 I'm trying to figure out how to get percents other than just row and 
 columns. I'd like a percent of a factor.

That's a recent addition, still only on R-forge.





 library(tables)



 c=data.frame(

gender=c(1,1,1,1,2,2,2,2),

race=c(3,3,4,4,4,4,4,4)

 )

 tabular(

Factor(gender,Gender) *

Factor(race, Race) + 1 ~

(n=1) + Percent(col),

data=c

 )





 The above produces:



 Gender Race n Percent

 1  32  25

  42  25

   2  30   0

  44  50

  All  8 100







 I'm looking for percents to have gender=1 or gender=2 as the denominator.
 I.e.,


You would get the table below using

Percent(denom = Equal(Gender))

Duncan Murdoch







 Gender Race n Percent

 1  32  *50*

  42  *50*

   2  30   *0*

  44  *100*

  All  8 100











 Daniel Cher, MD

   mailto:djc...@gmail.com djc...@gmail.com

 +1-650-269-5763



 This message and its attachments are confidential.\ \  
 \...{{dropped:8}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doubt in simple merge

2014-01-16 Thread kingsly
Thank you dear friends.  You have cleared my first doubt.  

My second doubt:
I have the same data sets Elder and Younger. Elder - data.frame(
  ID=c(ID1,ID2,ID3),
  age=c(38,35,31))
Younger - data.frame(
  ID=c(ID4,ID5,ID3),
  age=c(29,21,NA))


 Row ID3 comes in both data set. It has a value (31) in Elder while NA in 
Younger.

I need output like this.

ID    age
ID1  38
ID2  35
ID3  31
ID4  29
ID5  21 

Kindly help me.



On Thursday, 16 January 2014 9:16 PM, Marc Schwartz-3 [via R] 
ml-node+s789695n4683682...@n4.nabble.com wrote:
 
Not quite: 

 rbind(Elder, Younger) 
   ID age 
1 ID1  38 
2 ID2  35 
3 ID3  31 
4 ID4  29 
5 ID5  21 
6 ID3  31 

Note that ID3 is duplicated. 


Should be: 

 merge(Elder, Younger, by = c(ID, age), all = TRUE) 
   ID age 
1 ID1  38 
2 ID2  35 
3 ID3  31 
4 ID4  29 
5 ID5  21 


He wants to do a join on both ID and age to avoid duplications of rows when 
the same ID and age occur in both data frames. If the same column names (eg 
Var) appears in both data frames and are not part of the 'by' argument, you 
end up with Var.x and Var.y in the result. 

In the case of two occurrences of the same ID but two different ages, if that 
is possible, both rows would be added to the result using the above code. 

Regards, 

Marc Schwartz 


On Jan 16, 2014, at 9:04 AM, Frede Aakmann Tøgersen [hidden email] wrote: 



__ 
[hidden email] mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 






If you reply to this email, your message will be added to the discussion below: 
http://r.789695.n4.nabble.com/Doubt-in-simple-merge-tp4683671p4683682.html 
To start a new topic under R help, email ml-node+s789695n78969...@n4.nabble.com 
To unsubscribe from R help, click here.
NAML



--
View this message in context: 
http://r.789695.n4.nabble.com/Doubt-in-simple-merge-tp4683671p4683718.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Setting hetmap.2 Color Key Range Outside of Data Limits

2014-01-16 Thread Dario Strbenac
Hello,

There are many questions about making the limit of the colour key smaller than 
the data range, but I have the opposite problem.

Assume one heatmap has data in the range 6 to 12 and another has data in the 
range 6 to 9. By providing the same breaks argument to both plots, the heatmaps 
are coloured as it should be, but for the second heatmap, the range of the 
colour key is just from 6 to 9. I'd like to force the second colour key to go 
up to 12 also. How can this be achieved ? My use case is that I have identified 
a number of clusters in a gene expression dataset, and I would like to avoid 
plotting them in one large heatmap, but as multiple smaller heatmaps.

Also, unless key = FALSE, having a heatmap with values in only one colour bin 
causes
Error in axis(1, at = xv, labels = lv) : no locations are finite.

Perhaps this could also be handled more gracefully. I am using R 3.02.

--
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.