Re: [R] Behavior of self-defined function within ddply
Hi, May be this helps: small - read.table(text=monthend_n ticker wgtdiff ret interval b1 b2 b3 b4 b5 b6 1 19990228 AA 0.7172 -2.58 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904 1.108 2 19990228 AAPL -0.0828 -15.48 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904 1.108 3 19990228 ABCW 0.0966 -7.36 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904 1.108 705 19990331 AA 0.1932 1.7 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 0.816 706 19990331 AAPL 0.033 3.23 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 0.816 707 19990331 ABF 0.154 -20.51 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 0.816 708 19990331 ABI 0.286 8.33 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 0.816,sep=,header=TRUE,stringsAsFactors=FALSE) res - mutate(small,bins=unlist(dlply(small,.(monthend_n),cutfunc))) res$bins #[1] 4 2 3 4 3 3 4 ddply(small,.(monthend_n),summarize,bins=cut(wgtdiff,breaks=unique(c(b1,b2,b3,b4,b5,b6)),labels=F))[,2] #[1] 4 2 3 4 3 3 4 unlist(lapply(split(small,small$monthend_n),cutfunc),use.names=FALSE) #[1] 4 2 3 4 3 3 4 A.K. On Thursday, January 16, 2014 2:01 AM, Amitabh Dugar cleverc...@yahoo.com wrote: I have a dataframe small whch has 5,000 rows and contains data for several tickers every month, as below: monthend_n ticker wgtdiff ret interval b1 b2 b3 b4 b5 b6 1 19990228 AA 0.7172 -2.58 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904 1.108 2 19990228 AAPL -0.0828 -15.48 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904 1.108 3 19990228 ABCW 0.0966 -7.36 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904 1.108 … … 705 19990331 AA 0.1932 1.7 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 0.816 706 19990331 AAPL 0.033 3.23 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 0.816 707 19990331 ABF 0.154 -20.51 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 0.816 708 19990331 ABI 0.286 8.33 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998 0.816 etc. Variables b1 through b6 are break points that I want to use in the cut function and they vary each month according to the distribution of the variable wgtdiff during that month. To handle this I wrote a function as below: cutfunc - function(df) { vec - df$wgtdiff # need to apply unique function as break points within each month are same for all tickers (b1-b6 values same in each within month) breaks - c(unique(df$b1), unique(df$b2), unique(df$b3), unique(df$b4), unique(df$b5), unique(df$b6)) bin - cut(vec, breaks,labels=F) bin } Then I tried: temp4 - ddply(small, .(monthend_n), summarize, bins=cutfunc(small)) I was expecting to get back a data frame with 5,000 rows with bins assignments for each ticker, and if there are 6 break points the bin #s should range from 1 to 5. However instead I get a data frame with 40,000 rows and bin # ranging from 1- 40, as below: monthend_n bins 1 19990228 40 2 19990228 17 3 19990228 22 ... 5000 19990228 17 5001 19990331 40 5002 19990331 17 5003 19990331 22 etc It seems ddply doesn't pass in monthly pieces of the data frame small into my cutfunc in the way I expect Any guidance is appreciated. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subgroups definition with rpart
Hello everyone, I just completed a recursive pratitionning analysis, using rpart, and have a beautiful tree with 6 terminal nodes. Each terminal node containing a precise number of patients (it's a clinical study), I'd like to create a new variable informing in which terminal node are locating all the patients. More clearly maybe : patient 1: node 5; patient 2 : node 3; patient 3 : node 6, etc... I thank you already for your answers Jeremy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subgroups definition with rpart
Search for 'where' in ?rpart.object (linked from ?rpart). On 16/01/2014 08:50, Jérémy Lambert wrote: Hello everyone, I just completed a recursive pratitionning analysis, using rpart, and have a beautiful tree with 6 terminal nodes. Each terminal node containing a precise number of patients (it's a clinical study), I'd like to create a new variable informing in which terminal node are locating all the patients. More clearly maybe : patient 1: node 5; patient 2 : node 3; patient 3 : node 6, etc... I thank you already for your answers Jeremy -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] predefined area under the curve
Dear UseRs of R, My sincere apologizes in advance if my question isn't relevant to the operations in R. I actually have the following two columns data, with 12 rows in it. dput(el) structure(c(-1.42607687227285, -1.0200762327862, -0.736315917376129, -0.502402223373355, -0.293381232121193, -0.0965586152896391, 0.0965586152896391, 0.293381232121194, 0.502402223373355, 0.73631591737613, 1.0200762327862, 1.42607687227285, 1.99095972340185, 1.84006682649012, 1.71563586990498, 1.60312301737773, 0.748443534297919, 0.696909774793038, 0.64586377528834, 0.594330015783459, 0.270606020696256, 0.24247780756, 0.211370068418158, 0.173646844190226), .Dim = c(12L, 2L), .Dimnames = list( NULL, c(, GG))) When I plot column 2 against column 1 , i get a curve with an area [auc(column1,column2)] under it equals to 2.602997. As i am calibrating it for further simulations therefore i know that the area under the curve should actually be equal to 2.845. I also know that the first 6 rows have been located accurately, therefore the rows from 7 to 12 need to be relocated in such a manner that area under the curve gets equal to or as close as possible to 2.845. How can I do that? i have been doing it manually but at the cost of time and accuracy. Thankyou very much in advance. Elisa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ifelse...
Hi, Sorry for the newbie question! My code: x - 'a' ifelse(x == 'a',y - 1, y - 2) print(y) Shouldn't this assign a value of 1? When I execute this I get: x - 'a' ifelse(x == 'a',y - 1, y - 2) [1] 1 print(y) [1] 2 Am I doing something really daft??? thanks! sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines parallel stats graphics grDevices utils datasets methods base other attached packages: [1] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 ggplot2_0.9.3.1 sda_1.3.2 fdrtool_1.2.11 corpcor_1.6.6 [8] entropy_1.2.0 scatterplot3d_0.3-34 pdist_1.2 hash_2.2.6 DAAG_1.18 multicore_0.1-7 multtest_2.18.0 [15] XML_3.95-0.2 hgu133a.db_2.10.1 affy_1.40.0 genefilter_1.44.0 GOstats_2.28.0 graph_1.40.1 Category_2.28.0 [22] GO.db_2.10.1 venneuler_1.1-0 rJava_0.9-6 colorRamps_2.3 RColorBrewer_1.0-5 sparcl_1.0.3 gap_1.1-10 [29] plotrix_3.5-2 som_0.3-5 pvclust_1.2-2 lsr_0.3.1 compute.es_0.2-2 sm_2.2-5.3 imputation_2.0.1 [36] locfit_1.5-9.1 TimeProjection_0.2.0 Matrix_1.1-1.1 timeDate_3010.98 lubridate_1.3.3 gbm_2.1 lattice_0.20-24 [43] survival_2.37-4 RobustRankAggreg_1.1 impute_1.36.0 reshape_0.8.4 plyr_1.8 zoo_1.7-10 data.table_1.8.10 [50] foreach_1.4.1 foreign_0.8-57 languageR_1.4.1 preprocessCore_1.24.0 gtools_3.1.1 BiocInstaller_1.12.0 org.Hs.eg.db_2.10.1 [57] RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 biomaRt_2.18.0 loaded via a namespace (and not attached): [1] affyio_1.30.0 annotate_1.40.0 AnnotationForge_1.4.4 codetools_0.2-8 colorspace_1.2-4 dichromat_2.0-0 digest_0.6.4 [8] grid_3.0.2 GSEABase_1.24.0 gtable_0.1.2 iterators_1.0.6 labeling_0.2 latticeExtra_0.6-26 MASS_7.3-29 [15] munsell_0.4.2 proto_0.3-10 RBGL_1.38.0 RCurl_1.95-4.1 reshape2_1.2.2 scales_0.2.3 stats4_3.0.2 [22] stringr_0.6.2 tools_3.0.2 xtable_1.7-1 zlibbioc_1.8.0 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ifelse...
You want y - ifelse(x == 'a', 1, 2) ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Tim Smith Verzonden: donderdag 16 januari 2014 14:32 Aan: r Onderwerp: [R] ifelse... Hi, Sorry for the newbie question! My code: x - 'a' ifelse(x == 'a',y - 1, y - 2) print(y) Shouldn't this assign a value of 1? When I execute this I get: x - 'a' ifelse(x == 'a',y - 1, y - 2) [1] 1 print(y) [1] 2 Am I doing something really daft??? thanks! sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines parallel stats graphics grDevices utils datasets methods base other attached packages: [1] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 ggplot2_0.9.3.1 sda_1.3.2 fdrtool_1.2.11corpcor_1.6.6 [8] entropy_1.2.0 scatterplot3d_0.3-34 pdist_1.2 hash_2.2.6DAAG_1.18 multicore_0.1-7 multtest_2.18.0 [15] XML_3.95-0.2 hgu133a.db_2.10.1 affy_1.40.0 genefilter_1.44.0 GOstats_2.28.0graph_1.40.1 Category_2.28.0 [22] GO.db_2.10.1 venneuler_1.1-0 rJava_0.9-6 colorRamps_2.3RColorBrewer_1.0-5sparcl_1.0.3 gap_1.1-10 [29] plotrix_3.5-2 som_0.3-5 pvclust_1.2-2 lsr_0.3.1 compute.es_0.2-2 sm_2.2-5.3 imputation_2.0.1 [36] locfit_1.5-9.1TimeProjection_0.2.0 Matrix_1.1-1.1timeDate_3010.98 lubridate_1.3.3 gbm_2.1 lattice_0.20-24 [43] survival_2.37-4 RobustRankAggreg_1.1 impute_1.36.0 reshape_0.8.4 plyr_1.8 zoo_1.7-10 data.table_1.8.10 [50] foreach_1.4.1 foreign_0.8-57 languageR_1.4.1 pr! eprocessCore_1.24.0 gtools_3.1.1 BiocInstaller_1.12.0 org.Hs.eg.db_2.10.1 [57] RSQLite_0.11.4DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0BiocGenerics_0.8.0biomaRt_2.18.0 loaded via a namespace (and not attached): [1] affyio_1.30.0 annotate_1.40.0 AnnotationForge_1.4.4 codetools_0.2-8 colorspace_1.2-4 dichromat_2.0-0 digest_0.6.4 [8] grid_3.0.2GSEABase_1.24.0 gtable_0.1.2 iterators_1.0.6 labeling_0.2 latticeExtra_0.6-26 MASS_7.3-29 [15] munsell_0.4.2 proto_0.3-10 RBGL_1.38.0 RCurl_1.95-4.1reshape2_1.2.2scales_0.2.3 stats4_3.0.2 [22] stringr_0.6.2 tools_3.0.2 xtable_1.7-1 zlibbioc_1.8.0 [[alternative HTML version deleted]] * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Doubt in simple merge
Dear R community I have a two data set called Elder and Younger. This is my code for simple merge. Elder - data.frame(  ID=c(ID1,ID2,ID3),  age=c(38,35,31)) Younger - data.frame(  ID=c(ID4,ID5,ID3),  age=c(29,21,31)) mer - merge(Elder,Younger,by=ID, all=T) Output I am expecting: ID   age ID1 38 ID2  35 ID3 31 ID4 29 ID5 21 It looks very simple. But I need help. When I run the code it gives me age.x and age.y. thank you -- View this message in context: http://r.789695.n4.nabble.com/Doubt-in-simple-merge-tp4683671.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doubt in simple merge
You are telling it to merge by ID only. But it sounds like you would like it to merge by both ID and age. merge(Elder, Younger, all=TRUE) Jean On Thu, Jan 16, 2014 at 6:25 AM, kingsly ecoking...@yahoo.co.in wrote: Dear R community I have a two data set called Elder and Younger. This is my code for simple merge. Elder - data.frame( ID=c(ID1,ID2,ID3), age=c(38,35,31)) Younger - data.frame( ID=c(ID4,ID5,ID3), age=c(29,21,31)) mer - merge(Elder,Younger,by=ID, all=T) Output I am expecting: IDage ID1 38 ID2 35 ID3 31 ID4 29 ID5 21 It looks very simple. But I need help. When I run the code it gives me age.x and age.y. thank you -- View this message in context: http://r.789695.n4.nabble.com/Doubt-in-simple-merge-tp4683671.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ifelse...
On 16/01/2014 8:46 AM, ONKELINX, Thierry wrote: You want y - ifelse(x == 'a', 1, 2) or use if, rather than ifelse, i.e. if (x == 'a') { y - 1 } else { y - 2 } ifelse() is mainly used when you want to work with whole vectors of decisions, e.g. x - 1:10 ifelse(x 5, 1, 0) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doubt in simple merge
No I think the OP wants mer - merge(Elder, Younger) Br. Frede Oprindelig meddelelse Fra: Adams, Jean Dato:16/01/2014 15.45 (GMT+01:00) Til: kingsly Cc: R help Emne: Re: [R] Doubt in simple merge You are telling it to merge by ID only. But it sounds like you would like it to merge by both ID and age. merge(Elder, Younger, all=TRUE) Jean On Thu, Jan 16, 2014 at 6:25 AM, kingsly ecoking...@yahoo.co.in wrote: Dear R community I have a two data set called Elder and Younger. This is my code for simple merge. Elder - data.frame( ID=c(ID1,ID2,ID3), age=c(38,35,31)) Younger - data.frame( ID=c(ID4,ID5,ID3), age=c(29,21,31)) mer - merge(Elder,Younger,by=ID, all=T) Output I am expecting: IDage ID1 38 ID2 35 ID3 31 ID4 29 ID5 21 It looks very simple. But I need help. When I run the code it gives me age.x and age.y. thank you -- View this message in context: http://r.789695.n4.nabble.com/Doubt-in-simple-merge-tp4683671.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doubt in simple merge
Ups, sorry that should have been mer - rbind(Elder, Younger) /frede Oprindelig meddelelse Fra: Frede Aakmann Tøgersen Dato:16/01/2014 15.54 (GMT+01:00) Til: Adams, Jean ,kingsly Cc: R help Emne: Re: [R] Doubt in simple merge No I think the OP wants mer - merge(Elder, Younger) Br. Frede Oprindelig meddelelse Fra: Adams, Jean Dato:16/01/2014 15.45 (GMT+01:00) Til: kingsly Cc: R help Emne: Re: [R] Doubt in simple merge You are telling it to merge by ID only. But it sounds like you would like it to merge by both ID and age. merge(Elder, Younger, all=TRUE) Jean On Thu, Jan 16, 2014 at 6:25 AM, kingsly ecoking...@yahoo.co.in wrote: Dear R community I have a two data set called Elder and Younger. This is my code for simple merge. Elder - data.frame( ID=c(ID1,ID2,ID3), age=c(38,35,31)) Younger - data.frame( ID=c(ID4,ID5,ID3), age=c(29,21,31)) mer - merge(Elder,Younger,by=ID, all=T) Output I am expecting: IDage ID1 38 ID2 35 ID3 31 ID4 29 ID5 21 It looks very simple. But I need help. When I run the code it gives me age.x and age.y. thank you -- View this message in context: http://r.789695.n4.nabble.com/Doubt-in-simple-merge-tp4683671.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] DTM Package removeSparseTerms function question
IN inspect(removeSparseTerms(dtm, 0.4)) does anyone knows how the sparse term A numeric for the maximal allowed sparsity works? ie what is the difference between say 0.2, 0.4 0.6? Thanks for your help -- View this message in context: http://r.789695.n4.nabble.com/DTM-Package-removeSparseTerms-function-question-tp4683678.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doubt in simple merge
Not quite: rbind(Elder, Younger) ID age 1 ID1 38 2 ID2 35 3 ID3 31 4 ID4 29 5 ID5 21 6 ID3 31 Note that ID3 is duplicated. Should be: merge(Elder, Younger, by = c(ID, age), all = TRUE) ID age 1 ID1 38 2 ID2 35 3 ID3 31 4 ID4 29 5 ID5 21 He wants to do a join on both ID and age to avoid duplications of rows when the same ID and age occur in both data frames. If the same column names (eg Var) appears in both data frames and are not part of the 'by' argument, you end up with Var.x and Var.y in the result. In the case of two occurrences of the same ID but two different ages, if that is possible, both rows would be added to the result using the above code. Regards, Marc Schwartz On Jan 16, 2014, at 9:04 AM, Frede Aakmann Tøgersen fr...@vestas.com wrote: Ups, sorry that should have been mer - rbind(Elder, Younger) /frede Oprindelig meddelelse Fra: Frede Aakmann Tøgersen Dato:16/01/2014 15.54 (GMT+01:00) Til: Adams, Jean ,kingsly Cc: R help Emne: Re: [R] Doubt in simple merge No I think the OP wants mer - merge(Elder, Younger) Br. Frede Oprindelig meddelelse Fra: Adams, Jean Dato:16/01/2014 15.45 (GMT+01:00) Til: kingsly Cc: R help Emne: Re: [R] Doubt in simple merge You are telling it to merge by ID only. But it sounds like you would like it to merge by both ID and age. merge(Elder, Younger, all=TRUE) Jean On Thu, Jan 16, 2014 at 6:25 AM, kingsly ecoking...@yahoo.co.in wrote: Dear R community I have a two data set called Elder and Younger. This is my code for simple merge. Elder - data.frame( ID=c(ID1,ID2,ID3), age=c(38,35,31)) Younger - data.frame( ID=c(ID4,ID5,ID3), age=c(29,21,31)) mer - merge(Elder,Younger,by=ID, all=T) Output I am expecting: IDage ID1 38 ID2 35 ID3 31 ID4 29 ID5 21 It looks very simple. But I need help. When I run the code it gives me age.x and age.y. thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Estimating parameters of 3 parameters lognormal distribution
Hi guys, is there in some R package a statement to fit parameters in a 3 parameters lognormal distribution. Many thanks Vito Ricci [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Revolutions blog: December roundup
Happy New Year (if a little late!). Revolution Analytics staff write about R every weekday at the Revolutions blog: http://blog.revolutionanalytics.com and every month I post a summary of articles from the previous month of particular interest to readers of r-help. In case you missed them, here are some articles related to R from the month of December: A ComputerWorld tutorial on basic data processing with R: http://bit.ly/1cvhuqI Prediction: R will replace legacy SAS solutions and go mainstream http://bit.ly/1cvhtmS A chart of the growth of R user groups and local R meetings: http://bit.ly/1cvhuqH I discussed R, data science and big data in an interview with technology journalist Robert Scoble: http://bit.ly/1cvhuqG Looking at the evidence supporting the growth of R and Python: http://bit.ly/1cvhtmQ A replay of Mario Inchiosa’s webinar on scalable cross-platform R-based predictive analytics: http://bit.ly/1cvhuqF A look at the distribution of the number of R package dependencies: http://bit.ly/1cvhuqJ Revolution R Enterprise 7 is now available, with free download for academic users: http://bit.ly/1cvhtD7 Estimating the empirical distribution of Twitter follower counts with R: http://bit.ly/1cvhtD8 How R is used by insurance companies for catastrophe modeling: http://bit.ly/1cvhuqM Sheri Gilley creates an interactive chart of R package dependencies with DeployR, rCharts, and AngularJS: http://bit.ly/1cvhuqO Joseph Rickert offers 15 tips for computing with Big Data in R: http://bit.ly/1cvhuqN Daniel Hanson provides a step-by-step guide to download financial time data from Quandl into R, and then chart and analyze the time series using the xts package: http://bit.ly/1cvhuqR Luba Gloukhov used cluster analysis in R to allocate single-malt scotch whiskies to four distinct flavour profiles: http://bit.ly/1cvhuqS Some non-R stories in the past month included: Big Data Analytics predictions for 2014 (http://bit.ly/1cvhuqT), forced perspective illusions (http://bit.ly/1cvhtDb), analytics with Apache Spark (http://bit.ly/1cvhuqW), wind pattern visualization (http://bit.ly/1cvhuqX), privacy by design (http://bit.ly/1cvhtDc), Big Data Analytics platforms (http://bit.ly/1cvhuHb), the leidenfrost effect (http://bit.ly/1cvhuHa), big data and video gaming (http://bit.ly/1cvhuHi) and an ASCII fluid simulator (http://bit.ly/1cvhtDf). Meeting times for local R user groups (http://bit.ly/eC5YQe) can be found on the updated R Community Calendar at: http://bit.ly/bb3naW If you're looking for more articles about R, you can find summaries from previous months at http://blog.revolutionanalytics.com/roundups/. You can receive daily blog posts via email using services like blogtrottr.com, or join the Revolution Analytics mailing list at http://revolutionanalytics.com/newsletter to be alerted to new articles on a monthly basis. As always, thanks for the comments and please keep sending suggestions to me at da...@revolutionanalytics.com . Don't forget you can also follow the blog using an RSS reader, or by following me on Twitter (I'm @revodavid). Cheers, # David -- David M Smith da...@revolutionanalytics.com VP of Marketing, Revolution Analytics http://blog.revolutionanalytics.com Tel: +1 (650) 646-9523 (Seattle WA, USA) Twitter: @revodavid __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting color into an irregular grid comprised of polygons
As a follow up to this thread started nearly a month ago, I'm in need of help sorting out the R code that will create a log-scale legend in the call to image.plot below. As the last line of the code provided below shows, I attempted to force the labeling through the argument legend.lab, to no avail. The -4 on the legend label should actually be 0.0001, -3 = 0.001, etc etc according to the inverse of the log base 10. If possible, it would be nice to create legend similar to the one created herehttp://r.789695.n4.nabble.com/sppolot-fill-below-minimum-legend-value-td902841.html, only that uses spplot and I need to stick with the base graphics since I'm using irregular polygons to draw the figure. However, if someone is able to show how to replace the -4 with 0.0001 etc, that would be a great place to start. Here's the reproducible R code (note that log10 is taken of the matrix vals and is what guides the color fill in the plot): library(gsubfn) #uses paste0 func library(colorRamps) #uses blue2green2red() library(fields) #uses image.plot(..., legend.only=TRUE, ...) z.space - c(0.6790521,0.3454826,0.1872356,0.0891079,0.1525315,0.1088516,0.0950484,0.1128700,0.1247511,0.1188105,0.1143682,0.1232529,0.0930168,0.0751814,0.0511553,0.0244765,0.0424162,0.0435835,0.0577441,0.0471291,0.0974984,0.0303579,0.0234230,0.0378371,0.0396388,0.0278040,0.0427108,0.0450803,0.0735903,0.1499654,0.0235646,0.0309285,0.0770295,0.0687763,0.1007385,0.0666026,0.1083643,0.1092819,0.1372624,0.2248670,0.2620903,0.4606435,0.6262846,1.7111480,1.7111480,1.7111480,1.7111480,1.7111480,1.7111480,1.7111480,0.8662780,1.1220410,0.5368302) x.space - c(0.4477580,0.4058683,0.5047908,0.3488354,0.3170296,0.2280360,0.2371574,0.1658813,0.2098874,0.2441864,0.3050745,0.4087275,0.4448988,0.4416195,0.4020654,0.0862620,0.0332546,0.0871109,0.3531576,0.3037825,0.2396926,0.2351304,0.2144404,0.0733572,0.0338528,0.2016122,0.1533454,0.1265044,0.0932833,0.0481462,0.0662010,0.0150457,0.0481462,0.0270822,0.0318521,0.0995603,0.0583223,0.0371142,0.0854215,0.0577332,0.0883671,0.0786467,0.0787786,0.1135672,0.0897309,0.1659446,0.8536263,0.8536263,0.8536263,0.8536263,0.5794602,0.2741660) #Obs - read.csv(Obs_Loc_for_R.txt,header=T) x.range - c(0,sum(x.space)) z.range - c(0,sum(z.space)) z.sum - sum(z.space) ### # Read HK ### vals - matrix(c(0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00, 0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.1498912E-03,0.5414670E-02, 0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.9835267E-03,0.6889354E-01,0.1004814E-01,
Re: [R] Inserting color into an irregular grid comprised of polygons
On 01/17/2014 05:12 AM, Morway, Eric wrote: As a follow up to this thread started nearly a month ago, I'm in need of help sorting out the R code that will create a log-scale legend in the call to image.plot below. As the last line of the code provided below shows, I attempted to force the labeling through the argument legend.lab, to no avail. The -4 on the legend label should actually be 0.0001, -3 = 0.001, etc etc according to the inverse of the log base 10. If possible, it would be nice to create legend similar to the one created herehttp://r.789695.n4.nabble.com/sppolot-fill-below-minimum-legend-value-td902841.html, only that uses spplot and I need to stick with the base graphics since I'm using irregular polygons to draw the figure. However, if someone is able to show how to replace the -4 with 0.0001 etc, that would be a great place to start. Here's the reproducible R code (note that log10 is taken of the matrix vals and is what guides the color fill in the plot): library(gsubfn) #uses paste0 func library(colorRamps) #uses blue2green2red() library(fields) #uses image.plot(..., legend.only=TRUE, ...) z.space- c(0.6790521,0.3454826,0.1872356,0.0891079,0.1525315,0.1088516,0.0950484,0.1128700,0.1247511,0.1188105,0.1143682,0.1232529,0.0930168,0.0751814,0.0511553,0.0244765,0.0424162,0.0435835,0.0577441,0.0471291,0.0974984,0.0303579,0.0234230,0.0378371,0.0396388,0.0278040,0.0427108,0.0450803,0.0735903,0.1499654,0.0235646,0.0309285,0.0770295,0.0687763,0.1007385,0.0666026,0.1083643,0.1092819,0.1372624,0.2248670,0.2620903,0.4606435,0.6262846,1.7111480,1.7111480,1.7111480,1.7111480,1.7111480,1.7111480,1.7111480,0.8662780,1.1220410,0.5368302) x.space- c(0.4477580,0.4058683,0.5047908,0.3488354,0.3170296,0.2280360,0.2371574,0.1658813,0.2098874,0.2441864,0.3050745,0.4087275,0.4448988,0.4416195,0.4020654,0.0862620,0.0332546,0.0871109,0.3531576,0.3037825,0.2396926,0.2351304,0.2144404,0.0733572,0.0338528,0.2016122,0.1533454,0.1265044,0.0932833,0.0481462,0.0662010,0.0150457,0.0481462,0.0270822,0.0318521,0.0995603,0.0583223,0.0371142,0.0854215,0.0577332,0.0883671,0.0786467,0.0787786,0.1135672,0.0897309,0.1659446,0.8536263,0.8536263,0.8536263,0.8536263,0.5794602,0.2741660) #Obs- read.csv(Obs_Loc_for_R.txt,header=T) x.range- c(0,sum(x.space)) z.range- c(0,sum(z.space)) z.sum- sum(z.space) ### # Read HK ### vals- matrix(c(0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00, 0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.1498912E-03,0.5414670E-02, 0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.000E+00,0.9835267E-03,0.6889354E-01,0.1004814E-01,
Re: [R] names of columns
Try: dat1 - read.table(text=a b c d 1 0.5 0.1 0.2 0.2 5 0.3 0.5 0.1 0.1,sep=,header=TRUE) data.frame(Names=apply(dat1,1,function(x) names(x)[x %in% max(x)])) # Names #1 a #5 b #or colnames(dat1)[apply(dat1,1,which.max)] #[1] a b A.K. Hi, I need a small help... If I have a data frame like a b c d 1 0.5 0.1 0.2 0.2 5 0.3 0.5 0.1 0.1 I need the name of the column with the biggest number in each column. My results will be 1 a 5 b If I do apply(data.frame, 1, max) I have the maximum by row but I want the name, not the value... Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] names of columns
I guess, you could also do: names(dat1)[max.col(dat1)] #[1] a b A.K. On Thursday, January 16, 2014 3:47 PM, arun smartpink...@yahoo.com wrote: Try: dat1 - read.table(text=a b c d 1 0.5 0.1 0.2 0.2 5 0.3 0.5 0.1 0.1,sep=,header=TRUE) data.frame(Names=apply(dat1,1,function(x) names(x)[x %in% max(x)])) # Names #1 a #5 b #or colnames(dat1)[apply(dat1,1,which.max)] #[1] a b A.K. Hi, I need a small help... If I have a data frame like a b c d 1 0.5 0.1 0.2 0.2 5 0.3 0.5 0.1 0.1 I need the name of the column with the biggest number in each column. My results will be 1 a 5 b If I do apply(data.frame, 1, max) I have the maximum by row but I want the name, not the value... Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regression Modeling Strategies 4-Day Short Course March 2013
My yearly Regression Modeling Strategies course is expanded to 4 days this year to be able relax the pace a bit. Details are below. Questions welcomed. - *RMS Short Course 2014* Frank E. Harrell, Jr., Ph.D., Professor and Chair Department of Biostatistics, Vanderbilt University School of Medicine *March 4, 5, 6 7, 2014* 9:00am - 4:00pm (9:00am - 2:00pm March 7) Alumni Hall Vanderbilt University Nashville Tennessee USA See http://biostat.mc.vanderbilt.edu/2014RMSShortCourse for details. The course includes statistical methodology, case studies, and use of the R rms package. Please email interest to Audrey Carvajal {audrey.carva...@vanderbilt.edu} -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting rows from data frame that approximately equal another data frame
Hi, May be this helps: x - data.frame(V1=-1.162877, V2=0.1848928) set.seed(245) df - as.data.frame(matrix(rnorm(5051*2),ncol=2)) cut1 - cut(df[,1],breaks=c(x[,1]-0.1,x[,1]+0.1)) cut2 - cut(df[,2],breaks=c(x[,2]-0.1,x[,2]+0.1)) df1 - df[!is.na(cut1) !is.na(cut2),] A.K. I have a dataframe and would like to extract rows that approximately equal to the values in another data frame. say I have a data frame called x dim(x) [1] 1 2 x V1 V2 x -1.162877 0.1848928 I would like to search through a larger data frame called df and extract all rows that approximately equal the two values in the data frame x by say +- 0.1. The larger dataframe has these dimensions dim(df) [1] 5051 2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] barplot: segment-wise shading
Dear listers, I would like to make stacked barplot, and to be able to define shading (density or angle) segment-wise, i.e. NOT like here: # Bar shading example barplot(VADeaths, angle = 15+10*1:5, density = 20, col = black, legend = rownames(VADeaths)) The example has 5 different angles of shading, I would like to have as many possible angle values as there are segments (i.e. 20 in the VADeaths example). I was not successful using web search. Any advice? Thank you for your patience. With the best regards, Martin Weiser __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counts and percentage of multiple categorical columns in R
Also, You can do the same with the previous solution: result1 - result[,-6] vec1 - unique(unlist(dat1)) result2 - as.data.frame(t(sapply(dat1,function(x) {counts- table(factor(x,levels=vec1)); percentage-sprintf(%.1f,(counts/sum(counts))*100); c(paste0(counts,paste0((,percentage,))), sum(!is.na(x)))})),stringsAsFactors=FALSE) result2[,6] - as.numeric(result2[,6]) colnames(result2) - colnames(result1) identical(result1,result2) #[1] TRUE A.K. On Thursday, January 16, 2014 11:24 AM, arun smartpink...@yahoo.com wrote: Hi Jingxia, May be this helps: dat1 - read.table(text=fatfreemilk fatmilk halfmilk 2fatmilk A A A A A B B A B A A A C C C C D . A A A E A E C A B A A . A A A B . A A A B E,sep=,header=TRUE,stringsAsFactors=FALSE,check.names=FALSE,na.strings=.) dat2 - dat1 dat2$id - 1:nrow(dat2) library(reshape2) res - acast(melt(dat2,id.var=id)[,-1],variable~value,length) res[,-6] - paste0(res[,-6],paste0((,sprintf(%.1f,(res[,-6]/rowSums(res[,-6]))*100)),)) result - as.data.frame(res,stringsAsFactors=FALSE) #Either result$nonNAcount - dim(dat1)[1]-as.numeric(result$`NA`) #or result$nonNAcount - sapply(dat1,function(x) sum(!is.na(x))) result[,-6] # A B C D E nonNAcount #fatfreemilk 6(60.0) 1(10.0) 2(20.0) 1(10.0) 0(0.0) 10 #fatmilk 4(50.0) 2(25.0) 1(12.5) 0(0.0) 1(12.5) 8 #halfmilk 5(55.6) 3(33.3) 1(11.1) 0(0.0) 0(0.0) 9 #2fatmilk 7(70.0) 0(0.0) 1(10.0) 0(0.0) 2(20.0) 10 A.K. On Thursday, January 16, 2014 9:49 AM, Jingxia Lin jingxi...@gmail.com wrote: Dear Arun, Sorry to bother you again.. But may I ask you for one more question regarding the data set? I am using the following method you offered for the data set. In our original data, there are some blank cells (i.e. data missing) in some columns. So in the output data frame, can we add an additional column to show the number of response (i.e. the number of non-blank cells)? I tried a couple of ways but failed (sorry I'm really not good at R...) I would be very grateful if you can help us with this problem at your convenience. Thank you! Best, Jingxia dat2 - dat1 dat2$id - 1:nrow(dat2) library(reshape2) res - dcast(melt(dat2,id.var=id)[,-1],variable~value,length) row.names(res) - res[,1] res1 - res[,-1] res2 - as.matrix(res1) res2[]- paste0(res2,paste0((,(res2/rowSums(res2))*100),)) as.data.frame(res2) results # A B C D E #fatfreemilk 6(60) 1(10) 2(20) 1(10) 0(0) #fatmilk 6(60) 2(20) 1(10) 0(0) 1(10) #halfmilk 5(50) 4(40) 1(10) 0(0) 0(0) #2fatmilk 7(70) 0(0) 1(10) 0(0) 2(20) On Mon, Dec 30, 2013 at 3:50 PM, arun smartpink...@yahoo.com wrote: Dear Jingxia, No problem. Happy New Year to you too! Arun On Monday, December 30, 2013 2:49 AM, Jingxia Lin jingxi...@gmail.com wrote: Dear Arun, Thank YOU for your kind help :) Happy new year! Best, Jingxia On Mon, Dec 30, 2013 at 3:43 PM, arun smartpink...@yahoo.com wrote: Dear Jingxia, Glad that you were able to figure it out. I was away from my computer. My name is 'Arun Kirshna Sasikala-Appukuttan'. I am a postdoctoral research fellow at Wayne State University, Detroit, MI, USA. Thank you for the kind acknowledgment. Regards, Arun On Sunday, December 29, 2013 9:25 PM, Jingxia Lin jingxi...@gmail.com wrote: Dear A.K. I also solved the character problem by using library(xlsx). So everything is fine now. Thank you again! Best, Jingxia On Mon, Dec 30, 2013 at 10:17 AM, Jingxia Lin jingxi...@gmail.com wrote: Dear A.K., Thank you a lot! I tried your way and it works perfect. The only thing I haven't figured out is that while I exported the final data frame into an excel file, all Chinese characters were not shown correctly (my original data has Chinese in row/column names). Other than that, everything is great! Would you mind letting me know your name so that we can acknowledge your help in our paper? Thank you again! Best Jingxia On Mon, Dec 30, 2013 at 3:48 AM, arun smartpink...@yahoo.com wrote: Hi, Try: dat1 - read.table(text=fatfreemilk fatmilk halfmilk 2fatmilk A A A A A B B A B A A A C C C C D A A A A E A E C A B A A A A A A B B A A A B E,sep=,header=TRUE,stringsAsFactors=FALSE,check.names=FALSE) dat2 - dat1 dat2$id - 1:nrow(dat2) library(reshape2) res - dcast(melt(dat2,id.var=id)[,-1],variable~value,length) row.names(res) - res[,1] res1 - res[,-1] res2 - as.matrix(res1) res2[]- paste0(res2,paste0((,(res2/rowSums(res2))*100),)) as.data.frame(res2) # A B C D E #fatfreemilk 6(60) 1(10) 2(20) 1(10) 0(0) #fatmilk 6(60) 2(20) 1(10) 0(0) 1(10) #halfmilk 5(50) 4(40) 1(10) 0(0) 0(0) #2fatmilk 7(70) 0(0) 1(10) 0(0) 2(20) A.K. On Sunday, December 29, 2013 1:07 PM, Jingxia Lin jingxi...@gmail.com wrote: Dear R helpers, I have a data sheet (“milk”) with four types of milk from five brands (A, B, C, D, E), the column shows the brands
Re: [R] counts and percentage of multiple categorical columns in R
Hi Jingxia, May be this helps: dat1 - read.table(text=fatfreemilk fatmilk halfmilk 2fatmilk A A A A A B B A B A A A C C C C D . A A A E A E C A B A A . A A A B . A A A B E,sep=,header=TRUE,stringsAsFactors=FALSE,check.names=FALSE,na.strings=.) dat2 - dat1 dat2$id - 1:nrow(dat2) library(reshape2) res - acast(melt(dat2,id.var=id)[,-1],variable~value,length) res[,-6] - paste0(res[,-6],paste0((,sprintf(%.1f,(res[,-6]/rowSums(res[,-6]))*100)),)) result - as.data.frame(res,stringsAsFactors=FALSE) #Either result$nonNAcount - dim(dat1)[1]-as.numeric(result$`NA`) #or result$nonNAcount - sapply(dat1,function(x) sum(!is.na(x))) result[,-6] # A B C D E nonNAcount #fatfreemilk 6(60.0) 1(10.0) 2(20.0) 1(10.0) 0(0.0) 10 #fatmilk 4(50.0) 2(25.0) 1(12.5) 0(0.0) 1(12.5) 8 #halfmilk 5(55.6) 3(33.3) 1(11.1) 0(0.0) 0(0.0) 9 #2fatmilk 7(70.0) 0(0.0) 1(10.0) 0(0.0) 2(20.0) 10 A.K. On Thursday, January 16, 2014 9:49 AM, Jingxia Lin jingxi...@gmail.com wrote: Dear Arun, Sorry to bother you again.. But may I ask you for one more question regarding the data set? I am using the following method you offered for the data set. In our original data, there are some blank cells (i.e. data missing) in some columns. So in the output data frame, can we add an additional column to show the number of response (i.e. the number of non-blank cells)? I tried a couple of ways but failed (sorry I'm really not good at R...) I would be very grateful if you can help us with this problem at your convenience. Thank you! Best, Jingxia dat2 - dat1 dat2$id - 1:nrow(dat2) library(reshape2) res - dcast(melt(dat2,id.var=id)[,-1],variable~value,length) row.names(res) - res[,1] res1 - res[,-1] res2 - as.matrix(res1) res2[]- paste0(res2,paste0((,(res2/rowSums(res2))*100),)) as.data.frame(res2) results # A B C D E #fatfreemilk 6(60) 1(10) 2(20) 1(10) 0(0) #fatmilk 6(60) 2(20) 1(10) 0(0) 1(10) #halfmilk 5(50) 4(40) 1(10) 0(0) 0(0) #2fatmilk 7(70) 0(0) 1(10) 0(0) 2(20) On Mon, Dec 30, 2013 at 3:50 PM, arun smartpink...@yahoo.com wrote: Dear Jingxia, No problem. Happy New Year to you too! Arun On Monday, December 30, 2013 2:49 AM, Jingxia Lin jingxi...@gmail.com wrote: Dear Arun, Thank YOU for your kind help :) Happy new year! Best, Jingxia On Mon, Dec 30, 2013 at 3:43 PM, arun smartpink...@yahoo.com wrote: Dear Jingxia, Glad that you were able to figure it out. I was away from my computer. My name is 'Arun Kirshna Sasikala-Appukuttan'. I am a postdoctoral research fellow at Wayne State University, Detroit, MI, USA. Thank you for the kind acknowledgment. Regards, Arun On Sunday, December 29, 2013 9:25 PM, Jingxia Lin jingxi...@gmail.com wrote: Dear A.K. I also solved the character problem by using library(xlsx). So everything is fine now. Thank you again! Best, Jingxia On Mon, Dec 30, 2013 at 10:17 AM, Jingxia Lin jingxi...@gmail.com wrote: Dear A.K., Thank you a lot! I tried your way and it works perfect. The only thing I haven't figured out is that while I exported the final data frame into an excel file, all Chinese characters were not shown correctly (my original data has Chinese in row/column names). Other than that, everything is great! Would you mind letting me know your name so that we can acknowledge your help in our paper? Thank you again! Best Jingxia On Mon, Dec 30, 2013 at 3:48 AM, arun smartpink...@yahoo.com wrote: Hi, Try: dat1 - read.table(text=fatfreemilk fatmilk halfmilk 2fatmilk A A A A A B B A B A A A C C C C D A A A A E A E C A B A A A A A A B B A A A B E,sep=,header=TRUE,stringsAsFactors=FALSE,check.names=FALSE) dat2 - dat1 dat2$id - 1:nrow(dat2) library(reshape2) res - dcast(melt(dat2,id.var=id)[,-1],variable~value,length) row.names(res) - res[,1] res1 - res[,-1] res2 - as.matrix(res1) res2[]- paste0(res2,paste0((,(res2/rowSums(res2))*100),)) as.data.frame(res2) # A B C D E #fatfreemilk 6(60) 1(10) 2(20) 1(10) 0(0) #fatmilk 6(60) 2(20) 1(10) 0(0) 1(10) #halfmilk 5(50) 4(40) 1(10) 0(0) 0(0) #2fatmilk 7(70) 0(0) 1(10) 0(0) 2(20) A.K. On Sunday, December 29, 2013 1:07 PM, Jingxia Lin jingxi...@gmail.com wrote: Dear R helpers, I have a data sheet (“milk”) with four types of milk from five brands (A, B, C, D, E), the column shows the brands that each customer chose for each type of the milk they bought. The data sheet goes like below. You can see for some type of milk, no brand is chosen. fatfreemilk fatmilk halfmilk 2fatmilk A A A A A B B A B A A A C C C C D A A A A E A E C A B A A A A A A B B A A A B E I want to summarize each column so that for each type of milk, i know the counts and percentages of the brands chosen for each milk type. I tried summary in R, but the result is not shown nicely. How I can display the result in a way like below: A B C D E
[R] Object not Found Error on a .csv file
Hello: I am a new user, running the latest version of R on my Mac. I have started by reading a file with the read.csv command: task2analyses - read.csv(file=GroupsWithRTsEqualN.csv,head=TRUE,sep=,) When I print it out in R, the file appears to be intact, with the proper headers. Yet, when I run test commands like the following: cor(GfullUA,GFullUA) I get back a message Error in is.data.frame(y) : object 'GFullUA' not found This is not the case for all of the variables in the file. Some can be tested with is.numeric (or character). And some are read properly when I test the summary command. I have looked on Google, and have been unsuccessful in searching documentation. Thanks for any help, Valerie Shalin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on adding a p-value in bwplot
Hi, I am using bwplot to depict the box plots for two group by 6 time points. I need to add 6 p-values in each time point to compare two group at each time point. P-values are (0.0020, 0.0204, 0.3361, 0.0185, 0.1981, and 0.6677). I could depict the two box plots per each time point using the code below, but I am not sure how to add a p-value per each time point. Please let me know if you know how to do it. Thanks! library(lattice) library(Hmisc) library(gridExtra) font.settings - list( font = 1, cex = 1, fontfamily = serif) my.theme - list( box.umbrella = list(col = black), box.rectangle = list(fill= rep(c(black, black),2)), box.dot = list(col = black, pch = 3, cex=2), plot.symbol = list(cex = 0.5, col = 1, pch= 0), #outlier size and color par.xlab.text = font.settings, par.ylab.text = font.settings, axis.text = font.settings, superpose.symbol=list(fill=c(white,black)), # boxplots superpose.polygon=list(col=c(white,black)), # legend par.sub=font.settings) kccqd - sas.get(I:/Protocol/Datasets/2013/09302013/DataForQOL/, kccq_long,formats=F, sasprog=sasprog) kccqlong - subset(kccqd, month 72) id - (kccqlong$master.id) group - (kccqlong$rdrug12) month - (kccqlong$month) kccq.pred - (kccqlong$kccq.pred) kccq.raw - (kccqlong$kccq.raw) bwplot(kccq.raw ~ time, data = kccqlong, groups = group, ylim=c(-100,100), pch = |, box.width = 1/3, auto.key = list(points = FALSE, rectangles = TRUE, space = right, title=Treatment, cex.title=1), panel = panel.superpose, ylab = Change in KCCQ (Raw) from baseline, xlab=Visit, par.settings = my.theme, panel.groups = function(x, y, ..., group.number) { panel.bwplot(x + (group.number-1.5)/3, y, ...) }) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Object not Found Error on a .csv file
Hi Valerie, Assuming GfullUA is a column of your data frame task2analyses, you need to tell R where to look. It's trying to find an object called GfullUA, and there isn't one. Here are two ways: with(task2analyses, cor(GfullUA, GFullUA)) cor(task2analyses$GfullUA, task2analyses$GFullUA) You might want to read the Introduction to R that came with your software installation. Sarah On Thu, Jan 16, 2014 at 1:32 PM, Valerie Shalin vale...@knoesis.org wrote: Hello: I am a new user, running the latest version of R on my Mac. I have started by reading a file with the read.csv command: task2analyses - read.csv(file=GroupsWithRTsEqualN.csv,head=TRUE,sep=,) When I print it out in R, the file appears to be intact, with the proper headers. Yet, when I run test commands like the following: cor(GfullUA,GFullUA) I get back a message Error in is.data.frame(y) : object 'GFullUA' not found This is not the case for all of the variables in the file. Some can be tested with is.numeric (or character). And some are read properly when I test the summary command. I have looked on Google, and have been unsuccessful in searching documentation. Thanks for any help, Valerie Shalin -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot: segment-wise shading
On Jan 16, 2014, at 12:45 PM, Martin Weiser weis...@natur.cuni.cz wrote: Dear listers, I would like to make stacked barplot, and to be able to define shading (density or angle) segment-wise, i.e. NOT like here: # Bar shading example barplot(VADeaths, angle = 15+10*1:5, density = 20, col = black, legend = rownames(VADeaths)) The example has 5 different angles of shading, I would like to have as many possible angle values as there are segments (i.e. 20 in the VADeaths example). I was not successful using web search. Any advice? Thank you for your patience. With the best regards, Martin Weiser You could do something like this: # Get the dimensions of VADeaths dim(VADeaths) [1] 5 4 # How many segments? prod(dim(VADeaths)) [1] 20 Then use that value in the barplot() arguments as you desire, for example: barplot(VADeaths, angle = 15 + 10 * 1:prod(dim(VADeaths)), density = 20, col = black, legend = rownames(VADeaths)) or wrap the barplot() function in your own, which pre-calculates the values and then passes them to the barplot() call in the function. See ?dim and ?prod Be aware that a vector (eg. 1:5) will be 'dim-less', thus if you are going to use this approach for a vector based data object, you would want to use ?length Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Estimating parameters of 3 parameters lognormal distribution
On 01/16/2014 04:59 PM, Vito Ricci wrote: Hi guys, is there in some R package a statement to fit parameters in a 3 parameters lognormal distribution. Yes, the function 'phreg' in the package 'eha'. Göran Broström Many thanks Vito Ricci [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot: segment-wise shading
Marc Schwartz píše v Čt 16. 01. 2014 v 16:46 -0600: On Jan 16, 2014, at 12:45 PM, Martin Weiser weis...@natur.cuni.cz wrote: Dear listers, I would like to make stacked barplot, and to be able to define shading (density or angle) segment-wise, i.e. NOT like here: # Bar shading example barplot(VADeaths, angle = 15+10*1:5, density = 20, col = black, legend = rownames(VADeaths)) The example has 5 different angles of shading, I would like to have as many possible angle values as there are segments (i.e. 20 in the VADeaths example). I was not successful using web search. Any advice? Thank you for your patience. With the best regards, Martin Weiser You could do something like this: # Get the dimensions of VADeaths dim(VADeaths) [1] 5 4 # How many segments? prod(dim(VADeaths)) [1] 20 Then use that value in the barplot() arguments as you desire, for example: barplot(VADeaths, angle = 15 + 10 * 1:prod(dim(VADeaths)), density = 20, col = black, legend = rownames(VADeaths)) or wrap the barplot() function in your own, which pre-calculates the values and then passes them to the barplot() call in the function. See ?dim and ?prod Be aware that a vector (eg. 1:5) will be 'dim-less', thus if you are going to use this approach for a vector based data object, you would want to use ?length Regards, Marc Schwartz Hello, thank you for your attempt, but this does not work (for me). This produces 5 angles of shading, not 20. Maybe because of my R version (R version 2.15.1 (2012-06-22); Platform: i486-pc-linux-gnu (32-bit))? Thank you. Regards, Martin Weiser __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grepping with a variable trouble
Hi, It's not clear about the pattern in your rownames. In the for() loop, I guess you need rownames(df) instead of df. Using an example dataset (Here the rownames may be different) set.seed(59) x - as.data.frame(matrix(rnorm(110),ncol=2)) set.seed(24) row.names(x) - paste0(row.names(x),Reduce(`paste0`,lapply(1:2,function(x) sample(letters,55,replace=TRUE set.seed(435) df - as.data.frame(matrix(sample(200,300*20,replace=TRUE),ncol=20)) set.seed(34) row.names(df) - paste0(sample(1:55,300,replace=TRUE),Reduce(`paste0`,lapply(1:2,function(x) sample(letters,300,replace=TRUE gl1 - sapply(rownames(x),function(i) grep(paste0(gsub(\\d+,,i),$),rownames(df))) gl1[2] #$`2fl` #[1] 128 rownames(df[128,]) #[1] 31fl A.K. Hi, I'm having trouble using grep with a variable. When I do this it works fine: grep(^hb$, rownames(df)) [1] 9359 but what I really want to do is use the rownames of 1 data frame (x) to extract the position of that same rowname in a larger data frame (df). How can I do this for say all of the rownames in x? The positions should be stored in a variable called g. dim(x) [1] 55 2 dim(df) [1] 13000 19 I've tried this but it does not seem to work. for(i in rownames(x)){ g - grep(paste(^,i,$,sep=), df) } any ideas? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot: segment-wise shading
On Jan 16, 2014, at 5:03 PM, Martin Weiser weis...@natur.cuni.cz wrote: Marc Schwartz píše v Čt 16. 01. 2014 v 16:46 -0600: On Jan 16, 2014, at 12:45 PM, Martin Weiser weis...@natur.cuni.cz wrote: Dear listers, I would like to make stacked barplot, and to be able to define shading (density or angle) segment-wise, i.e. NOT like here: # Bar shading example barplot(VADeaths, angle = 15+10*1:5, density = 20, col = black, legend = rownames(VADeaths)) The example has 5 different angles of shading, I would like to have as many possible angle values as there are segments (i.e. 20 in the VADeaths example). I was not successful using web search. Any advice? Thank you for your patience. With the best regards, Martin Weiser You could do something like this: # Get the dimensions of VADeaths dim(VADeaths) [1] 5 4 # How many segments? prod(dim(VADeaths)) [1] 20 Then use that value in the barplot() arguments as you desire, for example: barplot(VADeaths, angle = 15 + 10 * 1:prod(dim(VADeaths)), density = 20, col = black, legend = rownames(VADeaths)) or wrap the barplot() function in your own, which pre-calculates the values and then passes them to the barplot() call in the function. See ?dim and ?prod Be aware that a vector (eg. 1:5) will be 'dim-less', thus if you are going to use this approach for a vector based data object, you would want to use ?length Regards, Marc Schwartz Hello, thank you for your attempt, but this does not work (for me). This produces 5 angles of shading, not 20. Maybe because of my R version (R version 2.15.1 (2012-06-22); Platform: i486-pc-linux-gnu (32-bit))? Thank you. Regards, Martin Weiser Arggh. No, this is my error for not actually looking at the plot and presuming that it would work. Turns out that it does work for a non-stacked barplot: barplot(VADeaths, angle = 1:20 * 10, density = 10, beside = TRUE) However, internally within barplot(), actually barplot.default(), the manner in which the matrix is passed to an internal function called xyrect() to draw the segments, is that entire columns are passed, rather than the individual segments (counts), when the bars are stacked. As a result, due to the vector based approach used, only the first 5 values of 'angle' are actually used, since there are 5 columns, rather than all 20. The same impact will be observed when using the default legend that is created. Thus, I don't believe that there will be an easy (non kludgy) way to do what you want, at least with the default barplot() function. You could fairly easily create/build your own function using ?rect, which is what barplot() uses to draw the segments. I am not sure if lattice based graphics can do this or perhaps using Hadley's ggplot based approach would offer a possibility. Apologies for the confusion. Regards, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Model averaging using QAICc
On 2014-01-15 11:00, r-help-requ...@r-project.org wrote: Date: Wed, 15 Jan 2014 16:39:17 +1000 From: Diana Virkkid.vir...@griffith.edu.au To:r-help@r-project.org Subject: [R] Model averaging using QAICc Message-ID: CAL6nRQcAyN-3SVeZSMXoJq=vsxotpg3e0prwjw7iu7g20b+...@mail.gmail.com Content-Type: text/plain Hi all, I am having some trouble running GLMM's and using model averaging with QAICc. Let me know if you need more detail here: I am trying to run GLMM's on count data in the package glmmADMB with a negative binomial distribution due to overdispersion. The dispersion parameter has now reduced to 2.679 for the global model (from a dispersion parameter of 27.507 with a poisson distribution), and I am not sure if this is still considered too high for running the models? I would like to try to use QAICc's for model selection and model averaging with the package MuMIn. I have so far been able to produce a QAICc output only for the models. I read that model averaging with QAICc can be done in MuMIn but cannot find the syntax to get these outputs, including the model weightings, parameter estimates, confidence intervals, and relative variable importance. Use argument 'rank' to provide the information criterion to use: - with 'dredge': rank = QAICc, chat = c-hat - with 'model.sel' and 'model.avg' : rank = QAICc, rank.args = list(chat = c-hat) See example(QAICc) and example(model.avg) kamil The University of Aberdeen is a charity registered in Scotland, No SC013683. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xts error: number of items to replace is not a multiple of replacement length
Dear all , I am getting this error while trying to change columns of an xts object with a date range as index. library(xts) Loading required package: zoo Attaching package: ‘zoo’ The following object is masked from ‘package:base’: as.Date, as.Date.numeric data(sample_matrix) sample.xts - as.xts(sample_matrix, descr='my new xts object') head(sample.xts) Open High LowClose 2007-01-02 50.03978 50.11778 49.95041 50.11778 2007-01-03 50.23050 50.42188 50.23050 50.39767 2007-01-04 50.42096 50.42096 50.26414 50.33236 2007-01-05 50.37347 50.37347 50.22103 50.33459 2007-01-06 50.24433 50.24433 50.11121 50.18112 2007-01-07 50.13211 50.21561 49.99185 49.99185 sample.xts$Close - sample.xts$Close+1 head(sample.xts) Open High LowClose 2007-01-02 50.03978 50.11778 49.95041 51.11778 2007-01-03 50.23050 50.42188 50.23050 51.39767 2007-01-04 50.42096 50.42096 50.26414 51.33236 2007-01-05 50.37347 50.37347 50.22103 51.33459 2007-01-06 50.24433 50.24433 50.11121 51.18112 2007-01-07 50.13211 50.21561 49.99185 50.99185 sample.xts[2007-01-02::2007-01-04] Open High LowClose 2007-01-02 50.03978 50.11778 49.95041 51.11778 2007-01-03 50.23050 50.42188 50.23050 51.39767 2007-01-04 50.42096 50.42096 50.26414 51.33236 sample.xts[2007-01-02::2007-01-04]$Close - sample.xts[2007-01-02::2007-01-04]$Close+1 Warning message: In NextMethod(.Generic) : number of items to replace is not a multiple of replacement length __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xts error: number of items to replace is not a multiple of replacement length
Hi, Try: sample.xts[2007-01-02::2007-01-04,Close] -sample.xts[2007-01-02::2007-01-04,Close] +1 sample.xts[2007-01-02::2007-01-04] # Open High Low Close #2007-01-02 50.03978 50.11778 49.95041 52.11778 #2007-01-03 50.23050 50.42188 50.23050 52.39767 #2007-01-04 50.42096 50.42096 50.26414 52.33236 A.K. On Thursday, January 16, 2014 8:34 PM, ce zadi...@excite.com wrote: Dear all , I am getting this error while trying to change columns of an xts object with a date range as index. library(xts) Loading required package: zoo Attaching package: ‘zoo’ The following object is masked from ‘package:base’: as.Date, as.Date.numeric data(sample_matrix) sample.xts - as.xts(sample_matrix, descr='my new xts object') head(sample.xts) Open High Low Close 2007-01-02 50.03978 50.11778 49.95041 50.11778 2007-01-03 50.23050 50.42188 50.23050 50.39767 2007-01-04 50.42096 50.42096 50.26414 50.33236 2007-01-05 50.37347 50.37347 50.22103 50.33459 2007-01-06 50.24433 50.24433 50.11121 50.18112 2007-01-07 50.13211 50.21561 49.99185 49.99185 sample.xts$Close - sample.xts$Close+1 head(sample.xts) Open High Low Close 2007-01-02 50.03978 50.11778 49.95041 51.11778 2007-01-03 50.23050 50.42188 50.23050 51.39767 2007-01-04 50.42096 50.42096 50.26414 51.33236 2007-01-05 50.37347 50.37347 50.22103 51.33459 2007-01-06 50.24433 50.24433 50.11121 51.18112 2007-01-07 50.13211 50.21561 49.99185 50.99185 sample.xts[2007-01-02::2007-01-04] Open High Low Close 2007-01-02 50.03978 50.11778 49.95041 51.11778 2007-01-03 50.23050 50.42188 50.23050 51.39767 2007-01-04 50.42096 50.42096 50.26414 51.33236 sample.xts[2007-01-02::2007-01-04]$Close - sample.xts[2007-01-02::2007-01-04]$Close+1 Warning message: In NextMethod(.Generic) : number of items to replace is not a multiple of replacement length __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xts error: number of items to replace is not a multiple of replacement length
Indeed it works ! Thanks a lot. But why? -Original Message- From: arun [smartpink...@yahoo.com] Date: 01/16/2014 08:44 PM To: r-help@r-project.org r-help@r-project.org Subject: Re: [R] xts error: number of items to replace is not a multiple of replacement length Hi, Try: sample.xts[2007-01-02::2007-01-04,Close] -sample.xts[2007-01-02::2007-01-04,Close] +1 sample.xts[2007-01-02::2007-01-04] # Open High Low Close #2007-01-02 50.03978 50.11778 49.95041 52.11778 #2007-01-03 50.23050 50.42188 50.23050 52.39767 #2007-01-04 50.42096 50.42096 50.26414 52.33236 A.K. On Thursday, January 16, 2014 8:34 PM, ce zadi...@excite.com wrote: Dear all , I am getting this error while trying to change columns of an xts object with a date range as index. library(xts) Loading required package: zoo Attaching package: ‘zoo’ The following object is masked from ‘package:base’: as.Date, as.Date.numeric data(sample_matrix) sample.xts - as.xts(sample_matrix, descr='my new xts object') head(sample.xts) Open High Low Close 2007-01-02 50.03978 50.11778 49.95041 50.11778 2007-01-03 50.23050 50.42188 50.23050 50.39767 2007-01-04 50.42096 50.42096 50.26414 50.33236 2007-01-05 50.37347 50.37347 50.22103 50.33459 2007-01-06 50.24433 50.24433 50.11121 50.18112 2007-01-07 50.13211 50.21561 49.99185 49.99185 sample.xts$Close - sample.xts$Close+1 head(sample.xts) Open High Low Close 2007-01-02 50.03978 50.11778 49.95041 51.11778 2007-01-03 50.23050 50.42188 50.23050 51.39767 2007-01-04 50.42096 50.42096 50.26414 51.33236 2007-01-05 50.37347 50.37347 50.22103 51.33459 2007-01-06 50.24433 50.24433 50.11121 51.18112 2007-01-07 50.13211 50.21561 49.99185 50.99185 sample.xts[2007-01-02::2007-01-04] Open High Low Close 2007-01-02 50.03978 50.11778 49.95041 51.11778 2007-01-03 50.23050 50.42188 50.23050 51.39767 2007-01-04 50.42096 50.42096 50.26414 51.33236 sample.xts[2007-01-02::2007-01-04]$Close - sample.xts[2007-01-02::2007-01-04]$Close+1 Warning message: In NextMethod(.Generic) : number of items to replace is not a multiple of replacement length __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot: segment-wise shading
On 01/17/2014 10:59 AM, Marc Schwartz wrote: ... Arggh. No, this is my error for not actually looking at the plot and presuming that it would work. Turns out that it does work for a non-stacked barplot: barplot(VADeaths, angle = 1:20 * 10, density = 10, beside = TRUE) However, internally within barplot(), actually barplot.default(), the manner in which the matrix is passed to an internal function called xyrect() to draw the segments, is that entire columns are passed, rather than the individual segments (counts), when the bars are stacked. As a result, due to the vector based approach used, only the first 5 values of 'angle' are actually used, since there are 5 columns, rather than all 20. The same impact will be observed when using the default legend that is created. Thus, I don't believe that there will be an easy (non kludgy) way to do what you want, at least with the default barplot() function. You could fairly easily create/build your own function using ?rect, which is what barplot() uses to draw the segments. I am not sure if lattice based graphics can do this or perhaps using Hadley's ggplot based approach would offer a possibility. Apologies for the confusion. Regards, Marc Hi Marc and Martin, When I saw the original message I tried to look at the code for the barplot function to see if I could call the rectFill function from plotrix into it. Unfortunately barplot is one of those internal functions that are not at all easy to hack and I have never gotten around to adding stacked bars to the barp function. I thought that rectFill would allow you to use more easily discriminated fills than angles that only differed by 18 degrees. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Predicting probabilities from a logistic regression by hand (in code)
Thanks for looking at this, I've been tearing my hair out for a day or so now. I have done a multiple variable logistic regression in R, and obtained my coefficients. I am able to make predictions for the training data in R without problem. But now I would like to create a prediction model in Ruby (that was the original point of doing the regression) and I'm having some trouble. Basically, my equation is: predicted_logit = K + v1*c1 + v2*c2 + ... vn*cn odds_ratio = e^predicted_logit/(1+e^predicted_logit) But it always seems to either give 1.0 or 0.0! The output of predict() in R is generally something nice and soft like 0.5578460! I realize not everyone knows Ruby, but I'll include my code here for reference: # These are the coefficients that R gives me from my logistic regression: intercept = 0.2700309 coefficients = { high: 1.0136028, low: 1.0016712, germ_mean: 1.0233327, gdds: 0.9990283, early_gdds: 0.9986464, mid_gdds: 1.0002979, late_gdds: 0 } # And this is what R predicts for one datum: # # outcome high low germ_mean gdds early_gdds mid_gdds late_gdds p_success # 1 1 73 2840 119 0 9128 0.5578460 # ... # So to get my own p_success, first I multiply each coefficient by it's input data period = {:high=73, :low=28, :germ_mean=40, :gdds=119, :early_gdds=0, :mid_gdds=91, :late_gdds=28} products = coefficients.map {|name,value| period[name]*value } # Then I add those together and add that to the intercept predicted_logit = intercept + products.sum # Then my probability should be e^predicted_logit over 1 + e^predicted_logit: odds_ratio = Math.exp(predicted_logit) / (1 + Math.exp(predicted_logit)) # But the odds ratio comes out as 1.0, not 0.5578460 like R predicts. -- View this message in context: http://r.789695.n4.nabble.com/Predicting-probabilities-from-a-logistic-regression-by-hand-in-code-tp4683713.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot: segment-wise shading
Jim Lemon píše v Pá 17. 01. 2014 v 13:21 +1100: On 01/17/2014 10:59 AM, Marc Schwartz wrote: ... Arggh. No, this is my error for not actually looking at the plot and presuming that it would work. Turns out that it does work for a non-stacked barplot: barplot(VADeaths, angle = 1:20 * 10, density = 10, beside = TRUE) However, internally within barplot(), actually barplot.default(), the manner in which the matrix is passed to an internal function called xyrect() to draw the segments, is that entire columns are passed, rather than the individual segments (counts), when the bars are stacked. As a result, due to the vector based approach used, only the first 5 values of 'angle' are actually used, since there are 5 columns, rather than all 20. The same impact will be observed when using the default legend that is created. Thus, I don't believe that there will be an easy (non kludgy) way to do what you want, at least with the default barplot() function. You could fairly easily create/build your own function using ?rect, which is what barplot() uses to draw the segments. I am not sure if lattice based graphics can do this or perhaps using Hadley's ggplot based approach would offer a possibility. Apologies for the confusion. Regards, Marc Hi Marc and Martin, When I saw the original message I tried to look at the code for the barplot function to see if I could call the rectFill function from plotrix into it. Unfortunately barplot is one of those internal functions that are not at all easy to hack and I have never gotten around to adding stacked bars to the barp function. I thought that rectFill would allow you to use more easily discriminated fills than angles that only differed by 18 degrees. Jim Hi, after Marc pointed me out where to look for, I hacked barplot.default a bit, so now it does what I want (I added segmentwise argument). Unfortunately, it works well with segmentwise = TRUE, but not with segmentwise = FALSE (default) With segmentwise = FALSE, density argument works only in 1/n-th of the segments, where n is the number of columns (it seems like it refuses to auto-multiplicate, but I do not know why). Any ideas? Martin Here is my hack of barplot: my.barplot- function (height, width = 1, space = NULL, names.arg = NULL, legend.text = NULL, beside = FALSE, horiz = FALSE, density = NULL, angle = 45, col = NULL, border = par(fg), main = NULL, sub = NULL, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL, xpd = TRUE, log = , axes = TRUE, axisnames = TRUE, cex.axis = par(cex.axis), cex.names = par(cex.axis), inside = TRUE, plot = TRUE, axis.lty = 0, offset = 0, add = FALSE, args.legend = NULL, segmentwise = FALSE, ...) { if (!missing(inside)) .NotYetUsed(inside, error = FALSE) if (is.null(space)) space - if (is.matrix(height) beside) c(0, 1) else 0.2 space - space * mean(width) if (plot axisnames is.null(names.arg)) names.arg - if (is.matrix(height)) colnames(height) else names(height) if (is.vector(height) || (is.array(height) (length(dim(height)) == 1))) { height - cbind(height) beside - TRUE if (is.null(col)) col - grey } else if (is.matrix(height)) { if (is.null(col)) col - gray.colors(nrow(height)) } else stop('height' must be a vector or a matrix) if (is.logical(legend.text)) legend.text - if (legend.text is.matrix(height)) rownames(height) stopifnot(is.character(log)) logx - logy - FALSE if (log != ) { logx - length(grep(x, log)) 0L logy - length(grep(y, log)) 0L } if ((logx || logy) !is.null(density)) stop(Cannot use shading lines in bars when log scale is used) NR - nrow(height) NC - ncol(height) if (beside) { if (length(space) == 2) space - rep.int(c(space[2L], rep.int(space[1L], NR - 1)), NC) width - rep(width, length.out = NR) } else { width - rep(width, length.out = NC) } offset - rep(as.vector(offset), length.out = length(width)) delta - width/2 w.r - cumsum(space + width) w.m - w.r - delta w.l - w.m - delta log.dat - (logx horiz) || (logy !horiz) if (log.dat) { if (min(height + offset, na.rm = TRUE) = 0) stop(log scale error: at least one 'height + offset' value = 0) if (logx !is.null(xlim) min(xlim) = 0) stop(log scale error: 'xlim' = 0) if (logy !is.null(ylim) min(ylim) = 0) stop(log scale error: 'ylim' = 0) rectbase - if (logy !horiz !is.null(ylim)) ylim[1L] else if (logx horiz !is.null(xlim)) xlim[1L] else 0.9 * min(height, na.rm = TRUE) } else rectbase - 0 if (!beside)
[R] Any recommendations for reusable profiling of name fields?
Hi, I'm pretty new to R and am trying to develop a reusable set of scripts that I can use to profile various data types and common fields in our database. I know that what I'm asking is a can of worms, so please bear with me. :) For example, we store a person's first name, last name, phone number, email address, last gift amount, gift date, etc. as well as integer type data. I'm wondering if there's a best practice for validating a field that holds, for example, first name or last name. A couple of things I've come up with are: 1) Count of characters (nchar) in the first (or last) name field 2) Number of unique tokens 3) Patterns (converting alpha to A and numeric to N) and count the frequency of each unique pattern that results.I suppose I could make lower case alpha 'a' and upper = 'A' to be more specific. 4) Min and max name (helps identify those with leading spaces, numbers) Does anyone have more suggestions for techniques that are common or that you'd recommend for name fields? Ultimately, I'm looking to develop a common set of profiles for various data types, so if there's a white paper (I've googled, but not found any that hit the mark yet) I'd love to see it. Perhaps there's even a package for this type of thing? Thanks much! -- Jeff [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Estimating parameters of 3 parameters lognormal distribution
Hi Goran, thanks for your suggestion, but I believe it's not helpful for me... phreg statement Proportional hazards model with parametric baseline hazard(s). Allows for stratification with dif-ferent scale and shape in each stratum, and left truncated and right censored data I've data whose distribution is lognormal with three parameters, I need to fit this model and its 3 parameters, especially the the 3rd, the theresold. Regards. VR Se non ora, quando? Se non qui, dove? Se non tu, chi? Il Venerdì 17 Gennaio 2014 8:26, Vito Ricci vito_ri...@yahoo.com ha scritto: Many thanks for your suggestion. Regards. VR Se non ora, quando? Se non qui, dove? Se non tu, chi? Il Giovedì 16 Gennaio 2014 22:31, Göran Broström goran.brost...@umu.se ha scritto: On 01/16/2014 04:59 PM, Vito Ricci wrote: Hi guys, is there in some R package a statement to fit parameters in a 3 parameters lognormal distribution. Yes, the function 'phreg' in the package 'eha'. Göran Broström Many thanks Vito Ricci [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tables package and alternative to col percent
Thanks for the reply. Another great option would be missing (like in SAS), especially for factors. I'm struggling to figure out how to do this with tables. Daniel Cher, MD djc...@gmail.com +1-650-269-5763 This message and its attachments are confidential. -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Monday, January 13, 2014 2:13 AM To: Daniel Cher; r-help@r-project.org Subject: Re: [R] tables package and alternative to col percent On 14-01-13 12:02 AM, Daniel Cher wrote: Library tables and tabular function is neato. I'm trying to figure out how to get percents other than just row and columns. I'd like a percent of a factor. That's a recent addition, still only on R-forge. library(tables) c=data.frame( gender=c(1,1,1,1,2,2,2,2), race=c(3,3,4,4,4,4,4,4) ) tabular( Factor(gender,Gender) * Factor(race, Race) + 1 ~ (n=1) + Percent(col), data=c ) The above produces: Gender Race n Percent 1 32 25 42 25 2 30 0 44 50 All 8 100 I'm looking for percents to have gender=1 or gender=2 as the denominator. I.e., You would get the table below using Percent(denom = Equal(Gender)) Duncan Murdoch Gender Race n Percent 1 32 *50* 42 *50* 2 30 *0* 44 *100* All 8 100 Daniel Cher, MD mailto:djc...@gmail.com djc...@gmail.com +1-650-269-5763 This message and its attachments are confidential.\ \ \...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doubt in simple merge
Thank you dear friends. You have cleared my first doubt.  My second doubt: I have the same data sets Elder and Younger. Elder - data.frame(  ID=c(ID1,ID2,ID3),  age=c(38,35,31)) Younger - data.frame(  ID=c(ID4,ID5,ID3),  age=c(29,21,NA))  Row ID3 comes in both data set. It has a value (31) in Elder while NA in Younger. I need output like this. ID   age ID1 38 ID2 35 ID3 31 ID4 29 ID5 21 Kindly help me. On Thursday, 16 January 2014 9:16 PM, Marc Schwartz-3 [via R] ml-node+s789695n4683682...@n4.nabble.com wrote:  Not quite: rbind(Elder, Younger)   ID age 1 ID1  38 2 ID2  35 3 ID3  31 4 ID4  29 5 ID5  21 6 ID3  31 Note that ID3 is duplicated. Should be: merge(Elder, Younger, by = c(ID, age), all = TRUE)   ID age 1 ID1  38 2 ID2  35 3 ID3  31 4 ID4  29 5 ID5  21 He wants to do a join on both ID and age to avoid duplications of rows when the same ID and age occur in both data frames. If the same column names (eg Var) appears in both data frames and are not part of the 'by' argument, you end up with Var.x and Var.y in the result. In the case of two occurrences of the same ID but two different ages, if that is possible, both rows would be added to the result using the above code. Regards, Marc Schwartz On Jan 16, 2014, at 9:04 AM, Frede Aakmann Tøgersen [hidden email] wrote: __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Doubt-in-simple-merge-tp4683671p4683682.html To start a new topic under R help, email ml-node+s789695n78969...@n4.nabble.com To unsubscribe from R help, click here. NAML -- View this message in context: http://r.789695.n4.nabble.com/Doubt-in-simple-merge-tp4683671p4683718.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Setting hetmap.2 Color Key Range Outside of Data Limits
Hello, There are many questions about making the limit of the colour key smaller than the data range, but I have the opposite problem. Assume one heatmap has data in the range 6 to 12 and another has data in the range 6 to 9. By providing the same breaks argument to both plots, the heatmaps are coloured as it should be, but for the second heatmap, the range of the colour key is just from 6 to 9. I'd like to force the second colour key to go up to 12 also. How can this be achieved ? My use case is that I have identified a number of clusters in a gene expression dataset, and I would like to avoid plotting them in one large heatmap, but as multiple smaller heatmaps. Also, unless key = FALSE, having a heatmap with values in only one colour bin causes Error in axis(1, at = xv, labels = lv) : no locations are finite. Perhaps this could also be handled more gracefully. I am using R 3.02. -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.