[R] datatable using dt not able to print background colors
Hey, all! I've got a report that uses datatable from DT to create an rmarkdown html that looks great as an html but when I try to print it, to a printer, or to a pdf the colors I've assigned to cells are not displaying. I'm using chrome and I've clicked on the Background graphics button there, but that doesn't help print the colors. I have tried to run the datatable section of the code using results = 'asis' and eliminating results = 'asis'. Neither seems to help with the My css style at the top of the rmarkdown is .main-container { max-width: 1500px; margin-left: auto; margin-right: auto; table.display td { white-space: wrap; } } td{ -webkit-print-color-adjust:exact !important; print-color-adjust:exact !important; } I added the webkit bit based on what I've found online. Maybe I have something set up incorrectly there? Any ideas or thoughts on how to get this to print the background colors? Thanks matt This communication is intended for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential or otherwise exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender and delete any copies. Thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Differential Gene Expression in R
You can look into the edgeR vignette. To get the vignette type 'vignette("edgeR")' in the R command line. Also, just type 'vignette()' and R will list all the vignette's for your loaded packages. Vignettes often have a model analysis that you can follow along and try to adjust to your specific data. There is also Biostars, https://www.biostars.org/ . However, I doubt you will find anyone on an online forum that will walk you through the whole analysis. Although, there is probably only 10 plus or minus 4 commands for the whole analysis. Alternatively, if you click on the URL you provided below, and at the bottom of that page click on 'SRA Run Selector', scroll down a little on the page you get to and select the runs you want to analyze by checking the appropriate boxes, then click on the grey box on the right that has the word 'Galaxy' in it, and it will load your selected runs into an instance of Galaxy in which it is a little easier to analyze data than on the R command line. In the leftmost column of the galaxy page, scroll down to Genomics Analysis and then click RNA-seq and scroll down a little and you will see that edgeR is available. You will still have to learn a little about edgeR analysis, so reading the vignette will be very helpful. Also, for the comparisons you want to do, statistical help is recommended. Matthew On 8/22/21 2:13 PM, Anas Jamshed wrote: External Email - Use Caution I have downloaded data from: https://secure-web.cisco.com/11QZcUaPohN9T-S3dXC_GmXle9LtWOwH3EZzb3DhLTvve9_5ltt1RpGGssjgmLGBrEaZGEhesLze6XzCJazVRBgu4xc8kHortjlXtfoXyWlsSXouXicfjhSkh_t-WWivcXHpnTvUtVtq9wEKnxWPCPFNu9hprFt91ho02_8XiRAYDkVLcT76BhLbTleUjEezCPbuh9ieLGA6MVW9oiqYERXpYc2dL-KmvVBER3bd-7KiXJJngxji9kbJDDmm-Irysc8aUWDHZZpWkIB8yT_HFAg/https%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fgeo%2Fquery%2Facc.cgi%3Facc%3DGSE162562%26fbclid%3DIwAR0iZQhttG8HzGhFIIMWbFgNszQrVDgiyVChYzQ_ypCx_d-1pn_tm7STjGs and now I want to compare: healthy vs Mild healthy vs Highly exposed seronegative (ishgl) Healthy vs Asymptomatic covid19 patient healthy vs Highly exposed seronegative (non ishgl) from this data. I started like : library(edgeR) library(limma) library(GEOquery) library(Biobase) Sys.setenv("VROOM_CONNECTION_SIZE" = 131072 * 2) setwd("D:\\") untar("GSE162562_RAW.tar") filelist = list.files(pattern = ".*.txt.gz") But after getting text files I don't know how to proceed further. I want to find degs from these files *Plz help me * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://secure-web.cisco.com/1tH9oPhtwGwMZPdSa6iYRfgPKcDjB0RbwLvAsZlByhBsnZOnWMGyfAJedegd7zgzjhBGoJR4l667r5yELyZUobz_rb-7cCszSEx-M4al0kObEUewwS1-66OaSN7ZHYe8OS9Oz6xG6KzS1XBqB5GDyXiA8FMoIEfaq49EamqyjBtwwgsNKpMdy2IyCTZ2dSL_cdkkD5dacTj5gg4PLprBua7uc32IM4bJmSXSAMxd31lqPP9m3V83kjORuTO61SZzQOeTSf8g8HwY6bDJLlOATxA/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help PLEASE do read the posting guide http://secure-web.cisco.com/1qNZPXyZ9T-DwVY58dRhW-s2KI0g8PKqYBjd8eU1WX1DwW8TqASTq2NkdBNjUHF6T9QiEWRKhGinSfo78D3RrHq9hc9HVXYF7t9KAzK-sUNE0Y0IB62wcBJrH8Gd0LS7aus-36dSfndVD9CShsOMfwyMj5KIVQI8sppBOu5xbWhJEYfH3MgGhC_TVJIkQ126GdEuG4wK7xnnBh90fF4tdTJbHmaIWBn4yxPbhSdrYqs7GCgf_Gp4kee0aSyzxk_0WBkd2fPtnz5Ecbqkb1P8C6g/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Joining data frames
I think, but I'm not sure, that when you use merge it basically attaches one data frame to the other. I do not think it matches up entries from a particular column in each data frame (and I know biologists frequently want to match entries from a particular column in each data frame). For that, I think you need a join from the dplyr package. If you do a right join, then it will use only the entries from the second df (the data frame to the right, df1). Entries in df, that are not in df1 will not be in the final (in your example the final is df). So, from you code, you took df and then joined it to where it had entries in df1 and changed df to contain only entries in df that were in df1. Had you done a left_join, then your final data frame, df, would contain only those entries found originally in df and df1 (entries in df1, but not in df would be excluded in the final df). You could do a full_join and then all entries (entries in both data frames, entries in df but not in df1, and entries in df1 but not in df) will be in the final. Maybe something like : (In this case I have created a new data frame, df_final, but you could still go with just changing df.) df_final<- full_join(df, df1, by = c(“Sample”, "Plot")) Matthew On 6/29/21 7:15 PM, Jim Lemon wrote: External Email - Use Caution Hi Esthi, Have you tried something like: df2<-merge(df,df1,by.x="Sample",by.y="Plot",all.y=TRUE) This will get you a right join in "df2", not overwriting "df". Jim On Wed, Jun 30, 2021 at 1:13 AM Esthi Erickson wrote: Hi and thank you in advance, If I have a dataframe, df: Sample Plot Biomass 1 1 1024 1 2 32 2 3 223 2 4 456 3 1 3 2 331 3 3 22151 3 4 1441 And another one, df1: Sample Plot % cover of plant1 % cover of plant2 3 1 32 63 3 2 3 3 3 3 3 4 5 23 I want to join these tables where the columns Sample and Plot are the same. Currently trying: df<- right_join(df, df1, by = c(“Sample”, "Plot")) I am working with a much larger dataset, but it will cut off the data starting at Sample 3 instead of joining the tables while retaining the information from df. Any ideas how I could join them this way? Esthi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://secure-web.cisco.com/1NdijE3bwtTnJ0kSEgtJU1NlrtOK9zEfac9zyeZv87EuBW5RBFz3d1rdtVoxuWjjEZm2ILfmP1KOs1kEsAOECi2THQ-_HKB9EOJWeI57gQdy8H3UbdNo5_jjkMLPJ7OWuokUT-FJwD84kR0uptsG7XUn_xN9NkAZ4ESV6jXCMs_vWVuqkvXkPRfDV0BBMBQWLKxiQKz-9GYTrcqzWGsCc_A1LB3p6YBnMcOeElnau9pAicwrSrzqbNayjDWgW75J91dn1Bpb7rhV4xLELl_KS0g/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help PLEASE do read the posting guide http://secure-web.cisco.com/1yHZkzpQUOKRg8cDG22MQDxPOC13uXEOgchugGyn3LgrkzeHEY3bJmUM7BdgniFNPIUlVK9c26rAxELBoKzCk3QtR375fxo8PTFptWSOByZg9wWZw8ounbb3NvkgZApJHaDn6KCFRf4ym05BIQUG039oUDbsdBh6fa5LNBsdgTIGVetQokelOMdncVxIv_g233z1CF1xfAozJ9-8eetgqhSIh1lRMlheHhpVRDzkSbxAij8APSko49XhpHmsqwOevGN0c3vHgLT2dLLAzvO_ZLA/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://secure-web.cisco.com/1NdijE3bwtTnJ0kSEgtJU1NlrtOK9zEfac9zyeZv87EuBW5RBFz3d1rdtVoxuWjjEZm2ILfmP1KOs1kEsAOECi2THQ-_HKB9EOJWeI57gQdy8H3UbdNo5_jjkMLPJ7OWuokUT-FJwD84kR0uptsG7XUn_xN9NkAZ4ESV6jXCMs_vWVuqkvXkPRfDV0BBMBQWLKxiQKz-9GYTrcqzWGsCc_A1LB3p6YBnMcOeElnau9pAicwrSrzqbNayjDWgW75J91dn1Bpb7rhV4xLELl_KS0g/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help PLEASE do read the posting guide http://secure-web.cisco.com/1yHZkzpQUOKRg8cDG22MQDxPOC13uXEOgchugGyn3LgrkzeHEY3bJmUM7BdgniFNPIUlVK9c26rAxELBoKzCk3QtR375fxo8PTFptWSOByZg9wWZw8ounbb3NvkgZApJHaDn6KCFRf4ym05BIQUG039oUDbsdBh6fa5LNBsdgTIGVetQokelOMdncVxIv_g233z1CF1xfAozJ9-8eetgqhSIh1lRMlheHhpVRDzkSbxAij8APSko49XhpHmsqwOevGN0c3vHgLT2dLLAzvO_ZLA/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] analyzing results from Tuesday's US elections
Bye the way, I thought I had checked my e-mail before sending it, but my last e-mail had an unfortunate typo with an 'I' that originally belonged to the beginning of a deleted sentence. Matthew On 11/17/20 1:54 AM, Matthew McCormack wrote: External Email - Use Caution No reason to apologize. It's a timely and very interesting topic that provides a glimpse into the application of statistics in forensics. I had never heard of Benford's Law before and I think it is really fascinating. One of those very counter intuitive rules that show up in statistics and probability; like the Monty Hall problem. Why in the world does Benford's Law work ? I have been wondering if it could in any way be applied to biological data analysis. (Also, I discovered Stand-up-maths !). Often things are not as easy to figure out as we may first estimate. I think you would have to start with how you would envision a fraud to be committed and then figure out if there is a statistical analysis that could detect it, or develop an anlalysis. For example, if a voting machine were weighting votes and giving 8/10ths of a vote to 'yes' and 10/10ths vote to a 'no'. Is there some statistical analysis that could detect this ? I, Or if someone dumped a couple of thousand fraudulent ballots in a vote counting center, is there some statistical analysis that could detect this ? Who knows, maybe a whole new field waiting to be explored. A oncee-in-a-while dive into a practical application of statistics that has current interest can be fun and enlightening for those interested. Matthew On 11/16/20 9:01 PM, Abby Spurdle wrote: External Email - Use Caution I've come to the conclusion this whole thing was a waste of time. This is after evaluating much of the relevant information. The main problem is a large number of red herrings (some in the data, some in the context), leading pointless data analysis and pointless data collection. It's unlikely that sophisticated software, or sophisticated statistical modelling tools will make any difference. Although pretty plots, and pretty web-graphics are achievable. Sorry list, for encouraging this discussion... __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://secure-web.cisco.com/1icMQVewwCL4P0r0nMcvTG7cQoLGA8vrClXS_7PuCMhfAP5EDlSYNlGppDKYtdY57R0Pqq_TLC4uyH7CSQjzrxbWonQqTR0d7Owzt1oJUshxqjBaYybtXPytcEKTyGL0Wj0aNw-lMCtbQG1wHYe2Gw8r8h0LpQfFihvpv8gyl3L3VpdCfL2GdiuVFUHGynOFY8Lu5fZwQDVdp1bN_ZAAbRHhoQEipiM-vRiK0kf20oD1N3CXQfqyS4O2r9kRmArVLk8RiqyHI0rj_I1iVq5m-bQ/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help PLEASE do read the posting guide http://secure-web.cisco.com/1K7htkVeCfn5qRcheVmtA1IibcAUehTMiQa-HWmOXY4aZKKdTMqGoB7oWO4dEEBc1qJDtaTeaodidutGZhJexhH2C4c_FpLR_XA-z7GOvfq77dIwhWfnGcvj_31a6y-SXgu5nPP4AdpguRqwR433dZOUMo5MtP5xwtOUGO-EcWd4AvW_7NUFljEFGuAMs06pzQoK4BPfSavqq_QAj-R_mHJ4-AgaKn2Fmh2BOhustujXNyeeWi6KXg3oXtQzqi6BL4HMEK7iWvT21SPXOEJZlMg/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] analyzing results from Tuesday's US elections
No reason to apologize. It's a timely and very interesting topic that provides a glimpse into the application of statistics in forensics. I had never heard of Benford's Law before and I think it is really fascinating. One of those very counter intuitive rules that show up in statistics and probability; like the Monty Hall problem. Why in the world does Benford's Law work ? I have been wondering if it could in any way be applied to biological data analysis. (Also, I discovered Stand-up-maths !). Often things are not as easy to figure out as we may first estimate. I think you would have to start with how you would envision a fraud to be committed and then figure out if there is a statistical analysis that could detect it, or develop an anlalysis. For example, if a voting machine were weighting votes and giving 8/10ths of a vote to 'yes' and 10/10ths vote to a 'no'. Is there some statistical analysis that could detect this ? I, Or if someone dumped a couple of thousand fraudulent ballots in a vote counting center, is there some statistical analysis that could detect this ? Who knows, maybe a whole new field waiting to be explored. A oncee-in-a-while dive into a practical application of statistics that has current interest can be fun and enlightening for those interested. Matthew On 11/16/20 9:01 PM, Abby Spurdle wrote: External Email - Use Caution I've come to the conclusion this whole thing was a waste of time. This is after evaluating much of the relevant information. The main problem is a large number of red herrings (some in the data, some in the context), leading pointless data analysis and pointless data collection. It's unlikely that sophisticated software, or sophisticated statistical modelling tools will make any difference. Although pretty plots, and pretty web-graphics are achievable. Sorry list, for encouraging this discussion... __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] analyzing results from Tuesday's US elections
I really like this guy's video as well. (He also has another nice video critiquing a statistical analysis of vote results from Kent county, Michigan that was presented by a Massachusetts Senate candidate, who has some impressive academic credentials. ) And continuing in this same vein of the complexities of statistical analysis by intelligent people here is a video by Mark Nigrini using Benfords analysis on Maricopa County vote results. https://www.youtube.com/watch?v=FrJui5d7BrI_channel=MarkNigrini If you search for Mark Nigrini on Amazon you will see that he has written a major text on Forensic Analysis, specifically forensic accounting investigations, that is now in its second edition as well as an additional two books on analysis with Benford's Law for accounting, auditing, and fraud detection (He plugs the text in the last part of the video). All four books have 4-5 star reviews with 2-48 reviewers. From the tiny amount of reading I have done on Benford's Law, it seems that Nigirini is a leading figure in the use of Benford's Law. In the video he shows that voting results for both Trump and Biden from Maricopa county AZ both agree with Benfords Law. However, he uses the last digit and not the first. A word of caution before you click on that link: he uses Excel ! Matthew On 11/13/20 9:59 PM, Rolf Turner wrote: External Email - Use Caution On Thu, 12 Nov 2020 01:23:06 +0100 Martin Møller Skarbiniks Pedersen wrote: Please watch this video if you wrongly believe that Benford's law easily can be applied to elections results. https://secure-web.cisco.com/1nXQfJ050onRLM1UOwgj-z0o0L3Hj6hd0rCZ7zMpqnBfCDuZcCkxAJZnj7o7Z8ZAUVxYBTf5FBjL2Y-Ca8T_ecO-N54S0KhgRtLoVDgxiEKX9N7eqzuxO0k0HloVcc2lXrXFNAiansI8zHgyUS4gTdKtRsJCHttTn5bwmV8J7d0_6iqrjee_toWiGnTsDSFaKVkev7tKKV3ERLFwzTPtNf2Rm99EBbdA75FvsXfBk3WXuVop4GZbN3ZGkd2SssFJaw9AgTHmM1k3C2bnB_STO_w/https%3A%2F%2Fyoutu.be%2Fetx0k1nLn78 Just watched this video and found it to be delightfully enlightening and entertaining. (Thank you Martin for posting the link.) However a question springs to mind: why is it the case that Trump's vote counts in Chicago *do* seem to follow Benford's law (at least roughly) when, as is apparently to be expected, Biden's don't? Has anyone any explanation for this? Any ideas? cheers, Rolf Turner __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] analyzing results from Tuesday's US elections
Benford Analysis for Data Validation and Forensic Analytics Provides tools that make it easier to validate data using Benford's Law. https://www.rdocumentation.org/packages/benford.analysis/versions/0.1.5 Matthew On 11/9/20 9:23 AM, Alexandra Thorn wrote: > External Email - Use Caution > > This thread strikes me as pretty far off-topic for a forum dedicated to > software support on R. > > https://secure-web.cisco.com/15MzwKoUQfDzeGBDx9gweXKgiYtAPv1UlnW2dg9CuDtSNWgxy3ffTf_uuPizbjoJnovoOD6lrPDluOgGvIUTEF1d_rOTfaF3nUKLvNiZa3fHZ_IHD-SjKotr4lurHjmNPlSrljLipPsrDk2aoo63-GLwvaw64By_MnLST7lt4FgA2pYXgE3x15Xn-kRZ85m29f0BxhHJMVfilvVUoUEBPrw/https%3A%2F%2Fwww.r-project.org%2Fmail.html%23instructions > "The ‘main’ R mailing list, for discussion about problems and solutions > using R, announcements (not covered by ‘R-announce’ or ‘R-packages’, > see above), about the availability of new functionality for R and > documentation of R, comparison and compatibility with S-plus, and for > the posting of nice examples and benchmarks. Do read the posting guide > before sending anything!" > > https://secure-web.cisco.com/1V05G8mWSPHU-YvLbL-UQMy49XX7n7-EivE-gTOlh2nZ3P0oxp6DGUUZQ_Q5VIkE3J0qmhrrSXxJaqZjv-Tllghba8lQrbkazuAHTcltsfo3I-C-SMqhb-CDdFbeEgIsr7py_gKW9BqumTZacywhHVnzhGGR2s1A-2akqQLYSYpYeX5EcVJAYvX1KPCs9kJbOEveOr5yYjetokaZpLTzdMA/https%3A%2F%2Fwww.r-project.org%2Fposting-guide.html > "The R mailing lists are primarily intended for questions and > discussion about the R software. However, questions about statistical > methodology are sometimes posted. If the question is well-asked and of > interest to someone on the list, it may elicit an informative > up-to-date answer. See also the Usenet groups sci.stat.consult (applied > statistics and consulting) and sci.stat.math (mathematical stat and > probability)." > > On Mon, 9 Nov 2020 00:53:46 -0500 > Matthew McCormack wrote: > >> You can try here: >> https://secure-web.cisco.com/17WRivozTB0Frts23cTlTBd3SYWzVXQsLa_jDRN8SldAl35F0SYXRMZczzIXrQFTzbfRV4YfPOVhMSwopcdTU9Sva396s3bX3-KM7-51KjSnY0aXxlADYaHdvs4y4YXrUfk1GT2801rVL26MCEEn2E1azdQ8ECllu1roS_Z8MIj8d6kyCtUYVdOYN1i9DuWBSXPlEi-iOtrQsBp6ELRXNFw/https%3A%2F%2Fdecisiondeskhq.com%2F >> >> I think they have what you are looking for. From their website: >> >> "Create a FREE account to access up to the minute election results >> and insights on all U.S. Federal elections. Decision Desk HQ & >> Øptimus provide live election night coverage, race-specific results >> including county-level returns, and exclusive race probabilities for >> key battleground races." >> >> Also, this article provides a little, emphasis on little, of >> statistical analysis of election results, but it may be a place to >> start. >> >> https://secure-web.cisco.com/1JA34S9tw27K78g7scwo2aGe4lPpV7HThBE81hhJjb4Ban7fxqbnOZqx7HxfcyqKrcB5BX7oJFHhMPumrxjm6aQJ0trW1Jgk0h9s2mNhZg4T_gTUls8y4l0KZ-AstUtw0eC0TtR9mHblU7KWid-7OO4mg0TfsxWyNpcqkA8MBuGftOEgUF7WtakShYgmCNYJkEfQJHK5_vjwK0taJeUheVw/https%3A%2F%2Fwww.theepochtimes.com%2Fstatistical-anomalies-in-biden-votes-analyses-indicate_3570518.html%3Futm_source%3Dnewsnoe%26utm_medium%3Demail%26utm_campaign%3Dbreaking-2020-11-08-5 >> >> Matthew >> >> On 11/8/20 11:25 PM, Bert Gunter wrote: >>> External Email - Use Caution >>> >>> NYT had interactive maps that reported votes by county. So try >>> contacting them. >>> >>> >>> Bert >>> >>> On Sun, Nov 8, 2020, 8:10 PM Abby Spurdle >>> wrote: >>>>> such a repository already exists -- the NY Times, AP, CNN, etc. >>>>> etc. >>>> already have interactive web pages that did this >>>> >>>> I've been looking for presidential election results, by >>>> ***county***. I've found historic results, including results for >>>> 2016. >>>> >>>> However, I can't find such a dataset, for 2020. >>>> (Even though this seems like an obvious thing to publish). >>>> >>>> I suspect that the NY Times has the data, but I haven't been able >>>> to work where the data is on their website, or how to access it. >>>> >>>> More ***specific*** suggestions would be appreciated...? >>>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://secure-web.cisco.com/1C8m4dUQtDXEQdbAFTH153ehiJcvHuL_FkvDGeJBHhMRYZauAp6gdevfmLIh2MLpRjBx7LXAG9QpagRV63oMY5AyQF6uOkNa7JGw-0zGZKIFHoSuZtjpcIokATDMxqoJlVfCiktqIYXEiJcro
Re: [R] analyzing results from Tuesday's US elections
You can try here: https://decisiondeskhq.com/ I think they have what you are looking for. From their website: "Create a FREE account to access up to the minute election results and insights on all U.S. Federal elections. Decision Desk HQ & Øptimus provide live election night coverage, race-specific results including county-level returns, and exclusive race probabilities for key battleground races." Also, this article provides a little, emphasis on little, of statistical analysis of election results, but it may be a place to start. https://www.theepochtimes.com/statistical-anomalies-in-biden-votes-analyses-indicate_3570518.html?utm_source=newsnoe_medium=email_campaign=breaking-2020-11-08-5 Matthew On 11/8/20 11:25 PM, Bert Gunter wrote: > External Email - Use Caution > > NYT had interactive maps that reported votes by county. So try contacting > them. > > > Bert > > On Sun, Nov 8, 2020, 8:10 PM Abby Spurdle wrote: > >>> such a repository already exists -- the NY Times, AP, CNN, etc. etc. >> already have interactive web pages that did this >> >> I've been looking for presidential election results, by ***county***. >> I've found historic results, including results for 2016. >> >> However, I can't find such a dataset, for 2020. >> (Even though this seems like an obvious thing to publish). >> >> I suspect that the NY Times has the data, but I haven't been able to >> work where the data is on their website, or how to access it. >> >> More ***specific*** suggestions would be appreciated...? >> > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://secure-web.cisco.com/1C8m4dUQtDXEQdbAFTH153ehiJcvHuL_FkvDGeJBHhMRYZauAp6gdevfmLIh2MLpRjBx7LXAG9QpagRV63oMY5AyQF6uOkNa7JGw-0zGZKIFHoSuZtjpcIokATDMxqoJlVfCiktqIYXEiJcrovbnxo-DAgLEiREocQrn0yMbLc2A-gwR3CN9XurWkU21TUD1CLJ-3gpiCLKKe9BdHWdaeEA/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help > PLEASE do read the posting guide > http://secure-web.cisco.com/1ppZyk8SO6U25PKNDKtGQ-VIADLxXgKvnHc8QlV3cUMNPzLQvS8E0i9cg05EyzUyHnFjj2QWDjvAjyuduvE1P8Nr0TogQweiuBysM9a1rXjQn1EOaypHdqwa2_inODK1icu0Ff33AZDB00N4x-nYxZ2e16nArVuaMEddaLXBhtBYMn2LAcPYJ8s2wGN10heiFWywn-r8--Hw77GJx1hkTgg/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RNA Seq Analysis in R
As with the previous post, I agree that Bioconductor will be a better place to ask this question. As a quick thought you also might try to adjust the p-value in the last line: DEGs = subset(tT, P.Value < 0.01 & abs(logFC) > 2). You could play around with the P.Value, 0.01 is pretty low, you could try 0.05 and maybe abs(logFC) > 1. But, first you should try to print out tT with something like write.table(tT, file = TopTable.txt, sep = "\t"). This will write out tT to a tab-delimited text file (in the directory that you are working in) that you can import into Excel and then inspect the logFC and p-values for the top 1250 genes. Matthew On 8/1/20 1:13 PM, Jeff Newmiller wrote: > External Email - Use Caution > > https://www.bioconductor.org/help/ > > On August 1, 2020 4:01:08 AM PDT, Anas Jamshed > wrote: >> I choose microarray data GSE75693 of 30 patients with stable kidney >> transplantation and 15 with BKVN to identify differentially expressed >> genes >> (DEGs). I performed this in GEO2R and find R script there and Runs R >> script >> Successfully on R studio as well. The R script is : >> >> # Differential expression analysis with limma >> >> library(Biobase) >> library(GEOquery) >> library(limma) >> # load series and platform data from GEO >> >> gset <- getGEO("GSE75693", GSEMatrix =TRUE, AnnotGPL=TRUE)if >> (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx >> <- 1 >> gset <- gset[[idx]] >> # make proper column names to match toptable >> fvarLabels(gset) <- make.names(fvarLabels(gset)) >> # group names for all samples >> gsms <- paste0("00XXX1", >> "11XXX") >> sml <- c()for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) } >> # eliminate samples marked as "X" >> sel <- which(sml != "X") >> sml <- sml[sel] >> gset <- gset[ ,sel] >> # log2 transform >> exprs(gset) <- log2(exprs(gset)) >> # set up the data and proceed with analysis >> sml <- paste("G", sml, sep="")# set group names >> fl <- as.factor(sml) >> gset$description <- fl >> design <- model.matrix(~ description + 0, gset) >> colnames(design) <- levels(fl) >> fit <- lmFit(gset, design) >> cont.matrix <- makeContrasts(G1-G0, levels=design) >> fit2 <- contrasts.fit(fit, cont.matrix) >> fit2 <- eBayes(fit2, 0.01) >> tT <- topTable(fit2, adjust="fdr", sort.by="B", number=1250) >> >> tT <- subset(tT, >> select=c("ID","adj.P.Val","P.Value","t","B","logFC","Gene.symbol","Gene.title")) >> DEGs = subset(tT, P.Value < 0.01 & abs(logFC) > 2) >> >> After running this no genes are found plz help me >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Re: transpose and split dataframe
Thank you very much Jim and David for your scripts and accompanying explanations. I was intrigued at the results that came from David's script. As seen below where I have taken a small piece of his DataTable: AT1G69490 AT1G29860 AT4G18170 *AT5G46350* AT1G01560 0 0 0 1 *AT1G02920* 1 2 2 4 AT1G02930 1 2 2 4 AT1G05675 1 1 1 2 There are numbers other than 1 or 0, which was not what I was expecting. The data I am working with come from downloading results of an analysis done at a particular web site. I looked at Jim's solution, and the equivalent of the above would be: AT1G69490 _AT1G29860_ _AT1G29860_ AT4G18170 AT4G18170 *AT5G46350 AT5G46350 AT5G46350 AT5G46350 AT5G46350* AT1G01560 NA NA NA NA NA NA NA NA AT1G01560 NA *AT1G02920* AT1G02920 AT1G02920 AT1G02920 AT1G02920 AT1G02920 AT1G02920 AT1G02920 AT1G02920 AT1G02920 NA AT1G02930 AT1G02930 AT1G02930 AT1G02930 AT1G02930 AT1G02930 AT1G02930 AT1G02930 AT1G02930 AT1G02930 NA AT1G05675 AT1G05675 AT1G05675 NA AT1G05675 NA AT1G05675 AT1G05675 NA NA NA The above is the format that I was desiring, but I was not expecting that a single ATG number would be the name of multiple columns. As shown above, _AT1G2960_ is the name of two columns and *AT5G46350* is the name of 5 columns (You may have to widen the e-mail across the screen to see it clearly). When a single ATG number, such as AT5G46350, names multiple columns, then the contents of each of those columns may or may not be the same. For example, going across a single row looking at *AT1G02920*, it occurs in the first column, hence the 1 in David's DataTable. It occurs in both AT1G29860 columns, hence the 2 in the DataTable. It again occurs in both AT4G18170 columns, so another 2 in the DataTable, and finally it occurs in only 4 of the 5 AT5G46350 columns, so the 4 in the DataTable. When the same ATG number names multiple columns it is because different methods were used to determine the content of each column. So, if an ATG number such as AT1G05675 occurs in all columns with the same name, I then know that it was by multiple methods that this has been shown, and if it only occurs in some of the columns, I know that all methods did not associate it with the column name ATG. David's result complements Jim's, and both end up being very helpful to me. Thanks again to both of you for your time and help. Matthew On 5/2/2019 8:40 PM, Jim Lemon wrote: > External Email - Use Caution > > Hi again, > Just noticed that the NA fill in the original solution is unnecessary, thus: > > # split the second column at the commas > hitsplit<-strsplit(mmdf$hits,",") > # get all the sorted hits > allhits<-sort(unique(unlist(hitsplit))) > tmmdf<-as.data.frame(matrix(NA,ncol=length(hitsplit),nrow=length(allhits))) > # change the names of the list > names(tmmdf)<-mmdf$Regulator > for(column in 1:length(hitsplit)) { > hitmatches<-match(hitsplit[[column]],allhits) > hitmatches<-hitmatches[!is.na(hitmatches)] > tmmdf[hitmatches,column]<-allhits[hitmatches] > } > > Jim > > On Fri, May 3, 2019 at 10:32 AM Jim Lemon wrote: >> Hi Matthew, >> I'm not sure whether you want something like your initial request or >> David's solution. The result of this can be transformed into the >> latter: >> >> mmdf<-read.table(text="Regulator hits >> AT1G69490 >> AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830 >> AT1G29860 >> AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135 >> AT1G2986 >> AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620
Re: [R] Fwd: Re: transpose and split dataframe
Thank you very much, David and Jim for your work and solutions. I have been working through both of them to better learn R. They both proceed through a similar logic except David's starts with a character matrix and Jim's with a dataframe, and both end with equivalent dataframes ( identical(tmmdf, TF2list2)) returns TRUE ). They have both been very helpful. However, there is one attribute of my intended final dataframe that is missing. Looking at part of the final dataframe: head(tmmdf) AT1G69490 AT1G29860 AT1G29860.1 AT4G18170 AT4G18170.1 AT5G46350 1 *AT4G31950* *AT4G31950* AT5G64905 *AT4G31950* AT5G64905 *AT4G31950* 2 AT5G24110 AT5G24110 AT1G21120 AT5G24110 AT1G14540 AT5G24110 3 AT1G26380 AT1G05675 AT1G07160 AT1G05675 AT1G21120 AT1G05675 Row 1 has *AT4G31950* in columns 1,2,4 and 6, but AT4G31950 in columns 3 and 5. What I was aiming at would be that each row would have a unique entry so that AT4G31950 is row 1 columns 1,2,4 and 6, and NA is row 1 columns 3 and 5. AT4G31950 is row 2 columns 3 and 5 and NA is row 2 columns 1,2,4 and 6. So, it would look like this: head(intended_df) AT1G69490 AT1G29860 AT1G29860.1 AT4G18170 AT4G18170.1 AT5G46350 1 AT4G31950 AT4G31950 NA AT4G31950 NA AT4G31950 2 NA NA AT4G31950 NA AT4G31950 NA I have been trying to adjust the code to get my intended result basically by trying to build a dataframe one column at a time from each entry in the character matrix, but have not got anything near working yet. Matthew On 4/30/2019 6:29 PM, David L Carlson wrote > If you read the data frame with read.csv() or one of the other read() > functions, use the asis=TRUE argument to prevent conversion to factors. If > not do the conversion first: > > # Convert factors to characters > DataMatrix <- sapply(TF2list, as.character) > # Split the vector of hits > DataList <- sapply(DataMatrix[, 2], strsplit, split=",") > # Use the values in Regulator to name the parts of the list > names(DataList) <- DataMatrix[,"Regulator"] > > # Now create a data frame > # How long is the longest list of hits? > mx <- max(sapply(DataList, length)) > # Now add NAs to vectors shorter than mx > DataList2 <- lapply(DataList, function(x) c(x, rep(NA, mx-length(x > # Finally convert back to a data frame > TF2list2 <- do.call(data.frame, DataList2) > > Try this on a portion of the list, say 25 lines and print each object to see > what is happening. > > > David L Carlson > Department of Anthropology > Texas A University > College Station, TX 77843-4352 > > > > > > -Original Message- > From: R-help On Behalf Of Matthew > Sent: Tuesday, April 30, 2019 4:31 PM > To: r-help@r-project.org > Subject: [R] Fwd: Re: transpose and split dataframe > > Thanks for your reply. I was trying to simplify it a little, but must > have got it wrong. Here is the real dataframe, TF2list: > > str(TF2list) > 'data.frame': 152 obs. of 2 variables: > $ Regulator: Factor w/ 87 levels "AT1G02065","AT1G13960",..: 17 6 6 54 > 54 82 82 82 82 82 ... > $ hits : Factor w/ 97 levels > "AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"| > __truncated__,..: 65 57 90 57 87 57 56 91 31 17 ... > > And the first few lines resulting from dput(head(TF2list)): > > dput(head(TF2list)) > structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L, > 82L), .Label = c("AT1G02065", "AT1G13960", "AT1G18860", "AT1G23380", > "AT1G29280", "AT1G29860", "AT1G30650", "AT1G55600", "AT1G62300", > "AT1G62990", "AT1G64000", "AT1G66550", "AT1G66560", "AT1G66600", > "AT1G68150", "AT1G69310", "AT1G69490", "AT1G69810", "AT1G70510", ... > > This is another way of looking at the first 4 entries (Regulator is > tab-separated from hits): > > Regulator > hits > 1 > AT1G69490 > > AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830 > 2 > AT1G29860 > > AT4G31950,AT5G24110,AT1G0567
[R] Fwd: Re: transpose and split dataframe
Thanks for your reply. I was trying to simplify it a little, but must have got it wrong. Here is the real dataframe, TF2list: str(TF2list) 'data.frame': 152 obs. of 2 variables: $ Regulator: Factor w/ 87 levels "AT1G02065","AT1G13960",..: 17 6 6 54 54 82 82 82 82 82 ... $ hits : Factor w/ 97 levels "AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"| __truncated__,..: 65 57 90 57 87 57 56 91 31 17 ... And the first few lines resulting from dput(head(TF2list)): dput(head(TF2list)) structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L, 82L), .Label = c("AT1G02065", "AT1G13960", "AT1G18860", "AT1G23380", "AT1G29280", "AT1G29860", "AT1G30650", "AT1G55600", "AT1G62300", "AT1G62990", "AT1G64000", "AT1G66550", "AT1G66560", "AT1G66600", "AT1G68150", "AT1G69310", "AT1G69490", "AT1G69810", "AT1G70510", ... This is another way of looking at the first 4 entries (Regulator is tab-separated from hits): Regulator hits 1 AT1G69490 AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830 2 AT1G29860 AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135 3 AT1G2986 AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620,AT2G44370,AT4G15975,AT1G35210,AT5G46295,AT1G11925,AT2G39200,AT1G02920,AT4G14370,AT4G35180,AT4G15417,AT2G18690,AT5G11140,AT1G06135,AT5G42830 So, the goal would be to first: Transpose the existing dataframe so that the factor Regulator becomes a column name (column 1 name = AT1G69490, column2 name AT1G29860, etc.) and the hits associated with each Regulator become rows. Hits is a comma separated 'list' ( I do not not know if technically it is an R list.), so it would have to be comma 'unseparated' with each entry becoming a row (col 1 row 1 = AT4G31950, col 1 row 2 - AT5G24410, etc); like this : AT1G69490 AT4G31950 AT5G24110 AT1G05675 AT5G64905 ... I did not include all the rows) I think it would be best to actually make the first entry a separate dataframe ( 1 column with name = AT1G69490 and number of rows depending on the number of hits), then make the second column (column name = AT1G29860, and number of rows depending on the number of hits) into a new dataframe and do a full join of of the two dataframes; continue by making the third column (column name = AT1G2986) into a dataframe and full join it with the previous; continue for the 152 observations so that then end result is a dataframe with 152 columns and number of rows depending on the entry with the greatest number of hits. The full joins I can do with dplyr, but getting up to that point seems rather difficult. This would get me what my ultimate goal would be; each Regulator is a column name (152 columns) and a given row has either NA or the same hit. This seems very difficult to me, but I appreciate any attempt. Matthew On 4/30/2019 4:34 PM, David L Carlson wrote: > External Email - Use Caution > > I think we need more information. Can you give us the structure of the data > with str(YourDataFrame). Alternatively you could copy a small piece into your > email message by copying and pasting the results of the following code: > > dput(head(YourDataFrame)) > > The data frame you present could not be a data frame since you say "hits" is > a factor with a variable number of elements. If each value of "hits" was a > single character string, it would only have 2 factor levels not 6 and your > efforts to parse the string would make more sense. Transposing to a data > frame would only be possible if each column was padded
[R] transpose and split dataframe
I have a data frame that is a lot bigger but for simplicity sake we can say it looks like this: Regulator hits AT1G69490 AT4G31950,AT5G24110,AT1G26380,AT1G05675 AT2G55980 AT2G85403,AT4G89223 In other words: data.frame : 2 obs. of 2 variables $Regulator: Factor w/ 2 levels $hits : Factor w/ 6 levels I want to transpose it so that Regulator is now the column headings and each of the AGI numbers now separated by commas is a row. So, AT1G69490 is now the header of the first column and AT4G31950 is row 1 of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of column 2 and AT2G85403 is row 1 of column 2, etc. I have tried playing around with strsplit(TF2list[2:2]) and strsplit(as.character(TF2list[2:2]), but I am getting nowhere. Matthew __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Define pch and color based on two different columns
You are not late to the party. And you solved it! Thank you very much. You just made my PhD a little closer to reality! Matt *Matthew R. Snyder* *~* PhD Candidate University Fellow University of Toledo Computational biologist, ecologist, and bioinformatician Sponsored Guest Researcher at NOAA PMEL, Seattle, WA. matthew.snyd...@rockets.utoledo.edu msnyder...@gmail.com [image: Mailtrack] <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;> Sender notified by Mailtrack <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;> 04/09/19, 10:01:53 PM On Tue, Apr 9, 2019 at 9:37 PM Peter Langfelder wrote: > Sorry for being late to the party, but has anyone suggested a minor > but important modification of the code from stack exchange? > > xyplot(mpg ~ wt | cyl, > panel = function(x, y, ..., groups, subscripts) { > pch <- mypch[factor(carb)[subscripts]] > col <- mycol[factor(gear)[subscripts]] > grp <- c(gear,carb) > panel.xyplot(x, y, pch = pch, col = col) > } > ) > > From the little I understand about what you're trying to do, this may > just do the trick. > > Peter > > On Tue, Apr 9, 2019 at 2:43 PM Matthew Snyder > wrote: > > > > I am making a lattice plot and I would like to use the value in one > column > > to define the pch and another column to define color of points. Something > > like: > > > > xyplot(mpg ~ wt | cyl, > >data=mtcars, > >col = gear, > >pch = carb > > ) > > > > There are unique pch points in the second and third panels, but these > > points are only unique within the plots, not among all the plots (as they > > should be). You can see this if you use the following code: > > > > xyplot(mpg ~ wt | cyl, > >data=mtcars, > >groups = carb > > ) > > > > This plot looks great for one group, but if you try to invoke two groups > > using c(gear, carb) I think it simply takes unique combinations of those > > two variables and plots them as unique colors. > > > > Another solution given by a StackExchange user: > > > > mypch <- 1:6 > > mycol <- 1:3 > > > > xyplot(mpg ~ wt | cyl, > > panel = function(x, y, ..., groups, subscripts) { > > pch <- mypch[factor(carb[subscripts])] > > col <- mycol[factor(gear[subscripts])] > > grp <- c(gear,carb) > > panel.xyplot(x, y, pch = pch, col = col) > > } > > ) > > > > This solution has the same problems as the code at the top. I think the > > issue causing problems with both solutions is that not every value for > each > > group is present in each panel, and they are almost never in the same > > order. I think R is just interpreting the appearance of unique values as > a > > signal to change to the next pch or color. My actual data file is very > > large, and it's not possible to sort my way out of this mess. It would be > > best if I could just use the value in two columns to actually define a > > color or pch for each point on an entire plot. Is there a way to do this? > > > > Ps, I had to post this via email because the Nabble site kept sending me > an > > error message: "Message rejected by filter rule match" > > > > Thanks, > > Matt > > > > > > > > *Matthew R. Snyder* > > *~* > > PhD Candidate > > University Fellow > > University of Toledo > > Computational biologist, ecologist, and bioinformatician > > Sponsored Guest Researcher at NOAA PMEL, Seattle, WA. > > matthew.snyd...@rockets.utoledo.edu > > msnyder...@gmail.com > > > > > > > > [image: Mailtrack] > > < > https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5; > > > > Sender > > notified by > > Mailtrack > > < > https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5; > > > > 04/09/19, > > 1:49:27 PM > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Define pch and color based on two different columns
I tried this too: xyplot(mpg ~ wt | cyl, data=mtcars, # groups = carb, subscripts = TRUE, col = as.factor(mtcars$gear), pch = as.factor(mtcars$carb) ) Same problem... *Matthew R. Snyder* *~* PhD Candidate University Fellow University of Toledo Computational biologist, ecologist, and bioinformatician Sponsored Guest Researcher at NOAA PMEL, Seattle, WA. matthew.snyd...@rockets.utoledo.edu msnyder...@gmail.com [image: Mailtrack] <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;> Sender notified by Mailtrack <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;> 04/09/19, 9:28:11 PM On Tue, Apr 9, 2019 at 8:18 PM Jeff Newmiller wrote: > Maybe you should use factors rather than character columns. > > On April 9, 2019 8:09:43 PM PDT, Matthew Snyder > wrote: > >Thanks, Jim. > > > >I appreciate your contributed answer, but neither of those make the > >desired > >plot either. I'm actually kind of shocked this isn't an easier more > >straightforward thing. It seems like this would be something that a > >user > >would want to do frequently. I can actually do this for single plots in > >ggplot. Maybe I should contact the authors of lattice and see if this > >is > >something they can help me with or if they would like to add this as a > >feature in the future... > > > >Matt > > > > > > > >*Matthew R. Snyder* > >*~* > >PhD Candidate > >University Fellow > >University of Toledo > >Computational biologist, ecologist, and bioinformatician > >Sponsored Guest Researcher at NOAA PMEL, Seattle, WA. > >matthew.snyd...@rockets.utoledo.edu > >msnyder...@gmail.com > > > > > > > >[image: Mailtrack] > >< > https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5; > > > >Sender > >notified by > >Mailtrack > >< > https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5; > > > >04/09/19, > >7:52:27 PM > > > >On Tue, Apr 9, 2019 at 4:53 PM Jim Lemon wrote: > > > >> Hi Matthew, > >> How about this? > >> > >> library(lattice) > >> xyplot(mpg ~ wt | cyl, > >>data=mtcars, > >>col = mtcars$gear, > >>pch = mtcars$carb > >> ) > >> library(plotrix) > >> grange<-range(mtcars$gear) > >> xyplot(mpg ~ wt | cyl, > >>data=mtcars, > >>col = > >> color.scale(mtcars$gear,extremes=c("blue","red"),xrange=grange), > >>pch = as.character(mtcars$carb) > >> ) > >> > >> Jim > >> > >> On Wed, Apr 10, 2019 at 7:43 AM Matthew Snyder > >> wrote: > >> > > >> > I am making a lattice plot and I would like to use the value in one > >> column > >> > to define the pch and another column to define color of points. > >Something > >> > like: > >> > > >> > xyplot(mpg ~ wt | cyl, > >> >data=mtcars, > >> >col = gear, > >> >pch = carb > >> > ) > >> > > >> > There are unique pch points in the second and third panels, but > >these > >> > points are only unique within the plots, not among all the plots > >(as they > >> > should be). You can see this if you use the following code: > >> > > >> > xyplot(mpg ~ wt | cyl, > >> >data=mtcars, > >> >groups = carb > >> > ) > >> > > >> > This plot looks great for one group, but if you try to invoke two > >groups > >> > using c(gear, carb) I think it simply takes unique combinations of > >those > >> > two variables and plots them as unique colors. > >> > > >> > Another solution given by a StackExchange user: > >> > > >> > mypch <- 1:6 > >> > mycol <- 1:3 > >> > > >> > xyplot(mpg ~ wt | cyl, > >> > panel = function(x, y, ..., groups, subscripts) { > >> > pch <- mypch[factor(carb[subscripts])] > >> > col <- mycol[factor(gear[subscripts])] > >> > grp <- c(gear,carb) > >> > panel.xyplot(x, y, pch = pch, col = col) > >> > } > >> > ) > >> > > >> > This solution has t
Re: [R] Define pch and color based on two different columns
I want to have one column in a dataframe define the color and another define the pch. This can be done easily with a single panel: xyplot(mpg ~ wt, data=mtcars, col = mtcars$gear, pch = mtcars$carb ) This produces the expected result: two pch that are the same color are unique in the whole plot. But when you add cyl as a factor. Those two points are only unique within their respective panels, and not across the whole plot. Matt *Matthew R. Snyder* *~* PhD Candidate University Fellow University of Toledo Computational biologist, ecologist, and bioinformatician Sponsored Guest Researcher at NOAA PMEL, Seattle, WA. matthew.snyd...@rockets.utoledo.edu msnyder...@gmail.com [image: Mailtrack] <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;> Sender notified by Mailtrack <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;> 04/09/19, 9:26:09 PM On Tue, Apr 9, 2019 at 9:23 PM Bert Gunter wrote: > 1. I am quite sure that whatever it is that you want to do can be done. > Probably straightforwardly. The various R graphics systems are mature and > extensive. > > 2. But I, for one, do not understand from your post what it is that you > want to do. Nor does anyone else apparently. > > Cheers, > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Apr 9, 2019 at 8:10 PM Matthew Snyder > wrote: > >> Thanks, Jim. >> >> I appreciate your contributed answer, but neither of those make the >> desired >> plot either. I'm actually kind of shocked this isn't an easier more >> straightforward thing. It seems like this would be something that a user >> would want to do frequently. I can actually do this for single plots in >> ggplot. Maybe I should contact the authors of lattice and see if this is >> something they can help me with or if they would like to add this as a >> feature in the future... >> >> Matt >> >> >> >> *Matthew R. Snyder* >> *~* >> PhD Candidate >> University Fellow >> University of Toledo >> Computational biologist, ecologist, and bioinformatician >> Sponsored Guest Researcher at NOAA PMEL, Seattle, WA. >> matthew.snyd...@rockets.utoledo.edu >> msnyder...@gmail.com >> >> >> >> [image: Mailtrack] >> < >> https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5; >> > >> Sender >> notified by >> Mailtrack >> < >> https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5; >> > >> 04/09/19, >> 7:52:27 PM >> >> On Tue, Apr 9, 2019 at 4:53 PM Jim Lemon wrote: >> >> > Hi Matthew, >> > How about this? >> > >> > library(lattice) >> > xyplot(mpg ~ wt | cyl, >> >data=mtcars, >> >col = mtcars$gear, >> >pch = mtcars$carb >> > ) >> > library(plotrix) >> > grange<-range(mtcars$gear) >> > xyplot(mpg ~ wt | cyl, >> >data=mtcars, >> >col = >> > color.scale(mtcars$gear,extremes=c("blue","red"),xrange=grange), >> >pch = as.character(mtcars$carb) >> > ) >> > >> > Jim >> > >> > On Wed, Apr 10, 2019 at 7:43 AM Matthew Snyder >> > wrote: >> > > >> > > I am making a lattice plot and I would like to use the value in one >> > column >> > > to define the pch and another column to define color of points. >> Something >> > > like: >> > > >> > > xyplot(mpg ~ wt | cyl, >> > >data=mtcars, >> > >col = gear, >> > >pch = carb >> > > ) >> > > >> > > There are unique pch points in the second and third panels, but these >> > > points are only unique within the plots, not among all the plots (as >> they >> > > should be). You can see this if you use the following code: >> > > >> > > xyplot(mpg ~ wt | cyl, >> > >data=mtcars, >> > >groups = carb >> > > ) >> > > >> > > This plot looks great for one group, but if you try to invoke two >> groups >> > > using c(gear, carb) I think it simply takes unique combinations of >> those >> > > two variables and plots them as uniq
Re: [R] Define pch and color based on two different columns
Thanks, Jim. I appreciate your contributed answer, but neither of those make the desired plot either. I'm actually kind of shocked this isn't an easier more straightforward thing. It seems like this would be something that a user would want to do frequently. I can actually do this for single plots in ggplot. Maybe I should contact the authors of lattice and see if this is something they can help me with or if they would like to add this as a feature in the future... Matt *Matthew R. Snyder* *~* PhD Candidate University Fellow University of Toledo Computational biologist, ecologist, and bioinformatician Sponsored Guest Researcher at NOAA PMEL, Seattle, WA. matthew.snyd...@rockets.utoledo.edu msnyder...@gmail.com [image: Mailtrack] <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;> Sender notified by Mailtrack <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;> 04/09/19, 7:52:27 PM On Tue, Apr 9, 2019 at 4:53 PM Jim Lemon wrote: > Hi Matthew, > How about this? > > library(lattice) > xyplot(mpg ~ wt | cyl, >data=mtcars, >col = mtcars$gear, >pch = mtcars$carb > ) > library(plotrix) > grange<-range(mtcars$gear) > xyplot(mpg ~ wt | cyl, >data=mtcars, >col = > color.scale(mtcars$gear,extremes=c("blue","red"),xrange=grange), > pch = as.character(mtcars$carb) > ) > > Jim > > On Wed, Apr 10, 2019 at 7:43 AM Matthew Snyder > wrote: > > > > I am making a lattice plot and I would like to use the value in one > column > > to define the pch and another column to define color of points. Something > > like: > > > > xyplot(mpg ~ wt | cyl, > >data=mtcars, > >col = gear, > >pch = carb > > ) > > > > There are unique pch points in the second and third panels, but these > > points are only unique within the plots, not among all the plots (as they > > should be). You can see this if you use the following code: > > > > xyplot(mpg ~ wt | cyl, > >data=mtcars, > >groups = carb > > ) > > > > This plot looks great for one group, but if you try to invoke two groups > > using c(gear, carb) I think it simply takes unique combinations of those > > two variables and plots them as unique colors. > > > > Another solution given by a StackExchange user: > > > > mypch <- 1:6 > > mycol <- 1:3 > > > > xyplot(mpg ~ wt | cyl, > > panel = function(x, y, ..., groups, subscripts) { > > pch <- mypch[factor(carb[subscripts])] > > col <- mycol[factor(gear[subscripts])] > > grp <- c(gear,carb) > > panel.xyplot(x, y, pch = pch, col = col) > > } > > ) > > > > This solution has the same problems as the code at the top. I think the > > issue causing problems with both solutions is that not every value for > each > > group is present in each panel, and they are almost never in the same > > order. I think R is just interpreting the appearance of unique values as > a > > signal to change to the next pch or color. My actual data file is very > > large, and it's not possible to sort my way out of this mess. It would be > > best if I could just use the value in two columns to actually define a > > color or pch for each point on an entire plot. Is there a way to do this? > > > > Ps, I had to post this via email because the Nabble site kept sending me > an > > error message: "Message rejected by filter rule match" > > > > Thanks, > > Matt > > > > > > > > *Matthew R. Snyder* > > *~* > > PhD Candidate > > University Fellow > > University of Toledo > > Computational biologist, ecologist, and bioinformatician > > Sponsored Guest Researcher at NOAA PMEL, Seattle, WA. > > matthew.snyd...@rockets.utoledo.edu > > msnyder...@gmail.com > > > > > > > > [image: Mailtrack] > > < > https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5; > > > > Sender > > notified by > > Mailtrack > > < > https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5; > > > > 04/09/19, > > 1:49:27 PM > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Define pch and color based on two different columns
I am making a lattice plot and I would like to use the value in one column to define the pch and another column to define color of points. Something like: xyplot(mpg ~ wt | cyl, data=mtcars, col = gear, pch = carb ) There are unique pch points in the second and third panels, but these points are only unique within the plots, not among all the plots (as they should be). You can see this if you use the following code: xyplot(mpg ~ wt | cyl, data=mtcars, groups = carb ) This plot looks great for one group, but if you try to invoke two groups using c(gear, carb) I think it simply takes unique combinations of those two variables and plots them as unique colors. Another solution given by a StackExchange user: mypch <- 1:6 mycol <- 1:3 xyplot(mpg ~ wt | cyl, panel = function(x, y, ..., groups, subscripts) { pch <- mypch[factor(carb[subscripts])] col <- mycol[factor(gear[subscripts])] grp <- c(gear,carb) panel.xyplot(x, y, pch = pch, col = col) } ) This solution has the same problems as the code at the top. I think the issue causing problems with both solutions is that not every value for each group is present in each panel, and they are almost never in the same order. I think R is just interpreting the appearance of unique values as a signal to change to the next pch or color. My actual data file is very large, and it's not possible to sort my way out of this mess. It would be best if I could just use the value in two columns to actually define a color or pch for each point on an entire plot. Is there a way to do this? Ps, I had to post this via email because the Nabble site kept sending me an error message: "Message rejected by filter rule match" Thanks, Matt *Matthew R. Snyder* *~* PhD Candidate University Fellow University of Toledo Computational biologist, ecologist, and bioinformatician Sponsored Guest Researcher at NOAA PMEL, Seattle, WA. matthew.snyd...@rockets.utoledo.edu msnyder...@gmail.com [image: Mailtrack] <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;> Sender notified by Mailtrack <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;> 04/09/19, 1:49:27 PM [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a dataframe with full_join and looping over a list of lists.
This is fantastic ! It was exactly what I was looking for. It is part of a larger Shiny app, so difficult to provide a working example as part of the post, and after figuring out how your code works ( I am an R novice), I made a couple of small tweaks and it works great ! Thank you very much, Jim, for the work you put into this. Matthew On 3/21/2019 11:01 PM, Jim Lemon wrote: External Email - Use Caution Hi Matthew, Remember, keep it on the list so that people know the status of the request. I couldn't get this to work with the "_source_info_" variable. It seems to be unreadable as a variable name. So, this _may_ be what you want. I don't know if it can be done with "merge" and I don't know the function "full_join". WRKY8_colamp_a<-as.character( c("AT1G02920","AT1G06135","AT1G07160","AT1G11925","AT1G14540","AT1G16150", "AT1G21120","AT1G26380","AT1G26410","AT1G35210","AT1G49000","AT1G51920", "AT1G56250","AT1G66090","AT1G72520","AT1G80840","AT2G02010","AT2G18690", "AT2G30750","AT2G39200","AT2G43620","AT3G01830","AT3G54150","AT3G55840", "AT4G03460","AT4G11470","AT4G11890","AT4G14370","AT4G15417","AT4G15975", "AT4G31940","AT4G35180","AT5G01540","AT5G05300","AT5G11140","AT5G24110", "AT5G25250","AT5G36925","AT5G46295","AT5G64750","AT5G64905","AT5G66020")) bHLH10_col_a<-as.character(c("AT1G72520","AT3G55840","AT5G20230","AT5G64750")) bHLH10_colamp_a<-as.character( c("AT1G01560","AT1G02920","AT1G16420","AT1G17147","AT1G35210","AT1G51620", "AT1G57630","AT1G72520","AT2G18690","AT2G19190","AT2G40180","AT2G44370", "AT3G23250","AT3G55840","AT4G03460","AT4G04480","AT4G04540","AT4G08555", "AT4G11470","AT4G11890","AT4G16820","AT4G23280","AT4G35180","AT5G01540", "AT5G05300","AT5G20230","AT5G22530","AT5G24110","AT5G56960","AT5G57010", "AT5G57220","AT5G64750","AT5G66020")) # let myenter be the sorted superset myenter<- sort(unique(c(WRKY8_colamp_a,bHLH10_col_a,bHLH10_colamp_a))) splice<-function(x,y) { nx<-length(x) ny<-length(y) newy<-rep(NA,nx) if(ny) { yi<-1 for(xi in 1:nx) { if(x[xi] == y[yi]) { newy[xi]<-y[yi] yi<-yi+1 } if(yi>ny) break() } } return(newy) } comatgs<-list(WRKY8_colamp_a=WRKY8_colamp_a, bHLH10_col_a=bHLH10_col_a,bHLH10_colamp_a=bHLH10_colamp_a) mydf3<-data.frame(myenter,stringsAsFactors=FALSE) for(j in 1:length(comatgs)) { tmp<-data.frame(splice(myenter,sort(comatgs[[j]]))) names(tmp)<-names(comatgs)[j] mydf3<-cbind(mydf3,tmp) } Jim On Fri, Mar 22, 2019 at 10:29 AM Matthew wrote: Hi Jim, Thanks for the reply. That was pretty dumb of me. I took that out of the loop. comatgs is longer than this but here is a sample of 4 of 569 elements: $WRKY8_colamp_a [1] "AT1G02920" "AT1G06135" "AT1G07160" "AT1G11925" "AT1G14540" "AT1G16150" "AT1G21120" [8] "AT1G26380" "AT1G26410" "AT1G35210" "AT1G49000" "AT1G51920" "AT1G56250" "AT1G66090" [15] "AT1G72520" "AT1G80840" "AT2G02010" "AT2G18690" "AT2G30750" "AT2G39200" "AT2G43620" [22] "AT3G01830" "AT3G54150" "AT3G55840" "AT4G03460" "AT4G11470" "AT4G11890" "AT4G14370" [29] "AT4G15417" "AT4G15975" "AT4G31940" "AT4G35180" "AT5G01540" "AT5G05300" "AT5G11140" [36] "AT5G24110" "AT5G25250" "AT5G36925" "AT5G46295" "AT5G64750" "AT5G64905" "AT5G66020" $`_source_info_` character(0) $bHLH10_col_a [1] "AT1G72520" "AT3G55840" "AT5G20230" "AT5G64750" $bHLH10_colamp_a [1] "AT1G01560" "AT1G02920" "AT1G16420" "AT1G17147" "AT1G35210" "AT1G51620" "AT1G57630" [8] "AT1G72520" "AT2G18690" "AT2G19190" "AT2G40180" "AT2G44370" "AT3G2325
[R] creating a dataframe with full_join and looping over a list of lists.
My apologies, my first e-mail formatted very poorly when sent, so I am trying again with something I hope will be less confusing. I have been trying create a dataframe by looping through a list of lists, and using dplyr's full_join so as to keep common elements on the same row. But, I have a couple of problems. 1) The lists have different numbers of elements. 2) In the final dataframe, I would like the column names to be the names of the lists. Is it possible ? Code: *for(j in avector){mydf3 <- data.frame(myenter) atglsts <- as.data.frame(comatgs[j]) mydf3 <- full_join(mydf3, atglsts) }* Explanation: # Start out with a list, myenter, to dataframe. mydf3 now has 1 column. # This first column will be the longest column in the final mydf3. # Loop through a list of lists, comatgs, and with each loop a particular list # is made into a dataframe of one column, atglsts. # The name of the column is the name of the list. # Each atglsts dataframe has a different number of elements. # What I want to do, is to add the newly made dataframe, atglsts, as a # new column of the data frame, mydf3 using full_join # in order to keep common elements on the same row. # I could rename the colname to 'AGI' so that I can join by 'AGI', # but then I would lose the name of the list. # In the final dataframe, I want to know the name of the original list # the column was made from. Matthew [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] creating a dataframe with full_join and looping over a list of lists
I have been trying create a dataframe by looping through a list of lists, and using dplyr's full_join so as to keep common elements on the same row. But, I have a couple of problems. 1) The lists have different numbers of elements. 2) In the final dataframe, I would like the column names to be the names of the lists. Is it possible ? for(j in avector){ mydf3 <- data.frame(myenter) # Start out with a list, myenter, to dataframe. mydf3 now has 1 column. # This first column will be the longest column in the final mydf3. atglsts <- as.data.frame(comatgs[j]) # Loop through a list of lists, comatgs, and with each loop a particular list # is made into a dataframe of one column, atglsts. # The name of the column is the name of the list. # Each atglsts dataframe has a different number of elements. mydf3 <- full_join(mydf3, atglsts) # What I want to do, is to add the newly made dataframe, atglsts, as a } # new column of the data frame, mydf3 using full_join # in order to keep common elements on the same row. # I could rename the colname to 'AGI' so that I can join by 'AGI', # but then I would lose the name of the list. # In the final dataframe, I want to know the name of the original list # the column was made from. Matthew [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Defining Variables from a Matrix for 10-Fold Cross Validation
Good afternoon, I am trying to run a 10-fold CV, using a matrix as my data set. Essentially, I want "y" to be the first column of the matrix, and my "x" to be all remaining columns (2-257). I've posted some of the code I used below, and the data set (called "zip.train") is in the "ElemStatLearn" package. The error message is highlighted in red, and the corresponding section of code is bolded. (I am not concerned with the warning message, just the error message). The issue I am experiencing is the error message below the code: I haven't come across that specific message before, and am not exactly sure how to interpret its meaning. What exactly is this error message trying to tell me? Any suggestions or insights are appreciated! Thank you all, Matthew Campbell > library (ElemStatLearn) > library(kknn) > data(zip.train) > train=zip.train[which(zip.train[,1] %in% c(2,3)),] > test=zip.test[which(zip.test[,1] %in% c(2,3)),] > nfold = 10 > infold = sample(rep(1:10, length.out = (x))) Warning message: In rep(1:10, length.out = (x)) : first element used of 'length.out' argument > *> mydata = data.frame(x = train[ , c(2,257)] , y = train[ , 1])* > > K = 20 > errorMatrix = matrix(NA, K, 10) > > for (l in nfold) + { + for (k in 1:20) + { + knn.fit = kknn(y ~ x, train = mydata[infold != l, ], test = mydata[infold == l, ], k = k) + errorMatrix[k, l] = mean((knn.fit$fitted.values - mydata$y[infold == l])^2) + } + } Error in model.frame.default(formula, data = train) : variable lengths differ (found for 'x') [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] security using R at work
Hi Katherina. Good point you make. What makes your IT department happy with the use of R studio server? What are the safe packages? Can I trust your answer? :) John. On 9 Aug 2018 10:38, "Fritsch, Katharina (NNL) via R-help" < r-help@r-project.org> wrote: > Hiya, > I work in a very security conscious organisation and we happily use R. The > average user can only use R via RStudio Server, with a limited number of > packages available, so that adds an additional level of control. > That said, are you sure that the sentence 'a few people on a mailing list > said it would be alright' is going to convince your IT department of the > harmlessness of R? > Cheers, > Katharina. > > -- > > Dr Katharina Fritsch B.Sc. M.Sc. MRSC > Chemical Modeller, Chemical and Process Modelling > > > E. > katharina.frit...@nnl.co.uk > T. > +44 (0)1925 289387 > @uknnl > > National Nuclear Laboratory Limited, 5th Floor, Chadwick House, > Birchwood Park, Warrington, WA3 6AE, UK > > www.nnl.co.uk > > > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Laurence > Clark > Sent: 08 August 2018 16:10 > To: 'r-help@r-project.org' > Subject: [R] security using R at work > > Hello all, > > I want to download R and use it for work purposes. I hope to use it to > analyse very sensitive data from our clients. > > My question is: > > If I install R on my work network computer, will the data ever leave our > network? I need to know if the data goes anywhere other than our network, > because this could compromise it's security. Is there is any chance the > data could go to a server owned by 'R' or anything else that's not > immediately obvious, but constitutes the data leaving our network? > > Thank you > > Laurence > > > > > -- > Laurence Clark > Business Data Analyst > Account Management > Health Management Ltd > > Mobile: 07584 556498 > Switchboard:0845 504 1000 > Email: laurence.cl...@healthmanltd.com > Web:BLOCKEDhealthmanagement[.]co[.]ukBLOCKED > > > > -- > CONFIDENTIALITY NOTICE: This email, including attachments, is for the sole > use of the intended recipients and may contain confidential and privileged > information or otherwise be protected by law. Any unauthorised review, use, > disclosure or distribution is prohibited. If you are not the intended > recipient, please contact the sender, and destroy all copies and the > original message.MAXIMUS People Services Limited is registered in > England and Wales (registered number: 03752300); registered office: 202 - > 206 Union Street, London, SE1 0LX, United Kingdom. The Centre for Health > and Disability Assessments Ltd (registered number: 9072343) and Health > Management Ltd (registered number: 4369949) are registered in England and > Wales. The registered office for each is Ash House, The Broyle, Ringmer, > East Sussex, BN8 5NN, United Kingdom. Remploy Limited is registered in > England and Wales (registered number: 09457025); registered office: 18c > Meridian East, Meridian Business Park, Leicester, L > eicestershire, LE19 1WZ, United Kingdom. > > > -- > > > > > -- > > > > # > Scanned by MailMarshal - M86 Security's comprehensive email content > security solution. > Download a free evaluation of MailMarshal at BLOCKEDm86security[.] > comBLOCKED > > # > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > BLOCKEDstat[.]ethz[.]ch/mailman/listinfo/r-helpBLOCKED > PLEASE do read the posting guide BLOCKEDR-project[.]org/ > posting-guide[.]htmlBLOCKED > and provide commented, minimal, self-contained, reproducible code. > > * > This message was received by the Cloud Security Email Gateway > > and was checked for Viruses and SPAM by the Cloud Security Email > Management Service. > Please forward any suspicious or unwanted emails to "Spam Helpdesk" > > * > > > This e-mail is from National Nuclear Laboratory Limited ("NNL"). This
Re: [R] sub/grep question: extract year
So there is probably a command that resets the capture variables as I call them. No doubt someone will write what it is. On 9 Aug 2018 10:36, "john matthew" wrote: > Hi Marc. > For question 1. > I know in Perl that regular expressions when captured can be saved if not > overwritten. \\1 is the capture variable in your R examples. > > So the 2nd regular expression does not match but \\1 still has 1980 > captured from the previous expression, hence the result. > > Maybe if you restart R and try your 2nd expression first, \\1 will be > empty or no match result. > > Just speculation :) > > John > > > On 9 Aug 2018 08:58, "Marc Girondot via R-help" > wrote: > >> Hi everybody, >> >> I have some questions about the way that sub is working. I hope that >> someone has the answer: >> >> 1/ Why the second example does not return an empty string ? There is no >> match. >> >> subtext <- "-1980-" >> sub(".*(1980).*", "\\1", subtext) # return 1980 >> sub(".*(1981).*", "\\1", subtext) # return -1980- >> >> 2/ Based on sub documentation, it replaces the first occurence of a >> pattern: why it does not return 1980 ? >> >> subtext <- " 1980 1981 " >> sub(".*(198[01]).*", "\\1", subtext) # return 1981 >> >> 3/ I want extract year from text; I use: >> >> subtext <- "bla 1980 bla" >> sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) # >> return 1980 >> subtext <- "bla 2010 bla" >> sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) # >> return 2010 >> >> but >> >> subtext <- "bla 1010 bla" >> sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) # >> return 1010 >> >> I would like exclude the case 1010 and other like this. >> >> The solution would be: >> >> 18[0-9][0-9] or 19[0-9][0-9] or 200[0-9] or 201[0-9] >> >> Is there a solution to write such a pattern in grep ? >> >> Thanks a lot >> >> Marc >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sub/grep question: extract year
Hi Marc. For question 1. I know in Perl that regular expressions when captured can be saved if not overwritten. \\1 is the capture variable in your R examples. So the 2nd regular expression does not match but \\1 still has 1980 captured from the previous expression, hence the result. Maybe if you restart R and try your 2nd expression first, \\1 will be empty or no match result. Just speculation :) John On 9 Aug 2018 08:58, "Marc Girondot via R-help" wrote: > Hi everybody, > > I have some questions about the way that sub is working. I hope that > someone has the answer: > > 1/ Why the second example does not return an empty string ? There is no > match. > > subtext <- "-1980-" > sub(".*(1980).*", "\\1", subtext) # return 1980 > sub(".*(1981).*", "\\1", subtext) # return -1980- > > 2/ Based on sub documentation, it replaces the first occurence of a > pattern: why it does not return 1980 ? > > subtext <- " 1980 1981 " > sub(".*(198[01]).*", "\\1", subtext) # return 1981 > > 3/ I want extract year from text; I use: > > subtext <- "bla 1980 bla" > sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) # > return 1980 > subtext <- "bla 2010 bla" > sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) # > return 2010 > > but > > subtext <- "bla 1010 bla" > sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) # > return 1010 > > I would like exclude the case 1010 and other like this. > > The solution would be: > > 18[0-9][0-9] or 19[0-9][0-9] or 200[0-9] or 201[0-9] > > Is there a solution to write such a pattern in grep ? > > Thanks a lot > > Marc > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posti > ng-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] security using R at work
Hello Laurence. Taking a pragmatic approach. If the data is so valuable and secret but also needs some analysis in R, here is suggested steps to minimise security risks. 1. Plan the analysis up front, what exactly what you want and the outcomes. 2. Take a laptop with Internet, install R and all packages needed for the planned analysis. 3. Unplug ethernet and turn off blue tooth and wifi. So no internet access at all. 4. Bring your secret data via USB or cd. 5. Perform the R analysis and export reports and figures etc to safe place. 6. Delete R, the data and all packages from laptop before using online again. A bit extreme and may still be some risk but its minimal as the analysis was done offline, and you removed R etc after. But now have a set of R results. Just an idea. John. On 8 Aug 2018 16:53, "Laurence Clark" wrote: > Hello all, > > I want to download R and use it for work purposes. I hope to use it to > analyse very sensitive data from our clients. > > My question is: > > If I install R on my work network computer, will the data ever leave our > network? I need to know if the data goes anywhere other than our network, > because this could compromise it's security. Is there is any chance the > data could go to a server owned by 'R' or anything else that's not > immediately obvious, but constitutes the data leaving our network? > > Thank you > > Laurence > > > > > -- > Laurence Clark > Business Data Analyst > Account Management > Health Management Ltd > > Mobile: 07584 556498 > Switchboard:0845 504 1000 > Email: laurence.cl...@healthmanltd.com > Web:www.healthmanagement.co.uk > > > > -- > CONFIDENTIALITY NOTICE: This email, including attachments, is for the sole > use of the intended recipients and may contain confidential and privileged > information or otherwise be protected by law. Any unauthorised review, use, > disclosure or distribution is prohibited. If you are not the intended > recipient, please contact the sender, and destroy all copies and the > original message.MAXIMUS People Services Limited is registered in > England and Wales (registered number: 03752300); registered office: 202 - > 206 Union Street, London, SE1 0LX, United Kingdom. The Centre for Health > and Disability Assessments Ltd (registered number: 9072343) and Health > Management Ltd (registered number: 4369949) are registered in England and > Wales. The registered office for each is Ash House, The Broyle, Ringmer, > East Sussex, BN8 5NN, United Kingdom. Remploy Limited is registered in > England and Wales (registered number: 09457025); registered office: 18c > Meridian East, Meridian Business Park, Leicester, Leicestershire, LE19 1WZ, > United Kingdom. > > > -- > > > > > -- > > > > # > Scanned by MailMarshal - M86 Security's comprehensive email content > security solution. > Download a free evaluation of MailMarshal at www.m86security.com > > # > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Breaking the samplesize package from CRAN
Dear Bert, Thanks for your answer, I already wrote to the maintainer/author of samplesize, Ralph Scherer, on Thu, Apr 19, 2018 but still have no answer. Does anyone have any ideas? Thank you. John. On 26 July 2018 at 20:18, Bert Gunter wrote: > Suggest you contact the package maintainer. > > ?maintainer > > Cheers, > Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > On Thu, Jul 26, 2018 at 9:49 AM, john matthew via R-help > wrote: >> >> Hello all, >> >> I am using the samplesize package (n.ttest function) to calculate >> number of samples per group power analysis (t-tests with unequal >> variance). >> I can break this n.ttest function from the samplesize package, >> depending on the standard deviations I input. >> >> This works very good. >> >> n.ttest(sd1 = 0.35, sd2 = 0.22 , variance = "unequal") >> # outputs >> $`Total sample size` >> [1] 8 >> >> $`Sample size group 1` >> [1] 5 >> >> $`sample size group 2` >> [1] 3 >> >> Warning message: >> In n.ttest(sd1 = 0.35, sd2 = 0.22, variance = "unequal") : >> Arguments -fraction- and -k- are not used, when variances are unequal >> The warnings are fine and all is good. >> >> >> But if I run it again with. >> n.ttest(sd1 = 1.68, sd2 = 0.28 , variance = "unequal") >> # outputs >> Error in while (n.start <= n.temp) { : >> missing value where TRUE/FALSE needed >> In addition: Warning messages: >> 1: In n.ttest(sd1 = 1.68, sd2 = 0.28, variance = "unequal") : >> Arguments -fraction- and -k- are not used, when variances are unequal >> 2: In qt(conf.level, df = df_approx) : NaNs produced >> 3: In qt(power, df = df_approx) : NaNs produced >> >> It breaks. >> The first obvious thing is that the standard deviations are a lot >> different in the 2nd example that breaks, compared with the first run. >> >> Checking the code myself, I can see it breaks down when the variable >> "df_approx" becomes a negative number, in a while loop from the >> n.ttest function. >> Exert of the code I am talking about. >> >> while (n.start <= n.temp) { >> n.start <- n1 + n2 + 1 >> n1 <- n.start/(1 + k) >> n2 <- (k * n.start)/(1 + k) >> df_approx <- 1/((gamma)^2/(n1 - 1) + (1 - gamma)^2/(n2 - 1)) # >> this calculation becomes negative and breaks subsequently >> tkrit.alpha <- qt(conf.level, df = df_approx) >> tkrit.beta <- qt(power, df = df_approx) >> n.temp <- ((tkrit.alpha + tkrit.beta)^2)/(c^2) >> } >> >> I can hard code df_approx to be an absolute value but I don't know if >> that messes up the statistics. >> >> Can anyone help or any ideas? How to fix? >> >> John. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Breaking the samplesize package from CRAN
Hello all, I am using the samplesize package (n.ttest function) to calculate number of samples per group power analysis (t-tests with unequal variance). I can break this n.ttest function from the samplesize package, depending on the standard deviations I input. This works very good. n.ttest(sd1 = 0.35, sd2 = 0.22 , variance = "unequal") # outputs $`Total sample size` [1] 8 $`Sample size group 1` [1] 5 $`sample size group 2` [1] 3 Warning message: In n.ttest(sd1 = 0.35, sd2 = 0.22, variance = "unequal") : Arguments -fraction- and -k- are not used, when variances are unequal The warnings are fine and all is good. But if I run it again with. n.ttest(sd1 = 1.68, sd2 = 0.28 , variance = "unequal") # outputs Error in while (n.start <= n.temp) { : missing value where TRUE/FALSE needed In addition: Warning messages: 1: In n.ttest(sd1 = 1.68, sd2 = 0.28, variance = "unequal") : Arguments -fraction- and -k- are not used, when variances are unequal 2: In qt(conf.level, df = df_approx) : NaNs produced 3: In qt(power, df = df_approx) : NaNs produced It breaks. The first obvious thing is that the standard deviations are a lot different in the 2nd example that breaks, compared with the first run. Checking the code myself, I can see it breaks down when the variable "df_approx" becomes a negative number, in a while loop from the n.ttest function. Exert of the code I am talking about. while (n.start <= n.temp) { n.start <- n1 + n2 + 1 n1 <- n.start/(1 + k) n2 <- (k * n.start)/(1 + k) df_approx <- 1/((gamma)^2/(n1 - 1) + (1 - gamma)^2/(n2 - 1)) # this calculation becomes negative and breaks subsequently tkrit.alpha <- qt(conf.level, df = df_approx) tkrit.beta <- qt(power, df = df_approx) n.temp <- ((tkrit.alpha + tkrit.beta)^2)/(c^2) } I can hard code df_approx to be an absolute value but I don't know if that messes up the statistics. Can anyone help or any ideas? How to fix? John. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fwrite() not found in data.table package
Thanks Jeff! It turns out that my problem was that I tried to install the newest data.table package while the old data.table package was loaded in R. Full instructions for installing data.table are here: https://github.com/Rdatatable/data.table/wiki/Installation On Mon, Oct 2, 2017 at 10:55 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > You are asking about (a) a contributed package (b) for a package version > that is not in CRAN and (c) an R version that is outdated, which stretches > the definition of "on topic" here. Since that function does not appear to > have been removed from that package (I am not installing a development > version to test if it is broken for your benefit), I will throw out a guess > that if you update R to 3.4.1 or 3.4.2 then things might start working. If > not, I suggest you use the CRAN version of the package and create a > reproducible example (check it with package reprex) and try again here, or > ask one of the maintainers of that package. > -- > Sent from my phone. Please excuse my brevity. > > On October 2, 2017 8:56:46 AM PDT, Matthew Keller <mckellerc...@gmail.com> > wrote: > >Hi all, > > > >I used to use fwrite() function in data.table but I cannot get it to > >work > >now. The function is not in the data.table package, even though a help > >page > >exists for it. My session info is below. Any ideas on how to get > >fwrite() > >to work would be much appreciated. Thanks! > > > >> sessionInfo() > >R version 3.2.0 (2015-04-16) > >Platform: x86_64-unknown-linux-gnu (64-bit) > >Running under: Red Hat Enterprise Linux Server release 6.3 (Santiago) > > > >locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > >LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 > >LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 > >LC_PAPER=en_US.UTF-8 > > [8] LC_NAME=C LC_ADDRESS=C > >LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 > >LC_IDENTIFICATION=C > > > >attached base packages: > >[1] stats graphics grDevices utils datasets methods base > > > >other attached packages: > >[1] data.table_1.10.5 > > > >loaded via a namespace (and not attached): > >[1] tools_3.2.0 chron_2.3-47 tcltk_3.2.0 > -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fwrite() not found in data.table package
Hi all, I used to use fwrite() function in data.table but I cannot get it to work now. The function is not in the data.table package, even though a help page exists for it. My session info is below. Any ideas on how to get fwrite() to work would be much appreciated. Thanks! > sessionInfo() R version 3.2.0 (2015-04-16) Platform: x86_64-unknown-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.3 (Santiago) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8LC_PAPER=en_US.UTF-8 [8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.10.5 loaded via a namespace (and not attached): [1] tools_3.2.0 chron_2.3-47 tcltk_3.2.0 -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Why does residuals.coxph use naive.var?
Hi all, I noticed that the scaled Schoenfeld residuals produced by residuals.coxph(fit, type="scaledsch") were different from those returned by cox.zph for a model where robust standard errors have been estimated. Looking at the source code for both functions suggests this is because residuals.coxph uses the naive variance to scale the Schoenfeld residuals whereas cox.zph uses the robust version when it is available. Lines 20-21 of the version of residuals.coxph currently on github: vv <- drop(object$naive.var) if (is.null(vv)) vv <- drop(object$var) i.e. the naive variance is used even when a robust version is available. Why is this the case? Have I missed something? Am I right in thinking that using the robust variance is the better choice if the intention is to check the proportional hazards assumption? Here is a reproducible example using the heart data: data(heart) fit <- coxph(Surv(start, stop, event) ~ year + age + surgery + cluster(id), data=jasa1) # Should return True since both produce the scaled Schoenfeld residuals all(residuals(fit, type='scaledsch') == cox.zph(fit)$y) Thanks for your help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use value in variable to be name of another variable
Hi Rolf, Thanks for the warning. I think because my initial efforts used the assign function, that Jim provided his solution using it. Any suggestions for how it could be done without assign() ? Matthew On 7/11/2016 6:31 PM, Rolf Turner wrote: On 12/07/16 10:13, Matthew wrote: Hi Jim, Wow ! And it does exactly what I was looking for. Thank you very much. That assign function is pretty nice. I should become more familiar with it. Indeed you should, and assign() is indeed nice and useful and handy. But it should be used with care and circumspection. It *alters the global environment* which is fraught with peril. Generally speaking most things that can be done with assign() (and its companion function get()) are better and more safely done using lists and functions and other "natural" R-ish constructs. Resist the temptation to turn R into a macro language. cheers, Rolf Turner __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use value in variable to be name of another variable
Hi Jim, Wow ! And it does exactly what I was looking for. Thank you very much. That assign function is pretty nice. I should become more familiar with it. Matthew On 7/11/2016 5:59 PM, Jim Lemon wrote: Hi Matthew, This question is a bit mysterious as we don't know what the object "chr" is. However, have a look at this and see if it is close to what you want to do. # set up a little matrix of character values tTargTFS<-matrix(paste("A",rep(1:4,each=4),"B",rep(1:4,4),sep=""),ncol=4) # try the assignment on the first row and column assign(tTargTFS[1,1],tTargTFS[-1,1]) # see what it looks like - okay A1B1 # run the assignment over the matrix for(i in 1:4) assign(tTargTFS[1,i],tTargTFS[-1,i]) # see what the variables look like A1B1 A2B1 A3B1 A4B1 It does what I would expect. Jim On Tue, Jul 12, 2016 at 6:01 AM, Matthew <mccorm...@molbio.mgh.harvard.edu> wrote: I want to get a value that has been assigned to a variable, and then use that value to be the name of a variable. For example, tTargTFS[1,1] # returns: V1 "AT1G01010" Now, I want to make AT1G01010 the name of a variable: AT1G01010 <- tTargTFS[-1,1] Then, go to the next tTargTFS[1,2]. Which produces V1 "AT1G01030" And then, AT1G01030 <- tTargTFS[-1,2] I want to do this up to tTargTFS[1, 2666], so I want to do this in a script and not manually. tTargTFS is a list of 2: chr [1:265, 1:2666], but I also have the data in a data frame of 265 observations of 2666 variables, if this data structure makes things easier. My initial attempts are not working. Starting with a test data structure that is a little simpler I have tried: for (i in 1:4) { ATG <- tTargTFS[1, i] assign(cat(ATG), tTargTFS[-1, i]) } Matthew __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] use value in variable to be name of another variable
I want to get a value that has been assigned to a variable, and then use that value to be the name of a variable. For example, tTargTFS[1,1] # returns: V1 "AT1G01010" Now, I want to make AT1G01010 the name of a variable: AT1G01010 <- tTargTFS[-1,1] Then, go to the next tTargTFS[1,2]. Which produces V1 "AT1G01030" And then, AT1G01030 <- tTargTFS[-1,2] I want to do this up to tTargTFS[1, 2666], so I want to do this in a script and not manually. tTargTFS is a list of 2: chr [1:265, 1:2666], but I also have the data in a data frame of 265 observations of 2666 variables, if this data structure makes things easier. My initial attempts are not working. Starting with a test data structure that is a little simpler I have tried: for (i in 1:4) { ATG <- tTargTFS[1, i] assign(cat(ATG), tTargTFS[-1, i]) } Matthew __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identify duplicate entries in data frame and calculate mean
Thank you very much, Dan. These work great. Two more great answers to my question. Matthew On 5/24/2016 4:15 PM, Nordlund, Dan (DSHS/RDA) wrote: You have several options. 1. You could use the aggregate function. If your data frame is called DF, you could do something like with(DF, aggregate(Length, list(Identifier), mean)) 2. You could use the dplyr package like this library(dplyr) summarize(group_by(DF, Identifier), mean(Length)) Hope this is helpful, Dan Daniel Nordlund, PhD Research and Data Analysis Division Services & Enterprise Support Administration Washington State Department of Social and Health Services -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew Sent: Tuesday, May 24, 2016 12:47 PM To: r-help@r-project.org Subject: [R] identify duplicate entries in data frame and calculate mean I have a data frame with 10 columns. In the last column is an alphaneumaric identifier. For most rows, this alphaneumaric identifier is unique to the file, however some of these alphanemeric idenitifiers occur in duplicate, triplicate or more. When they do occur more than once they are in consecutive rows, so when there is a duplicate or triplicate or quadruplicate (let's call them multiplicates), they are in consecutive rows. In column 7 there is an integer number (may or may not be unique. does not matter). I want to identify each multiple entries (multiplicates) occurring in column 10 and then for each multiplicate calculate the mean of the integers column 7. As an example, I will show just two columns: Length Identifier 321 A234 350 A234 340 A234 180 B123 198 B225 What I want to do (in the above example) is collapse all the A234's and report the mean to get this: Length Identifier 337 A234 180 B123 198 B225 Matthew __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identify duplicate entries in data frame and calculate mean
Thanks, Tom. I was making a mistake looking at your example and that's what my problem was. Cool answer, works great. Thank you very much. Matthew On 5/24/2016 4:23 PM, Tom Wright wrote: > Don't see that as being a big problem. If your data grows then dplyr > supports connections to external databases. Alternately if you just > want a mean, most databases can do that directly in SQL. > > On Tue, May 24, 2016 at 4:17 PM, Matthew > <mccorm...@molbio.mgh.harvard.edu > <mailto:mccorm...@molbio.mgh.harvard.edu>> wrote: > > Thank you very much, Tom. > This gets me thinking in the right direction. > One thing I should have mentioned that I did not is that the > number of rows in the data frame will be a little over 40,000 rows. > > > On 5/24/2016 4:08 PM, Tom Wright wrote: >> Using dplyr >> >> $ library(dplyr) >> $ x<-data.frame(Length=c(321,350,340,180,198), >> ID=c(rep('A234',3),'B123','B225') ) >> $ x %>% group_by(ID) %>% summarise(m=mean(Length)) >> >> >> >> On Tue, May 24, 2016 at 3:46 PM, Matthew >> <mccorm...@molbio.mgh.harvard.edu >> <mailto:mccorm...@molbio.mgh.harvard.edu>> wrote: >> >> I have a data frame with 10 columns. >> In the last column is an alphaneumaric identifier. >> For most rows, this alphaneumaric identifier is unique to the >> file, however some of these alphanemeric idenitifiers occur >> in duplicate, triplicate or more. When they do occur more >> than once they are in consecutive rows, so when there is a >> duplicate or triplicate or quadruplicate (let's call them >> multiplicates), they are in consecutive rows. >> >> In column 7 there is an integer number (may or may not be >> unique. does not matter). >> >> I want to identify each multiple entries (multiplicates) >> occurring in column 10 and then for each multiplicate >> calculate the mean of the integers column 7. >> >> As an example, I will show just two columns: >> Length Identifier >> 321 A234 >> 350 A234 >> 340 A234 >> 180 B123 >> 198 B225 >> >> What I want to do (in the above example) is collapse all the >> A234's and report the mean to get this: >> Length Identifier >> 337 A234 >> 180 B123 >> 198 B225 >> >> >> Matthew >> >> __ >> R-help@r-project.org <mailto:R-help@r-project.org> mailing >> list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible >> code. >> >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identify duplicate entries in data frame and calculate mean
Thank you very much, Tom. This gets me thinking in the right direction. One thing I should have mentioned that I did not is that the number of rows in the data frame will be a little over 40,000 rows. On 5/24/2016 4:08 PM, Tom Wright wrote: > Using dplyr > > $ library(dplyr) > $ x<-data.frame(Length=c(321,350,340,180,198), > ID=c(rep('A234',3),'B123','B225') ) > $ x %>% group_by(ID) %>% summarise(m=mean(Length)) > > > > On Tue, May 24, 2016 at 3:46 PM, Matthew > <mccorm...@molbio.mgh.harvard.edu > <mailto:mccorm...@molbio.mgh.harvard.edu>> wrote: > > I have a data frame with 10 columns. > In the last column is an alphaneumaric identifier. > For most rows, this alphaneumaric identifier is unique to the > file, however some of these alphanemeric idenitifiers occur in > duplicate, triplicate or more. When they do occur more than once > they are in consecutive rows, so when there is a duplicate or > triplicate or quadruplicate (let's call them multiplicates), they > are in consecutive rows. > > In column 7 there is an integer number (may or may not be unique. > does not matter). > > I want to identify each multiple entries (multiplicates) occurring > in column 10 and then for each multiplicate calculate the mean of > the integers column 7. > > As an example, I will show just two columns: > Length Identifier > 321 A234 > 350 A234 > 340 A234 > 180 B123 > 198 B225 > > What I want to do (in the above example) is collapse all the > A234's and report the mean to get this: > Length Identifier > 337 A234 > 180 B123 > 198 B225 > > > Matthew > > __ > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] identify duplicate entries in data frame and calculate mean
I have a data frame with 10 columns. In the last column is an alphaneumaric identifier. For most rows, this alphaneumaric identifier is unique to the file, however some of these alphanemeric idenitifiers occur in duplicate, triplicate or more. When they do occur more than once they are in consecutive rows, so when there is a duplicate or triplicate or quadruplicate (let's call them multiplicates), they are in consecutive rows. In column 7 there is an integer number (may or may not be unique. does not matter). I want to identify each multiple entries (multiplicates) occurring in column 10 and then for each multiplicate calculate the mean of the integers column 7. As an example, I will show just two columns: Length Identifier 321 A234 350 A234 340 A234 180 B123 198 B225 What I want to do (in the above example) is collapse all the A234's and report the mean to get this: Length Identifier 337 A234 180 B123 198 B225 Matthew __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fast way to create composite matrix based on mixed indices?
Brilliant Denes. Thank you for your help. This worked and is obviously much faster than a loop... On Thu, Sep 17, 2015 at 3:22 PM, Dénes Tóth <toth.de...@ttk.mta.hu> wrote: > Hi Matt, > > you could use matrix indexing. Here is a possible solution, which could be > optimized further (probably). > > # The old matrix > (old.mat <- matrix(1:30,nrow=3,byrow=TRUE)) > # matrix of indices > index <- matrix(c(1,1,1,4, > 1,3,5,10, > 2,2,1,3, > 2,1,4,8, > 2,3,9,10), > nrow=5,byrow=TRUE, > dimnames=list(NULL, > c('new.mat.row','old.mat.row', > 'old.mat.col.start','old.mat.col.end'))) > # expected result > new.mat <- matrix(c(1:4,25:30,11:13,4:8,29:30), > byrow=TRUE, nrow=2) > # > # column indices > ind <- mapply(seq, index[, 3], index[,4], > SIMPLIFY = FALSE, USE.NAMES = FALSE) > ind_len <- vapply(ind, length, integer(1)) > ind <- unlist(ind) > > # > # old indices > old.ind <- cbind(rep(index[,2], ind_len), ind) > # > # new indices > new.ind <- cbind(rep(index[,1], ind_len), ind) > # > # create the new matrix > result <- matrix(NA_integer_, max(index[,1]), max(index[,4])) > # > # fill the new matrix > result[new.ind] <- old.mat[old.ind] > # > # check the results > identical(result, new.mat) > > > HTH, > Denes > > > > > > On 09/17/2015 10:36 PM, Matthew Keller wrote: > >> HI all, >> >> Sorry for the title here but I find this difficult to describe succinctly. >> Here's the problem. >> >> I want to create a new matrix where each row is a composite of an old >> matrix, but where the row & column indexes of the old matrix change for >> different parts of the new matrix. For example, the second row of new >> matrix (which has , e.g., 10 columns) might be columns 1 to 3 of row 2 of >> old matrix, columns 4 to 8 of row 1 of old matrix, and columns 9 to 10 of >> row 3 of old matrix. >> >> Here's an example in code: >> >> #The old matrix >> (old.mat <- matrix(1:30,nrow=3,byrow=TRUE)) >> >> #matrix of indices to create the new matrix from the old one. >> #The 1st column gives the row number of the new matrix >> #the 2nd gives the row of the old matrix that we're going to copy into the >> new matrix >> #the 3rd gives the starting column of the old matrix for the row in col 2 >> #the 4th gives the end column of the old matrix for the row in col 2 >> index <- matrix(c(1,1,1,4, >>1,3,5,10, >>2,2,1,3, >>2,1,4,8, >>2,3,9,10), >> nrow=5,byrow=TRUE, >> >> >> dimnames=list(NULL,c('new.mat.row','old.mat.row','old.mat.col.start','old.mat.col.end'))) >> >> I will be given old.mat and index and want to create new.mat from them. >> >> I want to create a new.matrix of two rows that looks like this: >> new.mat <- matrix(c(1:4,25:30,11:13,4:8,29:30),byrow=TRUE,nrow=2) >> >> So here, the first row of new.mat is columns 1 to 4 of row 1 of the >> old.mat >> and columns 5 to 10 of row 3 of old.mat. >> >> new.mat and old.mat will always have the same number of columns but the >> number of rows could differ. >> >> I could accomplish this in a loop, but the real problem is quite large >> (new.mat might have 1e8 elements), and so a for loop would be >> prohibitively >> slow. >> >> I may resort to unix tools and use a shell script, but wanted to first see >> if this is doable in R in a fast way. >> >> Thanks in advance! >> >> Matt >> >> >> -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fast way to create composite matrix based on mixed indices?
HI all, Sorry for the title here but I find this difficult to describe succinctly. Here's the problem. I want to create a new matrix where each row is a composite of an old matrix, but where the row & column indexes of the old matrix change for different parts of the new matrix. For example, the second row of new matrix (which has , e.g., 10 columns) might be columns 1 to 3 of row 2 of old matrix, columns 4 to 8 of row 1 of old matrix, and columns 9 to 10 of row 3 of old matrix. Here's an example in code: #The old matrix (old.mat <- matrix(1:30,nrow=3,byrow=TRUE)) #matrix of indices to create the new matrix from the old one. #The 1st column gives the row number of the new matrix #the 2nd gives the row of the old matrix that we're going to copy into the new matrix #the 3rd gives the starting column of the old matrix for the row in col 2 #the 4th gives the end column of the old matrix for the row in col 2 index <- matrix(c(1,1,1,4, 1,3,5,10, 2,2,1,3, 2,1,4,8, 2,3,9,10), nrow=5,byrow=TRUE, dimnames=list(NULL,c('new.mat.row','old.mat.row','old.mat.col.start','old.mat.col.end'))) I will be given old.mat and index and want to create new.mat from them. I want to create a new.matrix of two rows that looks like this: new.mat <- matrix(c(1:4,25:30,11:13,4:8,29:30),byrow=TRUE,nrow=2) So here, the first row of new.mat is columns 1 to 4 of row 1 of the old.mat and columns 5 to 10 of row 3 of old.mat. new.mat and old.mat will always have the same number of columns but the number of rows could differ. I could accomplish this in a loop, but the real problem is quite large (new.mat might have 1e8 elements), and so a for loop would be prohibitively slow. I may resort to unix tools and use a shell script, but wanted to first see if this is doable in R in a fast way. Thanks in advance! Matt -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape: melt and cast
Yep, that works. Thanks, Stephen. I should have drawn the parallel with Excel Pivot tables sooner. On Tue, Sep 1, 2015 at 9:36 AM, stephen sefick <ssef...@gmail.com> wrote: > I would make this minimal. In other words, use an example data set, dput, > and use output of dput in a block of reproducible code. I don't understand > exactly what you want, but does sum work? If there is more than one record > for a given set of factors the sum is the sum of the counts. If only one > record, then the sum is the same as the original number. > > On Tue, Sep 1, 2015 at 10:00 AM, Matthew Pickard < > matthew.david.pick...@gmail.com> wrote: > >> Thanks, Stephen. I've looked into the fun.aggregate argument. I don't >> want to aggregate, so I thought leaving it blank (allowing it to default to >> NULL) would do that. >> >> >> Here's a corrected post (with further explanation): >> >> Hi, >> >> I have data that looks like this: >> >> >dput(head(ratings)) >> structure(list(QCode = structure(c(5L, 7L, 5L, 7L, 5L, 7L), .Label = >> c("APPEAR", >> "FEAR", "FUN", "GRAT", "GUILT", "Joy", "LOVE", "UNGRAT"), class = >> "factor"), >> PID = structure(c(1L, 1L, 2L, 2L, 3L, 3L), .Label = c("1123", >> "1136", "1137", "1142", "1146", "1147", "1148", "1149", "1152", >> "1153", "1154", "1156", "1158", "1161", "1164", "1179", "1182", >> "1183", "1191", "1196", "1197", "1198", "1199", "1200", "1201", >> "1203", "1205", "1207", "1208", "1209", "1214", "1216", "1219", >> "1220", "1222", "1223", "1224", "1225", "1226", "1229", "1236", >> "1237", "1238", "1240", "1241", "1243", "1245", "1246", "1248", >> "1254", "1255", "1256", "1257", "1260", "1262", "1264", "1268", >> "1270", "1272", "1278", "1279", "1280", "1282", "1283", "1287", >> "1288", "1292", "1293", "1297", "1310", "1311", "1315", "1329", >> "1332", "1333", "1343", "1346", "1347", "1352", "1354", "1355", >> "1356", "1360", "1368", "1369", "1370", "1378", "1398", "1400", >> "1403", "1404", "1411", "1412", "1420", "1421", "1423", "1424", >> "1426", "1428", "1432", "1433", "1435", "1436", "1438", "1439", >> "1440", "1441", "1443", "1444", "1446", "1447", "1448", "1449", >> "1450", "1453", "1454", "1456", "1459", "1460", "1461", "1462", >> "1463", "1468", "1471", "1475", "1478", "1481", "1482", "1487", >> "1488", "1490", "1493", "1495", "1497", "1503", "1504", "1508", >> "1509", "1511", "1513", "1514", "1515", "1522", "1524", "1525", >> "1526", "1527", "1528", "1529", "1532", "1534", "1536", "1538", >> "1539", "1540", "1543", "1550", "1551", "1552", "1554", "1555", >> "1556", "1558", "1559"), class = "factor"), RaterName = >> structure(c(1L, >> 1L, 1L, 1L, 1L, 1L), .Label = c("cwormhoudt", "zspeidel"), class = >> "factor"), >> SI1 = c(2L, 1L, 1L, 1L, 2L, 1L), SI2 = c(2L, 2L, 2L, 2L, >> 2L, 3L), SI3 = c(3L, 3L, 3L, 3L, 2L, 4L), SI4 = c(1L, 2L, >> 1L,
[R] setting up R -- VM Fusion, WIndows7
Hi, As i need R to speak to Bloomberg (and big only runs on windows), i'm running windows 7 via VM Fusion on my mac. I think i am having permission problems, as i cannot use install.packages, and cannot change .libPaths via either a .Rprofile, or Profile.site. I've posted more detail in this super-user question -- http://superuser.com/questions/948083/how-to-set-environment-variables-in-vm-fusion-windows-7 Throwing it over to this list as well, as I've spent about half the time i had allowed for my project on (not getting) set up. I realise this is a very niche problem - hoping that someone else has had a similar problem, and can offer pointers. best mj [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tcltk2 entry box
Thank you very much, Greg, for the tkwait commands. I am just starting to try out examples on the sciviews web page to get a feel for tcltk in R and the tkwait.variable and tkwait.window seem like they could be very useful to me. I will add these in to my practice scripts and see what I can do with them. Matthew On 7/9/2015 5:31 PM, Greg Snow wrote: If you want you script to wait until you have a value entered then you can use the tkwait.variable or tkwait.window commands to make the script wait before continuing (or you can bind the code to a button so that you enter the value, then click on the button to run the code). On Wed, Jul 8, 2015 at 7:58 PM, Matthew McCormack mccorm...@molbio.mgh.harvard.edu wrote: Wow ! Very nice. Thank you very much, John. This is very helpful and just what I need. Yes, I can see that I should have paid attention to tcltk before going to tcltk2. Matthew On 7/8/2015 8:37 PM, John Fox wrote: Dear Matthew, For file selection, see ?tcltk::tk_choose.files or ?tcltk::tkgetOpenFile . You could enter a number in a tk entry widget, but, depending upon the nature of the number, a slider or other widget might be a better choice. For a variety of helpful tcltk examples see http://www.sciviews.org/_rgui/tcltk/, originally by James Wettenhall but now maintained by Philippe Grosjean (the author of the tcltk2 package). (You probably don't need tcltk2 for the simple operations that you mention, but see ?tk2spinbox for an alternative to a slider.) Best, John --- John Fox, Professor McMaster University Hamilton, Ontario, Canada http://socserv.socsci.mcmaster.ca/jfox/ -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew Sent: July-08-15 8:01 PM To: r-help Subject: [R] tcltk2 entry box Is anyone familiar enough with the tcltk2 package to know if it is possible to have an entry box where a user can enter information (such as a path to a file or a number) and then be able to use the entered information downstream in a R script ? The idea is for someone unfamiliar with R to just start an R script that would take care of all the commands for them so all they have to do is get the script started. However, there is always a couple of pieces of information that will change each time the script is used (for example, a different file will be processed by the script). So, I would like a way for the user to input that information as the script ran. Matthew McCormack __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tcltk2 entry box
Is anyone familiar enough with the tcltk2 package to know if it is possible to have an entry box where a user can enter information (such as a path to a file or a number) and then be able to use the entered information downstream in a R script ? The idea is for someone unfamiliar with R to just start an R script that would take care of all the commands for them so all they have to do is get the script started. However, there is always a couple of pieces of information that will change each time the script is used (for example, a different file will be processed by the script). So, I would like a way for the user to input that information as the script ran. Matthew McCormack __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tcltk2 entry box
Wow ! Very nice. Thank you very much, John. This is very helpful and just what I need. Yes, I can see that I should have paid attention to tcltk before going to tcltk2. Matthew On 7/8/2015 8:37 PM, John Fox wrote: Dear Matthew, For file selection, see ?tcltk::tk_choose.files or ?tcltk::tkgetOpenFile . You could enter a number in a tk entry widget, but, depending upon the nature of the number, a slider or other widget might be a better choice. For a variety of helpful tcltk examples see http://www.sciviews.org/_rgui/tcltk/, originally by James Wettenhall but now maintained by Philippe Grosjean (the author of the tcltk2 package). (You probably don't need tcltk2 for the simple operations that you mention, but see ?tk2spinbox for an alternative to a slider.) Best, John --- John Fox, Professor McMaster University Hamilton, Ontario, Canada http://socserv.socsci.mcmaster.ca/jfox/ -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew Sent: July-08-15 8:01 PM To: r-help Subject: [R] tcltk2 entry box Is anyone familiar enough with the tcltk2 package to know if it is possible to have an entry box where a user can enter information (such as a path to a file or a number) and then be able to use the entered information downstream in a R script ? The idea is for someone unfamiliar with R to just start an R script that would take care of all the commands for them so all they have to do is get the script started. However, there is always a couple of pieces of information that will change each time the script is used (for example, a different file will be processed by the script). So, I would like a way for the user to input that information as the script ran. Matthew McCormack __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simple question - mean of a row of a data.frame
Hi all, Simple question I should know: I'm unclear on the logic of why the sum of a row of a data.frame returns a valid sum but the mean of a row of a data.frame returns NA: sum(rock[2,]) [1] 10901.05 mean(rock[2,],trim=0) [1] NA Warning message: In mean.default(rock[2, ], trim = 0) : argument is not numeric or logical: returning NA I get that rock[2,] is itself a data.frame of mode list, but why the inconsistency between functions? How can you figure this out from, e.g., ?mean ?sum Thanks in advance, Matt -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help with excel data
Try asap utilities (Home and Student edition), http://www.asap-utilities.com/index.php. When installed it will look like this in Excel, Select Columns Rows and then #18. If that is not helpful, then DigDB, http://www.digdb.com/, but this one requires a subscription. It will also split columns. You may have to do some 'cleaning' of individual cells, such as removing leading and/or trainling spaces. A lot of this can be one with the ASAP Utilities 'Text' pull down menu. Matthew On 1/21/2015 3:31 PM, Dr Polanski wrote: Hi all! Sorry to bother you, I am trying to learn some R via coursera courses and other internet sources yet haven’t managed to go far And now I need to do some, I hope, not too difficult things, which I think R can do, yet have no idea how to make it do so I have a big set of data (empirical) which was obtained by my colleagues and store at not convenient way - all of the data in two cells of an excel table an example of the data is in the attached file (the link) https://drive.google.com/file/d/0B64YMbf_hh5BS2tzVE9WVmV3bFU/view?usp=sharing so the first column has a number and the second has a whole vector (I guess it is) which looks like «some words in Cyrillic(the length varies)» and then the set of numbers «12*23 34*45» (another problem that some times it is «12*23, 34*56» And the number of raws is about 3000 so it is impossible to do manually what I need to have at the end is to have it separately in different excel cells - what is written in words - | 12 | 23 | 34 | 45 | Do you think it is possible to do so using R (or something else?) Thank you very much in advance and sorry for asking for help and so stupid question, the problem is - I am trying and yet haven’t even managed to install openSUSE onto my laptop - only Ubuntu! :) Thank you very much! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] change default installation of R
I have R version 2.15.0 installed in /usr/local/bin, and this is the default; in other words when I type which R this is the path I get. I also have installed R into/usr/local/R-3.1.1/. I used ./configure and then make to install this version. After make, I get the following error messages: ../unix/sys-std.o: In function `initialize_rlcompletion': /usr/local/R-3.1.1/src/unix/sys-std.c:689: undefined reference to `rl_sort_completion_matches' collect2: ld returned 1 exit status make[3]: *** [R.bin] Error 1 make[3]: Leaving directory `/usr/local/R-3.1.1/src/main' make[2]: *** [R] Error 2 make[2]: Leaving directory `/usr/local/R-3.1.1/src/main' make[1]: *** [R] Error 1 make[1]: Leaving directory `/usr/local/R-3.1.1/src' make: *** [R] Error 1 I want to change R-3.1.1 to the default, so that when I type which R, I get /usr/local/R-3.1.1 To do this I first cd'd into /usr/local/bin and renamed R to R-old_10-30-14 then created a symlink by 'ln -s /usr/local/R-3.1.1/bin R' but when I type which R, I get 'no R in ... , where ' . . . ' is my PATH variable. If I remove the symlink and then create another one with ln -s /usr/local/R-3.1.1/bin/R R, then after typing 'which R', I get /usr/local/bin/R: line 259: /usr/local/R-3.1.1/bin/exe c/R: No such file or directory /usr/local/bin/R: line 259: exec: /usr/local/R-3.1.1/bin/exec/R: cannot execute: No such file or directory This is the same message I get if I just type at the command line: /usr/local/R-3.1.1/bin/R. Matthew [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] find the data frames in list of objects and make a list of them
Thank you very much, Bill ! It has taken my a while to figure out, but yes, what I need is a list (the R object, list) of data frames and not a character vector containing the names of the data frames. Thank you very much. This works well and is getting me in the direction I want to go. Matthew On 8/13/2014 7:40 PM, William Dunlap wrote: Previously you asked A second question: is this the best way to make a list of data frames without having to manually type c(dataframe1, dataframe2, ...) ? If you use 'c' there you will not get a list of data.frames - you will get a list of all the columns in the data.frame you supplied. Use 'list' instead of 'c' if you are taking that route. The *apply functions are helpful here. To make list of all data.frames in an environment you can use the following function, which takes the environment to search as an argument. f - function(envir = globalenv()) { tmp - eapply(envir, all.names=TRUE, FUN=function(obj) if (is.data.frame(obj)) obj else NULL) # remove NULL's now tmp[!vapply(tmp, is.null, TRUE)] } Use is as allDataFrames - f(globalenv()) # or just f() Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Aug 13, 2014 at 3:49 PM, Matthew mccorm...@molbio.mgh.harvard.edu wrote: Hi Richard, Thank you very much for your reply and your code. Your code is doing just what I asked for, but does not seem to be what I need. I will need to review some basic R before I can continue. I am trying to list data frames in order to bind them into 1 single data frame with something like: dplyr::rbind_all(list of data frames), but when I try dplyr::rbind_all(lsDataFrame(ls())), I get the error: object at index 1 not a data.frame. So, I am going to have to learn some more about lists in R before proceding. Thank you for your help and code. Matthew Matthew On 8/13/2014 3:12 PM, Richard M. Heiberger wrote: I would do something like this lsDataFrame - function(xx=ls()) xx[sapply(xx, function(x) is.data.frame(get(x)))] ls(package:datasets) lsDataFrame(ls(package:datasets)) On Wed, Aug 13, 2014 at 2:56 PM, Matthew mccorm...@molbio.mgh.harvard.edu wrote: Hi everyone, I would like the find which objects are data frames in all the objects I have created ( in other words in what you get when you type: ls() ), then I would like to make a list of these data frames. Explained in other words; after typing ls(), you get the names of objects. Which objects are data frames ? How to then make a list of these data frames. A second question: is this the best way to make a list of data frames without having to manually type c(dataframe1, dataframe2, ...) ? Matthew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] find the data frames in list of objects and make a list of them
Hi everyone, I would like the find which objects are data frames in all the objects I have created ( in other words in what you get when you type: ls() ), then I would like to make a list of these data frames. Explained in other words; after typing ls(), you get the names of objects. Which objects are data frames ? How to then make a list of these data frames. A second question: is this the best way to make a list of data frames without having to manually type c(dataframe1, dataframe2, ...) ? Matthew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] find the data frames in list of objects and make a list of them
Jim, Wow that was cl ! This function is *really* useful. Thank you very much ! (It is also way beyond my capability). I need to make a list of data frames because then I am going to bind them with plyr using 'dplyr::rbind_all(listOfDataFrames)'. This will make a single data frame, and from that single data frame I can make a heat map of all the data. For example, when I use your fantastic function, my.ls(), I get: my.ls() Size Class Length Dim .Random.seed2,544integer 626 cpl28,664 character 512 filenames 2,120 character 19 filepath 216 character 1 i 152 character 1 Mer7_1-1_160-226A_1_gene_exp_diff_filt_hc_log2.txt 81,152 data.frame 3 529 x 3 Mer7_1-1_Mer7_1-2_gene_exp_diff_filt_hc_log2.txt 31,624 data.frame 3 199 x 3 Mer7_1-1_S150-160-226A_1_gene_exp_diff_filt_hc_log2.txt81,152 data.frame 3 529 x 3 Mer7_1-1_W29_1_gene_exp_diff_filt_hc_log2.txt 129,376 data.frame 3 849 x 3 Mer7_1-1_W29_S150-226A_1_gene_exp_diff_filt_hc_log2.txt 126,816 data.frame 3 835 x 3 Mer7_1-1_W29_S160-162A_1_gene_exp_diff_filt_hc_log2.txt82,792 data.frame 3 537 x 3 Mer7_1-1_W29_S226A_1_gene_exp_diff_filt_hc_log2.txt 115,008 data.frame 3 756 x 3 Mer7_1-2_160-226A_1_gene_exp_diff_filt_hc_log2.txt 79,936 data.frame 3 519 x 3 Mer7_1-2_S150-160-226A_1_gene_exp_diff_filt_hc_log2.txt84,512 data.frame 3 548 x 3 Mer7_1-2_W29_1_gene_exp_diff_filt_hc_log2.txt 130,568 data.frame 3 857 x 3 Mer7_1-2_W29_S160-162A_1_gene_exp_diff_filt_hc_log2.txt83,768 data.frame 3 542 x 3 Mer7_1-2_W29_S226A_1_gene_exp_diff_filt_hc_log2.txt 119,008 data.frame 3 783 x 3 Mer7_2-1_160-226A_2_gene_exp_diff_filt_hc_log2.txt105,344 data.frame 3 685 x 3 Mer7_2-1_Mer7_2-2_gene_exp_diff_filt_hc_log2.txt 26,216 data.frame 3 166 x 3 Mer7_2-1_S150-160-226A_2_gene_exp_diff_filt_hc_log2.txt 106,368 data.frame 3 693 x 3 Mer7_2-1_W29_2_gene_exp_diff_filt_hc_log2.txt 160,200 data.frame 3 1053 x 3 Mer7_2-1_W29_S150-226A_2_gene_exp_diff_filt_hc_log2.txt 152,696 data.frame 3 1005 x 3 Mer7_2-1_W29_S160-162A_2_gene_exp_diff_filt_hc_log2.txt 113,992 data.frame 3 743 x 3 Mer7_2-1_W29_S226A_2_gene_exp_diff_filt_hc_log2.txt 138,944 data.frame 3 914 x 3 my.ls 35,624 function 1 myfiles 2,120 character 19 names 2,424 list 19 test 680 character 5 whatisthis 2,424 list 19 **Total 2,026,440--- --- --- What I need is make the list of data frames for the dplyr command, dplyr::rbind_all(listOfDataFrames). Ideally, this would also be a specific subset of all the data frames, say the data frames with W29 in the name. This is something we, our lab, would be doing routinely and at various times of the day, so I want to automate the process so it does not need anyone to manually sit at the computer and type the list of data frames. Matthew On 8/13/2014 3:06 PM, jim holtman wrote: Here is a function that I use that might give you the results you want: = my.ls() Size Class Length Dim .Random.seed 2,544integer 626 .remapHeaderFile 40,440 data.frame 2 373 x 2 colID 216 character 3 delDate 104 character 1 deliv15,752 data.table 7 164 x 7 f_drawPallet 36,896 function 1 i96 character 1 indx168,816 character1782 pallet 172,696 data.table 31782 x 3 pallets 405,736 data.table 14 1782 x 14 picks26,572,856 data.table 19 154247 x 19 wb 656 Workbook 1 wSplit 68,043,136 list1782 x56numeric 2 **Total 95,460,000--- --- --- my.ls function (pos = 1, sorted = FALSE, envir = as.environment(pos)) { .result - sapply(ls(envir = envir, all.names = TRUE), function(..x) object.size(eval(as.symbol(..x), envir = envir
Re: [R] find the data frames in list of objects and make a list of them
Hi Richard, Thank you very much for your reply and your code. Your code is doing just what I asked for, but does not seem to be what I need. I will need to review some basic R before I can continue. I am trying to list data frames in order to bind them into 1 single data frame with something like: dplyr::rbind_all(list of data frames), but when I try dplyr::rbind_all(lsDataFrame(ls())), I get the error: object at index 1 not a data.frame. So, I am going to have to learn some more about lists in R before proceding. Thank you for your help and code. Matthew Matthew On 8/13/2014 3:12 PM, Richard M. Heiberger wrote: I would do something like this lsDataFrame - function(xx=ls()) xx[sapply(xx, function(x) is.data.frame(get(x)))] ls(package:datasets) lsDataFrame(ls(package:datasets)) On Wed, Aug 13, 2014 at 2:56 PM, Matthew mccorm...@molbio.mgh.harvard.edu wrote: Hi everyone, I would like the find which objects are data frames in all the objects I have created ( in other words in what you get when you type: ls() ), then I would like to make a list of these data frames. Explained in other words; after typing ls(), you get the names of objects. Which objects are data frames ? How to then make a list of these data frames. A second question: is this the best way to make a list of data frames without having to manually type c(dataframe1, dataframe2, ...) ? Matthew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] working on a data frame
Thank you very much Peter, Bill and Petr for some great and quite elegant solutions. There is a lot I can learn from these. Yes to your question Bill about the raw numbers, they are counts and they can not be negatives. The data is RNA Sequencing data where there are approximately 32,000 genes being measured for changes between two conditions. There are some genes that are not present (can not be measured) initially, but are present in the second condition, and the reverse is true also of some genes that are present initially and then not be present in the second condition (these are often the most interesting genes). This makes it difficult to compare mathematically the changes of all genes, so it is common practice to change the 0's to 1's and then redo the log2. 1 is considered sufficiently small, actually anywhere up to 3 or 5 could be just do to 'background noise' in the measurement process, but it is somewhat arbitrary. Matthew On 7/28/2014 2:43 AM, PIKAL Petr wrote: Hi I like to use logical values directly in computations if possible. yourData[,10] - yourData[,9]/(yourData[,8]+(yourData[,8]==0)) Logical values are automagicaly considered FALSE=0 and TRUE=1 and can be used in computations. If you really want to change 0 to 1 in column 8 you can use yourData[,8] - yourData[,8]+(yourData[,8]==0) without ifelse stuff. Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of William Dunlap Sent: Friday, July 25, 2014 8:07 PM To: Matthew Cc: r-help@r-project.org Subject: Re: [R] working on a data frame if yourData[,8]==0, then yourData[,8]==1, yourData[,10] - yourData[,9]/yourData[,8] You could do express this in R as is8Zero - yourData[,8] == 0 yourData[is8Zero, 8] - 1 yourData[is8Zero, 10] - yourData[is8Zero,9] / yourData[is8Zero,8] Note how logical (Boolean) values are used as subscripts - read the '[' as 'such that' when using logical subscripts. There are many more ways to express the same thing. (I am tempted to change the algorithm to avoid the divide by zero problem by making the quotient (numerator + epsilon)/(denominator + epsilon) where epsilon is a very small number. I am assuming that the raw numbers are counts or at least cannot be negative.) Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jul 25, 2014 at 10:44 AM, Matthew mccorm...@molbio.mgh.harvard.edu wrote: Thank you for your comments, Peter. A couple of questions. Can I do something like the following ? if yourData[,8]==0, then yourData[,8]==1, yourData[,10] - yourData[,9]/yourData[,8] I think I am just going to have to learn more about R. I thought getting into R would be like going from Perl to Python or Java etc., but it seems like R programming works differently. Matthew On 7/25/2014 12:06 AM, Peter Alspach wrote: Tena koe Matthew Column 10 contains the result of the value in column 9 divided by the value in column 8. If the value in column 8==0, then the division can not be done, so I want to change the zero to a one in order to do the division.. That being the case, think in terms of vectors, as Sarah says. Try: yourData[,10] - yourData[,9]/yourData[,8] yourData[yourData[,8]==0,10] - yourData[yourData[,8]==0,9] This doesn't change the 0 to 1 in column 8, but it doesn't appear you actually need to do that. HTH Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew McCormack Sent: Friday, 25 July 2014 3:16 p.m. To: Sarah Goslee Cc: r-help@r-project.org Subject: Re: [R] working on a data frame On 7/24/2014 8:52 PM, Sarah Goslee wrote: Hi, Your description isn't clear: On Thursday, July 24, 2014, Matthew mccorm...@molbio.mgh.harvard.edu mailto:mccorm...@molbio.mgh.harvard.edu wrote: I am coming from the perspective of Excel and VBA scripts, but I would like to do the following in R. I have a data frame with 14 columns and 32,795 rows. I want to check the value in column 8 (row 1) to see if it is a 0. If it is not a zero, proceed to the next row and check the value for column 8. If it is a zero, then a) change the zero to a 1, b) divide the value in column 9 (row 1) by 1, Row 1, or the row in which column 8 == 0? All rows in which the value in column 8==0. Why do you want to divide by 1? Column 10 contains the result of the value in column 9 divided by the value in column 8. If the value in column 8==0, then the division can not be done, so I want to change the zero to a one in order to do the division. This is a fairly standard thing to do with this data. (The data are measurements of amounts at two time points. Sometimes a thing will not be present in the beginning (0), but very present at the later time. Column 10 is the log2 of the change. Infinite is not an easy number to work with, so it is common to change
Re: [R] working on a data frame
Thank you for your comments, Peter. A couple of questions. Can I do something like the following ? if yourData[,8]==0, then yourData[,8]==1, yourData[,10] - yourData[,9]/yourData[,8] I think I am just going to have to learn more about R. I thought getting into R would be like going from Perl to Python or Java etc., but it seems like R programming works differently. Matthew On 7/25/2014 12:06 AM, Peter Alspach wrote: Tena koe Matthew Column 10 contains the result of the value in column 9 divided by the value in column 8. If the value in column 8==0, then the division can not be done, so I want to change the zero to a one in order to do the division.. That being the case, think in terms of vectors, as Sarah says. Try: yourData[,10] - yourData[,9]/yourData[,8] yourData[yourData[,8]==0,10] - yourData[yourData[,8]==0,9] This doesn't change the 0 to 1 in column 8, but it doesn't appear you actually need to do that. HTH Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew McCormack Sent: Friday, 25 July 2014 3:16 p.m. To: Sarah Goslee Cc: r-help@r-project.org Subject: Re: [R] working on a data frame On 7/24/2014 8:52 PM, Sarah Goslee wrote: Hi, Your description isn't clear: On Thursday, July 24, 2014, Matthew mccorm...@molbio.mgh.harvard.edu mailto:mccorm...@molbio.mgh.harvard.edu wrote: I am coming from the perspective of Excel and VBA scripts, but I would like to do the following in R. I have a data frame with 14 columns and 32,795 rows. I want to check the value in column 8 (row 1) to see if it is a 0. If it is not a zero, proceed to the next row and check the value for column 8. If it is a zero, then a) change the zero to a 1, b) divide the value in column 9 (row 1) by 1, Row 1, or the row in which column 8 == 0? All rows in which the value in column 8==0. Why do you want to divide by 1? Column 10 contains the result of the value in column 9 divided by the value in column 8. If the value in column 8==0, then the division can not be done, so I want to change the zero to a one in order to do the division. This is a fairly standard thing to do with this data. (The data are measurements of amounts at two time points. Sometimes a thing will not be present in the beginning (0), but very present at the later time. Column 10 is the log2 of the change. Infinite is not an easy number to work with, so it is common to change the 0 to a 1. On the other hand, something may be present at time 1, but not at the later time. In this case column 10 would be taking the log2 of a number divided by 0, so again the zero is commonly changed to a one in order to get a useable value in column 10. In both the preceding cases there was a real change, but Inf and NaN are not helpful.) c) place the result in column 10 (row 1) and Ditto on the row 1 question. I want to work on all rows where column 8 (and column 9) contain a zero. Column 10 contains the result of the value in column 9 divided by the value in column 8. So, for row 1, column 10 row 1 contains the ratio column 9 row 1 divided by column 8 row 1, and so on through the whole 32,000 or so rows. Most rows do not have a zero in columns 8 or 9. Some rows have zero in column 8 only, and some rows have a zero in column 9 only. I want to get rid of the zeros in these two columns and then do the division to get a manageable value in column 10. Division by zero and Inf are not considered 'manageable' by me. What do you want column 10 to be if column 8 isn't 0? Does it already have a value. I suppose it must. Yes column 10 does have something, but this something can be Inf or NaN, which I want to get rid of. d) repeat this for each of the other 32,794 rows. Is this possible with an R script, and is this the way to go about it. If it is, could anyone get me started ? Assuming you want to put the new values in the rows where column 8 == 0, you can do it in two steps: mydata[,10] - ifelse(mydata[,8] == 0, mydata[,9]/whatever, mydata[,10]) #where whatever is the thing you want to divide by that probably isn't 1 mydata[,8] - ifelse(mydata[,8] == 0, 1, mydata[,8]) R programming is best done by thinking about vectorizing things, rather than doing them in loops. Reading the Intro to R that comes with your installation is a good place to start. Would it be better to change the data frame into a matrix, or something else ? Thanks for your help. Sarah Matthew -- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained
[R] working on a data frame
I am coming from the perspective of Excel and VBA scripts, but I would like to do the following in R. I have a data frame with 14 columns and 32,795 rows. I want to check the value in column 8 (row 1) to see if it is a 0. If it is not a zero, proceed to the next row and check the value for column 8. If it is a zero, then a) change the zero to a 1, b) divide the value in column 9 (row 1) by 1, c) place the result in column 10 (row 1) and d) repeat this for each of the other 32,794 rows. Is this possible with an R script, and is this the way to go about it. If it is, could anyone get me started ? Matthew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] working on a data frame
On 7/24/2014 8:52 PM, Sarah Goslee wrote: Hi, Your description isn't clear: On Thursday, July 24, 2014, Matthew mccorm...@molbio.mgh.harvard.edu mailto:mccorm...@molbio.mgh.harvard.edu wrote: I am coming from the perspective of Excel and VBA scripts, but I would like to do the following in R. I have a data frame with 14 columns and 32,795 rows. I want to check the value in column 8 (row 1) to see if it is a 0. If it is not a zero, proceed to the next row and check the value for column 8. If it is a zero, then a) change the zero to a 1, b) divide the value in column 9 (row 1) by 1, Row 1, or the row in which column 8 == 0? All rows in which the value in column 8==0. Why do you want to divide by 1? Column 10 contains the result of the value in column 9 divided by the value in column 8. If the value in column 8==0, then the division can not be done, so I want to change the zero to a one in order to do the division. This is a fairly standard thing to do with this data. (The data are measurements of amounts at two time points. Sometimes a thing will not be present in the beginning (0), but very present at the later time. Column 10 is the log2 of the change. Infinite is not an easy number to work with, so it is common to change the 0 to a 1. On the other hand, something may be present at time 1, but not at the later time. In this case column 10 would be taking the log2 of a number divided by 0, so again the zero is commonly changed to a one in order to get a useable value in column 10. In both the preceding cases there was a real change, but Inf and NaN are not helpful.) c) place the result in column 10 (row 1) and Ditto on the row 1 question. I want to work on all rows where column 8 (and column 9) contain a zero. Column 10 contains the result of the value in column 9 divided by the value in column 8. So, for row 1, column 10 row 1 contains the ratio column 9 row 1 divided by column 8 row 1, and so on through the whole 32,000 or so rows. Most rows do not have a zero in columns 8 or 9. Some rows have zero in column 8 only, and some rows have a zero in column 9 only. I want to get rid of the zeros in these two columns and then do the division to get a manageable value in column 10. Division by zero and Inf are not considered 'manageable' by me. What do you want column 10 to be if column 8 isn't 0? Does it already have a value. I suppose it must. Yes column 10 does have something, but this something can be Inf or NaN, which I want to get rid of. d) repeat this for each of the other 32,794 rows. Is this possible with an R script, and is this the way to go about it. If it is, could anyone get me started ? Assuming you want to put the new values in the rows where column 8 == 0, you can do it in two steps: mydata[,10] - ifelse(mydata[,8] == 0, mydata[,9]/whatever, mydata[,10]) #where whatever is the thing you want to divide by that probably isn't 1 mydata[,8] - ifelse(mydata[,8] == 0, 1, mydata[,8]) R programming is best done by thinking about vectorizing things, rather than doing them in loops. Reading the Intro to R that comes with your installation is a good place to start. Would it be better to change the data frame into a matrix, or something else ? Thanks for your help. Sarah Matthew -- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odd behavior of seq()
Hi all, A bit stumped here. z - seq(.05,.85,by=.1) z==.05 #good [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE z==.15 #huh [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE More generally: sum(z==.25) [1] 1 sum(z==.35) [1] 0 sum(z==.45) [1] 1 sum(z==.55) [1] 1 sum(z==.65) [1] 0 sum(z==.75) [1] 0 sum(z==.85) [1] 1 Does anyone have any ideas what is going on here? R.Version() $platform [1] x86_64-apple-darwin9.8.0 $arch [1] x86_64 $os [1] darwin9.8.0 $system [1] x86_64, darwin9.8.0 $status [1] $major [1] 2 $minor [1] 13.1 $year [1] 2011 $month [1] 07 $day [1] 08 $`svn rev` [1] 56322 $language [1] R $version.string [1] R version 2.13.1 (2011-07-08) -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odd behavior of seq()
thanks all! On Thu, Jul 3, 2014 at 12:38 PM, Peter Langfelder peter.langfel...@gmail.com wrote: Precision, precision, precision... z[2]-0.15 [1] 2.775558e-17 My solution: z - signif(seq(.05,.85,by=.1), 5) z[2] - 0.15 [1] 0 z[2]==0.15 [1] TRUE Peter On Thu, Jul 3, 2014 at 11:28 AM, Matthew Keller mckellerc...@gmail.com wrote: Hi all, A bit stumped here. z - seq(.05,.85,by=.1) z==.05 #good [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE z==.15 #huh [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE More generally: sum(z==.25) [1] 1 sum(z==.35) [1] 0 sum(z==.45) [1] 1 sum(z==.55) [1] 1 sum(z==.65) [1] 0 sum(z==.75) [1] 0 sum(z==.85) [1] 1 Does anyone have any ideas what is going on here? R.Version() $platform [1] x86_64-apple-darwin9.8.0 $arch [1] x86_64 $os [1] darwin9.8.0 $system [1] x86_64, darwin9.8.0 $status [1] $major [1] 2 $minor [1] 13.1 $year [1] 2011 $month [1] 07 $day [1] 08 $`svn rev` [1] 56322 $language [1] R $version.string [1] R version 2.13.1 (2011-07-08) -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lm models over all possible pairwise combinations of the columns of two matrices
Dear all, I am working through a problem at the moment and have got stuck. I have searched around on the help list for assistance but could not find anything - but apologies if I have missed something. A dummy example of my problem is below. I will continue to work on it, but any help would be greatly appreciated. Thanks in advance for your time. Best wishes, Matt I have a matrix of response variables: p-matrix(c(rnorm(120,1), rnorm(120,1), rnorm(120,1)), 120,3) and two matrices of covariates: g-matrix(c(rep(1:3, each=40), rep(3:1, each=40), rep(1:3, 40)), 120,3) m-matrix(c(rep(1:2, 60), rep(2:1, 60), rep(1:2, each=60)), 120,3) For all combinations of the columns of the covariate matrices g and m I want to run these two models: test - function(uniq_m, uniq_g, p = p) { full - lm(p ~ factor(uniq_m) * factor(uniq_g)) null - lm(p ~ factor(uniq_m) + factor(uniq_g)) return(list('f'=full, 'n'=null)) } So I want to test for an interaction between column 1 of m and column 1 of g, then column 2 of m and column 1 of g, then column 2 of m and column 2 of g...and so forth across all possible pairwise interactions. The response variable is the same each time and is a matrix containing multiple columns. So far, I can do this for a single combination of columns: test_1 - test(m[ ,1], g[ ,1], p) And I can also run the model over all columns of m and one coloumn of g: test_2 - apply(m, 2, function(uniq_m) { test(uniq_m, g[ ,1], p = p) }) I can then get the F statistics for each response variable of each model: sapply(summary(test_2[[1]]$f), function(x) x$fstatistic) sapply(summary(test_2[[1]]$n), function(x) x$fstatistic) And I can compare models for each response variable using an F-test: d1-colSums(matrix(residuals(test_2[[1]]$n),nrow(g),ncol(p))^2) d2-colSums(matrix(residuals(test_2[[2]]$f),nrow(g),ncol(p))^2) F-((d1-d2) / (d2/114)) My question is how do I run the lm models over all combinations of columns from the m and the g matrix, and get the F-statistics? While this is a dummy example, the real analysis will have a response matrix that is 700 x 8000, and the covariate matrices will be 700 x 4000 and 700 x 100 so I need something that is as fast as possible. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [e1071] Features that are factors when exporting a model with write.svm
I have a trained SVM that I want to export with write.svm and eventually use in libSVM. Some of my features are factors. Standard libSVM only works with features that are doubles, so I need to figure out how my features should be represented and used. How does e1071 treat factors in an SVM? For feature foo with values a and b I'm assuming it's something like foo_a (0 or 1) and foo_b (0 or 1). Is that right? Do factors get treated differently in an SVM? If I convert the factors to intergers for libSVM, I'll lose the information that a feature doesn't take on a range of values. Is that going to cause problems? I don't know if the model takes that into account. When using write.svm a scale file is also output. My scale file is missing the same number of rows as I have features that are factors. That's another indication to me that the factors are causing issues. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [e1071] Features that are factors when exporting a model with write.svm
I may have been able to answer my own questions by reading the e1071 source. It looks like the features are just converted to doubles with as.double(x). And, I haven't found where in the code yet, but it looks like it's not scaling the factors which explains why I'm missing rows in the scale file. On Fri, Feb 21, 2014 at 1:50 PM, Matthew Wood doowt...@gmail.com wrote: I have a trained SVM that I want to export with write.svm and eventually use in libSVM. Some of my features are factors. Standard libSVM only works with features that are doubles, so I need to figure out how my features should be represented and used. How does e1071 treat factors in an SVM? For feature foo with values a and b I'm assuming it's something like foo_a (0 or 1) and foo_b (0 or 1). Is that right? Do factors get treated differently in an SVM? If I convert the factors to intergers for libSVM, I'll lose the information that a feature doesn't take on a range of values. Is that going to cause problems? I don't know if the model takes that into account. When using write.svm a scale file is also output. My scale file is missing the same number of rows as I have features that are factors. That's another indication to me that the factors are causing issues. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Biomod model access
Hi great that was easy I feel like a bit of a fool for not figuring this out. TO LOAD ALL SAVED MODELS AT ONCE: library(biomod2) # change directory to where you stored you’re original models (my documents is default if you did not specify). Go into the file models *# TO LOAD ALL SAVED MODELS FROM PREVIOUS RUN* rm(list=ls()) #will remove ALL objects currently stored in R # open old models with the load command load() list.files() # check to see if all your files appeared correctly f - as.list(list.files()) # get ready to load for(i in f) { load(i) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SDM using BIOMOD2 error message
I think it is a problem with your directory setting changing your directory. When you make your enviro stack you set your directory to: setwd(V:/BIOCLIM) Then when you import your species coordinates and presence/absence status you change your directory to: setwd(C:/Users/Lindsie/Documents/R) Then you try to run your model R says 'pred' is missing ~ because it cannot find your raster stacks. CHANGE YOUR R DIRECTORY BACK TO WHERE YOU STORED YOUR ENVIRONMENTAL LAYERS PRIOR TO RUNNING THE MODEL! myBiomodModelOut - BIOMOD_Modeling(myBiomodDa... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Biomod model access
I have been struggling with this same problem. I always have to re-run. PLEASE HELP!! I have however figured out the whole data-format issue am now able to save grid files for use in other GIS programs after they are re-exported. On Thursday, August 15, 2013 1:32:31 AM UTC-7, Jenny Williams wrote: I am still trying to get my head around biomod2. I have run through the tutorial a few times, which works really well in a linear format. But, I want to see the models and assess them at every part of the process. So, I need to: 1: be able to re-access all the files from /.BIOMOD_DATA/ once R is closed and all the file links are lost. e.g myBiomodModelOut 2: call the summary parameters for the models e.g GLM, I can see the files but not sure how to access them. e.g myGLMs - BIOMOD_LoadModels(myBiomodModelOut, models='GLM') #just produces a list summary(myGLMs[1]) Length Class Mode 1 character character #summary(GLM) doesn't work, but is the output that I am looking to find. 3. find the split datasets used for each of the iterations BIOMOD_Modeling options; NbRunEval for DataSplit Any help or pointers in the right direction would be greatly appreciated. FYI the vignette does not seem to work: http://127.0.0.1:15505/library/biomod2/doc/index.html ** Jenny Williams Spatial Information Scientist, GIS Unit Herbarium, Library, Art Archives Directorate Royal Botanic Gardens, Kew Richmond, TW9 3AB, UK Tel: +44 (0)208 332 5277 email: jenny.w...@kew.org javascript:mailto:jenny.willi...@kew.orgjavascript: ** Film: The Forgotten Home of Coffee - Beyond the Gardens http://www.youtube.com/watch?v=-uDtytKMKpAsns=tw Stories: Coffee Expedition - Ethiopia http://storify.com/KewGIS/coffee-expedition-ethiopia Kew in Harapan Rainforest Sumatra http://storify.com/KewGIS/kew-in-harapan-rainforest Articles: Seeing the wood for the trees http://www.kew.org/ucm/groups/public/documents/document/kppcont_060602.pdf How Kew's GIS team and South East Asia botanists are working to help conserve and restore a rainforest in Sumatra. Download a pdf of this article here. http://www.kew.org/ucm/groups/public/documents/document/kppcont_060602.pdf The Royal Botanic Gardens, Kew is a non-departmental public body with exempt charitable status, whose principal place of business is at Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AB, United Kingdom. The information contained in this email and any attachments is intended solely for the addressee(s) and may contain confidential or legally privileged information. If you have received this message in error, please return it immediately and permanently delete it. Do not use, copy or disclose the information contained in this email or in any attachment. Any views expressed in this email do not necessarily reflect the opinions of RBG Kew. Any files attached to this email have been inspected with virus detection software by RBG Kew before transmission, however you should carry out your own virus checks before opening any attachments. RBG Kew accepts no liability for any loss or damage which may be caused by software viruses. [[alternative HTML version deleted]] __ r-h...@r-project.org javascript: mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Handling large SAS file in R
Completing the reverse engineering effort is the principle barrier to fully incorporating the sas7bdat file format. Of course, SAS may change the format specification at any time, and without our knowledge. The sas7bdat package is a repository for the results of our (myself, Clint Cummins, and several others) experiments with the file format, most notably the 'sas7bdat' vignette, which lays out our current understanding of the structure of sas7bdat files. While others have reverse-engineered the file format, this is the ONLY publicly available specification. Hence, my feeling is that the vignette is the package's most important contribution. A prototype reader is also included; the read.sas7bdat function. Some have found it useful for routine work. But there are issues, as you have found. Fortunately, there are ongoing efforts by others to implement more efficient readers, using the data that we have compiled. Best, Matt P.S. There is a read loop in the read.sas7bdat function, indexed by rows of the tabular data, that you might use to indicate progress reading the file. Colleagues Frank Harrell wrote that you need to purchase Stat/Transfer, which I did many years ago and continue to use. But I don't understand why the sas7bdat package (or something equivalent) cannot reverse engineer the SAS procedures so that R users can read sas7bdat files as well as StatTransfer. I have been in contact with the maintainer, Matt Shotwell, regarding bugs in the present version (0.4) and he wrote: it tends to languish just one or two items from the top of my TODO... I hope to get back to it soon. I have also written to this bulletin board about the foreign package not being able to process certain SAS XPT files (which StatTransfer handled without any problem). I am a strong advocate of R and I have arranged work-arounds (using StatTransfer) in these cases. However, R users would benefit from the ability of R to read any SAS file without intermediate software. I would offer to participate in any efforts to accomplish this but I think that it is beyond my capabilities. Dennis [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] geo_bar x= and y= warnings and error help
Any insight on issues leading to the following error modes would be appreciated. #Version_1 CALL alphaDivOTU - ggplot(data=alphaDivOTU_pt1to5, aes(y = Num.OTUs,x = Patient,fill = Timepoint)) + geom_bar(position = position_dodge) + theme(text = element_text(family = 'Helvetica-Narrow',size = 18.0)) + scale_fill_manual(guide = guide_legend(),values = c(forestgreen,gray44,dodgerblue2,royalblue2,royalblue4,blue3)) + scale_y_continuous(breaks = pretty_breaks(n = 10.0,min.n = 5.0)) ggsave(plot=alphaDivOTU, filename='alphaDivOTU.png', scale=1, dpi=300, width=10, height=10, units=c(cm)) #Version_1 Error modes Mapping a variable to y and also using stat=bin. With stat=bin, it will attempt to set the y value to the count of cases in each group. This can result in unexpected behavior and will not be allowed in a future version of ggplot2. If you want y to represent counts of cases, use stat=bin and don't map a variable to y. If you want y to represent values in the data, use stat=identity. See ?geom_bar for examples. (Deprecated; last used in version 0.9.2) Error in .$position$adjust : object of type 'closure' is not subsettable #Version_2 CALL alphaDivOTU - ggplot(data=alphaDivOTU_pt1to5, aes(y = Num.OTUs,x = Patient,fill = Timepoint)) + geom_bar(position = position_dodge, stat = identity) + theme(text = element_text(family = 'Helvetica-Narrow',size = 18.0)) + scale_fill_manual(guide = guide_legend(),values = c(forestgreen,gray44,dodgerblue2,royalblue2,royalblue4,blue3)) + scale_y_continuous(breaks = pretty_breaks(n = 10.0,min.n = 5.0)) ggsave(plot=alphaDivOTU, filename='alphaDivOTU.png', scale=1, dpi=300, width=10, height=10, units=c(cm)) #For Version_2 I get the error: Error in stat$parameters : object of type 'closure' is not subsettable [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with removing extra legend elements in ggplot
I can't get the fine tuning right with my legend. I get an extra legend element 10 which is the point size in my plot. Can someone help me get rid of this extra element? Additionally I would also like to reduce the size of the legend. If you want to reproduce my figure you can download my data in csv format here https://github.com/scoyoc/EcoSiteDelineation/blob/master/VegNMDS_scores.csv . Here is my code... veg.nmds.sc = read.csv(VegNMDS_scores.csv, header = T) nmds.fig = ggplot(data = veg.nmds.sc, aes(x = NMDS1, y = NMDS2)) nmds.fig + geom_point(aes(color = VegType, shape = VegType, size = 10)) + scale_colour_manual(name = Vegetation Type, values = c(blue, magenta, gray50, red, cyan3, green4, gold)) + scale_shape_manual(name = Vegetation Type, values = c(15, 16, 17, 18, 15, 16, 17)) + theme_bw() + theme(panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.key = element_rect(color = white) ) I have been messing around with theme(..., legend.key.size = unit(1, cm)) but I keep getting the error could not find function unit. I'm not sure why, isn't unit supposed to be part of the legend.key argument? ... and the resulting figure... http://r.789695.n4.nabble.com/file/n4680764/VegNMDS.jpeg Thanks for the help. MVS = Matthew Van Scoyoc Graduate Research Assistant, Ecology Wildland Resources Department http://www.cnr.usu.edu/wild/ Ecology Center http://www.usu.edu/ecology/ Quinney College of Natural Resources http://cnr.usu.edu/ Utah State University Logan, UT = Think SNOW! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with removing extra legend elements in ggplot
No dice. I still get the 10 legend element. Thanks for the quick reply. Cheers, MVS = Matthew Van Scoyoc Graduate Research Assistant, Ecology Wildland Resources Department http://www.cnr.usu.edu/wild/ Ecology Center http://www.usu.edu/ecology/ Quinney College of Natural Resources http://cnr.usu.edu/ Utah State University Logan, UT mvansco...@aggiemail.usu.eduhttps://sites.google.com/site/scoyoc/ = Think SNOW! On Tue, Nov 19, 2013 at 5:12 PM, David Winsemius dwinsem...@comcast.netwrote: On Nov 19, 2013, at 3:44 PM, Matthew Van Scoyoc wrote: I can't get the fine tuning right with my legend. I get an extra legend element 10 which is the point size in my plot. Can someone help me get rid of this extra element? Additionally I would also like to reduce the size of the legend. If you want to reproduce my figure you can download my data in csv format here https://github.com/scoyoc/EcoSiteDelineation/blob/master/VegNMDS_scores.csv . Here is my code... veg.nmds.sc = read.csv(VegNMDS_scores.csv, header = T) nmds.fig = ggplot(data = veg.nmds.sc, aes(x = NMDS1, y = NMDS2)) nmds.fig + geom_point(aes(color = VegType, shape = VegType, size = 10)) + scale_colour_manual(name = Vegetation Type, values = c(blue, magenta, gray50, red, cyan3, green4, gold)) + scale_shape_manual(name = Vegetation Type, values = c(15, 16, 17, 18, 15, 16, 17)) + theme_bw() + theme(panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.key = element_rect(color = white) ) I have been messing around with theme(..., legend.key.size = unit(1, cm)) but I keep getting the error could not find function unit. I'm not sure why, isn't unit supposed to be part of the legend.key argument? Try this workaround to what sounds like a bug: library(grid) # then repeat the call. -- David Winsemius Alameda, CA, USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with removing extra legend elements in ggplot
Awesome! Thanks for the fix Dennis, and thanks for clearing up aes() too. It makes sense now. Cheers, MVS = Matthew Van Scoyoc Graduate Research Assistant, Ecology Wildland Resources Department http://www.cnr.usu.edu/wild/ Ecology Center http://www.usu.edu/ecology/ Quinney College of Natural Resources http://cnr.usu.edu/ Utah State University Logan, UT mvansco...@aggiemail.usu.eduhttps://sites.google.com/site/scoyoc/ = Think SNOW! On Tue, Nov 19, 2013 at 5:52 PM, Dennis Murphy djmu...@gmail.com wrote: The additional element comes from this code: geom_point(aes(color = VegType, shape = VegType, size = 10)) Take the size argument outside the aes() statement and the legend will disappear: geom_point(aes(color = VegType, shape = VegType), size = 10) The aes() statement maps a variable to a plot aesthetic. In this case you're mapping VegType to color and shape. You want to *set* the size aesthetic to a constant value, and that is done by assigning the value 10 to the size aesthetic outside of aes(). Dennis On Tue, Nov 19, 2013 at 4:35 PM, Matthew Van Scoyoc sco...@gmail.com wrote: No dice. I still get the 10 legend element. Thanks for the quick reply. Cheers, MVS = Matthew Van Scoyoc Graduate Research Assistant, Ecology Wildland Resources Department http://www.cnr.usu.edu/wild/ Ecology Center http://www.usu.edu/ecology/ Quinney College of Natural Resources http://cnr.usu.edu/ Utah State University Logan, UT mvansco...@aggiemail.usu.eduhttps://sites.google.com/site/scoyoc/ = Think SNOW! On Tue, Nov 19, 2013 at 5:12 PM, David Winsemius dwinsem...@comcast.net wrote: On Nov 19, 2013, at 3:44 PM, Matthew Van Scoyoc wrote: I can't get the fine tuning right with my legend. I get an extra legend element 10 which is the point size in my plot. Can someone help me get rid of this extra element? Additionally I would also like to reduce the size of the legend. If you want to reproduce my figure you can download my data in csv format here https://github.com/scoyoc/EcoSiteDelineation/blob/master/VegNMDS_scores.csv . Here is my code... veg.nmds.sc = read.csv(VegNMDS_scores.csv, header = T) nmds.fig = ggplot(data = veg.nmds.sc, aes(x = NMDS1, y = NMDS2)) nmds.fig + geom_point(aes(color = VegType, shape = VegType, size = 10)) + scale_colour_manual(name = Vegetation Type, values = c(blue, magenta, gray50, red, cyan3, green4, gold)) + scale_shape_manual(name = Vegetation Type, values = c(15, 16, 17, 18, 15, 16, 17)) + theme_bw() + theme(panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.key = element_rect(color = white) ) I have been messing around with theme(..., legend.key.size = unit(1, cm)) but I keep getting the error could not find function unit. I'm not sure why, isn't unit supposed to be part of the legend.key argument? Try this workaround to what sounds like a bug: library(grid) # then repeat the call. -- David Winsemius Alameda, CA, USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Save intermediate result in a same file
Hello everybody, i have to save a 100 iteration computation in a file every 5 iterations until the end. I first give a vector A of 100 elements for the 100 iterations and i want to update A every 5 iterations. I use save but it doesn't work. Someone has an idea, i need a help Cheers. -- View this message in context: http://r.789695.n4.nabble.com/Save-intermediate-result-in-a-same-file-tp4677350.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confusing behaviour in data.table: unexpectedly changing variable
Very sorry to hear this bit you. If you need a copy of names before changing them by reference : oldnames - copy(names(DT)) This will be documented and it's on the bug list to do so. copy is needed in other circumstances too, see ?copy. More details here : http://stackoverflow.com/questions/18662715/colnames-being-dropped-in-data-table-in-r http://stackoverflow.com/questions/15913417/why-does-data-table-update-namesdt-by-reference-even-if-i-assign-to-another-v Btw, the r-help posting guide says (last time I looked) you should only post to r-help about packages if you have tried the maintainer first but didn't hear from them; i.e., r-help isn't for support about packages. I don't follow r-help, so please continue to cc me if you reply. Matthew On 25/09/13 00:47, Jonathan Dushoff wrote: I got bitten badly when a variable I created for the purpose of recording an old set of names changed when I didn't think I was going near it. I'm not sure if this is a desired behaviour, or documented, or warned about. I read the data.table intro and the FAQ, and also ?setnames. Ben Bolker created a minimal reproducible example: library(data.table) DT = data.table(x=rep(c(a,b,c),each=3), y=c(1,3,6), v=1:9) names(DT) ## [1] x y v oldnames - names(DT) print(oldnames) ## [1] x y v setnames(DT, LETTERS[1:3]) print(oldnames) ## [1] A B C __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regularized Discriminant Analysis scores, anyone?
Thank you Dr. Ligges, i very much appreciate the quick reply. i wondered if that was the case, based on the math as I (poorly) understood it. However i remain confused. page 107 from the rrcov package PDF makes me think I can derive LDA-style discriminant scores for a QDA: library(rrcov) data(iris) qda1-QdaClassic(x=iris[,1:4], grouping=iris[,5]) pred_qda-predict(qda1, iris[,1:4]) head(pred_qda@x) plotdat-pred_qda@x plot(plotdat[,1], plotdat[,2]) plot(plotdat[,2], plotdat[,3]) pred_qda$x looks like QDA discriminant scores. No doubt you are right, but if you have a moment, I'd love to know what these scores are and what they summarize. In addition, I have run into this nice set of lengthy R code to manually calculate discriminant scores for a QDA: https://cs.uwaterloo.ca/~a2curtis/courses/2005/ML-classification.pdf None of this means i can calculate discriminant scores for a RDA, of course, but QDA is my back-up choice. Bottom line: am i am completely misinterpreting what I am seeing here, mathematically? Or is this just the result of different ways of implementing QDA in R? Regards, and thanks again, Matt On 6/2/2013 10:39 AM, Uwe Ligges wrote: On 02.06.2013 05:01, Matthew Fagan wrote: Hi all, I am attempting to do Regularized Discriminant Analysis (RDA) on a large dataset, and I want to extract the RDA discriminant score matrix. But the predict function in the klaR package, unlike the predict function for LDA in the MASS package, doesn't seem to give me an option to extract the scores. Any suggestions? There are no such scores: same as for qda, you do not follow the Fisher idea of the linear discriminant components any more: Your space is now partitioned by ellipsoid like structures based on the estimation of the inner-class covariance matrices. rda as implemented in klaR (see the reference given on the help page) is a regularization that helps to overcome problems when estimating non-singular covariance matrices for the separate classes. i have already tried (and failed; ran out of 16 GB of memory) to do this with the rda package: don't know why, but the klaR package seems to be much more efficient with memory. I have included an example below: The rda package provides a completely different regularization technique, see the reference given on the help page. Best, Uwe Ligges library(klaR) library(MASS) data(iris) x - rda(Species ~ ., data = iris, gamma = 0.05, lambda = 0.2) rda1-predict(x, iris[, 1:4]) str(rda1) # This gets you an object with posterior probabilities and classes, but no discriminant scores! # if you run lda y - lda(Species ~ ., data = iris) lda1-predict(y, iris[, 1:4]) str(lda1) head(lda1$x) # gets you the discriminant scores for the LDA. But how to do this for RDA? # curiously, the QDA function in MASS has this same problem, although you can get around it using the rrcov package. Regards, and thank very much for any help, Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew Fagan Columbia University Department of Ecology, Evolution, and Environmental Biology 512-569-1417 (cell/home) (212) 854-9987 (office) (212) 854-8188 (fax) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regularized Discriminant Analysis scores, anyone?
Hi all, I am attempting to do Regularized Discriminant Analysis (RDA) on a large dataset, and I want to extract the RDA discriminant score matrix. But the predict function in the klaR package, unlike the predict function for LDA in the MASS package, doesn't seem to give me an option to extract the scores. Any suggestions? i have already tried (and failed; ran out of 16 GB of memory) to do this with the rda package: don't know why, but the klaR package seems to be much more efficient with memory. I have included an example below: library(klaR) library(MASS) data(iris) x - rda(Species ~ ., data = iris, gamma = 0.05, lambda = 0.2) rda1-predict(x, iris[, 1:4]) str(rda1) # This gets you an object with posterior probabilities and classes, but no discriminant scores! # if you run lda y - lda(Species ~ ., data = iris) lda1-predict(y, iris[, 1:4]) str(lda1) head(lda1$x) # gets you the discriminant scores for the LDA. But how to do this for RDA? # curiously, the QDA function in MASS has this same problem, although you can get around it using the rrcov package. Regards, and thank very much for any help, Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Contour lines in a persp plot
Thanks a lot, that is all i want. If someone is interessed, see the code below panel.3d.contour - function(x, y, z, rot.mat, distance, nlevels = 20, zlim.scaled, ...) # les3 points de suspension pour dire les autres paramètres sont ceux données par défaut { add.line - trellis.par.get(add.line) panel.3dwire(x, y, z, rot.mat, distance, zlim.scaled = zlim.scaled, ...) clines - contourLines(x, y, matrix(z, nrow = length(x), byrow = TRUE), nlevels = nlevels) for (ll in clines) { m - ltransform3dto3d(rbind(ll$x, ll$y, zlim.scaled[2]), rot.mat, distance) panel.lines(m[1,], m[2,], col = add.line$col, lty = add.line$lty, lwd = add.line$lwd) } } fn-function(x,y){sin(x)+2*y} #this looks like a corrugated tin roof x-seq(from=1,to=100,by=2) #generates a list of x values to sample y-seq(from=1,to=100,by=2) #generates a list of y values to sample z-outer(x,y,FUN=fn) #applies the funct. across the combos of x and y wireframe(z,zlim = c(1, 300), nlevels = 10, aspect = c(1, 0.5), panel.aspect = 0.6, panel.3d.wireframe = panel.3d.contour, shade = FALSE , screen = list(z = 20, x = -60)) -- View this message in context: http://r.789695.n4.nabble.com/Contour-lines-in-a-persp-plot-tp4667220p4667309.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] expanding a presence only dataset into presence/absence
Hello, I'm working with a very large dataset (250,000+ lines in its' current form) that includes presence only data on various species (which is nested within different sites and sampling dates). I need to convert this into a dataset with presence/absence for each species. For example, I would like to expand My current data to Desired data: My current data Species Site Date a 1 1 b 1 1 b 1 2 c 1 3 Desired data Species Present Site Date a 1 1 1 b 1 1 1 c 0 1 1 a 0 2 2 b 1 2 2 C 0 2 2 a 0 3 3 b 0 3 3 c 1 3 3 I've scoured the web, including Rseek and haven't found a resolution (and note that a similar question was asked sometime in 2011 without an answer). Does anyone have any thoughts? Thank you in advance. -- Matthew D. Venesky, Ph.D. Postdoctoral Research Associate, Department of Integrative Biology, The University of South Florida, Tampa, FL 33620 Website: http://mvenesky.myweb.usf.edu/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Factor to numeric conversion - as.numeric(as.character(f))[f] - Language definition seems to say to not use this.
These two seem to be at odds. Is this the case? From help(factor) - section Warning: To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)). From the language definition - section 2.3.1: Factors are currently implemented using an integer array to specify the actual levels and a second array of names that are mapped to the integers. Rather unfortunately users often make use of the implementation in order to make some calculations easier. This, however, is an implementation issue and is not guaranteed to hold in all implementations of R. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factor to numeric conversion - as.numeric(as.character(f))[f] - Language definition seems to say to not use this.
When used as an index, the factor is implicitly converted to integer. In the expression as.numeric(levels(f))[f], the vector as.numeric(levels(f)) is indexed by as.integer(f). This appears to rely on the current implementation, as mentioned in section 2.3.1 of the language definition. On Mon, Apr 1, 2013 at 1:49 PM, Peter Ehlers ehl...@ucalgary.ca wrote: On 2013-04-01 10:48, Matthew Lundberg wrote: These two seem to be at odds. Is this the case? From help(factor) - section Warning: To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)). From the language definition - section 2.3.1: Factors are currently implemented using an integer array to specify the actual levels and a second array of names that are mapped to the integers. Rather unfortunately users often make use of the implementation in order to make some calculations easier. This, however, is an implementation issue and is not guaranteed to hold in all implementations of R. Hint: f - factor(sample(5, 10, TRUE)) as.numeric(levels(f))[f] g - factor(sample(letters[1:5], 10, TRUE)) as.numeric(levels(g))[g] Peter Ehlers [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factor to numeric conversion - as.numeric(levels(f))[f] - Language definition seems to say to not use this.
Note the edited subject line! I don't know why I typed it as it was before. This says that as.numeric(as.character(f)) will work regardless of the implementation, and I agree. It's the recommendation to use as.numeric(levels(f))[f] that has me wondering about section 2.3.1 of the language definition. I expect that this idiom is in widespread use, and perhaps the language definition should be changed. On Mon, Apr 1, 2013 at 2:58 PM, Bert Gunter gunter.ber...@gene.com wrote: Yup. Note also: as.character.factor function (x, ...) levels(x)[x] But of course this is OK, since this can change if the implementation does. Which is the whole point, of course. -- Bert On Mon, Apr 1, 2013 at 12:16 PM, Matthew Lundberg matthew.k.lundb...@gmail.com wrote: When used as an index, the factor is implicitly converted to integer. In the expression as.numeric(levels(f))[f], the vector as.numeric(levels(f)) is indexed by as.integer(f). This appears to rely on the current implementation, as mentioned in section 2.3.1 of the language definition. On Mon, Apr 1, 2013 at 1:49 PM, Peter Ehlers ehl...@ucalgary.ca wrote: On 2013-04-01 10:48, Matthew Lundberg wrote: These two seem to be at odds. Is this the case? From help(factor) - section Warning: To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)). From the language definition - section 2.3.1: Factors are currently implemented using an integer array to specify the actual levels and a second array of names that are mapped to the integers. Rather unfortunately users often make use of the implementation in order to make some calculations easier. This, however, is an implementation issue and is not guaranteed to hold in all implementations of R. Hint: f - factor(sample(5, 10, TRUE)) as.numeric(levels(f))[f] g - factor(sample(letters[1:5], 10, TRUE)) as.numeric(levels(g))[g] Peter Ehlers [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-help https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with R CMD check and the inconsolata font business
On 11/3/2011 3:30 PM, Brian Diggs wrote: Well, I figured it out. Or at least got it working. I had to run initexmf --mkmaps because apparently there was something wrong with my font mappings. I don't know why; I don't know how. But it works now. I think installing the font into the Windows Font directory was not necessary. I'm including the solution in case anyone else has this problem. Many thanks Brian Diggs! I just had the same problem and that fixed it. Matthew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] low pass filter analysis in R
Janesh, This might help get you started: http://biostatmatt.com/archives/78 (apologies for linking to my own blog) Regards, Matt -- Message: 51 Date: Wed, 6 Feb 2013 18:50:43 -0600 From: Janesh Devkota janesh.devk...@gmail.com To: r-help@r-project.org Subject: [R] low pass filter analysis in R Message-ID: CAPTbr1rrSmUgmjjKL54u2KZzzEAFLUXALCuH=wofrbttaky...@mail.gmail.com Content-Type: text/plain Hello R users, I am trying to use R to do the low pass filter analysis for the tidal data. I am a novice in R and so far been doing only simple stuffs on R. I found a package called signal but couldn't find the proper tutorial for the low pass filter. Could anyone point me to the proper tutorial or starting point on how to do low pass filter analysis in R ? Thank you so much. Janesh [[alternative HTML version deleted]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rscript on Mac : specify R64 over R (32-bit version)
Hi, I have both R and R64 installed on Mac OSX 10.8 Mountain Lion (64-bit). When I run the command sessionInfo() from within Rscript, I get: R version 2.15.2 (2012-10-26) Platform: i386-apple-darwin9.8.0/i386 (32-bit) Is there a way to make Rscript point at the R64 rather than R (32-bit)? Thanks, Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SQLDF column errors
I am trying to exclude integer values from a small data frame 1, d1 that have matching hits in data frame 2, d2 (Very big) which involves matching those hits first. I am trying to use sqldf on the df's in the following fashion: df1: V1 12675 14753 16222 18765 df2: head(df2) V1 V2 13647 rd1500 14753 rd1580 15987 rd1590 16222 rd2020. df1_new-sqldf(select df1.V1, df2.V2 where rs10.V1 = d10.pos) - Ideally I would like to try to use delete or not equal to != though I can only find that delete works with sqldf. but it returns this error: Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such column: df1.V1) I am also trying this: df1_new-sqldf(select V1 from df1, V2 from df2 where df1.V1 = df2.V1) which returns this error: Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: near from: syntax error) If anyone with sqldf knowledge could lend me a hand that would be great. Thanks! Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf merging with subset in specific range
Hi all: I have two data sets. Set A includes a long list of hits in a single column, say: m$V1 10 15 36 37 38 44 45 57 61 62 69 ...and so on Set B includes just a few key ranges set up by way of a minimum in column X and a maximum in column Y. Say, n$X n$Y 30 38 # range from 30 to 38 52 62 # range from 52 to 62 I would like the output to be the rows containing the following columns: m$V1 36 37 38 57 61 62 I am interested in isolating the hits in data set A that correspond to any of the hotspot ranges in data set B. I have downloaded sqldf and tried a couple things but I cannot do a traditional merge since set B is based on a range. I can always do a manual subset but I am trying to figure out if there is anything more expedient since these df's will be quite large. Thanks! Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] confirming a formula for use with lmer
Hello, I recently began using R and the lme4 package to carry out linear mixed effects analyses. I am interested in the effects of variables 'prime','time', and 'mood' on 'reaction_time' while taking into account the random effect 'subjects.' I've read through documentation on lme4 and came up with the following formula for use with lmer: reaction_time ~ (mood*prime*soa) + (1|subject) Prime and soa were repeated measures within subjects, while mood was manipulated between subjects. As I understand it, however, this distinction does not affect how the formula should be written. While I've done my background reading and think this formula is correct, I'd appreciate an expert with more experience than I to double check my work. Thanks in advance for any help, Matt The information in this e-mail is intended only for the ...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Updating Tom Short's R Reference Card, contacting him?
I am uncertain about how to acknowledge the fact that $ can do partial matching in the space of about 30 characters. One option is this: x[[name]] column named name x$name same as above (almost always) Is that better or worse than ignoring this issue, or is there an even better phrasing? As per the other suggestions, I fixed the matrices indexing info, pkg::foo() now has not usually required; and - now is explained as Left assignment in outer lexical scope; not for beginners Plus, I've been able to get in touch with Tom Short. :-) Thanks to Jeff Newmiller,Dennis Murphy, and Peter Dalgaard for these helpful suggestions and corrections! regards, m@ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Updating Tom Short's R Reference Card, contacting him?
I made an update/reboot of Tom Short's classic and public domain R Reference Card. His is from late 2004 and I've found myself giving it to new R users with additional notes about packages. If anyone knows how to reach Tom, that would be great. I am titling this reboot Short R Reference, in a play on his name, but I would like to know he wants his name (and/or email) on this version. Also if anyone feels like providing corrections or comments, the release candidate is here. To view it in full resolution, you may need to download it: https://docs.google.com/open?id=0B8NgE2q8ITzTQnhPTFVjVXlOaHM regards, m@ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Clustering groups according to multiple variables
Dear R help, I am trying to cluster my data according to group in a data frame such as the following: df=data.frame(group=rep(c(a,b,c,d),10),(replicate(100,rnorm(40 I'm not sure how to tell hclust() that I want to cluster according to the group variable. For example: dfclust=hclust(dist(df),ave) plot(dfclust) Clusters according to each individual row. What I'm looking for is an unrooted tree that will show similarity/dissimilarity among groups according to the data set as a whole. I appreciate the help, MO [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Clustering groups according to multiple variables
Dear R help, I am trying to cluster my data according to group in a data frame such as the following: df=data.frame(group=rep(c(a,b,c,d),10),(replicate(100,rnorm(40 I'm not sure how to tell hclust() that I want to cluster according to the group variable. For example: dfclust=hclust(dist(df),ave) plot(dfclust) Clusters according to each individual row. What I'm looking for is an unrooted tree that will show similarity/dissimilarity among groups according to the data set as a whole. I appreciate the help, MO [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] optimize and mcparallel problem
Dear list, I am running into 2 problems when using the optimize function from the stats package (note: I've also tried unsuccessfully to use optim, nlm, nlminb). The second problem is caused by my solution to the first, so I am asking if anyone has a better solution to the first question, or if there exists a solution to the second problem. I should also mention that what I am working on is a function for a package - so I need the code to be applicable to all platforms (I understand that 'multicore' doesn't really work on Windows, but for the second problem I mean all platforms, except windows) The first problem: I have a function that runs a linear mixed model with a constrained variance for one of the random effects, computes a loglikelihood ratio test statistic (LRT), and returns the absolute value of the difference between the LRT and some pre-defined value (e.g., 2). I have made a dummy function, called foo below that has the same inputs and outputs without the complicated inner workings of my actual function. My first problem, is that I don't just want to know the end value (x) that minimizes the output to foo (i.e., diff), but every x and the corresponding diff used by the optimize function. My solution to this is to create an object (vals) outside of foo and write to this object. foo - function(x){ vals - c(vals, x) diff - abs(x - 2) diff } This works well so far: vals - NULL out1 - optimize(foo, interval = seq(0, 4, 0.2)) vals However, the second problem arises if I want to use the parallel function in the multicore package: library(multicore) vals - NULL out2_tmp - mcparallel(optimize(foo, interval = seq(0, 4, 0.2))) out2 - collect(out2_tmp, wait = TRUE) vals Predictably, the child process does not return the vals object when I use the collect function. To summarize, my first question is whether or not there is a better way to return all of the values over which optimize evaluates my function. The second question is if I do use my solution to the first question, how can I get the vals object returned from the child process? Thanks anyone very much for any and all help!! Sincerely, Matthew -- Matthew Wolak PhD Candidate Evolution, Ecology, and Organismal Biology Graduate Program University of California Riverside http://student.ucr.edu/~mwola001/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hosted R
I looked in the archives and couldn't find anything that really addressed my question so here it is - Does anyone know of any web sites/environments that hosts R for free, web-based, multi-user access to the R engine. My apologies if the question is too simplistic for this forum. The reason I ask is that I'm looking at the possibility of establishing an R grid if one doesn't already exist, and if one does, then I'm looking for interfaces, protocols, and guidelines for adding an R node. -- Matthew K. Hettinger, Enterprise Architect and Systemist Mathet Consulting, Inc. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to sort huge ( 2^31 row) dataframes quickly
Hello all, I have some genetic datasets (gzipped) that contain 6 columns and upwards of 10s of billions of rows. The largest dataset is about 16 GB on file, gzipped (!). I need to sort them according to columns 1, 2, and 3. The setkey() function in the data.table package does this quickly, but of course we're limited by R not being able to index vectors with 2^31 elements, and bringing in only the parts of the dataset we need is not applicable here. I'm asking for practical advice from people who've done this or who have ideas. We'd like to be able to sort the biggest datasets in hours rather than days (or weeks!). We cannot have any process take over 50 GB RAM max (we'd prefer smaller so we can parallelize). . Relational databases seem too slow, but maybe I am wrong. A quick look at the bigmemory package doesn't turn up an ability to sort like this, but again, maybe I'm wrong. My computer programmer writes in C++, so if you have ideas in C++, that works too. Any help would be much appreciated... Thanks! Matt -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting standard errors for adjusted fixed effect sizes in lmer
Dear R help, Does no one have an idea of where I might find information that could help me with this problem? I apologize for re-posting - I have half a suspicion that my original message did not make it through. I hope you all had a good weekend and look forward to your reply, MO On Fri, Jul 20, 2012 at 11:56 AM, MO wrote: Dear R help list, I have done a lot of searching but have not been able to find an answer to my problem. I apologize in advance if this has been asked before. I am applying a mixed model to my data using lmer. I will use sample data to illustrate my question: library(lme4) library(arm) data(HR, package = SASmixed) str(HR) 'data.frame': 120 obs. of 5 variables: $ Patient: Factor w/ 24 levels 201,202,203,..: 1 1 1 1 1 2 2 2 2 2 ... $ Drug : Factor w/ 3 levels a,b,p: 3 3 3 3 3 2 2 2 2 2 ... $ baseHR : num 92 92 92 92 92 54 54 54 54 54 ... $ HR : num 76 84 88 96 84 58 60 60 60 64 ... $ Time : num 0.0167 0.0833 0.25 0.5 1 ... fm1 - lmer(HR ~ baseHR + Time + Drug + (1 | Patient), HR) fixef(fm1) ##Extract estimates of fixed effects (Intercept) baseHRTime Drugb Drugp 32.6037923 0.5881895 -7.0272873 4.6795262 -1.0027581 se.fixef(fm1) ##Extract standard error of estimates of fixed effects (Intercept) baseHRTime Drugb Drugp 9.9034008 0.1184529 1.4181457 3.5651679 3.5843026 ##Because the estimate of the fixed effects are displayed as differences from the intercept (I think?), I can back calculate the actual effect sizes easily enough. However, how would I do a similar calculation for the standard error for these effect sizes (since these error estimates are for the difference in means of effects) if my design isn't balanced (which confuses things tremendously when working with a data set as large as mine)? It may help to point out that I'm working with microarray data; applying the same model for each gene (hundreds of genes total) across multiple samples (hundreds of samples total), but as an R beginner I like to start with small data samples and work my way up. I appreciate the help, MO [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extracting standard errors for adjusted fixed effect sizes in lmer
Dear R help list, I have done a lot of searching but have not been able to find an answer to my problem. I apologize in advance if this has been asked before. I am applying a mixed model to my data using lmer. I will use sample data to illustrate my question: library(lme4) library(arm) data(HR, package = SASmixed) str(HR) 'data.frame': 120 obs. of 5 variables: $ Patient: Factor w/ 24 levels 201,202,203,..: 1 1 1 1 1 2 2 2 2 2 ... $ Drug : Factor w/ 3 levels a,b,p: 3 3 3 3 3 2 2 2 2 2 ... $ baseHR : num 92 92 92 92 92 54 54 54 54 54 ... $ HR : num 76 84 88 96 84 58 60 60 60 64 ... $ Time : num 0.0167 0.0833 0.25 0.5 1 ... fm1 - lmer(HR ~ baseHR + Time + Drug + (1 | Patient), HR) fixef(fm1) ##Extract estimates of fixed effects (Intercept) baseHRTime Drugb Drugp 32.6037923 0.5881895 -7.0272873 4.6795262 -1.0027581 se.fixef(fm1) ##Extract standard error of estimates of fixed effects (Intercept) baseHRTime Drugb Drugp 9.9034008 0.1184529 1.4181457 3.5651679 3.5843026 ##Because the estimate of the fixed effects are displayed as differences from the intercept (I think?), I can back calculate the actual effect sizes easily enough. However, how would I do a similar calculation for the standard error for these effect sizes (since these error estimates are for the difference in means of effects) if my design isn't balanced (which confuses things tremendously when working with a data set as large as mine)? It may help to point out that I'm working with microarray data; applying the same model for each gene (hundreds of genes total) across multiple samples (hundreds of samples total), but as an R beginner I like to start with small data samples and work my way up. I appreciate the help, MO [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.table vs plyr reg output
Hi Geoff, Please see this part of the r-help posting guide : For questions about functions in standard packages distributed with R (see the FAQ Add-on packages in R), ask questions on R-help. If the question relates to a contributed package , e.g., one downloaded from CRAN, try contacting the package maintainer first. You can also use find(functionname) and packageDescription(packagename) to find this information. ONLY send such questions to R-help or R-devel if you get no reply or need further assistance. This applies to both requests for help and to bug reports. Where I've capitalised ONLY since it is bold in the original HTML. I only saw your post thanks to Google Alerts. maintainer(data.table) returns the email address of the datatable-help list, with the posting guide in mind. However, for questions like this, I'd suggest the data.table tag on Stack Overflow (which I subscribe to) : http://stackoverflow.com/questions/tagged/data.table Btw, I recently presented at LondonR. Here's a link to the slides : http://datatable.r-forge.r-project.org/LondonR_2012.pdf Matthew -- View this message in context: http://r.789695.n4.nabble.com/data-table-vs-plyr-reg-output-tp4634518p4634865.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to convert list of matrix (raster:extract o/p) to data table with additional colums (polygon Id, class)
AKJ, Please see this recent answer : http://r.789695.n4.nabble.com/data-table-vs-plyr-reg-output-tp4634518p4634865.html Matthew -- View this message in context: http://r.789695.n4.nabble.com/how-to-convert-list-of-matrix-raster-extract-o-p-to-data-table-with-additional-colums-polygon-Id-cla-tp4634579p4634868.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] templated use of aggregate
Sorry, i'll try and put more flesh on the bones. please note, i changed the data in the example, as fiddling has raised another question that's best illustrated with a slightly different data set. first of all, when i do as you suggest, i obtain the following error: PxMat - aggregate(mm[,-1] ~ mm[,1], data=mm, sum) Error in aggregate.formula(mm[, -1] ~ mm[, 1], data = mm, sum) : 'names' attribute [3] must be the same length as the vector [1] my data.frame is an xts, and it looks like this: px_ym1 vol_ym1 2012-06-01 09:30:00 97.90 9 2012-06-01 09:30:00 97.90 60 2012-06-01 09:30:00 97.90 71 2012-06-01 09:30:00 97.90 5 2012-06-01 09:30:00 97.90 3 2012-06-01 09:30:00 97.90 21 2012-06-01 09:31:00 97.90 5 2012-06-01 09:31:00 97.89 192 2012-06-01 09:31:00 97.89 65 2012-06-01 09:31:00 97.89 73 2012-06-01 09:31:00 97.89 1 2012-06-01 09:31:00 97.89 1 2012-06-01 09:31:00 97.89 39 2012-06-01 09:31:00 97.90 15 2012-06-01 09:31:00 97.90 1 2012-06-01 09:31:00 97.89 1 2012-06-01 09:31:00 97.90 18 2012-06-01 09:31:00 97.89 1 2012-06-01 09:32:00 97.89 33 2012-06-01 09:34:00 97.89 1 2012-06-01 09:34:00 97.89 1 dput(mn) returns: dput(mn) structure(c(97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.89, 97.89, 97.89, 97.89, 97.89, 97.89, 97.9, 97.9, 97.89, 97.9, 97.89, 97.89, 97.89, 97.89, 9, 60, 71, 5, 3, 21, 5, 192, 65, 73, 1, 1, 39, 15, 1, 1, 18, 1, 33, 1, 1), .indexCLASS = c(POSIXct, POSIXt), .indexTZ = GMT, class = c(xts, zoo), index = structure(c(1338543000, 1338543000, 1338543000, 1338543000, 1338543000, 1338543000, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543120, 1338543240, 1338543240), tzone = GMT, tclass = c(POSIXct, POSIXt)), .Dim = c(21L, 2L), .Dimnames = list(NULL, c(px_ym1, vol_ym1))) as you can see, the xts data.frame xts data.frame that contains dates, prices and volumes. There is much more data over a long time period, and i'm interested in various sub-setting and then aggregate operations. I would like to split the data by time period and aggregate the data, such that i obtain a table which reports the volume traded at each price, for each of the time-period splits that i have chosen. I have employed the following approach: PxMat - aggregate(.~px_ym1, data=mn, sum) which yields: px_ym1 vol_ym1 1 97.89 408 2 97.90 208 and for subsets, i use the following grouping: PxMat30 - aggregate(.~px_ym1, data=mn[.indexmin(mn) == '30'], sum) Which yields: px_ym1 vol_ym1 1 97.9 169 and PxMat31 - aggregate(.~px_ym1, data=mn[.indexmin(mn) == '31'], sum) which yields: px_ym1 vol_ym1 1 97.89 373 2 97.90 39 and so on and so forth for each minute. when i try and sub-set using general notation, as follows: PxMat - aggregate(.~mn[,1], data=mn, sum) this yields a different form of output: px_ym1 px_ym1 vol_ym1 1 97.90 1076.79 408 2 97.89 979.00 208 the problem is that i now have the sum of the px_ym1 data (the sum of mn[,1]) hopefully things are now clearer - sorry to have wasted your time up until now. assuming that i have now made my situation clear, i am hope you can help with four specific questions. 1/ My data-sets are HUGE, so speed is an issue - is this the fastest way to sub-set and aggregate an xts? 2/ is there a way to do this for multiple splits? say a table for each minute, day, week, or month? the return would potentially be a list with a table for each day / minute etc showing volume traded at each price -- but it doesn't have to be a list ... i am writing a function with loops that would generate a table that reports volume traded at each price for each case of a specified time split (say for four tables, one for each minute in the example data, returned as a list). my solution is slow, it seems like something that someone would have done better already. is this the case? 3/ is there a way to do the sub-setting with templated variables? i would like to obtain the table i get with the named aggregate functions (reproduced above) with multiple data frames, as the column names will differ from time to time. i cannot figure out how to stop the command from summing the mn[,1] column when i stop using variable names. 4/ on a related note, is it possible to apply different functions to different columns of data? It would be nice, for example, if the table returned from an aggregate command could be made to be: px_ym1 count vol_ym1 1 97.90 11 408 2 97.89 10 208 where we have the price traded, the number of trades (a count of px_ym1 / mn[,1], and the sum of vol_ym1 (mn[,2]). thanks and best regards matt johnson On 13 June 2012 15:06, David Winsemius dwinsem...@comcast.net wrote: On Jun 12, 2012, at 11:32 PM, Matthew Johnson wrote: Dear R-help, I have an xts data
Re: [R] templated use of aggregate
Sorry about the cross posting - i didn't realise it was bad etiquette. my sessioninfo was as follows: sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: [1] xts_0.8-2 zoo_1.7-6 loaded via a namespace (and not attached): [1] grid_2.14.1lattice_0.20-0 i have now updated to R 2.15, and my session info is: sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: [1] xts_0.8-6 zoo_1.7-7 loaded via a namespace (and not attached): [1] grid_2.15.0lattice_0.20-6 tools_2.15.0 For the XTS object mn your suggestion still fails with an error: adf - aggregate(mn[,-1]~mn[,1], data=mn, sum); adf Error in aggregate.formula(mn[, -1] ~ mn[, 1], data = mn, sum) : 'names' attribute [3] must be the same length as the vector [1] however when i convert to a zoo with mnz - as.zoo(mn) I get some errors, but it works adf - aggregate(mnz[,-1]~mnz[,1], data=mnz, sum); adf Warning messages: 1: In zoo(rval, index(x)[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 2: In zoo(rval, index(x)[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 3: In zoo(rval[i], index(x)[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 4: In zoo(rval[i], index(x)[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 5: In zoo(xc[ind], ix[ind]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 6: In zoo(xc[ind], ix[ind]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique mnz[, 1] mnz[, -1] 197.90 408 297.89 208 So is this a bug in XTS? thanks for your patience mj On 13 June 2012 15:53, David Winsemius dwinsem...@comcast.net wrote: On Jun 13, 2012, at 9:38 AM, Matthew Johnson wrote: Sorry, i'll try and put more flesh on the bones. please note, i changed the data in the example, as fiddling has raised another question that's best illustrated with a slightly different data set. first of all, when i do as you suggest, i obtain the following error: PxMat - aggregate(mm[,-1] ~ mm[,1], data=mm, sum) Error in aggregate.formula(mm[, -1] ~ mm[, 1], data = mm, sum) : 'names' attribute [3] must be the same length as the vector [1] Very strange. When I just did it with the structure you (cross-) posted on SO I got: adf - aggregate(mm[,-1]~mm[,1], data=mm, sum); adf snipped warning messages mm[, 1] mm[, -1] 1 97.91 538 2 97.92 918 I had earlier tested it with a zoo object I had constructed and did it again with the structure below. mm[, 1] mm[, -1] 1 97.91 538 2 97.92 918 I'm using zoo_1.7-6 and R version 2.14.2 on a Mac. I do not remember you posting the requested information about your versions. -- David. my data.frame is an xts, and it looks like this: px_ym1 vol_ym1 2012-06-01 09:30:00 97.90 9 2012-06-01 09:30:00 97.90 60 2012-06-01 09:30:00 97.90 71 2012-06-01 09:30:00 97.90 5 2012-06-01 09:30:00 97.90 3 2012-06-01 09:30:00 97.90 21 2012-06-01 09:31:00 97.90 5 2012-06-01 09:31:00 97.89 192 2012-06-01 09:31:00 97.89 65 2012-06-01 09:31:00 97.89 73 2012-06-01 09:31:00 97.89 1 2012-06-01 09:31:00 97.89 1 2012-06-01 09:31:00 97.89 39 2012-06-01 09:31:00 97.90 15 2012-06-01 09:31:00 97.90 1 2012-06-01 09:31:00 97.89 1 2012-06-01 09:31:00 97.90 18 2012-06-01 09:31:00 97.89 1 2012-06-01 09:32:00 97.89 33 2012-06-01 09:34:00 97.89 1 2012-06-01 09:34:00 97.89 1 dput(mn) returns: dput(mn) structure(c(97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.89, 97.89, 97.89, 97.89, 97.89, 97.89, 97.9, 97.9, 97.89, 97.9, 97.89, 97.89, 97.89, 97.89, 9, 60, 71, 5, 3, 21, 5, 192, 65, 73, 1, 1, 39, 15, 1, 1, 18, 1, 33, 1, 1), .indexCLASS = c(POSIXct, POSIXt), .indexTZ = GMT, class = c(xts, zoo), index = structure(c(1338543000, 1338543000, 1338543000, 1338543000, 1338543000, 1338543000, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543120, 1338543240, 1338543240), tzone = GMT, tclass = c(POSIXct, POSIXt)), .Dim = c(21L, 2L), .Dimnames = list(NULL, c(px_ym1, vol_ym1))) as you can see, the xts data.frame xts data.frame
Re: [R] templated use of aggregate
thank you for your patience. i assure you i will get better with the appropriate etiquette - and hopefully eventually contribute. On 13 June 2012 16:18, David Winsemius dwinsem...@comcast.net wrote: On Jun 13, 2012, at 10:09 AM, Matthew Johnson wrote: my sessioninfo was as follows: sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: [1] xts_0.8-2 zoo_1.7-6 loaded via a namespace (and not attached): [1] grid_2.14.1 lattice_0.20-0 i have now updated to R 2.15, and my session info is: sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: [1] xts_0.8-6 zoo_1.7-7 loaded via a namespace (and not attached): [1] grid_2.15.0 lattice_0.20-6 tools_2.15.0 For the XTS object mn your suggestion still fails with an error: adf - aggregate(mn[,-1]~mn[,1], data=mn, sum); adf Error in aggregate.formula(mn[, -1] ~ mn[, 1], data = mn, sum) : 'names' attribute [3] must be the same length as the vector [1] however when i convert to a zoo with mnz - as.zoo(mn) I get some errors, but it works Those are warnings, ... not errors. adf - aggregate(mnz[,-1]~mnz[,1], data=mnz, sum); adf Warning messages: 1: In zoo(rval, index(x)[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 2: In zoo(rval, index(x)[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 3: In zoo(rval[i], index(x)[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 4: In zoo(rval[i], index(x)[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 5: In zoo(xc[ind], ix[ind]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 6: In zoo(xc[ind], ix[ind]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique mnz[, 1] mnz[, -1] 1 97.90 408 2 97.89 208 So is this a bug in XTS? It does look that way to me. The correct way to report this is to contact the package maintainer (copied on this message) , (although I did notice that Joshua Ulrich already looked at this posting in SO and he is on the xts development team). You should have put in this at the beginning of your code : library(xts) -- David. thanks for your patience mj On 13 June 2012 15:53, David Winsemius dwinsem...@comcast.net wrote: On Jun 13, 2012, at 9:38 AM, Matthew Johnson wrote: Sorry, i'll try and put more flesh on the bones. please note, i changed the data in the example, as fiddling has raised another question that's best illustrated with a slightly different data set. first of all, when i do as you suggest, i obtain the following error: PxMat - aggregate(mm[,-1] ~ mm[,1], data=mm, sum) Error in aggregate.formula(mm[, -1] ~ mm[, 1], data = mm, sum) : 'names' attribute [3] must be the same length as the vector [1] Very strange. When I just did it with the structure you (cross-) posted on SO I got: adf - aggregate(mm[,-1]~mm[,1], data=mm, sum); adf snipped warning messages mm[, 1] mm[, -1] 1 97.91 538 2 97.92 918 I had earlier tested it with a zoo object I had constructed and did it again with the structure below. mm[, 1] mm[, -1] 1 97.91 538 2 97.92 918 I'm using zoo_1.7-6 and R version 2.14.2 on a Mac. I do not remember you posting the requested information about your versions. -- David. my data.frame is an xts, and it looks like this: px_ym1 vol_ym1 2012-06-01 09:30:00 97.90 9 2012-06-01 09:30:00 97.90 60 2012-06-01 09:30:00 97.90 71 2012-06-01 09:30:00 97.90 5 2012-06-01 09:30:00 97.90 3 2012-06-01 09:30:00 97.90 21 2012-06-01 09:31:00 97.90 5 2012-06-01 09:31:00 97.89 192 2012-06-01 09:31:00 97.89 65 2012-06-01 09:31:00 97.89 73 2012-06-01 09:31:00 97.89 1 2012-06-01 09:31:00 97.89 1 2012-06-01 09:31:00 97.89 39 2012-06-01 09:31:00 97.90 15 2012-06-01 09:31:00 97.90 1 2012-06-01 09:31:00 97.89 1 2012-06-01 09:31:00 97.90 18 2012-06-01 09:31:00 97.89 1 2012-06-01 09:32:00 97.89 33 2012-06-01 09:34:00 97.89 1 2012-06-01 09:34:00 97.89 1 dput(mn) returns: dput(mn) structure(c(97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.89, 97.89, 97.89, 97.89, 97.89, 97.89, 97.9, 97.9, 97.89
[R] what does .indexDate() do - R::xts
Dear R experts, I am learning the very useful XTS package, but cannot figure out the purpose of some commands. in particular, the .indexDate() command does not work as expected. say: x - timeBasedSeq('2010-01-01/2010-01-02 12:00') x - xts(1:length(x), x) then i can subset on date as follows: x['2010-01-01'] however the .indexDate() command does not work as expected; in particular the following does not return anything. x[.indexDate(x) == '2010-01-01'] I am sure i am missing something - what is .indexDate() supposed to do? thanks and best regards matt johnson [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] what does .indexDate() do - R::xts
thanks. i think i understand: the difference is that the first command converts my 'searched-for' date to a number and matches it, but the second does not? On 13 June 2012 12:58, Joshua Ulrich josh.m.ulr...@gmail.com wrote: On Tue, Jun 12, 2012 at 9:48 PM, Matthew Johnson mcoog...@gmail.com wrote: Dear R experts, I am learning the very useful XTS package, but cannot figure out the purpose of some commands. in particular, the .indexDate() command does not work as expected. say: x - timeBasedSeq('2010-01-01/2010-01-02 12:00') x - xts(1:length(x), x) then i can subset on date as follows: x['2010-01-01'] however the .indexDate() command does not work as expected; in particular the following does not return anything. x[.indexDate(x) == '2010-01-01'] That's because all comparisons are FALSE. .indexDate() returns the index of x, converted to the numeric representation of the Date class (i.e. as.Date(.indexDate(x), origin=1970-01-01) will be the Date of the index values). '2010-01-01' is a character string. I am sure i am missing something - what is .indexDate() supposed to do? Though it's not well documented, what it's doing is pretty clear from the source: R .indexDate function (x) { .index(x)%/%86400L } environment: namespace:xts thanks and best regards matt johnson Best, -- Joshua Ulrich | FOSS Trading: www.fosstrading.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.