[R] A better scales::dollar() ?
I'm writing a quite large document in Rmarkdown which has financial data in it. I format that data using scales::dollar() currently something like this: > > require (scales) > x = 10 > cat (dollar (x, prefix ="£", big.mark=",")) £100,000 But actually, I'd quite like to get £100k out in that instance so I'd do: > cat (dollar (x/10^3, prefix ="£", suffix="k" )) £100k But x could be 100 or 10,000,000. I want some form of 'automatic' version that might give me something like: > > require (scales) > y = 0:7 > x = 1^y > dollar(x, prefix="£") [1] "£1" "£10" "£100""£1,000" "£10,000" "£100,000""£1,000,000" "£10,000,000" But what I want is more like: £1.00 £10.00 £100 £1k £10k £100k £1m £10m I'm sure I can write something as a little function, but before I do - is there something already out there? I have a similar need to format milligrams through to kilograms. This message may contain confidential information. If you are not the intended recipient please inform the sender that you have received the message in error before deleting it. Please do not disclose, copy or distribute information in this e-mail or take any action in relation to its contents. To do so is strictly prohibited and may be unlawful. Thank you for your co-operation. NHSmail is the secure email and directory service available for all NHS staff in England and Scotland. NHSmail is approved for exchanging patient data and other sensitive information with NHSmail and other accredited email services. For more information and to find out how you can switch, https://portal.nhs.net/help/joiningnhsmail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I can't get seq to behave how I think it should
Well I get the issue with finite precision. As in SQRT(2) * SQRT(2) is not 2. What surprised me was that seq(1.4, 2.1, by=0.001) starts at 1.3999 and not 1.4! -Original Message- From: PIKAL Petr [mailto:petr.pi...@precheza.cz] Sent: 17 January 2019 14:30 To: POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST); Ben Tupper Cc: r-help@r-project.org Subject: RE: [R] I can't get seq to behave how I think it should Hi It is not seq problem, it is floating point numbers representation in finit precision problem. Ben pointed to it and you could learn about it from FAQ 7.31. Cheers Petr > -Original Message- > From: POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION > TRUST) > Sent: Thursday, January 17, 2019 2:56 PM > To: PIKAL Petr ; Ben Tupper > > Cc: r-help@r-project.org > Subject: RE: [R] I can't get seq to behave how I think it should > > Thanks guys. > > I've used Petr's method and its working for me. > > If the data had been from a calculation I'd have rounded it... just > didn't expect seq to break it! > > C > > -Original Message- > From: PIKAL Petr [mailto:petr.pi...@precheza.cz] > Sent: 17 January 2019 13:53 > To: Ben Tupper; POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS > FOUNDATION TRUST) > Cc: r-help@r-project.org > Subject: RE: [R] I can't get seq to behave how I think it should > > Hi > > Or you could use rounding. > which(round(lut, 3)==1.8) > [1] 401 > > Cheers > Petr > > > -Original Message----- > > From: R-help On Behalf Of Ben Tupper > > Sent: Thursday, January 17, 2019 2:43 PM > > To: POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS > FOUNDATION TRUST) > > > > Cc: r-help@r-project.org > > Subject: Re: [R] I can't get seq to behave how I think it should > > > > Hi, > > > > This looks like a floating point reality bump - see > > > > https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-thin > > k- these-numbers-are-equal_003f > > <https://cran.r-project.org/doc/FAQ/R- > > FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f> > > > > You can use other methods to finding your row - I would opt for > > findInterval() > > > > > lut = seq(1.4, 2.1, by=0.001) > > > findInterval(1.8, lut) > > [1] 401 > > > > findInterval() uses a rapid search to find the index in the look up > > table (lut) that is just less than or equal to the search value (in > > your example > 1.8). > > > > Cheers, > > Ben > > > > > On Jan 17, 2019, at 8:33 AM, POLWART, Calum (COUNTY DURHAM AND > > DARLINGTON NHS FOUNDATION TRUST) via R-help > > wrote: > > > > > > I am using seq with the expression seq(1.4, 2.1, by=0.001) to > > > create a sequence of references from 1.4 to 2.1 in 0.001 > > > increments. They appear to be created correctly. They have a > > > related pair of data which for the purposes of this we will call > > > val. I'm interested in the content on the row with seq = 1.8. But > > > I can't seem to get it returned. I can get other values but not > > > 1.8! yet looking at row > > > 401 there is nothing to indicate an issue > > > > > >> a = 1.4 > > >> b = 2.1 > > >> seq = seq(a, b, by=0.001) > > >> val = ceiling(seq * 50) > > >> s=data.frame(seq, val) > > >> s$val[seq==1.799] > > > [1] 90 > > >> s$val[s$seq==1.8] > > > numeric(0) > > >> s$val[seq==1.8] > > > numeric(0) > > >> s$val[s$seq==1.800] > > > numeric(0) > > >> s$val[s$seq==1.801] > > > [1] 91 > > >> head(s[s$seq>1.798,]) > > > seq val > > > 400 1.799 90 > > > 401 1.800 90 > > > 402 1.801 91 > > > 403 1.802 91 > > > 404 1.803 91 > > > 405 1.804 91 > > > > > > > > > Can anyone explain what's going on here and how I would correctly > > > find the > > content of row 401 by using an expression to equal the seq column? > > > > > > > > > > > > > > > > > > > > > *** > > *** > > > ** > > > > > > This message may contain confidential information. If > > > ...{{dropped:25}} > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSC
Re: [R] I can't get seq to behave how I think it should
Thanks guys. I've used Petr's method and its working for me. If the data had been from a calculation I'd have rounded it... just didn't expect seq to break it! C -Original Message- From: PIKAL Petr [mailto:petr.pi...@precheza.cz] Sent: 17 January 2019 13:53 To: Ben Tupper; POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) Cc: r-help@r-project.org Subject: RE: [R] I can't get seq to behave how I think it should Hi Or you could use rounding. which(round(lut, 3)==1.8) [1] 401 Cheers Petr > -Original Message- > From: R-help On Behalf Of Ben Tupper > Sent: Thursday, January 17, 2019 2:43 PM > To: POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) > > Cc: r-help@r-project.org > Subject: Re: [R] I can't get seq to behave how I think it should > > Hi, > > This looks like a floating point reality bump - see > > https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think- > these-numbers-are-equal_003f <https://cran.r-project.org/doc/FAQ/R- > FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f> > > You can use other methods to finding your row - I would opt for > findInterval() > > > lut = seq(1.4, 2.1, by=0.001) > > findInterval(1.8, lut) > [1] 401 > > findInterval() uses a rapid search to find the index in the look up > table (lut) that is just less than or equal to the search value (in your > example 1.8). > > Cheers, > Ben > > > On Jan 17, 2019, at 8:33 AM, POLWART, Calum (COUNTY DURHAM AND > DARLINGTON NHS FOUNDATION TRUST) via R-help > wrote: > > > > I am using seq with the expression seq(1.4, 2.1, by=0.001) to create > > a sequence of references from 1.4 to 2.1 in 0.001 increments. They > > appear to be created correctly. They have a related pair of data > > which for the purposes of this we will call val. I'm interested in > > the content on the row with seq = 1.8. But I can't seem to get it > > returned. I can get other values but not 1.8! yet looking at row > > 401 there is nothing to indicate an issue > > > >> a = 1.4 > >> b = 2.1 > >> seq = seq(a, b, by=0.001) > >> val = ceiling(seq * 50) > >> s=data.frame(seq, val) > >> s$val[seq==1.799] > > [1] 90 > >> s$val[s$seq==1.8] > > numeric(0) > >> s$val[seq==1.8] > > numeric(0) > >> s$val[s$seq==1.800] > > numeric(0) > >> s$val[s$seq==1.801] > > [1] 91 > >> head(s[s$seq>1.798,]) > > seq val > > 400 1.799 90 > > 401 1.800 90 > > 402 1.801 91 > > 403 1.802 91 > > 404 1.803 91 > > 405 1.804 91 > > > > > > Can anyone explain what's going on here and how I would correctly > > find the > content of row 401 by using an expression to equal the seq column? > > > > > > > > > > > > > *** > *** > > ** > > > > This message may contain confidential information. If > > ...{{dropped:25}} > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ This message may contain confidential information. If you are not the intended recipient please inform the sender that you have received the message in error before deleting it. Please do not disclose, copy or distribute information in this e-mail or take any action in relation to its contents. To do so is strictly prohibited and may be unlawful. Thank you for your co-operation. NHSmail is the secure email and directory service available for all NHS staff in England and Scotland
[R] I can't get seq to behave how I think it should
I am using seq with the expression seq(1.4, 2.1, by=0.001) to create a sequence of references from 1.4 to 2.1 in 0.001 increments. They appear to be created correctly. They have a related pair of data which for the purposes of this we will call val. I'm interested in the content on the row with seq = 1.8. But I can't seem to get it returned. I can get other values but not 1.8! yet looking at row 401 there is nothing to indicate an issue > a = 1.4 > b = 2.1 > seq = seq(a, b, by=0.001) > val = ceiling(seq * 50) > s=data.frame(seq, val) > s$val[seq==1.799] [1] 90 > s$val[s$seq==1.8] numeric(0) > s$val[seq==1.8] numeric(0) > s$val[s$seq==1.800] numeric(0) > s$val[s$seq==1.801] [1] 91 > head(s[s$seq>1.798,]) seq val 400 1.799 90 401 1.800 90 402 1.801 91 403 1.802 91 404 1.803 91 405 1.804 91 Can anyone explain what's going on here and how I would correctly find the content of row 401 by using an expression to equal the seq column? This message may contain confidential information. If yo...{{dropped:19}} __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odfWeave - A loop of the "same" data
Before I go and do this another way - can I check if anyone has a way of looping through data in odfWeave (or possibly sweave) to do a repeating analysis on subsets of data? For simplicity lets use mtcars dataset in R to explain. Dataset looks like this: > mtcars mpg cyl disp hp drat wt ... Mazda RX4 21.0 6 160 110 3.90 2.62 ... Mazda RX4 Wag 21.0 6 160 110 3.90 2.88 ... Datsun 71022.8 4 108 93 3.85 2.32 ... Say I wanted to have a 'catalogue' style report from mtcars, where on each page I would perhaps have the Rowname as a heading and then plot a graph of mpg highlighting that specific car Then add a page break and *do the same for the next car*. I can manually do this of course, but it is effectively a loop something like this: for (n in length(mtcars$mpg)) { barplot (mtcars$mpg, col=c(rep(1,n-1),2,rep(1,length(mtcars$mpg)-n))) } There is a odfWeave page break function so I can do that sort of thing (I think). But I don't think I can output more than one image can I? In reality I will want several images and a table per "catalogue" page. At the moment I think I need to create a master odt document, and create individual catalogue pages. And merge them into one document - but that feels clunky (unless I can script the merge!) Anyone got a better way? This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ?currency challenge ?knapsack challenge ?probabilities
Hi Jim Working on basis of exact match. but the 25% inncrements are rounded to imtegers, so like buying from a shop priced in whole numbers but changeis what you expect not 'roughly right' Thanks Calum On 27 Dec 2015, at 22:04, Jim Lemon mailto:drjimle...@gmail.com>> wrote: Hi Calum, Does this include an error tolerance for the match between the ordered and delivered quantities? That is, is it okay to have a maximum of one unit difference or do deliveries have to exactly match orders? Jim On Sun, Dec 27, 2015 at 10:08 PM, Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) mailto:calum.polw...@nhs.net>> wrote: I am currently working on a project that is producing Gigabyte sized vectors that kill R. It can take 24 hours to run. So frustrating to get that far and have to scale back my inputs and rerun... The problem i'm trying to solve as i see it is very similar to optimal currency denominations problems combined with a knapsack problem. Let me TRY to explain. We have a product that we want to manufacture in as few sizes as possible. like currency if you want 30cents but a 30cent coin doesnt exist you can join multiple products together. (3x10 cent, 25cent +5 cent, etc) Unlike currency we dont need every value to be possible, we have a list of known values which are effectively related to each other by the next size up being 25% bigger. So for instance 64, 80 100. We have some rules that say you can't use more than X products combined to make the final size. A bit like saying never give more than 10 coins as change, so you cant issue 20x5cents for a dollar of change. All of that fits a standard currency denomination challenge. We dont need the combinations to be calculated using greedy method. [We will calculate and store as a table] BUT - we do have a manufacturing limitation that means can manufacture to any whole number size, we cant do smaller than size5. (We dont go as low as that anyway... size 11 is as low as needed). So different from any currency problem I've seen where the lowest coin size is always a 1 allowing any size to be produced. So i have three questions I'm trying to answer: - what is the smallest product range we can make that achieves our rules for max combinations of sizes? - Is there a more optimal range. Say the smallest range was 4 sizes, for example 5,6,23,40. Its possible adding a 22 and a 46 to that may actually be cheaper than supplying 2x5 and 2x6 or 2x23... Currently I'm identifying every possible combination into a matrix. We have a manufacturing constraint of max size 49 as well. So i take every end user size possible (from 11 thru to 125). For each size i then take every combination of possible sizes from 5 to 49 (45 sizes) that we COULD make and work out how i can achieve all the possible end user sizes, discarding any combinations that break our rules for max combinations. Thats a giant set of for loops. Once i establish the options we can apply the manufacturing costs and usage data to find the answer. For now 45 sizes,combined in any of up to 5 different combinations to do 10 end user sizes is creating vectors too big for R to handle... Long explanation of the problem, to basically say... has anyone come across a function in R that might simplify this? Sent from TypeMail<http://www.typeapp.com/r> On 27 Dec 2015, at 08:00, "Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)" mailto:calum.polw...@nhs.net><mailto:calum.polw...@nhs.net<mailto:calum.polw...@nhs.net>>> wrote: This message may contain confidential information. If you are not the intended recipient please inform the sender that you have received the message in error before deleting it. Please do not disclose, copy or distribute information in this e-mail or take any action in reliance on its contents: to do so is strictly prohibited and may be unlawful. Thank you for your co-operation. NHSmail is the secure email and directory service available for all NHS staff in England and Scotland NHSmail is approved for exchanging patient data and other sensitive information with NHSmail and GSi recipients NHSmail provides an email address for your career in the NHS and can be accessed anywhere [[alternative HTML version deleted]] __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.o
[R] ?currency challenge ?knapsack challenge ?probabilities
I am currently working on a project that is producing Gigabyte sized vectors that kill R. It can take 24 hours to run. So frustrating to get that far and have to scale back my inputs and rerun... The problem i'm trying to solve as i see it is very similar to optimal currency denominations problems combined with a knapsack problem. Let me TRY to explain. We have a product that we want to manufacture in as few sizes as possible. like currency if you want 30cents but a 30cent coin doesnt exist you can join multiple products together. (3x10 cent, 25cent +5 cent, etc) Unlike currency we dont need every value to be possible, we have a list of known values which are effectively related to each other by the next size up being 25% bigger. So for instance 64, 80 100. We have some rules that say you can't use more than X products combined to make the final size. A bit like saying never give more than 10 coins as change, so you cant issue 20x5cents for a dollar of change. All of that fits a standard currency denomination challenge. We dont need the combinations to be calculated using greedy method. [We will calculate and store as a table] BUT - we do have a manufacturing limitation that means can manufacture to any whole number size, we cant do smaller than size5. (We dont go as low as that anyway... size 11 is as low as needed). So different from any currency problem I've seen where the lowest coin size is always a 1 allowing any size to be produced. So i have three questions I'm trying to answer: - what is the smallest product range we can make that achieves our rules for max combinations of sizes? - Is there a more optimal range. Say the smallest range was 4 sizes, for example 5,6,23,40. Its possible adding a 22 and a 46 to that may actually be cheaper than supplying 2x5 and 2x6 or 2x23... Currently I'm identifying every possible combination into a matrix. We have a manufacturing constraint of max size 49 as well. So i take every end user size possible (from 11 thru to 125). For each size i then take every combination of possible sizes from 5 to 49 (45 sizes) that we COULD make and work out how i can achieve all the possible end user sizes, discarding any combinations that break our rules for max combinations. Thats a giant set of for loops. Once i establish the options we can apply the manufacturing costs and usage data to find the answer. For now 45 sizes,combined in any of up to 5 different combinations to do 10 end user sizes is creating vectors too big for R to handle... Long explanation of the problem, to basically say... has anyone come across a function in R that might simplify this? Sent from TypeMail<http://www.typeapp.com/r> On 27 Dec 2015, at 08:00, "Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)" mailto:calum.polw...@nhs.net>> wrote: This message may contain confidential information. If you are not the intended recipient please inform the sender that you have received the message in error before deleting it. Please do not disclose, copy or distribute information in this e-mail or take any action in reliance on its contents: to do so is strictly prohibited and may be unlawful. Thank you for your co-operation. NHSmail is the secure email and directory service available for all NHS staff in England and Scotland NHSmail is approved for exchanging patient data and other sensitive information with NHSmail and GSi recipients NHSmail provides an email address for your career in the NHS and can be accessed anywhere [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Not sure this is something R could do but it feels like it should be.
Some colleagues nationally have developed a system which means they can pick the optimal sets of doses for a drug. The system could apply to a number of drugs. But the actual doses might vary. To try and explain this in terms that the average Joe on the street might understand if you have some amoxicillin antibiotic for a chest infection the normal dose for an adult is 250 to 500mg increased to maybe 1000mg in severe cases. For a child it is dosed from a liquid and people usually go from 62.5mg, 125mg to 250mg although you could measure any volume you wanted. What this new method has developed is a means to pick the "right" standard doses so what above is 62.5, 125, 250, 500, 1000. However the method they've used is really engineered about ensure the jump between doses is correct - you'll notice that the list above is a doubling up method. But you can also have a doubling up method that went 50, 100, 200, 400, 800, 1600 and pretty much as many as you can think of depending on your starting point and there is no scientific means to pick that starting point. So colleagues have developed their rather more complex equivalent of the doubling method to determine the doses they need but they need to know if they should start at 40, 50, 62.5 or some other number. Once they have the starting number they can calculate all the other doses. I realise R can do that, and I realise using a loop of possible starting numbers it can build all those options. Each patient then has a theoretical dose they should get lets say that's 10mg/kg and you might treat patients from 5 to 120kg. They are then looking to calculate the variance for each dose range so if we take the 50, 100, 200, 400 model and said you'd give 50mg to anyone needing 0?? to 75mg 100mg to anyone needing 76 - 150mg etc... from there they are taking that range and saying that's a 31% overdose to a 33% underdose. Then they want to find if there is a starting number which minimises the extent of under and overdosing... Anyone know of an existing stats function in R that can easily do that and almost then report from some inputs a single number that is the "best fit"? Calum This message may contain confidential information. If yo...{{dropped:22}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing two different 'survival' events for the same subject using survdiff?
> It isn't that complex: > > myDataLong <- data.frame(Time=c(A, C), Censored=c(B, D), group=rep(0:1, > times=c(length(A), length(C > Fit = survfit(Surv(Time, Censored==0) ~ group, data=myDataLong) > plot(Fit, col=1:2) > survdiff(Surv(Time, Censored==0) ~ group, data=myDataLong) Yes - for the example its not complex - but once we get down to having more data columns I think it may... Maybe I ignore those and just build 'myDataLong' for this specific test. > However, your approach (a 'wide' data frame) suggests that there are equal > numbers in the two survival > studies. Are they even the same people? Is it even the same study? If so, > this is a competing risks question > and would have to be approached differently. Yes its the same patients. The two events are technically independant of each other but the hope is that the easier outcome measure would predict the other... I'm not familliar with competing risks and so will have to read up on it but it isn't a scenario where A or B happens, A happens and B happens and you might expect A happened because B happened... > And, of course, absence of evidence is not evidence of absence. Failing to > reject the null hypothesis that the > distributions are different is not proof that the distributions are equal. Yes absolutely - however I'm half expecting to detect a difference and so then dismiss using A as a surrogate of B... Thanks -----Original Message- From: Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) [mailto:calum.polw...@nhs.net] Sent: Monday, April 29, 2013 4:48 AM To: r-help@r-project.org Subject: [R] Comparing two different 'survival' events for the same subject using survdiff? I have a dataset which for the sake of simplicity has two endpoints. We would like to test if two different end-points have the same eventual meaning. To try and take an example that people might understand better: Lets assume we had a group of subjects who all received a treatment. The could stop treatment for any reason (side effects, treatment stops working etc). Getting that data is very easy. Measuring if treatment stops working is very hard to capture... so we would like to test if duration on treatment (easy) is the same as time to treatment failure (hard). My data might look like this: A = c(9.77, 0.43, 0.03, 3.50, 7.07, 6.57, 8.57, 2.30, 6.17, 3.27, 2.57, 0.77) B = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored) C = c( 9.80, 0.43, 5.93, 8.43, 6.80, 2.60, 8.93, 8.37, 12.23, 5.83, 13.17, 0.77) D = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored) myData = data.frame (TimeOnTx = A, StillOnTx = B, TimeToFailure = C, NotFailed = D) We can do a survival analysis on those individually: OnTxFit = survfit (Surv ( TimeOnTx, StillOnTx==0 ) ~ 1 , data = myData) FailedFit = survfit (Surv ( TimeToFailure , NotFailed==0 ) ~ 1 , data = myData) plot(OnTxFit) lines(OnTxFit) But how can I do a survdiff type of comparison between the two? Do I have to restructure the data so that Time's are all in one column, Event in another and then a Group to indicate what type of event it is? Seems a complex way to do it (especially as the dataset is of course more complex than I've just shown)... so I thought maybe I'm missing something... This message may contain confidential information. If yo...{{dropped:29}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparing two different 'survival' events for the same subject using survdiff?
I have a dataset which for the sake of simplicity has two endpoints. We would like to test if two different end-points have the same eventual meaning. To try and take an example that people might understand better: Lets assume we had a group of subjects who all received a treatment. The could stop treatment for any reason (side effects, treatment stops working etc). Getting that data is very easy. Measuring if treatment stops working is very hard to capture... so we would like to test if duration on treatment (easy) is the same as time to treatment failure (hard). My data might look like this: A = c(9.77, 0.43, 0.03, 3.50, 7.07, 6.57, 8.57, 2.30, 6.17, 3.27, 2.57, 0.77) B = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored) C = c( 9.80, 0.43, 5.93, 8.43, 6.80, 2.60, 8.93, 8.37, 12.23, 5.83, 13.17, 0.77) D = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored) myData = data.frame (TimeOnTx = A, StillOnTx = B, TimeToFailure = C, NotFailed = D) We can do a survival analysis on those individually: OnTxFit = survfit (Surv ( TimeOnTx, StillOnTx==0 ) ~ 1 , data = myData) FailedFit = survfit (Surv ( TimeToFailure , NotFailed==0 ) ~ 1 , data = myData) plot(OnTxFit) lines(OnTxFit) But how can I do a survdiff type of comparison between the two? Do I have to restructure the data so that Time's are all in one column, Event in another and then a Group to indicate what type of event it is? Seems a complex way to do it (especially as the dataset is of course more complex than I've just shown)... so I thought maybe I'm missing something... This message may contain confidential information. If yo...{{dropped:19}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] matchit - can I weight the parameters?
This may be a really obvious question but I just can't figure out how to do it. I have a small dataset that I am trying to compare to some controls. It is essential that the controls are matched on Cancer Stage (a numerical factor between 1 and 4), and then ideally on Age (integer), Gender (factor), Performance Status(factor). I'm using matchit to try and do this, but it seems to give equal priority to all my variables so I can be relatively well matched on Age, Sex and PS but not exactly matched on Stage. Stage is the biggest influence on outcome... so I must match it as close to perfect as possible even if that means dropping some data from the 'treatment' group. Here's some code: match = matchit(Group ~ Stage + Age + Gender + PS, myData, method="optimal") matchedData = match.data(match) by (matchedData$Stage, matchedData$Group, table) matchedData$GP: 0 1 3A 3B 4 1 6 9 10 myreData$GP: 1 1 3A 3B 4 1 3 9 13 Can anyone point me to a method that tells R to prioritise Stage over the others? Thanks in advance This message may contain confidential information. If yo...{{dropped:19}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan Meier - not for dates
> 2. The answer will be wrong. The reason is that the censoring occurs on a > time scale, not a $ scale: you don't stop observing someone because > total cost hits a threshold, but because calendar time does. The KM routines > assume that the censoring process and the event process are on the > same scale. > The result can be an overestimation of cost. See Dan-Yu Lin, Biometrics > 1997, "Estimating medical costs from incomplete follow-up data". Having now skimmed the paper this is long term follow-up. In my particular case the patients are getting treatment for relatively short periods (median time to stopping treatment will be ~ 9months) and will discontinue treatment relatively quickly (I'd be surprised if anyone is still on treatment 3-4 years out). I only want the costs of that treatment not the costs for their overall care to death. I'm not sure how that affects things but hoping it makes life simpler. Calum This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan Meier - not for dates
> 2. The answer will be wrong. The reason is that the censoring occurs on a > time scale, not a $ scale: you don't stop observing someone because > total cost hits a threshold, but because calendar time does. The KM routines > assume that the censoring process and the event process are on the > same > scale. > The result can be an overestimation of cost. See Dan-Yu Lin, Biometrics > 1997, "Estimating medical costs from incomplete follow-up data". > > Terry Therneau Thanks that's extremely useful. I'll dig out that reference. You are correct my censoring is happening on an event - (dis)continuation of treatment - not on reaching a cumulative cost. Calum This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan Meier - not for dates
Thanks for the reply. The treatment is effectively for a chronic condition - so you stay on the treatment till it stops working. We know from trials how long that should be and we know the theoretical cost of that treatment but that's based on the text book dose (patients dose reduce and delay treatment and its based on weight so variable). We've been asked to provide our national planning team with an "average" cost based on our early experiences. So we have suggested to them we might be able to get a median cost. Some patients will stay on treatment several years so it will be impossible to get an average for years. So the censored patients will be those still on treatment (the event being stopping treatment) I'll give what you've suggested a go. Thanks Calum Polwart BSc(Hons) MSc MRPharmS SPres IPres Network Pharmacist - NECN and Pharmacy Clinical Team Manager (Cancer & Aseptic Services) - CDDFT Our website has now been unlocked and updated. Should you require contacts, meeting details, publications etc, please visit us on www.cancernorth.nhs.uk From: Lancaster, Robert (Orbitz) [robert.lancas...@orbitz.com] Sent: 03 November 2011 19:55 To: Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST); r-help@r-project.org Subject: RE: Kaplan Meier - not for dates I think it really depends on what your event of interest is. If your event is that the patient got better and "left treatment" then I think this could work. You would have to mark as censored any patient still in treatment or any patient that stopped treatment w/o getting better (e.g. in the case of death). You would then be predicting the cost required to make the patient well enough to leave treatment. It is a little non-standard to use $ instead of time, but time is money after all. You could set up your data frame with two columns: 1) cost 2) event/censored. Then create your survival object: mySurv = Surv(my_data$cost,my_data$event) And then use survfit to create your KM curves: myFit = survfit(mySurv~NULL) If you have other explanatory variables that you think may influence the cost, you can of course add them to your data frame and change the formula you use in survfit. For instance, you could have some severity measure, e.g. High, Medium, Low. You could then do: myFit = survfit(mySurv~my_data$severity) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) Sent: Monday, October 31, 2011 1:29 PM To: r-help@r-project.org Subject: [R] Kaplan Meier - not for dates I have some data which is censored and I want to determine the median. Its actually cost data for a cohort of patients, many of whom are still on treatment and so are censored. I can do the same sort of analysis for a survival curve and get the median survival... ...but can I just use the survival curve functions to plot an X axis that is $ rather than date? If not is there some other way to achieve this? Thanks Calum This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This message may contain confidential information. If you are not the intended recipient please inform the sender that you have received the message in error before deleting it. Please do not disclose, copy or distribute information in this e-mail or take any action in reliance on its contents: to do so is strictly prohibited and may be unlawful. Thank you for your co-operation. NHSmail is the secure email and directory service available for all NHS staff in England and Scotland NHSmail is approved for exchanging patient data and other sensitive information with NHSmail and GSi recipients NHSmail provides an email address for your career in the NHS and can be accessed anywhere For more information and to find out how you can switch, visit www.connectingforhealth.nhs.uk/nhsmail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kaplan Meier - not for dates
I have some data which is censored and I want to determine the median. Its actually cost data for a cohort of patients, many of whom are still on treatment and so are censored. I can do the same sort of analysis for a survival curve and get the median survival... ...but can I just use the survival curve functions to plot an X axis that is $ rather than date? If not is there some other way to achieve this? Thanks Calum This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Non-unique Values
I might be missing something really obvious, but is there an easy way to locate all non-unique values in a data frame? Example mydata <- numeric() mydata$id <- 0:8 mydata$unique <- c(1:5, 1:4) mydata$result <- c(1:3, 1:3, 1:3) > mydata $id [1] 0 1 2 3 4 5 6 7 8 $unique [1] 1 2 3 4 5 1 2 3 4 $result [1] 1 2 3 1 2 3 1 2 3 What I want to to be able to get some form of data output that might look like this: > nonunique(mydata$unique) mydata$unique 1 $id 0, 5 2 $id 1, 6 3 $id 2, 7 4 $id 3, 8 So that I could report to my data entry team any non-unique values of unique and tell them the row numbers so they can check if the 'unique' value is keyed wrongly, or the entry had been made twice. Hoping there is an easy way. if not I suspect we can do it in the SQL tables, just trying not to juggle two languages... C This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exporting PDF
> I run the script and it exports a PDF called "version 1". > I want it to check if "version 1" already exists. If so, > then I want the new graphs to be exported as > "version 2", and so on. > > Is it possible to do it in R? Someone may know a way. However its certainly possible to execute a command in linus from R. So you can ls the file (in windows try dir) and see if it exists, and build a loop round that. Just bew3are it'll be quick if there is only 1 file. If you have files it may slow things down a bit! Here's my example... #Example script to check if a file exists (linux system) filename = "somefile" fileno = 0 extension= ".pdf" path="/home/user/" repeat { fullfile= paste(path,filename,fileno,extension, sep="") if (length (system(paste("ls",fullfile), intern = T)) == 0) break fileno<-fileno+1 } #your command to save the file... using 'fullfile' as the filename and path This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odfWeave - merged table cells, and adding information like totals and p-values
I'm hoping I'm missing some (probably fundamental basic process) which might make my life easier! Lets assume I have a 3 column table summarizing results from a trial from three arms (Arm A, B and C). For each arm there will be a number of pieces of information to report. The simplest example might be to compare this to the demographic comparisons often seen in clinical traisl where you are setting out to prove that your randomization produced similar populations So I might have a table like this: --- A B C --- Male50 50 50 Female 49 51 50 Age < 6530 29 31 Age 65+ 69 72 69 --- I've got a matrix with that data in it which I'm passing to odfWeave's table function. I just want to check a few basic things. Here's some short code which will create the matrix: groups = c("A","B","C") factors = c("Male","Female", "Age < 65", "Age 65+") mydata = matrix (c(50,49,30,69,50,51,29,72,50,50,31,69), nrow=4, dimnames = list(factors,groups)) - Is there anyway to add a merged cell above ABC which would say Group? - If I want to total column I can do that using: total=as.numeric() for (fact in 1:length(factors)) { total[fact]=sum(mydata[fact,]) } mydata = cbind(mydata,total) Is there an easier way? - Now lets say i want to do a chi-squ test between the ages differences in Gp A and Gp B I run chisq.test(mydata[3:4,1:2]) What I really want is the p-value and I'll want to repeat that for Gp A vs Gp C. If I was just using R I'd simply print those and then add them to my table by hand. But I'm trying to be smart and use odfWeave. Now I know I can put them in my caption but I'd probably have added them as an extra row in my table or added it in brackets similar to the SDs/ORs and CIs shown in this example http://www.bmj.com/cgi/content-nw/full/340/feb05_1/c199/TBL2 depending which was more appropriate. - Is there an easy way to do anything like this? I'm thinking that we often put crude numbers in and (%) in brackets, or CIs etc - so my exported table would not ideally be pure numbers. - As a p value usually links two columns I might have expected to use a merged cell which again brings me back to my original question ;-) Thanks Calum Polwart BSc(Hons) MSc MRPharmS SP IP Network Pharmacist - North of England Cancer Network and Pharmacy Clinical Team Manager (Cancer & Aseptic Services) - County Durham & Darlington NHS Foundation Trust This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Contingency Table - output to odfWeave not working for matrix
Solved my own problem by using: odfTable.matrix( as.matrix ( with (mydata, table (site_id, reaction)) ) ) This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Contingency Table - output to odfWeave not working for matrix
Hi guys I'm hoping someone can help me with this. It should be easy but it seems to get stuck for no obvious reason! I am trying to set a report up in odfWeave so that we can re-run the same analysis at 3 time points as the data matures and provide an 'instant' report. To simplify the situation we have two pieces of data: site_id (numerical value 1-9) and reaction (categorical Y or N). mydata="" mydata$site_id = rep(1:9, 15) mydata$reaction = rep(c("Y","N","N","Y","N"),27) until now I've simply used the command: with (mydata, table (site_id, reaction)) to give me a 'table' that looks like this: reaction site_id N Y 1 9 6 2 9 6 3 9 6 ... 7 9 6 8 9 6 9 9 6 I can pass that as plain text to odf, but I want to make it a properly formatted table. Executing odfTable(with (mydata, table (site_id, reaction))) results in: Error in UseMethod("odfTable") : no applicable method for "odfTable", which I assume is because the output of table is not a dataframe, vector or matrix which is what odfTable needs. So I got more creative - but nothing is working they way I want it :-( odfTable( as.vector ( with (mydata, table (site_id, reaction)) ) ) Produces a table with 9 9's and 9 6's in a vertical column... odfTable( as.data.frame ( with (mydata, table (site_id, reaction)) ) ) Doesn't error but produces a table similar to this: site_id reaction Freq 11N9 22N9 33N9 44N9 ... 15 6Y6 16 7Y6 17 8Y6 18 9Y6 Both of the above are how they appear at the command prompt so not odfWeave doing anything wrong... odfTable( as.matrix ( with (mydata, table (site_id, reaction)) ) ) At the command prompt this looks the same as the original output so looks like it might work! BUT odfWeave complains Error in UseMethod("odfTable") : no applicable method for "odfTable" -- but its a matrix and odfWeave is supposed to be able to handle matrices? Anyone got any suggestions? This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ?OT: Probabilistic Simulation
Sorry this may well be defined as Off Topic. I apologize in advance. I am interested in performing what I think would be a probabilistic sensitivity simulation. I've done some crude ones before in excel but I'm wondering if R can help me do it more effectively? I have a set of theoretical variables for simplicity lets use (what I think) is an easier example: I have a peg and a hole which I want to predict how often they would not fit together securely. The peg should be 1.8mm in diameter. But there are some variables which might affect if it is actually 1.8mm but I'd expect it to be normally distributed around 1.8mm with a known standard deviation. There are also variables such as temperature which would influence if the peg was the correct size (there would be a known relationship to temperature, and a mean temperature with a standard deviation would be known) The hole should be 1.8mm diameter as well. Again there are variables that affect if it is, drill size, drilling method, substrate being drilled, temperature at time of drilling, temperature at time of fitting peg (would be same as above). I'd be looking to model if the peg would fit in the hole, AND if it fitted how well it fitted. I'd then want to run a simulation of say 500 scenarios, randomly picking temperature, hole characteristics etc for each simulation.From there I'd get the number of times in 500 samples the peg wouldn't fit. I could then try adjusting a variable - say using a different drilling method and see if that 'easier' but less reliable drilling method actually would affect the number of times the peg didn't fit the hole properly. So from what I understand this is a MonteCarlo simulation of probability. And this is where I go off topic! Am I right? I know its off topic - so what I'd like to know is can someone point me to where I can find out? Then if it is a monte-carlo can someone point me to a good description of how to get R to model this? Ta Calum This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Winbugs under R's error message
>LinZhongjun wrote: >> > >I ran Winbugs under R. I could get the results, but I kept getting the error > >messages: >> > >Error in > > file(con, "wb") : cannot open the connection > >In addition: Warning messages: > >1: In file.create(to[okay]) : > > cannot create file 'c:/Program > > Files/WinBUGS14//System/Rsrc/Registry_Rsave.odc', reason 'Permission denied' Uwe wrote: > > You probably do not have write permissions for those WinBUGS files that > R2WinBUGS tries to rewrite. > and wrote: > > Why do you post twice? > Uwe that was a pretty obvious answer and filled my inbox with two rather useless emails at the same time - so I hardly think you have the right to then criticise someone for dual posting. I'd guess he didn't intentionally do it! However your answer failed to point out the more obvious reason there is a problem - the path has a double forward slash in it WinBUGS14//System - under windows thats an impossible file name as far as I know. I don't use WinBUGS (*or even windows) but rather than looking at file privs you should be trying to find how that path gets built. My gut feeling is there will be a config file somewhere with a path that is C:/program files/WinBugs14 and another path that takes you from there to the files which is listed as /System/Rsrc/ - joining the two together and you get a double slash. Probably take the leading slash off the second one - but not sure where you'll find the file. Hope that helps - sorry I can't tell you where to look. This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is there an equivalent of "echo"
Sorry I'm having one of those moments where I can't find the answer but I bet its obvious... I'm outputting my results to a file using sink() Is there a command simillar to php's echo command that would allow me to add some text to that file ie: dataFr$a = 1:10 dataFr$b = 2*1:10 sink ("filepath/filename.txt", split=T) #Show number of entries in vector a table (dataFr$a) #show number of entries in vector b table (dataFr$b) #show relationship between a and b table (dataFr$a , dataFr$b) sink() Gives me a text file like this: 1 2 3 4 5 6 7 8 9 10 1 1 1 1 1 1 1 1 1 1 2 4 6 8 10 12 14 16 18 20 1 1 1 1 1 1 1 1 1 1 2 4 6 8 10 12 14 16 18 20 1 1 0 0 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 3 0 0 1 0 0 0 0 0 0 0 4 0 0 0 1 0 0 0 0 0 0 5 0 0 0 0 1 0 0 0 0 0 6 0 0 0 0 0 1 0 0 0 0 7 0 0 0 0 0 0 1 0 0 0 8 0 0 0 0 0 0 0 1 0 0 9 0 0 0 0 0 0 0 0 1 0 10 0 0 0 0 0 0 0 0 0 1 What I'd like is to be able to add some headers in the text file maybe like this: sink ("filepath/filename.txt", split=T) echo "Number of entries in vector a" table (dataFr$a) echo "number of entries in vector b" table (dataFr$b) echo "relationship between a and b" table (dataFr$a , dataFr$b) sink() Giving an output like: Number of entries in vector a 1 2 3 4 5 6 7 8 9 10 1 1 1 1 1 1 1 1 1 1 number of entries in vector b 2 4 6 8 10 12 14 16 18 20 1 1 1 1 1 1 1 1 1 1 relationship between a and b 2 4 6 8 10 12 14 16 18 20 1 1 0 0 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 3 0 0 1 0 0 0 0 0 0 0 4 0 0 0 1 0 0 0 0 0 0 5 0 0 0 0 1 0 0 0 0 0 6 0 0 0 0 0 1 0 0 0 0 7 0 0 0 0 0 0 1 0 0 0 8 0 0 0 0 0 0 0 1 0 0 9 0 0 0 0 0 0 0 0 1 0 10 0 0 0 0 0 0 0 0 0 1 Possible? Without 200 lines of code? This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to set default plotting colors by treatment?
>> col=c("blue","red")mydfr$[treatment] > > Yes, but I would like to use the function for lots of other dataframes > as well, so embedding 'mydfr' in the function is not the ideal > solution... In that case I'd try something like: myplot <- function(..., tmnt) { plot(..., pch=19, col=c("blue","red")[tmnt] ) } with(mydfr, myplot(Xmeas, Ymeas, tmnt=treatment)) That seems to work... - basically you just need to pass treatment in the function call... This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Exporting Numerous Graphs
> I have got 27 graphs to export (not a lot...I know!). How can I fit all of > them into a single file like PNG without adjusting the size of the graphs? > What's in my mind is like pasting graphs into Word, in which I can just > scroll down to view the graphs. Pretty sure PNG can only cope with single 'page' images - the page can be as big as it wants but then when it comes to things like printing its gonna cause problems as I doubt many graphics packages can split it over the page? So they'll either crop it or scale it. 27 on 1 page is gonna be very small? TIFF can handle multiple pages and of course so can PDF. I don't know of an export to TIFF function. So I'd suggest exporting to PDF - and exporting to 27 different file names (1 to 27.pdf) Then using a tool like pdfshuffler (linux GUI based) or using command line ghostscript (windows or linux) gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=merged.pdf 1.pdf 2.pdf 3.pdf ...etc... 27.pdf (This is a linux command not a R command. The widnwos version will be very simillar I suspect) That'd give you a 27 page pdf file with each graph on a new page? Much easier to scroll through than using a scroll bar on a graphics package - you can go back to Page 5 and on to page 9 to compare quickly rather than having to manually scroll to find the right info. Plus PDF is vector based which means future importing into decent desktop publishing packages should avoid and issues with loss / scaling. I believe its also possible with psmerge using postscript and so possible EPS files. This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to set default plotting colors by treatment?
> > # I tried defining a function like this > myplot <- function(...)plot(..., pch=19, col=c("blue","red")[treatment]) > > # So i can call it like this: > with(mydfr, myplot(Xmeas, Ymeas)) > > # but: > Error in plot.xy(xy, type, ...) : object 'treatment' not found > basically that is something like calling: myplot( mydfr$Xmeas, mydfr$Ymeas ) So plot doesn't know that treatment is within mydfr... changing your function to: myplot <- function(...) { plot(..., pch=19, col=c("blue","red")mydfr$[treatment] ) } should work? This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Obtaining value of median survival for survfit function to use in calculation
Hi, I'm sure this should be simple but I can't figure it out! I want to get the median survival calculated by the survfit function and use the value rather than just be able to print it. Something like this: library(survival) data(lung) lung.byPS = survfit(Surv (time, status) ~ ph.ecog, data=lung) # lung.byPS Call: survfit(formula = Surv(time, status) ~ ph.ecog, data = lung) 1 observation deleted due to missingness n events median 0.95LCL 0.95UCL ph.ecog=0 63 37394 348 574 ph.ecog=1 113 82306 268 429 ph.ecog=2 50 44199 156 288 ph.ecog=3 1 1118 Inf Inf What I want is to be able to call the median using something like: lung.byPS$median[ph.ecog=0] so that I can add it to a plot like this: plot (lung.byPS, conf.int=F, lty=1:4, ) abline(h=0.5, lty=5) abline(v=lung.byPS$median[ph.ecog=1]) abline(v=lung.byPS$median[ph.ecog=2]) Anyone got any easy solutions? Its fairly normal to plot across and down to show medians on survival curves... so I'm sure it must be possible to automate. Ta Calum This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Partykit Document
>Example, >data("GlaucomaM", package = "ipred") is accepted. Now instead of GlaucomaM, I> need to give my own data. But the data format for GlaucomaM is not given. >So how can I know that? Not familliar with the packages at all but if you simply enter: > data ("GlaucomaM", package="ipred") > GlaucomaM All the data will be shown for you which helps you understand what it is - a bit! There is a lot of data and it wraps so its hard to understand. So instead try: > summary (GlaucomaM) That'll tell you a bit about the 'field' names and the sort of content it has. What all the fields are ... not so sure! I'm afraid the output in the example means nothing to me - but possibly if you understand recursive partitioning you understand the output which when you know that the input fields are might help. Alternatively if you still struggle with the input fields they stole the data from the ipred package. I generally find that if you want to know what a package is all about going to: http://cran.r-project.org/web/packages//.pdf seems to give you the manual! http://cran.r-project.org/web/packages/ipred/ipred.pdf - does indeed and there is a bookmarked section all about the GlaucomaM dataset. Gives you every field name and what it means... still none the wiser - I assume if you laser scan peoples eyes it means something! That same page would also be available with: > library (ipred) > help (GlaucomaM) Hopefully that helps? Calum This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] forest (meta) editing text
OK - I've been doing some work on getting a forest plot or two together for a sub-group analaysis. Thats the crucial thing - because its a sub-group analysis rather than a meta-analsysis and all the forest (meta) and forestplot (rmeta) instructions assume you are doing a meta-analysis. I found rmeta didn't play nicely for me so I'm using the meta package with forest. Here's some code that would allow you to reproduce it: library(meta) mydata.forest.DoseRed = data.frame(c(1,7,14,6,16), c(7,21,10,15,23), c(1,1,5,2,5), c(4,7,6,5,12)) colnames(mydata.forest.DoseRed)=c("n.DR","n.No.DR", "E.DR","E.No.DR") rownames(mydata.forest.DoseRed)=c( "Age: 50-59", "Age: 60-69 ", "Age: >70", "Sex: F ", "Sex: M") mydata.meta.DoseRed = metabin( E.DR, n.DR, E.No.DR, n.No.DR, data=mydata.forest.DoseRed, sm="RR", meth="MH", studlab=rownames(mydata.forest.DoseRed) ) forest( mydata.meta.DoseRed, comb.fixed=TRUE, comb.random=FALSE, leftcols="studlab", plotwidth=unit(8, "cm"), at = c(0.5,1.0,2.0,3.5), xlim = c(0.22, 3.48) ) Thats roughly what I want. There is a jpeg of it here: http://www.wittongilbert.free-online.co.uk/forest.jpg BUT - there are two problems: Firstly the subgroups have a heading called Study - which makes sense for a meta-analysis but should say something like sub-groups for a sub-group analysis. Anyone know if this is customisable? Secondly I want to display a global effect (i.e. the RR and confidence intervals for the whole study population) - thats what the diamond should show but it actually is the MH Fixed Effect Model. Again - probably appropriate for a meta-analysis but I know the actual value for the whole population so i don't have to model it... Anyone know how to get that top plot instead? forest doesn't open a standard plot window so you can't simply use plot commands to send extra stuff to the window. Or do I need to use plot (mydata.meta.DoseRed) and then build up the extra info from there? Still surprised this isn't a proper function! This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PowerCut Killed R - is my code retrievable?
Rolf Write: > If you've been cut-and-pasting from a text editor, then your commands > *might* be in the file .Rhistory. Unfortunately this history gets saved > only when you exit R (and by default only when you also say ``yes'' to > saving the workspace) or if you explicitly savehistory(). Was hoping there was some thing like that... but like you say it wasn't a planned exit so its not there -- Barry wrote: > So you were just using the text editor as a scratch pad, and not > saving it? Exactly > A half-decent text editor should be saving things to disk > as you go. For example, in emacs, if your emacs dies while editing a > file then when you restart it, it will notice and offer to restore it > from its "restore" file. Yeh thats what I was expecting to happen. I can't actually remember but I think it was gedit that I had open although sometimes I use kate. Neither have autorecoverry as I now know. Both can recover (possibly) files that had once been saved but it seems I never saved it - nit was literally just a big temporary, editable clipboard! > If you were using emacs you might have files > like #foo.R# which are emacs' auto-restore files. Other editors might > do other things - possibly leaving files in /tmp on a Linux system > (but they might get zapped by a reboot). It seems only if you hit save once then you may get temporary copies...! I've never liked emacs for other programming stuff - but will take another look at it. -- Ted wrote: > To follow up on what Barry wrote above: Don't be put off by the > recommendation to use EMACS (which is not to everyone's taste). Took the words out of my mouth >I use vim, which does much the same sort of thing: there is a hidden > ".whatever.swp" file which contains a back-trackable history of > the editing changes that have been made to the file. So, if you get > a crash, on re-opening the file in vim you are asked if you want to > "recover". The ".swp" file is *not* zapped by a reboot! Good to know. I used to use vi quite a lot in my command line linux days. These days it seems cumbersome on my linux PC but works fine on my server! Must be a configuration issue... > Again, what I routinely do (in Linux) when developing R code is to > have two terminal windows open. In one I am running R. In the other, > beside it, I am editing a file of R code. To run code in R that has > been entered in the "code" window, I just highlight it with the > mouse in the "code" window, and then paste it into the "R" window. Exactly what I was doing. Sounds like emacs with ESS has this even slicker - highlight and run. I just need to save the stuff! -- Ellison wrote: > Use a text editor or something specific to R like Tinn-R and _save early > and often_. Tinn-R is window$ only. I've played with RKWard and found it cumbersome plus it doesn't put my commands and its commands together so you don't get a command line History. Looking at sciViews-K just now. I currently use Eclipse as my DIE for PHP programming so I'll look at that too. I guess where I'm struggling is I don't really rate what I'm doing as programming becasue I'm not writing R plugins just some functions... Mind you just having some R context sensitive text / syntax highlighting may be an advantage. -- Hans wrote: > What about a version control system to (locally) save the different > stages of your script files? > (I use git but Subversion (SVN) may be easier and do the job). My changes were typically one line at a time - and re-run it. I could remember what it was if it didn't do it (or use undo). But if this gets big then SVN etc might be worth it. I have no hands on expereince with GIT - one of my PHP projects has been discussing moveing to GIT / Bazzar for about 6 months. I see the advantage for them as they have multiple coders - is there some advanatge i might have missed for a lone programmer (sorry going well OT). Clearly either doesn't work if I don't save the file! Thanks BTW for all the posts back - one of my best responded to queries! Calum This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] PowerCut Killed R - is my code retrievable?
I've been tweaking code for several days on and off in R, cut and pasting in from a text editor (I just leave them open all the time). I think I got something that was usable but then a powersurge tripped the fuses and unfortunately the machine I was working on doesn't have a UPS. Does R hold the command data in cache some place? I've purposefully not opened R since the crash so that if it did I don't over write it? I guess what I'm wanting is the equivalent of the linux up arrow which repeats the last line. i know that exists in R but from what I recall it didn't work when you close the R session. Is that stored in a hidden file someplace that I could copy? Its not a huge piece of code or anything - but it was code I tweaked over several stages as I'm new to R and so it was part of my learning process. Also - is there a better way for the future? I know some people use IDE's but is that for serious programming or for building a small function and tweaking it? Thanks Calum This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odd results with Chi-square test. (Not an R problem, but general statistics, I think)
I'm far from an expert on stats but what I think you are saying is if you try and compare Baseline with Version 3 you don't think your p-value is as good as version 1 and 2. I'm not 100% sure you are meant to do that with p-values but I'll let someone else comment on that!. totalincorrect correct % correct baseline 898 708 190 21.2% version_1 898 688 210 23.4% version_2 898 680 218 24.3% version_3 1021790 231 22.6% > > Here, the p value for version_3 (when compared with the baseline) seems to > make no sense whatsoever. It shouldn't be larger that the other two p > values, the increase in correct answers (that is what counts!) is bigger > after all. > No its not the raw numbers its the proportion of correct answers that counts. I've added a % correct to your data - does that make it clearer? Only 22.6% of version 3's answers were correct - so the difference in terms of % change is smaller than version 1 and 2 produced. From my niave persepctive I'd want to test for a difference between all results and baseline, and then v1 & v2, v1 & v3, v2 & v3 (you may tell me they are unsound things to test - in which case don't test them. You'd then need to determine a threshold for accepting that the test is valid (say p < 0.05). I'#d contest that the test should be two tailed - results could be better or worse? You should also develop a hypothesis. Let me create one for you: A. H1: version1 of the software is better than baseline (H0: version 1 is no better than baseline) B. H1: version2 of the software is better than version 1 (H0: version 2 is no better than version 1) C. H1: version3 of the software is better than version 2 (H0: version 3 is no better than version 2) Now look at you results and p-values and and work out if the H1 or H0 applies. You could develop further variants (D: version 3 is better than baseline). Finally - remember to consider the 'clinical significance' as well as the statistical significance. I'd have hoped a software change might have increase correct answers to say 40%? And remember also that p-value of 0.05 has a false positive rate of 1:20. > > Any idea what's going on here? I thought the sample size should have no > impact on the results? > Erm.. sample size always has an influence of results, If you show a difference in 100 samples - you would expect a larger p value for virtually any statistical test you chose than if you show that same difference in 1000 results. You have a bigger sample but a smaller overall difference so in effect you can be less sure that that change is not down to chance. (Purists statisticians will likely challenge that definition) This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using 'field names' of a data.frame in a function
I may be doing this wrong! but I have a function which I have simplified a lot below. I want to pass some 'field names' of a data-frame to the function for it to then do some manipulation of. Here's my code: #build a simple dataset mydataset = data.frame ( ages=c('40-49','40-49','40-49','30-39','50-59','50-59','60-69','50-59'), sex = c("M","F","F","M","F","F","M","M")) #build a simple function myfunc <- function (categories) { table (categories) } #call the function myfunc (c(mydataset$ages, mydataset$sex)) === My output I am getting is: categories 1 2 3 4 5 7 3 1 But what I'm expecting is: table (mydataset$ages, mydataset$sex) F M 30-39 0 1 40-49 2 1 50-59 2 1 60-69 0 1 Calling the function as: myfunc ("mydataset$ages, mydataset$sex") doesn't work either and nor does myfunc (c("mydataset$ages", "mydataset$sex")) Now in the simple version above I could make the function (category1, category2) and then call table (category1, category2) - but what if I might have a category 3, category 4 etc... How can I pass the dataset name to the function without it trying to actually pass the data to the function? I've also tried paste - but perhaps I'm mis-using it? Many thanks for help in advance This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summarising Data for Forrest Plots
>Ah, I think I see what you want. Try this on each pair of exclusive sets: >Then under65row and over65row should be the first two rows of your result. >Can't test this at the moment, but I don't think it's too far wrong. > I knew this shouldn't need so much work ;-) Not cracked it yet - because as I see it I need a 2 x 4 table and at the moment I only cracked a 2 x 2 table. ( Or really I need something like a 10 x 4 - but the 4 is the bit that I haven't cracked) First option is something like this: with(mydataset, table(Sex, Dose)) I can get: Dose Sex FD RD F615 M 1623 For non catagorical data its slightly trickier... but quite achievable in two lines (for the 2 x 2 table) factor(cut(mydatasetl$Age, breaks = c(0,65,100))) -> AgeBands table (AgeBands, mydataset$Dose) Which gives: AgeBands FD RD (0,65]156 (65,100]1326 Although - I'm not yet sure if I can actually call that data back by column names. ie x <- table (AgeBands, mydataset$Dose) x$FD produces an error. :-( But getting there. This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summarising Data for Forrest Plots
>Ah, I think I see what you want. Try this on each pair of exclusive sets: > n_total<-dim(mydataset)[1] under65<-mydataset$age <= 65 n_under65<-sum(under65) under65row<-c(sum(mydataset$dose[under65] == "FD"), sum(mydataset$dose[under65] == "RD"), sum(mydataset$vitalstatus[under65] == "dead" & mydataset$dose[under65] == "FD"), sum(mydataset$vitalstatus[under65] == "dead" & mydataset$dose[under65] == "RD")) over65row<-c(sum(mydataset$dose[!under65] == "FD"), sum(mydataset$dose[!under65] == "RD"), sum(mydataset$vitalstatus[!under65] == "dead" & mydataset$dose[!under65] == "FD"), sum(mydataset$vitalstatus[!under65] == "dead" & mydataset$dose[!under65] == "RD")) > >Then under65row and over65row should be the first two rows of your result. >Can't test this at the moment, but I don't think it's too far wrong. Thanks Jim. Yes it looks like that code should do the job. I was really hoping for a code like "SummariseForSubsetAnalysis(mydataset, by=mydatatset$dose, subsets=c(age, renal, sex, toxicity), event=survival )" which would magically do it for me ;-) I guess if this is something I start having to do lots I might have to write one. Surprised one doesn't seem to exist - perhaps the number of variations in what people want would be too complex. Calum This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superscripts and rounding
Duh! did it again! the variables need str in front of them don't they!! sub = sprintf('Seasonal station with natural streamflow - Lat: %s Lon: %s Gross Area %s km\UB2 - Effective Area %s km\UB2 ', round( str[['metadata']][['latitude']], digits=4 ), round( str[['metadata']][['longitude']], digits = 4), round( str[['metadata']][['grossarea']], digits = 4), round( str[['metadata']][['effectivearea']], digits = 4)) Let me know if that works. This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superscripts and rounding
> > library(RODBC) > library(HYDAT) > You will need to install HYDAT (the zip file) from > http://www.geog.ubc.ca/~rdmoore/Rcode.htm > > Below is my current code - which works. The [[]] is the way i am accessing > the columns from the data frame. > > thanks again for all your help > > # load HYDAT data > par(mfrow=c(3,1)) > path <- 'c:\\HYDAT' > wsc.id <- '11AB075' > stn <- ReadHydatFile(path, wsc.id) #if data in CDROM image > # stn <- ReadHydatDatabase (path, wsc.id) #if data in database > I'd need access to the whole data file. I tried exporting some data from the website for it but it got too complex for me! However, it seems to me you have two chunks of data: stn[flow] - which has daily flow data in it? stn[metadata] which i guess is a header for the whole dataset -describing what it is. So I recreated what i hope might be simillar to part of your data (using lst instead of stn as the vector/array/list name) # Build an array containing a lat & long. info <- data.frame(latitude=1.0005689, longitude=55.698754) #display result info # latitude longitude #1 1.000569 55.69875 #create a list (called lst) with an object metadata in it containing the array lst <- list (metadata=info) #display result lst #$metadata # latitude longitude #1 1.000569 55.69875 #check if can call a single piece of data using the square brackets as references... lst[['metadata']][['longitude']] #[1] 55.69875 #now try rounding that > round (lst[['metadata']][['longitude']], digits=2) [1] 55.7 #now try sprintf'ing that sprintf('Seasonal station with natural streamflow - Lat: %s', round (lst[['metadata']][['longitude']], digits=2)) # [1] "Seasonal station with natural streamflow - Lat: 55.7" #now try that in a plot plot(1,1, sub=sprintf('Seasonal station with natural streamflow - Lat: %s', round (lst[['metadata']][['longitude']], digits=2))) # results in a correct label ;-) Its possible to refer to that same value in the following ways: > lst$metadata # same as lst[['metadata']] latitude longitude 1 1.000569 55.69875 > lst$metadata[['longitude']] # same as lst[['metadata']][['longitude']] [1] 55.69875 > lst$metadata$longitude # same as lst[['metadata']][['longitude']] [1] 55.69875 > So I'm stumped! without being able to see the actual structure of your data I can't figure out why you are getting an error! BTW - was there a cut and paste error? Your error message was reported as: Error: unexpected symbol in: " sub = sprintf('Seasonal station with natural streamflow - Lat: %s Lon: %s Gross Area %s km\UB2 - Effective Area %s km\UB2, + round( [['metadata" > + round( [['metadata']][['longitude']], digits = 4), Error: unexpected '[[' in "+ round( [[" The + on + round looks like the plus was typed. + shouldn't have been typed, but also was there a missing quote after the \UB2. You should have entered: sub = sprintf('Seasonal station with natural streamflow - Lat: %s Lon: %s Gross Area %s km\UB2 - Effective Area %s km\UB2 ', round( [['metadata']][['latitude']], digits=4 ), round( [['metadata']][['longitude']], digits = 4), round( [['metadata']][['grossarea']], digits = 4), round( [['metadata']][['effectivearea']], digits = 4)) C This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summarising Data for Forrest Plots
>> What I want to do is do a forrest (forest) plot for subgroups within my >> single dataset as a test of heterogeniety. I have a dataset who received >> either full dose(FD) or reduced dose(RD) treatment, and a number of >> characteristics about those subjects: age, sex, renal function, weight, >> toxicity. And I have survival data (censored). they are in standard >> columnar data. >> > Is there an *easy* way to transform them into something like this: >> >> SubGroupn.FDn.RDsurv.FD surv.RD >> 1 Age >65 >> 2 Age <= 65 >> 3 Male >> ... >> 9 Grade 0-2 Tox >> 10 Grade 3/4 Tox >> >Hi Calum, >Have you tried subsetting the dataset like this: > >meta.DSL(...,data=mydataset[mydataset$age <= 65,],...) > >Jim Hi Jim, I'm not sure that I understand! But my understanding was that meta.DSL wants 4 bits of information number treated (Full Dose in my case), Number in control (reduced dose in my case), Number of events in the twoi groups... which is what I was trying to describe above - although possibly not very well.. Then it will do the work for me. My challenge is taking a load of data in columns and getting it summarised by the subgroups so that it takes Age > 65 and counts how many had full dose, howmany had reduced dose and populates the field then does the same for Age < 65 etc etc... (I may be back with questions about the survival value - but even knowing how to get it to summarise like I describe would be a start. I guess its a bit like a pivot table in excel? But perhaps its something to do with the mydataset[mydataset$age <=65,] bit? That seems to give me a data table with only the 65 and unders which makes sense. But then how do I get it to populate a table with the numbers in the two groups? This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summarising Data for Forrest Plots
> Are n.FD and n.RD the number of people who received the full/reduced dose Yes - but I don't have the data structured like that YET - thats what I want to get to because thats what forest plot seems to be wanting. > and surv.FD and surv.RD the number of people that survived? Mmm... was more thinking of something like median survival? ALthough the brain hasn't kicked into gear yet tonight and it might actually be mean to be a hazard ratio? >And are the people who received the full dose different from the people who >received the reduced dose? Yes > And what exactly is it that you want to plot in the forest plot? Subgroups - see below >From the way you have arranged the table, it seems as if you want some kind of >effect size measure that contrasts the survival rate of the full versus >reduced dose in the various subgroups. Is that correct? Yip that sounds right >And are you just trying to figure out how to draw the forest plot once you >have the data in the table form as shown in your post or are you also trying >to figure out how to create that table to begin with? I *think* I can draw the plot once I have the data structured right. But at the moment my data is structured like this: PatientID FullDose Survival Censored Age Sex Normal Renal Func Grade of Toxicity 001 Y125 N 75 F Y 1 002 N55 Y 55 M N 4 003 N65 Y78 F Y 2 I want to eventually get to a forest plot that looks a bit like this: Age: < 65|-#---|| >= 65 |-#| | | Sex:| M|-#--|---| F |---#-| | | Renal Fucn: | Normal |---#-| Abnormal |---#-| | Grade of Toxicity: | 0-1 | |---#---| 2 |-#--|| 3-4 |--#| | | Overall: <> | Which I believe I can achieve using the metaplot or forrest plot functions, replacing the studies with the relevant sub groups. But my challenge has been converting the patient data above down to list subgroups. Other than by running a survival analysis individually on an individual subgroup recording the results and building up a table. Calum This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superscripts and rounding
So the issue is something to do with the [['xxx']] construction of your data. Can you explain what thats' all about - as it errors all over the shop when I try using that... You've set me on a mission to find the answer! So I'd really like to recreate a little bit of your data here, and play... C This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superscripts and rounding
Slightly confused because if I try: > newdata.yaxis = c(2.473, 3.123456, 3.23456, 2.67890, 1.56789) > newdata.yaxis_4 = round (newdata.yaxis, digits = 4) > newdata.yaxis [1] 2.47 3.123456 3.234560 2.678900 1.567890 > newdata.yaxis_4 [1] 2. 3.1235 3.2346 2.6789 1.5679 As you see - I get a new variable with the result rounded to 4 places. Then if I try: > inserted = sprintf("Some text with a 4 digit number ( %s ) inserted in it", > newdata.yaxis_4[2]) > inserted [1] "Some text with a 4 digit number ( 3.1235 ) inserted in it" It works for me - clearly the way my data is structured is a little different from you. Would have thought you might need to do something like: > [['rounddata']] = round ([['metadata']], digits = 4) Then call your data with [['rounddata']][['latitude']] etc - I'm no expert on all this matrixes stuff though! (Not even sure what a double [[ means! ) But this (calling the round within the sprintf function) also works for me: > inserted = sprintf("Some text with a 4 digit number ( %s ) inserted in > it",round( newdata.yaxis[2], digits = 4)) > inserted [1] "Some text with a 4 digit number ( 3.1235 ) inserted in it" So why can't you just use: > sub = sprintf('Seasonal station with natural streamflow - Lat: %s Lon: %s > Gross Area %s km\UB2 - Effective Area %s km\UB2, + round( [['metadata']][['latitude']], digits = 4), + round( [['metadata']][['longitude']], digits = 4), + round( [['metadata']][['grossarea']], digits = 4), + round( [['metadata']][['effectivearea']] digits = 4), + ) This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Summarising Data for Forrest Plots
I tried to post this a few times last week and it seems to have got stuck somehow so I'm trying from a different email in the hope that works. If somehow this has appeared on the list 20 tiems and I never saw any of them I apologize ;-) I'm basically an R-newbie. But I am VERY computer literate. But this has me stumped... All the examples for using the rmeta package to create a forest plot or simillar seem to use the catheter data: Name n.trt n.ctrl col.trt col.ctrl inf.trt inf.ctrl 1 Ciresi 124127 15 21 13 14 2 George44 35 10 25 13 3 Hannan68 60 22 22 57 4 Heard 151157 60 82 56 ... As I see it thats a summary of data from several published trials. What I want to do is do a forrest (forest) plot for subgroups within my single dataset as a test of heterogeniety. I have a dataset who received either full dose(FD) or reduced dose(RD) treatment, and a number of characteristics about those subjects: age, sex, renal function, weight, toxicity. And I have survival data (censored). they are in standard columnar data. Is there an *easy* way to transform them into something like this: SubGroupn.FDn.RDsurv.FD surv.RD 1 Age >65 2 Age <= 65 3 Male ... 9 Grade 0-2 Tox 10 Grade 3/4 Tox Which rmeta will then let me use to create a forest plot from? This is a reasonably standard approach in biomedical studies these days so it seems odd that I can't find any "How-To" that tells me how to short cut it. Otherwise I have to manually calculate each of the parameters :-( Which is a real pain as we are awaiting more mature data which would need the same process re-run. Thanks in advance C This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superscripts and rounding
I'm anything but an expert in R however if I'm labeling a graph axis with a superscript I have tended to use: > plot (x , y , xlab = expression ("label"^2)) But when you try to have more than one superscript it fails. Assuming you are in a UTF8 location (Western Europe) you could try: > plot (x , y , xlab = expression ("Some label text \UB2 some more label text > \UB2")) That works for me. (B2 is the hex code for UTF-8 character = ^2 and \U is a control sequence that will call that character.) It onlyu works if you are a UTF8 area from what i understand. -- As for rounding - can you not populate a new field using the round command? i.e. something like [metadata][long_4dig] = round([metadata][longitude],4) #I haven't used round before my syntax may be wrong! Then use %s to drop that value in? C This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.