[R] A better scales::dollar() ?

2019-12-06 Thread POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) via R-help
I'm writing a quite large document in Rmarkdown which has financial data in it. 
 I format that data using scales::dollar() currently something like this:

>
> require (scales)
> x = 10
> cat (dollar (x, prefix ="£", big.mark=","))

£100,000

But actually, I'd quite like to get £100k out in that instance so I'd do:

> cat (dollar (x/10^3, prefix ="£", suffix="k" ))

£100k

But x could be 100 or 10,000,000.  I want some form of 'automatic' version that 
might give me something like:

>
> require (scales)
> y = 0:7
> x = 1^y
> dollar(x, prefix="£")
[1] "£1"  "£10" "£100""£1,000"  "£10,000" 
"£100,000""£1,000,000"  "£10,000,000"

But what I want is more like:

£1.00  £10.00  £100  £1k  £10k  £100k  £1m  £10m

I'm sure I can write something as a little function, but before I do - is there 
something already out there?

I have a similar need to format milligrams through to kilograms.




This message may contain confidential information. If you are not the intended 
recipient please inform the
sender that you have received the message in error before deleting it.
Please do not disclose, copy or distribute information in this e-mail or take 
any action in relation to its contents. To do so is strictly prohibited and may 
be unlawful. Thank you for your co-operation.

NHSmail is the secure email and directory service available for all NHS staff 
in England and Scotland. NHSmail is approved for exchanging patient data and 
other sensitive information with NHSmail and other accredited email services.

For more information and to find out how you can switch, 
https://portal.nhs.net/help/joiningnhsmail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I can't get seq to behave how I think it should

2019-01-17 Thread POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) via R-help
Well I get the issue with finite precision. As in SQRT(2) * SQRT(2) is not 2.

What surprised me was that seq(1.4, 2.1, by=0.001) starts at 1.3999 
and not 1.4!


-Original Message-
From: PIKAL Petr [mailto:petr.pi...@precheza.cz]
Sent: 17 January 2019 14:30
To: POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST); Ben 
Tupper
Cc: r-help@r-project.org
Subject: RE: [R] I can't get seq to behave how I think it should

Hi

It is not seq problem, it is floating point numbers representation in finit 
precision problem. Ben pointed to it and you could learn about it from FAQ 7.31.

Cheers
Petr

> -Original Message-
> From: POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION
> TRUST) 
> Sent: Thursday, January 17, 2019 2:56 PM
> To: PIKAL Petr ; Ben Tupper
> 
> Cc: r-help@r-project.org
> Subject: RE: [R] I can't get seq to behave how I think it should
>
> Thanks guys.
>
> I've used Petr's method and its working for me.
>
> If the data had been from a calculation I'd have rounded it... just
> didn't expect seq to break it!
>
> C
>
> -Original Message-
> From: PIKAL Petr [mailto:petr.pi...@precheza.cz]
> Sent: 17 January 2019 13:53
> To: Ben Tupper; POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS
> FOUNDATION TRUST)
> Cc: r-help@r-project.org
> Subject: RE: [R] I can't get seq to behave how I think it should
>
> Hi
>
> Or you could use rounding.
> which(round(lut, 3)==1.8)
> [1] 401
>
> Cheers
> Petr
>
> > -Original Message-----
> > From: R-help  On Behalf Of Ben Tupper
> > Sent: Thursday, January 17, 2019 2:43 PM
> > To: POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS
> FOUNDATION TRUST)
> > 
> > Cc: r-help@r-project.org
> > Subject: Re: [R] I can't get seq to behave how I think it should
> >
> > Hi,
> >
> > This looks like a floating point reality bump - see
> >
> > https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-thin
> > k- these-numbers-are-equal_003f
> > <https://cran.r-project.org/doc/FAQ/R-
> > FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f>
> >
> > You can use other methods to finding your row - I would opt for
> > findInterval()
> >
> > > lut = seq(1.4, 2.1, by=0.001)
> > > findInterval(1.8, lut)
> > [1] 401
> >
> > findInterval() uses a rapid search to find the index in the look up
> > table (lut) that is just less than  or equal to the search value (in
> > your example
> 1.8).
> >
> > Cheers,
> > Ben
> >
> > > On Jan 17, 2019, at 8:33 AM, POLWART, Calum (COUNTY DURHAM AND
> > DARLINGTON NHS FOUNDATION TRUST) via R-help 
> > wrote:
> > >
> > > I am using seq with the expression seq(1.4, 2.1, by=0.001) to
> > > create a sequence of references from 1.4 to 2.1 in 0.001
> > > increments.  They appear to be created correctly.  They have a
> > > related pair of data which for the purposes of this we will call
> > > val.  I'm interested in the content on the row with seq = 1.8. But
> > > I can't seem to get it returned.  I can get other values but not
> > > 1.8!  yet looking at row
> > > 401 there is nothing to indicate an issue
> > >
> > >> a = 1.4
> > >> b = 2.1
> > >> seq = seq(a, b, by=0.001)
> > >> val = ceiling(seq * 50)
> > >> s=data.frame(seq, val)
> > >> s$val[seq==1.799]
> > > [1] 90
> > >> s$val[s$seq==1.8]
> > > numeric(0)
> > >> s$val[seq==1.8]
> > > numeric(0)
> > >> s$val[s$seq==1.800]
> > > numeric(0)
> > >> s$val[s$seq==1.801]
> > > [1] 91
> > >> head(s[s$seq>1.798,])
> > >  seq val
> > > 400 1.799  90
> > > 401 1.800  90
> > > 402 1.801  91
> > > 403 1.802  91
> > > 404 1.803  91
> > > 405 1.804  91
> > >
> > >
> > > Can anyone explain what's going on here and how I would correctly
> > > find the
> > content of row 401 by using an expression to equal the seq column?
> > >
> > >
> > >
> > >
> > >
> > >
> >
> ***
> > ***
> > > **
> > >
> > > This message may contain confidential information. If
> > > ...{{dropped:25}}
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSC

Re: [R] I can't get seq to behave how I think it should

2019-01-17 Thread POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) via R-help
Thanks guys.

I've used Petr's method and its working for me.

If the data had been from a calculation I'd have rounded it... just didn't 
expect seq to break it!

C

-Original Message-
From: PIKAL Petr [mailto:petr.pi...@precheza.cz]
Sent: 17 January 2019 13:53
To: Ben Tupper; POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION 
TRUST)
Cc: r-help@r-project.org
Subject: RE: [R] I can't get seq to behave how I think it should

Hi

Or you could use rounding.
which(round(lut, 3)==1.8)
[1] 401

Cheers
Petr

> -Original Message-
> From: R-help  On Behalf Of Ben Tupper
> Sent: Thursday, January 17, 2019 2:43 PM
> To: POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)
> 
> Cc: r-help@r-project.org
> Subject: Re: [R] I can't get seq to behave how I think it should
>
> Hi,
>
> This looks like a floating point reality bump - see
>
> https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-
> these-numbers-are-equal_003f <https://cran.r-project.org/doc/FAQ/R-
> FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f>
>
> You can use other methods to finding your row - I would opt for
> findInterval()
>
> > lut = seq(1.4, 2.1, by=0.001)
> > findInterval(1.8, lut)
> [1] 401
>
> findInterval() uses a rapid search to find the index in the look up
> table (lut) that is just less than  or equal to the search value (in your 
> example 1.8).
>
> Cheers,
> Ben
>
> > On Jan 17, 2019, at 8:33 AM, POLWART, Calum (COUNTY DURHAM AND
> DARLINGTON NHS FOUNDATION TRUST) via R-help 
> wrote:
> >
> > I am using seq with the expression seq(1.4, 2.1, by=0.001) to create
> > a sequence of references from 1.4 to 2.1 in 0.001 increments.  They
> > appear to be created correctly.  They have a related pair of data
> > which for the purposes of this we will call val.  I'm interested in
> > the content on the row with seq = 1.8. But I can't seem to get it
> > returned.  I can get other values but not 1.8!  yet looking at row
> > 401 there is nothing to indicate an issue
> >
> >> a = 1.4
> >> b = 2.1
> >> seq = seq(a, b, by=0.001)
> >> val = ceiling(seq * 50)
> >> s=data.frame(seq, val)
> >> s$val[seq==1.799]
> > [1] 90
> >> s$val[s$seq==1.8]
> > numeric(0)
> >> s$val[seq==1.8]
> > numeric(0)
> >> s$val[s$seq==1.800]
> > numeric(0)
> >> s$val[s$seq==1.801]
> > [1] 91
> >> head(s[s$seq>1.798,])
> >  seq val
> > 400 1.799  90
> > 401 1.800  90
> > 402 1.801  91
> > 403 1.802  91
> > 404 1.803  91
> > 405 1.804  91
> >
> >
> > Can anyone explain what's going on here and how I would correctly
> > find the
> content of row 401 by using an expression to equal the seq column?
> >
> >
> >
> >
> >
> >
> ***
> ***
> > **
> >
> > This message may contain confidential information. If
> > ...{{dropped:25}}
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních 
partnerů PRECHEZA a.s. jsou zveřejněny na: 
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about 
processing and protection of business partner’s personal data are available on 
website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a 
podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: 
https://www.precheza.cz/01-dovetek/ | This email and any documents attached to 
it may be confidential and are subject to the legally binding disclaimer: 
https://www.precheza.cz/en/01-disclaimer/





This message may contain confidential information. If you are not the intended 
recipient please inform the
sender that you have received the message in error before deleting it.
Please do not disclose, copy or distribute information in this e-mail or take 
any action in relation to its contents. To do so is strictly prohibited and may 
be unlawful. Thank you for your co-operation.

NHSmail is the secure email and directory service available for all NHS staff 
in England and Scotland

[R] I can't get seq to behave how I think it should

2019-01-17 Thread POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) via R-help
I am using seq with the expression seq(1.4, 2.1, by=0.001) to create a sequence 
of references from 1.4 to 2.1 in 0.001 increments.  They appear to be created 
correctly.  They have a related pair of data which for the purposes of this we 
will call val.  I'm interested in the content on the row with seq = 1.8. But I 
can't seem to get it returned.  I can get other values but not 1.8!  yet 
looking at row 401 there is nothing to indicate an issue

> a = 1.4
> b = 2.1
> seq = seq(a, b, by=0.001)
> val = ceiling(seq * 50)
> s=data.frame(seq, val)
> s$val[seq==1.799]
[1] 90
> s$val[s$seq==1.8]
numeric(0)
> s$val[seq==1.8]
numeric(0)
> s$val[s$seq==1.800]
numeric(0)
> s$val[s$seq==1.801]
[1] 91
> head(s[s$seq>1.798,])
  seq val
400 1.799  90
401 1.800  90
402 1.801  91
403 1.802  91
404 1.803  91
405 1.804  91


Can anyone explain what's going on here and how I would correctly find the 
content of row 401 by using an expression to equal the seq column?







This message may contain confidential information. If yo...{{dropped:19}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] odfWeave - A loop of the "same" data

2017-06-01 Thread POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) via R-help
Before I go and do this another way - can I check if anyone has a way of 
looping through data in odfWeave (or possibly sweave) to do a repeating 
analysis on subsets of data?

For simplicity lets use mtcars dataset in R to explain.  Dataset looks like 
this:

> mtcars
   mpg cyl disp  hp drat   wt ...
Mazda RX4 21.0   6  160 110 3.90 2.62 ...
Mazda RX4 Wag 21.0   6  160 110 3.90 2.88 ...
Datsun 71022.8   4  108  93 3.85 2.32 ...
   

Say I wanted to have a 'catalogue' style report from mtcars, where on each page 
I would perhaps have the Rowname as a heading and then plot a graph of mpg 
highlighting that specific car

Then add a page break and *do the same for the next car*.  I can manually do 
this of course, but it is effectively a loop something like this:

for (n in length(mtcars$mpg)) {
barplot (mtcars$mpg, col=c(rep(1,n-1),2,rep(1,length(mtcars$mpg)-n)))
}

There is a odfWeave page break function so I can do that sort of thing (I 
think).  But I don't think I can output more than one image can I?  In reality 
I will want several images and a table per "catalogue" page.

At the moment I think I need to create a master odt document, and create 
individual catalogue pages.  And merge them into one document - but that feels 
clunky (unless I can script the merge!)

Anyone got a better way?






This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ?currency challenge ?knapsack challenge ?probabilities

2015-12-28 Thread Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)
Hi Jim

Working on basis of exact match.  but the 25% inncrements are rounded to 
imtegers, so like buying from a shop priced in whole numbers but changeis what 
you expect not 'roughly right'

Thanks

Calum

On 27 Dec 2015, at 22:04, Jim Lemon 
mailto:drjimle...@gmail.com>> wrote:
Hi Calum,
Does this include an error tolerance for the match between the ordered and 
delivered quantities? That is, is it okay to have a maximum of one unit 
difference or do deliveries have to exactly match orders?

Jim


On Sun, Dec 27, 2015 at 10:08 PM, Polwart Calum (COUNTY DURHAM AND DARLINGTON 
NHS FOUNDATION TRUST) mailto:calum.polw...@nhs.net>> 
wrote:
I am currently working on a project that is producing Gigabyte sized vectors 
that kill R.  It can take 24 hours to run. So frustrating to get that far and 
have to scale back my inputs and rerun...


The problem i'm trying to solve as i see it is very similar to optimal currency 
denominations problems combined with a knapsack problem. Let me TRY to explain.


We have a product that we want to manufacture in as few sizes as possible.  
like currency if you want 30cents but a 30cent coin doesnt exist you can join 
multiple products together. (3x10 cent, 25cent +5 cent, etc)


Unlike currency we dont need every value to be possible, we have a list of 
known values which are effectively related to each other by the next size up 
being 25% bigger. So for instance 64, 80 100.


We have some rules that say you can't use more than X products combined to make 
the final size.  A bit like saying never give more than 10 coins as change, so 
you cant issue 20x5cents for a dollar of change.


All of that fits a standard currency denomination challenge.


We dont need the combinations to be calculated using greedy method. [We will 
calculate and store as a table]


BUT - we do have a manufacturing limitation that means can manufacture to any 
whole number size, we cant do smaller than size5. (We dont go as low as that 
anyway... size 11 is as low as needed).  So different from any currency problem 
I've seen where the lowest coin size is always a 1 allowing any size to be 
produced.


So i have three questions I'm trying to answer:


- what is the smallest product range we can make that achieves our rules for 
max combinations of sizes?


- Is there a more optimal range. Say the smallest range was 4 sizes, for 
example 5,6,23,40.  Its possible adding a 22 and a 46 to that may actually be 
cheaper than supplying 2x5 and 2x6 or 2x23...

Currently I'm identifying every possible combination into a matrix.  We have a 
manufacturing constraint of max size 49 as well.  So i take every end user size 
possible (from 11 thru to 125).  For each size i then take every combination of 
possible sizes from 5 to 49 (45 sizes) that we COULD make and work out how i 
can achieve all the possible end user sizes, discarding any combinations that 
break our rules for max combinations.

Thats a giant set of for loops.  Once i establish the options we  can apply the 
manufacturing costs and usage data to find the answer.

For now 45 sizes,combined in any of up to 5 different combinations to do 10 end 
user sizes is creating vectors too big for R to handle...

Long explanation of the problem, to basically say... has anyone come across a 
function in R that might simplify this?



Sent from TypeMail<http://www.typeapp.com/r>


On 27 Dec 2015, at 08:00, "Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS 
FOUNDATION TRUST)" 
mailto:calum.polw...@nhs.net><mailto:calum.polw...@nhs.net<mailto:calum.polw...@nhs.net>>>
 wrote:



This message may contain confidential information. If you are not the intended 
recipient please inform the
sender that you have received the message in error before deleting it.
Please do not disclose, copy or distribute information in this e-mail or take 
any action in reliance on its contents:
to do so is strictly prohibited and may be unlawful.

Thank you for your co-operation.

NHSmail is the secure email and directory service available for all NHS staff 
in England and Scotland
NHSmail is approved for exchanging patient data and other sensitive information 
with NHSmail and GSi recipients
NHSmail provides an email address for your career in the NHS and can be 
accessed anywhere



[[alternative HTML version deleted]]

__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.o

[R] ?currency challenge ?knapsack challenge ?probabilities

2015-12-27 Thread Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)
I am currently working on a project that is producing Gigabyte sized vectors 
that kill R.  It can take 24 hours to run. So frustrating to get that far and 
have to scale back my inputs and rerun...


The problem i'm trying to solve as i see it is very similar to optimal currency 
denominations problems combined with a knapsack problem. Let me TRY to explain.


We have a product that we want to manufacture in as few sizes as possible.  
like currency if you want 30cents but a 30cent coin doesnt exist you can join 
multiple products together. (3x10 cent, 25cent +5 cent, etc)


Unlike currency we dont need every value to be possible, we have a list of 
known values which are effectively related to each other by the next size up 
being 25% bigger. So for instance 64, 80 100.


We have some rules that say you can't use more than X products combined to make 
the final size.  A bit like saying never give more than 10 coins as change, so 
you cant issue 20x5cents for a dollar of change.


All of that fits a standard currency denomination challenge.


We dont need the combinations to be calculated using greedy method. [We will 
calculate and store as a table]


BUT - we do have a manufacturing limitation that means can manufacture to any 
whole number size, we cant do smaller than size5. (We dont go as low as that 
anyway... size 11 is as low as needed).  So different from any currency problem 
I've seen where the lowest coin size is always a 1 allowing any size to be 
produced.


So i have three questions I'm trying to answer:


- what is the smallest product range we can make that achieves our rules for 
max combinations of sizes?


- Is there a more optimal range. Say the smallest range was 4 sizes, for 
example 5,6,23,40.  Its possible adding a 22 and a 46 to that may actually be 
cheaper than supplying 2x5 and 2x6 or 2x23...

Currently I'm identifying every possible combination into a matrix.  We have a 
manufacturing constraint of max size 49 as well.  So i take every end user size 
possible (from 11 thru to 125).  For each size i then take every combination of 
possible sizes from 5 to 49 (45 sizes) that we COULD make and work out how i 
can achieve all the possible end user sizes, discarding any combinations that 
break our rules for max combinations.

Thats a giant set of for loops.  Once i establish the options we  can apply the 
manufacturing costs and usage data to find the answer.

For now 45 sizes,combined in any of up to 5 different combinations to do 10 end 
user sizes is creating vectors too big for R to handle...

Long explanation of the problem, to basically say... has anyone come across a 
function in R that might simplify this?



Sent from TypeMail<http://www.typeapp.com/r>


On 27 Dec 2015, at 08:00, "Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS 
FOUNDATION TRUST)" mailto:calum.polw...@nhs.net>> wrote:



This message may contain confidential information. If you are not the intended 
recipient please inform the
sender that you have received the message in error before deleting it.
Please do not disclose, copy or distribute information in this e-mail or take 
any action in reliance on its contents:
to do so is strictly prohibited and may be unlawful.

Thank you for your co-operation.

NHSmail is the secure email and directory service available for all NHS staff 
in England and Scotland
NHSmail is approved for exchanging patient data and other sensitive information 
with NHSmail and GSi recipients
NHSmail provides an email address for your career in the NHS and can be 
accessed anywhere



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Not sure this is something R could do but it feels like it should be.

2013-06-06 Thread Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)
Some colleagues nationally have developed a system which means they can pick 
the optimal sets of doses for a drug.  The system could apply to a number of 
drugs.  But the actual doses might vary.  To try and explain this in terms that 
the average Joe on the street might understand if you have some amoxicillin 
antibiotic for a chest infection the normal dose for an adult is 250 to 500mg 
increased to maybe 1000mg in severe cases.

For a child it is dosed from a liquid and people usually go from 62.5mg, 125mg 
to 250mg although you could measure any volume you wanted.

What this new method has developed is a means to pick the "right" standard 
doses so what above is 62.5, 125, 250, 500, 1000.  However the method they've 
used is really engineered about ensure the jump between doses is correct - 
you'll notice that the list above is a doubling up method.

But you can also have a doubling up method that went 50, 100, 200, 400, 800, 
1600  and pretty much as many as you can think of depending on your starting 
point and there is no scientific means to pick that starting point.  So 
colleagues have developed their rather more complex equivalent of the doubling 
method to determine the doses they need but they need to know if they should 
start at 40, 50, 62.5 or some other number.

Once they have the starting number they can calculate all the other doses.  I 
realise R can do that, and I realise using a loop of possible starting numbers 
it can build all those options.

Each patient then has a theoretical dose they should get lets say that's 
10mg/kg and you might treat patients from 5 to 120kg.  They are then looking to 
calculate the variance for each dose range so if we take the 50, 100, 200, 400 
model and said you'd give 50mg to anyone needing 0?? to 75mg 100mg to anyone 
needing 76 - 150mg etc... from there they are taking that range and saying 
that's a 31% overdose to a 33% underdose.  Then they want to find if there is a 
starting number which minimises the extent of under and overdosing...

Anyone know of an existing stats function in R that can easily do that and 
almost then report from some inputs a single number that is the "best fit"?

Calum



This message may contain confidential information. If yo...{{dropped:22}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing two different 'survival' events for the same subject using survdiff?

2013-04-29 Thread Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)
> It isn't that complex:
>
> myDataLong <- data.frame(Time=c(A, C), Censored=c(B, D), group=rep(0:1, 
> times=c(length(A), length(C
> Fit = survfit(Surv(Time, Censored==0) ~ group, data=myDataLong)
> plot(Fit, col=1:2)
> survdiff(Surv(Time, Censored==0) ~ group, data=myDataLong)

Yes - for the example its not complex - but once we get down to having more 
data columns I think it may...  Maybe I ignore those and just build 
'myDataLong' for this specific test.

> However, your approach (a 'wide' data frame) suggests that there are equal 
> numbers in the two survival
> studies.  Are they even the same people?  Is it even the same study?  If so, 
> this is a competing risks question
> and would have to be approached differently.

Yes its the same patients. The two events are technically independant of each 
other but the hope is that the easier outcome measure would predict the 
other...  I'm not familliar with competing risks and so will have to read up on 
it but it isn't a scenario where A or B happens, A happens and B happens and 
you might expect A happened because B happened...

> And, of course, absence of evidence is not evidence of absence.  Failing to 
> reject the null hypothesis that the
> distributions are different is not proof that the distributions are equal.

Yes absolutely - however I'm half expecting to detect a difference and so then 
dismiss using A as a surrogate of B...

Thanks



-----Original Message-
From: Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) 
[mailto:calum.polw...@nhs.net]
Sent: Monday, April 29, 2013 4:48 AM
To: r-help@r-project.org
Subject: [R] Comparing two different 'survival' events for the same subject 
using survdiff?

I have a dataset which for the sake of simplicity has two endpoints.  We would 
like to test if two different end-points have the same eventual meaning.  To 
try and take an example that people might understand better:

Lets assume we had a group of subjects who all received a treatment.  The could 
stop treatment for any reason (side effects, treatment stops working etc).  
Getting that data is very easy.  Measuring if treatment stops working is very 
hard to capture... so we would like to test if duration on treatment (easy) is 
the same as time to treatment failure (hard).

My data might look like this:

A = c(9.77,  0.43,  0.03,  3.50,  7.07,  6.57,  8.57,  2.30,  6.17,  3.27,  
2.57,  0.77) B = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)  # 1 = yes (censored) C 
= c( 9.80,  0.43,  5.93,  8.43,  6.80,  2.60,  8.93,  8.37, 12.23,  5.83, 
13.17,  0.77) D = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored) 
myData = data.frame (TimeOnTx = A, StillOnTx = B, TimeToFailure = C, NotFailed 
= D)

We can do a survival analysis on those individually:
OnTxFit = survfit (Surv ( TimeOnTx, StillOnTx==0 ) ~ 1 , data = myData)

FailedFit = survfit (Surv ( TimeToFailure , NotFailed==0 ) ~ 1 , data = myData)

plot(OnTxFit)
lines(OnTxFit)

But how can I do a survdiff type of comparison between the two?  Do I have to 
restructure the data so that Time's are all in one column, Event in another and 
then a Group to indicate what type of event it is?  Seems a complex way to do 
it (especially as the dataset is of course more complex than I've just 
shown)... so I thought maybe I'm missing something...





This message may contain confidential information. If yo...{{dropped:29}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Comparing two different 'survival' events for the same subject using survdiff?

2013-04-29 Thread Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)
I have a dataset which for the sake of simplicity has two endpoints.  We would 
like to test if two different end-points have the same eventual meaning.  To 
try and take an example that people might understand better:

Lets assume we had a group of subjects who all received a treatment.  The could 
stop treatment for any reason (side effects, treatment stops working etc).  
Getting that data is very easy.  Measuring if treatment stops working is very 
hard to capture... so we would like to test if duration on treatment (easy) is 
the same as time to treatment failure (hard).

My data might look like this:

A = c(9.77,  0.43,  0.03,  3.50,  7.07,  6.57,  8.57,  2.30,  6.17,  3.27,  
2.57,  0.77)
B = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)  # 1 = yes (censored)
C = c( 9.80,  0.43,  5.93,  8.43,  6.80,  2.60,  8.93,  8.37, 12.23,  5.83, 
13.17,  0.77)
D = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored)
myData = data.frame (TimeOnTx = A, StillOnTx = B, TimeToFailure = C, NotFailed 
= D)

We can do a survival analysis on those individually:
OnTxFit = survfit (Surv ( TimeOnTx, StillOnTx==0 ) ~ 1 , data = myData)

FailedFit = survfit (Surv ( TimeToFailure , NotFailed==0 ) ~ 1 , data = myData)

plot(OnTxFit)
lines(OnTxFit)

But how can I do a survdiff type of comparison between the two?  Do I have to 
restructure the data so that Time's are all in one column, Event in another and 
then a Group to indicate what type of event it is?  Seems a complex way to do 
it (especially as the dataset is of course more complex than I've just 
shown)... so I thought maybe I'm missing something...





This message may contain confidential information. If yo...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] matchit - can I weight the parameters?

2012-06-19 Thread Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)
This may be a really obvious question but I just can't figure out how to do it.

I have a small dataset that I am trying to compare to some controls.  It is 
essential that the controls are matched on Cancer Stage (a numerical factor 
between 1 and 4), and then ideally on Age (integer), Gender (factor), 
Performance Status(factor).

I'm using matchit to try and do this, but it seems to give equal priority to 
all my variables so I can be relatively well matched on Age, Sex and PS but not 
exactly matched on Stage.  Stage is the biggest influence on outcome... so I 
must match it as close to perfect as possible even if that means dropping some 
data from the 'treatment' group.

Here's some code:

match = matchit(Group ~ Stage + Age + Gender + PS, myData, method="optimal")
matchedData = match.data(match)
by (matchedData$Stage, matchedData$Group, table)

matchedData$GP: 0

 1 3A 3B  4
 1  6   9  10

myreData$GP: 1

 1 3A 3B  4
 1  3   9  13

Can anyone point me to a method that tells R to prioritise Stage over the 
others?
Thanks in advance



This message may contain confidential information. If yo...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kaplan Meier - not for dates

2011-11-07 Thread Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)

>  2. The answer will be wrong.  The reason is that the censoring occurs on a 
> time scale, not a $ scale: you don't stop observing someone because
> total cost hits a threshold, but because calendar time does.  The KM routines 
> assume that the censoring process and the event process are on the
> same scale.
> The result can be an overestimation of cost.  See Dan-Yu Lin, Biometrics 
> 1997, "Estimating medical costs from incomplete follow-up data".

Having now skimmed the paper this is long term follow-up.  In my particular 
case the patients are getting treatment for relatively short periods (median 
time to stopping treatment will be ~ 9months) and will discontinue treatment 
relatively quickly (I'd be surprised if anyone is still on treatment 3-4 years 
out).  I only want the costs of that treatment not the costs for their overall 
care to death.  I'm not sure how that affects things but  hoping it makes life 
simpler.

Calum





This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kaplan Meier - not for dates

2011-11-07 Thread Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)
>  2. The answer will be wrong.  The reason is that the censoring occurs on a 
> time scale, not a $ scale: you don't stop observing someone because
> total cost hits a threshold, but because calendar time does.  The KM routines 
> assume that the censoring process and the event process are on the > same 
> scale.
> The result can be an overestimation of cost.  See Dan-Yu Lin, Biometrics 
> 1997, "Estimating medical costs from incomplete follow-up data".
>
> Terry Therneau

Thanks that's extremely useful.

I'll dig out that reference.

You are correct my censoring is happening on an event - (dis)continuation of 
treatment - not on reaching a cumulative cost.

Calum







This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kaplan Meier - not for dates

2011-11-03 Thread Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)
Thanks for the reply.

The treatment is effectively for a chronic condition - so you stay on the 
treatment till it stops working.  We know from trials how long that should be 
and we know the theoretical cost of that treatment but that's based on the text 
book dose (patients dose reduce and delay treatment and its based on weight so 
variable).  We've been asked to provide our national planning team with an 
"average" cost based on our early experiences. So we have suggested to them we 
might be able to get a median cost.  Some patients will stay on treatment 
several years so it will be impossible to get an average for years.

So the censored patients will be those still on treatment (the event being 
stopping treatment)

I'll give what you've suggested a go.

Thanks


Calum Polwart BSc(Hons) MSc MRPharmS SPres IPres
Network Pharmacist - NECN and Pharmacy Clinical Team Manager (Cancer & Aseptic 
Services) - CDDFT
Our website has now been unlocked and updated.  Should you require contacts, 
meeting details, publications etc, please visit us on www.cancernorth.nhs.uk

From: Lancaster, Robert (Orbitz) [robert.lancas...@orbitz.com]
Sent: 03 November 2011 19:55
To: Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST); 
r-help@r-project.org
Subject: RE: Kaplan Meier - not for dates

I think it really depends on what your event of interest is.  If your event is 
that the patient got better and "left treatment" then I think this could work.  
You would have to mark as censored any patient still in treatment or any 
patient that stopped treatment w/o getting better (e.g. in the case of death).  
You would then be predicting the cost required to make the patient well enough 
to leave treatment.  It is a little non-standard to use $ instead of time, but 
time is money after all.

You could set up your data frame with two columns: 1) cost 2) event/censored.

Then create your survival object:
mySurv = Surv(my_data$cost,my_data$event)

And then use survfit to create your KM curves:
myFit = survfit(mySurv~NULL)


If you have other explanatory variables that you think may influence the cost, 
you can of course add them to your data frame and change the formula you use in 
survfit.  For instance, you could have some severity measure, e.g. High, 
Medium, Low.  You could then do:
myFit = survfit(mySurv~my_data$severity)




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)
Sent: Monday, October 31, 2011 1:29 PM
To: r-help@r-project.org
Subject: [R] Kaplan Meier - not for dates

I have some data which is censored and I want to determine the median.  Its 
actually cost data for a cohort of patients, many of whom are still on 
treatment and so are censored.

I can do the same sort of analysis for a survival curve and get the median 
survival... ...but can I just use the survival curve functions to plot an X 
axis that is $ rather than date? If not is there some other way to achieve this?

Thanks

Calum



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



This message may contain confidential information. If you are not the intended 
recipient please inform the
sender that you have received the message in error before deleting it.
Please do not disclose, copy or distribute information in this e-mail or take 
any action in reliance on its contents:
to do so is strictly prohibited and may be unlawful.

Thank you for your co-operation.

NHSmail is the secure email and directory service available for all NHS staff 
in England and Scotland
NHSmail is approved for exchanging patient data and other sensitive information 
with NHSmail and GSi recipients
NHSmail provides an email address for your career in the NHS and can be 
accessed anywhere
For more information and to find out how you can switch, visit 
www.connectingforhealth.nhs.uk/nhsmail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kaplan Meier - not for dates

2011-10-31 Thread Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)
I have some data which is censored and I want to determine the median.  Its 
actually cost data for a cohort of patients, many of whom are still on 
treatment and so are censored.

I can do the same sort of analysis for a survival curve and get the median 
survival... ...but can I just use the survival curve functions to plot an X 
axis that is $ rather than date? If not is there some other way to achieve this?

Thanks

Calum



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Non-unique Values

2010-05-25 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
I might be missing something really obvious, but is there an easy way to locate 
all non-unique values in a data frame?

Example

mydata <- numeric()
mydata$id <- 0:8
mydata$unique <- c(1:5, 1:4)
mydata$result <- c(1:3, 1:3, 1:3)

> mydata
$id
 [1] 0 1 2 3 4 5 6 7 8
$unique
[1] 1 2 3 4 5 1 2 3 4
$result
[1] 1 2 3 1 2 3 1 2 3

What I want to to be able to get some form of data output that might look like 
this:

> nonunique(mydata$unique)
mydata$unique
1  $id 0, 5
2  $id 1, 6
3  $id 2, 7
4  $id 3, 8

So that I could report to my data entry team any non-unique values of unique 
and tell them the row numbers so they can check if the 'unique' value is keyed 
wrongly, or the entry had been made twice.

Hoping there is an easy way.  if not I suspect we can do it in the SQL tables, 
just trying not to juggle two languages...

C



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exporting PDF

2010-04-20 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
> I run the script and it exports a PDF called "version 1".
> I want it to check if "version 1" already exists. If so,
>  then I want the new graphs to be exported as
> "version 2", and so on.
>
> Is it possible to do it in R?

Someone may know a way.  However its certainly possible to execute a command in 
linus from R.  So you can ls the file (in windows try dir) and see if it 
exists, and build a loop round that.
Just bew3are it'll be quick if there is only 1 file.  If you have  files it 
may slow things down a bit!
Here's my example...

#Example script to check if a file exists (linux system)
filename = "somefile"
fileno = 0
extension= ".pdf"
path="/home/user/"

repeat
{
fullfile= paste(path,filename,fileno,extension, sep="")
if (length (system(paste("ls",fullfile), intern = T)) == 0) break
fileno<-fileno+1
}

#your command to save the file... using 'fullfile' as the filename and path




This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] odfWeave - merged table cells, and adding information like totals and p-values

2010-02-21 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
I'm hoping I'm missing some (probably fundamental basic process) which might 
make my life easier!

Lets assume I have a 3 column table summarizing results from a trial from three 
arms (Arm A, B and C).
For each arm there will be a number of pieces of information to report.  The 
simplest example might be to compare this to the demographic comparisons often 
seen in clinical traisl where you are setting out to prove that your 
randomization produced similar populations

So I might have a table like this:

---
 A B   C
---

Male50  50  50
Female  49  51  50

Age < 6530  29  31
Age 65+ 69  72  69
---

I've got a matrix with that data in it which I'm passing to odfWeave's table 
function.   I just want to check a few basic things.

Here's some short code which will create the matrix:
groups = c("A","B","C")
factors = c("Male","Female", "Age < 65", "Age 65+")
mydata = matrix (c(50,49,30,69,50,51,29,72,50,50,31,69), nrow=4, dimnames = 
list(factors,groups))

- Is there anyway to add a merged cell above ABC which would say Group?
- If I want to total column I can do that using:
total=as.numeric()
for (fact in 1:length(factors)) {
total[fact]=sum(mydata[fact,])
}
mydata = cbind(mydata,total)
Is there an easier way?

- Now lets say i want to do a chi-squ test between the ages differences in Gp A 
and Gp B I run
chisq.test(mydata[3:4,1:2])
What I really want is the p-value and I'll want to repeat that for Gp A vs Gp 
C.  If I was just using R I'd simply print those and then add them to my table 
by hand.  But I'm trying to be smart and use odfWeave.  Now I know I can put 
them in my caption but I'd probably have added them as an extra row in my table 
or added it in brackets similar to the SDs/ORs and CIs shown in this example 
http://www.bmj.com/cgi/content-nw/full/340/feb05_1/c199/TBL2 depending which 
was more appropriate.

- Is there an easy way to do anything like this?  I'm thinking that we often 
put crude numbers in and (%) in brackets, or CIs etc  - so my exported table 
would not ideally be pure numbers.

- As a p value usually links two columns I might have expected to use a merged 
cell which again brings me back to my original question ;-)

Thanks





Calum Polwart BSc(Hons) MSc MRPharmS SP IP
Network Pharmacist - North of England Cancer Network and
Pharmacy Clinical Team Manager (Cancer & Aseptic Services) - County Durham & 
Darlington NHS Foundation Trust



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Contingency Table - output to odfWeave not working for matrix

2010-02-20 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
Solved my own problem by using:

odfTable.matrix(
  as.matrix (
 with (mydata, table (site_id, reaction))
 )
)



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Contingency Table - output to odfWeave not working for matrix

2010-02-20 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
Hi guys I'm hoping someone can help me with this.  It should be easy but it 
seems to get stuck for no obvious reason!  I am trying to set a report up in 
odfWeave so that we can re-run the same analysis at 3 time points as the data 
matures and provide an 'instant' report.

To simplify the situation we have two pieces of data: site_id (numerical value 
1-9) and reaction (categorical Y or N).

mydata=""
mydata$site_id = rep(1:9, 15)
mydata$reaction = rep(c("Y","N","N","Y","N"),27)

until now I've simply used the command:

with (mydata, table (site_id, reaction))

to give me a 'table' that looks like this:

   reaction
site_id N Y
  1 9 6
  2 9 6
  3 9 6
...
  7 9 6
  8 9 6
  9 9 6

I can pass that as plain text to odf, but I want to make it a properly 
formatted table.

Executing odfTable(with (mydata, table (site_id, reaction)))  results in: Error 
in UseMethod("odfTable") : no applicable method for "odfTable", which I assume 
is because the output of table is not a dataframe, vector or matrix which is 
what odfTable needs.  So I got more creative - but nothing is working they way 
I want it :-(

odfTable(
  as.vector (
 with (mydata, table (site_id, reaction))
 )
)
Produces a table with 9 9's and 9 6's in a vertical column...

odfTable(
  as.data.frame (
 with (mydata, table (site_id, reaction))
 )
)

Doesn't error but produces a table similar to this:
   site_id reaction Freq
11N9
22N9
33N9
44N9
...
15   6Y6
16   7Y6
17   8Y6
18   9Y6

Both of the above are how they appear at the command prompt so not odfWeave 
doing anything wrong...

odfTable(
  as.matrix (
 with (mydata, table (site_id, reaction))
 )
)

At the command prompt this looks the same as the original output so looks like 
it might work!  BUT  odfWeave complains Error in UseMethod("odfTable") : no 
applicable method for "odfTable"  -- but its a matrix and odfWeave is supposed 
to be able to handle matrices?  Anyone got any suggestions?



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ?OT: Probabilistic Simulation

2009-12-18 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)

Sorry this may well be defined as Off Topic.  I apologize in advance.

I am interested in performing what I think would be a probabilistic sensitivity 
simulation.  I've done some crude ones before in excel but I'm wondering if R 
can help me do it more effectively?

I have a set of theoretical variables for simplicity lets use (what I think) is 
an easier example:  I have a peg and a hole which I want to predict how often 
they would not fit together securely.

The peg should be 1.8mm in diameter.
But there are some variables which might affect if it is actually 1.8mm but I'd 
expect it to be normally distributed around 1.8mm with a known standard 
deviation.  There are also variables such as temperature which would influence 
if the peg was the correct size (there would be a known relationship to 
temperature, and a mean temperature with a standard deviation would be known)

The hole should be 1.8mm diameter as well.  Again there are variables that 
affect if it is, drill size, drilling method, substrate being drilled, 
temperature at time of drilling, temperature at time of fitting peg (would be 
same as above).  I'd be looking to model if the peg would fit in the hole, AND 
if it fitted how well it fitted.

I'd then want to run a simulation of say 500 scenarios, randomly picking 
temperature, hole characteristics etc for each simulation.From there I'd 
get the number of times in 500 samples the peg wouldn't fit.  I could then try 
adjusting a variable - say using a different drilling method and see if that 
'easier' but less reliable drilling method actually would affect the number of 
times the peg didn't fit the hole properly.

So from what I understand this is a MonteCarlo simulation of probability.  And 
this is where I go off topic!  Am I right?  I know its off topic - so what I'd 
like to know is can someone point me to where I can find out?

Then if it is a monte-carlo can someone point me to a good description of how 
to get R to model this?

Ta

Calum



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Winbugs under R's error message

2009-10-03 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
>LinZhongjun wrote:
>>
> >I ran Winbugs under R. I could get the results, but I kept getting the error 
> >messages:
>>
> >Error in 
> > file(con, "wb") : cannot open the connection
> >In addition: Warning messages:
> >1: In file.create(to[okay]) :
> >  cannot create file 'c:/Program 
> > Files/WinBUGS14//System/Rsrc/Registry_Rsave.odc', reason 'Permission denied'

Uwe wrote:
>
> You probably do not have write permissions for those WinBUGS files that 
> R2WinBUGS tries to rewrite.
>
and wrote:
>
> Why do you post twice?
>

Uwe that was a pretty obvious answer and filled my inbox with two rather 
useless emails at the same time - so I hardly think you have the right to then 
criticise someone for dual posting.  I'd guess he didn't intentionally do it!

However your answer failed to point out the more obvious reason there is a 
problem - the path has a double forward slash in it WinBUGS14//System - under 
windows thats an impossible file name as far as I know.  I don't use WinBUGS 
(*or even windows) but rather than looking at file privs you should be trying 
to find how that path gets built.  My gut feeling is there will be a config 
file somewhere with a path that is C:/program files/WinBugs14 and another path 
that takes you from there to the files which is listed as /System/Rsrc/ - 
joining the two together and you get a double slash.  Probably take the leading 
slash off the second one - but not sure where you'll find the file.

Hope that helps - sorry I can't tell you where to look.



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Is there an equivalent of "echo"

2009-09-14 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
Sorry I'm having one of those moments where I can't find the answer but I bet 
its obvious...

I'm outputting my results to a file using sink()

Is there a command simillar to php's echo command that would allow me to add 
some text to that file ie:

dataFr$a = 1:10
dataFr$b = 2*1:10
sink ("filepath/filename.txt", split=T)
#Show number of entries in vector a
table (dataFr$a)
#show number of entries in vector b
table (dataFr$b)
#show relationship between a and b
table (dataFr$a , dataFr$b)
sink()

Gives me a text file like this:
 1  2  3  4  5  6  7  8  9 10
 1  1  1  1  1  1  1  1  1  1

 2  4  6  8 10 12 14 16 18 20
 1  1  1  1  1  1  1  1  1  1

 2 4 6 8 10 12 14 16 18 20
  1  1 0 0 0  0  0  0  0  0  0
  2  0 1 0 0  0  0  0  0  0  0
  3  0 0 1 0  0  0  0  0  0  0
  4  0 0 0 1  0  0  0  0  0  0
  5  0 0 0 0  1  0  0  0  0  0
  6  0 0 0 0  0  1  0  0  0  0
  7  0 0 0 0  0  0  1  0  0  0
  8  0 0 0 0  0  0  0  1  0  0
  9  0 0 0 0  0  0  0  0  1  0
  10 0 0 0 0  0  0  0  0  0  1

What I'd like is to be able to add some headers in the text file maybe like 
this:

sink ("filepath/filename.txt", split=T)
echo "Number of entries in vector a"
table (dataFr$a)
echo "number of entries in vector b"
table (dataFr$b)
echo "relationship between a and b"
table (dataFr$a , dataFr$b)
sink()

Giving an output like:
Number of entries in vector a
 1  2  3  4  5  6  7  8  9 10
 1  1  1  1  1  1  1  1  1  1
number of entries in vector b
 2  4  6  8 10 12 14 16 18 20
 1  1  1  1  1  1  1  1  1  1
relationship between a and b
 2 4 6 8 10 12 14 16 18 20
  1  1 0 0 0  0  0  0  0  0  0
  2  0 1 0 0  0  0  0  0  0  0
  3  0 0 1 0  0  0  0  0  0  0
  4  0 0 0 1  0  0  0  0  0  0
  5  0 0 0 0  1  0  0  0  0  0
  6  0 0 0 0  0  1  0  0  0  0
  7  0 0 0 0  0  0  1  0  0  0
  8  0 0 0 0  0  0  0  1  0  0
  9  0 0 0 0  0  0  0  0  1  0
  10 0 0 0 0  0  0  0  0  0  1

Possible?  Without 200 lines of code?



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to set default plotting colors by treatment?

2009-09-14 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
>> col=c("blue","red")mydfr$[treatment]
>
> Yes, but I would like to use the function for lots of other dataframes
> as well, so embedding 'mydfr' in the function is not the ideal
> solution...

In that case I'd try something like:

 myplot <- function(..., tmnt) {
 plot(...,
   pch=19,
   col=c("blue","red")[tmnt]
  )
 }

with(mydfr, myplot(Xmeas, Ymeas, tmnt=treatment))

That seems to work... - basically you just need to pass treatment in the 
function call...






This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Exporting Numerous Graphs

2009-09-14 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
> I have got 27 graphs to export (not a lot...I know!). How can I fit all of
> them into a single file like PNG without adjusting the size of the graphs?
> What's in my mind is like pasting graphs into Word, in which I can just
> scroll down to view the graphs.

Pretty sure PNG can only cope with single 'page' images - the page can be as 
big as it wants but then when it comes to things like printing its gonna cause 
problems as I doubt many graphics packages can split it over the page?  So 
they'll either crop it or scale it.  27 on 1 page is gonna be very small?

TIFF can handle multiple pages and of course so can PDF.  I don't know of an 
export to TIFF function.  So I'd suggest exporting to PDF - and exporting to 27 
different file names (1 to 27.pdf)  Then using a tool like pdfshuffler (linux 
GUI based) or using command line ghostscript (windows or linux)

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=merged.pdf 1.pdf 2.pdf 
3.pdf ...etc... 27.pdf
(This is a linux command not a R command.  The widnwos version will be very 
simillar I suspect)

That'd give you a 27 page pdf file with each graph on a new page?  Much easier 
to scroll through than using a scroll bar on a graphics package - you can go 
back to Page 5 and on to page 9 to compare quickly rather than having to 
manually scroll to find the right info.  Plus PDF is vector based which means 
future importing into decent desktop publishing packages should avoid and 
issues with loss / scaling.

I believe its also possible with psmerge using postscript and so possible EPS 
files.




This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to set default plotting colors by treatment?

2009-09-14 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
>
> # I tried defining a function like this
> myplot <- function(...)plot(..., pch=19, col=c("blue","red")[treatment])
>
> # So i can call it like this:
> with(mydfr, myplot(Xmeas, Ymeas))
>
> # but:
> Error in plot.xy(xy, type, ...) : object 'treatment' not found
>
basically that is something like calling:

myplot( mydfr$Xmeas, mydfr$Ymeas )

So plot doesn't know that treatment is within mydfr...

changing your function to:

myplot <- function(...) {
plot(...,
   pch=19,
   col=c("blue","red")mydfr$[treatment]
  )
}

should work?



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Obtaining value of median survival for survfit function to use in calculation

2009-09-08 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
Hi,

I'm sure this should be simple but I can't figure it out!  I want to get the 
median survival calculated by the survfit function and use the value rather 
than just be able to print it.  Something like this:

library(survival)
data(lung)
lung.byPS = survfit(Surv (time, status) ~ ph.ecog, data=lung)
# lung.byPS
  Call: survfit(formula = Surv(time, status) ~ ph.ecog, data = lung)
  1 observation deleted due to missingness
  n events median 0.95LCL 0.95UCL
  ph.ecog=0  63 37394 348 574
  ph.ecog=1 113 82306 268 429
  ph.ecog=2  50 44199 156 288
  ph.ecog=3   1  1118 Inf Inf

What I want is to be able to call the median using something like:

lung.byPS$median[ph.ecog=0]

so that I can add it to a plot like this:
plot (lung.byPS,
conf.int=F,
lty=1:4,
)
abline(h=0.5, lty=5)
abline(v=lung.byPS$median[ph.ecog=1])
abline(v=lung.byPS$median[ph.ecog=2])

Anyone got any easy solutions?  Its fairly normal to plot across and down to 
show medians on survival curves... so I'm sure it must be possible to automate.

Ta

Calum





This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Partykit Document

2009-08-25 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
>Example,
>data("GlaucomaM", package = "ipred") is accepted. Now instead of GlaucomaM,
I> need to give my own data. But the data format for GlaucomaM is not given.
>So how can I know that?

Not familliar with the packages at all but if you simply enter:

> data ("GlaucomaM", package="ipred")
> GlaucomaM

All the data will be shown for you which helps you understand what it is - a 
bit!  There is a lot of data and it wraps so its hard to understand.  So 
instead try:

> summary (GlaucomaM)
That'll tell you a bit about the 'field' names and the sort of content it has.

What all the fields are ... not so sure!  I'm afraid the output in the example 
means nothing to me - but possibly if you understand recursive partitioning you 
understand the output which when you know that the input fields are might help. 
 Alternatively if you still struggle with the input fields they stole the data 
from the ipred package.  I generally find that if you want to know what a 
package is all about going to: 
http://cran.r-project.org/web/packages//.pdf seems to 
give you the manual!  http://cran.r-project.org/web/packages/ipred/ipred.pdf  - 
does indeed and there is a bookmarked section all about the GlaucomaM dataset. 
Gives you every field name and what it means... still none the wiser - I assume 
if you laser scan peoples eyes it means something!  That same page would also 
be available with:

> library (ipred)
> help (GlaucomaM)

Hopefully that helps?

Calum



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] forest (meta) editing text

2009-08-22 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
OK - I've been doing some work on getting a forest plot or two together for a 
sub-group analaysis.  Thats the crucial thing - because its a sub-group 
analysis rather than a meta-analsysis and all the forest (meta) and forestplot 
(rmeta) instructions assume you are doing a meta-analysis.

I found rmeta didn't play nicely for me so I'm using the meta package with 
forest.  Here's some code that would allow you to reproduce it:

library(meta)
mydata.forest.DoseRed = data.frame(c(1,7,14,6,16), c(7,21,10,15,23), 
c(1,1,5,2,5), c(4,7,6,5,12))
colnames(mydata.forest.DoseRed)=c("n.DR","n.No.DR", "E.DR","E.No.DR")
rownames(mydata.forest.DoseRed)=c( "Age: 50-59",   "Age: 60-69 ", "Age: >70", 
"Sex: F ", "Sex: M")

mydata.meta.DoseRed = metabin(
E.DR, n.DR, E.No.DR, n.No.DR,
data=mydata.forest.DoseRed,
sm="RR", meth="MH",
studlab=rownames(mydata.forest.DoseRed)
)

forest(
mydata.meta.DoseRed,
comb.fixed=TRUE,
comb.random=FALSE,
leftcols="studlab",
plotwidth=unit(8, "cm"),
at = c(0.5,1.0,2.0,3.5),
xlim = c(0.22, 3.48)
)

Thats roughly what I want.  There is a jpeg of it here:  
http://www.wittongilbert.free-online.co.uk/forest.jpg

BUT - there are two problems:

Firstly the subgroups have a heading called Study - which makes sense for a 
meta-analysis but should say something like sub-groups for a sub-group 
analysis.  Anyone know if this is customisable?

Secondly I want to display a global effect (i.e. the RR and confidence 
intervals for the whole study population) - thats what the diamond should show 
but it actually is the MH Fixed Effect Model.  Again - probably appropriate for 
a meta-analysis but I know the actual value for the whole population so i don't 
have to model it...  Anyone know how to get that top plot instead?  forest 
doesn't open a standard plot window so you can't simply use plot commands to 
send extra stuff to the window.

Or do I need to use plot (mydata.meta.DoseRed) and then build up the extra info 
from there?  Still surprised this isn't a proper function!




This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PowerCut Killed R - is my code retrievable?

2009-08-20 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
Rolf Write:
> If you've been cut-and-pasting from a text editor, then your commands
> *might* be in the file .Rhistory.  Unfortunately this history gets saved
> only when you exit R (and by default only when you also say ``yes'' to
> saving the workspace) or if you explicitly savehistory().

Was hoping there was some thing like that... but like you say it wasn't a 
planned exit so its not there

--

Barry wrote:

> So you were just using the text editor as a scratch pad, and not
> saving it?

Exactly

> A half-decent text editor should be saving things to disk
> as you go. For example, in emacs, if your emacs dies while editing a
> file then when you restart it, it will notice and offer to restore it
> from its "restore" file.

Yeh thats what I was expecting to happen.  I can't actually remember but I 
think it was gedit that I had open although sometimes I use kate.  Neither have 
autorecoverry as I now know.  Both can recover (possibly) files that had once 
been saved but it seems I never saved it - nit was literally just a big 
temporary, editable clipboard!

> If you were using emacs you might have files
> like #foo.R# which are emacs' auto-restore files. Other editors might
> do other things - possibly leaving files in /tmp on a Linux system
> (but they might get zapped by a reboot).

It seems only if you hit save once then you may get temporary copies...!

I've never liked emacs for other programming stuff - but will take another look 
at it.

--
Ted wrote:
> To follow up on what Barry wrote above: Don't be put off by the
> recommendation to use EMACS (which is not to everyone's taste).

Took the words out of my mouth

>I use vim, which does much the same sort of thing: there is a hidden
> ".whatever.swp" file which contains a back-trackable history of
> the editing changes that have been made to the file. So, if you get
> a crash, on re-opening the file in vim you are asked if you want to
> "recover". The ".swp" file is *not* zapped by a reboot!

Good to know.  I used to use vi quite a lot in my command line linux days.
These days it seems cumbersome on my linux PC but works fine on my server!
Must be a configuration issue...

> Again, what I routinely do (in Linux) when developing R code is to
> have two terminal windows open. In one I am running R. In the other,
> beside it, I am editing a file of R code. To run code in R that has
> been entered in the "code" window, I just highlight it with the
> mouse in the "code" window, and then paste it into the "R" window.

Exactly what I was doing.  Sounds like emacs with ESS has this even
slicker - highlight and run.  I just need to save the stuff!

--
Ellison wrote:
> Use a text editor or something specific to R like Tinn-R and _save early
> and often_.

Tinn-R is window$ only.  I've played with RKWard and found it cumbersome plus 
it doesn't put my commands and its commands together so you don't get a command 
line History.

Looking at sciViews-K just now.

I currently use Eclipse as my DIE for PHP programming so I'll look at that too. 
 I guess where I'm struggling is I don't really rate what I'm doing as 
programming becasue I'm not writing R plugins just some functions... Mind you 
just having some R context sensitive text / syntax highlighting may be an 
advantage.

--
Hans wrote:
> What about a version control system to (locally) save the different
> stages of your script files?
> (I use git but Subversion (SVN) may be easier and do the job).

My changes were typically one line at a time - and re-run it.  I could remember 
what it was if it didn't do it (or use undo).  But if this gets big then SVN 
etc might be worth it.  I have no hands on expereince with GIT - one of my PHP 
projects has been discussing moveing to GIT / Bazzar for about 6 months.  I see 
the advantage for them as they have multiple coders - is there some advanatge i 
might have missed for a lone programmer (sorry going well OT).

Clearly either doesn't work if I don't save the file!

Thanks BTW for all the posts back - one of my best responded to queries!

Calum







This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] PowerCut Killed R - is my code retrievable?

2009-08-19 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
I've been tweaking code for several days on and off in R, cut and pasting in 
from a text editor (I just leave them open all the time).  I think I got 
something that was usable but then a powersurge tripped the fuses and 
unfortunately the machine I was working on doesn't have a UPS.

Does R hold the command data in cache some place?  I've purposefully not opened 
R since the crash so that if it did I don't over write it?

I guess what I'm wanting is the equivalent of the linux up arrow which repeats 
the last line.  i know that exists in R but from what I recall it didn't work 
when you close the R session.  Is that stored in a hidden file someplace that I 
could copy?

Its not a huge piece of code or anything - but it was code I tweaked over 
several stages as I'm new to R and so it was part of my learning process.

Also - is there a better way for the future?  I know some people use IDE's but 
is that for serious programming or for building a small function and tweaking 
it?

Thanks

Calum



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odd results with Chi-square test. (Not an R problem, but general statistics, I think)

2009-08-18 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
I'm far from an expert on stats but what I think you are saying is if you try 
and compare Baseline with Version 3 you don't think your p-value is as good as 
version 1 and 2.  I'm not 100% sure you are meant to do that with p-values but 
I'll let someone else comment on that!.

totalincorrect  correct   % correct
baseline 898  708 190   21.2%
version_1   898  688 210   23.4%
version_2   898  680 218  24.3%
version_3   1021790  231  22.6%

>
> Here, the p value for version_3 (when compared with the baseline) seems to
> make no sense whatsoever. It shouldn't be larger that the other two p
> values, the increase in correct answers (that is what counts!) is bigger
> after all.
>
No its not the raw numbers its the proportion of correct answers that counts.

I've added a % correct to your data - does that  make it clearer?  Only 22.6% 
of version 3's answers were correct - so the difference in terms of % change is 
smaller than version 1 and 2 produced.  From my niave persepctive I'd want to 
test for a difference between all results and baseline, and then v1 & v2, v1 & 
v3, v2 & v3  (you may tell me they are unsound things to test - in which case 
don't test them.  You'd then need to determine a threshold for accepting that 
the test is valid (say p < 0.05).  I'#d contest that the test should be two 
tailed - results could be better or worse?

You should also develop a hypothesis.  Let me create one for you:


A.
H1: version1 of the software is better than baseline
(H0: version 1 is no better than baseline)

B.
H1: version2 of the software is better than version 1
(H0: version 2 is no better than version 1)

C.
H1: version3 of the software is better than version 2
(H0: version 3 is no better than version 2)

Now look at you results and p-values and and work out if the H1 or H0 applies. 
You could develop further variants (D: version 3 is better than baseline).

Finally - remember to consider the 'clinical significance' as well as the 
statistical significance.  I'd have hoped a software change might have increase 
correct answers to say 40%?  And remember also that p-value of 0.05 has a false 
positive rate of 1:20.

>
> Any idea what's going on here? I thought the sample size should have no
> impact on the results?
>
Erm.. sample size always has an influence of results,  If you show a  
difference in 100 samples - you would expect a larger p value for virtually any 
statistical test you chose than if you show that same difference in 1000 
results.  You have a bigger sample but a smaller overall difference so in 
effect you can be less sure that that change is not down to chance. (Purists 
statisticians will likely challenge that definition)




This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using 'field names' of a data.frame in a function

2009-08-06 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
I may be doing this wrong!  but I have a function which I have simplified a lot 
below.  I want to pass some 'field names' of a data-frame to the function for 
it to then do some manipulation of.

Here's my code:
#build a simple dataset
 mydataset = data.frame ( 
ages=c('40-49','40-49','40-49','30-39','50-59','50-59','60-69','50-59'),  sex = 
c("M","F","F","M","F","F","M","M"))

#build a simple function
 myfunc <- function (categories) {
   table (categories)
 }

#call the function
myfunc (c(mydataset$ages, mydataset$sex))

===

My output I am getting is:
categories
1 2 3 4
5 7 3 1

But what I'm expecting is:
table (mydataset$ages, mydataset$sex)

F M
  30-39 0 1
  40-49 2 1
  50-59 2 1
  60-69 0 1


Calling the function as:  myfunc ("mydataset$ages, mydataset$sex") doesn't work 
either and nor does myfunc (c("mydataset$ages", "mydataset$sex"))

Now in the simple version above I could make the function (category1, 
category2) and then call table (category1, category2) - but what if I might 
have a category 3, category 4 etc... How can I pass the dataset name to the 
function without it trying to actually pass the data to the function?  I've 
also tried paste - but perhaps I'm mis-using it?

Many thanks for help in advance



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summarising Data for Forrest Plots

2009-07-30 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)

>Ah, I think I see what you want. Try this on each pair of exclusive sets:


>Then under65row and over65row should be the first two rows of your result.
>Can't test this at the moment, but I don't think it's too far wrong.
>
I knew this shouldn't need so much work ;-)

Not cracked it yet - because as I see it I need a 2 x 4 table and at the moment 
I only cracked a 2 x 2 table.  ( Or really I need something like a 10 x 4 - but 
the 4 is the bit that I haven't cracked)

First option is something like this:
with(mydataset,  table(Sex, Dose))

I can get:
Dose
Sex  FD RD
  F615
  M  1623

For non catagorical data its slightly trickier... but quite achievable in two 
lines (for the 2 x 2 table)

factor(cut(mydatasetl$Age, breaks = c(0,65,100))) -> AgeBands
table (AgeBands, mydataset$Dose)

Which gives:

AgeBands  FD RD
 (0,65]156
 (65,100]1326

Although - I'm not yet sure if I can actually call that data back by column 
names.  ie

x <- table (AgeBands, mydataset$Dose)
x$FD

produces an error. :-(

But getting there.
















This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summarising Data for Forrest Plots

2009-07-30 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)

>Ah, I think I see what you want. Try this on each pair of exclusive sets:
>
n_total<-dim(mydataset)[1]
under65<-mydataset$age <= 65
n_under65<-sum(under65)
under65row<-c(sum(mydataset$dose[under65] == "FD"),
 sum(mydataset$dose[under65] == "RD"),
 sum(mydataset$vitalstatus[under65] == "dead" &
  mydataset$dose[under65] == "FD"),
 sum(mydataset$vitalstatus[under65] == "dead" &
  mydataset$dose[under65] == "RD"))
over65row<-c(sum(mydataset$dose[!under65] == "FD"),
 sum(mydataset$dose[!under65] == "RD"),
 sum(mydataset$vitalstatus[!under65] == "dead" &
  mydataset$dose[!under65] == "FD"),
 sum(mydataset$vitalstatus[!under65] == "dead" &
  mydataset$dose[!under65] == "RD"))
>
>Then under65row and over65row should be the first two rows of your result.
>Can't test this at the moment, but I don't think it's too far wrong.

Thanks Jim.

Yes it looks like that code should do the job.  I was really hoping for a code 
like "SummariseForSubsetAnalysis(mydataset, by=mydatatset$dose, subsets=c(age, 
renal, sex, toxicity), event=survival )" which would magically do it for me ;-)

I guess if this is something I start having to do lots I might have to write 
one.

Surprised one doesn't seem to exist - perhaps the number of variations in what 
people want would be too complex.

Calum




This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Superscripts and rounding

2009-07-30 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
Duh! did it again! the variables need str in front of them don't they!!

sub = sprintf('Seasonal station with natural streamflow - Lat: %s Lon: %s
Gross Area %s km\UB2 - Effective Area %s km\UB2 ',
round( str[['metadata']][['latitude']], digits=4 ),
round( str[['metadata']][['longitude']], digits = 4),
round( str[['metadata']][['grossarea']], digits = 4),
round( str[['metadata']][['effectivearea']], digits = 4))


Let me know if that works.



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Superscripts and rounding

2009-07-29 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
>
> library(RODBC)
> library(HYDAT)
> You will need to install HYDAT (the zip file) from
> http://www.geog.ubc.ca/~rdmoore/Rcode.htm
>
> Below is my current code - which works. The [[]] is the way i am accessing
> the columns from the data frame.
>
> thanks again for all your help
>
> # load HYDAT data
> par(mfrow=c(3,1))
> path <- 'c:\\HYDAT'
> wsc.id <- '11AB075'
> stn <- ReadHydatFile(path, wsc.id)  #if data in CDROM image
> # stn <- ReadHydatDatabase (path, wsc.id) #if data in database
>

I'd need access to the whole data file.  I tried exporting some data from the 
website for it but it got too complex for me!

However, it seems to me you have two chunks of data:

stn[flow] - which has daily flow data in it?
stn[metadata] which i guess is a header for the whole dataset -describing what 
it is.

So I recreated what i hope might be simillar to part of your data (using lst 
instead of stn as the vector/array/list name)

# Build an array containing a lat & long.
info <- data.frame(latitude=1.0005689, longitude=55.698754)
#display result
info
#  latitude longitude
#1 1.000569  55.69875
#create a list (called lst) with an object metadata in it containing the array
lst <- list (metadata=info)
#display result
lst
#$metadata
#  latitude longitude
#1 1.000569  55.69875

#check if can call a single piece of data using the square brackets as 
references...
lst[['metadata']][['longitude']]
#[1] 55.69875

#now try rounding that
> round (lst[['metadata']][['longitude']], digits=2)
[1] 55.7

#now try sprintf'ing that
sprintf('Seasonal station with natural streamflow - Lat: %s', round 
(lst[['metadata']][['longitude']], digits=2))
# [1] "Seasonal station with natural streamflow - Lat: 55.7"

#now try that in a plot
plot(1,1, sub=sprintf('Seasonal station with natural streamflow - Lat: %s', 
round (lst[['metadata']][['longitude']], digits=2)))
# results in a correct label ;-)

Its possible to refer to that same value in the following ways:

> lst$metadata   # same as lst[['metadata']]
  latitude longitude
1 1.000569  55.69875
> lst$metadata[['longitude']] # same as lst[['metadata']][['longitude']]
[1] 55.69875
> lst$metadata$longitude # same as lst[['metadata']][['longitude']]
[1] 55.69875
>

So I'm stumped!  without being able to see the actual structure of your data I 
can't figure out why you are getting an error!

BTW - was there a cut and paste error?  Your error message was reported as:

Error: unexpected symbol in:
"  sub = sprintf('Seasonal station with natural streamflow - Lat: %s Lon: %s
Gross Area %s km\UB2 - Effective Area %s km\UB2,
+ round( [['metadata"
> + round( [['metadata']][['longitude']], digits = 4),
Error: unexpected '[[' in "+ round( [["

The + on + round looks like the plus was typed. + shouldn't have been typed, 
but also was there a missing quote after the \UB2.  You should have entered:

sub = sprintf('Seasonal station with natural streamflow - Lat: %s Lon: %s
Gross Area %s km\UB2 - Effective Area %s km\UB2 ',
round( [['metadata']][['latitude']], digits=4 ),
round( [['metadata']][['longitude']], digits = 4),
round( [['metadata']][['grossarea']], digits = 4),
round( [['metadata']][['effectivearea']], digits = 4))

C





This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summarising Data for Forrest Plots

2009-07-29 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
>> What I want to do is do a forrest (forest) plot for subgroups within my 
>> single dataset as a test of heterogeniety. I have a dataset who received 
>> either full dose(FD) or reduced dose(RD) treatment, and a number of 
>> characteristics about those subjects: age, sex, renal function, weight, 
>> toxicity.  And I have survival data (censored).  they are in standard 
>> columnar data.
>>
> Is there an *easy* way to transform them into something like this:
>>
>> SubGroupn.FDn.RDsurv.FD surv.RD
>> 1   Age >65
>> 2   Age <= 65
>> 3   Male
>> ...
>> 9   Grade 0-2 Tox
>> 10  Grade 3/4 Tox

>>
>Hi Calum,
>Have you tried subsetting the dataset like this:
>
>meta.DSL(...,data=mydataset[mydataset$age <= 65,],...)
>
>Jim

Hi Jim,

I'm not sure that I understand!  But my understanding was that meta.DSL wants 4 
bits of information number treated (Full Dose in my case), Number in control 
(reduced dose in my case), Number of events in the twoi groups... which is what 
I was trying to describe above - although possibly not very well..

Then it will do the work for me.

My challenge is taking a load of data in columns and getting it summarised by 
the subgroups so that it takes Age > 65 and counts how many had full dose, 
howmany had reduced dose and populates the field then does the same for Age < 
65 etc etc...  (I may be back with questions about the survival value - but 
even knowing how to get it to summarise like I describe would be a start.  I 
guess its a bit like a pivot table in excel?


But perhaps its something to do with the mydataset[mydataset$age <=65,] bit?  
That seems to give me a data table with only the 65 and unders which makes 
sense.  But then how do I get it to populate a table with the numbers in the 
two groups?



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summarising Data for Forrest Plots

2009-07-29 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
> Are n.FD and n.RD the number of people who received the full/reduced dose
Yes - but I don't have the data structured like that YET - thats what I want to 
get to because thats what forest plot seems to be wanting.

> and surv.FD and surv.RD the number of people that survived?
Mmm... was more thinking of something like median survival?  ALthough the brain 
hasn't kicked into gear yet tonight and it might actually be mean to be a 
hazard ratio?

>And are the people who received the full dose different from the people who 
>received the reduced dose?
Yes

> And what exactly is it that you want to plot in the forest plot?
Subgroups - see below

>From the way you have arranged the table, it seems as if you want some kind of 
>effect size measure that contrasts the survival rate of the full versus 
>reduced dose in the various subgroups. Is that correct?
Yip that sounds right

>And are you just trying to figure out how to draw the forest plot once you 
>have the data in the table form as shown in your post or are you also trying 
>to figure out how to create that table to begin with?
I *think* I can draw the plot once I have the data structured right.  But at 
the moment my data is structured like this:

PatientID  FullDose   Survival  Censored  Age Sex   Normal Renal Func   
Grade of Toxicity

001  Y125 N   75   F   Y
 1
002  N55   Y   55   M  N
 4
003  N65  Y78   F   Y   
  2

I want to eventually get to a forest plot that looks a bit like this:

Age:
< 65|-#---||
>= 65 |-#|   |
   |
Sex:|
M|-#--|---|
F  |---#-|   |
   |
Renal Fucn:   |
Normal   |---#-|
Abnormal   |---#-|
   |
Grade of Toxicity:  |
0-1 |  
|---#---|
2  |-#--||
3-4  |--#|  |
  |
Overall:  <>   |


Which I believe I can achieve using the metaplot or forrest plot functions, 
replacing the studies with the relevant sub groups.  But my challenge has been 
converting the patient data above down to list subgroups.  Other than by 
running a survival analysis individually on an individual subgroup recording 
the results and building up a table.

Calum



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Superscripts and rounding

2009-07-28 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
So the issue is something to do with the [['xxx']] construction of your data.

Can you explain what thats' all about - as it errors all over the shop when I 
try using that...

You've set me on a mission to find the answer!  So I'd really like to recreate 
a little bit of your data here, and play...

C





This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Superscripts and rounding

2009-07-28 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
Slightly confused because if I try:

> newdata.yaxis = c(2.473, 3.123456, 3.23456, 2.67890, 1.56789)
> newdata.yaxis_4 = round (newdata.yaxis, digits = 4)
> newdata.yaxis
[1] 2.47 3.123456 3.234560 2.678900 1.567890
> newdata.yaxis_4
[1] 2. 3.1235 3.2346 2.6789 1.5679

As you see - I get a new variable with the result rounded to 4 places.

Then if I try:

> inserted = sprintf("Some text with a 4 digit number ( %s ) inserted in it", 
> newdata.yaxis_4[2])
> inserted
[1] "Some text with a 4 digit number ( 3.1235 ) inserted in it"

It works for me - clearly the way my data is structured is a little different 
from you.  Would have thought you might need to do something like:
> [['rounddata']] = round ([['metadata']], digits = 4)

Then call your data with [['rounddata']][['latitude']] etc - I'm no expert on 
all this matrixes stuff though! (Not even sure what a double [[ means! )

But this (calling the round within the sprintf function) also works for me:

> inserted = sprintf("Some text with a 4 digit number ( %s ) inserted in 
> it",round( newdata.yaxis[2], digits = 4))
> inserted
[1] "Some text with a 4 digit number ( 3.1235 ) inserted in it"

So why can't you just use:

> sub = sprintf('Seasonal station with natural streamflow - Lat: %s Lon: %s 
> Gross Area %s km\UB2 - Effective Area %s km\UB2,
+ round( [['metadata']][['latitude']], digits = 4),
+ round( [['metadata']][['longitude']], digits = 4),
+ round( [['metadata']][['grossarea']], digits = 4),
+ round( [['metadata']][['effectivearea']] digits = 4),
+ )




This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Summarising Data for Forrest Plots

2009-07-28 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
I tried to post this a few times last week and it seems to have got stuck 
somehow so I'm trying from a different email in the hope that works.   If 
somehow this has appeared on the list 20 tiems and I never saw any of them I 
apologize ;-)

I'm basically an R-newbie.  But I am VERY computer literate.  But this has me 
stumped...

All the examples for using the rmeta package to create a forest plot or 
simillar seem to use the catheter data:

 Name n.trt n.ctrl col.trt col.ctrl inf.trt inf.ctrl
1  Ciresi   124127  15   21  13   14
2  George44 35  10   25   13
3  Hannan68 60  22   22   57
4   Heard   151157  60   82   56
...

As I see it thats a summary of data from several published trials.

What I want to do is do a forrest (forest) plot for subgroups within my single 
dataset as a test of heterogeniety. I have a dataset who received either full 
dose(FD) or reduced dose(RD) treatment, and a number of characteristics about 
those subjects: age, sex, renal function, weight, toxicity.  And I have 
survival data (censored).  they are in standard columnar data.

Is there an *easy* way to transform them into something like this:

SubGroupn.FDn.RDsurv.FD surv.RD
1   Age >65
2   Age <= 65
3   Male
...
9   Grade 0-2 Tox
10  Grade 3/4 Tox

Which rmeta will then let me use to create a forest plot from?  This is a 
reasonably standard approach in biomedical studies these days so it seems odd 
that I can't find any "How-To" that tells me how to short cut it.  Otherwise I 
have to manually calculate each of the parameters :-(  Which is a real pain as 
we are awaiting more mature data which would need the same process re-run.

Thanks in advance

C



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Superscripts and rounding

2009-07-28 Thread Polwart Calum (County Durham and Darlington NHS Foundation Trust)
I'm anything but an expert in R however if I'm labeling a graph axis with a 
superscript I have tended to use:

> plot (x , y , xlab = expression ("label"^2))

But when you try to have more than one superscript it fails.  Assuming you are 
in a UTF8 location (Western Europe) you could try:

> plot (x , y , xlab = expression ("Some label text \UB2 some more label text 
> \UB2"))

That works for me.  (B2 is the hex code for UTF-8 character = ^2 and \U is a 
control sequence that will call that character.)  It onlyu works if you are a 
UTF8 area from what i understand.

--

As for rounding - can you not populate a new field using the round command?

i.e. something like

[metadata][long_4dig] = round([metadata][longitude],4)  #I haven't used round 
before my syntax may be wrong!

Then use %s to drop that value in?

C



This message may contain confidential information. If yo...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.