Re: [R] dplyr's arrange function - 3 solutions received - 1 New Question

2016-06-16 Thread Muhuri, Pradip (AHRQ/CFACT)
Hello David,

Your revisions to the earlier code have given me desired results.

library("gtools")
mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c("indicator", 
"prevalence_c")  ]

Thanks,

Pradip


Pradip K. Muhuri,  AHRQ/CFACT
 5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564


 


-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Thursday, June 16, 2016 12:54 PM
To: Muhuri, Pradip (AHRQ/CFACT)
Cc: r-help@r-project.org
Subject: Re: [R] dplyr's arrange function - 3 solutions received - 1 New 
Question


> On Jun 16, 2016, at 6:12 AM, Muhuri, Pradip (AHRQ/CFACT) 
> <pradip.muh...@ahrq.hhs.gov> wrote:
> 
> Hello,
> 
> I got 3 solutions to my earlier code.  Thanks to the contributors.  May I 
> bring your attention to  a new question below (with respect to David's 
> solution)?
> 
> 1) Thanks to Daniel Nordlund  for the tips - replacing leading space with a 0 
>  in the data.
> 
> 2)  Thanks to David Winsemius for  his  solution with the gtools::mixedorder 
> function.   I  have added an argument to his.
> 
> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE),  ]
> 
> 3)  Thanks to Jim Lemon's for his  solution. I  have prepended a minus sign 
> to reverse the order.
> 
> numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c)," 
> "),"[",1)) mydata[order(-numprev), ]
> 
> 
> (New)Question for solution 2:
> 
> I want to keep only 2 variables  (say, indicator and prevalence_c) in the 
> output.  Where to insert the additional code? Why does the following code 
> fail?
> 
>> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), 
>> c(mydata$indicator, mydata$prevalence_c) ]
> 


Try instead just a vector of names for the second argument to "["

 mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), 
 c("indicator", "prevalence_c") ]

> Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = 
> TRUE),  : 
>  undefined columns selected
> 
> 
>> str(mydata)
> Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of  10 variables:
> $ indicator   : chr  "1. Health check-up" "2. Blood cholesterol checked " "3. 
> Recieved flu vaccine" "4. Blood pressure checked" ...
> $ subgroup: chr  "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, 
> ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ 
> "Both sexes, ages =35 yrs""| __truncated__ ...
> $ n   : num  2117 2127 2124 2135 1027 ...
> $ prevalence_c: chr  "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" 
> ...
> $ prevalence_p: chr  "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" 
> ...
> $ sensitivity : chr  "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" 
> ...
> $ specificity : chr  "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" 
> ...
> $ ppv : chr  "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" 
> ...
> $ npv : chr  "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" 
> ...
> $ kappa   : chr  "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 
> (0.035)" ...
> 
> Pradip K. Muhuri,  AHRQ/CFACT
> 5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
> 
> 
> 
> 
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Daniel 
> Nordlund
> Sent: Wednesday, June 15, 2016 6:37 PM
> To: r-help@r-project.org
> Subject: Re: [R] dplyr's arrange function
> 
> On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:
>> Hello,
>> 
>> I am using the dplyr's arrange() function to sort  one of the  many data 
>> frames  on a character variable (named "prevalence").
>> 
>> Issue: I am not getting the desired output  (line 7 is the problem, which 
>> should be the very last line in the sorted data frame) because the sorted 
>> field is character, not numeric.
>> 
>> The reproducible example and the output are appended below.
>> 
>> Is there any work-around  to convert/treat  this character variable (named 
>> "prevalence" in the data frame below)  as numeric before using the arrange() 
>> function within the dplyr package?
>> 
>> Any hints will be appreciated.
>> 
>> Th

Re: [R] dplyr's arrange function - 3 solutions received - 1 New Question

2016-06-16 Thread David Winsemius

> On Jun 16, 2016, at 6:12 AM, Muhuri, Pradip (AHRQ/CFACT) 
> <pradip.muh...@ahrq.hhs.gov> wrote:
> 
> Hello,
> 
> I got 3 solutions to my earlier code.  Thanks to the contributors.  May I 
> bring your attention to  a new question below (with respect to David's 
> solution)?
> 
> 1) Thanks to Daniel Nordlund  for the tips - replacing leading space with a 0 
>  in the data.
> 
> 2)  Thanks to David Winsemius for  his  solution with the gtools::mixedorder 
> function.   I  have added an argument to his.
> 
> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE),  ]
> 
> 3)  Thanks to Jim Lemon's for his  solution. I  have prepended a minus sign 
> to reverse the order.
> 
> numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c)," "),"[",1))
> mydata[order(-numprev), ]
> 
> 
> (New)Question for solution 2:
> 
> I want to keep only 2 variables  (say, indicator and prevalence_c) in the 
> output.  Where to insert the additional code? Why does the following code 
> fail?
> 
>> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), 
>> c(mydata$indicator, mydata$prevalence_c) ]
> 


Try instead just a vector of names for the second argument to "["

 mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), 
 c("indicator", "prevalence_c") ]

> Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = 
> TRUE),  : 
>  undefined columns selected
> 
> 
>> str(mydata)
> Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of  10 variables:
> $ indicator   : chr  "1. Health check-up" "2. Blood cholesterol checked " "3. 
> Recieved flu vaccine" "4. Blood pressure checked" ...
> $ subgroup: chr  "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, 
> ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ 
> "Both sexes, ages =35 yrs""| __truncated__ ...
> $ n   : num  2117 2127 2124 2135 1027 ...
> $ prevalence_c: chr  "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" 
> ...
> $ prevalence_p: chr  "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" 
> ...
> $ sensitivity : chr  "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" 
> ...
> $ specificity : chr  "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" 
> ...
> $ ppv : chr  "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" 
> ...
> $ npv : chr  "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" 
> ...
> $ kappa   : chr  "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 
> (0.035)" ...
> 
> Pradip K. Muhuri,  AHRQ/CFACT
> 5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
> 
> 
> 
> 
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Daniel 
> Nordlund
> Sent: Wednesday, June 15, 2016 6:37 PM
> To: r-help@r-project.org
> Subject: Re: [R] dplyr's arrange function
> 
> On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:
>> Hello,
>> 
>> I am using the dplyr's arrange() function to sort  one of the  many data 
>> frames  on a character variable (named "prevalence").
>> 
>> Issue: I am not getting the desired output  (line 7 is the problem, which 
>> should be the very last line in the sorted data frame) because the sorted 
>> field is character, not numeric.
>> 
>> The reproducible example and the output are appended below.
>> 
>> Is there any work-around  to convert/treat  this character variable (named 
>> "prevalence" in the data frame below)  as numeric before using the arrange() 
>> function within the dplyr package?
>> 
>> Any hints will be appreciated.
>> 
>> Thanks,
>> 
>> Pradip Muhuri
>> 
>> # Reproducible Example
>> 
>> library("readr")
>> testdata <- read_csv(
>> "indicator,  prevalence
>> 1. Health check-up, 77.2 (1.19)
>> 2. Blood cholesterol checked,  84.5 (1.14) 3. Recieved flu vaccine, 
>> 50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin 
>> use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7. Sigmoidoscopy,  
>> 6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram,  72.6 (1.82) 
>> 10. Pap Smear test, 73.3 (2.37)")
>

Re: [R] dplyr's arrange function - 3 solutions received - 1 New Question

2016-06-16 Thread Muhuri, Pradip (AHRQ/CFACT)
Hello,

I got 3 solutions to my earlier code.  Thanks to the contributors.  May I bring 
your attention to  a new question below (with respect to David's solution)?

1) Thanks to Daniel Nordlund  for the tips - replacing leading space with a 0  
in the data.

2)  Thanks to David Winsemius for  his  solution with the gtools::mixedorder 
function.   I  have added an argument to his.

mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE),  ]

3)  Thanks to Jim Lemon's for his  solution. I  have prepended a minus sign to 
reverse the order.

numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c)," "),"[",1))
mydata[order(-numprev), ]


(New)Question for solution 2:

I want to keep only 2 variables  (say, indicator and prevalence_c) in the 
output.  Where to insert the additional code? Why does the following code fail?

> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c(mydata$indicator, 
> mydata$prevalence_c) ]

Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = 
TRUE),  : 
  undefined columns selected


> str(mydata)
Classes 'tbl_df', 'tbl' and 'data.frame':   10 obs. of  10 variables:
 $ indicator   : chr  "1. Health check-up" "2. Blood cholesterol checked " "3. 
Recieved flu vaccine" "4. Blood pressure checked" ...
 $ subgroup: chr  "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, 
ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ "Both 
sexes, ages =35 yrs""| __truncated__ ...
 $ n   : num  2117 2127 2124 2135 1027 ...
 $ prevalence_c: chr  "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" 
...
 $ prevalence_p: chr  "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" 
...
 $ sensitivity : chr  "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" 
...
 $ specificity : chr  "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" 
...
 $ ppv : chr  "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" 
...
 $ npv : chr  "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" 
...
 $ kappa   : chr  "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 
(0.035)" ...

Pradip K. Muhuri,  AHRQ/CFACT
 5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564


 

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Daniel Nordlund
Sent: Wednesday, June 15, 2016 6:37 PM
To: r-help@r-project.org
Subject: Re: [R] dplyr's arrange function

On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:
> Hello,
>
> I am using the dplyr's arrange() function to sort  one of the  many data 
> frames  on a character variable (named "prevalence").
>
> Issue: I am not getting the desired output  (line 7 is the problem, which 
> should be the very last line in the sorted data frame) because the sorted 
> field is character, not numeric.
>
> The reproducible example and the output are appended below.
>
> Is there any work-around  to convert/treat  this character variable (named 
> "prevalence" in the data frame below)  as numeric before using the arrange() 
> function within the dplyr package?
>
> Any hints will be appreciated.
>
> Thanks,
>
> Pradip Muhuri
>
> # Reproducible Example
>
> library("readr")
> testdata <- read_csv(
> "indicator,  prevalence
> 1. Health check-up, 77.2 (1.19)
> 2. Blood cholesterol checked,  84.5 (1.14) 3. Recieved flu vaccine, 
> 50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin 
> use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7. Sigmoidoscopy,  
> 6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram,  72.6 (1.82) 
> 10. Pap Smear test, 73.3 (2.37)")
>
> # Sort on the character variable in descending order arrange(testdata, 
> desc(prevalence))
>
> # Results from Console
>
>   indicator  prevalence
>   (chr)   (chr)
> 1 4. Blood pressure checked 88.7 (0.88)
> 2  2. Blood cholesterol checked 84.5 (1.14)
> 31. Health check-up 77.2 (1.19)
> 410. Pap Smear test 73.3 (2.37)
> 5   9.Mammogram 72.6 (1.82)
> 6 6.Colonoscopy 60.2 (1.41)
> 7  7. Sigmoidoscopy  6.1 (0.61)
> 8   3. Recieved flu vaccine 50.0 (1.33)
> 9   8. Blood stool test 14.6 (1.00)
> 10  5. Aspirin use-problems 11.7 (1.02)
>
>
> Pradip K. Muhuri,  AHRQ/CFACT
>  5600 Fishers Lane # 7N142A, R

Re: [R] dplyr's arrange function

2016-06-15 Thread David Winsemius

> On Jun 15, 2016, at 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) 
>  wrote:
> 
> Hello,
> 
> I am using the dplyr's arrange() function to sort  one of the  many data 
> frames  on a character variable (named "prevalence").
> 
> Issue: I am not getting the desired output  (line 7 is the problem, which 
> should be the very last line in the sorted data frame) because the sorted 
> field is character, not numeric. 
> 
> The reproducible example and the output are appended below. 
> 
> Is there any work-around  to convert/treat  this character variable (named 
> "prevalence" in the data frame below)  as numeric before using the arrange() 
> function within the dplyr package?
> 
> Any hints will be appreciated.
> 
> Thanks,
> 
> Pradip Muhuri
> 
> # Reproducible Example 
> 
> library("readr")
> testdata <- read_csv(
> "indicator,  prevalence
> 1. Health check-up, 77.2 (1.19)
> 2. Blood cholesterol checked,  84.5 (1.14)
> 3. Recieved flu vaccine, 50.0 (1.33)
> 4. Blood pressure checked, 88.7 (0.88)
> 5. Aspirin use-problems, 11.7 (1.02)
> 6.Colonoscopy, 60.2 (1.41)
> 7. Sigmoidoscopy,  6.1 (0.61)
> 8. Blood stool test, 14.6 (1.00)
> 9.Mammogram,  72.6 (1.82)
> 10. Pap Smear test, 73.3 (2.37)")
> 
> # Sort on the character variable in descending order
> arrange(testdata, desc(prevalence))
> 
> # Results from Console
> 
>  indicator  prevalence
>  (chr)   (chr)
> 1 4. Blood pressure checked 88.7 (0.88)
> 2  2. Blood cholesterol checked 84.5 (1.14)
> 31. Health check-up 77.2 (1.19)
> 410. Pap Smear test 73.3 (2.37)
> 5   9.Mammogram 72.6 (1.82)
> 6 6.Colonoscopy 60.2 (1.41)
> 7  7. Sigmoidoscopy  6.1 (0.61)
> 8   3. Recieved flu vaccine 50.0 (1.33)
> 9   8. Blood stool test 14.6 (1.00)
> 10  5. Aspirin use-problems 11.7 (1.02)

Despite the fact that the prevalence columns is not really the  mixed 
numeric/alpha , it still can be sorted quite easily with the very handy 
gtools::mixedorder function:

> > require(gtools)
> Loading required package: gtools
> > testdata[ mixedorder(testdata$prevalence), ]
>   indicator  prevalence
> 7  7. Sigmoidoscopy  6.1 (0.61)
> 5   5. Aspirin use-problems 11.7 (1.02)
> 8   8. Blood stool test 14.6 (1.00)
> 3   3. Recieved flu vaccine 50.0 (1.33)
> 6 6.Colonoscopy 60.2 (1.41)
> 9   9.Mammogram 72.6 (1.82)
> 10   10. Pap Smear test 73.3 (2.37)
> 11. Health check-up 77.2 (1.19)
> 2  2. Blood cholesterol checked 84.5 (1.14)
> 4 4. Blood pressure checked 88.7 (0.88)

The mixedorder function splits the strings at the space boundaries and tests 
for numeric or alpha.

> 
> 
> Pradip K. Muhuri,  AHRQ/CFACT
> 5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
> 

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dplyr's arrange function

2016-06-15 Thread Jim Lemon
Hi Pradip,
I'll assume that you are reading the data from a file:

pm.df<-read.csv("pmdat.txt",stringsAsFactors=FALSE)
# create a vector of numeric values of prevalence
numprev<-as.numeric(sapply(strsplit(trimws(pm.df$prevalence)," "),"[",1))
# order the data frame by that vector
pm.df[order(numprev),]

Jim


On Thu, Jun 16, 2016 at 7:08 AM, Muhuri, Pradip (AHRQ/CFACT)
 wrote:
> Hello,
>
> I am using the dplyr's arrange() function to sort  one of the  many data 
> frames  on a character variable (named "prevalence").
>
> Issue: I am not getting the desired output  (line 7 is the problem, which 
> should be the very last line in the sorted data frame) because the sorted 
> field is character, not numeric.
>
> The reproducible example and the output are appended below.
>
> Is there any work-around  to convert/treat  this character variable (named 
> "prevalence" in the data frame below)  as numeric before using the arrange() 
> function within the dplyr package?
>
> Any hints will be appreciated.
>
> Thanks,
>
> Pradip Muhuri
>
> # Reproducible Example
>
> library("readr")
> testdata <- read_csv(
> "indicator,  prevalence
> 1. Health check-up, 77.2 (1.19)
> 2. Blood cholesterol checked,  84.5 (1.14)
> 3. Recieved flu vaccine, 50.0 (1.33)
> 4. Blood pressure checked, 88.7 (0.88)
> 5. Aspirin use-problems, 11.7 (1.02)
> 6.Colonoscopy, 60.2 (1.41)
> 7. Sigmoidoscopy,  6.1 (0.61)
> 8. Blood stool test, 14.6 (1.00)
> 9.Mammogram,  72.6 (1.82)
> 10. Pap Smear test, 73.3 (2.37)")
>
> # Sort on the character variable in descending order
> arrange(testdata, desc(prevalence))
>
> # Results from Console
>
>   indicator  prevalence
>   (chr)   (chr)
> 1 4. Blood pressure checked 88.7 (0.88)
> 2  2. Blood cholesterol checked 84.5 (1.14)
> 31. Health check-up 77.2 (1.19)
> 410. Pap Smear test 73.3 (2.37)
> 5   9.Mammogram 72.6 (1.82)
> 6 6.Colonoscopy 60.2 (1.41)
> 7  7. Sigmoidoscopy  6.1 (0.61)
> 8   3. Recieved flu vaccine 50.0 (1.33)
> 9   8. Blood stool test 14.6 (1.00)
> 10  5. Aspirin use-problems 11.7 (1.02)
>
>
> Pradip K. Muhuri,  AHRQ/CFACT
>  5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
>
>
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dplyr's arrange function

2016-06-15 Thread Daniel Nordlund

On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:

Hello,

I am using the dplyr's arrange() function to sort  one of the  many data frames  on a 
character variable (named "prevalence").

Issue: I am not getting the desired output  (line 7 is the problem, which 
should be the very last line in the sorted data frame) because the sorted field 
is character, not numeric.

The reproducible example and the output are appended below.

Is there any work-around  to convert/treat  this character variable (named 
"prevalence" in the data frame below)  as numeric before using the arrange() 
function within the dplyr package?

Any hints will be appreciated.

Thanks,

Pradip Muhuri

# Reproducible Example

library("readr")
testdata <- read_csv(
"indicator,  prevalence
1. Health check-up, 77.2 (1.19)
2. Blood cholesterol checked,  84.5 (1.14)
3. Recieved flu vaccine, 50.0 (1.33)
4. Blood pressure checked, 88.7 (0.88)
5. Aspirin use-problems, 11.7 (1.02)
6.Colonoscopy, 60.2 (1.41)
7. Sigmoidoscopy,  6.1 (0.61)
8. Blood stool test, 14.6 (1.00)
9.Mammogram,  72.6 (1.82)
10. Pap Smear test, 73.3 (2.37)")

# Sort on the character variable in descending order
arrange(testdata, desc(prevalence))

# Results from Console

  indicator  prevalence
  (chr)   (chr)
1 4. Blood pressure checked 88.7 (0.88)
2  2. Blood cholesterol checked 84.5 (1.14)
31. Health check-up 77.2 (1.19)
410. Pap Smear test 73.3 (2.37)
5   9.Mammogram 72.6 (1.82)
6 6.Colonoscopy 60.2 (1.41)
7  7. Sigmoidoscopy  6.1 (0.61)
8   3. Recieved flu vaccine 50.0 (1.33)
9   8. Blood stool test 14.6 (1.00)
10  5. Aspirin use-problems 11.7 (1.02)


Pradip K. Muhuri,  AHRQ/CFACT
 5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564





The problem is that you are sorting a character variable.


testdata$prevalence

 [1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7 (1.02)"
 [6] "60.2 (1.41)" "6.1 (0.61)"  "14.6 (1.00)" "72.6 (1.82)" "73.3 (2.37)"




Notice that the 7th element is "6.1 (0.61)".  The first CHARACTER is a 
"6", so it is going to sort BEFORE the "50.0 (1.33)" (in descending 
order).  If you want the character value of line 7 to sort last, it 
would need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the leading space).


Hope this is helpful,

Dan

Daniel Nordlund
Port Townsend, WA USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dplyr's arrange function

2016-06-15 Thread Muhuri, Pradip (AHRQ/CFACT)
Hello,

I am using the dplyr's arrange() function to sort  one of the  many data frames 
 on a character variable (named "prevalence").

Issue: I am not getting the desired output  (line 7 is the problem, which 
should be the very last line in the sorted data frame) because the sorted field 
is character, not numeric. 

The reproducible example and the output are appended below. 

Is there any work-around  to convert/treat  this character variable (named 
"prevalence" in the data frame below)  as numeric before using the arrange() 
function within the dplyr package?

Any hints will be appreciated.

Thanks,

Pradip Muhuri

# Reproducible Example 

library("readr")
testdata <- read_csv(
"indicator,  prevalence
1. Health check-up, 77.2 (1.19)
2. Blood cholesterol checked,  84.5 (1.14)
3. Recieved flu vaccine, 50.0 (1.33)
4. Blood pressure checked, 88.7 (0.88)
5. Aspirin use-problems, 11.7 (1.02)
6.Colonoscopy, 60.2 (1.41)
7. Sigmoidoscopy,  6.1 (0.61)
8. Blood stool test, 14.6 (1.00)
9.Mammogram,  72.6 (1.82)
10. Pap Smear test, 73.3 (2.37)")

# Sort on the character variable in descending order
arrange(testdata, desc(prevalence))

# Results from Console

  indicator  prevalence
  (chr)   (chr)
1 4. Blood pressure checked 88.7 (0.88)
2  2. Blood cholesterol checked 84.5 (1.14)
31. Health check-up 77.2 (1.19)
410. Pap Smear test 73.3 (2.37)
5   9.Mammogram 72.6 (1.82)
6 6.Colonoscopy 60.2 (1.41)
7  7. Sigmoidoscopy  6.1 (0.61)
8   3. Recieved flu vaccine 50.0 (1.33)
9   8. Blood stool test 14.6 (1.00)
10  5. Aspirin use-problems 11.7 (1.02)


Pradip K. Muhuri,  AHRQ/CFACT
 5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564


 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.