Re: [R] dplyr's arrange function - 3 solutions received - 1 New Question

2016-06-16 Thread Muhuri, Pradip (AHRQ/CFACT)
Hello David,

Your revisions to the earlier code have given me desired results.

library("gtools")
mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c("indicator", 
"prevalence_c")  ]

Thanks,

Pradip


Pradip K. Muhuri,  AHRQ/CFACT
 5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564


 


-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Thursday, June 16, 2016 12:54 PM
To: Muhuri, Pradip (AHRQ/CFACT)
Cc: r-help@r-project.org
Subject: Re: [R] dplyr's arrange function - 3 solutions received - 1 New 
Question


> On Jun 16, 2016, at 6:12 AM, Muhuri, Pradip (AHRQ/CFACT) 
> <pradip.muh...@ahrq.hhs.gov> wrote:
> 
> Hello,
> 
> I got 3 solutions to my earlier code.  Thanks to the contributors.  May I 
> bring your attention to  a new question below (with respect to David's 
> solution)?
> 
> 1) Thanks to Daniel Nordlund  for the tips - replacing leading space with a 0 
>  in the data.
> 
> 2)  Thanks to David Winsemius for  his  solution with the gtools::mixedorder 
> function.   I  have added an argument to his.
> 
> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE),  ]
> 
> 3)  Thanks to Jim Lemon's for his  solution. I  have prepended a minus sign 
> to reverse the order.
> 
> numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c)," 
> "),"[",1)) mydata[order(-numprev), ]
> 
> 
> (New)Question for solution 2:
> 
> I want to keep only 2 variables  (say, indicator and prevalence_c) in the 
> output.  Where to insert the additional code? Why does the following code 
> fail?
> 
>> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), 
>> c(mydata$indicator, mydata$prevalence_c) ]
> 


Try instead just a vector of names for the second argument to "["

 mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), 
 c("indicator", "prevalence_c") ]

> Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = 
> TRUE),  : 
>  undefined columns selected
> 
> 
>> str(mydata)
> Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of  10 variables:
> $ indicator   : chr  "1. Health check-up" "2. Blood cholesterol checked " "3. 
> Recieved flu vaccine" "4. Blood pressure checked" ...
> $ subgroup: chr  "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, 
> ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ 
> "Both sexes, ages =35 yrs""| __truncated__ ...
> $ n   : num  2117 2127 2124 2135 1027 ...
> $ prevalence_c: chr  "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" 
> ...
> $ prevalence_p: chr  "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" 
> ...
> $ sensitivity : chr  "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" 
> ...
> $ specificity : chr  "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" 
> ...
> $ ppv : chr  "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" 
> ...
> $ npv : chr  "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" 
> ...
> $ kappa   : chr  "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 
> (0.035)" ...
> 
> Pradip K. Muhuri,  AHRQ/CFACT
> 5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
> 
> 
> 
> 
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Daniel 
> Nordlund
> Sent: Wednesday, June 15, 2016 6:37 PM
> To: r-help@r-project.org
> Subject: Re: [R] dplyr's arrange function
> 
> On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:
>> Hello,
>> 
>> I am using the dplyr's arrange() function to sort  one of the  many data 
>> frames  on a character variable (named "prevalence").
>> 
>> Issue: I am not getting the desired output  (line 7 is the problem, which 
>> should be the very last line in the sorted data frame) because the sorted 
>> field is character, not numeric.
>> 
>> The reproducible example and the output are appended below.
>> 
>> Is there any work-around  to convert/treat  this character variable (named 
>> "prevalence" in the data frame below)  as numeric before using the arrange() 
>> function within the dplyr package?
>> 
>> Any hints will be appreciated.
>> 
>> Th

Re: [R] dplyr's arrange function - 3 solutions received - 1 New Question

2016-06-16 Thread David Winsemius

> On Jun 16, 2016, at 6:12 AM, Muhuri, Pradip (AHRQ/CFACT) 
>  wrote:
> 
> Hello,
> 
> I got 3 solutions to my earlier code.  Thanks to the contributors.  May I 
> bring your attention to  a new question below (with respect to David's 
> solution)?
> 
> 1) Thanks to Daniel Nordlund  for the tips - replacing leading space with a 0 
>  in the data.
> 
> 2)  Thanks to David Winsemius for  his  solution with the gtools::mixedorder 
> function.   I  have added an argument to his.
> 
> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE),  ]
> 
> 3)  Thanks to Jim Lemon's for his  solution. I  have prepended a minus sign 
> to reverse the order.
> 
> numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c)," "),"[",1))
> mydata[order(-numprev), ]
> 
> 
> (New)Question for solution 2:
> 
> I want to keep only 2 variables  (say, indicator and prevalence_c) in the 
> output.  Where to insert the additional code? Why does the following code 
> fail?
> 
>> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), 
>> c(mydata$indicator, mydata$prevalence_c) ]
> 


Try instead just a vector of names for the second argument to "["

 mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), 
 c("indicator", "prevalence_c") ]

> Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = 
> TRUE),  : 
>  undefined columns selected
> 
> 
>> str(mydata)
> Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of  10 variables:
> $ indicator   : chr  "1. Health check-up" "2. Blood cholesterol checked " "3. 
> Recieved flu vaccine" "4. Blood pressure checked" ...
> $ subgroup: chr  "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, 
> ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ 
> "Both sexes, ages =35 yrs""| __truncated__ ...
> $ n   : num  2117 2127 2124 2135 1027 ...
> $ prevalence_c: chr  "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" 
> ...
> $ prevalence_p: chr  "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" 
> ...
> $ sensitivity : chr  "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" 
> ...
> $ specificity : chr  "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" 
> ...
> $ ppv : chr  "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" 
> ...
> $ npv : chr  "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" 
> ...
> $ kappa   : chr  "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 
> (0.035)" ...
> 
> Pradip K. Muhuri,  AHRQ/CFACT
> 5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
> 
> 
> 
> 
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Daniel 
> Nordlund
> Sent: Wednesday, June 15, 2016 6:37 PM
> To: r-help@r-project.org
> Subject: Re: [R] dplyr's arrange function
> 
> On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:
>> Hello,
>> 
>> I am using the dplyr's arrange() function to sort  one of the  many data 
>> frames  on a character variable (named "prevalence").
>> 
>> Issue: I am not getting the desired output  (line 7 is the problem, which 
>> should be the very last line in the sorted data frame) because the sorted 
>> field is character, not numeric.
>> 
>> The reproducible example and the output are appended below.
>> 
>> Is there any work-around  to convert/treat  this character variable (named 
>> "prevalence" in the data frame below)  as numeric before using the arrange() 
>> function within the dplyr package?
>> 
>> Any hints will be appreciated.
>> 
>> Thanks,
>> 
>> Pradip Muhuri
>> 
>> # Reproducible Example
>> 
>> library("readr")
>> testdata <- read_csv(
>> "indicator,  prevalence
>> 1. Health check-up, 77.2 (1.19)
>> 2. Blood cholesterol checked,  84.5 (1.14) 3. Recieved flu vaccine, 
>> 50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin 
>> use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7. Sigmoidoscopy,  
>> 6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram,  72.6 (1.82) 
>> 10. Pap Smear test, 73.3 (2.37)")
>> 
>> # Sort on the character variable in descending order arrange(testdata, 
>> desc(prevalence))
>> 
>> # Results from Console
>> 
>>  indicator  prevalence
>>  (chr)   (chr)
>> 1 4. Blood pressure checked 88.7 (0.88)
>> 2  2. Blood cholesterol checked 84.5 (1.14)
>> 31. Health check-up 77.2 (1.19)
>> 410. Pap Smear test 73.3 (2.37)
>> 5   9.Mammogram 72.6 (1.82)
>> 6 6.Colonoscopy 60.2 (1.41)
>> 7  7. Sigmoidoscopy  6.1 (0.61)
>> 8   3. Recieved flu vaccine 50.0 (1.33)
>> 9   8. Blood stool test 14.6 (1.00)
>> 10  5. Aspirin use-problems 11.7 (1.02)
>> 
>> 
>> Pradip K. Muhuri,  AHRQ/CFACT
>> 5600 Fishers Lane # 7N142A, Rockville, MD 20857
>> Tel: 301-427-1564
>> 
>> 
>> 
> 
> The problem is that you are sorting a character variable.
> 
>> testdata$prevalence
>  [1] 

Re: [R] dplyr's arrange function - 3 solutions received - 1 New Question

2016-06-16 Thread Muhuri, Pradip (AHRQ/CFACT)
Hello,

I got 3 solutions to my earlier code.  Thanks to the contributors.  May I bring 
your attention to  a new question below (with respect to David's solution)?

1) Thanks to Daniel Nordlund  for the tips - replacing leading space with a 0  
in the data.

2)  Thanks to David Winsemius for  his  solution with the gtools::mixedorder 
function.   I  have added an argument to his.

mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE),  ]

3)  Thanks to Jim Lemon's for his  solution. I  have prepended a minus sign to 
reverse the order.

numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c)," "),"[",1))
mydata[order(-numprev), ]


(New)Question for solution 2:

I want to keep only 2 variables  (say, indicator and prevalence_c) in the 
output.  Where to insert the additional code? Why does the following code fail?

> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c(mydata$indicator, 
> mydata$prevalence_c) ]

Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = 
TRUE),  : 
  undefined columns selected


> str(mydata)
Classes 'tbl_df', 'tbl' and 'data.frame':   10 obs. of  10 variables:
 $ indicator   : chr  "1. Health check-up" "2. Blood cholesterol checked " "3. 
Recieved flu vaccine" "4. Blood pressure checked" ...
 $ subgroup: chr  "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, 
ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ "Both 
sexes, ages =35 yrs""| __truncated__ ...
 $ n   : num  2117 2127 2124 2135 1027 ...
 $ prevalence_c: chr  "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" 
...
 $ prevalence_p: chr  "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" 
...
 $ sensitivity : chr  "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" 
...
 $ specificity : chr  "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" 
...
 $ ppv : chr  "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" 
...
 $ npv : chr  "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" 
...
 $ kappa   : chr  "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 
(0.035)" ...

Pradip K. Muhuri,  AHRQ/CFACT
 5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564


 

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Daniel Nordlund
Sent: Wednesday, June 15, 2016 6:37 PM
To: r-help@r-project.org
Subject: Re: [R] dplyr's arrange function

On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:
> Hello,
>
> I am using the dplyr's arrange() function to sort  one of the  many data 
> frames  on a character variable (named "prevalence").
>
> Issue: I am not getting the desired output  (line 7 is the problem, which 
> should be the very last line in the sorted data frame) because the sorted 
> field is character, not numeric.
>
> The reproducible example and the output are appended below.
>
> Is there any work-around  to convert/treat  this character variable (named 
> "prevalence" in the data frame below)  as numeric before using the arrange() 
> function within the dplyr package?
>
> Any hints will be appreciated.
>
> Thanks,
>
> Pradip Muhuri
>
> # Reproducible Example
>
> library("readr")
> testdata <- read_csv(
> "indicator,  prevalence
> 1. Health check-up, 77.2 (1.19)
> 2. Blood cholesterol checked,  84.5 (1.14) 3. Recieved flu vaccine, 
> 50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin 
> use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7. Sigmoidoscopy,  
> 6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram,  72.6 (1.82) 
> 10. Pap Smear test, 73.3 (2.37)")
>
> # Sort on the character variable in descending order arrange(testdata, 
> desc(prevalence))
>
> # Results from Console
>
>   indicator  prevalence
>   (chr)   (chr)
> 1 4. Blood pressure checked 88.7 (0.88)
> 2  2. Blood cholesterol checked 84.5 (1.14)
> 31. Health check-up 77.2 (1.19)
> 410. Pap Smear test 73.3 (2.37)
> 5   9.Mammogram 72.6 (1.82)
> 6 6.Colonoscopy 60.2 (1.41)
> 7  7. Sigmoidoscopy  6.1 (0.61)
> 8   3. Recieved flu vaccine 50.0 (1.33)
> 9   8. Blood stool test 14.6 (1.00)
> 10  5. Aspirin use-problems 11.7 (1.02)
>
>
> Pradip K. Muhuri,  AHRQ/CFACT
>  5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
>
>
>

The problem is that you are sorting a character variable.

> testdata$prevalence
  [1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7 (1.02)"
  [6] "60.2 (1.41)" "6.1 (0.61)"  "14.6 (1.00)" "72.6 (1.82)" "73.3 (2.37)"
>

Notice that the 7th element is "6.1 (0.61)".  The first CHARACTER is a "6", so 
it is going to sort BEFORE the "50.0 (1.33)" (in descending order).  If you 
want the character value of line 7 to sort last, it would need to be "06.1 
(0.61)" or " 6.1 (0.61)" (notice the leading space).

Hope this is helpful,

Dan

Daniel Nordlund
Port