Thanks again Michael, simple enough!

r1z <- r1
str(r1z$B1)
#Factor w/ 14 levels "Z","A","C","D",..: 2 2 3 3 2 2 2 2 2 2 ...

# When you do your step of replacing lower case l with upper case L the
# level still stays in the factor even though it is empty. If that is a nuisance
r1z$B1 <- factor(r1z$B1)
#will drop the unused levels. There are other ways of doing this.
str(r1z$B1)
#Factor w/ 13 levels "Z","A","C","D",..: 2 2 3 3 2 2 2 2 2 2 ...
table(r1z$B1)
# Z     A     C     D     E     G     J     L     P     Q     S     U     V
# 19600  1671   543     2     8   147   281   660     1    64    36   114    14


Dear Bill

When you do your step of replacing lower case l with upper case L the
level still stays in the factor even though it is empty. If that is a
nuisance x <- factor(x) will drop the unused levels. There are other
ways of doing this.

Michael

On 16/11/2018 15:38, Bill Poling wrote:
> Hello:
>
> I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456
>
> I would like to know why when I replace a column value it still appears in 
> subsequent routines:
>
> My example:
>
> r1$B1 is a Factor: It is created from the first character of a list of CPT 
> codes, r1$CPT.
>
> head(r1$CPT, N= 25)
> [1] A4649 A4649 C9359 C1713 A0394 A0398
> 903 Levels: 00000 00001 00140 00160 00670 00810 00940 01400 01470 01961 01968 
> 10160 11000 11012 11042 11043 11044 11045 11100 11101 11200 11201 11401 11402 
> ... l8699
>
> str(r1$CPT)
> Factor w/ 903 levels "00000","00001",..: 773 773 816 783 739 741 743 739 739 
> 741 ...
>
>
> And I want only those CPT's with leading alpha char in this column so I set 
> the numeric leading char to Z
>
> r1$B1 <- str_sub(r1$CPT,1,1)
>
> r1$B1 <- as.factor(r1$B1) #Redundant
> levels(r1$B1)[levels(r1$B1) %in% c('1','2','3','4','5','6','7','8','9','0')] 
> <- 'Z'
>
> When I check what I have done I find l & L
>
> unique(r1$B1)
> #[1] A C Z L G Q U J V E S l D P
> #Levels: Z A C D E G J l L P Q S U V
>
> So I change l to L
> r1$B1[r1$B1 == 'l'] <- 'L'
>
> When I check again I have l & L but l = 0
> table(r1$B1)
> # Z A C D E G J l L P Q S U V
> #19639 1673 546 2 8 147 281 0 664 1 64 36 114 14
>
> When I go to find those rows as if they existed, they are not accounted for?
>
> tmp <- subset(r1, B1 == "l")
> print(tmp)
> Empty data.table (0 rows) of 9 cols: 
> SavingsReversed,productID,ProviderID,PatientGender,ModCnt,Editnumber2...
>
> And I have actually visually inspected the whole darn column, sheesh!
>
> So I ignore it temporarily.
>
> Now later on it resurfaces in a tutorial I am following for caret pkg.
>
> preProcess(r1b, method = c("center", "scale"),
> thresh = 0.95, pcaComp = NULL, na.remove = TRUE, k = 5,
> knnSummary = mean, outcome = NULL, fudge = 0.2, numUnique = 3,
> verbose = FALSE, freqCut = 95/5, uniqueCut = 10, cutoff = 0.9,
> rangeBounds = c(0, 1))
> # Warning in preProcess.default(r1b, method = c("center", "scale"), thresh = 
> 0.95, :
> # These variables have zero variances: B1l <-------------yes this is a 
> remnant of the r1$B1 clean-up
> # Created from 23141 samples and 22 variables
> #
> # Pre-processing:
> # - centered (22)
> # - ignored (0)
> # - scaled (22)
>
>
> So my questions are, in consideration of regression modelling accuracy:
>
> Why is this happening?
> How do I remove it?
> Or is it irrelevant and leave it be?
>
> As always, thank you for you support.
>
> WHP
>
>
>
>
>
>
>
>
>
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
> ______________________________________________
> mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Michael
http://www.dewey.myzen.co.uk/home.html

Confidentiality Notice This message is sent from Zelis. This transmission may 
contain information which is privileged and confidential and is intended for 
the personal and confidential use of the named recipient only. Such information 
may be protected by applicable State and Federal laws from this disclosure or 
unauthorized use. If the reader of this message is not the intended recipient, 
or the employee or agent responsible for delivering the message to the intended 
recipient, you are hereby notified that any disclosure, review, discussion, 
copying, or taking any action in reliance on the contents of this transmission 
is strictly prohibited. If you have received this transmission in error, please 
contact the sender immediately. Zelis, 2018.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to