color and clarity are ordered factors, so sparse.model.matrix is
generating orthogonal-polynomial contrasts  (see ?contr.poly).  This is
by design ...  what are you trying to do?  Are you interested in fac2sparse?

On 18-02-07 11:00 PM, Dario Strbenac wrote:
> Good day,
> 
> Sometimes, sparse.model.matrix outputs a dgCMatrix which has column names 
> consisting of factor levels that were not in the original dataset. The first 
> factor appears to be correctly transformed, but the following factors don't. 
> For example:
> 
> diamonds <- as.data.frame(ggplot2::diamonds)
>> colnames(sparse.model.matrix(~ . -1, diamonds))
>  [1] "carat"        "cutFair"      "cutGood"      "cutVery Good" "cutPremium" 
>   "cutIdeal"     "color.L"      "color.Q"      "color.C"      "color^4"      
> "color^5"     
> [12] "color^6"      "clarity.L"    "clarity.Q"    "clarity.C"    "clarity^4"  
>   "clarity^5"    "clarity^6"    "clarity^7"    "depth"        "table"        
> "price"       
> [23] "x"            "y"            "z"
> 
> The variables color and clarity don't have factor levels which have been 
> suffixed to them in the transformed matrix. The values in those columns are 
> also wrong. Changing the Ord.factor columns into simply being factors fixes 
> the problem. 
> 
>> diamonds[, "cut"] <- factor(as.character(diamonds[, "cut"]))
>> diamonds[, "color"] <- factor(as.character(diamonds[, "color"]))
>> diamonds[, "clarity"] <- factor(as.character(diamonds[, "clarity"]))
> 
>> colnames(sparse.model.matrix(~ . -1, diamonds)) # No more invented factor 
>> levels.
>  [1] "carat"        "cutFair"      "cutGood"      "cutIdeal"     "cutPremium" 
>   "cutVery Good" "colorE"       "colorF"       "colorG"       "colorH"      
> [11] "colorI"       "colorJ"       "clarityIF"    "claritySI1"   "claritySI2" 
>   "clarityVS1"   "clarityVS2"   "clarityVVS1"  "clarityVVS2"  "depth"       
> [21] "table"        "price"        "x"            "y"            "z"
> 
> Can it be made to work correctly for both plain and ordered factors?
> 
>> sessionInfo()
> R Under development (unstable) (2018-02-06 r74231)
> Platform: i386-w64-mingw32/i386 (32-bit)
> 
> other attached packages:
> [1] Matrix_1.2-12
> 
> loaded via a namespace (and not attached):
>  [1] colorspace_1.3-2 scales_0.5.0     compiler_3.5.0   lazyeval_0.2.1  
>  [5] plyr_1.8.4       pillar_1.1.0     gtable_0.2.0     tibble_1.4.2    
>  [9] Rcpp_0.12.15     ggplot2_2.2.1    grid_3.5.0       rlang_0.1.6     
> [13] munsell_0.4.3    lattice_0.20-35
> 
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> 
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to