Karl Ove Hufthammer wrote:

> Anyway, do you think it’s worth trying to change the ‘table’ function the
> way I outlined in my first post¹? This should eliminate the performance
> hit on all platforms.

Some additional notes: ‘table’ uses ‘factor’ directly, but also indirectly, 
in ‘addNA’. The definition of ‘addNA’ ends with:

    if (!any(is.na(ll))) 
        ll <- c(ll, NA)
    factor(x, levels = ll, exclude = NULL)

Which is slow for non-ASCII levels. One *could* fix this by changing the 
last line to

  attr(x, "levels")=ll

But one soon ends up changing every function that uses ‘factor’ in this way, 
which seems like the wrong approach. The problems lies inside ‘factor’,
and that’s where it should be fixed, if feasible.

BTW, the defintion of ‘addNA’ looks suboptimal in a different way. The last 
line is always executed, even if the factor *does* contain NA values (and of 
course NA levels). For this case, basically it’s doing nothing, just taking 
a very long time doing it (at least on Windows). Moving the last line inside 
the ‘if’ clause, and adding a ‘else return(x)’ would fix this (correct me if 
I’m wrong).

-- 
Karl Ove Hufthammer

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to