Re: [Rd] colnames for data.frame could be greatly improved

2016-12-27 Thread Jan Gorecki
Hi there,
Any update on this?
Should I create bugzilla ticket and submit patch?
Regards
Jan Gorecki

On 20 December 2016 at 01:27, Jan Gorecki  wrote:
> Hello,
>
> colnames seems to be not optimized well for data.frame. It escapes
> processing for data.frame in
>
>   if (is.data.frame(x) && do.NULL)
> return(names(x))
>
> but only when do.NULL true. This makes huge difference when do.NULL
> false. Minimal edit to `colnames`:
>
> if (is.data.frame(x)) {
> nm <- names(x)
> if (do.NULL || !is.null(nm))
> return(nm)
> else
> return(paste0(prefix, seq_along(x)))
> }
>
> Script and timings:
>
> N=1e7; K=100
> set.seed(1)
> DF <- data.frame(
> id1 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
> id2 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
> id3 = sample(sprintf("id%010d",1:(N/K)), N, TRUE), # small groups (char)
> id4 = sample(K, N, TRUE),  # large groups (int)
> id5 = sample(K, N, TRUE),  # large groups (int)
> id6 = sample(N/K, N, TRUE),# small groups (int)
> v1 =  sample(5, N, TRUE),  # int in range [1,5]
> v2 =  sample(5, N, TRUE),  # int in range [1,5]
> v3 =  sample(round(runif(100,max=100),4), N, TRUE) # numeric e.g. 23.5749
> )
> cat("GB =", round(sum(gc()[,2])/1024, 3), "\n")
> #GB = 0.397
> colnames(DF) = NULL
> system.time(nm1<-colnames(DF, FALSE))
> #   user  system elapsed
> # 22.158   0.299  22.498
> print(nm1)
> #[1] "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9"
>
> ### restart R
>
> colnames <- function (x, do.NULL = TRUE, prefix = "col")
> {
> if (is.data.frame(x)) {
> nm <- names(x)
> if (do.NULL || !is.null(nm))
> return(nm)
> else
> return(paste0(prefix, seq_along(x)))
> }
> dn <- dimnames(x)
> if (!is.null(dn[[2L]]))
> dn[[2L]]
> else {
> nc <- NCOL(x)
> if (do.NULL)
> NULL
> else if (nc > 0L)
> paste0(prefix, seq_len(nc))
> else character()
> }
> }
> N=1e7; K=100
> set.seed(1)
> DF <- data.frame(
> id1 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
> id2 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
> id3 = sample(sprintf("id%010d",1:(N/K)), N, TRUE), # small groups (char)
> id4 = sample(K, N, TRUE),  # large groups (int)
> id5 = sample(K, N, TRUE),  # large groups (int)
> id6 = sample(N/K, N, TRUE),# small groups (int)
> v1 =  sample(5, N, TRUE),  # int in range [1,5]
> v2 =  sample(5, N, TRUE),  # int in range [1,5]
> v3 =  sample(round(runif(100,max=100),4), N, TRUE) # numeric e.g. 23.5749
> )
> cat("GB =", round(sum(gc()[,2])/1024, 3), "\n")
> #GB = 0.397
> colnames(DF) = NULL
> system.time(nm1<-colnames(DF, FALSE))
> #   user  system elapsed
> #  0.001   0.000   0.000
> print(nm1)
> #[1] "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9"
>
> sessionInfo()
> #R Under development (unstable) (2016-12-19 r71815)
> #Platform: x86_64-pc-linux-gnu (64-bit)
> #Running under: Debian GNU/Linux stretch/sid
> #
> #locale:
> # [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> # [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> # [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
> # [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> # [9] LC_ADDRESS=C   LC_TELEPHONE=C
> #[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> #
> #attached base packages:
> #[1] stats graphics  grDevices utils datasets  methods   base  #
> #
> #loaded via a namespace (and not attached):
> #[1] compiler_3.4.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Proper attribution in Authors@R for the d3.js library by Mike Bostock

2016-12-27 Thread Bryan Hanson
I have a couple of packages that use the d3.js library developed (and 
copyrighted) by Mike Bostock.  One package uses it extensively, another only 
for one function.  I use R to piece together parts files containing JavaScript 
that I have written, which use d3.js functions and eventually the d3 library is 
called from a temporary web page.

To date, I have pointed to Bostock's library in the Rd files.  However, there 
are a growing number of packages that use d3.js now, and the habit (standard?) 
now seems to be to put Bostock in Authors@R, but they do so in different ways.  
Here are a few examples:

D3partitionR:  Mike Bostock [aut, cph] (d3.js library in htmlwidgets/lib, 
http://d3js.org. The partitionChart, sunburst, the treemap, the zoomable 
circlePacking, the collapsible indented tree and the collapsible tree are all 
derived from his works.)

scatterD3:  Mike Bostock [aut, cph] (d3.js library, http://d3js.org)

qrage:  Michael Bostock [ctb, cph] (D3.js library)

And there are others.

?person offers the following options that seem relevant:

"aut"
(Author) Use for full authors who have made substantial contributions to the 
package and should show up in the package citation.

"com"
(Compiler) Use for persons who collected code (potentially in other languages) 
but did not make further substantial contributions to the package.

"ctb"
(Contributor) Use for authors who have made smaller contributions (such as code 
patches etc.) but should not show up in the package citation.

"cph"
(Copyright holder) Use for all copyright holders.

Questions: What is the proper attribution in this case?  Bostock is clearly 
cph.  Is that sufficient?  I'm not completely certain about the intentions of 
"com" but it may fit.  "ctb" sounds insufficient and "aut" seems to be a bit 
much (can you be an author of an R package and not know about it?).  Is there a 
CRAN policy about this?

Thanks for any thoughts and guidance.  Bryan

Prof. Bryan Hanson
Dept of Chemistry & Biochemistry
DePauw University
Greencastle IN 46135 USA
academic.depauw.edu/~hanson/index.html
github.com/bryanhanson

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel