Re: [R] > Understanding strip.default & strip.custom

2016-06-27 Thread Bert Gunter
It's a vector of **length** 1, not **value** 1. In your case it gives
the index (1 to 12) of the level being drawn in the panel, which is
used to draw the strip according to other strip parameters, esp.
style.

You seem to be making this way more difficult than you should.
strip.default is the **function** being used to draw the strips in
each panel. Generally speaking, you should not have to mess with
arguments like which.panel and should probably use strip.custom()
instead.  Do carefully go through the examples in ?strip and ?xyplot.
That may help.

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jun 27, 2016 at 7:59 PM, p_connolly  wrote:
> I'm having difficulty following the help for those functions.
>
> My plot has a single conditioning factor with 12 levels.  My
> factor.levels in a call to strip.default looks like this:
>
>  factor.levels = expression(Needles~ "::"~alpha -pinene,
> Stems~ "::"~alpha -pinene,
> Needles~ "::"~beta -pinene,
> Stems~ "::"~beta -pinene,
> Needles~ ":: B−Phellandrene",
> Stems~ ":: B−Phellandrene",
> Needles~ ":: Camphene",
> Stems~ ":: Camphene",
> Needles~ ":: Myrcene",
> Stems~ ":: Myrcene",
> Needles~ ":: Limonene",
> Stems~ "::Limonene")
>
> Since there is only one factor, which.given must be 1.
> Likewise, var.name must be of length 1.
>
> What I can't understand is the argument which.panel.  The help says:
>
> which.panel: vector of integers as long as the number of conditioning
>   variables. The contents are indices specifying the current
>   levels of each of the conditioning variables (thus, this
>   would be unique for each distinct packet).  This is identical
>   to the return value of ‘which.packet’, which is a more
>   accurate name.
>
> So, that must be of length 1 also, according to the first sentence,
> but if I set it to 1, I get the first strip label repeated 12 times.
> Set it to 2, I get the second one 12 times.  Set it to 1:2, it
> attempts to squash 2 strips in the space of 1, labelling the first
> one.   I don't understand the second sentence at all.
>
> What do I do to get all 12 in their correct order?
>
> I couldn't find an example remotely like what I'm trying to do.  Are
> there any pointers?
>
> TIA
> Patrick
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Turning a variable name into a function argument

2016-06-27 Thread KMNanus
I’m inexperience but am trying to get my head around using functions to make a 
number of ggplots easier to do.  

I have a function that creates a ggplot taking one input variable as an 
argument. The variable name is shorthand for the actual variable (variable name 
= tue, Actual name = Tuesday).  Since I want to use the actual variable name in 
ylab and ggtitle, I’d like to add a second argument, new.name, to the function 
which would allow me to utilize both inputs as arguments but have not been 
successful.  I tried creating a function within the function to accomplish 
this, using deparse(substitute(new.name))and also using the code you see below.


myfun <- function(myvar, new.name){
  function(new.name){return(as.character(substitute(new.name)))}
  ggplot(b12.2, aes(x= games,  y = myvar, col = Group))+
  geom_point() + 
  geom_line()+
  xlab("Minimum Games" ) +
  ylab(paste(new.name, “Average Change"))+
  ggtitle(new.name, "Change \n as a Function of Minimum Number of Games”)+
  theme_bw()

When call myfun(myvar, new.name), I get an error msg “new.name is not found” 
whether I call new.name or Tuesday.

I want ggplot to automatically insert Tuesday into ylab and ggtitle.

Can anyone help me with this?  Thanks for your patience.

Ken
kmna...@gmail.com
914-450-0816 (tel)
347-730-4813 (fax)



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] performance of do.call("rbind")

2016-06-27 Thread Jeff Newmiller
Sarah, you make it sound as though everyone should be using matrices, even 
though they have distinct disadvantages for many types of analysis.


You are right that rbind on data frames is slow, but dplyr::bind_rows 
handles data frames almost as fast as your rbind-ing matrices solution.


And if you apply knowledge of your data frames and don't do the error 
checking that bind_rows does, you can beat both of them without converting 
to matrices, as the "tm.dfcolcat" solution below illustrates. (Not for 
everyday use, but if you have a big job and the data are clean this may 
make a difference.)


Data frames, handled properly, are only slightly slower than matrices for 
most purposes. I have seen numerical solutions of partial differential 
equations run lightning fast using pre-allocated data frames and vector 
calculations, so even traditional "matrix" calculation domains don't have 
use matrices to be competitive.


##
testsize <- 5000
N <- 20

set.seed(1234)
testdf.list <- lapply( seq_len( testsize )
 , function( x ) {
data.frame( matrix( runif( 300 ), nrow=100 ) )
   }
 )

tm.rbind <- function( x = 0 ) {
  system.time( r.df <- do.call( "rbind", testdf.list ) )
}
#toss the first one
tm.rbind()
tms.rbind <- data.frame( do.call( rbind
, lapply( 1:N
, tm.rbind
)
)
   , which = "rbind"
   )

tm.rbindm <- function( x = 0 ) {
  system.time({
testm.list <- lapply( testdf.list, as.matrix )
r.m <- do.call( rbind, testm.list )
  })
}
#toss the first one
tm.rbindm()
tms.rbindm <- data.frame( do.call( rbind
 , lapply( 1:N
 , tm.rbindm
 )
 )
, which = "rbindm"
)

tm.dfcopy <- function(x=0) {
  system.time({
l.df <- data.frame( matrix( NA
  , nrow=100 * testsize
  , ncol=3
  )
  )
for ( i in seq_len( testsize ) ) {
  start <- ( i - 1 ) * 100 + 1
  end <- i * 100
  l.df[ start:end, ] <- testdf.list[[ i ]]
}
  })
}
#toss the first one
tm.dfcopy()
tms.dfcopy <- data.frame( do.call( rbind
 , lapply( 1:N
 , tm.dfcopy
 )
 )
, which = "dfcopy"
)

tm.dfmatcopy <- function(x=0) {
  system.time({
l.m <- data.frame( matrix( NA
 , nrow=100 * testsize
 , ncol = 3
 )
 )
testm.list <- lapply( testdf.list, as.matrix )
for ( i in seq_len( testsize ) ) {
  start <- ( i - 1 ) * 100 + 1
  end <- i * 100
  l.m[ start:end, ] <- testm.list[[ i ]]
}
  })
}
#toss the first one
tm.dfmatcopy()
tms.dfmatcopy <- data.frame( do.call( rbind
, lapply( 1:N
, tm.dfmatcopy
)
)
   , which = "dfmatcopy"
   )

tm.bind_rows <- function(x=0) {
  system.time({
dplyr::bind_rows( testdf.list )
  })
}
#toss the first one
tm.bind_rows()
tms.bind_rows <- data.frame( do.call( rbind
, lapply( 1:N
, tm.bind_rows
)
)
   , which = "bind_rows"
   )

tm.dfcolcat <- function(x=0) {
  system.time({
mycolnames <- names( testdf.list[[ 1 ]] )
result <-
  setNames( data.frame( lapply( mycolnames
  , function( colidx ) {
  do.call( c
 , lapply( testdf.list
 , function( v ) {
 v[[ colidx ]]
   }
 )
 )
}
  )
  )
  , mycolnames
  )
  })
}
#toss the first one
tm.dfcolcat()
tms.dfcolcat <- data.frame( do.call( rbind, lapply( 1:N
  , tm.dfcolcat
  )

[R] > Understanding strip.default & strip.custom

2016-06-27 Thread p_connolly

I'm having difficulty following the help for those functions.

My plot has a single conditioning factor with 12 levels.  My
factor.levels in a call to strip.default looks like this:

 factor.levels = expression(Needles~ "::"~alpha -pinene,
Stems~ "::"~alpha -pinene,
Needles~ "::"~beta -pinene,
Stems~ "::"~beta -pinene,
Needles~ ":: B−Phellandrene",
Stems~ ":: B−Phellandrene",
Needles~ ":: Camphene",
Stems~ ":: Camphene",
Needles~ ":: Myrcene",
Stems~ ":: Myrcene",
Needles~ ":: Limonene",
Stems~ "::Limonene")

Since there is only one factor, which.given must be 1.
Likewise, var.name must be of length 1.

What I can't understand is the argument which.panel.  The help says:

which.panel: vector of integers as long as the number of conditioning
  variables. The contents are indices specifying the current
  levels of each of the conditioning variables (thus, this
  would be unique for each distinct packet).  This is identical
  to the return value of ‘which.packet’, which is a more
  accurate name.

So, that must be of length 1 also, according to the first sentence,
but if I set it to 1, I get the first strip label repeated 12 times.
Set it to 2, I get the second one 12 times.  Set it to 1:2, it
attempts to squash 2 strips in the space of 1, labelling the first
one.   I don't understand the second sentence at all.

What do I do to get all 12 in their correct order?

I couldn't find an example remotely like what I'm trying to do.  Are
there any pointers?

TIA
Patrick

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What's box() (exactly) doing?

2016-06-27 Thread Marius Hofert
On Mon, Jun 27, 2016 at 5:42 PM, Greg Snow <538...@gmail.com> wrote:
> You can use the grconvertX and grconvertY functions to find the
> coordinates (in user coordinates to pass to rect) of the figure region
> (or other regions).
>
> Probably something like:
> grconvertX(c(0,1), from='nfc', to='user')
> grconvertY(c(0,1), from='nfc', to='user')

Hi Greg,

Thanks, that's good to know.

Cheers,
Marius

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Call subroutine init_random_seed () in R

2016-06-27 Thread Kodalore Vijayan, Vineetha W
I want to call the subroutine init_random_seed() in R. The subroutine is
defined as an example in the following link.

https://gcc.gnu.org/onlinedocs/gfortran/RANDOM_005fSEED.html

subroutine init_random_seed()
use iso_fortran_env, only: int64
implicit none
integer, allocatable :: seed(:)
integer :: i, n, un, istat, dt(8), pid
integer(int64) :: t

call random_seed(size = n)
allocate(seed(n))
! First try if the OS provides a random number generator
open(newunit=un, file="/dev/urandom", access="stream", &
 form="unformatted", action="read", status="old", iostat=istat)
if (istat == 0) then
   read(un) seed
   close(un)
.
...

end subroutine init_random_seed

I do not know what variable goes in when you write the function
.fortran () in R. Any guidance is appreciated.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What's box() (exactly) doing?

2016-06-27 Thread Greg Snow
You can use the grconvertX and grconvertY functions to find the
coordinates (in user coordinates to pass to rect) of the figure region
(or other regions).

Probably something like:
grconvertX(c(0,1), from='nfc', to='user')
grconvertY(c(0,1), from='nfc', to='user')




On Fri, Jun 24, 2016 at 8:19 PM, Marius Hofert
 wrote:
> Hi Jim,
>
> Here is a follow-up question: How would you replicate box("figure")
> (instead of box() = box("plot"))?
> I tried to fill the plotted box but there seems to be no argument to
> box("figure") that does that. If that's indeed the case, one could
> work again with rect() (thus replicating box("figure")), but how can
> one specify the exact location/width/height of the rectangle? (see
> example below)
>
> Cheers,
> M
>
> plot(NA, type = "n", ann = TRUE, axes = TRUE, xlim = 0:1, ylim = 0:1)
> box("figure", col = "red", lwd = 2) # how to fill?
>
> par(xpd = TRUE)
> width = 1.4 # obviously not correct...
> height <- width
> loc.x <- 0.5
> loc.y <- 0.5
> xleft <- loc.x-width/2
> xright <- loc.x+width/2
> ybottom <- loc.y-height/2
> ytop <- loc.y+height/2
> rect(xleft = xleft, ybottom = ybottom, xright = xright, ytop = ytop,
>  col = adjustcolor("grey80", alpha.f = 0.5))
> par(xpd = FALSE)
>
> On Fri, Jun 24, 2016 at 8:40 PM, Marius Hofert
>  wrote:
>> Hi Jim,
>>
>> Thanks a lot, exactly what I was looking for.
>>
>> Cheers,
>> Marius
>>
>>
>>
>> On Thu, Jun 23, 2016 at 11:06 PM, Jim Lemon  wrote:
>>> Hi Marius,
>>> There are a few things that are happening here. First, the plot area
>>> is not going to be the same as your x and y limits unless you say so:
>>>
>>> # run your first example
>>> par("usr")
>>> [1] -0.04  1.04 -0.04  1.04
>>>
>>> # but
>>> plot(NA, type = "n", ann = FALSE, axes = FALSE,
>>>  xlim = 0:1, ylim = 0:1,xaxs="i",yaxs="i")
>>> box()
>>> rect(xleft = 0, ybottom = 0, xright = 1, ytop = 1, col = "grey80")
>>> par("usr")
>>> [1] 0 1 0 1
>>>
>>> Second, the "rect" function is automatically clipped to the plot area,
>>> so you may lose a bit at the edges if you don't override this:
>>>
>>> par(xpd=TRUE)
>>> rect(...)
>>> par(xpd=FALSE)
>>>
>>> Finally your second example simply multiplies the first problem by
>>> specifying a layout of more than one plot. Applying the "xaxs" and
>>> "yaxs" parameters before you start plotting will fix this:
>>>
>>> par(xaxs="i",yaxs="i")
>>>
>>> Jim
>>>
>>> On Fri, Jun 24, 2016 at 12:29 PM, Marius Hofert
>>>  wrote:
 Hi,

 I would like to replicate the behavior of box() with rect() (don't ask 
 why).
 However, my rect()angles are always too small. I looked a bit into the
 internal C_box but
 couldn't figure out how to solve the problem. Below is a minimal
 working (and a slightly bigger) example.

 Cheers,
 Marius

 ## MWE
 plot(NA, type = "n", ann = FALSE, axes = FALSE, xlim = 0:1, ylim = 0:1)
 rect(xleft = 0, ybottom = 0, xright = 1, ytop = 1, col = "grey80") #
 should match box()
 box()

 ## Extended example

 ## Basic plot
 my_rect <- function()
 {
 plot(NA, type = "n", ann = FALSE, axes = FALSE, xlim = 0:1, ylim = 0:1)
 rect(xleft = 0, ybottom = 0, xright = 1, ytop = 1, col = "grey80")
 # should match box()
 box()
 }

 ## Layout
 lay <- matrix(0, nrow = 3, ncol = 3, byrow = TRUE)
 lay[1,1] <- 1
 lay[2,1] <- 2
 lay[2,2] <- 3
 lay[2,3] <- 4
 lay[3,3] <- 5
 layout(lay, heights = c(1, 10, 1), widths = c(10, 1, 10))
 layout.show(5) # => no space between rectangles; calls box() to draw the 
 boxes

 ## Fill layout
 par(oma = rep(0, 4), mar = rep(0, 4))
 my_rect()
 my_rect()
 my_rect()
 my_rect()
 my_rect()
 ## => spaces between rectangles => why?/how to avoid?

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] performance of do.call("rbind")

2016-06-27 Thread Hervé Pagès

Hi,

Note that if your list of 200k data frames is the result of splitting
a big data frame, then trying to rbind the result of the split is
equivalent to reordering the orginal big data frame. More precisely,

  do.call(rbind, unname(split(df, f)))

is equivalent to

  df[order(f), , drop=FALSE]

(except for the rownames), but the latter is *much* faster!

Cheers,
H.


On 06/27/2016 08:51 AM, Witold E Wolski wrote:

I have a list (variable name data.list) with approx 200k data.frames
with dim(data.frame) approx 100x3.

a call

data <-do.call("rbind", data.list)

does not complete - run time is prohibitive (I killed the rsession
after 5 minutes).

I would think that merging data.frame's is a common operation. Is
there a better function (more performant) that I could use?

Thank you.
Witold






--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] performance of do.call("rbind")

2016-06-27 Thread Sarah Goslee
That's not what I said, though, and it's not necessarily true. Growing
an object within a loop _is_ a slow process, but that's not the
problem here. The problem is using data frames instead of matrices.
The need to manage column classes is very costly. Converting to
matrices will almost always be enormously faster.

Here's an expansion of the previous example I posted, in four parts:
1. do.call with data frame - very slow - 34.317 s elapsed time for
2000 data frames
2. do.call with matrix - very fast - 0.311 s elapsed
3. pre-allocated loop with data frame - even slower (!) - 82.162 s
4. pre-allocated loop with matrix - very fast - 68.009 s

It matters whether the columns are converted to numeric or character,
and the time doesn't scale linearly with list length. For a particular
problem, the best solution may vary greatly (and I didn't even include
packages beyond the base functionality). In general, though, using
matrices is faster than using data frames, and using do.call is faster
than using a pre-allocated loop, which is much faster than growing an
object.

Sarah

> testsize <- 5000
>
> set.seed(1234)
> testdf <- data.frame(matrix(runif(300), nrow=100, ncol=3))
> testdf.list <- lapply(seq_len(testsize), function(x)testdf)
>
> system.time(r.df <- do.call("rbind", testdf.list))
   user  system elapsed
 34.280   0.009  34.317
>
> system.time({
+ testm.list <- lapply(testdf.list, as.matrix)
+ r.m <- do.call("rbind", testm.list)
+ })
   user  system elapsed
  0.310   0.000   0.311
>
> system.time({
+ l.df <- data.frame(matrix(NA, nrow=100 * testsize, ncol=3))
+ for(i in seq_len(testsize)) {
+ start <- (i-1)*100 + 1
+ end <- i*100
+ l.df[start:end, ] <- testdf.list[[i]]
+ }
+ })
   user  system elapsed
 81.890   0.069  82.162
>
> system.time({
+ l.m <- data.frame(matrix(NA, nrow=100 * testsize, ncol=3))
+ testm.list <- lapply(testdf.list, as.matrix)
+ for(i in seq_len(testsize)) {
+ start <- (i-1)*100 + 1
+ end <- i*100
+ l.m[start:end, ] <- testm.list[[i]]
+ }
+ })
   user  system elapsed
 67.664   0.047  68.009




On Mon, Jun 27, 2016 at 1:05 PM, Marc Schwartz  wrote:
> Hi,
>
> Just to add my tuppence, which might not even be worth that these days...
>
> I found the following blog post from 2013, which is likely dated to some 
> extent, but provided some benchmarks for a few methods:
>
>   
> http://rcrastinate.blogspot.com/2013/05/the-rbinding-race-for-vs-docall-vs.html
>
> There is also a comment with a reference there to using the data.table 
> package, which I don't use, but may be something to evaluate.
>
> As Bert and Sarah hinted at, there is overhead in taking the repetitive 
> piecemeal approach.
>
> If all of your data frames are of the exact same column structure (column 
> order, column types), it may be prudent to do your own pre-allocation of a 
> data frame that is the target row total size and then "insert" each "sub" 
> data frame by using row indexing into the target structure.
>
> Regards,
>
> Marc Schwartz
>
>
>> On Jun 27, 2016, at 11:54 AM, Witold E Wolski  wrote:
>>
>> Hi Bert,
>>
>> You are most likely right. I just thought that do.call("rbind", is
>> somehow more clever and allocates the memory up front. My error. After
>> more searching I did find rbind.fill from plyr which seems to do the
>> job (it computes the size of the result data.frame and allocates it
>> first).
>>
>> best
>>
>> On 27 June 2016 at 18:49, Bert Gunter  wrote:
>>> The following might be nonsense, as I have no understanding of R
>>> internals; but 
>>>
>>> "Growing" structures in R by iteratively adding new pieces is often
>>> warned to be inefficient when the number of iterations is large, and
>>> your rbind() invocation might fall under this rubric. If so, you might
>>> try  issuing the call say, 20 times, over 10k disjoint subsets of the
>>> list, and then rbinding up the 20 large frames.
>>>
>>> Again, caveat emptor.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Mon, Jun 27, 2016 at 8:51 AM, Witold E Wolski  wrote:
 I have a list (variable name data.list) with approx 200k data.frames
 with dim(data.frame) approx 100x3.

 a call

 data <-do.call("rbind", data.list)

 does not complete - run time is prohibitive (I killed the rsession
 after 5 minutes).

 I would think that merging data.frame's is a common operation. Is
 there a better function (more performant) that I could use?

 Thank you.
 Witold


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, 

Re: [R] performance of do.call("rbind")

2016-06-27 Thread Marc Schwartz
Hi,

Just to add my tuppence, which might not even be worth that these days...

I found the following blog post from 2013, which is likely dated to some 
extent, but provided some benchmarks for a few methods:

  
http://rcrastinate.blogspot.com/2013/05/the-rbinding-race-for-vs-docall-vs.html

There is also a comment with a reference there to using the data.table package, 
which I don't use, but may be something to evaluate.

As Bert and Sarah hinted at, there is overhead in taking the repetitive 
piecemeal approach.

If all of your data frames are of the exact same column structure (column 
order, column types), it may be prudent to do your own pre-allocation of a data 
frame that is the target row total size and then "insert" each "sub" data frame 
by using row indexing into the target structure.

Regards,

Marc Schwartz


> On Jun 27, 2016, at 11:54 AM, Witold E Wolski  wrote:
> 
> Hi Bert,
> 
> You are most likely right. I just thought that do.call("rbind", is
> somehow more clever and allocates the memory up front. My error. After
> more searching I did find rbind.fill from plyr which seems to do the
> job (it computes the size of the result data.frame and allocates it
> first).
> 
> best
> 
> On 27 June 2016 at 18:49, Bert Gunter  wrote:
>> The following might be nonsense, as I have no understanding of R
>> internals; but 
>> 
>> "Growing" structures in R by iteratively adding new pieces is often
>> warned to be inefficient when the number of iterations is large, and
>> your rbind() invocation might fall under this rubric. If so, you might
>> try  issuing the call say, 20 times, over 10k disjoint subsets of the
>> list, and then rbinding up the 20 large frames.
>> 
>> Again, caveat emptor.
>> 
>> Cheers,
>> Bert
>> 
>> 
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> 
>> On Mon, Jun 27, 2016 at 8:51 AM, Witold E Wolski  wrote:
>>> I have a list (variable name data.list) with approx 200k data.frames
>>> with dim(data.frame) approx 100x3.
>>> 
>>> a call
>>> 
>>> data <-do.call("rbind", data.list)
>>> 
>>> does not complete - run time is prohibitive (I killed the rsession
>>> after 5 minutes).
>>> 
>>> I would think that merging data.frame's is a common operation. Is
>>> there a better function (more performant) that I could use?
>>> 
>>> Thank you.
>>> Witold
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Witold Eryk Wolski
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Witold Eryk Wolski
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] performance of do.call("rbind")

2016-06-27 Thread Jeff Newmiller
Your description of the data frames as "approx" puts the solution to 
considerable difficulty and speed penalty. If you want better performance you 
need a better handle on the data you are working with. 

For example, if you knew that every data frame had exactly three columns named 
identically and exactly 100 rows, then you could preallocate the result data 
frame and loop through the input data copying values directly to the 
appropriate destination locations in the result. 

To the extent that you can figure out things like the union of all column names 
or the total number of rows prior to starting copying data, you can adapt the 
above approach even if the input data frames are not identical. The key is not 
having to restructure/reallocate your result data frame as you go. 

The bind_rows function in the dplyr package can do a lot of this for you... but 
being a general-purpose function it may not be as optimized as you could do 
yourself with better knowledge of your data. 
-- 
Sent from my phone. Please excuse my brevity.

On June 27, 2016 8:51:17 AM PDT, Witold E Wolski  wrote:
>I have a list (variable name data.list) with approx 200k data.frames
>with dim(data.frame) approx 100x3.
>
>a call
>
>data <-do.call("rbind", data.list)
>
>does not complete - run time is prohibitive (I killed the rsession
>after 5 minutes).
>
>I would think that merging data.frame's is a common operation. Is
>there a better function (more performant) that I could use?
>
>Thank you.
>Witold

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] performance of do.call("rbind")

2016-06-27 Thread Witold E Wolski
Hi Bert,

You are most likely right. I just thought that do.call("rbind", is
somehow more clever and allocates the memory up front. My error. After
more searching I did find rbind.fill from plyr which seems to do the
job (it computes the size of the result data.frame and allocates it
first).

best

On 27 June 2016 at 18:49, Bert Gunter  wrote:
> The following might be nonsense, as I have no understanding of R
> internals; but 
>
> "Growing" structures in R by iteratively adding new pieces is often
> warned to be inefficient when the number of iterations is large, and
> your rbind() invocation might fall under this rubric. If so, you might
> try  issuing the call say, 20 times, over 10k disjoint subsets of the
> list, and then rbinding up the 20 large frames.
>
> Again, caveat emptor.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Jun 27, 2016 at 8:51 AM, Witold E Wolski  wrote:
>> I have a list (variable name data.list) with approx 200k data.frames
>> with dim(data.frame) approx 100x3.
>>
>> a call
>>
>> data <-do.call("rbind", data.list)
>>
>> does not complete - run time is prohibitive (I killed the rsession
>> after 5 minutes).
>>
>> I would think that merging data.frame's is a common operation. Is
>> there a better function (more performant) that I could use?
>>
>> Thank you.
>> Witold
>>
>>
>>
>>
>> --
>> Witold Eryk Wolski
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



-- 
Witold Eryk Wolski

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] performance of do.call("rbind")

2016-06-27 Thread Sarah Goslee
There is a substantial overhead in rbind.dataframe() because of the
need to check the column types. Converting to matrix makes a huge
difference in speed, but be careful of type coercion.

testdf <- data.frame(matrix(runif(300), nrow=100, ncol=3))
testdf.list <- lapply(1:1, function(x)testdf)

system.time(r.df <- do.call("rbind", testdf.list))

system.time({
testm.list <- lapply(testdf.list, as.matrix)
r.m <- do.call("rbind", testm.list)
})


> testdf <- data.frame(matrix(runif(300), nrow=100, ncol=3))
> testdf.list <- lapply(1:1, function(x)testdf)
>
> system.time(r.df <- do.call("rbind", testdf.list))
   user  system elapsed
195.105  36.419 231.930
>
> system.time({
+ testm.list <- lapply(testdf.list, as.matrix)
+ r.m <- do.call("rbind", testm.list)
+ })
   user  system elapsed
  0.603   0.009   0.612

Sarah

On Mon, Jun 27, 2016 at 11:51 AM, Witold E Wolski  wrote:
> I have a list (variable name data.list) with approx 200k data.frames
> with dim(data.frame) approx 100x3.
>
> a call
>
> data <-do.call("rbind", data.list)
>
> does not complete - run time is prohibitive (I killed the rsession
> after 5 minutes).
>
> I would think that merging data.frame's is a common operation. Is
> there a better function (more performant) that I could use?
>
> Thank you.
> Witold
>
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] performance of do.call("rbind")

2016-06-27 Thread Bert Gunter
The following might be nonsense, as I have no understanding of R
internals; but 

"Growing" structures in R by iteratively adding new pieces is often
warned to be inefficient when the number of iterations is large, and
your rbind() invocation might fall under this rubric. If so, you might
try  issuing the call say, 20 times, over 10k disjoint subsets of the
list, and then rbinding up the 20 large frames.

Again, caveat emptor.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jun 27, 2016 at 8:51 AM, Witold E Wolski  wrote:
> I have a list (variable name data.list) with approx 200k data.frames
> with dim(data.frame) approx 100x3.
>
> a call
>
> data <-do.call("rbind", data.list)
>
> does not complete - run time is prohibitive (I killed the rsession
> after 5 minutes).
>
> I would think that merging data.frame's is a common operation. Is
> there a better function (more performant) that I could use?
>
> Thank you.
> Witold
>
>
>
>
> --
> Witold Eryk Wolski
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] refclasses question - Defining accessor function

2016-06-27 Thread Witold E Wolski
Are accessors a fancy feature that do not work?

I wanted to use accessor functions in a R refclass to hide the classes
implementation where I am using sqlite.

What I did observe is, that if I access in a method any of the fields
(in the example below field .data in method printExample) all the
accessor functions are called (even those not accessed by the function
: in this case funnydata is not accessed).

That means if any of the accessor functions is slow (in the example
the funnydata accessor sleeps for 3 s) all the fields and all
functions accessing any fields will be slow.
In the example below accessing .data or calling printExample will take 3s.

It's easy enough not to use accessor functions, so not a big deal.
Still, I lost quite a bit of time wondering what is happening.



Test <-setRefClass("Test",
   fields = list( .data = "list",
  funnydata = function(x){
Sys.sleep(3)
if(missing(x)){
  Sys.sleep(3)
}
  }
   ),
   methods = list(
 initialise=function(){ .data <<- list(a="a")},

 printExample = function(){
   print("print Example")
   print(.data)}
 )
)

test<-Test()
test$printExample()
test$.data

-- 
Witold Eryk Wolski

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] performance of do.call("rbind")

2016-06-27 Thread Witold E Wolski
I have a list (variable name data.list) with approx 200k data.frames
with dim(data.frame) approx 100x3.

a call

data <-do.call("rbind", data.list)

does not complete - run time is prohibitive (I killed the rsession
after 5 minutes).

I would think that merging data.frame's is a common operation. Is
there a better function (more performant) that I could use?

Thank you.
Witold




-- 
Witold Eryk Wolski

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] API key at start-up

2016-06-27 Thread Jeff Newmiller
"Assign ... key to a value" defies my understanding of those terms, and 
includes no context (API is a very vague term). We are not (necessarily) 
subject area experts in your preferred domain of jargon. 

Doing things when you start up your session is typically done as described in

?Startup
-- 
Sent from my phone. Please excuse my brevity.

On June 27, 2016 7:25:24 AM PDT, Glenn Schultz  wrote:
>All,
>
>Is there a way to assign an API key to a value FREDAPI which is loaded
>and available once a R session is has started?
>
>Glenn
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: Fwd: RE: Heatmap.2 Breaks argument

2016-06-27 Thread Jeff Newmiller
Please use Reply-All to keep the mailing list included in the conversation. I 
don't do private consulting via the Internet, and others can correct me if I 
give bad advice. 

I doubt the maintaner function "doesn't work"... more likely you did not read 
the help file to learn how to use it:

?maintainer

and tried to give it the name of a function instead of the name of the package 
containing that function:

# correct usage
maintainer( "gplots" )

You can confirm which package a function like heatmap.2 is in by reading the 
help file for that function:

?heatmap.2

or

help.search( "heatmap.2" )

if you have installed that package but not yet loaded it using library().
-- 
Sent from my phone. Please excuse my brevity.

On June 27, 2016 7:14:01 AM PDT, fgoetz  wrote:
>Hi Jeff,
>
>I just tried the maintainerfunction but it did not work for heatmap.2.
>
>Best,
>
>Florian
>
>
>Am 24.06.2016 um 17:08 schrieb Jeff Newmiller:
>> Did you try the maintainer() function?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] API key at start-up

2016-06-27 Thread Glenn Schultz
All,

Is there a way to assign an API key to a value FREDAPI which is loaded and 
available once a R session is has started?

Glenn
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Antwort: RE: Antwort: Fw: Re: Subscripting problem with is.na()

2016-06-27 Thread PIKAL Petr
Hi

On top of what Duncan wrote you can check results yourself

> str(iris[,"Sepal.Length"])
num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

here you get vector and data.frame class is lost. The result is same as
> str(iris$Sepal.Length)
num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

However

> str(iris["Sepal.Length"])
'data.frame':   150 obs. of  1 variable:
$ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

here the data frame class is preserved.

I subscript rectangular objects (arrays, matricex, data, frames) exclusively by

object[ , , something, ] or in case of data frame object.df[sub, ]

and lists with

object.l[sub] or object.l[[sub]]

so I never had problems with it.

Cheers
Petr


From: g.maub...@weinwolf.de [mailto:g.maub...@weinwolf.de]
Sent: Monday, June 27, 2016 1:43 PM
To: PIKAL Petr 
Cc: r-help@r-project.org
Subject: Antwort: RE: [R] Antwort: Fw: Re: Subscripting problem with is.na()

Hi Petr,

many thanks for your reply and the examples.

My subscripting problems drive me nuts.

I have understood that dataset[variable] is semantically identical to dataset[, 
variable] cause dataset[variable] takes all cases because no other subscripts 
are given.

Where can I lookup the rules when to use the comma and when not?

Kind regards

Georg





Von:PIKAL Petr >
An:"g.maub...@weinwolf.de" 
>,
Kopie:"r-help@r-project.org" 
>
Datum:27.06.2016 11:03
Betreff:RE: [R] Antwort: Fw: Re:  Subscripting problem with is.na()




Hi

see in line

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: Monday, June 27, 2016 10:45 AM
> To: David L Carlson >; Bert Gunter
> >
> Cc: r-help@r-project.org
> Subject: [R] Antwort: Fw: Re: Subscripting problem with is.na()
>
> Hi David,
> Hi Bert,
>
> many thanks for the valuable discussion on NA in R (please see extract
> below). I follow your arguments leaving NA as they are for most of the
> time. In special occasions however I want to replace the NA with another
> value. To preserve the newly acquired knowledge for me I wrote this
> function:
>
> -- cut --
> t_replace_na <- function(dataset, variable, value) {
>  if(inherits(dataset[[variable]], "factor") == TRUE) {
>dataset[variable] <- as.character(dataset[variable])
>print(class(dataset[variable]))
>dataset[, variable][is.na(dataset[, variable])] <- value
>dataset[variable] <- as.factor(dataset[variable])
>print(class(dataset[variable]))
>  } else {
>dataset[, variable][is.na(dataset[, variable])] <- value
>  }
>  return(dataset)
> }
>



> class(ds_test[, "c"])
> test_class(ds_test, "c")
> warning("'c' should be factor NOT data.frame.
> In addition data.frame != factor")
> -- cut --
>
> Why do I get different results for the same function if it is inside or
> outside my own function definition?

Because you still are missing the way how to subscript data frames.

test_class <- function(dataset, variable) {
 if(inherits(dataset[, variable], "factor") == TRUE) {
   return(c(class(dataset[,variable]), TRUE))
 
} else {
   return(c(class(dataset[,variable]), FALSE))
##
 }
}

> test_class(ds_test, "a")
[1] "numeric" "FALSE"
> test_class(ds_test, "c")
[1] "factor" "TRUE"
>

If you properly arrange commas in your function you get desired result

p_replace_na <- function(dataset, variable, value) {
if(inherits(dataset[,variable], "factor") == TRUE) {
  dataset[,variable] <- as.character(dataset[,variable])
  print(class(dataset[,variable]))
  dataset[, variable][is.na(dataset[, variable])] <- value
  dataset[, variable] <- as.factor(dataset[, variable])
  print(class(dataset[, variable]))
} else {
  dataset[, variable][is.na(dataset[, variable])] <- value
}
return(dataset)
}

> p_replace_na(ds_test, "c", value = -3)
[1] "character"
[1] "factor"
  a  b  c
1  1 NA  A
2 NA NA  b
3  2 NA -3

> t_replace_na(ds_test, "c", value = -3)
[1] "data.frame"
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
>

Cheers
Petr



>
> Kind regards
>
> Georg
>
> 
>
> > Gesendet: Donnerstag, 23. Juni 2016 um 21:14 Uhr
> > Von: "David L Carlson" >
> > An: "Bert Gunter" >
> > Cc: "R Help" >
> > Betreff: Re: [R] Subscripting problem with is.na()
> >
> > Good point. I did not 

Re: [R] object 'add.expr' not found

2016-06-27 Thread Duncan Murdoch

On 27/06/2016 3:52 AM, Christian Hoffmann wrote:

Since the change to R-3.2.1 I seem to be unable to compile and install
my package cwhmisc. One evidence is the appearance of th messages in R
CMD build and install:

* installing *source* package 'cwhmisc' ...
** R
** inst
** preparing package for lazy loading
Error in eval(expr, envir, enclos) : object 'add.expr' not found

Can anyone give me a pointer to where I should investigate my problem
further?

C.

"add.expr" is an argument to the heatmap() function.  If you're using 
that, make sure your calls are okay.  If not, you'll need to find what 
other function is using add.expr; there are no other base functions that 
use it.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] object 'add.expr' not found

2016-06-27 Thread Christian Hoffmann
Since the change to R-3.2.1 I seem to be unable to compile and install 
my package cwhmisc. One evidence is the appearance of th messages in R 
CMD build and install:


* installing *source* package 'cwhmisc' ...
** R
** inst
** preparing package for lazy loading
Error in eval(expr, envir, enclos) : object 'add.expr' not found

Can anyone give me a pointer to where I should investigate my problem 
further?


C.

--
Christian W. Hoffmann
CH - 8915 Hausen am Albis, Schweiz
Rigiblickstrasse 15 b, Tel.+41-44-7640853
mailto: christ...@echoffmann.ch
home: www.echoffmann.ch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: RE: Antwort: Fw: Re: Subscripting problem with is.na()

2016-06-27 Thread G . Maubach
Hi All,

Petr, Bert, David, Ivan, Duncan and Rui helped me to develop a function 
able to replace NA's in variables IF NEEDED:

#---
# Module: t_replace_na.R
# Author: Georg Maubach
# Date  : 2016-06-27
# Update: 2016-06-27
# Description   : Replace NA with another value
# Source System : R 3.3.0 (64 Bit)
# Target System : R 3.3.0 (64 Bit)
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#1-2-3-4-5-6-7-8

t_version = "2016-06-27"
t_module_name = "t_replace_na.R"

cat(
  paste0("\n",
 t_module_name, " (Version: ", t_version, ")", "\n", "\n",
 "This software comes with ABSOLUTELY NO WARRANTY.",
 "\n", "\n"))

# If do_test is not defined globally define it here locally by 
un-commenting it
t_do_test <- FALSE

# [ Function Defintion 
]
t_replace_na <- function(dataset, variables, value) {
  # Replace NA with another given value
  #
  # Args:
  #   dataset (data frame, data table):
  # Object with dimnames, e.g. data frame, data table.
  #   variables (character vector):
  # List of variable names.
  #
  # Operation:
  #   NA is replaced by the value given with the parameter "value".
  #
  #   A factor is converted explicitly with as.character(), the missing 
value
  #   replacement is done and then the character vector is converted back 
with
  #   as.factor(). Thus NA becomes a category of the new factor variable.
  #
  # Caution:
  #   Please check your data in case you replace NA within factors due to
  #   explicit type conversion. Tests were done only for the below given
  #   dataset.
  #
  # Returns:
  #   Original dataset.
  #
  # Error handling:
  #   None.
  #
  # Credits: 
https://www.mail-archive.com/r-help@r-project.org/msg236537.html

  for (variable in variables) {
if (inherits(dataset[, variable], "factor") == TRUE) {
  dataset[, variable] <- as.character(dataset[, variable])
  print(class(dataset[, variable]))
  dataset[, variable][is.na(dataset[, variable])] <- value
  dataset[, variable] <- as.factor(dataset[, variable])
  print(class(dataset[, variable]))
} else {
  dataset[, variable][is.na(dataset[, variable])] <- value
}
  }
  return(dataset)
}

# [ Test Defintion 
]
t_test <- function(do_test = FALSE) {
  if (do_test == TRUE) {
cat("\n", "\n", "Test function t_count_na()", "\n", "\n")
 
# Example dataset
ds_example <- data.frame(a=c(1,NA,2), b = rep(NA,3), c = 
c("A","b",NA))
 
cat("\n", "\n", "Example dataset before function call", "\n", "\n")
cat("Variables and their classes:\n")
print(sapply(ds_example, class))
cat("Dataset:\n")
print(ds_example)
 
cat("\n", "\n", "Function call", "\n", "\n")
ds_result <- t_replace_na(ds_example, "a", value = -1)
cat("\n", "\n", "Dataset after function call", "\n", "\n") 
print(ds_result)
 
cat("\n", "\n", "Function call", "\n", "\n")
ds_result <- t_replace_na(ds_example, "b", value = -2)
cat("\n", "\n", "Example dataset after function call", "\n", "\n") 
print(ds_result)

cat("\n", "\n", "Function call", "\n", "\n") 
ds_result <- t_replace_na(ds_example, "c", value = -3)
cat("\n", "\n", "Example dataset after function call", "\n", "\n") 
print(ds_result) 
  }
}

# [ Test Run 
]--
t_test(do_test = t_do_test)

# [ Clean up 
]--
rm("t_module_name", "t_version", "t_do_test", "t_test")

# EOF .

Please note: R has capabilities to handle NA correctly. There is often no 
need to recode NA. Also NA might or might not have meaning. You have to 
decide with regard to the meaning of the original data and the business 
problem.

Kind regards

Georg




Von:PIKAL Petr 
An: "g.maub...@weinwolf.de" , 
Kopie:  "r-help@r-project.org" 
Datum:  27.06.2016 11:03
Betreff:RE: [R] Antwort: Fw: Re:  Subscripting problem with 
is.na()



Hi

see in line

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: Monday, June 27, 2016 10:45 AM
> To: David L Carlson ; Bert Gunter
> 
> Cc: r-help@r-project.org
> Subject: [R] Antwort: Fw: Re: Subscripting problem with is.na()
>
> Hi David,
> Hi Bert,
>
> many thanks for the valuable discussion on NA in R (please see extract
> below). I follow your arguments leaving NA as they are for most of the
> time. In special occasions however I want to 

Re: [R] Antwort: RE: Antwort: Fw: Re: Subscripting problem with is.na()

2016-06-27 Thread Duncan Murdoch

On 27/06/2016 7:43 AM, g.maub...@weinwolf.de wrote:

Hi Petr,

many thanks for your reply and the examples.

My subscripting problems drive me nuts.

I have understood that dataset[variable] is semantically identical to
dataset[, variable] cause dataset[variable] takes all cases because no
other subscripts are given.

Where can I lookup the rules when to use the comma and when not?


I don't think you'll find an explicit list of rules in the R 
documentation.  It does imply the rules, however, by saying that a data 
frame is a list which can be indexed as a matrix.


So if you want to treat your dataset as a list of columns, use single 
component list indexing:  dataset[columnname] to give another list, 
dataset[[columnname]] to extract the column as a vector.


If you want to treat it as a matrix of values, use two indices:

dataset[row, column]

to extract the entry (or entries, if row or column contains more than 
one value).


Duncan Murdoch



Kind regards

Georg





Von:PIKAL Petr 
An: "g.maub...@weinwolf.de" ,
Kopie:  "r-help@r-project.org" 
Datum:  27.06.2016 11:03
Betreff:RE: [R] Antwort: Fw: Re:  Subscripting problem with
is.na()



Hi

see in line


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
g.maub...@weinwolf.de
Sent: Monday, June 27, 2016 10:45 AM
To: David L Carlson ; Bert Gunter

Cc: r-help@r-project.org
Subject: [R] Antwort: Fw: Re: Subscripting problem with is.na()

Hi David,
Hi Bert,

many thanks for the valuable discussion on NA in R (please see extract
below). I follow your arguments leaving NA as they are for most of the
time. In special occasions however I want to replace the NA with another
value. To preserve the newly acquired knowledge for me I wrote this
function:

-- cut --
t_replace_na <- function(dataset, variable, value) {
 if(inherits(dataset[[variable]], "factor") == TRUE) {
   dataset[variable] <- as.character(dataset[variable])
   print(class(dataset[variable]))
   dataset[, variable][is.na(dataset[, variable])] <- value
   dataset[variable] <- as.factor(dataset[variable])
   print(class(dataset[variable]))
 } else {
   dataset[, variable][is.na(dataset[, variable])] <- value
 }
 return(dataset)
}






class(ds_test[, "c"])
test_class(ds_test, "c")
warning("'c' should be factor NOT data.frame.
In addition data.frame != factor")
-- cut --

Why do I get different results for the same function if it is inside or
outside my own function definition?


Because you still are missing the way how to subscript data frames.

test_class <- function(dataset, variable) {
  if(inherits(dataset[, variable], "factor") == TRUE) {
return(c(class(dataset[,variable]), TRUE))
 
} else {
return(c(class(dataset[,variable]), FALSE))
##
  }
}


test_class(ds_test, "a")

[1] "numeric" "FALSE"

test_class(ds_test, "c")

[1] "factor" "TRUE"




If you properly arrange commas in your function you get desired result

p_replace_na <- function(dataset, variable, value) {
 if(inherits(dataset[,variable], "factor") == TRUE) {
   dataset[,variable] <- as.character(dataset[,variable])
   print(class(dataset[,variable]))
   dataset[, variable][is.na(dataset[, variable])] <- value
   dataset[, variable] <- as.factor(dataset[, variable])
   print(class(dataset[, variable]))
 } else {
   dataset[, variable][is.na(dataset[, variable])] <- value
 }
 return(dataset)
}


p_replace_na(ds_test, "c", value = -3)

[1] "character"
[1] "factor"
   a  b  c
1  1 NA  A
2 NA NA  b
3  2 NA -3


t_replace_na(ds_test, "c", value = -3)

[1] "data.frame"
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?




Cheers
Petr





Kind regards

Georg




Gesendet: Donnerstag, 23. Juni 2016 um 21:14 Uhr
Von: "David L Carlson" 
An: "Bert Gunter" 
Cc: "R Help" 
Betreff: Re: [R] Subscripting problem with is.na()

Good point. I did not think about factors. Also your example raises

another issue since column c is logical, but gets silently converted to
numeric. This would seem to get the job done assuming the conversion is
intended for numeric columns only:



test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3))
sapply(test, class)

a b c
"numeric"  "factor" "logical"

num <- sapply(test, is.numeric)
test[, num][is.na(test[, num])] <- 0
test

  ab  c
1 1A NA
2 0b NA
3 2  NA

David C


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.



[R] Antwort: RE: Antwort: Fw: Re: Subscripting problem with is.na()

2016-06-27 Thread G . Maubach
Hi Petr,

many thanks for your reply and the examples.

My subscripting problems drive me nuts.

I have understood that dataset[variable] is semantically identical to 
dataset[, variable] cause dataset[variable] takes all cases because no 
other subscripts are given.

Where can I lookup the rules when to use the comma and when not?

Kind regards

Georg





Von:PIKAL Petr 
An: "g.maub...@weinwolf.de" , 
Kopie:  "r-help@r-project.org" 
Datum:  27.06.2016 11:03
Betreff:RE: [R] Antwort: Fw: Re:  Subscripting problem with 
is.na()



Hi

see in line

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: Monday, June 27, 2016 10:45 AM
> To: David L Carlson ; Bert Gunter
> 
> Cc: r-help@r-project.org
> Subject: [R] Antwort: Fw: Re: Subscripting problem with is.na()
>
> Hi David,
> Hi Bert,
>
> many thanks for the valuable discussion on NA in R (please see extract
> below). I follow your arguments leaving NA as they are for most of the
> time. In special occasions however I want to replace the NA with another
> value. To preserve the newly acquired knowledge for me I wrote this
> function:
>
> -- cut --
> t_replace_na <- function(dataset, variable, value) {
>  if(inherits(dataset[[variable]], "factor") == TRUE) {
>dataset[variable] <- as.character(dataset[variable])
>print(class(dataset[variable]))
>dataset[, variable][is.na(dataset[, variable])] <- value
>dataset[variable] <- as.factor(dataset[variable])
>print(class(dataset[variable]))
>  } else {
>dataset[, variable][is.na(dataset[, variable])] <- value
>  }
>  return(dataset)
> }
>



> class(ds_test[, "c"])
> test_class(ds_test, "c")
> warning("'c' should be factor NOT data.frame.
> In addition data.frame != factor")
> -- cut --
>
> Why do I get different results for the same function if it is inside or
> outside my own function definition?

Because you still are missing the way how to subscript data frames.

test_class <- function(dataset, variable) {
  if(inherits(dataset[, variable], "factor") == TRUE) {
return(c(class(dataset[,variable]), TRUE))
 
} else {
return(c(class(dataset[,variable]), FALSE))
##
  }
}

> test_class(ds_test, "a")
[1] "numeric" "FALSE"
> test_class(ds_test, "c")
[1] "factor" "TRUE"
>

If you properly arrange commas in your function you get desired result

p_replace_na <- function(dataset, variable, value) {
 if(inherits(dataset[,variable], "factor") == TRUE) {
   dataset[,variable] <- as.character(dataset[,variable])
   print(class(dataset[,variable]))
   dataset[, variable][is.na(dataset[, variable])] <- value
   dataset[, variable] <- as.factor(dataset[, variable])
   print(class(dataset[, variable]))
 } else {
   dataset[, variable][is.na(dataset[, variable])] <- value
 }
 return(dataset)
}

> p_replace_na(ds_test, "c", value = -3)
[1] "character"
[1] "factor"
   a  b  c
1  1 NA  A
2 NA NA  b
3  2 NA -3

> t_replace_na(ds_test, "c", value = -3)
[1] "data.frame"
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
>

Cheers
Petr



>
> Kind regards
>
> Georg
>
> 
>
> > Gesendet: Donnerstag, 23. Juni 2016 um 21:14 Uhr
> > Von: "David L Carlson" 
> > An: "Bert Gunter" 
> > Cc: "R Help" 
> > Betreff: Re: [R] Subscripting problem with is.na()
> >
> > Good point. I did not think about factors. Also your example raises
> another issue since column c is logical, but gets silently converted to
> numeric. This would seem to get the job done assuming the conversion is
> intended for numeric columns only:
> >
> > > test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3))
> > > sapply(test, class)
> > a b c
> > "numeric"  "factor" "logical"
> > > num <- sapply(test, is.numeric)
> > > test[, num][is.na(test[, num])] <- 0
> > > test
> >   ab  c
> > 1 1A NA
> > 2 0b NA
> > 3 2  NA
> >
> > David C
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jak�koliv k n�mu p�ipojen� dokumenty jsou d�v�rn� a jsou 
ur�eny pouze jeho adres�t�m.
Jestli�e jste obdr�el(a) tento e-mail omylem, informujte laskav� 
neprodlen� jeho odes�latele. Obsah tohoto emailu i s p��lohami a jeho 
kopie vyma�te ze sv�ho syst�mu.
Nejste-li zam��len�m adres�tem tohoto emailu, nejste opr�vn�ni tento email 
jakkoliv u��vat, roz�i�ovat, kop�rovat �i zve�ej�ovat.
Odes�latel e-mailu neodpov�d� 

Re: [R] Antwort: Fw: Re: Subscripting problem with is.na()

2016-06-27 Thread PIKAL Petr
Hi

see in line

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: Monday, June 27, 2016 10:45 AM
> To: David L Carlson ; Bert Gunter
> 
> Cc: r-help@r-project.org
> Subject: [R] Antwort: Fw: Re: Subscripting problem with is.na()
>
> Hi David,
> Hi Bert,
>
> many thanks for the valuable discussion on NA in R (please see extract
> below). I follow your arguments leaving NA as they are for most of the
> time. In special occasions however I want to replace the NA with another
> value. To preserve the newly acquired knowledge for me I wrote this
> function:
>
> -- cut --
> t_replace_na <- function(dataset, variable, value) {
>  if(inherits(dataset[[variable]], "factor") == TRUE) {
>dataset[variable] <- as.character(dataset[variable])
>print(class(dataset[variable]))
>dataset[, variable][is.na(dataset[, variable])] <- value
>dataset[variable] <- as.factor(dataset[variable])
>print(class(dataset[variable]))
>  } else {
>dataset[, variable][is.na(dataset[, variable])] <- value
>  }
>  return(dataset)
> }
>



> class(ds_test[, "c"])
> test_class(ds_test, "c")
> warning("'c' should be factor NOT data.frame.
> In addition data.frame != factor")
> -- cut --
>
> Why do I get different results for the same function if it is inside or
> outside my own function definition?

Because you still are missing the way how to subscript data frames.

test_class <- function(dataset, variable) {
  if(inherits(dataset[, variable], "factor") == TRUE) {
return(c(class(dataset[,variable]), TRUE))
 
} else {
return(c(class(dataset[,variable]), FALSE))
##
  }
}

> test_class(ds_test, "a")
[1] "numeric" "FALSE"
> test_class(ds_test, "c")
[1] "factor" "TRUE"
>

If you properly arrange commas in your function you get desired result

p_replace_na <- function(dataset, variable, value) {
 if(inherits(dataset[,variable], "factor") == TRUE) {
   dataset[,variable] <- as.character(dataset[,variable])
   print(class(dataset[,variable]))
   dataset[, variable][is.na(dataset[, variable])] <- value
   dataset[, variable] <- as.factor(dataset[, variable])
   print(class(dataset[, variable]))
 } else {
   dataset[, variable][is.na(dataset[, variable])] <- value
 }
 return(dataset)
}

> p_replace_na(ds_test, "c", value = -3)
[1] "character"
[1] "factor"
   a  b  c
1  1 NA  A
2 NA NA  b
3  2 NA -3

> t_replace_na(ds_test, "c", value = -3)
[1] "data.frame"
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
>

Cheers
Petr



>
> Kind regards
>
> Georg
>
> 
>
> > Gesendet: Donnerstag, 23. Juni 2016 um 21:14 Uhr
> > Von: "David L Carlson" 
> > An: "Bert Gunter" 
> > Cc: "R Help" 
> > Betreff: Re: [R] Subscripting problem with is.na()
> >
> > Good point. I did not think about factors. Also your example raises
> another issue since column c is logical, but gets silently converted to
> numeric. This would seem to get the job done assuming the conversion is
> intended for numeric columns only:
> >
> > > test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3))
> > > sapply(test, class)
> > a b c
> > "numeric"  "factor" "logical"
> > > num <- sapply(test, is.numeric)
> > > test[, num][is.na(test[, num])] <- 0
> > > test
> >   ab  c
> > 1 1A NA
> > 2 0b NA
> > 3 2  NA
> >
> > David C
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není 

[R] Antwort: Fw: Re: Subscripting problem with is.na()

2016-06-27 Thread G . Maubach
Hi David,
Hi Bert,

many thanks for the valuable discussion on NA in R (please see extract 
below). I follow your arguments leaving NA as they are for most of the 
time. In special occasions however I want to replace the NA with another 
value. To preserve the newly acquired knowledge for me I wrote this 
function:

-- cut --
t_replace_na <- function(dataset, variable, value) {
 if(inherits(dataset[[variable]], "factor") == TRUE) {
   dataset[variable] <- as.character(dataset[variable])
   print(class(dataset[variable]))
   dataset[, variable][is.na(dataset[, variable])] <- value
   dataset[variable] <- as.factor(dataset[variable])
   print(class(dataset[variable]))
 } else {
   dataset[, variable][is.na(dataset[, variable])] <- value
 }
 return(dataset)
}

ds_test <- data.frame(a=c(1,NA,2), b = rep(NA,3), c = c("A","b",NA))
print(sapply(ds_test, class))

t_replace_na(ds_test, "a", value = -1)
t_replace_na(ds_test, "b", value = -2)
t_replace_na(ds_test, "c", value = -3)
-- cut --

Unfortunately the if-statement does not work due to a wrong class 
definition within the function. When finding out what is going on I did 
this:

-- cut --
test_class <- function(dataset, variable) {
  if(inherits(dataset[, variable], "factor") == TRUE) {
return(c(class(dataset[variable]), TRUE))
  } else {
return(c(class(dataset[variable]), FALSE))
  }
}

ds_test <- data.frame(a=c(1,NA,2), b = rep(NA,3), c = c("A","b",NA))
print(sapply(ds_test, class))

# -- Test a --
class(ds_test[, "a"])
if(inherits(ds_test[, "a"], "factor")) {
  print(c(class(ds_test[, "a"]), "TRUE"))
} else {
  print(c(class(ds_test[, "a"]), "FALSE"))
}
test_class(ds_test, "a")
warning("'a' should be numeric NOT data.frame!")

# -- Test b --
if(inherits(ds_test[, "b"], "factor")) {
  print(c(class(ds_test[, "b"]), "TRUE"))
} else {
  print(c(class(ds_test[, "b"]), "FALSE"))
}
class(ds_test[, "b"])
test_class(ds_test, "b")
warning("'b' should be logical NOT data.frame!")

# -- Test c --
if(inherits(ds_test[, "c"], "factor")) {
  print(c(class(ds_test[, "c"]), "TRUE"))
} else {
  print(c(class(ds_test[, "c"]), "FALSE"))
}
class(ds_test[, "c"])
test_class(ds_test, "c")
warning("'c' should be factor NOT data.frame.
In addition data.frame != factor")
-- cut --

Why do I get different results for the same function if it is inside or 
outside my own function definition?

Kind regards

Georg



> Gesendet: Donnerstag, 23. Juni 2016 um 21:14 Uhr
> Von: "David L Carlson" 
> An: "Bert Gunter" 
> Cc: "R Help" 
> Betreff: Re: [R] Subscripting problem with is.na()
>
> Good point. I did not think about factors. Also your example raises 
another issue since column c is logical, but gets silently converted to 
numeric. This would seem to get the job done assuming the conversion is 
intended for numeric columns only:
> 
> > test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3))
> > sapply(test, class)
> a b c 
> "numeric"  "factor" "logical" 
> > num <- sapply(test, is.numeric)
> > test[, num][is.na(test[, num])] <- 0
> > test
>   ab  c
> 1 1A NA
> 2 0b NA
> 3 2  NA
> 
> David C

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Right input mechanism to R for high amount of data

2016-06-27 Thread PIKAL Petr
Hi

have you read something of these?

http://www.r-bloggers.com/five-ways-to-handle-big-data-in-r/
https://en.wikipedia.org/wiki/Programming_with_Big_Data_in_R
http://r-pbd.org/
http://www.columbia.edu/~sjm2186/EPIC_R/EPIC_R_BigData.pdf

I am not an expert in big data, however when reading your data takes days I 
wonder how do you want to do the analysis. AFAIK R keeps all read data in 
memory.

Maybe others can give you better answer.

Cheers
Petr


> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Sawhney,
> Prerna (Nokia - IN/Bangalore)
> Sent: Monday, June 27, 2016 7:48 AM
> To: r-help@r-project.org
> Subject: [R] Right input mechanism to R for high amount of data
>
>
> Hi All,
>
> I am currently loading 3B (20GB) events in my algorithm for processing. I am
> reading this data from postgresXL DB cluster (1 coordinator+4 datanodes
> (8cpu 61GB 200GB machines each)) total 1TB of space.
>
> The whole data loading is taking too much time almost 5days before I can
> start running my algorithms.
>
> Can you please help me in suggesting right technology to choose for
> inputting data? So clearly DB is the bottleneck right now
>
> Should I move away from postgresXL ? Which is most suitable options DB,
> File, Paraquet File to load data efficiently in R?
>
> Look forward to your responses
>
> Thanks
> Prerna
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Right input mechanism to R for high amount of data

2016-06-27 Thread Sawhney, Prerna (Nokia - IN/Bangalore)

Hi All,

I am currently loading 3B (20GB) events in my algorithm for processing. I am 
reading this data from postgresXL DB cluster (1 coordinator+4 datanodes (8cpu 
61GB 200GB machines each)) total 1TB of space.

The whole data loading is taking too much time almost 5days before I can start 
running my algorithms.

Can you please help me in suggesting right technology to choose for inputting 
data? So clearly DB is the bottleneck right now

Should I move away from postgresXL ? Which is most suitable options DB, File, 
Paraquet File to load data efficiently in R?

Look forward to your responses

Thanks
Prerna

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function for over dispersed poisson regression in the setting of a random effects model

2016-06-27 Thread Mark Podolsky
Hi John,

The Gamlss.mx  package can accommodate variables that follow 
negative binomial (and other) distributions in multilevel models.

Mark


> On Jun 25, 2016, at 11:00 PM, John Sorkin  wrote:
> 
> Is there a function that will run a model appropriate for over dispersed data 
> (such as a negative binomial or quasipoisson)
> with a random effects (or mixed effects) model in R? GLMER will not accept:  
> family=quasipoisson(link="log") or
> family=negbinomial(link="log") 
> 
> I want to run something like the following:
> fit0 <- glmer(Fall ~ 
> Group+(1|PID)+offset(log(TimeYrs)),family=quasipoisson(link="log"),data=data)
> Thank  you
> John
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and 
> Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing) 
> 
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:16}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.