Re: [R] grid: Grid graphics flickering

2013-08-18 Thread Paul Murrell

Hi

Two points:

1.  grid.remove() redraws the entire scene (and because the drawing is 
slow you see a flicker)


2.  Doing this ...

dev.hold()
grid.gremove("lastShape")
dev.flush()

... may reduce flicker for you.

Paul

On 07/25/13 06:10, Daniel Guetta wrote:

I'm designing an interactive plot using the `grid` package in R. As
part of the interactivity, I repeatedly delete and re-create various
parts of the plot. However, the total number of grid elements (as
obtained using the `grid.ls()` command) stays constant; everything I
create was previously removed.

The problem is as follows - once I've gone through a few cycles of
creation and deletion, every deletion I make to the graphic, however
small, causes all the interactive parts of the plot (those I've been
repeatedly deleting and creating) to flicker.

Here's the simplest example I could come up with - first run this code
to set up the `grid` graphic, and repeatedly delete and re-create
certain elements

###

 library(grid)

 pushViewport(viewport())

 for (x in seq(0, 1, length=5))
 {
 for (y in seq(0, 1, length=5))
 {
 pushViewport(viewport(x = x, y = y, width=1/5, height=1/5,
name=paste("foo", x, y, sep="")))
 grid.rect()

 pushViewport(viewport(x = 0, 0, width=1/4, height=1/4, name="bar1"))
 grid.circle(name="testing")
 grid.text("123")
 upViewport()

 pushViewport(viewport(x = 1, 0, width=1/4, height=1/4, name="bar2"))
 grid.circle(name="testing")
 grid.text("123")
 upViewport()

 pushViewport(viewport(x = 0, 1, width=1/4, height=1/4, name="bar3"))
 grid.circle(name="testing")
 grid.text("123")
 upViewport()

 pushViewport(viewport(x = 1, 1, width=1/4, height=1/4, name="bar4"))
 grid.circle(name="testing")
 grid.text("123")
 upViewport()

 upViewport()
 }
 }

 for (i in 1:10)
 {

 grid.gremove("testing")

 for (x in seq(0, 1, length=5))
 {
 for (y in seq(0, 1, length=5))
 {
 downViewport(paste("foo", x, y, sep=""))

 downViewport("bar1"); grid.circle(name="testing"); upViewport()
 downViewport("bar2"); grid.circle(name="testing"); upViewport()
 downViewport("bar3"); grid.circle(name="testing"); upViewport()
 downViewport("bar4"); grid.circle(name="testing"); upViewport()

 upViewport()
 }
 }

 }

###

Once this is all set up, create a new arbitrary square on the device

###
 grid.rect(height=0.5, width=0.5, gp=gpar(lty = 2), name = "lastShape")
###

Now try to delete it

###
 grid.gremove("lastShape")
###

Notice that when you run this last deletion command, all the small
circles that I've been creating and deleting flicker slightly, even
though I haven't touched them. This makes the entire graphic very
distracting.

The thing is, if I don't delete and re-create the original graphics so
many times, this doesn't happen! So I figure I must be leaving a trail
somewhere because of inefficient deleting.

Any ideas how to prevent that?

Thanks a million!

--

Daniel Guetta
PhD Candidate, Columbia University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
p...@stat.auckland.ac.nz
http://www.stat.auckland.ac.nz/~paul/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] image versus levelplot

2013-08-18 Thread Paul Murrell

Hi

You could try the 'useRaster' argument to levelplot()

Alternatively, if you are not doing much annotation (such as axes and 
labels) around the matrix-representations, you could possibly speed 
things up by calling grid.rect() directly.  Feel free to contact me 
directly if that sounds like an option.


Paul

On 07/16/13 09:55, Nicolas Servant wrote:

Dear R users,

I'm currently using the Graphics package to display several hundred of
matrix objects, using a layout and the image() function.
It works well except for large matrices (> 1000*1000) or for a large
number of matrices (there is a limitation around 400 if I remember well)
To solve these issues, I move to the Matrix package which is much more
efficient for large sparse matrix, and to the grid package for the plot
(using image from Matrix)
However, the levelplot() is much more slow than the graphics image()
function !!
Basically, with the image() from graphics, I'm able to plot 360 matrices
in 30 sec, against more than 10 minutes with levelplot().
I would appreciate if anyone has some experience on that, and could give
me some advice to efficiently use the grid package !?
Regards
Nicolas

 > i=1000
 > M1 <- Matrix(rnorm(i*i), ncol=i)
 > system.time(print(Matrix::image(M1)))
user  system elapsed
  11.320   0.064  11.406
 > M2 <- matrix(rnorm(i*i), ncol=i)
 > system.time(image(as.matrix(M2)))
user  system elapsed
   0.837   0.004   0.844







--
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
p...@stat.auckland.ac.nz
http://www.stat.auckland.ac.nz/~paul/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Amelia, Zelig and latex in R

2013-08-18 Thread Christopher Desjardins
Hi,
I am glad you could get it to work. I don't really know  I usually just
use xtable and any additional formatting I need done I do in my LaTeX
editor. Perhaps there isn't a nice tex format out of the box for MI data.
Once you write some nice code, you could keep reusing it or better yet
package it :)

On Sunday, August 18, 2013, Francesco Sarracino wrote:

> That's right!
> Your advice is in the right direction and with little adjustments it did
> the job. However, I admit it was tricky and the result looks a bit
> artisanal and needs some polishing that I will do by hand in the tex code.
> Is it possible that there is no way to get nicely latex formatted tables
> concerning multiply imputed data-set?
> But maybe I should open a separate thread on this.
> Thanks a lot for your kind and patient help.
> Best regards,
> f.
>
>
> On 18 August 2013 15:56, Christopher Desjardins 
> 
> > wrote:
>
>> Seems you're after the pooled results. Would the following work?
>>
>>
>>
>> library(Amelia)
>> library(Zelig)
>> library(xtable)
>>
>> data(africa)
>>
>> m = 10
>> imp1 <- amelia(x = africa,cs="country",m=m)
>> imp2 <- amelia(x = africa,cs="country",m=m)
>>  lm.imputed1 <- zelig(gdp_pc ~ trade + civlib, model="ls",data = imp1)
>> lm.imputed2 <- zelig(gdp_pc ~ trade + civlib, model="ls",data = imp2)
>>
>> lm1 <- as.data.frame(summary(lm.imputed1)$coef)
>> lm2 <- as.data.frame(summary(lm.imputed2)$coef)
>> lm1[,2] <- ifelse(lm1[,4]<.001,paste(lm1[,2],"***",sep=" "),
>>   ifelse(lm1[,4]<.01,paste(lm1[,2],"**",sep=" "),
>>  ifelse(lm1[,4]<.05,paste(lm1[,2],"*",sep=" "),
>> ifelse(lm1[,4]<.1,paste(lm1[,2],".",sep="
>> "),lm1[,2]
>>
>> lm2[,2] <- ifelse(lm2[,4]<.001,paste(lm2[,2],"***",sep=" "),
>>   ifelse(lm2[,4]<.01,paste(lm2[,2],"**",sep=" "),
>>  ifelse(lm2[,4]<.05,paste(lm2[,2],"*",sep=" "),
>> ifelse(lm2[,4]<.1,paste(lm2[,2],".",sep="
>> "),lm2[,2]
>>
>> ## ONE OPTIONS ##
>> lms <- as.data.frame(cbind(lm1[,1],lm2[,1],lm1[,2],lm2[,2]))
>> rownames(lms) <- rownames(lm1)
>> colnames(lms) <- c("Imp1.Est","Imp2.Est","Imp1.SE","Imp2.SE")
>> xtable(lms)
>>
>> ## OR ##
>> xtable(cbind(lm1[,1:2],lm2[,1:2]))
>>
>
>
>
> --
> Francesco Sarracino, Ph.D.
> fsarracino 
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] First time r user

2013-08-18 Thread jwd
On Sun, 18 Aug 2013 02:56:56 -0500
Paul Bernal  wrote:

Paul,

I would suggest acquiring at least a small library of of books about
R and reading them.  I would recommend An Introduction to R and R Data
Import/Export (both available online on the R Project Site in both pdf
and HTML), Introductory Statistics with R, Venables and Ripley, and R
in a Nutshell for starters.  It is pointless to answer some of these
questions since the answers are there for the taking.  You also should
look through the logs of previous discussions in the various R user
groups also accessible through the project site, since memory
limitations have often been discussed.

JWDougherty

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Amelia, Zelig and latex in R

2013-08-18 Thread Francesco Sarracino
That's right!
Your advice is in the right direction and with little adjustments it did
the job. However, I admit it was tricky and the result looks a bit
artisanal and needs some polishing that I will do by hand in the tex code.
Is it possible that there is no way to get nicely latex formatted tables
concerning multiply imputed data-set?
But maybe I should open a separate thread on this.
Thanks a lot for your kind and patient help.
Best regards,
f.


On 18 August 2013 15:56, Christopher Desjardins wrote:

> Seems you're after the pooled results. Would the following work?
>
>
>
> library(Amelia)
> library(Zelig)
> library(xtable)
>
> data(africa)
>
> m = 10
> imp1 <- amelia(x = africa,cs="country",m=m)
> imp2 <- amelia(x = africa,cs="country",m=m)
> lm.imputed1 <- zelig(gdp_pc ~ trade + civlib, model="ls",data = imp1)
> lm.imputed2 <- zelig(gdp_pc ~ trade + civlib, model="ls",data = imp2)
>
> lm1 <- as.data.frame(summary(lm.imputed1)$coef)
> lm2 <- as.data.frame(summary(lm.imputed2)$coef)
> lm1[,2] <- ifelse(lm1[,4]<.001,paste(lm1[,2],"***",sep=" "),
>   ifelse(lm1[,4]<.01,paste(lm1[,2],"**",sep=" "),
>  ifelse(lm1[,4]<.05,paste(lm1[,2],"*",sep=" "),
> ifelse(lm1[,4]<.1,paste(lm1[,2],".",sep="
> "),lm1[,2]
>
> lm2[,2] <- ifelse(lm2[,4]<.001,paste(lm2[,2],"***",sep=" "),
>   ifelse(lm2[,4]<.01,paste(lm2[,2],"**",sep=" "),
>  ifelse(lm2[,4]<.05,paste(lm2[,2],"*",sep=" "),
> ifelse(lm2[,4]<.1,paste(lm2[,2],".",sep="
> "),lm2[,2]
>
> ## ONE OPTIONS ##
> lms <- as.data.frame(cbind(lm1[,1],lm2[,1],lm1[,2],lm2[,2]))
> rownames(lms) <- rownames(lm1)
> colnames(lms) <- c("Imp1.Est","Imp2.Est","Imp1.SE","Imp2.SE")
> xtable(lms)
>
> ## OR ##
> xtable(cbind(lm1[,1:2],lm2[,1:2]))
>



-- 
Francesco Sarracino, Ph.D.
https://sites.google.com/site/fsarracino/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How can I do nonparametric regression with multivariate dependent variables.

2013-08-18 Thread Ying Zheng
I have tried several commands in "np" package, like "npregbw". I cannot
find a command that can apply to the case with multivariate dependent
variables.

My original problem is to evaluate several conditional moments
nonparametrically. That is E(y|x=x_0) with y a 2X1 variables. I cannot do
them separately because finally I also need to evaluate Var(y|x=x_0).

My thought is doing this:

1. evaluate E(y1|x=x_0) and E(y2|x=x_0) separately;
2. Var(y|x=x_0)=E(yy'|x=x_0)-E(y|x=x_0)E(y|x=x_0)', so I evaluate
E(yy'|x=x_0) by evaluating E(y1y1|x=x_0), E(y1y2|x=x_0), E(y2y2|x=x_0)
separately. Then put them in the right order to form a 2X2 matrix.

Am I right if I do this? And is there direct way to do this?

Thank you for suggestions.
-- 
Best Regards,
Zheng, Ying

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] First time r user

2013-08-18 Thread Steve Lianoglou
Yes, please do some reading first and give take a crack at your data first.

This will only be a fruitful endeavor for you after you get some
working knowledge of R.

Hadley is compiling a nice book online that I think is very helpful to
read through:
https://github.com/hadley/devtools/wiki/Introduction

The section on "functional looping patterns" will be immediately
useful (once you have a bit more background working with R):
http://github.com/hadley/devtools/wiki/functionals#looping-patterns

It's really a great resource and you should spend the time to read
through it. Once you read and understand the looping-patterns section,
you'll be able to handle your data like a pro and you can move on to
asking more interesting questions ;-)

If something is unclear there, though, please do raise that issue.

HTH,
-steve


On Sun, Aug 18, 2013 at 7:22 AM, Bert Gunter  wrote:
> This is ridiculous!
>
> Please read "An Introduction to R" (ships with R) or other online R
> tutorial. There are many good ones. There are also probably online
> courses. Please make an effort to learn the basics before posting
> further here.
>
> -- Bert
>
>
>
> On Sun, Aug 18, 2013 at 7:13 AM, Dylan Doyle  wrote:
>> Hello all thank-you for your speedy replies ,
>>
>> Here is the first few lines from the head function
>>
>>  brewery_idbrewery_name review_time review_overall review_aroma
>> review_appearance review_profilename
>> 1  10325 Vecchio Birraio  12348178231.5
>>  2.0   2.5stcules
>> 2  10325 Vecchio Birraio  12359150973.0
>>  2.5   3.0stcules
>> 3  10325 Vecchio Birraio  12359166043.0
>>  2.5   3.0stcules
>> 4  10325 Vecchio Birraio  12347251453.0
>>  3.0   3.5stcules
>> 5   1075 Caldera Brewing Company  12937352064.0
>>  4.5   4.0 johnmichaelsen
>> 6   1075 Caldera Brewing Company  13255246593.0
>>  3.5   3.5oline73
>>
>>beer_style review_palate review_taste  beer_name
>> beer_abv beer_beerid
>> 1 Hefeweizen   1.5  1.5   Sausa
>> Weizen  5.0   47986
>> 2 English Strong Ale   3.0  3.0
>> Red Moon  6.2   48213
>> 3 Foreign / Export Stout   3.0  3.0 Black Horse
>> Black Beer  6.5   48215
>> 4German Pilsener   2.5  3.0
>> Sausa Pils  5.0   47969
>> 5 American Double / Imperial IPA   4.0  4.5
>>  Cauldron DIPA  7.7   64883
>> 6   Herbed / Spiced Beer   3.0  3.5Caldera
>> Ginger Beer  4.7   52159
>>
>> '
>> I have only discovered how to import the data set , and run some basic r
>> functions on it my goal is to be able to answer questions like what are the
>> top 10 pilsner's , or the brewer with the highest abv average. Also using
>> two factors such as best beer aroma and appearance, which beer style should
>> I try. Let me know if i can give you any more information you might need to
>> help me.
>>
>> Thanks again ,
>>
>> Dylan
>>
>>>
>>
>>
>>
>> On Sun, Aug 18, 2013 at 4:16 AM, Paul Bernal  wrote:
>>
>>> Thank you so much Steve.
>>>
>>> The computer I'm currently working with is a 32 bit windows 7 OS. And RAM
>>> is only 4GB so I guess thats a big limitation.
>>> El 18/08/2013 03:11, "Steve Lianoglou" 
>>> escribió:
>>>
>>> > Hi Paul,
>>> >
>>> > On Sun, Aug 18, 2013 at 12:56 AM, Paul Bernal 
>>> > wrote:
>>> > > Thanks a lot for the valuable information.
>>> > >
>>> > > Now my question would necessarily be, how many columns can R handle,
>>> > > provided that I have millions of rows and, in general, whats the
>>> maximum
>>> > > amount of rows and columns that R can effortlessly handle?
>>> >
>>> > This is all determined by your RAM.
>>> >
>>> > Prior to R-3.0, R could only handle vectors of length 2^31 - 1. If you
>>> > were working with a matrix, that meant that you could only have that
>>> > many elements in the entire matrix.
>>> >
>>> > If you were working with a data.frame, you could have data.frames with
>>> > 2^31-1 rows, and I guess as many columns, since data.frames are really
>>> > a list of vectors, the entire thing doesn't have to be in one
>>> > contiguous block (and addressable that way)
>>> >
>>> > R-3.0 introduced "Long Vectors" (search for that section in the release
>>> > notes):
>>> >
>>> > https://stat.ethz.ch/pipermail/r-announce/2013/000561.html
>>> >
>>> > It almost doubles the size of a vector that R can handle (assuming you
>>> > are running 64bit). So, if you've got the RAM, you can have a
>>> > data.frame/data.table w/ billion(s) of rows, in theory.
>>> >
>>> > To figure out how much data you can handle on your machine, you need
>>> > to know the size of real/integer/whatever and the number of e

Re: [R] Error in Corpus() in tm package

2013-08-18 Thread Milan Bouchet-Valat
Le dimanche 18 août 2013 à 09:19 -0700, Ajinkya Kale a écrit :
> I did exactly what you mentioned... tried subset of these documents
> and found out there were some junk non-txt files which were causing
> this issue. Everything worked fine with dirsource once I deleted them
> from the dir.
> But I feel these functions should also tell what file they are failing
> at I have ended up debugging with sub sets of input one too many
> times. 
Good. Could you send us (or maybe privately to me) at least an excerpt
of the file that is enough to reproduce the bug? Indeed it would be nice
to get a more explicit error message from tm if possible.


Regards

> 
> On Aug 18, 2013 9:01 AM, "Milan Bouchet-Valat" 
> wrote:
> Le samedi 17 août 2013 à 11:16 -0700, Ajinkya Kale a écrit :
> > It contains all text files which were converted from doc,
> docx, ppt
> > etc. using libreoffice.
> > Some of them are non-english text documents.
> >
> >
> > Sorry I cannot share the corpus.. but if someone can shed
> light on
> > what might cause this error then I can try to eliminate
> those
> > documents if some specific docs are causing it.
> I think you should go the other way round: try with only one
> document
> and see if it works, and do enough attempts to find out in
> what cases it
> works and in what cases it fails. If it always fails, try with
> examples
> provided by tm, and then with parts of your documents.
> 
> I don't think it makes sense to try to use VectorSource() as
> it would
> imply reimplementing DirSource().
> 
> 
> Regards
> 
> > On Sat, Aug 17, 2013 at 9:55 AM, Milan Bouchet-Valat
> >  wrote:
> > Le vendredi 16 août 2013 à 19:35 -0700, Ajinkya Kale
> a écrit :
> > > I am trying to use the text mining package ... I
> keep
> > getting this error :
> > >
> > > rm(list=ls())
> > > library(tm)
> > > sourceDir <- "Z:\\projectk_viz\\docs_to_index"
> > > ovid <- Corpus(DirSource(sourceDir),readerControl
> =
> > list(language = "lat"))
> > >
> > > Error in if (vectorized && (length <= 0))
> stop("vectorized
> > sources must
> > > have positive length") : missing value where
> TRUE/FALSE
> > needed
> > >
> > > I am not sure what it means.
> >
> > The posting guide asks for a reproducible example.
> If you
> > cannot make
> > available to us the contents of sourceDir, at least
> you should
> > tell us
> > what kind of files it contains. Have you tried with
> only some
> > of the
> > files the directory contains ?
> >
> >
> > Regards
> >
> > > --ajinkya
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained,
> reproducible
> > code.
> >
> >
> >
> >
> >
> > --
> >
> > Sincerely,
> > Ajinkya
> > http://ajinkya.info
> >
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in Corpus() in tm package

2013-08-18 Thread Ajinkya Kale
I did exactly what you mentioned... tried subset of these documents and
found out there were some junk non-txt files which were causing this issue.
Everything worked fine with dirsource once I deleted them from the dir.
But I feel these functions should also tell what file they are failing
at I have ended up debugging with sub sets of input one too many times.
On Aug 18, 2013 9:01 AM, "Milan Bouchet-Valat"  wrote:

> Le samedi 17 août 2013 à 11:16 -0700, Ajinkya Kale a écrit :
> > It contains all text files which were converted from doc, docx, ppt
> > etc. using libreoffice.
> > Some of them are non-english text documents.
> >
> >
> > Sorry I cannot share the corpus.. but if someone can shed light on
> > what might cause this error then I can try to eliminate those
> > documents if some specific docs are causing it.
> I think you should go the other way round: try with only one document
> and see if it works, and do enough attempts to find out in what cases it
> works and in what cases it fails. If it always fails, try with examples
> provided by tm, and then with parts of your documents.
>
> I don't think it makes sense to try to use VectorSource() as it would
> imply reimplementing DirSource().
>
>
> Regards
>
> > On Sat, Aug 17, 2013 at 9:55 AM, Milan Bouchet-Valat
> >  wrote:
> > Le vendredi 16 août 2013 à 19:35 -0700, Ajinkya Kale a écrit :
> > > I am trying to use the text mining package ... I keep
> > getting this error :
> > >
> > > rm(list=ls())
> > > library(tm)
> > > sourceDir <- "Z:\\projectk_viz\\docs_to_index"
> > > ovid <- Corpus(DirSource(sourceDir),readerControl =
> > list(language = "lat"))
> > >
> > > Error in if (vectorized && (length <= 0)) stop("vectorized
> > sources must
> > > have positive length") : missing value where TRUE/FALSE
> > needed
> > >
> > > I am not sure what it means.
> >
> > The posting guide asks for a reproducible example. If you
> > cannot make
> > available to us the contents of sourceDir, at least you should
> > tell us
> > what kind of files it contains. Have you tried with only some
> > of the
> > files the directory contains ?
> >
> >
> > Regards
> >
> > > --ajinkya
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
> > code.
> >
> >
> >
> >
> >
> > --
> >
> > Sincerely,
> > Ajinkya
> > http://ajinkya.info
> >
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] First time r user

2013-08-18 Thread Bert Gunter
This is ridiculous!

Please read "An Introduction to R" (ships with R) or other online R
tutorial. There are many good ones. There are also probably online
courses. Please make an effort to learn the basics before posting
further here.

-- Bert



On Sun, Aug 18, 2013 at 7:13 AM, Dylan Doyle  wrote:
> Hello all thank-you for your speedy replies ,
>
> Here is the first few lines from the head function
>
>  brewery_idbrewery_name review_time review_overall review_aroma
> review_appearance review_profilename
> 1  10325 Vecchio Birraio  12348178231.5
>  2.0   2.5stcules
> 2  10325 Vecchio Birraio  12359150973.0
>  2.5   3.0stcules
> 3  10325 Vecchio Birraio  12359166043.0
>  2.5   3.0stcules
> 4  10325 Vecchio Birraio  12347251453.0
>  3.0   3.5stcules
> 5   1075 Caldera Brewing Company  12937352064.0
>  4.5   4.0 johnmichaelsen
> 6   1075 Caldera Brewing Company  13255246593.0
>  3.5   3.5oline73
>
>beer_style review_palate review_taste  beer_name
> beer_abv beer_beerid
> 1 Hefeweizen   1.5  1.5   Sausa
> Weizen  5.0   47986
> 2 English Strong Ale   3.0  3.0
> Red Moon  6.2   48213
> 3 Foreign / Export Stout   3.0  3.0 Black Horse
> Black Beer  6.5   48215
> 4German Pilsener   2.5  3.0
> Sausa Pils  5.0   47969
> 5 American Double / Imperial IPA   4.0  4.5
>  Cauldron DIPA  7.7   64883
> 6   Herbed / Spiced Beer   3.0  3.5Caldera
> Ginger Beer  4.7   52159
>
> '
> I have only discovered how to import the data set , and run some basic r
> functions on it my goal is to be able to answer questions like what are the
> top 10 pilsner's , or the brewer with the highest abv average. Also using
> two factors such as best beer aroma and appearance, which beer style should
> I try. Let me know if i can give you any more information you might need to
> help me.
>
> Thanks again ,
>
> Dylan
>
>>
>
>
>
> On Sun, Aug 18, 2013 at 4:16 AM, Paul Bernal  wrote:
>
>> Thank you so much Steve.
>>
>> The computer I'm currently working with is a 32 bit windows 7 OS. And RAM
>> is only 4GB so I guess thats a big limitation.
>> El 18/08/2013 03:11, "Steve Lianoglou" 
>> escribió:
>>
>> > Hi Paul,
>> >
>> > On Sun, Aug 18, 2013 at 12:56 AM, Paul Bernal 
>> > wrote:
>> > > Thanks a lot for the valuable information.
>> > >
>> > > Now my question would necessarily be, how many columns can R handle,
>> > > provided that I have millions of rows and, in general, whats the
>> maximum
>> > > amount of rows and columns that R can effortlessly handle?
>> >
>> > This is all determined by your RAM.
>> >
>> > Prior to R-3.0, R could only handle vectors of length 2^31 - 1. If you
>> > were working with a matrix, that meant that you could only have that
>> > many elements in the entire matrix.
>> >
>> > If you were working with a data.frame, you could have data.frames with
>> > 2^31-1 rows, and I guess as many columns, since data.frames are really
>> > a list of vectors, the entire thing doesn't have to be in one
>> > contiguous block (and addressable that way)
>> >
>> > R-3.0 introduced "Long Vectors" (search for that section in the release
>> > notes):
>> >
>> > https://stat.ethz.ch/pipermail/r-announce/2013/000561.html
>> >
>> > It almost doubles the size of a vector that R can handle (assuming you
>> > are running 64bit). So, if you've got the RAM, you can have a
>> > data.frame/data.table w/ billion(s) of rows, in theory.
>> >
>> > To figure out how much data you can handle on your machine, you need
>> > to know the size of real/integer/whatever and the number of elements
>> > of those you will have so you can calculate the amount of RAM you need
>> > to load it all up.
>> >
>> > Lastly, I should mention there are packages that let you work with
>> > "out of memory" data, like bigmemory, biglm, ff. Look at the HPC Task
>> > view for more info along those lines:
>> >
>> > http://cran.r-project.org/web/views/HighPerformanceComputing.html
>> >
>> >
>> > >
>> > > Best regards and again thank you for the help,
>> > >
>> > > Paul
>> > > El 18/08/2013 02:35, "Steve Lianoglou" 
>> > escribió:
>> > >
>> > >> Hi Paul,
>> > >>
>> > >> First: please keep your replies on list (use reply-all when replying
>> > >> to R-help lists) so that others can help but also the lists can be
>> > >> used as a resource for others.
>> > >>
>> > >> Now:
>> > >>
>> > >> On Aug 18, 2013, at 12:20 AM, Paul Bernal 
>> > wrote:
>> > >>
>> > >> > Can R really handle millions of rows of data?
>> > >>
>> > >> Yup.
>> > >>
>> > >> > I thought it was not possible.
>> > >>
>> > 

Re: [R] First time r user

2013-08-18 Thread Dylan Doyle
Hello all thank-you for your speedy replies ,

Here is the first few lines from the head function

 brewery_idbrewery_name review_time review_overall review_aroma
review_appearance review_profilename
1  10325 Vecchio Birraio  12348178231.5
 2.0   2.5stcules
2  10325 Vecchio Birraio  12359150973.0
 2.5   3.0stcules
3  10325 Vecchio Birraio  12359166043.0
 2.5   3.0stcules
4  10325 Vecchio Birraio  12347251453.0
 3.0   3.5stcules
5   1075 Caldera Brewing Company  12937352064.0
 4.5   4.0 johnmichaelsen
6   1075 Caldera Brewing Company  13255246593.0
 3.5   3.5oline73

   beer_style review_palate review_taste  beer_name
beer_abv beer_beerid
1 Hefeweizen   1.5  1.5   Sausa
Weizen  5.0   47986
2 English Strong Ale   3.0  3.0
Red Moon  6.2   48213
3 Foreign / Export Stout   3.0  3.0 Black Horse
Black Beer  6.5   48215
4German Pilsener   2.5  3.0
Sausa Pils  5.0   47969
5 American Double / Imperial IPA   4.0  4.5
 Cauldron DIPA  7.7   64883
6   Herbed / Spiced Beer   3.0  3.5Caldera
Ginger Beer  4.7   52159

'
I have only discovered how to import the data set , and run some basic r
functions on it my goal is to be able to answer questions like what are the
top 10 pilsner's , or the brewer with the highest abv average. Also using
two factors such as best beer aroma and appearance, which beer style should
I try. Let me know if i can give you any more information you might need to
help me.

Thanks again ,

Dylan

>



On Sun, Aug 18, 2013 at 4:16 AM, Paul Bernal  wrote:

> Thank you so much Steve.
>
> The computer I'm currently working with is a 32 bit windows 7 OS. And RAM
> is only 4GB so I guess thats a big limitation.
> El 18/08/2013 03:11, "Steve Lianoglou" 
> escribió:
>
> > Hi Paul,
> >
> > On Sun, Aug 18, 2013 at 12:56 AM, Paul Bernal 
> > wrote:
> > > Thanks a lot for the valuable information.
> > >
> > > Now my question would necessarily be, how many columns can R handle,
> > > provided that I have millions of rows and, in general, whats the
> maximum
> > > amount of rows and columns that R can effortlessly handle?
> >
> > This is all determined by your RAM.
> >
> > Prior to R-3.0, R could only handle vectors of length 2^31 - 1. If you
> > were working with a matrix, that meant that you could only have that
> > many elements in the entire matrix.
> >
> > If you were working with a data.frame, you could have data.frames with
> > 2^31-1 rows, and I guess as many columns, since data.frames are really
> > a list of vectors, the entire thing doesn't have to be in one
> > contiguous block (and addressable that way)
> >
> > R-3.0 introduced "Long Vectors" (search for that section in the release
> > notes):
> >
> > https://stat.ethz.ch/pipermail/r-announce/2013/000561.html
> >
> > It almost doubles the size of a vector that R can handle (assuming you
> > are running 64bit). So, if you've got the RAM, you can have a
> > data.frame/data.table w/ billion(s) of rows, in theory.
> >
> > To figure out how much data you can handle on your machine, you need
> > to know the size of real/integer/whatever and the number of elements
> > of those you will have so you can calculate the amount of RAM you need
> > to load it all up.
> >
> > Lastly, I should mention there are packages that let you work with
> > "out of memory" data, like bigmemory, biglm, ff. Look at the HPC Task
> > view for more info along those lines:
> >
> > http://cran.r-project.org/web/views/HighPerformanceComputing.html
> >
> >
> > >
> > > Best regards and again thank you for the help,
> > >
> > > Paul
> > > El 18/08/2013 02:35, "Steve Lianoglou" 
> > escribió:
> > >
> > >> Hi Paul,
> > >>
> > >> First: please keep your replies on list (use reply-all when replying
> > >> to R-help lists) so that others can help but also the lists can be
> > >> used as a resource for others.
> > >>
> > >> Now:
> > >>
> > >> On Aug 18, 2013, at 12:20 AM, Paul Bernal 
> > wrote:
> > >>
> > >> > Can R really handle millions of rows of data?
> > >>
> > >> Yup.
> > >>
> > >> > I thought it was not possible.
> > >>
> > >> Surprise :-)
> > >>
> > >> As I type, I'm working with a ~5.5 million row data.table pretty
> > >> effortlessly.
> > >>
> > >> Columns matter too, of course -- RAM is RAM, after all and you've got
> > >> to be able to fit the whole thing into it if you want to use
> > >> data.table. Once loaded, though, data.table enables one to do
> > >> split/apply/combine calculations over these data quite efficiently.
> > >> The first time I used it, I was honestly blown aw

Re: [R] Amelia, Zelig and latex in R

2013-08-18 Thread Christopher Desjardins
Seems you're after the pooled results. Would the following work?



library(Amelia)
library(Zelig)
library(xtable)

data(africa)

m = 10
imp1 <- amelia(x = africa,cs="country",m=m)
imp2 <- amelia(x = africa,cs="country",m=m)
lm.imputed1 <- zelig(gdp_pc ~ trade + civlib, model="ls",data = imp1)
lm.imputed2 <- zelig(gdp_pc ~ trade + civlib, model="ls",data = imp2)

lm1 <- as.data.frame(summary(lm.imputed1)$coef)
lm2 <- as.data.frame(summary(lm.imputed2)$coef)
lm1[,2] <- ifelse(lm1[,4]<.001,paste(lm1[,2],"***",sep=" "),
  ifelse(lm1[,4]<.01,paste(lm1[,2],"**",sep=" "),
 ifelse(lm1[,4]<.05,paste(lm1[,2],"*",sep=" "),
ifelse(lm1[,4]<.1,paste(lm1[,2],".",sep="
"),lm1[,2]

lm2[,2] <- ifelse(lm2[,4]<.001,paste(lm2[,2],"***",sep=" "),
  ifelse(lm2[,4]<.01,paste(lm2[,2],"**",sep=" "),
 ifelse(lm2[,4]<.05,paste(lm2[,2],"*",sep=" "),
ifelse(lm2[,4]<.1,paste(lm2[,2],".",sep="
"),lm2[,2]

## ONE OPTIONS ##
lms <- as.data.frame(cbind(lm1[,1],lm2[,1],lm1[,2],lm2[,2]))
rownames(lms) <- rownames(lm1)
colnames(lms) <- c("Imp1.Est","Imp2.Est","Imp1.SE","Imp2.SE")
xtable(lms)

## OR ##
xtable(cbind(lm1[,1:2],lm2[,1:2]))

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in Corpus() in tm package

2013-08-18 Thread Milan Bouchet-Valat
Le samedi 17 août 2013 à 11:16 -0700, Ajinkya Kale a écrit :
> It contains all text files which were converted from doc, docx, ppt
> etc. using libreoffice. 
> Some of them are non-english text documents.
> 
> 
> Sorry I cannot share the corpus.. but if someone can shed light on
> what might cause this error then I can try to eliminate those
> documents if some specific docs are causing it.
I think you should go the other way round: try with only one document
and see if it works, and do enough attempts to find out in what cases it
works and in what cases it fails. If it always fails, try with examples
provided by tm, and then with parts of your documents.

I don't think it makes sense to try to use VectorSource() as it would
imply reimplementing DirSource().


Regards

> On Sat, Aug 17, 2013 at 9:55 AM, Milan Bouchet-Valat
>  wrote:
> Le vendredi 16 août 2013 à 19:35 -0700, Ajinkya Kale a écrit :
> > I am trying to use the text mining package ... I keep
> getting this error :
> >
> > rm(list=ls())
> > library(tm)
> > sourceDir <- "Z:\\projectk_viz\\docs_to_index"
> > ovid <- Corpus(DirSource(sourceDir),readerControl =
> list(language = "lat"))
> >
> > Error in if (vectorized && (length <= 0)) stop("vectorized
> sources must
> > have positive length") : missing value where TRUE/FALSE
> needed
> >
> > I am not sure what it means.
> 
> The posting guide asks for a reproducible example. If you
> cannot make
> available to us the contents of sourceDir, at least you should
> tell us
> what kind of files it contains. Have you tried with only some
> of the
> files the directory contains ?
> 
> 
> Regards
> 
> > --ajinkya
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible
> code.
> 
> 
> 
> 
> 
> -- 
> 
> Sincerely,
> Ajinkya
> http://ajinkya.info
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] escaping % in Rd files

2013-08-18 Thread Jim Lemon

Hi all,
In trying to write an Rd file for a new package, I was stumped at 
something that is probably quite simple. I have % characters in the 
examples of one Rd file. Both my previous experience and some searching 
argeed that one can escape % with \. This worked on this command:


  fh_dates<-
   as.Date(paste(florida_hurr20$day,florida_hurr20$month,
   florida_hurr20$year,sep="-"),"\%d-\%b-\%Y")

which is okay in the HTML help page but it didn't work on another line 
in the same file:


  findval<-function(x,set) return(which(set \%in\% x))

The second line was cut off at the first % character in the HTML help 
page. After lots of head scratching I noticed that the first line was 
broken after the assignment so I changed the second line to:


  findval<-
   function(x,set) return(which(set \%in\% x))

and it worked properly! I don't know whether this qualifies as something 
interesting to the core team, but I thought I would let you know.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Amelia, Zelig and latex in R

2013-08-18 Thread Francesco Sarracino
Dear Christopher,

I have multiply imputed two data-set (let's say Africa1 and Africa2).
Now I run 1 regression (let's call it: reg1) using the imputed data from
Africa1 and 1 regression  (let's call it: reg2) using the imputed data from
Africa2. For these 2 regressions I use Zelig that automatically takes into
account the fact that Africa1 and Africa2 are multiply imputed data-sets.
Now, my problem is to show the results of these 2 regressions in 1 latex
table where the first column contains the name of the variables, the second
column lists the coefficients (and eventually S.E. and asterisks) from reg1
and the third column lists the same information for reg2.
As I wrote, I tried the following:
reg1 <- zelig(Y ~ X + Z, model="ls",data = Africa1)
reg2 <- zelig(Y ~ X + Z, model="ls",data = Africa2)

and according to stargazer syntax I should type something like:
stargazer(reg1, reg2, summary = FALSE)
but then I get my error:
Error: Unrecognized object type.

Any ideas?
thanks,
f.


On 17 August 2013 18:41, Christopher Desjardins wrote:

> Oh and are you looking for just the summarized results over all the
> imputed runs? i thought you wanted them from each iteration.
>
>
>
> On Sat, Aug 17, 2013 at 11:38 AM, Christopher Desjardins <
> cddesjard...@gmail.com> wrote:
>
>> What do you mean by results? Do you want just the estimated parameters?
>> And are you looking for one big table with all the estimated parameters
>> from all imputation runs?
>>
>> Chris
>>
>>
>> On Sat, Aug 17, 2013 at 11:18 AM, Francesco Sarracino <
>> f.sarrac...@gmail.com> wrote:
>>
>>> Hi Christopher,
>>> thanks for your reply. Unfortunately, that's not what I am looking for.
>>> I would like to have a table with the results of the two models
>>> (lm.imputed1 and lm.imputed2) in two separate columns.
>>> According to stargazer syntax I should type something like:
>>> stargazer(lm.imputed1, lm.imputed2, summary = FALSE)
>>> but then I get my error:
>>> Error: Unrecognized object type.
>>>
>>>  Even though your example is insightful, I  can't  figure out how to
>>> solve my problem.
>>> Any advice is very welcome.
>>> Regards,
>>> f.
>>>
>>>
>>> On 17 August 2013 17:02, Christopher Desjardins 
>>> wrote:
>>>
 Does this do what you want?

 library(Amelia)
 library(Zelig)
 library(stargazer)
 library(xtable)

 data(africa)

  m = 10
 imp1 <- amelia(x = africa,cs="country",m=m)
 imp2 <- amelia(x = africa,cs="country",m=m)
 lm.imputed1 <- zelig(infl ~ trade + civlib, model="ls",data = imp1)
 lm.imputed2 <- zelig(infl ~ trade + civlib, model="ls",data = imp2)

 # Stargazer
 for(i in 1:m){

 print(stargazer(as.data.frame(summary(lm.imputed1[[i]])$coef),as.data.frame(summary(lm.imputed2[[i]])$coef)))
 }

 # xtable
 for(i in 1:m){
   print(xtable(summary(lm.imputed1[[i]])))
   print(xtable(summary(lm.imputed2[[i]])))
 }


 On Sat, Aug 17, 2013 at 6:37 AM, Francesco Sarracino <
 f.sarrac...@gmail.com> wrote:

> Dear listers,
>
> I am running some OLS on multiply imputed data using Amelia.
> I first imputed the data with Amelia.
> than I run a OLS using Zelig to obtain a table of results accounting
> for
> the multiply imputed data-sets. And I'd like to do this for various
> models.
> Finally, I want to output all the models in a table of results for
> latex.
>
> I've tried   with  Stargazer because it seems to support Zelig output,
> but
> when I run stargazer on a set of objects containing the output of
> zelig, I
> get the following error: Error: unrecognized object type.
>
> this message is repeated for each model I passed to Stargazer.
>
> I am sorry I can't provide a working example, because I should make up
> some
> multiply imputed data first. Hoewever, summarizing what I did is:
>
> imputed1 <- amelia(x=data1, m=10)
> imputed2 <- amelia(x=data2, m=10)
> lm.imputed1 <- zelig(Y ~ X + Z, data = imputed1)
> lm.imputed2 <- zelig(Y ~ X + Z, data = imputed2)
> stargazer(lm.imputed1, lm.imputed2)
> The outcome is the error I mentioned above.
> Thanks in advance for all the support you can offer.
> Regards,
> f.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


>>>
>>>
>>> --
>>> Francesco Sarracino, Ph.D.
>>> https://sites.google.com/site/fsarracino/
>>>
>>
>>
>


-- 
Francesco Sarracino, Ph.D.
https://sites.google.com/site/fsarracino/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://s

Re: [R] First time r user

2013-08-18 Thread Paul Bernal
Thank you so much Steve.

The computer I'm currently working with is a 32 bit windows 7 OS. And RAM
is only 4GB so I guess thats a big limitation.
El 18/08/2013 03:11, "Steve Lianoglou"  escribió:

> Hi Paul,
>
> On Sun, Aug 18, 2013 at 12:56 AM, Paul Bernal 
> wrote:
> > Thanks a lot for the valuable information.
> >
> > Now my question would necessarily be, how many columns can R handle,
> > provided that I have millions of rows and, in general, whats the maximum
> > amount of rows and columns that R can effortlessly handle?
>
> This is all determined by your RAM.
>
> Prior to R-3.0, R could only handle vectors of length 2^31 - 1. If you
> were working with a matrix, that meant that you could only have that
> many elements in the entire matrix.
>
> If you were working with a data.frame, you could have data.frames with
> 2^31-1 rows, and I guess as many columns, since data.frames are really
> a list of vectors, the entire thing doesn't have to be in one
> contiguous block (and addressable that way)
>
> R-3.0 introduced "Long Vectors" (search for that section in the release
> notes):
>
> https://stat.ethz.ch/pipermail/r-announce/2013/000561.html
>
> It almost doubles the size of a vector that R can handle (assuming you
> are running 64bit). So, if you've got the RAM, you can have a
> data.frame/data.table w/ billion(s) of rows, in theory.
>
> To figure out how much data you can handle on your machine, you need
> to know the size of real/integer/whatever and the number of elements
> of those you will have so you can calculate the amount of RAM you need
> to load it all up.
>
> Lastly, I should mention there are packages that let you work with
> "out of memory" data, like bigmemory, biglm, ff. Look at the HPC Task
> view for more info along those lines:
>
> http://cran.r-project.org/web/views/HighPerformanceComputing.html
>
>
> >
> > Best regards and again thank you for the help,
> >
> > Paul
> > El 18/08/2013 02:35, "Steve Lianoglou" 
> escribió:
> >
> >> Hi Paul,
> >>
> >> First: please keep your replies on list (use reply-all when replying
> >> to R-help lists) so that others can help but also the lists can be
> >> used as a resource for others.
> >>
> >> Now:
> >>
> >> On Aug 18, 2013, at 12:20 AM, Paul Bernal 
> wrote:
> >>
> >> > Can R really handle millions of rows of data?
> >>
> >> Yup.
> >>
> >> > I thought it was not possible.
> >>
> >> Surprise :-)
> >>
> >> As I type, I'm working with a ~5.5 million row data.table pretty
> >> effortlessly.
> >>
> >> Columns matter too, of course -- RAM is RAM, after all and you've got
> >> to be able to fit the whole thing into it if you want to use
> >> data.table. Once loaded, though, data.table enables one to do
> >> split/apply/combine calculations over these data quite efficiently.
> >> The first time I used it, I was honestly blown away.
> >>
> >> If you find yourself wanting to work with such data, you could do
> >> worse than read through data.table's vignette and FAQ and give it a
> >> spin.
> >>
> >> HTH,
> >>
> >> -steve
> >>
> >> --
> >> Steve Lianoglou
> >> Computational Biologist
> >> Bioinformatics and Computational Biology
> >> Genentech
> >>
> >
> > [[alternative HTML version deleted]]
> >
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Steve Lianoglou
> Computational Biologist
> Bioinformatics and Computational Biology
> Genentech
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] First time r user

2013-08-18 Thread Steve Lianoglou
Hi Paul,

On Sun, Aug 18, 2013 at 12:56 AM, Paul Bernal  wrote:
> Thanks a lot for the valuable information.
>
> Now my question would necessarily be, how many columns can R handle,
> provided that I have millions of rows and, in general, whats the maximum
> amount of rows and columns that R can effortlessly handle?

This is all determined by your RAM.

Prior to R-3.0, R could only handle vectors of length 2^31 - 1. If you
were working with a matrix, that meant that you could only have that
many elements in the entire matrix.

If you were working with a data.frame, you could have data.frames with
2^31-1 rows, and I guess as many columns, since data.frames are really
a list of vectors, the entire thing doesn't have to be in one
contiguous block (and addressable that way)

R-3.0 introduced "Long Vectors" (search for that section in the release notes):

https://stat.ethz.ch/pipermail/r-announce/2013/000561.html

It almost doubles the size of a vector that R can handle (assuming you
are running 64bit). So, if you've got the RAM, you can have a
data.frame/data.table w/ billion(s) of rows, in theory.

To figure out how much data you can handle on your machine, you need
to know the size of real/integer/whatever and the number of elements
of those you will have so you can calculate the amount of RAM you need
to load it all up.

Lastly, I should mention there are packages that let you work with
"out of memory" data, like bigmemory, biglm, ff. Look at the HPC Task
view for more info along those lines:

http://cran.r-project.org/web/views/HighPerformanceComputing.html


>
> Best regards and again thank you for the help,
>
> Paul
> El 18/08/2013 02:35, "Steve Lianoglou"  escribió:
>
>> Hi Paul,
>>
>> First: please keep your replies on list (use reply-all when replying
>> to R-help lists) so that others can help but also the lists can be
>> used as a resource for others.
>>
>> Now:
>>
>> On Aug 18, 2013, at 12:20 AM, Paul Bernal  wrote:
>>
>> > Can R really handle millions of rows of data?
>>
>> Yup.
>>
>> > I thought it was not possible.
>>
>> Surprise :-)
>>
>> As I type, I'm working with a ~5.5 million row data.table pretty
>> effortlessly.
>>
>> Columns matter too, of course -- RAM is RAM, after all and you've got
>> to be able to fit the whole thing into it if you want to use
>> data.table. Once loaded, though, data.table enables one to do
>> split/apply/combine calculations over these data quite efficiently.
>> The first time I used it, I was honestly blown away.
>>
>> If you find yourself wanting to work with such data, you could do
>> worse than read through data.table's vignette and FAQ and give it a
>> spin.
>>
>> HTH,
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Computational Biologist
>> Bioinformatics and Computational Biology
>> Genentech
>>
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] latin1 encoding in WriteXLS

2013-08-18 Thread Rainer Hurling
[maintainer CC'ed]


Am 17.08.2013 11:28, schrieb Hugo Varet:
> Yes, it also occurs with WriteXLS version 3.2.1.
> 
> This test on several computers always leads to the same error.

Oops, sorry. I just realised that this happens on both Windows and Unix
alikes. On Win7 I am using ActivePerl 5.16.3 (X64).

The relevant perl scripts (WriteXLS.pl and Encode.pm) seem to be the
same on Unix/Linux and on Windows.

The first lines of the temporary file '1.csv', from which the xls should
be created, looks like:

"Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"
"WRITEXLS COMMENT: ","WRITEXLS COMMENT: ","WRITEXLS COMMENT: ","WRITEXLS
COMMENT: ","WRITEXLS COMMENT: "
"5.1","3.5","1.4","0.2","setosa"
"4.9","3","1.4","0.2","setosa"
"4.7","3.2","1.3","0.2","setosa"
"4.6","3.1","1.5","0.2","setosa"
"5","3.6","1.4","0.2","setosa"
"5.4","3.9","1.7","0.4","setosa"
"4.6","3.4","1.4","0.3","setosa"
"5","3.4","1.5","0.2","setosa"
"4.4","2.9","1.4","0.2","setosa"
"4.9","3.1","1.5","0.1","setosa"
"5.4","3.7","1.5","0.2","setosa"
"4.8","3.4","1.6","0.2","setosa"
"4.8","3","1.4","0.1","setosa"
[..]


The sequence to call the converting perl script from WriteXLS by
system() is:

cmd <- paste(perl, " -I", shQuote(Perl.Path), " ", shQuote(Fn.Path),
" --CSVPath ", shQuote(Tmp.Dir), " --verbose ", verbose,
" --AdjWidth ", AdjWidth, " --AutoFilter ", AutoFilter, "
--BoldHeaderRow ",
BoldHeaderRow, " --FreezeRow ", FreezeRow, " --FreezeCol ",
FreezeCol, " --Encoding ", Encoding, " ", shQuote(ExcelFileName),
sep = "")


WriteXLS is calling the perl code by 'Result <- system(cmd)':

"perl -I'/usr/local/lib/R/library/WriteXLS/Perl'
'/usr/local/lib/R/library/WriteXLS/Perl/WriteXLS.pl' --CSVPath
'/tmp/RtmpFpgjq6/WriteXLS' --verbose FALSE --AdjWidth FALSE --AutoFilter
FALSE --BoldHeaderRow FALSE --FreezeRow 0 --FreezeCol 0 --Encoding
latin1 '/usr/home/rhurlin/iris.xls'"


In Perl, '/usr/local/lib/R/library/WriteXLS/Perl/WriteXLS.pl' is calling
'/usr/local/lib/R/library/WriteXLS/Perl/Encode.pm' to decode the csv
file (in this case iso8859-1) and encode again in uft8, if needed.

It seems to me that 'sub decode' in Encode.pm is doing something wrong.
Unfortunately I do not understand what is going on here. Perhaps the
Marc as maintainer has an idea?

Regards,
Rainer Hurling


sessionInfo()
R Under development (unstable) (2013-08-15 r63591)
Platform: amd64-portbld-freebsd10.0 (64-bit)

locale:
[1] C/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] WriteXLS_3.2.1

loaded via a namespace (and not attached):
[1] tools_3.1.0


> 
> Hugo Varet
> 
> 
> 
> 2013/8/17 Rainer Hurling 
> 
>> Am 13.08.2013 19:40, schrieb Hugo Varet:
>>> Dear R users,
>>>
>>> I've just updated the WriteXLS package (on R 3.0.1) and I now have an
>> error
>>> when exporting a data.frame with the argument Encoding="latin1". For
>>> example, these two lines work:
>>>library(WriteXLS)
>>>WriteXLS("iris", "iris.xls")
>>> whereas these ones don't work:
>>>   library(WriteXLS)
>>>   WriteXLS("iris", "irislatin1.xls",Encoding="latin1")
>>> I get this message:
>>> Argument "Sepal.Length" isn't numeric in subroutine entry at
>>> C:/Perl64/lib/Encode.pm line 217,  line 1.
>>> Modification of a read-only value attempted at C:/Perl64/lib/Encode.pm
>> line
>>> 218,  line 1.
>>> The Perl script 'WriteXLS.pl' failed to run successfully.
>>> Message d'avis :
>>> l'exécution de la commande 'perl
>>> -I"C:/Users/varet/Documents/R/win-library/3.0/WriteXLS/Perl"
>>> "C:/Users/varet/Documents/R/win-library/3.0/WriteXLS/Perl/WriteXLS.pl"
>>> --CSVPath "C:\Users\varet\AppData\Local\Temp\RtmpEzqFNz/WriteXLS"
>> --verbose
>>> FALSE --AdjWidth FALSE --AutoFilter FALSE --BoldHeaderRow FALSE
>> --FreezeRow
>>> 0 --FreezeCol 0 --Encoding latin1
>> "C:\Users\varet\Desktop\irislatin1.xls"'
>>> renvoie un statut 255
>>>
>>> Does anyone know why it failed? May it be a problem with Perl?
>>>
>>> Thanks for your help,
>>>
>>> Hugo Varet
>>
>> Does this also occur with WriteXLS version 3.2.1 ?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] First time r user

2013-08-18 Thread Paul Bernal
Thanks a lot for the valuable information.

Now my question would necessarily be, how many columns can R handle,
provided that I have millions of rows and, in general, whats the maximum
amount of rows and columns that R can effortlessly handle?

Best regards and again thank you for the help,

Paul
El 18/08/2013 02:35, "Steve Lianoglou"  escribió:

> Hi Paul,
>
> First: please keep your replies on list (use reply-all when replying
> to R-help lists) so that others can help but also the lists can be
> used as a resource for others.
>
> Now:
>
> On Aug 18, 2013, at 12:20 AM, Paul Bernal  wrote:
>
> > Can R really handle millions of rows of data?
>
> Yup.
>
> > I thought it was not possible.
>
> Surprise :-)
>
> As I type, I'm working with a ~5.5 million row data.table pretty
> effortlessly.
>
> Columns matter too, of course -- RAM is RAM, after all and you've got
> to be able to fit the whole thing into it if you want to use
> data.table. Once loaded, though, data.table enables one to do
> split/apply/combine calculations over these data quite efficiently.
> The first time I used it, I was honestly blown away.
>
> If you find yourself wanting to work with such data, you could do
> worse than read through data.table's vignette and FAQ and give it a
> spin.
>
> HTH,
>
> -steve
>
> --
> Steve Lianoglou
> Computational Biologist
> Bioinformatics and Computational Biology
> Genentech
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] First time r user

2013-08-18 Thread Steve Lianoglou
Hi Paul,

First: please keep your replies on list (use reply-all when replying
to R-help lists) so that others can help but also the lists can be
used as a resource for others.

Now:

On Aug 18, 2013, at 12:20 AM, Paul Bernal  wrote:

> Can R really handle millions of rows of data?

Yup.

> I thought it was not possible.

Surprise :-)

As I type, I'm working with a ~5.5 million row data.table pretty effortlessly.

Columns matter too, of course -- RAM is RAM, after all and you've got
to be able to fit the whole thing into it if you want to use
data.table. Once loaded, though, data.table enables one to do
split/apply/combine calculations over these data quite efficiently.
The first time I used it, I was honestly blown away.

If you find yourself wanting to work with such data, you could do
worse than read through data.table's vignette and FAQ and give it a
spin.

HTH,

-steve

--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.