Hello all thank-you for your speedy replies ,

Here is the first few lines from the head function

 brewery_id            brewery_name review_time review_overall review_aroma
review_appearance review_profilename
1      10325         Vecchio Birraio  1234817823            1.5
 2.0               2.5            stcules
2      10325         Vecchio Birraio  1235915097            3.0
 2.5               3.0            stcules
3      10325         Vecchio Birraio  1235916604            3.0
 2.5               3.0            stcules
4      10325         Vecchio Birraio  1234725145            3.0
 3.0               3.5            stcules
5       1075 Caldera Brewing Company  1293735206            4.0
 4.5               4.0     johnmichaelsen
6       1075 Caldera Brewing Company  1325524659            3.0
 3.5               3.5            oline73

       beer_style review_palate review_taste              beer_name
beer_abv beer_beerid
1                     Hefeweizen           1.5          1.5           Sausa
Weizen      5.0       47986
2             English Strong Ale           3.0          3.0
Red Moon      6.2       48213
3         Foreign / Export Stout           3.0          3.0 Black Horse
Black Beer      6.5       48215
4                German Pilsener           2.5          3.0
Sausa Pils      5.0       47969
5 American Double / Imperial IPA           4.0          4.5
 Cauldron DIPA      7.7       64883
6           Herbed / Spiced Beer           3.0          3.5    Caldera
Ginger Beer      4.7       52159

'
I have only discovered how to import the data set , and run some basic r
functions on it my goal is to be able to answer questions like what are the
top 10 pilsner's , or the brewer with the highest abv average. Also using
two factors such as best beer aroma and appearance, which beer style should
I try. Let me know if i can give you any more information you might need to
help me.

Thanks again ,

Dylan

>



On Sun, Aug 18, 2013 at 4:16 AM, Paul Bernal <paulberna...@gmail.com> wrote:

> Thank you so much Steve.
>
> The computer I'm currently working with is a 32 bit windows 7 OS. And RAM
> is only 4GB so I guess thats a big limitation.
> El 18/08/2013 03:11, "Steve Lianoglou" <lianoglou.st...@gene.com>
> escribió:
>
> > Hi Paul,
> >
> > On Sun, Aug 18, 2013 at 12:56 AM, Paul Bernal <paulberna...@gmail.com>
> > wrote:
> > > Thanks a lot for the valuable information.
> > >
> > > Now my question would necessarily be, how many columns can R handle,
> > > provided that I have millions of rows and, in general, whats the
> maximum
> > > amount of rows and columns that R can effortlessly handle?
> >
> > This is all determined by your RAM.
> >
> > Prior to R-3.0, R could only handle vectors of length 2^31 - 1. If you
> > were working with a matrix, that meant that you could only have that
> > many elements in the entire matrix.
> >
> > If you were working with a data.frame, you could have data.frames with
> > 2^31-1 rows, and I guess as many columns, since data.frames are really
> > a list of vectors, the entire thing doesn't have to be in one
> > contiguous block (and addressable that way)
> >
> > R-3.0 introduced "Long Vectors" (search for that section in the release
> > notes):
> >
> > https://stat.ethz.ch/pipermail/r-announce/2013/000561.html
> >
> > It almost doubles the size of a vector that R can handle (assuming you
> > are running 64bit). So, if you've got the RAM, you can have a
> > data.frame/data.table w/ billion(s) of rows, in theory.
> >
> > To figure out how much data you can handle on your machine, you need
> > to know the size of real/integer/whatever and the number of elements
> > of those you will have so you can calculate the amount of RAM you need
> > to load it all up.
> >
> > Lastly, I should mention there are packages that let you work with
> > "out of memory" data, like bigmemory, biglm, ff. Look at the HPC Task
> > view for more info along those lines:
> >
> > http://cran.r-project.org/web/views/HighPerformanceComputing.html
> >
> >
> > >
> > > Best regards and again thank you for the help,
> > >
> > > Paul
> > > El 18/08/2013 02:35, "Steve Lianoglou" <lianoglou.st...@gene.com>
> > escribió:
> > >
> > >> Hi Paul,
> > >>
> > >> First: please keep your replies on list (use reply-all when replying
> > >> to R-help lists) so that others can help but also the lists can be
> > >> used as a resource for others.
> > >>
> > >> Now:
> > >>
> > >> On Aug 18, 2013, at 12:20 AM, Paul Bernal <paulberna...@gmail.com>
> > wrote:
> > >>
> > >> > Can R really handle millions of rows of data?
> > >>
> > >> Yup.
> > >>
> > >> > I thought it was not possible.
> > >>
> > >> Surprise :-)
> > >>
> > >> As I type, I'm working with a ~5.5 million row data.table pretty
> > >> effortlessly.
> > >>
> > >> Columns matter too, of course -- RAM is RAM, after all and you've got
> > >> to be able to fit the whole thing into it if you want to use
> > >> data.table. Once loaded, though, data.table enables one to do
> > >> split/apply/combine calculations over these data quite efficiently.
> > >> The first time I used it, I was honestly blown away.
> > >>
> > >> If you find yourself wanting to work with such data, you could do
> > >> worse than read through data.table's vignette and FAQ and give it a
> > >> spin.
> > >>
> > >> HTH,
> > >>
> > >> -steve
> > >>
> > >> --
> > >> Steve Lianoglou
> > >> Computational Biologist
> > >> Bioinformatics and Computational Biology
> > >> Genentech
> > >>
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > >
> > > ______________________________________________
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> >
> > --
> > Steve Lianoglou
> > Computational Biologist
> > Bioinformatics and Computational Biology
> > Genentech
> >
>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to