This too.
Applied Predictive Modeling
Max Kuhn • Kjell Johnson
Thanks,
Mohan
On Tue, Sep 9, 2014 at 10:03 PM, Subba Rao wrote:
> Hi,
>
> I am interested in R programming in the Big Data space. Are there any
> books that would be a good starting point for this career path?
> Predictive analy
10 de septiembre de 2014 18:10
Para: Angel Rodriguez
Asunto: Re: [R] R, Big Data and books
Angel,
Thanks for the references. I am not seeing the name or link to the book that is
"free and recommended by John Hopskins' Department of Statistics". Can you
please let me know the name
>From an email list:
"R is well known in the world of Big Data and is increasing in popularity. A
number of very useful resources are available for anyone undertaking data
mining in R.
For example, Luis Torgo has just published a book called Data Mining with R �
learning with case studies (Torg
Hi,
I am interested in R programming in the Big Data space. Are there any
books that would be a good starting point for this career path?
Predictive analysis is also an area I am interested in.
Thank you in advance for any information and help.
Subba Rao
__
correcting a typo (400 MB, not GB. Thanks to David Winsemius for
reporting it). Spencer
###
Thanks to all who replied. For the record, I will summarize here
what I tried and what I learned:
Mike Harwood suggested the ff package. David Winsemius suggested
data.
Thanks to all who replied. For the record, I will summarize here
what I tried and what I learned:
Mike Harwood suggested the ff package. David Winsemius suggested
data.table and colbycol. Peter Langfelder suggested sqldf.
sqldf::read.csv.sql allowed me to create an SQL
The read.table.ffdf function in the ff package can read in delimited files
and store them to disk as individual columns. The ffbase package provides
additional data management and analytic functionality. I have used these
packages on 15 Gb files of 18 million rows and 250 columns.
On Tuesday
On Aug 5, 2014, at 10:20 AM, Spencer Graves wrote:
> What tools do you like for working with tab delimited text files up to
> 1.5 GB (under Windows 7 with 8 GB RAM)?
?data.table::fread
> Standard tools for smaller data sometimes grab all the available RAM,
> after which CPU usage dr
Have you tried read.csv.sql from package sqldf?
Peter
On Tue, Aug 5, 2014 at 10:20 AM, Spencer Graves
wrote:
> What tools do you like for working with tab delimited text files up to
> 1.5 GB (under Windows 7 with 8 GB RAM)?
>
>
> Standard tools for smaller data sometimes grab all the
What tools do you like for working with tab delimited text files
up to 1.5 GB (under Windows 7 with 8 GB RAM)?
Standard tools for smaller data sometimes grab all the available
RAM, after which CPU usage drops to 3% ;-)
The "bigmemory" project won the 2010 John Chambers Awa
The read.csv.sql function in the sqldf package may make this approach
quite simple.
On Thu, Aug 16, 2012 at 10:12 AM, jim holtman wrote:
> Why not put this into a database, and then you can easily extract the
> records you want specifying the record numbers. You play the one time
> expense of cr
Why not put this into a database, and then you can easily extract the
records you want specifying the record numbers. You play the one time
expense of creating the database, but then have much faster access to
the data as you make subsequent runs.
On Thu, Aug 16, 2012 at 9:44 AM, Tudor Medallion
Hello,
I'm most grateful for your time to read this.
I have a uber size 30GB file of 6 million records and 3000 (mostly
categorical data) columns in csv format. I want to bootstrap subsamples for
multinomial regression, but it's proving difficult even with my 64GB RAM
in my machine and twice tha
Daniel, thanks for the help. I finally made it, doing the merging separately.
Daniel Malter wrote:
>
> On a different note: how are you matching if AA has multiple matches in
> BB?
>
About that, all I have to do is check whether, for any of the BB which
matches with AA, the indicator equals 1.
If A has more columns than in your example, you could always try to only
merge those columns of A with B that are relevant for the merging. You could
then cbind the result of the merging back together with the rest of A as
long as the merged data preserved the same order as in A.
Alternatively, yo
Thanks Daniel, that helped me. Based on your suggestions I built this final
code:
library(foreign)
library(gdata)
AA = c(4,4,4,2,2,6,8,9)
A1 = c(3,3,11,5,5,7,11,12)
A2 = c(3,3,7,3,5,7,11,12)
A = cbind(AA, A1, A2)
BB = c(2,2,4,6,6)
B1 =c(5,11,7,13,NA)
B2 =c(4,12,11,NA,NA)
B3 =c(12,13,NA,NA
This is much clearer. So here is what I think you want to do. In theory and
practice:
Theory:
Check if AA[i] is in BB
If AA[i] is in BB, then take the row where BB[j] == AA[i] and check whether
A1 and A2 are in B1 to B3. Is that right? Only if both are, you want the
indicator to take 1.
Here i
Hi
> Re: [R] Big data and column correspondence problem
>
> Daniel, thanks for the answer.
> I will try to make myself i little bit clearer. Doing step by step I
would
> have (using a loop trough the lines of 'A'):
I am not sure if you are successful in your clarifying
Daniel, thanks for the answer.
I will try to make myself i little bit clearer. Doing step by step I would
have (using a loop trough the lines of 'A'):
1. AA[1] is 4. As so, I would have to compare A1[1] = 20 and A2[1] =3 with
B1 B2 B3
B[3,2:4] 7 11 NA
beacause BB[3]=4. Since there is
For question (a), do:
which(AA%in%BB)
Question (b) is very ambiguous to me. It makes little sense for your example
because all values of BB are in AA. Therefore I am wondering whether you
meant in question (a) that you want to find all values in BB that are in AA.
That's not the same thing. I am
Greetings,
I've been struggling for some time with a problem concerning a big database
that i have to deal with.
I'll try to exemplify my problem since the database is really big.
Suppose I have the following data:
AA = c(4,4,4,2,2,6,8,9)
A1 = c(3,3,5,5,5,7,11,12)
A2 = c(3,3,5,5,5,7,11,12)
A = cb
On Thu, Oct 21, 2010 at 2:00 PM, Ben Bolker wrote:
> Michal Figurski mail.med.upenn.edu> writes:
>
>> I have a data set of roughly 10 million records, 7 columns. It has only
>> about 500MB as a csv, so it fits in the memory. It's painfully slow to
>> do anything with it, but it's possible. I also
Though bigmemory, ff, and other big data solutions (databases, etc...)
can help easily manage massive data, their data objects are not
natively compatible with all the advanced functionality of R.
Exceptions include lm and glm (both ff and bigmemory support his via
Lumley's biglm package), kmeans,
Michal Figurski mail.med.upenn.edu> writes:
> I have a data set of roughly 10 million records, 7 columns. It has only
> about 500MB as a csv, so it fits in the memory. It's painfully slow to
> do anything with it, but it's possible. I also have another dataset of
> covariates that I would like
Dear R-helpers
I have a data set of roughly 10 million records, 7 columns. It has only
about 500MB as a csv, so it fits in the memory. It's painfully slow to
do anything with it, but it's possible. I also have another dataset of
covariates that I would like to explore - with about 4GB of data.
Intermountain Healthcare
greg.s...@imail.org
801.408.8111
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of André de Boer
> Sent: Wednesday, September 08, 2010 5:27 AM
> To: r-help@r-project.org
> Subject: [R] big data
On 8 September 2010 at 13:26, André de Boer wrote:
| I searched the internet but i didn't find the answer for the next problem:
| I want to do a glm on a csv file consisting of 25 columns and 4 mln rows.
| Not all the columns are relevant. My problem is to read the data into R.
| Manipulate the da
Hello,
I searched the internet but i didn't find the answer for the next problem:
I want to do a glm on a csv file consisting of 25 columns and 4 mln rows.
Not all the columns are relevant. My problem is to read the data into R.
Manipulate the data and then do a glm.
I've tried with:
dd<-scan("m
On Dec 18, 2008, at 3:07 PM, Stephan Kolassa wrote:
Hi Mauricio,
Mauricio Calvao schrieb:
1) I would like very much to use R for processing some big data
files (around 1.7 or more GB) for spatial analysis, wavelets, and
power spectra estimation; is this possible with R? Within IDL, such
Hi Mauricio,
Mauricio Calvao schrieb:
1) I would like very much to use R for processing some big data files
(around 1.7 or more GB) for spatial analysis, wavelets, and power
spectra estimation; is this possible with R? Within IDL, such a big data
set seems to be tractable...
There are some p
Hi there
I am new to R and would like to ask some questions which might not make
perfect sense. Anyhow, here they are:
1) I would like very much to use R for processing some big data files
(around 1.7 or more GB) for spatial analysis, wavelets, and power
spectra estimation; is this possible
31 matches
Mail list logo