Hi R'er,

I have a dataset which has a matrix of 7502 x 1426 (rows x columns).
The data is in a CSV format which has a size around 68Mb. This dataset is less 
than 10% of our dataset.
I have been adopting the Anomaly detection method as described by 
http://www.mattpeeples.net/kmeans.html .
It has been running more than 24hrs and still haven't completed the calculation.
I did manage to run it with a smaller dataset (ie, 2100 rows x 1426 columns). 
It took around 12hrs to run.

I have a few questions and need your expertise guidance.

1)      Is there any better Open source tools to use to do in one tool (eg, R 
Studio): prepare data, build models, validate models, test models and present 
data. I am looking a tool which will allow me to do the same as per the above 
link (Matt Peeples' blog).

2)      Is there an Open source tools to perform the above which will allow me 
to run on top of Hadoop eco-system?

3)      Can we use R Studio for windows as a client to run on top of Hadoop 
eco-system? If yes, please point me to the site where they have a use cases or 
samples.

Thanks and Regards,
Truong Phan

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to