Re: [R] R and DBSCAN

2011-06-07 Thread Paco Pastor

Hello Christian

Thanks for answering. Yes, I have tried dbscan from fpc but I'm still 
stuck on the memory problem. Regarding your answer, I'm not sure which 
memory parameter should I look at. Following is the code I tried with 
dbscan parameters, maybe you can see if there is any mistake.


sstdat=read.csv(sst.dat,sep=;,header=F,col.names=c(lon,lat,sst))

library(fpc)
sst1=subset(sstdat, sst50)
sst2=subset(sst1, lon-6)
sst2=subset(sst2, lon40)
sst2=subset(sst2, lat46)

 dbscan(sst2$sst, 0.1, MinPts = 5, scale = FALSE, method = 
c(hybrid), seeds = FALSE, showplot = FALSE, countmode = NULL)

Error: no se puede ubicar un vector de tamaño  858.2 Mb
 head(sst2)
 lon   lat   sst
1257 35.18 24.98 26.78
1258 35.22 24.98 26.78
1259 35.27 24.98 26.78
1260 35.31 24.98 26.78
1261 35.35 24.98 26.78
1262 35.40 24.98 26.85


In this example I only apply dbscan to temperature values, not lon/lat, 
so eps parameter is 0.1. As it is a gridded data set any point is 
surrounded by eight data points, then I thought that at least 5 of the 
surrounding points should be within the reachability distance. But I'm 
not sure I'm getting the right approach by only considering temperature 
value, maybe then I'm missing spatial information. How should I deal 
with longitude and latitude data?


dimensions of sst2 are: 152243 rows x 3 columns

Thanks again

El 03/06/2011 18:24, Christian Hennig escribió:
Have you considered the dbscan function in library fpc, or was it 
another one?
dbscan in fpc doesn't have a distance parameter but several options, 
one
of which may resolve your memory problem (look up the documentation of 
the memory parameter).


Using a distance matrix for hundreds of thousands of points is a 
recipe for disaster (memory-wise). I'm not sure whether the function 
that you used did that, but dbscan in fpc can avoid it.


It is true that dbscan requires tuning constants that the user has to 
provide. There is unfortunately no general rule how to do this; it 
would be necessary to understand the method and the meaning of the 
constants, and how this translates into the requirements of your 
application.


You may try several different choices and do some cluster validation 
to see what works, but I can't explain this in general terms easily 
via email.


Hope this helps at least a bit.

Best regards,
Christian


On Fri, 3 Jun 2011, Paco Pastor wrote:


Hello everyone,

When looking for information about clustering of spatial data in R I 
was directed towards DBSCAN. I've read some docs about it and theb 
new questions have arisen.


DBSCAN requires some parameters, one of them is distance. As my 
data are three dimensional, longitude, latitude and temperature, 
which distance should I use? which dimension is related to that 
distance? I suposse it should be temperature. How do I find such 
minimum distance with R?


Another parameter is the minimum number of points neded to form a 
cluster. Is there any method to find that number? Unfortunately I 
haven't found.


Searching thorugh Google I could not find an R example for using 
dbscan in a dataset similar to mine, do you know any website with 
such kind of examples? So I can read and try to adapt to my case.


The last question is that my first R attempt with DBSCAN (without a 
proper answer to the prior questions) resulted in a memory problem. R 
says it can not allocate vector. I start with a 4 km spaced grid with 
779191 points that ends in approximately 30 rows x 3 columns 
(latitude, longitude and temperature) when removing not valid SST 
points. Any hint to address this memory problem. Does it depend on my 
computer or in DBSCAN itself?


Thanks for the patience to read a long and probably boring message 
and for your help.


--
---
Francisco Pastor
Meteorology department, Instituto Universitario CEAM-UMH
http://www.ceam.es
---
mail: p...@ceam.es
skype: paco.pastor.guzman
Researcher ID: http://www.researcherid.com/rid/B-8331-2008
Cosis profile: http://www.cosis.net/profile/francisco.pastor
---
Parque Tecnologico, C/ Charles R. Darwin, 14
46980 PATERNA (Valencia), Spain
Tlf. 96 131 82 27 - Fax. 96 131 81 90


-
Este mensaje y los ficheros anexos son confidenciales. Los mismos 
contienen información reservada de la empresa que no puede ser 
difundida. Si usted ha recibido este correo por error, tenga la 
amabilidad de eliminarlo de su sistema y avisar al remitente mediante 
reenvío a su dirección electrónica; no deberá copiar el mensaje ni 
divulgar su contenido a ninguna persona.


Su dirección de correo electrónico junto a sus datos personales 
forman parte de un fichero titularidad de la Fundación de la 
Comunidad Valenciana Centro de Estudios Ambientales del Mediterráneo 
- CEAM, con CIF: G-46957213, cuya finalidad es la de mantener el 
contacto con Ud. De acuerdo con la Ley Orgánica 15/1999, usted puede 
ejercitar sus derechos de acceso

[R] R and DBSCAN

2011-06-03 Thread Paco Pastor

Hello everyone,

When looking for information about clustering of spatial data in R I was 
directed towards DBSCAN. I've read some docs about it and theb new 
questions have arisen.


DBSCAN requires some parameters, one of them is distance. As my data 
are three dimensional, longitude, latitude and temperature, which 
distance should I use? which dimension is related to that distance? I 
suposse it should be temperature. How do I find such minimum distance 
with R?


Another parameter is the minimum number of points neded to form a 
cluster. Is there any method to find that number? Unfortunately I 
haven't found.


Searching thorugh Google I could not find an R example for using dbscan 
in a dataset similar to mine, do you know any website with such kind of 
examples? So I can read and try to adapt to my case.


The last question is that my first R attempt with DBSCAN (without a 
proper answer to the prior questions) resulted in a memory problem. R 
says it can not allocate vector. I start with a 4 km spaced grid with 
779191 points that ends in approximately 30 rows x 3 columns 
(latitude, longitude and temperature) when removing not valid SST 
points. Any hint to address this memory problem. Does it depend on my 
computer or in DBSCAN itself?


Thanks for the patience to read a long and probably boring message and 
for your help.


--
---
Francisco Pastor
Meteorology department, Instituto Universitario CEAM-UMH
http://www.ceam.es
---
mail: p...@ceam.es
skype: paco.pastor.guzman
Researcher ID: http://www.researcherid.com/rid/B-8331-2008
Cosis profile: http://www.cosis.net/profile/francisco.pastor
---
Parque Tecnologico, C/ Charles R. Darwin, 14
46980 PATERNA (Valencia), Spain
Tlf. 96 131 82 27 - Fax. 96 131 81 90


-
Este mensaje y los ficheros anexos son confidenciales. Los mismos contienen 
información reservada de la empresa que no puede ser difundida. Si usted ha 
recibido este correo por error, tenga la amabilidad de eliminarlo de su sistema 
y avisar al remitente mediante reenvío a su dirección electrónica; no deberá 
copiar el mensaje ni divulgar su contenido a ninguna persona.

Su dirección de correo electrónico junto a sus datos personales forman parte de 
un fichero titularidad de la Fundación de la Comunidad Valenciana Centro de 
Estudios Ambientales del Mediterráneo - CEAM, con CIF: G-46957213, cuya 
finalidad es la de mantener el contacto con Ud. De acuerdo con la Ley Orgánica 
15/1999, usted puede ejercitar sus derechos de acceso, rectificación, 
cancelación y, en su caso, oposición enviando una solicitud por escrito, 
acompañada de una fotocopia de su DNI dirigida a: Fundación de la Comunidad 
Valenciana Centro de Estudios Ambientales del Mediterráneo - CEAM. C/ Charles 
R. Darwin, 14. Parque Tecnológico.46980 PATERNA (Valencia).

This message and the attached files are confidential. They contain reserved 
information belonging to our centre and are not to be broadcast. If you have 
received this email by mistake, please delete it from your system and alert the 
sender by returning it to his/her email address. You must not copy or divulge 
the contents of the message to anyone.

Your email address and personal data are included in a file belonging to the 
Fundación de la Comunidad Valenciana Centro de Estudios Ambientales del 
Mediterráneo - CEAM, con CIF: G-46957213. The purpose of this file is to allow 
us to keep in contact with you. In accordance with Organic Law 15/1999, you are 
permitted to access, rectify, cancel or oppose the contents of this file by 
submitting a written request, accompanied by a photocopy of your DNI, to: 
Fundación de la Comunidad Valenciana Centro de Estudios Ambientales del 
Mediterráneo - CEAM. C/ Charles R. Darwin, 14. Parque Tecnológico.46980 PATERNA 
(Valencia).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Time series analysis for a daily series

2011-03-04 Thread Paco Pastor

Hi everyone

I am trying to do some time series analysis with daily temperature data 
(40 years). I have created a zoo object and ts object but can't apply 
stl function. It says the series is not periodic or has less than two 
periods. I've searched through google and found a lot of messages about 
this problem but not a solution/example to look for trend and seasonal 
component of a daily series.


Is there any guide/document to perform this analysis? I suppose there 
are another choices but stl, which should I try for daily series?


Thanks in advance

Paco


--
---
Francisco Pastor
Meteorology department, Instituto Universitario CEAM-UMH
http://www.ceam.es
---
mail: p...@ceam.es
skype: paco.pastor.guzman
Researcher ID: http://www.researcherid.com/rid/B-8331-2008
Cosis profile: http://www.cosis.net/profile/francisco.pastor
---
Parque Tecnologico, C/ Charles R. Darwin, 14
46980 PATERNA (Valencia), Spain
Tlf. 96 131 82 27 - Fax. 96 131 81 90

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Install Rmpi

2010-06-14 Thread Paco Pastor

Hi everyone

As I couldn't succeed with manual installation of Rmpi I decided to 
start again from the beginning. I removed R and MPICH in my Ubuntu Hardy 
installation. Then, to avoid any dependencies problems I have installed 
MPICH and R from synaptic, not from sources. But now I can't install Rmpi.


An error message appears when trying to install Rmpi, you can find in 
http://ubuntuone.com/p/71x/


Is it the time to upgrade to the latest Ubuntu version and build a new 
system?


Any help would be greatly appreciated.


--
---
Francisco Pastor
Meteorology department
Fundación CEAM
p...@ceam.es
http://www.ceam.es/ceamet - http://www.ceam.es
Parque Tecnologico, C/ Charles R. Darwin, 14
46980 PATERNA (Valencia), Spain
Tlf. 96 131 82 27 - Fax. 96 131 81 90
---
Usuario Linux registrado: 363952

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem using Rmpi

2010-06-09 Thread Paco Pastor

Hi

Thanks to the help from Uwe Ligges I could update Rmpi package. Now R 
can load the package but it still does not work. Now the problem comes 
when trying to use the first Rmpi command in a basic tutorial:


 library(Rmpi)
 mpi.spawn.Rslaves()
Error en mpi.spawn.Rslaves() : You cannot use MPI_Comm_spawn API

Seaching in the list I have found references to this problem but not the 
solution. More info about R and MPICH installation in my system (Ubuntu 
Hardy Heron in a dual AMD Athlon PC) can be found in 
http://ubuntuone.com/p/6Wr/


Thanks in advance

--
---
Francisco Pastor
Meteorology department
Fundación CEAM
p...@ceam.es
http://www.ceam.es/ceamet - http://www.ceam.es
Parque Tecnologico, C/ Charles R. Darwin, 14
46980 PATERNA (Valencia), Spain
Tlf. 96 131 82 27 - Fax. 96 131 81 90
---
Usuario Linux registrado: 363952

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem installing Rmpi

2010-06-08 Thread Paco Pastor

Hi everyone

I want to install Rmpi to use R in parallel mode in a Linux cluster 
(Ubuntu, Hardy Heron). It seems to be properly installed but a problem 
appears when loading Rmpi library.


R version 2.11.1 (2010-05-31)

 library(Rmpi)
Error: package 'Rmpi' was built before R 2.10.0: please re-install it


Should I remove R-2.11 and install R-2.10? I have tried to reinstall 
Rmpi but it gives the same error message.


Thanks in advance

Paco

--
---
Francisco Pastor
Meteorology department
Fundación CEAM
p...@ceam.es
http://www.ceam.es/ceamet - http://www.ceam.es
Parque Tecnologico, C/ Charles R. Darwin, 14
46980 PATERNA (Valencia), Spain
Tlf. 96 131 82 27 - Fax. 96 131 81 90
---
Usuario Linux registrado: 363952

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.