Re: [R] clustering in R

Joris Meys Fri, 28 May 2010 15:33:20 -0700

errr, forget about the output of dput(q), but keep it in mind for next time.


f = dist(t(q))
hclust(f,method="single")

it's as simple as that.
Cheers
Joris

On Fri, May 28, 2010 at 10:39 PM, Ayesha Khan
<ayesha.diamond...@gmail.com>wrote:

> v <- dput(x,"sampledata.txt")
> dim(v)
> q <- v[1:10,1:10]
> f =as.matrix(dist(t(q)))
>
> distB=NULL
> for(k in 1:(nrow(f)-1)) for( m in (k+1):ncol(f)) {
> if(f[k,m] <2) distB=rbind(distB,c(k,m,f[k,m]))
> }
> #now distB looks like this
>
> > distB
>       [,1] [,2]      [,3]
>  [1,]    1    2  1.6275568
>  [2,]    1    3  0.5252058
>  [3,]    1    4  0.7323116
>  [4,]    1    5  1 .9966001
>  [5,]    1    6  1.6664110
>  [6,]    1    7  1.0800540
>  [7,]    1    8  1.8698925
>  [8,]    1   10  0.5161808
>  [9,]    2    3  1.7325811
> [10,]    2    5  0.8267843
> [11,]    2    6  0.5963280
> [12,]    2    7  0.8787230
>
> #now from this output< i want to cluster all 1's, friedns of 1 and friends
> of friends of 1 in one cluster. The same goes for 2,3 and so on
> But when i do that using hclust, i get the following error. I think what I
> need to do is convert my cureent matrix somehow into a format that would be
> accepted by the hclust function but I dont know how to achieve that.
>  distclust <- hclust(distB,method="single")
>
> Error in if (n < 2) stop("must have n >= 2 objects to cluster") :
>   argument is of length zero
>
> P.S: Please let me know if this makes things more clear? "cuz i dont know
> how looking at the original data set would help becuase the matrix under
> consdieration right now is the distance matrix and how it can be altered. I
> have tried as.dist, doesnt work because my matrix as i mentioned eralier is
> not a square matrix.
> On Fri, May 28, 2010 at 2:37 PM, Tal Galili <tal.gal...@gmail.com> wrote:
>
>> Hi Ayesha,
>> I wish to help you, but without a simple self contained example that shows
>> your issue, I will not be able to help.
>> Try using the ?dput command to create some simple data, and let us see
>> what you are doing.
>>
>> Best,
>> Tal
>> ----------------Contact
>> Details:-------------------------------------------------------
>> Contact me: tal.gal...@gmail.com |  972-52-7275845
>> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
>> www.r-statistics.com (English)
>>
>> ----------------------------------------------------------------------------------------------
>>
>>
>>
>>
>>   On Fri, May 28, 2010 at 9:04 PM, Ayesha Khan <
>> ayesha.diamond...@gmail.com> wrote:
>>
>>> Thanks Tal & Joris!
>>> I created my distance matrix distA by using the dist() function in R
>>> manipulating my output in order to get a matrix.
>>> distA =as.matrix(dist(t(x2))) # x2 being my original dataset
>>> as according to the documentaion on dist()
>>>
>>> For the default method, a "dist" object, or a matrix (of distances) or
>>> an object which can be coerced to such a matrix using as.matrix()
>>>
>>>   On Fri, May 28, 2010 at 6:34 AM, Joris Meys <jorism...@gmail.com>wrote:
>>>
>>>> As Tal said.
>>>>
>>>> Next to that, I read that column1 (and column2?) are supposed to be seen
>>>> as factors, not as numerical variables. Did you take that into account
>>>> somehow?
>>>>
>>>> It's easy to reproduce the error code :
>>>> > n <- NULL
>>>> > if(n<2)print("This is OK")
>>>> Error in if (n < 2) print("This is OK") : argument is of length zero
>>>>
>>>> In the hclust code, you find following line :
>>>> n <- as.integer(attr(d, "Size"))
>>>> where d is the distance object entered in the hclust function. Looking
>>>> at the error you get, this means that the size attribute of your distance 
>>>> is
>>>> NULL. Which tells me that distA is not a dist-object.
>>>>
>>>> > A <- matrix(1:4,ncol=2)
>>>> > A
>>>>      [,1] [,2]
>>>> [1,]    1    3
>>>> [2,]    2    4
>>>> > hclust(A,method="single")
>>>>
>>>> Error in if (n < 2) stop("must have n >= 2 objects to cluster") :
>>>>   argument is of length zero
>>>>
>>>> Did you actually put in a distance object? see also ?dist or ?as.dist.
>>>>
>>>> Cheers
>>>> Joris
>>>>
>>>>
>>>>
>>>>
>>>>  On Fri, May 28, 2010 at 1:41 AM, Ayesha Khan <
>>>> ayesha.diamond...@gmail.com> wrote:
>>>>
>>>>>  i have a matrix with the following dimensions
>>>>> 136   3
>>>>>
>>>>> and it looks something like
>>>>>
>>>>>         [,1] [,2]     [,3]
>>>>>  [1,]  402  675 1.802758
>>>>>  [2,]  402  696 1.938902
>>>>>  [3,]  402  699 1.994253
>>>>>  [4,]  402  945 1.898619
>>>>>  [5,]  424  470 1.812857
>>>>>  [6,]  424  905 1.816345
>>>>>  [7,]  470  905 1.871252
>>>>>  [8,]  504  780 1.958191
>>>>>  [9,]  504  848 1.997111...............
>>>>>
>>>>> ................................................................................
>>>>> so you get the idea. I want to group similar items in one group/cluster
>>>>> following the "friends of friends" approach. I tried doing
>>>>>
>>>>> distclust <- hclust(distA,method="single")
>>>>> However, I got the following error.
>>>>>
>>>>> Error in if (n < 2) stop("must have n >= 2 objects to cluster") :
>>>>>  argument
>>>>> is of length zero
>>>>> which probably means there's something wrong with my input here. Is
>>>>> there
>>>>> another way of doing this kind of clustering without getting into all
>>>>> the
>>>>>  looping and ifelse etc. Basically, if 402 is close to 675,696,and699
>>>>> and
>>>>> thus fall in cluster A then all items close to 675,696,and 699 should
>>>>> also
>>>>> fall into the same cluster A following a friends of friedns strategy.
>>>>> Any help would be highly appreciated.
>>>>>
>>>>> --
>>>>> Ayesha Khan
>>>>>
>>>>> MS Bioengineering
>>>>> Dept. of Bioengineering
>>>>> Rice University, TX
>>>>>
>>>>>        [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Joris Meys
>>>> Statistical Consultant
>>>>
>>>> Ghent University
>>>> Faculty of Bioscience Engineering
>>>> Department of Applied mathematics, biometrics and process control
>>>>
>>>> Coupure Links 653
>>>> B-9000 Gent
>>>>
>>>> tel : +32 9 264 59 87
>>>> joris.m...@ugent.be
>>>> -------------------------------
>>>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>>>
>>>
>>>
>>>
>>> --
>>>  Ayesha Khan
>>>
>>> MS Bioengineering
>>> Dept. of Bioengineering
>>> Rice University, TX
>>>
>>
>>
>
>
> --
> Ayesha Khan
>
> MS Bioengineering
> Dept. of Bioengineering
> Rice University, TX
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] clustering in R

Reply via email to