[jira] [Commented] (MATH-1367) DBSCAN Implementation does not count the seed point itself as part of its neighbors count

2022-01-08 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/MATH-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471246#comment-17471246
 ] 

Gilles Sadowski commented on MATH-1367:
---

Will you provide a patch?

> DBSCAN Implementation does not count the seed point itself as part of its 
> neighbors count
> -
>
> Key: MATH-1367
> URL: https://issues.apache.org/jira/browse/MATH-1367
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Amol Singh
>Priority: Major
> Fix For: 4.0
>
>
> The DSCAN paper describes the eps-neighborhood of a point as 
> https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf (Page 2)
> Definition 1: (Eps-neighborhood of a point) The Eps-neighborhood of a point 
> p, denoted by NEps(p), is defined by NEps(p) = {q ∈ D | dist(p,q)< Eps} 
> in other words for all q points that are a member of database D whose 
> distance from p is less that Eps should be classified as a neighbor. This 
> should include the point itself. 
> The implementation however has a reference check to the point itself and does 
> not add it to its neighbors list.
> private List getNeighbors(final T point, final Collection points) {
> final List neighbors = new ArrayList();
> for (final T neighbor : points) {
> if (point != neighbor && distance(neighbor, point) <= eps) {
> neighbors.add(neighbor);
> }
> }
> return neighbors;
> } 
> "point != neighbor "  check should be removed here. Keeping this check 
> effectively is raising the minPts count by 1. Other third party QuadTree 
> backed DBSCAN implementations consider the center point in its neighbor count 
> E.g. bmw-carit library. 
> If this is infact by design, the check should use value equality instead of 
> reference equality. T extends Clusterable , the client should be able to 
> define this behavior. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (MATH-1367) DBSCAN Implementation does not count the seed point itself as part of its neighbors count

2018-08-15 Thread wang qiang (JIRA)


[ 
https://issues.apache.org/jira/browse/MATH-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581926#comment-16581926
 ] 

wang qiang commented on MATH-1367:
--

I agree with you 

> DBSCAN Implementation does not count the seed point itself as part of its 
> neighbors count
> -
>
> Key: MATH-1367
> URL: https://issues.apache.org/jira/browse/MATH-1367
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Amol Singh
>Priority: Major
> Fix For: 4.0
>
>
> The DSCAN paper describes the eps-neighborhood of a point as 
> https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf (Page 2)
> Definition 1: (Eps-neighborhood of a point) The Eps-neighborhood of a point 
> p, denoted by NEps(p), is defined by NEps(p) = {q ∈ D | dist(p,q)< Eps} 
> in other words for all q points that are a member of database D whose 
> distance from p is less that Eps should be classified as a neighbor. This 
> should include the point itself. 
> The implementation however has a reference check to the point itself and does 
> not add it to its neighbors list.
> private List getNeighbors(final T point, final Collection points) {
> final List neighbors = new ArrayList();
> for (final T neighbor : points) {
> if (point != neighbor && distance(neighbor, point) <= eps) {
> neighbors.add(neighbor);
> }
> }
> return neighbors;
> } 
> "point != neighbor "  check should be removed here. Keeping this check 
> effectively is raising the minPts count by 1. Other third party QuadTree 
> backed DBSCAN implementations consider the center point in its neighbor count 
> E.g. bmw-carit library. 
> If this is infact by design, the check should use value equality instead of 
> reference equality. T extends Clusterable , the client should be able to 
> define this behavior. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MATH-1367) DBSCAN Implementation does not count the seed point itself as part of its neighbors count

2018-01-12 Thread jiaopaner (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323717#comment-16323717
 ] 

jiaopaner commented on MATH-1367:
-

Hello,Amol
I quite agree with your opinion .I also found this problem when I use the 
DBSCAN algorithm.The version I used is apache-commons -math-3.6.1.
So the problem still exists,and I also found other problems in addition to the 
above problem.The time complexity of the algorithm is always n^2 when I 
test.Besides that,there are many no necessary operations in the algorithm 
implementation.BTW,why only DoublePoint, no StringPoint, so I can't use 
Levenshtein distance to search the adjacency points.I'll create a detailed bug 
report .

> DBSCAN Implementation does not count the seed point itself as part of its 
> neighbors count
> -
>
> Key: MATH-1367
> URL: https://issues.apache.org/jira/browse/MATH-1367
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Amol Singh
> Fix For: 4.0
>
>
> The DSCAN paper describes the eps-neighborhood of a point as 
> https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf (Page 2)
> Definition 1: (Eps-neighborhood of a point) The Eps-neighborhood of a point 
> p, denoted by NEps(p), is defined by NEps(p) = {q ∈ D | dist(p,q)< Eps} 
> in other words for all q points that are a member of database D whose 
> distance from p is less that Eps should be classified as a neighbor. This 
> should include the point itself. 
> The implementation however has a reference check to the point itself and does 
> not add it to its neighbors list.
> private List getNeighbors(final T point, final Collection points) {
> final List neighbors = new ArrayList();
> for (final T neighbor : points) {
> if (point != neighbor && distance(neighbor, point) <= eps) {
> neighbors.add(neighbor);
> }
> }
> return neighbors;
> } 
> "point != neighbor "  check should be removed here. Keeping this check 
> effectively is raising the minPts count by 1. Other third party QuadTree 
> backed DBSCAN implementations consider the center point in its neighbor count 
> E.g. bmw-carit library. 
> If this is infact by design, the check should use value equality instead of 
> reference equality. T extends Clusterable , the client should be able to 
> define this behavior. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MATH-1367) DBSCAN Implementation does not count the seed point itself as part of its neighbors count

2016-05-24 Thread Amol Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298255#comment-15298255
 ] 

Amol Singh commented on MATH-1367:
--

Okay, I'll do that. 

https://en.wikipedia.org/wiki/DBSCAN#cite_note-dbscan-1 Not the most reliable 
source but if you look at the pseudocode, thats how others have interpreted 
this algorithm as well. 
{quote}
regionQuery(P, eps)
   return all points within P's eps-neighborhood (including P)
{quote}

I'll submit a patch. 

> DBSCAN Implementation does not count the seed point itself as part of its 
> neighbors count
> -
>
> Key: MATH-1367
> URL: https://issues.apache.org/jira/browse/MATH-1367
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Amol Singh
> Fix For: 4.0
>
>
> The DSCAN paper describes the eps-neighborhood of a point as 
> https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf (Page 2)
> Definition 1: (Eps-neighborhood of a point) The Eps-neighborhood of a point 
> p, denoted by NEps(p), is defined by NEps(p) = {q ∈ D | dist(p,q)< Eps} 
> in other words for all q points that are a member of database D whose 
> distance from p is less that Eps should be classified as a neighbor. This 
> should include the point itself. 
> The implementation however has a reference check to the point itself and does 
> not add it to its neighbors list.
> private List getNeighbors(final T point, final Collection points) {
> final List neighbors = new ArrayList();
> for (final T neighbor : points) {
> if (point != neighbor && distance(neighbor, point) <= eps) {
> neighbors.add(neighbor);
> }
> }
> return neighbors;
> } 
> "point != neighbor "  check should be removed here. Keeping this check 
> effectively is raising the minPts count by 1. Other third party QuadTree 
> backed DBSCAN implementations consider the center point in its neighbor count 
> E.g. bmw-carit library. 
> If this is infact by design, the check should use value equality instead of 
> reference equality. T extends Clusterable , the client should be able to 
> define this behavior. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MATH-1367) DBSCAN Implementation does not count the seed point itself as part of its neighbors count

2016-05-24 Thread Gilles (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298132#comment-15298132
 ] 

Gilles commented on MATH-1367:
--

bq. Let me know if you agree this is a bug.

I don't know. :(

If you are positive that the algorithm was not correctly implemented, please do 
submit a patch with code comments that explain why the change was necessary.
It would also be nice to set up a unit test showing a practical case that this 
implementation agrees with the result computed by another implementation.

Thanks.

> DBSCAN Implementation does not count the seed point itself as part of its 
> neighbors count
> -
>
> Key: MATH-1367
> URL: https://issues.apache.org/jira/browse/MATH-1367
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Amol Singh
> Fix For: 4.0
>
>
> The DSCAN paper describes the eps-neighborhood of a point as 
> https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf (Page 2)
> Definition 1: (Eps-neighborhood of a point) The Eps-neighborhood of a point 
> p, denoted by NEps(p), is defined by NEps(p) = {q ∈ D | dist(p,q)< Eps} 
> in other words for all q points that are a member of database D whose 
> distance from p is less that Eps should be classified as a neighbor. This 
> should include the point itself. 
> The implementation however has a reference check to the point itself and does 
> not add it to its neighbors list.
> private List getNeighbors(final T point, final Collection points) {
> final List neighbors = new ArrayList();
> for (final T neighbor : points) {
> if (point != neighbor && distance(neighbor, point) <= eps) {
> neighbors.add(neighbor);
> }
> }
> return neighbors;
> } 
> "point != neighbor "  check should be removed here. Keeping this check 
> effectively is raising the minPts count by 1. Other third party QuadTree 
> backed DBSCAN implementations consider the center point in its neighbor count 
> E.g. bmw-carit library. 
> If this is infact by design, the check should use value equality instead of 
> reference equality. T extends Clusterable , the client should be able to 
> define this behavior. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MATH-1367) DBSCAN Implementation does not count the seed point itself as part of its neighbors count

2016-05-21 Thread Amol Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295112#comment-15295112
 ] 

Amol Singh commented on MATH-1367:
--

Hi Giles, 

Thanks for the quick response. 

Here's a simple breaking test, you should be able to add this to 
DBSCANClustererTest.java and run it. 

 @Test public void testBreakingCase() {
final DoublePoint[] points = { new DoublePoint(new double[] { 
21.345289965479466, 32.149537670215295 }),
new DoublePoint(new double[] { 42.02226837841131, 
19.69611946377732 }),
new DoublePoint(new double[] { 41.930019221367985, 
21.954397109962457 }),
new DoublePoint(new double[] { 42.87345060400816, 
18.796775009724836 }) };

final DBSCANClusterer clusterer = new 
DBSCANClusterer(3, 3);
List clusters = 
clusterer.cluster(Arrays.asList(points));
Assert.assertEquals(1, clusters.size());
final List clusterOne = Arrays.asList(points[1], 
points[2], points[3]);
Assert.assertTrue(clusters.get(0).getPoints().containsAll(clusterOne));
}

Now, if you set minPts=2 or change the code to remove the reference check of 
the point to itself in getNeighbors() this will pass. 

Let me know if you agree this is a bug. I'm happy to submit a patch. This 
change does change the behavior for existing users, the algorithm will produce 
more clusters / shape of clusters may change. Nonetheless I feel this would be 
the correct interpretation of the DBSCAN paper, and it seems consistent with 
other third party libraries I've evaluated.

> DBSCAN Implementation does not count the seed point itself as part of its 
> neighbors count
> -
>
> Key: MATH-1367
> URL: https://issues.apache.org/jira/browse/MATH-1367
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Amol Singh
> Fix For: 4.0
>
>
> The DSCAN paper describes the eps-neighborhood of a point as 
> https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf (Page 2)
> Definition 1: (Eps-neighborhood of a point) The Eps-neighborhood of a point 
> p, denoted by NEps(p), is defined by NEps(p) = {q ∈ D | dist(p,q)< Eps} 
> in other words for all q points that are a member of database D whose 
> distance from p is less that Eps should be classified as a neighbor. This 
> should include the point itself. 
> The implementation however has a reference check to the point itself and does 
> not add it to its neighbors list.
> private List getNeighbors(final T point, final Collection points) {
> final List neighbors = new ArrayList();
> for (final T neighbor : points) {
> if (point != neighbor && distance(neighbor, point) <= eps) {
> neighbors.add(neighbor);
> }
> }
> return neighbors;
> } 
> "point != neighbor "  check should be removed here. Keeping this check 
> effectively is raising the minPts count by 1. Other third party QuadTree 
> backed DBSCAN implementations consider the center point in its neighbor count 
> E.g. bmw-carit library. 
> If this is infact by design, the check should use value equality instead of 
> reference equality. T extends Clusterable , the client should be able to 
> define this behavior. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MATH-1367) DBSCAN Implementation does not count the seed point itself as part of its neighbors count

2016-05-21 Thread Gilles (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294842#comment-15294842
 ] 

Gilles commented on MATH-1367:
--

Hi Amol.

Thanks for your report.

>From reading the problem description it is not obvious to figure out whether 
>your proposed fix won't have any unwanted side-effects (e.g. it could be that 
>the "getNeighbors" was not meant to exactly match the definition in the 
>reference you cite; maybe you are totally right, I'm just guessing since I 
>never looked at that code...).

Could you provide a unit test showing that it is indeed a bug (i.e. a case 
where the best solution cannot be recovered unless the fix is applied)?


> DBSCAN Implementation does not count the seed point itself as part of its 
> neighbors count
> -
>
> Key: MATH-1367
> URL: https://issues.apache.org/jira/browse/MATH-1367
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Amol Singh
> Fix For: 4.0
>
>
> The DSCAN paper describes the eps-neighborhood of a point as 
> https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf (Page 2)
> Definition 1: (Eps-neighborhood of a point) The Eps-neighborhood of a point 
> p, denoted by NEps(p), is defined by NEps(p) = {q ∈ D | dist(p,q)< Eps} 
> in other words for all q points that are a member of database D whose 
> distance from p is less that Eps should be classified as a neighbor. This 
> should include the point itself. 
> The implementation however has a reference check to the point itself and does 
> not add it to its neighbors list.
> private List getNeighbors(final T point, final Collection points) {
> final List neighbors = new ArrayList();
> for (final T neighbor : points) {
> if (point != neighbor && distance(neighbor, point) <= eps) {
> neighbors.add(neighbor);
> }
> }
> return neighbors;
> } 
> "point != neighbor "  check should be removed here. Keeping this check 
> effectively is raising the minPts count by 1. Other third party QuadTree 
> backed DBSCAN implementations consider the center point in its neighbor count 
> E.g. bmw-carit library. 
> If this is infact by design, the check should use value equality instead of 
> reference equality. T extends Clusterable , the client should be able to 
> define this behavior. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)