[ https://issues.apache.org/jira/browse/MATH-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298255#comment-15298255 ]
Amol Singh commented on MATH-1367: ---------------------------------- Okay, I'll do that. https://en.wikipedia.org/wiki/DBSCAN#cite_note-dbscan-1 Not the most reliable source but if you look at the pseudocode, thats how others have interpreted this algorithm as well. {quote} regionQuery(P, eps) return all points within P's eps-neighborhood (including P) {quote} I'll submit a patch. > DBSCAN Implementation does not count the seed point itself as part of its > neighbors count > ----------------------------------------------------------------------------------------- > > Key: MATH-1367 > URL: https://issues.apache.org/jira/browse/MATH-1367 > Project: Commons Math > Issue Type: Bug > Affects Versions: 3.6.1 > Reporter: Amol Singh > Fix For: 4.0 > > > The DSCAN paper describes the eps-neighborhood of a point as > https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf (Page 2) > Definition 1: (Eps-neighborhood of a point) The Eps-neighborhood of a point > p, denoted by NEps(p), is defined by NEps(p) = {q ∈ D | dist(p,q)< Eps} > in other words for all q points that are a member of database D whose > distance from p is less that Eps should be classified as a neighbor. This > should include the point itself. > The implementation however has a reference check to the point itself and does > not add it to its neighbors list. > private List<T> getNeighbors(final T point, final Collection<T> points) { > final List<T> neighbors = new ArrayList<T>(); > for (final T neighbor : points) { > if (point != neighbor && distance(neighbor, point) <= eps) { > neighbors.add(neighbor); > } > } > return neighbors; > } > "point != neighbor " check should be removed here. Keeping this check > effectively is raising the minPts count by 1. Other third party QuadTree > backed DBSCAN implementations consider the center point in its neighbor count > E.g. bmw-carit library. > If this is infact by design, the check should use value equality instead of > reference equality. T extends Clusterable<T> , the client should be able to > define this behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)