[GitHub] [madlib] fmcquillan99 commented on pull request #496: DBSCAN: Add new module DBSCAN

GitBox Tue, 05 May 2020 11:29:33 -0700


fmcquillan99 commented on pull request #496:
URL: https://github.com/apache/madlib/pull/496#issuecomment-624228952



   (5)
   To be consistent with knn, can we please make the algorithm `brute_force` as 
the base.  It seems you can do `brute` but `brute_force` does not work:
   ```
   SELECT madlib.dbscan( 
                                'dbscan_train_data',    -- source table
                                'dbscan_result',                -- output table
                                'pid',                                  -- 
point id column
                                'pointsxx',                             -- data 
point
                                 1.75,                                  -- 
epsilon
                                 4,                                             
-- min samples
                                 'dist_norm2',  -- metric
                                'brute_force');                 -- algorithm
   
   ERROR:  plpy.Error: dbscan Error: algorithm has to be one of the following: 
brute, kd-tree (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "dbscan", line 21, in <module>
       return dbscan.dbscan(**globals())
     PL/Python function "dbscan", line 38, in dbscan
     PL/Python function "dbscan", line 184, in _validate_dbscan
     PL/Python function "dbscan", line 123, in _assert
   PL/Python function "dbscan"
   ```
   
   
   (6) 
   It seems the column header `points` does not work, maybe in conflicts with 
an internal name?
   I think it could be a common col name so we should fix this.
   ```
   SELECT madlib.dbscan( 
                                'dbscan_train_data',    -- source table
                                'dbscan_result',                -- output table
                                'pid',                                  -- 
point id column
                                'points',                               -- data 
point
                                 1.75,                                  -- 
epsilon
                                 4,                                             
-- min samples
                                 'dist_norm2',  -- metric
                                'brute');                       -- algorithm
   
   ERROR:  plpy.SPIError: column reference "points" is ambiguous
   LINE 3:             SET points = points
                                    ^
   QUERY:  
               UPDATE dbscan_result AS t1
               SET points = points
               FROM dbscan_train_data AS t2
               WHERE t1.pid = t2.pid
           
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "dbscan", line 21, in <module>
       return dbscan.dbscan(**globals())
     PL/Python function "dbscan", line 109, in dbscan
   PL/Python function "dbscan"
   ```
   
   
   (7)
   Please remove the verbose output unless you need it for debugging and such:
   ```
   ALTER        ANALYZE      BEGIN        CHECKPOINT   CLOSE        CLUSTER     
 COMMENT      COMMIT       COPY         
   CREATE       DEALLOCATE   DECLARE      DELETE FROM  DISCARD      DO          
 DROP         END          EXECUTE      
   EXPLAIN      FETCH        GRANT        INSERT       LISTEN       LOAD        
 LOCK         MOVE         NOTIFY       
   PREPARE      REASSIGN     REINDEX      RELEASE      RESET        REVOKE      
 ROLLBACK     SAVEPOINT    SELECT       
   SET          SHOW         START        TABLE        TRUNCATE     UNLISTEN    
 UPDATE       VACUUM       VALUES       
   WITH         ABORT     
   etc.   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [madlib] fmcquillan99 commented on pull request #496: DBSCAN: Add new module DBSCAN

Reply via email to