RE: Continuing on clustering queries

Regina Obe Thu, 28 Mar 2024 09:43:55 -0700

> Thank you again; results are starting to come in.
> 
> Apologies for starting a new thread; but the other one was getting messy.
> 
> Here are the queries that I am testing:
> SELECT city, state, ST_ClusterDBScan( ST_Transform(geom,2163), eps :=
> 0.3048*5000, minpoints :=1) OVER () AS cluster_id INTO TEMP aa_zz FROM
> shapefiles.atp_filling_stations aa WHERE aa.amenity = 'fuel'
> AND aa.ctry_code = 'US' AND aa.geom IS NOT NULL;
> 
> SELECT cluster_id, COUNT(*) FROM aa_zz GROUP BY 1 ORDER BY 2 DESC
> LIMIT 10;
> 
> SELECT state,  cluster_id, COUNT(*) FROM aa_zz GROUP BY 2, 1 ORDER BY 3
> DESC LIMIT 20;
> 
> SELECT city, state,  cluster_id, COUNT(*) FROM aa_zz GROUP BY 3, 2, 1
> ORDER BY 4 DESC LIMIT 20;
> 
> 
> I receive the following results:
> SELECT 76023
>   cluster_id | count
> ------------+-------
>         8467 |   422
>         3888 |   371
>         5701 |   249
>         3351 |   177
>         1111 |   141
>         8352 |   120
>         1610 |   107
>         1812 |    90
>         1824 |    86
>        14232 |    81
> (10 rows)
> 
>   state | cluster_id | count
> -------+------------+-------
>   MI    |       8467 |   422
>   NY    |       3888 |   371
>   IL    |       5701 |   245
>   NY    |       3351 |   177
>   CO    |       8352 |   110
>   TX    |       1610 |   101
>   NY    |       1824 |    86
>   DC    |       1111 |    82
>   IN    |      14748 |    78
>   TX    |      14297 |    75
>   MO    |       3661 |    74
>   CA    |      11272 |    74
>   MI    |        462 |    71
>   TX    |      14232 |    70
>   TX    |      19326 |    69
>   TX    |       1812 |    68
>   FL    |      13693 |    68
>   TX    |      16824 |    68
>   CA    |      14006 |    66
>   IL    |       2727 |    64
> (20 rows)
> 
>       city     | state | cluster_id | count
> --------------+-------+------------+-------
>   DETROIT      | MI    |       8467 |   147
>   CHICAGO      | IL    |       5701 |   104
>   Austin       | TX    |       1610 |   101
>   Detroit      | MI    |       8467 |    85
>   BROOKLYN     | NY    |       3888 |    77
>   Chicago      | IL    |       5701 |    75
>   Fresno       | CA    |      11272 |    73
>   Brooklyn     | NY    |       3888 |    71
>   WASHINGTON   | DC    |       1111 |    66
>   HOUSTON      | TX    |      14297 |    64
>   St. Louis    | MO    |       3661 |    63
>   HOUSTON      | TX    |      14232 |    63
>   DENVER       | CO    |       8352 |    54
>   BRONX        | NY    |       3351 |    53
>   LOS ANGELES  | CA    |      14006 |    50
>   Bakersfield  | CA    |       1777 |    49
>   INDIANAPOLIS | IN    |      14748 |    44
>   Bronx        | NY    |       3351 |    42
>   Amarillo     | TX    |        622 |    41
>   Austin       | TX    |       1601 |    38
> (20 rows)
> 
> 
> Obviously, the case-based aspect of the city name can be cancelled with a
> function such as UPPER(city).
> 
> I also am generous in the radius dimension setting it at 5k feet.
> 
> But my next question is how do position the cluster_id on a map w/o
> referencing the city, and then tying that out to a geometry that I have
> stored for the city?
> 
> Thank you again,
> 
> Max


Depends how you want to show the cluster. If you want to show like a bubble
to represent the cluster, you can probably using ST_MinimumBoundingCircle
https://postgis.net/docs/en/ST_MinimumBoundingCircle.html  
First by keeping the point geometry

CREATE TEMP TABLE tmp_aa_zz AS 
SELECT city, state, ST_ClusterDBScan( ST_Transform(geom,2163), eps :=
 0.3048*5000, minpoints :=1) OVER () AS cluster_id, aa.geom 
FROM
 shapefiles.atp_filling_stations aa WHERE aa.amenity = 'fuel'
AND aa.ctry_code = 'US' AND aa.geom IS NOT NULL;


SELECT cluster_id , ST_MinimumBoundingCircle(ST_Collect(geom)) AS geom,
COUNT(*)
FROM tmp_aa_zz
GROUP BY cluster_id;

Or you might want to show the individual points in which case you can
dispense with the ST_MinimumBoundingCircle call or have it in addition to
like: 

SELECT cluster_id , ST_Collect(ST_MinimumBoundingCircle(ST_Collect(geom)),
ST_Collect(geom)) AS geom, COUNT(*)
FROM tmp_aa_zz
GROUP BY cluster_id;

Other alternative options you can try besides the minimum bounding circle
might be ST_ConvexHull  https://postgis.net/docs/en/ST_ConvexHull.html
, ST_ConcaveHull https://postgis.net/docs/en/ST_ConcaveHull.html, or maybe
even ST_GeometricMedian https://postgis.net/docs/en/ST_GeometricMedian.html
(if you want to show only one point to represent all the points or
ST_Centroid ( I think ST_GeometricMedian would be a more accurate
representation though, think of centroid as the average so would be highly
affected by outlier points where as the median would be less so)

RE: Continuing on clustering queries

Reply via email to