[jira] [Commented] (SEDONA-739) ST_DBScan fails when selecting only subset of columns.

James Willis (Jira) Fri, 20 Jun 2025 17:52:15 -0700


    [ 
https://issues.apache.org/jira/browse/SEDONA-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17985050#comment-17985050
 ]


James Willis commented on SEDONA-739:
-------------------------------------

I need a more specific reproducible example than what you've given me. There is 
going to be some interactions with the optimizer that allow me to recreate this 
case.

The below does not reproduce:
{code:java}
sedona.createDataFrame([
    {"index": 1, "x": 2.2, "y": 3.3}
]).select("index", ST_Point("x", 
"y").alias("geom")).cache().createOrReplaceTempView("points")

dbscan_df = sedona.sql(
    """
    SELECT
        index,
        geom AS geom,
        ST_DBSCAN(geom, 0.5, 10, False) AS scan
    FROM points
    """
) 

# unhappy case
dbscan_df.select("scan.*", "index").show() 
 {code}

> ST_DBScan fails when selecting only subset of columns.
> ------------------------------------------------------
>
>                 Key: SEDONA-739
>                 URL: https://issues.apache.org/jira/browse/SEDONA-739
>             Project: Apache Sedona
>          Issue Type: Bug
>            Reporter: Paweł Kociński
>            Priority: Major
>
>  
> {code:java}
> dbscan_df = sedona.sql(
>     """
>     SELECT
>         index,
>         geom AS geom,
>         ST_DBSCAN(geom, 0.5, 10, False) AS scan
>     FROM points
>     """
> ) {code}
> Selecting all columns works just fine
>  
>  
> {code:java}
> result = dbscan_df.select("scan.*", "index", "geom") {code}
> but subset 
>  
> {code:java}
> dbscan_df.select("scan.*", "index").show() {code}
> is causing the 
> {code:java}
> IllegalArgumentException                  Traceback (most recent call last)
> Cell In[56], line 1
> ----> 1 dbscan_df.select("scan.*", "index").show()
> File /opt/spark/python/pyspark/sql/dataframe.py:947, in DataFrame.show(self, 
> n, truncate, vertical)
>     887 def show(self, n: int = 20, truncate: Union[bool, int] = True, 
> vertical: bool = False) -> None:
>     888     """Prints the first ``n`` rows to the console.
>     889 
>     890     .. versionadded:: 1.3.0
>    (...)
>     945     name | Bob
>     946     """
> --> 947     print(self._show_string(n, truncate, vertical))
> File /opt/spark/python/pyspark/sql/dataframe.py:965, in 
> DataFrame._show_string(self, n, truncate, vertical)
>     959     raise PySparkTypeError(
>     960         error_class="NOT_BOOL",
>     961         message_parameters={"arg_name": "vertical", "arg_type": 
> type(vertical).__name__},
>     962     )
>     964 if isinstance(truncate, bool) and truncate:
> --> 965     return self._jdf.showString(n, 20, vertical)
>     966 else:
>     967     try:
> File /usr/local/lib/python3.10/dist-packages/py4j/java_gateway.py:1322, in 
> JavaMember.__call__(self, *args)
>    1316 command = proto.CALL_COMMAND_NAME +\
>    1317     self.command_header +\
>    1318     args_command +\
>    1319     proto.END_COMMAND_PART
>    1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>    1323     answer, self.gateway_client, self.target_id, self.name)
>    1325 for temp_arg in temp_args:
>    1326     if hasattr(temp_arg, "_detach"):
> File /opt/spark/python/pyspark/errors/exceptions/captured.py:185, in 
> capture_sql_exception.<locals>.deco(*a, **kw)
>     181 converted = convert_exception(e.java_exception)
>     182 if not isinstance(converted, UnknownException):
>     183     # Hide where the exception came from that shows a non-Pythonic
>     184     # JVM exception message.
> --> 185     raise converted from None
>     186 else:
>     187     raise
> IllegalArgumentException: geometry argument must be a named reference to an 
> existing column {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (SEDONA-739) ST_DBScan fails when selecting only subset of columns.

Reply via email to