[jira] [Commented] (SEDONA-739) ST_DBScan fails when selecting only subset of columns.

James Willis (Jira) Fri, 20 Jun 2025 17:45:05 -0700


    [ 
https://issues.apache.org/jira/browse/SEDONA-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17985051#comment-17985051
 ]


James Willis commented on SEDONA-739:
-------------------------------------

Absent a full MRE, a full query plan (especially the optimized logical plan) 
would be immensely helpful.

> ST_DBScan fails when selecting only subset of columns.
> ------------------------------------------------------
>
>                 Key: SEDONA-739
>                 URL: https://issues.apache.org/jira/browse/SEDONA-739
>             Project: Apache Sedona
>          Issue Type: Bug
>            Reporter: Paweł Kociński
>            Priority: Major
>
>  
> {code:java}
> dbscan_df = sedona.sql(
>     """
>     SELECT
>         index,
>         geom AS geom,
>         ST_DBSCAN(geom, 0.5, 10, False) AS scan
>     FROM points
>     """
> ) {code}
> Selecting all columns works just fine
>  
>  
> {code:java}
> result = dbscan_df.select("scan.*", "index", "geom") {code}
> but subset 
>  
> {code:java}
> dbscan_df.select("scan.*", "index").show() {code}
> is causing the 
> {code:java}
> IllegalArgumentException                  Traceback (most recent call last)
> Cell In[56], line 1
> ----> 1 dbscan_df.select("scan.*", "index").show()
> File /opt/spark/python/pyspark/sql/dataframe.py:947, in DataFrame.show(self, 
> n, truncate, vertical)
>     887 def show(self, n: int = 20, truncate: Union[bool, int] = True, 
> vertical: bool = False) -> None:
>     888     """Prints the first ``n`` rows to the console.
>     889 
>     890     .. versionadded:: 1.3.0
>    (...)
>     945     name | Bob
>     946     """
> --> 947     print(self._show_string(n, truncate, vertical))
> File /opt/spark/python/pyspark/sql/dataframe.py:965, in 
> DataFrame._show_string(self, n, truncate, vertical)
>     959     raise PySparkTypeError(
>     960         error_class="NOT_BOOL",
>     961         message_parameters={"arg_name": "vertical", "arg_type": 
> type(vertical).__name__},
>     962     )
>     964 if isinstance(truncate, bool) and truncate:
> --> 965     return self._jdf.showString(n, 20, vertical)
>     966 else:
>     967     try:
> File /usr/local/lib/python3.10/dist-packages/py4j/java_gateway.py:1322, in 
> JavaMember.__call__(self, *args)
>    1316 command = proto.CALL_COMMAND_NAME +\
>    1317     self.command_header +\
>    1318     args_command +\
>    1319     proto.END_COMMAND_PART
>    1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>    1323     answer, self.gateway_client, self.target_id, self.name)
>    1325 for temp_arg in temp_args:
>    1326     if hasattr(temp_arg, "_detach"):
> File /opt/spark/python/pyspark/errors/exceptions/captured.py:185, in 
> capture_sql_exception.<locals>.deco(*a, **kw)
>     181 converted = convert_exception(e.java_exception)
>     182 if not isinstance(converted, UnknownException):
>     183     # Hide where the exception came from that shows a non-Pythonic
>     184     # JVM exception message.
> --> 185     raise converted from None
>     186 else:
>     187     raise
> IllegalArgumentException: geometry argument must be a named reference to an 
> existing column {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (SEDONA-739) ST_DBScan fails when selecting only subset of columns.

Reply via email to