[jira] [Commented] (SEDONA-153) Python Serialization Fails with Nulls

Doug Dennis (Jira) Wed, 24 Aug 2022 19:06:03 -0700


    [ 
https://issues.apache.org/jira/browse/SEDONA-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584550#comment-17584550
 ]


Doug Dennis commented on SEDONA-153:
------------------------------------

I'm not sure about the connection with ST_GeomFromWKT but the exception I get 
when running test_null_deserializer is coming from the Python side. The 
deserialize method attempts to iterate on None. Here is some of the output from 
pytest that demonstrates:
{code:java}
def deserialize(self, datum):
>       bytes_data = b''.join([struct.pack('b', el) for el in datum])
E       TypeError: 'NoneType' object is not iterable

sedona/sql/types.py:40: TypeError {code}
My solution would be to add guards to GeometryType in Python like what Spark 
does with their TimestampType:

[https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L266]

[https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L273]

 

> Python Serialization Fails with Nulls
> -------------------------------------
>
>                 Key: SEDONA-153
>                 URL: https://issues.apache.org/jira/browse/SEDONA-153
>             Project: Apache Sedona
>          Issue Type: Bug
>            Reporter: Doug Dennis
>            Priority: Major
>
> The following currently fail due to Shapely not liking nulls/Nones:
> {code:python}
> def test_null_deserializer(self):
>     result = self.spark.sql("select st_geomfromwkt(null)").collect()[0][0]
>     assert result is None
> def test_null_serializer(self):
>     data = [
>         [1, None]
>     ]
>     schema = t.StructType(
>         [
>             t.StructField("id", IntegerType(), True),
>             t.StructField("geom", GeometryType(), True),
>         ]
>     )
>     self.spark.createDataFrame(
>         data,
>         schema
>     ).createOrReplaceTempView("points")
>     count = self.spark.sql("select count from points").collect()[0][0]
>     assert count == 1
> {code}
> The solution is to add some null guards to methods in the python GeometryType 
> class. I can make a PR for this but I wasn't sure if I needed to wait for 
> this issue to be approved or acknowledged or something :)
> Edit: I adjusted the deserializer test. I accidentally used a previous 
> version that fails on analysis. This version fails when the None is attempted 
> to be iterated in Python.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (SEDONA-153) Python Serialization Fails with Nulls

Reply via email to