PDEUXA opened a new issue, #945: URL: https://github.com/apache/sedona/issues/945
Hola Sedona ! ## Expected behavior Serialisation of geom object when collecting data to Pandas or similar ## Actual behavior Error ``` ~/.venv/lib/python3.10/site-packages/pyspark/sql/dataframe.py in take(self, num) 866 [Row(age=2, name='Alice'), Row(age=5, name='Bob')] 867 """ --> 868 return self.limit(num).collect() 869 870 def tail(self, num: int) -> List[Row]: ~/.venv/lib/python3.10/site-packages/pyspark/sql/dataframe.py in collect(self) 816 with SCCallSiteSync(self._sc): 817 sock_info = self._jdf.collectToPython() --> 818 return list(_load_from_socket(sock_info, BatchedSerializer(CPickleSerializer()))) 819 820 def toLocalIterator(self, prefetchPartitions: bool = False) -> Iterator[Row]: ~/.venv/lib/python3.10/site-packages/pyspark/serializers.py in load_stream(self, stream) 149 while True: 150 try: --> 151 yield self._read_with_length(stream) 152 except EOFError: 153 return ~/.venv/lib/python3.10/site-packages/pyspark/serializers.py in _read_with_length(self, stream) 171 if len(obj) < length: 172 raise EOFError --> 173 return self.loads(obj) 174 175 def dumps(self, obj): ~/.venv/lib/python3.10/site-packages/pyspark/serializers.py in loads(self, obj, encoding) 469 470 def loads(self, obj, encoding="bytes"): --> 471 return cloudpickle.loads(obj, encoding=encoding) 472 473 ~/.venv/lib/python3.10/site-packages/pyspark/sql/types.py in <lambda>(*a) 1727 # This is used to unpickle a Row from JVM 1728 def _create_row_inbound_converter(dataType: DataType) -> Callable: -> 1729 return lambda *a: dataType.fromInternal(a) 1730 1731 ~/.venv/lib/python3.10/site-packages/pyspark/sql/types.py in fromInternal(self, obj) 819 if self._needSerializeAnyField: 820 # Only calling fromInternal function for fields that need conversion --> 821 values = [ 822 f.fromInternal(v) if c else v 823 for f, v, c in zip(self.fields, obj, self._needConversion) ~/.venv/lib/python3.10/site-packages/pyspark/sql/types.py in <listcomp>(.0) 820 # Only calling fromInternal function for fields that need conversion 821 values = [ --> 822 f.fromInternal(v) if c else v 823 for f, v, c in zip(self.fields, obj, self._needConversion) 824 ] ~/.venv/lib/python3.10/site-packages/pyspark/sql/types.py in fromInternal(self, obj) 592 593 def fromInternal(self, obj: T) -> T: --> 594 return self.dataType.fromInternal(obj) 595 596 def typeName(self) -> str: # type: ignore[override] ~/.venv/lib/python3.10/site-packages/pyspark/sql/types.py in fromInternal(self, obj) 879 v = self._cachedSqlType().fromInternal(obj) 880 if v is not None: --> 881 return self.deserialize(v) 882 883 def serialize(self, obj: Any) -> Any: ~/.venv/lib/python3.10/site-packages/sedona/sql/types.py in deserialize(self, datum) 31 32 def deserialize(self, datum): ---> 33 geom, offset = geometry_serde.deserialize(datum) 34 return geom 35 ~/.venv/lib/python3.10/site-packages/sedona/utils/geometry_serde.py in deserialize(buf) 59 if buf is None: 60 return None ---> 61 return geomserde_speedup.deserialize(buf) 62 63 speedup_enabled = True TypeError: a bytes-like object is required, not 'list' ``` ## Steps to reproduce the problem I got the following geom, "0101000020DB0B00004D3F10ED049B4E411318961117634741" leading to this dataframe: ``` root |-- geom: string (nullable = true) ``` I am applying the following function: `df.withColumn("geom2", geom_from_wkb(F.unhex(F.col(geom")))` when I show the dataframe` df.select("geom","geom2").show(2,False)`: ``` +---------------------------------------------+--------------------------------------------------+ |geom2. |geom | +---------------------------------------------+--------------------------------------------------+ |POINT (4011529.8520583273 3065390.1373930066)|0101000020DB0B00004D3F10ED049B4E411318961117634741| |POINT (4009430.840070244 3009943.8371693227) |0101000020DB0B0000F86B876BEB964E41475D28EBCBF64641| +---------------------------------------------+--------------------------------------------------+ ``` when I take, collect, or toPandas() -> the error is threw. It was working on a previous Sedona version 1.2.1 ## Settings Sedona version = 1.4.1 Apache Spark version = 3.3.2 Apache Flink version = N/A API type = Python Scala version = 2.12 JRE version = 1.8 Python version = 3.10.9 Environment = Standalone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org