Imbruced commented on pull request #516:
URL: https://github.com/apache/incubator-sedona/pull/516#issuecomment-879031727
@netanel246 I was able to quick fix the Python exceptions which occur after
changes within this PR.
* Fast solution require to force user to use "wkb" serializer. I can create
PR with that but that cause one issue, Python API will have only one
serialization method available (wkb). Selecting "shape" method breaks the API.
* Second solution is to implement also shape serializer in Python API but...
It is ugly solution in my opinion. Why ? The only way how I can get information
about serialization methodology is using SedonaRegistrator and store that
information after registering additional functions and after that dynamically
chose which serializer should be used. Also I can try to get spark session and
current arguments, but still this causes code smell. I do not like that
solution.
```python
class GeometrySerializer:
def serialize(self, obj):
raise NotImplementedError("Geometry serializer should implement
serialize method")
def deserialize(self, datum) -> BaseGeometry:
raise NotImplementedError("Geometry serializer should implement
deserialize method")
class WKBSerializer(GeometrySerializer):
def serialize(self, obj):
return dumps(obj)
def deserialize(self, datum) -> BaseGeometry:
bytes_data = b''.join([struct.pack('b', el) for el in datum])
geom = loads(bytes_data)
return geom
class ShapeSerializer(GeometrySerializer):
def serialize(self, obj):
pass
def deserialize(self, datum) -> BaseGeometry:
pass
serializers = {
"wkb": WKBSerializer(),
"shape": ShapeSerializer()
}
class GeometryType(UserDefinedType):
@classmethod
def sqlType(cls):
return ArrayType(ByteType(), containsNull=False)
def fromInternal(self, obj):
return self.deserialize(obj)
def toInternal(self, obj):
return [el - 256 if el >= 128 else el for el in self.serialize(obj)]
def serialize(self, obj):
from sedona.register import SedonaRegistrator
serializer = serializers[SedonaRegistrator.serializer]
return serializer.serialize(obj)
def deserialize(self, datum):
from sedona.register import SedonaRegistrator
deserializer = serializers[SedonaRegistrator.serializer]
return deserializer.deserialize(datum)
@classmethod
def module(cls):
return "sedona.sql.types"
def needConversion(self):
return True
@classmethod
def scalaUDT(cls):
return "org.apache.spark.sql.sedona_sql.UDT.GeometryUDT"
```
* Third, require some minimal code changes on jvm side, can you bring
information which serialization method you use as first 4 bytes saved as
integer ? Example 1 means wkb, 2 means shape and other numbers for next
possible serializers ? What do you think ? I think this is much safer than the
previous ones.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]