Imbruced commented on pull request #516:
URL: https://github.com/apache/incubator-sedona/pull/516#issuecomment-879031727


   @netanel246 I was able to quick fix the Python exceptions which occur after 
changes within this PR. 
   * Fast solution require to force user to use "wkb" serializer. I can create 
PR with that but that cause one issue, Python API will have only one 
serialization method available (wkb). Selecting "shape" method breaks the API. 
   * Second solution is to implement also shape serializer in Python API but... 
It is ugly solution in my opinion. Why ? The only way how I can get information 
about serialization methodology is using SedonaRegistrator and store that 
information after registering additional functions and after that dynamically 
chose which serializer should be used. Also I can try to get spark session and 
current arguments, but still this causes code smell. I do not like that 
solution.
   ```python
   
   class GeometrySerializer:
   
       def serialize(self, obj):
           raise NotImplementedError("Geometry serializer should implement 
serialize method")
   
       def deserialize(self, datum) -> BaseGeometry:
           raise NotImplementedError("Geometry serializer should implement 
deserialize method")
   
   
   class WKBSerializer(GeometrySerializer):
   
       def serialize(self, obj):
           return dumps(obj)
   
       def deserialize(self, datum) -> BaseGeometry:
           bytes_data = b''.join([struct.pack('b', el) for el in datum])
           geom = loads(bytes_data)
           return geom
   
   
   class ShapeSerializer(GeometrySerializer):
   
       def serialize(self, obj):
           pass
   
       def deserialize(self, datum) -> BaseGeometry:
           pass
   
   
   serializers = {
       "wkb": WKBSerializer(),
       "shape": ShapeSerializer()
   }
   
   
   class GeometryType(UserDefinedType):
   
       @classmethod
       def sqlType(cls):
           return ArrayType(ByteType(), containsNull=False)
   
       def fromInternal(self, obj):
           return self.deserialize(obj)
   
       def toInternal(self, obj):
           return [el - 256 if el >= 128 else el for el in self.serialize(obj)]
   
       def serialize(self, obj):
           from sedona.register import SedonaRegistrator
           serializer = serializers[SedonaRegistrator.serializer]
           return serializer.serialize(obj)
   
       def deserialize(self, datum):
           from sedona.register import SedonaRegistrator
           deserializer = serializers[SedonaRegistrator.serializer]
           return deserializer.deserialize(datum)
   
       @classmethod
       def module(cls):
           return "sedona.sql.types"
   
       def needConversion(self):
           return True
   
       @classmethod
       def scalaUDT(cls):
           return "org.apache.spark.sql.sedona_sql.UDT.GeometryUDT"
   ```
   * Third, require some minimal code changes on jvm side, can you bring 
information which serialization method you use as first 4 bytes saved as 
integer ? Example 1 means wkb, 2 means shape and other numbers for next 
possible serializers ? What do you think ? I think this is much safer than the 
previous ones. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to