douglasdennis commented on PR #745:
URL: https://github.com/apache/sedona/pull/745#issuecomment-1381195068

   @Imbruced Took me a little longer but here are some memory profiling 
results. 
   TL;DR 
   Serialization Average Memory Increment Results (MiB):
   |Serializer|Point|LineString|Polygon|
   |----------|------|----------|---------|
   |WKB|7.22|13.82|13.76|
   |Master|9.98|16.2|17.7|
   |Refactor|7.1|13.6|14.98|
   
   Deserialization Average Memory Increment Results (MiB):
   |Deserializer|Point|LineString|Polygon|
   |----------|------|----------|---------|
   |WKB|28.5|39|46.8|
   |Master|34.42|45.26|52.84|
   |Refactor|34.8|45.26|52.8|
   
   Method:
   I used the `memory_profiler` package to run trials using wkb.dumps and 
wkb.loads, this PR's refactored serialize and deserialize functions, and the 
current master branch's serialize and deserialize function. For each system of 
serde I ran trials operating on 100,000 `Point`s, `LineString`s with five 
vertices, and square `Polygon`s. For each combination of serde and geometry I 
ran five trials and averaged the results. The script I used to run these trials 
is included below. Each script was ran using a command such as:
   `mprof run python python/mem_profile_serialize.py wkb point`
   
   This would give a printout like this:
   ```
   mprof: Sampling memory every 0.1s
   running new process
   Filename: python/mem_profile_serialize.py
   
   Line #    Mem usage    Increment  Occurrences   Line Contents
   =============================================================
       21     41.8 MiB     41.8 MiB           1   @profile
       22                                         def 
run_serialize_profile(serializer, geom):
       23     48.8 MiB      7.0 MiB      100003       x = [serializer(geom) for 
_ in range(100_000)]
       24     48.8 MiB      0.0 MiB           1       return x
   ```
   
   I then averaged the "Increment" amount for line 23 across all five trials. 
Additionally, I plotted a representative run of the serialize operations (if 
you click on these you should get a bigger image):
   
   | |Point|LineString|Polygon|
   |---|----|----|----|
   
|WKB|![wkb_point](https://user-images.githubusercontent.com/4318895/212215809-55e7cfcd-3449-4ef5-bd9c-4c950f3d6f78.png)|![wkb_linestring](https://user-images.githubusercontent.com/4318895/212215837-6291dd6c-a2fc-4496-bb85-f3f677a7d778.png)|![wkb_polygon](https://user-images.githubusercontent.com/4318895/212215868-86a46a16-1289-421d-9e0b-859029c82172.png)|
   
|Master|![master_point](https://user-images.githubusercontent.com/4318895/212215948-32067e4c-1ec2-47ac-ae1f-3ce2d6c4e7cf.png)|![master_linestring](https://user-images.githubusercontent.com/4318895/212215964-6d2a3bbd-558c-48a3-9807-d55a364a94e6.png)|![master_polygon](https://user-images.githubusercontent.com/4318895/212215974-3988e685-cb18-42f2-9143-1b2b913290f3.png)|
   
|Refactor|![refactor_point](https://user-images.githubusercontent.com/4318895/212216008-ad8cc711-32f5-4aef-9f87-576e18bdff69.png)|![refactor_linestring](https://user-images.githubusercontent.com/4318895/212216034-c7b786ff-0187-4659-b73d-cc11dfd4254f.png)|![refactor_polygon](https://user-images.githubusercontent.com/4318895/212216047-41ca2be3-595b-4f02-ae8b-77d707012d43.png)|
   
   Lastly here is the serialization trial script:
   ```python
   import sys
   from typing import List
   
   from sedona.utils.geometry_serde import serialize
   from shapely.geometry import Point, LineString, Polygon
   from shapely.wkb import dumps
   
   from memory_profiler import profile
   
   
   def make_point():
       return Point(12.3, 45.6)
   
   def make_linestring():
       return LineString([(n, n) for n in range(5)])
   
   def make_polygon():
       return Polygon([(10.0, 10.0), (20.0, 10.0), (20.0, 20.0), (10.0, 20.0), 
(10.0, 10.0)])
   
   
   @profile
   def run_serialize_profile(serializer, geom):
       x = [serializer(geom) for _ in range(100_000)]
       return x
   
   
   def main(args: List[str]) -> int:
       if len(args) < 2:
           print("Usage: mem_profile_run.py <sedona or wkb> <point>")
           return 1
       serializer_type = args[0]
       geom_type = args[1]
   
       if geom_type == "point":
           geom = make_point()
       elif geom_type == "linestring":
           geom = make_linestring()
       elif geom_type == "polygon":
           geom = make_polygon()
       else:
           print(f"Geometry type is not supported: {geom_type}")
           return 1
   
       if serializer_type == "sedona":
           serializer = serialize
       elif serializer_type == "wkb":
           serializer = dumps
       else:
           print(f"Serializer type is not supported: {serializer_type}")
           return 1
   
       run_serialize_profile(serializer, geom)
   
   
   if __name__ == "__main__":
       exit(main(sys.argv[1:]))
   ```
   
   Deserialize trial script:
   ```python
   import sys
   from typing import List
   
   from sedona.utils.geometry_serde import serialize, deserialize
   from shapely.geometry import Point, LineString, Polygon
   from shapely.wkb import dumps, loads
   
   from memory_profiler import profile
   
   
   def make_point():
       return Point(12.3, 45.6)
   
   def make_linestring():
       return LineString([(n, n) for n in range(5)])
   
   def make_polygon():
       return Polygon([(10.0, 10.0), (20.0, 10.0), (20.0, 20.0), (10.0, 20.0), 
(10.0, 10.0)])
   
   
   @profile
   def run_deserialize_profile(deserializer, geom):
       x = [deserializer(geom) for _ in range(100_000)]
       return x
   
   
   def main(args: List[str]) -> int:
       if len(args) < 2:
           print("Usage: mem_profile_run.py <sedona or wkb> <point>")
           return 1
       serializer_type = args[0]
       geom_type = args[1]
   
       if geom_type == "point":
           geom = make_point()
       elif geom_type == "linestring":
           geom = make_linestring()
       elif geom_type == "polygon":
           geom = make_polygon()
       else:
           print(f"Geometry type is not supported: {geom_type}")
           return 1
   
       if serializer_type == "sedona":
           geom = serialize(geom)
           deserializer = deserialize
       elif serializer_type == "wkb":
           geom = dumps(geom)
           deserializer = loads
       else:
           print(f"Serializer type is not supported: {serializer_type}")
           return 1
   
       run_deserialize_profile(deserializer, geom)
   
   
   if __name__ == "__main__":
       exit(main(sys.argv[1:]))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to