douglasdennis commented on PR #745:
URL: https://github.com/apache/sedona/pull/745#issuecomment-1381195068
@Imbruced Took me a little longer but here are some memory profiling
results.
TL;DR
Serialization Average Memory Increment Results (MiB):
|Serializer|Point|LineString|Polygon|
|----------|------|----------|---------|
|WKB|7.22|13.82|13.76|
|Master|9.98|16.2|17.7|
|Refactor|7.1|13.6|14.98|
Deserialization Average Memory Increment Results (MiB):
|Deserializer|Point|LineString|Polygon|
|----------|------|----------|---------|
|WKB|28.5|39|46.8|
|Master|34.42|45.26|52.84|
|Refactor|34.8|45.26|52.8|
Method:
I used the `memory_profiler` package to run trials using wkb.dumps and
wkb.loads, this PR's refactored serialize and deserialize functions, and the
current master branch's serialize and deserialize function. For each system of
serde I ran trials operating on 100,000 `Point`s, `LineString`s with five
vertices, and square `Polygon`s. For each combination of serde and geometry I
ran five trials and averaged the results. The script I used to run these trials
is included below. Each script was ran using a command such as:
`mprof run python python/mem_profile_serialize.py wkb point`
This would give a printout like this:
```
mprof: Sampling memory every 0.1s
running new process
Filename: python/mem_profile_serialize.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
21 41.8 MiB 41.8 MiB 1 @profile
22 def
run_serialize_profile(serializer, geom):
23 48.8 MiB 7.0 MiB 100003 x = [serializer(geom) for
_ in range(100_000)]
24 48.8 MiB 0.0 MiB 1 return x
```
I then averaged the "Increment" amount for line 23 across all five trials.
Additionally, I plotted a representative run of the serialize operations (if
you click on these you should get a bigger image):
| |Point|LineString|Polygon|
|---|----|----|----|
|WKB||||
|Master||||
|Refactor||||
Lastly here is the serialization trial script:
```python
import sys
from typing import List
from sedona.utils.geometry_serde import serialize
from shapely.geometry import Point, LineString, Polygon
from shapely.wkb import dumps
from memory_profiler import profile
def make_point():
return Point(12.3, 45.6)
def make_linestring():
return LineString([(n, n) for n in range(5)])
def make_polygon():
return Polygon([(10.0, 10.0), (20.0, 10.0), (20.0, 20.0), (10.0, 20.0),
(10.0, 10.0)])
@profile
def run_serialize_profile(serializer, geom):
x = [serializer(geom) for _ in range(100_000)]
return x
def main(args: List[str]) -> int:
if len(args) < 2:
print("Usage: mem_profile_run.py <sedona or wkb> <point>")
return 1
serializer_type = args[0]
geom_type = args[1]
if geom_type == "point":
geom = make_point()
elif geom_type == "linestring":
geom = make_linestring()
elif geom_type == "polygon":
geom = make_polygon()
else:
print(f"Geometry type is not supported: {geom_type}")
return 1
if serializer_type == "sedona":
serializer = serialize
elif serializer_type == "wkb":
serializer = dumps
else:
print(f"Serializer type is not supported: {serializer_type}")
return 1
run_serialize_profile(serializer, geom)
if __name__ == "__main__":
exit(main(sys.argv[1:]))
```
Deserialize trial script:
```python
import sys
from typing import List
from sedona.utils.geometry_serde import serialize, deserialize
from shapely.geometry import Point, LineString, Polygon
from shapely.wkb import dumps, loads
from memory_profiler import profile
def make_point():
return Point(12.3, 45.6)
def make_linestring():
return LineString([(n, n) for n in range(5)])
def make_polygon():
return Polygon([(10.0, 10.0), (20.0, 10.0), (20.0, 20.0), (10.0, 20.0),
(10.0, 10.0)])
@profile
def run_deserialize_profile(deserializer, geom):
x = [deserializer(geom) for _ in range(100_000)]
return x
def main(args: List[str]) -> int:
if len(args) < 2:
print("Usage: mem_profile_run.py <sedona or wkb> <point>")
return 1
serializer_type = args[0]
geom_type = args[1]
if geom_type == "point":
geom = make_point()
elif geom_type == "linestring":
geom = make_linestring()
elif geom_type == "polygon":
geom = make_polygon()
else:
print(f"Geometry type is not supported: {geom_type}")
return 1
if serializer_type == "sedona":
geom = serialize(geom)
deserializer = deserialize
elif serializer_type == "wkb":
geom = dumps(geom)
deserializer = loads
else:
print(f"Serializer type is not supported: {serializer_type}")
return 1
run_deserialize_profile(deserializer, geom)
if __name__ == "__main__":
exit(main(sys.argv[1:]))
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]