The GitHub Actions job "Fury CI" on fury.git has succeeded. Run started by GitHub user pandalee99 (triggered by chaokunyang).
Head commit for run: f2d38abdaf84d114b6447eba644995dc5ef9da29 / Shawn Yang <[email protected]> feat(python): chunk based map serialization for python (#2038) ## What does this PR do? Implement chunk based map serialization for python using cython. Note: - non-cython debug mode version is not implemented in this PR. It's implemented in #2037 - xlang serialization is not covered in this PR too. It will be supported easily after we unified type system in java for xlang and native java serialization ## Related issues Closes #1935 ## Does this PR introduce any user-facing change? <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/fury/issues/new/choose) describing the need to do so and update the document if necessary. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark Here is the benchamrk script: ```python def test_map_benchmark(data=None, dictsize=50, repeat=100000): import timeit, pickle, pyfury fury = pyfury.Fury(language=pyfury.Language.PYTHON, ref_tracking=False) dict0 = data or {i: i * 2 for i in range(dictsize)} bytes0 = fury.serialize(dict0) bytes1 = pickle.dumps(dict0) print(f"fury serialize map of size {len(dict0)}, payload size {len(bytes0)}") print(f"pickle serialize map of size {len(dict0)}, payload size {len(bytes1)}") print(f"fury serialize map of size {len(dict0)}", timeit.timeit(lambda : fury.serialize(dict0), number=repeat)) print(f"pickle serialize map of size {len(dict0)}", timeit.timeit(lambda : pickle.dumps(dict0), number=repeat)) print(f"fury deserialize map of size {len(dict0)}", timeit.timeit(lambda : fury.deserialize(bytes0), number=repeat)) print(f"pickle deserialize map of size {len(dict0)}", timeit.timeit(lambda : pickle.loads(bytes1), number=repeat)) ``` With this PR, **the serialized size is only 1/2 of pickle**, test result: ```bash In [7]: test_map_benchmark(dictsize=50, repeat=1000000) fury serialize map of size 50, payload size 126 pickle serialize map of size 50, payload size 216 fury serialize map of size 50 2.600001722999991 pickle serialize map of size 50 2.703825038000005 fury deserialize map of size 50 3.2978402969999934 pickle deserialize map of size 50 3.6489022370000015 In [8]: test_map_benchmark(dictsize=500, repeat=100000) fury serialize map of size 500, payload size 1917 pickle serialize map of size 500, payload size 2632 fury serialize map of size 500 1.541724773999988 pickle serialize map of size 500 2.1854165999999964 fury deserialize map of size 500 3.2613812140000107 pickle deserialize map of size 500 3.2642077769999958 In [23]: test_map_benchmark(data={f"k{i}": f"v{i}" for i in range(10)}, repeat=1000000) fury serialize map of size 10, payload size 88 pickle serialize map of size 10, payload size 116 fury serialize map of size 10 2.053245253 pickle serialize map of size 10 1.5431892400000606 fury deserialize map of size 10 2.8904618450000044 pickle deserialize map of size 10 2.2623522280000543 In [22]: test_map_benchmark(data={f"k{i}": f"v{i}" for i in range(1000)}, repeat=100000) fury serialize map of size 1000, payload size 11801 pickle serialize map of size 1000, payload size 13798 fury serialize map of size 1000 7.018782786999964 pickle serialize map of size 1000 21.388066090000052 fury deserialize map of size 1000 19.090073496999935 pickle deserialize map of size 1000 20.72099072399999 ``` before this PR, the **serialized size is 50% larger than pickle**: ```bash In [6]: test_map_benchmark(dictsize=50, repeat=1000000) fury serialize map of size 50, payload size 322 pickle serialize map of size 50, payload size 216 fury serialize map of size 50 4.886074129999997 pickle serialize map of size 50 2.684925058999994 fury deserialize map of size 50 5.766612550999994 pickle deserialize map of size 50 3.482006009999992 In [7]: test_map_benchmark(dictsize=500, repeat=100000) fury serialize map of size 500, payload size 3909 pickle serialize map of size 500, payload size 2632 fury serialize map of size 500 3.6878661510000086 pickle serialize map of size 500 2.0822324780000088 fury deserialize map of size 500 5.649835711999998 pickle deserialize map of size 500 3.401463585000016 In [8]: test_map_benchmark(data={f"k{i}": f"v{i}" for i in range(10)}, repeat=1000000) fury serialize map of size 10, payload size 104 pickle serialize map of size 10, payload size 116 fury serialize map of size 10 1.5266061640000146 pickle serialize map of size 10 1.7377313819999927 fury deserialize map of size 10 2.830370420999998 pickle deserialize map of size 10 2.3650116949999926 In [9]: test_map_benchmark(data={f"k{i}": f"v{i}" for i in range(1000)}, repeat=100000) fury serialize map of size 1000, payload size 13785 pickle serialize map of size 1000, payload size 13798 fury serialize map of size 1000 5.561600682000005 pickle serialize map of size 1000 15.757341811999993 fury deserialize map of size 1000 19.507720968 pickle deserialize map of size 1000 21.805054765999955 ``` Report URL: https://github.com/apache/fury/actions/runs/13173695018 With regards, GitHub Actions via GitBox --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
