The GitHub Actions job "Fury CI" on fury.git has failed.
Run started by GitHub user pandalee99 (triggered by pandalee99).

Head commit for run:
f2d38abdaf84d114b6447eba644995dc5ef9da29 / Shawn Yang <[email protected]>
feat(python): chunk based map serialization for python (#2038)

## What does this PR do?

Implement chunk based map serialization for python using cython.

Note:
- non-cython debug mode version is not implemented in this PR. It's
implemented in #2037
- xlang serialization is not covered in this PR too. It will be
supported easily after we unified type system in java for xlang and
native java serialization

## Related issues
Closes #1935 

## Does this PR introduce any user-facing change?

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/fury/issues/new/choose) describing the
need to do so and update the document if necessary.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?

## Benchmark

Here is the benchamrk script:
```python
def test_map_benchmark(data=None, dictsize=50, repeat=100000):
    import timeit, pickle, pyfury

    fury = pyfury.Fury(language=pyfury.Language.PYTHON, ref_tracking=False)
    dict0 = data or {i: i * 2 for i in range(dictsize)}
    bytes0 = fury.serialize(dict0)
    bytes1 = pickle.dumps(dict0)
    print(f"fury serialize map of size {len(dict0)}, payload size 
{len(bytes0)}")
    print(f"pickle serialize map of size {len(dict0)}, payload size 
{len(bytes1)}")
    print(f"fury serialize map of size {len(dict0)}", timeit.timeit(lambda : 
fury.serialize(dict0), number=repeat))
    print(f"pickle serialize map of size {len(dict0)}", timeit.timeit(lambda : 
pickle.dumps(dict0), number=repeat))
    print(f"fury deserialize map of size {len(dict0)}", timeit.timeit(lambda : 
fury.deserialize(bytes0), number=repeat))
    print(f"pickle deserialize map of size {len(dict0)}", timeit.timeit(lambda 
: pickle.loads(bytes1), number=repeat))
```

With this PR,  **the serialized size is only 1/2 of pickle**, 
test result:
```bash
In [7]: test_map_benchmark(dictsize=50, repeat=1000000)
fury serialize map of size 50, payload size 126
pickle serialize map of size 50, payload size 216
fury serialize map of size 50 2.600001722999991
pickle serialize map of size 50 2.703825038000005
fury deserialize map of size 50 3.2978402969999934
pickle deserialize map of size 50 3.6489022370000015

In [8]: test_map_benchmark(dictsize=500, repeat=100000)
fury serialize map of size 500, payload size 1917
pickle serialize map of size 500, payload size 2632
fury serialize map of size 500 1.541724773999988
pickle serialize map of size 500 2.1854165999999964
fury deserialize map of size 500 3.2613812140000107
pickle deserialize map of size 500 3.2642077769999958

In [23]: test_map_benchmark(data={f"k{i}": f"v{i}" for i in range(10)}, 
repeat=1000000)
fury serialize map of size 10, payload size 88
pickle serialize map of size 10, payload size 116
fury serialize map of size 10 2.053245253
pickle serialize map of size 10 1.5431892400000606
fury deserialize map of size 10 2.8904618450000044
pickle deserialize map of size 10 2.2623522280000543

In [22]: test_map_benchmark(data={f"k{i}": f"v{i}" for i in range(1000)}, 
repeat=100000)
fury serialize map of size 1000, payload size 11801
pickle serialize map of size 1000, payload size 13798
fury serialize map of size 1000 7.018782786999964
pickle serialize map of size 1000 21.388066090000052
fury deserialize map of size 1000 19.090073496999935
pickle deserialize map of size 1000 20.72099072399999
```

before this PR, the **serialized size is 50% larger than pickle**:
```bash
In [6]: test_map_benchmark(dictsize=50, repeat=1000000)
fury serialize map of size 50, payload size 322
pickle serialize map of size 50, payload size 216
fury serialize map of size 50 4.886074129999997
pickle serialize map of size 50 2.684925058999994
fury deserialize map of size 50 5.766612550999994
pickle deserialize map of size 50 3.482006009999992

In [7]: test_map_benchmark(dictsize=500, repeat=100000)
fury serialize map of size 500, payload size 3909
pickle serialize map of size 500, payload size 2632
fury serialize map of size 500 3.6878661510000086
pickle serialize map of size 500 2.0822324780000088
fury deserialize map of size 500 5.649835711999998
pickle deserialize map of size 500 3.401463585000016

In [8]: test_map_benchmark(data={f"k{i}": f"v{i}" for i in range(10)}, 
repeat=1000000)
fury serialize map of size 10, payload size 104
pickle serialize map of size 10, payload size 116
fury serialize map of size 10 1.5266061640000146
pickle serialize map of size 10 1.7377313819999927
fury deserialize map of size 10 2.830370420999998
pickle deserialize map of size 10 2.3650116949999926

In [9]: test_map_benchmark(data={f"k{i}": f"v{i}" for i in range(1000)}, 
repeat=100000)
fury serialize map of size 1000, payload size 13785
pickle serialize map of size 1000, payload size 13798
fury serialize map of size 1000 5.561600682000005
pickle serialize map of size 1000 15.757341811999993
fury deserialize map of size 1000 19.507720968
pickle deserialize map of size 1000 21.805054765999955
```

Report URL: https://github.com/apache/fury/actions/runs/13173695018

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to