This is an automated email from the ASF dual-hosted git repository.
pandalee pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fory.git
The following commit(s) were added to refs/heads/main by this push:
new 49746f346 docs(python): add row format doc (#2499)
49746f346 is described below
commit 49746f34646dd81e0c2e3ea751be57b9c6788c34
Author: Shawn Yang <[email protected]>
AuthorDate: Sat Aug 23 10:42:41 2025 +0800
docs(python): add row format doc (#2499)
## What does this PR do?
<!-- Describe the purpose of this PR. -->
## Related issues
#2498
## Does this PR introduce any user-facing change?
<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/fory/issues/new/choose) describing the
need to do so and update the document if necessary.
-->
- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?
## Benchmark
<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.
-->
---
python/README.md | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 75 insertions(+), 1 deletion(-)
diff --git a/python/README.md b/python/README.md
index fb29c7dc5..963c8409a 100644
--- a/python/README.md
+++ b/python/README.md
@@ -46,7 +46,7 @@ print(fory.deserialize(data))
### Cross-language Serialization
-Fory excels at cross-language serialization. You can serialize data in Python
and deserialize it in another language like Java or Go, and vice-versa.
+Apache Fory excels at cross-language serialization. You can serialize data in
Python and deserialize it in another language like Java or Go, and vice-versa.
Here's an example of how to serialize an object in Python and deserialize it
in Java:
@@ -95,6 +95,80 @@ public class ReferenceExample {
}
```
+### Row Format Zero-Copy Partial Serialzation
+
+Apache Fory provide a random-access row format, which supports map a typed
nested struct into a binary and read its nested element without deserializing
the whole binary. This can be used to minimize teh deserialization overhead for
huge objects in the case where you only needs to access part of the data. You
can even encode huge objects into binary and write to file, then mmap that file
into memory to reduce memory overhead too.
+
+**Python**
+
+```python
+@dataclass
+class Bar:
+ f1: str
+ f2: List[pa.int64]
+@dataclass
+class Foo:
+ f1: pa.int32
+ f2: List[pa.int32]
+ f3: Dict[str, pa.int32]
+ f4: List[Bar]
+
+encoder = pyfory.encoder(Foo)
+foo = Foo(f1=10, f2=list(range(1000_000)),
+ f3={f"k{i}": i for i in range(1000_000)},
+ f4=[Bar(f1=f"s{i}", f2=list(range(10))) for i in range(1000_000)])
+binary: bytes = encoder.to_row(foo).to_bytes()
+foo_row = pyfory.RowData(encoder.schema, binary)
+print(foo_row.f2[100000], foo_row.f4[100000].f1, foo_row.f4[200000].f2[5])
+```
+
+**Java**
+
+```java
+public class Bar {
+ String f1;
+ List<Long> f2;
+}
+
+public class Foo {
+ int f1;
+ List<Integer> f2;
+ Map<String, Integer> f3;
+ List<Bar> f4;
+}
+
+RowEncoder<Foo> encoder = Encoders.bean(Foo.class);
+Foo foo = new Foo();
+foo.f1 = 10;
+foo.f2 = IntStream.range(0, 1000000).boxed().collect(Collectors.toList());
+foo.f3 = IntStream.range(0, 1000000).boxed().collect(Collectors.toMap(i ->
"k"+i, i->i));
+List<Bar> bars = new ArrayList<>(1000000);
+for (int i = 0; i < 1000000; i++) {
+ Bar bar = new Bar();
+ bar.f1 = "s"+i;
+ bar.f2 = LongStream.range(0, 10).boxed().collect(Collectors.toList());
+ bars.add(bar);
+}
+foo.f4 = bars;
+// Can be zero-copy read by python
+BinaryRow binaryRow = encoder.toRow(foo);
+// can be data from python
+Foo newFoo = encoder.fromRow(binaryRow);
+// zero-copy read List<Integer> f2
+BinaryArray binaryArray2 = binaryRow.getArray(1);
+// zero-copy read List<Bar> f4
+BinaryArray binaryArray4 = binaryRow.getArray(3);
+// zero-copy read 11th element of `readList<Bar> f4`
+BinaryRow barStruct = binaryArray4.getStruct(10);
+
+// zero-copy read 6th of f2 of 11th element of `readList<Bar> f4`
+barStruct.getArray(1).getInt64(5);
+RowEncoder<Bar> barEncoder = Encoders.bean(Bar.class);
+// deserialize part of data.
+Bar newBar = barEncoder.fromRow(barStruct);
+Bar newBar2 = barEncoder.fromRow(binaryArray4.getStruct(20));
+```
+
## Useful Links
- **[Project Website](https://fory.apache.org)**
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]