This is an automated email from the ASF dual-hosted git repository. chaokunyang pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/fury-site.git
commit 68f34059d26358f8f5abfbfffccd0542b43863db Author: chaokunyang <[email protected]> AuthorDate: Sat Aug 17 15:03:08 2024 +0000 🔄 synced local 'docs/guide/' with remote 'docs/guide/' --- docs/guide/DEVELOPMENT.md | 5 +- docs/guide/graalvm_guide.md | 27 +--- docs/guide/java_serialization_guide.md | 10 +- docs/guide/row_format_guide.md | 12 -- docs/guide/scala_guide.md | 21 +-- docs/guide/xlang_serialization_guide.md | 239 ++++++++++++++++---------------- docs/guide/xlang_type_mapping.md | 6 +- 7 files changed, 135 insertions(+), 185 deletions(-) diff --git a/docs/guide/DEVELOPMENT.md b/docs/guide/DEVELOPMENT.md index 39d8605..3949dcb 100644 --- a/docs/guide/DEVELOPMENT.md +++ b/docs/guide/DEVELOPMENT.md @@ -1,9 +1,11 @@ --- -title: How to build Fury +title: Development sidebar_position: 7 id: development --- +# How to build Fury + Please checkout the source tree from https://github.com/apache/fury. ### Build Fury Java @@ -97,3 +99,4 @@ npm run test - node 14+ - npm 8+ + diff --git a/docs/guide/graalvm_guide.md b/docs/guide/graalvm_guide.md index 0333a43..3ed919f 100644 --- a/docs/guide/graalvm_guide.md +++ b/docs/guide/graalvm_guide.md @@ -4,6 +4,7 @@ sidebar_position: 6 id: graalvm_guide --- +# GraalVM Native Image GraalVM `native image` can compile java code into native code ahead to build faster, smaller, leaner applications. The native image doesn't have a JIT compiler to compile bytecode into machine code, and doesn't support reflection unless configure reflection file. @@ -15,7 +16,6 @@ In order to use Fury on graalvm native image, you must create Fury as an **stati the enclosing class initialize time. Then configure `native-image.properties` under `resources/META-INF/native-image/$xxx/native-image.propertie` to tell graalvm to init the class at native image build time. For example, here we configure `org.apache.fury.graalvm.Example` class be init at build time: - ```properties Args = --initialize-at-build-time=org.apache.fury.graalvm.Example ``` @@ -29,9 +29,7 @@ Note that Fury `asyncCompilationEnabled` option will be disabled automatically f native image doesn't support JIT at the image run time. ## Not thread-safe Fury - Example: - ```java import org.apache.fury.Fury; import org.apache.fury.util.Preconditions; @@ -65,15 +63,12 @@ public class Example { } } ``` - Then add `org.apache.fury.graalvm.Example` build time init to `native-image.properties` configuration: - ```properties Args = --initialize-at-build-time=org.apache.fury.graalvm.Example ``` ## Thread-safe Fury - ```java import org.apache.fury.Fury; import org.apache.fury.ThreadLocalFury; @@ -114,40 +109,32 @@ public class ThreadSafeExample { } } ``` - Then add `org.apache.fury.graalvm.ThreadSafeExample` build time init to `native-image.properties` configuration: - ```properties Args = --initialize-at-build-time=org.apache.fury.graalvm.ThreadSafeExample ``` ## Framework Integration - For framework developers, if you want to integrate fury for serialization, you can provided a configuration file to let the users to list all the classes they want to serialize, then you can load those classes and invoke `org.apache.fury.Fury.register(Class<?>, boolean)` to register those classes in your Fury integration class, and configure that class be initialized at graalvm native image build time. ## Benchmark - Here we give two class benchmarks between Fury and Graalvm Serialization. When Fury compression is disabled: - - Struct: Fury is `46x speed, 43% size` compared to JDK. - Pojo: Fury is `12x speed, 56% size` compared to JDK. When Fury compression is enabled: - - Struct: Fury is `24x speed, 31% size` compared to JDK. - Pojo: Fury is `12x speed, 48% size` compared to JDK. See [[Benchmark.java](https://github.com/apache/fury/blob/main/integration_tests/graalvm_tests/src/main/java/org/apache/fury/graalvm/Benchmark.java)] for benchmark code. ### Struct Benchmark - #### Class Fields - ```java public class Struct implements Serializable { public int f1; @@ -164,11 +151,8 @@ public class Struct implements Serializable { public double f12; } ``` - #### Benchmark Results - No compression: - ``` Benchmark repeat number: 400000 Object type: class org.apache.fury.graalvm.Struct @@ -180,9 +164,7 @@ JDK serialization took mills: 2254 Compare speed: Fury is 45.70x speed of JDK Compare size: Fury is 0.43x size of JDK ``` - Compress number: - ``` Benchmark repeat number: 400000 Object type: class org.apache.fury.graalvm.Struct @@ -196,9 +178,7 @@ Compare size: Fury is 0.31x size of JDK ``` ### Pojo Benchmark - #### Class Fields - ```java public class Foo implements Serializable { int f1; @@ -207,11 +187,8 @@ public class Foo implements Serializable { Map<String, Long> f4; } ``` - #### Benchmark Results - No compression: - ``` Benchmark repeat number: 400000 Object type: class org.apache.fury.graalvm.Foo @@ -223,9 +200,7 @@ JDK serialization took mills: 16266 Compare speed: Fury is 12.19x speed of JDK Compare size: Fury is 0.56x size of JDK ``` - Compress number: - ``` Benchmark repeat number: 400000 Object type: class org.apache.fury.graalvm.Foo diff --git a/docs/guide/java_serialization_guide.md b/docs/guide/java_serialization_guide.md index e29b0b2..de179f4 100644 --- a/docs/guide/java_serialization_guide.md +++ b/docs/guide/java_serialization_guide.md @@ -4,6 +4,8 @@ sidebar_position: 0 id: java_object_graph_guide --- +# Java object graph serialization + When only java object serialization needed, this mode will have better performance compared to cross-language object graph serialization. @@ -177,12 +179,12 @@ bit is set, then next byte will be read util first bit of next byte is unset. For long compression, fury support two encoding: - Fury SLI(Small long as int) Encoding (**used by default**): - - If long is in [-1073741824, 1073741823], encode as 4 bytes int: `| little-endian: ((int) value) << 1 |` - - Otherwise write as 9 bytes: `| 0b1 | little-endian 8bytes long |` + - If long is in [-1073741824, 1073741823], encode as 4 bytes int: `| little-endian: ((int) value) << 1 |` + - Otherwise write as 9 bytes: `| 0b1 | little-endian 8bytes long |` - Fury PVL(Progressive Variable-length Long) Encoding: - - First bit in every byte indicate whether has next byte. if first bit is set, then next byte will be read util + - First bit in every byte indicate whether has next byte. if first bit is set, then next byte will be read util first bit of next byte is unset. - - Negative number will be converted to positive number by `(v << 1) ^ (v >> 63)` to reduce cost of small negative + - Negative number will be converted to positive number by ` (v << 1) ^ (v >> 63)` to reduce cost of small negative numbers. If a number are `long` type, it can't be represented by smaller bytes mostly, the compression won't get good enough diff --git a/docs/guide/row_format_guide.md b/docs/guide/row_format_guide.md index 1083297..076a2c0 100644 --- a/docs/guide/row_format_guide.md +++ b/docs/guide/row_format_guide.md @@ -5,9 +5,7 @@ id: row_format_guide --- ## Row format protocol - ### Java - ```java public class Bar { String f1; @@ -52,9 +50,7 @@ RowEncoder<Bar> barEncoder = Encoders.bean(Bar.class); Bar newBar = barEncoder.fromRow(barStruct); Bar newBar2 = barEncoder.fromRow(binaryArray4.getStruct(20)); ``` - ### Python - ```python @dataclass class Bar: @@ -83,13 +79,10 @@ new_foo = pickle.loads(binary) print(new_foo.f2[100000], new_foo.f4[100000].f1, new_foo.f4[200000].f2[5]) print(f"pickle end: {datetime.datetime.now()}") ``` - ### Apache Arrow Support - Fury Format also supports automatic conversion from/to Arrow Table/RecordBatch. Java: - ```java Schema schema = TypeInference.inferSchema(BeanA.class); ArrowWriter arrowWriter = ArrowUtils.createArrowWriter(schema); @@ -100,18 +93,14 @@ for (int i = 0; i < 10; i++) { } return arrowWriter.finishAsRecordBatch(); ``` - Python: - ```python import pyfury encoder = pyfury.encoder(Foo) encoder.to_arrow_record_batch([foo] * 10000) encoder.to_arrow_table([foo] * 10000) ``` - C++ - ```c++ std::shared_ptr<ArrowWriter> arrow_writer; EXPECT_TRUE( @@ -126,7 +115,6 @@ EXPECT_TRUE(record_batch->Validate().ok()); EXPECT_EQ(record_batch->num_columns(), schema->num_fields()); EXPECT_EQ(record_batch->num_rows(), row_nums); ``` - ```java Schema schema = TypeInference.inferSchema(BeanA.class); ArrowWriter arrowWriter = ArrowUtils.createArrowWriter(schema); diff --git a/docs/guide/scala_guide.md b/docs/guide/scala_guide.md index 4de2f09..aa8f99b 100644 --- a/docs/guide/scala_guide.md +++ b/docs/guide/scala_guide.md @@ -4,8 +4,8 @@ sidebar_position: 4 id: scala_guide --- +# Scala serialization Fury supports all scala object serialization: - - `case` class serialization supported - `pojo/bean` class serialization supported - `object` singleton serialization supported @@ -15,15 +15,12 @@ Fury supports all scala object serialization: Scala 2 and 3 are both supported. ## Install - ```sbt libraryDependencies += "org.apache.fury" % "fury-core" % "0.7.0" ``` ## Fury creation - When using fury for scala serialization, you should create fury at least with following options: - ```scala val fury = Fury.builder() .withScalaOptimizationEnabled(true) @@ -31,14 +28,11 @@ val fury = Fury.builder() .withRefTracking(true) .build() ``` - Depending on the object types you serialize, you may need to register some scala internal types: - ```scala fury.register(Class.forName("scala.collection.generic.DefaultSerializationProxy")) fury.register(Class.forName("scala.Enumeration.Val")) ``` - If you want to avoid such registration, you can disable class registration by `FuryBuilder#requireClassRegistration(false)`. Note that this option allow to deserialize objects unknown types, more flexible but may be insecure if the classes contains malicious code. @@ -49,7 +43,6 @@ Note that fury instance should be shared between multiple serialization, the cre If you use shared fury instance across multiple threads, you should create `ThreadSafeFury` instead by `FuryBuilder#buildThreadSafeFury()` instead. ## Serialize case object - ```scala case class Person(github: String, age: Int, id: Long) val p = Person("https://github.com/chaokunyang", 18, 1) @@ -58,7 +51,6 @@ println(fury.deserializeJavaObject(fury.serializeJavaObject(p))) ``` ## Serialize pojo - ```scala class Foo(f1: Int, f2: String) { override def toString: String = s"Foo($f1, $f2)" @@ -67,7 +59,6 @@ println(fury.deserialize(fury.serialize(Foo(1, "chaokunyang")))) ``` ## Serialize object singleton - ```scala object singleton { } @@ -77,7 +68,6 @@ println(o1 == o2) ``` ## Serialize collection - ```scala val seq = Seq(1,2) val list = List("a", "b") @@ -88,7 +78,6 @@ println(fury.deserialize(fury.serialize(map))) ``` ## Serialize Tuple - ```scala val tuple = Tuple2(100, 10000L) println(fury.deserialize(fury.serialize(tuple))) @@ -97,16 +86,12 @@ println(fury.deserialize(fury.serialize(tuple))) ``` ## Serialize Enum - ### Scala3 Enum - ```scala enum Color { case Red, Green, Blue } println(fury.deserialize(fury.serialize(Color.Green))) ``` - ### Scala2 Enum - ```scala object ColorEnum extends Enumeration { type ColorEnum = Value @@ -116,7 +101,6 @@ println(fury.deserialize(fury.serialize(ColorEnum.Green))) ``` ## Serialize Option - ```scala val opt: Option[Long] = Some(100) println(fury.deserialize(fury.serialize(opt))) @@ -124,8 +108,7 @@ val opt1: Option[Long] = None println(fury.deserialize(fury.serialize(opt1))) ``` -## Performance - +# Performance Scala `pojo/bean/case/object` are supported by fury jit well, the performance is as good as fury java. Scala collections and generics doesn't follow java collection framework, and is not fully integrated with Fury JIT in current release version. The performance won't be as good as fury collections serialization for java. diff --git a/docs/guide/xlang_serialization_guide.md b/docs/guide/xlang_serialization_guide.md index 5bc6a05..a68348e 100644 --- a/docs/guide/xlang_serialization_guide.md +++ b/docs/guide/xlang_serialization_guide.md @@ -7,9 +7,9 @@ id: xlang_object_graph_guide ## Cross-language object graph serialization ### Serialize built-in types - Common types can be serialized automatically: primitive numeric types, string, binary, array, list, map and so on. + **Java** ```java @@ -64,32 +64,32 @@ import furygo "github.com/apache/fury/fury/go/fury" import "fmt" func main() { - list := []interface{}{true, false, "str", -1.1, 1, make([]int32, 10), make([]float64, 20)} - fury := furygo.NewFury() - bytes, err := fury.Marshal(list) - if err != nil { - panic(err) - } - var newValue interface{} - // bytes can be data serialized by other languages. - if err := fury.Unmarshal(bytes, &newValue); err != nil { - panic(err) - } - fmt.Println(newValue) - dict := map[string]interface{}{ - "k1": "v1", - "k2": list, - "k3": -1, - } - bytes, err = fury.Marshal(dict) - if err != nil { - panic(err) - } - // bytes can be data serialized by other languages. - if err := fury.Unmarshal(bytes, &newValue); err != nil { - panic(err) - } - fmt.Println(newValue) + list := []interface{}{true, false, "str", -1.1, 1, make([]int32, 10), make([]float64, 20)} + fury := furygo.NewFury() + bytes, err := fury.Marshal(list) + if err != nil { + panic(err) + } + var newValue interface{} + // bytes can be data serialized by other languages. + if err := fury.Unmarshal(bytes, &newValue); err != nil { + panic(err) + } + fmt.Println(newValue) + dict := map[string]interface{}{ + "k1": "v1", + "k2": list, + "k3": -1, + } + bytes, err = fury.Marshal(dict) + if err != nil { + panic(err) + } + // bytes can be data serialized by other languages. + if err := fury.Unmarshal(bytes, &newValue); err != nil { + panic(err) + } + fmt.Println(newValue) } ``` @@ -126,7 +126,6 @@ fn run() { ``` ### Serialize custom types - Serializing user-defined types needs registering the custom type using the register API to establish the mapping relationship between the type in different languages. **Java** @@ -256,59 +255,59 @@ import furygo "github.com/apache/fury/fury/go/fury" import "fmt" func main() { - type SomeClass1 struct { - F1 interface{} - F2 string - F3 []interface{} - F4 map[int8]int32 - F5 int8 - F6 int16 - F7 int32 - F8 int64 - F9 float32 - F10 float64 - F11 []int16 - F12 fury.Int16Slice - } - - type SomeClas2 struct { - F1 interface{} - F2 map[int8]int32 - } - fury := furygo.NewFury() - if err := fury.RegisterTagType("example.SomeClass1", SomeClass1{}); err != nil { - panic(err) - } - if err := fury.RegisterTagType("example.SomeClass2", SomeClass2{}); err != nil { - panic(err) - } - obj1 := &SomeClass1{} - obj1.F1 = true - obj1.F2 = map[int8]int32{-1: 2} - obj := &SomeClass1{} - obj.F1 = obj1 - obj.F2 = "abc" - obj.F3 = []interface{}{"abc", "abc"} - f4 := map[int8]int32{1: 2} - obj.F4 = f4 - obj.F5 = fury.MaxInt8 - obj.F6 = fury.MaxInt16 - obj.F7 = fury.MaxInt32 - obj.F8 = fury.MaxInt64 - obj.F9 = 1.0 / 2 - obj.F10 = 1 / 3.0 - obj.F11 = []int16{1, 2} - obj.F12 = []int16{-1, 4} - bytes, err := fury.Marshal(obj); - if err != nil { - panic(err) - } - var newValue interface{} - // bytes can be data serialized by other languages. - if err := fury.Unmarshal(bytes, &newValue); err != nil { - panic(err) - } - fmt.Println(newValue) + type SomeClass1 struct { + F1 interface{} + F2 string + F3 []interface{} + F4 map[int8]int32 + F5 int8 + F6 int16 + F7 int32 + F8 int64 + F9 float32 + F10 float64 + F11 []int16 + F12 fury.Int16Slice + } + + type SomeClas2 struct { + F1 interface{} + F2 map[int8]int32 + } + fury := furygo.NewFury() + if err := fury.RegisterTagType("example.SomeClass1", SomeClass1{}); err != nil { + panic(err) + } + if err := fury.RegisterTagType("example.SomeClass2", SomeClass2{}); err != nil { + panic(err) + } + obj1 := &SomeClass1{} + obj1.F1 = true + obj1.F2 = map[int8]int32{-1: 2} + obj := &SomeClass1{} + obj.F1 = obj1 + obj.F2 = "abc" + obj.F3 = []interface{}{"abc", "abc"} + f4 := map[int8]int32{1: 2} + obj.F4 = f4 + obj.F5 = fury.MaxInt8 + obj.F6 = fury.MaxInt16 + obj.F7 = fury.MaxInt32 + obj.F8 = fury.MaxInt64 + obj.F9 = 1.0 / 2 + obj.F10 = 1 / 3.0 + obj.F11 = []int16{1, 2} + obj.F12 = []int16{-1, 4} + bytes, err := fury.Marshal(obj); + if err != nil { + panic(err) + } + var newValue interface{} + // bytes can be data serialized by other languages. + if err := fury.Unmarshal(bytes, &newValue); err != nil { + panic(err) + } + fmt.Println(newValue) } ``` @@ -395,7 +394,6 @@ fn complex_struct() { ``` ### Serialize Shared Reference and Circular Reference - Shared reference and circular reference can be serialized automatically, no duplicate data or recursion error. **Java** @@ -462,27 +460,27 @@ import furygo "github.com/apache/fury/fury/go/fury" import "fmt" func main() { - type SomeClass struct { - F1 *SomeClass - F2 map[string]string - F3 map[string]string - } - fury := furygo.NewFury(true) - if err := fury.RegisterTagType("example.SomeClass", SomeClass{}); err != nil { - panic(err) - } - value := &SomeClass{F2: map[string]string{"k1": "v1", "k2": "v2"}} - value.F3 = value.F2 - value.F1 = value - bytes, err := fury.Marshal(value) - if err != nil { - } - var newValue interface{} - // bytes can be data serialized by other languages. - if err := fury.Unmarshal(bytes, &newValue); err != nil { - panic(err) - } - fmt.Println(newValue) + type SomeClass struct { + F1 *SomeClass + F2 map[string]string + F3 map[string]string + } + fury := furygo.NewFury(true) + if err := fury.RegisterTagType("example.SomeClass", SomeClass{}); err != nil { + panic(err) + } + value := &SomeClass{F2: map[string]string{"k1": "v1", "k2": "v2"}} + value.F3 = value.F2 + value.F1 = value + bytes, err := fury.Marshal(value) + if err != nil { + } + var newValue interface{} + // bytes can be data serialized by other languages. + if err := fury.Unmarshal(bytes, &newValue); err != nil { + panic(err) + } + fmt.Println(newValue) } ``` @@ -516,6 +514,7 @@ console.log(result.bar.foo === result.foo); **JavaScript** Reference cannot be implemented because of rust ownership restrictions + ### Zero-Copy Serialization **Java** @@ -570,23 +569,23 @@ import furygo "github.com/apache/fury/fury/go/fury" import "fmt" func main() { - fury := furygo.NewFury() - list := []interface{}{"str", make([]byte, 1000)} - buf := fury.NewByteBuffer(nil) - var bufferObjects []fury.BufferObject - fury.Serialize(buf, list, func(o fury.BufferObject) bool { - bufferObjects = append(bufferObjects, o) - return false - }) - var newList []interface{} - var buffers []*fury.ByteBuffer - for _, o := range bufferObjects { - buffers = append(buffers, o.ToBuffer()) - } - if err := fury.Deserialize(buf, &newList, buffers); err != nil { - panic(err) - } - fmt.Println(newList) + fury := furygo.NewFury() + list := []interface{}{"str", make([]byte, 1000)} + buf := fury.NewByteBuffer(nil) + var bufferObjects []fury.BufferObject + fury.Serialize(buf, list, func(o fury.BufferObject) bool { + bufferObjects = append(bufferObjects, o) + return false + }) + var newList []interface{} + var buffers []*fury.ByteBuffer + for _, o := range bufferObjects { + buffers = append(buffers, o.ToBuffer()) + } + if err := fury.Deserialize(buf, &newList, buffers); err != nil { + panic(err) + } + fmt.Println(newList) } ``` diff --git a/docs/guide/xlang_type_mapping.md b/docs/guide/xlang_type_mapping.md index 7be6900..4baa455 100644 --- a/docs/guide/xlang_type_mapping.md +++ b/docs/guide/xlang_type_mapping.md @@ -10,6 +10,8 @@ Note: - `int16_t[n]/vector<T>` indicates `int16_t[n]/vector<int16_t>` - The cross-language serialization is not stable, do not use it in your production environment. +# Type Mapping + | Fury Type | Fury Type ID | Java | Python | Javascript | C++ | Golang | Rust | |--------------------|--------------|-----------------|----------------------|-----------------|--------------------------------|------------------|------------------| | bool | 1 | bool/Boolean | bool | Boolean | bool | bool | bool | @@ -46,7 +48,7 @@ Note: | arrow record batch | 32 | / | / | / | / | / | / | | arrow table | 33 | / | / | / | / | / | / | -## Type info(not implemented currently) +# Type info(not implemented currently) Due to differences between type systems of languages, those types can't be mapped one-to-one between languages. @@ -68,7 +70,6 @@ Such information can be provided in other languages too: Here is en example: - Java: - ```java class Foo { @Int32Type(varint = true) @@ -76,7 +77,6 @@ Here is en example: List<@Int32Type(varint = true) Integer> f2; } ``` - - Python: ```python --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
