[jira] [Updated] (PHOENIX-7330) Introducing Binary JSON (BSON) with Complex Document structures in Phoenix

Viraj Jasani (Jira) Wed, 12 Jun 2024 19:57:23 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Viraj Jasani updated PHOENIX-7330:
----------------------------------
    Description: 
The purpose of this Jira is to introduce new data type in Phoenix: Binary JSON 
(BSON) to manage more complex document data structures in Phoenix.

BSON or Binary JSON is a Binary-Encoded serialization of JSON-like documents. 
BSON data type is specifically used for users to store, update and query part 
or whole of the BsonDocument in the most performant way without having to 
serialize/deserialize the document to/from binary format. Bson allows 
deserializing only part of the nested documents such that querying or indexing 
any attributes within the nested structure becomes more efficient and 
performant as the deserialization happens at runtime. Any other document 
structure would require deserializing the binary into the document, and then 
perform the query.

BSONSpec: [https://bsonspec.org/]

JSON and BSON are closely related by design. BSON serves as a binary 
representation of JSON data, tailored with specialized extensions for wider 
application scenarios, and finely tuned for efficient data storage and 
traversal. Similar to JSON, BSON facilitates the embedding of objects and 
arrays.

One particular way in which BSON differs from JSON is in its support for some 
more advanced data types. For instance, JSON does not differentiate between 
integers (round numbers), and floating-point numbers (with decimal precision). 
BSON does distinguish between the two and store them in the corresponding BSON 
data type (e.g. BsonInt32 vs BsonDouble). Many server-side programming 
languages offer advanced numeric data types (standards include integer, regular 
precision floating point number i.e. “float”, double-precision floating point 
i.e. “double”, and boolean values), each with its own optimal usage for 
efficient mathematical operations.

Another key distinction between BSON and JSON is that BSON documents have the 
capability to include Date or Binary objects, which cannot be directly 
represented in pure JSON format. BSON also provides the ability to store and 
retrieve user defined Binary objects. Likewise, by integrating advanced data 
structures like Sets into BSON documents, we can significantly enhance the 
capabilities of Phoenix for storing, retrieving, and updating Binary, Sets, 
Lists, and Documents as nested or complex data types.

Moreover, JSON format is human as well as machine readable, whereas BSON format 
is only machine readable. Hence, as part of introducing BSON data type, we also 
need to provide a user interface such that users can provide human readable 
JSON as input for BSON datatype.

This Jira also introduces access and update functions for BSON documents.

BSON_CONDITION_EXPRESSION can evaluate condition expression on the document 
fields, similar to how WHERE clause evaluates condition expression on various 
columns of the given row(s) for the relational tables.

BSON_UPDATE_EXPRESSION can perform one or more document field updates similar 
to how UPSERT statements can perform update to one or more columns of the given 
row(s) for the relational tables.

 

Phoenix can introduce more complex data structures like sets of scalar types, 
in addition to the nested documents and nested arrays provided by BSON.

Overall, by combining various functionalities available in Phoenix like 
secondary indexes, conditional updates, high throughput read/write with BSON, 
we can evolve Phoenix into highly scalable Document Database.

  was:
The purpose of this Jira is to introduce new data type in Phoenix: Binary JSON 
(BSON) to manage more complex document data structures in Phoenix.

BSON or Binary JSON is a Binary-Encoded serialization of JSON-like documents. 
BSON data type is specifically used for users to store, update and query part 
or whole of the BsonDocument in the most performant way without having to 
serialize/deserialize the document to/from binary format. Bson allows 
deserializing only part of the nested documents such that querying or indexing 
any attributes within the nested structure becomes more efficient and 
performant as the deserialization happens at runtime. Any other document 
structure would require deserializing the binary into the document, and then 
perform the query.

BSONSpec: [https://bsonspec.org/]

JSON and BSON are closely related by design. BSON serves as a binary 
representation of JSON data, tailored with specialized extensions for wider 
application scenarios, and finely tuned for efficient data storage and 
traversal. Similar to JSON, BSON facilitates the embedding of objects and 
arrays.

 

One particular way in which BSON differs from JSON is in its support for some 
more advanced data types. For instance, JSON does not differentiate between 
integers (round numbers), and floating-point numbers (with decimal precision). 
BSON does distinguish between the two and store them in the corresponding BSON 
data type (e.g. BsonInt32 vs BsonDouble). Many server-side programming 
languages offer advanced numeric data types (standards include integer, regular 
precision floating point number i.e. “float”, double-precision floating point 
i.e. “double”, and boolean values), each with its own optimal usage for 
efficient mathematical operations.

Another key distinction between BSON and JSON is that BSON documents have the 
capability to include Date or Binary objects, which cannot be directly 
represented in pure JSON format. BSON also provides the ability to store and 
retrieve user defined Binary objects. Likewise, by integrating advanced data 
structures like Sets into BSON documents, we can significantly enhance the 
capabilities of Phoenix for storing, retrieving, and updating Binary, Sets, 
Lists, and Documents as nested or complex data types.

Moreover, JSON format is human as well as machine readable, whereas BSON format 
is only machine readable. Hence, as part of introducing BSON data type, we also 
need to provide a user interface such that users can provide human readable 
JSON as input for BSON datatype.

This Jira also introduces access and update functions for BSON documents.

BSON_CONDITION_EXPRESSION can evaluate condition expression on the document 
fields, similar to how WHERE clause evaluates condition expression on various 
columns of the given row(s) for the relational tables.

BSON_UPDATE_EXPRESSION can perform one or more document field updates similar 
to how UPSERT statements can perform update to one or more columns of the given 
row(s) for the relational tables.

Overall, by combining various functionalities available in Phoenix like 
secondary indexes, conditional updates, high throughput read/write with BSON, 
we can evolve Phoenix into highly scalable Document Database.


> Introducing Binary JSON (BSON) with Complex Document structures in Phoenix
> --------------------------------------------------------------------------
>
>                 Key: PHOENIX-7330
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7330
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>
> The purpose of this Jira is to introduce new data type in Phoenix: Binary 
> JSON (BSON) to manage more complex document data structures in Phoenix.
> BSON or Binary JSON is a Binary-Encoded serialization of JSON-like documents. 
> BSON data type is specifically used for users to store, update and query part 
> or whole of the BsonDocument in the most performant way without having to 
> serialize/deserialize the document to/from binary format. Bson allows 
> deserializing only part of the nested documents such that querying or 
> indexing any attributes within the nested structure becomes more efficient 
> and performant as the deserialization happens at runtime. Any other document 
> structure would require deserializing the binary into the document, and then 
> perform the query.
> BSONSpec: [https://bsonspec.org/]
> JSON and BSON are closely related by design. BSON serves as a binary 
> representation of JSON data, tailored with specialized extensions for wider 
> application scenarios, and finely tuned for efficient data storage and 
> traversal. Similar to JSON, BSON facilitates the embedding of objects and 
> arrays.
> One particular way in which BSON differs from JSON is in its support for some 
> more advanced data types. For instance, JSON does not differentiate between 
> integers (round numbers), and floating-point numbers (with decimal 
> precision). BSON does distinguish between the two and store them in the 
> corresponding BSON data type (e.g. BsonInt32 vs BsonDouble). Many server-side 
> programming languages offer advanced numeric data types (standards include 
> integer, regular precision floating point number i.e. “float”, 
> double-precision floating point i.e. “double”, and boolean values), each with 
> its own optimal usage for efficient mathematical operations.
> Another key distinction between BSON and JSON is that BSON documents have the 
> capability to include Date or Binary objects, which cannot be directly 
> represented in pure JSON format. BSON also provides the ability to store and 
> retrieve user defined Binary objects. Likewise, by integrating advanced data 
> structures like Sets into BSON documents, we can significantly enhance the 
> capabilities of Phoenix for storing, retrieving, and updating Binary, Sets, 
> Lists, and Documents as nested or complex data types.
> Moreover, JSON format is human as well as machine readable, whereas BSON 
> format is only machine readable. Hence, as part of introducing BSON data 
> type, we also need to provide a user interface such that users can provide 
> human readable JSON as input for BSON datatype.
> This Jira also introduces access and update functions for BSON documents.
> BSON_CONDITION_EXPRESSION can evaluate condition expression on the document 
> fields, similar to how WHERE clause evaluates condition expression on various 
> columns of the given row(s) for the relational tables.
> BSON_UPDATE_EXPRESSION can perform one or more document field updates similar 
> to how UPSERT statements can perform update to one or more columns of the 
> given row(s) for the relational tables.
>  
> Phoenix can introduce more complex data structures like sets of scalar types, 
> in addition to the nested documents and nested arrays provided by BSON.
> Overall, by combining various functionalities available in Phoenix like 
> secondary indexes, conditional updates, high throughput read/write with BSON, 
> we can evolve Phoenix into highly scalable Document Database.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-7330) Introducing Binary JSON (BSON) with Complex Document structures in Phoenix

Reply via email to