[
https://issues.apache.org/jira/browse/PHOENIX-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani updated PHOENIX-7330:
----------------------------------
Description:
The purpose of this Jira is to introduce new data type in Phoenix: Binary JSON
(BSON) to manage more complex document data structures in Phoenix.
BSON or Binary JSON is a Binary-Encoded serialization of JSON-like documents.
BSON data type is specifically used for users to store, update and query part
or whole of the BsonDocument in the most performant way without having to
serialize/deserialize the document to/from binary format. Bson allows
deserializing only part of the nested documents such that querying or indexing
any attributes within the nested structure becomes more efficient and
performant as the deserialization happens at runtime. Any other document
structure would require deserializing the binary into the document, and then
perform the query.
BSONSpec: [https://bsonspec.org/]
JSON and BSON are closely related by design. BSON serves as a binary
representation of JSON data, tailored with specialized extensions for wider
application scenarios, and finely tuned for efficient data storage and
traversal. Similar to JSON, BSON facilitates the embedding of objects and
arrays.
One particular way in which BSON differs from JSON is in its support for some
more advanced data types. For instance, JSON does not differentiate between
integers (round numbers), and floating-point numbers (with decimal precision).
BSON does distinguish between the two and store them in the corresponding BSON
data type (e.g. BsonInt32 vs BsonDouble). Many server-side programming
languages offer advanced numeric data types (standards include integer, regular
precision floating point number i.e. “float”, double-precision floating point
i.e. “double”, and boolean values), each with its own optimal usage for
efficient mathematical operations.
Another key distinction between BSON and JSON is that BSON documents have the
capability to include Date or Binary objects, which cannot be directly
represented in pure JSON format. BSON also provides the ability to store and
retrieve user defined Binary objects. Likewise, by integrating advanced data
structures like Sets into BSON documents, we can significantly enhance the
capabilities of Phoenix for storing, retrieving, and updating Binary, Sets,
Lists, and Documents as nested or complex data types.
Moreover, JSON format is human as well as machine readable, whereas BSON format
is only machine readable. Hence, as part of introducing BSON data type, we also
need to provide a user interface such that users can provide human readable
JSON as input for BSON datatype.
This Jira also introduces access and update functions for BSON documents.
BSON_CONDITION_EXPRESSION can evaluate condition expression on the document
fields, similar to how WHERE clause evaluates condition expression on various
columns of the given row(s) for the relational tables.
BSON_UPDATE_EXPRESSION can perform one or more document field updates similar
to how UPSERT statements can perform update to one or more columns of the given
row(s) for the relational tables.
Phoenix can introduce more complex data structures like sets of scalar types,
in addition to the nested documents and nested arrays provided by BSON.
Overall, by combining various functionalities available in Phoenix like
secondary indexes, conditional updates, high throughput read/write with BSON,
we can evolve Phoenix into highly scalable Document Database.
was:
The purpose of this Jira is to introduce new data type in Phoenix: Binary JSON
(BSON) to manage more complex document data structures in Phoenix.
BSON or Binary JSON is a Binary-Encoded serialization of JSON-like documents.
BSON data type is specifically used for users to store, update and query part
or whole of the BsonDocument in the most performant way without having to
serialize/deserialize the document to/from binary format. Bson allows
deserializing only part of the nested documents such that querying or indexing
any attributes within the nested structure becomes more efficient and
performant as the deserialization happens at runtime. Any other document
structure would require deserializing the binary into the document, and then
perform the query.
BSONSpec: [https://bsonspec.org/]
JSON and BSON are closely related by design. BSON serves as a binary
representation of JSON data, tailored with specialized extensions for wider
application scenarios, and finely tuned for efficient data storage and
traversal. Similar to JSON, BSON facilitates the embedding of objects and
arrays.
One particular way in which BSON differs from JSON is in its support for some
more advanced data types. For instance, JSON does not differentiate between
integers (round numbers), and floating-point numbers (with decimal precision).
BSON does distinguish between the two and store them in the corresponding BSON
data type (e.g. BsonInt32 vs BsonDouble). Many server-side programming
languages offer advanced numeric data types (standards include integer, regular
precision floating point number i.e. “float”, double-precision floating point
i.e. “double”, and boolean values), each with its own optimal usage for
efficient mathematical operations.
Another key distinction between BSON and JSON is that BSON documents have the
capability to include Date or Binary objects, which cannot be directly
represented in pure JSON format. BSON also provides the ability to store and
retrieve user defined Binary objects. Likewise, by integrating advanced data
structures like Sets into BSON documents, we can significantly enhance the
capabilities of Phoenix for storing, retrieving, and updating Binary, Sets,
Lists, and Documents as nested or complex data types.
Moreover, JSON format is human as well as machine readable, whereas BSON format
is only machine readable. Hence, as part of introducing BSON data type, we also
need to provide a user interface such that users can provide human readable
JSON as input for BSON datatype.
This Jira also introduces access and update functions for BSON documents.
BSON_CONDITION_EXPRESSION can evaluate condition expression on the document
fields, similar to how WHERE clause evaluates condition expression on various
columns of the given row(s) for the relational tables.
BSON_UPDATE_EXPRESSION can perform one or more document field updates similar
to how UPSERT statements can perform update to one or more columns of the given
row(s) for the relational tables.
Overall, by combining various functionalities available in Phoenix like
secondary indexes, conditional updates, high throughput read/write with BSON,
we can evolve Phoenix into highly scalable Document Database.
> Introducing Binary JSON (BSON) with Complex Document structures in Phoenix
> --------------------------------------------------------------------------
>
> Key: PHOENIX-7330
> URL: https://issues.apache.org/jira/browse/PHOENIX-7330
> Project: Phoenix
> Issue Type: New Feature
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
>
> The purpose of this Jira is to introduce new data type in Phoenix: Binary
> JSON (BSON) to manage more complex document data structures in Phoenix.
> BSON or Binary JSON is a Binary-Encoded serialization of JSON-like documents.
> BSON data type is specifically used for users to store, update and query part
> or whole of the BsonDocument in the most performant way without having to
> serialize/deserialize the document to/from binary format. Bson allows
> deserializing only part of the nested documents such that querying or
> indexing any attributes within the nested structure becomes more efficient
> and performant as the deserialization happens at runtime. Any other document
> structure would require deserializing the binary into the document, and then
> perform the query.
> BSONSpec: [https://bsonspec.org/]
> JSON and BSON are closely related by design. BSON serves as a binary
> representation of JSON data, tailored with specialized extensions for wider
> application scenarios, and finely tuned for efficient data storage and
> traversal. Similar to JSON, BSON facilitates the embedding of objects and
> arrays.
> One particular way in which BSON differs from JSON is in its support for some
> more advanced data types. For instance, JSON does not differentiate between
> integers (round numbers), and floating-point numbers (with decimal
> precision). BSON does distinguish between the two and store them in the
> corresponding BSON data type (e.g. BsonInt32 vs BsonDouble). Many server-side
> programming languages offer advanced numeric data types (standards include
> integer, regular precision floating point number i.e. “float”,
> double-precision floating point i.e. “double”, and boolean values), each with
> its own optimal usage for efficient mathematical operations.
> Another key distinction between BSON and JSON is that BSON documents have the
> capability to include Date or Binary objects, which cannot be directly
> represented in pure JSON format. BSON also provides the ability to store and
> retrieve user defined Binary objects. Likewise, by integrating advanced data
> structures like Sets into BSON documents, we can significantly enhance the
> capabilities of Phoenix for storing, retrieving, and updating Binary, Sets,
> Lists, and Documents as nested or complex data types.
> Moreover, JSON format is human as well as machine readable, whereas BSON
> format is only machine readable. Hence, as part of introducing BSON data
> type, we also need to provide a user interface such that users can provide
> human readable JSON as input for BSON datatype.
> This Jira also introduces access and update functions for BSON documents.
> BSON_CONDITION_EXPRESSION can evaluate condition expression on the document
> fields, similar to how WHERE clause evaluates condition expression on various
> columns of the given row(s) for the relational tables.
> BSON_UPDATE_EXPRESSION can perform one or more document field updates similar
> to how UPSERT statements can perform update to one or more columns of the
> given row(s) for the relational tables.
>
> Phoenix can introduce more complex data structures like sets of scalar types,
> in addition to the nested documents and nested arrays provided by BSON.
> Overall, by combining various functionalities available in Phoenix like
> secondary indexes, conditional updates, high throughput read/write with BSON,
> we can evolve Phoenix into highly scalable Document Database.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)