Viraj Jasani created PHOENIX-7330:
-------------------------------------
Summary: Introducing Binary JSON (BSON) with Complex Document
structures in Phoenix
Key: PHOENIX-7330
URL: https://issues.apache.org/jira/browse/PHOENIX-7330
Project: Phoenix
Issue Type: New Feature
Reporter: Viraj Jasani
The purpose of this Jira is to introduce new data type in Phoenix: Binary JSON
(BSON) to manage more complex document data structures in Phoenix.
BSON or Binary JSON is a Binary-Encoded serialization of JSON-like documents.
BSON data type is specifically used for users to store, update and query part
or whole of the BsonDocument in the most performant way without having to
serialize/deserialize the document to/from binary format. Bson allows
deserializing only part of the nested documents such that querying or indexing
any attributes within the nested structure becomes more efficient and
performant as the deserialization happens at runtime. Any other document
structure would require deserializing the binary into the document, and then
perform the query.
BSONSpec: [https://bsonspec.org/]
JSON and BSON are closely related by design. BSON serves as a binary
representation of JSON data, tailored with specialized extensions for wider
application scenarios, and finely tuned for efficient data storage and
traversal. Similar to JSON, BSON facilitates the embedding of objects and
arrays.
One particular way in which BSON differs from JSON is in its support for some
more advanced data types. For instance, JSON does not differentiate between
integers (round numbers), and floating-point numbers (with decimal precision).
BSON does distinguish between the two and store them in the corresponding BSON
data type (e.g. BsonInt32 vs BsonDouble). Many server-side programming
languages offer advanced numeric data types (standards include integer, regular
precision floating point number i.e. “float”, double-precision floating point
i.e. “double”, and boolean values), each with its own optimal usage for
efficient mathematical operations.
Another key distinction between BSON and JSON is that BSON documents have the
capability to include Date or Binary objects, which cannot be directly
represented in pure JSON format. BSON also provides the ability to store and
retrieve user defined Binary objects. Likewise, by integrating advanced data
structures like Sets into BSON documents, we can significantly enhance the
capabilities of Phoenix for storing, retrieving, and updating Binary, Sets,
Lists, and Documents as nested or complex data types.
Moreover, JSON format is human as well as machine readable, whereas BSON format
is only machine readable. Hence, as part of introducing BSON data type, we also
need to provide a user interface such that users can provide human readable
JSON as input for BSON datatype.
This Jira also introduces access and update functions for BSON documents.
BSON_CONDITION_EXPRESSION can evaluate condition expression on the document
fields, similar to how WHERE clause evaluates condition expression on various
columns of the given row(s) for the relational tables.
BSON_UPDATE_EXPRESSION can perform one or more document field updates similar
to how UPSERT statements can perform update to one or more columns of the given
row(s) for the relational tables.
Overall, by combining various functionalities available in Phoenix like
secondary indexes, conditional updates, high throughput read/write with BSON,
we can evolve Phoenix into highly scalable Document Database.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)