GitHub user jianqiao opened a pull request:
https://github.com/apache/incubator-quickstep/pull/315
[DO NOT MERGE] Refactor type system to provide better extensibility of
types and functions
This is a preliminary PR that is not ready to be merged but provides an
overall view of the type system refactoring work. Many constructs are at their
initial designs and maybe further improved.
The PR aims at reviewing the refactoring designs at the "architecture"
level. Detailed code style and unit test issues may be addressed later in
subsequent concrete PRs.
The overall purpose of the refactoring is to improve the extensibility of
the existing type/function system (i.e. support more kinds of types/functions
and make it easier to add new types and functions), while retaining the
performance of the current system.
### Major Changes
#### Part I. Type System
---
##### 1. Categorize all types into four [_memory
layouts_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeID.hpp#L64).
The four memory layouts are:
* __CxxInlinePod__ <sub>(C++ plain old data)</sub>
* __ParInlinePod__ <sub>(Parameterized inline plain old data)</sub>
* __ParOutOfLinePod__ <sub>(Parameterized out-of-line plain old data)</sub>
* __CxxGeneric__ <sub>(C++ generic types)</sub>
Memory layout decides how the corresponding type's values are stored and
represented.
Briefly speaking,
* _CxxInlinePod_ corresponds to C++ primitive types or POD structs.
* E.g. _int_, _double_, _struct { double x, double y }_.
* The size of a CxxInlinePod value is known at C++ compile time (e.g
_double_ has size 8, _struct { double x, double y }_ has size 16).
* _ParInlinePod_ corresponds to database defined "fixed length" types.
* E.g. _Char(8)_, _Char(20)_.
* The size of such types' values are not known at C++ compile time.
Instead, the type is parameterized by an unsigned integer, where the
parameter's value is known at SQL query compile time (which is C++ run-time).
* _ParOutOfLinePod_ corresponds to database defined "variable length" types.
* E.g. _Varchar(20)_.
* The size of such types' values are not known until SQL query run-time.
* _CxxGeneric_ correponds to C++ general types (i.e. any C++ type).
* E.g. _std::set<int>_, _std::vector<const Type*>_.
* Such types have to implement serialization/deserialization methods to
have storage support.
---
##### 2. Use
[_TypeIDTrait_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeRegistrar.hpp#L59)
to allow many information to be known at compile time.
With this per-type trait information, we can avoid many boilerplate code
for each subclass of _Type_ by using template techniques and specialize on the
memory layout. See
[_TypeSynthesizer_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeSynthesizer.hpp)
and
[_TypeFactory_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeFactory.cpp#L69).
_TypeIDTrait_ is also extensively used in many other places as it provides
all the required compile-time information about a type.
---
##### 3. Support more types.
Details will be written later about how to add a new type into the
Quickstep system.
The current PR has some example types added:
* The
[_Bool_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/BoolType.hpp)
type. It will be used later for connecting scalar functions and predicates.
* The
[_Text_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TextType.hpp)
type. A general non-parameterized string type.
* __TODO:__ We need some updates in the storage block module (potentially
also other places) to handle the "infinite maximum byte size" types.
* The
[_MetaType_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/MetaType-decl.hpp)
type. It is "type of type". I.e. a value of _MetaType_ has C++ type _const
Type*_.
* The
[_Array_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/ArrayType.hpp)
type. A generic type that represents an array. This type takes a MetaType
value as parameter, where the parameter specifies the array's element type.
* __TODO__: We need specialized array types such as _IntArray_ and
_TextArray_ for performance consideration.
---
##### 4. Improve the type casting mechanism.
Type casting (coersion) is an important feature that is needed in practice
from time to time.
This PR's design defined an overall
[template](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/CastFunctorOverloads.hpp#L41)
```
template <typename SourceType, typename TargetType, typename Enable = void>
struct CastFunctor;
```
which is then specialized by different source/target types.
The coercibility between two types is then
[inferred](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/utility/CastUtil.cpp#L58)
according to whether the corresponding specialization exists. Thus it suffices
to just specialize _CastFunctor_ when adding a new casting operation, and all
the dependent places (e.g. _Type::isCoercibleFrom()_) will mostly be
auto-generated by the system (unless the target type is a parameterized type
and you want to do some further checks).
Note that _safe-coercibility_ is a separate issue and needs to be taken
care of mostly manually, by overriding _Type::isSafelyCoercibleFrom()_.
Explicit casting is supported with a PostgreSQL-like syntax. E.g.
(1)
```
SELECT (i::text + (i+1)::text)::int AS result FROM generate_series(1, 3) AS
g(i);
--
+-----------+
|result |
+-----------+
| 12|
| 23|
| 34|
+-----------+
```
(2)
```
CREATE TABLE r(x varchar(16));
INSERT INTO r SELECT pow(10, i)::varchar(10) FROM generate_series(1, 3) AS
g(i);
SELECT 'There are ' + length(x)::varchar(10) + ' characters in ' + x AS
result FROM r;
--
+---------------------------------------------------+
|result |
+---------------------------------------------------+
| There are 2 characters in 10|
| There are 3 characters in 100|
| There are 4 characters in 1000|
+---------------------------------------------------+
```
(3)
```
SELECT {1,2,3}::array(double) AS result from generate_series(1, 1);
--
+--------------------------------+
|result |
+--------------------------------+
| {1,2,3}|
+--------------------------------+
```
__NOTE__: The work is not yet fully completed so there may be `LOG(FATAL)`
aborts for some combinations of queries.
Implicit coersion is supported when resolving scalar functions, see
[here](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/OperationFactory.cpp#L292).
For example, we have support for the _sqrt_ function where the parameter can
be a _Float_ or _Double_ value. Consider the query
```
SELECT sqrt(x) FROM r;
```
where `x` has _Int_ type, then an implicit coercion from _Int_ to _Float_
will be added.
---
##### 5. Add
[_GenericValue_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/GenericValue.hpp)
to represent typed-values of all four memory layouts.
The original _TypedValue_ is not sufficient to represent _CxxGeneric_
values, as we need to embed the overall _Type_ information in order to handle
value allocation/copy/destruction. However, due to performance consideration,
we may not just replace _TypedValue_ with a more generic but slower
implementation. Thus, a separate _GenericValue_ is added and we still use
_TypedValue_ when handling storage-related operations.
---
##### 6. Move type resolving from parser to resolver.
This avoids the need of modifying _SqlParser.ypp_ for adding a new type.
See
[_ParseDataType_](https://github.com/apache/incubator-quickstep/blob/refactor-type/parser/ParseDataType.hpp)
and
[_Resolver::resolveDataType()_](https://github.com/apache/incubator-quickstep/blob/refactor-type/query_optimizer/resolver/Resolver.cpp#L1196).
~
#### Part II. Scalar Function
---
##### 1. Implement
[_UnaryOperationSynthesizer_/_UncheckedUnaryOperatorSynthesizer_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/UnaryOperationSynthesizer.hpp#L58)
to make it easier to add unary functions.
Example unary functions:
*
[Arithmetic](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/ArithmeticUnaryFunctors.hpp#L60)
*
[String](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/AsciiStringUnaryFunctors.hpp#L106)
*
[Math](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/CMathUnaryFunctors.hpp#L70)
##### 2. Implement
[_BinaryOperationSynthesizer_/_UncheckedBinaryOperatorSynthesizer_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/binary_operations/BinaryOperationSynthesizer.hpp#L62)
to make it easier to add binary functions.
Example binary functions:
*
[Arithmetic](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/binary_operations/ArithmeticBinaryFunctors.hpp#L94)
*
[String](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/binary_operations/AsciiStringBinaryFunctors.hpp#L127)
*
[Math](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/binary_operations/CMathBinaryFunctors.hpp#L66)
##### 3. Use
[_OperationSignature_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/OperationSignature.hpp#L45)
and
[_OperationFactory_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/OperationFactory.hpp#L48)
to support general operation resolution.
* See
[_OperationFactory::OperationFactory()_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/OperationFactory.cpp#L85)
about how operations are registered.
* See
[_Resolver::resolveScalarFunction()_](https://github.com/apache/incubator-quickstep/blob/refactor-type/query_optimizer/resolver/Resolver.cpp#L2889)
about how a function from SQL query gets resolved.
~
#### Part III. TODOs
* A lot of _TODO(refactor-type)_ in the code to be fixed.
* Refactor the predicate system (we will have something like
_ComparisonSynthesizer_).
* A lot unit tests are broken (due to API change) and need to be fixed.
* Comments and style of template metaprogramming code.
* More to be added ...
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-quickstep refactor-type
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-quickstep/pull/315.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #315
----
commit cb564509c8da64af1c0981ca816f962f94b06602
Author: Jianqiao Zhu <[email protected]>
Date: 2017-03-04T18:11:13Z
Refactor type system and operations.
commit 02005508dd4b6813ecc494e2cdfed842b4c93dc4
Author: Jianqiao Zhu <[email protected]>
Date: 2017-09-28T02:12:59Z
Some updates
commit ebf44cd2dd230bd45c849cb008005ad9c07b2d60
Author: Jianqiao Zhu <[email protected]>
Date: 2017-10-02T05:26:05Z
Updates for adding generic types
commit a7031a343814bb003353c6b0b75957f66db6240c
Author: Jianqiao Zhu <[email protected]>
Date: 2017-10-02T06:30:46Z
Add array expression
commit b6fd31fec0cd9b1a89eee1fe89af70b85df44bc5
Author: Jianqiao Zhu <[email protected]>
Date: 2017-10-02T20:36:22Z
Continue the work
commit 1e69fb18eb9e7f31c48d85aaef781dca1ba8290a
Author: Jianqiao Zhu <[email protected]>
Date: 2017-10-03T04:46:48Z
Updates for array type
commit bef66ad47a6edb69f76f29c56d528a28ba7760b8
Author: Jianqiao Zhu <[email protected]>
Date: 2017-10-03T06:52:29Z
Updates to meta type
commit 3a3772d91bd94d269b2f2fa49895f53385d46381
Author: Jianqiao Zhu <[email protected]>
Date: 2017-10-03T22:03:20Z
Add text type
commit 0957264b534cb65bbcc28999833f30bd8888856a
Author: Jianqiao Zhu <[email protected]>
Date: 2017-10-04T05:26:36Z
Type as first class citizen
commit 9cb664c802f2a85862dea1cc41a08c989dd579e7
Author: Jianqiao Zhu <[email protected]>
Date: 2017-10-04T08:21:44Z
More updates to types
commit 1cb97e3547846240f3ceb0a7f086dbe912174b72
Author: Jianqiao Zhu <[email protected]>
Date: 2017-10-05T22:02:33Z
More updates, refactor names
commit 477c385d427483d4c2708449927f14268e53c311
Author: Jianqiao Zhu <[email protected]>
Date: 2017-10-10T18:20:17Z
Updates to casts
commit a3aec8e789b66c2c1de64cbd2cdc3fac70b8121b
Author: Jianqiao Zhu <[email protected]>
Date: 2017-10-11T08:38:40Z
Updates to implicit casts
----
---