GitHub user jianqiao opened a pull request: https://github.com/apache/incubator-quickstep/pull/315
[DO NOT MERGE] Refactor type system to provide better extensibility of types and functions This is a preliminary PR that is not ready to be merged but provides an overall view of the type system refactoring work. Many constructs are at their initial designs and maybe further improved. The PR aims at reviewing the refactoring designs at the "architecture" level. Detailed code style and unit test issues may be addressed later in subsequent concrete PRs. The overall purpose of the refactoring is to improve the extensibility of the existing type/function system (i.e. support more kinds of types/functions and make it easier to add new types and functions), while retaining the performance of the current system. ### Major Changes #### Part I. Type System --- ##### 1. Categorize all types into four [_memory layouts_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeID.hpp#L64). The four memory layouts are: * __CxxInlinePod__ <sub>(C++ plain old data)</sub> * __ParInlinePod__ <sub>(Parameterized inline plain old data)</sub> * __ParOutOfLinePod__ <sub>(Parameterized out-of-line plain old data)</sub> * __CxxGeneric__ <sub>(C++ generic types)</sub> Memory layout decides how the corresponding type's values are stored and represented. Briefly speaking, * _CxxInlinePod_ corresponds to C++ primitive types or POD structs. * E.g. _int_, _double_, _struct { double x, double y }_. * The size of a CxxInlinePod value is known at C++ compile time (e.g _double_ has size 8, _struct { double x, double y }_ has size 16). * _ParInlinePod_ corresponds to database defined "fixed length" types. * E.g. _Char(8)_, _Char(20)_. * The size of such types' values are not known at C++ compile time. Instead, the type is parameterized by an unsigned integer, where the parameter's value is known at SQL query compile time (which is C++ run-time). * _ParOutOfLinePod_ corresponds to database defined "variable length" types. * E.g. _Varchar(20)_. * The size of such types' values are not known until SQL query run-time. * _CxxGeneric_ correponds to C++ general types (i.e. any C++ type). * E.g. _std::set<int>_, _std::vector<const Type*>_. * Such types have to implement serialization/deserialization methods to have storage support. --- ##### 2. Use [_TypeIDTrait_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeRegistrar.hpp#L59) to allow many information to be known at compile time. With this per-type trait information, we can avoid many boilerplate code for each subclass of _Type_ by using template techniques and specialize on the memory layout. See [_TypeSynthesizer_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeSynthesizer.hpp) and [_TypeFactory_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeFactory.cpp#L69). _TypeIDTrait_ is also extensively used in many other places as it provides all the required compile-time information about a type. --- ##### 3. Support more types. Details will be written later about how to add a new type into the Quickstep system. The current PR has some example types added: * The [_Bool_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/BoolType.hpp) type. It will be used later for connecting scalar functions and predicates. * The [_Text_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TextType.hpp) type. A general non-parameterized string type. * __TODO:__ We need some updates in the storage block module (potentially also other places) to handle the "infinite maximum byte size" types. * The [_MetaType_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/MetaType-decl.hpp) type. It is "type of type". I.e. a value of _MetaType_ has C++ type _const Type*_. * The [_Array_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/ArrayType.hpp) type. A generic type that represents an array. This type takes a MetaType value as parameter, where the parameter specifies the array's element type. * __TODO__: We need specialized array types such as _IntArray_ and _TextArray_ for performance consideration. --- ##### 4. Improve the type casting mechanism. Type casting (coersion) is an important feature that is needed in practice from time to time. This PR's design defined an overall [template](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/CastFunctorOverloads.hpp#L41) ``` template <typename SourceType, typename TargetType, typename Enable = void> struct CastFunctor; ``` which is then specialized by different source/target types. The coercibility between two types is then [inferred](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/utility/CastUtil.cpp#L58) according to whether the corresponding specialization exists. Thus it suffices to just specialize _CastFunctor_ when adding a new casting operation, and all the dependent places (e.g. _Type::isCoercibleFrom()_) will mostly be auto-generated by the system (unless the target type is a parameterized type and you want to do some further checks). Note that _safe-coercibility_ is a separate issue and needs to be taken care of mostly manually, by overriding _Type::isSafelyCoercibleFrom()_. Explicit casting is supported with a PostgreSQL-like syntax. E.g. (1) ``` SELECT (i::text + (i+1)::text)::int AS result FROM generate_series(1, 3) AS g(i); -- +-----------+ |result | +-----------+ | 12| | 23| | 34| +-----------+ ``` (2) ``` CREATE TABLE r(x varchar(16)); INSERT INTO r SELECT pow(10, i)::varchar(10) FROM generate_series(1, 3) AS g(i); SELECT 'There are ' + length(x)::varchar(10) + ' characters in ' + x AS result FROM r; -- +---------------------------------------------------+ |result | +---------------------------------------------------+ | There are 2 characters in 10| | There are 3 characters in 100| | There are 4 characters in 1000| +---------------------------------------------------+ ``` (3) ``` SELECT {1,2,3}::array(double) AS result from generate_series(1, 1); -- +--------------------------------+ |result | +--------------------------------+ | {1,2,3}| +--------------------------------+ ``` __NOTE__: The work is not yet fully completed so there may be `LOG(FATAL)` aborts for some combinations of queries. Implicit coersion is supported when resolving scalar functions, see [here](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/OperationFactory.cpp#L292). For example, we have support for the _sqrt_ function where the parameter can be a _Float_ or _Double_ value. Consider the query ``` SELECT sqrt(x) FROM r; ``` where `x` has _Int_ type, then an implicit coercion from _Int_ to _Float_ will be added. --- ##### 5. Add [_GenericValue_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/GenericValue.hpp) to represent typed-values of all four memory layouts. The original _TypedValue_ is not sufficient to represent _CxxGeneric_ values, as we need to embed the overall _Type_ information in order to handle value allocation/copy/destruction. However, due to performance consideration, we may not just replace _TypedValue_ with a more generic but slower implementation. Thus, a separate _GenericValue_ is added and we still use _TypedValue_ when handling storage-related operations. --- ##### 6. Move type resolving from parser to resolver. This avoids the need of modifying _SqlParser.ypp_ for adding a new type. See [_ParseDataType_](https://github.com/apache/incubator-quickstep/blob/refactor-type/parser/ParseDataType.hpp) and [_Resolver::resolveDataType()_](https://github.com/apache/incubator-quickstep/blob/refactor-type/query_optimizer/resolver/Resolver.cpp#L1196). ~ #### Part II. Scalar Function --- ##### 1. Implement [_UnaryOperationSynthesizer_/_UncheckedUnaryOperatorSynthesizer_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/UnaryOperationSynthesizer.hpp#L58) to make it easier to add unary functions. Example unary functions: * [Arithmetic](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/ArithmeticUnaryFunctors.hpp#L60) * [String](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/AsciiStringUnaryFunctors.hpp#L106) * [Math](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/CMathUnaryFunctors.hpp#L70) ##### 2. Implement [_BinaryOperationSynthesizer_/_UncheckedBinaryOperatorSynthesizer_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/binary_operations/BinaryOperationSynthesizer.hpp#L62) to make it easier to add binary functions. Example binary functions: * [Arithmetic](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/binary_operations/ArithmeticBinaryFunctors.hpp#L94) * [String](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/binary_operations/AsciiStringBinaryFunctors.hpp#L127) * [Math](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/binary_operations/CMathBinaryFunctors.hpp#L66) ##### 3. Use [_OperationSignature_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/OperationSignature.hpp#L45) and [_OperationFactory_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/OperationFactory.hpp#L48) to support general operation resolution. * See [_OperationFactory::OperationFactory()_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/OperationFactory.cpp#L85) about how operations are registered. * See [_Resolver::resolveScalarFunction()_](https://github.com/apache/incubator-quickstep/blob/refactor-type/query_optimizer/resolver/Resolver.cpp#L2889) about how a function from SQL query gets resolved. ~ #### Part III. TODOs * A lot of _TODO(refactor-type)_ in the code to be fixed. * Refactor the predicate system (we will have something like _ComparisonSynthesizer_). * A lot unit tests are broken (due to API change) and need to be fixed. * Comments and style of template metaprogramming code. * More to be added ... You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-quickstep refactor-type Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-quickstep/pull/315.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #315 ---- commit cb564509c8da64af1c0981ca816f962f94b06602 Author: Jianqiao Zhu <jianq...@cs.wisc.edu> Date: 2017-03-04T18:11:13Z Refactor type system and operations. commit 02005508dd4b6813ecc494e2cdfed842b4c93dc4 Author: Jianqiao Zhu <jianq...@cs.wisc.edu> Date: 2017-09-28T02:12:59Z Some updates commit ebf44cd2dd230bd45c849cb008005ad9c07b2d60 Author: Jianqiao Zhu <jianq...@cs.wisc.edu> Date: 2017-10-02T05:26:05Z Updates for adding generic types commit a7031a343814bb003353c6b0b75957f66db6240c Author: Jianqiao Zhu <jianq...@cs.wisc.edu> Date: 2017-10-02T06:30:46Z Add array expression commit b6fd31fec0cd9b1a89eee1fe89af70b85df44bc5 Author: Jianqiao Zhu <jianq...@cs.wisc.edu> Date: 2017-10-02T20:36:22Z Continue the work commit 1e69fb18eb9e7f31c48d85aaef781dca1ba8290a Author: Jianqiao Zhu <jianq...@cs.wisc.edu> Date: 2017-10-03T04:46:48Z Updates for array type commit bef66ad47a6edb69f76f29c56d528a28ba7760b8 Author: Jianqiao Zhu <jianq...@cs.wisc.edu> Date: 2017-10-03T06:52:29Z Updates to meta type commit 3a3772d91bd94d269b2f2fa49895f53385d46381 Author: Jianqiao Zhu <jianq...@cs.wisc.edu> Date: 2017-10-03T22:03:20Z Add text type commit 0957264b534cb65bbcc28999833f30bd8888856a Author: Jianqiao Zhu <jianq...@cs.wisc.edu> Date: 2017-10-04T05:26:36Z Type as first class citizen commit 9cb664c802f2a85862dea1cc41a08c989dd579e7 Author: Jianqiao Zhu <jianq...@cs.wisc.edu> Date: 2017-10-04T08:21:44Z More updates to types commit 1cb97e3547846240f3ceb0a7f086dbe912174b72 Author: Jianqiao Zhu <jianq...@cs.wisc.edu> Date: 2017-10-05T22:02:33Z More updates, refactor names commit 477c385d427483d4c2708449927f14268e53c311 Author: Jianqiao Zhu <jianq...@cs.wisc.edu> Date: 2017-10-10T18:20:17Z Updates to casts commit a3aec8e789b66c2c1de64cbd2cdc3fac70b8121b Author: Jianqiao Zhu <jianq...@cs.wisc.edu> Date: 2017-10-11T08:38:40Z Updates to implicit casts ---- ---