Hi Dev, We have identified the scope of phase1 activities for complex type enhancements.
Below are the phase 1 enhancement activities. - Predicate push down for struct data type. - Provide adaptive encoding and decoding for all data type. - Support JSON data loading directly into Carbon table. Please find the detail design document attached in the JIRA [CARBONDATA-2605 ] https://issues.apache.org/jira/browse/CARBONDATA-2605 Thanks, Sounak On Mon, Jun 4, 2018 at 8:10 AM sounak <[email protected]> wrote: > Hi Dev, > > Complex types (also referred to as nested types) let you represent > multiple data values within a single row/column position. > CarbonData already has the support of Complex Types but it lacks major > enhancements which are present in other primitive Datatypes. As complex > type usages are increasing, we are planning to enhance the coverage of > Complex Types and apply some major optimization. I am listing down few of > the optimization which we have thought off. > > Request to the community to go through the listing and please give your > valuable suggestions. > > 1. Adaptive Encoding for Complex Type Page: Currently Complex Types > page doesn't have any encoding present, which leads to higher IO compared > to other DataTypes. Complex Page should be at par with other datatypes > encoding mechanism. > > 2. Optimize Array Type Reading: Optimizing Complex Type Array reading so > that it can be read faster. One of the ways is to reduce the Read IO for > Arrays after applying encoding mechanism like Adaptive or RLE on the Array > data type. > > 3. Filter and Projection Push Down for Complex Datatypes: As of now in > case of Complex DataTypes filters and projections are handled in the upper > spark layer. In case they are pushed down Carbon will get better > performance as less IO will incur as all rows need not be send back to > spark for processing. > > 4. Support Multilevel Nesting in Complex Datatypes: Only 2 Level of > nesting is supported for Complex Datatype through Load and Insert into. > Make this to n-level support. > > 5. Update and Delete support for complex Datatype: Currently, only > primitive datatypes work for Update and Delete in CarbonData. Support > Complex DataType too for the DML operation. > > 6. Alter Table Support for Complex DataType : Alter table doesn't support > addition or deletion of complex columns as of now. This support needs to be > extended. > > 7. Map Datatype Support: Only Struct and Array datatypes are part of > Complex Datatype as of now. Map Datatype should be extended as part of > Complex. > > 8. Compaction support for Complex Datatype: Compaction works for the > primitive datatype, but should be extended for complex too. > > > Good to have features > ------------------------------ > 9. Geospatial Support through Complex Datatype: Geospatial datatypes like > ST_GEOMETRY and XMLs object representation through complex datatypes. > > 10. Complex Datatype Transformation: Once complex datatype can transform > into different complex datatype. For e.g. User Inserted Data with ComplexA > datatype but want to transform the data and retrieve the data like ComplexB > datatype. > > 11. Virtual Tables for Complex Datatypes: Currently complex columns reside > in one column, but through virtual tables, the complex columns an be > denormalized and placed into a separate table called a virtual table for > faster processing and joins and applying to sort columns. > > 12. Including Complex Datatype to Sort Columns. > > Please let me know your suggestion on these enhancements. > > Thanks a lot > > -- > Thanks > Sounak > -- Thanks Sounak
