Complex DataType Enhancements

2018-06-03 Thread sounak
Hi Dev,

Complex types (also referred to as nested types) let you represent multiple
data values within a single row/column position. CarbonData already has the
support of Complex Types but it lacks major enhancements which are present
in other primitive Datatypes. As complex type usages are increasing, we are
planning to enhance the coverage of Complex Types and apply some major
optimization. I am listing down few of the optimization which we have
thought off.

Request to the community to go through the listing and please give your
valuable suggestions.

1. Adaptive Encoding for Complex Type Page: Currently Complex Types
page doesn't have any encoding present, which leads to higher IO compared
to other DataTypes. Complex Page should be at par with other datatypes
encoding mechanism.

2. Optimize Array Type Reading: Optimizing Complex Type Array reading so
that it can be read faster. One of the ways is to reduce the Read IO for
Arrays after applying encoding mechanism like Adaptive or RLE on the Array
data type.

3.   Filter and Projection Push Down for Complex Datatypes: As of now in
case of Complex DataTypes filters and projections are handled in the upper
spark layer. In case they are pushed down Carbon will get better
performance as less IO will incur as all rows need not be send back to
spark for processing.

4. Support Multilevel Nesting in Complex Datatypes: Only 2 Level of nesting
is supported for Complex Datatype through Load and Insert into. Make this
to n-level support.

5. Update and Delete support for complex Datatype: Currently, only
primitive datatypes work for Update and Delete in CarbonData. Support
Complex DataType too for the DML operation.

6. Alter Table Support for Complex DataType : Alter table doesn't support
addition or deletion of complex columns as of now. This support needs to be
extended.

7. Map Datatype Support: Only Struct and Array datatypes are part of
Complex Datatype as of now. Map Datatype should be extended as part of
Complex.

8. Compaction support for Complex Datatype: Compaction works for the
primitive datatype, but should be extended for complex too.


Good to have features
--
9. Geospatial Support through Complex Datatype: Geospatial datatypes like
ST_GEOMETRY and XMLs  object representation through complex datatypes.

10. Complex Datatype Transformation: Once complex datatype can transform
into different complex datatype. For e.g. User Inserted Data with ComplexA
datatype but want to transform the data and retrieve the data like ComplexB
datatype.

11. Virtual Tables for Complex Datatypes: Currently complex columns reside
in one column, but through virtual tables, the complex columns an be
denormalized and placed into a separate table called a virtual table for
faster processing and joins and applying to sort columns.

12. Including Complex Datatype to Sort Columns.

Please let me know your suggestion on these enhancements.

Thanks a lot

-- 
Thanks
Sounak


Re: Support updating/deleting data for stream table

2018-06-03 Thread Liang Chen
Hi

+1 for first considering solution1

Regards
Liang

xm_zzc wrote
> Hi  Raghu:
>   Yep, you are right, so I said solution 1 is not very precise when there
> are still some data you want to update/delete being stored in stream
> segments, solution 2 can handle this scenario you mentioned.
>   But, in my opinion, the scenario of deleting historical data is more
> common than the one of updating data, the data size of stream table will
> grow day by day, user generally want to delete specific data to make data
> size not too large, for example, if user want to keep data for one year,
> he
> need to delete one year ago of data everyday. On the other hand, solution
> 2
> is more complicated than solution 1, we need to consider the implement of
> solution 2 in depth.
>   Based on the above reasons, Liang Chen, Jacky, David and I prefered to
> implement Solution 1 first. Is it ok for you?
>   
>   Is there any other suggestion?
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: MODERATE for dev@carbondata.apache.org

2018-06-03 Thread Liang Chen
Hi

1. You can get table detail info with the below script:
sql("desc formatted xx_your tablename")

2. You can find the more detail docs about datamap at : ../docs/datamap

Regards
Liang

2018-05-31 17:59 GMT+08:00 <
dev-reject-1527760781.11669.gamfjekkdhlpcbigj...@carbondata.apache.org>:

>
> To approve:
>dev-accept-1527760781.11669.gamfjekkdhlpcbigj...@carbondata.apache.org
> To reject:
>dev-reject-1527760781.11669.gamfjekkdhlpcbigj...@carbondata.apache.org
> To give a reason to reject:
> %%% Start comment
> %%% End comment
>
>
>
> -- 已转发邮件 --
> From: "陈星宇" 
> To: dev 
> Cc:
> Bcc:
> Date: Thu, 31 May 2018 17:59:32 +0800
> Subject: carbondata table detail
> hi,
> in carbondata 1.3.1, i created some tables with many properties, after
> that ,i want to check the tableproperties by 'show create table tablename',
> but i only got result just like:
> CREATE TABLE `xx`.`xxx` (`xx` STRING,xxx int )
> USING org.apache.spark.sql.CarbonSource
> OPTIONS (
>   `dbName` 'xx',
>   `carbonSchemaPartsNo` '2',
>   `serialization.format` '1',
>   `tableName` 'xx',
>   `isVisible` 'true',
>   `tablePath` 'xxx',
>   path 'xxx'
> )
>
> Also,i didn't find any way to check the datamap's detail.
>
> is any suggestion?
> thanks
>
> chenxingyu
>
>