Hi Vimal,

Design doc looks clear, can you also add file format storage design for map
datatype.

Regards,
Ravi.

On 17 October 2016 at 07:43, Liang Chen <chenliang6...@gmail.com> wrote:

> Hi Vimal
>
> Thank you started the discussion.
> For keys of Map data only can be primitive, can you list these type which
> will be supported? (Int,String,Double..
>
> For discussing more conveniently, you can go ahead to use google docs.
> After the design document finalized , please archive and upload it to
> cwiki:https://cwiki.apache.org/confluence/display/
> CARBONDATA/CarbonData+Home
>
> Regards
> Liang
>
>
> Vimal Das Kammath wrote
> > Hi All,
> >
> > This discussion is regarding support for Map Data type in Carbon Data.
> >
> > Carbon Data supports complex and nested data types such as Arrays and
> > Struts. However, Carbon Data does not support other complex data types
> > such
> > as Maps and Union which are generally supported by popular opensource
> file
> > formats.
> >
> >
> > Supporting Map data type will require changes/additions to the DDL, Query
> > Syntax, Data Loading and Storage.
> >
> >
> > I have hosted the design on google docs for review and discussion.
> >
> > https://docs.google.com/document/d/1U6wPohvdDHk0B7bONnVHWa6PKG8R9
> q5-oKMqzMMQHYY/edit?usp=sharing
> >
> >
> > Below is the same inline.
> >
> >
> > 1.  DDL Changes
> >
> > Maps are key->value data types and where the value can be fetched by
> > providing the key. Hence we need to restrict keys to primitive data types
> > whereas values can be of any data type supported in Carbon(primitive and
> > complex).
> >
> > Map data types can be defined in the create table DDL as :-
> >
> > “MAP&lt;primitive_data_type, data_type&gt;”
> >
> > For Example:-
> >
> > create table example_table (id Int, name String, salary Int,
> > salary_breakup
> > map&lt;String, Int&gt;, city String)
> >
> >
> > 2.  Data Loading Changes
> >
> > Carbon should be able to support loading data into tables with Map type
> > columns from csv files. It should be possible to represent maps in a
> > single
> > row of csv. This will need carbon to support specifying the delimiters
> for
> > :-
> >
> > 1.     Between two Key-Value pairs
> >
> > 2.     Between each Key and Value in a pair
> >
> > As Carbon already supports Strut and Array Complex types, the data
> loading
> > process already provides support for defining delimiters for complex data
> > types. Carbon provides two Optional parameters for data loading
> >
> > 1.     COMPLEX_DELIMITER_LEVEL_1: will define the delimiter between two
> > Key-Value pairs
> >
> > OPTIONS('COMPLEX_DELIMITER_LEVEL_1'='$')
> >
> > 2.     COMPLEX_DELIMITER_LEVEL_2: will define the delimiter between each
> > Key and Value in a pair
> >
> > OPTIONS('COMPLEX_DELIMITER_LEVEL_2'=':')
> >
> > With these delimiter options, the below map can be represented in csv:-
> >
> > Fixed->100,000
> >
> > Bonus->30,000
> >
> > Stock->40,000
> >
> > As
> >
> > Fixed:100,000$Bonus:30,000$Stock:40,000 in the csv file.
> >
> >
> >
> > 3.  Query Capabilities
> >
> > A complex datatype like Map will require additional operators to be
> > supported in the query language to fully utilize the strength of the data
> > type.
> >
> > Maps are sequence of key-value pairs, hence should support looking up
> > value
> > for a given key. Users could use the ColumnName[“key”] syntax to lookup
> > values in a map column. For example: salary_breakup[“Fixed”] could be
> used
> > to fetch only the Fixed component in the salary breakup.
> >
> > In Addition, we also need to define how maps can be used in existing
> > constructs such as select, where(filter), group by etc..
> > 1.     Select:- Map data type can be directly selected or only the value
> > for a given key can be selected as per the requirement. For
> > example:-“Select
> > name, salary, salary_breakup” will return the content of map long with
> > each
> > row.“Select name, salary, salary_breakup[“Fixed”]” will return only one
> > value from the map whose key is “Fixed”2.     Filter:-Map data type
> cannot
> > be directly used in a where clause as where clause can operate only on
> > primitive data types. However the map lookup operator can be used in
> where
> > clauses. For example:-“Select name, salary where
> > salary_breakup[“Bonus”]>10,000”*Note: if the value is not of primitive
> > type, further assessor operators need to be used depending on the type of
> > value to arrive at a primitive type for the filter expression to be
> > valid.*
> > 3.     Group By:- Just like with filters, maps cannot be directly used in
> > a
> > group by clause, however the lookup operator can be used.
> >
> > 4.     Functions:- A size() function can be provided for map types to
> > determine the number of key-value pairs in a map.
> > 4.  Storage changes
> >
> > As Carbon is a columnar data store, Map values will be stored using 3
> > physical columns
> >
> > 1.     One Column for representing the Map Data type. Will store the
> > number
> > of fields and start index, just the same way as it is done for Struts and
> > Arrays.
> >
> > 2.     One Column for the Key
> >
> > 3.     One Column for the value, if the value is of primitive data type,
> > else the value itself will be multiple physical columns depending on the
> > data type of the value.
> >
> > Map&lt;String,Int&gt;
> >
> > Column_1
> >
> > Column_2
> >
> > Column_3
> >
> > Map_Salary_Breakup
> >
> > Map_Salary_Breakup.key
> >
> > Map_Salary_Breakup.value
> >
> > 3,1
> >
> > Fixed
> >
> > 1,00,000
> >
> > Bonus
> >
> > 30,000
> >
> > Stock
> >
> > 40,000
> >
> > 2,4
> >
> > Fixed
> >
> > 1,40,000
> >
> > Bonus
> >
> > 30,000
> >
> > 3,6
> >
> > Fixed
> >
> > 1,20,000
> >
> > Bonus
> >
> > 20,000
> >
> > Stock
> >
> > 30,000
> >
> > Regards
> > Vimal
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Discussion-New-
> feature-Support-Complex-Data-Type-Map-in-Carbon-Data-tp1969p1985.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>



-- 
Thanks & Regards,
Ravi

Reply via email to