Re: [Discussion] Carbon Local Dictionary Support

2018-06-08 Thread akashrn5
1.  If user is giving any invalid value, default threshold(1000 unique 
values)
value will be considered.  What is the consideration behind the default
value 1000.
*1000 is a random value we have mentioned in design doc. 
CARBON_LOCALDICT_THRESHOLD is exposed to user for setting threshold value
based on their usecase*

2.  There is no option mentioned for the user to alter the table if the
ENABLE_LOCAL_DICT and CARBON_LOCALDICT_THRESHOLD values are set. This would
also help in compatibility if we want to generate local dictionary for table
created in previous versions.
*In new load for old table local dictionary will be generated as by default
local dictionary generation is enabled. Alter command for setting
CARBON_LOCALDICT_THRESHOLD and ENABLE_LOCAL_DICT property will be provided
for older tables and This will be updated in desing doc. Thank you for
pointing this out*

3.There should be validation provided if the user inputs ENABLE_LOCAL_DICT
as false and tries to set CARBON_LOCALDICT_THRESHOLD value.
*will not consider Threshold value if ENABLE_LOCAL_DICT is false*

4.Impact of alter table add/drop/change type of column is not mentioned .
*There is no impact that’s why not captured in design doc's Impact analysis
section*

5.Would complex types also be considered for local dictionary. 
* it will be handled for complex primitive no dictionary String data type
columns*

6.For any column if dictionary values crosses the threshold
(carbon_localdict_threshold), then it will drop dictionary for that column.
 Could not understand “drop dictionary for that column”
* Local dictionary will not be considered for respective column*

7.For better testability information regarding generation and updation of
local dictionary can be logged.
*Log will be added for each level of local dictionary generation.*




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] Carbon Local Dictionary Support

2018-06-08 Thread akashrn5
1.  If user is giving any invalid value, default threshold(1000 unique 
values)
value will be considered.  What is the consideration behind the default
value 1000.
*1000 is a random value we have mentioned in design doc. 
CARBON_LOCALDICT_THRESHOLD is exposed to user for setting threshold value
based on their usecase*

2.  There is no option mentioned for the user to alter the table if the
ENABLE_LOCAL_DICT and CARBON_LOCALDICT_THRESHOLD values are set. This would
also help in compatibility if we want to generate local dictionary for table
created in previous versions.
*In new load for old table local dictionary will be generated as by default
local dictionary generation is enabled. Alter command for setting
CARBON_LOCALDICT_THRESHOLD and ENABLE_LOCAL_DICT property will be provided
for older tables and This will be updated in desing doc. Thank you for
pointing this out*

3.There should be validation provided if the user inputs ENABLE_LOCAL_DICT
as false and tries to set CARBON_LOCALDICT_THRESHOLD value.
*will not consider Threshold value if ENABLE_LOCAL_DICT is false*

4.Impact of alter table add/drop/change type of column is not mentioned .
*There is no impact that’s why not captured in design doc's Impact analysis
section*

5.Would complex types also be considered for local dictionary. 
* it will be handled for complex primitive no dictionary String data type
columns*

6.For any column if dictionary values crosses the threshold
(carbon_localdict_threshold), then it will drop dictionary for that column.
 Could not understand “drop dictionary for that column”
* Local dictionary will not be considered for respective column*

7.For better testability information regarding generation and updation of
local dictionary can be logged.
*Log will be added for each level of local dictionary generation.*




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Use RowStreamParserImp as default value of config 'carbon.stream.parser'

2018-06-08 Thread Jacky Li
+1.

I think this change is fine.

Regards,
Jacky

> 在 2018年6月8日,下午3:10,David CaiQiang  写道:
> 
> +1, I agree with using RowStreamParserImpl by default.
> 
> 
> -
> Best Regards
> David Cai
> --
> Sent from: 
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> 





Re: carbondata partitioned by date generate many small files

2018-06-08 Thread Jacky Li
Hi, 

I couldn’t see the picture you sent, can you send a text of it?

Regards,
Jacky

> 在 2018年6月6日,上午9:46,陈星宇  写道:
> 
> hi Li,
> Yes,i got the partition folder as you say, but under the partition folder 
> ,there are many small file just like following picture,
> How to merge then automatically after jobs done.
> 
> 
> thanks
> 
> ChenXingYu
>  
>  
> -- Original --
> From:  "Jacky Li";
> Date:  Tue, Jun 5, 2018 08:43 PM
> To:  "dev";
> Subject:  Re: carbondata partitioned by date generate many small files
>  
> Hi,
> 
> There is a testcase in StandardPartitionTableQueryTestCase used date column 
> as partition column, if you run that testcase, the partition folder generated 
> looks like following picture.
>  
> 
> Are you getting similar folders?
> 
> Regards,
> Jacky
> 
>> 在 2018年6月5日,下午2:49,陈星宇 > > 写道:
>> 
>> hi carbondata team,
>> 
>> 
>> i am using carbondata 1.3.1 to create table and import data, generated many 
>> small files and spark job is very slow, i suspected the number of file is 
>> related to the number of spark job . but if i decrease the jobs, job will 
>> fail because of outofmemory. see my ddl as below:
>> 
>> 
>> create table xx.xx(
>> dept_name string,
>> xx
>> .
>> .
>> .
>> ) PARTITIONED BY (xxx date)
>> STORED BY 'carbondata' TBLPROPERTIES('SORT_COLUMNS'='xxx,xxx,xxx ,xxx,xxx')
>> 
>> 
>> 
>> please give some advice.
>> 
>> 
>> thanks
>> 
>> 
>> ChenXingYu
> 



Re: [Discussion] Carbon Local Dictionary Support

2018-06-08 Thread chetdb
Dear Vishal,

Please find the queries/comments on the design doc.

1.  If user is giving any invalid value, default threshold(1000 unique
values) value will be considered.  What is the consideration behind the
default value 1000.
2.  There is no option mentioned for the user to alter the table if the
ENABLE_LOCAL_DICT and CARBON_LOCALDICT_THRESHOLD values are set. This would
also help in compatibility if we want to generate local dictionary for table
created in previous carbon versions.
3.  There should be validation provided if the user inputs ENABLE_LOCAL_DICT
as false and tries to set CARBON_LOCALDICT_THRESHOLD value.
4.  Impact of alter table add/drop/change type of column is not mentioned .
5.  would complex types also be considered for local dictionary.
6.  For any column if dictionary values crosses the threshold
(carbon_localdict_threshold), then it will drop dictionary for that column.
 Could not understand “drop dictionary for that column”
7.  For better testability information regarding generation and updation of
local dictionary can be logged.

Regards

Chetan 




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] Carbon Local Dictionary Support

2018-06-08 Thread akashrn5
Hi bhavya,

Local dictionary generation is task level. if in ongoing load, if the
threshold is breached, then for that load the local dictionary will not be
generated for that corresponding column and there is no dependency with the
previous loads. For each load new local dictionary will be generated.

Regards,
Akash r Nilugal



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Use RowStreamParserImp as default value of config 'carbon.stream.parser'

2018-06-08 Thread David CaiQiang
+1, I agree with using RowStreamParserImpl by default.



-
Best Regards
David Cai
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] Carbon Local Dictionary Support

2018-06-08 Thread akashrn5
Hi xuchuanyin,

Please find my comments inline

About query filtering 

1. “during filter, actual filter values will be generated using column local 
dictionary values...then filter will be applied on the dictionary encode 
data” 
--- 
If the filter is not 'equal' but 'like','greater than', can it also run on 
encode data. 

*For range type of filters , it will be same as the way global dictionary
column is handled.*

2. "As dictionary data will be always of 4 bytes " 
--- 
Why they are 4 bytes? 

*Dictionary value/data is nothing but integer value assigned to the
dictionary key. So it will of 4 bytes.*





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/