Re: Size control of minot compaction

2020-11-25 Thread Kunal Kapoor
The user has to anyways change the application to add new property.
If we don't change the property name then at least we can use the existing
major compaction size threshold property instead of adding a new one.

On Tue, 24 Nov 2020, 1:43 pm Zhangshunyu,  wrote:

> Hi Akash, if we change the property name, the old user need to change many
> places like code of his application, cluster config file etc to adapt to
> this change.  What's your opinion? @David @Ajantha
>
>
>
> -
> My English name is Sunday
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: Size control of minot compaction

2020-11-24 Thread Zhangshunyu
Hi Akash, if we change the property name, the old user need to change many
places like code of his application, cluster config file etc to adapt to
this change.  What's your opinion? @David @Ajantha



-
My English name is Sunday
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Size control of minot compaction

2020-11-23 Thread Kunal Kapoor
Hi Zhangshunyu,
We should refactor the code and change the property name from "
carbon.major.compaction.size" to "carbon.compaction.size.threshold"( A
global property is exposed which defines the size after which segment would
not be considered for auto compaction). By doing this we can use the same
threshold for major and minor compaction. Let us avoid adding new property
for a minor compaction size threshold.

Minor compaction would consider the segments based on  "
carbon.compaction.size.threshold  " and "carbon.compaction.level.threshold".
Major would consider all segments with size below "
carbon.compaction.size.threshold".
Custom compaction should not consider any property and do a force
compaction(existing behaviour).

Thanks
Kunal Kapoor

On Tue, Nov 24, 2020 at 10:32 AM Kunal Kapoor 
wrote:

> Hi Zhangshunyu,
> We should refactor the code and change the property name from "
> carbon.major.compaction.size" to "carbon.compaction.size.threshold"( A
> global property is exposed which defines the size after which segment would
> not be considered for auto compaction). By doing this we can use the same
> threshold for major and minor compaction. Let us avoid adding new property
> for a minor compaction size threshold.
>
> Consider 5 segments when carbon.compaction.threshold = 1GB:
>
> Minor compaction would consider the segments based on  "
> carbon.compaction.size.threshold  " and "carbon.compaction.level.threshold
> ".
> Major would consider all segments with size below "
> carbon.compaction.size.threshold".
> Custom compaction should not consider any property and do a force
> compaction(existing behaviour).
>
> Thanks
> Kunal Kapoor
>
> On Tue, Nov 24, 2020 at 7:32 AM Zhangshunyu 
> wrote:
>
>> OK
>>
>>
>>
>> -
>> My English name is Sunday
>> --
>> Sent from:
>> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>>
>


Re: Size control of minot compaction

2020-11-23 Thread Kunal Kapoor
Hi Zhangshunyu,
We should refactor the code and change the property name from "
carbon.major.compaction.size" to "carbon.compaction.size.threshold"( A
global property is exposed which defines the size after which segment would
not be considered for auto compaction). By doing this we can use the same
threshold for major and minor compaction. Let us avoid adding new property
for a minor compaction size threshold.

Consider 5 segments when carbon.compaction.threshold = 1GB:

Minor compaction would consider the segments based on  "
carbon.compaction.size.threshold  " and "carbon.compaction.level.threshold".
Major would consider all segments with size below "
carbon.compaction.size.threshold".
Custom compaction should not consider any property and do a force
compaction(existing behaviour).

Thanks
Kunal Kapoor

On Tue, Nov 24, 2020 at 7:32 AM Zhangshunyu  wrote:

> OK
>
>
>
> -
> My English name is Sunday
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: Size control of minot compaction

2020-11-23 Thread Zhangshunyu
OK



-
My English name is Sunday
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Size control of minot compaction

2020-11-23 Thread Ajantha Bhat
Hi Zhangshunyu, Thanks for providing more details on the problem.

If it is just for skipping history segments during auto minor compaction,
Adding a size threshold for minor compaction should be fine.
We can have a table level, dynamically configurable threshold.
If it is not configured, consider all the segments for merging. If
configured, consider the segments within the threshold value.

Thanks,
Ajantha

On Mon, Nov 23, 2020 at 5:26 PM Zhangshunyu  wrote:

> Yes, we need to support auto load merge for major compaction or size
> threshold limit for minor compaction.
> In many cases, the user use the minor compaction only want to merge small
> segments by time series (the num of segment is generated intime series),
> they dont want to merge big segment which is large enough.
>
>
>
> -
> My English name is Sunday
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: Size control of minot compaction

2020-11-23 Thread Zhangshunyu
Yes, we need to support auto load merge for major compaction or size
threshold limit for minor compaction.
In many cases, the user use the minor compaction only want to merge small
segments by time series (the num of segment is generated intime series),
they dont want to merge big segment which is large enough.



-
My English name is Sunday
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Size control of minot compaction

2020-11-23 Thread Zhangshunyu
hi Ajantha, thanks for this reply.
Because many users will enable auto load merge for monir compaction as the
segment will be geneated per hour based on time. 
Sometimes, the user will load some history data manually by load cmd, and
the data size of segment for history data will be very large,but the use
dont want to merge this segment in auto load merge while do minor compaction
as it is time costlt, so he want to set a paramter to limit the size of
segment added into minor compaction.



-
My English name is Sunday
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Size control of minot compaction

2020-11-23 Thread akashrn5
Hi Sunday,

This looks like a valid scenario because, may be some user application 
might be doing the minor compaction by default and some may be enabled auto 
compaction. which basically will be minor and if size is more we blindly go
to 
compact.

So i think instead of supporting auto compaction major/minor and adding as
new feature,
or making more changes to existing code, we can add little more intelligence
to the code
to identify the segments less than the threshold size to consider in minor
compaction.

Thanks

Regards,
Akash R



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Size control of minot compaction

2020-11-23 Thread Ajantha Bhat
Hi Zhangshunyu,

For this scenario specific cases, the user can use custom compaction by
mentioning the segment id which needs to be considered for compaction.

Also if you just want to do size based, major compaction can be used.

So, why are you thinking to support size based minor compaction? It will
basically lose the meaning of combining files based on number.

If you are using minor compaction for this scenario just because it
supports auto compaction, then may be we can check about supporting
"auto_compaction_type" = "minor/major"
option or the user can write some script to trigger major compaction
automatically.

Thanks,
Ajantha


On Mon, 23 Nov, 2020, 12:11 pm Zhangshunyu,  wrote:

> Hi dev,
> Currentlly, minor compaction only consider the num of segments and major
> compaction only consider the SUM size of segments, but consider a scenario
> that the user want to use minor compaction by the num of segments but he
> dont want to merge the segment whose datasize larger the threshold for
> example 2GB, as it is no need to merge so much big segment and it is time
> costly.
> so we need to add a parameter to control the threshold of segment included
> in minor compaction, so that the user can specify the segment not included
> in minor compaction once the datasize exeed the threshold, of course
> default
> value must be threre.
>
> So, what's your opinion about this?
>
>
>
> -
> My English name is Sunday
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Size control of minot compaction

2020-11-22 Thread Zhangshunyu
Hi dev,
Currentlly, minor compaction only consider the num of segments and major
compaction only consider the SUM size of segments, but consider a scenario
that the user want to use minor compaction by the num of segments but he
dont want to merge the segment whose datasize larger the threshold for
example 2GB, as it is no need to merge so much big segment and it is time
costly.
so we need to add a parameter to control the threshold of segment included
in minor compaction, so that the user can specify the segment not included
in minor compaction once the datasize exeed the threshold, of course default
value must be threre.

So, what's your opinion about this?



-
My English name is Sunday
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/