Size control of minot compaction

2020-11-22 Thread Zhangshunyu
Hi dev,
Currentlly, minor compaction only consider the num of segments and major
compaction only consider the SUM size of segments, but consider a scenario
that the user want to use minor compaction by the num of segments but he
dont want to merge the segment whose datasize larger the threshold for
example 2GB, as it is no need to merge so much big segment and it is time
costly.
so we need to add a parameter to control the threshold of segment included
in minor compaction, so that the user can specify the segment not included
in minor compaction once the datasize exeed the threshold, of course default
value must be threre.

So, what's your opinion about this?



-
My English name is Sunday
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSSION]Merge index property and operations improvement.

2020-11-22 Thread Ajantha Bhat
Hi Akash,
In point 3, you have mentioned no need to fail load if merge index fails,
So, how to create merge index again (as first-time query is slow without
merge index) If you block for new tables (as per point 2)? It is
contradicting I guess.

Here are my inputs for this,
*For Transactional tables*, As the merge index immediately deletes the
index files, concurrent queries can fail. So,

a) we can avoid exposing index files to the query (user), by making load
status success only after merge index created.
Also, update the table status file and segment file once after merge index
is created. no need to update with index file info before.
Also here keep maximum retry for table status file as this is the last
operation of load, failing here is costly to retry the whole load again.

b) After ensuring point a), If the merge index creation fails (which cannot
happen in most of the case), we can fail the load

c)  We still need to support the Alter table merge index command (mainly
required for old table upgrade scenario), no need to block for new tables.
when the user runs it, if index file doesn't exist (can know by reading
segment file), command can finish immediately and print a warning log that
no index files present to merge.

d) merge index carbon property (carbon.merge.index.in.segment), we can
directly remove it.


Thanks,
Ajantha

On Mon, Nov 9, 2020 at 1:56 PM Akash Nilugal  wrote:

> Hi All,
>
> Currently, we have the merge index feature which can be enabled or disabled
> and by default it's enabled.
> Now during load or compaction, we first create index files and then create
> merge index,
> if merge index generation fails we don't fail load, we have the alter
> compact command to do for unmerged
> index files.
>
> here are few things I want to suggest.
>
> 1. Deprecate the merge index property and keep for only for the developer
> purpose.
> 2. do not allow the alter compact merge index command for new table as
> already merge index is created and allow for only legacy tables.
>Alter merge index can be allowed only in the below conditions.
>a) when the update has happened on segment.
>b) when merge index creation failed during load or compaction.
> 3. Also no need to fail the load if the merge index fails(same as exiting
> behavior)
>
> Please suggest any modifications or any additions to this.
>
> Thanks
>
> Regards,
> Akash R Nilugal
>