Carbon over-use cluster resources

2020-04-02 Thread Manhua Jiang
Hi All,
Recently, I found carbon over-use cluster resources. Generally the design of 
carbon work flow does not act as common spark task which only do one small work 
in one thread, but the task has its mind/logic.

For example,
1.launch carbon with --num-executors=1 but set 
carbon.number.of.cores.while.loading=10;
2.no_sort table with multi-block input, N Iterator for example, 
carbon will start N tasks in parallel. And in each task the 
CarbonFactDataHandlerColumnar has model.getNumberOfCores() (let's say C) in 
ProducerPool. Totally launch N*C threads; ==>This is the case makes me take 
this as serious problem. To many threads stucks the executor to send heartbeat 
and be killed.

So, the over-use is related to usage of threadpool.

This would affect the cluster overall resource usage and may lead to wrong 
performance results.

I hope this get your notice while fixing or writing new codes.


Re: [VOTE] Apache CarbonData 2.0.0(RC1) release

2020-04-02 Thread Ajantha Bhat
Hi,
For rc1, my comment is : -1

Similar points as Liang but along with that, After #3661, many
documentation link is broken for MV, bloom, lucene datamap from ReadMe.md
We need to fix it soon before the carbondata 2.0.0 release.

Thanks,
Ajantha

On Thu, Apr 2, 2020 at 4:26 PM Liang Chen  wrote:

> Hi
>
> Thanks for preparing 2.0.0.
> For rc1, my comment is : -1 (binding)
> The following of open issues should be considerred in 2.0.0:
>
> https://github.com/apache/carbondata/pull/3675
> https://github.com/apache/carbondata/pull/3687
> https://github.com/apache/carbondata/pull/3682
> https://github.com/apache/carbondata/pull/3691
> https://github.com/apache/carbondata/pull/3689
> https://github.com/apache/carbondata/pull/3686
> https://github.com/apache/carbondata/pull/3683
> https://github.com/apache/carbondata/pull/3676
> https://github.com/apache/carbondata/pull/3690
> https://github.com/apache/carbondata/pull/3688
> https://github.com/apache/carbondata/pull/3639
> https://github.com/apache/carbondata/pull/3659
> https://github.com/apache/carbondata/pull/3669
>
> Regards
> Liang
>
> kunalkapoor wrote
> > Hi All,
> >
> > I submit the Apache CarbonData 2.0.0(RC1) for your vote.
> >
> >
> > *1.Release Notes:*
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12346046
> >
> > *Some key features and improvements in this release:*
> >
> >- Adapt to SparkSessionExtensions
> >- Support integration with spark 2.4.5
> >- Support heterogeneous format segments in carbondata
> >- Support write Flink streaming data to Carbon
> >- Insert from stage command support partition table.
> >- Support secondary index on carbon table
> >- Support query of stage files
> >- Support TimeBased Cache expiration using ExpiringMap
> >- Improve insert into performance and decrease memory foot print
> >
> >  *2. The tag to be voted upon* : apache-carbondata-2.0.0-rc1
> > <
> https://github.com/apache/carbondata/tree/apache-carbondata-2.0.0-rc1>;
> >
> > Commit: a906785f73f297b4a71c8aaeabae82ae690fb1c3
> > <
> https://github.com/apache/carbondata/commit/a906785f73f297b4a71c8aaeabae82ae690fb1c3>
> ;
> > )
> >
> > *3. The artifacts to be voted on are located here:*
> > https://dist.apache.org/repos/dist/dev/carbondata/2.0.0-rc1/
> >
> > *4. A staged Maven repository is available for review at:*
> >
> https://repository.apache.org/content/repositories/orgapachecarbondata-1060/
> >
> > *5. Release artifacts are signed with the following key:*
> > https://people.apache.org/keys/committer/kunalkapoor.asc
> >
> >
> > Please vote on releasing this package as Apache CarbonData 2.0.0,  The
> > vote will
> > be open for the next 72 hours and passes if a majority of at least three
> > +1
> > PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache CarbonData 2.0.0
> >
> > [ ] 0 I don't feel strongly about it, but I'm okay with the release
> >
> > [ ] -1 Do not release this package because...
> >
> >
> > Regards,
> > Kunal Kapoor
>
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [VOTE] Apache CarbonData 2.0.0(RC1) release

2020-04-02 Thread Liang Chen
Hi

Thanks for preparing 2.0.0.
For rc1, my comment is : -1 (binding)
The following of open issues should be considerred in 2.0.0:

https://github.com/apache/carbondata/pull/3675
https://github.com/apache/carbondata/pull/3687
https://github.com/apache/carbondata/pull/3682
https://github.com/apache/carbondata/pull/3691
https://github.com/apache/carbondata/pull/3689
https://github.com/apache/carbondata/pull/3686
https://github.com/apache/carbondata/pull/3683
https://github.com/apache/carbondata/pull/3676
https://github.com/apache/carbondata/pull/3690
https://github.com/apache/carbondata/pull/3688
https://github.com/apache/carbondata/pull/3639
https://github.com/apache/carbondata/pull/3659
https://github.com/apache/carbondata/pull/3669

Regards
Liang

kunalkapoor wrote
> Hi All,
> 
> I submit the Apache CarbonData 2.0.0(RC1) for your vote.
> 
> 
> *1.Release Notes:*
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12346046
> 
> *Some key features and improvements in this release:*
> 
>- Adapt to SparkSessionExtensions
>- Support integration with spark 2.4.5
>- Support heterogeneous format segments in carbondata
>- Support write Flink streaming data to Carbon
>- Insert from stage command support partition table.
>- Support secondary index on carbon table
>- Support query of stage files
>- Support TimeBased Cache expiration using ExpiringMap
>- Improve insert into performance and decrease memory foot print
> 
>  *2. The tag to be voted upon* : apache-carbondata-2.0.0-rc1
> ;
> 
> Commit: a906785f73f297b4a71c8aaeabae82ae690fb1c3
> ;
> )
> 
> *3. The artifacts to be voted on are located here:*
> https://dist.apache.org/repos/dist/dev/carbondata/2.0.0-rc1/
> 
> *4. A staged Maven repository is available for review at:*
> https://repository.apache.org/content/repositories/orgapachecarbondata-1060/
> 
> *5. Release artifacts are signed with the following key:*
> https://people.apache.org/keys/committer/kunalkapoor.asc
> 
> 
> Please vote on releasing this package as Apache CarbonData 2.0.0,  The
> vote will
> be open for the next 72 hours and passes if a majority of at least three
> +1
> PMC votes are cast.
> 
> [ ] +1 Release this package as Apache CarbonData 2.0.0
> 
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> 
> [ ] -1 Do not release this package because...
> 
> 
> Regards,
> Kunal Kapoor





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [ANNOUNCE] Kunal Kapoor as new PMC for Apache CarbonData

2020-04-02 Thread Tao Li
Congratulations Kunal ~~

On 2020/03/29 07:07:04, Liang Chen  wrote: 
> Hi
> 
> 
> We are pleased to announce that Kunal Kapoor as new PMC for Apache
> CarbonData.
> 
> 
> Congrats to Kunal Kapoor!
> 
> 
> Apache CarbonData PMC
> 


Re: [ANNOUNCE] Kunal Kapoor as new PMC for Apache CarbonData

2020-04-02 Thread Tao Li
Congratulations Kunal.




Re: [VOTE] Apache CarbonData 2.0.0(RC1) release

2020-04-02 Thread Akash r
+1

Regards,
Akash R Nilugal

On Wed, Apr 1, 2020 at 9:54 PM Kunal Kapoor 
wrote:

> Hi All,
>
> I submit the Apache CarbonData 2.0.0(RC1) for your vote.
>
>
> *1.Release Notes:*
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12346046
>
> *Some key features and improvements in this release:*
>
>- Adapt to SparkSessionExtensions
>- Support integration with spark 2.4.5
>- Support heterogeneous format segments in carbondata
>- Support write Flink streaming data to Carbon
>- Insert from stage command support partition table.
>- Support secondary index on carbon table
>- Support query of stage files
>- Support TimeBased Cache expiration using ExpiringMap
>- Improve insert into performance and decrease memory foot print
>
>  *2. The tag to be voted upon* : apache-carbondata-2.0.0-rc1
> 
>
> Commit: a906785f73f297b4a71c8aaeabae82ae690fb1c3
> <
> https://github.com/apache/carbondata/commit/a906785f73f297b4a71c8aaeabae82ae690fb1c3
> >
> )
>
> *3. The artifacts to be voted on are located here:*
> https://dist.apache.org/repos/dist/dev/carbondata/2.0.0-rc1/
>
> *4. A staged Maven repository is available for review at:*
>
> https://repository.apache.org/content/repositories/orgapachecarbondata-1060/
>
> *5. Release artifacts are signed with the following key:*
> https://people.apache.org/keys/committer/kunalkapoor.asc
>
>
> Please vote on releasing this package as Apache CarbonData 2.0.0,  The
> vote will
> be open for the next 72 hours and passes if a majority of at least three +1
> PMC votes are cast.
>
> [ ] +1 Release this package as Apache CarbonData 2.0.0
>
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
>
> [ ] -1 Do not release this package because...
>
>
> Regards,
> Kunal Kapoor
>