Re: [Discussion] Roadmap for Apache CarbonData 2

manish gupta Tue, 13 Aug 2019 21:35:01 -0700

Hi Team

Its glad to see how Carbondata has grown and become popular over the time.
It was important to re-look and come up with a roadmap as per future needs.
Carbondata 2.0 proposal looks good as we are trying to align it with Cloud
which will be more or less the prominent run time environment in the near
future. A lot of code refactoring will be required as per the roadmap. I
would like to add a couple of points.


1. Complex type support: Although we do have complex type support there is
scope for improvement. use cases for nested columns are growing
extensively. We should work on improving the storage of nested columns and
should also support creating compound/multi column indexes for the nested
columns.
2. Feature code segregation and Pluggability: Current code is tightly
coupled. The ideal case would be to have a base and make all the features
pluggable into it but that will be hard to achieve. We can try segregation
at the package level for major features but for any new feature developed
we should think in terms of pluggability.

[Clarification] Carbon UI: I did not understand the usage of Carbon segment
management UI. For cloud scenario we will have to expose rest end points
which will make carbon more like a Microservice and that does not go along
with Carbondata use case. UI/tool makes more sense for internal testing but
not sure how it will be beneficial for end user. May be a tool showing the
data stored in each table would be more useful to the end user.

Regards
Manish Gupta

On Tue, Aug 13, 2019 at 4:51 PM Kumar Vishal <kumarvishal1...@gmail.com>
wrote:

> Hi Ravi,
>
> We can add below requirements in 2.0:
>
> 1. Data Loading performance improvement.(Need to analyze and improve)
> 2. Unify reading for carbon data file, currently data is read in two parts
> dimension and measure because of this number of IO is more.
> 3. Carbon Store size optimization(Already PR is raised need to revisit) and
> we can explore some more optimization(like RLE hybrid Bit Packing).
> 4. Presto enhancement(Like write support, Presto SQL adaptation, Complex
> type read support)
> 5. Spark Data Source V2 integration.
> 6. Spatial Index Support.
>
>
> -Regards
> Kumar Vishal
>
> On Thu, Jul 18, 2019 at 8:20 PM Ravindra Pesala <ravi.pes...@gmail.com>
> wrote:
>
> > Hi Kevin,
> >
> > Yes, we can improve it. The implementation is closely related to
> supporting
> > pre-aggregate datamaps on the streaming table which we have already
> > implemented some time ago. And same will be reimplemented for MV datamap
> > soon as well.
> > The implementation allows using of pre-aggregate datamap for
> non-streaming
> > segments and main table for streaming segments. We update the query plan
> to
> > do union on both the tables and query only the streaming segments for
> main
> > table.
> > So even in our case also we can use the same way, we can do the union
> query
> > of MV table and main table(only non loaded datamap segments) and execute
> > the query.  We can definitely consider after we support streaming table
> for
> > MV datamap.
> >
> > Regards,
> > Ravindra.
> >
> > On Wed, 17 Jul 2019 at 07:55, kevinjmh <kevin...@qq.com> wrote:
> >
> > > currently, datamap in carbon applys to all segments.
> > > The roadmap refers to commands like add/drop segment, and also maybe
> > > something
> > > about incremental loading for MV. For these scenes, it is better to
> make
> > > datamap can be use on segment level instead of disable the datamap when
> > any
> > > datamap data is not ready for any segment. Also this can make datamap
> > > fail-safe and enhance carbon's stablility.
> > > Maybe we can consider about this also.
> > >
> > >
> > >
> > >
> > > -----
> > > Regards
> > > Manhua
> > >
> > >
> > >
> > > ---Original---
> > > From: "Ravindra Pesala"<ravi.pes...@gmail.com>
> > > Date: Tue, Jul 16, 2019 22:31 PM
> > > To: "dev"<dev@carbondata.apache.org>;
> > > Subject: [Discussion] Roadmap for Apache CarbonData 2
> > >
> > >
> > > Hi Community,
> > >
> > > Three years have passed since the launching of the Apache CarbonData
> > > project, CarbonData has become a popular data management solution for
> > > various scenarios. As new workload like AI and new runtime environment
> > like
> > > the cloud is emerging quickly, I think we are reaching a point that
> needs
> > > to discuss the future of CarbonData.
> > >
> > > To bring CarbonData to a new level to satisfy those new requirements,
> > Jacky
> > > and I drafted a roadmap for CarbonData 2 in the cwiki website.
> > > - English Version:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+2+Roadmap+Proposal
> > > - Chinese Version:
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120737492
> > >
> > > Please feel free to discuss the roadmap in this thread, and we welcome
> > > every feedback to make CarbonData better.
> > >
> > > Thanks and Regards,
> > > Ravindra.
> >
> >
> >
> > --
> > Thanks & Regards,
> > Ravi
> >
>

Re: [Discussion] Roadmap for Apache CarbonData 2

Reply via email to