Hey Liang and Xu Bo,
AI seems to be a good direction to move forward.

Ray is also a good option to integrate Carbondata with. It is getting quite
popular and has a strong place in the ML stack.

I suggest upgrading to newer spark versions as they have many good features
for AI/ML.
Also we should upgrade the spark version frequently to leverage these
features.

Another idea that popped into my head is if Carbondata can help in an
offline Feature Store for the model training.
Not sure whether this is feasible or even the right approach, need to
brainstorm on this.

Thanks
Kunal Kapoor


On Thu, 19 Oct, 2023, 7:36 pm Bo Xu, <xubo...@apache.org> wrote:

> Agree, CarbonData was focus on bigdata before, and only has very less
> integration with AI, such as PyCarbon, which support PyTorch and TensorFlow
> read data from CarbonData.  AI is very popular recently, and has many
> customer need unified data format and storage for bigdata and AI.
> I suggest:
> 1. support developer tools integrade CarbonData, such as jupyter notebook
> and  zepplin,
> 2. improve usability of CarbonData, such as support run CarbonData on
> docker and kubernetes  easily
> 3.support/enhance different AI framework  integrate CarbonData, such as
> TensorFlow/PyTorch/Ray
>
> I hope CarbonData can become unified data format and datastore for
> bigdata,warehouse and AI, User can use the same data with CarbonData in
> different compute engine,such as spark/flink/tensorflow/Pytorch
>
>
> On 2023/10/19 07:58:14 Liang Chen wrote:
> > As you know, Carbondata as datastore and dataformat already be quite good
> > and mature.
> > I want to create the thread via mailing list to open discuss what are the
> > next milestones of carbondata project?
> > One proposal from my side: we should consider how to integrate with AI
> > computing engine?
> >
> > Regards
> > Liang
> >
>

Reply via email to