Re: Dear community

2023-10-20 Thread Tao Li
Hi Liang,
 AI technology has broad prospects, and large model technology is in full 
swing
 If we combine AI technology to automatically tune Carbon's parameters, 
including some predictions, we will become more user-friendly
It's very visionary to exert force here。 

On 2023/10/19 07:58:14 Liang Chen wrote:
> As you know, Carbondata as datastore and dataformat already be quite good
> and mature.
> I want to create the thread via mailing list to open discuss what are the
> next milestones of carbondata project?
> One proposal from my side: we should consider how to integrate with AI
> computing engine?
> 
> Regards
> Liang
> 


Re: Dear community

2023-10-20 Thread Indhumathi M
Hi Liang,

Agree on your point, to get CarbonData integrated with AI and Machine
Learning, which could help with predictive analytics and also automated
data cleaning.

Some other potential features that we could consider for next roadmap could
be Data versioning, TimeTravel and upgrading Spark, Flink, Presto etc. to
leverage new features.

Regards,
Indhumathi M


On Thu, Oct 19, 2023 at 1:28 PM Liang Chen  wrote:

> As you know, Carbondata as datastore and dataformat already be quite good
> and mature.
> I want to create the thread via mailing list to open discuss what are the
> next milestones of carbondata project?
> One proposal from my side: we should consider how to integrate with AI
> computing engine?
>
> Regards
> Liang
>


Re: Dear community

2023-10-19 Thread Kunal Kapoor
Hey Liang and Xu Bo,
AI seems to be a good direction to move forward.

Ray is also a good option to integrate Carbondata with. It is getting quite
popular and has a strong place in the ML stack.

I suggest upgrading to newer spark versions as they have many good features
for AI/ML.
Also we should upgrade the spark version frequently to leverage these
features.

Another idea that popped into my head is if Carbondata can help in an
offline Feature Store for the model training.
Not sure whether this is feasible or even the right approach, need to
brainstorm on this.

Thanks
Kunal Kapoor


On Thu, 19 Oct, 2023, 7:36 pm Bo Xu,  wrote:

> Agree, CarbonData was focus on bigdata before, and only has very less
> integration with AI, such as PyCarbon, which support PyTorch and TensorFlow
> read data from CarbonData.  AI is very popular recently, and has many
> customer need unified data format and storage for bigdata and AI.
> I suggest:
> 1. support developer tools integrade CarbonData, such as jupyter notebook
> and  zepplin,
> 2. improve usability of CarbonData, such as support run CarbonData on
> docker and kubernetes  easily
> 3.support/enhance different AI framework  integrate CarbonData, such as
> TensorFlow/PyTorch/Ray
>
> I hope CarbonData can become unified data format and datastore for
> bigdata,warehouse and AI, User can use the same data with CarbonData in
> different compute engine,such as spark/flink/tensorflow/Pytorch
>
>
> On 2023/10/19 07:58:14 Liang Chen wrote:
> > As you know, Carbondata as datastore and dataformat already be quite good
> > and mature.
> > I want to create the thread via mailing list to open discuss what are the
> > next milestones of carbondata project?
> > One proposal from my side: we should consider how to integrate with AI
> > computing engine?
> >
> > Regards
> > Liang
> >
>


Re: Dear community

2023-10-19 Thread Bo Xu
Agree, CarbonData was focus on bigdata before, and only has very less 
integration with AI, such as PyCarbon, which support PyTorch and TensorFlow 
read data from CarbonData.  AI is very popular recently, and has many customer 
need unified data format and storage for bigdata and AI.
I suggest: 
1. support developer tools integrade CarbonData, such as jupyter notebook and  
zepplin,
2. improve usability of CarbonData, such as support run CarbonData on docker 
and kubernetes  easily
3.support/enhance different AI framework  integrate CarbonData, such as 
TensorFlow/PyTorch/Ray

I hope CarbonData can become unified data format and datastore for 
bigdata,warehouse and AI, User can use the same data with CarbonData in 
different compute engine,such as spark/flink/tensorflow/Pytorch


On 2023/10/19 07:58:14 Liang Chen wrote:
> As you know, Carbondata as datastore and dataformat already be quite good
> and mature.
> I want to create the thread via mailing list to open discuss what are the
> next milestones of carbondata project?
> One proposal from my side: we should consider how to integrate with AI
> computing engine?
> 
> Regards
> Liang
> 


Dear community

2023-10-19 Thread Liang Chen
As you know, Carbondata as datastore and dataformat already be quite good
and mature.
I want to create the thread via mailing list to open discuss what are the
next milestones of carbondata project?
One proposal from my side: we should consider how to integrate with AI
computing engine?

Regards
Liang