flink batch data processing

2016-07-22 Thread Paul Joireman
I'm evaluating for some processing batches of data.  As a simple example say I 
have 2000 points which I would like to pass through an FIR filter using 
functionality provided by the Python scipy libraryjk.  The scipy filter is a 
simple function which accepts a set of coefficients and the data to filter and 
returns the data.   Is is possible to create a transformation to handle this in 
flink?  It seems flink transformations are applied on a point by point basis 
but I may be missing something.

Paul


Re: flink batch data processing

2016-07-26 Thread Ufuk Celebi
Are you using the DataSet or DataStream API?

Yes, most Flink transformations operate on single tuples, but you can
work around it:
- You could write a custom source function, which emits records that
contain X points
(https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#data-sources)
- You can use a mapPartition
(https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/dataset_transformations.html#mappartition)
or FlatMap 
(https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/dataset_transformations.html#flatmap)
function and create the batches manually.

Does this help?

On Fri, Jul 22, 2016 at 7:21 PM, Paul Joireman  wrote:
> I'm evaluating for some processing batches of data.  As a simple example say
> I have 2000 points which I would like to pass through an FIR filter using
> functionality provided by the Python scipy libraryjk.  The scipy filter is a
> simple function which accepts a set of coefficients and the data to filter
> and returns the data.   Is is possible to create a transformation to handle
> this in flink?  It seems flink transformations are applied on a point by
> point basis but I may be missing something.
>
> Paul


[ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-11-30 Thread Yingjie Cao
Hi dev & users,

We are happy to announce the open source of remote shuffle project [1] for
Flink. The project is originated in Alibaba and the main motivation is to
improve batch data processing for both performance & stability and further
embrace cloud native. For more features about the project, please refer to
[1].

Before going open source, the project has been used widely in production
and it behaves well on both stability and performance. We hope you enjoy
it. Collaborations and feedbacks are highly appreciated.

Best,
Yingjie on behalf of all contributors

[1] https://github.com/flink-extended/flink-remote-shuffle


Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-11-30 Thread Till Rohrmann
Great news, Yingjie. Thanks a lot for sharing this information with the
community and kudos to all the contributors of the external shuffle service
:-)

Cheers,
Till

On Tue, Nov 30, 2021 at 2:32 PM Yingjie Cao  wrote:

> Hi dev & users,
>
> We are happy to announce the open source of remote shuffle project [1] for
> Flink. The project is originated in Alibaba and the main motivation is to
> improve batch data processing for both performance & stability and further
> embrace cloud native. For more features about the project, please refer to
> [1].
>
> Before going open source, the project has been used widely in production
> and it behaves well on both stability and performance. We hope you enjoy
> it. Collaborations and feedbacks are highly appreciated.
>
> Best,
> Yingjie on behalf of all contributors
>
> [1] https://github.com/flink-extended/flink-remote-shuffle
>


Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-11-30 Thread Yun Tang
Great news!
Thanks for all the guys who contributed in this project.

Best
Yun Tang

On 2021/11/30 16:30:52 Till Rohrmann wrote:
> Great news, Yingjie. Thanks a lot for sharing this information with the
> community and kudos to all the contributors of the external shuffle service
> :-)
> 
> Cheers,
> Till
> 
> On Tue, Nov 30, 2021 at 2:32 PM Yingjie Cao  wrote:
> 
> > Hi dev & users,
> >
> > We are happy to announce the open source of remote shuffle project [1] for
> > Flink. The project is originated in Alibaba and the main motivation is to
> > improve batch data processing for both performance & stability and further
> > embrace cloud native. For more features about the project, please refer to
> > [1].
> >
> > Before going open source, the project has been used widely in production
> > and it behaves well on both stability and performance. We hope you enjoy
> > it. Collaborations and feedbacks are highly appreciated.
> >
> > Best,
> > Yingjie on behalf of all contributors
> >
> > [1] https://github.com/flink-extended/flink-remote-shuffle
> >
> 


Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-11-30 Thread Jingsong Li
Amazing!

Thanks Yingjie and all contributors for your great work.

Best,
Jingsong

On Wed, Dec 1, 2021 at 10:52 AM Yun Tang  wrote:
>
> Great news!
> Thanks for all the guys who contributed in this project.
>
> Best
> Yun Tang
>
> On 2021/11/30 16:30:52 Till Rohrmann wrote:
> > Great news, Yingjie. Thanks a lot for sharing this information with the
> > community and kudos to all the contributors of the external shuffle service
> > :-)
> >
> > Cheers,
> > Till
> >
> > On Tue, Nov 30, 2021 at 2:32 PM Yingjie Cao  wrote:
> >
> > > Hi dev & users,
> > >
> > > We are happy to announce the open source of remote shuffle project [1] for
> > > Flink. The project is originated in Alibaba and the main motivation is to
> > > improve batch data processing for both performance & stability and further
> > > embrace cloud native. For more features about the project, please refer to
> > > [1].
> > >
> > > Before going open source, the project has been used widely in production
> > > and it behaves well on both stability and performance. We hope you enjoy
> > > it. Collaborations and feedbacks are highly appreciated.
> > >
> > > Best,
> > > Yingjie on behalf of all contributors
> > >
> > > [1] https://github.com/flink-extended/flink-remote-shuffle
> > >
> >



-- 
Best, Jingsong Lee


Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-11-30 Thread Yun Gao
Very thanks for all the warm responses ! We are greatly welcome more use cases 
and co-work on Flink Remote Shuffle and bash processing with Flink~

Best,
Yun


--
From:Yingjie Cao 
Send Time:2021 Dec. 1 (Wed.) 11:16
To:dev 
Subject:Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch 
data processing

Hi Jing,

Great to hear that, collaborations and feedbacks are welcomed.

Best,
Yingjie

Jing Zhang  于2021年12月1日周三 上午10:34写道:

> Amazing!
> Remote shuffle service is an important improvement for batch data
> processing experience on Flink.
> It is also a strong requirement in our internal batch business, we would
> try it soon and give you feedback.
>
> Best,
> Jing Zhang
>
> Martijn Visser  于2021年12月1日周三 上午3:25写道:
>
> > Hi Yingjie,
> >
> > This is great, thanks for sharing. Will you also add it to
> > https://flink-packages.org/ ?
> >
> > Best regards,
> >
> > Martijn
> >
> > On Tue, 30 Nov 2021 at 17:31, Till Rohrmann 
> wrote:
> >
> > > Great news, Yingjie. Thanks a lot for sharing this information with the
> > > community and kudos to all the contributors of the external shuffle
> > service
> > > :-)
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Nov 30, 2021 at 2:32 PM Yingjie Cao 
> > > wrote:
> > >
> > > > Hi dev & users,
> > > >
> > > > We are happy to announce the open source of remote shuffle project
> [1]
> > > for
> > > > Flink. The project is originated in Alibaba and the main motivation
> is
> > to
> > > > improve batch data processing for both performance & stability and
> > > further
> > > > embrace cloud native. For more features about the project, please
> refer
> > > to
> > > > [1].
> > > >
> > > > Before going open source, the project has been used widely in
> > production
> > > > and it behaves well on both stability and performance. We hope you
> > enjoy
> > > > it. Collaborations and feedbacks are highly appreciated.
> > > >
> > > > Best,
> > > > Yingjie on behalf of all contributors
> > > >
> > > > [1] https://github.com/flink-extended/flink-remote-shuffle
> > > >
> > >
> >
>



Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-11-30 Thread 刘建刚
Good work for flink's batch processing!
Remote shuffle service can resolve the container lost problem and reduce
the running time for batch jobs once failover. We have investigated the
component a lot and welcome Flink's native solution. We will try it and
help improve it.

Thanks,
Liu Jiangang

Yingjie Cao  于2021年11月30日周二 下午9:33写道:

> Hi dev & users,
>
> We are happy to announce the open source of remote shuffle project [1] for
> Flink. The project is originated in Alibaba and the main motivation is to
> improve batch data processing for both performance & stability and further
> embrace cloud native. For more features about the project, please refer to
> [1].
>
> Before going open source, the project has been used widely in production
> and it behaves well on both stability and performance. We hope you enjoy
> it. Collaborations and feedbacks are highly appreciated.
>
> Best,
> Yingjie on behalf of all contributors
>
> [1] https://github.com/flink-extended/flink-remote-shuffle
>


Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-12-01 Thread Yingjie Cao
Hi Jiangang,

Great to hear that, welcome to work together to make the project better.

Best,
Yingjie

刘建刚  于2021年12月1日周三 下午3:27写道:

> Good work for flink's batch processing!
> Remote shuffle service can resolve the container lost problem and reduce
> the running time for batch jobs once failover. We have investigated the
> component a lot and welcome Flink's native solution. We will try it and
> help improve it.
>
> Thanks,
> Liu Jiangang
>
> Yingjie Cao  于2021年11月30日周二 下午9:33写道:
>
> > Hi dev & users,
> >
> > We are happy to announce the open source of remote shuffle project [1]
> for
> > Flink. The project is originated in Alibaba and the main motivation is to
> > improve batch data processing for both performance & stability and
> further
> > embrace cloud native. For more features about the project, please refer
> to
> > [1].
> >
> > Before going open source, the project has been used widely in production
> > and it behaves well on both stability and performance. We hope you enjoy
> > it. Collaborations and feedbacks are highly appreciated.
> >
> > Best,
> > Yingjie on behalf of all contributors
> >
> > [1] https://github.com/flink-extended/flink-remote-shuffle
> >
>


Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-12-05 Thread Lijie Wang
As one of the contributors of flink remote shuffle, I'm glad to hear all
the warm responses! Welcome more people to try the flink remote shuffle and
look forward to your feedback.

Best,
Lijie

Yingjie Cao  于2021年12月1日周三 17:50写道:

> Hi Jiangang,
>
> Great to hear that, welcome to work together to make the project better.
>
> Best,
> Yingjie
>
> 刘建刚  于2021年12月1日周三 下午3:27写道:
>
>> Good work for flink's batch processing!
>> Remote shuffle service can resolve the container lost problem and reduce
>> the running time for batch jobs once failover. We have investigated the
>> component a lot and welcome Flink's native solution. We will try it and
>> help improve it.
>>
>> Thanks,
>> Liu Jiangang
>>
>> Yingjie Cao  于2021年11月30日周二 下午9:33写道:
>>
>> > Hi dev & users,
>> >
>> > We are happy to announce the open source of remote shuffle project [1]
>> for
>> > Flink. The project is originated in Alibaba and the main motivation is
>> to
>> > improve batch data processing for both performance & stability and
>> further
>> > embrace cloud native. For more features about the project, please refer
>> to
>> > [1].
>> >
>> > Before going open source, the project has been used widely in production
>> > and it behaves well on both stability and performance. We hope you enjoy
>> > it. Collaborations and feedbacks are highly appreciated.
>> >
>> > Best,
>> > Yingjie on behalf of all contributors
>> >
>> > [1] https://github.com/flink-extended/flink-remote-shuffle
>> >
>>
>