[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-11-08 Thread roryqi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17440423#comment-17440423
 ] 

roryqi commented on FLINK-13247:


[~maguowei] Can I give me your email? I want to communicate with you about the 
remote shuffle service?

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Priority: Minor
>  Labels: auto-deprioritized-major
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-09-30 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422627#comment-17422627
 ] 

Guowei Ma commented on FLINK-13247:
---

Hi [~dangshazi] 

I am glad you are interested in the this topic. Currently there is no public 
doc yet.

But we are plan to open source our  remote shuffle service in mid-to-late 
October.

 

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Priority: Minor
>  Labels: auto-deprioritized-major
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-09-26 Thread LI Mingkun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420436#comment-17420436
 ] 

LI Mingkun commented on FLINK-13247:


Is there any related MR or design doc? [~maguowei] [~wind_ljy]

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Priority: Minor
>  Labels: auto-deprioritized-major
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-04-20 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325814#comment-17325814
 ] 

Guowei Ma commented on FLINK-13247:
---

I am very glad that you are interested in this topic. [~wind_ljy]

We also implemented a Remote Shuffle Service based on 1.13, this is mainly 
taking into account the increasingly common situation of storage and computing 
separation and containerization. At present, it is mainly deployed on k8s, of 
course, it is not a big problem to deploy to yarn. In the process of 
implementation, we also found some minor problems with pluggable shuffle 
service architecture, and we plan to initiate some discussions in the community.

If you are also interested, we can take time to chat offline.

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Priority: Major
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-04-20 Thread Jiayi Liao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325805#comment-17325805
 ] 

Jiayi Liao commented on FLINK-13247:


[~trohrmann][~zjwang] It's okay for me. I'll draft a desgin ASAP.

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Priority: Major
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-04-20 Thread Zhijiang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325754#comment-17325754
 ] 

Zhijiang commented on FLINK-13247:
--

[~wind_ljy] Glad to hear that you have implemented this feature in your company 
and desired to contriute it. I believe it would be a good enhancement to Flink 
batch jobs. But I guess most of the committers are pretty busy for the current 
release work ATM, so it's better to prepare your detail design and then the 
community can judge whether it would be covered in the next release cycle.

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Priority: Major
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-04-20 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325603#comment-17325603
 ] 

Till Rohrmann commented on FLINK-13247:
---

[~wind_ljy] if you can contribute the external shuffle service to Flink then 
this would be really good. I think this feature will be quite helpful for our 
batch/batch-execution-mode users because it makes the failover much cheaper 
also in cases where a TM dies.

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Priority: Major
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-04-18 Thread Marek Simunek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324601#comment-17324601
 ] 

Marek Simunek commented on FLINK-13247:
---

Very very glad to hear. Frankly running batch jobs without external shuffle 
service is not economically viable due to wasted resources. So in our company 
it's must have to start preferring Flink over spark for batch jobs.

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Priority: Minor
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-04-17 Thread Jiayi Liao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324189#comment-17324189
 ] 

Jiayi Liao commented on FLINK-13247:


Actually we've already implemented Yarn shuffle service in our internal 
version, and also can be contributed to Flink community. I can summarize our 
implementation and write a detailed design before 5th May, if the community 
still needs this. What do you think [~zjwang]? 

 

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Priority: Minor
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-01-19 Thread Zhijiang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268352#comment-17268352
 ] 

Zhijiang commented on FLINK-13247:
--

Thanks [~trohrmann] for the operation. Actually I tried to do that while last 
commenting, but failed unexpectedly :(

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Priority: Minor
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-01-18 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267099#comment-17267099
 ] 

Till Rohrmann commented on FLINK-13247:
---

Thanks for the update [~zjwang]. I will unassign [~ssy] then.

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Assignee: MalcolmSanders
>Priority: Minor
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-01-17 Thread Zhijiang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267020#comment-17267020
 ] 

Zhijiang commented on FLINK-13247:
--

AFAIK, the current assignee [~ssy] will not work on it any more since he has 
already transferred to a new work position.  But I am not sure whether there 
are other candidates to take over this issue in the future plan.

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Assignee: MalcolmSanders
>Priority: Minor
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-01-14 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264757#comment-17264757
 ] 

Till Rohrmann commented on FLINK-13247:
---

At the moment I am not aware that the community is actively working on it 
[~marek.simunek]. But I am pulling in [~maguowei] who should know better than 
me.

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Assignee: MalcolmSanders
>Priority: Minor
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13247) Implement external shuffle service for YARN

2021-01-14 Thread Marek Simunek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264747#comment-17264747
 ] 

Marek Simunek commented on FLINK-13247:
---

Hi, is there some raw plan when it will be finished?  [~ssy] 

> Implement external shuffle service for YARN
> ---
>
> Key: FLINK-13247
> URL: https://issues.apache.org/jira/browse/FLINK-13247
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: MalcolmSanders
>Assignee: MalcolmSanders
>Priority: Minor
>
> Flink batch job users could achieve better cluster utilization and job 
> throughput throught external shuffle service because the producers of 
> intermedia result partitions can be released once intermedia result 
> partitions have been persisted on disks. In 
> [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. I propose to YARN implementation for flink external shuffle service 
> since YARN is widely used in various companies.
> The basic idea is as follows:
> (1) Producers write intermedia result partitions to local disks assigned by 
> NodeManager;
> (2) Yarn shuffle servers, deployed on each NodeManager as an auxiliary 
> service, are acknowledged of intermedia result partition descriptions by 
> producers;
> (3) Consumers fetch intermedia result partition from yarn shuffle servers;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)