Re: yarn session: one JVM per task
Perfect. No problem. My Bad. Not really clear. Thanks ! Le mar. 25 févr. 2020 à 13:45, Xintong Song a écrit : > Ah, I misunderstood and thought that you want to keep all your Sink > instances on the same TM. > > If what you want is to have one instance per TM, then as Gary mentioned > specifying "-s 1" at starting the session would be enough, and it should > work with all existing versions above (including) 1.8. > > Thank you~ > > Xintong Song > > > > On Tue, Feb 25, 2020 at 7:41 PM David Morin > wrote: > >> Hi Gary, >> >> Sorry I was probably not very clear. >> Yes that's exactly what I want to hear :) >> I use the -s 1 parameter and what I expect to have is one task of my Sink >> (one instance in fact) per TM (i.e. per JVM) >> That's the current behaviour during my tests but I want to be sure. >> Thanks a lot >> >> David >> >> Le mar. 25 févr. 2020 à 11:16, Gary Yao a écrit : >> >>> Hi David, >>> >>> Before with the both n and -s it was not the case. >>> >>> What do you mean by before? At least in 1.8 "-s" could be used to >>> specify the >>> number of slots per TM. >>> >>> >>> how can I be sure that my Sink that uses this lib is in one JVM ? >>> >>> Is it enough that no other parallel instance of your sink runs in the >>> same >>> JVM? If that is the case, it is enough to start your your YARN session >>> with: >>> >>> ./bin/yarn-session.sh -s 1 [...] >>> >>> This will result in exactly one slot per TM. Note that a single slot may >>> still >>> hold several subtasks of the job (Slot Sharing) but never two parallel >>> instances of your sink [2]. You can also control Slot Sharing manually >>> [3]. >>> >>> >>> So, if I understand I have to keep this Flink release (1.9.2) ? >>> >>> I don't see why 1.10.0 would not work for you. >>> >>> >>> Best, >>> Gary >>> >>> [1] >>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#start-a-session >>> [2] >>> https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html#task-slots-and-resources >>> [3] >>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#task-chaining-and-resource-groups >>> >>> On Tue, Feb 25, 2020 at 10:28 AM David Morin >>> wrote: >>> Hi Xintong, At the moment I'm using the 1.9.2 with this command: yarn-session.sh -d *-s 1* -jm 4096 -tm 4096 -qu "XXX" -nm "MyPipeline" So, after a lot of tests, I've noticed that if I increase the parallelism of my Custom Sink, each task is embedded into one TS and, the most important, each one into one TaskManager (Yarn container in fact). So, if I understand I have to keep this Flink release (1.9.2) ? Thanks David Le mar. 25 févr. 2020 à 02:02, Xintong Song a écrit : > Depending on your Flink version, the '-n' option might not take > effect. It is removed in the latest release, but before that there were a > few versions where this option is neither removed nor taking effect. > > Anyway, as long as you have multiple containers, I don't think there's > a way to make some of the tasks scheduled to the same JVM. Not that I'm > aware of. > > > Thank you~ > > Xintong Song > > > > On Mon, Feb 24, 2020 at 8:43 PM David Morin > wrote: > >> Hi, >> >> Thanks Xintong. >> I've noticed than when I use yarn-session.sh with --slots (-s) >> parameter but without --container (-n) it creates one task/slot per >> taskmanager. Before with the both n and -s it was not the case. >> I prefer to use only small container with only one task to scale my >> pipeline and of course to prevent from thread-safe issue >> Do you think I cannot be confident on that behaviour ? >> >> Regards, >> David >> >> On 2020/02/22 17:11:25, David Morin >> wrote: >> > Hi, >> > My app is based on a lib that is not thread safe (yet...). >> > In waiting of the patch has been pushed, how can I be sure that my >> Sink that uses this lib is in one JVM ? >> > Context: I use one Yarn session and send my Flink jobs to this >> session >> > >> > Regards, >> > David >> > >> >
Re: yarn session: one JVM per task
Ah, I misunderstood and thought that you want to keep all your Sink instances on the same TM. If what you want is to have one instance per TM, then as Gary mentioned specifying "-s 1" at starting the session would be enough, and it should work with all existing versions above (including) 1.8. Thank you~ Xintong Song On Tue, Feb 25, 2020 at 7:41 PM David Morin wrote: > Hi Gary, > > Sorry I was probably not very clear. > Yes that's exactly what I want to hear :) > I use the -s 1 parameter and what I expect to have is one task of my Sink > (one instance in fact) per TM (i.e. per JVM) > That's the current behaviour during my tests but I want to be sure. > Thanks a lot > > David > > Le mar. 25 févr. 2020 à 11:16, Gary Yao a écrit : > >> Hi David, >> >> Before with the both n and -s it was not the case. >>> >> >> What do you mean by before? At least in 1.8 "-s" could be used to specify >> the >> number of slots per TM. >> >> >> how can I be sure that my Sink that uses this lib is in one JVM ? >>> >> >> Is it enough that no other parallel instance of your sink runs in the same >> JVM? If that is the case, it is enough to start your your YARN session >> with: >> >> ./bin/yarn-session.sh -s 1 [...] >> >> This will result in exactly one slot per TM. Note that a single slot may >> still >> hold several subtasks of the job (Slot Sharing) but never two parallel >> instances of your sink [2]. You can also control Slot Sharing manually >> [3]. >> >> >> So, if I understand I have to keep this Flink release (1.9.2) ? >>> >> >> I don't see why 1.10.0 would not work for you. >> >> >> Best, >> Gary >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#start-a-session >> [2] >> https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html#task-slots-and-resources >> [3] >> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#task-chaining-and-resource-groups >> >> On Tue, Feb 25, 2020 at 10:28 AM David Morin >> wrote: >> >>> Hi Xintong, >>> >>> At the moment I'm using the 1.9.2 with this command: >>>yarn-session.sh -d *-s 1* -jm 4096 -tm 4096 -qu "XXX" -nm >>> "MyPipeline" >>> So, after a lot of tests, I've noticed that if I increase the >>> parallelism of my Custom Sink, each task is embedded into one TS and, the >>> most important, each one into one TaskManager (Yarn container in fact). >>> So, if I understand I have to keep this Flink release (1.9.2) ? >>> >>> Thanks >>> David >>> >>> >>> >>> Le mar. 25 févr. 2020 à 02:02, Xintong Song a >>> écrit : >>> Depending on your Flink version, the '-n' option might not take effect. It is removed in the latest release, but before that there were a few versions where this option is neither removed nor taking effect. Anyway, as long as you have multiple containers, I don't think there's a way to make some of the tasks scheduled to the same JVM. Not that I'm aware of. Thank you~ Xintong Song On Mon, Feb 24, 2020 at 8:43 PM David Morin wrote: > Hi, > > Thanks Xintong. > I've noticed than when I use yarn-session.sh with --slots (-s) > parameter but without --container (-n) it creates one task/slot per > taskmanager. Before with the both n and -s it was not the case. > I prefer to use only small container with only one task to scale my > pipeline and of course to prevent from thread-safe issue > Do you think I cannot be confident on that behaviour ? > > Regards, > David > > On 2020/02/22 17:11:25, David Morin > wrote: > > Hi, > > My app is based on a lib that is not thread safe (yet...). > > In waiting of the patch has been pushed, how can I be sure that my > Sink that uses this lib is in one JVM ? > > Context: I use one Yarn session and send my Flink jobs to this > session > > > > Regards, > > David > > >
Re: yarn session: one JVM per task
Hi Gary, Sorry I was probably not very clear. Yes that's exactly what I want to hear :) I use the -s 1 parameter and what I expect to have is one task of my Sink (one instance in fact) per TM (i.e. per JVM) That's the current behaviour during my tests but I want to be sure. Thanks a lot David Le mar. 25 févr. 2020 à 11:16, Gary Yao a écrit : > Hi David, > > Before with the both n and -s it was not the case. >> > > What do you mean by before? At least in 1.8 "-s" could be used to specify > the > number of slots per TM. > > > how can I be sure that my Sink that uses this lib is in one JVM ? >> > > Is it enough that no other parallel instance of your sink runs in the same > JVM? If that is the case, it is enough to start your your YARN session > with: > > ./bin/yarn-session.sh -s 1 [...] > > This will result in exactly one slot per TM. Note that a single slot may > still > hold several subtasks of the job (Slot Sharing) but never two parallel > instances of your sink [2]. You can also control Slot Sharing manually [3]. > > > So, if I understand I have to keep this Flink release (1.9.2) ? >> > > I don't see why 1.10.0 would not work for you. > > > Best, > Gary > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#start-a-session > [2] > https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html#task-slots-and-resources > [3] > https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#task-chaining-and-resource-groups > > On Tue, Feb 25, 2020 at 10:28 AM David Morin > wrote: > >> Hi Xintong, >> >> At the moment I'm using the 1.9.2 with this command: >>yarn-session.sh -d *-s 1* -jm 4096 -tm 4096 -qu "XXX" -nm "MyPipeline" >> So, after a lot of tests, I've noticed that if I increase the parallelism >> of my Custom Sink, each task is embedded into one TS and, the most >> important, each one into one TaskManager (Yarn container in fact). >> So, if I understand I have to keep this Flink release (1.9.2) ? >> >> Thanks >> David >> >> >> >> Le mar. 25 févr. 2020 à 02:02, Xintong Song a >> écrit : >> >>> Depending on your Flink version, the '-n' option might not take effect. >>> It is removed in the latest release, but before that there were a few >>> versions where this option is neither removed nor taking effect. >>> >>> Anyway, as long as you have multiple containers, I don't think there's a >>> way to make some of the tasks scheduled to the same JVM. Not that I'm aware >>> of. >>> >>> >>> Thank you~ >>> >>> Xintong Song >>> >>> >>> >>> On Mon, Feb 24, 2020 at 8:43 PM David Morin >>> wrote: >>> Hi, Thanks Xintong. I've noticed than when I use yarn-session.sh with --slots (-s) parameter but without --container (-n) it creates one task/slot per taskmanager. Before with the both n and -s it was not the case. I prefer to use only small container with only one task to scale my pipeline and of course to prevent from thread-safe issue Do you think I cannot be confident on that behaviour ? Regards, David On 2020/02/22 17:11:25, David Morin wrote: > Hi, > My app is based on a lib that is not thread safe (yet...). > In waiting of the patch has been pushed, how can I be sure that my Sink that uses this lib is in one JVM ? > Context: I use one Yarn session and send my Flink jobs to this session > > Regards, > David > >>>
Re: yarn session: one JVM per task
Hi David, Before with the both n and -s it was not the case. > What do you mean by before? At least in 1.8 "-s" could be used to specify the number of slots per TM. how can I be sure that my Sink that uses this lib is in one JVM ? > Is it enough that no other parallel instance of your sink runs in the same JVM? If that is the case, it is enough to start your your YARN session with: ./bin/yarn-session.sh -s 1 [...] This will result in exactly one slot per TM. Note that a single slot may still hold several subtasks of the job (Slot Sharing) but never two parallel instances of your sink [2]. You can also control Slot Sharing manually [3]. So, if I understand I have to keep this Flink release (1.9.2) ? > I don't see why 1.10.0 would not work for you. Best, Gary [1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#start-a-session [2] https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html#task-slots-and-resources [3] https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#task-chaining-and-resource-groups On Tue, Feb 25, 2020 at 10:28 AM David Morin wrote: > Hi Xintong, > > At the moment I'm using the 1.9.2 with this command: >yarn-session.sh -d *-s 1* -jm 4096 -tm 4096 -qu "XXX" -nm "MyPipeline" > So, after a lot of tests, I've noticed that if I increase the parallelism > of my Custom Sink, each task is embedded into one TS and, the most > important, each one into one TaskManager (Yarn container in fact). > So, if I understand I have to keep this Flink release (1.9.2) ? > > Thanks > David > > > > Le mar. 25 févr. 2020 à 02:02, Xintong Song a > écrit : > >> Depending on your Flink version, the '-n' option might not take effect. >> It is removed in the latest release, but before that there were a few >> versions where this option is neither removed nor taking effect. >> >> Anyway, as long as you have multiple containers, I don't think there's a >> way to make some of the tasks scheduled to the same JVM. Not that I'm aware >> of. >> >> >> Thank you~ >> >> Xintong Song >> >> >> >> On Mon, Feb 24, 2020 at 8:43 PM David Morin >> wrote: >> >>> Hi, >>> >>> Thanks Xintong. >>> I've noticed than when I use yarn-session.sh with --slots (-s) parameter >>> but without --container (-n) it creates one task/slot per taskmanager. >>> Before with the both n and -s it was not the case. >>> I prefer to use only small container with only one task to scale my >>> pipeline and of course to prevent from thread-safe issue >>> Do you think I cannot be confident on that behaviour ? >>> >>> Regards, >>> David >>> >>> On 2020/02/22 17:11:25, David Morin wrote: >>> > Hi, >>> > My app is based on a lib that is not thread safe (yet...). >>> > In waiting of the patch has been pushed, how can I be sure that my >>> Sink that uses this lib is in one JVM ? >>> > Context: I use one Yarn session and send my Flink jobs to this session >>> > >>> > Regards, >>> > David >>> > >>> >>
Re: yarn session: one JVM per task
Hi Xintong, At the moment I'm using the 1.9.2 with this command: yarn-session.sh -d *-s 1* -jm 4096 -tm 4096 -qu "XXX" -nm "MyPipeline" So, after a lot of tests, I've noticed that if I increase the parallelism of my Custom Sink, each task is embedded into one TS and, the most important, each one into one TaskManager (Yarn container in fact). So, if I understand I have to keep this Flink release (1.9.2) ? Thanks David Le mar. 25 févr. 2020 à 02:02, Xintong Song a écrit : > Depending on your Flink version, the '-n' option might not take effect. It > is removed in the latest release, but before that there were a few versions > where this option is neither removed nor taking effect. > > Anyway, as long as you have multiple containers, I don't think there's a > way to make some of the tasks scheduled to the same JVM. Not that I'm aware > of. > > > Thank you~ > > Xintong Song > > > > On Mon, Feb 24, 2020 at 8:43 PM David Morin > wrote: > >> Hi, >> >> Thanks Xintong. >> I've noticed than when I use yarn-session.sh with --slots (-s) parameter >> but without --container (-n) it creates one task/slot per taskmanager. >> Before with the both n and -s it was not the case. >> I prefer to use only small container with only one task to scale my >> pipeline and of course to prevent from thread-safe issue >> Do you think I cannot be confident on that behaviour ? >> >> Regards, >> David >> >> On 2020/02/22 17:11:25, David Morin wrote: >> > Hi, >> > My app is based on a lib that is not thread safe (yet...). >> > In waiting of the patch has been pushed, how can I be sure that my Sink >> that uses this lib is in one JVM ? >> > Context: I use one Yarn session and send my Flink jobs to this session >> > >> > Regards, >> > David >> > >> >
Re: yarn session: one JVM per task
Depending on your Flink version, the '-n' option might not take effect. It is removed in the latest release, but before that there were a few versions where this option is neither removed nor taking effect. Anyway, as long as you have multiple containers, I don't think there's a way to make some of the tasks scheduled to the same JVM. Not that I'm aware of. Thank you~ Xintong Song On Mon, Feb 24, 2020 at 8:43 PM David Morin wrote: > Hi, > > Thanks Xintong. > I've noticed than when I use yarn-session.sh with --slots (-s) parameter > but without --container (-n) it creates one task/slot per taskmanager. > Before with the both n and -s it was not the case. > I prefer to use only small container with only one task to scale my > pipeline and of course to prevent from thread-safe issue > Do you think I cannot be confident on that behaviour ? > > Regards, > David > > On 2020/02/22 17:11:25, David Morin wrote: > > Hi, > > My app is based on a lib that is not thread safe (yet...). > > In waiting of the patch has been pushed, how can I be sure that my Sink > that uses this lib is in one JVM ? > > Context: I use one Yarn session and send my Flink jobs to this session > > > > Regards, > > David > > >
Re: yarn session: one JVM per task
Hi, Thanks Xintong. I've noticed than when I use yarn-session.sh with --slots (-s) parameter but without --container (-n) it creates one task/slot per taskmanager. Before with the both n and -s it was not the case. I prefer to use only small container with only one task to scale my pipeline and of course to prevent from thread-safe issue Do you think I cannot be confident on that behaviour ? Regards, David On 2020/02/22 17:11:25, David Morin wrote: > Hi, > My app is based on a lib that is not thread safe (yet...). > In waiting of the patch has been pushed, how can I be sure that my Sink that > uses this lib is in one JVM ? > Context: I use one Yarn session and send my Flink jobs to this session > > Regards, > David >
Re: yarn session: one JVM per task
Hi David, In general, I don't think you can control all parallel subtasks of a certain task run in the same JVM process with the current Flink. If you job scale is very small, one thing you might try is to have only one task manager in the Flink session cluster. You need to make sure the task manager has enough cpu/memory resources and slots for running your job. Thank you~ Xintong Song On Sun, Feb 23, 2020 at 1:11 AM David Morin wrote: > Hi, > My app is based on a lib that is not thread safe (yet...). > In waiting of the patch has been pushed, how can I be sure that my Sink > that uses this lib is in one JVM ? > Context: I use one Yarn session and send my Flink jobs to this session > > Regards, > David >
yarn session: one JVM per task
Hi, My app is based on a lib that is not thread safe (yet...). In waiting of the patch has been pushed, how can I be sure that my Sink that uses this lib is in one JVM ? Context: I use one Yarn session and send my Flink jobs to this session Regards, David