Thanks for the explanation. I'd go for the manager as well. In the end, if you really want to scale out, you should go for Celery/Kubernetes.
Cheers, Fokko Op ma 29 apr. 2019 om 23:09 schreef Jarek Potiuk <jarek.pot...@polidea.com>: > Yep. Manager() creates a SyncManager instance and runs start(). > This in turn creates a Manager process that all processes communicate with > via Proxies: > > https://docs.python.org/3.4/library/multiprocessing.html?highlight=process#sharing-state-between-processes > > It's reliable, but slower than Shared Memory used by direct Queue > instantiation. > > J. > > On Mon, Apr 29, 2019 at 10:26 PM Bas Harenslak < > basharens...@godatadriven.com> wrote: > > > @Fokko I believe that’s implicitly created by multiprocessing.Manager()< > > > https://docs.python.org/3.4/library/multiprocessing.html?highlight=process#multiprocessing.sharedctypes.multiprocessing.Manager > > >: > > > > ….. The returned manager object corresponds to a spawned child process > and > > …. > > > > I’m in favour of the Manager route, since this doesn’t introduce > > additional complex multiprocessing code in Airflow. I’m reviewing right > now > > although it’s a bit out of my comfort zone... > > > > Bas > > > > On 29 Apr 2019, at 21:46, Driesprong, Fokko <fo...@driesprong.frl > <mailto: > > fo...@driesprong.frl>> wrote: > > > > I'm missing the part of another process? This is within the Scheduler > > process if I understand correctly. > > > > Cheers, Fokko > > > > Op ma 29 apr. 2019 om 21:33 schreef Jarek Potiuk < > jarek.pot...@polidea.com > > <mailto:jarek.pot...@polidea.com>>: > > > > I am also leaning towards the manager. I updated the > > https://github.com/apache/airflow/pull/5200 PR now after review and once > > it > > passes CI I think we can merge it. > > If anyone wants to have a look as well, happy to hear it. > > > > J. > > > > On Mon, Apr 29, 2019 at 2:14 PM Ash Berlin-Taylor <a...@apache.org > <mailto: > > a...@apache.org>> wrote: > > > > I think I lean towards the built-in/manager approach as it is less > > concurrency code we have to manage/maintain in Airflow, though I'm not > > hugely happy about another process :( > > > > -ash > > > > On 29 Apr 2019, at 07:33, Jarek Potiuk <jarek.pot...@polidea.com<mailto: > > jarek.pot...@polidea.com>> > > wrote: > > > > Hello Everyone, > > > > I think we need some more pairs of eyes to take a look at potential > > fixes > > we have for the pesky LocalExecutorTest that we are all experiencing > > with > > our Travis builds. Once we solve it I think we should be much closer to > > have stable builds - including some other flaky test fixes merged > > recently. > > > > It turned out that the problem relates to quite deep internals of how > > data > > is passed between processes using multiprocessing queues. It's really > > deep > > in the core processing of Airflow so I think it would be great if also > > other experienced Airflowers review and comment it and help to select > > the > > best solution as we could have missed something. > > > > I was looking at it together with Ash and Bas and I (a bit too fast) > > merged > > a preliminary version of the fix last week. We reverted it later as it > > turned out to have some side effects, so we know we have to be careful > > with > > this one. > > > > After more detailed analysis and discussions with Omar, we have now two > > potential candidates to fix it. Both are green and from local testing - > > both are solving the problem in a different way. > > > > - https://github.com/apache/airflow/pull/5199 > > - https://github.com/apache/airflow/pull/5200 > > > > I tried to describe the problem, solution candidates with Pros and Cons > > in > > the JIRA ticket : > > https://issues.apache.org/jira/browse/AIRFLOW-4401 > > > > I'd love if we can get reviews in the PRs and input to discussion on > > which > > solution to choose. > > > > J. > > > > > > -- > > > > Jarek Potiuk > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > M: +48 660 796 129 <+48660796129> > > E: jarek.pot...@polidea.com<mailto:jarek.pot...@polidea.com> > > > > > > > > -- > > > > Jarek Potiuk > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > M: +48 660 796 129 <+48660796129> > > E: jarek.pot...@polidea.com<mailto:jarek.pot...@polidea.com> > > > > > > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > E: jarek.pot...@polidea.com >