Thanks for the context Dan, that was helpful. On Fri, Nov 9, 2018 at 10:09 AM Udi Meiri <[email protected]> wrote:
> The reasoning unbounded threadpool is explained as: > /* The SDK requires an unbounded thread pool because a step may create X > writers > * each requiring their own thread to perform the writes otherwise a writer > may > * block causing deadlock for the step because the writers buffer is full. > * Also, the MapTaskExecutor launches the steps in reverse order and > completes > * them in forward order thus requiring enough threads so that each step's > writers > * can be active. > > */ > > > https://github.com/apache/beam/blob/17c2da6d981cae9f233aea1e2d6d64259362dd73/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcsOptions.java#L133-L138 > > On Thu, Nov 8, 2018 at 11:41 PM Dan Halperin <[email protected]> wrote: > >> >>> On Thu, Nov 8, 2018 at 2:12 PM Udi Meiri <[email protected]> wrote: >>> >>>> Both options risk delaying worker shutdown if the executor's shutdown() >>>> hasn't been called, which is I guess why the executor in GcsOptions.java >>>> creates daemon threads. >>>> >>> >> My guess (and it really is a guess at this point) is that this was a fix >> for DirectRunner issues - want that to exit quickly! >> >> >>> >>>> On Thu, Nov 8, 2018 at 1:02 PM Lukasz Cwik <[email protected]> wrote: >>>> >>>>> Not certain, it looks like we should have been caching the executor >>>>> within the GcsUtil as a static instance instead of creating one each time. >>>>> Could have been missed during code review / slow code changes over time. >>>>> GcsUtil is not well "loved". >>>>> >>>>> On Thu, Nov 8, 2018 at 11:00 AM Udi Meiri <[email protected]> wrote: >>>>> >>>>>> HI, >>>>>> I've identified a memory leak when GcsUtil.java instantiates a >>>>>> ThreadPoolExecutor (https://issues.apache.org/jira/browse/BEAM-6018). >>>>>> The code uses the getExitingExecutorService >>>>>> <https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java#L551> >>>>>> wrapper, >>>>>> which leaks memory. The question is, why is that wrapper necessary >>>>>> if executor.shutdown(); is later unconditionally called? >>>>>> >>>>>
