On Fri, Nov 16, 2018 at 2:12 AM, Robert Bradshaw <[email protected]> wrote:
> One needs to ensure that gprof2dot is importable (i.e. installed via pip > into your Python environment). > > As for specifying the FnApiRunner via the runner argument, --runner can > take fully qualified names (if it's not in the short list of known > runners). However, the FnApiRunner is the DirectRunner for non-streaming > mode, so there's no need to specify it explicitly. > > Good point about adding this to the documentation. It's unclear where best > to put it... > How about in wiki under python tips? ( https://cwiki.apache.org/confluence/display/BEAM/Python+Tips) From there it can be later converted to full user docs later. > > On Thu, Nov 15, 2018 at 5:28 PM Thomas Weise <[email protected]> wrote: > >> Hi Robert, >> >> This is great. It should be added to our Python documentation because >> users will like need this! >> >> After I installed gprof2dot I'm still prompted to install: >> >> "Please install gprof2dot and dot for profile renderings." >> >> Also is there a way to run a pipeline unmodified with fn_api_runner? (For >> those interested in profiling the SDK worker.) >> >> It works with direct runner, but "FnApiRunner" isn't currently supported >> as --runner argument: >> >> python -m apache_beam.examples.wordcount \ >> --input=/etc/profile \ >> --output=/tmp/py-wordcount-direct \ >> *--runner=FnApiRunner* \ >> --streaming \ >> --profile_cpu --profile_location=./build/pyprofile >> >> Thanks, >> Thomas >> >> >> On Mon, Nov 5, 2018 at 7:15 PM Ankur Goenka <[email protected]> wrote: >> >>> All containers are destroyed by default on termination so to analyze >>> profiling data for portable runners, either disable container cleanup >>> (using --retainDockerContainers=true) or use remote distributed file >>> system path. >>> >>> On Mon, Nov 5, 2018 at 1:05 AM Robert Bradshaw <[email protected]> >>> wrote: >>> >>>> Any portable runner should pick it up automatically. >>>> On Tue, Oct 30, 2018 at 3:32 AM Manu Zhang <[email protected]> >>>> wrote: >>>> > >>>> > Cool ! Can we document it somewhere such that other Runners could >>>> pick it up later ? >>>> > >>>> > Thanks, >>>> > Manu Zhang >>>> > On Oct 29, 2018, 5:54 PM +0800, Maximilian Michels <[email protected]>, >>>> wrote: >>>> > >>>> > This looks very helpful for debugging performance of portable >>>> pipelines. >>>> > Great work! >>>> > >>>> > Enabling local directories for Flink or other portable Runners would >>>> be >>>> > useful for debugging, e.g. per >>>> > https://issues.apache.org/jira/browse/BEAM-5440 >>>> > >>>> > On 26.10.18 18:08, Robert Bradshaw wrote: >>>> > >>>> > Now that we've (mostly) moved from features to performance for >>>> > BeamPython-on-Flink, I've been doing some profiling of Python code, >>>> > and thought it may be useful for others as well (both those working on >>>> > the SDK, and users who want to understand their own code), so I've >>>> > tried to wrap this up into something useful. >>>> > >>>> > Python already had some existing profile options that we used with >>>> > Dataflow, specifically --profile_cpu and --profile_location. I've >>>> > hooked these up to both the DirectRunner and the SDK Harness Worker. >>>> > One can now run commands like >>>> > >>>> > python -m apache_beam.examples.wordcount >>>> > --output=counts.txt--profile_cpu --profile_location=path/to/directory >>>> > >>>> > and get nice graphs like the one attached. (Here the bulk of the time >>>> > is spent reading from the default input in gcs. Another hint for >>>> > reading the graph is that due to fusion the call graph is cyclic, >>>> > passing through operations:86:receive for every output.) >>>> > >>>> > The raw python profile stats [1] are produced in that directory, along >>>> > with a dot graph and an svg if both dot and gprof2dot are installed. >>>> > There is also an important option --direct_runner_bundle_repeat which >>>> > can be set to gain more accurate profiles on smaller data sets by >>>> > re-playing the bundle without the (non-trivial) one-time setup costs. >>>> > >>>> > These flags also work on portability runners such as Flink, where the >>>> > directory must be set to a distributed filesystem. Each bundle >>>> > produces its own profile in that directory, and they can be >>>> > concatenated and manually fed into tools like below. In that case >>>> > there is a --profile_sample_rate which can be set to avoid profiling >>>> > every single bundle (e.g. for a production job). >>>> > >>>> > The PR is up at https://github.com/apache/beam/pull/6847 Hope it's >>>> useful. >>>> > >>>> > - Robert >>>> > >>>> > >>>> > [1] https://docs.python.org/2/library/profile.html >>>> > [2] https://github.com/jrfonseca/gprof2dot >>>> > >>>> >>>
