Junru Shao created MXNET-1417: --------------------------------- Summary: [Performance] Caching Dynamic Shape Checking Result Key: MXNET-1417 URL: https://issues.apache.org/jira/browse/MXNET-1417 Project: Apache MXNet Issue Type: Improvement Reporter: Junru Shao
h2. Description (Please see appendix for experiment details) PR [#1324|https://github.com/apache/incubator-mxnet/issues/1324] that enables dynamic shapes slows down a model that originally runs in 235.65 ms by 7.26 ms (to 242.91 ms). Also noted that a seemingly relevant PR [#14665|https://github.com/apache/incubator-mxnet/pull/14665] suggesting itself to be improving "[performance]", does not change performance number in any means - It still runs in 242.35 ms. This PR fixes this by caching the checking result of whether dynamic shape exists. The mechanism itself is quick simple: if the dynamic shape existence has been checked, let's simply don't do it again, because the graph does not change. h2. Checklist h3. Essentials Please feel free to remove inapplicable items for your PR. * The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue|https://issues.apache.org/jira/projects/MXNET/issues] created (except PRs with tiny changes) * Changes are complete (i.e. I finished coding on this PR) * All changes have test coverage: * Unit tests are added for small changes to verify correctness (e.g. adding a new operator) * Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) * Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) * Code is well-documented: * For user-facing API changes, API doc string has been updated. * For new C++ functions in header files, their functionalities and arguments are documented. * For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable * Check the API doc at [http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html] * To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change h3. Changes Nothing h2. Comments Experiment environment: EC2 p2.8xlarge, CUDA 10 and cuDNN 7.5. The model itself is confidential. The detailed benchmark is as below (mean ± stdev). The experiment is conducted in 20 runs, warmup run is excluded. # On commit [{{39412b3}}|https://github.com/apache/incubator-mxnet/commit/39412b37ffca84bf3cd10f81dac5c6c77149f3ac] (right before PR [#14192|https://github.com/apache/incubator-mxnet/pull/14192] is merge): Hybridize w/ static_alloc: 235.65 ± 0.22246 ms # On commit [{{83d2c2d}}|https://github.com/apache/incubator-mxnet/commit/83d2c2d0e0edeb7d85471437601efcf8bebf070e] (where PR [#14192|https://github.com/apache/incubator-mxnet/pull/14192] is merged): Hybridize w/ static_alloc: 242.91 ms ± 0.71125 ms # PR [#14665|https://github.com/apache/incubator-mxnet/pull/14665] patched to commit [{{83d2c2d}}|https://github.com/apache/incubator-mxnet/commit/83d2c2d0e0edeb7d85471437601efcf8bebf070e] Hybridize w/ static_alloc: 242.35 ± 0.25124 ms # After this patch applied to commit [{{83d2c2d}}|https://github.com/apache/incubator-mxnet/commit/83d2c2d0e0edeb7d85471437601efcf8bebf070e] Hybridize w/ static_alloc: 234.95 ± 0.39334 ms CC: [@szha|https://github.com/szha] [@zheng-da|https://github.com/zheng-da] please review :-) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org For additional commands, e-mail: issues-h...@mxnet.apache.org