Fabricio, this is great work! One thing that stands out is that a very large amount of time is being spent in remove_duplicates(), 65% of the total runtime of the sync.
- 13% of the total runtime spent on this inside cast() https://github.com/pulp/pulpcore/blob/master/pulpcore/plugin/repo_version_utils.py#L31 - 31% of the total runtime spent in these lines https://github.com/pulp/pulpcore/blob/master/pulpcore/plugin/repo_version_utils.py#L37-L38 - 20% of the total runtime spent on this line https://github.com/pulp/pulpcore/blob/master/pulpcore/plugin/repo_version_utils.py#L42 There's a couple of suggestions on how to improve this code written up here https://pulp.plan.io/issues/5701 The memory usage seems to be coming from the Django ORM backend. My guess is that lines 37 & 38 (linked above) are generating some extremely long SQL, because it's basically doing OR with a vast number of individual queries. And I think that is also not great for performance. https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/ Another hint that that might be the case is that the memory usage grows and grows and then collapses all at once. That would make sense if the SQL compiler's data structure is growing and growing and then being destroyed after the query executes. I would imagine that if we fix our queries, we'll both fix the memory consumption and improve performance at the same time. On Tue, Nov 26, 2019 at 11:57 AM Fabricio Aguiar <fabricio.agu...@redhat.com> wrote: > Hi everyone, > > I've been investigating memory consumption for syncing on RPM plugin - issue > 5688 <https://pulp.plan.io/issues/5688>, and I want to share what I've > found so far. > > First, it seems that a new process is created when we started to sync, and > this new process is responsible for the large amount of memory consumption > observed. > > As I shared on some notes here: https://pulp.plan.io/issues/5688 > The problem seems to affect only RHEL, I did not see a large consumption > of memory for the following repos: > - CentOS 6: http://mirror.centos.org/centos-6/6.10/os/x86_64/ > - CentOS 7: http://mirror.linux.duke.edu/pub/centos/7/os/x86_64/ > - CentOS 8 AppStrem: > http://mirror.linux.duke.edu/pub/centos/8/AppStream/x86_64/os/ > - CentOS 8 BaseOS: > http://mirror.linux.duke.edu/pub/centos/8/BaseOS/x86_64/os/ > - EPEL 7: https://dl.fedoraproject.org/pub/epel/7/x86_64/ > > As stated on the issue, with RHEL 7 we can observe the problem, for my > tests, I used the following repo: > http://cdn.stage.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/ > > For my tests, I used pulp3-source-fedora30 vagrant box, initially set with > 4GB, then I increased to 16GB, and it was not enough. I only was able to > sync RHEL 7 when I increased the memory up to 23GB. > > Utilizing py-spy I got those SVG profiles attached, you can find them > hosted here: > Sync - https://sendeyo.com/up/d/90a8ae4c8f > Re-sync - https://sendeyo.com/up/d/4c855bcce3 > > With the following branch, I was able to take some tracemalloc > <https://docs.python.org/3/library/tracemalloc.html> snapshots (every 30 > seconds): > > https://github.com/pulp/pulp_rpm/commit/f3f079010cfe81b7f5cf3ef94c2402b1ccf7d90c > > *Tracemalloc for Sync at peak of memory consumption:* > > #1: sql/compiler.py:1512: 7825340.4 KiB > #2: tasks/synchronizing.py:154: 4062651.9 KiB > #3: stages/declarative_version.py:149: 4062601.9 KiB > #4: models/repository.py:631: 4062130.0 KiB > #5: models/repository.py:96: 4062127.7 KiB > #6: rq/job.py:611: 4060459.2 KiB > #7: rq/job.py:605: 4058273.8 KiB > #8: rq/worker.py:822: 4053883.3 KiB > #9: tasking/worker.py:100: 4053875.9 KiB > #10: rq/worker.py:684: 4053849.9 KiB > #11: rq/worker.py:610: 4051645.7 KiB > #12: plugin/repo_version_utils.py:31: 4028192.9 KiB > #13: models/base.py:124: 4028191.2 KiB > #14: fields/related_descriptors.py:401: 4028165.2 KiB > #15: models/query.py:1242: 4021212.9 KiB > #16: models/query.py:402: 3995284.5 KiB > #17: models/query.py:256: 3995284.5 KiB > #18: models/query.py:55: 3952718.2 KiB > #19: sql/compiler.py:1133: 3919417.7 KiB > #20: db/utils.py:96: 3912674.4 KiB > #21: psycopg2/_json.py:166: 3837534.7 KiB > #22: json/decoder.py:337: 3837533.2 KiB > obj, end = self.raw_decode(s, idx=_w(s, 0).end()) > #23: json/__init__.py:348: 3837533.2 KiB > return _default_decoder.decode(s) > #24: json/decoder.py:353: 3837532.7 KiB > obj, end = self.scan_once(s, idx) > #25: rq/worker.py:670: 211891.5 KiB > #26: tasking/worker.py:69: 198267.0 KiB > #27: rq/worker.py:477: 191261.5 KiB > #28: cli/cli.py:252: 186110.9 KiB > #29: click/core.py:555: 185957.8 KiB > #30: cli/cli.py:78: 104822.7 KiB > #31: models/query.py:73: 48350.9 KiB > #32: models/base.py:513: 48350.9 KiB > #33: sql/compiler.py:405: 38126.1 KiB > #34: sql/compiler.py:1087: 33355.6 KiB > #35: click/core.py:956: 29580.3 KiB > #36: click/core.py:1137: 29498.3 KiB > #37: models/base.py:430: 29340.9 KiB > #38: models/query.py:274: 25946.8 KiB > #39: plugin/repo_version_utils.py:30: 25850.4 KiB > #40: models/query.py:892: 24105.6 KiB > #41: models/query.py:910: 24089.7 KiB > #42: models/query.py:399: 24045.0 KiB > #43: sql/query.py:1290: 17343.9 KiB > #44: sql/where.py:81: 15629.7 KiB > #45: sql/compiler.py:489: 15581.6 KiB > #46: models/lookups.py:162: 15527.0 KiB > #47: psycopg2/extras.py:678: 15027.3 KiB > #48: sql/query.py:1323: 13486.4 KiB > #49: sql/query.py:796: 13486.3 KiB > #50: sql/compiler.py:474: 11291.3 KiB > #51: sql/compiler.py:54: 11282.7 KiB > #52: sql/compiler.py:45: 11282.7 KiB > #53: click/core.py:717: 10190.6 KiB > #54: models/query.py:72: 9967.2 KiB > #55: models/query.py:1219: 8855.6 KiB > #56: models/query.py:1231: 8855.6 KiB > #57: models/query.py:1072: 8840.0 KiB > #58: models/query.py:401: 8840.0 KiB > #59: models/lookups.py:153: 8779.2 KiB > #60: models/lookups.py:79: 8773.0 KiB > #61: models/expressions.py:332: 8769.2 KiB > #62: models/expressions.py:238: 8769.2 KiB > #63: plugin/repo_version_utils.py:37: 7602.2 KiB > #64: models/base.py:408: 7378.9 KiB > #65: click/core.py:764: 7371.2 KiB > #66: sql/compiler.py:254: 6853.7 KiB > #67: models/expressions.py:737: 6851.8 KiB > #68: models/lookups.py:159: 6745.8 KiB > #69: sql/query.py:1293: 6744.2 KiB > #70: sql/query.py:2258: 6743.6 KiB > #71: sql/query.py:763: 6743.6 KiB > #72: sql/query.py:2259: 6742.7 KiB > #73: sql/compiler.py:1054: 6696.5 KiB > #74: utils/deconstruct.py:17: 6261.2 KiB > #75: fields/related.py:986: 6246.5 KiB > #76: fields/__init__.py:381: 6245.8 KiB > #77: models/base.py:411: 5990.2 KiB > #78: python3.7/uuid.py:204: 5876.0 KiB > self.__dict__['int'] = int > #79: sql/compiler.py:472: 5784.0 KiB > #80: sql/compiler.py:1019: 5023.7 KiB > #81: sql/compiler.py:1053: 4916.2 KiB > #82: sql/query.py:350: 4490.9 KiB > #83: sql/query.py:309: 4438.3 KiB > #84: sql/compiler.py:219: 4406.5 KiB > #85: sql/compiler.py:666: 4401.8 KiB > #86: python3.7/copy.py:96: 4399.3 KiB > rv = reductor(4) > #87: models/expressions.py:159: 4398.6 KiB > #88: python3.7/copyreg.py:88: 4372.9 KiB > return cls.__new__(cls, *args) > #89: python3.7/copy.py:274: 4372.9 KiB > y = func(*args) > #90: python3.7/copy.py:106: 4372.8 KiB > return _reconstruct(x, None, *rv) > #91: models/query_utils.py:82: 4223.2 KiB > #92: bin/rq:10: 4115.9 KiB > #93: python3.7/uuid.py:161: 4025.9 KiB > int = int_(hex, 16) > #94: sql/query.py:1318: 3846.5 KiB > #95: models/expressions.py:749: 3362.8 KiB > #96: fields/__init__.py:709: 3340.3 KiB > #97: utils/tree.py:108: 3008.9 KiB > #98: models/query_utils.py:72: 2534.5 KiB > #99: models/query_utils.py:85: 2534.2 KiB > #100: models/base.py:503: 2214.3 KiB > #101: models/query.py:199: 2190.8 KiB > #102: models/query.py:190: 2178.6 KiB > #103: models/query_utils.py:74: 2164.6 KiB > #104: utils/tree.py:23: 2115.5 KiB > #105: models/query_utils.py:59: 2113.1 KiB > #106: sql/query.py:1251: 1895.8 KiB > #107: sql/query.py:1116: 1895.3 KiB > #108: models/lookups.py:19: 1892.1 KiB > #109: sql/query.py:68: 1882.4 KiB > #110: sql/query.py:1249: 1881.4 KiB > #111: fields/related.py:974: 1679.4 KiB > #112: sql/compiler.py:1018: 1672.8 KiB > #113: base/operations.py:564: 1672.8 KiB > #114: models/expressions.py:747: 1660.9 KiB > #115: models/query.py:1893: 1590.1 KiB > #116: models/query.py:63: 1590.1 KiB > #117: models/query.py:62: 1554.4 KiB > #118: models/query.py:61: 1554.4 KiB > 962 other: 9.8 MiB > Total allocated size: 99186.9 MiB > > > > *Tracemalloc for Re-sync at peak of memory consumption:* > > #1: sql/compiler.py:1512: 1813416.3 KiB > #2: asyncio/events.py:88: 1159219.2 KiB > self._context.run(self._callback, *self._args) > #3: stages/api.py:43: 1159151.2 KiB > #4: asyncio/base_events.py:1775: 1158553.1 KiB > if bool(enabled) == bool(self._coroutine_origin_tracking_enabled): > #5: asyncio/base_events.py:539: 1157867.4 KiB > self._thread_id = None > #6: asyncio/base_events.py:571: 1156528.6 KiB > # local task. > #7: stages/declarative_version.py:149: 1152654.5 KiB > #8: tasks/synchronizing.py:154: 1151586.8 KiB > #9: rq/job.py:611: 1147795.3 KiB > #10: rq/job.py:605: 1147534.7 KiB > #11: rq/worker.py:822: 1141556.0 KiB > #12: tasking/worker.py:100: 1139872.3 KiB > #13: rq/worker.py:684: 1137457.4 KiB > #14: rq/worker.py:610: 1136707.1 KiB > #15: models/query.py:274: 940312.1 KiB > #16: models/query.py:1242: 921581.8 KiB > #17: models/query.py:55: 908503.7 KiB > #18: sql/compiler.py:1133: 906747.3 KiB > #19: db/utils.py:96: 906707.8 KiB > #20: stages/content_stages.py:48: 899314.9 KiB > #21: psycopg2/_json.py:166: 872466.2 KiB > #22: json/decoder.py:337: 871817.4 KiB > obj, end = self.raw_decode(s, idx=_w(s, 0).end()) > #23: json/__init__.py:348: 871817.4 KiB > return _default_decoder.decode(s) > #24: json/decoder.py:353: 871816.8 KiB > obj, end = self.scan_once(s, idx) > #25: click/core.py:555: 448794.2 KiB > #26: rq/worker.py:670: 259351.8 KiB > #27: tasking/worker.py:69: 257743.4 KiB > #28: rq/worker.py:477: 249243.3 KiB > #29: cli/cli.py:252: 244023.2 KiB > #30: cli/cli.py:78: 226347.5 KiB > #31: click/core.py:956: 220086.5 KiB > #32: click/core.py:1137: 213146.8 KiB > #33: click/core.py:717: 188052.0 KiB > #34: tasks/synchronizing.py:581: 177471.0 KiB > #35: models/package.py:232: 138350.9 KiB > #36: stages/artifact_stages.py:219: 54726.5 KiB > #37: stages/artifact_stages.py:251: 54723.6 KiB > #38: models/query.py:1625: 47373.9 KiB > #39: models/query.py:1738: 46489.3 KiB > #40: fields/related_descriptors.py:627: 42352.2 KiB > #41: models/package.py:238: 24673.4 KiB > #42: models/query.py:1591: 24325.2 KiB > #43: models/query.py:1244: 18744.3 KiB > #44: models/query.py:771: 18744.3 KiB > #45: models/query.py:73: 12827.6 KiB > #46: models/base.py:513: 12827.6 KiB > #47: sql/compiler.py:405: 11912.8 KiB > #48: psycopg2/extras.py:678: 11666.5 KiB > #49: models/package.py:245: 10837.2 KiB > #50: sql/compiler.py:1087: 7407.8 KiB > #51: sql/where.py:81: 6915.5 KiB > #52: models/base.py:741: 6705.2 KiB > #53: models/base.py:779: 6697.1 KiB > #54: models/progress.py:144: 6607.2 KiB > #55: click/core.py:764: 6603.1 KiB > #56: models/progress.py:191: 6601.5 KiB > #57: tasks/synchronizing.py:609: 6574.8 KiB > #58: python3.7/uuid.py:204: 6532.6 KiB > self.__dict__['int'] = int > #59: models/base.py:851: 6485.3 KiB > #60: models/base.py:411: 6366.6 KiB > #61: models/base.py:900: 6178.2 KiB > #62: models/query.py:760: 5612.1 KiB > #63: sql/compiler.py:1429: 5528.5 KiB > #64: models/base.py:408: 5042.7 KiB > #65: tasks/synchronizing.py:575: 4724.0 KiB > #66: createrepo_c/__init__.py:312: 4722.6 KiB > #67: tasks/synchronizing.py:285: 4722.6 KiB > #68: models/lookups.py:162: 4583.4 KiB > #69: tasks/synchronizing.py:582: 4236.2 KiB > #70: tasks/synchronizing.py:256: 3655.1 KiB > #71: sql/query.py:1312: 3358.8 KiB > #72: sql/compiler.py:1417: 3312.9 KiB > #73: models/query.py:892: 3121.7 KiB > #74: models/lookups.py:153: 2934.4 KiB > #75: models/lookups.py:79: 2931.5 KiB > #76: models/expressions.py:332: 2918.8 KiB > #77: models/expressions.py:238: 2918.5 KiB > #78: fields/related_descriptors.py:219: 2766.3 KiB > #79: fields/related_descriptors.py:629: 2765.8 KiB > #80: models/query.py:910: 2575.7 KiB > #81: sql/query.py:1290: 2513.6 KiB > #82: sql/query.py:1318: 2257.2 KiB > #83: stages/association_stages.py:36: 2219.3 KiB > #84: sql/compiler.py:1452: 2214.1 KiB > #85: sql/compiler.py:1370: 2214.1 KiB > #86: python3.7/uuid.py:161: 2179.2 KiB > int = int_(hex, 16) > #87: models/base.py:503: 2082.6 KiB > #88: models/manager.py:82: 2008.6 KiB > #89: tasks/synchronizing.py:585: 1803.4 KiB > #90: fields/related_descriptors.py:527: 1767.5 KiB > #91: python3.7/copy.py:96: 1754.9 KiB > rv = reductor(4) > #92: urllib/parse.py:562: 1725.2 KiB > resolved_path.append('') > #93: urllib/parse.py:489: 1725.2 KiB > if url and url[:1] != '/': url = '/' + url > #94: urllib/parse.py:475: 1725.2 KiB > _coerce_args(*components)) > #95: models/lookups.py:159: 1647.3 KiB > #96: models/expressions.py:159: 1625.9 KiB > #97: tasks/synchronizing.py:512: 1619.8 KiB > #98: models/query.py:1652: 1619.0 KiB > #99: fields/__init__.py:801: 1604.3 KiB > #100: models/base.py:473: 1545.2 KiB > #101: python3.7/uuid.py:759: 1488.4 KiB > def uuid4(): > #102: models/package.py:248: 1455.4 KiB > #103: sql/compiler.py:489: 1332.9 KiB > #104: python3.7/copy.py:106: 1293.2 KiB > return _reconstruct(x, None, *rv) > #105: python3.7/copyreg.py:88: 1279.1 KiB > return cls.__new__(cls, *args) > #106: python3.7/copy.py:274: 1279.1 KiB > y = func(*args) > #107: sql/query.py:350: 1119.1 KiB > #108: models/query.py:1231: 1057.4 KiB > #109: models/query.py:1219: 1057.4 KiB > 1019 other: 34.8 MiB > Total allocated size: 28304.4 MiB > > > > The snapshots can be found here: > > memory_data > <https://drive.google.com/drive/folders/1mdJtdGDvbwq_jLc5m_T8VBjjb-bDRDmS> > > *Memory consumption over time:* > [image: sync.png] > > [image: resync.png] > > [image: sync_and_resync.png] > > Best regards, > Fabricio Aguiar > Software Engineer, Pulp Project > Red Hat Brazil - Latam <https://www.redhat.com/> > +55 11 999652368 > _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev