Re: [Pulp-list] Changing working_directory and/or reducing disk utilization during sync
Thanks Michael! I appreciate the response. Oracle seems to have their own special way of doing things, for sure. On Fri, Mar 31, 2017 at 5:54 PM, Michael Hrivnak wrote: > I just looked at the repo, and other.xml is 720MB compressed!!! Wow! I > wonder what's in there. > > For comparison, just for fun, I checked RHEL 6.8. The other.xml file there > is under 5MB compressed. > > The setting to change where the working directory lives is intended to > help in a scenario where you're using a slow/latent network filesystem in a > Pulp cluster. It allows a worker process to potentially use fast local > storage for transient data. But you do pay a small price for having to > eventually copy some data from that filesystem to the shared one. > > Thus on a single-machine deployment, it pays to have /var/cache/pulp on > the same filesystem as /var/lib/pulp. > > When changing the setting, restarting services is all you need to do, > besides of course ensure that the "apache" user can write to the new > location. > > Otherwise, there's no option for reducing Pulp's disk usage during sync. > It has to download that 720MB file, and it does end up storing all of that > data on disk uncompressed temporarily while the sync takes place. I theory > we could modify that workflow to store those temporary data blobs (one for > each RPM) compressed in the working directory. But it's not currently > optimized for gigantic metadata files, and I'm not sure if it would be > worth adding that complexity and overhead for a rare use case. > > Michael > > On Wed, Mar 29, 2017 at 9:57 AM, Christina Plummer > wrote: > >> >> We found the "working_directory" setting in server.conf, but couldn't >> find much documentation about it. Since this is a production system, I >> wanted to check with the list first to confirm: >> 1) Will changing this to a location on a different, larger filesystem >> address my issues with /var utilization spikes during repo sync? >> 2) Are there any special considerations to changing this setting, other >> than restarting all the services? Do I need to copy the subdirectories? Is >> a symlink a bad idea? It looks like the SELinux context probably needs to >> be set to pulp_var_cache_t. >> 3) Is there another way to reduce Pulp's utilization during the sync? >> This repo seems to be particularly egregious in terms of the massive size >> of the uncompressed other.db and filelists.db for some reason. >> >> Thanks, >> Christina >> >> ___ >> Pulp-list mailing list >> Pulp-list@redhat.com >> https://www.redhat.com/mailman/listinfo/pulp-list >> > > ___ Pulp-list mailing list Pulp-list@redhat.com https://www.redhat.com/mailman/listinfo/pulp-list
Re: [Pulp-list] Changing working_directory and/or reducing disk utilization during sync
I just looked at the repo, and other.xml is 720MB compressed!!! Wow! I wonder what's in there. For comparison, just for fun, I checked RHEL 6.8. The other.xml file there is under 5MB compressed. The setting to change where the working directory lives is intended to help in a scenario where you're using a slow/latent network filesystem in a Pulp cluster. It allows a worker process to potentially use fast local storage for transient data. But you do pay a small price for having to eventually copy some data from that filesystem to the shared one. Thus on a single-machine deployment, it pays to have /var/cache/pulp on the same filesystem as /var/lib/pulp. When changing the setting, restarting services is all you need to do, besides of course ensure that the "apache" user can write to the new location. Otherwise, there's no option for reducing Pulp's disk usage during sync. It has to download that 720MB file, and it does end up storing all of that data on disk uncompressed temporarily while the sync takes place. I theory we could modify that workflow to store those temporary data blobs (one for each RPM) compressed in the working directory. But it's not currently optimized for gigantic metadata files, and I'm not sure if it would be worth adding that complexity and overhead for a rare use case. Michael On Wed, Mar 29, 2017 at 9:57 AM, Christina Plummer wrote: > > We found the "working_directory" setting in server.conf, but couldn't find > much documentation about it. Since this is a production system, I wanted to > check with the list first to confirm: > 1) Will changing this to a location on a different, larger filesystem > address my issues with /var utilization spikes during repo sync? > 2) Are there any special considerations to changing this setting, other > than restarting all the services? Do I need to copy the subdirectories? Is > a symlink a bad idea? It looks like the SELinux context probably needs to > be set to pulp_var_cache_t. > 3) Is there another way to reduce Pulp's utilization during the sync? This > repo seems to be particularly egregious in terms of the massive size of the > uncompressed other.db and filelists.db for some reason. > > Thanks, > Christina > > ___ > Pulp-list mailing list > Pulp-list@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-list > ___ Pulp-list mailing list Pulp-list@redhat.com https://www.redhat.com/mailman/listinfo/pulp-list
Re: [Pulp-list] Changing working_directory and/or reducing disk utilization during sync
> > The short answer is that if you need to sync Oracle Linux sync one distro > at a time and leave enough space. Yes, I understand and suspected as much. My question was primarily about setting the working_directory setting in server.conf, since this does not seem to be well documented. Thanks, Christina ___ Pulp-list mailing list Pulp-list@redhat.com https://www.redhat.com/mailman/listinfo/pulp-list
Re: [Pulp-list] Changing working_directory and/or reducing disk utilization during sync
> > This may be unrelated to the sync problem - but do you have the export > distributor configured on that repo? Hi Mihai, No, we aren't using the export distributor - but I'll keep that in mind if we end up needing it later. Thanks, Christina ___ Pulp-list mailing list Pulp-list@redhat.com https://www.redhat.com/mailman/listinfo/pulp-list
Re: [Pulp-list] Changing working_directory and/or reducing disk utilization during sync
This may be unrelated to the sync problem - but do you have the export distributor configured on that repo? It doesn't affect syncing at all, but there is a publish operation at the end of the sync. The export distributor tries hard to burn all your CPU while running mkisofs (for my usecase we don't need ISO images), as we have found out the hard way. That task may consume some extra space as well (which, as you point out, will be released) Mihai On Wed, Mar 29, 2017 at 9:57 AM, Christina Plummer wrote: > Hello all, > > I am running Pulp 2.9.2. We are facing issues with our /var filesystem > filling up when we do our nightly syncs - in particular, when we sync the > Oracle Linux channel: > http://public-yum.oracle.com/repo/OracleLinux/OL6/latest/x86_64/ > > Syncing this one repo uses 5+ GB of space on /var while the sync is > running. Usage typically returns to normal once the sync completes > (although if the filesystem completely fills we have had to restart > pulp_workers in order to clear it). We have increased the size of the > filesystem already (currently 8GB), but don't want to keep having to do so. > > We found the "working_directory" setting in server.conf, but couldn't find > much documentation about it. Since this is a production system, I wanted to > check with the list first to confirm: > 1) Will changing this to a location on a different, larger filesystem > address my issues with /var utilization spikes during repo sync? > 2) Are there any special considerations to changing this setting, other > than restarting all the services? Do I need to copy the subdirectories? Is > a symlink a bad idea? It looks like the SELinux context probably needs to > be set to pulp_var_cache_t. > 3) Is there another way to reduce Pulp's utilization during the sync? This > repo seems to be particularly egregious in terms of the massive size of the > uncompressed other.db and filelists.db for some reason. > > Thanks, > Christina > > ___ > Pulp-list mailing list > Pulp-list@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-list > ___ Pulp-list mailing list Pulp-list@redhat.com https://www.redhat.com/mailman/listinfo/pulp-list
[Pulp-list] Changing working_directory and/or reducing disk utilization during sync
Hello all, I am running Pulp 2.9.2. We are facing issues with our /var filesystem filling up when we do our nightly syncs - in particular, when we sync the Oracle Linux channel: http://public-yum.oracle.com/repo/OracleLinux/OL6/latest/x86_64/ Syncing this one repo uses 5+ GB of space on /var while the sync is running. Usage typically returns to normal once the sync completes (although if the filesystem completely fills we have had to restart pulp_workers in order to clear it). We have increased the size of the filesystem already (currently 8GB), but don't want to keep having to do so. We found the "working_directory" setting in server.conf, but couldn't find much documentation about it. Since this is a production system, I wanted to check with the list first to confirm: 1) Will changing this to a location on a different, larger filesystem address my issues with /var utilization spikes during repo sync? 2) Are there any special considerations to changing this setting, other than restarting all the services? Do I need to copy the subdirectories? Is a symlink a bad idea? It looks like the SELinux context probably needs to be set to pulp_var_cache_t. 3) Is there another way to reduce Pulp's utilization during the sync? This repo seems to be particularly egregious in terms of the massive size of the uncompressed other.db and filelists.db for some reason. Thanks, Christina ___ Pulp-list mailing list Pulp-list@redhat.com https://www.redhat.com/mailman/listinfo/pulp-list