Re: Retrieving non-/etc/passwd users with Python 3?
Christian Heimes writes: > On 31/03/2021 14.45, Loris Bennett wrote: >> Chris Angelico writes: >> >>> On Wed, Mar 31, 2021 at 11:21 PM Loris Bennett >>> wrote: Hi, I want to get a list of users on a Linux system using Python 3.6. All the users I am interested in are just available via LDAP and are not in /etc/passwd. Thus, in a bash shell I can use 'getent' to display them. When I try to install the PyPi package getent I get the error File "/tmp/pip-build-vu4lziex/getent/setup.py", line 9, in long_description = file('README.rst').read(), NameError: name 'file' is not defined I duckduckwent a bit and the problem seems to be that 'file' from Python 2 has been replaced by 'open' in Python 3. So what's the standard way of getting a list of users in this case? >>> >>> I don't have LDAP experience so I don't know for sure, but is the >>> stdlib "pwd" module suitable, or does it only read /etc/passwd? >>> >>> https://docs.python.org/3/library/pwd.html >>> >>> Failing that, one option - and not as bad as you might think - is >>> simply to run getent using the subprocess module, and parse its >>> output. Sometimes that's easier than finding (or porting!) a library. >> >> D'oh! Thanks, 'pwd' is indeed exactly what I need. When I read the >> documentation here >> >> https://docs.python.org/3.6/library/pwd.html >> >> I mistakenly got the impression that it was only going to give me the >> local users. It doesn't actually say that, but it mentions /etc/shadow >> and not getent. However, it does talk about the "account and password >> database", which is a clue (although our passwords are on an other >> system entirely), since "database" is more getent terminology. >> >> In any case, I think 'pwd' is hiding its light under a bushel a bit >> here. > > Please open a documentation bug :) I'll have a look :) > The pwd and grp module use the libc API to get users from the local > account database. On Linux and glibc the account database is handled by > NSS and nsswitch.conf. > > By the way I recommend that you use SSSD instead of talking to LDAP > directly. You'll have a much more pleasant experience. Yes, we do use SSSD, but my grasp of what it does is pretty much limited to "as well as looking at the local /etc/passwd it can also talk to LDAP" :/ Cheers, Loris -- This signature is currently under construction. -- https://mail.python.org/mailman/listinfo/python-list
Re: Canonical conversion of dict of dicts to list of dicts
On 31/03/21 7:37 pm, dn wrote: Python offers mutable (can be changed) and immutable (can't) objects (remember: 'everything is an object'): https://docs.python.org/3/reference/datamodel.html?highlight=mutable%20data While that's true, it's actually irrelevant to this situation. $ a = "bob" $ b = a $ b = "bert" $ a 'bob' Here, you're not even attempting to modify the object that is bound to b; instead, you're rebinding the name b to a different object. Whether the object to which b was previously bound is mutable or not makes no difference. You can see this if you do the equivalent thing with lists: >>> a = ["alice", "bob", "carol"] >>> b = a >>> b ['alice', 'bob', 'carol'] >>> b = ['dave', 'edward', 'felicity'] >>> a ['alice', 'bob', 'carol'] >>> b ['dave', 'edward', 'felicity'] -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Horrible abuse of __init_subclass__, or elegant hack?
On 01/04/2021 13.54, Chris Angelico wrote: > On Thu, Apr 1, 2021 at 11:39 AM dn via Python-list > wrote: >> >> On 01/04/2021 12.14, Chris Angelico wrote: >>> I think this code makes some sort of argument in the debate about >>> whether Python has too much flexibility or if it's the best >>> metaprogramming toolset in the world. I'm not sure which side of the >>> debate it falls on, though. >>> >>> class Building: >>> resource = None >>> @classmethod >>> def __init_subclass__(bldg): >>> super().__init_subclass__() >>> print("Building:", bldg.__name__) >>> def make_recipe(recip): >>> print(recip.__name__.replace("_", " "), "is made in a", >>> bldg.__name__.replace("_", " ")) >>> bldg.__init_subclass__ = classmethod(make_recipe) >>> >>> >>> class Extractor(Building): ... >>> class Refinery(Building): ... >>> >>> class Crude(Extractor): >>> resource = "Oil" >>> time: 1 >>> Crude: 1 >>> >>> class Plastic(Refinery): >>> Crude: 3 >>> time: 6 >>> Residue: 1 >>> Plastic: 2 >>> >>> class Rubber(Refinery): >>> Crude: 3 >>> time: 6 >>> Residue: 2 >>> Rubber: 2 >> >> >> [pauses for a moment, to let his mind unwind and return to (what passes >> as) 'reality'] > > Real and imaginary are the same thing, just rotated a quarter turn In which dimension(s)? >> Without looking into the details/context: surely there's a more >> straightforward approach? > > Perhaps, but there are potentially a LOT of recipes, and I needed to > be able to cleanly edit those, even if the code at the top was a mess. > (The goal here is to map out production patterns in the game > "Satisfactory", for the curious. It's easy to add other things, like > computer manufacturing or bauxite processing, simply by adding more > recipes.) > > My original plan was basically pairwise tuple summing (deriving a set > of "oil in, water in, rubber out, fuel out" for each set of recipes, > where some might be zero), but it turned out that that wasn't flexible > enough, and it really needed more options than that. Which was where my mind was going*. Why not a dict of inputs, processes, and outputs**? Each dict having variable length, from None, and key:values assigned at declaration/init. In the case of process, the contained objects could be Python-functions. With "compact representation" (3.6+) the functions could also be relied upon to represent a 'production line' or pipeline of functions. * but it strayed so far, I had to ask for it back ** in my mindless state, this combination of three activities seemed familiar, to the point of providing much-needed comfort. >> As to this, I'm slightly amused, but perhaps not in a good way: >> >> class Sanatorium( Building ): >>patient_name = "Chris" >>duration_of_treatment = "life" > > I already have certificates from Rutledge's Asylum and MaayaInsane's > (unnamed) asylum, so that seems pretty likely. Noted you on the list of lauded alumni at the latter. When you left the former, did they allow you to keep the t-shirt, or did you have to buy your own memorabilia? (https://mysterious.americanmcgee.com/products/rutledge-asylum-mug) The latter's treatment list sounds remarkably like .mil training. I know of plenty with that t-shirt - but can't think of a one sporting a mug... Should you have one, kindly bring it (with appropriate contents) come ANZAC Day at the end of this month... >> Thus, design suggestion: add a 'back-door' to the __init_subclass__ to >> ensure access to the Internet from any/all buildings! > > Perfect. Nobody'll find it. I'll have full access to Usenet News from > a secret panel in one of the padded sections of the wall. Surely, in such a state of mind, one's natural 'home' would be "the dark web"? Curiously, last night, the nieces (now long passed that age) were talking about the different modes they used to get to various of their schools. (nostalgia in one's twenties???) Which, it should have been expected, opened the way for their father to indulge in the usual grey-hair stories as: walking so many miles barefoot through the snow, falling backwards off the horse because there were so many others climbing up in-front, ... After rolling their eyes (compulsory Dad-joke, -dance, -comment, ... behavioral-response) they attempted to de-rail his, um, railing, by reminding everyone that I went to (boarding) school (aka gentle asylum for young boys - at a considerable distance from 'polite society') by long-distance train. The only understanding of which came when they watched the Harry Potter films and saw the school-kids collecting at a ?London terminus on their way to magic-school. Magic, you ask? Well, maybe more "sinister". We did manage to find a loose floor-board, but a sad life-lesson was learned, when certain ones (un-named*) took it upon themselves to eat all of the contraband secreted there. Another dorm[itory] did manage to prise-open a wall-panel. Their boasting creat
Re: memory consumption
On 31/03/2021 09:35, Alexey wrote: среда, 31 марта 2021 г. в 01:20:06 UTC+3, Dan Stromberg: What if you increase the machine's (operating system's) swap space? Does that take care of the problem in practice? I can`t do that because it will affect other containers running on this host. In my opinion it may significantly reduce their performance. Probably still worth trying. Always better to measure than to guess. Rob Cliffe -- https://mail.python.org/mailman/listinfo/python-list
Re: Horrible abuse of __init_subclass__, or elegant hack?
On Thu, Apr 1, 2021 at 11:39 AM dn via Python-list wrote: > > On 01/04/2021 12.14, Chris Angelico wrote: > > I think this code makes some sort of argument in the debate about > > whether Python has too much flexibility or if it's the best > > metaprogramming toolset in the world. I'm not sure which side of the > > debate it falls on, though. > > > > class Building: > > resource = None > > @classmethod > > def __init_subclass__(bldg): > > super().__init_subclass__() > > print("Building:", bldg.__name__) > > def make_recipe(recip): > > print(recip.__name__.replace("_", " "), "is made in a", > > bldg.__name__.replace("_", " ")) > > bldg.__init_subclass__ = classmethod(make_recipe) > > > > > > class Extractor(Building): ... > > class Refinery(Building): ... > > > > class Crude(Extractor): > > resource = "Oil" > > time: 1 > > Crude: 1 > > > > class Plastic(Refinery): > > Crude: 3 > > time: 6 > > Residue: 1 > > Plastic: 2 > > > > class Rubber(Refinery): > > Crude: 3 > > time: 6 > > Residue: 2 > > Rubber: 2 > > > [pauses for a moment, to let his mind unwind and return to (what passes > as) 'reality'] Real and imaginary are the same thing, just rotated a quarter turn > Without looking into the details/context: surely there's a more > straightforward approach? Perhaps, but there are potentially a LOT of recipes, and I needed to be able to cleanly edit those, even if the code at the top was a mess. (The goal here is to map out production patterns in the game "Satisfactory", for the curious. It's easy to add other things, like computer manufacturing or bauxite processing, simply by adding more recipes.) My original plan was basically pairwise tuple summing (deriving a set of "oil in, water in, rubber out, fuel out" for each set of recipes, where some might be zero), but it turned out that that wasn't flexible enough, and it really needed more options than that. > As to this, I'm slightly amused, but perhaps not in a good way: > > class Sanatorium( Building ): >patient_name = "Chris" >duration_of_treatment = "life" I already have certificates from Rutledge's Asylum and MaayaInsane's (unnamed) asylum, so that seems pretty likely. > Thus, design suggestion: add a 'back-door' to the __init_subclass__ to > ensure access to the Internet from any/all buildings! Perfect. Nobody'll find it. I'll have full access to Usenet News from a secret panel in one of the padded sections of the wall. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Horrible abuse of __init_subclass__, or elegant hack?
On 01/04/2021 12.14, Chris Angelico wrote: > I think this code makes some sort of argument in the debate about > whether Python has too much flexibility or if it's the best > metaprogramming toolset in the world. I'm not sure which side of the > debate it falls on, though. > > class Building: > resource = None > @classmethod > def __init_subclass__(bldg): > super().__init_subclass__() > print("Building:", bldg.__name__) > def make_recipe(recip): > print(recip.__name__.replace("_", " "), "is made in a", > bldg.__name__.replace("_", " ")) > bldg.__init_subclass__ = classmethod(make_recipe) > > > class Extractor(Building): ... > class Refinery(Building): ... > > class Crude(Extractor): > resource = "Oil" > time: 1 > Crude: 1 > > class Plastic(Refinery): > Crude: 3 > time: 6 > Residue: 1 > Plastic: 2 > > class Rubber(Refinery): > Crude: 3 > time: 6 > Residue: 2 > Rubber: 2 [pauses for a moment, to let his mind unwind and return to (what passes as) 'reality'] Without looking into the details/context: surely there's a more straightforward approach? As to this, I'm slightly amused, but perhaps not in a good way: class Sanatorium( Building ): patient_name = "Chris" duration_of_treatment = "life" Thus, design suggestion: add a 'back-door' to the __init_subclass__ to ensure access to the Internet from any/all buildings! -- Regards, =dn -- https://mail.python.org/mailman/listinfo/python-list
Re: Horrible abuse of __init_subclass__, or elegant hack?
On 3/31/21 4:14 PM, Chris Angelico wrote: I think this code makes some sort of argument in the debate about whether Python has too much flexibility or if it's the best metaprogramming toolset in the world. I'm not sure which side of the debate it falls on, though. Well, `__init_subclass__` is there to provide metaclass power without needing a full-blown metaclass. I vote elegant hack. :) -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Horrible abuse of __init_subclass__, or elegant hack?
I think this code makes some sort of argument in the debate about whether Python has too much flexibility or if it's the best metaprogramming toolset in the world. I'm not sure which side of the debate it falls on, though. class Building: resource = None @classmethod def __init_subclass__(bldg): super().__init_subclass__() print("Building:", bldg.__name__) def make_recipe(recip): print(recip.__name__.replace("_", " "), "is made in a", bldg.__name__.replace("_", " ")) bldg.__init_subclass__ = classmethod(make_recipe) class Extractor(Building): ... class Refinery(Building): ... class Crude(Extractor): resource = "Oil" time: 1 Crude: 1 class Plastic(Refinery): Crude: 3 time: 6 Residue: 1 Plastic: 2 class Rubber(Refinery): Crude: 3 time: 6 Residue: 2 Rubber: 2 Full code is here if you want context: https://github.com/Rosuav/shed/blob/master/satisfactory-production.py Subclassing Building defines a class that is a building. (The ellipsis body is a placeholder; I haven't implemented stuff where the buildings know about their power consumptions and such. Eventually they'll have other attributes.) But subclassing a building defines a recipe that is produced in that building. Markers placed before the "time" are ingredients, those after the "time" are products. There are actually a lot of interesting wrinkles to trying to replace __init_subclass__ on the fly. Things get quite entertaining if you don't use the decorator, or if you define and decorate the function outside of the class, or various other combinations. On a scale of 1 to "submit this to The Daily WTF immediately", how bad is this code? :) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Unable to find 'edit with ide' option in the Context menu
On 3/31/2021 2:11 AM, Arjav Jain wrote: I am using the lastest version of python recently. But I am facing a problem with the python files, When I am right clicking any python file there is no option for `Edit with idle'. I have repaired the python installation too, but this doesn't solves my problem, please help! Did you check (or leave checked) [x] Install tkinter and IDLE? Can you start IDLE otherwise? -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: memory consumption
среда, 31 марта 2021 г. в 18:17:46 UTC+3, Dieter Maurer: > Alexey wrote at 2021-3-31 02:43 -0700: > >среда, 31 марта 2021 г. в 06:54:52 UTC+3, Inada Naoki: > > ... > >> You can get some hints from sys._debugmallocstats(). It prints > >> obmalloc (allocator for small objects) stats to stderr. > >> Try printing stats before and after 1st run, and after 2nd run. And > >> post it in this thread if you can. (no sensible information in the > >> stats). > `glibc` has similar functions to monitor the memory allocation > at the C level: `mallinfo[2]`, `malloc_stats`, `malloc_info`. > > The `mallinfo` functions can be called via `ctypes`. > Provided your `glibc` has `mallinfo2`, I recommend its use. > > In order to use `malloc_info` from Python, you need > a C extension. I have one implemented via `cython`. Let me know, > if you are interested. I think I found something. I'll return tomorrow with update. -- https://mail.python.org/mailman/listinfo/python-list
Re: memory consumption
Alexey wrote at 2021-3-31 02:43 -0700: >среда, 31 марта 2021 г. в 06:54:52 UTC+3, Inada Naoki: > ... >> You can get some hints from sys._debugmallocstats(). It prints >> obmalloc (allocator for small objects) stats to stderr. >> Try printing stats before and after 1st run, and after 2nd run. And >> post it in this thread if you can. (no sensible information in the >> stats). `glibc` has similar functions to monitor the memory allocation at the C level: `mallinfo[2]`, `malloc_stats`, `malloc_info`. The `mallinfo` functions can be called via `ctypes`. Provided your `glibc` has `mallinfo2`, I recommend its use. In order to use `malloc_info` from Python, you need a C extension. I have one implemented via `cython`. Let me know, if you are interested. -- https://mail.python.org/mailman/listinfo/python-list
Source code link was: Re: Ann: New Python curses book
On 31/03/2021 00:09, Alan Gauld via Python-list wrote: > Watch this space. Hopefully tomorrow. The source code is now available in a zip file at: http://www.alan-g.me.uk/hills/PythonCursesCode.zip Or via a link on the programming section of my home page http://www.alan-g.me.uk/ It is licensed using a Creative Commons license. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos -- https://mail.python.org/mailman/listinfo/python-list
Unable to find 'edit with ide' option in the Context menu
I am using the lastest version of python recently. But I am facing a problem with the python files, When I am right clicking any python file there is no option for `Edit with idle'. I have repaired the python installation too, but this doesn't solves my problem, please help! Sent from [1]Mail for Windows 10 References Visible links 1. https://go.microsoft.com/fwlink/?LinkId=550986 -- https://mail.python.org/mailman/listinfo/python-list
Re: Retrieving non-/etc/passwd users with Python 3?
On 31/03/2021 14.45, Loris Bennett wrote: > Chris Angelico writes: > >> On Wed, Mar 31, 2021 at 11:21 PM Loris Bennett >> wrote: >>> >>> Hi, >>> >>> I want to get a list of users on a Linux system using Python 3.6. All >>> the users I am interested in are just available via LDAP and are not in >>> /etc/passwd. Thus, in a bash shell I can use 'getent' to display them. >>> >>> When I try to install the PyPi package >>> >>> getent >>> >>> I get the error >>> >>> File "/tmp/pip-build-vu4lziex/getent/setup.py", line 9, in >>> long_description = file('README.rst').read(), >>> NameError: name 'file' is not defined >>> >>> I duckduckwent a bit and the problem seems to be that 'file' from Python >>> 2 has been replaced by 'open' in Python 3. >>> >>> So what's the standard way of getting a list of users in this case? >>> >> >> I don't have LDAP experience so I don't know for sure, but is the >> stdlib "pwd" module suitable, or does it only read /etc/passwd? >> >> https://docs.python.org/3/library/pwd.html >> >> Failing that, one option - and not as bad as you might think - is >> simply to run getent using the subprocess module, and parse its >> output. Sometimes that's easier than finding (or porting!) a library. > > D'oh! Thanks, 'pwd' is indeed exactly what I need. When I read the > documentation here > > https://docs.python.org/3.6/library/pwd.html > > I mistakenly got the impression that it was only going to give me the > local users. It doesn't actually say that, but it mentions /etc/shadow > and not getent. However, it does talk about the "account and password > database", which is a clue (although our passwords are on an other > system entirely), since "database" is more getent terminology. > > In any case, I think 'pwd' is hiding its light under a bushel a bit > here. Please open a documentation bug :) The pwd and grp module use the libc API to get users from the local account database. On Linux and glibc the account database is handled by NSS and nsswitch.conf. By the way I recommend that you use SSSD instead of talking to LDAP directly. You'll have a much more pleasant experience. Christian -- https://mail.python.org/mailman/listinfo/python-list
Re: Retrieving non-/etc/passwd users with Python 3?
Chris Angelico writes: > On Wed, Mar 31, 2021 at 11:21 PM Loris Bennett > wrote: >> >> Hi, >> >> I want to get a list of users on a Linux system using Python 3.6. All >> the users I am interested in are just available via LDAP and are not in >> /etc/passwd. Thus, in a bash shell I can use 'getent' to display them. >> >> When I try to install the PyPi package >> >> getent >> >> I get the error >> >> File "/tmp/pip-build-vu4lziex/getent/setup.py", line 9, in >> long_description = file('README.rst').read(), >> NameError: name 'file' is not defined >> >> I duckduckwent a bit and the problem seems to be that 'file' from Python >> 2 has been replaced by 'open' in Python 3. >> >> So what's the standard way of getting a list of users in this case? >> > > I don't have LDAP experience so I don't know for sure, but is the > stdlib "pwd" module suitable, or does it only read /etc/passwd? > > https://docs.python.org/3/library/pwd.html > > Failing that, one option - and not as bad as you might think - is > simply to run getent using the subprocess module, and parse its > output. Sometimes that's easier than finding (or porting!) a library. D'oh! Thanks, 'pwd' is indeed exactly what I need. When I read the documentation here https://docs.python.org/3.6/library/pwd.html I mistakenly got the impression that it was only going to give me the local users. It doesn't actually say that, but it mentions /etc/shadow and not getent. However, it does talk about the "account and password database", which is a clue (although our passwords are on an other system entirely), since "database" is more getent terminology. In any case, I think 'pwd' is hiding its light under a bushel a bit here. Cheers, Loris -- This signature is currently under construction. -- https://mail.python.org/mailman/listinfo/python-list
Re: Retrieving non-/etc/passwd users with Python 3?
On Wed, Mar 31, 2021 at 11:21 PM Loris Bennett wrote: > > Hi, > > I want to get a list of users on a Linux system using Python 3.6. All > the users I am interested in are just available via LDAP and are not in > /etc/passwd. Thus, in a bash shell I can use 'getent' to display them. > > When I try to install the PyPi package > > getent > > I get the error > > File "/tmp/pip-build-vu4lziex/getent/setup.py", line 9, in > long_description = file('README.rst').read(), > NameError: name 'file' is not defined > > I duckduckwent a bit and the problem seems to be that 'file' from Python > 2 has been replaced by 'open' in Python 3. > > So what's the standard way of getting a list of users in this case? > I don't have LDAP experience so I don't know for sure, but is the stdlib "pwd" module suitable, or does it only read /etc/passwd? https://docs.python.org/3/library/pwd.html Failing that, one option - and not as bad as you might think - is simply to run getent using the subprocess module, and parse its output. Sometimes that's easier than finding (or porting!) a library. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Retrieving non-/etc/passwd users with Python 3?
Hi, I want to get a list of users on a Linux system using Python 3.6. All the users I am interested in are just available via LDAP and are not in /etc/passwd. Thus, in a bash shell I can use 'getent' to display them. When I try to install the PyPi package getent I get the error File "/tmp/pip-build-vu4lziex/getent/setup.py", line 9, in long_description = file('README.rst').read(), NameError: name 'file' is not defined I duckduckwent a bit and the problem seems to be that 'file' from Python 2 has been replaced by 'open' in Python 3. So what's the standard way of getting a list of users in this case? Cheers, Loris -- This signature is currently under construction. -- https://mail.python.org/mailman/listinfo/python-list
Re: memory consumption
среда, 31 марта 2021 г. в 14:16:30 UTC+3, Inada Naoki: > > ** Before first run: > > # arenas allocated total = 776 > > # arenas reclaimed = 542 > > # arenas highwater mark = 234 > > # arenas allocated current = 234 > > 234 arenas * 262144 bytes/arena = 61,341,696 > > ** After first run: > > # arenas allocated total = 47,669 > > # arenas reclaimed = 47,316 > > # arenas highwater mark = 10,114 > > # arenas allocated current = 353 > > 353 arenas * 262144 bytes/arena = 92,536,832 > > ** After second run: > > # arenas allocated total = 63,635 > > # arenas reclaimed = 63,238 > > # arenas highwater mark = 10,114 > > # arenas allocated current = 397 > > 397 arenas * 262144 bytes/arena = 104,071,168 > OK, memory allocated by obmalloc is 61MB -> 92MB -> 104MB. > > Memory usage increasing, but it is much smaller than 1GB. 90% memory > is allocated by malloc(). > > You should try jemalloc. Trying jemalloc is not hard. You don't need > to rebuild Python. > Google " jemalloc LD_PRELOAD". > > > -- > Inada Naoki With jemalloc it looks like a memory leak :D After first run it grabs 980Mb, second run 1.4Gb then 2.6Gb and so on -- https://mail.python.org/mailman/listinfo/python-list
Re: memory consumption
> ** Before first run: > # arenas allocated total = 776 > # arenas reclaimed = 542 > # arenas highwater mark= 234 > # arenas allocated current = 234 > 234 arenas * 262144 bytes/arena= 61,341,696 > ** After first run: > # arenas allocated total = 47,669 > # arenas reclaimed = 47,316 > # arenas highwater mark= 10,114 > # arenas allocated current = 353 > 353 arenas * 262144 bytes/arena= 92,536,832 > ** After second run: > # arenas allocated total = 63,635 > # arenas reclaimed = 63,238 > # arenas highwater mark= 10,114 > # arenas allocated current = 397 > 397 arenas * 262144 bytes/arena= 104,071,168 OK, memory allocated by obmalloc is 61MB -> 92MB -> 104MB. Memory usage increasing, but it is much smaller than 1GB. 90% memory is allocated by malloc(). You should try jemalloc. Trying jemalloc is not hard. You don't need to rebuild Python. Google " jemalloc LD_PRELOAD". -- Inada Naoki -- https://mail.python.org/mailman/listinfo/python-list
Re: memory consumption
среда, 31 марта 2021 г. в 11:52:43 UTC+3, Marco Ippolito: > > > At which point does the problem start manifesting itself? > > The problem spot is my cache(dict). I simplified my code to just load > > all the objects to this dict and then clear it. > What's the memory utilisation just _before_ performing this load? I am > assuming > it's much less than this 1 GB you can't seem to drop under after you run your > `.clear()`. Around 100Mb before first run. > > After loading "top" > > You may be using `top` in command line mode already but in case you aren't, > consider sorting processes whose command name is `python` (or whatever filter > selects your program) by RSS, like so, for easier collection of > machine-readable statistics: I'm using following command to highlight what I need - top -c -p $(pgrep -d',' -f python) and then sort by RSS and switch to Mb by pressing 'e'. > # ps -o rss,ppid,pid,args --sort -rss $(pgrep python) > RSS PPID PID COMMAND > 32836 14130 14377 python3 > 10644 14540 14758 python3 > > For debugging I use Pycharm > Sounds good, you can then use the GUI to set the breakpoint and consult > external statistics-gathering programs (like the `ps` invocation above) as > you > step through your code. > > Pycharm also allows you to see which variables are in scope in a particular > stack frame, so you'll have an easier time reasoning about garbage collection > in terms of which references might be preventing GC. That's what I tried in the first place and I see no references to this dict. I'll try that one more time anyway. -- https://mail.python.org/mailman/listinfo/python-list
Re: memory consumption
среда, 31 марта 2021 г. в 06:54:52 UTC+3, Inada Naoki: > First of all, I recommend upgrading your Python. Python 3.6 is a bit old. I was thinking about that. > As you saying, Python can not return the memory to OS until the whole > arena become unused. > If your task releases all objects allocated during the run, Python can > release the memory. > But if your task keeps at least one object, it may prevent releasing > the whole arena (256KB). > > Python manages only small (~256bytes) objects. Larger objects is > allocated by malloc(). > And glibc malloc may not efficient for some usage. jemalloc is better > for many use cases. > > You can get some hints from sys._debugmallocstats(). It prints > obmalloc (allocator for small objects) stats to stderr. > Try printing stats before and after 1st run, and after 2nd run. And > post it in this thread if you can. (no sensible information in the > stats). > > That is all I can advise. ** Before first run: class size num pools blocks in use avail blocks - - - 0 8 52404 126 1 16 3 611 148 2 24 13210381 3 32 371 4670343 4 40 292 2942270 5 48 233 1949973 6 561360 9790317 7 641614 10165428 8 721964 10994737 9 801056 5278317 10 88 436 2002333 11 96 297 1245519 12104 266 1009513 13112 193693018 14120 127417021 15128 217671215 161361299 37669 2 171441223 34239 5 18152 113292018 19160 781949 1 201681474 35369 7 21176 541237 5 22184 46 99121 23192 42 86418 24200 531054 6 25208 39 71328 26216 54 95517 27224 575 10350 0 28232 43 724 7 29240 32 49715 30248 73115315 31256 29 431 4 32264 25 375 0 33272 46 637 7 34280 24 328 8 35288 20 280 0 36296 3985167 7 37304 21 26112 38312 22 256 8 39320 17 195 9 40328 18 215 1 41336 57 675 9 42344 17 183 4 43352 18 194 4 44360 14 153 1 45368 14 153 1 46376 15 148 2 47384 15 148 2 48392 14 131 9 49400 15 149 1 50408 17 147 6 51416 16 142 2 52424 25 221 4 53432 36 317 7 54440 44 393 3 55448 45 399 6 56456 53 420 4 57464 46 363 5 58472 36 288 0 59480 35 274 6 60488 29 227 5 61496 29 230 2 62504 21 161 7 63512 85 589 6 # arenas allocated total = 776 # arenas reclaimed = 542 # arenas highwater mark= 234 # arenas allocated current = 234 234 arenas * 262144 bytes/arena= 61,341,696 # bytes in allocated blocks= 59,737,176 # bytes in available blocks
Re: memory consumption
среда, 31 марта 2021 г. в 05:45:27 UTC+3, cameron...@gmail.com: > Since everyone is talking about vague OS memory use and not at all about > working set size of Python objects, let me ... > On 29Mar2021 03:12, Alexey wrote: > >I'm experiencing problems with memory consumption. > > > >I have a class which is doing ETL job. What`s happening inside: > > - fetching existing objects from DB via SQLAchemy > Do you need to? Or do you only need to fetch their ids? Or do you only > need to fetch a subset of the objects? I really need all the objects because I'm performing update and create operations. If I'll be fetching them on the go, this will take hours or even days to complete. > It is easy to accidentally suck in way too many db session entity > objects, or at any rate, more than you need to. > > - iterate over raw data > Can you prescan the data to determine which objects you care about, > reducing the number of objects you need to obtain? In this case I still need to iterate over raw and old data. As I said before if I'll try it without caching it'll take days > > - create new/update existing objects > Depoending what you're doing, you may not need to "create new/update > existing objects". You could collate changes and do an UPSERT (the > incantation varies a little depending on the SQL dialect behind > SQLAlchemy). Good advice. > > - commit changes > > Do you discard the SQLAlchemy session after this? Otherwise it may lurk > and hold onto the objects. Commit doesn't forget the objects. I tried expire_all() and expunge_all. Should I try rollback ? > For my current client we have a script to import historic data from a > legacy system. It has many of the issues you're dealing with: the naive > (ORM) way consumes gads of memory, and can be very slow too (udating > objects in an ad hoc manner tends to do individual UPDATE SQL commands, > very latency laden). > > I wrote a generic batch UPSERT function which took an accrued list of > changes and prepared a PostgreSQL INSERT...ON CONFLICT statement. The > main script hands it the accrued updates and it runs batches (which lets > up do progress reporting). Orders of magnitude faster, _and_ does not > require storing the db objects. > > On the subject of "fetching existing objects from DB via SQLAchemy": you > may not need to do that, either. Can you identify _which_ objects are of > interest? Associate with the same script I've go a batch_select > function: it takes an terable if object ids and collects them in > batches, where before we were really scanning the whole db because we > had an arbitrary scattering of relevant object ids from the raw data. I'll try to analyze if it's possible to rewrite code this way > It basicly collected ids into batches, and ran a SELECT...WHERE id in > (batch-of-ids). It's really fast considering, and also scales _way_ down > when the set of arbitrary ids is small. > > I'm happy to walk through the mechanics of these with you; the code at > this end is Django's ORM, but I prefer SQLAlchemy anyway - the project > dictated the ORM here. > >Before processing data I create internal cache(dictionary) and store all > >existing objects in it. > >Every 1 items I do bulk insert and flush. At the end I run commit > >command. > Yah. I suspect the session data are not being released. Also, SQLAlchemy > may be caching sessions or something across runs, since this is a celery > worker which survives from one task to the next. I tried to dig in this direction. Created a few graphs with "objgraph" but it has so much references under the hood. I'll try to measure size of session object before and after building cache. > You could try explicitly creating a new SQLAlchemy session around your > task. > >Problem. Before executing, my interpreter process weighs ~100Mb, after first > >run memory increases up to 500Mb > >and after second run it weighs 1Gb. If I will continue to run this class, > >memory wont increase, so I think > >it's not a memory leak, but rather Python wont release allocated memory back > >to OS. Maybe I'm wrong. > I don't know enough about Python's "release OS memory" phase. But > reducing the task memory footprint will help regardless. Definitely. I'll think about it. Thank you! -- https://mail.python.org/mailman/listinfo/python-list
Re: memory consumption
> > At which point does the problem start manifesting itself? > The problem spot is my cache(dict). I simplified my code to just load > all the objects to this dict and then clear it. What's the memory utilisation just _before_ performing this load? I am assuming it's much less than this 1 GB you can't seem to drop under after you run your `.clear()`. > After loading "top" You may be using `top` in command line mode already but in case you aren't, consider sorting processes whose command name is `python` (or whatever filter selects your program) by RSS, like so, for easier collection of machine-readable statistics: # ps -o rss,ppid,pid,args --sort -rss $(pgrep python) RSSPPID PID COMMAND 32836 14130 14377 python3 10644 14540 14758 python3 > For debugging I use Pycharm Sounds good, you can then use the GUI to set the breakpoint and consult external statistics-gathering programs (like the `ps` invocation above) as you step through your code. Pycharm also allows you to see which variables are in scope in a particular stack frame, so you'll have an easier time reasoning about garbage collection in terms of which references might be preventing GC. -- https://mail.python.org/mailman/listinfo/python-list
Re: memory consumption
среда, 31 марта 2021 г. в 01:20:06 UTC+3, Dan Stromberg: > On Tue, Mar 30, 2021 at 1:25 AM Alexey wrote: > > > > > I'm sorry. I didn't understand your question right. If I have 4 workers, > > they require 4Gb > > in idle state and some extra memory when they execute other tasks. If I > > increase workers > > count up to 16, they`ll eat all the memory I have (16GB) on my machine and > > will crash as soon > > as system get swapped. > > > What if you increase the machine's (operating system's) swap space? Does > that take care of the problem in practice? I can`t do that because it will affect other containers running on this host. In my opinion it may significantly reduce their performance. -- https://mail.python.org/mailman/listinfo/python-list
Re: memory consumption
вторник, 30 марта 2021 г. в 18:43:54 UTC+3, Alan Gauld: > On 29/03/2021 11:12, Alexey wrote: > The first thing you really need to tell us is which > OS you are using? Memory management varies wildly > depending on OS. Even different flavours of *nix > do it differently. I'm using Ubuntu(5.8.0-45-generic #51~20.04.1-Ubuntu) in development and Centos 7 in production > However, most do it effectively, so you as a programmer > shouldn't have to worry too much provided you aren't > leaking, which you don't think you are. > > and after second run it weighs 1Gb. If I will continue > > to run this class, memory wont increase, so I think > > it's not a memory leak, but rather Python wont release > > allocated memory back to OS. Maybe I'm wrong. > A 1GB process on modern computers is hardly a big problem? > Most machines have 4G and many have 16G or even 32G > nowadays. In case of one worker it's ok. But when 8 workers holding 8Gb of garbage it becomes a problem and I cant ignore this due to company rules. -- https://mail.python.org/mailman/listinfo/python-list
Re: memory consumption
вторник, 30 марта 2021 г. в 18:43:51 UTC+3, Marco Ippolito: > Have you tried to identify where in your code the surprising memory > allocations > are made? Yes. > You could "bisect search" by adding breakpoints: > > https://docs.python.org/3/library/functions.html#breakpoint > > At which point does the problem start manifesting itself? The problem spot is my cache(dict). I simplified my code to just load all the objects to this dict and then clear it. After loading "top" was showing resident memory usage at 3.3Gb and immediately after that I did self.__cache.clear() and memory reduced to 1Gb. Then I tried to find any references to this dict with no luck. Also I tried "del self.__cache". For debugging I use Pycharm -- https://mail.python.org/mailman/listinfo/python-list