Re: Pickle caching objects?
On Sun, Dec 01, 2019 at 12:26:15PM +1100, Chris Angelico wrote: I can't answer your question authoritatively, but I can suggest a place to look. Python's memory allocator doesn't always return memory to the system when the objects are freed up, for various reasons including the way that memory pages get allocated from. But it internally knows which parts are in use and which parts aren't. You're seeing the RSS go down slightly at some points, which would be the times when entire pages can be released; but other than that, what you'll end up with is a sort of high-water-mark with lots of unused space inside it. So what you're seeing isn't actual objects being cached, but just memory ready to be populated with future objects. Thank you and Richard for your responses, this makes perfect sense now. Cheers, -- José María (Chema) Mateos || https://rinzewind.org/ -- https://mail.python.org/mailman/listinfo/python-list
Pickle caching objects?
Hi, I just asked this question on the IRC channel but didn't manage to get a response, though some people replied with suggestions that expanded this question a bit. I have a program that has to read some pickle files, perform some operations on them, and then return. The pickle objects I am reading all have the same structure, which consists of a single list with two elements: the first one is a long list, the second one is a numpy object. I found out that, after calling that function, the memory taken by the Python executable (monitored using htop -- the entire thing runs on Python 3.6 on an Ubuntu 16.04, pretty standard conda installation with a few packages installed directly using `conda install`) increases in proportion to the size of the pickle object being read. My intuition is that that memory should be free upon exiting. Does pickle keep a cache of objects in memory after they have been returned? I thought that could be the answer, but then someone suggested to measure the time it takes to load the objects. This is a script I wrote to test this; nothing(filepath) just loads the pickle file, doesn't do anything with the output and returns how long it took to perform the load operation. --- import glob import pickle import timeit import os import psutil def nothing(filepath): start = timeit.default_timer() with open(filepath, 'rb') as f: _ = pickle.load(f) return timeit.default_timer() - start if __name__ == "__main__": filelist = glob.glob('/tmp/test/*.pk') for i, filepath in enumerate(filelist): print("Size of file {}: {}".format(i, os.path.getsize(filepath))) print("First call:", nothing(filepath)) print("Second call:", nothing(filepath)) print("Memory usage:", psutil.Process(os.getpid()).memory_info().rss) print() --- This is the output of the second time the script was run, to avoid any effects of potential IO caches: --- Size of file 0: 11280531 First call: 0.1466723980847746 Second call: 0.10044755204580724 Memory usage: 49418240 Size of file 1: 8955825 First call: 0.07904054620303214 Second call: 0.07996074995025992 Memory usage: 49831936 Size of file 2: 43727266 First call: 0.37741047400049865 Second call: 0.38176894187927246 Memory usage: 49758208 Size of file 3: 31122090 First call: 0.271301960805431 Second call: 0.27462846506386995 Memory usage: 49991680 Size of file 4: 634456686 First call: 5.526095286011696 Second call: 5.558765463065356 Memory usage: 539324416 Size of file 5: 3349952658 First call: 29.50982437795028 Second call: 29.461691531119868 Memory usage: 3443597312 Size of file 6: 9384929 First call: 0.0826977719552815 Second call: 0.08362263604067266 Memory usage: 3443597312 Size of file 7: 422137 First call: 0.0057482069823890924 Second call: 0.005949910031631589 Memory usage: 3443597312 Size of file 8: 409458799 First call: 3.562588643981144 Second call: 3.6001368327997625 Memory usage: 3441451008 Size of file 9: 44843816 First call: 0.3913297887245 Second call: 0.398518088972196 Memory usage: 3441451008 --- Notice that memory usage increases noticeably specially on files 4 and 5, the biggest ones, and doesn't come down as I would expect it to. But the loading time is constant, so I think I can disregard any pickle caching mechanisms. So I guess now my question is: can anyone give me any pointers as to why is this happening? Any help is appreciated. Thanks, -- José María (Chema) Mateos || https://rinzewind.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Random signal capture when using multiprocessing
On Sat, Jul 06, 2019 at 04:54:42PM +1000, Chris Angelico wrote: > But if I comment out the signal.signal line, there seem to be no ill > effects. I suspect that what you're seeing here is the multiprocessing > module managing its own subprocesses, telling some of them to shut > down. I added a print call to multiprocessing/popen_fork.py inside > _send_signal (line 53 or thereabouts depending on Python version) and > saw a *lot* of termination signals being sent; only a few actually > triggered the exception message. My guess is that most of the time, > the SIGTERM is smoothly handled as part of the Pool's __exit__ method, > but sometimes the child process is blocked on something, and has to > be told to shut down; and then normally, the signal gets caught and > handled just fine, but since you're explicitly hooking it, you get to > see it. Ok, that makes all the sense in the world, thanks for digging into this. -- José María (Chema) Mateos || https://rinzewind.org/ -- https://mail.python.org/mailman/listinfo/python-list
Random signal capture when using multiprocessing
Hi, This is a minimal proof of concept for something that has been bugging me for a few days: ``` $ cat signal_multiprocessing_poc.py import random import multiprocessing import signal import time def signal_handler(signum, frame): raise Exception(f"Unexpected signal {signum}!") def process_message(args): time.sleep(random.random() / 100.) if __name__ == "__main__": signal.signal(signal.SIGTERM, signal_handler) n_round = 1 while n_round < 10: job_list = [x for x in range(random.randint(100, 400))] print(f"Running round {n_round} with {len(job_list)} jobs") with multiprocessing.Pool(8) as p1: p1.map(process_message, job_list) n_round += 1 ``` So basically I have some subprocesses that don't do anything, just sleep for a few milliseconds, and I capture SIGTERM signals. I don't expect This is the output: ``` $ python signal_multiprocessing_poc.py Running round 1 with 244 jobs Running round 2 with 151 jobs Running round 3 with 173 jobs Running round 4 with 124 jobs Running round 5 with 249 jobs Running round 6 with 359 jobs Process ForkPoolWorker-48: Traceback (most recent call last): File "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/pool.py", line 110, in worker task = get() File "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/queues.py", line 352, in get res = self._reader.recv_bytes() File "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/synchronize.py", line 98, in __exit__ return self._semlock.__exit__(*args) File "signal_multiprocessing_poc.py", line 7, in signal_handler raise Exception(f"Unexpected signal {signum}!") Exception: Unexpected signal 15! Running round 7 with 185 jobs Running round 8 with 246 jobs Running round 9 with 217 jobs Process ForkPoolWorker-68: Traceback (most recent call last): File "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/pool.py", line 110, in worker task = get() File "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/queues.py", line 352, in get res = self._reader.recv_bytes() File "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/synchronize.py", line 98, in __exit__ return self._semlock.__exit__(*args) File "signal_multiprocessing_poc.py", line 7, in signal_handler raise Exception(f"Unexpected signal {signum}!") Exception: Unexpected signal 15! ``` Can anyone help me understand why the random SIGTERM gets captured? I guess some of the children processes receive the signal for termination and that's why I see it, but I don't understand the randomness of it. Of course, as a coworker just pointed out, this works (telling the children processes to ignore the signal handler): ``` import random import multiprocessing import signal import time def signal_handler(signum, frame): raise Exception(f"Unexpected signal {signum}!") def init_worker(): signal.signal(signal.SIGTERM, signal.SIG_IGN) signal.signal(signal.SIGINT, signal.SIG_IGN) def process_message(args): time.sleep(random.random() / 100.) if __name__ == "__main__": signal.signal(signal.SIGTERM, signal_handler) signal.signal(signal.SIGINT, signal_handler) n_round = 1 while n_round < 20: job_list = [x for x in range(random.randint(100, 400))] print(f"Running round {n_round} with {len(job_list)} jobs") with multiprocessing.Pool(8,init_worker) as p1: p1.map(process_message, job_list) n_round += 1 ``` Thanks for your help, -- José María (Chema) Mateos || https://rinzewind.org -- https://mail.python.org/mailman/listinfo/python-list
Re: send PIL.Image to django server side and get it back
On Mon, Jul 16, 2018 at 06:40:45AM -0700, Christos Georgiou - ΤΖΩΤΖΙΟΥ wrote: > You need first to serialize the object to bytes that can go over the > wire. There is no predefined way to do that, so you can: > > >>> import io > >>> file_like_object = io.BytesIO() > >>> PILImage.save(file_like_object, format='png') > > and then in your POST request send file_like_object.getvalue() as the > image data. You will most probably need to add a Content-Type: > image/png as a header. If you definitely need to send the data as a string, because you want to use a JSON object or similar, I've done this in the past using base64 encoding. https://docs.python.org/2/library/base64.html Cheers, -- José María (Chema) Mateos https://rinzewind.org/blog-es || https://rinzewind.org/blog-en -- https://mail.python.org/mailman/listinfo/python-list
Re: Where's the junk coming from?
On Mon, Jun 25, 2018, at 13:37, Mark Lawrence wrote: > More of the flaming things, this time name@1261/38.remove-ij1-this. Any > ideas as I don't understand this stuff? I've contacted the list admin about this. It doesn't seem like it's going to go away on its own. I just received another batch, for what it's worth. Cheers, -- José María (Chema) Mateos https://rinzewind.org/blog-es || https://rinzewind.org/blog-en -- https://mail.python.org/mailman/listinfo/python-list
Re: Where's the junk coming from?
On Sun, Jun 24, 2018 at 09:39:33PM +0100, Mark Lawrence wrote: > Hi folks, > > In the last hour or so I've seen via thunderbird and gmane around 15 > emails from various people where the from field is > name@1261/38.remove-r7u-this. The part after the @ symbol never > changes. I've seen the contents previously, apart from one from the > RUE. Users' complete email addresses are given right at the top. > What gives? Same for me. Could it be a news to mailing list gateway? I've found this header in some of the offending messages: X-Gateway: castlerockbbs.com [Synchronet 3.17a-Linux NewsLink 1.108] Cheers, -- José María (Chema) Mateos https://rinzewind.org/blog-es || https://rinzewind.org/blog-en -- https://mail.python.org/mailman/listinfo/python-list
Re: a Python bug report
On Wed, May 30, 2018 at 01:07:38AM +, Ruifeng Guo wrote: > Hello, > We encountered a bug in Python recently, we checked the behavior for Python > version 2.7.12, and 3.1.1, both version show the same behavior. Please see > below the unexpected behavior in "red text". Have you tried the round() function, however? In [1]: round(1000 * 1.017) Out[1]: 1017.0 This is a floating point precision "issue". int() only gets rid of the decimals. In [2]: int(3.9) Out[2]: 3 Because: In [3]: 1000 * 1.017 Out[3]: 1016.9 So there you have it. Some more reading: https://stackoverflow.com/questions/43660910/python-difference-between-round-and-int Cheers, -- José María (Chema) Mateos https://rinzewind.org/blog-es || https://rinzewind.org/blog-en -- https://mail.python.org/mailman/listinfo/python-list
Re: Usenet Gateway
On Thu, May 24, 2018, at 09:10, Chris Green wrote: > > Yes I can mark an entire thread as "read" in IMAP. > > > A *thread* yes, but not a whole list. I.e. if you read this using > mail/IMAP you can mark a thread read but you can't mark *all* Python > list messages read in one go can you? With tin/Usenet I look at the > list of new subjects in the Python group, I may investigate a couple > of threads, then I just hit 'C' and all of the Python group is marked > as read. Yes, you can, at least with mutt. I have a handy alias (ESC + m) that accomplish precisely that. Cheers, -- José María (Chema) Mateos https://rinzewind.org/blog-es || https://rinzewind.org/blog-en -- https://mail.python.org/mailman/listinfo/python-list
Re: Spam levels.
On Mon, May 21, 2018 at 10:00:41AM +0200, m wrote: > I also almost stopped reading c.l.python, because of enormous spam > levels. Do I have any option to read it without spam, other than launch > my own filtering NNTP server and do whack the mole game for myself? > > Maybe join forces and establish such server for public use? If you're willing to let NNTP access go, the mailing list works perfectly fine and is virtually spam-free. Cheers, -- José María (Chema) Mateos https://rinzewind.org/blog-es || https://rinzewind.org/blog-en -- https://mail.python.org/mailman/listinfo/python-list
Re: syntax oddities
On Fri, May 18, 2018 at 02:55:52PM +, Grant Edwards wrote: > You work someplace pretty unique. Everyplace I've worked has done the > whole top-posting and include the whole damn thread in reverse order > thing. It just doesn't work. The attached reverse-chronological > history doesn't seem to do _any_ good at all. AFAICT, nobody ever > reads it. Occasionally somebody will refer opaquely to something with > the phrase "see below" -- but there's never any indication to _what_ > among the fifteen messages and thirty attachements they are referring. In my experience, this "e-mail-that-contains-the-entire-conversation" is useful if and only if you happen to receive a forwarded copy so you know something you were not previously aware of. Otherwise, replies just accumulate past conversations because people are too lazy to bother. I wouldn't dare inline-replying in my current Outlook corporate environment. I just top-post, don't trim, go with the flow. Cheers, -- José María Mateos https://rinzewind.org/blog-es || https://rinzewind.org/blog-en -- https://mail.python.org/mailman/listinfo/python-list
Re: syntax oddities
On Thu, May 17, 2018 at 07:56:41AM -0700, Rich Shepard wrote: > Allow me to add an additional reason for trimming and responding > beneath each quoted section: it puts the response in the proper > context. And another one I learned recently on a similar conversation on another mailing list (that of the e-mail client I'm using right now): it is very useful for searches. Every e-mail contains just the right amount of text necessary to be properly read, as opposed to a more or less complete copy of the current thread. Cheers, -- José María Mateos https://rinzewind.org/blog-es || https://rinzewind.org/blog-en -- https://mail.python.org/mailman/listinfo/python-list
Re: Pandas, create new column if previous column(s) are not in [None, '', np.nan]
On Wed, Apr 11, 2018, at 14:48, zljubi...@gmail.com wrote: > I have a dataframe: > [...] This seems to work: df1 = pd.DataFrame( { 'A' : ['a', 'b', '', None, np.nan], 'B' : [None, np.nan, 'a', 'b', '']}) df1['C'] = df1[['A', 'B']].apply(lambda x: x[0] if x[1] in [None, '', np.nan] else x[1], axis = 1) Two notes: - Do apply() on axis = 1, so you process every row. - You lambda function wasn't entirely correct, if I understood what you wanted to do. Cheers, -- José María (Chema) Mateos https://rinzewind.org/blog-es || https://rinzewind.org/blog-en -- https://mail.python.org/mailman/listinfo/python-list
Re: psutil
On Tue, Feb 27, 2018 at 07:29:50PM -0500, Larry Martell wrote: > Trying to install psutil (with pip install psutil) on Red Hat EL 7. > It's failing with: > > Python.h: No such file or directory Two questions come to my mind: - Does it work if you try to install some other package? - Is `pip` by any change trying to install a Python 3 package, but you only have the libraries for Python 2 installed? Cheers, -- José María Mateos https://rinzewind.org/blog-es || https://rinzewind.org/blog-en -- https://mail.python.org/mailman/listinfo/python-list