Re: Pickle caching objects?

2019-12-01 Thread José María Mateos

On Sun, Dec 01, 2019 at 12:26:15PM +1100, Chris Angelico wrote:

I can't answer your question authoritatively, but I can suggest a
place to look. Python's memory allocator doesn't always return memory
to the system when the objects are freed up, for various reasons
including the way that memory pages get allocated from. But it
internally knows which parts are in use and which parts aren't. You're
seeing the RSS go down slightly at some points, which would be the
times when entire pages can be released; but other than that, what
you'll end up with is a sort of high-water-mark with lots of unused
space inside it.

So what you're seeing isn't actual objects being cached, but just
memory ready to be populated with future objects.


Thank you and Richard for your responses, this makes perfect sense now.

Cheers,

--
José María (Chema) Mateos || https://rinzewind.org/
--
https://mail.python.org/mailman/listinfo/python-list


Pickle caching objects?

2019-11-30 Thread José María Mateos

Hi,

I just asked this question on the IRC channel but didn't manage to get a 
response, though some people replied with suggestions that expanded this 
question a bit.


I have a program that has to read some pickle files, perform some 
operations on them, and then return. The pickle objects I am reading all 
have the same structure, which consists of a single list with two 
elements: the first one is a long list, the second one is a numpy 
object.


I found out that, after calling that function, the memory taken by the 
Python executable (monitored using htop -- the entire thing runs on 
Python 3.6 on an Ubuntu 16.04, pretty standard conda installation with a 
few packages installed directly using `conda install`) increases in 
proportion to the size of the pickle object being read. My intuition is 
that that memory should be free upon exiting.


Does pickle keep a cache of objects in memory after they have been 
returned? I thought that could be the answer, but then someone suggested 
to measure the time it takes to load the objects. This is a script I 
wrote to test this; nothing(filepath) just loads the pickle file, 
doesn't do anything with the output and returns how long it took to 
perform the load operation.


---
import glob
import pickle
import timeit
import os
import psutil

def nothing(filepath):
   start = timeit.default_timer()
   with open(filepath, 'rb') as f:
   _ = pickle.load(f)
   return timeit.default_timer() - start

if __name__ == "__main__":

   filelist = glob.glob('/tmp/test/*.pk')

   for i, filepath in enumerate(filelist):
   print("Size of file {}: {}".format(i, os.path.getsize(filepath)))
   print("First call:", nothing(filepath))
   print("Second call:", nothing(filepath))
   print("Memory usage:", psutil.Process(os.getpid()).memory_info().rss)
   print()
---

This is the output of the second time the script was run, to avoid any 
effects of potential IO caches:


---
Size of file 0: 11280531
First call: 0.1466723980847746
Second call: 0.10044755204580724
Memory usage: 49418240

Size of file 1: 8955825
First call: 0.07904054620303214
Second call: 0.07996074995025992
Memory usage: 49831936

Size of file 2: 43727266
First call: 0.37741047400049865
Second call: 0.38176894187927246
Memory usage: 49758208

Size of file 3: 31122090
First call: 0.271301960805431
Second call: 0.27462846506386995
Memory usage: 49991680

Size of file 4: 634456686
First call: 5.526095286011696
Second call: 5.558765463065356
Memory usage: 539324416

Size of file 5: 3349952658
First call: 29.50982437795028
Second call: 29.461691531119868
Memory usage: 3443597312

Size of file 6: 9384929
First call: 0.0826977719552815
Second call: 0.08362263604067266
Memory usage: 3443597312

Size of file 7: 422137
First call: 0.0057482069823890924
Second call: 0.005949910031631589
Memory usage: 3443597312

Size of file 8: 409458799
First call: 3.562588643981144
Second call: 3.6001368327997625
Memory usage: 3441451008

Size of file 9: 44843816
First call: 0.3913297887245
Second call: 0.398518088972196
Memory usage: 3441451008
---

Notice that memory usage increases noticeably specially on files 4 and 
5, the biggest ones, and doesn't come down as I would expect it to. But 
the loading time is constant, so I think I can disregard any pickle 
caching mechanisms.


So I guess now my question is: can anyone give me any pointers as to why 
is this happening? Any help is appreciated.


Thanks,

--
José María (Chema) Mateos || https://rinzewind.org/
--
https://mail.python.org/mailman/listinfo/python-list


Re: Random signal capture when using multiprocessing

2019-07-06 Thread José María Mateos
On Sat, Jul 06, 2019 at 04:54:42PM +1000, Chris Angelico wrote:

> But if I comment out the signal.signal line, there seem to be no ill
> effects. I suspect that what you're seeing here is the multiprocessing
> module managing its own subprocesses, telling some of them to shut
> down. I added a print call to multiprocessing/popen_fork.py inside
> _send_signal (line 53 or thereabouts depending on Python version) and
> saw a *lot* of termination signals being sent; only a few actually
> triggered the exception message. My guess is that most of the time,
> the SIGTERM is smoothly handled as part of the Pool's __exit__ method,
> but sometimes the child process is blocked on something, and has to
> be told to shut down; and then normally, the signal gets caught and
> handled just fine, but since you're explicitly hooking it, you get to
> see it.

Ok, that makes all the sense in the world, thanks for digging into this.

-- 
José María (Chema) Mateos || https://rinzewind.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Random signal capture when using multiprocessing

2019-07-05 Thread José María Mateos
Hi,

This is a minimal proof of concept for something that has been bugging me for a 
few days:

```
$ cat signal_multiprocessing_poc.py 

import random
import multiprocessing
import signal
import time

def signal_handler(signum, frame):
raise Exception(f"Unexpected signal {signum}!")

def process_message(args):
time.sleep(random.random() / 100.)

if __name__ == "__main__":
signal.signal(signal.SIGTERM, signal_handler)
n_round = 1
while n_round < 10:
job_list = [x for x in range(random.randint(100, 400))]
print(f"Running round {n_round} with {len(job_list)} jobs")
with multiprocessing.Pool(8) as p1:
p1.map(process_message, job_list)
n_round += 1

```

So basically I have some subprocesses that don't do anything, just sleep for a 
few milliseconds, and I capture SIGTERM signals. I don't expect 

This is the output:

```
$ python signal_multiprocessing_poc.py 
Running round 1 with 244 jobs
Running round 2 with 151 jobs
Running round 3 with 173 jobs
Running round 4 with 124 jobs
Running round 5 with 249 jobs
Running round 6 with 359 jobs
Process ForkPoolWorker-48:
Traceback (most recent call last):
  File 
"/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/process.py", line 
297, in _bootstrap
self.run()
  File 
"/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/process.py", line 
99, in run
self._target(*self._args, **self._kwargs)
  File "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/pool.py", 
line 110, in worker
task = get()
  File 
"/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/queues.py", line 
352, in get
res = self._reader.recv_bytes()
  File 
"/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/synchronize.py", 
line 98, in __exit__
return self._semlock.__exit__(*args)
  File "signal_multiprocessing_poc.py", line 7, in signal_handler
raise Exception(f"Unexpected signal {signum}!")
Exception: Unexpected signal 15!
Running round 7 with 185 jobs
Running round 8 with 246 jobs
Running round 9 with 217 jobs
Process ForkPoolWorker-68:
Traceback (most recent call last):
  File 
"/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/process.py", line 
297, in _bootstrap
self.run()
  File 
"/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/process.py", line 
99, in run
self._target(*self._args, **self._kwargs)
  File "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/pool.py", 
line 110, in worker
task = get()
  File 
"/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/queues.py", line 
352, in get
res = self._reader.recv_bytes()
  File 
"/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/synchronize.py", 
line 98, in __exit__
return self._semlock.__exit__(*args)
  File "signal_multiprocessing_poc.py", line 7, in signal_handler
raise Exception(f"Unexpected signal {signum}!")
Exception: Unexpected signal 15!
```

Can anyone help me understand why the random SIGTERM gets captured? I guess 
some of the children processes receive the signal for termination and that's 
why I see it, but I don't understand the randomness of it.

Of course, as a coworker just pointed out, this works (telling the children 
processes to ignore the signal handler):

```
import random
import multiprocessing
import signal
import time

def signal_handler(signum, frame):
raise Exception(f"Unexpected signal {signum}!")

def init_worker():
signal.signal(signal.SIGTERM, signal.SIG_IGN)
signal.signal(signal.SIGINT, signal.SIG_IGN)

def process_message(args):
time.sleep(random.random() / 100.)

if __name__ == "__main__":

signal.signal(signal.SIGTERM, signal_handler)
signal.signal(signal.SIGINT, signal_handler)
n_round = 1

while n_round < 20:
job_list = [x for x in range(random.randint(100, 400))]
print(f"Running round {n_round} with {len(job_list)} jobs")
with multiprocessing.Pool(8,init_worker) as p1:
p1.map(process_message, job_list)
n_round += 1
```

Thanks for your help,

-- 
José María (Chema) Mateos || https://rinzewind.org
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: send PIL.Image to django server side and get it back

2018-07-19 Thread José María Mateos
On Mon, Jul 16, 2018 at 06:40:45AM -0700, Christos Georgiou - ΤΖΩΤΖΙΟΥ wrote:
> You need first to serialize the object to bytes that can go over the 
> wire. There is no predefined way to do that, so you can:
> 
> >>> import io
> >>> file_like_object = io.BytesIO()
> >>> PILImage.save(file_like_object, format='png')
> 
> and then in your POST request send file_like_object.getvalue() as the 
> image data.  You will most probably need to add a Content-Type: 
> image/png as a header.

If you definitely need to send the data as a string, because you want to 
use a JSON object or similar, I've done this in the past using base64 
encoding.

https://docs.python.org/2/library/base64.html

Cheers,

-- 
José María (Chema) Mateos
https://rinzewind.org/blog-es || https://rinzewind.org/blog-en
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Where's the junk coming from?

2018-06-26 Thread José María Mateos
On Mon, Jun 25, 2018, at 13:37, Mark Lawrence wrote:
> More of the flaming things, this time name@1261/38.remove-ij1-this.  Any 
> ideas as I don't understand this stuff?

I've contacted the list admin about this. It doesn't seem like it's going to go 
away on its own. I just received another batch, for what it's worth.

Cheers,

-- 
José María (Chema) Mateos
https://rinzewind.org/blog-es || https://rinzewind.org/blog-en
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Where's the junk coming from?

2018-06-25 Thread José María Mateos
On Sun, Jun 24, 2018 at 09:39:33PM +0100, Mark Lawrence wrote:
> Hi folks,
> 
> In the last hour or so I've seen via thunderbird and gmane around 15
> emails from various people where the from field is
> name@1261/38.remove-r7u-this.  The part after the @ symbol never
> changes.  I've seen the contents previously, apart from one from the
> RUE.  Users' complete email addresses are given right at the top.
> What gives?

Same for me. Could it be a news to mailing list gateway? I've found this 
header in some of the offending messages:

X-Gateway: castlerockbbs.com [Synchronet 3.17a-Linux NewsLink 1.108]

Cheers,

-- 
José María (Chema) Mateos
https://rinzewind.org/blog-es || https://rinzewind.org/blog-en
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: a Python bug report

2018-05-29 Thread José María Mateos
On Wed, May 30, 2018 at 01:07:38AM +, Ruifeng Guo wrote:
> Hello,
> We encountered a bug in Python recently, we checked the behavior for Python 
> version 2.7.12, and 3.1.1, both version show the same behavior. Please see 
> below the unexpected behavior in "red text".

Have you tried the round() function, however?

In [1]: round(1000 * 1.017)
Out[1]: 1017.0

This is a floating point precision "issue". int() only gets rid of the 
decimals.

In [2]: int(3.9)
Out[2]: 3

Because:

In [3]: 1000 * 1.017
Out[3]: 1016.9

So there you have it.

Some more reading: 
https://stackoverflow.com/questions/43660910/python-difference-between-round-and-int

Cheers,

-- 
José María (Chema) Mateos
https://rinzewind.org/blog-es || https://rinzewind.org/blog-en
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Usenet Gateway

2018-05-24 Thread José María Mateos
On Thu, May 24, 2018, at 09:10, Chris Green wrote:
> > Yes I can mark an entire thread as "read" in IMAP.
> > 
> A *thread* yes, but not a whole list.  I.e. if you read this using
> mail/IMAP you can mark a thread read but you can't mark *all* Python
> list messages read in one go can you?   With tin/Usenet I look at the
> list of new subjects in the Python group, I may investigate a couple
> of threads, then I just hit 'C' and all of the Python group is marked
> as read.

Yes, you can, at least with mutt. I have a handy alias (ESC + m) that 
accomplish precisely that.

Cheers,

-- 
José María (Chema) Mateos
https://rinzewind.org/blog-es || https://rinzewind.org/blog-en
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Spam levels.

2018-05-21 Thread José María Mateos
On Mon, May 21, 2018 at 10:00:41AM +0200, m wrote:
> I also almost stopped reading c.l.python, because of enormous spam
> levels. Do I have any option to read it without spam, other than launch
> my own filtering NNTP server and do whack the mole game for myself?
> 
> Maybe join forces and establish such server for public use?

If you're willing to let NNTP access go, the mailing list works 
perfectly fine and is virtually spam-free.

Cheers,

-- 
José María (Chema) Mateos
https://rinzewind.org/blog-es || https://rinzewind.org/blog-en
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: syntax oddities

2018-05-18 Thread José María Mateos
On Fri, May 18, 2018 at 02:55:52PM +, Grant Edwards wrote:
> You work someplace pretty unique.  Everyplace I've worked has done the
> whole top-posting and include the whole damn thread in reverse order
> thing.  It just doesn't work.  The attached reverse-chronological
> history doesn't seem to do _any_ good at all.  AFAICT, nobody ever
> reads it.  Occasionally somebody will refer opaquely to something with
> the phrase "see below" -- but there's never any indication to _what_
> among the fifteen messages and thirty attachements they are referring.

In my experience, this "e-mail-that-contains-the-entire-conversation" is 
useful if and only if you happen to receive a forwarded copy so you know 
something you were not previously aware of. Otherwise, replies just 
accumulate past conversations because people are too lazy to bother.

I wouldn't dare inline-replying in my current Outlook corporate 
environment. I just top-post, don't trim, go with the flow.

Cheers,

-- 
José María Mateos
https://rinzewind.org/blog-es || https://rinzewind.org/blog-en
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: syntax oddities

2018-05-18 Thread José María Mateos
On Thu, May 17, 2018 at 07:56:41AM -0700, Rich Shepard wrote:
> Allow me to add an additional reason for trimming and responding 
> beneath each quoted section: it puts the response in the proper 
> context.

And another one I learned recently on a similar conversation on another 
mailing list (that of the e-mail client I'm using right now): it is very 
useful for searches. Every e-mail contains just the right amount of text 
necessary to be properly read, as opposed to a more or less complete 
copy of the current thread.

Cheers,

-- 
José María Mateos
https://rinzewind.org/blog-es || https://rinzewind.org/blog-en
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pandas, create new column if previous column(s) are not in [None, '', np.nan]

2018-04-11 Thread José María Mateos
On Wed, Apr 11, 2018, at 14:48, zljubi...@gmail.com wrote:
> I have a dataframe:
> [...]

This seems to work:

df1 = pd.DataFrame( { 'A' : ['a', 'b', '', None, np.nan],
  'B'  : [None, np.nan, 'a', 
'b', '']})
df1['C'] = df1[['A', 'B']].apply(lambda x: x[0] if x[1] in [None, '', np.nan] 
else x[1], axis = 1)

Two notes:

- Do apply() on axis = 1, so you process every row.
- You lambda function wasn't entirely correct, if I understood what you wanted 
to do.

Cheers,

-- 
José María (Chema) Mateos
https://rinzewind.org/blog-es || https://rinzewind.org/blog-en
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: psutil

2018-02-27 Thread José María Mateos
On Tue, Feb 27, 2018 at 07:29:50PM -0500, Larry Martell wrote:
> Trying to install psutil (with pip install psutil) on Red Hat EL 7.
> It's failing with:
> 
> Python.h: No such file or directory

Two questions come to my mind:

- Does it work if you try to install some other package?
- Is `pip` by any change trying to install a Python 3 package, but you 
  only have the libraries for Python 2 installed?

Cheers,

-- 
José María Mateos
https://rinzewind.org/blog-es || https://rinzewind.org/blog-en
-- 
https://mail.python.org/mailman/listinfo/python-list