Should PyImport_ImportModule be threadsafe when importing from zipfiles?

2020-02-20 Thread Geoff Bache
Hi all,

I have some embedded Python code which looks like this in C++

_gstate = PyGILState_Ensure();
PyImport_ImportModule("a");
...
PyGILState_Release(_gstate);

and is called from different threads which are created in C++.

My module a.py then imports another module b in python, which defines a lot
of functions.

When several threads execute this simultaneously I often get a stacktrace
saying some function near the end of module b is not defined, presumably
because the module has been imported part-initialised.
This only seems to happen when my Python modules are packaged in a zip
file, not when they are ordinary files on disk.

I have observed this in both Python 3.7 and Python 3.8. Does anyone have
any insights or suggestions for how to debug this? It seems likely to be
hard to produce a reproducible test case.

Regards,
Geoff Bache
-- 
https://mail.python.org/mailman/listinfo/python-list


What's the best forum to get help with Pandas?

2020-02-20 Thread Luca



subject has it all. Thanks
--
https://mail.python.org/mailman/listinfo/python-list


Re: Paper Print Help

2020-02-20 Thread Rhodri James

On 20/02/2020 15:08, Duram wrote:

On 19/02/2020 12:17, Rhodri James wrote:

On 19/02/2020 14:22, Duram via Python-list wrote:
I have a drawing in a .gif file with (a,b) pixels and want to 
paperprint it in a position (x,y), what would be the code?


What have you tried?


Nothing, I did not find the module that make to print to the paper


Please don't reply to me directly; if it's a question worth asking in 
public then it's worth answering in public too!


OK, let's backtrack a bit.  First off, what operating system are you using?

Second, do you have this GIF file in any sort of program at the moment, 
or do you want advice on how to write a program to handle the image?  I 
suspect your question is a bit too specific at the moment, and you have 
some mistaken assumptions about how images and (most especially) 
printing work.


--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list


Re: Should PyImport_ImportModule be threadsafe when importing from zipfiles?

2020-02-20 Thread Chris Angelico
On Fri, Feb 21, 2020 at 2:37 AM Geoff Bache  wrote:
> When several threads execute this simultaneously I often get a stacktrace
> saying some function near the end of module b is not defined, presumably
> because the module has been imported part-initialised.
> This only seems to happen when my Python modules are packaged in a zip
> file, not when they are ordinary files on disk.
>
> I have observed this in both Python 3.7 and Python 3.8. Does anyone have
> any insights or suggestions for how to debug this? It seems likely to be
> hard to produce a reproducible test case.

One easy way to probe the bug would be to pre-import the module before
starting any secondary threads. If you ever get the problem under that
pattern, then it's not a concurrency problem (IOW have fun figuring
out what *is* the problem).

Another thing to try: Slap a print call at the top and bottom of the
module. See if you get multiple of them.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pandas rookie

2020-02-20 Thread musbur
On Wed, 19 Feb 2020 17:15:59 -0500
FilippoM  wrote:

> How can I use Pandas' dataframe magic to calculate, for each of the 
> possible 109 values, how many have VIDEO_OK, and how many have 
> VIDEO_FAILURE I have respectively?

crosstab()
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: What's the best forum to get help with Pandas?

2020-02-20 Thread Skip Montanaro
I believe the Pandas people tend to refer people to Stack Overflow. I
find that suboptimal as many questions go completely unanswered or get
gruff responses. Aside from that, I suspect this list is as good a
place as any to request help.

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Should PyImport_ImportModule be threadsafe when importing from zipfiles?

2020-02-20 Thread Chris Angelico
On Fri, Feb 21, 2020 at 3:06 AM Geoff Bache  wrote:
>
> Hi Chris,
>
> Yes, I've tried both of these things already. I can confirm there are 
> multiple calls, and that pre-importing the module fixes it. But pre-importing 
> it is not a solution in practice.
>

Cool, good to know.

Crazy idea: What would happen if you stick something temporarily into
sys.modules["b"], then when you're done importing, set the module back
into there? Might not help, but would be interesting to try, and might
show a bit more of what's going on.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Should PyImport_ImportModule be threadsafe when importing from zipfiles?

2020-02-20 Thread Geoff Bache
Hi Chris,

Yes, I've tried both of these things already. I can confirm there are
multiple calls, and that pre-importing the module fixes it. But
pre-importing it is not a solution in practice.

Regards,
Geoff

On Thu, Feb 20, 2020 at 4:45 PM Chris Angelico  wrote:

> On Fri, Feb 21, 2020 at 2:37 AM Geoff Bache  wrote:
> > When several threads execute this simultaneously I often get a stacktrace
> > saying some function near the end of module b is not defined, presumably
> > because the module has been imported part-initialised.
> > This only seems to happen when my Python modules are packaged in a zip
> > file, not when they are ordinary files on disk.
> >
> > I have observed this in both Python 3.7 and Python 3.8. Does anyone have
> > any insights or suggestions for how to debug this? It seems likely to be
> > hard to produce a reproducible test case.
>
> One easy way to probe the bug would be to pre-import the module before
> starting any secondary threads. If you ever get the problem under that
> pattern, then it's not a concurrency problem (IOW have fun figuring
> out what *is* the problem).
>
> Another thing to try: Slap a print call at the top and bottom of the
> module. See if you get multiple of them.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Idiom for partial failures

2020-02-20 Thread David Wihl

(first post)

I'm working on the Python client library [0]for the Google Ads API [1]. In some 
cases, we can start a request with a partial failure [2] flag = True. This 
means that the request may contain say 1000 operations. If any of the 
operations fail, the request will return with a success status without an 
exception. Then the developer has to iterate through the list of operation 
return statuses to determine which specific ones failed (example [3]).

I believe that it would be more idiomatic in Python (and other languages like 
Ruby) to throw an exception when one of these partial errors occur. That way 
there would be the same control flow if a major or minor error occurred. 

The team is asking me for other examples or justification of this being 
idiomatic of Python. Can you recommend any examples or related best practices?

[0] https://github.com/googleads/google-ads-python

[1] https://developers.google.com/google-ads/api/docs/start

[2] 
https://developers.google.com/google-ads/api/docs/best-practices/partial-failures

[3] 
https://github.com/googleads/google-ads-python/blob/master/examples/error_handling/handle_partial_failure.py

Thanks!
-David
-- 
https://mail.python.org/mailman/listinfo/python-list


Machine Learning program outputting wrong things

2020-02-20 Thread rgladkik
I am writing a program for an assignment (in a course I am auditing). I am 
pasting it below:





# Assignment 2 skeleton code
# This code shows you how to use the 'argparse' library to read in parameters

import argparse
import math
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random as rn
from dispkernel import dispKernel

# Command Line Arguments

parser = argparse.ArgumentParser(description='generate training and validation 
data for assignment 2')
parser.add_argument('trainingfile', help='name stub for training data and label 
output in csv format', default="train")
parser.add_argument('validationfile', help='name stub for validation data and 
label output in csv format', default="valid")
parser.add_argument('-numtrain', help='number of training samples', type= int, 
default=200)
parser.add_argument('-numvalid', help='number of validation samples', type= 
int, default=20)
parser.add_argument('seed', help='random seed', type= int, default=1)
parser.add_argument('learningrate', help='learning rate', type= float, 
default=0.1)
parser.add_argument('actfunction', help='activation functions', 
choices=['sigmoid', 'relu', 'linear'], default='linear')
parser.add_argument('numepoch', help='number of epochs', type= int, default=50)

args = parser.parse_args()

traindataname = args.trainingfile + "data.csv"
trainlabelname = args.trainingfile + "label.csv"

print("training data file name: ", traindataname)
print("training label file name: ", trainlabelname)

validdataname = args.validationfile + "data.csv"
validlabelname = args.validationfile + "label.csv"

print("validation data file name: ", validdataname)
print("validation label file name: ", validlabelname)

print("number of training samples = ", args.numtrain)
print("number of validation samples = ", args.numvalid)

print("learning rate = ", args.learningrate)
print("number of epoch = ", args.numepoch)

print("activation function is ", args.actfunction)

# read in training data
t_data = pd.read_csv(args.trainingfile, ',', header=None).values
t_label = pd.read_csv('trainlabel.csv', ',', header=None).values
row_dim_t = t_data.shape[1]
col_dim_t = t_data.shape[0]

# read in validation data
v_data = pd.read_csv(args.validationfile, ',', header=None).values
v_label = pd.read_csv('validlabel.csv', ',', header=None).values
row_dim_v = v_data.shape[1]
col_dim_v = v_data.shape[0]

np.random.seed(args.seed)

# initialize weights
w = np.random.rand(row_dim_t, 1)

# initialize bias
b = np.random.uniform(0, 1)

n_epoch = []
loss_t = []
loss_v = []
accuracy_t = []
accuracy_v = []

Z_t = np.zeros([col_dim_t, 1])
Z_v = np.zeros([col_dim_v, 1])
guess_t_label = np.zeros([col_dim_t, 1])
guess_v_label = np.zeros([col_dim_v, 1])
accuracy_j_t = np.zeros([col_dim_t, 1])
accuracy_j_v = np.zeros([col_dim_v, 1])
Y_t = np.zeros([col_dim_t, 1])
Y_v = np.zeros([col_dim_v, 1])
loss_j_t = np.zeros([col_dim_t, 1])
loss_j_v = np.zeros([col_dim_v, 1])
grad_loss_w = np.zeros([col_dim_t, row_dim_v])
grad_loss_b = np.zeros([col_dim_t, 1])

class Linear:
def __init__(self, Z, data, label):
self.Z = Z
self.data = data
self.label = label

# pass Z through an activation function
def act_linear(self):
return self.Z

# gradient of the loss wrt weights
def grad_loss_w(self, Y):
return 2*(Y - self.label)*self.data

# gradient of the loss wrt bias
def grad_loss_b(self, Y):
return 2*(Y - self.label)

for i in range(0, args.numepoch):
n_epoch.append(i)

# calculate predictor for training data
Z_t[:] = np.dot(t_data[:, :], w[:]) + b

# predict training label based on output
# of predictor
guess_t_label[:] = (Z_t >= 0.5)
# determine whether predicted label is
# correct or not
accuracy_j_t[:] = 1 - np.absolute(guess_t_label[:] - t_label[:])
# calculate accuracy for training data
accuracy_t.append(np.sum(accuracy_j_t[:], axis=0)/col_dim_t)

# calculate predictor for validation data
Z_v[:] = np.dot(v_data[:, :], w[:]) + b

# predict validation label based on output
# of predictor
guess_v_label[:] = (Z_v >= 0.5)
# determine whether predicted label is
# correct or not
accuracy_j_v[:] = 1 - np.absolute(guess_v_label[:] - v_label[:])
# calculate accuracy for validation data
accuracy_v.append(np.sum(accuracy_j_v[:], axis=0)/col_dim_v)

l_t = Linear(Z_t, t_data, t_label)
l_v = Linear(Z_v, v_data, v_label)

# pass Z through an activation function
Y_t[:] = l_t.act_linear()
Y_v[:] = l_v.act_linear()

# calculate loss across all training data
loss_j_t[:] = (Y_t[:] - t_label)**2
loss_t.append(np.sum(loss_j_t[:], axis=0)/col_dim_t)

# calculate loss across all validation data
loss_j_v[:] = (Y_v[:] - v_label)**2
loss_v.append(np.sum(loss_j_v[:], axis=0)/col_dim_v)

grad_loss_w[:] = l_t.grad_loss_w(Y_t)
grad_loss_b[:] = l_t.grad_loss_b(Y_t)

# average gradient across all inputs

Re: Idiom for partial failures

2020-02-20 Thread Ethan Furman

On 02/20/2020 09:30 AM, David Wihl wrote:


I'm working on the Python client library for the Google Ads API. In some cases, 
we can start a request with a partial failure flag = True. This means that the 
request may contain say 1000 operations. If any of the operations fail, the 
request will return with a success status without an exception. Then the 
developer has to iterate through the list of operation return statuses to 
determine which specific ones failed.

I believe that it would be more idiomatic in Python (and other languages like 
Ruby) to throw an exception when one of these partial errors occur. That way 
there would be the same control flow if a major or minor error occurred.


What I have done in such circumstances is have a custom exception type, and 
store the errors there -- how much detail you save depends on how much you need 
to properly post-process the failed items.

For example:

# not tested

class PartialFailure(Exception):
def __init__(self, failures):
self.failures = failures

... lots of code ...
failures = []

try:
an_operation_that_could_fail
except SomeException:
failures.append(debugging info)

... more code ...

if failures:
raise PartialFailure(failures)

# in the calling code
try:
code_that_may_raise_partial_failures
except PartialFailure as e:
for failure in e.failures:
   handle_failure

--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list


Re: Idiom for partial failures

2020-02-20 Thread Rob Gaddi

On 2/20/20 9:30 AM, David Wihl wrote:


(first post)

I'm working on the Python client library [0]for the Google Ads API [1]. In some 
cases, we can start a request with a partial failure [2] flag = True. This 
means that the request may contain say 1000 operations. If any of the 
operations fail, the request will return with a success status without an 
exception. Then the developer has to iterate through the list of operation 
return statuses to determine which specific ones failed (example [3]).

I believe that it would be more idiomatic in Python (and other languages like 
Ruby) to throw an exception when one of these partial errors occur. That way 
there would be the same control flow if a major or minor error occurred.

The team is asking me for other examples or justification of this being 
idiomatic of Python. Can you recommend any examples or related best practices?

[0] https://github.com/googleads/google-ads-python

[1] https://developers.google.com/google-ads/api/docs/start

[2] 
https://developers.google.com/google-ads/api/docs/best-practices/partial-failures

[3] 
https://github.com/googleads/google-ads-python/blob/master/examples/error_handling/handle_partial_failure.py

Thanks!
-David



Potentially stupid question, does your library to the API have to support 
partial failures?  Sure Google Ads does, but does it really add value to do so? 
And then you've got to decide what to DO with those partially failed results.


You could, as a library, just say "No, this library never sets the 
partial_failure field.  We only support atomic transactions that fully succeed 
or fully fail and raise an exception."  Makes your life a lot easier; so unless 
it makes your customer's lives a lot harder...

--
https://mail.python.org/mailman/listinfo/python-list


Re: Idiom for partial failures

2020-02-20 Thread Tim Chase
On 2020-02-20 13:30, David Wihl wrote:
> I believe that it would be more idiomatic in Python (and other
> languages like Ruby) to throw an exception when one of these
> partial errors occur. That way there would be the same control flow
> if a major or minor error occurred. 

There are a variety of ways to do it -- I like Ethan's suggestion
about tacking the failures onto the exception and raising it at the
end.  But you can also yield a tuple of success/failure iterators,
something like this pseudocode:

  def process(input_iter):
successes = []
failures = []
for thing in input_iter:
  try:
result = process(thing)
  except ValueError as e: # catch appropriate exception(s) here
failures.append((e, thing))
  else:
successes.append((result, thing))
return successes, failures

  def process(item):
if item % 3:
  raise ValueError("Must be divisible by 3!")
else:
  print(item)
  return item // 3

  successes, failures = process(range(10))

  for reason, thing in failures:
print(f"Unable to process {thing} because {reason}")

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Idiom for partial failures

2020-02-20 Thread Richard Damon

On 2/20/20 12:30 PM, David Wihl wrote:


(first post)

I'm working on the Python client library [0]for the Google Ads API [1]. In some 
cases, we can start a request with a partial failure [2] flag = True. This 
means that the request may contain say 1000 operations. If any of the 
operations fail, the request will return with a success status without an 
exception. Then the developer has to iterate through the list of operation 
return statuses to determine which specific ones failed (example [3]).

I believe that it would be more idiomatic in Python (and other languages like 
Ruby) to throw an exception when one of these partial errors occur. That way 
there would be the same control flow if a major or minor error occurred.

The team is asking me for other examples or justification of this being 
idiomatic of Python. Can you recommend any examples or related best practices?

[0] https://github.com/googleads/google-ads-python

[1] https://developers.google.com/google-ads/api/docs/start

[2] 
https://developers.google.com/google-ads/api/docs/best-practices/partial-failures

[3] 
https://github.com/googleads/google-ads-python/blob/master/examples/error_handling/handle_partial_failure.py

Thanks!
-David


My first thought is that partial failure also means partial success, 
which means that it shouldn't throw.


One key thing about exceptions is that they provide an easy way to ABORT 
a current operation on an error and trap to an error handler. They allow 
you to begin the operation starting a try block, have a number of 
operations that don't need to worry about all the little error 
conditions that would force you to abort the operation, and then handle 
all the errors at the end, and the code in the middle doesn't need to 
propagate the error codes, as the exception carries that information. If 
you throw on a partial failure/partial success, then you really need to 
IMMEDIATELY catch the error so you can continue the successes, at which 
point returning an error value is normally clearer.


Now, an alternative would be rather than throwing an exception, would be 
to effectively raise a signal and call an error callback with the 
details of the failed operation, and let the call back handle (or 
record) the error information, so the caller doesn't need to go through 
all the successes to see if there was an error.


Now, if the code was going to iterate through the success anyway, then 
there isn't as much of a cost to detect the errors that occured.


--
Richard Damon

--
https://mail.python.org/mailman/listinfo/python-list


Re: Idiom for partial failures

2020-02-20 Thread DL Neil via Python-list

On 21/02/20 10:05 AM, Stefan Ram wrote:

David Wihl  writes:

I believe that it would be more idiomatic in Python (and other languages lik=
e Ruby) to throw an exception when one of these partial errors occur.


   I wonder whether the term "idiomatic" is too heavily
   burdened here.

   Python offers an idiom for handling a failure to do an
   operation - the exception.

   But you have special case of a run-time program of many
   operations, and you want the execution of the program to
   continue even in the case of failures to execute some of
   the operations. I am not aware of Python offering an idiom
   for this special case.

   I am always thinking about terminology first. So I'd say:
   You need to /define/ what "success" and "failure" means in
   the case of your operation. Then return normally in the case
   of success and return by exception in the case of failure.



Not arguing - and if I were the logical answer is "not Python", but...

PyTest (runs unit-tests, enables TDD) offers options to stop the 
test-runner at the first error-found or to keep going for up to 
n-errors. We can test for an error condition and if experienced such 
will not 'count' or stop the test-run. It distinguishes between 
hard-errors and warnings, and enables us to choose to 'elevate' one to 
the level of the other, and vice-versa. Its run-report lists which tests 
worked and which didn't (the 'last' or the up-to "n") - a clear advance 
on the OP's experience.



I think someone has already discussed, but if the intent is 
bulk-processing ("batch processing") then I'd expect to find a means of 
'bailing-out' when an error is found, or of back-tracking to locate errors.


Perhaps, like my good-looks, my expectations of a professional standard 
are higher-than-average?

--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list


Asyncio question

2020-02-20 Thread Frank Millman

Hi all

I use asyncio in my project, and it works very well without my having to 
understand what goes on under the hood. It is a multi-user client/server 
system, and I want it to scale to many concurrent users. I have a 
situation where I have to decide between two approaches, and I want to 
choose the least resource-intensive, but I find it hard to reason about 
which, if either, is better.


I use HTTP. On the initial connection from a client, I set up a session 
object, and the session id is passed to the client. All subsequent 
requests from that client include the session id, and the request is 
passed to the session object for handling.


It is possible for a new request to be received from a client before the 
previous one has been completed, and I want each request to be handled 
atomically, so each session maintains its own asyncio.Queue(). The main 
routine gets the session id from the request and 'puts' the request in 
the appropriate queue. The session object 'gets' from the queue and 
handles the request. It works well.


The question is, how to arrange for each session to 'await' its queue. 
My first attempt was to create a background task for each session which 
runs for the life-time of the session, and 'awaits' its queue. It works, 
but I was concerned about having a lot a background tasks active at the 
same time.


Then I came up with what I thought was a better idea. On the initial 
connection, I create the session object, send the response to the 
client, and then 'await' the method that sets up the session's queue. 
This also works, and there is no background task involved. However, I 
then realised that the initial response handler never completes, and 
will 'await' until the session is closed.


Is this better, worse, or does it make no difference? If it makes no 
difference, I will lean towards the first approach, as it is easier to 
reason about what is going on.


Thanks for any advice.

Frank Millman

--
https://mail.python.org/mailman/listinfo/python-list