Should PyImport_ImportModule be threadsafe when importing from zipfiles?
Hi all, I have some embedded Python code which looks like this in C++ _gstate = PyGILState_Ensure(); PyImport_ImportModule("a"); ... PyGILState_Release(_gstate); and is called from different threads which are created in C++. My module a.py then imports another module b in python, which defines a lot of functions. When several threads execute this simultaneously I often get a stacktrace saying some function near the end of module b is not defined, presumably because the module has been imported part-initialised. This only seems to happen when my Python modules are packaged in a zip file, not when they are ordinary files on disk. I have observed this in both Python 3.7 and Python 3.8. Does anyone have any insights or suggestions for how to debug this? It seems likely to be hard to produce a reproducible test case. Regards, Geoff Bache -- https://mail.python.org/mailman/listinfo/python-list
What's the best forum to get help with Pandas?
subject has it all. Thanks -- https://mail.python.org/mailman/listinfo/python-list
Re: Paper Print Help
On 20/02/2020 15:08, Duram wrote: On 19/02/2020 12:17, Rhodri James wrote: On 19/02/2020 14:22, Duram via Python-list wrote: I have a drawing in a .gif file with (a,b) pixels and want to paperprint it in a position (x,y), what would be the code? What have you tried? Nothing, I did not find the module that make to print to the paper Please don't reply to me directly; if it's a question worth asking in public then it's worth answering in public too! OK, let's backtrack a bit. First off, what operating system are you using? Second, do you have this GIF file in any sort of program at the moment, or do you want advice on how to write a program to handle the image? I suspect your question is a bit too specific at the moment, and you have some mistaken assumptions about how images and (most especially) printing work. -- Rhodri James *-* Kynesim Ltd -- https://mail.python.org/mailman/listinfo/python-list
Re: Should PyImport_ImportModule be threadsafe when importing from zipfiles?
On Fri, Feb 21, 2020 at 2:37 AM Geoff Bache wrote: > When several threads execute this simultaneously I often get a stacktrace > saying some function near the end of module b is not defined, presumably > because the module has been imported part-initialised. > This only seems to happen when my Python modules are packaged in a zip > file, not when they are ordinary files on disk. > > I have observed this in both Python 3.7 and Python 3.8. Does anyone have > any insights or suggestions for how to debug this? It seems likely to be > hard to produce a reproducible test case. One easy way to probe the bug would be to pre-import the module before starting any secondary threads. If you ever get the problem under that pattern, then it's not a concurrency problem (IOW have fun figuring out what *is* the problem). Another thing to try: Slap a print call at the top and bottom of the module. See if you get multiple of them. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pandas rookie
On Wed, 19 Feb 2020 17:15:59 -0500 FilippoM wrote: > How can I use Pandas' dataframe magic to calculate, for each of the > possible 109 values, how many have VIDEO_OK, and how many have > VIDEO_FAILURE I have respectively? crosstab() -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the best forum to get help with Pandas?
I believe the Pandas people tend to refer people to Stack Overflow. I find that suboptimal as many questions go completely unanswered or get gruff responses. Aside from that, I suspect this list is as good a place as any to request help. Skip -- https://mail.python.org/mailman/listinfo/python-list
Re: Should PyImport_ImportModule be threadsafe when importing from zipfiles?
On Fri, Feb 21, 2020 at 3:06 AM Geoff Bache wrote: > > Hi Chris, > > Yes, I've tried both of these things already. I can confirm there are > multiple calls, and that pre-importing the module fixes it. But pre-importing > it is not a solution in practice. > Cool, good to know. Crazy idea: What would happen if you stick something temporarily into sys.modules["b"], then when you're done importing, set the module back into there? Might not help, but would be interesting to try, and might show a bit more of what's going on. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Should PyImport_ImportModule be threadsafe when importing from zipfiles?
Hi Chris, Yes, I've tried both of these things already. I can confirm there are multiple calls, and that pre-importing the module fixes it. But pre-importing it is not a solution in practice. Regards, Geoff On Thu, Feb 20, 2020 at 4:45 PM Chris Angelico wrote: > On Fri, Feb 21, 2020 at 2:37 AM Geoff Bache wrote: > > When several threads execute this simultaneously I often get a stacktrace > > saying some function near the end of module b is not defined, presumably > > because the module has been imported part-initialised. > > This only seems to happen when my Python modules are packaged in a zip > > file, not when they are ordinary files on disk. > > > > I have observed this in both Python 3.7 and Python 3.8. Does anyone have > > any insights or suggestions for how to debug this? It seems likely to be > > hard to produce a reproducible test case. > > One easy way to probe the bug would be to pre-import the module before > starting any secondary threads. If you ever get the problem under that > pattern, then it's not a concurrency problem (IOW have fun figuring > out what *is* the problem). > > Another thing to try: Slap a print call at the top and bottom of the > module. See if you get multiple of them. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Idiom for partial failures
(first post) I'm working on the Python client library [0]for the Google Ads API [1]. In some cases, we can start a request with a partial failure [2] flag = True. This means that the request may contain say 1000 operations. If any of the operations fail, the request will return with a success status without an exception. Then the developer has to iterate through the list of operation return statuses to determine which specific ones failed (example [3]). I believe that it would be more idiomatic in Python (and other languages like Ruby) to throw an exception when one of these partial errors occur. That way there would be the same control flow if a major or minor error occurred. The team is asking me for other examples or justification of this being idiomatic of Python. Can you recommend any examples or related best practices? [0] https://github.com/googleads/google-ads-python [1] https://developers.google.com/google-ads/api/docs/start [2] https://developers.google.com/google-ads/api/docs/best-practices/partial-failures [3] https://github.com/googleads/google-ads-python/blob/master/examples/error_handling/handle_partial_failure.py Thanks! -David -- https://mail.python.org/mailman/listinfo/python-list
Machine Learning program outputting wrong things
I am writing a program for an assignment (in a course I am auditing). I am pasting it below: # Assignment 2 skeleton code # This code shows you how to use the 'argparse' library to read in parameters import argparse import math import matplotlib.pyplot as plt import numpy as np import pandas as pd import random as rn from dispkernel import dispKernel # Command Line Arguments parser = argparse.ArgumentParser(description='generate training and validation data for assignment 2') parser.add_argument('trainingfile', help='name stub for training data and label output in csv format', default="train") parser.add_argument('validationfile', help='name stub for validation data and label output in csv format', default="valid") parser.add_argument('-numtrain', help='number of training samples', type= int, default=200) parser.add_argument('-numvalid', help='number of validation samples', type= int, default=20) parser.add_argument('seed', help='random seed', type= int, default=1) parser.add_argument('learningrate', help='learning rate', type= float, default=0.1) parser.add_argument('actfunction', help='activation functions', choices=['sigmoid', 'relu', 'linear'], default='linear') parser.add_argument('numepoch', help='number of epochs', type= int, default=50) args = parser.parse_args() traindataname = args.trainingfile + "data.csv" trainlabelname = args.trainingfile + "label.csv" print("training data file name: ", traindataname) print("training label file name: ", trainlabelname) validdataname = args.validationfile + "data.csv" validlabelname = args.validationfile + "label.csv" print("validation data file name: ", validdataname) print("validation label file name: ", validlabelname) print("number of training samples = ", args.numtrain) print("number of validation samples = ", args.numvalid) print("learning rate = ", args.learningrate) print("number of epoch = ", args.numepoch) print("activation function is ", args.actfunction) # read in training data t_data = pd.read_csv(args.trainingfile, ',', header=None).values t_label = pd.read_csv('trainlabel.csv', ',', header=None).values row_dim_t = t_data.shape[1] col_dim_t = t_data.shape[0] # read in validation data v_data = pd.read_csv(args.validationfile, ',', header=None).values v_label = pd.read_csv('validlabel.csv', ',', header=None).values row_dim_v = v_data.shape[1] col_dim_v = v_data.shape[0] np.random.seed(args.seed) # initialize weights w = np.random.rand(row_dim_t, 1) # initialize bias b = np.random.uniform(0, 1) n_epoch = [] loss_t = [] loss_v = [] accuracy_t = [] accuracy_v = [] Z_t = np.zeros([col_dim_t, 1]) Z_v = np.zeros([col_dim_v, 1]) guess_t_label = np.zeros([col_dim_t, 1]) guess_v_label = np.zeros([col_dim_v, 1]) accuracy_j_t = np.zeros([col_dim_t, 1]) accuracy_j_v = np.zeros([col_dim_v, 1]) Y_t = np.zeros([col_dim_t, 1]) Y_v = np.zeros([col_dim_v, 1]) loss_j_t = np.zeros([col_dim_t, 1]) loss_j_v = np.zeros([col_dim_v, 1]) grad_loss_w = np.zeros([col_dim_t, row_dim_v]) grad_loss_b = np.zeros([col_dim_t, 1]) class Linear: def __init__(self, Z, data, label): self.Z = Z self.data = data self.label = label # pass Z through an activation function def act_linear(self): return self.Z # gradient of the loss wrt weights def grad_loss_w(self, Y): return 2*(Y - self.label)*self.data # gradient of the loss wrt bias def grad_loss_b(self, Y): return 2*(Y - self.label) for i in range(0, args.numepoch): n_epoch.append(i) # calculate predictor for training data Z_t[:] = np.dot(t_data[:, :], w[:]) + b # predict training label based on output # of predictor guess_t_label[:] = (Z_t >= 0.5) # determine whether predicted label is # correct or not accuracy_j_t[:] = 1 - np.absolute(guess_t_label[:] - t_label[:]) # calculate accuracy for training data accuracy_t.append(np.sum(accuracy_j_t[:], axis=0)/col_dim_t) # calculate predictor for validation data Z_v[:] = np.dot(v_data[:, :], w[:]) + b # predict validation label based on output # of predictor guess_v_label[:] = (Z_v >= 0.5) # determine whether predicted label is # correct or not accuracy_j_v[:] = 1 - np.absolute(guess_v_label[:] - v_label[:]) # calculate accuracy for validation data accuracy_v.append(np.sum(accuracy_j_v[:], axis=0)/col_dim_v) l_t = Linear(Z_t, t_data, t_label) l_v = Linear(Z_v, v_data, v_label) # pass Z through an activation function Y_t[:] = l_t.act_linear() Y_v[:] = l_v.act_linear() # calculate loss across all training data loss_j_t[:] = (Y_t[:] - t_label)**2 loss_t.append(np.sum(loss_j_t[:], axis=0)/col_dim_t) # calculate loss across all validation data loss_j_v[:] = (Y_v[:] - v_label)**2 loss_v.append(np.sum(loss_j_v[:], axis=0)/col_dim_v) grad_loss_w[:] = l_t.grad_loss_w(Y_t) grad_loss_b[:] = l_t.grad_loss_b(Y_t) # average gradient across all inputs
Re: Idiom for partial failures
On 02/20/2020 09:30 AM, David Wihl wrote: I'm working on the Python client library for the Google Ads API. In some cases, we can start a request with a partial failure flag = True. This means that the request may contain say 1000 operations. If any of the operations fail, the request will return with a success status without an exception. Then the developer has to iterate through the list of operation return statuses to determine which specific ones failed. I believe that it would be more idiomatic in Python (and other languages like Ruby) to throw an exception when one of these partial errors occur. That way there would be the same control flow if a major or minor error occurred. What I have done in such circumstances is have a custom exception type, and store the errors there -- how much detail you save depends on how much you need to properly post-process the failed items. For example: # not tested class PartialFailure(Exception): def __init__(self, failures): self.failures = failures ... lots of code ... failures = [] try: an_operation_that_could_fail except SomeException: failures.append(debugging info) ... more code ... if failures: raise PartialFailure(failures) # in the calling code try: code_that_may_raise_partial_failures except PartialFailure as e: for failure in e.failures: handle_failure -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Re: Idiom for partial failures
On 2/20/20 9:30 AM, David Wihl wrote: (first post) I'm working on the Python client library [0]for the Google Ads API [1]. In some cases, we can start a request with a partial failure [2] flag = True. This means that the request may contain say 1000 operations. If any of the operations fail, the request will return with a success status without an exception. Then the developer has to iterate through the list of operation return statuses to determine which specific ones failed (example [3]). I believe that it would be more idiomatic in Python (and other languages like Ruby) to throw an exception when one of these partial errors occur. That way there would be the same control flow if a major or minor error occurred. The team is asking me for other examples or justification of this being idiomatic of Python. Can you recommend any examples or related best practices? [0] https://github.com/googleads/google-ads-python [1] https://developers.google.com/google-ads/api/docs/start [2] https://developers.google.com/google-ads/api/docs/best-practices/partial-failures [3] https://github.com/googleads/google-ads-python/blob/master/examples/error_handling/handle_partial_failure.py Thanks! -David Potentially stupid question, does your library to the API have to support partial failures? Sure Google Ads does, but does it really add value to do so? And then you've got to decide what to DO with those partially failed results. You could, as a library, just say "No, this library never sets the partial_failure field. We only support atomic transactions that fully succeed or fully fail and raise an exception." Makes your life a lot easier; so unless it makes your customer's lives a lot harder... -- https://mail.python.org/mailman/listinfo/python-list
Re: Idiom for partial failures
On 2020-02-20 13:30, David Wihl wrote: > I believe that it would be more idiomatic in Python (and other > languages like Ruby) to throw an exception when one of these > partial errors occur. That way there would be the same control flow > if a major or minor error occurred. There are a variety of ways to do it -- I like Ethan's suggestion about tacking the failures onto the exception and raising it at the end. But you can also yield a tuple of success/failure iterators, something like this pseudocode: def process(input_iter): successes = [] failures = [] for thing in input_iter: try: result = process(thing) except ValueError as e: # catch appropriate exception(s) here failures.append((e, thing)) else: successes.append((result, thing)) return successes, failures def process(item): if item % 3: raise ValueError("Must be divisible by 3!") else: print(item) return item // 3 successes, failures = process(range(10)) for reason, thing in failures: print(f"Unable to process {thing} because {reason}") -tkc -- https://mail.python.org/mailman/listinfo/python-list
Re: Idiom for partial failures
On 2/20/20 12:30 PM, David Wihl wrote: (first post) I'm working on the Python client library [0]for the Google Ads API [1]. In some cases, we can start a request with a partial failure [2] flag = True. This means that the request may contain say 1000 operations. If any of the operations fail, the request will return with a success status without an exception. Then the developer has to iterate through the list of operation return statuses to determine which specific ones failed (example [3]). I believe that it would be more idiomatic in Python (and other languages like Ruby) to throw an exception when one of these partial errors occur. That way there would be the same control flow if a major or minor error occurred. The team is asking me for other examples or justification of this being idiomatic of Python. Can you recommend any examples or related best practices? [0] https://github.com/googleads/google-ads-python [1] https://developers.google.com/google-ads/api/docs/start [2] https://developers.google.com/google-ads/api/docs/best-practices/partial-failures [3] https://github.com/googleads/google-ads-python/blob/master/examples/error_handling/handle_partial_failure.py Thanks! -David My first thought is that partial failure also means partial success, which means that it shouldn't throw. One key thing about exceptions is that they provide an easy way to ABORT a current operation on an error and trap to an error handler. They allow you to begin the operation starting a try block, have a number of operations that don't need to worry about all the little error conditions that would force you to abort the operation, and then handle all the errors at the end, and the code in the middle doesn't need to propagate the error codes, as the exception carries that information. If you throw on a partial failure/partial success, then you really need to IMMEDIATELY catch the error so you can continue the successes, at which point returning an error value is normally clearer. Now, an alternative would be rather than throwing an exception, would be to effectively raise a signal and call an error callback with the details of the failed operation, and let the call back handle (or record) the error information, so the caller doesn't need to go through all the successes to see if there was an error. Now, if the code was going to iterate through the success anyway, then there isn't as much of a cost to detect the errors that occured. -- Richard Damon -- https://mail.python.org/mailman/listinfo/python-list
Re: Idiom for partial failures
On 21/02/20 10:05 AM, Stefan Ram wrote: David Wihl writes: I believe that it would be more idiomatic in Python (and other languages lik= e Ruby) to throw an exception when one of these partial errors occur. I wonder whether the term "idiomatic" is too heavily burdened here. Python offers an idiom for handling a failure to do an operation - the exception. But you have special case of a run-time program of many operations, and you want the execution of the program to continue even in the case of failures to execute some of the operations. I am not aware of Python offering an idiom for this special case. I am always thinking about terminology first. So I'd say: You need to /define/ what "success" and "failure" means in the case of your operation. Then return normally in the case of success and return by exception in the case of failure. Not arguing - and if I were the logical answer is "not Python", but... PyTest (runs unit-tests, enables TDD) offers options to stop the test-runner at the first error-found or to keep going for up to n-errors. We can test for an error condition and if experienced such will not 'count' or stop the test-run. It distinguishes between hard-errors and warnings, and enables us to choose to 'elevate' one to the level of the other, and vice-versa. Its run-report lists which tests worked and which didn't (the 'last' or the up-to "n") - a clear advance on the OP's experience. I think someone has already discussed, but if the intent is bulk-processing ("batch processing") then I'd expect to find a means of 'bailing-out' when an error is found, or of back-tracking to locate errors. Perhaps, like my good-looks, my expectations of a professional standard are higher-than-average? -- Regards =dn -- https://mail.python.org/mailman/listinfo/python-list
Asyncio question
Hi all I use asyncio in my project, and it works very well without my having to understand what goes on under the hood. It is a multi-user client/server system, and I want it to scale to many concurrent users. I have a situation where I have to decide between two approaches, and I want to choose the least resource-intensive, but I find it hard to reason about which, if either, is better. I use HTTP. On the initial connection from a client, I set up a session object, and the session id is passed to the client. All subsequent requests from that client include the session id, and the request is passed to the session object for handling. It is possible for a new request to be received from a client before the previous one has been completed, and I want each request to be handled atomically, so each session maintains its own asyncio.Queue(). The main routine gets the session id from the request and 'puts' the request in the appropriate queue. The session object 'gets' from the queue and handles the request. It works well. The question is, how to arrange for each session to 'await' its queue. My first attempt was to create a background task for each session which runs for the life-time of the session, and 'awaits' its queue. It works, but I was concerned about having a lot a background tasks active at the same time. Then I came up with what I thought was a better idea. On the initial connection, I create the session object, send the response to the client, and then 'await' the method that sets up the session's queue. This also works, and there is no background task involved. However, I then realised that the initial response handler never completes, and will 'await' until the session is closed. Is this better, worse, or does it make no difference? If it makes no difference, I will lean towards the first approach, as it is easier to reason about what is going on. Thanks for any advice. Frank Millman -- https://mail.python.org/mailman/listinfo/python-list