How to write list of integers to file with struct.pack_into?
My previous message just went up -- sorry for the mangled formatting. Here it is properly formatted: I want to write a list of 64-bit integers to a binary file. Every example I have seen in my research converts it to .txt, but I want it in binary. I wrote this code, based on some earlier work I have done: buf = bytes((len(qs_array)) * 8) for offset in range(len(qs_array)): item_to_write = bytes(qs_array[offset]) struct.pack_into(buf, "https://mail.python.org/mailman/listinfo/python-list
Re: How to write list of integers to file with struct.pack_into?
Dieter, thanks for your comment that: * In your code, `offset` is `0`, `1`, `2`, ... but it should be `0 *8`, `1 * 8`, `2 * 8`, ... But you concluded with essentially the same solution proposed by MRAB, so that would obviate the need to write item by item because it writes the whole buffer at once. Thanks for your help. Oct 2, 2023, 17:47 by die...@handshake.de: > Jen Kris wrote at 2023-10-2 00:04 +0200: > >Iwant to write a list of 64-bit integers to a binary file. Everyexample I > >have seen in my research convertsit to .txt, but I want it in binary. I > >wrote this code,based on some earlier work I have done: > >> >> > >buf= bytes((len(qs_array)) * 8) > >> >> > >for offset in range(len(qs_array)): > >> item_to_write= bytes(qs_array[offset]) >> struct.pack_into(buf,"> > >But I get the error "struct.error: embedded null character." > > You made a lot of errors: > > * the signature of `struct.pack_into` is > `(format, buffer, offset, v1, v2, ...)`. > Especially: `format` is the first, `buffer` the second argument > > * In your code, `offset` is `0`, `1`, `2`, ... > but it should be `0 *8`, `1 * 8`, `2 * 8`, ... > > * The `vi` should be something which fits with the format: > integers in your case. But you pass bytes. > > Try `struct.pack_into(" instead of your loop. > > > Next time: carefully read the documentation and think carefully > about the types involved. > -- https://mail.python.org/mailman/listinfo/python-list
Re: How to write list of integers to file with struct.pack_into?
Thanks very much, MRAB. I just tried that and it works. What frustrated me is that every research example I found writes integers as strings. That works -- sort of -- but it requires re-casting each string to integer when reading the file. If I'm doing binary work I don't want the extra overhead, and it's more difficult yet if I'm using the Python integer output in a C program. Your solution solves those problems. Oct 2, 2023, 17:11 by python-list@python.org: > On 2023-10-01 23:04, Jen Kris via Python-list wrote: > >> >> Iwant to write a list of 64-bit integers to a binary file. Everyexample I >> have seen in my research convertsit to .txt, but I want it in binary. I >> wrote this code,based on some earlier work I have done: >> >> buf= bytes((len(qs_array)) * 8) >> >> foroffset in range(len(qs_array)): >> >> item_to_write= bytes(qs_array[offset]) >> >> struct.pack_into(buf,"> >> ButI get the error "struct.error: embedded null character." >> >> Maybethere's a better way to do this? >> > You can't pack into a 'bytes' object because it's immutable. > > The simplest solution I can think of is: > > buf = struct.pack("<%sQ" % len(qs_array), *qs_array) > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
How to write list of integers to file with struct.pack_into?
Iwant to write a list of 64-bit integers to a binary file. Everyexample I have seen in my research convertsit to .txt, but I want it in binary. I wrote this code,based on some earlier work I have done: buf= bytes((len(qs_array)) * 8) foroffset in range(len(qs_array)): item_to_write= bytes(qs_array[offset]) struct.pack_into(buf,"https://mail.python.org/mailman/listinfo/python-list
Re: How does a method of a subclass become a method of the base class?
Thanks to everyone who answered this question. Your answers have helped a lot. Jen Mar 27, 2023, 14:12 by m...@wichmann.us: > On 3/26/23 17:53, Jen Kris via Python-list wrote: > >> I’m asking all these question because I have worked in a procedural style >> for many years, with class work limited to only simple classes, but now I’m >> studying classes in more depth. The three answers I have received today, >> including yours, have helped a lot. >> > > Classes in Python don't work quite like they do in many other languages. > > You may find a lightbulb if you listen to Raymond Hettinger talk about them: > > https://dailytechvideo.com/raymond-hettinger-pythons-class-development-toolkit/ > > I'd also advise that benchmarks often do very strange things to set up the > scenario they're trying to test, a benchmark sure wouldn't be my first place > to look in learning a new piece of Python - I don't know if it was the first > place, but thought this was worth a mention. > > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: How does a method of a subclass become a method of the base class?
Cameron, Thanks for your reply. You are correct about the class definition lines – e.g. class EqualityConstraint(BinaryConstraint). I didn’t post all of the code because this program is over 600 lines long. It's DeltaBlue in the Python benchmark suite. I’ve done some more work since this morning, and now I see what’s happening. But it gave rise to another question, which I’ll ask at the end. The call chain starts at EqualityConstraint(prev, v, Strength.REQUIRED) The class EqualityConstraint is a subclass of BinaryConstraint. The entire class code is: class EqualityConstraint(BinaryConstraint): def execute(self): self.output().value = self.input().value Because EqualityConstraint is a subclass of BinaryConstraint, the init method of BinaryConstraint is called first. During that initialization (I showed the call chain in my previous message), it calls choose_method. When I inspect the code at "self.choose_method(mark):" in PyCharm, it shows: > As EqualityConstraint is a subclass of BinaryConstraint it has bound the choose method from BinaryConstraint, apparently during the BinaryConstraint init process, and that’s the one it uses. So that answers my original question. But that brings up a new question. I can create a class instance with x = BinaryConstraint(), but what happens when I have a line like "EqualityConstraint(prev, v, Strength.REQUIRED)"? Is it because the only method of EqualityConstraint is execute(self)? Is execute a special function like a class __init__? I’ve done research on that but I haven’t found an answer. I’m asking all these question because I have worked in a procedural style for many years, with class work limited to only simple classes, but now I’m studying classes in more depth. The three answers I have received today, including yours, have helped a lot. Thanks very much. Jen Mar 26, 2023, 22:45 by c...@cskk.id.au: > On 26Mar2023 22:36, Jen Kris wrote: > >> At the final line it calls "satisfy" in the Constraint class, and that line >> calls choose_method in the BinaryConstraint class. Just as Peter Holzer >> said, it requires a call to "satisfy." >> >> My only remaining question is, did it select the choose_method in the >> BinaryConstraint class instead of the choose_method in the UrnaryConstraint >> class because of "super(BinaryConstraint, self).__init__(strength)" in step >> 2 above? >> > > Basicly, no. > > You've omitting the "class" lines of the class definitions, and they define > the class inheritance, _not "__init__". The "__init__" method just > initialises the state of the new objects (which has already been created). > The: > > super(BinaryConstraint,_ self).__init__(strength) > > line simply calls the appropriate superclass "__init__" with the "strength" > parameter to do that aspect of the initialisation. > > You haven't cited the line which calls the "choose_method" method, but I'm > imagining it calls "choose_method" like this: > > self.choose_method(...) > > That searchs for the "choose_method" method based on the method resolution > order of the object "self". So if "self" was an instance of > "EqualityConstraint", and I'm guessing abut its class definition, assuming > this: > > class EqualityConstraint(BinaryConstraint): > > Then a call to "self.choose_method" would look for a "choose_method" method > first in the EqualityConstraint class and then via the BinaryConstraint > class. I'm also assuming UrnaryConstraint is not in that class ancestry i.e. > not an ancestor of BinaryConstraint, for example. > > The first method found is used. > > In practice, when you define a class like: > > class EqualityConstraint(BinaryConstraint): > > the complete class ancestry (the addition classes from which BinaryConstraint > inherits) gets flatterned into a "method resultion order" list of classes to > inspect in order, and that is stored as the ".__mro__" field on the new class > (EqualityConstraint). You can look at it directly as > "EqualityConstraint.__mro__". > > So looking up: > > self.choose_method() > > looks for a "choose_method" method on the classes in "type(self).__mro__". > > Cheers, > Cameron Simpson > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: How does a method of a subclass become a method of the base class?
Based on your explanations, I went through the call chain and now I understand better how it works, but I have a follow-up question at the end. This code comes from the DeltaBlue benchmark in the Python benchmark suite. 1 The call chain starts in a non-class program with the following call: EqualityConstraint(prev, v, Strength.REQUIRED) 2 EqualityConstraint is a subclass of BinaryConstraint, so first it calls the __init__ method of BinaryConstraint: def __init__(self, v1, v2, strength): super(BinaryConstraint, self).__init__(strength) self.v1 = v1 self.v2 = v2 self.direction = Direction.NONE self.add_constraint() 3 At the final line shown above it calls add_constraint in the Constraint class, the base class of BinaryConstraint: def add_constraint(self): global planner self.add_to_graph() planner.incremental_add(self) 4 At planner.incremental_add it calls incremental_add in the Planner class because planner is a global instance of the Planner class: def incremental_add(self, constraint): mark = self.new_mark() overridden = constraint.satisfy(mark) At the final line it calls "satisfy" in the Constraint class, and that line calls choose_method in the BinaryConstraint class. Just as Peter Holzer said, it requires a call to "satisfy." My only remaining question is, did it select the choose_method in the BinaryConstraint class instead of the choose_method in the UrnaryConstraint class because of "super(BinaryConstraint, self).__init__(strength)" in step 2 above? Thanks for helping me clarify that. Jen Mar 26, 2023, 18:55 by hjp-pyt...@hjp.at: > On 2023-03-26 19:43:44 +0200, Jen Kris via Python-list wrote: > >> The base class: >> >> >> class Constraint(object): >> > [...] > >> def satisfy(self, mark): >> global planner >> self.choose_method(mark) >> >> The subclass: >> >> class UrnaryConstraint(Constraint): >> > [...] > >> def choose_method(self, mark): >> if self.my_output.mark != mark and \ >> Strength.stronger(self.strength, self.my_output.walk_strength): >> self.satisfied = True >> else: >> self.satisfied = False >> >> The base class Constraint doesn’t have a "choose_method" class method, >> but it’s called as self.choose_method(mark) on the final line of >> Constraint shown above. >> >> My question is: what makes "choose_method" a method of the base >> class, >> > > Nothing. choose_method isn't a method of the base class. > >> called as self.choose_method instead of >> UrnaryConstraint.choose_method? Is it super(UrnaryConstraint, >> self).__init__(strength) or just the fact that Constraint is its base >> class? >> > > This works only if satisfy() is called on a subclass of Constraint which > actually implements this method. > > If you do something like > > x = UrnaryConstraint() > x.satisfy(whatever) > > Then x is a member of class UrnaryConstraint and will have a > choose_method() method which can be called. > > >> Also, this program also has a class BinaryConstraint that is also a >> subclass of Constraint and it also has a choose_method class method >> that is similar but not identical: >> > ... > >> When called from Constraint, it uses the one at UrnaryConstraint. How >> does it know which one to use? >> > > By inspecting self. If you call x.satisfy() on an object of class > UrnaryConstraint, then self.choose_method will be the choose_method from > UrnaryConstraint. If you call it on an object of class BinaryConstraint, > then self.choose_method will be the choose_method from BinaryConstraint. > > hp > > PS: Pretty sure there's one "r" too many in UrnaryConstraint. > > -- > _ | Peter J. Holzer| Story must make more sense than reality. > |_|_) || > | | | h...@hjp.at |-- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!" > -- https://mail.python.org/mailman/listinfo/python-list
Re: How does a method of a subclass become a method of the base class?
Thanks to Richard Damon and Peter Holzer for your replies. I'm working through the call chain to understand better so I can post a followup question if needed. Thanks again. Jen Mar 26, 2023, 19:21 by rich...@damon-family.org: > On 3/26/23 1:43 PM, Jen Kris via Python-list wrote: > >> The base class: >> >> >> class Constraint(object): >> >> def __init__(self, strength): >> super(Constraint, self).__init__() >> self.strength = strength >> >> def satisfy(self, mark): >> global planner >> self.choose_method(mark) >> >> The subclass: >> >> class UrnaryConstraint(Constraint): >> >> def __init__(self, v, strength): >> super(UrnaryConstraint, self).__init__(strength) >> self.my_output = v >> self.satisfied = False >> self.add_constraint() >> >> def choose_method(self, mark): >> if self.my_output.mark != mark and \ >> Strength.stronger(self.strength, self.my_output.walk_strength): >> self.satisfied = True >> else: >> self.satisfied = False >> >> The base class Constraint doesn’t have a "choose_method" class method, but >> it’s called as self.choose_method(mark) on the final line of Constraint >> shown above. >> >> My question is: what makes "choose_method" a method of the base class, >> called as self.choose_method instead of UrnaryConstraint.choose_method? Is >> it super(UrnaryConstraint, self).__init__(strength) or just the fact that >> Constraint is its base class? >> >> Also, this program also has a class BinaryConstraint that is also a subclass >> of Constraint and it also has a choose_method class method that is similar >> but not identical: >> >> def choose_method(self, mark): >> if self.v1.mark == mark: >> if self.v2.mark != mark and Strength.stronger(self.strength, >> self.v2.walk_strength): >> self.direction = Direction.FORWARD >> else: >> self.direction = Direction.BACKWARD >> >> When called from Constraint, it uses the one at UrnaryConstraint. How does >> it know which one to use? >> >> Thanks, >> >> Jen >> > > Perhaps the key point to remember is that when looking up the methods on an > object, those methods are part of the object as a whole, not particually > "attached" to a given class. When creating the subclass typed object, first > the most base class part is built, and all the methods of that class are put > into the object, then the next level, and so on, and if a duplicate method is > found, it just overwrites the connection. Then when the object is used, we > see if there is a method by that name to use, so methods in the base can find > methods in subclasses to use. > > Perhaps a more modern approach would be to use the concept of an "abstract > base" which allows the base to indicate that a derived class needs to define > certain abstract methods, (If you need that sort of support, not defining a > method might just mean the subclass doesn't support some optional behavior > defined by the base) > > -- > Richard Damon > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
How does a method of a subclass become a method of the base class?
The base class: class Constraint(object): def __init__(self, strength): super(Constraint, self).__init__() self.strength = strength def satisfy(self, mark): global planner self.choose_method(mark) The subclass: class UrnaryConstraint(Constraint): def __init__(self, v, strength): super(UrnaryConstraint, self).__init__(strength) self.my_output = v self.satisfied = False self.add_constraint() def choose_method(self, mark): if self.my_output.mark != mark and \ Strength.stronger(self.strength, self.my_output.walk_strength): self.satisfied = True else: self.satisfied = False The base class Constraint doesn’t have a "choose_method" class method, but it’s called as self.choose_method(mark) on the final line of Constraint shown above. My question is: what makes "choose_method" a method of the base class, called as self.choose_method instead of UrnaryConstraint.choose_method? Is it super(UrnaryConstraint, self).__init__(strength) or just the fact that Constraint is its base class? Also, this program also has a class BinaryConstraint that is also a subclass of Constraint and it also has a choose_method class method that is similar but not identical: def choose_method(self, mark): if self.v1.mark == mark: if self.v2.mark != mark and Strength.stronger(self.strength, self.v2.walk_strength): self.direction = Direction.FORWARD else: self.direction = Direction.BACKWARD When called from Constraint, it uses the one at UrnaryConstraint. How does it know which one to use? Thanks, Jen -- https://mail.python.org/mailman/listinfo/python-list
Re: How to escape strings for re.finditer?
I wrote my previous message before reading this. Thank you for the test you ran -- it answers the question of performance. You show that re.finditer is 30x faster, so that certainly recommends that over a simple loop, which introduces looping overhead. Feb 28, 2023, 05:44 by li...@tompassin.net: > On 2/28/2023 4:33 AM, Roel Schroeven wrote: > >> Op 28/02/2023 om 3:44 schreef Thomas Passin: >> >>> On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote: >>> And, just for fun, since there is nothing wrong with your code, this minor change is terser: >>> example = 'X - abc_degree + 1 + qq + abc_degree + 1' >>> for match in re.finditer(re.escape('abc_degree + 1') , example): >>> ... print(match.start(), match.end()) ... ... 4 18 26 40 >>> >>> Just for more fun :) - >>> >>> Without knowing how general your expressions will be, I think the following >>> version is very readable, certainly more readable than regexes: >>> >>> example = 'X - abc_degree + 1 + qq + abc_degree + 1' >>> KEY = 'abc_degree + 1' >>> >>> for i in range(len(example)): >>> if example[i:].startswith(KEY): >>> print(i, i + len(KEY)) >>> # prints: >>> 4 18 >>> 26 40 >>> >> I think it's often a good idea to use a standard library function instead of >> rolling your own. The issue becomes less clear-cut when the standard library >> doesn't do exactly what you need (as here, where re.finditer() uses regular >> expressions while the use case only uses simple search strings). Ideally >> there would be a str.finditer() method we could use, but in the absence of >> that I think we still need to consider using the almost-but-not-quite >> fitting re.finditer(). >> >> Two reasons: >> >> (1) I think it's clearer: the name tells us what it does (though of course >> we could solve this in a hand-written version by wrapping it in a suitably >> named function). >> >> (2) Searching for a string in another string, in a performant way, is not as >> simple as it first appears. Your version works correctly, but slowly. In >> some situations it doesn't matter, but in other cases it will. For better >> performance, string searching algorithms jump ahead either when they found a >> match or when they know for sure there isn't a match for some time (see e.g. >> the Boyer–Moore string-search algorithm). You could write such a more >> efficient algorithm, but then it becomes more complex and more error-prone. >> Using a well-tested existing function becomes quite attractive. >> > > Sure, it all depends on what the real task will be. That's why I wrote > "Without knowing how general your expressions will be". For the example > string, it's unlikely that speed will be a factor, but who knows what target > strings and keys will turn up in the future? > >> To illustrate the difference performance, I did a simple test (using the >> paragraph above is test text): >> >> import re >> import timeit >> >> def using_re_finditer(key, text): >> matches = [] >> for match in re.finditer(re.escape(key), text): >> matches.append((match.start(), match.end())) >> return matches >> >> >> def using_simple_loop(key, text): >> matches = [] >> for i in range(len(text)): >> if text[i:].startswith(key): >> matches.append((i, i + len(key))) >> return matches >> >> >> CORPUS = """Searching for a string in another string, in a performant >> way, is >> not as simple as it first appears. Your version works correctly, but >> slowly. >> In some situations it doesn't matter, but in other cases it will. For >> better >> performance, string searching algorithms jump ahead either when they >> found a >> match or when they know for sure there isn't a match for some time (see >> e.g. >> the Boyer–Moore string-search algorithm). You could write such a more >> efficient algorithm, but then it becomes more complex and more >> error-prone. >> Using a well-tested existing function becomes quite attractive.""" >> KEY = 'in' >> print('using_simple_loop:', timeit.repeat(stmt='using_simple_loop(KEY, >> CORPUS)', globals=globals(), number=1000)) >> print('using_re_finditer:', timeit.repeat(stmt='using_re_finditer(KEY, >> CORPUS)', globals=globals(), number=1000)) >> >> This does 5 runs of 1000 repetitions each, and reports the time in seconds >> for each of those runs. >> Result on my machine: >> >> using_simple_loop: [0.1395295020792, 0.1306313000456, >> 0.1280345001249, 0.1318618002423, 0.1308461032626] >> using_re_finditer: [0.00386140005233, 0.00406190124297, >> 0.00347899970256, 0.00341310216218, 0.003732001273] >> >> We find that in this test re.finditer() is more than 30 times faster >> (despite the overhead of regular expressions. >> >> While speed isn't everything in programming, w
Re: How to escape strings for re.finditer?
Using str.startswith is a cool idea in this case. But is it better than regex for performance or reliability? Regex syntax is not a model of simplicity, but in my simple case it's not too difficult. Feb 27, 2023, 18:52 by li...@tompassin.net: > On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote: > >> And, just for fun, since there is nothing wrong with your code, this minor >> change is terser: >> > example = 'X - abc_degree + 1 + qq + abc_degree + 1' > for match in re.finditer(re.escape('abc_degree + 1') , example): > >> ... print(match.start(), match.end()) >> ... >> ... >> 4 18 >> 26 40 >> > > Just for more fun :) - > > Without knowing how general your expressions will be, I think the following > version is very readable, certainly more readable than regexes: > > example = 'X - abc_degree + 1 + qq + abc_degree + 1' > KEY = 'abc_degree + 1' > > for i in range(len(example)): > if example[i:].startswith(KEY): > print(i, i + len(KEY)) > # prints: > 4 18 > 26 40 > > If you may have variable numbers of spaces around the symbols, OTOH, the > whole situation changes and then regexes would almost certainly be the best > approach. But the regular expression strings would become harder to read. > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
RE: How to escape strings for re.finditer?
The code I sent is correct, and it runs here. Maybe you received it with a carriage return removed, but on my copy after posting, it is correct: example = 'X - abc_degree + 1 + qq + abc_degree + 1' find_string = re.escape('abc_degree + 1') for match in re.finditer(find_string, example): print(match.start(), match.end()) One question: several people have made suggestions other than regex (not your terser example with regex you shown below). Is there a reason why regex is not preferred to, for example, a list comp? Performance? Reliability? Feb 27, 2023, 18:16 by avi.e.gr...@gmail.com: > Jen, > > Can you see what SOME OF US see as ASCII text? We can help you better if we > get code that can be copied and run as-is. > > What you sent is not terse. It is wrong. It will not run on any python > interpreter because you somehow lost a carriage return and indent. > > This is what you sent: > > example = 'X - abc_degree + 1 + qq + abc_degree + 1' > find_string = re.escape('abc_degree + 1') for match in > re.finditer(find_string, example): > print(match.start(), match.end()) > > This is code indentedproperly: > > example = 'X - abc_degree + 1 + qq + abc_degree + 1' > find_string = re.escape('abc_degree + 1') > for match in re.finditer(find_string, example): > print(match.start(), match.end()) > > Of course I am sure you wrote and ran code more like the latter version but > somewhere in your copy/paste process, > > And, just for fun, since there is nothing wrong with your code, this minor > change is terser: > >>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1' >>>> for match in re.finditer(re.escape('abc_degree + 1') , example): >>>> > ... print(match.start(), match.end()) > ... > ... > 4 18 > 26 40 > > But note once you use regular expressions, and not in your case, you might > match multiple things that are far from the same such as matching two > repeated words of any kind in any case including "and and" and "so so" or > finding words that have multiple doubled letter as in the stereotypical > bookkeeper. In those cases, you may want even more than offsets but also show > the exact text that matched or even show some characters before and/or after > for context. > > > -Original Message- > From: Python-list On > Behalf Of Jen Kris via Python-list > Sent: Monday, February 27, 2023 8:36 PM > To: Cameron Simpson > Cc: Python List > Subject: Re: How to escape strings for re.finditer? > > > I haven't tested it either but it looks like it would work. But for this > case I prefer the relative simplicity of: > > example = 'X - abc_degree + 1 + qq + abc_degree + 1' > find_string = re.escape('abc_degree + 1') for match in > re.finditer(find_string, example): > print(match.start(), match.end()) > > 4 18 > 26 40 > > I don't insist on terseness for its own sake, but it's cleaner this way. > > Jen > > > Feb 27, 2023, 16:55 by c...@cskk.id.au: > >> On 28Feb2023 01:13, Jen Kris wrote: >> >>> I went to the re module because the specified string may appear more than >>> once in the string (in the code I'm writing). >>> >> >> Sure, but writing a `finditer` for plain `str` is pretty easy (untested): >> >> pos = 0 >> while True: >> found = s.find(substring, pos) >> if found < 0: >> break >> start = found >> end = found + len(substring) >> ... do whatever with start and end ... >> pos = end >> >> Many people go straight to the `re` module whenever they're looking for >> strings. It is often cryptic error prone overkill. Just something to keep in >> mind. >> >> Cheers, >> Cameron Simpson >> -- >> https://mail.python.org/mailman/listinfo/python-list >> > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: How to escape strings for re.finditer?
I haven't tested it either but it looks like it would work. But for this case I prefer the relative simplicity of: example = 'X - abc_degree + 1 + qq + abc_degree + 1' find_string = re.escape('abc_degree + 1') for match in re.finditer(find_string, example): print(match.start(), match.end()) 4 18 26 40 I don't insist on terseness for its own sake, but it's cleaner this way. Jen Feb 27, 2023, 16:55 by c...@cskk.id.au: > On 28Feb2023 01:13, Jen Kris wrote: > >> I went to the re module because the specified string may appear more than >> once in the string (in the code I'm writing). >> > > Sure, but writing a `finditer` for plain `str` is pretty easy (untested): > > pos = 0 > while True: > found = s.find(substring, pos) > if found < 0: > break > start = found > end = found + len(substring) > ... do whatever with start and end ... > pos = end > > Many people go straight to the `re` module whenever they're looking for > strings. It is often cryptic error prone overkill. Just something to keep in > mind. > > Cheers, > Cameron Simpson > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: How to escape strings for re.finditer?
string.count() only tells me there are N instances of the string; it does not say where they begin and end, as does re.finditer. Feb 27, 2023, 16:20 by bobmellow...@gmail.com: > Would string.count() work for you then? > > On Mon, Feb 27, 2023 at 5:16 PM Jen Kris via Python-list <> > python-list@python.org> > wrote: > >> >> I went to the re module because the specified string may appear more than >> once in the string (in the code I'm writing). For example: >> >> a = "X - abc_degree + 1 + qq + abc_degree + 1" >> b = "abc_degree + 1" >> q = a.find(b) >> >> print(q) >> 4 >> >> So it correctly finds the start of the first instance, but not the second >> one. The re code finds both instances. If I knew that the substring >> occurred only once then the str.find would be best. >> >> I changed my re code after MRAB's comment, it now works. >> >> Thanks much. >> >> Jen >> >> >> Feb 27, 2023, 15:56 by >> c...@cskk.id.au>> : >> >> > On 28Feb2023 00:11, Jen Kris <>> jenk...@tutanota.com>> > wrote: >> > >> >> When matching a string against a longer string, where both strings have >> spaces in them, we need to escape the spaces. >> >> >> >> This works (no spaces): >> >> >> >> import re >> >> example = 'abcdefabcdefabcdefg' >> >> find_string = "abc" >> >> for match in re.finditer(find_string, example): >> >> print(match.start(), match.end()) >> >> >> >> That gives me the start and end character positions, which is what I >> want. >> >> >> >> However, this does not work: >> >> >> >> import re >> >> example = re.escape('X - cty_degrees + 1 + qq') >> >> find_string = re.escape('cty_degrees + 1') >> >> for match in re.finditer(find_string, example): >> >> print(match.start(), match.end()) >> >> >> >> I’ve tried several other attempts based on my reseearch, but still no >> match. >> >> >> > >> > You need to print those strings out. You're escaping the _example_ >> string, which would make it: >> > >> > X - cty_degrees \+ 1 \+ qq >> > >> > because `+` is a special character in regexps and so `re.escape` escapes >> it. But you don't want to mangle the string you're searching! After all, the >> text above does not contain the string `cty_degrees + 1`. >> > >> > My secondary question is: if you're escaping the thing you're searching >> _for_, then you're effectively searching for a _fixed_ string, not a >> pattern/regexp. So why on earth are you using regexps to do your searching? >> > >> > The `str` type has a `find(substring)` function. Just use that! It'll be >> faster and the code simpler! >> > >> > Cheers, >> > Cameron Simpson <>> c...@cskk.id.au>> > >> > -- >> > >> https://mail.python.org/mailman/listinfo/python-list >> > >> >> -- >> >> https://mail.python.org/mailman/listinfo/python-list >> > > > -- > Listen to my CD at > http://www.mellowood.ca/music/cedars> > Bob van der Poel ** Wynndel, British Columbia, CANADA ** > EMAIL: > b...@mellowood.ca > WWW: > http://www.mellowood.ca > -- https://mail.python.org/mailman/listinfo/python-list
Re: How to escape strings for re.finditer?
I went to the re module because the specified string may appear more than once in the string (in the code I'm writing). For example: a = "X - abc_degree + 1 + qq + abc_degree + 1" b = "abc_degree + 1" q = a.find(b) print(q) 4 So it correctly finds the start of the first instance, but not the second one. The re code finds both instances. If I knew that the substring occurred only once then the str.find would be best. I changed my re code after MRAB's comment, it now works. Thanks much. Jen Feb 27, 2023, 15:56 by c...@cskk.id.au: > On 28Feb2023 00:11, Jen Kris wrote: > >> When matching a string against a longer string, where both strings have >> spaces in them, we need to escape the spaces. >> >> This works (no spaces): >> >> import re >> example = 'abcdefabcdefabcdefg' >> find_string = "abc" >> for match in re.finditer(find_string, example): >> print(match.start(), match.end()) >> >> That gives me the start and end character positions, which is what I want. >> >> However, this does not work: >> >> import re >> example = re.escape('X - cty_degrees + 1 + qq') >> find_string = re.escape('cty_degrees + 1') >> for match in re.finditer(find_string, example): >> print(match.start(), match.end()) >> >> I’ve tried several other attempts based on my reseearch, but still no match. >> > > You need to print those strings out. You're escaping the _example_ string, > which would make it: > > X - cty_degrees \+ 1 \+ qq > > because `+` is a special character in regexps and so `re.escape` escapes it. > But you don't want to mangle the string you're searching! After all, the text > above does not contain the string `cty_degrees + 1`. > > My secondary question is: if you're escaping the thing you're searching > _for_, then you're effectively searching for a _fixed_ string, not a > pattern/regexp. So why on earth are you using regexps to do your searching? > > The `str` type has a `find(substring)` function. Just use that! It'll be > faster and the code simpler! > > Cheers, > Cameron Simpson > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: How to escape strings for re.finditer?
Yes, that's it. I don't know how long it would have taken to find that detail with research through the voluminous re documentation. Thanks very much. Feb 27, 2023, 15:47 by pyt...@mrabarnett.plus.com: > On 2023-02-27 23:11, Jen Kris via Python-list wrote: > >> When matching a string against a longer string, where both strings have >> spaces in them, we need to escape the spaces. >> >> This works (no spaces): >> >> import re >> example = 'abcdefabcdefabcdefg' >> find_string = "abc" >> for match in re.finditer(find_string, example): >> print(match.start(), match.end()) >> >> That gives me the start and end character positions, which is what I want. >> >> However, this does not work: >> >> import re >> example = re.escape('X - cty_degrees + 1 + qq') >> find_string = re.escape('cty_degrees + 1') >> for match in re.finditer(find_string, example): >> print(match.start(), match.end()) >> >> I’ve tried several other attempts based on my reseearch, but still no match. >> >> I don’t have much experience with regex, so I hoped a reg-expert might help. >> > You need to escape only the pattern, not the string you're searching. > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
How to escape strings for re.finditer?
When matching a string against a longer string, where both strings have spaces in them, we need to escape the spaces. This works (no spaces): import re example = 'abcdefabcdefabcdefg' find_string = "abc" for match in re.finditer(find_string, example): print(match.start(), match.end()) That gives me the start and end character positions, which is what I want. However, this does not work: import re example = re.escape('X - cty_degrees + 1 + qq') find_string = re.escape('cty_degrees + 1') for match in re.finditer(find_string, example): print(match.start(), match.end()) I’ve tried several other attempts based on my reseearch, but still no match. I don’t have much experience with regex, so I hoped a reg-expert might help. Thanks, Jen -- https://mail.python.org/mailman/listinfo/python-list
Re: To clarify how Python handles two equal objects
Yes, in fact I asked my original question – "I discovered something about Python array handling that I would like to clarify" -- because I saw that Python did it that way. Jan 14, 2023, 15:51 by ros...@gmail.com: > On Sun, 15 Jan 2023 at 10:32, Jen Kris via Python-list > wrote: > >> The situation I described in my original post is limited to a case such as x >> = y ... the assignment can be done simply by "x" taking the pointer to "y" >> rather than moving all the data from "y" into the memory buffer for "x" >> > > It's not simply whether it *can* be done. It, in fact, *MUST* be done > that way. The ONLY meaning of "x = y" is that you now have a name "x" > which refers to whatever object is currently found under the name "y". > This is not an optimization, it is a fundamental of Python's object > model. This is true regardless of what kind of object this is; every > object must behave this way. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
RE: To clarify how Python handles two equal objects
Avi, Your comments go farther afield than my original question, but you made some interesting additional points. For example, I sometimes work with the C API and sys.getrefcount may be helpful in deciding when to INCREF and DECREF. But that’s another issue. The situation I described in my original post is limited to a case such as x = y where both "x" and "y" are arrays – whether they are lists in Python, or from the array module – and the question in a compiled C extension is whether the assignment can be done simply by "x" taking the pointer to "y" rather than moving all the data from "y" into the memory buffer for "x" which, for a wide array, would be much more time consuming than just moving a pointer. The other advantage to doing it that way is if, as in my case, we perform a math operation on any element in "x" then Python expects that the same change to be reflected in "y." If I don’t use the same pointers then I would have to perform that operation twice – once for "x" and once for "y" – in addition to the expense of moving all the data. The answers I got from this post confirmed that it I can use the pointer if "y" is not re-defined to something else during the lifespan of "x." If it is then "x" has to be restored to its original pointer. I did it that way, and helpfully the compiler did not overrule me. Jan 13, 2023, 18:41 by avi.e.gr...@gmail.com: > Jen, > > This may not be on target but I was wondering about your needs in this > category. Are all your data in a form where all in a cluster are the same > object type, such as floating point? > > Python has features designed to allow you to get multiple views on such > objects such as memoryview that can be used to say see an array as a matrix > of n rows by m columns, or m x n, or any other combo. And of course the > fuller numpy package has quite a few features. > > However, as you note, there is no guarantee that any reference to the data > may not shift away from it unless you build fairly convoluted logic or data > structures such as having an object that arranges to do something when you > try to remove it, such as tinkering with the __del__ method as well as > whatever method is used to try to set it to a new value. I guess that might > make sense for something like asynchronous programming including when setting > locks so multiple things cannot overlap when being done. > > Anyway, some of the packages like numpy are optimized in many ways but if you > want to pass a subset of sorts to make processing faster, I suspect you could > do things like pass a memoryview but it might not be faster than what you > build albeit probably more reliable and portable. > > I note another odd idea that others may have mentioned, with caution. > > If you load the sys module, you can CAREFULLY use code like this. > > a="Something Unique" > sys.getrefcount(a) > 2 > > Note if a==1 you will get some huge number of references and this is > meaningless. The 2 above is because asking about how many references also > references it. > > So save what ever number you have and see what happens when you make a second > reference or a third, and what happens if you delete or alter a reference: > > a="Something Unique" > sys.getrefcount(a) > 2 > b = a > sys.getrefcount(a) > 3 > sys.getrefcount(b) > 3 > c = b > d = a > sys.getrefcount(a) > 5 > sys.getrefcount(d) > 5 > del(a) > sys.getrefcount(d) > 4 > b = "something else" > sys.getrefcount(d) > 3 > > So, in theory, you could carefully write your code to CHECK the reference > count had not changed but there remain edge cases where a removed reference > is replaced by yet another new reference and you would have no idea. > > Avi > > > -Original Message- > From: Python-list On > Behalf Of Jen Kris via Python-list > Sent: Wednesday, January 11, 2023 1:29 PM > To: Roel Schroeven > Cc: python-list@python.org > Subject: Re: To clarify how Python handles two equal objects > > Thanks for your comments. After all, I asked for clarity so it’s not > pedantic to be precise, and you’re helping to clarify. > > Going back to my original post, > > mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] > arr1 = mx1[2] > > Now if I write "arr1[1] += 5" then both arr1 and mx1[2][1] will be changed > because while they are different names, they are the assigned same memory > location (pointer). Similarly, if I write "mx1[2][1] += 5" then again both > names will be updated. > > That’s what I meant by "an operation on one is an operat
Re: To clarify how Python handles two equal objects
Bob, Your examples show a and b separately defined. My example is where the definition is a=1; b = a. But I'm only interested in arrays. I would not rely on this for integers, and there's not likely to be any real cost savings there. Jan 13, 2023, 08:45 by b...@mellowood.ca: > It seems to me that the the entire concept of relying on python's idea of > where an object is stored is just plain dangerous. A most simple example > might be: > >>> a=1 > >>> b=1 > >>> a is b > True > >>> a=1234 > >>> b=1234 > >>> a is b > False > > Not sure what happens if you manipulate the data referenced by 'b' in the > first example thinking you are changing something referred to by 'a' ... but > you might be smart to NOT think that you know. > > > > On Fri, Jan 13, 2023 at 9:00 AM Jen Kris via Python-list <> > python-list@python.org> > wrote: > >> >> Avi, >> >> Thanks for your comments. You make a good point. >> >> Going back to my original question, and using your slice() example: >> >> middle_by_two = slice(5, 10, 2) >> nums = [n for n in range(12)] >> q = nums[middle_by_two] >> x = id(q) >> b = q >> y = id(b) >> >> If I assign "b" to "q", then x and y match – they point to the same memory >> until "b" OR "q" are reassigned to something else. If "q" changes during >> the lifetime of "b" then it’s not safe to use the pointer to "q" for "b", as >> in: >> >> nums = [n for n in range(2, 14)] >> q = nums[middle_by_two] >> x = id(q) >> y = id(b) >> >> Now "x" and "y" are different, as we would expect. So when writing a spot >> speed up in a compiled language, you can see in the Python source if either >> is reassigned, so you’ll know how to handle it. The motivation behind my >> question was that in a compiled extension it’s faster to borrow a pointer >> than to move an entire array if it’s possible, but special care must be >> taken. >> >> Jen >> >> >> >> Jan 12, 2023, 20:51 by >> avi.e.gr...@gmail.com>> : >> >> > Jen, >> > >> > It is dangerous territory you are treading as there are times all or >> parts of objects are copied, or changed in place or the method you use to >> make a view is not doing quite what you want. >> > >> > As an example, you can create a named slice such as: >> > >> > middle_by_two = slice(5, 10, 2) >> > >> > The above is not in any sense pointing at anything yet. But given a long >> enough list or other such objects, it will take items (starting at index 0) >> starting with item that are at indices 5 then 7 then 9 as in this: >> > >> > nums = [n for n in range(12)] >> > nums[middle_by_two] >> > >> > [5, 7, 9] >> > >> > The same slice will work on anything else: >> > >> > list('abcdefghijklmnopqrstuvwxyz')[middle_by_two] >> > ['f', 'h', 'j'] >> > >> > So although you may think the slice is bound to something, it is not. It >> is an object that only later is briefly connected to whatever you want to >> apply it to. >> > >> > If I later change nums, above, like this: >> > >> > nums = [-3, -2, -1] + nums >> > nums >> > [-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] >> > nums[middle_by_two] >> > [2, 4, 6] >> > >> > In the example, you can forget about whether we are talking about >> pointers directly or indirectly or variable names and so on. Your "view" >> remains valid ONLY as long as you do not change either the slice or the >> underlying object you are applying to -- at least not the items you want to >> extract. >> > >> > Since my example inserted three new items at the start using negative >> numbers for illustration, you would need to adjust the slice by making a new >> slice designed to fit your new data. The example below created an adjusted >> slice that adds 3 to the start and stop settings of the previous slice while >> copying the step value and then it works on the elongated object: >> > >> > middle_by_two_adj = slice(middle_by_two.start + 3, middle_by_two.stop + >> 3, middle_by_two.step) >> > nu
RE: To clarify how Python handles two equal objects
Avi, Thanks for your comments. You make a good point. Going back to my original question, and using your slice() example: middle_by_two = slice(5, 10, 2) nums = [n for n in range(12)] q = nums[middle_by_two] x = id(q) b = q y = id(b) If I assign "b" to "q", then x and y match – they point to the same memory until "b" OR "q" are reassigned to something else. If "q" changes during the lifetime of "b" then it’s not safe to use the pointer to "q" for "b", as in: nums = [n for n in range(2, 14)] q = nums[middle_by_two] x = id(q) y = id(b) Now "x" and "y" are different, as we would expect. So when writing a spot speed up in a compiled language, you can see in the Python source if either is reassigned, so you’ll know how to handle it. The motivation behind my question was that in a compiled extension it’s faster to borrow a pointer than to move an entire array if it’s possible, but special care must be taken. Jen Jan 12, 2023, 20:51 by avi.e.gr...@gmail.com: > Jen, > > It is dangerous territory you are treading as there are times all or parts of > objects are copied, or changed in place or the method you use to make a view > is not doing quite what you want. > > As an example, you can create a named slice such as: > > middle_by_two = slice(5, 10, 2) > > The above is not in any sense pointing at anything yet. But given a long > enough list or other such objects, it will take items (starting at index 0) > starting with item that are at indices 5 then 7 then 9 as in this: > > nums = [n for n in range(12)] > nums[middle_by_two] > > [5, 7, 9] > > The same slice will work on anything else: > > list('abcdefghijklmnopqrstuvwxyz')[middle_by_two] > ['f', 'h', 'j'] > > So although you may think the slice is bound to something, it is not. It is > an object that only later is briefly connected to whatever you want to apply > it to. > > If I later change nums, above, like this: > > nums = [-3, -2, -1] + nums > nums > [-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] > nums[middle_by_two] > [2, 4, 6] > > In the example, you can forget about whether we are talking about pointers > directly or indirectly or variable names and so on. Your "view" remains valid > ONLY as long as you do not change either the slice or the underlying object > you are applying to -- at least not the items you want to extract. > > Since my example inserted three new items at the start using negative numbers > for illustration, you would need to adjust the slice by making a new slice > designed to fit your new data. The example below created an adjusted slice > that adds 3 to the start and stop settings of the previous slice while > copying the step value and then it works on the elongated object: > > middle_by_two_adj = slice(middle_by_two.start + 3, middle_by_two.stop + 3, > middle_by_two.step) > nums[middle_by_two_adj] > [5, 7, 9] > > A suggestion is that whenever you are not absolutely sure that the contents > of some data structure might change without your participation, then don't > depend on various kinds of aliases to keep the contents synchronized. Make a > copy, perhaps a deep copy and make sure the only thing ever changing it is > your code and later, if needed, copy the result back to any other data > structure. Of course, if anything else is accessing the result in the > original in between, it won't work. > > Just FYI, a similar analysis applies to uses of the numpy and pandas and > other modules if you get some kind of object holding indices to a series such > as integers or Booleans and then later try using it after the number of items > or rows or columns have changed. Your indices no longer match. > > Avi > > -Original Message- > From: Python-list On > Behalf Of Jen Kris via Python-list > Sent: Wednesday, January 11, 2023 1:29 PM > To: Roel Schroeven > Cc: python-list@python.org > Subject: Re: To clarify how Python handles two equal objects > > Thanks for your comments. After all, I asked for clarity so it’s not > pedantic to be precise, and you’re helping to clarify. > > Going back to my original post, > > mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] > arr1 = mx1[2] > > Now if I write "arr1[1] += 5" then both arr1 and mx1[2][1] will be changed > because while they are different names, they are the assigned same memory > location (pointer). Similarly, if I write "mx1[2][1] += 5" then again both > names will be updated. > > That’s what I meant by "an operation on one is an operation on the other." >
Re: To clarify how Python handles two equal objects
Thanks for your comments. After all, I asked for clarity so it’s not pedantic to be precise, and you’re helping to clarify. Going back to my original post, mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] arr1 = mx1[2] Now if I write "arr1[1] += 5" then both arr1 and mx1[2][1] will be changed because while they are different names, they are the assigned same memory location (pointer). Similarly, if I write "mx1[2][1] += 5" then again both names will be updated. That’s what I meant by "an operation on one is an operation on the other." To be more precise, an operation on one name will be reflected in the other name. The difference is in the names, not the pointers. Each name has the same pointer in my example, but operations can be done in Python using either name. Jan 11, 2023, 09:13 by r...@roelschroeven.net: > Op 11/01/2023 om 16:33 schreef Jen Kris via Python-list: > >> Yes, I did understand that. In your example, "a" and "b" are the same >> pointer, so an operation on one is an operation on the other (because >> they’re the same memory block). >> > > Sorry if you feel I'm being overly pedantic, but your explanation "an > operation on one is an operation on the other (because they’re the same > memory block)" still feels a bit misguided. "One" and "other" still make it > sound like there are two objects, and "an operation on one" and "an operation > on the other" make it sound like there are two operations. > Sometimes it doesn't matter if we're a bit sloppy for sake of simplicity or > convenience, sometimes we really need to be precise. I think this is a case > where we need to be precise. > > So, to be precise: there is only one object, with possible multiple names to > it. We can change the object, using one of the names. That is one and only > one operation on one and only one object. Since the different names refer to > the same object, that change will of course be visible through all of them. > Note that 'name' in that sentence doesn't just refer to variables (mx1, arr1, > ...) but also things like indexed lists (mx1[0], mx1[[0][0], ...), loop > variables, function arguments. > > The correct mental model is important here, and I do think you're on track or > very close to it, but the way you phrase things does give me that nagging > feeling that you still might be just a bit off. > > -- > "Peace cannot be kept by force. It can only be achieved through > understanding." > -- Albert Einstein > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: To clarify how Python handles two equal objects
Yes, I did understand that. In your example, "a" and "b" are the same pointer, so an operation on one is an operation on the other (because they’re the same memory block). My issue in Python came up because Python can dynamically change one or the other to a different object (memory block) so I have to be aware of that when handing this kind of situation. Jan 10, 2023, 17:31 by greg.ew...@canterbury.ac.nz: > On 11/01/23 11:21 am, Jen Kris wrote: > >> where one object derives from another object (a = b[0], for example), any >> operation that would alter one will alter the other. >> > > I think you're still confused. In C terms, after a = b[0], a and b[0] > are pointers to the same block of memory. If you change that block of > memory, then of course you will see the change through either pointer. > > Here's a rough C translation of some of your Python code: > > /* mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] */ > int **mx1 = (int **)malloc(3 * sizeof(int *)); > mx1[0] = (int *)malloc(3 * sizeof(int)); > mx1[0][0] = 1; > mx1[0][1] = 2; > mx1[0][2] = 3; > mx1[1] = (int *)malloc(3 * sizeof(int)); > mx1[1][0] = 4; > mx1[1][1] = 5; > mx1[1][2] = 6; > mx1[2] = (int *)malloc(3 * sizeof(int)); > mx1[2][0] = 7; > mx1[2][1] = 8; > mx1[2][2] = 9; > > /* arr1 = mx1[2] */ > int *arr1 = mx[2]; > > /* arr1 = [ 10, 11, 12 ] */ > arr1 = (int *)malloc(3 * sizeof(int)); > arr1[0] = 10; > arr1[1] = 11; > arr1[2] = 12; > > Does that help your understanding? > > -- > Greg > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: To clarify how Python handles two equal objects
There are cases where NumPy would be the best choice, but that wasn’t the case here with what the loop was doing. To sum up what I learned from this post, where one object derives from another object (a = b[0], for example), any operation that would alter one will alter the other. When either is assigned to something else, then they no longer point to the same memory location and they’re once again independent. I hope the word "derives" sidesteps the semantic issue of whether they are "equal." Thanks to all who replied to this post. Jen Jan 10, 2023, 13:59 by li...@tompassin.net: > Just to add a possibly picky detail to what others have said, Python does not > have an "array" type. It has a "list" type, as well as some other, not > necessarily mutable, sequence types. > > If you want to speed up list and matrix operations, you might use NumPy. Its > arrays and matrices are heavily optimized for fast processing and provide > many useful operations on them. No use calling out to C code yourself when > NumPy has been refining that for many years. > > On 1/10/2023 4:10 PM, MRAB wrote: > >> On 2023-01-10 20:41, Jen Kris via Python-list wrote: >> >>> >>> Thanks for your comments. I'd like to make one small point. You say: >>> >>> "Assignment in Python is a matter of object references. It's not >>> "conform them as long as they remain equal". You'll have to think in >>> terms of object references the entire way." >>> >>> But where they have been set to the same object, an operation on one will >>> affect the other as long as they are equal (in Python). So I will have to >>> conform them in those cases because Python will reflect any math operation >>> in both the array and the matrix. >>> >> It's not a 2D matrix, it's a 1D list containing references to 1D lists, each >> of which contains references to Python ints. >> >> In CPython, references happen to be pointers, but that's just an >> implementation detail. >> >>> >>> >>> Jan 10, 2023, 12:28 by ros...@gmail.com: >>> >>>> On Wed, 11 Jan 2023 at 07:14, Jen Kris via Python-list >>>> wrote: >>>> >>>>> >>>>> I am writing a spot speedup in assembly language for a short but >>>>> computation-intensive Python loop, and I discovered something about >>>>> Python array handling that I would like to clarify. >>>>> >>>>> For a simplified example, I created a matrix mx1 and assigned the array >>>>> arr1 to the third row of the matrix: >>>>> >>>>> mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] >>>>> arr1 = mx1[2] >>>>> >>>>> The pointers to these are now the same: >>>>> >>>>> ida = id(mx1[2]) - 140260325306880 >>>>> idb = id(arr1) - 140260325306880 >>>>> >>>>> That’s great because when I encounter this in assembly or C, I can just >>>>> borrow the pointer to row 3 for the array arr1, on the assumption that >>>>> they will continue to point to the same object. Then when I do any math >>>>> operations in arr1 it will be reflected in both arrays because they are >>>>> now pointing to the same array: >>>>> >>>> >>>> That's not an optimization; what you've done is set arr1 to be a >>>> reference to that object. >>>> >>>>> But on the next iteration we assign arr1 to something else: >>>>> >>>>> arr1 = [ 10, 11, 12 ] >>>>> idc = id(arr1) – 140260325308160 >>>>> idd = id(mx1[2]) – 140260325306880 >>>>> >>>>> Now arr1 is no longer equal to mx1[2], and any subsequent operations in >>>>> arr1 will not affect mx1. >>>>> >>>> >>>> Yep, you have just set arr1 to be a completely different object. >>>> >>>>> So where I’m rewriting some Python code in a low level language, I can’t >>>>> assume that the two objects are equal because that equality will not >>>>> remain if either is reassigned. So if I do some operation on one array I >>>>> have to conform the two arrays for as long as they remain equal, I can’t >>>>> just do it in one operation because I can’t rely on the objects remaining >>>>> equal. >>>>> >>>>> Is my understanding of this correct? Is there anything I’m missing? >>>>> >>>> >>>> Assignment in Python is a matter of object references. It's not >>>> "conform them as long as they remain equal". You'll have to think in >>>> terms of object references the entire way. >>>> >>>> ChrisA >>>> -- >>>> https://mail.python.org/mailman/listinfo/python-list >>>> > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: To clarify how Python handles two equal objects
Thanks for your comments. I'd like to make one small point. You say: "Assignment in Python is a matter of object references. It's not "conform them as long as they remain equal". You'll have to think in terms of object references the entire way." But where they have been set to the same object, an operation on one will affect the other as long as they are equal (in Python). So I will have to conform them in those cases because Python will reflect any math operation in both the array and the matrix. Jan 10, 2023, 12:28 by ros...@gmail.com: > On Wed, 11 Jan 2023 at 07:14, Jen Kris via Python-list > wrote: > >> >> I am writing a spot speedup in assembly language for a short but >> computation-intensive Python loop, and I discovered something about Python >> array handling that I would like to clarify. >> >> For a simplified example, I created a matrix mx1 and assigned the array arr1 >> to the third row of the matrix: >> >> mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] >> arr1 = mx1[2] >> >> The pointers to these are now the same: >> >> ida = id(mx1[2]) - 140260325306880 >> idb = id(arr1) - 140260325306880 >> >> That’s great because when I encounter this in assembly or C, I can just >> borrow the pointer to row 3 for the array arr1, on the assumption that they >> will continue to point to the same object. Then when I do any math >> operations in arr1 it will be reflected in both arrays because they are now >> pointing to the same array: >> > > That's not an optimization; what you've done is set arr1 to be a > reference to that object. > >> But on the next iteration we assign arr1 to something else: >> >> arr1 = [ 10, 11, 12 ] >> idc = id(arr1) – 140260325308160 >> idd = id(mx1[2]) – 140260325306880 >> >> Now arr1 is no longer equal to mx1[2], and any subsequent operations in arr1 >> will not affect mx1. >> > > Yep, you have just set arr1 to be a completely different object. > >> So where I’m rewriting some Python code in a low level language, I can’t >> assume that the two objects are equal because that equality will not remain >> if either is reassigned. So if I do some operation on one array I have to >> conform the two arrays for as long as they remain equal, I can’t just do it >> in one operation because I can’t rely on the objects remaining equal. >> >> Is my understanding of this correct? Is there anything I’m missing? >> > > Assignment in Python is a matter of object references. It's not > "conform them as long as they remain equal". You'll have to think in > terms of object references the entire way. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
To clarify how Python handles two equal objects
I am writing a spot speedup in assembly language for a short but computation-intensive Python loop, and I discovered something about Python array handling that I would like to clarify. For a simplified example, I created a matrix mx1 and assigned the array arr1 to the third row of the matrix: mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] arr1 = mx1[2] The pointers to these are now the same: ida = id(mx1[2]) - 140260325306880 idb = id(arr1) - 140260325306880 That’s great because when I encounter this in assembly or C, I can just borrow the pointer to row 3 for the array arr1, on the assumption that they will continue to point to the same object. Then when I do any math operations in arr1 it will be reflected in both arrays because they are now pointing to the same array: arr1[0] += 2 print(mx1[2]) - [9, 8, 9] print(arr1) - [9, 8, 9] Now mx1 looks like this: [ 1, 2, 3 ] [ 4, 5, 6 ] [ 9, 8, 9 ] and it stays that way for remaining iterations. But on the next iteration we assign arr1 to something else: arr1 = [ 10, 11, 12 ] idc = id(arr1) – 140260325308160 idd = id(mx1[2]) – 140260325306880 Now arr1 is no longer equal to mx1[2], and any subsequent operations in arr1 will not affect mx1. So where I’m rewriting some Python code in a low level language, I can’t assume that the two objects are equal because that equality will not remain if either is reassigned. So if I do some operation on one array I have to conform the two arrays for as long as they remain equal, I can’t just do it in one operation because I can’t rely on the objects remaining equal. Is my understanding of this correct? Is there anything I’m missing? Thanks very much. Jen -- https://mail.python.org/mailman/listinfo/python-list
Re: Debugging Python C extensions with GDB
Thanks for your reply. Victor's article didn't mention ctypes extensions, so I wanted to post a question before I build from source. Nov 14, 2022, 14:32 by ba...@barrys-emacs.org: > > >> On 14 Nov 2022, at 19:10, Jen Kris via Python-list >> wrote: >> >> In September 2021, Victor Stinner wrote “Debugging Python C extensions with >> GDB” >> (https://developers.redhat.com/articles/2021/09/08/debugging-python-c-extensions-gdb#getting_started_with_python_3_9). >> >> >> My question is: with Python 3.9+, can I debug into a C extension written in >> pure C and called from ctypes -- that is not written using the C_API? >> > > Yes. > > Just put a breakpoint on the function in the c library that you want to debug. > You can set the breakpoint before a .so is loaded. > > Barry > >> >> Thanks. >> >> Jen >> >> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> -- https://mail.python.org/mailman/listinfo/python-list
Debugging Python C extensions with GDB
In September 2021, Victor Stinner wrote “Debugging Python C extensions with GDB” (https://developers.redhat.com/articles/2021/09/08/debugging-python-c-extensions-gdb#getting_started_with_python_3_9). My question is: with Python 3.9+, can I debug into a C extension written in pure C and called from ctypes -- that is not written using the C_API? Thanks. Jen -- https://mail.python.org/mailman/listinfo/python-list
Re: PyObject_CallFunctionObjArgs segfaults
That's great. It clarifies things a lot for me, particularly re ref count for new references. I would have had trouble if I didn't decref it twice. Thanks very much once again. Sep 30, 2022, 12:18 by pyt...@mrabarnett.plus.com: > On 2022-09-30 17:02, Jen Kris wrote: > >> >> Thanks very much for your detailed reply. I have a few followup questions. >> >> You said, “Some functions return an object that has already been incref'ed >> ("new reference"). This occurs when it has either created a new object (the >> refcount will be 1) or has returned a pointer to an existing object (the >> refcount will be > 1 because it has been incref'ed). Other functions return >> an object that hasn't been incref'ed. This occurs when you're looking up >> something, for example, looking at a member of a list or the value of an >> attribute.” >> >> In the official docs some functions show “Return value: New reference” and >> others do not. Is there any reason why I should not just INCREF on every >> new object, regardless of whether it’s a new reference or not, and DECREF >> when I am finished with it? The answer at >> https://stackoverflow.com/questions/59870703/python-c-extension-need-to-py-incref-a-borrowed-reference-if-not-returning-it-to >> says “With out-of-order execution, the INCREF/DECREF are basically free >> operations, so performance is no reason to leave them out.” Doing so means >> I don’t have to check each object to see if it needs to be INCREF’d or not, >> and that is a big help. >> > It's OK to INCREF them, provided that you DECREF them when you no longer need > them, and remember that if it's a "new reference" you'd need to DECREF it > twice. > >> Also: >> >> What is a borrowed reference, and how does it effect reference counting? >> According to https://jayrambhia.com/blog/pythonc-api-reference-counting, >> “Use Py_INCREF on a borrowed PyObject pointer you already have. This >> increments the reference count on the object, and obligates you to dispose >> of it properly.” So I guess it’s yes, but I’m confused by “pointer you >> already have.” >> > > A borrowed reference is when it hasn't been INCREFed. > > You can think of INCREFing as a way of indicating ownership, which is often > shared ownership (refcount > 1). When you're borrowing a reference, you're > using it temporarily, but not claiming ownership. When the last owner > releases its ownership (DECREF reduces the refcount to 0), the object can be > garbage collected. > > When, say, you lookup an attribute, or get an object from a list with > PyList_GetItem, it won't have been INCREFed. You're using it temporarily, > just borrowing a reference. > >> >> What does it mean to steal a reference? If a function steals a reference >> does it have to decref it without incref (because it’s stolen)? >> > When function steals a reference, it's claiming ownership but not INCREFing > it. > >> >> Finally, you said: >> >> if (pMod_random == 0x0){ >> PyErr_Print(); >> Leaks here because of the refcount >> >> Assuming pMod_random is not null, why would this leak? >> > It's pName_random that's the leak. > > PyUnicode_FromString("random") will either create and return a new object for > the string "random" (refcount == 1) or return a reference to an existing > object (refcount > 1). You need to DECREF it before returning from the > function. > > Suppose it created a new object. You call the function, it creates an object, > you use it, then return from the function. The object still exists, but > there's no reference to it. Now call the function again. It creates another > object, you use it, then return from the function. You now have 2 objects > with no reference to them. > >> Thanks again for your input on this question. >> >> Jen >> >> >> >> Sep 29, 2022, 17:33 by pyt...@mrabarnett.plus.com: >> >> On 2022-09-30 01:02, MRAB wrote: >> >> On 2022-09-29 23:41, Jen Kris wrote: >> >> >> I just solved this C API problem, and I’m posting the >> answer to help anyone else who might need it. >> >> [snip] >> >> What I like to do is write comments that state which variables >> hold a reference, followed by '+' if it's a new reference >> (incref'ed) and '?' if it could be null. '+?' means that it's >> probably a new reference but could be null. Once I know that it's >> not null, I can remove the '?', and once I've decref'ed it (if >> required) and no longer need it, I remobe it from the comment. >> >> Clearing up references, as soon as they're not needed, helps to >> keep the number of current references more manageable. >> >> >> int64_t Get_LibModules(int64_t * return_array) { >> PyObject * pName_random = PyUnicode_FromString("random"); >> //> pName_random+? >> if (!pName_random) { >> PyErr_Print(); >> return 1; >> } >> >> //> pName_random+ >> PyObject * pMod_random = PyImport_Import(pName_random); >> //> pName_random+ pMod_random+? >> Py_DECREF(pName_random); >> //> pMod_random+? >> if (!pMod_r
Re: PyObject_CallFunctionObjArgs segfaults
Thanks very much for your detailed reply. I have a few followup questions. You said, “Some functions return an object that has already been incref'ed ("new reference"). This occurs when it has either created a new object (the refcount will be 1) or has returned a pointer to an existing object (the refcount will be > 1 because it has been incref'ed). Other functions return an object that hasn't been incref'ed. This occurs when you're looking up something, for example, looking at a member of a list or the value of an attribute.” In the official docs some functions show “Return value: New reference” and others do not. Is there any reason why I should not just INCREF on every new object, regardless of whether it’s a new reference or not, and DECREF when I am finished with it? The answer at https://stackoverflow.com/questions/59870703/python-c-extension-need-to-py-incref-a-borrowed-reference-if-not-returning-it-to says “With out-of-order execution, the INCREF/DECREF are basically free operations, so performance is no reason to leave them out.” Doing so means I don’t have to check each object to see if it needs to be INCREF’d or not, and that is a big help. Also: What is a borrowed reference, and how does it effect reference counting? According to https://jayrambhia.com/blog/pythonc-api-reference-counting, “Use Py_INCREF on a borrowed PyObject pointer you already have. This increments the reference count on the object, and obligates you to dispose of it properly.” So I guess it’s yes, but I’m confused by “pointer you already have.” What does it mean to steal a reference? If a function steals a reference does it have to decref it without incref (because it’s stolen)? Finally, you said: if (pMod_random == 0x0){ PyErr_Print(); Leaks here because of the refcount Assuming pMod_random is not null, why would this leak? Thanks again for your input on this question. Jen Sep 29, 2022, 17:33 by pyt...@mrabarnett.plus.com: > On 2022-09-30 01:02, MRAB wrote: > >> On 2022-09-29 23:41, Jen Kris wrote: >> >>> >>> I just solved this C API problem, and I’m posting the answer to help anyone >>> else who might need it. >>> > [snip] > > What I like to do is write comments that state which variables hold a > reference, followed by '+' if it's a new reference (incref'ed) and '?' if it > could be null. '+?' means that it's probably a new reference but could be > null. Once I know that it's not null, I can remove the '?', and once I've > decref'ed it (if required) and no longer need it, I remobe it from the > comment. > > Clearing up references, as soon as they're not needed, helps to keep the > number of current references more manageable. > > > int64_t Get_LibModules(int64_t * return_array) { > PyObject * pName_random = PyUnicode_FromString("random"); > //> pName_random+? > if (!pName_random) { > PyErr_Print(); > return 1; > } > > //> pName_random+ > PyObject * pMod_random = PyImport_Import(pName_random); > //> pName_random+ pMod_random+? > Py_DECREF(pName_random); > //> pMod_random+? > if (!pMod_random) { > PyErr_Print(); > return 1; > } > > //> pMod_random+ > PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed"); > //> pMod_random+ pAttr_seed? > if (!pAttr_seed) { > Py_DECREF(pMod_random); > PyErr_Print(); > return 1; > } > > //> pMod_random+ pAttr_seed > PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, > "randrange"); > //> pMod_random+ pAttr_seed pAttr_randrange? > Py_DECREF(pMod_random); > //> pAttr_seed pAttr_randrange? > if (!pAttr_randrange) { > PyErr_Print(); > return 1; > } > > //> pAttr_seed pAttr_randrange > return_array[0] = (int64_t)pAttr_seed; > return_array[1] = (int64_t)pAttr_randrange; > > return 0; > } > > int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1) { > PyObject * value_ptr = PyLong_FromLong(value_1); > //> value_ptr+? > if (!!value_ptr) { > PyErr_Print(); > return 1; > } > > //> value_ptr+ > PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr, > NULL); > //> value_ptr+ p_seed_calc+? > Py_DECREF(value_ptr); > //> p_seed_calc+? > if (!p_seed_calc) { > PyErr_Print(); > return 1; > } > > //> p_seed_calc+ > Py_DECREF(p_seed_calc); > return 0; > } > > int64_t C_API_12(PyObject * pAttr_randrange, Py_ssize_t value_1) { > PyObject * value_ptr = PyLong_FromLong(value_1); > //> value_ptr+? > if (!value_ptr) { > PyErr_Print(); > return 1; > } > > //> value_ptr+ > PyObject * p_randrange_calc = PyObject_CallFunctionObjArgs(pAttr_randrange, > value_ptr, NULL); > //> value_ptr+ p_randrange_calc+? > Py_DECREF(value_ptr); > //> p_randrange_calc+? > if (!p_randrange_calc) { > PyErr_Print(); > return 1; > } > > //Prepare return values > //> p_randrange_calc+ > return_val = PyLong_AsLong(p_randrange_calc); > Py_DECREF(p_randrange_calc); > > return return_val; > } > > -- > https://mail.python.org/mailman/listinfo/python-list > --
Re: PyObject_CallFunctionObjArgs segfaults
I just solved this C API problem, and I’m posting the answer to help anyone else who might need it. The errors were: (1) we must call Py_INCREF on each object when it’s created. (2) in C_API_2 (see below) we don’t cast value_1 as I did before with PyObject * value_ptr = (PyObject * )value_1. Instead we use PyObject * value_ptr = PyLong_FromLong(value_1); (3) The command string to PyObject_CallFunctionObjArgs must be null terminated. Here’s the revised code: First we load the modules, and increment the reference to each object: int64_t Get_LibModules(int64_t * return_array) { PyObject * pName_random = PyUnicode_FromString("random"); PyObject * pMod_random = PyImport_Import(pName_random); Py_INCREF(pName_random); Py_INCREF(pMod_random); if (pMod_random == 0x0){ PyErr_Print(); return 1;} PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed"); PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange"); Py_INCREF(pAttr_seed); Py_INCREF(pAttr_randrange); return_array[0] = (int64_t)pAttr_seed; return_array[1] = (int64_t)pAttr_randrange; return 0; } Next we call a program to initialize the random number generator with random.seed(), and increment the reference to its return value p_seed_calc: int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1) { PyObject * value_ptr = PyLong_FromLong(value_1); PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr, NULL); // _ if (p_seed_calc == 0x0){ PyErr_Print(); return 1;} Py_INCREF(p_seed_calc); return 0; } Now we call another program to get a random number: int64_t C_API_12(PyObject * pAttr_randrange, Py_ssize_t value_1) { PyObject * value_ptr = PyLong_FromLong(value_1); PyObject * p_randrange_calc = PyObject_CallFunctionObjArgs(pAttr_randrange, value_ptr, NULL); if (p_randrange_calc == 0x0){ PyErr_Print(); return 1;} //Prepare return values long return_val = PyLong_AsLong(p_randrange_calc); return return_val; } That returns 28, which is what I get from the Python command line. Thanks again to MRAB for helpful comments. Jen Sep 29, 2022, 15:31 by pyt...@mrabarnett.plus.com: > On 2022-09-29 21:47, Jen Kris wrote: > >> To update my previous email, I found the problem, but I have a new problem. >> >> Previously I cast PyObject * value_ptr = (PyObject * )value_1 but that's not >> correct. Instead I used PyObject * value_ptr = PyLong_FromLong(value_1) and >> that works. HOWEVER, while PyObject_CallFunctionObjArgs does work now, it >> returns -1, which is not the right answer for random.seed. I use "long >> return_val = PyLong_AsLong(p_seed_calc);" to convert it to a long. >> > random.seed returns None, so when you call PyObject_CallFunctionObjArgs it > returns a new reference to Py_None. > > If you then pass to PyLong_AsLong a reference to something that's not a > PyLong, it'll set an error and return -1. > >> So my question is why do I get -1 as return value? When I query p_seed calc >> : get: >> >> (gdb) p p_seed_calc >> $2 = (PyObject *) 0x769be120 <_Py_NoneStruct> >> > Exactly. It's Py_None, not a PyLong. > >> Thanks again. >> >> Jen >> >> >> >> >> Sep 29, 2022, 13:02 by python-list@python.org: >> >> Thanks very much to @MRAB for taking time to answer. I changed my >> code to conform to your answer (as best I understand your comments >> on references), but I still get the same error. My comments >> continue below the new code immediately below. >> >> int64_t Get_LibModules(int64_t * return_array) >> { >> PyObject * pName_random = PyUnicode_FromString("random"); >> PyObject * pMod_random = PyImport_Import(pName_random); >> >> Py_INCREF(pName_random); >> Py_INCREF(pMod_random); >> >> if (pMod_random == 0x0){ >> PyErr_Print(); >> return 1;} >> >> PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed"); >> PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, >> "randrange"); >> >> Py_INCREF(pAttr_seed); >> Py_INCREF(pAttr_randrange); >> >> return_array[0] = (int64_t)pAttr_seed; >> return_array[1] = (int64_t)pAttr_randrange; >> >> return 0; >> } >> >> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1) >> { >> PyObject * value_ptr = (PyObject * )value_1; >> PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, >> value_ptr, NULL); >> >> if (p_seed_calc == 0x0){ >> PyErr_Print(); >> return 1;} >> >> //Prepare return values >> long return_val = PyLong
Re: PyObject_CallFunctionObjArgs segfaults
To update my previous email, I found the problem, but I have a new problem. Previously I cast PyObject * value_ptr = (PyObject * )value_1 but that's not correct. Instead I used PyObject * value_ptr = PyLong_FromLong(value_1) and that works. HOWEVER, while PyObject_CallFunctionObjArgs does work now, it returns -1, which is not the right answer for random.seed. I use "long return_val = PyLong_AsLong(p_seed_calc);" to convert it to a long. So my question is why do I get -1 as return value? When I query p_seed calc : get: (gdb) p p_seed_calc $2 = (PyObject *) 0x769be120 <_Py_NoneStruct> Thanks again. Jen Sep 29, 2022, 13:02 by python-list@python.org: > Thanks very much to @MRAB for taking time to answer. I changed my code to > conform to your answer (as best I understand your comments on references), > but I still get the same error. My comments continue below the new code > immediately below. > > int64_t Get_LibModules(int64_t * return_array) > { > PyObject * pName_random = PyUnicode_FromString("random"); > PyObject * pMod_random = PyImport_Import(pName_random); > > Py_INCREF(pName_random); > Py_INCREF(pMod_random); > > if (pMod_random == 0x0){ > PyErr_Print(); > return 1;} > > PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed"); > PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange"); > > Py_INCREF(pAttr_seed); > Py_INCREF(pAttr_randrange); > > return_array[0] = (int64_t)pAttr_seed; > return_array[1] = (int64_t)pAttr_randrange; > > return 0; > } > > int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1) > { > PyObject * value_ptr = (PyObject * )value_1; > PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr, > NULL); > > if (p_seed_calc == 0x0){ > PyErr_Print(); > return 1;} > > //Prepare return values > long return_val = PyLong_AsLong(p_seed_calc); > > return return_val; > } > > So I incremented the reference to all objects in Get_LibModules, but I still > get the same segfault at PyObject_CallFunctionObjArgs. Unfortunately, > reference counting is not well documented so I’m not clear what’s wrong. > > > > > Sep 29, 2022, 10:06 by pyt...@mrabarnett.plus.com: > >> On 2022-09-29 16:54, Jen Kris via Python-list wrote: >> >>> Recently I completed a project where I used PyObject_CallFunctionObjArgs >>> extensively with the NLTK library from a program written in NASM, with no >>> problems. Now I am on a new project where I call the Python random >>> library. I use the same setup as before, but I am getting a segfault with >>> random.seed. >>> >>> At the start of the NASM program I call a C API program that gets PyObject >>> pointers to “seed” and “randrange” in the same way as I did before: >>> >>> int64_t Get_LibModules(int64_t * return_array) >>> { >>> PyObject * pName_random = PyUnicode_FromString("random"); >>> PyObject * pMod_random = PyImport_Import(pName_random); >>> >> Both PyUnicode_FromString and PyImport_Import return new references or null >> pointers. >> >>> if (pMod_random == 0x0){ >>> PyErr_Print(); >>> >> >> You're leaking a reference here (pName_random). >> >>> return 1;} >>> >>> PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed"); >>> PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, >>> "randrange"); >>> >>> return_array[0] = (int64_t)pAttr_seed; >>> return_array[1] = (int64_t)pAttr_randrange; >>> >> >> You're leaking 2 references here (pName_random and pMod_random). >> >>> return 0; >>> } >>> >>> Later in the same program I call a C API program to call random.seed: >>> >>> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1) >>> { >>> PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_1); >>> >> >> It's expecting all of the arguments to be PyObject*, but value_1 is >> Py_ssize_t instead of PyObject* (a pointer to a _Python_ int). >> >> The argument list must end with a null pointer. >> >> It returns a new reference or a null pointer. >> >>> >>> if (p_seed_calc == 0x0){ >>> PyErr_Print(); >>> return 1;} >>> >>> //Prepare return values >>> long return_val = PyLong_AsLong(p_seed_calc); >>> >> You're leaking a reference here (p_seed_calc). >> &
Re: PyObject_CallFunctionObjArgs segfaults
Thanks very much to @MRAB for taking time to answer. I changed my code to conform to your answer (as best I understand your comments on references), but I still get the same error. My comments continue below the new code immediately below. int64_t Get_LibModules(int64_t * return_array) { PyObject * pName_random = PyUnicode_FromString("random"); PyObject * pMod_random = PyImport_Import(pName_random); Py_INCREF(pName_random); Py_INCREF(pMod_random); if (pMod_random == 0x0){ PyErr_Print(); return 1;} PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed"); PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange"); Py_INCREF(pAttr_seed); Py_INCREF(pAttr_randrange); return_array[0] = (int64_t)pAttr_seed; return_array[1] = (int64_t)pAttr_randrange; return 0; } int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1) { PyObject * value_ptr = (PyObject * )value_1; PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr, NULL); if (p_seed_calc == 0x0){ PyErr_Print(); return 1;} //Prepare return values long return_val = PyLong_AsLong(p_seed_calc); return return_val; } So I incremented the reference to all objects in Get_LibModules, but I still get the same segfault at PyObject_CallFunctionObjArgs. Unfortunately, reference counting is not well documented so I’m not clear what’s wrong. Sep 29, 2022, 10:06 by pyt...@mrabarnett.plus.com: > On 2022-09-29 16:54, Jen Kris via Python-list wrote: > >> Recently I completed a project where I used PyObject_CallFunctionObjArgs >> extensively with the NLTK library from a program written in NASM, with no >> problems. Now I am on a new project where I call the Python random library. >> I use the same setup as before, but I am getting a segfault with >> random.seed. >> >> At the start of the NASM program I call a C API program that gets PyObject >> pointers to “seed” and “randrange” in the same way as I did before: >> >> int64_t Get_LibModules(int64_t * return_array) >> { >> PyObject * pName_random = PyUnicode_FromString("random"); >> PyObject * pMod_random = PyImport_Import(pName_random); >> > Both PyUnicode_FromString and PyImport_Import return new references or null > pointers. > >> if (pMod_random == 0x0){ >> PyErr_Print(); >> > > You're leaking a reference here (pName_random). > >> return 1;} >> >> PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed"); >> PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, >> "randrange"); >> >> return_array[0] = (int64_t)pAttr_seed; >> return_array[1] = (int64_t)pAttr_randrange; >> > > You're leaking 2 references here (pName_random and pMod_random). > >> return 0; >> } >> >> Later in the same program I call a C API program to call random.seed: >> >> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1) >> { >> PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_1); >> > > It's expecting all of the arguments to be PyObject*, but value_1 is > Py_ssize_t instead of PyObject* (a pointer to a _Python_ int). > > The argument list must end with a null pointer. > > It returns a new reference or a null pointer. > >> >> if (p_seed_calc == 0x0){ >> PyErr_Print(); >> return 1;} >> >> //Prepare return values >> long return_val = PyLong_AsLong(p_seed_calc); >> > You're leaking a reference here (p_seed_calc). > >> return return_val; >> } >> >> The first program correctly imports “random” and gets pointers to “seed” and >> “randrange.” I verified that the same pointer is correctly passed into >> C_API_2, and the seed value (1234) is passed as Py_ssize_t value_1. But I >> get this segfault: >> >> Program received signal SIGSEGV, Segmentation fault. >> 0x764858d5 in _Py_INCREF (op=0x4d2) at ../Include/object.h:459 >> 459 ../Include/object.h: No such file or directory. >> >> So I tried Py_INCREF in the first program: >> >> Py_INCREF(pMod_random); >> Py_INCREF(pAttr_seed); >> >> Then I moved Py_INCREF(pAttr_seed) to the second program. Same segfault. >> >> Finally, I initialized “random” and “seed” in the second program, where they >> are used. Same segfault. >> >> The segfault refers to Py_INCREF, so this seems to do with reference >> counting, but Py_INCREF didn’t solve it. >> >> I’m using Python 3.8 on Ubuntu. >> >> Thanks for any ideas on how to solve this. >> >> Jen >> > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
PyObject_CallFunctionObjArgs segfaults
Recently I completed a project where I used PyObject_CallFunctionObjArgs extensively with the NLTK library from a program written in NASM, with no problems. Now I am on a new project where I call the Python random library. I use the same setup as before, but I am getting a segfault with random.seed. At the start of the NASM program I call a C API program that gets PyObject pointers to “seed” and “randrange” in the same way as I did before: int64_t Get_LibModules(int64_t * return_array) { PyObject * pName_random = PyUnicode_FromString("random"); PyObject * pMod_random = PyImport_Import(pName_random); if (pMod_random == 0x0){ PyErr_Print(); return 1;} PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed"); PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange"); return_array[0] = (int64_t)pAttr_seed; return_array[1] = (int64_t)pAttr_randrange; return 0; } Later in the same program I call a C API program to call random.seed: int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1) { PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_1); if (p_seed_calc == 0x0){ PyErr_Print(); return 1;} //Prepare return values long return_val = PyLong_AsLong(p_seed_calc); return return_val; } The first program correctly imports “random” and gets pointers to “seed” and “randrange.” I verified that the same pointer is correctly passed into C_API_2, and the seed value (1234) is passed as Py_ssize_t value_1. But I get this segfault: Program received signal SIGSEGV, Segmentation fault. 0x764858d5 in _Py_INCREF (op=0x4d2) at ../Include/object.h:459 459 ../Include/object.h: No such file or directory. So I tried Py_INCREF in the first program: Py_INCREF(pMod_random); Py_INCREF(pAttr_seed); Then I moved Py_INCREF(pAttr_seed) to the second program. Same segfault. Finally, I initialized “random” and “seed” in the second program, where they are used. Same segfault. The segfault refers to Py_INCREF, so this seems to do with reference counting, but Py_INCREF didn’t solve it. I’m using Python 3.8 on Ubuntu. Thanks for any ideas on how to solve this. Jen -- https://mail.python.org/mailman/listinfo/python-list
Re: Problem slicing a list with the C API
Thanks for PySequence_InPlaceConcat, so when I need to extend I'll know what to use. But my previous email was based on incorrect information from several SO posts that claimed only the extend method will work to add tuples to a list. I found that's wrong -- even my own Python code uses the append method. But my PyList_Append is not doing the job so that's where I'm looking now. Thanks very much for your reply. Mar 12, 2022, 15:36 by ros...@gmail.com: > On Sun, 13 Mar 2022 at 10:30, Jen Kris wrote: > >> >> >> Chris, you were right to focus on the list pDictData itself. As I said, >> that is a list of 2-tuples, but I added each of the 2-tuples with >> PyList_Append, but you can only append a tuple to a list with the extend >> method. However, there is no append method in the C API as far as I can >> tell -- hence pDictData is empty. I tried with PyList_SetItem but that >> doesn't work. Do you know of way to "extend" a list in the C API. >> > > Hmm. Not entirely sure I understand the question. > > In Python, a list has an append method, which takes any object (which > may be a tuple) and adds that object to the end of the list: > x = ["spam", "ham"] x.append((1,2)) x > ['spam', 'ham', (1, 2)] > > A list also has an extend method, which takes any sequence (that also > includes tuples), and adds *the elements from it* to the end of the > list: > x = ["spam", "ham"] x.extend((1,2)) x > ['spam', 'ham', 1, 2] > > The append method corresponds to PyList_Append, as you mentioned. It > should be quite happy to append a tuple, and will add the tuple > itself, not the contents of it. So when you iterate over the list, > you'll get tuples. > > Extending a list can be done with the sequence API. In Python, you can > write extend() as +=, indicating that you're adding something onto the > end: > x = ["spam", "ham"] x += (1, 2) x > ['spam', 'ham', 1, 2] > > This corresponds to PySequence_InPlaceConcat, so if that's the > behaviour you want, that would be the easiest way to do it. > > Based on your other comments, I would suspect that appending the > tuples is probably what you want here? > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Problem slicing a list with the C API
Chris, you were right to focus on the list pDictData itself. As I said, that is a list of 2-tuples, but I added each of the 2-tuples with PyList_Append, but you can only append a tuple to a list with the extend method. However, there is no append method in the C API as far as I can tell -- hence pDictData is empty. I tried with PyList_SetItem but that doesn't work. Do you know of way to "extend" a list in the C API. Thanks very much. Jen Mar 12, 2022, 13:57 by ros...@gmail.com: > On Sun, 13 Mar 2022 at 08:54, Jen Kris wrote: > >> >> >> pDictData, despite the name, is a list of 2-tuples where each 2-tuple is a >> dictionary object and a string. >> > > Ah, gotcha. In that case, yeah, slicing it will involve referencing > the tuples all the way down the line (adding to their refcounts, so if > there's a borked one, kaboom). > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Problem slicing a list with the C API
pDictData, despite the name, is a list of 2-tuples where each 2-tuple is a dictionary object and a string. Mar 12, 2022, 13:41 by ros...@gmail.com: > On Sun, 13 Mar 2022 at 08:25, Jen Kris via Python-list > wrote: > >> PyObject* slice = PySlice_New(PyLong_FromLong(0), half_slice, 0); >> PyObject* subdata_a = PyObject_GetItem(pDictddata, slice); >> >> On the final line (subdata_a) I get a segfault. I know that the second >> parameter of PyObject_GetItem is a “key” and I suspect that’s where the >> problem comes from, but I don’t understand what a key is in this context. >> > > The key is simply whatever would be in the square brackets in Python > code, so that part looks fine. > > But dictionaries aren't usually subscripted with slices, so I'm a bit > confused as to what's going on here. What exactly is > dictdata/pDictdata? > > Have you confirmed that pDictdata (a) isn't NULL, (b) is the object > you intend it to be, and (c) contains the objects you expect it to? > The segfault might not be from the slice object itself, it might be > from actually iterating over the thing being sliced and touching all > its elements. For instance, if dictdata is actually a list, that call > will be constructing a new list with references to the same elements, > so if one of them is broken (maybe NULL), it'll break badly. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Problem slicing a list with the C API
Thanks to you both. I am going to implement PySequence_Get_Slice now. If I have trouble then, per comments from Chris Angelico, I will iterate through pDictData to verify it because I haven't done that. It is not null, however. Jen Mar 12, 2022, 13:40 by pyt...@mrabarnett.plus.com: > On 2022-03-12 21:24, Jen Kris via Python-list wrote: > >> I have a C API project where I have to slice a list into two parts. >> Unfortunately the documentation on the slice objects is not clear enough for >> me to understand how to do this, and I haven’t found enough useful info >> through research. The list contains tuple records where each tuple consists >> of a dictionary object and a string. >> >> The relevant part of the Python code is: >> >> half_slice = int(len(dictdata) * 0.5) >> subdata_a = dictdata[half_slice:] >> subdata_b = dictdata[:half_slice] >> >> This is what I’ve done so far with the C API: >> >> int64_t Calc_Slices(PyObject* pDictdata, int64_t records_count) >> { >> long round_half = records_count * 0.5; >> PyObject* half_slice = PyLong_FromLong(round_half); >> >> PyObject* slice = PySlice_New(PyLong_FromLong(0), half_slice, 0); >> PyObject* subdata_a = PyObject_GetItem(pDictddata, slice); >> >> return 0; >> } >> >> On the final line (subdata_a) I get a segfault. I know that the second >> parameter of PyObject_GetItem is a “key” and I suspect that’s where the >> problem comes from, but I don’t understand what a key is in this context. >> >> The code shown above omits error handling but none of the objects leading up >> to the final line is null, they all succeed. >> >> Thanks for any ideas. >> > Use PySequence_GetSlice to slice the list. > > Also, why use floats when you can use integers? > > long round_half = records_count / 2; > > (In Python that would be half_slice = len(dictdata) // 2.) > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Problem slicing a list with the C API
I have a C API project where I have to slice a list into two parts. Unfortunately the documentation on the slice objects is not clear enough for me to understand how to do this, and I haven’t found enough useful info through research. The list contains tuple records where each tuple consists of a dictionary object and a string. The relevant part of the Python code is: half_slice = int(len(dictdata) * 0.5) subdata_a = dictdata[half_slice:] subdata_b = dictdata[:half_slice] This is what I’ve done so far with the C API: int64_t Calc_Slices(PyObject* pDictdata, int64_t records_count) { long round_half = records_count * 0.5; PyObject* half_slice = PyLong_FromLong(round_half); PyObject* slice = PySlice_New(PyLong_FromLong(0), half_slice, 0); PyObject* subdata_a = PyObject_GetItem(pDictddata, slice); return 0; } On the final line (subdata_a) I get a segfault. I know that the second parameter of PyObject_GetItem is a “key” and I suspect that’s where the problem comes from, but I don’t understand what a key is in this context. The code shown above omits error handling but none of the objects leading up to the final line is null, they all succeed. Thanks for any ideas. Jen -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_CallFunctionObjArgs returns incorrect result
Thanks to MRAB and Chris Angelico for your help. Here is how I implemented the string conversion, and it works correctly now for a library call that needs a list converted to a string (error handling not shown): PyObject* str_sentence = PyObject_Str(pSentence); PyObject* separator = PyUnicode_FromString(" "); PyObject* str_join = PyUnicode_Join(separator, pSentence); Py_DECREF(separator); PyObject* pNltk_WTok = PyObject_GetAttrString(pModule_mstr, "word_tokenize"); PyObject* pWTok = PyObject_CallFunctionObjArgs(pNltk_WTok, str_join, 0); That produces what I need (this is the REPR of pWTok): "['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']" Thanks again to both of you. Jen Mar 7, 2022, 11:03 by pyt...@mrabarnett.plus.com: > On 2022-03-07 17:05, Jen Kris wrote: > >> Thank you MRAB for your reply. >> >> Regarding your first question, pSentence is a list. In the nltk library, >> nltk.word_tokenize takes a string, so we convert sentence to string before >> we call nltk.word_tokenize: >> >> >>> sentence = " ".join(sentence) >> >>> pt = nltk.word_tokenize(sentence) >> >>> print(sentence) >> [ Emma by Jane Austen 1816 ] >> >> But with the C API it looks like this: >> >> PyObject *pSentence = PySequence_GetItem(pSents, sent_count); >> PyObject* str_sentence = PyObject_Str(pSentence); // Convert to string >> >> ; See what str_sentence looks like: >> PyObject* repr_str = PyObject_Repr(str_sentence); >> PyObject* str_str = PyUnicode_AsEncodedString(repr_str, "utf-8", "~E~"); >> const char *bytes_str = PyBytes_AS_STRING(str_str); >> printf("REPR_String: %s\n", bytes_str); >> >> REPR_String: "['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']" >> >> So the two string representations are not the same – or at least the >> PyUnicode_AsEncodedString is not the same, as each item is surrounded by >> single quotes. >> >> Assuming that the conversion to bytes object for the REPR is an accurate >> representation of str_sentence, it looks like I need to strip the quotes >> from str_sentence before “PyObject* pWTok = >> PyObject_CallFunctionObjArgs(pNltk_WTok, str_sentence, 0).” >> >> So my questions now are (1) is there a C API function that will convert a >> list to a string exactly the same way as ‘’.join, and if not then (2) how >> can I strip characters from a string object in the C API? >> > Your Python code is joining the list with a space as the separator. > > The equivalent using the C API is: > > PyObject* separator; > PyObject* joined; > > separator = PyUnicode_FromString(" "); > joined = PyUnicode_Join(separator, pSentence); > Py_DECREF(sep); > >> >> Mar 6, 2022, 17:42 by pyt...@mrabarnett.plus.com: >> >> On 2022-03-07 00:32, Jen Kris via Python-list wrote: >> >> I am using the C API in Python 3.8 with the nltk library, and >> I have a problem with the return from a library call >> implemented with PyObject_CallFunctionObjArgs. >> >> This is the relevant Python code: >> >> import nltk >> from nltk.corpus import gutenberg >> fileids = gutenberg.fileids() >> sentences = gutenberg.sents(fileids[0]) >> sentence = sentences[0] >> sentence = " ".join(sentence) >> pt = nltk.word_tokenize(sentence) >> >> I run this at the Python command prompt to show how it works: >> >> sentence = " ".join(sentence) >> pt = nltk.word_tokenize(sentence) >> print(pt) >> >> ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']'] >> >> type(pt) >> >> >> >> This is the relevant part of the C API code: >> >> PyObject* str_sentence = PyObject_Str(pSentence); >> // nltk.word_tokenize(sentence) >> PyObject* pNltk_WTok = PyObject_GetAttrString(pModule_mstr, >> "word_tokenize"); >> PyObject* pWTok = PyObject_CallFunctionObjArgs(pNltk_WTok, >> str_sentence, 0); >> >> (where pModule_mstr is the nltk library). >> >> That should produce a list with a length of 7 that looks like >> it does on the command line version shown above: >> >> ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']'] >> >> But instead the C API produces a list with a length of 24, and >> the REPR looks like this: >> >> '[\'[\', "\'", \'[\', "\'", \',\', "\'Emma", "\'", \',\', >> "\'by", "\'", \',\', "\'Jane", "\'", \',\', "\'Austen", "\'", >> \',\', "\'1816", "\'", \',\', "\'", \']\', "\'", \']\']' >> >> I also tried this with PyObject_CallMethodObjArgs and >> PyObject_Call without success. >> >> Thanks for any help on this. >> >> What is pSentence? Is it what you think it is? >> To me it looks like it's either the list: >> >> ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']'] >> >> or that list as a string: >> >> "['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']" >> >> and that what you're tokenising. >> -- https://mail.python.org/mailman/listinfo/python-list >> > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_CallFunctionObjArgs returns incorrect result
The PyObject str_sentence is a string representation of a list. I need to convert the list to a string like "".join because that's what the library call takes. Mar 7, 2022, 09:09 by ros...@gmail.com: > On Tue, 8 Mar 2022 at 04:06, Jen Kris via Python-list > wrote: > >> But with the C API it looks like this: >> >> PyObject *pSentence = PySequence_GetItem(pSents, sent_count); >> PyObject* str_sentence = PyObject_Str(pSentence); // Convert to string >> >> PyObject* repr_str = PyObject_Repr(str_sentence); >> > > You convert it to a string, then take the representation of that. Is > that what you intended? > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_CallFunctionObjArgs returns incorrect result
Thank you MRAB for your reply. Regarding your first question, pSentence is a list. In the nltk library, nltk.word_tokenize takes a string, so we convert sentence to string before we call nltk.word_tokenize: >>> sentence = " ".join(sentence) >>> pt = nltk.word_tokenize(sentence) >>> print(sentence) [ Emma by Jane Austen 1816 ] But with the C API it looks like this: PyObject *pSentence = PySequence_GetItem(pSents, sent_count); PyObject* str_sentence = PyObject_Str(pSentence); // Convert to string ; See what str_sentence looks like: PyObject* repr_str = PyObject_Repr(str_sentence); PyObject* str_str = PyUnicode_AsEncodedString(repr_str, "utf-8", "~E~"); const char *bytes_str = PyBytes_AS_STRING(str_str); printf("REPR_String: %s\n", bytes_str); REPR_String: "['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']" So the two string representations are not the same – or at least the PyUnicode_AsEncodedString is not the same, as each item is surrounded by single quotes. Assuming that the conversion to bytes object for the REPR is an accurate representation of str_sentence, it looks like I need to strip the quotes from str_sentence before “PyObject* pWTok = PyObject_CallFunctionObjArgs(pNltk_WTok, str_sentence, 0).” So my questions now are (1) is there a C API function that will convert a list to a string exactly the same way as ‘’.join, and if not then (2) how can I strip characters from a string object in the C API? Thanks. Mar 6, 2022, 17:42 by pyt...@mrabarnett.plus.com: > On 2022-03-07 00:32, Jen Kris via Python-list wrote: > >> I am using the C API in Python 3.8 with the nltk library, and I have a >> problem with the return from a library call implemented with >> PyObject_CallFunctionObjArgs. >> >> This is the relevant Python code: >> >> import nltk >> from nltk.corpus import gutenberg >> fileids = gutenberg.fileids() >> sentences = gutenberg.sents(fileids[0]) >> sentence = sentences[0] >> sentence = " ".join(sentence) >> pt = nltk.word_tokenize(sentence) >> >> I run this at the Python command prompt to show how it works: >> >>>>> sentence = " ".join(sentence) >>>>> pt = nltk.word_tokenize(sentence) >>>>> print(pt) >>>>> >> ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']'] >> >>>>> type(pt) >>>>> >> >> >> This is the relevant part of the C API code: >> >> PyObject* str_sentence = PyObject_Str(pSentence); >> // nltk.word_tokenize(sentence) >> PyObject* pNltk_WTok = PyObject_GetAttrString(pModule_mstr, "word_tokenize"); >> PyObject* pWTok = PyObject_CallFunctionObjArgs(pNltk_WTok, str_sentence, 0); >> >> (where pModule_mstr is the nltk library). >> >> That should produce a list with a length of 7 that looks like it does on the >> command line version shown above: >> >> ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']'] >> >> But instead the C API produces a list with a length of 24, and the REPR >> looks like this: >> >> '[\'[\', "\'", \'[\', "\'", \',\', "\'Emma", "\'", \',\', "\'by", "\'", >> \',\', "\'Jane", "\'", \',\', "\'Austen", "\'", \',\', "\'1816", "\'", >> \',\', "\'", \']\', "\'", \']\']' >> >> I also tried this with PyObject_CallMethodObjArgs and PyObject_Call without >> success. >> >> Thanks for any help on this. >> > What is pSentence? Is it what you think it is? > To me it looks like it's either the list: > > ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']'] > > or that list as a string: > > "['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']" > > and that what you're tokenising. > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
C API PyObject_CallFunctionObjArgs returns incorrect result
I am using the C API in Python 3.8 with the nltk library, and I have a problem with the return from a library call implemented with PyObject_CallFunctionObjArgs. This is the relevant Python code: import nltk from nltk.corpus import gutenberg fileids = gutenberg.fileids() sentences = gutenberg.sents(fileids[0]) sentence = sentences[0] sentence = " ".join(sentence) pt = nltk.word_tokenize(sentence) I run this at the Python command prompt to show how it works: >>> sentence = " ".join(sentence) >>> pt = nltk.word_tokenize(sentence) >>> print(pt) ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']'] >>> type(pt) This is the relevant part of the C API code: PyObject* str_sentence = PyObject_Str(pSentence); // nltk.word_tokenize(sentence) PyObject* pNltk_WTok = PyObject_GetAttrString(pModule_mstr, "word_tokenize"); PyObject* pWTok = PyObject_CallFunctionObjArgs(pNltk_WTok, str_sentence, 0); (where pModule_mstr is the nltk library). That should produce a list with a length of 7 that looks like it does on the command line version shown above: ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']'] But instead the C API produces a list with a length of 24, and the REPR looks like this: '[\'[\', "\'", \'[\', "\'", \',\', "\'Emma", "\'", \',\', "\'by", "\'", \',\', "\'Jane", "\'", \',\', "\'Austen", "\'", \',\', "\'1816", "\'", \',\', "\'", \']\', "\'", \']\']' I also tried this with PyObject_CallMethodObjArgs and PyObject_Call without success. Thanks for any help on this. Jen -- https://mail.python.org/mailman/listinfo/python-list
Re: C API - How to return a Dictionary as a Dictionary type
Yes, that works. This is my first day with C API dictionaries. Now that you've explained it, it makes perfect sense. Thanks much. Jen Feb 14, 2022, 17:24 by ros...@gmail.com: > On Tue, 15 Feb 2022 at 12:07, Jen Kris via Python-list > wrote: > >> >> I created a dictionary with the Python C API and assigned two keys and >> values: >> >> PyObject* this_dict = PyDict_New(); >> const char *key = "key1"; >> char *val = "data_01"; >> PyObject* val_p = PyUnicode_FromString(val); >> int r = PyDict_SetItemString(this_dict, key, val_p); >> >> // Add another k-v pair >> key = "key2"; >> val = "data_02"; >> val_p = PyUnicode_FromString(val); >> r = PyDict_SetItemString(this_dict, key, val_p); >> >> I need to retrieve the entire dictionary to be passed to a library function >> that expects a dictionary. I used PyDict_Items: >> >> PyObject* pdi = PyDict_Items(this_dict); >> PyObject* str_untagd = PyObject_Str(pdi); >> PyObject* repr_utd = PyObject_Repr(str_untagd); >> PyObject* str_utd = PyUnicode_AsEncodedString(repr_utd, "utf-8", "~E~"); >> const char *bytes_d = PyBytes_AS_STRING(str_utd); >> printf("REPR_UnTag: %s\n", bytes_d); >> >> but as the docs say (https://docs.python.org/3.8/c-api/dict.html), that >> returns a PyListObject, not a dictionary enclosed with curly braces: >> >> [('key1', 'data_01'), ('key2', 'data_02')]". >> >> My question is, how can I get the dictionary as a dictionary type, enclosed >> with curly braces. I found PyObject_GenericGetDict >> (https://docs.python.org/3.8/c-api/object.html) but I haven't found any >> documentation or explanation of how it works. >> >> Is PyObject_GenericGetDict what I need, or is there another way to do it? >> > > Not sure what you mean. The dict is already a dict. If you refer to > this_dict, it is a dict, right? > > If you need the string representation of that, you should be able to > call PyObject_Repr just as you are, but call it on the dict, not on > the dict items. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
C API - How to return a Dictionary as a Dictionary type
I created a dictionary with the Python C API and assigned two keys and values: PyObject* this_dict = PyDict_New(); const char *key = "key1"; char *val = "data_01"; PyObject* val_p = PyUnicode_FromString(val); int r = PyDict_SetItemString(this_dict, key, val_p); // Add another k-v pair key = "key2"; val = "data_02"; val_p = PyUnicode_FromString(val); r = PyDict_SetItemString(this_dict, key, val_p); I need to retrieve the entire dictionary to be passed to a library function that expects a dictionary. I used PyDict_Items: PyObject* pdi = PyDict_Items(this_dict); PyObject* str_untagd = PyObject_Str(pdi); PyObject* repr_utd = PyObject_Repr(str_untagd); PyObject* str_utd = PyUnicode_AsEncodedString(repr_utd, "utf-8", "~E~"); const char *bytes_d = PyBytes_AS_STRING(str_utd); printf("REPR_UnTag: %s\n", bytes_d); but as the docs say (https://docs.python.org/3.8/c-api/dict.html), that returns a PyListObject, not a dictionary enclosed with curly braces: [('key1', 'data_01'), ('key2', 'data_02')]". My question is, how can I get the dictionary as a dictionary type, enclosed with curly braces. I found PyObject_GenericGetDict (https://docs.python.org/3.8/c-api/object.html) but I haven't found any documentation or explanation of how it works. Is PyObject_GenericGetDict what I need, or is there another way to do it? Thanks, Jen -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
Thank you for that suggestion. It allowed me to replace six lines of code with one. :) Feb 10, 2022, 12:43 by pyt...@mrabarnett.plus.com: > On 2022-02-10 20:00, Jen Kris via Python-list wrote: > >> With the help of PyErr_Print() I have it solved. Here is the final code >> (the part relevant to sents): >> >> Py_ssize_t listIndex = 0; >> pListItem = PyList_GetItem(pFileIds, listIndex); >> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); >> pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer >> >> // Then: sentences = gutenberg.sents(fileid) - this is a sequence item >> PyObject *c_args = Py_BuildValue("s", pListStr); >> PyObject *args_tuple = PyTuple_New(1); >> PyTuple_SetItem(args_tuple, 0, c_args); >> >> pSents = PyObject_CallObject(pSentMod, args_tuple); >> >> if ( pSents == 0x0){ >> PyErr_Print(); >> return return_value; } >> >> As you mentioned yesterday, CallObject needs a tuple, so that was the >> problem. Now it works. >> >> You also asked why I don't just use pListStrE. I tried that and got a long >> error message from PyErr_Print. I'm not far enough along in my C_API work >> to understand why, but it doesn't work. >> >> Thanks very much for your help on this. >> > You're encoding a Unicode string to a UTF-8 bytestring: > > pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); > > then pointing to the bytes of that UTF-8 bytestring: > > pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer > > then making a Unicode string from those UTF-8 bytes: > > PyObject *c_args = Py_BuildValue("s", pListStr); > > You might was well just use the original Unicode string! > > Try this instead: > > Py_ssize_t listIndex = 0; > pListItem = PyList_GetItem(pFileIds, listIndex); > //> pListItem? > > pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem, 0); > //> pSents+? > > if (pSents == 0x0){ > PyErr_Print(); > return return_value; > } > >> >> >> Feb 9, 2022, 17:40 by songofaca...@gmail.com: >> >>> On Thu, Feb 10, 2022 at 10:37 AM Jen Kris wrote: >>> >>>> >>>> I'm using Python 3.8 so I tried your second choice: >>>> >>>> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); >>>> >>>> but pSents is 0x0. pSentMod and pListItem are valid pointers. >>>> >>> >>> It means exception happened. >>> If you are writing Python/C function, return NULL (e.g. `if (pSents == >>> NULL) return NULL`) >>> Then Python show the exception and traceback for you. >>> > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
Hi and thanks very much for your comments on reference counting. Since I'm new to the C_API that will help a lot. I know that reference counting is one of the difficult issues with the C API. I just posted a reply to Inada Naoki showing how I solved the problem I posted yesterday. Thanks much for your help. Jen Feb 9, 2022, 18:43 by pyt...@mrabarnett.plus.com: > On 2022-02-10 01:37, Jen Kris via Python-list wrote: > >> I'm using Python 3.8 so I tried your second choice: >> >> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); >> >> but pSents is 0x0. pSentMod and pListItem are valid pointers. >> > 'PyObject_CallFunction' looks like a good one to use: > > """PyObject* PyObject_CallFunction(PyObject *callable, const char *format, > ...) > > Call a callable Python object callable, with a variable number of C > arguments. The C arguments are described using a Py_BuildValue() style format > string. The format can be NULL, indicating that no arguments are provided. > """ > > [snip] > > What I do is add comments to keep track of what objects I have references to > at each point and whether they are new references or could be NULL. > > For example: > > pName = PyUnicode_FromString("nltk.corpus"); > //> pName+? > > This means that 'pName' contains a reference, '+' means that it's a new > reference, and '?' means that it could be NULL (usually due to an exception, > but not always) so I need to check it. > > Continuing in this vein: > > pModule = PyImport_Import(pName); > //> pName+? pModule+? > > pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); > //> pName+? pModule+? pSubMod+? > pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); > //> pName+? pModule+? pSubMod+? pFidMod+? > pSentMod = PyObject_GetAttrString(pSubMod, "sents"); > //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? > > pFileIds = PyObject_CallObject(pFidMod, 0); > //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+? > pListItem = PyList_GetItem(pFileIds, listIndex); > //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+? > pListItem? > pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); > //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+? > pListItem? pListStrE+? > > As you can see, there's a lot of leaked references building up. > > Note how after: > > pListItem = PyList_GetItem(pFileIds, listIndex); > > the addition is: > > //> pListItem? > > This means that 'pListItem' contains a borrowed (not new) reference, but > could be NULL. > > I find it easiest to DECREF as soon as I no longer need the reference and > remove a name from the list as soon I no longer need it (and DECREFed where). > > For example: > > pName = PyUnicode_FromString("nltk.corpus"); > //> pName+? > if (!pName) > goto error; > //> pName+ > pModule = PyImport_Import(pName); > //> pName+ pModule+? > Py_DECREF(pName); > //> pModule+? > if (!pModule) > goto error; > //> pModule+ > > I find that doing this greatly reduces the chances of getting the reference > counting wrong, and I can remove the comments once I've finished the function > I'm writing. > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
With the help of PyErr_Print() I have it solved. Here is the final code (the part relevant to sents): Py_ssize_t listIndex = 0; pListItem = PyList_GetItem(pFileIds, listIndex); pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer // Then: sentences = gutenberg.sents(fileid) - this is a sequence item PyObject *c_args = Py_BuildValue("s", pListStr); PyObject *args_tuple = PyTuple_New(1); PyTuple_SetItem(args_tuple, 0, c_args); pSents = PyObject_CallObject(pSentMod, args_tuple); if ( pSents == 0x0){ PyErr_Print(); return return_value; } As you mentioned yesterday, CallObject needs a tuple, so that was the problem. Now it works. You also asked why I don't just use pListStrE. I tried that and got a long error message from PyErr_Print. I'm not far enough along in my C_API work to understand why, but it doesn't work. Thanks very much for your help on this. Jen Feb 9, 2022, 17:40 by songofaca...@gmail.com: > On Thu, Feb 10, 2022 at 10:37 AM Jen Kris wrote: > >> >> I'm using Python 3.8 so I tried your second choice: >> >> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); >> >> but pSents is 0x0. pSentMod and pListItem are valid pointers. >> > > It means exception happened. > If you are writing Python/C function, return NULL (e.g. `if (pSents == > NULL) return NULL`) > Then Python show the exception and traceback for you. > > -- > Inada Naoki > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
I'll do that and post back tomorrow. The office is closing and I have to leave now (I'm in Seattle). Thanks again for your help. Feb 9, 2022, 17:40 by songofaca...@gmail.com: > On Thu, Feb 10, 2022 at 10:37 AM Jen Kris wrote: > >> >> I'm using Python 3.8 so I tried your second choice: >> >> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); >> >> but pSents is 0x0. pSentMod and pListItem are valid pointers. >> > > It means exception happened. > If you are writing Python/C function, return NULL (e.g. `if (pSents == > NULL) return NULL`) > Then Python show the exception and traceback for you. > > -- > Inada Naoki > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
I'm using Python 3.8 so I tried your second choice: pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); but pSents is 0x0. pSentMod and pListItem are valid pointers. Feb 9, 2022, 17:23 by songofaca...@gmail.com: > // https://docs.python.org/3/c-api/call.html#c.PyObject_CallNoArgs > // This function is only for one arg. Python >= 3.9 is required. > pSents = PyObject_CallOneArg(pSentMod, pListItem); > > Or > > // https://docs.python.org/3/c-api/call.html#c.PyObject_CallFunctionObjArgs > // This function can call function with multiple arguments. Can be > used with Python <3.9 too. > pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); > > On Thu, Feb 10, 2022 at 10:15 AM Jen Kris wrote: > >> >> Right you are. In that case should I use Py_BuildValue and convert to tuple >> (because it won't return a tuple for a one-arg), or should I just convert >> pListStr to tuple? Thanks for your help. >> >> >> Feb 9, 2022, 17:08 by songofaca...@gmail.com: >> >> On Thu, Feb 10, 2022 at 10:05 AM Jen Kris wrote: >> >> >> Thanks for your reply. >> >> I eliminated the DECREF and now it doesn't segfault but it returns 0x0. Same >> when I substitute pListStrE for pListStr. pListStr contains the string >> representation of the fileid, so it seemed like the one to use. According to >> http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, >> PyBuildValue "builds a tuple only if its format string contains two or more >> format units" and that doc contains examples. >> >> >> Yes, and PyObject_Call accept tuple, not str. >> >> >> https://docs.python.org/3/c-api/call.html#c.PyObject_Call >> >> >> Feb 9, 2022, 16:52 by songofaca...@gmail.com: >> >> On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list >> wrote: >> >> >> I have everything finished down to the last line (sentences = >> gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, >> but it segfaults. The fileid is a string -- the first fileid in this corpus >> is "austen-emma.txt." >> >> pName = PyUnicode_FromString("nltk.corpus"); >> pModule = PyImport_Import(pName); >> >> pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); >> pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); >> pSentMod = PyObject_GetAttrString(pSubMod, "sents"); >> >> pFileIds = PyObject_CallObject(pFidMod, 0); >> pListItem = PyList_GetItem(pFileIds, listIndex); >> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); >> pListStr = PyBytes_AS_STRING(pListStrE); >> Py_DECREF(pListStrE); >> >> >> HERE. >> PyBytes_AS_STRING() returns pointer in the pListStrE Object. >> So Py_DECREF(pListStrE) makes pListStr a dangling pointer. >> >> >> // sentences = gutenberg.sents(fileid) >> PyObject *c_args = Py_BuildValue("s", pListStr); >> >> >> Why do you encode&decode pListStrE? >> Why don't you use just pListStrE? >> >> PyObject *NullPtr = 0; >> pSents = PyObject_Call(pSentMod, c_args, NullPtr); >> >> >> c_args must tuple, but you passed a unicode object here. >> Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue >> >> The final line segfaults: >> Program received signal SIGSEGV, Segmentation fault. >> 0x76e4e8d5 in _PyEval_EvalCodeWithName () >> from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 >> >> My guess is the problem is in Py_BuildValue, which returns a pointer but it >> may not be constructed correctly. I also tried it with "O" and it doesn't >> segfault but it returns 0x0. >> >> I'm new to using the C API. Thanks for any help. >> >> Jen >> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> >> >> Bests, >> >> -- >> Inada Naoki >> >> >> >> -- >> Inada Naoki >> > > > -- > Inada Naoki > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
Right you are. In that case should I use Py_BuildValue and convert to tuple (because it won't return a tuple for a one-arg), or should I just convert pListStr to tuple? Thanks for your help. Feb 9, 2022, 17:08 by songofaca...@gmail.com: > On Thu, Feb 10, 2022 at 10:05 AM Jen Kris wrote: > >> >> Thanks for your reply. >> >> I eliminated the DECREF and now it doesn't segfault but it returns 0x0. >> Same when I substitute pListStrE for pListStr. pListStr contains the string >> representation of the fileid, so it seemed like the one to use. According >> to http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, >> PyBuildValue "builds a tuple only if its format string contains two or more >> format units" and that doc contains examples. >> > > Yes, and PyObject_Call accept tuple, not str. > > > https://docs.python.org/3/c-api/call.html#c.PyObject_Call > >> >> Feb 9, 2022, 16:52 by songofaca...@gmail.com: >> >> On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list >> wrote: >> >> >> I have everything finished down to the last line (sentences = >> gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, >> but it segfaults. The fileid is a string -- the first fileid in this corpus >> is "austen-emma.txt." >> >> pName = PyUnicode_FromString("nltk.corpus"); >> pModule = PyImport_Import(pName); >> >> pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); >> pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); >> pSentMod = PyObject_GetAttrString(pSubMod, "sents"); >> >> pFileIds = PyObject_CallObject(pFidMod, 0); >> pListItem = PyList_GetItem(pFileIds, listIndex); >> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); >> pListStr = PyBytes_AS_STRING(pListStrE); >> Py_DECREF(pListStrE); >> >> >> HERE. >> PyBytes_AS_STRING() returns pointer in the pListStrE Object. >> So Py_DECREF(pListStrE) makes pListStr a dangling pointer. >> >> >> // sentences = gutenberg.sents(fileid) >> PyObject *c_args = Py_BuildValue("s", pListStr); >> >> >> Why do you encode&decode pListStrE? >> Why don't you use just pListStrE? >> >> PyObject *NullPtr = 0; >> pSents = PyObject_Call(pSentMod, c_args, NullPtr); >> >> >> c_args must tuple, but you passed a unicode object here. >> Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue >> >> The final line segfaults: >> Program received signal SIGSEGV, Segmentation fault. >> 0x76e4e8d5 in _PyEval_EvalCodeWithName () >> from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 >> >> My guess is the problem is in Py_BuildValue, which returns a pointer but it >> may not be constructed correctly. I also tried it with "O" and it doesn't >> segfault but it returns 0x0. >> >> I'm new to using the C API. Thanks for any help. >> >> Jen >> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> >> >> Bests, >> >> -- >> Inada Naoki >> > > > -- > Inada Naoki > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
Thanks for your reply. I eliminated the DECREF and now it doesn't segfault but it returns 0x0. Same when I substitute pListStrE for pListStr. pListStr contains the string representation of the fileid, so it seemed like the one to use. According to http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, PyBuildValue "builds a tuple only if its format string contains two or more format units" and that doc contains examples. Feb 9, 2022, 16:52 by songofaca...@gmail.com: > On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list > wrote: > >> >> I have everything finished down to the last line (sentences = >> gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, >> but it segfaults. The fileid is a string -- the first fileid in this corpus >> is "austen-emma.txt." >> >> pName = PyUnicode_FromString("nltk.corpus"); >> pModule = PyImport_Import(pName); >> >> pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); >> pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); >> pSentMod = PyObject_GetAttrString(pSubMod, "sents"); >> >> pFileIds = PyObject_CallObject(pFidMod, 0); >> pListItem = PyList_GetItem(pFileIds, listIndex); >> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); >> pListStr = PyBytes_AS_STRING(pListStrE); >> Py_DECREF(pListStrE); >> > > HERE. > PyBytes_AS_STRING() returns pointer in the pListStrE Object. > So Py_DECREF(pListStrE) makes pListStr a dangling pointer. > >> >> // sentences = gutenberg.sents(fileid) >> PyObject *c_args = Py_BuildValue("s", pListStr); >> > > Why do you encode&decode pListStrE? > Why don't you use just pListStrE? > >> PyObject *NullPtr = 0; >> pSents = PyObject_Call(pSentMod, c_args, NullPtr); >> > > c_args must tuple, but you passed a unicode object here. > Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue > > >> The final line segfaults: >> Program received signal SIGSEGV, Segmentation fault. >> 0x76e4e8d5 in _PyEval_EvalCodeWithName () >> from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 >> >> My guess is the problem is in Py_BuildValue, which returns a pointer but it >> may not be constructed correctly. I also tried it with "O" and it doesn't >> segfault but it returns 0x0. >> >> I'm new to using the C API. Thanks for any help. >> >> Jen >> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> > > Bests, > > -- > Inada Naoki > -- https://mail.python.org/mailman/listinfo/python-list
C API PyObject_Call segfaults with string
This is a follow-on to a question I asked yesterday, which was answered by MRAB. I'm using the Python C API to load the Gutenberg corpus from the nltk library and iterate through the sentences. The Python code I am trying to replicate is: from nltk.corpus import gutenberg for i, fileid in enumerate(gutenberg.fileids()): sentences = gutenberg.sents(fileid) etc I have everything finished down to the last line (sentences = gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, but it segfaults. The fileid is a string -- the first fileid in this corpus is "austen-emma.txt." pName = PyUnicode_FromString("nltk.corpus"); pModule = PyImport_Import(pName); pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); pSentMod = PyObject_GetAttrString(pSubMod, "sents"); pFileIds = PyObject_CallObject(pFidMod, 0); pListItem = PyList_GetItem(pFileIds, listIndex); pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); pListStr = PyBytes_AS_STRING(pListStrE); Py_DECREF(pListStrE); // sentences = gutenberg.sents(fileid) PyObject *c_args = Py_BuildValue("s", pListStr); PyObject *NullPtr = 0; pSents = PyObject_Call(pSentMod, c_args, NullPtr); The final line segfaults: Program received signal SIGSEGV, Segmentation fault. 0x76e4e8d5 in _PyEval_EvalCodeWithName () from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 My guess is the problem is in Py_BuildValue, which returns a pointer but it may not be constructed correctly. I also tried it with "O" and it doesn't segfault but it returns 0x0. I'm new to using the C API. Thanks for any help. Jen -- https://mail.python.org/mailman/listinfo/python-list
Re: Can't get iterator in the C API
Thank you for clarifying that. Now on to getting the iterator from the method. Jen Feb 8, 2022, 18:10 by pyt...@mrabarnett.plus.com: > On 2022-02-09 01:12, Jen Kris via Python-list wrote: > >> I am using the Python C API to load the Gutenberg corpus from the nltk >> library and iterate through the sentences. The Python code I am trying to >> replicate is: >> >> from nltk.corpus import gutenberg >> for i, fileid in enumerate(gutenberg.fileids()): >> sentences = gutenberg.sents(fileid) >> etc >> >> where gutenberg.fileids is, of course, iterable. >> >> I use the following C API code to import the module and get pointers: >> >> int64_t Call_PyModule() >> { >> PyObject *pModule, *pName, *pSubMod, *pFidMod, *pFidSeqIter,*pSentMod; >> >> pName = PyUnicode_FromString("nltk.corpus"); >> pModule = PyImport_Import(pName); >> >> if (pModule == 0x0){ >> PyErr_Print(); >> return 1; } >> >> pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); >> pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); >> pSentMod = PyObject_GetAttrString(pSubMod, "sents"); >> >> pFidIter = PyObject_GetIter(pFidMod); >> int ckseq_ok = PySeqIter_Check(pFidMod); >> pFidSeqIter = PySeqIter_New(pFidMod); >> >> return 0; >> } >> >> pSubMod, pFidMod and pSentMod all return valid pointers, but the iterator >> lines return zero: >> >> pFidIter = PyObject_GetIter(pFidMod); >> int ckseq_ok = PySeqIter_Check(pFidMod); >> pFidSeqIter = PySeqIter_New(pFidMod); >> >> So the C API thinks gutenberg.fileids is not iterable, but it is. What am I >> doing wrong? >> > Look at your Python code. You have "gutenberg.fileids()", so the 'fileids' > attribute is not an iterable itself, but a method that you need to call to > get the iterable. > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Can't get iterator in the C API
I am using the Python C API to load the Gutenberg corpus from the nltk library and iterate through the sentences. The Python code I am trying to replicate is: from nltk.corpus import gutenberg for i, fileid in enumerate(gutenberg.fileids()): sentences = gutenberg.sents(fileid) etc where gutenberg.fileids is, of course, iterable. I use the following C API code to import the module and get pointers: int64_t Call_PyModule() { PyObject *pModule, *pName, *pSubMod, *pFidMod, *pFidSeqIter,*pSentMod; pName = PyUnicode_FromString("nltk.corpus"); pModule = PyImport_Import(pName); if (pModule == 0x0){ PyErr_Print(); return 1; } pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); pSentMod = PyObject_GetAttrString(pSubMod, "sents"); pFidIter = PyObject_GetIter(pFidMod); int ckseq_ok = PySeqIter_Check(pFidMod); pFidSeqIter = PySeqIter_New(pFidMod); return 0; } pSubMod, pFidMod and pSentMod all return valid pointers, but the iterator lines return zero: pFidIter = PyObject_GetIter(pFidMod); int ckseq_ok = PySeqIter_Check(pFidMod); pFidSeqIter = PySeqIter_New(pFidMod); So the C API thinks gutenberg.fileids is not iterable, but it is. What am I doing wrong? -- https://mail.python.org/mailman/listinfo/python-list
Re: Data unchanged when passing data to Python in multiprocessing shared memory
An ASCII string will not work. If you convert 32894 to an ascii string you will have five bytes, but you need four. In my original post I showed the C program I used to convert any 32-bit number to 4 bytes. Feb 2, 2022, 10:16 by python-list@python.org: > I applaud trying to find the right solution but wonder if a more trivial > solution is even being considered. It ignores big and little endians and just > converts your data into another form and back. > > If all you want to do is send an integer that fit in 32 bits or 64 bits, why > not convert it to a character string in a form that both machines will see > the same way and when read back, convert it back to an integer? > > As long as both side see the same string, this can be done in reasonable time > and portably. > > Or am I missing something? Is "1234" not necessarily seen in the same order, > or "1.234e3" or whatever? > > Obviously, if the mechanism is heavily used and multiple sides keep reading > and even writing the same memory location, this is not ideal. But having > different incompatible processors looking at the same memory is also not. > > -Original Message- > From: Dennis Lee Bieber > To: python-list@python.org > Sent: Wed, Feb 2, 2022 12:30 am > Subject: Re: Data unchanged when passing data to Python in multiprocessing > shared memory > > > On Wed, 2 Feb 2022 00:40:22 +0100 (CET), Jen Kris > > declaimed the following: > > > >> >> >> breakup = int.from_bytes(byte_val, "big") >> > > >print("this is breakup " + str(breakup)) > >> >> > > >Python prints: this is breakup 32894 > >> >> > > >Note that I had to switch from little endian to big endian. Python is > >little endian by default, but in this case it's big endian. > >> >> > > Look at the struct module. I'm pretty certain it has flags for big or > > little end, or system native (that, or run your integers through the > > various "network byte order" functions that I think C and Python both > > support. > > > > https://www.gta.ufrj.br/ensino/eel878/sockets/htonsman.html > > > > > > >However, if anyone on this list knows how to pass data from a non-Python > >language to Python in multiprocessing.shared_memory please let me (and the > >list) know. > > > > MMU cache lines not writing through to RAM? Can't find > > anything on Google to force a cache flush Can you test on a > > different OS? (Windows vs Linux) > > > > > > > > -- > > Wulfraed Dennis Lee Bieber AF6VN > > wlfr...@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/ > > -- > > https://mail.python.org/mailman/listinfo/python-list > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Data unchanged when passing data to Python in multiprocessing shared memory
It's not clear to me from the struct module whether it can actually auto-detect endianness. I think it must be specified, just as I had to do with int.from_bytes(). In my case endianness was dictated by how the four bytes were populated, starting with the zero bytes on the left. Feb 1, 2022, 21:30 by wlfr...@ix.netcom.com: > On Wed, 2 Feb 2022 00:40:22 +0100 (CET), Jen Kris > declaimed the following: > >> >> breakup = int.from_bytes(byte_val, "big") >> > >print("this is breakup " + str(breakup)) > >> >> > >Python prints: this is breakup 32894 > >> >> > >Note that I had to switch from little endian to big endian. Python is > >little endian by default, but in this case it's big endian. > >> >> > Look at the struct module. I'm pretty certain it has flags for big or > little end, or system native (that, or run your integers through the > various "network byte order" functions that I think C and Python both > support. > > https://www.gta.ufrj.br/ensino/eel878/sockets/htonsman.html > > > >However, if anyone on this list knows how to pass data from a non-Python > >language to Python in multiprocessing.shared_memory please let me (and the > >list) know. > > MMU cache lines not writing through to RAM? Can't find > anything on Google to force a cache flush Can you test on a > different OS? (Windows vs Linux) > > > > -- > Wulfraed Dennis Lee Bieber AF6VN > wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Data unchanged when passing data to Python in multiprocessing shared memory
Barry, thanks for your reply. On the theory that it is not yet possible to pass data from a non-Python language to Python with multiprocessing.shared_memory, I bypassed the problem by attaching 4 bytes to my FIFO pipe message from NASM to Python: byte_val = v[10:14] where v is the message read from the FIFO. Then: breakup = int.from_bytes(byte_val, "big") print("this is breakup " + str(breakup)) Python prints: this is breakup 32894 Note that I had to switch from little endian to big endian. Python is little endian by default, but in this case it's big endian. However, if anyone on this list knows how to pass data from a non-Python language to Python in multiprocessing.shared_memory please let me (and the list) know. Thanks. Feb 1, 2022, 14:20 by ba...@barrys-emacs.org: > > >> On 1 Feb 2022, at 20:26, Jen Kris via Python-list >> wrote: >> >> I am using multiprocesssing.shared_memory to pass data between NASM and >> Python. The shared memory is created in NASM before Python is called. >> Python connects to the shm: shm_00 = >> shared_memory.SharedMemory(name='shm_object_00',create=False). >> >> I have used shared memory at other points in this project to pass text data >> from Python back to NASM with no problems. But now this time I need to pass >> a 32-bit integer (specifically 32,894) from NASM to Python. >> >> First I convert the integer to bytes in a C program linked into NASM: >> >> unsigned char bytes[4] >> unsigned long int_to_convert = 32894; >> >> bytes[0] = (int_to_convert >> 24) & 0xFF; >> bytes[1] = (int_to_convert >> 16) & 0xFF; >> bytes[2] = (int_to_convert >> 8) & 0xFF; >> bytes[3] = int_to_convert & 0xFF; >> memcpy(outbuf, bytes, 4); >> >> where outbuf is a pointer to the shared memory. On return from C to NASM, I >> verify that the first four bytes of the shared memory contain what I want, >> and they are 0, 0, -128, 126 which is binary 1000 >> 0110, and that's correct (32,894). >> >> Next I send a message to Python through a FIFO to read the data from shared >> memory. Python uses the following code to read the first four bytes of the >> shared memory: >> >> byte_val = shm_00.buf[:4] >> print(shm_00.buf[0]) >> print(shm_00.buf[1]) >> print(shm_00.buf[2]) >> print(shm_00.buf[3]) >> >> But the bytes show as 40 39 96 96, which is exactly what the first four >> bytes of this shared memory contained before I called C to overwrite them >> with the bytes 0, 0, -128, 126. So Python does not see the updated bytes, >> and naturally int.from_bytes(byte_val, "little") does not return the result >> I want. >> >> I know that Python refers to shm00.buf, using the buffer protocol. Is that >> the reason that Python can't see the data that has been updated by another >> language? >> >> So my question is, how can I alter the data in shared memory in a non-Python >> language to pass back to Python? >> > > Maybe you need to use a memory barrier to force the data to be seen by > another cpu? > Maybe use shm lock operation to sync both sides? > Googling I see people talking about using stdatomic.h for this. > > But I am far from clear what you would need to do. > > Barry > >> >> Thanks, >> >> Jen >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> -- https://mail.python.org/mailman/listinfo/python-list
Data unchanged when passing data to Python in multiprocessing shared memory
I am using multiprocesssing.shared_memory to pass data between NASM and Python. The shared memory is created in NASM before Python is called. Python connects to the shm: shm_00 = shared_memory.SharedMemory(name='shm_object_00',create=False). I have used shared memory at other points in this project to pass text data from Python back to NASM with no problems. But now this time I need to pass a 32-bit integer (specifically 32,894) from NASM to Python. First I convert the integer to bytes in a C program linked into NASM: unsigned char bytes[4] unsigned long int_to_convert = 32894; bytes[0] = (int_to_convert >> 24) & 0xFF; bytes[1] = (int_to_convert >> 16) & 0xFF; bytes[2] = (int_to_convert >> 8) & 0xFF; bytes[3] = int_to_convert & 0xFF; memcpy(outbuf, bytes, 4); where outbuf is a pointer to the shared memory. On return from C to NASM, I verify that the first four bytes of the shared memory contain what I want, and they are 0, 0, -128, 126 which is binary 1000 0110, and that's correct (32,894). Next I send a message to Python through a FIFO to read the data from shared memory. Python uses the following code to read the first four bytes of the shared memory: byte_val = shm_00.buf[:4] print(shm_00.buf[0]) print(shm_00.buf[1]) print(shm_00.buf[2]) print(shm_00.buf[3]) But the bytes show as 40 39 96 96, which is exactly what the first four bytes of this shared memory contained before I called C to overwrite them with the bytes 0, 0, -128, 126. So Python does not see the updated bytes, and naturally int.from_bytes(byte_val, "little") does not return the result I want. I know that Python refers to shm00.buf, using the buffer protocol. Is that the reason that Python can't see the data that has been updated by another language? So my question is, how can I alter the data in shared memory in a non-Python language to pass back to Python? Thanks, Jen -- https://mail.python.org/mailman/listinfo/python-list
Re: Python child process in while True loop blocks parent
I started this post on November 29, and there have been helpful comments since then from Barry Scott, Cameron Simpson, Peter Holzer and Chris Angelico. Thanks to all of you. I've found a solution that works for my purpose, and I said earlier that I would post the solution I found. If anyone has a better solution I would appreciate any feedback. To recap, I'm using a pair of named pipes for IPC between C and Python. Python runs as a child process after fork-execv. The Python program continues to run concurrently in a while True loop, and responds to requests from C at intervals, and continues to run until it receives a signal from C to exit. C sends signals to Python, then waits to receive data back from Python. My problem was that C was blocked when Python started. The solution was twofold: (1) for Python to run concurrently it must be a multiprocessing loop (from the multiprocessing module), and (2) Python must terminate its write strings with \n, or read will block in C waiting for something that never comes. The multiprocessing module sidesteps the GIL; without multiprocessing the GIL will block all other threads once Python starts. Originally I used epoll() on the pipes. Cameron Smith and Barry Scott advised against epoll, and for this case they are right. Blocking pipes work here, and epoll is too much overhead for watching on a single file descriptor. This is the Python code now: #!/usr/bin/python3 from multiprocessing import Process import os print("Python is running") child_pid = os.getpid() print('child process id:', child_pid) def f(a, b): print("Python now in function f") pr = os.open('/tmp/Pipe_01', os.O_RDONLY) print("File Descriptor1 Opened " + str(pr)) pw = os.open('/tmp/Pipe_02', os.O_WRONLY) print("File Descriptor2 Opened " + str(pw)) while True: v = os.read(pr,64) print("Python read from pipe pr") print(v) if v == b'99': os.close(pr) os.close(pw) print("Python is terminating") os._exit(os.EX_OK) if v != "Send child PID": os.write(pw, b"OK message received\n") print("Python wrote back") if __name__ == '__main__': a = 0 b = 0 p = Process(target=f, args=(a, b,)) p.start() p.join() The variables a and b are not currently used in the body, but they will be later. This is the part of the C code that communicates with Python: fifo_fd1 = open(fifo_path1, O_WRONLY); fifo_fd2 = open(fifo_path2, O_RDONLY); status_write = write(fifo_fd1, py_msg_01, sizeof(py_msg_01)); if (status_write < 0) perror("write"); status_read = read(fifo_fd2, fifo_readbuf, sizeof(py_msg_01)); if (status_read < 0) perror("read"); printf("C received message 1 from Python\n"); printf("%.*s",(int)buf_len, fifo_readbuf); status_write = write(fifo_fd1, py_msg_02, sizeof(py_msg_02)); if (status_write < 0) perror("write"); status_read = read(fifo_fd2, fifo_readbuf, sizeof(py_msg_02)); if (status_read < 0) perror("read"); printf("C received message 2 from Python\n"); printf("%.*s",(int)buf_len, fifo_readbuf); // Terminate Python multiprocessing printf("C is sending exit message to Python\n"); status_write = write(fifo_fd1, py_msg_03, 2); printf("C is closing\n"); close(fifo_fd1); close(fifo_fd2); Screen output: Python is running child process id: 5353 Python now in function f File Descriptor1 Opened 6 Thread created 0 File Descriptor2 Opened 7 Process ID: 5351 Parent Process ID: 5351 I am the parent Core joined 0 I am the child Python read from pipe pr b'Hello to Python from C\x00\x00' Python wrote back C received message 1 from Python OK message received Python read from pipe pr b'Message to Python 2\x00\x00' Python wrote back C received message 2 from Python OK message received C is sending exit message to Python C is closing Python read from pipe pr b'99' Python is terminating Python runs on a separate thread (created with pthreads) because I want the flexibility of using this same basic code as a stand-alone .exe, or for a C extension from Python called with ctypes. If I use it as a C extension then I want the Python code on a separate thread because I can't have two instances of the Python interpreter running on one thread, and one instance will already be running on the main thread, albeit "suspended" by the call from ctypes. So that's my solution: (1) Python multiprocessing module; (2) Python strings written to the pipe must be terminated with \n. Thanks again to all who commented. Dec 6, 2021, 13:33 by ba...@barrys-emacs.org: > > > >> On 6 Dec 2021, at 21:05, Jen Kris <>> jenk...@tutanota.com>> > wrote: >> >> Here is what I don't understand from what you said. "The child process is >> created with a single thread—the one that called fork()." To me that >> implies that the thread that called fork() is the same thre
Re: Python child process in while True loop blocks parent
Here is what I don't understand from what you said. "The child process is created with a single thread—the one that called fork()." To me that implies that the thread that called fork() is the same thread as the child process. I guess you're talking about the distinction between logical threads and physical threads. But the main issue is your suggestion that I should call fork-execv from the thread that runs the main C program, not from a separate physical pthread. That would certainly eliminate the overhead of creating a new pthread. I am working now to finish this, and I will try your suggestion of calling fork-execv from the "main" thread. When I reply back next I can give you a complete picture of what I'm doing. Your comments, and those of Peter Holzer and Chris Angelico, are most appreciated. Dec 6, 2021, 10:37 by ba...@barrys-emacs.org: > > >> On 6 Dec 2021, at 17:09, Jen Kris via Python-list >> wrote: >> >> I can't find any support for your comment that "Fork creates a new >> process and therefore also a new thread." From the Linux man pages >> https://www.man7.org/linux/man-pages/man2/fork.2.html, "The child process is >> created with a single thread—the one that called fork()." >> > > You just quoted the evidence! > > All new processes on unix (may all OS) only ever have one thread when they > start. > The thread-id of the first thread is the same as the process-id and referred > to as the main thread. > >> >> I have a one-core one-thread instance at Digital Ocean available running >> Ubuntu 18.04. I can fork and create a new process on it, but it doesn't >> create a new thread because it doesn't have one available. >> > > > By that logic it can only run one process... > > It has one hardware core that support one hardware thread. > Linux can create as many software threads as it likes. > >> You may also want to see "Forking vs Threading" >> (https://www.geekride.com/fork-forking-vs-threading-thread-linux-kernel), >> "Fork vs Thread" >> (https://medium.com/obscure-system/fork-vs-thread-38e09ec099e2), and "Linux >> process and thread" (https://zliu.org/post/linux-process-and-thread) ("This >> means that to create a normal process fork() is used that further calls >> clone() with appropriate arguments while to create a thread or LWP, a >> function from pthread library calls clone() with relvant flags. So, the main >> difference is generated by using different flags that can be passed to >> clone() funciton(to be exact, it is a system call"). >> >> You may be confused by the fact that threads are called light-weight >> processes. >> > > No Peter and I are not confused. > >> >> Or maybe I'm confused :) >> > > Yes you are confused. > >> >> If you have other information, please let me know. Thanks. >> > > Please get the book I recommended, or another that covers systems programming > on unix, and have a read. > > Barry > >> >> Jen >> >> >> Dec 5, 2021, 18:08 by hjp-pyt...@hjp.at: >> >>> On 2021-12-06 00:51:13 +0100, Jen Kris via Python-list wrote: >>> >>>> The C program creates two threads (using pthreads), one for itself and >>>> one for the child process. On creation, the second pthread is pointed >>>> to a C program that calls fork-execv to run the Python program. That >>>> way Python runs on a separate thread. >>>> >>> >>> I think you have the relationship between processes and threads >>> backwards. A process consists of one or more threads. Fork creates a new >>> process and therefore also a new thread. >>> >>> hp >>> >>> -- >>> _ | Peter J. Holzer| Story must make more sense than reality. >>> |_|_) || >>> | | | h...@hjp.at |-- Charles Stross, "Creative writing >>> __/ | http://www.hjp.at/ | challenge!" >>> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> -- https://mail.python.org/mailman/listinfo/python-list
Re: Python child process in while True loop blocks parent
I can't find any support for your comment that "Fork creates a new process and therefore also a new thread." From the Linux man pages https://www.man7.org/linux/man-pages/man2/fork.2.html, "The child process is created with a single thread—the one that called fork()." I have a one-core one-thread instance at Digital Ocean available running Ubuntu 18.04. I can fork and create a new process on it, but it doesn't create a new thread because it doesn't have one available. You may also want to see "Forking vs Threading" (https://www.geekride.com/fork-forking-vs-threading-thread-linux-kernel), "Fork vs Thread" (https://medium.com/obscure-system/fork-vs-thread-38e09ec099e2), and "Linux process and thread" (https://zliu.org/post/linux-process-and-thread) ("This means that to create a normal process fork() is used that further calls clone() with appropriate arguments while to create a thread or LWP, a function from pthread library calls clone() with relvant flags. So, the main difference is generated by using different flags that can be passed to clone() funciton(to be exact, it is a system call"). You may be confused by the fact that threads are called light-weight processes. Or maybe I'm confused :) If you have other information, please let me know. Thanks. Jen Dec 5, 2021, 18:08 by hjp-pyt...@hjp.at: > On 2021-12-06 00:51:13 +0100, Jen Kris via Python-list wrote: > >> The C program creates two threads (using pthreads), one for itself and >> one for the child process. On creation, the second pthread is pointed >> to a C program that calls fork-execv to run the Python program. That >> way Python runs on a separate thread. >> > > I think you have the relationship between processes and threads > backwards. A process consists of one or more threads. Fork creates a new > process and therefore also a new thread. > > hp > > -- > _ | Peter J. Holzer| Story must make more sense than reality. > |_|_) || > | | | h...@hjp.at |-- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!" > -- https://mail.python.org/mailman/listinfo/python-list
Re: Python child process in while True loop blocks parent
ation on how processes work. >>> Maybe "Advanced Programming in the UNIX Environment" >>> would be helpful? >>> >>> https://www.amazon.co.uk/Programming-Environment-Addison-Wesley-Professional-Computing-dp-0321637739/dp/0321637739/ref=dp_ob_image_bk>>> >>> >>> >>> It's a great book and covers a wide range of Unix systems programming >>> topics. >>> >>> Have you created a small C program that just does the fork and exec of a >>> python program to test out your assumptions? >>> If not I recommend that you do. >>> >>> Barry >>> >>> >>> >>>> >>>> >>>> >>>> Nov 30, 2021, 11:42 by >>>> ba...@barrys-emacs.org>>>> : >>>> >>>>> >>>>> >>>>> >>>>>> On 29 Nov 2021, at 22:31, Jen Kris <>>>>>> jenk...@tutanota.com>>>>>> > >>>>>> wrote: >>>>>> >>>>>> Thanks to you and Cameron for your replies. The C side has an epoll_ctl >>>>>> set, but no event loop to handle it yet. I'm putting that in now with a >>>>>> pipe write in Python-- as Cameron pointed out that is the likely source >>>>>> of blocking on C. The pipes are opened as rdwr in Python because that's >>>>>> nonblocking by default. The child will become more complex, but not in >>>>>> a way that affects polling. And thanks for the tip about the c-string >>>>>> termination. >>>>>> >>>>>> >>>>> >>>>> flags is a bit mask. You say its BLOCKing by not setting os.O_NONBLOCK. >>>>> You should not use O_RDWR when you only need O_RDONLY access or only >>>>> O_WRONLY access. >>>>> >>>>> You may find >>>>> >>>>> man 2 open >>>>> >>>>> useful to understand in detail what is behind os.open(). >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>> Nov 29, 2021, 14:12 by >>>>>> ba...@barrys-emacs.org>>>>>> : >>>>>> >>>>>>> >>>>>>> >>>>>>>> On 29 Nov 2021, at 20:36, Jen Kris via Python-list <>>>>>>>> >>>>>>>> python-list@python.org>>>>>>>> > wrote: >>>>>>>> >>>>>>>> I have a C program that forks to create a child process and uses >>>>>>>> execv to call a Python program. The Python program communicates with >>>>>>>> the parent process (in C) through a FIFO pipe monitored with epoll(). >>>>>>>> >>>>>>>> The Python child process is in a while True loop, which is intended to >>>>>>>> keep it running while the parent process proceeds, and perform >>>>>>>> functions for the C program only at intervals when the parent sends >>>>>>>> data to the child -- similar to a daemon process. >>>>>>>> >>>>>>>> The C process writes to its end of the pipe and the child process >>>>>>>> reads it, but then the child process continues to loop, thereby >>>>>>>> blocking the parent. >>>>>>>> >>>>>>>> This is the Python code: >>>>>>>> >>>>>>>> #!/usr/bin/python3 >>>>>>>> import os >>>>>>>> import select >>>>>>>> >>>>>>>> #Open the named pipes >>>>>>>> pr = os.open('/tmp/Pipe_01', os.O_RDWR) >>>>>>>> >>>>>>> Why open rdwr if you are only going to read the pipe? >>>>>>> >>>>>>>> pw = os.open('/tmp/Pipe_02', os.O_RDWR) >>>>>>>> >>>>>>> Only need to open for write. >>>>>>> >>>>>>>> >>>>>>>> ep = select.epoll(-1) >>>>>>>> ep.register(pr, select.EPOLLIN) >>>>>>>> >>>>>>> >>>>>>> Is the only thing that the child does this: >>>>>>> 1. Read message from pr >>>>>>> 2. Process message >>>>>>> 3. Write result to pw. >>>>>>> 4. Loop from 1 >>>>>>> >>>>>>> If so as Cameron said you do not need to worry about the poll. >>>>>>> Do you plan for the child to become more complex? >>>>>>> >>>>>>>> >>>>>>>> while True: >>>>>>>> >>>>>>>> events = ep.poll(timeout=2.5, maxevents=-1) >>>>>>>> #events = ep.poll(timeout=None, maxevents=-1) >>>>>>>> >>>>>>>> print("child is looping") >>>>>>>> >>>>>>>> for fileno, event in events: >>>>>>>> print("Python fileno") >>>>>>>> print(fileno) >>>>>>>> print("Python event") >>>>>>>> print(event) >>>>>>>> v = os.read(pr,64) >>>>>>>> print("Pipe value") >>>>>>>> print(v) >>>>>>>> >>>>>>>> The child process correctly receives the signal from ep.poll and >>>>>>>> correctly reads the data in the pipe, but then it continues looping. >>>>>>>> For example, when I put in a timeout: >>>>>>>> >>>>>>>> child is looping >>>>>>>> Python fileno >>>>>>>> 4 >>>>>>>> Python event >>>>>>>> 1 >>>>>>>> Pipe value >>>>>>>> b'10\x00' >>>>>>>> >>>>>>> The C code does not need to write a 0 bytes at the end. >>>>>>> I assume the 0 is from the end of a C string. >>>>>>> UDS messages have a length. >>>>>>> In the C just write 2 byes in the case. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>>> child is looping >>>>>>>> child is looping >>>>>>>> >>>>>>>> That suggests that a while True loop is not the right thing to do in >>>>>>>> this case. My question is, what type of process loop is best for this >>>>>>>> situation? The multiprocessing, asyncio and subprocess libraries are >>>>>>>> very extensive, and it would help if someone could suggest the best >>>>>>>> alternative for what I am doing here. >>>>>>>> >>>>>>>> Thanks very much for any ideas. >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> https://mail.python.org/mailman/listinfo/python-list >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> -- https://mail.python.org/mailman/listinfo/python-list
Re: Python child process in while True loop blocks parent
Thanks for your comments. I put the Python program on its own pthread, and call a small C program to fork-execv to call the Python program as a child process. I revised the Python program to be a multiprocessing loop using the Python multiprocessing module. That bypasses the GIL and allows Python to run concurrently with C. So far so good. Next I will use Linux pipes, not Python multiprocessing pipes, for IPC between Python and C. Multiprocessing pipes are (as far as I can tell) only for commo between two Python processes. I will have the parent thread send a signal through the pipe to the child process to exit when the parent thread is ready to exit, then call wait() to finalize the child process. I will reply back when it's finished and post the code so you can see what I have done. Thanks again. Jen Dec 4, 2021, 09:22 by ba...@barrys-emacs.org: > > >> On 1 Dec 2021, at 16:01, Jen Kris <>> jenk...@tutanota.com>> > wrote: >> >> Thanks for your comment re blocking. >> >> I removed pipes from the Python and C programs to see if it blocks without >> them, and it does. >> >> It looks now like the problem is not pipes. >> > > Ok. > > >> I use fork() and execv() in C to run Python in a child process, but the >> Python process blocks >> > > Use strace on the parent process to see what is happening. > You will need to use the option to follow subprocesses so that you can see > what goes on in the python process. > > See man strace and the --follow-forks and --output-separately options. > That will allow you to find the blocking system call that your code is making. > > >> because fork() does not create a new thread, so the Python global >> interpreter lock (GIL) prevents the C program from running once Python >> starts. >> > > Not sure why you think this. > > >> So the solution appears to be run Python in a separate thread, which I can >> do with pthread create. >> >> See "Thread State and the Global Interpreter Lock" >> >> https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock>> >> and the sections below that "Non-Python created threads" and "Cautions >> about fork()." >> > > I take it you mean that in the parent you think that using pthreads will > affect python after the exec() call? > I does not. After exec() the process has one main thread create by the kernel > and a new address space as defined by the /usr/bin/python. > The only state that in inherited from the parent are open file descriptors, > the current working directory and security state like UID, GID. > > >> I'm working on that today and I hope all goes well :) >> > > You seem to be missing background information on how processes work. > Maybe "Advanced Programming in the UNIX Environment" > would be helpful? > > https://www.amazon.co.uk/Programming-Environment-Addison-Wesley-Professional-Computing-dp-0321637739/dp/0321637739/ref=dp_ob_image_bk> > > > It's a great book and covers a wide range of Unix systems programming topics. > > Have you created a small C program that just does the fork and exec of a > python program to test out your assumptions? > If not I recommend that you do. > > Barry > > > >> >> >> >> Nov 30, 2021, 11:42 by >> ba...@barrys-emacs.org>> : >> >>> >>> >>> >>>> On 29 Nov 2021, at 22:31, Jen Kris <>>>> jenk...@tutanota.com>>>> > wrote: >>>> >>>> Thanks to you and Cameron for your replies. The C side has an epoll_ctl >>>> set, but no event loop to handle it yet. I'm putting that in now with a >>>> pipe write in Python-- as Cameron pointed out that is the likely source of >>>> blocking on C. The pipes are opened as rdwr in Python because that's >>>> nonblocking by default. The child will become more complex, but not in a >>>> way that affects polling. And thanks for the tip about the c-string >>>> termination. >>>> >>>> >>> >>> flags is a bit mask. You say its BLOCKing by not setting os.O_NONBLOCK. >>> You should not use O_RDWR when you only need O_RDONLY access or only >>> O_WRONLY access. >>> >>> You may find >>> >>> man 2 open >>> >>> useful to understand in detail what is behind os.open(). >>> >>> Barry >>> >>> >>> >>> >>>> >>>> >>>&
Re: Python child process in while True loop blocks parent
Thanks for your comment re blocking. I removed pipes from the Python and C programs to see if it blocks without them, and it does. It looks now like the problem is not pipes. I use fork() and execv() in C to run Python in a child process, but the Python process blocks because fork() does not create a new thread, so the Python global interpreter lock (GIL) prevents the C program from running once Python starts. So the solution appears to be run Python in a separate thread, which I can do with pthread create. See "Thread State and the Global Interpreter Lock" https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock and the sections below that "Non-Python created threads" and "Cautions about fork()." I'm working on that today and I hope all goes well :) Nov 30, 2021, 11:42 by ba...@barrys-emacs.org: > > > >> On 29 Nov 2021, at 22:31, Jen Kris <>> jenk...@tutanota.com>> > wrote: >> >> Thanks to you and Cameron for your replies. The C side has an epoll_ctl >> set, but no event loop to handle it yet. I'm putting that in now with a >> pipe write in Python-- as Cameron pointed out that is the likely source of >> blocking on C. The pipes are opened as rdwr in Python because that's >> nonblocking by default. The child will become more complex, but not in a >> way that affects polling. And thanks for the tip about the c-string >> termination. >> >> > > flags is a bit mask. You say its BLOCKing by not setting os.O_NONBLOCK. > You should not use O_RDWR when you only need O_RDONLY access or only O_WRONLY > access. > > You may find > > man 2 open > > useful to understand in detail what is behind os.open(). > > Barry > > > > >> >> >> Nov 29, 2021, 14:12 by >> ba...@barrys-emacs.org>> : >> >>> >>> >>>> On 29 Nov 2021, at 20:36, Jen Kris via Python-list <>>>> >>>> python-list@python.org>>>> > wrote: >>>> >>>> I have a C program that forks to create a child process and uses execv to >>>> call a Python program. The Python program communicates with the parent >>>> process (in C) through a FIFO pipe monitored with epoll(). >>>> >>>> The Python child process is in a while True loop, which is intended to >>>> keep it running while the parent process proceeds, and perform functions >>>> for the C program only at intervals when the parent sends data to the >>>> child -- similar to a daemon process. >>>> >>>> The C process writes to its end of the pipe and the child process reads >>>> it, but then the child process continues to loop, thereby blocking the >>>> parent. >>>> >>>> This is the Python code: >>>> >>>> #!/usr/bin/python3 >>>> import os >>>> import select >>>> >>>> #Open the named pipes >>>> pr = os.open('/tmp/Pipe_01', os.O_RDWR) >>>> >>> Why open rdwr if you are only going to read the pipe? >>> >>>> pw = os.open('/tmp/Pipe_02', os.O_RDWR) >>>> >>> Only need to open for write. >>> >>>> >>>> ep = select.epoll(-1) >>>> ep.register(pr, select.EPOLLIN) >>>> >>> >>> Is the only thing that the child does this: >>> 1. Read message from pr >>> 2. Process message >>> 3. Write result to pw. >>> 4. Loop from 1 >>> >>> If so as Cameron said you do not need to worry about the poll. >>> Do you plan for the child to become more complex? >>> >>>> >>>> while True: >>>> >>>> events = ep.poll(timeout=2.5, maxevents=-1) >>>> #events = ep.poll(timeout=None, maxevents=-1) >>>> >>>> print("child is looping") >>>> >>>> for fileno, event in events: >>>> print("Python fileno") >>>> print(fileno) >>>> print("Python event") >>>> print(event) >>>> v = os.read(pr,64) >>>> print("Pipe value") >>>> print(v) >>>> >>>> The child process correctly receives the signal from ep.poll and correctly >>>> reads the data in the pipe, but then it continues looping. For example, >>>> when I put in a timeout: >>>> >>>> child is looping >>>> Python fileno >>>> 4 >>>> Python event >>>> 1 >>>> Pipe value >>>> b'10\x00' >>>> >>> The C code does not need to write a 0 bytes at the end. >>> I assume the 0 is from the end of a C string. >>> UDS messages have a length. >>> In the C just write 2 byes in the case. >>> >>> Barry >>> >>>> child is looping >>>> child is looping >>>> >>>> That suggests that a while True loop is not the right thing to do in this >>>> case. My question is, what type of process loop is best for this >>>> situation? The multiprocessing, asyncio and subprocess libraries are very >>>> extensive, and it would help if someone could suggest the best alternative >>>> for what I am doing here. >>>> >>>> Thanks very much for any ideas. >>>> >>>> >>>> -- >>>> https://mail.python.org/mailman/listinfo/python-list >>>> >> >> -- https://mail.python.org/mailman/listinfo/python-list
Re: Python child process in while True loop blocks parent
Thanks to you and Cameron for your replies. The C side has an epoll_ctl set, but no event loop to handle it yet. I'm putting that in now with a pipe write in Python-- as Cameron pointed out that is the likely source of blocking on C. The pipes are opened as rdwr in Python because that's nonblocking by default. The child will become more complex, but not in a way that affects polling. And thanks for the tip about the c-string termination. Nov 29, 2021, 14:12 by ba...@barrys-emacs.org: > > >> On 29 Nov 2021, at 20:36, Jen Kris via Python-list >> wrote: >> >> I have a C program that forks to create a child process and uses execv to >> call a Python program. The Python program communicates with the parent >> process (in C) through a FIFO pipe monitored with epoll(). >> >> The Python child process is in a while True loop, which is intended to keep >> it running while the parent process proceeds, and perform functions for the >> C program only at intervals when the parent sends data to the child -- >> similar to a daemon process. >> >> The C process writes to its end of the pipe and the child process reads it, >> but then the child process continues to loop, thereby blocking the parent. >> >> This is the Python code: >> >> #!/usr/bin/python3 >> import os >> import select >> >> #Open the named pipes >> pr = os.open('/tmp/Pipe_01', os.O_RDWR) >> > Why open rdwr if you are only going to read the pipe? > >> pw = os.open('/tmp/Pipe_02', os.O_RDWR) >> > Only need to open for write. > >> >> ep = select.epoll(-1) >> ep.register(pr, select.EPOLLIN) >> > > Is the only thing that the child does this: > 1. Read message from pr > 2. Process message > 3. Write result to pw. > 4. Loop from 1 > > If so as Cameron said you do not need to worry about the poll. > Do you plan for the child to become more complex? > >> >> while True: >> >> events = ep.poll(timeout=2.5, maxevents=-1) >> #events = ep.poll(timeout=None, maxevents=-1) >> >> print("child is looping") >> >> for fileno, event in events: >> print("Python fileno") >> print(fileno) >> print("Python event") >> print(event) >> v = os.read(pr,64) >> print("Pipe value") >> print(v) >> >> The child process correctly receives the signal from ep.poll and correctly >> reads the data in the pipe, but then it continues looping. For example, >> when I put in a timeout: >> >> child is looping >> Python fileno >> 4 >> Python event >> 1 >> Pipe value >> b'10\x00' >> > The C code does not need to write a 0 bytes at the end. > I assume the 0 is from the end of a C string. > UDS messages have a length. > In the C just write 2 byes in the case. > > Barry > >> child is looping >> child is looping >> >> That suggests that a while True loop is not the right thing to do in this >> case. My question is, what type of process loop is best for this situation? >> The multiprocessing, asyncio and subprocess libraries are very extensive, >> and it would help if someone could suggest the best alternative for what I >> am doing here. >> >> Thanks very much for any ideas. >> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> -- https://mail.python.org/mailman/listinfo/python-list
Python child process in while True loop blocks parent
I have a C program that forks to create a child process and uses execv to call a Python program. The Python program communicates with the parent process (in C) through a FIFO pipe monitored with epoll(). The Python child process is in a while True loop, which is intended to keep it running while the parent process proceeds, and perform functions for the C program only at intervals when the parent sends data to the child -- similar to a daemon process. The C process writes to its end of the pipe and the child process reads it, but then the child process continues to loop, thereby blocking the parent. This is the Python code: #!/usr/bin/python3 import os import select #Open the named pipes pr = os.open('/tmp/Pipe_01', os.O_RDWR) pw = os.open('/tmp/Pipe_02', os.O_RDWR) ep = select.epoll(-1) ep.register(pr, select.EPOLLIN) while True: events = ep.poll(timeout=2.5, maxevents=-1) #events = ep.poll(timeout=None, maxevents=-1) print("child is looping") for fileno, event in events: print("Python fileno") print(fileno) print("Python event") print(event) v = os.read(pr,64) print("Pipe value") print(v) The child process correctly receives the signal from ep.poll and correctly reads the data in the pipe, but then it continues looping. For example, when I put in a timeout: child is looping Python fileno 4 Python event 1 Pipe value b'10\x00' child is looping child is looping That suggests that a while True loop is not the right thing to do in this case. My question is, what type of process loop is best for this situation? The multiprocessing, asyncio and subprocess libraries are very extensive, and it would help if someone could suggest the best alternative for what I am doing here. Thanks very much for any ideas. -- https://mail.python.org/mailman/listinfo/python-list