Re: String Identity Test
Terry Reedy wrote: Hendrik van Rooyen wrote: "S Arrowsmith" wrote: "Small" integers get a similar treatment: a = 256 b = 256 a is b True a = 257 b = 257 a is b False This is weird - I would have thought that the limit of "small" would be at 255 - the biggest number to fit in a byte. 256 takes two bytes, so it must be Ints take as least 4 bytes. It is commonness of usage that determined caching. The range was expanded a few years ago in anticipation of the new bytes type, whose contents are ints, not chars. an arbitrary limit - could have been set at 300, or 30 000... 'Small' also goes to -10 or so. 256 was included, at minuscule cost, because it is a relatively common number, being the number of bytes. In fact, 3.0.1 starts with 36 internal references to the cached int 256! >>> import sys >>> sys.getrefcount(256) 38 # -2 for the function call >>> sys.getrefcount(257) 2 >>> [sys.getrefcount(i)-2 for i in range(258)] shows that only 15 cached ints start with more references. 0 has the most with 724 (and that small actually goes to -5). tjr -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Hendrik van Rooyen wrote: "S Arrowsmith" wrote: "Small" integers get a similar treatment: a = 256 b = 256 a is b True a = 257 b = 257 a is b False This is weird - I would have thought that the limit of "small" would be at 255 - the biggest number to fit in a byte. 256 takes two bytes, so it must be an arbitrary limit - could have been set at 300, or 30 000... 'Small' also goes to -10 or so. 256 was included, at minuscule cost, because it is a relatively common number, being the number of bytes. -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Avetis KAZARIAN a écrit : > Well, it's not about curiosity, it's more about performance. Steve Holden wrote: (snip) So, don't try to translate concepts from one language to another. I'll try ;] Also and FWIW: 1/ Python has some very handy tools when it comes to perfs - like a couple profilers (to identify bottlenecks), or the timeit module (for quick benchmarks). 2/ Most "best practice" idioms are frequently discussed here 3/ If you have performance problems related to wrong algorithm/data structure, some of us here _really_ enjoy helping !-) Welcome onboard. -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Hendrik van Rooyen a écrit : "S Arrowsmith" wrote: "Small" integers get a similar treatment: a = 256 b = 256 a is b True a = 257 b = 257 a is b False This is weird - I would have thought that the limit of "small" would be at 255 - the biggest number to fit in a byte. 256 takes two bytes, so it must be an arbitrary limit It is, and has changed from version to version. -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
"S Arrowsmith" wrote: > "Small" integers get a similar treatment: > > >>> a = 256 > >>> b = 256 > >>> a is b > True > >>> a = 257 > >>> b = 257 > >>> a is b > False This is weird - I would have thought that the limit of "small" would be at 255 - the biggest number to fit in a byte. 256 takes two bytes, so it must be an arbitrary limit - could have been set at 300, or 30 000... - Hendrik -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Avetis KAZARIAN wrote: >It seems that any strict ASCII alpha-numeric string is instantiated as >an unique object, like a "singleton" ( a =3D "x" and b =3D "x" =3D> a is b = >) >and that any non strict ASCII alpha-numeric string is instantiated as >a new object every time with a new id. What no-one appears to have mentioned so far is that the purpose of this implementation detail is to ensure that there is a single instance of strings which are valid identifiers, so that you don't go around creating and destroying string instances just to do an attribute look-up on an object. A few strings which are not valid as identifiers get swept up into this system: >>> a = "1" >>> b = "1" >>> a is b True "Small" integers get a similar treatment: >>> a = 256 >>> b = 256 >>> a is b True >>> a = 257 >>> b = 257 >>> a is b False But as as hopefully been made clear, all this is completely an implementation detail. (Indeed, the range of "interned" integers changed from 0--99 to -5--2356 a few versions ago.) So don't, under any circumstances, rely on it, even when you understand what's going on. -- \S under construction -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Steve Holden wrote: > Does PHP really keep only one copy of every string? Not at all. I might have said something confusing if you understood that... > So, don't try to translate concepts from one language to another. > > -- > Gabriel Genellina I'll try ;] -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
En Wed, 04 Mar 2009 07:07:44 -0200, Avetis KAZARIAN escribió: Gary Herron wrote: The question now is: Why do you care? The properties of strings do not depend on the implementation's choice, so you shouldn't care because of programming considerations. Perhaps it's just a matter of curiosity on your part. Gary Herron Well, it's not about curiosity, it's more about performance. I will make a PHP example (a really quite simple ) PHP : Stat 1 : $aVeryLongString == $anOtherVeryLongString Stat 2 : $aVeryLongString === $anOtherVeryLongString Stat 2 is really faster than Stat 1 (due to the binary comparison) As I said, I'm coming from PHP, so I was wondering if there was such a difference in Python. Because I was trying to use "is" as for "===". PHP '==' has no direct correspondence in Python. '===' in PHP is more like '==' in Python (but not exactly the same). In PHP, $x === $y is true if both variables are of the same type *and* both have the same value. $x == $y checks only the values, doing type conversions as needed, even string -> number; there is no equivalent operator in Python. PHP === is called "identity" but isn't related to the "is" operator in Python; there is no identity test in PHP with the Python semantics. PHP: 1 == 1 TRUE 1 == 1.0 TRUE 1 == "1" TRUE 1 == "1.0" TRUE 1 === 1 TRUE 1 === 1.0 FALSE 1 === "1" FALSE 1 === "1.0" FALSE array(1,2,3) == array(1,2,3) TRUE array(1,2,3) === array(1,2,3) TRUE Python: 1 == 1 True 1 == 1.0 True 1 == "1" False 1 == "1.0" False [1,2,3] == [1,2,3] True [1,2,3] is [1,2,3] False So, don't try to translate concepts from one language to another. (Ok, it's natural to try to do that if you know PHP, but doesn't work. You have to know the differences). -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Avetis KAZARIAN wrote: > Gary Herron wrote: >> The question now is: Why do you care? The properties of strings do >> not depend on the implementation's choice, so you shouldn't care because >> of programming considerations. Perhaps it's just a matter of curiosity >> on your part. >> >> Gary Herron > > Well, it's not about curiosity, it's more about performance. > > I will make a PHP example (a really quite simple ) > > PHP : > > Stat 1 : $aVeryLongString == $anOtherVeryLongString > Stat 2 : $aVeryLongString === $anOtherVeryLongString > > Stat 2 is really faster than Stat 1 (due to the binary comparison) > > As I said, I'm coming from PHP, so I was wondering if there was such a > difference in Python. > > Because I was trying to use "is" as for "===". Suppose you write a = b Thereafter, unless some further assignment is made to either a or b, you are guaranteed that "a is b" returns True. This is pretty much the only guarantee you have. There is no guarantee (across all implementations) that a = some-expression b = some-equivalent-expression will leave "a is b" True. Does PHP really keep only one copy of every string? Sounds like that could slow string creation down a little. Essentially it's keeping all strings in a set. Of course you could do that in Python if you wanted, but it would certainly slow things down. Anyway, thanks for looking at Python. I hope you continue to enjoy it! regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Avetis KAZARIAN wrote: > Gary Herron wrote: >> The question now is: Why do you care? The properties of strings do >> not depend on the implementation's choice, so you shouldn't care because >> of programming considerations. Perhaps it's just a matter of curiosity >> on your part. >> >> Gary Herron > > Well, it's not about curiosity, it's more about performance. > > I will make a PHP example (a really quite simple ) > > PHP : > > Stat 1 : $aVeryLongString == $anOtherVeryLongString > Stat 2 : $aVeryLongString === $anOtherVeryLongString > > Stat 2 is really faster than Stat 1 (due to the binary comparison) > > As I said, I'm coming from PHP, so I was wondering if there was such a > difference in Python. > > Because I was trying to use "is" as for "===". So you have two very long strings that may be equal. How did you get them? If you read them from a file, that took much more time than the comparison. If they are sufficiently likely to be not equal just read them in smaller chunks and compare these. If you want to compare multiple combinations use hashes. If 'a is b' worked like 'a == b' for arbitrary string that would mean that the python implementation had done a lot of unnecessary 'a == b' comparisons behind the scene or at least calculated a lot of hash values, i. e. the ability to use the fast operation would in effect slow down your program. Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Everything's clear now. Thanks all (especially Christian and Tino) :] -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Avetis KAZARIAN schrieb: > Gary Herron wrote: >> The question now is: Why do you care? The properties of strings do >> not depend on the implementation's choice, so you shouldn't care because >> of programming considerations. Perhaps it's just a matter of curiosity >> on your part. >> >> Gary Herron > > Well, it's not about curiosity, it's more about performance. > > I will make a PHP example (a really quite simple ) > > PHP : > > Stat 1 : $aVeryLongString == $anOtherVeryLongString > Stat 2 : $aVeryLongString === $anOtherVeryLongString > > Stat 2 is really faster than Stat 1 (due to the binary comparison) > > As I said, I'm coming from PHP, so I was wondering if there was such a > difference in Python. Python uses some tricks to speed up string comparison. The struct of the string type contains the length of the string and it caches the hash of the string, too. s1 == s2 is broken down to several steps. Here is the Python equivalent of the C code: # for strings, identity is always equality if s1 is s2: return True # compare the size if len(s1) != len(s2): return False # special case strings with a length of one if len(s1) == 1 and s1[0] == s2[0]: return True # compare the hash if hash(s1) != hash(s2): return False # if size and hash are equal compare every char* of the str for i in xrange(len(s1)): if s1[i] != s2[i]: return False # it's really the same thing return True Christian -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Avetis KAZARIAN wrote: Gary Herron wrote: The question now is: Why do you care? The properties of strings do not depend on the implementation's choice, so you shouldn't care because of programming considerations. Perhaps it's just a matter of curiosity on your part. Gary Herron Well, it's not about curiosity, it's more about performance. I will make a PHP example (a really quite simple ) PHP : Stat 1 : $aVeryLongString == $anOtherVeryLongString Stat 2 : $aVeryLongString === $anOtherVeryLongString Stat 2 is really faster than Stat 1 (due to the binary comparison) As I said, I'm coming from PHP, so I was wondering if there was such a difference in Python. Please keep in mind in both cases there is nothing "for free". To have identity, you would need to have the same object - which in case of a string means the interpreter has to find out about existing string with exactly the same contents and reference it instead of creating a new object in memory. This takes about at least the same time (if not more) then just run the compare with both strings when you need (aka == ). If you only have a few strings but compare them often, you could profit from identity and the overhead of installing it would be neglectable (and you can force this in python with "internal") but in this case I'd think calculating and working with a hash instead should be preferred. Regards Tino Wildenhain smime.p7s Description: S/MIME Cryptographic Signature -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Gary Herron wrote: > The question now is: Why do you care? The properties of strings do > not depend on the implementation's choice, so you shouldn't care because > of programming considerations. Perhaps it's just a matter of curiosity > on your part. > > Gary Herron Well, it's not about curiosity, it's more about performance. I will make a PHP example (a really quite simple ) PHP : Stat 1 : $aVeryLongString == $anOtherVeryLongString Stat 2 : $aVeryLongString === $anOtherVeryLongString Stat 2 is really faster than Stat 1 (due to the binary comparison) As I said, I'm coming from PHP, so I was wondering if there was such a difference in Python. Because I was trying to use "is" as for "===". -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Avetis KAZARIAN wrote: After reading the discussion about the same subject ( From: "Thomas Moore" Date: Tue, 1 Nov 2005 21:45:56 +0800 ), I tried myself some tests with some confusing results (I'm a beginner with Python, I'm coming from PHP) For immutable objects, identity is essentially irrelevant. Whether an implementation conserves space by reusing immutable objects with a given value, and if so, how so, depends on the particular version of a particular implementation. Unless one in interested in interpreter implementation, I advise against paying too much attention to the issue. It seems to generate more confusion than enlightenment. How does Python manage strings as objects? Python the language does not 'manage' objects. Particular interpreters do what they do. The CPython sources are decently readable. tjr -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Avetis KAZARIAN wrote: After reading the discussion about the same subject ( From: "Thomas Moore" Date: Tue, 1 Nov 2005 21:45:56 +0800 ), I tried myself some tests with some confusing results (I'm a beginner with Python, I'm coming from PHP) # 1. Short alpha-numeric String without space a = "b747" b = "b747" a is b True # 2. Long alpha-numeric String without space a = "averylongstringbutreallyaveryveryverylongstringwithabout68characters" b = "averylongstringbutreallyaveryveryverylongstringwithabout68characters" a is b True # 3. Short alpha-numeric String with space a = "x y" b = "x y" a is b False # 4. Long alpha-numeric String with space a = "I love Python it s so much better than PHP but sometimes confusing" b = "I love Python it s so much better than PHP but sometimes confusing" a is b False # 5. Empty String a = "" b = "" a is b True # 6. Whitecharacter String : space a = " " b = " " a is b False # 7. Whitecharacter String : new line a = "\n" b = "\n" a is b False # 8. Non-ASCII without space a = "é" b = "é" a is b False # 9. Non-ASCII with space a = "é à" b = "é à" a is b False It seems that any strict ASCII alpha-numeric string is instantiated as an unique object, like a "singleton" ( a = "x" and b = "x" => a is b ) and that any non strict ASCII alpha-numeric string is instantiated as a new object every time with a new id. Conclusion : How does Python manage strings as objects? However the implementors want. That may seem a flippant answer, but it's actually accurate. The choice of whether a new string reuses an existing string or creates a new one is *not* a Python question, but rather a question of implementation. It's a matter of efficiency, and as such each implementation/version of Python may make its own choices. Writing a program that depends on the string identity policy would be considered an erroneous program, and should be avoided. The question now is: Why do you care? The properties of strings do not depend on the implementation's choice, so you shouldn't care because of programming considerations. Perhaps it's just a matter of curiosity on your part. Gary Herron -- Avétis KAZARIAN -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
"Richard Brodie" <[EMAIL PROTECTED]> wrote: > >"Roy Smith" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > >> On the other hand, I can't imagine any reason why you would want to >> define such a class, > >PEP 754? My congratulations on a very subtle and somewhat multicultural joke... -- - Tim Roberts, [EMAIL PROTECTED] Providenza & Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Hi: > Were you planning to write code that relied on id(x) being different > for different but identical strings x or do you just try to understand > what's going on? > Just try to understand what's going on. Thanks All. -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
"Roy Smith" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On the other hand, I can't imagine any reason why you would want to > define such a class, PEP 754? -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Duncan Booth <[EMAIL PROTECTED]> wrote: > If 'a!=b' then it will also be the case that 'a is not b' That's true for strings, and (as far as I know), all pre-defined types, but it's certainly possible to define a class which violates that. class isButNotEqual: def __ne__ (self, other): return True a = isButNotEqual() b = a print "a != b:", a != b print "a is not b:", a is not b frame:play$ ./eq.py a != b: True a is not b: False On the other hand, I can't imagine any reason why you would want to define such a class, other than as a demonstration (or part of an obfuscated Python contest). -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Thomas Moore wrote: a="test" b="test" a is b > > True > > About identity, I think a is not b, but "a is b" returns True. > Does that mean equality and identity is the same thing for strings? Not exactly: >>> a="this is also a string" >>> b="this is also a string" >>> a is b False It's the same with integers. Small ones are shared, big ones aren't. Details vary with Python version. Python sometimes optimizes its memory use by reusing immutable objects. If you've done 'a="test"', and does 'b="test"', Python sees that it can save some memory here, so instead of creating a new string object on the heap (which is what happened when you did 'a="test"'), it makes 'b' refer to that already existing "test" string object that 'a' refers to. It's roughly as if you would have written 'b=a' instead. Of course, it would *never* do this for mutable objects. 'a=[];b=[];a.append(1)' must leave b empty, otherwise Python would be seriously broken. For immutable objects, this isn't a problem though. Once created, the 'test' string object will always be the same until it's destroyed by garbage collection etc. Were you planning to write code that relied on id(x) being different for different but identical strings x or do you just try to understand what's going on? -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Thomas Moore wrote: > I am confused at string identity test: > > Does that mean equality and identity is the same thing for strings? > Definitely not. What is actually happening is that certain string literals get folded together at compile time to refer to the same string constant, but you should never depend on this happening. If 'a!=b' then it will also be the case that 'a is not b', but if 'a==b' then there are no guarantees; any observed behaviour is simply an accident of the implementation and could change: >>> a="test 1" >>> b="test 1" >>> a is b False >>> a="test" >>> b="test" >>> a is b True >>> Testing for identity is only useful in very rare situations, most of the time you are better just to forget there is such a test. -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Thomas Moore wrote: > I am confused at string identity test: > > Python 2.4.1 (#65, Mar 30 2005, 09:13:57) [MSC v.1310 32 bit (Intel)] on > win32 > Type "help", "copyright", "credits" or "license" for more information. a="test" b="test" a is b > True > > About identity, I think a is not b, but "a is b" returns True. for the string literals you used, a is b: >>> a = "test" >>> b = "test" >>> id(a) 10634848 >>> id(b) 10634848 >>> a is b True >>> a == b True > Does that mean equality and identity is the same thing for strings? nope. >>> a = "test!" >>> b = "test!" >>> id(a) 10635264 >>> id(b) 10636256 >>> a is b False >>> a == b True the current CPython implementation automatically interns string literals that happens to look like identifiers. this is an implementation detail, and nothing you can rely on. the current CPython implementation also "interns" single-character strings, so most instances of, say, the string "A" will point to the same object. this is also an implementation detail, and nothing you can rely on. -- http://mail.python.org/mailman/listinfo/python-list
Re: String Identity Test
Thomas Moore wrote: > I am confused at string identity test: a="test" b="test" a is b > > True > About identity, I think a is not b, but "a is b" returns True. > Does that mean equality and identity is the same thing for strings? Nope: >>> a = 'te' + 'st' >>> b = 'test' >>> a is b False You're seeing a coincidence of the implementation. -- Benji York -- http://mail.python.org/mailman/listinfo/python-list