Re: String Identity Test

2009-03-05 Thread Terry Reedy

Terry Reedy wrote:

Hendrik van Rooyen wrote:

"S Arrowsmith"  wrote:


"Small" integers get a similar treatment:


a = 256
b = 256
a is b

True

a = 257
b = 257
a is b

False


This is weird - I would have thought that the limit
of "small" would be at 255 - the biggest number to fit in a byte.  256 
takes two bytes, so it must be


Ints take as least 4 bytes.  It is commonness of usage that determined 
caching.  The range was expanded a few years ago in anticipation of the 
new bytes type, whose contents are ints, not chars.



an arbitrary limit - could have been set at 300,
or 30 000...


'Small' also goes to -10 or so.  256 was included, at minuscule cost, 
because it is a relatively common number, being the number of bytes.


In fact, 3.0.1 starts with 36 internal references to the cached int 256!

>>> import sys
>>> sys.getrefcount(256)
38 # -2 for the function call

>>> sys.getrefcount(257)
2

>>> [sys.getrefcount(i)-2 for i in range(258)]

shows that only 15 cached ints start with more references. 0 has the 
most with 724 (and that small actually goes to -5).


tjr

--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-05 Thread Terry Reedy

Hendrik van Rooyen wrote:

"S Arrowsmith"  wrote:


"Small" integers get a similar treatment:


a = 256
b = 256
a is b

True

a = 257
b = 257
a is b

False


This is weird - I would have thought that the limit
of "small" would be at 255 - the biggest number to 
fit in a byte.  256 takes two bytes, so it must be

an arbitrary limit - could have been set at 300,
or 30 000...


'Small' also goes to -10 or so.  256 was included, at minuscule cost, 
because it is a relatively common number, being the number of bytes.


--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-05 Thread Bruno Desthuilliers

Avetis KAZARIAN a écrit :


> Well, it's not about curiosity, it's more about performance.


Steve Holden wrote:

(snip)

So, don't try to translate concepts from one language to another.


I'll try ;]


Also and FWIW:

1/ Python has some very handy tools when it comes to perfs - like a 
couple profilers (to identify bottlenecks), or the timeit module (for 
quick benchmarks).


2/ Most "best practice" idioms are frequently discussed here

3/ If you have performance problems related to wrong algorithm/data 
structure, some of us here _really_ enjoy helping !-)


Welcome onboard.
--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-05 Thread Bruno Desthuilliers

Hendrik van Rooyen a écrit :

"S Arrowsmith"  wrote:


"Small" integers get a similar treatment:


a = 256
b = 256
a is b

True

a = 257
b = 257
a is b

False


This is weird - I would have thought that the limit
of "small" would be at 255 - the biggest number to 
fit in a byte.  256 takes two bytes, so it must be

an arbitrary limit


It is, and has changed from version to version.
--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-04 Thread Hendrik van Rooyen
"S Arrowsmith"  wrote:

> "Small" integers get a similar treatment:
> 
> >>> a = 256
> >>> b = 256
> >>> a is b
> True
> >>> a = 257
> >>> b = 257
> >>> a is b
> False

This is weird - I would have thought that the limit
of "small" would be at 255 - the biggest number to 
fit in a byte.  256 takes two bytes, so it must be
an arbitrary limit - could have been set at 300,
or 30 000...

- Hendrik


--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-04 Thread S Arrowsmith
Avetis KAZARIAN   wrote:
>It seems that any strict ASCII alpha-numeric string is instantiated as
>an unique object, like a "singleton" ( a =3D "x" and b =3D "x" =3D> a is b =
>)
>and that any non strict ASCII alpha-numeric string is instantiated as
>a new object every time with a new id.

What no-one appears to have mentioned so far is that the purpose
of this implementation detail is to ensure that there is a single
instance of strings which are valid identifiers, so that you don't
go around creating and destroying string instances just to do an
attribute look-up on an object. A few strings which are not valid
as identifiers get swept up into this system:

>>> a = "1"
>>> b = "1"
>>> a is b
True

"Small" integers get a similar treatment:

>>> a = 256
>>> b = 256
>>> a is b
True
>>> a = 257
>>> b = 257
>>> a is b
False

But as as hopefully been made clear, all this is completely an
implementation detail. (Indeed, the range of "interned" integers
changed from 0--99 to -5--2356 a few versions ago.) So don't,
under any circumstances, rely on it, even when you understand
what's going on.

-- 
\S

   under construction

--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-04 Thread Avetis KAZARIAN

Steve Holden wrote:
> Does PHP really keep only one copy of every string?

Not at all.

I might have said something confusing if you understood that...

> So, don't try to translate concepts from one language to another.
>
> --
> Gabriel Genellina

I'll try ;]
--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-04 Thread Gabriel Genellina
En Wed, 04 Mar 2009 07:07:44 -0200, Avetis KAZARIAN   
escribió:



Gary Herron wrote:

The question now is:  Why do you care?   The properties of strings do
not depend on the implementation's choice, so you shouldn't care because
of programming considerations.  Perhaps it's just a matter of curiosity
on your part.

Gary Herron


Well, it's not about curiosity, it's more about performance.

I will make a PHP example (a really quite simple )

PHP :

Stat 1 : $aVeryLongString == $anOtherVeryLongString
Stat 2 : $aVeryLongString === $anOtherVeryLongString

Stat 2 is really faster than Stat 1 (due to the binary comparison)

As I said, I'm coming from PHP, so I was wondering if there was such a
difference in Python.

Because I was trying to use "is" as for "===".


PHP '==' has no direct correspondence in Python. '===' in PHP is more like  
'==' in Python (but not exactly the same).
In PHP, $x === $y is true if both variables are of the same type *and*  
both have the same value. $x == $y checks only the values, doing type  
conversions as needed, even string -> number; there is no equivalent  
operator in Python. PHP === is called "identity" but isn't related to the  
"is" operator in Python; there is no identity test in PHP with the Python  
semantics.


PHP:
1 == 1
TRUE

1 == 1.0
TRUE

1 == "1"
TRUE

1 == "1.0"
TRUE

1 === 1
TRUE

1 === 1.0
FALSE

1 === "1"
FALSE

1 === "1.0"
FALSE

array(1,2,3) == array(1,2,3)
TRUE

array(1,2,3) === array(1,2,3)
TRUE


Python:
1 == 1
True

1 == 1.0
True

1 == "1"
False

1 == "1.0"
False

[1,2,3] == [1,2,3]
True

[1,2,3] is [1,2,3]
False


So, don't try to translate concepts from one language to another. (Ok,  
it's natural to try to do that if you know PHP, but doesn't work. You have  
to know the differences).

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-04 Thread Steve Holden
Avetis KAZARIAN wrote:
> Gary Herron wrote:
>> The question now is:  Why do you care?   The properties of strings do
>> not depend on the implementation's choice, so you shouldn't care because
>> of programming considerations.  Perhaps it's just a matter of curiosity
>> on your part.
>>
>> Gary Herron
> 
> Well, it's not about curiosity, it's more about performance.
> 
> I will make a PHP example (a really quite simple )
> 
> PHP :
> 
> Stat 1 : $aVeryLongString == $anOtherVeryLongString
> Stat 2 : $aVeryLongString === $anOtherVeryLongString
> 
> Stat 2 is really faster than Stat 1 (due to the binary comparison)
> 
> As I said, I'm coming from PHP, so I was wondering if there was such a
> difference in Python.
> 
> Because I was trying to use "is" as for "===".

Suppose you write

a = b

Thereafter, unless some further assignment is made to either a or b, you
are guaranteed that "a is b" returns True.

This is pretty much the only guarantee you have. There is no guarantee
(across all implementations) that

a = some-expression

b = some-equivalent-expression

will leave "a is b" True.

Does PHP really keep only one copy of every string? Sounds like that
could slow string creation down a little. Essentially it's keeping all
strings in a set. Of course you could do that in Python if you wanted,
but it would certainly slow things down.

Anyway, thanks for looking at Python. I hope you continue to enjoy it!

regards
 Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/

--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-04 Thread Peter Otten
Avetis KAZARIAN wrote:

> Gary Herron wrote:
>> The question now is:  Why do you care?   The properties of strings do
>> not depend on the implementation's choice, so you shouldn't care because
>> of programming considerations.  Perhaps it's just a matter of curiosity
>> on your part.
>>
>> Gary Herron
> 
> Well, it's not about curiosity, it's more about performance.
> 
> I will make a PHP example (a really quite simple )
> 
> PHP :
> 
> Stat 1 : $aVeryLongString == $anOtherVeryLongString
> Stat 2 : $aVeryLongString === $anOtherVeryLongString
> 
> Stat 2 is really faster than Stat 1 (due to the binary comparison)
> 
> As I said, I'm coming from PHP, so I was wondering if there was such a
> difference in Python.
> 
> Because I was trying to use "is" as for "===".

So you have two very long strings that may be equal. How did you get them?
If you read them from a file, that took much more time than the comparison. 

If they are sufficiently likely to be not equal just read them in smaller
chunks and compare these. If you want to compare multiple combinations use
hashes.

If 'a is b' worked like 'a == b' for arbitrary string that would mean that
the python implementation had done a lot of unnecessary 'a == b'
comparisons behind the scene or at least calculated a lot of hash values,
i. e. the ability to use the fast operation would in effect slow down your
program.

Peter
--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-04 Thread Avetis KAZARIAN
Everything's clear now.

Thanks all (especially Christian and Tino) :]
--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-04 Thread Christian Heimes
Avetis KAZARIAN schrieb:
> Gary Herron wrote:
>> The question now is:  Why do you care?   The properties of strings do
>> not depend on the implementation's choice, so you shouldn't care because
>> of programming considerations.  Perhaps it's just a matter of curiosity
>> on your part.
>>
>> Gary Herron
> 
> Well, it's not about curiosity, it's more about performance.
> 
> I will make a PHP example (a really quite simple )
> 
> PHP :
> 
> Stat 1 : $aVeryLongString == $anOtherVeryLongString
> Stat 2 : $aVeryLongString === $anOtherVeryLongString
> 
> Stat 2 is really faster than Stat 1 (due to the binary comparison)
> 
> As I said, I'm coming from PHP, so I was wondering if there was such a
> difference in Python.

Python uses some tricks to speed up string comparison. The struct of the
string type contains the length of the string and it caches the hash of
the string, too.

s1 == s2 is broken down to several steps. Here is the Python equivalent
of the C code:

# for strings, identity is always equality
if s1 is s2:
return True

# compare the size
if len(s1) != len(s2):
return False

# special case strings with a length of one
if len(s1) == 1 and s1[0] == s2[0]:
return True

# compare the hash
if hash(s1) != hash(s2):
return False

# if size and hash are equal compare every char* of the str
for i in xrange(len(s1)):
if s1[i] != s2[i]:
return False

# it's really the same thing
return True

Christian

--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-04 Thread Tino Wildenhain

Avetis KAZARIAN wrote:

Gary Herron wrote:

The question now is:  Why do you care?   The properties of strings do
not depend on the implementation's choice, so you shouldn't care because
of programming considerations.  Perhaps it's just a matter of curiosity
on your part.

Gary Herron


Well, it's not about curiosity, it's more about performance.

I will make a PHP example (a really quite simple )

PHP :

Stat 1 : $aVeryLongString == $anOtherVeryLongString
Stat 2 : $aVeryLongString === $anOtherVeryLongString

Stat 2 is really faster than Stat 1 (due to the binary comparison)

As I said, I'm coming from PHP, so I was wondering if there was such a
difference in Python.


Please keep in mind in both cases there is nothing "for free".
To have identity, you would need to have the same object - which
in case of a string means the interpreter has to find out about
existing string with exactly the same contents and reference it
instead of creating a new object in memory. This takes about at least
the same time (if not more) then just run the compare with both strings
when you need (aka == ).

If you only have a few strings but compare them often, you could
profit from identity and the overhead of installing it would
be neglectable (and you can force this in python with "internal")
but in this case I'd think calculating and working with a hash
instead should be preferred.

Regards
Tino Wildenhain


smime.p7s
Description: S/MIME Cryptographic Signature
--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-04 Thread Avetis KAZARIAN
Gary Herron wrote:
> The question now is:  Why do you care?   The properties of strings do
> not depend on the implementation's choice, so you shouldn't care because
> of programming considerations.  Perhaps it's just a matter of curiosity
> on your part.
>
> Gary Herron

Well, it's not about curiosity, it's more about performance.

I will make a PHP example (a really quite simple )

PHP :

Stat 1 : $aVeryLongString == $anOtherVeryLongString
Stat 2 : $aVeryLongString === $anOtherVeryLongString

Stat 2 is really faster than Stat 1 (due to the binary comparison)

As I said, I'm coming from PHP, so I was wondering if there was such a
difference in Python.

Because I was trying to use "is" as for "===".
--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-04 Thread Terry Reedy

Avetis KAZARIAN wrote:

After reading the discussion about the same subject ( From: "Thomas
Moore"  Date: Tue, 1 Nov 2005 21:45:56
+0800 ), I tried myself some tests with some confusing results (I'm a
beginner with Python, I'm coming from PHP)


For immutable objects, identity is essentially irrelevant.  Whether an 
implementation conserves space by reusing immutable objects with a given 
value, and if so, how so, depends on the particular version of a 
particular implementation.  Unless one in interested in interpreter 
implementation, I advise against paying too much attention to the issue. 
 It seems to generate more confusion than enlightenment.



How does Python manage strings as objects?


Python the language does not 'manage' objects.  Particular interpreters 
do what they do.  The CPython sources are decently readable.


tjr



--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2009-03-03 Thread Gary Herron

Avetis KAZARIAN wrote:

After reading the discussion about the same subject ( From: "Thomas
Moore"  Date: Tue, 1 Nov 2005 21:45:56
+0800 ), I tried myself some tests with some confusing results (I'm a
beginner with Python, I'm coming from PHP)



# 1. Short alpha-numeric String without space

a = "b747"
b = "b747"

  

a is b


True



# 2. Long alpha-numeric String without space

a =
"averylongstringbutreallyaveryveryverylongstringwithabout68characters"
b =
"averylongstringbutreallyaveryveryverylongstringwithabout68characters"

  

a is b


True



# 3. Short alpha-numeric String with space

a = "x y"
b = "x y"

  

a is b


False



# 4. Long alpha-numeric String with space

a = "I love Python it s so much better than PHP but sometimes
confusing"
b = "I love Python it s so much better than PHP but sometimes
confusing"

  

a is b


False



# 5. Empty String

a = ""
b = ""

  

a is b


True


# 6. Whitecharacter  String : space

a = " "
b = " "

  

a is b


False



# 7. Whitecharacter String : new line

a = "\n"
b = "\n"

  

a is b


False



# 8. Non-ASCII without space

a = "é"
b = "é"

  

a is b


False



# 9. Non-ASCII with space

a = "é à"
b = "é à"

  

a is b


False



It seems that any strict ASCII alpha-numeric string is instantiated as
an unique object, like a "singleton" ( a = "x" and b = "x" => a is b )
and that any non strict ASCII alpha-numeric string is instantiated as
a new object every time with a new id.

Conclusion :

How does Python manage strings as objects?
  


However the implementors want. 

That may seem a flippant answer, but it's actually accurate.  The choice 
of whether a new string reuses an existing string or creates a new one 
is *not* a Python question, but rather a question of implementation.  
It's a matter of efficiency, and as such each implementation/version of 
Python may make its own choices.   Writing a program that depends on the 
string identity policy would be considered an erroneous program, and 
should be avoided. 

The question now is:  Why do you care?   The properties of strings do 
not depend on the implementation's choice, so you shouldn't care because 
of programming considerations.  Perhaps it's just a matter of curiosity 
on your part.



Gary Herron





--
Avétis KAZARIAN
--
http://mail.python.org/mailman/listinfo/python-list
  


--
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2005-11-02 Thread Tim Roberts
"Richard Brodie" <[EMAIL PROTECTED]> wrote:
>
>"Roy Smith" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]
>
>> On the other hand, I can't imagine any reason why you would want to
>> define such a class,
>
>PEP 754?

My congratulations on a very subtle and somewhat multicultural joke...
-- 
- Tim Roberts, [EMAIL PROTECTED]
  Providenza & Boekelheide, Inc.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2005-11-02 Thread Thomas Moore
Hi:
> Were you planning to write code that relied on id(x) being different
> for different but identical strings x or do you just try to understand
> what's going on?
> 
Just try to understand what's going on.

Thanks All.



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: String Identity Test

2005-11-01 Thread Richard Brodie

"Roy Smith" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]

> On the other hand, I can't imagine any reason why you would want to
> define such a class,

PEP 754?


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2005-11-01 Thread Roy Smith
Duncan Booth  <[EMAIL PROTECTED]> wrote:
> If 'a!=b' then it will also be the case that 'a is not b'

That's true for strings, and (as far as I know), all pre-defined
types, but it's certainly possible to define a class which violates
that.

class isButNotEqual:
def __ne__ (self, other):
return True

a = isButNotEqual()
b = a
print "a != b:", a != b
print "a is not b:", a is not b


frame:play$ ./eq.py
a != b: True
a is not b: False

On the other hand, I can't imagine any reason why you would want to
define such a class, other than as a demonstration (or part of an
obfuscated Python contest).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2005-11-01 Thread Magnus Lycka
Thomas Moore wrote:
a="test"
b="test"
a is b
> 
> True
> 
> About identity, I think a is not b, but "a is b" returns True.
> Does that mean equality and identity is the same thing for strings?

Not exactly:

>>> a="this is also a string"
>>> b="this is also a string"
>>> a is b
False

It's the same with integers. Small ones are shared, big ones aren't.
Details vary with Python version.

Python sometimes optimizes its memory use by reusing immutable objects.
If you've done 'a="test"', and does 'b="test"', Python sees that it can
save some memory here, so instead of creating a new string object on the
heap (which is what happened when you did 'a="test"'), it makes 'b'
refer to that already existing "test" string object that 'a' refers to.
It's roughly as if you would have written 'b=a' instead.

Of course, it would *never* do this for mutable objects.
'a=[];b=[];a.append(1)' must leave b empty, otherwise Python would be
seriously broken. For immutable objects, this isn't a problem though.
Once created, the 'test' string object will always be the same until
it's destroyed by garbage collection etc.

Were you planning to write code that relied on id(x) being different
for different but identical strings x or do you just try to understand
what's going on?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: String Identity Test

2005-11-01 Thread Duncan Booth
Thomas Moore wrote:
> I am confused at string identity test:
> 


> Does that mean equality and identity is the same thing for strings?
> 
Definitely not. What is actually happening is that certain string literals 
get folded together at compile time to refer to the same string constant, 
but you should never depend on this happening.

If 'a!=b' then it will also be the case that 'a is not b', but if 'a==b' 
then there are no guarantees; any observed behaviour is simply an accident 
of the implementation and could change:

>>> a="test 1"
>>> b="test 1"
>>> a is b
False
>>> a="test"
>>> b="test"
>>> a is b
True
>>> 

Testing for identity is only useful in very rare situations, most of the 
time you are better just to forget there is such a test.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2005-11-01 Thread Fredrik Lundh
Thomas Moore wrote:

> I am confused at string identity test:
>
> Python 2.4.1 (#65, Mar 30 2005, 09:13:57) [MSC v.1310 32 bit (Intel)] on
> win32
> Type "help", "copyright", "credits" or "license" for more information.
 a="test"
 b="test"
 a is b
> True

>
> About identity, I think a is not b, but "a is b" returns True.

for the string literals you used, a is b:

>>> a = "test"
>>> b = "test"
>>> id(a)
10634848
>>> id(b)
10634848
>>> a is b
True
>>> a == b
True

> Does that mean equality and identity is the same thing for strings?

nope.

>>> a = "test!"
>>> b = "test!"
>>> id(a)
10635264
>>> id(b)
10636256
>>> a is b
False
>>> a == b
True

the current CPython implementation automatically interns string literals that
happens to look like identifiers.  this is an implementation detail, and nothing
you can rely on.

the current CPython implementation also "interns" single-character strings,
so most instances of, say, the string "A" will point to the same object.  this
is also an implementation detail, and nothing you can rely on.

 



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String Identity Test

2005-11-01 Thread Benji York
Thomas Moore wrote:
> I am confused at string identity test:

a="test"
b="test"
a is b
> 
> True

> About identity, I think a is not b, but "a is b" returns True.
> Does that mean equality and identity is the same thing for strings?

Nope:

 >>> a = 'te' + 'st'
 >>> b = 'test'
 >>> a is b
False

You're seeing a coincidence of the implementation.
--
Benji York
-- 
http://mail.python.org/mailman/listinfo/python-list