Re: same code to login,one is ok,another is not

2011-08-15 Thread Rafael Durán Castañeda
First one is using http and second one https, did you try an https handler?
as I already pointed out to you in other thread with the same topic...

Please don't spam the list, if you aren 't getting the answers you was
looking for, wait for a while and then repost not just open threads until
get the answer

2011/8/15 Steven D'Aprano steve+comp.lang.pyt...@pearwood.info

 守株待兔 wrote:

  1.http://www.renren.com/Login.do
  it is ok,my code:
 [...]
  2.https://passport.baidu.com/?login
  can't login,my code:
 [...]

 Do you have a question, or are you just sharing the bad news?

 Websites may choose to respond to login attempts differently. Some may
 require cookies, some may not. Some may check the referrer, some may not.
 Some may look at the user agent, some may not.

 If the web developer of the site insists that you log in with a browser, or
 Internet Explorer, you have to fight to convince the web server to let you
 in. Many websites really try hard to prevent bots and scripts logging in.
 The closer you can imitate what a real human being in a browser does, the
 better the chances you can fool the server that you are a real human being
 using a browser and not a bot. (Since your script *is* a bot, you may also
 be in violation of the web site's terms of service.)

 Some web sites may even check how often you try to log in, or how fast.

 But what makes you think you can't log in? Given the response below, it
 looks to me that you did log in, and got a blank page with some javascript
 to redirect you to the real content page. (If you are a web developer and
 you do this, I hate you.) But I may be wrong -- I'm not an expert on these
 things.


  !--STATUS OK--
  htmlheadtitle��§  /title
  meta http-equiv=content-type content=text/html; charset=gb2312
  META http-equiv='Pragma' content='no-cache'
  /head
  body
 
 
 
  script
  var url=./?pwd=1
  url=url.replace(/^\.\//gi,http://passport.baidu.com/;);
  location.href=url;
  /script
 
 
 
  /body
  /html


 --
 Steven

 --
 http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Terry Reedy

On 8/15/2011 12:28 AM, Seebs wrote:

To repeat again: you are free to put in explicit dedent markers that 
will let you re-indent code should all indents be removed.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Help needed with using SWIG wrapped code in Python

2011-08-15 Thread Vipul Raheja
Hi,

I have wrapped a library from C++ to Python using SWIG. But I am facing
problems while importing and using it in Python.

$ python
 import pyossimtest
 import pyossim
 a = [Image1.png,Image2.png]
 b = pyossimtest.Info()
 b.initialize(len(a),a)
Traceback (most recent call last):
  File stdin, line 1, in module
  File pyossimtest.py, line 84, in initialize
def initialize(self, *args): return _pyossimtest.Info_initialize(self,
*args)
TypeError: in method 'Info_initialize', argument 3 of type 'char *[]'

What does this error message imply? I have already handled char** as a
special case in swig using typemaps.


Here is the code excerpt from the swig-generated .cxx file:

SWIGINTERN PyObject *_wrap_Info_initialize(PyObject *SWIGUNUSEDPARM(self),
PyObject *args) {
  PyObject *resultobj = 0;
  pyossimtest::Info *arg1 = (pyossimtest::Info *) 0 ;
  int arg2 ;
  char **arg3 ;
  void *argp1 = 0 ;
  int res1 = 0 ;
  PyObject * obj0 = 0 ;
  PyObject * obj1 = 0 ;
  bool result;

  if (!PyArg_ParseTuple(args,(char *)OO:Info_initialize,obj0,obj1))
SWIG_fail;
  res1 = SWIG_ConvertPtr(obj0, argp1,SWIGTYPE_p_pyossimtest__Info, 0 |  0
);
  if (!SWIG_IsOK(res1)) {
SWIG_exception_fail(SWIG_ArgError(res1), in method ' Info_initialize
', argument  1 of type ' pyossimtest::Info *');
  }
  arg1 = reinterpret_cast pyossimtest::Info * (argp1);
  {
int i;

if (!PyList_Check(obj1))
{
  PyErr_SetString(PyExc_ValueError, Expecting a list);
  return NULL;
}

arg2 = PyList_Size(obj1);
arg3 = (char **) malloc((arg2+1)*sizeof(char *));

for (i = 0; i  arg2; i++)
{
  PyObject *s = PyList_GetItem(obj1,i);
  if (!PyString_Check(s))
  {
free(arg3);
PyErr_SetString(PyExc_ValueError, List items must be strings);
return NULL;
  }
  arg3[i] = PyString_AsString(s);
}
arg3[i] = 0;
  }
  {
try
{
  result = (bool)(arg1)-initialize(arg2,arg3);
}
catch (const std::exception e)
{
  SWIG_exception(SWIG_RuntimeError, e.what());
}
  }
  resultobj = SWIG_From_bool(static_cast bool (result));
  {
if (arg3) free(arg3);
  }
  return resultobj;
fail:
  {
if (arg3) free(arg3);
  }
  return NULL;
}


Kindly help.

Thanks and regards,
Vipul Raheja
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Chris Angelico
On Mon, Aug 15, 2011 at 5:28 AM, Seebs usenet-nos...@seebs.net wrote:
 Character stream:  tab tab tab foo newline tab bar.  This is, as you
 say, *usually* two dedents, but it could be one.

I see your point, though I cannot imagine anyone who would use tab
tab as an indent level. But if you go from 16 spaces down to 8, it's
possible that the script uses eight space indents, or four.

On Mon, Aug 15, 2011 at 8:31 AM, Terry Reedy tjre...@udel.edu wrote:
 To repeat again: you are free to put in explicit dedent markers that will
 let you re-indent code should all indents be removed.


This would be a solution to the above, but it has the feeling of
syntactic salt. (I don't believe that braces are, because they afford
a different form of flexibility.) But sure, if you configure your
editor to do it for you.

I type: if (blah):
It puts:

if (blah):
|
#  if

with the cursor at the | marker. I never really got used to editors
doing this for me, though. It didn't feel right. I prefer an editor
that deals with my indentation but lets me do the rest; when I hit
enter, it autoindents to either the current indent level or one
greater, depending on whether it looks like there ought to be an
indent (which mainly happens when I put a loose { on a line).
Similarly, when I put a } into the file, it removes an indent level
automatically.

Still, it wouldn't be hard to make an editor put those dedent comments
in, if you want them.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pythonw.exe

2011-08-15 Thread Chris Angelico
On Mon, Aug 15, 2011 at 3:14 AM, Dennis Lee Bieber
wlfr...@ix.netcom.com wrote:
        Depends... DOS, to me, is just short for Disk Operating
 System... I've source code (in a book) for K2FDOS, source code for
 LS-DOS 6, and have used the AmigaDOS component of AmigaOS (granted --
 AmigaDOS technically was the part of the OS that gave access to the I/O
 system, and included the command line interpreter...).

        DOS does not automatically mean MicroSoft DOS...

I would say that DOS can, in a Windows context, mean either MS-DOS or
a generic Disk Operating System. The latter sense is no more
appropriate to the CLI than the former; in a modern OS, the part that
truly operates the disk would be either the kernel or the hard disk
driver, depending on your point of view, and neither of those has any
sort of UI.

        What most call DOS is, to me, merely a command line interpreter
 (CLI).

And that's really what we have. A shell. A CLI. A textual command
parser (as opposed to a graphical action system which is what most
GUIs are). It's more similar to a MUD than to an operating system -
first space-separated word is a verb, everything else is modifiers.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Paul Woolcock
On Aug 14, 2011 3:24 PM, Seebs usenet-nos...@seebs.net wrote:
...
  I'm not impressed by arguments based on but if I do something stupid,
like
  select text with my eyes closed and reindent it without looking, I
expect
  the compiler to save my bacon. In my opinion, it's not the compiler's
job
  to protect you from errors caused by sheer carelessness at the keyboard.

 I don't know about sheer carelessness.  Typos happen.  Typos are not
 something you can prevent from happening just by wanting it very much.


If you have valid code caused by improper indentation, shouldn't that be
caught by a good set of unit tests?
-- 
http://mail.python.org/mailman/listinfo/python-list


surprising interaction between function scope and class namespace

2011-08-15 Thread Stefan Behnel

Hi,

I just stumbled over this:

   A = 1
   def foo(x):
  ... A = x
  ... class X:
  ... a = A
  ... return X
  ...
   foo(2).a
  2
   def foo(x):
  ... A = x
  ... class X:
  ... A = A
  ... return X
  ...
   foo(2).A
  1

Works that way in Py2.7 and Py3.3.

I couldn't find any documentation on this, but my *guess* about the 
reasoning is that the second case contains an assignment to A inside of the 
class namespace, and assignments make a variable local to a scope, in this 
case, the function scope. Therefore, the A on the rhs is looked up in that 
scope as well. However, this is just a totally hand waving guess.


Does anyone have a better explanation or know of a place where this 
specific behaviour is documented?


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Re: surprising interaction between function scope and class namespace

2011-08-15 Thread Stefan Behnel

Stefan Behnel, 15.08.2011 11:33:

I just stumbled over this:

  A = 1
  def foo(x):
 ... A = x
 ... class X:
 ... a = A
 ... return X
 ...
  foo(2).a
 2
  def foo(x):
 ... A = x
 ... class X:
 ... A = A
 ... return X
 ...
  foo(2).A
 1

Works that way in Py2.7 and Py3.3.

I couldn't find any documentation on this, but my *guess* about the
reasoning is that the second case contains an assignment to A inside of the
class namespace, and assignments make a variable local to a scope, in this
case, the function scope. Therefore, the A on the rhs is looked up in that
scope as well. However, this is just a totally hand waving guess.


... and an incorrect one, as it turns out. I think I misinterpreted the 
results the wrong way around. Still:



Does anyone have a better explanation or know of a place where this
specific behaviour is documented?


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Re: Help needed with using SWIG wrapped code in Python

2011-08-15 Thread Stefan Behnel

Vipul Raheja, 15.08.2011 10:08:

I have wrapped a library from C++ to Python using SWIG. But I am facing
problems while importing and using it in Python.

$ python
 import pyossimtest
 import pyossim
 a = [Image1.png,Image2.png]
 b = pyossimtest.Info()
 b.initialize(len(a),a)
Traceback (most recent call last):
   File stdin, line 1, inmodule
   File pyossimtest.py, line 84, in initialize
 def initialize(self, *args): return _pyossimtest.Info_initialize(self,
*args)
TypeError: in method 'Info_initialize', argument 3 of type 'char *[]'

What does this error message imply? I have already handled char** as a
special case in swig using typemaps.


I have little experience with SWIG, so I can't comment much on the problem 
at hand, but what I can do is to encourage you to use Cython instead. It's 
faster, easier to use and much more versatile for writing Python wrappers 
than SWIG. Basically, it provides you with the full power and flexibility 
of a programming language, whereas SWIG (like all automatic wrapper 
generators) is always limiting because it has its predefined ways of 
wrapping things, and if they don't fit, you're on your own fighting up-hill 
against it.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Re: surprising interaction between function scope and class namespace

2011-08-15 Thread Duncan Booth
Stefan Behnel stefan...@behnel.de wrote:

 I couldn't find any documentation on this, but my *guess* about the 
 reasoning is that the second case contains an assignment to A inside
 of the class namespace, and assignments make a variable local to a
 scope, in this case, the function scope. Therefore, the A on the rhs
 is looked up in that scope as well. However, this is just a totally
 hand waving guess. 
 
 Does anyone have a better explanation or know of a place where this 
 specific behaviour is documented?
 

If it was a function rather than a class then in the first case you look up 
the non-local variable as expected and in the second case you get an 
UnboundLocalError.

The only difference with the class definition is that instead of 
UnboundLocalError the lookup falls back to the global scope.

This happens because class definitions use LOAD_NAME/STORE_NAME instead of 
LOAD_FAST/STORE_FAST and LOAD_NAME looks first in local scope and then in 
global. I suspect that 
http://docs.python.org/reference/executionmodel.html#naming-and-binding 
should say something about this but it doesn't. 

-- 
Duncan Booth http://kupuguy.blogspot.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: surprising interaction between function scope and class namespace

2011-08-15 Thread Peter Otten
Stefan Behnel wrote:

 Hi,
 
 I just stumbled over this:
 
 A = 1
 def foo(x):
... A = x
... class X:
... a = A
... return X
...
 foo(2).a
2
 def foo(x):
... A = x
... class X:
... A = A
... return X
...
 foo(2).A
1

That's subtle.
 
 Works that way in Py2.7 and Py3.3.
 
 I couldn't find any documentation on this, but my *guess* about the
 reasoning is that the second case contains an assignment to A inside of
 the class namespace, and assignments make a variable local to a scope, in
 this case, the function scope. Therefore, the A on the rhs is looked up in
 that scope as well. However, this is just a totally hand waving guess.

 Does anyone have a better explanation or know of a place where this
 specific behaviour is documented?

I think it's an implementation accident.

Classes have a special opcode, LOAD_NAME, that allows for

 x = 42
 class A:
... x = x
...
 A.x
42

which would fail in a function

 def f():
... x = x
...
 f()
Traceback (most recent call last):
  File stdin, line 1, in module
  File stdin, line 2, in f
UnboundLocalError: local variable 'x' referenced before assignment

LOAD_NAME is pretty dumb, it looks into the local namespace and if that 
lookup fails falls back to the global namespace. Someone probably thought I 
can do better, and reused the static name lookup for nested functions for 
names that occur only on the right-hand side of assignments in a class.

Here's a slightly modified version of your demo:

 x = global  
 def foo():
... x = local   
... class A:  
... x = x 
... return A  
...   
 def bar():
... x = local   
... class A:  
... y = x 
... return A  
...   
 foo().x
'global'   
 bar().y
'local'

Now let's have a glimpse at the bytecode:

 import dis
 foo.func_code.co_consts
(None, 'local', 'A', code object A at 0x7ffe311bdb70, file stdin, line 
3, ())
 dis.dis(foo.func_code.co_consts[3])
  3   0 LOAD_NAME0 (__name__)
  3 STORE_NAME   1 (__module__)

  4   6 LOAD_NAME2 (x)
  9 STORE_NAME   2 (x)
 12 LOAD_LOCALS   
 13 RETURN_VALUE  
 bar.func_code.co_consts  
(None, 'local', 'A', code object A at 0x7ffe311bd828, file stdin, line 
3, ())
 dis.dis(bar.func_code.co_consts[3])
  3   0 LOAD_NAME0 (__name__)
  3 STORE_NAME   1 (__module__)

  4   6 LOAD_DEREF   0 (x)
  9 STORE_NAME   2 (y)
 12 LOAD_LOCALS   
 13 RETURN_VALUE  


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Tim Chase

On 08/14/2011 11:28 PM, Seebs wrote:

I tend to write stuff like

foo.array_of_things.sort.map { block }.join(, )

I like this a lot more than
array = foo.array_of_things
sorted_array = array.sort()
mapped_array = [block(x) for x in sorted_array]
, .join(mapped_array)


If you like the one-liner, this is readily written as

  , .join(block(x) for x in sorted(foo.array_of_things))

Modulo your gripes about string.join(), this is about as succinct 
(and more readable, IMHO) as your initial example.  I've got 
piles of these sorts of things in my ETL code.


-tkc




--
http://mail.python.org/mailman/listinfo/python-list


Re: Ten rules to becoming a Python community member.

2011-08-15 Thread Neil Cerutti
On 2011-08-14, Chris Angelico ros...@gmail.com wrote:
 On Sun, Aug 14, 2011 at 2:21 PM, Irmen de Jong irmen.nos...@xs4all.nl wrote:
 On 14-8-2011 7:57, rantingrick wrote:
 8. Use e.g. as many times as you can! (e.g. e.g.) If you use e.g.
 more than ten times in a single post, you will get an invite to
 Guido's next birthday party; where you'll be forced to do shots whist
 walking the balcony railing wearing wooden shoes!

 I lolled about this one, e.g. I laughed out loud. But where
 are the tulips and windmills for extra credit?

 Greetings from a Dutchman!

No credit. E.g., i.e., exampla gratis, means, for example.

-- 
Neil Cerutti
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help needed with using SWIG wrapped code in Python

2011-08-15 Thread Philip Semanchuk

On Aug 15, 2011, at 4:08 AM, Vipul Raheja wrote:

 Hi,
 
 I have wrapped a library from C++ to Python using SWIG. But I am facing
 problems while importing and using it in Python.

Hi Vipul,
Did you try asking about this on the SWIG mailing list?

bye
Philip


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Johann Hibschman
Chris Angelico ros...@gmail.com writes:

 Why is left-to-right inherently more logical than
 multiplication-before-addition? Why is it more logical than
 right-to-left? And why is changing people's expectations more logical
 than fulfilling them? Python uses the + and - symbols to mean addition
 and subtraction for good reason. Let's not alienate the mathematical
 mind by violating this rule. It would be far safer to go the other way
 and demand parentheses on everything.

I'm a clearly a fool for allowing myself to be drawn into this thread,
but I've been playing a lot recently with the APL-derivative language J,
which uses a right-to-left operator precendence rule.

Pragmatically, this is because J defines roughly a bajillion operators,
and it would be impossible to remember the precendence of them all, but
it makes sense in its own way.

If you read 3 * 10 + 7, using right-to-left, you get three times
something.  Then you read more and you get three times (ten plus
something).  And finally, you get 3*(10+7).  The prefix gives the
continuation for the rest of the calculation; no matter what you
substitute for X in 3*X, you will always just evaluate X, then multply
it by 3.  Likewise, for 3*10+X, no matter what X is, you know you'll
add 10 and multiply by 3.

This took me a while to get used to, but it's definitely a nice
property.  Not much to do with python, but I do like the syntax enough
that I've implemented my own toy evaluator for J-like expressions in
python, to get around the verbosity of some bits of numpy.

Regards,
Johann
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Steven D'Aprano
Seebs wrote:

 I tend to write stuff like
 
 foo.array_of_things.sort.map { block }.join(, )
 
 I like this a lot more than
 array = foo.array_of_things
 sorted_array = array.sort()
 mapped_array = [block(x) for x in sorted_array]
 , .join(mapped_array)

If you insist on a one-liner for four separate operations, what's wrong with
this?

, .join([block(x) for x in sorted(foo.array_of_things)])

Or if you prefer map:

, .join(map(block, sorted(foo.array_of_things))


I think I would be less skeptical about fluent interfaces if they were
written more like Unix shell script pipelines instead of using attribute
access notation:

foo.array_of_things | sort | map block | join , 



-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Roy Smith
In article mailman.2233.1313179799.1164.python-l...@python.org,
 Chris Angelico ros...@gmail.com wrote:

 Python uses the + and - symbols to mean addition
 and subtraction for good reason. Let's not alienate the mathematical
 mind by violating this rule.

Computer programming languages follow math conventions only in the most 
vague ways.  For example, standard math usage dictates that addition is 
commutative.  While this is true for adding integers, it's certainly not 
true for adding strings (in any language which supports string addition).

Where to draw the line between math and programming languages is not an 
easy question.

 It would be far safer to go the other way
 and demand parentheses on everything.

Demand, no, but sometimes it's a good idea.  I've been writing computer 
programs for close to 40 years, and I still have no clue what most of 
the order of operations is.  It's just not worth investing the brain 
cells to remember such trivia (especially since the details change from 
language to language).  Beyond remembering the (apparently) universal 
rule that {*, /} bind tighter than {+, -}, I pretty much just punt on 
everything else and put in extra parens everywhere.

It's not the most efficient way to write code, and probably doesn't even 
result in the prettiest code.  But it sure does eliminate those 
face-palm moments at the end of a long debugging session when you 
realize that somebody got it wrong.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Chris Angelico
On Mon, Aug 15, 2011 at 2:41 PM, Roy Smith r...@panix.com wrote:
 Demand, no, but sometimes it's a good idea.  I've been writing computer
 programs for close to 40 years, and I still have no clue what most of
 the order of operations is.  It's just not worth investing the brain
 cells to remember such trivia (especially since the details change from
 language to language).  Beyond remembering the (apparently) universal
 rule that {*, /} bind tighter than {+, -}, I pretty much just punt on
 everything else and put in extra parens everywhere.


Understandable. I go the other way, though, and keep an operator
precedence table for each language handy; often, what I'm after is not
which one binds more tightly, but what's the symbol for modulo,
which is also (usually) on that same table. Or: Blasted PHP, which
operators have precedence between || and or? which is easy to forget.

And you're right about the details changing from language to language,
hence the operators table *for each language*. But most languages
follow fairly sane rules, and tend to come up with pretty much the
same ordering.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Steven D'Aprano
Roy Smith wrote:

 Computer programming languages follow math conventions only in the most
 vague ways.  For example, standard math usage dictates that addition is
 commutative.  While this is true for adding integers, it's certainly not
 true for adding strings (in any language which supports string addition).

Not quite true for maths either, at least in principle. I'm not aware of any
number types where addition is non-commutative, but subtraction is
noncommutative even for integers, and noncommutative multiplication is
quite common (e.g. matrix multiplication).

And of course, once you start using floating point numbers, you can't assume
commutativity:

 0.1 + 0.7 + 0.3 == 0.3 + 0.7 + 0.1
False


I'm reminded of this quote from John Baez:

The real numbers are the dependable breadwinner of the family, the complete
ordered field we all rely on. The complex numbers are a slightly flashier
but still respectable younger brother: not ordered, but algebraically
complete. The quaternions, being noncommutative, are the eccentric cousin
who is shunned at important family gatherings. But the octonions are the
crazy old uncle nobody lets out of the attic: they are nonassociative.

(And don't even ask about the sedenions...)


-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Chris Angelico
On Mon, Aug 15, 2011 at 3:28 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
 And of course, once you start using floating point numbers, you can't assume
 commutativity:

 0.1 + 0.7 + 0.3 == 0.3 + 0.7 + 0.1
 False


This isn't because programming languages fail to follow mathematics;
it's because floating point numbers do not represent real numbers.
Python doesn't support substring removal using the subtraction
operator, but I'd have to say that floats more closely parallel
strings and other high level objects than they do mathematical reals.
If Python treated __sub__(str,str) as str.replace(str,) then:


 hello world + asdfqwer - d
hello worlasfqwer
 hello world - d + asdfqwer
hello worlasdfqwer

Nobody would expect strings to behave mathematically with subtraction,
because negatives don't make sense. Even sets don't quite work,
although they're closer:

 set(asdf)-set(test)
{'a', 'd', 'f'}

There's no way, in a set, to show a negative reference to 't' and 'e'.
In theory you could do this with dictionaries or collections.Counter,
but subtracting a Counter from a Counter doesn't produce negative
numbers either. No, these constructs do not subtract algebraically,
and I do not think it would be any improvement to the language if they
did.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


string to unicode

2011-08-15 Thread Artie Ziff
if I am using the standard csv library to read contents of a csv file 
which contains Unicode strings (short example: 
'\xe8\x9f\x92\xe8\x9b\x87'), how do I use a python Unicode method such 
as decode or encode to transform this string type into a python unicode 
type? Must I know the encoding (byte groupings) of the Unicode? Can I 
get this from the file? Perhaps I need to open the file with particular 
attributes?


thanks!

--
http://mail.python.org/mailman/listinfo/python-list


Re: string to unicode

2011-08-15 Thread Chris Angelico
On Mon, Aug 15, 2011 at 4:20 PM, Artie Ziff artie.z...@gmail.com wrote:
 if I am using the standard csv library to read contents of a csv file which
 contains Unicode strings (short example: '\xe8\x9f\x92\xe8\x9b\x87'), how do
 I use a python Unicode method such as decode or encode to transform this
 string type into a python unicode type? Must I know the encoding (byte
 groupings) of the Unicode? Can I get this from the file? Perhaps I need to
 open the file with particular attributes?


Start here:

http://www.joelonsoftware.com/articles/Unicode.html

The CSV file, being stored on disk, cannot contain Unicode strings; it
can only contain bytes. If you know the encoding (eg UTF-8, UCS-2,
etc), then you can decode it using that. If you don't, your best bet
is to ask the origin of the file; failing that, check the first few
bytes - if it's \xFF\xFE or \xFE\xFF or \xEF\xBB\xBF, then it's
probably UTF-16LE, UTF-16BE, or UTF-8, respectively (those being the
encodings of the BOM). There may be other clues, too, but normally
it's best to get the encoding separately from the data rather than try
to decode it from the data itself.

Chris Angelico
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: string to unicode

2011-08-15 Thread Adam Tauno Williams
On Mon, 2011-08-15 at 08:20 -0700, Artie Ziff wrote:
 if I am using the standard csv library to read contents of a csv file 
 which contains Unicode strings (short example: 
 '\xe8\x9f\x92\xe8\x9b\x87'), how do I use a python Unicode method such 
 as decode or encode to transform this string type into a python unicode 
 type? Must I know the encoding (byte groupings) of the Unicode? Can I 
 get this from the file? Perhaps I need to open the file with particular 
 attributes?

Open the file with a codec and pass that file-like object to csv.

codecs.open(filename, mode[, encoding[, errors[, buffering]]])

http://docs.python.org/library/codecs.html#codec-objects

-- 
Adam Tauno Williams awill...@whitemice.org LPIC-1, Novell CLA
http://www.whitemiceconsulting.com
OpenGroupware, Cyrus IMAPd, Postfix, OpenLDAP, Samba

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Ten rules to becoming a Python community member.

2011-08-15 Thread Lucio Santi
On Mon, Aug 15, 2011 at 9:06 AM, Neil Cerutti ne...@norwich.edu wrote:

 On 2011-08-14, Chris Angelico ros...@gmail.com wrote:
  On Sun, Aug 14, 2011 at 2:21 PM, Irmen de Jong irmen.nos...@xs4all.nl
 wrote:
  On 14-8-2011 7:57, rantingrick wrote:
  8. Use e.g. as many times as you can! (e.g. e.g.) If you use e.g.
  more than ten times in a single post, you will get an invite to
  Guido's next birthday party; where you'll be forced to do shots whist
  walking the balcony railing wearing wooden shoes!
 
  I lolled about this one, e.g. I laughed out loud. But where
  are the tulips and windmills for extra credit?
 
  Greetings from a Dutchman!

 No credit. E.g., i.e., exampla gratis, means, for example.


The correct spelling is 'exempli gratia'. It's Latin.

i.e., on the other hand, comes from 'id est' ('that is'). Latin too.


Regards,

Lucio
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Java is killing me! (AKA: Java for Pythonheads?)

2011-08-15 Thread Dirk Olmes
On Fri, 12 Aug 2011 17:02:38 +, kj wrote:

 *Please* forgive me for asking a Java question in a Python forum. My
 only excuse for this no-no is that a Python forum is more likely than a
 Java one to have among its readers those who have had to deal with the
 same problems I'm wrestling with.
 
 Due to my job, I have to port some Python code to Java, and write tests
 for the ported code.  (Yes, I've considered finding myself another job,
 but this is not an option in the immediate future.)

Can't you sidestep the porting effort and try to run everything in Jython 
on the JVM? 

-dirk (Python lurker with Java experience)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Seebs
On 2011-08-15, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote:
 Seebs wrote:
 I tend to write stuff like
 
 foo.array_of_things.sort.map { block }.join(, )
 
 I like this a lot more than
 array = foo.array_of_things
 sorted_array = array.sort()
 mapped_array = [block(x) for x in sorted_array]
 , .join(mapped_array)

 If you insist on a one-liner for four separate operations, what's wrong with
 this?

 , .join([block(x) for x in sorted(foo.array_of_things)])

Nothing in particular; I was just contrasting two styles, not asserting
that Python couldn't do that.

In general, I don't like to do things that either involve making a lot of
variables that are assigned to once and then read from once, or making a
whole lot of x = foo(x) type assignments to one variable.  It feels cluttered
to me.

 I think I would be less skeptical about fluent interfaces if they were
 written more like Unix shell script pipelines instead of using attribute
 access notation:

 foo.array_of_things | sort | map block | join , 

Interesting!  I think that's probably why I find them so comfortable;
shell was one of the first languages I got serious about.

-s
-- 
Copyright 2011, all wrongs reversed.  Peter Seebach / usenet-nos...@seebs.net
http://www.seebs.net/log/ -- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) -- get educated!
I am not speaking for my employer, although they do rent some of my opinions.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Seebs
On 2011-08-15, Roy Smith r...@panix.com wrote:
 Demand, no, but sometimes it's a good idea.  I've been writing computer 
 programs for close to 40 years, and I still have no clue what most of 
 the order of operations is.  It's just not worth investing the brain 
 cells to remember such trivia (especially since the details change from 
 language to language).  Beyond remembering the (apparently) universal 
 rule that {*, /} bind tighter than {+, -}, I pretty much just punt on 
 everything else and put in extra parens everywhere.

 It's not the most efficient way to write code, and probably doesn't even 
 result in the prettiest code.  But it sure does eliminate those 
 face-palm moments at the end of a long debugging session when you 
 realize that somebody got it wrong.

Wholehearted agreement.  It is conceivable for me to misremember precedence.
I am pretty reliable at recognizing which things are in which parens.

So I use them even in obvious cases:

foo + (3 * 4)

Never regretted that.  Yes, it's extra typing, a little, but it prevents a
whole category of bugs.

-s
-- 
Copyright 2011, all wrongs reversed.  Peter Seebach / usenet-nos...@seebs.net
http://www.seebs.net/log/ -- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) -- get educated!
I am not speaking for my employer, although they do rent some of my opinions.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Ten rules to becoming a Python community member.

2011-08-15 Thread MRAB

On 15/08/2011 17:18, Lucio Santi wrote:



On Mon, Aug 15, 2011 at 9:06 AM, Neil Cerutti ne...@norwich.edu
mailto:ne...@norwich.edu wrote:

On 2011-08-14, Chris Angelico ros...@gmail.com
mailto:ros...@gmail.com wrote:
  On Sun, Aug 14, 2011 at 2:21 PM, Irmen de Jong
irmen.nos...@xs4all.nl mailto:irmen.nos...@xs4all.nl wrote:
  On 14-8-2011 7:57, rantingrick wrote:
  8. Use e.g. as many times as you can! (e.g. e.g.) If you use
e.g.
  more than ten times in a single post, you will get an invite to
  Guido's next birthday party; where you'll be forced to do shots
whist
  walking the balcony railing wearing wooden shoes!
 
  I lolled about this one, e.g. I laughed out loud. But where
  are the tulips and windmills for extra credit?
 
  Greetings from a Dutchman!

No credit. E.g., i.e., exampla gratis, means, for example.


The correct spelling is 'exempli gratia'. It's Latin.

i.e., on the other hand, comes from 'id est' ('that is'). Latin too.

I remember reading a book about polymorphism in programming. The author 
said

it came from Latin. Nope.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Ten rules to becoming a Python community member.

2011-08-15 Thread Neil Cerutti
On 2011-08-15, MRAB pyt...@mrabarnett.plus.com wrote:
 On 15/08/2011 17:18, Lucio Santi wrote:
 On Mon, Aug 15, 2011 at 9:06 AM, Neil Cerutti ne...@norwich.edu
 mailto:ne...@norwich.edu wrote:

 On 2011-08-14, Chris Angelico ros...@gmail.com
 mailto:ros...@gmail.com wrote:
   On Sun, Aug 14, 2011 at 2:21 PM, Irmen de Jong
 irmen.nos...@xs4all.nl mailto:irmen.nos...@xs4all.nl wrote:
   On 14-8-2011 7:57, rantingrick wrote:
   8. Use e.g. as many times as you can! (e.g. e.g.) If you use
 e.g.
   more than ten times in a single post, you will get an invite to
   Guido's next birthday party; where you'll be forced to do shots
 whist
   walking the balcony railing wearing wooden shoes!
  
   I lolled about this one, e.g. I laughed out loud. But where
   are the tulips and windmills for extra credit?
  
   Greetings from a Dutchman!

 No credit. E.g., i.e., exampla gratis, means, for example.


 The correct spelling is 'exempli gratia'. It's Latin.

Thanks for the correction.

 i.e., on the other hand, comes from 'id est' ('that is').
 Latin too.

 I remember reading a book about polymorphism in programming.
 The author said it came from Latin. Nope.

Sounds more like Greek.

-- 
Neil Cerutti
-- 
http://mail.python.org/mailman/listinfo/python-list


Reusable ways to wrapping thread locking techniques

2011-08-15 Thread python
I'm reviewing a lot of code that has thread acquire and release
locks scattered throughout the code base.

Would a better technique be to use
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reusable ways to wrapping thread locking techniques

2011-08-15 Thread python
Hit send too soon ...

I'm reviewing a lot of code that has thread acquire and release
locks scattered throughout the code base.

Would a better technique be to use contextmanagers (for safe
granular locking within a function) or decorators (function wide
locks) to manage locks or am I making things too complicated?

Am I reinventing the wheel by creating my own versions of above
or are there off-the-shelf, debugged versions of above that one
can use?

Thank you,
Malcolm
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: string to unicode

2011-08-15 Thread Terry Reedy

On 8/15/2011 11:29 AM, Adam Tauno Williams wrote:

On Mon, 2011-08-15 at 08:20 -0700, Artie Ziff wrote:

if I am using the standard csv library to read contents of a csv file
which contains Unicode strings (short example:
'\xe8\x9f\x92\xe8\x9b\x87'), how do I use a python Unicode method such
as decode or encode to transform this string type into a python unicode
type? Must I know the encoding (byte groupings) of the Unicode? Can I
get this from the file? Perhaps I need to open the file with particular
attributes?


Open the file with a codec and pass that file-like object to csv.

codecs.open(filename, mode[, encoding[, errors[, buffering]]])

http://docs.python.org/library/codecs.html#codec-objects


In Python 3, just open with open(... encoding = 'xxx')


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


datetime.strptime w/ non-UTC and non-local TZs?

2011-08-15 Thread Artur Ergashev
I was hoping somebody give me some clarity on how datetime.strptime is
supposed to work, I'm thinking this is a bug, but wanted to see if the
community has any ideas before I submit a bug to the python tracker.
For reference I am in the CDT timezone...

 from datetime import datetime
 date_utc = Sun Jul 24 02:54:11 UTC 2011
 date_local = Sun Jul 24 02:54:11 CDT 2011
 date_other = Sun Jul 24 02:54:11 PDT 2011
 print datetime.strptime(date_utc, '%a %b %d %H:%M:%S %Z %Y')
2011-07-24 02:54:11
 print datetime.strptime(date_local, '%a %b %d %H:%M:%S %Z %Y')
2011-07-24 02:54:11
 print datetime.strptime(date_other, '%a %b %d %H:%M:%S %Z %Y')
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib/python2.6/_strptime.py, line 325, in _strptime
(data_string, format))
ValueError: time data 'Sun Jul 24 02:54:11 PDT 2011' does not match
format '%a %b %d %H:%M:%S %Z %Y'

The format is correct, and it can parse UTC (as mentioned in the
documentation), and it can parse CDT (which is my current time zone),
but using another timezone causes it to fail. Trying to parse the
failing timezone on a computer set on that timezone works correctly.
The documentation seems to indicate that parsing non-UTC TZs isn't
guaranteed to work, but in that case the exception is non-clear
(perhaps there's no elegant solution here).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: string to unicode

2011-08-15 Thread Thomas 'PointedEars' Lahn
Chris Angelico wrote:

 On Mon, Aug 15, 2011 at 4:20 PM, Artie Ziff artie.z...@gmail.com wrote:
 if I am using the standard csv library to read contents of a csv file
 which contains Unicode strings (short example:
 '\xe8\x9f\x92\xe8\x9b\x87'), how do I use a python Unicode method such as
 decode or encode to transform this string type into a python unicode
 type? Must I know the encoding (byte groupings) of the Unicode? Can I get
 this from the file? Perhaps I need to open the file with particular
 attributes?
 
 Start here:
 
 http://www.joelonsoftware.com/articles/Unicode.html
 
 The CSV file, being stored on disk, cannot contain Unicode strings; it
 can only contain bytes. If you know the encoding (eg UTF-8, UCS-2,
 etc), then you can decode it using that. If you don't, your best bet
 is to ask the origin of the file; failing that, check the first few
 bytes - if it's \xFF\xFE or \xFE\xFF or \xEF\xBB\xBF, then it's
 probably UTF-16LE, UTF-16BE, or UTF-8, respectively (those being the
 encodings of the BOM). There may be other clues, too, but normally
 it's best to get the encoding separately from the data rather than try
 to decode it from the data itself.

As this problem really is not a new one, there are several more – if I may 
say so – pythonic approaches:

http://stackoverflow.com/questions/436220/python-is-there-a-way-to-
determine-the-encoding-of-text-file

Improving Billy Mays' matching brackets checker, chardet worked for me 
(the test file was UTF-8-encoded).  Watch for word-wrap:

---
# encoding: utf-8
'''
Created on 2011-07-18

@author: Thomas 'PointedEars' Lahn pointede...@web.de, based on an idea of
Billy Mays 81282ed9a88799d21e77957df2d84bd6514d9...@myhashismyemail.com
in news:j01ph6$knt$1...@speranza.aioe.org 
'''
import sys, os, chardet

pairs = {u'}': u'{', u')': u'(', u']': u'[',
 u'”': u'“', u'›': u'‹', u'»': u'«',
 u'】': u'【', u'〉': u'〈', u'》': u'《',
 u'」': u'「', u'』': u'『'}
valid = set(v for pair in pairs.items() for v in pair)

if __name__ == '__main__':
for dirpath, dirnames, filenames in os.walk(sys.argv[1]):
for name in filenames:
stack = [' ']

file_path = os.path.join(dirpath, name)

with open(file_path, 'rb') as f:
reported = False
lines = enumerate(f, 1)

encoding = chardet.detect(''.join(map(lambda x: x[1], 
lines)))['encoding']

chars = ((c, line_no, col) for line_no, line in lines for 
col, c in enumerate(line.decode(encoding), 1) if c in valid)
for c, line_no, col in chars:
if c in pairs:
if stack[-1] == pairs[c]:
stack.pop()
else:
if not reported:
first_bad = (c, line_no, col)
reported = True
else:
stack.append(c)

print '%s: %s' % (name, (good if len(stack) == 1 else bad 
'%s' at %s:%s % first_bad))
---

HTH

-- 
PointedEars

Bitte keine Kopien per E-Mail. / Please do not Cc: me.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread rantingrick
On Aug 15, 2:31 am, Terry Reedy tjre...@udel.edu wrote:
 On 8/15/2011 12:28 AM, Seebs wrote:

 To repeat again: you are free to put in explicit dedent markers that
 will let you re-indent code should all indents be removed.


As Terry has been trying to say for a while now, use the following
methods to quell your eye pain.


 Use pass statement:


if foo:
if bar:
baz
else:
pass
else:
quux


 Use comments:


if foo:
if bar:
baz
#else bar (or endif or whatever you like)
else:
quux


 Use road signs: :-)


# [Warning: Curves Ahead: Eyeball Parse limit 35 WPM!]
if foo: # [Exit 266: foo] --
# [Right Curve Ahead: slow eyeball parsing to 15 WPM!]
if bar:
baz
else:
pass # -- [Warning: Do not litter!]
else: # [Exit 267: Not Foo] --
# [Right Curve Ahead: slow eyeball parsing to 15 WPM!]
quux
...
# [Eyeball Parse limit 55 WPM!]
...
# [PSA: Friends don't let friends write asinine code]
...
# [Next Rest Stop: NEVER!]



Now you have the nice triangular shape that your eyes have been
trained to recognize! I would suggest to use comments whenever
possible. Of course there will be times when you cannot use a comment
and must use an else clause.

Now you have nothing to complain about :).
-- 
http://mail.python.org/mailman/listinfo/python-list


Why no warnings when re-assigning builtin names?

2011-08-15 Thread Gerrat Rickert
With surprising regularity, I see program postings (eg. on
StackOverflow) from inexperienced Python users  accidentally
re-assigning built-in names.

 

For example, they'll innocently call some variable, list, and assign a
list of items to it.

...and if they're _unlucky_ enough, their program may actually work
(encouraging them to re-use this name in other programs).

 

If they try to use an actual keyword, both the interpreter and compiler
are helpful enough to give them a syntax error, but I think the builtins
should be pseudo-reserved, and a user should explicitly have to do
something *extra* to not receive a warning.

I'd suggest: from __future__ import allow_reassigning_builtins, but I
think this abuse of the __future__ module likely isn't welcome.

 

I know that for testing purposes, this functionality is very convenient,
and I'm not suggesting it be removed.

In these cases, it would be trivial to just require something explicit,
telling the interpreter that the programmer was aware they were
assigning to a builtin name.

 

The situation is slightly different for modules that come with Python.  

Most of us would cringe when seeing something like:  `string = Some
string`;  but at least the user has to explicitly import the string
module for this to actually cause issues (other than readability).

 

 

What sayest the Python community about having an explicit warning
against such un-pythonic behaviour (re-assigning builtin names)?

 

Regards,

Gerrat

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why no warnings when re-assigning builtin names?

2011-08-15 Thread Philip Semanchuk

On Aug 15, 2011, at 5:52 PM, Gerrat Rickert wrote:

 With surprising regularity, I see program postings (eg. on
 StackOverflow) from inexperienced Python users  accidentally
 re-assigning built-in names.
 
 
 
 For example, they'll innocently call some variable, list, and assign a
 list of items to it.
 
 ...and if they're _unlucky_ enough, their program may actually work
 (encouraging them to re-use this name in other programs).

Or they'll assign a class instance to 'object', only to cause weird errors 
later when they use it as a base class.

I agree that this is a problem. The folks on my project who are new-ish to 
Python overwrite builtins fairly often. Since there's never been any 
consequence other than my my vague warnings that something bad might happen as 
a result, it's difficult for them to develop good habits in this regard. It 
doesn't help that Eclipse (their editor of choice) doesn't seem to provide a 
way of coloring builtins differently. (That's what I'm told, anyway. I don't 
use it.)

 If they try to use an actual keyword, both the interpreter and compiler
 are helpful enough to give them a syntax error, but I think the builtins
 should be pseudo-reserved, and a user should explicitly have to do
 something *extra* to not receive a warning.

Unfortunately you're suggesting a change to the language which could break 
existing code. I could see a use for from __future__ import 
squawk_if_i_reassign_a_builtin or something like that, but the current default 
behavior has to remain as it is.

JMO,
Philip

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why no warnings when re-assigning builtin names?

2011-08-15 Thread Chris Angelico
On Mon, Aug 15, 2011 at 10:52 PM, Gerrat Rickert
grick...@coldstorage.com wrote:
 With surprising regularity, I see program postings (eg. on StackOverflow)
 from inexperienced Python users  accidentally re-assigning built-in names.

 For example, they’ll innocently call some variable, “list”, and assign a
 list of items to it.

It's actually masking, not reassigning. That may make it easier or
harder to resolve the issue.

If you want a future directive that deals with it, I'd do it the other
way - from __future__ import mask_builtin_warning or something - so
the default remains as it currently is. But this may be a better job
for a linting script.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why no warnings when re-assigning builtin names?

2011-08-15 Thread Ethan Furman

Gerrat Rickert wrote:
What sayest the Python community about having an explicit warning 
against such un-pythonic behaviour (re-assigning builtin names)?


What makes you think this behavior is unpythonic?  Python is not about 
hand-holding.


~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: Why no warnings when re-assigning builtin names?

2011-08-15 Thread Seebs
On 2011-08-15, Ethan Furman et...@stoneleaf.us wrote:
 Gerrat Rickert wrote:
 What sayest the Python community about having an explicit warning 
 against such un-pythonic behaviour (re-assigning builtin names)?

 What makes you think this behavior is unpythonic?  Python is not about 
 hand-holding.

It seems like something which is sufficiently likely to be a mistake might
deserve a warning -- especially since, so far as I can tell, there's never
going to be a program which can't easily be written to avoid the problematic
behavior.

-s
-- 
Copyright 2011, all wrongs reversed.  Peter Seebach / usenet-nos...@seebs.net
http://www.seebs.net/log/ -- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) -- get educated!
I am not speaking for my employer, although they do rent some of my opinions.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why no warnings when re-assigning builtin names?

2011-08-15 Thread Ethan Furman

Seebs wrote:

On 2011-08-15, Ethan Furman et...@stoneleaf.us wrote:

Gerrat Rickert wrote:
What sayest the Python community about having an explicit warning 
against such un-pythonic behaviour (re-assigning builtin names)?


What makes you think this behavior is unpythonic?  Python is not about 
hand-holding.


It seems like something which is sufficiently likely to be a mistake might
deserve a warning -- especially since, so far as I can tell, there's never
going to be a program which can't easily be written to avoid the problematic
behavior.


sufficiently likely depends entirely on who is doing the coding.  I 
use `open()` for opening my files, and so regularly use `file` as a 
name.  It can also be very handy to mask a built-in when doing something 
even more fun and entertaining and I, for one, have zero desire to have 
Python start warning me about perfectly legitimate code.


Programmers need to learn whichever language they are choosing to code 
in, and if extra help is needed beyond whatever is basic for that 
language, find (or write! ;) the third-party tool to help out.  There 
are at least two linters for Python, and multiple IDEs that can help 
with these, and other, problems.  (I don't much care for IDEs, but I am 
thinking of starting to use a linter, myself.)


~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: Why no warnings when re-assigning builtin names?

2011-08-15 Thread Benjamin Kaplan
On Aug 15, 2011 5:56 PM, Gerrat Rickert grick...@coldstorage.com wrote:

 With surprising regularity, I see program postings (eg. on StackOverflow)
from inexperienced Python users  accidentally re-assigning built-in names.



 For example, they’ll innocently call some variable, “list”, and assign a
list of items to it.

 ...and if they’re _unlucky_ enough, their program may actually work
(encouraging them to re-use this name in other programs).



 If they try to use an actual keyword, both the interpreter and compiler
are helpful enough to give them a syntax error, but I think the builtins
should be “pseudo-reserved”, and a user should explicitly have to do
something *extra* ...


 What sayest the Python community about having an explicit warning against
such un-pythonic behaviour (re-assigning builtin names)?


One of Python's greatest strength's in my opinion is that it strives for
consistency. As much as possible, Python avoids differentiating between
built-in objects (types or otherwise) and user-defined objects. I think it
should stay that way. There are tools that can detect these errors and their
use should be encouraged, but the Python interpreter shouldn't single out
variables which are types that happen to be built-in from any other variable
or any other type.
-- 
http://mail.python.org/mailman/listinfo/python-list


testing if a list contains a sublist

2011-08-15 Thread Johannes
hi list,
what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?

for example:
l1 = [1,2], l2 = [1,2,3,4,5] - l1 is contained in l2
l1 = [1,2,2,], l2 = [1,2,3,4,5] - l1 is not contained in l2
l1 = [1,2,3], l2 = [1,3,5,7] - l1 is not contained in l2

my problem is the second example, which makes it impossible to work with
sets insteads of lists. But something like set.issubset for lists would
be nice.

greatz Johannes
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why no warnings when re-assigning builtin names?

2011-08-15 Thread Dan Stromberg
On Mon, Aug 15, 2011 at 2:52 PM, Gerrat Rickert grick...@coldstorage.comwrote:

 With surprising regularity, I see program postings (eg. on StackOverflow)
 from inexperienced Python users  accidentally re-assigning built-in names.


http://pypi.python.org/pypi/pylint checks for this and many other issues.

I don't know if pyflakes or pychecker do.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: testing if a list contains a sublist

2011-08-15 Thread Dan Stromberg
Check out collections.Counter if you have 2.7 or up.

If you don't, google for multiset or bag types.

On Mon, Aug 15, 2011 at 4:26 PM, Johannes dajo.m...@web.de wrote:

 hi list,
 what is the best way to check if a given list (lets call it l1) is
 totally contained in a second list (l2)?

 for example:
 l1 = [1,2], l2 = [1,2,3,4,5] - l1 is contained in l2
 l1 = [1,2,2,], l2 = [1,2,3,4,5] - l1 is not contained in l2
 l1 = [1,2,3], l2 = [1,3,5,7] - l1 is not contained in l2

 my problem is the second example, which makes it impossible to work with
 sets insteads of lists. But something like set.issubset for lists would
 be nice.

 greatz Johannes
 --
 http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reusable ways to wrapping thread locking techniques

2011-08-15 Thread Cameron Simpson
On 15Aug2011 13:56, pyt...@bdurham.com pyt...@bdurham.com wrote:
| I'm reviewing a lot of code that has thread acquire and release
| locks scattered throughout the code base.
| 
| Would a better technique be to use contextmanagers (for safe
| granular locking within a function) or decorators (function wide
| locks) to manage locks or am I making things too complicated?

No, you're on the money.

| Am I reinventing the wheel by creating my own versions of above
| or are there off-the-shelf, debugged versions of above that one
| can use?

I routinely have:

  some_lock = allocate_lock()
  ...
  with some_lock:
code here!

Doing the equivalent with decorators to make monitors seems perfectly
reasonable to me too.

Do it all with context managers if you can; they do reliable lock
release even when an exception occurs, and don't clutter the code with
distracting and hard to verify cleanup code.

It's worth it just to make everything more readable. The fact that it
makes the code smaller and easier to maintain and less bug prone is just
sugar.

Cheers,
-- 
Cameron Simpson c...@zip.com.au DoD#743
http://www.cskk.ezoshosting.com/cs/

My mind is like a blotter: Soaks it up, gets it backwards.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Data issues with Django and Apache

2011-08-15 Thread John Gordon
In j27bde$dlr$1...@reader1.panix.com John Gordon gor...@panix.com writes:

 The problem is that I get conflicting results as to whether these temporary
 records have reached their expiration date, depending if I search for them
 via an Apache web call or if I do the search locally from a python shell.

 And to make it weirder, the conflicts go away if I stop and restart the
 Apache server, although any new records created after this point will still
 exhibit the issue.

The problem turned out to be a class variable that contained a time filter
with the current time.

But since it was a class variable, it was only evaluated once upon import
and its idea of now was forever frozen at that moment, so it always
compared as being less than any of the lock records that were passed in.

I changed it to be a class method that constructs and returns a new time
filter whenever it is called.

Thanks for everyone's help!

-- 
John Gordon   A is for Amy, who fell down the stairs
gor...@panix.com  B is for Basil, assaulted by bears
-- Edward Gorey, The Gashlycrumb Tinies

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Roy Smith
In article mailman.10.1313417818.27778.python-l...@python.org,
 Chris Angelico ros...@gmail.com wrote:

 Or: Blasted PHP, which
 operators have precedence between || and or? which is easy to forget.
 
 And you're right about the details changing from language to language,
 hence the operators table *for each language*. But most languages
 follow fairly sane rules

How dare you use the words PHP and sane in two adjoining paragraphs!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Roy Smith
In article 4e492d08$0$30003$c3e8da3$54964...@news.astraweb.com,
 Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote:

 I'm reminded of this quote from John Baez:
 
 The real numbers are the dependable breadwinner of the family, the complete
 ordered field we all rely on. The complex numbers are a slightly flashier
 but still respectable younger brother: not ordered, but algebraically
 complete. The quaternions, being noncommutative, are the eccentric cousin
 who is shunned at important family gatherings. But the octonions are the
 crazy old uncle nobody lets out of the attic: they are nonassociative.

Wow, at first glance, I mis-parsed that name as Joan Baez.  Had me 
really confused for a moment.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Chris Angelico
On Tue, Aug 16, 2011 at 1:34 AM, Roy Smith r...@panix.com wrote:
 In article mailman.10.1313417818.27778.python-l...@python.org,
  Chris Angelico ros...@gmail.com wrote:

 Or: Blasted PHP, which
 operators have precedence between || and or? which is easy to forget.

 And you're right about the details changing from language to language,
 hence the operators table *for each language*. But most languages
 follow fairly sane rules

 How dare you use the words PHP and sane in two adjoining paragraphs!

By separating them with the word most.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Ten rules to becoming a Python community member.

2011-08-15 Thread Gregory Ewing

rantingrick wrote:

Used to and supposed to is the verbiage of children
and idiots.


So when we reach a certain age we're meant to abandon
short, concise and idomatic ways of speaking, and substitute
long words and phrases to make ourselves sound adult and
educated?

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Gregory Ewing

Steven D'Aprano wrote:


I'm reminded of this quote from John Baez:

...But the octonions are the
crazy old uncle nobody lets out of the attic: they are nonassociative.

(And don't even ask about the sedenions...)


Aren't they the ones that mutilate cattle and abduct people?

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list


Re: Ten rules to becoming a Python community member.

2011-08-15 Thread Gregory Ewing

I don't mind people using e.g. and i.e. as long
as they use them *correctly*.

Many times people use i.e. when they really
mean e.g.

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Seebs
On 2011-08-16, Roy Smith r...@panix.com wrote:
 In article 4e492d08$0$30003$c3e8da3$54964...@news.astraweb.com,
  Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote:
 I'm reminded of this quote from John Baez:

 The real numbers are the dependable breadwinner of the family, the complete
 ordered field we all rely on. The complex numbers are a slightly flashier
 but still respectable younger brother: not ordered, but algebraically
 complete. The quaternions, being noncommutative, are the eccentric cousin
 who is shunned at important family gatherings. But the octonions are the
 crazy old uncle nobody lets out of the attic: they are nonassociative.

 Wow, at first glance, I mis-parsed that name as Joan Baez.  Had me 
 really confused for a moment.

Would it have been that much weirder than Hedy Lamarr?

-s
-- 
Copyright 2011, all wrongs reversed.  Peter Seebach / usenet-nos...@seebs.net
http://www.seebs.net/log/ -- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) -- get educated!
I am not speaking for my employer, although they do rent some of my opinions.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: testing if a list contains a sublist

2011-08-15 Thread Roy Smith
In article mailman.27.1313450819.27778.python-l...@python.org,
 Johannes dajo.m...@web.de wrote:

 hi list,
 what is the best way to check if a given list (lets call it l1) is
 totally contained in a second list (l2)?
 
 for example:
 l1 = [1,2], l2 = [1,2,3,4,5] - l1 is contained in l2
 l1 = [1,2,2,], l2 = [1,2,3,4,5] - l1 is not contained in l2
 l1 = [1,2,3], l2 = [1,3,5,7] - l1 is not contained in l2
 
 my problem is the second example, which makes it impossible to work with
 sets insteads of lists. But something like set.issubset for lists would
 be nice.
 
 greatz Johannes

import re

def sublist(l1, l2):
s1 = ''.join(map(str, l1))
s2 = ''.join(map(str, l2))
return re.search(s1, s2)

assert sublist([1,2], [1,2,3,4,5])
assert not sublist ([1,2,2], [1,2,3,4,5])
assert not sublist([1,2,3], [1,3,5,7])


(running and ducking)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Ten rules to becoming a Python community member.

2011-08-15 Thread Roy Smith
In article 9att2bf71...@mid.individual.net,
 Gregory Ewing greg.ew...@canterbury.ac.nz wrote:

 rantingrick wrote:
  Used to and supposed to is the verbiage of children
  and idiots.
 
 So when we reach a certain age we're meant to abandon
 short, concise and idomatic ways of speaking, and substitute
 long words and phrases to make ourselves sound adult and
 educated?

Yup.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: surprising interaction between function scope and class namespace

2011-08-15 Thread Gregory Ewing

Peter Otten wrote:

LOAD_NAME is pretty dumb, it looks into the local namespace and if that 
lookup fails falls back to the global namespace. Someone probably thought I 
can do better, and reused the static name lookup for nested functions for 
names that occur only on the right-hand side of assignments in a class.


I doubt that it was a conscious decision -- it just falls
out of the way the compiler looks up names in its symbol
table. In case 1, the compiler finds the name 'a' in
the function's local namespace and generates a LOAD_FAST
opcode, because that's what it does for all function-local
names. In case 2, it finds it in the local namespace of
the class and generates LOAD_NAME, because that's what
it does for all class-local names.

The weirdness arises because classes make use of vestiges
of the old two-namespace system, which bypasses lexical
scoping at run time.

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list


Re: Why no warnings when re-assigning builtin names?

2011-08-15 Thread rantingrick
On Aug 15, 5:13 pm, Philip Semanchuk phi...@semanchuk.com wrote:
 On Aug 15, 2011, at 5:52 PM, Gerrat Rickert wrote:

  With surprising regularity, I see program postings (eg. on
  StackOverflow) from inexperienced Python users  accidentally
  re-assigning built-in names.

  For example, they'll innocently call some variable, list, and assign a
  list of items to it.

  ...and if they're _unlucky_ enough, their program may actually work
  (encouraging them to re-use this name in other programs).

 Or they'll assign a class instance to 'object', only to cause weird errors 
 later when they use it as a base class.

 I agree that this is a problem. The folks on my project who are new-ish to 
 Python overwrite builtins fairly often.

Simple syntax hilighting can head off these issues with great ease.
Heck, python even has a keyword module, and you get a list of built-
ins from the dir() function!

import keyword
import __builtin__
PY_BUILTINS = [str(name) for name in dir(__builtin__) if not
name.startswith('_')]
PY_KEYWORDS = keyword.kwlist

Also Python ships with IDLE (which is a simplistic IDE) and although i
find it needs a bit of work to be what GvR initially dreamed, it works
good enough to get you by. I always say, you must use the correct tool
for the job, and syntax hilight is a must have to avoid these
accidents.
-- 
http://mail.python.org/mailman/listinfo/python-list


TestFixtures 1.12.0 Released!

2011-08-15 Thread Chris Withers

Hi All,

I'm happy to announce a new release of TestFixtures with the following 
changes:


- OutputCapture has grown a `captured` property and can now be
  temporarily disabled using their`disable` method:

  http://packages.python.org/testfixtures/streams.html

- Logging can now be captured only when it exceeds a specified logging
  level:


http://packages.python.org/testfixtures/logging.html#only-capturing-specific-logging

- The handling of timezones has been reworked in both
  `test_datetime` and `test_time`. This is not backwards
  compatible but is much more useful and correct:

  http://packages.python.org/testfixtures/datetime.html#timezones

The package is on PyPI and a full list of all the links to docs, issue 
trackers and the like can be found here:


http://www.simplistix.co.uk/software/python/testfixtures

cheers,

Chris

--
Simplistix - Content Management, Batch Processing  Python Consulting
   - http://www.simplistix.co.uk
--
http://mail.python.org/mailman/listinfo/python-list


Re: Why no warnings when re-assigning builtin names?

2011-08-15 Thread Steven D'Aprano
On Tue, 16 Aug 2011 08:15 am Chris Angelico wrote:

 On Mon, Aug 15, 2011 at 10:52 PM, Gerrat Rickert
 grick...@coldstorage.com wrote:
 With surprising regularity, I see program postings (eg. on StackOverflow)
 from inexperienced Python users  accidentally re-assigning built-in
 names.

 For example, they’ll innocently call some variable, “list”, and assign a
 list of items to it.
 
 It's actually masking, not reassigning. That may make it easier or
 harder to resolve the issue.

The usual term is shadowing builtins, and it's a feature, not a bug :)


 If you want a future directive that deals with it, I'd do it the other
 way - from __future__ import mask_builtin_warning or something - so
 the default remains as it currently is. But this may be a better job
 for a linting script.

Agreed. It's a style issue, nothing else. There's nothing worse about:

def spam(list):
pass

compared to

class thingy: pass

def spam(thingy):
pass

Why should built-ins be treated as more sacred than your own objects?



-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Ten rules to becoming a Python community member.

2011-08-15 Thread Steven D'Aprano
On Tue, 16 Aug 2011 10:48 am Gregory Ewing wrote:

 rantingrick wrote:
 Used to and supposed to is the verbiage of children
 and idiots.
 
 So when we reach a certain age we're meant to abandon
 short, concise and idomatic ways of speaking, and substitute
 long words and phrases to make ourselves sound adult and
 educated?

Say what?

Used to isn't idiom. It is grammatical English. Avoidance of used to is
a hyper-correction done by people who don't know as much about English as
they think, like the grammar policeman let Johnny and I off with a
warning, perhaps the most widespread hyper-correction in English.

(If you take Johnny out of the picture, the policeman let I off with a
warning... which is obviously wrong. Whether Johnny was there or not, the
policeman let *me* off with a warning.)

Used to is unexceptional English:

http://www.englishpage.com/verbpage/usedto.html
http://www.bbc.co.uk/worldservice/learningenglish/youmeus/quiznet/newquiz114.shtml
http://www.englishclub.com/grammar/verbs-m_used-to-do.htm
http://www.learnenglish.de/grammar/usedtotext2.htm


Any-grammatical-errors-are-deliberate-ly y'rs,



-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Ten rules to becoming a Python community member.

2011-08-15 Thread MRAB

On 16/08/2011 01:52, Gregory Ewing wrote:

I don't mind people using e.g. and i.e. as long
as they use them *correctly*.

Many times people use i.e. when they really
mean e.g.


Can you give me an example? :-)
--
http://mail.python.org/mailman/listinfo/python-list


Re: Ten rules to becoming a Python community member.

2011-08-15 Thread Roy Smith
In article 9att9mf71...@mid.individual.net,
 Gregory Ewing greg.ew...@canterbury.ac.nz wrote:

 I don't mind people using e.g. and i.e. as long
 as they use them *correctly*.

The only correct way to use i.e. is to use it to download a better 
browser.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: testing if a list contains a sublist

2011-08-15 Thread Steven D'Aprano
On Tue, 16 Aug 2011 09:26 am Johannes wrote:

 hi list,
 what is the best way to check if a given list (lets call it l1) is
 totally contained in a second list (l2)?

This is not the most efficient algorithm, but for short lists it should be
plenty fast enough:


def contains(alist, sublist):
if len(sublist) == 0 or len(sublist)  len(alist):
return False
start = 0
while True:
try:
p = alist.index(sublist[0], start)
except ValueError:
return False
for i,x in enumerate(sublist):
if alist[p+i] != x:
start = p+1
break
else:  # for loop exits without break
return True


-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Ten rules to becoming a Python community member.

2011-08-15 Thread Seebs
On 2011-08-16, Roy Smith r...@panix.com wrote:
 In article 9att9mf71...@mid.individual.net,
  Gregory Ewing greg.ew...@canterbury.ac.nz wrote:

 I don't mind people using e.g. and i.e. as long
 as they use them *correctly*.

 The only correct way to use i.e. is to use it to download a better 
 browser.

Similarly:

Boy, is there, e.g., on my face now!

-s
-- 
Copyright 2011, all wrongs reversed.  Peter Seebach / usenet-nos...@seebs.net
http://www.seebs.net/log/ -- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) -- get educated!
I am not speaking for my employer, although they do rent some of my opinions.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why no warnings when re-assigning builtin names?

2011-08-15 Thread Philip Semanchuk

On Aug 15, 2011, at 9:32 PM, Steven D'Aprano wrote:

 On Tue, 16 Aug 2011 08:15 am Chris Angelico wrote:
 
 If you want a future directive that deals with it, I'd do it the other
 way - from __future__ import mask_builtin_warning or something - so
 the default remains as it currently is. But this may be a better job
 for a linting script.
 
 Agreed. It's a style issue, nothing else. There's nothing worse about:
 
 def spam(list):
pass
 
 compared to
 
 class thingy: pass
 
 def spam(thingy):
pass
 
 Why should built-ins be treated as more sacred than your own objects?

Because built-ins are described in the official documentation as having a 
specific behavior, while my objects are not.

Yes, it can be useful to replace some of the builtins with one's own 
implementation, and yes, doing so fits in with Python's we're all consenting 
adults philosophy. But replacing (shadowing, masking -- call it what you will) 
builtins is not everyday practice. On the contrary, as the OP Gerrat pointed 
out, it's most often done unwittingly by newcomers to the language who have no 
idea that they've done anything out of the ordinary or potentially confusing. 

If a language feature is most often invoked accidentally without knowledge of 
or regard for its potential negative consequences, then it might be worth 
making it easier to avoid those accidents. 

bye,
Philip
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread Teemu Likonen
* 2011-08-14T01:44:05-07:00 * Chris Rebert wrote:

 I've heard that Dylan is supposedly Lisp, sans parens.
 http://en.wikipedia.org/wiki/Dylan_(programming_language)

It has copied/derived many features from Lisps but it's not a dialect of
Lisp because of the syntax and its consequences.
-- 
http://mail.python.org/mailman/listinfo/python-list


Windows service in production?

2011-08-15 Thread snorble
Anyone know of a Python application running as a Windows service in
production? I'm planning a network monitoring application that runs as
a service and reports back to the central server. Sort of a heartbeat
type agent to assist with this server is down, go check on it type
situations.

If using Visual Studio and C# is the more reliable way, then I'll go
that route. I love Python, but everything I read about Python services
seems to have workarounds ahoy for various situations (or maybe that's
just Windows services in general?). And there seem to be multiple
layers of workarounds, since it takes py2exe (or similar) and there
are numerous workarounds required there, depending on which libraries
and functionality are being used. Overall, reading about Windows
services in Python is not exactly a confidence inspiring experience.
If I knew of a reference example of something reliably running in
production, I'd feel better than copying and pasting some code from a
guy's blog.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: allow line break at operators

2011-08-15 Thread rantingrick
On Aug 15, 11:13 pm, alex23 wuwe...@gmail.com wrote:
 Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote:
  I think I would be less skeptical about fluent interfaces if they were
  written more like Unix shell script pipelines instead of using attribute
  access notation:

  foo.array_of_things | sort | map block | join , 

 I've seen at least one attempt to provide this in Python:

If you want 100% OOP then use Ruby:

rb [3,100,-20].sort.join('#')
-20#3#100

Ruby is great from this angle! The reading proceeds naturally from
right to left. I have become accustomed to reading Python's nested
function calls however it does feel much more natural in Ruby. Of
course, there are architectural reasons why Python cannot do this
linear syntactical processing which lends some paradigm-al niceties to
the python programmer that are not available to the Ruby folks.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Ten rules to becoming a Python community member.

2011-08-15 Thread rantingrick
On Aug 15, 7:48 pm, Gregory Ewing greg.ew...@canterbury.ac.nz wrote:
 rantingrick wrote:
  Used to and supposed to is the verbiage of children
  and idiots.

 So when we reach a certain age we're meant to abandon
 short, concise and idomatic ways of speaking, and substitute
 long words and phrases to make ourselves sound adult and
 educated?

Well that is the idea anyway. Not that we should be overly pedantic
about it of course, however some words need to be cast off before we
leave the primary school playground in the name of articulate
communication.

These specific phrases i have pointed out (used to and supposed
to) are a result of a mind choosing the easy way out instead of
putting in the wee bit more effort required to express one's self in
an articulate manner. Also these two phrases are quite prolifically
used within his community (among others), from the BDFL on down. It's
a slippery slope my friend.
-- 
http://mail.python.org/mailman/listinfo/python-list


How to use python environment created using virtualenv?

2011-08-15 Thread smith jack
I have created a python environment using virtualenv, but when i want
to import such environment to PyDev, error just appears,
it tells there should be a Libs dir, but there is no Libs DIr in the
virtual envronment created using virtualenv, what should i do if
i want to use this virtual environment?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why no warnings when re-assigning builtin names?

2011-08-15 Thread Steven D'Aprano
On Tue, 16 Aug 2011 01:23 pm Philip Semanchuk wrote:

 
 On Aug 15, 2011, at 9:32 PM, Steven D'Aprano wrote:
 
 On Tue, 16 Aug 2011 08:15 am Chris Angelico wrote:
 
 If you want a future directive that deals with it, I'd do it the other
 way - from __future__ import mask_builtin_warning or something - so
 the default remains as it currently is. But this may be a better job
 for a linting script.
 
 Agreed. It's a style issue, nothing else. There's nothing worse about:
 
 def spam(list):
pass
 
 compared to
 
 class thingy: pass
 
 def spam(thingy):
pass
 
 Why should built-ins be treated as more sacred than your own objects?
 
 Because built-ins are described in the official documentation as having a
 specific behavior, while my objects are not.

*My* objects certainly are, because I write documentation for my code. My
docs are no less official than Python's docs.

You can shadow anything. Sometimes shadowing is safe, sometimes it isn't. I
don't see why we should necessarily fear safe shadowing of built-ins more
than we fear unsafe shadowing of non-built-ins.

(I'm not even convinced that making None a reserved word was the right
decision.)

A warning that is off by default won't help the people who need it, because
they don't know enough to turn the warning on. A warning that is on by
default will be helpful to the newbie programmer for the first week or so,
and then will be nothing but an annoyance for the rest of their career.

(For some definition of a week -- some people are slower learners than
others.)


 Yes, it can be useful to replace some of the builtins with one's own
 implementation, and yes, doing so fits in with Python's we're all
 consenting adults philosophy. But replacing (shadowing, masking -- call
 it what you will) builtins is not everyday practice. On the contrary, as
 the OP Gerrat pointed out, it's most often done unwittingly by newcomers
 to the language who have no idea that they've done anything out of the
 ordinary or potentially confusing.

Protecting n00bs from their own errors is an admirable aim, but have you
considered that warnings for something which may be harmless could do more
harm than good? Beginners often lack the skill to distinguish between
harmless warnings that can safely be ignored, and fatal errors that need to
be fixed. Even user friendly warning or error messages tend to unnerve
some beginner coders.

There's not much we can do about outright errors, except to make sure that
the error string is as useful as possible, but we can avoid overloading
beginners with warnings they don't need to care about:


WARNING WARNING WARNING WILL ROBINSON, DANGER DANGER DANGER:
YOUR SISTER'S NAME 'PENNY' SHADOWS THE BRITISH CURRENCY, 
POTENTIAL AMBIGUITY ALERT DANGER DANGER DANGER!


*wink*

Depending on their personality, you may end up teaching them to ignore
warnings, or a superstitious dread of anything that leads to a warning.
Neither is a good outcome.


 If a language feature is most often invoked accidentally without knowledge
 of or regard for its potential negative consequences, then it might be
 worth making it easier to avoid those accidents.

Perhaps. But I'm not so sure it is worth the cost of extra code to detect
shadowing and raise a warning. After all, the average coder probably never
shadows anything, and for those that do, once they get bitten *once* they
either never do it again or learn how to shadow safely. I don't see it as a
problem.



-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list


[issue12266] str.capitalize contradicts oneself

2011-08-15 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset c34772013c53 by Ezio Melotti in branch '3.2':
#12266: Fix str.capitalize() to correctly uppercase/lowercase titlecased and 
cased non-letter characters.
http://hg.python.org/cpython/rev/c34772013c53

New changeset eab17979a586 by Ezio Melotti in branch '2.7':
#12266: Fix str.capitalize() to correctly uppercase/lowercase titlecased and 
cased non-letter characters.
http://hg.python.org/cpython/rev/eab17979a586

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12266
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12266] str.capitalize contradicts oneself

2011-08-15 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 1ea72da11724 by Ezio Melotti in branch 'default':
#12266: merge with 3.2.
http://hg.python.org/cpython/rev/1ea72da11724

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12266
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12266] str.capitalize contradicts oneself

2011-08-15 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

Fixed, thanks for the report!

--
resolution: duplicate - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12266
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-15 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Ezio Melotti rep...@bugs.python.org wrote on Mon, 15 Aug 2011 04:56:55 -: 

 Another thing I noticed is that (at least on wide builds) surrogate pairs are 
 not joined on the fly:
  p
 '\ud800\udc00'
  len(p)
 2
  p.encode('utf-16').decode('utf-16')
 ''
  len(_)
 1

(For those who may not immediately realize from reading the surrogates,
 '' is code point 0x1, the first non-BMP code point.  I piped it 
 through `uniquote -x` just to make sure.)

Yes, that makes perfect sense.  It's something of a buggy feature or featureful 
bug
that UTF-16 does this.  

When you are thinking of arbitrary sequences of code points, which is
something you have be able to do in memory but not in a UTF stream, then
one can say that one has four code points of anything in the 0 .. 0x10
range.  Those can be any arbitrary code points only (1) *while* in memory,
*and* assuming a (2) non-UTF16, ie UTF-32 or UTF-8 representation.  You
cannot do that with UTF-16, which is why it works only on a Python wide
build.  Otherwise they join up.

The reason they join up in UTF-16 is also the reason why unlike in regular
memory where you might be able to use an alternate representation like UTF-8 or
UTF-32, UTF streams cannot contain unpaired surrogates: because if that stream
were in UTF-16, you would never be able to tell the difference between a
sequence of a lead surrogate followed by a tail surrogate and the same thing
meaning just one non-BMP code point.  Since you would not be able to tell the
difference, it always only means the latter, and the former sense is illegal.
This is why lone surrogates are illegal in UTF streams.

In case it isn't obvious, *this* is the source of the [풜--풵] bug in all
the UTF-16 or UCS-2 regex languages. It is why Java 7 added \x{...}, so
that they can rewrite that as [\x{1D49C}--\x{1D4B5}] to pass the regex
compiler, so that it seems something indirect, not just surrogates.

That's why I always check it in my cross-language regex tests.  A 16-bit
language has to have a workaround, somehow, or it will be in trouble.

The Java regex compiler doesn't generate UTF-16 for itself, either. It
generates UTF-32 for its pattern.  You can see this right at the start of
the source code.  This is from the Java Pattern class:

/**
 * Copies regular expression to an int array and invokes the parsing
 * of the expression which will create the object tree.
 */
private void compile() {
// Handle canonical equivalences
if (has(CANON_EQ)  !has(LITERAL)) {
normalize();
} else {
normalizedPattern = pattern;
}
patternLength = normalizedPattern.length();

// Copy pattern to int array for convenience
// Use double zero to terminate pattern
temp = new int[patternLength + 2];

hasSupplementary = false;
int c, count = 0;
// Convert all chars into code points
for (int x = 0; x  patternLength; x += Character.charCount(c)) {
c = normalizedPattern.codePointAt(x);
if (isSupplementary(c)) {
hasSupplementary = true;
}
temp[count++] = c;
}

patternLength = count;   // patternLength now in code points

See how that works?  They use an int(-32) array, not a char(-16) array!  It's 
reasonably
clever, and necessary.  Because it does that, it can now compile \x{1D49C} or 
erstwhile
embedded UTF-8 non-BMP literals into UTF-32, and not get upset by the stormy 
sea of
troubles that surrogates are. You can't have surrogates in ranges if you don't 
do
something like this in a 16-bit language.

Java couldn't fix the [풜--풵] bug except by doing the \x{...} indirection trick,
because they are stuck with UTF-16.  However, they actually can match the string
풜 against the pattern ^.$, and have it fail on ^..$.   Yes, I know: the
code-unit length of that string is 2, but its regex count is just one dot worth.

I *believe* they did it that way because tr18 says it has to work that way, but
they may also have done it just because it makes sense.  My current contact at
Oracle doing regex support is not the guy who originally wrote the class, so I
am not sure.  (He's very good, BTW.  For Java 7, he also added named captures,
script properties, *and* brought the class up to conformance with tr18's 
level 1 requirements.)

I'm thinking Python might be able to do in the regex engine on narrow builds 
the 
sort of thing that Java does.  However, I am also thinking that that might
be a lot of work for a situation more readily addressed by phasing out narrow
builds or at least telling people they should use wide builds to get that thing
to work.  

--tom

==


===  QUASI OFF TOPIC ADDENDUM FOLLOWS  ===

[issue12266] str.capitalize contradicts oneself

2011-08-15 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset d3816fa1bcdf by Ezio Melotti in branch '2.7':
#12266: move the tests in test_unicode.
http://hg.python.org/cpython/rev/d3816fa1bcdf

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12266
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12711] Explain tracker components in devguide

2011-08-15 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

Fixed in http://hg.python.org/devguide/rev/c9dd231b0940

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12711
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12746] normalization is affected by unicode width

2011-08-15 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

See also #12737.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12746
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-15 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

See also #12746.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12737
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12746] normalization is affected by unicode width

2011-08-15 Thread Tom Christiansen

Changes by Tom Christiansen tchr...@perl.com:


--
nosy: +tchrist

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12746
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-15 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

 Keep in mind that we should be able to access and use lone surrogates too, 
 therefore:
 s = '\ud800'  # should be valid
 len(s)  # should this raise an error? (or return 0.5 ;)?
 s[0]  # error here too?
 list(s)  # here too?
 
 p = s + '\udc00'
 len(p)  # 1?
 s[0]  # '\U0001' ?
 s[1]  # IndexError?
 list(p + 'a')  # ['\ud800\udc00', 'a']?
 
 We can still decide that strings with lone surrogates work only with a 
 limited number of methods/functions but:
 1) it's not backward compatible;
 2) it's not very consistent
 
 Another thing I noticed is that (at least on wide builds) surrogate pairs are 
 not joined on the fly:
 p
 '\ud800\udc00'
 len(p)
 2
 p.encode('utf-16').decode('utf-16')
 ''
 len(_)
 1

Hi Tom,

welcome to Python land :-) Here's some more background information
on how Python's Unicode implementation works:

You need to differentiate between Unicode code points stored in
Unicode objects and ones encoded in transfer formats by codecs.

We generally do allow lone surrogates, unassigned code
points, lone combining code points, etc. in Unicode objects
since Python needs to be able to work on all Unicode code points
and build strings with them.

The transfer format codecs do try to combine surrogates
on decoding data on UCS4 builds. On UCS2 builds they create
surrogate pairs as necessary. On output, those pairs will again
be joined to get round-trip safety.

It helps if you think of Python's Unicode objects using UCS2
and UCS4 instead of UTF-16/32. Python does try to make working
with UCS2 easy and in many cases behaves as if it were using
UTF-16 internally, but there are, of course, limits to this. In
practice, you only rarely get to see any of these special cases,
since non-BMP code points are usually not found in everyday
use. If they do become a problem for you, you have the option
of switching to a UCS4 build of Python.

You also have to be aware of the fact that Python started
Unicode in 1999/2000 with Unicode 2.0/3.0, so it uses the
terminology of those versions, some of which has changed in
more recent versions of Unicode.

For more background information, you might want take a look
at this talk from 2002:

http://www.egenix.com/library/presentations/#PythonAndUnicode

Related to the other tickets you opened You'll also find that
collation and compression was already on the plate back then,
but since no one step forward, it wasn't implemented.

Cheers,
-- 
Marc-Andre Lemburg
eGenix.com


2011-10-04: PyCon DE 2011, Leipzig, Germany50 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/

--
nosy: +lemburg
title: Python lib re cannot handle Unicode properly due to narrow/wide bug - 
Python lib re cannot handle Unicode properly due to   narrow/wide bug

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12751] Use macros for surrogates in unicodeobject.c

2011-08-15 Thread STINNER Victor

New submission from STINNER Victor victor.stin...@haypocalc.com:

A lot of code is duplicated in unicodeobject.c to manipulate (encode/decode) 
surrogates. Each function has from one to three different implementations. The 
new decode_ucs4() function adds a new implementation. Attached patch replaces 
this code by macros.

I think that only the implementations of IS_HIGH_SURROGATE and IS_LOW_SURROGATE 
are important for speed. ((ch  0xFC00UL) == 0xD800) (from decode_ucs4) is 
*a little bit* faster than (0xD800 = ch  ch = 0xDBFF) on my CPU (Atom Z520 
@ 1.3 GHz): running test_unicode 4 times takes ~54 sec instead of ~57 sec (-3%).

These 3 macros have to be checked, I wrote the first one:

#define IS_SURROGATE(ch) (((ch)  0xF800UL) == 0xD800)
#define IS_HIGH_SURROGATE(ch) (((ch)  0xFC00UL) == 0xD800)
#define IS_LOW_SURROGATE(ch) (((ch)  0xFC00UL) == 0xDC00)

I added cast to Py_UCS4 in COMBINE_SURROGATES to avoid integer overflow if 
Py_UNICODE is 16 bits (narrow build). It's maybe useless.

#define COMBINE_SURROGATES(ch1, ch2) \
 (Py_UCS4)(ch1)  0x3FF)  10) | ((Py_UCS4)(ch2)  0x3FF)) + 0x1)

HIGH_SURROGATE and LOW_SURROGATE require that their ordinal argument has been 
preproceed to fit in [0; 0x]. I added this requirement in the comment of 
these macros. It would be better to have only one macro to do the two 
operations, but because *p++ (dereference and increment) is usually used, I 
prefer to avoid one unique macro (I don't like passing *p++ in a macro using 
its argument more than once).

Or we may add a third macro using HIGH_SURROGATE and LOW_SURROGATE.

I rewrote the main loop of PyUnicode_EncodeUTF16() to avoid an useless test on 
ch2 on narrow build.

I also added a IS_NONBMP macro just because I prefer macro over hardcoded 
constants.

--
files: unicode_macros.patch
keywords: patch
messages: 142108
nosy: benjamin.peterson, ezio.melotti, haypo, lemburg, loewis, pitrou, tchrist, 
terry.reedy
priority: normal
severity: normal
status: open
title: Use macros for surrogates in unicodeobject.c
versions: Python 3.3
Added file: http://bugs.python.org/file22901/unicode_macros.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12751] Use macros for surrogates in unicodeobject.c

2011-08-15 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

We may use the following unlikely macro for IS_SURROGATE, IS_HIGH_SURROGATE and 
IS_LOW_SURROGATE:

#define likely(x)   __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)

I suppose that we should use microbenchmarks to validate these macros?

Should I open a new issue for this idea?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-15 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

So the issue here is that while using combing chars, str.title() fails to 
titlecase the string properly.

The algorithm implemented by str.title() [0] is quite simple: it loops through 
the code units, and uppercases all the chars that follow a char that is not 
lower/upper/titlecased.
This means that if Déme doesn't use combining accents, the char before the 'm' 
is 'é', 'é' is a lowercase char, so 'm' is not capitalized.
If the 'é' is represented as 'e' + '´', the char before the 'm' is '´', '´' is 
not a lower/upper/titlecase char, so the 'm' is capitalized.

I guess we could normalize the string before doing the title casing, and then 
normalize it back.
Also the str methods don't claim to follow Unicode afaik, so unless we decide 
that they should, we could implement whatever algorithm we want.

[0]: Objects/unicodeobject.c:6752

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12737
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12751] Use macros for surrogates in unicodeobject.c

2011-08-15 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

This has been proposed already in #10542 (the issue also has patches).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-15 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

If the regex module works fine here, I think it's better to leave the re module 
alone and include the regex module in 3.3.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12731
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12734] Request for property support in Python re lib

2011-08-15 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

This indeed should be fixed by replacing 're' with 'regex'.  So I would 
suggest to focus your tests on 'regex' and report them there so that possible 
bugs gets fixed and tested before we include the module in the stdlib.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12734
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12733] Request for grapheme support in Python re lib

2011-08-15 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

As I said on #12734 and #12731, if the 'regex' module address this issue, we 
should just wait until we include it in the stdlib.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12733
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12730] Python's casemapping functions are untrustworthy due to narrow/wide build issues

2011-08-15 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

This is actually a duplicated of #9200.

@Terry

 Besides which, all I see (on Windowsj) in Firefox is things like
 𐐼𐐯𐑅𐐨𐑉𐐯𐐻.

Encoding problem.  Firefox thinks this is some iso-8859-*.  You can fix this 
selecting 'Unicode (UTF-8)' from View - Character Encoding.

 IDLE just has empty boxes.

This is most likely because it doesn't use a font able to display those chars.

--
resolution:  - duplicate
stage: needs patch - committed/rejected
status: open - closed
superseder:  - str.isprintable() is always False for large code points

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12730
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9200] Make str methods work with non-BMP chars on narrow builds

2011-08-15 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

I closed #12730 as a duplicate of this and updated the title of this issue.

--
title: str.isprintable() is always False for large code points - Make str 
methods work with non-BMP chars on narrow builds

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-15 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

See also #12751.

--
nosy: +tchrist

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10542
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9200] Make str methods work with non-BMP chars on narrow builds

2011-08-15 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +tchrist

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12752] locale.normalize does not take unicode strings

2011-08-15 Thread Julian Taylor

New submission from Julian Taylor jtaylor.deb...@googlemail.com:

using unicode strings for locale.normalize gives following traceback with 
python2.7:

~$ python2.7 -c 'import locale; locale.normalize(uen_US)'
Traceback (most recent call last):
  File string, line 1, in module
  File /usr/lib/python2.7/locale.py, line 358, in normalize
fullname = localename.translate(_ascii_lower_map)
TypeError: character mapping must return integer, None or unicode

with python2.6 it works and it also works with non-unicode strings in 2.7

--
components: Unicode
messages: 142118
nosy: jtaylor
priority: normal
severity: normal
status: open
title: locale.normalize does not take unicode strings
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12752
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12752] locale.normalize does not take unicode strings

2011-08-15 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti
stage:  - test needed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12752
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12204] str.upper converts to title

2011-08-15 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 16edc5cf4a79 by Ezio Melotti in branch '3.2':
#12204: document that str.upper().isupper() might be False and add a note about 
cased characters.
http://hg.python.org/cpython/rev/16edc5cf4a79

New changeset fb49394f75ed by Ezio Melotti in branch '2.7':
#12204: document that str.upper().isupper() might be False and add a note about 
cased characters.
http://hg.python.org/cpython/rev/fb49394f75ed

New changeset c821e3a54930 by Ezio Melotti in branch 'default':
#12204: merge with 3.2.
http://hg.python.org/cpython/rev/c821e3a54930

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12204
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12204] str.upper converts to title

2011-08-15 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

Fixed, thanks for the report!

--
resolution:  - fixed
stage: commit review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12204
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-15 Thread Matthew Barnett

Matthew Barnett pyt...@mrabarnett.plus.com added the comment:

For what it's worth, I've had idea about string storage, roughly based on how 
*nix stores data on disk.

If a string is small, point to a block of codepoints.

If a string is medium-sized, point to a block of pointers to codepoint blocks.

If a string is large, point to a block of pointers to pointer blocks.

This means that a large string doesn't need a single large allocation.

The level of indirection can be increased as necessary.

For simplicity, all codepoint blocks contain the same number of codepoints, 
except the final codepoint block, which may contain fewer.

A codepoint block may use the minimum width necessary (1, 2 or 4 bytes) to 
store all of its codepoints.

This means that there are no surrogates and that different sections of the 
string can be stored in different widths to reduce memory usage.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12752] locale.normalize does not take unicode strings

2011-08-15 Thread Julian Taylor

Julian Taylor jtaylor.deb...@googlemail.com added the comment:

this is a regression introduced by fixing http://bugs.python.org/issue1813

This breaks some user code,. e.g. wx.Locale.GetCanonicalName returns unicode.
Example bugs:
https://bugs.launchpad.net/ubuntu/+source/update-manager/+bug/824734
https://bugs.launchpad.net/ubuntu/+source/playonlinux/+bug/825421

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12752
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12752] locale.normalize does not take unicode strings

2011-08-15 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

Julian Taylor wrote:
 
 New submission from Julian Taylor jtaylor.deb...@googlemail.com:
 
 using unicode strings for locale.normalize gives following traceback with 
 python2.7:
 
 ~$ python2.7 -c 'import locale; locale.normalize(uen_US)'
 Traceback (most recent call last):
   File string, line 1, in module
   File /usr/lib/python2.7/locale.py, line 358, in normalize
 fullname = localename.translate(_ascii_lower_map)
 TypeError: character mapping must return integer, None or unicode
 
 with python2.6 it works and it also works with non-unicode strings in 2.7

This looks like a side-effect of the change Antoine made to the locale
module when trying to make the case mapping work in a non-locale
dependent way.

--
nosy: +lemburg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12752
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   >