[Tutor] How does # -*- coding: utf-8 -*- work?

2013-01-26 Thread Santosh Kumar
Everything starting with hash character in Python is comment and is
not interpreted by the interpreter. So how does that works? Give me
full explanation.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How does # -*- coding: utf-8 -*- work?

2013-01-26 Thread Joel Goldstick
On Sat, Jan 26, 2013 at 11:38 AM, Santosh Kumar sntshkm...@gmail.comwrote:

 Everything starting with hash character in Python is comment and is
 not interpreted by the interpreter. So how does that works? Give me
 full explanation.


If you google you get this:


http://stackoverflow.com/questions/4872007/where-does-this-come-from-coding-utf-8



 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor




-- 
Joel Goldstick
http://joelgoldstick.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How does # -*- coding: utf-8 -*- work?

2013-01-26 Thread eryksun
On Sat, Jan 26, 2013 at 11:38 AM, Santosh Kumar sntshkm...@gmail.com wrote:

 Everything starting with hash character in Python is comment and is
 not interpreted by the interpreter. So how does that works? Give me
 full explanation.

The encoding declaration is parsed in the process of compiling the
source. CPython uses the function get_coding_spec in tokenizer.c.

CPython 2.7.3 source link:
http://hg.python.org/cpython/file/70274d53c1dd/Parser/tokenizer.c#l205

You can use the parser module to represent the nodes of a parsed
source tree as a sequence of nested tuples. The first item in each
tuple is the node type number. The associated names for each number
are split across two dictionaries. symbol.sym_name maps non-terminal
node types, and token.tok_name maps terminal nodes (i.e. leaf nodes in
the tree). In CPython 2.7/3.3, node types below 256 are terminal.

Here's an example source tree for two types of encoding declaration:

 src1 = '# -*- coding: utf-8 -*-'
 parser.suite(src1).totuple()
(339, (257, (0, '')), 'utf-8')

 src2 = '# coding=utf-8'
 parser.suite(src2).totuple()
(339, (257, (0, '')), 'utf-8')

As expected, src1 and src2 are equivalent. Now find the names of node
types 339, 257, and 0:

 symbol.sym_name[339]
'encoding_decl'
 symbol.sym_name[257]
'file_input'

 token.ISTERMINAL(0)
True
 token.tok_name[0]
'ENDMARKER'

The base node is type 339 (encoding_decl). The child is type 257
(file_input), which is just the empty body of the source (to keep it
simple, src1 and src2 lack statements). Tacked on at the end is the
string value of the encoding_decl (e.g. 'utf-8').
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor