Re: escaping/encoding/formatting in python

2012-05-23 Thread John Nagle

On 4/5/2012 10:10 PM, Steve Howell wrote:

On Apr 5, 9:59 pm, rusirustompm...@gmail.com  wrote:

On Apr 6, 6:56 am, Steve Howellshowel...@yahoo.com  wrote:



You've one-upped me with 2-to-the-N backspace escaping.


   Early attempts at UNIX word processing, nroff and troff,
suffered from that problem, due to a badly designed macro system.

   A question in language design is whether to escape or quote.
Do you write

X = %d % (n,))

or

X =  + str(n)

In general, for anything but output formatting, the second scales
better.  Regular expressions have a bad case of the first.
For a quoted alternative to regular expression syntax, see
SNOBOL or Icon.   SNOBOL allows naming patterns, and those patterns
can then be used as components of other patterns.  SNOBOL
is obsolete, but that approach produced much more readable
code.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: escaping/encoding/formatting in python

2012-05-22 Thread rusi
On Apr 6, 10:13 am, Steve Howell showel...@yahoo.com wrote:
 On Apr 5, 9:59 pm,rusirustompm...@gmail.com wrote:

  On Apr 6, 6:56 am,SteveHowellshowel...@yahoo.com wrote:

   One of the biggest nuisances for programmers, just beneath date/time
   APIs in the pantheon of annoyances, is that we are constantly dealing
   with escaping/encoding/formatting issues.

  [OT for this list]
  If you run
  $ find /usr/share/emacs/23.3/lisp/ -name '*.gz'|xargs zgrep '\\
  \\'
  you can get quite a few results.

  [Suitable assumptions: linux box with emacs installed]

 You've one-upped me with 2-to-the-N backslash escaping.  I've written
 useful scripts before with  (scripts that went through
 three
 levels of interpretation), but four is setting a new bar.  My use of
 three exponentially increasing levels of backslashes back in the day
 was like Beamon's jump in the Mexico City Olympics.  An amazing feat
 for its time, but every record
 eventually gets broken.  Well done.


On a (somewhat distantly) related note, found this old fortune:

Wouldn't the sentence 'I want to put a hyphen between the words Fish
and And and And and Chips in my Fish-And-Chips sign' have been clearer
if quotation marks had been placed before Fish, and between Fish and
and, and and and And, and And and and, and and and And, and And and
and, and and and Chips, as well as after Chips?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: escaping/encoding/formatting in python

2012-04-07 Thread Chris Angelico
On Sat, Apr 7, 2012 at 3:36 PM, Nobody nob...@nowhere.com wrote:
 The delimiter can be chosen either by analysing the string
 or by choosing something a string at random and relying upon a collision
 being statistically improbable.

The same techniques being available to MIME multi-part encoders, and
for the same reason. Nestable structures can be quite annoying to
parse.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: escaping/encoding/formatting in python

2012-04-06 Thread Nobody
On Thu, 05 Apr 2012 22:28:19 -0700, rusi wrote:

 All this mess would vanish if the string-literal-starter and ender
 were different.

You still need an escape character in order to be able to embed an
unbalanced end character.

Tcl and PostScript use mirrored string delimiters (braces for Tcl,
parentheses for PostScript), which results in the worst of both worlds:
they still need an escape character (backslash, in both cases) but now you
can't match tokens with a regexp/DFA.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: escaping/encoding/formatting in python

2012-04-06 Thread rusi
On Apr 6, 1:52 pm, Nobody nob...@nowhere.com wrote:
 On Thu, 05 Apr 2012 22:28:19 -0700, rusi wrote:
  All this mess would vanish if the string-literal-starter and ender
  were different.

 You still need an escape character in order to be able to embed an
 unbalanced end character.

 Tcl and PostScript use mirrored string delimiters (braces for Tcl,
 parentheses for PostScript), which results in the worst of both worlds:
 they still need an escape character (backslash, in both cases) but now you
 can't match tokens with a regexp/DFA.

Yes. I hand it to you that I missed the case of explicitly unbalanced
strings.
But are not such cases rare?
For example code such as:
print ''
print str(something)
print ''

could better be written as
print '%s' % str(something)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: escaping/encoding/formatting in python

2012-04-06 Thread Serhiy Storchaka

06.04.12 08:28, rusi написав(ла):

All this mess would vanish if the string-literal-starter and ender
were different.
[You dont need to escape a open-paren in a lisp sexp]


But you need to escape an escape symbol.

--
http://mail.python.org/mailman/listinfo/python-list


Re: escaping/encoding/formatting in python

2012-04-06 Thread Serhiy Storchaka

06.04.12 16:22, rusi написав(ла):

Yes. I hand it to you that I missed the case of explicitly unbalanced
strings.
But are not such cases rare?


No, unfortunately. }:-(


For example code such as:
print ''
print str(something)
print ''

could better be written as
print '%s' % str(something)


Don't forget to escape %.

--
http://mail.python.org/mailman/listinfo/python-list


Re: escaping/encoding/formatting in python

2012-04-06 Thread Nobody
On Fri, 06 Apr 2012 06:22:13 -0700, rusi wrote:

 But are not such cases rare?

They exist, therefore they have to be supported somehow.

 For example code such as:
 print ''
 print str(something)
 print ''
 
 could better be written as
 print '%s' % str(something)

Not if the text between the delimiters is large.

Consider:

print 'static const char * const data[] = {'
for line in infile:
print '\t%s,' % line.rstrip()
print '};'

Versus:

text = '\n'.join('\t%s,' % line.rstrip() for line in infile)
print 'static const char * const data[] = {\n%s\n};' % text

C++11 solves the problem to an extent by providing raw strings with
user-defined delimiters (up to 16 printable characters excluding
parentheses and backslash), e.g.:

Rdelim(quote:  backslash: \ rparen: ))delim

evaluates to the string:

quote:  backslash: \ rparen: )

The only sequence which cannot appear in such a string is )delim (i.e. a
right parenthesis followed by the chosen delimiter string followed by a
double quote). The delimiter can be chosen either by analysing the string
or by choosing something a string at random and relying upon a collision
being statistically improbable.

-- 
http://mail.python.org/mailman/listinfo/python-list


escaping/encoding/formatting in python

2012-04-05 Thread Steve Howell
One of the biggest nuisances for programmers, just beneath date/time
APIs in the pantheon of annoyances, is that we are constantly dealing
with escaping/encoding/formatting issues.

I wrote this little program as a cheat sheet for myself and others.
Hope it helps.

  # escaping quotes
  legal_string = ['', ', '\, '\'',  ' ]
  for s in legal_string:
print([ + s + ])

  # formatting
  print 'Hello %s' % 'world'
  print Hello %s % 'world'
  planet = 'world'
  print Hello {planet}.format(**locals())
  print Hello {planet}.format(planet=planet)
  print Hello {0}.format(planet)

  # Unicode
  s = u\u0394
  print s # prints a triangle
  print repr(s) == u'\u0394' # True
  print s.encode(utf-8) == \xce\x94 # True
  # other examples/resources???

  # Web encodings
  import urllib
  s = ~foo ~bar
  print urllib.quote_plus(s) == '%7Efoo+%7Ebar' # True
  print urllib.unquote_plus(urllib.quote_plus(s)) == s # True
  import cgi
  s = x  4  x  5
  print cgi.escape(s) == 'x lt; 4 amp; x gt; 5' # True

  # JSON
  import json
  h = {'foo': 'bar'}
  print json.dumps(h) == '{foo: bar}' # True
  try:
bad_json = {'foo': 'bar'}
json.loads(bad_json)
  except:
print 'Must use double quotes in your JSON'

It's tested under Python3.2.  I didn't dare to cover regexes.  It
would be great if somebody could flesh out the Unicode examples or
remind me (and others) of other common APIs that are useful to have in
your bag of tricks.



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: escaping/encoding/formatting in python

2012-04-05 Thread rusi
On Apr 6, 6:56 am, Steve Howell showel...@yahoo.com wrote:
 One of the biggest nuisances for programmers, just beneath date/time
 APIs in the pantheon of annoyances, is that we are constantly dealing
 with escaping/encoding/formatting issues.

[OT for this list]
If you run
$ find /usr/share/emacs/23.3/lisp/ -name '*.gz'|xargs zgrep '\\
\\'
you can get quite a few results.

[Suitable assumptions: linux box with emacs installed]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: escaping/encoding/formatting in python

2012-04-05 Thread Steve Howell
On Apr 5, 9:59 pm, rusi rustompm...@gmail.com wrote:
 On Apr 6, 6:56 am, Steve Howell showel...@yahoo.com wrote:

  One of the biggest nuisances for programmers, just beneath date/time
  APIs in the pantheon of annoyances, is that we are constantly dealing
  with escaping/encoding/formatting issues.

 [OT for this list]
 If you run
 $ find /usr/share/emacs/23.3/lisp/ -name '*.gz'|xargs zgrep '\\
 \\'
 you can get quite a few results.

 [Suitable assumptions: linux box with emacs installed]

You've one-upped me with 2-to-the-N backspace escaping.  I've written
useful scripts before with  (scripts that went through three
levels of interpretation), but four is setting a new bar.  My use of
three backslashes back in the day was like Beamon's jump in the Mexico
City Olympics.  An amazing feat for its time, but every record
eventually gets broken.  Well done.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: escaping/encoding/formatting in python

2012-04-05 Thread Steve Howell
On Apr 5, 9:59 pm, rusi rustompm...@gmail.com wrote:
 On Apr 6, 6:56 am, Steve Howell showel...@yahoo.com wrote:

  One of the biggest nuisances for programmers, just beneath date/time
  APIs in the pantheon of annoyances, is that we are constantly dealing
  with escaping/encoding/formatting issues.

 [OT for this list]
 If you run
 $ find /usr/share/emacs/23.3/lisp/ -name '*.gz'|xargs zgrep '\\
 \\'
 you can get quite a few results.

 [Suitable assumptions: linux box with emacs installed]

You've one-upped me with 2-to-the-N backslash escaping.  I've written
useful scripts before with  (scripts that went through
three
levels of interpretation), but four is setting a new bar.  My use of
three exponentially increasing levels of backslashes back in the day
was like Beamon's jump in the Mexico City Olympics.  An amazing feat
for its time, but every record
eventually gets broken.  Well done.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: escaping/encoding/formatting in python

2012-04-05 Thread rusi
On Apr 6, 10:13 am, Steve Howell showel...@yahoo.com wrote:
 On Apr 5, 9:59 pm, rusi rustompm...@gmail.com wrote:

  On Apr 6, 6:56 am, Steve Howell showel...@yahoo.com wrote:

   One of the biggest nuisances for programmers, just beneath date/time
   APIs in the pantheon of annoyances, is that we are constantly dealing
   with escaping/encoding/formatting issues.

  [OT for this list]
  If you run
  $ find /usr/share/emacs/23.3/lisp/ -name '*.gz'|xargs zgrep '\\
  \\'
  you can get quite a few results.

  [Suitable assumptions: linux box with emacs installed]

 You've one-upped me with 2-to-the-N backslash escaping.  I've written
 useful scripts before with  (scripts that went through
 three
 levels of interpretation), but four is setting a new bar.  My use of
 three exponentially increasing levels of backslashes back in the day
 was like Beamon's jump in the Mexico City Olympics.  An amazing feat
 for its time, but every record
 eventually gets broken.  Well done.

There was a competition here?!
If so I can break my own record -- double the number of backslashes
and you still get hits.
Its just that I was unsure of my ability at typing 32 backslashes (and
making a reasonable post).

On a more serious note this indicates that it is (may be?) a bad idea
for old-fashioned languages (like elisp and C) to have only 1 string-
quoter.

May-be-question-mark because programming language experience tells us
that avoiding recursion (in its infinite guises) by special-casing is
usually a bad idea.

All this mess would vanish if the string-literal-starter and ender
were different.
[You dont need to escape a open-paren in a lisp sexp]
-- 
http://mail.python.org/mailman/listinfo/python-list