Re: escaping/encoding/formatting in python
On 4/5/2012 10:10 PM, Steve Howell wrote: On Apr 5, 9:59 pm, rusirustompm...@gmail.com wrote: On Apr 6, 6:56 am, Steve Howellshowel...@yahoo.com wrote: You've one-upped me with 2-to-the-N backspace escaping. Early attempts at UNIX word processing, nroff and troff, suffered from that problem, due to a badly designed macro system. A question in language design is whether to escape or quote. Do you write X = %d % (n,)) or X = + str(n) In general, for anything but output formatting, the second scales better. Regular expressions have a bad case of the first. For a quoted alternative to regular expression syntax, see SNOBOL or Icon. SNOBOL allows naming patterns, and those patterns can then be used as components of other patterns. SNOBOL is obsolete, but that approach produced much more readable code. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: escaping/encoding/formatting in python
On Apr 6, 10:13 am, Steve Howell showel...@yahoo.com wrote: On Apr 5, 9:59 pm,rusirustompm...@gmail.com wrote: On Apr 6, 6:56 am,SteveHowellshowel...@yahoo.com wrote: One of the biggest nuisances for programmers, just beneath date/time APIs in the pantheon of annoyances, is that we are constantly dealing with escaping/encoding/formatting issues. [OT for this list] If you run $ find /usr/share/emacs/23.3/lisp/ -name '*.gz'|xargs zgrep '\\ \\' you can get quite a few results. [Suitable assumptions: linux box with emacs installed] You've one-upped me with 2-to-the-N backslash escaping. I've written useful scripts before with (scripts that went through three levels of interpretation), but four is setting a new bar. My use of three exponentially increasing levels of backslashes back in the day was like Beamon's jump in the Mexico City Olympics. An amazing feat for its time, but every record eventually gets broken. Well done. On a (somewhat distantly) related note, found this old fortune: Wouldn't the sentence 'I want to put a hyphen between the words Fish and And and And and Chips in my Fish-And-Chips sign' have been clearer if quotation marks had been placed before Fish, and between Fish and and, and and and And, and And and and, and and and And, and And and and, and and and Chips, as well as after Chips? -- http://mail.python.org/mailman/listinfo/python-list
Re: escaping/encoding/formatting in python
On Sat, Apr 7, 2012 at 3:36 PM, Nobody nob...@nowhere.com wrote: The delimiter can be chosen either by analysing the string or by choosing something a string at random and relying upon a collision being statistically improbable. The same techniques being available to MIME multi-part encoders, and for the same reason. Nestable structures can be quite annoying to parse. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: escaping/encoding/formatting in python
On Thu, 05 Apr 2012 22:28:19 -0700, rusi wrote: All this mess would vanish if the string-literal-starter and ender were different. You still need an escape character in order to be able to embed an unbalanced end character. Tcl and PostScript use mirrored string delimiters (braces for Tcl, parentheses for PostScript), which results in the worst of both worlds: they still need an escape character (backslash, in both cases) but now you can't match tokens with a regexp/DFA. -- http://mail.python.org/mailman/listinfo/python-list
Re: escaping/encoding/formatting in python
On Apr 6, 1:52 pm, Nobody nob...@nowhere.com wrote: On Thu, 05 Apr 2012 22:28:19 -0700, rusi wrote: All this mess would vanish if the string-literal-starter and ender were different. You still need an escape character in order to be able to embed an unbalanced end character. Tcl and PostScript use mirrored string delimiters (braces for Tcl, parentheses for PostScript), which results in the worst of both worlds: they still need an escape character (backslash, in both cases) but now you can't match tokens with a regexp/DFA. Yes. I hand it to you that I missed the case of explicitly unbalanced strings. But are not such cases rare? For example code such as: print '' print str(something) print '' could better be written as print '%s' % str(something) -- http://mail.python.org/mailman/listinfo/python-list
Re: escaping/encoding/formatting in python
06.04.12 08:28, rusi написав(ла): All this mess would vanish if the string-literal-starter and ender were different. [You dont need to escape a open-paren in a lisp sexp] But you need to escape an escape symbol. -- http://mail.python.org/mailman/listinfo/python-list
Re: escaping/encoding/formatting in python
06.04.12 16:22, rusi написав(ла): Yes. I hand it to you that I missed the case of explicitly unbalanced strings. But are not such cases rare? No, unfortunately. }:-( For example code such as: print '' print str(something) print '' could better be written as print '%s' % str(something) Don't forget to escape %. -- http://mail.python.org/mailman/listinfo/python-list
Re: escaping/encoding/formatting in python
On Fri, 06 Apr 2012 06:22:13 -0700, rusi wrote: But are not such cases rare? They exist, therefore they have to be supported somehow. For example code such as: print '' print str(something) print '' could better be written as print '%s' % str(something) Not if the text between the delimiters is large. Consider: print 'static const char * const data[] = {' for line in infile: print '\t%s,' % line.rstrip() print '};' Versus: text = '\n'.join('\t%s,' % line.rstrip() for line in infile) print 'static const char * const data[] = {\n%s\n};' % text C++11 solves the problem to an extent by providing raw strings with user-defined delimiters (up to 16 printable characters excluding parentheses and backslash), e.g.: Rdelim(quote: backslash: \ rparen: ))delim evaluates to the string: quote: backslash: \ rparen: ) The only sequence which cannot appear in such a string is )delim (i.e. a right parenthesis followed by the chosen delimiter string followed by a double quote). The delimiter can be chosen either by analysing the string or by choosing something a string at random and relying upon a collision being statistically improbable. -- http://mail.python.org/mailman/listinfo/python-list
escaping/encoding/formatting in python
One of the biggest nuisances for programmers, just beneath date/time APIs in the pantheon of annoyances, is that we are constantly dealing with escaping/encoding/formatting issues. I wrote this little program as a cheat sheet for myself and others. Hope it helps. # escaping quotes legal_string = ['', ', '\, '\'', ' ] for s in legal_string: print([ + s + ]) # formatting print 'Hello %s' % 'world' print Hello %s % 'world' planet = 'world' print Hello {planet}.format(**locals()) print Hello {planet}.format(planet=planet) print Hello {0}.format(planet) # Unicode s = u\u0394 print s # prints a triangle print repr(s) == u'\u0394' # True print s.encode(utf-8) == \xce\x94 # True # other examples/resources??? # Web encodings import urllib s = ~foo ~bar print urllib.quote_plus(s) == '%7Efoo+%7Ebar' # True print urllib.unquote_plus(urllib.quote_plus(s)) == s # True import cgi s = x 4 x 5 print cgi.escape(s) == 'x lt; 4 amp; x gt; 5' # True # JSON import json h = {'foo': 'bar'} print json.dumps(h) == '{foo: bar}' # True try: bad_json = {'foo': 'bar'} json.loads(bad_json) except: print 'Must use double quotes in your JSON' It's tested under Python3.2. I didn't dare to cover regexes. It would be great if somebody could flesh out the Unicode examples or remind me (and others) of other common APIs that are useful to have in your bag of tricks. -- http://mail.python.org/mailman/listinfo/python-list
Re: escaping/encoding/formatting in python
On Apr 6, 6:56 am, Steve Howell showel...@yahoo.com wrote: One of the biggest nuisances for programmers, just beneath date/time APIs in the pantheon of annoyances, is that we are constantly dealing with escaping/encoding/formatting issues. [OT for this list] If you run $ find /usr/share/emacs/23.3/lisp/ -name '*.gz'|xargs zgrep '\\ \\' you can get quite a few results. [Suitable assumptions: linux box with emacs installed] -- http://mail.python.org/mailman/listinfo/python-list
Re: escaping/encoding/formatting in python
On Apr 5, 9:59 pm, rusi rustompm...@gmail.com wrote: On Apr 6, 6:56 am, Steve Howell showel...@yahoo.com wrote: One of the biggest nuisances for programmers, just beneath date/time APIs in the pantheon of annoyances, is that we are constantly dealing with escaping/encoding/formatting issues. [OT for this list] If you run $ find /usr/share/emacs/23.3/lisp/ -name '*.gz'|xargs zgrep '\\ \\' you can get quite a few results. [Suitable assumptions: linux box with emacs installed] You've one-upped me with 2-to-the-N backspace escaping. I've written useful scripts before with (scripts that went through three levels of interpretation), but four is setting a new bar. My use of three backslashes back in the day was like Beamon's jump in the Mexico City Olympics. An amazing feat for its time, but every record eventually gets broken. Well done. -- http://mail.python.org/mailman/listinfo/python-list
Re: escaping/encoding/formatting in python
On Apr 5, 9:59 pm, rusi rustompm...@gmail.com wrote: On Apr 6, 6:56 am, Steve Howell showel...@yahoo.com wrote: One of the biggest nuisances for programmers, just beneath date/time APIs in the pantheon of annoyances, is that we are constantly dealing with escaping/encoding/formatting issues. [OT for this list] If you run $ find /usr/share/emacs/23.3/lisp/ -name '*.gz'|xargs zgrep '\\ \\' you can get quite a few results. [Suitable assumptions: linux box with emacs installed] You've one-upped me with 2-to-the-N backslash escaping. I've written useful scripts before with (scripts that went through three levels of interpretation), but four is setting a new bar. My use of three exponentially increasing levels of backslashes back in the day was like Beamon's jump in the Mexico City Olympics. An amazing feat for its time, but every record eventually gets broken. Well done. -- http://mail.python.org/mailman/listinfo/python-list
Re: escaping/encoding/formatting in python
On Apr 6, 10:13 am, Steve Howell showel...@yahoo.com wrote: On Apr 5, 9:59 pm, rusi rustompm...@gmail.com wrote: On Apr 6, 6:56 am, Steve Howell showel...@yahoo.com wrote: One of the biggest nuisances for programmers, just beneath date/time APIs in the pantheon of annoyances, is that we are constantly dealing with escaping/encoding/formatting issues. [OT for this list] If you run $ find /usr/share/emacs/23.3/lisp/ -name '*.gz'|xargs zgrep '\\ \\' you can get quite a few results. [Suitable assumptions: linux box with emacs installed] You've one-upped me with 2-to-the-N backslash escaping. I've written useful scripts before with (scripts that went through three levels of interpretation), but four is setting a new bar. My use of three exponentially increasing levels of backslashes back in the day was like Beamon's jump in the Mexico City Olympics. An amazing feat for its time, but every record eventually gets broken. Well done. There was a competition here?! If so I can break my own record -- double the number of backslashes and you still get hits. Its just that I was unsure of my ability at typing 32 backslashes (and making a reasonable post). On a more serious note this indicates that it is (may be?) a bad idea for old-fashioned languages (like elisp and C) to have only 1 string- quoter. May-be-question-mark because programming language experience tells us that avoiding recursion (in its infinite guises) by special-casing is usually a bad idea. All this mess would vanish if the string-literal-starter and ender were different. [You dont need to escape a open-paren in a lisp sexp] -- http://mail.python.org/mailman/listinfo/python-list