On Fri, Apr 8, 2011 at 03:01, Robert Muir <[email protected]> wrote: > On Thu, Apr 7, 2011 at 6:48 PM, Chris Hostetter > <[email protected]> wrote: >> >> : -1. These files should be readable, for maintaining, debugging and >> : knowing whats going on. >> >> Readability is my main concern ... i don't know (and frequently can't >> tell) the differnece between a lot of non ascii characters -- and i'm >> guessing i'm not alone. when it's spelled out explicitly using the >> character name or escape code, there is no ambiquity about what character >> was intended, or wether it got screwed up by some tool along the way (ie: >> the svn server, an svn client, the patch command, a text editor, an IDE, >> ant's "fixcrlf" task, etc...) > > Please take the time, just 5 or 10 minutes, to look thru some of this > source code and tests. > > Imagine if you couldn't just look at the code to see what it does, but > had to decode from some crazy numeric encoding scheme. > Imagine if it were this way for things like stopword lists too. > > It would be basically impossible for you to look at the code and > figure out what it does! > For example, try looking at thai analyzer tests, if these were all > numbers, how would you know wtf is going on? > > Although this comes up from time to time, I stand firm on my -1 > because its important to me for the source code to be readable. > I'm not willing to give this up just because some people cannot read > writing system XYZ. > > I have said before, i'm willing to change my -1 vote on this, if *ALL* > string constants (including english ones) are changed to be character > escapes. > If you imagine what the code would look like if english string > constants were instead codes, then I think you will understand my > point of view! > > Its really really important to source code readability to be able to > open a file and understand what it does, not to have to use some > decoder because it uses characters other people dont understand. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
I think having both raw characters /and/ encoded representation is the best? (one of them in comments) I'm all for unicode sources, but at least two things hit me repeatedly: 1. Tools do screw up, and you have to recover somehow. eg. IntelliJ IDEA's 'shelve' function uses platform default (MacRoman in my case) and I've lost some text on things I shelved but never committed anywhere. 2. There are characters that look all the same. E.g. different whitespace/dashes. Or, (if you have cyrillic in your fonts) I dare you to discern between a/а, c/с, e/е, o/о. These are different characters from latin and cyrillic charsets (left latin/right cyrillic), but in 99% fonts they are visually identical. I had a filter that folded up similarily looking characters, and it was documented in exactly this way - raw char+code. -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: [email protected] Phone: +7 (495) 683-567-4 ICQ: 104465785 --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
