On Fri, Apr 8, 2011 at 2:49 AM, Earwin Burrfoot <ear...@gmail.com> wrote: > On Fri, Apr 8, 2011 at 03:01, Robert Muir <rcm...@gmail.com> wrote: >> On Thu, Apr 7, 2011 at 6:48 PM, Chris Hostetter >> <hossman_luc...@fucit.org> wrote: >>> >>> : -1. These files should be readable, for maintaining, debugging and >>> : knowing whats going on. >>> >>> Readability is my main concern ... i don't know (and frequently can't >>> tell) the differnece between a lot of non ascii characters -- and i'm >>> guessing i'm not alone. when it's spelled out explicitly using the >>> character name or escape code, there is no ambiquity about what character >>> was intended, or wether it got screwed up by some tool along the way (ie: >>> the svn server, an svn client, the patch command, a text editor, an IDE, >>> ant's "fixcrlf" task, etc...) >> >> Please take the time, just 5 or 10 minutes, to look thru some of this >> source code and tests. >> >> Imagine if you couldn't just look at the code to see what it does, but >> had to decode from some crazy numeric encoding scheme. >> Imagine if it were this way for things like stopword lists too. >> >> It would be basically impossible for you to look at the code and >> figure out what it does! >> For example, try looking at thai analyzer tests, if these were all >> numbers, how would you know wtf is going on? >> >> Although this comes up from time to time, I stand firm on my -1 >> because its important to me for the source code to be readable. >> I'm not willing to give this up just because some people cannot read >> writing system XYZ. >> >> I have said before, i'm willing to change my -1 vote on this, if *ALL* >> string constants (including english ones) are changed to be character >> escapes. >> If you imagine what the code would look like if english string >> constants were instead codes, then I think you will understand my >> point of view! >> >> Its really really important to source code readability to be able to >> open a file and understand what it does, not to have to use some >> decoder because it uses characters other people dont understand. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > > I think having both raw characters /and/ encoded representation is the > best? (one of them in comments) > I'm all for unicode sources, but at least two things hit me repeatedly: > 1. Tools do screw up, and you have to recover somehow. > eg. IntelliJ IDEA's 'shelve' function uses platform default (MacRoman > in my case) and I've lost some text on things I shelved but never > committed anywhere. > 2. There are characters that look all the same. > E.g. different whitespace/dashes. Or, (if you have cyrillic in your > fonts) I dare you to discern between a/а, c/с, e/е, o/о. > These are different characters from latin and cyrillic charsets (left > latin/right cyrillic), but in 99% fonts they are visually identical. > I had a filter that folded up similarily looking characters, and it > was documented in exactly this way - raw char+code. >
I've worked with a lot of characters on eclipse, and the ones that confuse my eyes the most are l/1 and O/0 So again if we do this, then we must do it for all english text, too --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org