On Fri, Apr 8, 2011 at 2:49 AM, Earwin Burrfoot <ear...@gmail.com> wrote:
> On Fri, Apr 8, 2011 at 03:01, Robert Muir <rcm...@gmail.com> wrote:
>> On Thu, Apr 7, 2011 at 6:48 PM, Chris Hostetter
>> <hossman_luc...@fucit.org> wrote:
>>>
>>> : -1. These files should be readable, for maintaining, debugging and
>>> : knowing whats going on.
>>>
>>> Readability is my main concern ... i don't know (and frequently can't
>>> tell) the differnece between a lot of non ascii characters -- and i'm
>>> guessing i'm not alone.  when it's spelled out explicitly using the
>>> character name or escape code, there is no ambiquity about what character
>>> was intended, or wether it got screwed up by some tool along the way (ie:
>>> the svn server, an svn client, the patch command, a text editor, an IDE,
>>> ant's "fixcrlf" task, etc...)
>>
>> Please take the time, just 5 or 10 minutes, to look thru some of this
>> source code and tests.
>>
>> Imagine if you couldn't just look at the code to see what it does, but
>> had to decode from some crazy numeric encoding scheme.
>> Imagine if it were this way for things like stopword lists too.
>>
>> It would be basically impossible for you to look at the code and
>> figure out what it does!
>> For example, try looking at thai analyzer tests, if these were all
>> numbers, how would you know wtf is going on?
>>
>> Although this comes up from time to time, I stand firm on my -1
>> because its important to me for the source code to be readable.
>> I'm not willing to give this up just because some people cannot read
>> writing system XYZ.
>>
>> I have said before, i'm willing to change my -1 vote on this, if *ALL*
>> string constants (including english ones) are changed to be character
>> escapes.
>> If you imagine what the code would look like if english string
>> constants were instead codes, then I think you will understand my
>> point of view!
>>
>> Its really really important to source code readability to be able to
>> open a file and understand what it does, not to have to use some
>> decoder because it uses characters other people dont understand.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> I think having both raw characters /and/ encoded representation is the
> best? (one of them in comments)
> I'm all for unicode sources, but at least two things hit me repeatedly:
> 1. Tools do screw up, and you have to recover somehow.
> eg. IntelliJ IDEA's 'shelve' function uses platform default (MacRoman
> in my case) and I've lost some text on things I shelved but never
> committed anywhere.
> 2. There are characters that look all the same.
> E.g. different whitespace/dashes. Or, (if you have cyrillic in your
> fonts) I dare you to discern between a/а, c/с, e/е, o/о.
> These are different characters from latin and cyrillic charsets (left
> latin/right cyrillic), but in 99% fonts they are visually identical.
> I had a filter that folded up similarily looking characters, and it
> was documented in exactly this way - raw char+code.
>

I've worked with a lot of characters on eclipse, and the ones that
confuse my eyes the most are l/1 and O/0

So again if we do this, then we must do it for all english text, too

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to