[Harbour] CaseStudy: an alternative to Hungarian notation for a High-Level language such as [x]Harbour

frank van nuffel Fri, 14 Dec 2007 08:37:29 -0800

CaseStudy: an alternative to Hungarian notation for a High-Level language
such as [x]Harbour


Hi everyone,

I've been invited to this forum to share thoughts about a possible
implementation addition to OO regarding native types in Harbour; although
native types are already available to an OO approach, here's what a personal
project targeting Ca-Clipper combined with Class(y)'s OO native type support
resulted in.  While it may contribute to [x]Harbour, i tend to reserve the
initiative in spe on basis of some sort of compliance with you developers to
see some premises used throughout (o)ceans' thinktank, as it is codenamed,
fit for wider acceptance; there are many aspects to it, hence i'll try
presenting modular casestudies hoping you enjoy to read

This abstract is about an alternative (data) type notation methodology, as
part of the names of identifiers, along the lines of what Hungarian notation
came up with initially, but specifically related to how this was introduced
for use with Ca-Clipper.

For reference, here's what wiki says about Hungarian notation:
http://en.wikipedia.org/wiki/Hungarian_notation

Hungarian notation or similar has pros and cons, unavoidably leading to
lenghty discussions among people;  this review is partial, in favour for,
and doesn't want to initiate that debate, but instead show a possible
roadmap in formalistic technical terms as to help code become even more self
documenting, bearing the reader of the code in mind, before the writer;  it
is about potential, leaving the ultimate choice for application to the
individual code designer.   I reckon projects like Harbour and xHarbour not
to set the pace for the community regarding this, but an occasion such as to
design a complete OO interface for native types, in parrallel with the
traditional procedural approach, where legacy dictates what names for
variables and functions to use, now taking advantage of things such as
polymorphism and overloading, raises the question what convention to use
naming the upcoming properties and messages.  In (o)ceans' this question has
been answered in favour of experimentation.  The alternative, say (o)ceans'
notation, has matured over the years, but hasn't been published or announced
until yet; so here you are the first to get it :-)

In order to come forth with a uniform code writing style, the issue whether
systematically adopting or leaving some sort of type expressing identifier
notation, adds content to a possible wide acceptance of a language;
Ca-Clipper's legacy proved xBase programmers familiar with this; my personal
preferences regarding this are just one humble scenario, but from an
esthetic formal point of view, here's what has been concluded in (o)ceans'
[for ca-clipper], i've enjoyed blueprinting over the past years.

The ca-clipper programming environment, although not systematically, uses
some form of type expression for identifier names which seemed having been
widely adopted throughout third-parties and probably end-programmers; trying
to be backward compatible as much as possible with ca-clipper, especially
with the language in its entirety, has been an ongoing exercise in this
project of yours, developers of Harbour, and successful at that, making it a
reality to open up new perspectives, opportunities and applicabilities.

What follows is an attempt, at a purely formalistic level, part of bigger
picture, to expand on some aspects of the language.  Further implications
might become the subject of succeeding case studies, which will be presented
on this forum - if and when invited.

Day 1

Many moons ago C++ documentation of the Borland product line initiated
thoughts about an alternative way to make use of mnemonics expressing the
data type of an identifier, along the lines of existing Hungarian notation.

According to it, given a class named someClass, an instance thereof named
instanceOfSomeClass and a message someMessage, writing

instanceOfSomeClass . someClass:: someMessage()

appeared to me being not only more informative to the human reader, but also
compiling into a little smaller footprint.

As long as someMessage() isn't a virtual method (message), or even when it
is, but when it needs to be called in a non-virtual manner, the explicit use
of a scope operation - someClass:: - while neutral to the compiler (or
almost), does have some self documenting value to the reader, in that it
locates the class where someMessage is defined.  Picture code written this
way, where a method might be send to an instance of a subclass in a broader
hierarchy of classes, from within an editor with no (configured)
hyperlink-on-the-fly capabilities, or from a printed page.

While it is cumbersome to have to write the full class name in the scope
operation during editing, why not recur to the use of mnemonics, in this
case in the scope operation?

note: while doing so, there's yet one step further to go with this,
differentiating between type mnemonics inserted in the scope operation,
showing an upfront uppercase instead of lowercase, especially in the context
of OO for native types, when a traditional procedural translation is to be
obtained; such can be supported by means of the preprocessor; the resulting
writing style is more uniform

someVar := someVar . c:: upper() // executed in OO manner
someVar := someVar . C:: upper() // backtranslated procedurally to someVar
:=
UPPER( someVar )

Day 2

Ca-Clipper introduced an attractive form of Hungarian notation, simple and
straightforward, but left open, imo, essential paths to be addressed by
future evolution; the o for object, with only four object types at the time,
is one such occassion.  With good ergonomy in design for mnemonics, why not
use acronyms consisting, for example, of 2 letters refering to the actual
class type of the object?

someVar . tb:: stabilize() // TBrowse class

Day 3

Most modern IDE's and lots of editors come with some sort of intellisense;
it is more practicable to have identifiers start with their actual naming
part, for lookup sort order, than with type notation components; hence,
instead of a prefix why not use a postfix?

Day 4

Link and (visibility) scope information can be put to use differentiating
between identifiers, pushing polymorhism to a higher degree, by simply
inserting that info in the type notation.  Static and Local variables which
share the actual namepart, may be recognised by means of a digit describing
scope.  Digits are visually far more striking than an additional letter;
combined scope and type inferencing in identifier notation allows, for
instance

static  someVar_c2  // c=char and 2 = static module-wide
static  someVar_l2   // l=logical

...

function SomeFunction ( ... )
static   someVar_c1  // 1 = static function-wide
local    someVar_c7  // 7 = dynamic (in contrast to static; stack vars are
dynamic)==local

Polymorphism!

1/2/3 are static     , 1=private/2=protected/3=public (*)
4/5/6 are reserved for a special sort of typedef
7/8/9 are dynamic, 7=private/8=protected/9=public (*)

(*) private in C++ terminology; ie hidden
    static not to be confused with 'static' keyword, but generally in the
meaning of memory which is compile-time reserved, in the heap for instance
    dynamic in contrast to static, ie. memory allocated at run-time

in next overview, read 'name' as a Ca-Clipper keyword

part 1: scopes for variables (outside the context of a class)

1 static/private    ='static' variable (function/procedure wide)
2 static/protected='static' variable (module wide)
3 static/public      =n/a in Ca-Clipper // 'extern' variable in C(++)

7 dynamic/private    ='local' variable
8 dynamic/protected='private' variable in Ca-Clipper ('private' keyword)
9 dynamic/public      ='public' variable in Ca-Clipper

part 2: scopes for properties (within the context of a class)

1 static/private    =class property; 'hidden'
2 static/protected=class property, 'protected'
3 static/public      =class property, 'public'

7 dynamic/private    =instance property; 'hidden'
8 dynamic/protected=instance property; 'protected'
9 dynamic/public      =instance property; 'public'

general type/scope notation layout for variables/properties:

variableOrPropertyName_$#

         $ = type mnemonic
         # = scope info

Day 5

Why not using type/scope notation for functions/messages also?  C++ requires
a virtual message to not differ in returning type when overloaded; this is
the only requirement relevant for a higher-level language such as
[x]Harbour, which has to be taken into consideration, compared to a low
and/or strongly-typed language, such as from the C family, since argument
type mangling isn't applied in the former.

Preserving the return (data) type while overloading virtual messages doesn't
conflict with practising type notation for these; and for non-virtual
messages, type notation is even free from such consideration; hence, why not
applying type notation to functions/messages?

part 3: scopes for functions (outside the context of a class)

1 static/private    =init/exit procedure (since kind of private)
2 static/protected='static' function/procedure (module wide)
3 static/public      =function/procedure (normal function/procedure)

7 dynamic/private    =n/a in Ca-Clipper
8 dynamic/protected=n/a in Ca-Clipper
9 dynamic/public      =n/a in Ca-Clipper

.oO could xHarbour (and Harbour?) make use of the latter group for its
DYNAMIC keyword?

part 2: scopes for properties (within the context of a class)

1 static/private    =class message; 'hidden'
2 static/protected=class message, 'protected'
3 static/public      =class message, 'public'

7 dynamic/private    =instance message; 'hidden'
8 dynamic/protected=instance message; 'protected'
9 dynamic/public      =instance message; 'public'

general type/scope notation layout for functions/messages:

functionOrMessageName#$

       # = scope info
       $ = type mnemonic

Differentiating the general type/scope notation layout between
variables/properties and functions/messages allows for an advanced notation
layout, although not applicable in Ca-Clipper;

functionOrMessagePointerName#$#p

ie, for pointers to functions/messages, where the first digit references the
scope of the function/message where the pointer points to, the type mnemonic
references the return type of the function/message and the last digit
expresses the scope of the pointer itself; the trailing p indicates it
is a single-indirection pointer

Like p in the previous example, there are a few other 'aggregation' type
qualifiers; in contrast to type mnemonics, these indicate that the
identifier is in some sense 'compound' or 'indirect'

a -> array of <type>
b -> codebock returning <type>
h -> handle to <type> // a special kind of integer
p -> pointer to <type>

the type qualifier is merely appended to the notation, but may be repeated;
in case of an array of pointers, it looks like

functionOrMessagePointerName#$#pa

standard, the type qualifier is t (for type) but mostly omitted; single
letters as type mnemonics for real object/class types cannot be used, since
these are reserved for native types, but real objects/classes can use 2
letter acronyms as mnemonic

when the type qualifier is not merely t, the general lay-out is:

variableOrPropertyName_$#$
functionOrMessageName#$_$

where the last $ is for that qualifier (or qualifiers, as above, 'pa")

virtual messages are prepended with a leading underscore making them
contrast with non-virtual messages; although in Class(y) all messages are
virtual by nature, most of the time, and especially instance variables, are
addressed non-virtually intented; explicitly applying casts (scope
resolution messages) in send operations is perhaps an extreme technique (as
Anton Van Straaten put it) but it is worthwhile considering for it
systematically eliminates possible conflicts, especially true in third-party
libraries which are designed to be inherited from.

A leading underscore might induce conflicts with ASSIGN properties, but the
general type/scope notation lay-out for properties differs from that of
messages in a way that there can be no confusion.

Day 6

In a language such as Ca-Clipper, where identifiers count up to only 10
significant letters, this notation could lead to name conflicts, when the
actual name part pushes the type/scope notation part past that limit, hence
(o)ceans' provides a transfixing tool converting postfix to prefix notation,
before compiling; it is a drawback during debugging, but [x]Harbour doesn't
need transfixing; here, postfix notation has even some more interesting
aspects:

In a procedural language which uses combined infix function call
construction, such as

STR( INT( VAL( LTRIM( someVar_c7 ))))

the reader must read from inside out, while in OO appoach, the send chain
reads naturally:

someVar_c7 : lTrim() : val() : int() : str()

applying insight from day 1 this can be rewritten as (and compiled with the
aid of some preprocessing) :

someVar_c7 . c:: lTrim() . c:: val() . c:: int() . n:: str()

applying type/scope notation for functions,

someVar_c7 . c:: lTrim9c() . c:: val9n() . c:: int9n() . n:: str9c()

note how visually striking the error (on purpose) stands out:

someVar_c7 ....................... c:: val9n() . c:: int9n() ...............

val9n() returns a numeric, not a character, so the next message cannot
belong to the c:: class; it should read:

someVar_c7 ....................... c:: val9n() . n:: int9n() ...............

The postfix in every identifier narrows the chance a wrong scope operation
will be inserted during the succeeding send; also, for a reader of the code,
it makes lookups where to find the definition of a symbol alot easier, but
admitted, this is not a successful example, since it is still very basic.

A language that is case insensitive such as Ca-Clipper, allows for even
further details in this style of notation without penalty when using the
wrong case

the type qualifier appears standard in lowercase, but if shown in uppercase
it expresses the idea of 'constant' or read-only.  Especially within the
context of classes, properties or messages may gain self documenting value
when case is properly applied; ie

CLASS someClass
EXPORT:
        VAR someVar_c9T
RO
...
END CLASS

A simple lookup in the sources (no need for additional documentation
describing the var as read-only) as well as reading appearances of this
identifier thruout samples, stamps the image in memory as to being
read-only; here the type qualifier is standard preferably not omitted,
precisely to express 'constant'

sidenote: the transfixing process in (o)ceans' for Ca-Clipper rewrites
someVar_c9 as t9c_someVar, with an upfront t which in postfix notation by
convention, is omitted; someVar_c9T is transfixed to T9c_someVar, just a
difference in case, so when compiled appearances in code of someVar_c9 or
someVar_c9T result in the same; this will not be the case taking advantage
of more than ten significant characters as in [x]Harbour, making the
requirement for transfixing unnecessary.

Another case related feature of (o)ceans' notation, is the fact that the
type mnemonic, although standard always with a leading lowercase letter for
its acronym, appears in uppercase when 'by reference' - ie. when explicitly
passed using the @ operator; formal arguments to functions/messages may put
this at use, so that on sight the identifier is known as potentially
dangerous to modification (unless intended to)

function someFunction ( formalArg_C7 ) // note: formal arguments are like
locals regarding the scope digit to be used

This has nothing to do with the 'by reference' nature of arrays and objects
in se, for which the type mnemonic shows up with a leading lowercase, except
as mentioned, when it is passed by reference (@) but this happens rarely.

The combination of uppercase type mnemonic and uppercase type qualifier is
also possible

function someFunction ( formalArg_C7T )

here the reason doing so might be that someFunction intends operating on
very large strings, which are to be passed 'by reference', whereas the
variable itself isn't meant to be modified, to be kept 'constant';
Ca-Clipper Tools contains a few examples of this.

Day 7

Cautious conclusion:

The wiki reference at the top lists objections made by notorious people to
(any) sort of type notation at source level; while aware of the
complications when a message changes type in implementation from a previous
version to the next, there are many ways to avoid clutter; one of them is,
in mind that native types when fully OO supported, are also just one kind of
object, a neutral way to address the type notation is to simply declare it
as o - type object; traditionally Ca-Clipper uses u or x, but (o)ceans' just
uses one: simply o - a function/message can intentionally be declared as
returning 'object' without further specification:

a7 . a:: add9o( someObject_o7 ) // AADD() function

Many other objections are truely valid considerations to suppress Hungarian
notation or alternatives, but i tend to believe in the attractive power of
it for newcomers, and present times, with intellisensed and autocompletion
development environments all around, doesn't make live harder for the
encoder, else having to complete identifiers by entering type/scope notation
manually.  Moreover, once the identifier sits, the postfix really invites to
selecting the proper scope operation, part of the (o)ceans' send syntax.  I
might advocate other advantages, but in the end, people just like it or not.
Only, it can't be expected to appear illustrate from this abstract, hence
please shout if some example of practicable code needs be delivered

There are more days to the topic of this casestudy; preferably on a basis of
private request and discussion, the topic may be elaborated, but this review
covers most of the basics.

Thanks for reading, and many thanks to Przemek for inviting this abstract

best regards,

frank van nuffel
[EMAIL PROTECTED]

_______________________________________________
Harbour mailing list
[email protected]
http://lists.harbour-project.org/mailman/listinfo/harbour

[Harbour] CaseStudy: an alternative to Hungarian notation for a High-Level language such as [x]Harbour

Reply via email to