CaseStudy: an alternative to Hungarian notation for a High-Level language such as [x]Harbour
Hi everyone, I've been invited to this forum to share thoughts about a possible implementation addition to OO regarding native types in Harbour; although native types are already available to an OO approach, here's what a personal project targeting Ca-Clipper combined with Class(y)'s OO native type support resulted in. While it may contribute to [x]Harbour, i tend to reserve the initiative in spe on basis of some sort of compliance with you developers to see some premises used throughout (o)ceans' thinktank, as it is codenamed, fit for wider acceptance; there are many aspects to it, hence i'll try presenting modular casestudies hoping you enjoy to read This abstract is about an alternative (data) type notation methodology, as part of the names of identifiers, along the lines of what Hungarian notation came up with initially, but specifically related to how this was introduced for use with Ca-Clipper. For reference, here's what wiki says about Hungarian notation: http://en.wikipedia.org/wiki/Hungarian_notation Hungarian notation or similar has pros and cons, unavoidably leading to lenghty discussions among people; this review is partial, in favour for, and doesn't want to initiate that debate, but instead show a possible roadmap in formalistic technical terms as to help code become even more self documenting, bearing the reader of the code in mind, before the writer; it is about potential, leaving the ultimate choice for application to the individual code designer. I reckon projects like Harbour and xHarbour not to set the pace for the community regarding this, but an occasion such as to design a complete OO interface for native types, in parrallel with the traditional procedural approach, where legacy dictates what names for variables and functions to use, now taking advantage of things such as polymorphism and overloading, raises the question what convention to use naming the upcoming properties and messages. In (o)ceans' this question has been answered in favour of experimentation. The alternative, say (o)ceans' notation, has matured over the years, but hasn't been published or announced until yet; so here you are the first to get it :-) In order to come forth with a uniform code writing style, the issue whether systematically adopting or leaving some sort of type expressing identifier notation, adds content to a possible wide acceptance of a language; Ca-Clipper's legacy proved xBase programmers familiar with this; my personal preferences regarding this are just one humble scenario, but from an esthetic formal point of view, here's what has been concluded in (o)ceans' [for ca-clipper], i've enjoyed blueprinting over the past years. The ca-clipper programming environment, although not systematically, uses some form of type expression for identifier names which seemed having been widely adopted throughout third-parties and probably end-programmers; trying to be backward compatible as much as possible with ca-clipper, especially with the language in its entirety, has been an ongoing exercise in this project of yours, developers of Harbour, and successful at that, making it a reality to open up new perspectives, opportunities and applicabilities. What follows is an attempt, at a purely formalistic level, part of bigger picture, to expand on some aspects of the language. Further implications might become the subject of succeeding case studies, which will be presented on this forum - if and when invited. Day 1 Many moons ago C++ documentation of the Borland product line initiated thoughts about an alternative way to make use of mnemonics expressing the data type of an identifier, along the lines of existing Hungarian notation. According to it, given a class named someClass, an instance thereof named instanceOfSomeClass and a message someMessage, writing instanceOfSomeClass . someClass:: someMessage() appeared to me being not only more informative to the human reader, but also compiling into a little smaller footprint. As long as someMessage() isn't a virtual method (message), or even when it is, but when it needs to be called in a non-virtual manner, the explicit use of a scope operation - someClass:: - while neutral to the compiler (or almost), does have some self documenting value to the reader, in that it locates the class where someMessage is defined. Picture code written this way, where a method might be send to an instance of a subclass in a broader hierarchy of classes, from within an editor with no (configured) hyperlink-on-the-fly capabilities, or from a printed page. While it is cumbersome to have to write the full class name in the scope operation during editing, why not recur to the use of mnemonics, in this case in the scope operation? note: while doing so, there's yet one step further to go with this, differentiating between type mnemonics inserted in the scope operation, showing an upfront uppercase instead of lowercase, especially in the context of OO for native types, when a traditional procedural translation is to be obtained; such can be supported by means of the preprocessor; the resulting writing style is more uniform someVar := someVar . c:: upper() // executed in OO manner someVar := someVar . C:: upper() // backtranslated procedurally to someVar := UPPER( someVar ) Day 2 Ca-Clipper introduced an attractive form of Hungarian notation, simple and straightforward, but left open, imo, essential paths to be addressed by future evolution; the o for object, with only four object types at the time, is one such occassion. With good ergonomy in design for mnemonics, why not use acronyms consisting, for example, of 2 letters refering to the actual class type of the object? someVar . tb:: stabilize() // TBrowse class Day 3 Most modern IDE's and lots of editors come with some sort of intellisense; it is more practicable to have identifiers start with their actual naming part, for lookup sort order, than with type notation components; hence, instead of a prefix why not use a postfix? Day 4 Link and (visibility) scope information can be put to use differentiating between identifiers, pushing polymorhism to a higher degree, by simply inserting that info in the type notation. Static and Local variables which share the actual namepart, may be recognised by means of a digit describing scope. Digits are visually far more striking than an additional letter; combined scope and type inferencing in identifier notation allows, for instance static someVar_c2 // c=char and 2 = static module-wide static someVar_l2 // l=logical ... function SomeFunction ( ... ) static someVar_c1 // 1 = static function-wide local someVar_c7 // 7 = dynamic (in contrast to static; stack vars are dynamic)==local Polymorphism! 1/2/3 are static , 1=private/2=protected/3=public (*) 4/5/6 are reserved for a special sort of typedef 7/8/9 are dynamic, 7=private/8=protected/9=public (*) (*) private in C++ terminology; ie hidden static not to be confused with 'static' keyword, but generally in the meaning of memory which is compile-time reserved, in the heap for instance dynamic in contrast to static, ie. memory allocated at run-time in next overview, read 'name' as a Ca-Clipper keyword part 1: scopes for variables (outside the context of a class) 1 static/private ='static' variable (function/procedure wide) 2 static/protected='static' variable (module wide) 3 static/public =n/a in Ca-Clipper // 'extern' variable in C(++) 7 dynamic/private ='local' variable 8 dynamic/protected='private' variable in Ca-Clipper ('private' keyword) 9 dynamic/public ='public' variable in Ca-Clipper part 2: scopes for properties (within the context of a class) 1 static/private =class property; 'hidden' 2 static/protected=class property, 'protected' 3 static/public =class property, 'public' 7 dynamic/private =instance property; 'hidden' 8 dynamic/protected=instance property; 'protected' 9 dynamic/public =instance property; 'public' general type/scope notation layout for variables/properties: variableOrPropertyName_$# $ = type mnemonic # = scope info Day 5 Why not using type/scope notation for functions/messages also? C++ requires a virtual message to not differ in returning type when overloaded; this is the only requirement relevant for a higher-level language such as [x]Harbour, which has to be taken into consideration, compared to a low and/or strongly-typed language, such as from the C family, since argument type mangling isn't applied in the former. Preserving the return (data) type while overloading virtual messages doesn't conflict with practising type notation for these; and for non-virtual messages, type notation is even free from such consideration; hence, why not applying type notation to functions/messages? part 3: scopes for functions (outside the context of a class) 1 static/private =init/exit procedure (since kind of private) 2 static/protected='static' function/procedure (module wide) 3 static/public =function/procedure (normal function/procedure) 7 dynamic/private =n/a in Ca-Clipper 8 dynamic/protected=n/a in Ca-Clipper 9 dynamic/public =n/a in Ca-Clipper .oO could xHarbour (and Harbour?) make use of the latter group for its DYNAMIC keyword? part 2: scopes for properties (within the context of a class) 1 static/private =class message; 'hidden' 2 static/protected=class message, 'protected' 3 static/public =class message, 'public' 7 dynamic/private =instance message; 'hidden' 8 dynamic/protected=instance message; 'protected' 9 dynamic/public =instance message; 'public' general type/scope notation layout for functions/messages: functionOrMessageName#$ # = scope info $ = type mnemonic Differentiating the general type/scope notation layout between variables/properties and functions/messages allows for an advanced notation layout, although not applicable in Ca-Clipper; functionOrMessagePointerName#$#p ie, for pointers to functions/messages, where the first digit references the scope of the function/message where the pointer points to, the type mnemonic references the return type of the function/message and the last digit expresses the scope of the pointer itself; the trailing p indicates it is a single-indirection pointer Like p in the previous example, there are a few other 'aggregation' type qualifiers; in contrast to type mnemonics, these indicate that the identifier is in some sense 'compound' or 'indirect' a -> array of <type> b -> codebock returning <type> h -> handle to <type> // a special kind of integer p -> pointer to <type> the type qualifier is merely appended to the notation, but may be repeated; in case of an array of pointers, it looks like functionOrMessagePointerName#$#pa standard, the type qualifier is t (for type) but mostly omitted; single letters as type mnemonics for real object/class types cannot be used, since these are reserved for native types, but real objects/classes can use 2 letter acronyms as mnemonic when the type qualifier is not merely t, the general lay-out is: variableOrPropertyName_$#$ functionOrMessageName#$_$ where the last $ is for that qualifier (or qualifiers, as above, 'pa") virtual messages are prepended with a leading underscore making them contrast with non-virtual messages; although in Class(y) all messages are virtual by nature, most of the time, and especially instance variables, are addressed non-virtually intented; explicitly applying casts (scope resolution messages) in send operations is perhaps an extreme technique (as Anton Van Straaten put it) but it is worthwhile considering for it systematically eliminates possible conflicts, especially true in third-party libraries which are designed to be inherited from. A leading underscore might induce conflicts with ASSIGN properties, but the general type/scope notation lay-out for properties differs from that of messages in a way that there can be no confusion. Day 6 In a language such as Ca-Clipper, where identifiers count up to only 10 significant letters, this notation could lead to name conflicts, when the actual name part pushes the type/scope notation part past that limit, hence (o)ceans' provides a transfixing tool converting postfix to prefix notation, before compiling; it is a drawback during debugging, but [x]Harbour doesn't need transfixing; here, postfix notation has even some more interesting aspects: In a procedural language which uses combined infix function call construction, such as STR( INT( VAL( LTRIM( someVar_c7 )))) the reader must read from inside out, while in OO appoach, the send chain reads naturally: someVar_c7 : lTrim() : val() : int() : str() applying insight from day 1 this can be rewritten as (and compiled with the aid of some preprocessing) : someVar_c7 . c:: lTrim() . c:: val() . c:: int() . n:: str() applying type/scope notation for functions, someVar_c7 . c:: lTrim9c() . c:: val9n() . c:: int9n() . n:: str9c() note how visually striking the error (on purpose) stands out: someVar_c7 ....................... c:: val9n() . c:: int9n() ............... val9n() returns a numeric, not a character, so the next message cannot belong to the c:: class; it should read: someVar_c7 ....................... c:: val9n() . n:: int9n() ............... The postfix in every identifier narrows the chance a wrong scope operation will be inserted during the succeeding send; also, for a reader of the code, it makes lookups where to find the definition of a symbol alot easier, but admitted, this is not a successful example, since it is still very basic. A language that is case insensitive such as Ca-Clipper, allows for even further details in this style of notation without penalty when using the wrong case the type qualifier appears standard in lowercase, but if shown in uppercase it expresses the idea of 'constant' or read-only. Especially within the context of classes, properties or messages may gain self documenting value when case is properly applied; ie CLASS someClass EXPORT: VAR someVar_c9T RO ... END CLASS A simple lookup in the sources (no need for additional documentation describing the var as read-only) as well as reading appearances of this identifier thruout samples, stamps the image in memory as to being read-only; here the type qualifier is standard preferably not omitted, precisely to express 'constant' sidenote: the transfixing process in (o)ceans' for Ca-Clipper rewrites someVar_c9 as t9c_someVar, with an upfront t which in postfix notation by convention, is omitted; someVar_c9T is transfixed to T9c_someVar, just a difference in case, so when compiled appearances in code of someVar_c9 or someVar_c9T result in the same; this will not be the case taking advantage of more than ten significant characters as in [x]Harbour, making the requirement for transfixing unnecessary. Another case related feature of (o)ceans' notation, is the fact that the type mnemonic, although standard always with a leading lowercase letter for its acronym, appears in uppercase when 'by reference' - ie. when explicitly passed using the @ operator; formal arguments to functions/messages may put this at use, so that on sight the identifier is known as potentially dangerous to modification (unless intended to) function someFunction ( formalArg_C7 ) // note: formal arguments are like locals regarding the scope digit to be used This has nothing to do with the 'by reference' nature of arrays and objects in se, for which the type mnemonic shows up with a leading lowercase, except as mentioned, when it is passed by reference (@) but this happens rarely. The combination of uppercase type mnemonic and uppercase type qualifier is also possible function someFunction ( formalArg_C7T ) here the reason doing so might be that someFunction intends operating on very large strings, which are to be passed 'by reference', whereas the variable itself isn't meant to be modified, to be kept 'constant'; Ca-Clipper Tools contains a few examples of this. Day 7 Cautious conclusion: The wiki reference at the top lists objections made by notorious people to (any) sort of type notation at source level; while aware of the complications when a message changes type in implementation from a previous version to the next, there are many ways to avoid clutter; one of them is, in mind that native types when fully OO supported, are also just one kind of object, a neutral way to address the type notation is to simply declare it as o - type object; traditionally Ca-Clipper uses u or x, but (o)ceans' just uses one: simply o - a function/message can intentionally be declared as returning 'object' without further specification: a7 . a:: add9o( someObject_o7 ) // AADD() function Many other objections are truely valid considerations to suppress Hungarian notation or alternatives, but i tend to believe in the attractive power of it for newcomers, and present times, with intellisensed and autocompletion development environments all around, doesn't make live harder for the encoder, else having to complete identifiers by entering type/scope notation manually. Moreover, once the identifier sits, the postfix really invites to selecting the proper scope operation, part of the (o)ceans' send syntax. I might advocate other advantages, but in the end, people just like it or not. Only, it can't be expected to appear illustrate from this abstract, hence please shout if some example of practicable code needs be delivered There are more days to the topic of this casestudy; preferably on a basis of private request and discussion, the topic may be elaborated, but this review covers most of the basics. Thanks for reading, and many thanks to Przemek for inviting this abstract best regards, frank van nuffel [EMAIL PROTECTED] _______________________________________________ Harbour mailing list [email protected] http://lists.harbour-project.org/mailman/listinfo/harbour
