Re: [Ghdl-discuss] bad character in identifier - switch the hyphen to an underline

David G. Koontz Thu, 22 Apr 2010 14:27:57 -0700

On 15 Apr 10 9:53 AM, Ian Chapman wrote:
> Hi,
>       I have this error in elaboration and I guess that it is in my
> uart_tx.vhd file.  I can generate a jed file using Lattice SW so I hope
> someone can tell me what I should look for.  Regards Ian.
> 
> 
> ghdl -e UART_TX.o
> /usr/lib/ghdl/bin/ghdl:*command-line*: bad character in identifier
> make: *** [UART_TX] Error 1
>


VHDL does not allow the a hyphen character in an identifier.  The Lattice
software is not strictly compliant if it does. You can substitute an
underline character.

VHDL with specific exceptions relating to the apostrophe character is
capable of separating lexical elements without regards to previous or
following lexical elements.  The apostrophe character is used as an element
in a character literal and also used as a delimiter in syntax.

(
A Qualified Expression modifying an Aggregate, where an apostrophe followed
by a left parenthesis followed by a character literal.  This could be
disambiguated by a white space separator, but the consensus of early tool
implementers was that the case should be handled by noting impossible
pairing of lexical elements, and keeping a history of the last lexical
element.  If the last lexical element was a a delimiter right parenthesis,
delimiter right bracket, the keyword ALL, an identifier, a string literal,
or a character literal, then an apostrophe is a delimiter apostrophe,
otherwise it is part of a character literal and would be followed by a
graphic character and a trailing apostrophe.

A working theory might be that VHDL tool implementers were interested in
zero sum new competition, where most lexical analyzer generators available
at the time would imply custom coding.  There is precedent for fixing the
problem by requiring a space, such as between an identifiers and identifiers
and abstract literals (numbers).  The LRM in 13.2, Lexical elements,
separators, and delimiters:

In some cases an explicit separator is required to separate adjacent lexical
elements (namely when, without separation, interpretation as a single
lexical element is possible). A separator is either a space character (SPACE
or NBSP),a format effector, or the end of a line. A space character (SPACE
or NBSP) is a separator except within a comment, a string literal, or a
space character literal.

 --
So we see there was a rule that could have covered apostrophe's already. The
problem arises from changing the lexical element set used by Ada from which
VHDL was derived.  It simply took a few years for some one to notice and
invoking the rule would have been aesthetically unpleasing, undoubtedly the
real reason the tokenizer got a last token history (that and the fact that
testing for spaces usually requires custom code, and testing for the last
lexical element was easier to implement for most VHDL analyzers then
extant).  Considerable analysis went into the solution, the problem report
with a proposed solution goes on  with a certain onomatopoetic rhythm for
pages showing possible successive token cases to come up with the adopted
rule (Issue Report 1045, targeting the IEEE 1076-1987 standard).
)

The rules for an identifier keep faith with the original intent derived from
Ada, that lexical separation could be a pure stream driven activity and not
be involve history or look ahead.

(And aren't you glad you asked). The reason for providing the explanation is
that a sizable portion of those when apprised of the limitations on
identifiers complain and there is actually a reasoned basis.  You could note
that consensus over IR1045 isn't quite reflected in the LRM (used by
language implementers to derive new tools).  There also isn't a test case
although one can easily be written.  Still discriminating against new tool
implementers.

This may be as close to a colorful anecdote that we have in VHDL's history,
an otherwise drier tale.


From IEEE 1076-1993 Language Reference Manual (LRM):

Section 13 Lexical elements

13.3  Identifiers

Identifiers are used as names and also as reserved words.

     identifier ::=  basic_identifier | extended_identifier

13.3.1 Basic identifiers

A basic identifier consists only of letters, digits, and underlines.

     basic_identifier ::=
         letter  { [ underline ] letter_or_digit }

     letter_or_digit ::=  letter | digit

     letter ::=  upper_case_letter | lower_case_letter

All characters of a basic identifier are significant, including any
underline character inserted between a letter or digit and an adjacent
letter or digit. Basic identifiers differing only in the use of
corresponding upper and lowercase letters are considered the same.

 --


The first character of an identifier must be a letter.  Only one consecutive
underline character is allowed - it must be followed by a letter or digit.

To be able to read 13.3.1:

0.2.1  Syntactic description

c. A production consists of a left-hand side, the symbol "::="(which is read
as "can be replaced by"), and a right-hand side. The left-hand side of a
production is always a syntactic category; the right-hand side is a
replacement rule.

The meaning of a production is a textual-replacement rule: any occurrence of
the left-hand side may be replaced by an instance of the right-hand side.

d. A vertical bar separates alternative items on the right-hand side of a
production unless it occurs immediately after an opening brace, in which
case it stands for itself:

          letter_or_digit ::= letter | digit

          choices  ::=  choice { | choice }

In the first instance, an occurrence of "letter_or_digit" can be replaced by
either "letter" or "digit." In the second case, "choices" can be replaced by
a list of "choice," separated by vertical bars (see item f for the meaning
of braces).

e. Square brackets enclose optional items on the right-hand side of a
production; thus the two following productions are equivalent:

          return_statement ::= return [ expression ] ;

          return_statement ::= return ; | return expression ;

f. Braces enclose a repeated item or items on the right-hand side of a
production. The items may appear zero or more times; the repetitions occur
from left to right as with an equivalent left-recursive rule. Thus, the
following two productions are equivalent:

          term ::= factor { multiplying_operator factor }

          term ::= factor | term multiplying_operator factor

uppercase letters
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z À Á Â Ã Ä Å Æ Ç È É Ê Ë
Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ

digits
0 1 2 3 4 5 6 7 8 9

lowercase letters
a b c d e f g h i j k l m n o p q r s t u v w x y z ß à á â ã ä å æ ç è é ê
ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ


-       hyphen, minus sign
_       underline, low line

 --

The character set is from ISO 8859-1:1987


_______________________________________________
Ghdl-discuss mailing list
[email protected]
https://mail.gna.org/listinfo/ghdl-discuss

Re: [Ghdl-discuss] bad character in identifier - switch the hyphen to an underline

Reply via email to