Re: LCLint 3.0.0.17 parse problem

2001-10-03 Thread Derek M Jones

Richard,

>I can find nothing in either C standard that requires a C system to
>support blanks at the ends of lines.  I therefore deny that the translation
>unit above *is* strictly conforming.  (Nothing in the C standard requires
>a C system to support curly braces, either.  Hence trigraphs and digraphs.)

I think these somewhat unusual views are best argued out on comp.std.c


derek

--
Derek M Jones   tel: +44 (0) 1252 520 667
Knowledge Software Ltdmailto:[EMAIL PROTECTED]
Applications Standards Conformance Testing   http://www.knosof.co.uk





Re: LCLint 3.0.0.17 parse problem

2001-10-02 Thread Richard A. O'Keefe

I observed that
>As has often been pointed out in comp.std.c, there is NOTHING in any C or
>C++ standard to forbid a compiler writer defining the end of line indicator
>to be "end of record, preceded by any number of blanks".  There is NO 
>requirement whatsoever that the "end of line indicator" be just the physical
>end of record.

Derek M Jones <[EMAIL PROTECTED]> wrote in reply:
True.  But the compiler has to handle any strictly conforming program.

Agreed.

I can write a strictly conforming program (using macros and stringizing)
where a backslash followed by blanks, followed by end-of-record occurs.

Let's see it!  I've tried to think how it might be done.  AH!
preprocessing-token:
   header-name
   identifier
   pp-number
   character-constant
   string-literal
   punctuator
   each non-white-space character that cannot be
one of the above*

So
#define bar(x) #x
#define foo(x) bar(x)
#define fred~\

char *x = foo(fred);
=>
char *x = "~\";

[lcc and SPARCompiler cc like this, gcc doesn't.]
But if we change fred to
#define fred~\

then we get
char *x = "~";


A compiler that unconditionally turned this into a line splice
would be faulty.

Well, no.  As I've said, a compiler is at liberty to define an end of
line indicator however it wants to.  Such a compiler would not "turn
this into a line splice", because it would be a compiler in which blanks
at the end of a line (logically) DID NOT EXIST.

I can find nothing in either C standard that requires a C system to
support blanks at the ends of lines.  I therefore deny that the translation
unit above *is* strictly conforming.  (Nothing in the C standard requires
a C system to support curly braces, either.  Hence trigraphs and digraphs.)

The output of the cc and lcc compilers is arguably incorrect; a \ is
supposed to be inserted before each " or \ in the replacement text of the
parameter.  I think it should have been "~\\", not "~\".

The purpose of my reference to the #if macro DR was to point out
that such a macro could not be used by a strictly conforming
program within a #if directive.

The reason why foo(fred) can't be used within an #if is that it yields
a string, and strings can't appear in #if.  None of the macro _calls_
involves an embedded newline.

So far, the only way I know of that this can show up is a combination of
- stringizing, and
- the fact that ANY character not otherwise forming part of a token is
  allowed as a pp-token.
I think the last rule there is a bad one.

gcc -E produces workable output for this example, but gcc -c refuses
to compile it.  I don't think any realistic working code is likely to be
broken by a compiler that handles fixed records this way.




Re: LCLint 3.0.0.17 parse problem

2001-10-02 Thread Derek M Jones

Richard,

>So what is an end-of-line indicator?  The standard just says
>"In source files, there shall be SOME way of indicating the end of
>each line of text; this International Standard treats such an
>end-of-line indicator as if it were a single new-line character."
>
>As has often been pointed out in comp.std.c, there is NOTHING in any C or
>C++ standard to forbid a compiler writer defining the end of line indicator
>to be "end of record, preceded by any number of blanks".  There is NO 
>requirement whatsoever that the "end of line indicator" be just the physical
>end of record.

True.  But the compiler has to handle any strictly conforming program.

I can write a strictly conforming program (using macros and stringizing)
where a backslash followed by blanks, followed by end-of-record occurs.
A compiler that unconditionally turned this into a line splice would be
faulty.

The purpose of my reference to the #if macro DR was to point out that
such a macro could not be used by a strictly conforming program within
a #if directive.  So in this special case a #if followed by backslash,
followed by blanks could only be treated as a syntax error, or a line splice.


derek

--
Derek M Jones   tel: +44 (0) 1252 520 667
Knowledge Software Ltdmailto:[EMAIL PROTECTED]
Applications Standards Conformance Testing   http://www.knosof.co.uk





Re: LCLint 3.0.0.17 parse problem

2001-10-01 Thread Richard A. O'Keefe

Derek M Jones <[EMAIL PROTECTED]> wrote:
#if is a special case in that it is not possible to split macro invocations
across it, as answered by the following C90 Defect Report:

http://anubis.dkuug.dk/JTC1/SC22/WG14/www/docs/dr_017.html

Thank you for this reference.  But what it says is that
#if f(1,
2)
is not defined.  It doesn't say anything about backslash.
The correction says it is not allowed:  preprocessing directives are
_first_ terminated by newline and _then_ macro-expanded if appropriate.

It looks like you have found a compiler that, strictly speaking, is
making use of an extension.  Or IBM wiggle a bit and point out
that their compilers undefined behaviour on encountering this kind
of syntax error (which is what it is) is to treat it as a line splice.

The C99 standard says
[translation phase 1]
"Physical source file multibyte characters are mapped to the source
character set (introducing new-line characters for end-of-line
indicators) if necessary.  Trigraph sequences are replaced by
corresponding single-character internal representations."

Backslash/newline splicing takes place in phase 2, AFTER end-of-line
indicators have been replaced.

So what is an end-of-line indicator?  The standard just says
"In source files, there shall be SOME way of indicating the end of
each line of text; this International Standard treats such an
end-of-line indicator as if it were a single new-line character."

As has often been pointed out in comp.std.c, there is NOTHING in any C or
C++ standard to forbid a compiler writer defining the end of line indicator
to be "end of record, preceded by any number of blanks".  There is NO 
requirement whatsoever that the "end of line indicator" be just the physical
end of record.

In the spirit of "be strict about what you generate, forgiving about what
you accept", the best thing for whoever wrote the C compiler in question to
do would be to be very careful to put backslashes in the rightmost column
of their header files, but accept any number of spaces between a backslash
and end of record.

Note that there is nothing to stop a UNIX C compiler defining
 ::= ( | )* ? 
(This could be quite useful if one were trying to compile from a source file
on a floppy disc written on a Win/DOS system.)

You will either have to modify the preprocessor in LCLint,

LCLint really ought to handle this.  Spaces after a backslash should be
recognised as something that the standard DOES allow (if a compiler writer
cares to define "end-of-line indicator" appropriately) but does not require,
so is a definite porting problem.

or make a
local copy of the offending header and edit around the problem.

The quickest workaround.



Re: LCLint 3.0.0.17 parse problem

2001-10-01 Thread Anthony Giorgio

I did some poking around with a hex editor in the offending header file, 
and this is what I found:

File: features.h EBCDIC Offset: 0xB6E0 / 0x0001B4A3 
(%42)
B690  15 40 40 40  40 40 40 7B   89 86 40 4D  5A 84 85 86  #if 
(!def
B6A0  89 95 85 84  4D 6D C1 D3   D3 6D E2 D6  E4 D9 C3 C5 
ined(_ALL_SOURCE
B6B0  5D 40 40 40  40 40 40 40   40 40 40 40  40 40 40 40   )
B6C0  40 40 E0 40  40 40 40 40   40 40 40 40  40 40 40 40 \
B6D0  40 40 40 40  40 40 40 40   40 40 40 40  40 40 40 40
B6E0  40 15 40 40  40 40 40 40   40 50 50 40  40 5A 84 85 &&  !de
B6F0  86 89 95 85  84 4D 6D D6   D7 C5 D5 6D  E2 D6 E4 D9 
fined(_OPEN_SOUR
B700  C3 C5 5D 40  40 40 40 40   40 40 40 40  40 40 40 40   CE)
B710  40 40 40 E0  40 40 40 40   40 40 40 40  40 40 40 40  \
B720  40 40 40 40  40 40 40 40   40 40 40 40  40 40 40 40
B730  40 40 15 40  40 40 40 40   40 40 50 50  40 40 5A 84 &&  !d
B740  85 86 89 95  85 84 4D 6D   D6 D7 C5 D5  6D E2 E8 E2 
efined(_OPEN_SYS
B750  5D 5D 40 40  40 40 40 40   40 40 40 40  40 40 40 40   ))
B760  40 40 40 40  40 40 40 40   40 40 40 40  40 40 40 40
B770  40 40 40 40  40 40 40 40   40 40 40 40  40 40 40 40
B780  40 40 40 15  40 40 40 40   40 40 40 40  40 40 40 40
B790  40 40 40 40  40 40 40 40   40 40 40 40  40 40 40 40


0x15 is the EBCDIC newline character, 0x40 is the space character. 

It seems that there are an awful lot of spaces between the backslash char 
(0xE0) and the newline.


On another note, please excuse my ignorance of optimizing compilers and 
the like.  I'm just starting out, and learning just how much I need to 
learn.
  


Anthony Giorgio
DBX Developer
Phone: (845) 435-9115
Tie Line: 295-9115
Email: agiorgio AT us.ibm.com



>> Does the file system pad lines with spaces?
>
>Yes it does!  It seems that the file is very similar to the old-style IBM 

>punch cards, where everything had 80 columns, and anything that wasn't 
>filled in was a space.  The file is filled out to column 81 with spaces, 
>and the \ is in column 66.  Shouldn't lclint just ignore the whitespace 
>following the trailing \?


You might like to poke around in using a binary editor to see
what representation is used by IBM.



Re: LCLint 3.0.0.17 parse problem

2001-09-30 Thread Richard A. O'Keefe

The thing about \ is that each C implementation gets to
define its own line termination sequence.  I may be misreading the
standard (actually a draft; I have bought three paper copies of the
C89 standard and they have _all_ walked out of my office over the years)
but as far as I can see there is no reason why an IBM mainframe C compiler
couldn't just say that the character sequence constrained by the C standard
is
data Card = Card [Char] -- a record with possible space padding

to_standard: [Card] -> [Char]

to_standard [] = []
to_standard (Card card : cards) =
trim_right card ++ "\n" ++ to_standard cards

trim_right :: [Char] -> [Char]
trim_right line = reverse (dropWhile (<= ' ') (reverse line))

That is, for the purpose of \, there is no reason why the
padding spaces needed for fixed length records have to count as existing.

It most cases it could ignore whitespace after a \, provided there
were only trailing whitespaces.  It is possible to come up with
programs that rely on the line being a \ followed by whitespace,
for instance, and not being a line splice.  Such cases are rare,
but that does not mean an implementation is free to change the
requirements specified in the standard.

True, but an implementation IS free to define how the character sequence
constrained by the standard is computed from raw bytes.

Since line splicing is in one of the earliest translation phases, it would
be interesting to see a legal example where \ was allowed
by the standard.




Re: LCLint 3.0.0.17 parse problem

2001-09-28 Thread Alexander Mai

On Fri, Sep 28, 2001 at 04:01:40PM +, Derek M Jones wrote:
> Anthony,
> 
> >> Does the file system pad lines with spaces?
> >
> >Yes it does!  It seems that the file is very similar to the old-style IBM 
> >punch cards, where everything had 80 columns, and anything that wasn't 
> >filled in was a space.  The file is filled out to column 81 with spaces, 
> >and the \ is in column 66.  Shouldn't lclint just ignore the whitespace 
> >following the trailing \?
> 
> It most cases it could ignore whitespace after a \, provided there
> were only trailing whitespaces.  It is possible to come up with
> programs that rely on the line being a \ followed by whitespace,
> for instance, and not being a line splice.  Such cases are rare,
> but that does not mean an implementation is free to change the
> requirements specified in the standard.
[...]

Well, in principle it's not OK to ignore them.
The C standard clearly says that "a '\' immediately followed by a new-line
character" fulfills the desired job.
So for those 'strange' systems what is lclint supposed to do in
your opinion? Only chance would be to use system's preprocessor?

-- 
Alexander Mai
[EMAIL PROTECTED]



Re: LCLint 3.0.0.17 parse problem

2001-09-28 Thread Derek M Jones

Anthony,

>> Does the file system pad lines with spaces?
>
>Yes it does!  It seems that the file is very similar to the old-style IBM 
>punch cards, where everything had 80 columns, and anything that wasn't 
>filled in was a space.  The file is filled out to column 81 with spaces, 
>and the \ is in column 66.  Shouldn't lclint just ignore the whitespace 
>following the trailing \?

It most cases it could ignore whitespace after a \, provided there
were only trailing whitespaces.  It is possible to come up with
programs that rely on the line being a \ followed by whitespace,
for instance, and not being a line splice.  Such cases are rare,
but that does not mean an implementation is free to change the
requirements specified in the standard.

Some implementations that have to exist on file systems that
pad lines with spaces use an alternative representation
for \.  For instance a \ on the end of line being represented by
two \\ (but in all other positions being represented by itself.

You might like to poke around in using a binary editor to see
what representation is used by IBM.

You could make a copy of the offending header, edit it, and use
the -I option to cause LCLint to pick up that file first.

Are you sure that spaces are the problem?  There must be
other line splices in the headers.  What abou tmy suggestion that
there is a bug in LCLint for this case?


derek

--
Derek M Jones   tel: +44 (0) 1252 520 667
Knowledge Software Ltdmailto:[EMAIL PROTECTED]
Applications Standards Conformance Testing   http://www.knosof.co.uk





Re: LCLint 3.0.0.17 parse problem

2001-09-28 Thread Anthony Giorgio

> Does the file system pad lines with spaces?

Yes it does!  It seems that the file is very similar to the old-style IBM 
punch cards, where everything had 80 columns, and anything that wasn't 
filled in was a space.  The file is filled out to column 81 with spaces, 
and the \ is in column 66.  Shouldn't lclint just ignore the whitespace 
following the trailing \?


Anthony Giorgio
DBX Developer
Phone: (845) 435-9115
Tie Line: 295-9115
Email: agiorgio AT us.ibm.com




Derek M Jones <[EMAIL PROTECTED]>
09/28/2001 11:33 AM

 
To: Anthony Giorgio <[EMAIL PROTECTED]>
cc: [EMAIL PROTECTED]
    Subject:    Re: LCLint 3.0.0.17 parse problem

 

Anthony,

>I'm using LCLint 3.0.0.17 on an IBM zSeries mainframe, and I'm having 

Does the file system pad lines with spaces?

>problems getting it to parse the code for the project I'm on.  Whenever 
it 
>tries to parse one of the systerm header files, it gags on the 
>preprocessor step.  Many of the header files have a construct similar to 
>the one below:
>
>  #if (lots_of_stuff) \
> && (other_stuff)
>#define some_flag
>  #endif 
>
>
>Whenever I run lclint on a file that includes a header with the above 
>construct, it dies with the following errror:
>
>/usr/include/features.h:203:67: Invalid character in #if: \ 
>
>Is it valid to allow #if directives to span lines with the use of '\' , 

This error message is very specific.  Perhaps it is a bug in LCLint.
The code is certainly conforming.

>and if so, how can I convince lclint that it's okay?  If it's not valid, 
>then how can I have lclint ignore the problem?  I can't change the system 

>header files to make them more compliant, even though it might be a good 
>idea :)

They are already compliant.  Nothing to change.


derek

--
Derek M Jones   tel: +44 (0) 1252 
520 667
Knowledge Software Ltdmailto:[EMAIL PROTECTED]
Applications Standards Conformance Testing   http://www.knosof.co.uk








Re: LCLint 3.0.0.17 parse problem

2001-09-28 Thread Derek M Jones

Anthony,

>I'm using LCLint 3.0.0.17 on an IBM zSeries mainframe, and I'm having 

Does the file system pad lines with spaces?

>problems getting it to parse the code for the project I'm on.  Whenever it 
>tries to parse one of the systerm header files, it gags on the 
>preprocessor step.  Many of the header files have a construct similar to 
>the one below:
>
>  #if (lots_of_stuff) \
> && (other_stuff)
>#define some_flag
>  #endif 
>
>
>Whenever I run lclint on a file that includes a header with the above 
>construct, it dies with the following errror:
>
>/usr/include/features.h:203:67: Invalid character in #if: \   
>
>Is it valid to allow #if directives to span lines with the use of '\' , 

This error message is very specific.  Perhaps it is a bug in LCLint.
The code is certainly conforming.

>and if so, how can I convince lclint that it's okay?  If it's not valid, 
>then how can I have lclint ignore the problem?  I can't change the system 
>header files to make them more compliant, even though it might be a good 
>idea :)

They are already compliant.  Nothing to change.


derek

--
Derek M Jones   tel: +44 (0) 1252 520 667
Knowledge Software Ltdmailto:[EMAIL PROTECTED]
Applications Standards Conformance Testing   http://www.knosof.co.uk





LCLint 3.0.0.17 parse problem

2001-09-28 Thread Anthony Giorgio

I'm using LCLint 3.0.0.17 on an IBM zSeries mainframe, and I'm having 
problems getting it to parse the code for the project I'm on.  Whenever it 
tries to parse one of the systerm header files, it gags on the 
preprocessor step.  Many of the header files have a construct similar to 
the one below:

  #if (lots_of_stuff) \
 && (other_stuff)
#define some_flag
  #endif 


Whenever I run lclint on a file that includes a header with the above 
construct, it dies with the following errror:

/usr/include/features.h:203:67: Invalid character in #if: \   

Is it valid to allow #if directives to span lines with the use of '\' , 
and if so, how can I convince lclint that it's okay?  If it's not valid, 
then how can I have lclint ignore the problem?  I can't change the system 
header files to make them more compliant, even though it might be a good 
idea :)


Anthony Giorgio
DBX Developer
Phone: (845) 435-9115
Tie Line: 295-9115
Email: agiorgio AT us.ibm.com