Hi Branden,

On 4/30/23 14:34, G. Branden Robinson wrote:
>> Well, formally yes.  And a regex can't find C function definitions in
>> a source tree; at least if you try to fool it by writing the most
>> horrible code in the universe.  But I wrote a relatively small
>> script[1] that finds a lot of C code with pcre2grep(1), and works most
>> of the time.  It has limitations; some of which can be fixed by
>> improving the regexes (read: making them even more unreadable); some
>> others are likely impossible to fix with a regex.  The biggest
>> limitation I think I've met is K&R-style functions: I don't think a
>> regex can cope with them.
> 
> I don't know if you have to cope with "the lexer hack", but you might.
> 
> https://en.wikipedia.org/wiki/Lexer_hack

No, I didn't.  The script is by design very dumb.  It doesn't have a
database or index; it doesn't involve any compiler either.  It is able
to work very fast on any source tree, without having to perform any
operations on it (e.g., I can clone a repository, and immediately after
I can run the program to search for a function).

It's literally just a wrapper around pcre2grep(1), which is just grep(1)
on steroids.  I find it more usable than existing tools like ctags(1).

You could try it (but C++ will only work as long as it resembles C; and
you need to specify the file suffix).

> 
> How much grief might have been saved if objects in C had been prefixed
> with a sigil like $, or if types had been prefixed with %?

With sane coding styles, my script works well.  Of course, if you
take code from an obfuscation code contest, it will find garbage, but
I'm writing a small tool that is useful for finding code in useful code,
not a compiler that needs to be able to actually compile the weirdest
stuff that one can think of.

> 
> In my imagination, Thompson vetoed this, but when I consider it more
> seriously, I reckon the truth is more complicated, and arises from C's
> origins in the wholly untyped B language.  The dialect of C we see in
> Version 6 Unix (q.v. the Lions book) is shockingly loosely typed to
> modern eyes.  I once ground the productivity of my workplace to a halt
> for an entire afternoon by presenting my colleagues with the attached
> exhibit of "legal C".  (It remained legal in AT&T USG Unix for many,
> many years.)
> 
>> I believe a regex-based script can be good enough for some purposes,
>> even if it's not perfect.
> 
> All of this is true, and I like programming languages that are dead
> simple to lexically analyze.  (But I spend next to no time working in
> them.)
> 
> I'm strident on this point because I'm opposed to putting a diagnostic
> into the formatter that throws false positives.

Bjarni didn't propose adding such a thing to groff.  He was rather
suggesting me to call such a script from my Makefile where I want the
diagnostics.  I think that would be fair (assuming I can get a readable
thing out of that script); especially, since I already have other
scripts for similar purposes (like the one suggested by Ralph, for the
80-column margin, which I find very useful).

Cheers,
Alex

>  That would disserve
> users.
> 
> Regards,
> Branden

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to