Hi Branden, On 4/30/23 14:34, G. Branden Robinson wrote: >> Well, formally yes. And a regex can't find C function definitions in >> a source tree; at least if you try to fool it by writing the most >> horrible code in the universe. But I wrote a relatively small >> script[1] that finds a lot of C code with pcre2grep(1), and works most >> of the time. It has limitations; some of which can be fixed by >> improving the regexes (read: making them even more unreadable); some >> others are likely impossible to fix with a regex. The biggest >> limitation I think I've met is K&R-style functions: I don't think a >> regex can cope with them. > > I don't know if you have to cope with "the lexer hack", but you might. > > https://en.wikipedia.org/wiki/Lexer_hack
No, I didn't. The script is by design very dumb. It doesn't have a database or index; it doesn't involve any compiler either. It is able to work very fast on any source tree, without having to perform any operations on it (e.g., I can clone a repository, and immediately after I can run the program to search for a function). It's literally just a wrapper around pcre2grep(1), which is just grep(1) on steroids. I find it more usable than existing tools like ctags(1). You could try it (but C++ will only work as long as it resembles C; and you need to specify the file suffix). > > How much grief might have been saved if objects in C had been prefixed > with a sigil like $, or if types had been prefixed with %? With sane coding styles, my script works well. Of course, if you take code from an obfuscation code contest, it will find garbage, but I'm writing a small tool that is useful for finding code in useful code, not a compiler that needs to be able to actually compile the weirdest stuff that one can think of. > > In my imagination, Thompson vetoed this, but when I consider it more > seriously, I reckon the truth is more complicated, and arises from C's > origins in the wholly untyped B language. The dialect of C we see in > Version 6 Unix (q.v. the Lions book) is shockingly loosely typed to > modern eyes. I once ground the productivity of my workplace to a halt > for an entire afternoon by presenting my colleagues with the attached > exhibit of "legal C". (It remained legal in AT&T USG Unix for many, > many years.) > >> I believe a regex-based script can be good enough for some purposes, >> even if it's not perfect. > > All of this is true, and I like programming languages that are dead > simple to lexically analyze. (But I spend next to no time working in > them.) > > I'm strident on this point because I'm opposed to putting a diagnostic > into the formatter that throws false positives. Bjarni didn't propose adding such a thing to groff. He was rather suggesting me to call such a script from my Makefile where I want the diagnostics. I think that would be fair (assuming I can get a readable thing out of that script); especially, since I already have other scripts for similar purposes (like the one suggested by Ralph, for the 80-column margin, which I find very useful). Cheers, Alex > That would disserve > users. > > Regards, > Branden -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
OpenPGP_signature
Description: OpenPGP digital signature