On 07/03/2014 04:47 PM, Daniel Goldman wrote: > Hi Eric, > > My basic question was "are there any other benchmarks for frozen > files?". You provided a long answer to my post, but never answered my > basic question. :( I take it the answer is "no".
Frozen file support was added before I ever started hacking on m4; I'm not sure what benchmarks were used at the time (other than the autoconf case, since autoconf relies on them), and I have not personally tried to benchmark it. It doesn't necessarily mean no benchmarks exist, just that I haven't found any; conversely, I haven't had any reason to worry about it. > >> It depends on how complex those macros were. >> > Admittedly, the macros I write are simple, not complex. That is by > design. I happen to disagree with the notion: "Once really addicted, > users pursue writing of sophisticated m4 applications even to solve > simple problems, devoting more time debugging their m4 scripts than > doing real work. Beware that m4 may be dangerous for the health of > compulsive programmers." I use m4 very differently from that scenario, > which sounds like a programming nightmare. Anyway, it would make sense > that complex macros MIGHT be sped up more by frozen files. But unless > you have a benchmark it's more speculation than fact. I have nothing for > or against frozen files. I just see little evidence of benefit so far, > but do see a significant maintenance / complexity burden. While I agree with your notion that there is little evidence of benefit, I disagree with your claim of a significant burden. The code is there, it is covered by the testsuite, and we haven't had to patch it in several years (thus no one is reporting bugs against it). I do not consider that to be a maintenance burden, but evidence of something that does its job well, even if the job is not useful to many. As to the complexity claim, the code is fairly well segregated (see src/freeze.c); the additional code base needed for frozen files does not intrude into the speed or memory taken by the normal code that works without frozen files (pretty much one if() statement at shutdown on whether to dump to a file, to call into freeze.c; then on load time, freeze.c does its thing and sets up the internal hash tables of known macros, then returns control to the normal input engine). I don't see the code base being slowed down, because we don't have to maintain any extra state just because something is frozen. > >> Yay that you were able to measure a difference for your test case. >> > I don't know what that means. It sounds kind of flippant, like you don't > take my post very seriously. Of course there was a difference. But it > was disappointingly small. It may have been smaller than you liked, but it was definitely non-zero, and not in the noise. 30% may not be much, but it's better than a LOT of premature optimizations I've seen in my days that have a difference of no more than 1%. I wasn't trying to be flippant, but actually glad that you now have a benchmark for your use case, which shows an actual gain (proof that the code is not complete dead weight, even if it didn't do as much as you wanted). > Maybe the historical reasons were bad reasons. Maybe they don't apply > today. My guess is there were better ways to deal with those O(n^2) (and > even O(n^3)) macro definitions. You say they are gone now, so apparently > someone found a better way. I was one of the programmers that spent a lot of time on autoconf trying to eradicate stupid O(n^2) algorithms and replace it with faster iterations, and with some definite success (the time it took to run autoconf on a complex program such as coreutils was cut in half. Admittedly, the time to run autoconf on one developer's machine is in the noise compared to the time spent on running configure on all the users' machines in the collective scheme of things, but faster developer turnaround time can get patches to the users faster, so every little bit helps). However, while I know autoconf runs faster now than it did in 2.59 days, I don't know how much of that speed is due to improvements in m4 (such as using unlocked io), in frozen file handling, or in improvements to management of configure.ac constructs (the part of the processing done after frozen files were loaded) - only that I was working on speedups on all three fronts at the same time several years ago. > I'm sure someone was trying to do their best > way back when. But it's possibly they messed up, that frozen files were > a failed experiment. Programmers make bad design decisions all the time, > and they can persist for many years. It's possible that happened here. They are not a failed experiment, because autoconf still uses them. You don't have to use it, but that doesn't mean it failed. And back-compat demands that we can't rip it out. For that matter, I worry that ripping it out might have more negative consequences than positive. > What are the autoconf "quadratic algorithms" you are referring to? Are > they still around? If so, maybe there is a better approach. I would > suggest that if there is a composite macro that is more or less general, > widely used, and computation intensive, that would be a good candidate > to consider using a builtin, which could potentially be much faster, > much better, and much easier to use. But that would require an openness > to adding builtin macros. https://www.gnu.org/software/m4/manual/m4.html#Foreach documents a foreach macro, which (modulo `' vs. [] quoting) was originally lifted from autoconf 2.59. I'm not sure if autoconf actually used foreach when defining other macros in the files it eventually froze, or if it was more a matter of using foreach in the definition of macros that then caused quadratic expansion time while processing the user's configure.ac. And even if the algorithm was quadratic, if your list is small enough you'll never notice the poor scaling. Meanwhile, https://www.gnu.org/software/m4/manual/m4.html#Improved-foreach documents the improved foreach definition that is no longer quadratic, in part because of tricks I employed in getting rid of the quadratic recursion in newer autoconf in the 2.63 days. Another thing that I know was computation intensive was use of regex; autoconf 2.59 definitely had some places where it used a regular expression in order to define a new macro based on substitution of patterns of an existing macro, and did so in an inefficient manner. In newer autoconf, I made it a point to use fewer regex, to defer expressions unless they were needed; and I also improved m4 to cache frequently used expressions (as compiling a regex was a noticeable hotspot in performance testing). This is another case where frozen files matter (loading a frozen file does not have to compile the regular expression used to define a macro) but where the gap may be smaller (the code uses fewer regex to begin with). > > You totally make my point when you say "I'm not sure if you will see > better or worse numbers from autoconf". If it's not faster, there is NO > point to use frozen files. Perhaps without intending to, you make my > point that there is a possibility frozen files are not so hot. But until someone actually runs a benchmark to prove one way or the other, the status quo seems to be just fine. > > BTW, I'm sure you would not see "worse numbers". I am NOT suggesting > that frozen files slow things down. :) But they might. It is a very real possibility that with modern hardware, and with improvements made in both m4 and autoconf, that autoconf could be changed to avoid frozen files with no loss or even a potential gain in performance. But until someone posts hard numbers, we can speculate all day, and it won't matter. > >> They're not mandatory to use. But at this point, we can't rip it out of >> m4 - there are users that depend on it. The code is designed to not >> penalize people that aren't using it. > > I never suggested frozen files were mandatory to use, so I don't get > your point. I am suggesting is they are mandatory to maintain. And my > guess is they add significant complexity to the software (you would be > best placed to comment on that). And as m4 development seems more or > less stuck based on what I read, maybe it might be a good idea to > strategize before adding some other "feature", and to figure out how to > get m4 development unstuck. And again, sometimes "less is more". Okay, then it sounds like we are on the same page about leaving it alone. > > And I never suggested "rip it out of m4". My exact words were "might be > deprecated at some point". I'm sure you understand what deprecated > means. So I totally don't understand why you say "rip it out". Softwares > deprecate things when called for. It's how you move forward. While we have prepared the code to deprecate some command line options that aren't very consistent, we haven't had to deprecate any features. I don't see that marking frozen files as deprecated would make any difference. > As pointed out by another poster, I have tried to be pretty diplomatic. > It's really great you maintain m4, I appreciate your volunteer work. But > just my impression, you don't seem very open to the possibility of > figuring out a better way to do things. There are perhaps valid reasons > for that, or maybe I got the wrong impression. I write software. I love > it when someone finds a usability problem, especially a bug, or suggests > a possible way to improve the software. I understand some developers > don't like that kind of input. In any case, I'm just trying to help by > pointing out some possible ways to improve m4. Maybe it's more or less > an echo chamber. :( Who knows? Anyway, as a user, I am just noting some > ways m4 could probably serve the user better. At this point, m4 is stable enough that patches speak louder than words. While it can be quite powerful at what it does, there doesn't seem to be many people flocking to use it. Whether that is because people don't know about it, or because m4 only fits a niche market, it's hard to justify adding features when there is already such a low volume of contribution and a lack of free time on my part to write new patches. I'd love to review patches from others - but such patches are rarely submitted. I _do_ like suggestions for improvement, but with the limited time I spend on m4, I like it more when those suggestions are accompanied by an implementation that demonstrate the improvement rather than just describing it in prose. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
