Di-an's email has prompted me to change the way Bison computes output file names. I'd appreciate any opinions from other developers as well on this issue. I feel we need to settle it before the 2.4 release so we don't find ourselves stuck with more backward compatibility problems.
On Mon, 6 Oct 2008, Di-an JAN wrote: > If an output file name is specified, tr "cC" "hH" on the extension gives > the default header file name output. This causes conflict if the output > file name extension does not contain [cC]. This patch uses the selected > language's header extension in that case instead. Other developers have pointed out in the past that Bison's rules for computing output file names are too complex already, and I agree. In my view, we have been maintaining these old rules simply for backward compatibility. With that in mind, I'd rather not add to them yet another special case that needs to be documented, understood by users, and maintained by us. I prefer the current behavior of warning the user when he has specified an unusual source code file name that leads these old rules to a conflict. The user can then specify the header file name explicitly in order to resolve the conflict. > If no output file name is specified and the input file name does not ends > in ".y", tr "yY" "cC" and tr "yY" "hH" are used on the input file extension > to get the output and default header file names. This causes the input file > to be overwritten if the the input file name extension does not contain [yY]. Thanks for reporting that. That seems to be possible only for Java output files because their names don't automatically pick up the .tab. > Overwriting the input file is not such a good idea. This patch uses the > selected language's source extension in that case instead. Instead, I think Bison shouldn't be applying the tr rules to languages other than C and C++ regardless of what letters their extensions contain. These rules were designed for C and C++... where they're not always reasonable either. For example, if a user names the grammar file parser.yacc, the default output files are named parser.tab.cacc and parser.tab.hacc. If he names it parser.bison, there are conflicts. Also, Akim once mentioned that he wished there was no .tab in the names, and I agree. The old output file naming rules became even messier when we tried to combine them with %language. However, because %language has not yet seen a full release, it presents an excellent opportunity to begin to dig our way out of this mess. Bison needs to evolve. Consider the following proposal: 1. When %yacc is specified, Bison continues to produce Yacc-compatible file names by default. This is necessary at least for Automake. 2. When neither %yacc nor %language are specified, Bison continues to produce file names with its messy C/C++-specific rules. This is for backward compatibility, and it would be nice not to have to touch this behavior again. 3. We encourage users to always specify %language in order to avoid the mess of #2. When %language is specified, the output file naming rules should be simple, consistent, and intuitive as follows: a. Bison strips the extension on the grammar file name regardless of whether it's .y, .yy, .Y, .foo, etc. If there's no ".", Bison strips nothing. The result is the default base name for the output files. b. Bison adds the default language-specific extensions to the default base name to form the default header file name and the default source code file name. Bison never adds the useless .tab. c. The user can specify %yacc, %file-prefix, %output, and %defines (or their equivalent command-line options) as usual to override the default file names. When the user specifies %output and not %defines, the default header file name is not overridden unexpectedly as in #2. Notice that the problems Di-an reported are avoided. Any objections from anyone? For contributors who do not have an up-to-date copyright assignment and employee disclaimer already on file, please feel free to discuss these issues, but please do not post more patches in this area for now. I'd like to get these issues settled and patched quickly, and then I'd like to release Bison 2.4 without waiting for any more assignments or disclaimers to be processed. We've waited long enough. > This patch also checks that the input file name is not used as an output > file name in check_file_name_check, with a fatal error if it is. I think this change is a nice improvement regardless of the issues above. For example, a user might accidentally write "bison -o parser.y parser.y" and lose his grammar file as a result. It's the user's fault, but making Bison detect it is a trivial change. Di-an, thanks for reporting these issues.
