Re: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC.
Dave Dodge wrote: (14/04/2009 21:18) >> Ok. Point taken about undefined behaviour. Is the "unsigned char *p" >> declaration enough though? > >Yes. Dereferencing a valid (unsigned char *) will produce an >(unsigned char), which by definition is safe to pass to isupper. > >> One mail suggested using "unsigned" at every subsequent use of the >> variable. > >That's because p was a (char *), and therefore *p was producing a >possibly-signed value. Casting the dereferenced value to (unsigned >char) is another way of ensuring isupper gets a usable value, but I >think simply changing p to an (unsigned char *) is cleaner. > So do I. Thankyou. >BTW it's worth noting that casting from a signed integer to an >unsigned integer is a well-defined operation, but casting from >unsigned to signed is implementation-defined. > Sounds familiar. I think I read there were at least three standard ways to encode negative numbers. Though I think one of them (Two's complement: up from 0 to total/2-1, and down from total until you reach total/2), where 'total' is the full count of possible values available) strongly dominates the others. ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC.
grischka wrote: (14/04/2009 21:04) >http://repo.or.cz/w/tinycc.git?a=shortlog;h=92c58361 > >Links to shapsnots are at the rightmost end of the lines. > Where are you getting stricmp from? I got some C reference PDF's that say only strcmp and people here told me when I asked, that to do a force or an insensitive compare I'd have to iterate over each character. You're doing something no-one mentioned, so what is it? ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC.
On Tue, Apr 14, 2009 at 06:37:49PM +0100, lostgallifreyan wrote: > Dave Dodge wrote: > >If the filename string contains any high-valued characters, such as > >accented letters, then accessing it with a char* might produce a > >negative char value, and passing that to isupper/islower can be a > >problem. > > Ok. Point taken about undefined behaviour. Is the "unsigned char *p" > declaration enough though? Yes. Dereferencing a valid (unsigned char *) will produce an (unsigned char), which by definition is safe to pass to isupper. > One mail suggested using "unsigned" at every subsequent use of the > variable. That's because p was a (char *), and therefore *p was producing a possibly-signed value. Casting the dereferenced value to (unsigned char) is another way of ensuring isupper gets a usable value, but I think simply changing p to an (unsigned char *) is cleaner. BTW it's worth noting that casting from a signed integer to an unsigned integer is a well-defined operation, but casting from unsigned to signed is implementation-defined. -Dave Dodge ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC.
lostgallifreyan wrote: However that wouldn't stop windows users from asking why they can't type "TCC FILE.C", I'm afraid. Nope, probably not. But they wouldn't have to ask if it were GCC, because those paths work there. This is why I kept it simple. This patch is not trying to do clever things, just meant to reduce what you seem to want reduced. If you decided to add it, that MIGHT stop those questions in future. No way to be sure it won't cause others but they'd be fewer, and come from people who'd progressed far enough to ask more interesting questions. Okay, let's move on. http://repo.or.cz/w/tinycc.git?a=shortlog;h=92c58361 Links to shapsnots are at the rightmost end of the lines. --- grischka ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC.
grischka wrote: (14/04/2009 18:18) >Ivo wrote: >> Why don't you just implement gcc's -x command line option instead of this >> ifdeffery? It's also useful on other platforms and it solves the .s/.S >> problem, too: >> >> -x c >> -x cpp-output >> -x assembler >> -x assembler-with-cpp >> >> etc... >> > >I wouldn't say no if we can get it. > >However that wouldn't stop windows users from asking why they >can't type "TCC FILE.C", I'm afraid. > Nope, probably not. But they wouldn't have to ask if it were GCC, because those paths work there. This is why I kept it simple. This patch is not trying to do clever things, just meant to reduce what you seem to want reduced. If you decided to add it, that MIGHT stop those questions in future. No way to be sure it won't cause others but they'd be fewer, and come from people who'd progressed far enough to ask more interesting questions. ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC.
Dave Dodge wrote: (14/04/2009 18:13) >If the filename string contains any high-valued characters, such as >accented letters, then accessing it with a char* might produce a >negative char value, and passing that to isupper/islower can be a >problem. > Ok. Point taken about undefined behaviour. Is the "unsigned char *p" declaration enough though? One mail suggested using "unsigned" at every subsequent use of the variable. Your mail suggested it was ok to do it once at declaration. I just forgot to include it while working on the patch, but the second posting has it there (as well as a more important fix). ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC.
Ivo wrote: Why don't you just implement gcc's -x command line option instead of this ifdeffery? It's also useful on other platforms and it solves the .s/.S problem, too: -x c -x cpp-output -x assembler -x assembler-with-cpp etc... I wouldn't say no if we can get it. However that wouldn't stop windows users from asking why they can't type "TCC FILE.C", I'm afraid. --- grischka --Ivo ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC.
On Tue, Apr 14, 2009 at 12:21:50PM +0100, lostgallifreyan wrote: > "Charlie Gordon" wrote: > >Also more importantly, do not pass char arguments to > >tolower/toupper and islower/isupper functions. 'char' may be > >signed whereas these functions expect an int parameter with a value > >among those of type unsigned char or EOF. > As the variable isn't assigned any negative values it's not going to > be an issue, If the filename string contains any high-valued characters, such as accented letters, then accessing it with a char* might produce a negative char value, and passing that to isupper/islower can be a problem. > Actually I forgot the unsigned char bit but even so, it compiled > without error or warning and it worked. I think you've mentioned you're new to C, in which case you need to understand that "compiled without error or warning and it worked" doesn't mean very much. C has the concept of "undefined behavior", which basically means the compiler is allowed to quietly accept the code but there are no constraints on what it actually does when you run it. It might even produce the expected result for some inputs. Passing a value outside the range of unsigned char (or EOF) to a function such as isupper is an example of something that produces undefined behavior. The program might implicitly cast it to unsigned char, or change it to 0, or corrupt memory, or crash, or quietly erase the hard drive and set the printer on fire. While some of these actions are perhaps more likely than others, all of them are a correct result as far as C is concerned. For example given a blatantly undefined bit of code such as: isupper(-20) gcc does not give a warning, even with -pedantic, -Wall, and -Wextra. Whatever it actually does when you run it, there is no guarantee that it will do the same thing in any other C compiler (including other versions of gcc). -Dave Dodge ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC.
Regarding the iffdeferry :) I did that because while I explored tcc.c I noticed it used a lot when something was meant to only get added to the executable file if it was compiled for Windows. It's there to assure people NOT running Windows that they won't ever have to wonder what I did. I assumed from what I saw in the source that this was appropriate practise for something specific to Win32. ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC.
Ivo wrote: (14/04/2009 17:36) >On Tuesday 14 April 2009 17:07, lostgallifreyan wrote: >> What I have considered is using GetLongFileName() in Kernel32.dll to pass >> long names as specified in the file system. It's a neat answer but only >> for W98, no good in W95 (and I think redundant in W2K and WXP). Another >> thing is to add some other option for assembler code files, such as .asm >> (or .ASM, same thing) for those requiring preprocessing in Windows, and >> .s (or .S) to be converted in DOS shortname paths so TCC makes the usual >> interpretation of .s NOT being preprocessed. But I don't know ASM, it's >> not my call what decision gets made there, and GCC doesn't recognise the >> extension ASM anyway. For now, if I were to try assembler code I'd just >> watch carefully what was put into my .s (or .S) files, and make batch >> scripts with the right extension case for TCC. > >Why don't you just implement gcc's -x command line option instead of this >ifdeffery? It's also useful on other platforms and it solves the .s/.S >problem, too: > >-x c >-x cpp-output >-x assembler >-x assembler-with-cpp > >etc... > I think TCC has that option. If not I'd have to do a lot more than I did. I was solving one singular issue only, I wanted a simple DOS shortname path to perform as expected in W9X, and I got it. The idea is to present a working TCC for Windows newcomers who haven't yet explored deeply. It's also better that something as simple as this does work as expected. It works in GCC, so it ought to work in TCC. With this patch it does. I deliberately limited to the simplest task, because I don't want to break existing handling for more complex matters. It's meant to work with TCC, not to reinvent it. ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC.
On Tuesday 14 April 2009 17:07, lostgallifreyan wrote: > What I have considered is using GetLongFileName() in Kernel32.dll to pass > long names as specified in the file system. It's a neat answer but only > for W98, no good in W95 (and I think redundant in W2K and WXP). Another > thing is to add some other option for assembler code files, such as .asm > (or .ASM, same thing) for those requiring preprocessing in Windows, and > .s (or .S) to be converted in DOS shortname paths so TCC makes the usual > interpretation of .s NOT being preprocessed. But I don't know ASM, it's > not my call what decision gets made there, and GCC doesn't recognise the > extension ASM anyway. For now, if I were to try assembler code I'd just > watch carefully what was put into my .s (or .S) files, and make batch > scripts with the right extension case for TCC. Why don't you just implement gcc's -x command line option instead of this ifdeffery? It's also useful on other platforms and it solves the .s/.S problem, too: -x c -x cpp-output -x assembler -x assembler-with-cpp etc... --Ivo ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC.
Actually I made a horrible howler that you didn't spot. :) I never noticed it till I tried passing two files to compile just now. TCC went into a silent CPU-consuming spin, I think because my test for lowercase called the case-forcing loop for *every character* found in that first loop until a lowercase one was found. I had it right before I posted that first attempt at a patch, but had tried to reduce it, and reduced it too far. The earlier one used a flag variable. It definitely works as advertised, even for multiple files. #ifdef _WIN32 /* Set W9X DOS 8.3 upper case paths to lower case. */ unsigned char *p; int flag = 1; if (r[1] == ':' && r[2] == '\\') { for (p = r; *p; p++) if (flag && *p != toupper(*p)) flag = 0; if (flag == 1) for (p = r; *p; p++) *p = tolower(*p); } #endif Also, when you mention the distinction in Windows for case sensitive extensions, you know it can't exist, at least for the same file name in the same location... So when it comes to .s and .S, I don't know what to do. That's why I won't over-ride expected behaviour if someone wants a command line written accordingly, they just make sure (if using Windows) that the commandline has a lower case letter on the path as written (or a forward slash after the colon). I described this clearly already. What I have considered is using GetLongFileName() in Kernel32.dll to pass long names as specified in the file system. It's a neat answer but only for W98, no good in W95 (and I think redundant in W2K and WXP). Another thing is to add some other option for assembler code files, such as .asm (or .ASM, same thing) for those requiring preprocessing in Windows, and .s (or .S) to be converted in DOS shortname paths so TCC makes the usual interpretation of .s NOT being preprocessed. But I don't know ASM, it's not my call what decision gets made there, and GCC doesn't recognise the extension ASM anyway. For now, if I were to try assembler code I'd just watch carefully what was put into my .s (or .S) files, and make batch scripts with the right extension case for TCC. To summarise, the various interpretations of case sensitive extensions is a minefield the moment you use a file system that does not recognise case sensitivity. I think the best that can be done is to make sure that DOS shortname paths do at least work most of the time, using a method that makes it easy to pass written arguments to TCC with case-based determinations that work regardless of what case they have in the file system. So that's why I wrote the patch as I did, to cleanly solve one small, specific problem. ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC.
"Charlie Gordon" wrote: (14/04/2009 11:28) >I don't think this is going in the right direction: >- Command line arguments should not be changed >- Why restrict this behaviour to full paths with a drive letter ? >- instead of matching filename extensions in a case independent manner ? > You miss the point. Look at the GCC specs for commandlines, as I was told to do. How do you propose to handle .s and .S on Windows? The patch would bloat uncontrollably if any attempt was made to solve this that way, and code for that exists already, the written arguments' case is taken literally by TCC. As TCC tends to use lower case commands and filespecs, except where upper case is required, the simplest thing to do is make the DOS paths lower case, while testing for existing lowercase to disable the patch if any is found. I'm a newcomer to C, and TCC, yet was urged to jump in the deep end and suggest a solution! So that is what I did, and I had to figure out the best placement of the patch for myself. And it works. If you want to specify an exact case for an extension, you can. All this patch is meant to do is prevent DOS's all-caps habits from breaking filespecs as normally expected by TCC, while being irrelevant to non-windows users, and being easily bypassed by Windows users. >Also more importantly, do not pass char arguments to tolower/toupper and >islower/isupper functions. 'char' may be signed whereas these functions >expect >an int parameter with a value among those of type unsigned char or EOF. >The argument to these functions should at least be cast to (unsigned char), >and so should the value you compare their result to. > >>if (*p != toupper(*p)) > >if ((unsigned char)*p != toupper((unsigned char)*p) > >>*p = tolower(*p); > >*p = tolower((unsigned char)*p); > >If you don't like the resulting code (quite ugly in my opinion), >you should use a temporary variable: > >int c = (unsigned char)*p; >if (toupper(c) != c) ... > >*p = tolower(c); > I went with advice given, as best I could. As the variable isn't assigned any negative values it's not going to be an issue, as I guess was why I was advised as I was. Actually I forgot the unsigned char bit but even so, it compiled without error or warning and it worked. Not sure why you put all those "unsigned char"s in there like that, or want an extra variable, I just tried "unsigned char *p;" as the initial declaration and it worked fine. (See Dave Dodge's mail in the earlier thread, he does the same). >But all things considered, I still argue that the correct approach to the >problem >you are trying to fix, is to perform case insensitive matching on filename >extensions >wherever these occur. A simple way to do this is to identify all >occurrences of such >matching and use call function with a platform dependent implementation. > No, logically it's the same thing. Though while I could have forced case only on the TEST value that made no sense in practise, as DOS already does force case in shortname paths! I would have not have solved the problem by doing the same thing, or its inverse. And to do so goes right against the GCC specs that TCC is emulating. The correct way is to force lower case for this singular instance to reverse Windows' forced upper case. Keep it simple. If further refined exceptions must be made, add them. For example, as .c requires preprocessing, while .S as opposed to .s also requires it for assembler code, it might be worth adding a line to re-forece the singular extension .s to upper case on Windows to invoke the right response from TCC. But in the interests of keeping code size down, it seems wise to limit the patch's behaviour to a simple easily predicted action that is extremely easy to over-ride in the extremely limited situations where it can even act at all. :) This I have done. ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC.
- Original Message - From: "lostgallifreyan" To: Sent: Sunday, April 12, 2009 5:02 PM Subject: [Tinycc-devel] Basic patch for passing W9X short DOS paths to TCC. As promised, here's a basic patch for allowing files passed to TCC to work in W9X. It can be easily overridden for usual case sensitive file extension handling, but in Windows it's up to the user to see that the file pointed to contains the right stuff, especially in instances of ASM coding where .s and .S are important. You can't specify which in the file name, but you can still do it in the written path, even with this patch. Immediately below this line in tcc.c: /* add a new file */ I added this: #ifdef _WIN32 /* Set W9X DOS 8.3 upper case paths to lower case. */ char *p; if (r[1] == ':' && r[2] == '\\') { for (p = r; *p; p++) { if (*p != toupper(*p)) break; else for (p = r; *p; p++) *p = tolower(*p); } } #endif I considered a command line switch as suggested by a couple of people in the earlier thread, but I think this is neater, it only operates in very limited circumstances. If you want to be sure it does nothing, make a forward slash instead of that first backslash, or a lower case letter on the path, any of these will stop it acting, it ONLY does something if it looks like a standard all caps path, so be aware that if you like all-caps long-name paths you need to think about that if you try this patch... My test for the colon and backslash in Windows paths might even be redundant given the #ifdef_WIN32 bit, but I'm playing it safe in case Windows Mobile is considered as Win32, because it doesn't use a single drive letter at the start of a path. That backslash test is a useful tool, it allows the change to a forward slash to disable the patch as a command line switch might do. I don't think this is going in the right direction: - Command line arguments should not be changed - Why restrict this behaviour to full paths with a drive letter ? - instead of matching filename extensions in a case independent manner ? Also more importantly, do not pass char arguments to tolower/toupper and islower/isupper functions. 'char' may be signed whereas these functions expect an int parameter with a value among those of type unsigned char or EOF. The argument to these functions should at least be cast to (unsigned char), and so should the value you compare their result to. if (*p != toupper(*p)) if ((unsigned char)*p != toupper((unsigned char)*p) *p = tolower(*p); *p = tolower((unsigned char)*p); If you don't like the resulting code (quite ugly in my opinion), you should use a temporary variable: int c = (unsigned char)*p; if (toupper(c) != c) ... *p = tolower(c); But all things considered, I still argue that the correct approach to the problem you are trying to fix, is to perform case insensitive matching on filename extensions wherever these occur. A simple way to do this is to identify all occurrences of such matching and use call function with a platform dependent implementation. -- Chqrlie. ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/tinycc-devel