Re: [vdr] trouble with asprintf
On 02/10/08 16:06, Wolfgang Rohdewald wrote: > Hi, > > I am making the muggle plugin work with UTF-8 and have a little problem: > > since asprintf leads to segfaults if feeded with incorrect UTF-8 characters, > I wanted to write a wrapper function which would then check the return value > of asprintf. However I have a problem with the variable argument list and > the va_* macros. Using gdb shows that, in the following example, in > > res=asprintf (strp, fmt, ap); > > ap is interpreted not as a list of arguments but as an integer. > > What is wrong here? > > BTW I am quite sure that vdr will sometimes coredump since it never checks the > return value of asprintf. One suspect would be if somebody used a latin1 > charset and had special characters like äöü in file names and then changes > to utf-8 without converting file names to utf-8. If vdr then passes such > a file name to asprintf, corrupted memory results. Might be difficult > to debug remotely. You could use VDR's cString::sprintf() instead. This is probably also what I am going to do in the VDR core code, to avoid asprintf() altogether. The single leftover vasprintf() call in cString::sprintf() can then be made safe. Klaus ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] trouble with asprintf
On Sonntag, 10. Februar 2008, Klaus Schmidinger wrote: > You could use VDR's cString::sprintf() instead. > This is probably also what I am going to do in the VDR core code, > to avoid asprintf() altogether. The single leftover vasprintf() > call in cString::sprintf() can then be made safe. vasprintf was a good hint - I only had to change asprintf to vasprintf, same arguments. now it works as expected. I will use my msprintf until you have made cString::sprintf() safe. Thank you! int msprintf(char **strp, const char *fmt, ...) { va_list ap; va_start (ap, fmt); int res=vasprintf (strp, fmt, ap); va_end (ap); } -- Wolfgang ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] trouble with asprintf
Wolfgang Rohdewald wrote: > since asprintf leads to segfaults if feeded with incorrect UTF-8 characters, > I wanted to write a wrapper function which would then check the return value > of asprintf. I never understood what the problem is with utf8 and asprintf, since utf8 is mostly ASCIIZ backwards compatible, and asprintf probably doesn't even know the difference between utf8 and ascii. What special handling does asprintf with utf8? Is there some example that causes the trouble? Worst case I can imagine would be that there's an invalid 0 byte inside an utf8 multibyte char, and even this would just result in an utf8 string that terminates with an incomplete char - and shouldn't handling such crap be the job of whatever processes the utf8 string later on? At least IMHO it would be wise to count any 0 byte as string end. Cheers, Udo ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] trouble with asprintf
On Sonntag, 10. Februar 2008, Udo Richter wrote: > What special > handling does asprintf with utf8? Is there some example that causes the > trouble? > Worst case I can imagine would be that there's an invalid 0 byte inside > an utf8 multibyte char printf and family sometimes have to count characters, so I suppose they have to scan UTF I know from mysql and postgresql that they also scan every UTF string passed from the client for illegal chars and abort the transaction if they find any. My problem code: mgDb::Build_cddbid(const mgSQLString& artist) const { char *s; asprintf(&s,"%ld-%.9s",random(),artist.original()); segfaults only if illegal utf8 chars appear in artist.original() asprintf returns -1, so s is nothing that could be freed, and this gives a nice backtrace: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1319449712 (LWP 22989)] 0xb7bf57ea in free () from /lib/tls/i686/cmov/libc.so.6 (gdb) bt #0 0xb7bf57ea in free () from /lib/tls/i686/cmov/libc.so.6 #1 0xb7986908 in mgDb::Build_cddbid (this=0x86ed8e8, [EMAIL PROTECTED]) at mg_db.c:1023 If I change %.9s to %s, everything is fine. I cannot easily simplify that, if I try like this, it works: char artist[50]; strcpy(artist,"Celine Dion"); artist[1]=0xe9; asprintf(&buffer,"%ld-%.9s",random(),artist); printf(buffer); free(buffer); -- Wolfgang ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] trouble with asprintf
I demand that Wolfgang Rohdewald may or may not have written... > On Sonntag, 10. Februar 2008, Udo Richter wrote: >> What special handling does asprintf with utf8? Is there some example that >> causes the trouble? >> Worst case I can imagine would be that there's an invalid 0 byte inside >> an utf8 multibyte char > printf and family sometimes have to count characters, so I suppose they > have to scan UTF No; they only ever count bytes. The encoding is irrelevant. [snip] -- | Darren Salt| linux or ds at | nr. Ashington, | Toon | RISC OS, Linux | youmustbejoking,demon,co,uk | Northumberland | Army | + Buy local produce. Try to walk or cycle. TRANSPORT CAUSES GLOBAL WARMING. If a bus stops at a bus station, does work stop at a workstation? ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] trouble with asprintf
Wolfgang Rohdewald wrote: > char *s; > asprintf(&s,"%ld-%.9s",random(),artist.original()); > > segfaults only if illegal utf8 chars appear in artist.original() > > asprintf returns -1, so s is nothing that could be freed, > and this gives a nice backtrace: So its basically just free'ing an uninitialized pointer. Well, that leads to the question whether s is unchanged in case of a -1 error return, and whether this would work: char *s = NULL; asprintf(&s,"%ld-%.9s",random(),artist.original()); Cheers, Udo ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] trouble with asprintf
Udo Richter wrote: > Wolfgang Rohdewald wrote: > > char *s; > > asprintf(&s,"%ld-%.9s",random(),artist.original()); > > > > segfaults only if illegal utf8 chars appear in artist.original() > > > > asprintf returns -1, so s is nothing that could be freed, > > and this gives a nice backtrace: > > So its basically just free'ing an uninitialized pointer. > > Well, that leads to the question whether s is unchanged in case of a -1 > error return, and whether this would work: > > char *s = NULL; > asprintf(&s,"%ld-%.9s",random(),artist.original()); The manpage explicitly says that the content of s is undefined in case of error. So even if it works you can't really count on it. You can't get around checking the return value. cu Ludwig -- (o_ Ludwig Nussel //\ SUSE LINUX Products GmbH, Development V_/_ http://www.suse.de/ ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] trouble with asprintf
I demand that Ludwig Nussel may or may not have written... > Darren Salt wrote: >> I demand that Ludwig Nussel may or may not have written... >> [snip] >>> asprintf needs to check for multibyte characters to not cut them in >>> the middle and produce invalid output. >> No - it's encoding-neutral. [...] > Try the following with 'LANG=C' and 'LANG=de_DE.UTF-8'. You will notice > that in the latter case it will not cut the umlaut. [snip code - hmm, dodgy use of printf] Interesting. It omits it entirely. But the rest of my point still stands - it still counts bytes. -- | Darren Salt| linux or ds at | nr. Ashington, | Toon | RISC OS, Linux | youmustbejoking,demon,co,uk | Northumberland | Army | + Burn less waste. Use less packaging. Waste less. USE FEWER RESOURCES. This message was brought to you using only 100% recycled electrons. ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] trouble with asprintf
Darren Salt wrote: > I demand that Ludwig Nussel may or may not have written... > > [snip] > > asprintf needs to check for multibyte characters to not cut them in > > the middle and produce invalid output. > > No - it's encoding-neutral. What you want is your own version which does that Try the following with 'LANG=C' and 'LANG=de_DE.UTF-8'. You will notice that in the latter case it will not cut the umlaut. #define _GNU_SOURCE #include #include #include #include int main(void) { char* buffer; char artist[] = "Haegar"; int ret; setlocale(LC_ALL, ""); artist[1]=0xc3; artist[2]=0xa4; ret = asprintf(&buffer,"%.2s\n",artist); printf("%d bytes\n", ret); printf(buffer); free(buffer); return 0; } cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.de/ SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nuernberg) ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] trouble with asprintf
I demand that Ludwig Nussel may or may not have written... [snip] > asprintf needs to check for multibyte characters to not cut them in > the middle and produce invalid output. No - it's encoding-neutral. What you want is your own version which does that (but if you still think that that should be called asprintf, you may as well rewrite printf etc. while you're at it), or conversion to/from wide character strings (and a version of asprintf() which handles wchar_t*). -- | Darren Salt| linux or ds at | nr. Ashington, | Toon | RISC OS, Linux | youmustbejoking,demon,co,uk | Northumberland | Army | + Output less CO2 => avoid boiling weather. TIME IS RUNNING OUT *FAST*. Windows 2000. Known to some as Windows 1900. ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] trouble with asprintf
On Montag, 11. Februar 2008, Udo Richter wrote: > Well, that leads to the question whether s is unchanged in case of a -1 > error return, and whether this would work: I can confirm that. The man page however says the value will be undefined. My current understanding is: 1. dont forget to call setlocale! Normally setlocale(LC_ALL,"") 2. if locale is UTF-8, asprintf returns -1 if the string contains illegal UTF-8 characters anywhere 3. this and out of memory are the only reasons I know for result -1. The man page to asprintf says there could be other errors than out of memory but mentions none. 4. If result -1, the buffer pointer stays unchanged, see man page 5. if locale is UTF-8 and a maximum length is defined as in %.9s, and if %.9s would cut a multibyte char, only 8 chars will be used. See example from Ludwig Nussel. What I don't know where in the man pages this is explained - I did not find anything about it. Neither man asprintf or man printf -- Wolfgang ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] trouble with asprintf
Wolfgang Rohdewald wrote: > My problem code: > > mgDb::Build_cddbid(const mgSQLString& artist) const > { > char *s; > asprintf(&s,"%ld-%.9s",random(),artist.original()); > > segfaults only if illegal utf8 chars appear in artist.original() > > asprintf returns -1, so s is nothing that could be freed, > and this gives a nice backtrace: > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread -1319449712 (LWP 22989)] > 0xb7bf57ea in free () from /lib/tls/i686/cmov/libc.so.6 > (gdb) bt > #0 0xb7bf57ea in free () from /lib/tls/i686/cmov/libc.so.6 > #1 0xb7986908 in mgDb::Build_cddbid (this=0x86ed8e8, [EMAIL PROTECTED]) at > mg_db.c:1023 As you can see it doesn't segfault on asprintf but on free(). > If I change %.9s to %s, everything is fine. > > I cannot easily simplify that, if I try like this, it works: > > char artist[50]; > strcpy(artist,"Celine Dion"); > artist[1]=0xe9; > asprintf(&buffer,"%ld-%.9s",random(),artist); > printf(buffer); > free(buffer); if(asprintf(...) >= 0) { printf(...); free(...); } Or just use normal snprintf as the amount of charactes to print is fixed anyways so you don't need a variable sized buffer. cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.de/ SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nuernberg) ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] trouble with asprintf
Wolfgang Rohdewald wrote: > since asprintf leads to segfaults if feeded with incorrect UTF-8 characters, It's not asprintf that segfaults but the call to free uninitialized memory afterwards. > I wanted to write a wrapper function which would then check the return value > of asprintf. However I have a problem with the variable argument list and > the va_* macros. Using gdb shows that, in the following example, in > > res=asprintf (strp, fmt, ap); > > ap is interpreted not as a list of arguments but as an integer. use vasprintf > int > msprintf(char **strp, const char *fmt, ...) > { > va_list ap; > int res; > va_start (ap, fmt); > res=asprintf (strp, fmt, ap); > va_end (ap); > } Even if you use vasprintf to make the function actually work you still need to check the return value of vasprintf otherwise this wrapper would be kind of useless. cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.de/ SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nuernberg) ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] trouble with asprintf
Udo Richter wrote: > Wolfgang Rohdewald wrote: > > since asprintf leads to segfaults if feeded with incorrect UTF-8 characters, > > I wanted to write a wrapper function which would then check the return value > > of asprintf. > > I never understood what the problem is with utf8 and asprintf, since > utf8 is mostly ASCIIZ backwards compatible, and asprintf probably > doesn't even know the difference between utf8 and ascii. What special > handling does asprintf with utf8? Is there some example that causes the > trouble? asprintf needs to check for multibyte characters to not cut them in the middle and produce invalid output. cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.de/ SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nuernberg) ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] trouble with asprintf
On Montag, 11. Februar 2008, Ludwig Nussel wrote: > As you can see it doesn't segfault on asprintf but on free(). I did see that. I did not say it segfaults but it does lead to segfaults. > if(asprintf(...) >= 0) > { > printf(...); > free(...); > } I do not want to change dozens of places like that. Just have one single point which can emit an error message so I can then see what has to be done for each individual place. Most of the asprintf calls will never get into trouble anyway. But if a user reports a problem I prefer an error message over some vague description. > Or just use normal snprintf as the amount of charactes to print is > fixed anyways so you don't need a variable sized buffer. this is just a minimal sample. The real code has variable length strings. On Montag, 11. Februar 2008, Ludwig Nussel wrote: > Even if you use vasprintf to make the function actually work you > still need to check the return value of vasprintf otherwise this > wrapper would be kind of useless. of course. See above. -- Wolfgang ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr