Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
On Wed Mar 18 06:33:49 EDT 2009, mattmob...@proweb.co.uk wrote: > Using rc in werc neutralizes OS differences to a certain degree, > obviously some things catch one out, such as this one. (and just wait > until a \0 comes along!) this is an easy problem to solve: tr '\0' '☺' - erik
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
2009/3/18 erik quanstrom : > the total cost is O(maximum token length) for the > whole input. how could this be a problem? well, if there's only one token (e.g. when ifs=''), it's actually O(n^2), assuming that realloc copies every time. but your first argument is sufficient. i acquiesce.
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
On Wed Mar 18 09:54:54 EDT 2009, rogpe...@gmail.com wrote: > 2009/3/18 erik quanstrom : > > - ewd = wd+l+100-1; > > one small comment, based on a totally superficial scan of that diff: > might it not be better to grow the buffer by some multiplicative > factor, to avoid linear behaviour when reading large files? > i often (for no particularly good reason) use 50% as a growth > factor - it doesn't seem as radical as *2, but will still work ok > in the long run. i have two arguments against doing expontential growth: - other dynamicly allocated buffers in rc are allocated in increments of 100 bytes. - the linear behavior would only be for long *tokens*. the length of the input is irrelavant. only in the case of tokens >= 100 chars would there be a second call to realloc. the total cost is O(maximum token length) for the whole input. how could this be a problem? - erik
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
2009/3/18 erik quanstrom : > - ewd = wd+l+100-1; one small comment, based on a totally superficial scan of that diff: might it not be better to grow the buffer by some multiplicative factor, to avoid linear behaviour when reading large files? i often (for no particularly good reason) use 50% as a growth factor - it doesn't seem as radical as *2, but will still work ok in the long run. i've probably misread the code though...
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
> 2009/3/17 erik quanstrom : > > it is unreasonable to expect to be able to generate tokens > > that are bigger than 8k. > > i'm not sure i agree. they're not just tokens, they're strings, > and there are lots of reasons why one might wish to > have a string longer than 8k read from a file. i've certainly done so > in inferno's sh, which doesn't have this restriction. you win. couple of notes - * same changes to haven't fork, omitted for clarity * erealloc should be in subr.c and declared in rc.h and should be supported by Realloc in (plan9 unix win32)^.c * there are two other calls to realloc that should be addressed, too. * the if guarding efree prevents a "free 0" whine. havefork.c:67,81 - /n/dump/2009/0316/sys/src/cmd/rc/havefork.c:67,72 } } - char* - erealloc(char *p, long n) - { - p = realloc(p, n); /* botch, should be Realloc */ - if(p==0) - panic("Can't realloc %d bytes\n", n); - return p; - } - /* * Who should wait for the exit from the fork? */ havefork.c:82,89 - /n/dump/2009/0316/sys/src/cmd/rc/havefork.c:73,81 void Xbackq(void) { - int c, l; - char *s, *wd, *ewd, *stop; + char wd[8193]; + int c; + char *s, *ewd=&wd[8192], *stop; struct io *f; var *ifs = vlook("ifs"); word *v, *nextv; havefork.c:108,127 - /n/dump/2009/0316/sys/src/cmd/rc/havefork.c:100,115 default: close(pfd[PWR]); f = openfd(pfd[PRD]); - s = wd = ewd = 0; + s = wd; v = 0; while((c = rchr(f))!=EOF){ - if(s==ewd){ - l = s-wd; - wd = erealloc(wd, l+100); - ewd = wd+l+100-1; - s = wd+l; + if(strchr(stop, c) || s==ewd){ + if(s!=wd){ + *s='\0'; + v = newword(wd, v); + s = wd; + } } - if(strchr(stop, c) && s!=wd){ - *s='\0'; - v = newword(wd, v); - s = wd; - } else *s++=c; } if(s!=wd){ havefork.c:128,135 - /n/dump/2009/0316/sys/src/cmd/rc/havefork.c:116,121 *s='\0'; v = newword(wd, v); } - if(wd) - efree(wd); closeio(f); Waitfor(pid, 0); /* v points to reversed arglist -- reverse it onto argv */ - erik
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
But parsing is not the big issue (thanks Charles for sending me a small program that does just that), the issue here is sticking the results in a variable (or variables), Russ' suggestion would work, but only in native Plan 9 and not in p9p. Still, I might be able to hack something up if characters don't get deleted, which seems like a real bug to me. But while I don't like arbitrary limits (specially when I hit them ;)), I can understand that for the sake of simplicity it makes sense to have them and not fall into the 'lets handle every possibility under the sun' dogma. Which makes me wonder, would it be excessive to double the current limit? While 8k is quite ample, 16k would be even more so :) Thanks everyone for all the ideas and suggestions uriel On Wed, Mar 18, 2009 at 2:25 AM, erik quanstrom wrote: > On Tue Mar 17 20:29:50 EDT 2009, urie...@gmail.com wrote: >> > why can't you just let ifs = $newline (formatted to fit your screen) ? >> >> Unfortunately that doesn't work in this case, my input is HTTP post >> data, which is a single line of URL-encoded text which I have to >> decode into multiple parameters of arbitrary length. > > why not write a small program to crack the post data. > might take ½ an hour, tops. > > - erik > >
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
2009/3/17 erik quanstrom : > it is unreasonable to expect to be able to generate tokens > that are bigger than 8k. i'm not sure i agree. they're not just tokens, they're strings, and there are lots of reasons why one might wish to have a string longer than 8k read from a file. i've certainly done so in inferno's sh, which doesn't have this restriction.
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
not every environment is Plan 9 so /env is not an option. Arbitrary limits seem a bit, well, arbitrary ! (not that I'm complaining). With a flat memory address space and Gbs of memory chucking a realloc in there is not totally out of technical bounds. Using rc in werc neutralizes OS differences to a certain degree, obviously some things catch one out, such as this one. (and just wait until a \0 comes along!) In this case it might make sense to inspect Content-Length and Content-Type and awk it with FS="&" to individual files and then inspect their size And then someone will want to upload Mime ! On Tue, Mar 17, 2009 at 5:26 PM, Uriel wrote: Unfortunately that doesn't work in this case, my input is HTTP post data, which is a single line of URL-encoded text which I have to decode into multiple parameters of arbitrary length. writing a shell script doesn't mean you have to write everything in the shell. why not write a simple c program that reads stdin, decodes the key=value arguments, and writes each "value" to /env/form_key? russ
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
Hello http://plan9.aichi-u.ac.jp/pegasus/cgitools/formparse.html It works well for me, seems than Kenji Arisawa thought about this when developing his http tools pegasus and rit. Take a look a it, may be it helps you. Slds. Gabi On 18/03/09 2:23, "Russ Cox" wrote: > On Tue, Mar 17, 2009 at 5:26 PM, Uriel wrote: >> Unfortunately that doesn't work in this case, my input is HTTP post >> data, which is a single line of URL-encoded text which I have to >> decode into multiple parameters of arbitrary length. > > writing a shell script doesn't mean you have to > write everything in the shell. why not write a > simple c program that reads stdin, decodes the > key=value arguments, and writes each "value" to > /env/form_key? > > russ > >
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
On Tue Mar 17 20:29:50 EDT 2009, urie...@gmail.com wrote: > > why can't you just let ifs = $newline (formatted to fit your screen) ? > > Unfortunately that doesn't work in this case, my input is HTTP post > data, which is a single line of URL-encoded text which I have to > decode into multiple parameters of arbitrary length. why not write a small program to crack the post data. might take ½ an hour, tops. - erik
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
On Tue, Mar 17, 2009 at 5:26 PM, Uriel wrote: > Unfortunately that doesn't work in this case, my input is HTTP post > data, which is a single line of URL-encoded text which I have to > decode into multiple parameters of arbitrary length. writing a shell script doesn't mean you have to write everything in the shell. why not write a simple c program that reads stdin, decodes the key=value arguments, and writes each "value" to /env/form_key? russ
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
> why can't you just let ifs = $newline (formatted to fit your screen) ? Unfortunately that doesn't work in this case, my input is HTTP post data, which is a single line of URL-encoded text which I have to decode into multiple parameters of arbitrary length. Still, if no characters were getting lost, I probably can figure some way to work around the issue and stitch things together after they get split. uriel
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
> >> Right now having the output of `{} corrupted can be quite inconvenient... > > > > it is unreasonable to expect to be able to generate tokens > > that are bigger than 8k. > > Well, I would prefer if such limit didn't exist ;) But it doesn't seem > like a totally unreasonable limit either. why can't you just let ifs = $newline (formatted to fit your screen) ? - erik
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
> in your test, try this > > echo $#x I tried that too, I'm getting the same result for an ifs of '' or (). % ifs=() {x=`{cat f}; echo $#x} 2 % ifs='' {x=`{cat f}; echo $#x} 2 I'm doing something else wrong? uriel
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
On Tue, Mar 17, 2009 at 11:43 PM, erik quanstrom wrote: > On Tue Mar 17 18:29:14 EDT 2009, urie...@gmail.com wrote: >> Thanks martin for your analysis, this makes some sense to me, but as I >> pointed out, even setting ifs to () doesn't solve the issue, so it >> would be nice to find a solution to this. >> >> Right now having the output of `{} corrupted can be quite inconvenient... > > it is unreasonable to expect to be able to generate tokens > that are bigger than 8k. Well, I would prefer if such limit didn't exist ;) But it doesn't seem like a totally unreasonable limit either. > however, the '8' should not be dropped. Yes, this is the critical issue, at least if the tokens are just split, one can join them up by hand if needed, but as things are now the data gets corrupted in ways that at least at first are mystifying, and which are hard to work around. > i would think this small change would be worth > consideration. I will give it a try when I get a chance, but if it fixes the lost chars, I'll be happy. Thanks! uriel > ; diffy -c havefork.c > /n/dump/2009/0317/sys/src/cmd/rc/havefork.c:74,80 - havefork.c:74,80 > Xbackq(void) > { > char wd[8193]; > - int c; > + int c, trunc; > char *s, *ewd=&wd[8192], *stop; > struct io *f; > var *ifs = vlook("ifs"); > /n/dump/2009/0317/sys/src/cmd/rc/havefork.c:105,113 - havefork.c:105,116 > while((c = rchr(f))!=EOF){ > if(strchr(stop, c) || s==ewd){ > if(s!=wd){ > + trunc = s == ewd; > *s='\0'; > v = newword(wd, v); > s = wd; > + if(trunc) > + *s++ = c; > } > } > else *s++=c; > > - erik > >
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
On Tue Mar 17 18:29:14 EDT 2009, urie...@gmail.com wrote: > Thanks martin for your analysis, this makes some sense to me, but as I > pointed out, even setting ifs to () doesn't solve the issue, so it > would be nice to find a solution to this. > > Right now having the output of `{} corrupted can be quite inconvenient... it is unreasonable to expect to be able to generate tokens that are bigger than 8k. however, the '8' should not be dropped. i would think this small change would be worth consideration. ; diffy -c havefork.c /n/dump/2009/0317/sys/src/cmd/rc/havefork.c:74,80 - havefork.c:74,80 Xbackq(void) { char wd[8193]; - int c; + int c, trunc; char *s, *ewd=&wd[8192], *stop; struct io *f; var *ifs = vlook("ifs"); /n/dump/2009/0317/sys/src/cmd/rc/havefork.c:105,113 - havefork.c:105,116 while((c = rchr(f))!=EOF){ if(strchr(stop, c) || s==ewd){ if(s!=wd){ + trunc = s == ewd; *s='\0'; v = newword(wd, v); s = wd; + if(trunc) + *s++ = c; } } else *s++=c; - erik
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
Thanks martin for your analysis, this makes some sense to me, but as I pointed out, even setting ifs to () doesn't solve the issue, so it would be nice to find a solution to this. Right now having the output of `{} corrupted can be quite inconvenient... Thanks uriel On Tue, Mar 17, 2009 at 3:01 AM, Martin Neubauer wrote: > On second thought (and in the light of Geoffs reply) I probably won't. > If you do care, the following change to the loop in question will at > least preserve all input: > > while((c = rchr(f))!=EOF){ > if(strchr(stop, c)){ > if(s!=wd){ > *s='\0'; > v = newword(wd, v); > s = wd; > } > } > else if(s==ewd){ > *s='\0'; > v = newword(wd, v); > s = wd; > *s++=c; > } > else *s++=c; > } > > With a dynamic buffer the tokenisation could be prevented, but in your > example the lexical scanner would quite likely bail afterwards. (I > remember a discussion some time ago about this.) > >
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
On Tue Mar 17 18:17:53 EDT 2009, urie...@gmail.com wrote: > Thanks Geoff for the prompt explanation, but I'm getting the same > results with ifs=() Not sure why, but I'm not sure I understand the > difference between setting ifs to '' and (). in your test, try this echo $#x - erik
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
Thanks Geoff for the prompt explanation, but I'm getting the same results with ifs=() Not sure why, but I'm not sure I understand the difference between setting ifs to '' and (). Thanks again uriel On Tue, Mar 17, 2009 at 1:40 AM, wrote: > Setting ifs='' defeats rc's tokenisation, so the result > of `{} will be a series of rc `words', each limited to > Wordmax (8192) bytes and with the next byte of the input > stream after each word set to NUL. > > Did you perhaps intend to write ifs=(), which has different > meaning? > >
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
On second thought (and in the light of Geoffs reply) I probably won't. If you do care, the following change to the loop in question will at least preserve all input: while((c = rchr(f))!=EOF){ if(strchr(stop, c)){ if(s!=wd){ *s='\0'; v = newword(wd, v); s = wd; } } else if(s==ewd){ *s='\0'; v = newword(wd, v); s = wd; *s++=c; } else *s++=c; } With a dynamic buffer the tokenisation could be prevented, but in your example the lexical scanner would quite likely bail afterwards. (I remember a discussion some time ago about this.)
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
Hi, I think the following gives a clue: % cmp f f2 f f2 differ: char 8193 The following snippet from the Xbackq code seems to be the culprit: char wd[8193]; int c; char *s, *ewd=&wd[8192], *stop; ... while((c = rchr(f))!=EOF){ if(strchr(stop, c) || s==ewd){ if(s!=wd){ *s='\0'; v = newword(wd, v); s = wd; } } else *s++=c; } Keeping the loop from dropping characters is trivial. Getting rid of the inserted space probably requires a dynamic buffer. I might give it a shot. Regards, Martin * Uriel (urie...@gmail.com) wrote: > At first I thought very big rc variables seem to become strangely corrupted. > > % for(i in `{seq 1000}) { echo 0123456789 >> f } > % ifs='' {x=`{cat f}} > % echo -n $x > f2 > % diff f f2 > 745c745 > < 0123456789 > --- > > 01234567 9 > > But the bug seems to be in `{ } because replacing the use of the x var > with simply: > > % ifs='' { echo -n `{cat f} > f2} > > Produces the same results. > > Longer strings get more random(?) characters 'blanked'. > > The results are identical in p9p and native plan9. > > I looked a bit around the rc source that seemed relevant, but didn't > see any obvious errors, but I don't fully understand the code. > > Peace > > uriel
Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
Setting ifs='' defeats rc's tokenisation, so the result of `{} will be a series of rc `words', each limited to Wordmax (8192) bytes and with the next byte of the input stream after each word set to NUL. Did you perhaps intend to write ifs=(), which has different meaning?
[9fans] Strange rc bug for the 9fans bug-squashing squad
At first I thought very big rc variables seem to become strangely corrupted. % for(i in `{seq 1000}) { echo 0123456789 >> f } % ifs='' {x=`{cat f}} % echo -n $x > f2 % diff f f2 745c745 < 0123456789 --- > 01234567 9 But the bug seems to be in `{ } because replacing the use of the x var with simply: % ifs='' { echo -n `{cat f} > f2} Produces the same results. Longer strings get more random(?) characters 'blanked'. The results are identical in p9p and native plan9. I looked a bit around the rc source that seemed relevant, but didn't see any obvious errors, but I don't fully understand the code. Peace uriel