Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-18 Thread Gabriel Díaz López de la Llave
Hello

http://plan9.aichi-u.ac.jp/pegasus/cgitools/formparse.html

It works well for me, seems than Kenji Arisawa thought about this when
developing his http tools pegasus and rit.

Take a look a it, may be it helps you.

Slds.

Gabi



On 18/03/09 2:23, Russ Cox r...@swtch.com wrote:

 On Tue, Mar 17, 2009 at 5:26 PM, Uriel urie...@gmail.com wrote:
 Unfortunately that doesn't work in this case, my input is HTTP post
 data, which is a single line of URL-encoded text which I have to
 decode into multiple parameters of arbitrary length.
 
 writing a shell script doesn't mean you have to
 write everything in the shell.  why not write a
 simple c program that reads stdin, decodes the
 key=value arguments, and writes each value to
 /env/form_key?
 
 russ
 
 





Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-18 Thread maht

not every environment is Plan 9 so /env is not an option.

Arbitrary limits seem a bit, well, arbitrary ! (not that I'm complaining).
With a flat memory address space and Gbs of memory chucking a realloc in 
there is not totally out of technical bounds.


Using rc in werc neutralizes OS differences to a certain degree, 
obviously some things catch one out, such as this one. (and just wait 
until a \0 comes along!)


In this case it might make sense to inspect Content-Length and 
Content-Type and awk it with FS= to individual files and then inspect 
their size

And then someone will want to upload Mime !




On Tue, Mar 17, 2009 at 5:26 PM, Uriel urie...@gmail.com wrote:
  

Unfortunately that doesn't work in this case, my input is HTTP post
data, which is a single line of URL-encoded text which I have to
decode into multiple parameters of arbitrary length.



writing a shell script doesn't mean you have to
write everything in the shell.  why not write a
simple c program that reads stdin, decodes the
key=value arguments, and writes each value to
/env/form_key?

russ


  





Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-18 Thread roger peppe
2009/3/17 erik quanstrom quans...@quanstro.net:
 it is unreasonable to expect to be able to generate tokens
 that are bigger than 8k.

i'm not sure i agree. they're not just tokens, they're strings,
and there are lots of reasons why one might wish to
have a string longer than 8k read from a file. i've certainly done so
in inferno's sh, which doesn't have this restriction.



Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-18 Thread Uriel
But parsing is not the big issue (thanks Charles for sending me a
small program that does just that), the issue here is sticking the
results in a variable (or variables), Russ' suggestion would work, but
only in native Plan 9 and not in p9p.

Still, I might be able to hack something up if characters don't get
deleted, which seems like a real bug to me.

But while I don't like arbitrary limits (specially when I hit them
;)), I can understand that for the sake of simplicity it makes sense
to have them and not fall into the 'lets handle every possibility
under the sun' dogma.

Which makes me wonder, would it be excessive to double the current
limit? While 8k is quite ample, 16k would be even more so :)

Thanks everyone for all the ideas and suggestions

uriel

On Wed, Mar 18, 2009 at 2:25 AM, erik quanstrom quans...@quanstro.net wrote:
 On Tue Mar 17 20:29:50 EDT 2009, urie...@gmail.com wrote:
  why can't you just let ifs = $newline (formatted to fit your screen) ?

 Unfortunately that doesn't work in this case, my input is HTTP post
 data, which is a single line of URL-encoded text which I have to
 decode into multiple parameters of arbitrary length.

 why not write a small program to crack the post data.
 might take ½ an hour, tops.

 - erik





Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-18 Thread erik quanstrom
 2009/3/17 erik quanstrom quans...@quanstro.net:
  it is unreasonable to expect to be able to generate tokens
  that are bigger than 8k.
 
 i'm not sure i agree. they're not just tokens, they're strings,
 and there are lots of reasons why one might wish to
 have a string longer than 8k read from a file. i've certainly done so
 in inferno's sh, which doesn't have this restriction.

you win.

couple of notes -
* same changes to haven't fork, omitted for clarity
* erealloc should be in subr.c and declared in rc.h
   and should be supported by Realloc in (plan9
   unix win32)^.c
* there are two other calls to realloc that should
   be addressed, too.
* the if guarding efree prevents a free 0 whine.

havefork.c:67,81 - /n/dump/2009/0316/sys/src/cmd/rc/havefork.c:67,72
}
  }
  
- char*
- erealloc(char *p, long n)
- {
-   p = realloc(p, n);  /* botch, should be Realloc */
-   if(p==0)
-   panic(Can't realloc %d bytes\n, n);
-   return p;
- }
- 
  /*
   * Who should wait for the exit from the fork?
   */
havefork.c:82,89 - /n/dump/2009/0316/sys/src/cmd/rc/havefork.c:73,81
  void
  Xbackq(void)
  {
-   int c, l;
-   char *s, *wd, *ewd, *stop;
+   char wd[8193];
+   int c;
+   char *s, *ewd=wd[8192], *stop;
struct io *f;
var *ifs = vlook(ifs);
word *v, *nextv;
havefork.c:108,127 - /n/dump/2009/0316/sys/src/cmd/rc/havefork.c:100,115
default:
close(pfd[PWR]);
f = openfd(pfd[PRD]);
-   s = wd = ewd = 0;
+   s = wd;
v = 0;
while((c = rchr(f))!=EOF){
-   if(s==ewd){
-   l = s-wd;
-   wd = erealloc(wd, l+100);
-   ewd = wd+l+100-1;
-   s = wd+l;
+   if(strchr(stop, c) || s==ewd){
+   if(s!=wd){
+   *s='\0';
+   v = newword(wd, v);
+   s = wd;
+   }
}
-   if(strchr(stop, c)  s!=wd){
-   *s='\0';
-   v = newword(wd, v);
-   s = wd;
-   }
else *s++=c;
}
if(s!=wd){
havefork.c:128,135 - /n/dump/2009/0316/sys/src/cmd/rc/havefork.c:116,121
*s='\0';
v = newword(wd, v);
}
-   if(wd)
-   efree(wd);
closeio(f);
Waitfor(pid, 0);
/* v points to reversed arglist -- reverse it onto argv */


- erik



Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-18 Thread roger peppe
2009/3/18 erik quanstrom quans...@quanstro.net:
 -                               ewd = wd+l+100-1;

one small comment, based on a totally superficial scan of that diff:
might it not be better to grow the buffer by some multiplicative
factor, to avoid linear behaviour when reading large files?
i often (for no particularly good reason) use 50% as a growth
factor - it doesn't seem as radical as *2, but will still work ok
in the long run.

i've probably misread the code though...



Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-18 Thread erik quanstrom
On Wed Mar 18 09:54:54 EDT 2009, rogpe...@gmail.com wrote:
 2009/3/18 erik quanstrom quans...@quanstro.net:
  -                               ewd = wd+l+100-1;
 
 one small comment, based on a totally superficial scan of that diff:
 might it not be better to grow the buffer by some multiplicative
 factor, to avoid linear behaviour when reading large files?
 i often (for no particularly good reason) use 50% as a growth
 factor - it doesn't seem as radical as *2, but will still work ok
 in the long run.

i have two arguments against doing expontential growth:
- other dynamicly allocated buffers in rc are allocated
in increments of 100 bytes.

- the linear behavior would only be for long *tokens*.
the length of the input is irrelavant.  only in the case
of tokens = 100 chars would there be a second call
to realloc.

the total cost is O(maximum token length) for the
whole input.  how could this be a problem?

- erik



Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-18 Thread roger peppe
2009/3/18 erik quanstrom quans...@coraid.com:
 the total cost is O(maximum token length) for the
 whole input.  how could this be a problem?

well, if there's only one token (e.g. when ifs=''), it's actually
O(n^2), assuming
that realloc copies every time.

but your first argument is sufficient. i acquiesce.



Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-17 Thread Uriel
Thanks Geoff for the prompt explanation, but I'm getting the same
results with ifs=() Not sure why, but I'm not sure I understand the
difference between setting ifs to '' and ().

Thanks again

uriel

On Tue, Mar 17, 2009 at 1:40 AM,  ge...@plan9.bell-labs.com wrote:
 Setting ifs='' defeats rc's tokenisation, so the result
 of `{} will be a series of rc `words', each limited to
 Wordmax (8192) bytes and with the next byte of the input
 stream after each word set to NUL.

 Did you perhaps intend to write ifs=(), which has different
 meaning?





Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-17 Thread erik quanstrom
On Tue Mar 17 18:17:53 EDT 2009, urie...@gmail.com wrote:
 Thanks Geoff for the prompt explanation, but I'm getting the same
 results with ifs=() Not sure why, but I'm not sure I understand the
 difference between setting ifs to '' and ().

in your test, try this

echo $#x

- erik



Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-17 Thread erik quanstrom
On Tue Mar 17 18:29:14 EDT 2009, urie...@gmail.com wrote:
 Thanks martin for your analysis, this makes some sense to me, but as I
 pointed out, even setting ifs to () doesn't solve the issue, so it
 would be nice to find a solution to this.
 
 Right now having the output of `{} corrupted can be quite inconvenient...

it is unreasonable to expect to be able to generate tokens
that are bigger than 8k.  however, the '8' should not
be dropped.  i would think this small change would be worth
consideration.

; diffy -c havefork.c
/n/dump/2009/0317/sys/src/cmd/rc/havefork.c:74,80 - havefork.c:74,80
  Xbackq(void)
  {
char wd[8193];
-   int c;
+   int c, trunc;
char *s, *ewd=wd[8192], *stop;
struct io *f;
var *ifs = vlook(ifs);
/n/dump/2009/0317/sys/src/cmd/rc/havefork.c:105,113 - havefork.c:105,116
while((c = rchr(f))!=EOF){
if(strchr(stop, c) || s==ewd){
if(s!=wd){
+   trunc = s == ewd;
*s='\0';
v = newword(wd, v);
s = wd;
+   if(trunc)
+   *s++ = c;
}
}
else *s++=c;

- erik



Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-17 Thread Uriel
On Tue, Mar 17, 2009 at 11:43 PM, erik quanstrom quans...@quanstro.net wrote:
 On Tue Mar 17 18:29:14 EDT 2009, urie...@gmail.com wrote:
 Thanks martin for your analysis, this makes some sense to me, but as I
 pointed out, even setting ifs to () doesn't solve the issue, so it
 would be nice to find a solution to this.

 Right now having the output of `{} corrupted can be quite inconvenient...

 it is unreasonable to expect to be able to generate tokens
 that are bigger than 8k.

Well, I would prefer if such limit didn't exist ;) But it doesn't seem
like a totally unreasonable limit either.

  however, the '8' should not be dropped.

Yes, this is the critical issue, at least if the tokens are just
split, one can join them up by hand if needed, but as things are now
the data gets corrupted in ways that at least at first are mystifying,
and which are hard to work around.

 i would think this small change would be worth
 consideration.

I will give it a try when I get a chance, but if it fixes the lost
chars, I'll be happy.

Thanks!

uriel

 ; diffy -c havefork.c
 /n/dump/2009/0317/sys/src/cmd/rc/havefork.c:74,80 - havefork.c:74,80
  Xbackq(void)
  {
        char wd[8193];
 -       int c;
 +       int c, trunc;
        char *s, *ewd=wd[8192], *stop;
        struct io *f;
        var *ifs = vlook(ifs);
 /n/dump/2009/0317/sys/src/cmd/rc/havefork.c:105,113 - havefork.c:105,116
                while((c = rchr(f))!=EOF){
                        if(strchr(stop, c) || s==ewd){
                                if(s!=wd){
 +                                       trunc = s == ewd;
                                        *s='\0';
                                        v = newword(wd, v);
                                        s = wd;
 +                                       if(trunc)
 +                                               *s++ = c;
                                }
                        }
                        else *s++=c;

 - erik




Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-17 Thread Uriel
 in your test, try this

        echo $#x

I tried that too, I'm getting the same result for an ifs of '' or ().

% ifs=() {x=`{cat f}; echo $#x}
2
% ifs='' {x=`{cat f}; echo $#x}
2

I'm doing something else wrong?

uriel



Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-17 Thread erik quanstrom
  Right now having the output of `{} corrupted can be quite inconvenient...
 
  it is unreasonable to expect to be able to generate tokens
  that are bigger than 8k.
 
 Well, I would prefer if such limit didn't exist ;) But it doesn't seem
 like a totally unreasonable limit either.

why can't you just let ifs = $newline (formatted to fit your screen) ?

- erik



Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-17 Thread Uriel
 why can't you just let ifs = $newline (formatted to fit your screen) ?

Unfortunately that doesn't work in this case, my input is HTTP post
data, which is a single line of URL-encoded text which I have to
decode into multiple parameters of arbitrary length.

Still, if no characters were getting lost, I probably can figure some
way to work around the issue and stitch things together after they get
split.

uriel



Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-17 Thread Russ Cox
On Tue, Mar 17, 2009 at 5:26 PM, Uriel urie...@gmail.com wrote:
 Unfortunately that doesn't work in this case, my input is HTTP post
 data, which is a single line of URL-encoded text which I have to
 decode into multiple parameters of arbitrary length.

writing a shell script doesn't mean you have to
write everything in the shell.  why not write a
simple c program that reads stdin, decodes the
key=value arguments, and writes each value to
/env/form_key?

russ



Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-17 Thread erik quanstrom
On Tue Mar 17 20:29:50 EDT 2009, urie...@gmail.com wrote:
  why can't you just let ifs = $newline (formatted to fit your screen) ?
 
 Unfortunately that doesn't work in this case, my input is HTTP post
 data, which is a single line of URL-encoded text which I have to
 decode into multiple parameters of arbitrary length.

why not write a small program to crack the post data.
might take ½ an hour, tops.

- erik



Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-16 Thread geoff
Setting ifs='' defeats rc's tokenisation, so the result
of `{} will be a series of rc `words', each limited to
Wordmax (8192) bytes and with the next byte of the input
stream after each word set to NUL.

Did you perhaps intend to write ifs=(), which has different
meaning?



Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-16 Thread Martin Neubauer
Hi,

I think the following gives a clue:

% cmp f f2
f f2 differ: char 8193

The following snippet from the Xbackq code seems to be the culprit:

char wd[8193];
int c;
char *s, *ewd=wd[8192], *stop;

...

while((c = rchr(f))!=EOF){
if(strchr(stop, c) || s==ewd){
if(s!=wd){
*s='\0';
v = newword(wd, v);
s = wd;
}
}
else *s++=c;
}

Keeping the loop from dropping characters is trivial.  Getting rid of
the inserted space probably requires a dynamic buffer.  I might give
it a shot.

Regards,
Martin

* Uriel (urie...@gmail.com) wrote:
 At first I thought very big rc variables seem to become strangely corrupted.
 
 % for(i in `{seq 1000}) { echo 0123456789  f }
 % ifs='' {x=`{cat f}}
 % echo -n $x  f2
 % diff f f2
 745c745
  0123456789
 ---
  01234567 9
 
 But the bug seems to be in `{ } because replacing the use of the x var
 with simply:
 
 % ifs='' { echo -n `{cat f}  f2}
 
 Produces the same results.
 
 Longer strings get more random(?) characters 'blanked'.
 
 The results are identical in p9p and native plan9.
 
 I looked a bit around the rc source that seemed relevant, but didn't
 see any obvious errors, but I don't fully understand the code.
 
 Peace
 
 uriel



Re: [9fans] Strange rc bug for the 9fans bug-squashing squad

2009-03-16 Thread Martin Neubauer
On second thought (and in the light of Geoffs reply) I probably won't.
If you do care, the following change to the loop in question will at
least preserve all input:

while((c = rchr(f))!=EOF){
if(strchr(stop, c)){
if(s!=wd){
*s='\0';
v = newword(wd, v);
s = wd;
}
}
else if(s==ewd){
*s='\0';
v = newword(wd, v);
s = wd;
*s++=c;
}
else *s++=c;
}

With a dynamic buffer the tokenisation could be prevented, but in your
example the lexical scanner would quite likely bail afterwards.  (I
remember a discussion some time ago about this.)