date:20130115

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-15 Thread Alex Shinn

On Tue, Jan 15, 2013 at 3:03 PM, Ivan Raikov ivan.g.rai...@gmail.comwrote:


 Percent-encoded sequences of more than one octet will not get touched by
 pct-decode in the current implementation, so you will not get double
 escaping. Percent-encoded sequences of one octet will get decoded if they
 fall in the unstructured char-set, as per RFC 3986.


OK, now I'm thoroughly confused.  The percent-encoding is context sensitive?
How can this not be broken?

We need to make the design clear:

  * What can be constructed directly with make-uri.
  * What can be parsed, and how this is passed to make-uri.
  * How URIs are represented internally.
  * How URIs are encoded on output.

It sounds like uri-common and uri-generic are doing different things here.

-- 
Alex
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-15 Thread Peter Bex

On Tue, Jan 15, 2013 at 06:07:06PM +0900, Alex Shinn wrote:
 On Tue, Jan 15, 2013 at 3:03 PM, Ivan Raikov ivan.g.rai...@gmail.comwrote:
 
 
  Percent-encoded sequences of more than one octet will not get touched by
  pct-decode in the current implementation, so you will not get double
  escaping. Percent-encoded sequences of one octet will get decoded if they
  fall in the unstructured char-set, as per RFC 3986.
 
 
 OK, now I'm thoroughly confused.  The percent-encoding is context sensitive?
 How can this not be broken?
 
 We need to make the design clear:
 
   * What can be constructed directly with make-uri.
   * What can be parsed, and how this is passed to make-uri.
   * How URIs are represented internally.
   * How URIs are encoded on output.
 
 It sounds like uri-common and uri-generic are doing different things here.

uri-generic is agnostic about specific encodings and types.
uri-common is designed to make life simpler in the case of common URIs
like HTTP where we know what types of characters are to be decoded.

RFC3986 special characters cannot be decoded unless we know they have
no special meaning.  uri-common just decodes everything fully because
there is generally no deeper nested encoding involved.  It's also smart
enough to know that port 80 belongs to http, so it can be omitted,
whereas uri-generic can't make such assumptions.

uri-common also makes the assumption that query args are
x-www-form-urlencoded.  This is the main reason to prefer it for web
programming; uri-generic doesn't know about form-encoding because that
is really only used in the context of HTML (it's strictly not even a
HTTP thing), so this messy stuff should stay out of the generic URI
library.

Yes, the web is evil and must die.

Cheers,
Peter
-- 
http://sjamaan.ath.cx

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-15 Thread Peter Bex

On Tue, Jan 15, 2013 at 07:30:07PM +0900, Alex Shinn wrote:
 Right, I'm familiar with the evil standards :)  I'm also hoping that we can
 have some basic compatibility between Chicken's uri module and Chibi's
 (and whatever R7RS WG2 comes up with).

That would be nice indeed.

 It seems to me the sane thing to do is represent URIs unencoded
 internally, which can be generated directly with make-uri or decoded
 on parsing.

That cannot be done in general.  If you decode something like %2F, that
will wreak havoc with path-structured URIs.  The same will happen with
other types of special characters; you need to be able to distinguish
between the special character as-is and encoded.

These special characters are called reserved in the BNF.  As you can
see, the question mark, equals sign and ampersand is in there.
For query urlencoded query strings, these *cannot* be decoded, because
then you can't distinguish between

http://calc.example.com?bool-expr=x%26y%3D
and 
http://calc.example.com?bool-expr=xy=1

The former should be decoded in uri-common to the alist
((bool-expr . xy=1)) and the latter to ((bool-expr . x) (y . 1)).
By fully decoding all reserved characters in uri-generic, you drop
important information.

All unreserved characters are already fully decoded by uri-generic,
but this leaves the extra decoding of things like the ampersand above
inside the query string components after form-decoding to be done by
uri-common.

 The decoding might be schema-specific, although
 really the only difference is the space-to-+ and query args encoding.

No, the conversion to a friendly alist is specific to uri-common.

 I was confused because the uri-generic change Ivan suggests
 seems to be putting encoded characters directly in the representation,
 whereas uri-common is encoding only on output.

I don't understand this either.  I'm at work, so maybe it's just due to
a lack of complete attention.

 [It also looks like the uri-common encoding is broken - why were bytes
 getting lost?]

Probably because it doesn't correctly deal with UTF-8 in the decoding of
URLencoded form data.  I'll need a proper test case and some time to
look into it.

 Finally, regarding parsing I still don't understand why %AB is decoded
 into the corresponding octet but %AB%CD is not?

Unsure.

Cheers,
Peter
-- 
http://sjamaan.ath.cx

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-15 Thread Alex Shinn

On Tue, Jan 15, 2013 at 7:48 PM, Peter Bex peter@xs4all.nl wrote:


 These special characters are called reserved in the BNF.  As you can
 see, the question mark, equals sign and ampersand is in there.
 For query urlencoded query strings, these *cannot* be decoded, because
 then you can't distinguish between

 http://calc.example.com?bool-expr=x%26y%3D
 and
 http://calc.example.com?bool-expr=xy=1

 The former should be decoded in uri-common to the alist
 ((bool-expr . xy=1)) and the latter to ((bool-expr . x) (y . 1)).
 By fully decoding all reserved characters in uri-generic, you drop
 important information.


The internal representation is either decoded, or it is encoded.
Either can be made to work.

In this case, the decoded uri-common representation of the former is:

  ((bool-expr . xy=1))

and the decoded representation of the latter is:

  ((bool-expr . x) (y . 1))

just as you say, so this is how they are stored in the URI object.

In uri-generic, both get parsed to:

  ((bool-expr . xy=1))

As the RFC states:

   Because the percent (%) character serves as the indicator for
   percent-encoded octets, it must be percent-encoded as %25 for that
   octet to be used as data within a URI.

Therefore, if you intended the raw URI data to include a %,
then the correct representation (for either common or generic)
would have been:

  
http://calc.example.com?bool-expr=x%2526y%253Dhttp://calc.example.com/?bool-expr=x%26y%3D

So assuming  is _not_ special to the query (as is the case
with uri-generic), escaping  with %25 or not produces the
same result.

-- 
Alex
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-15 Thread Peter Bex

On Wed, Jan 16, 2013 at 12:39:16AM +0900, Alex Shinn wrote:
 The internal representation is either decoded, or it is encoded.
 Either can be made to work.
 
 In this case, the decoded uri-common representation of the former is:
 
   ((bool-expr . xy=1))
 
 and the decoded representation of the latter is:
 
   ((bool-expr . x) (y . 1))
 
 just as you say, so this is how they are stored in the URI object.
 
 In uri-generic, both get parsed to:
 
   ((bool-expr . xy=1))

This cannot work because uri-common is re-using uri-generic's parser.
Also, uri-generic doesn't do alist-decoding at all, because form-encoding
is a HTML affair and has nothing to do with HTTP or URI standards.

 Therefore, if you intended the raw URI data to include a %,
 then the correct representation (for either common or generic)
 would have been:
 
   
 http://calc.example.com?bool-expr=x%2526y%253Dhttp://calc.example.com/?bool-expr=x%26y%3D
 
 So assuming  is _not_ special to the query (as is the case
 with uri-generic), escaping  with %25 or not produces the
 same result.

If you can make it work for both libraries, feel free to do so, but
my energy to work on web stuff is very very low at the moment.

Cheers,
Peter
-- 
http://sjamaan.ath.cx

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] Some questions about loading libraries

2013-01-15 Thread Felix

 I am developing a project which I expect will involve a number of extension
 libraries, or plugins (a large number, many of them provided by third
 parties, if my project ever becomes popular). For several reasons (which I
 will explain on request if anyone is curious), I feel it is best *not* to
 implement these libraries as eggs. So I am trying to work out a reasonable
 method for deploying the libraries and loading them at runtime.

Sure. What does plugin mean, though? A dynamically loadable library?
I assume it is compiled code? Written in Scheme?

 1) Is there a way to set an arbitrary search path, such that REQUIRE (or
 LOAD, or some such thing) will work when the library is in a non-standard
 location?

You could implement your own routine (probably based on load), which
scans a number of directories. repository-path is for eggs, and require
respects it. You can also add directories to ##sys#include-pathnames.
I think require will try this too.

 
 2) Is it possible to load a library selected at runtime (via an environment
 variable, config file, command-line argument ... the exact method isn't
 that important) AND have the symbols defined in that library included in a
 module?

You can compute arbitrary file-paths and pass them to load. To make
the symbols available, you have to make sure the import-libraries for
the modules are available. How do you use the symbols in the
libraries?  Do you evaluate code at runtime?

 
 3) Is there a way for a Chicken executable or library to determine its own
 location in the filesystem?

Hm. You can copy the C_path_to_executable(argv[ 0 ]) in chicken.h,
and use it via the foreign-function interface to get the pathname
of an executable. 

 
 4) Is LOAD-RELATIVE broken? I wrote some test code to try to use that
 procedure, but as far as I can tell it behaves just like LOAD, i.e. any
 relative path I give it is determined relative to the current directory
 from which the program is invoked. Though I would note that the
 documentation isn't entirely clear to me: loads FILE relative to the path
 of the currently loaded file. ...? What does the currently loaded file
 mean? I will be happy to post my code if this is not a known issue.

It means that if loading is chained, nested load-relative calls with
load files relative to the outer load/load-relative invocations. e.g.

file a.scm:

...
(load foo/b.scm)
...

file foo/b.scm:

...
(load-relative c.scm) ; - will load foo/c.scm
...

Perhaps, if you describe the implementation of your plugins and the
way the code therein is invoked, we can give more detailed advice.


cheers,
felix

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-15 Thread Alex Shinn

On Wed, Jan 16, 2013 at 12:59 AM, Peter Bex peter@xs4all.nl wrote:

 On Wed, Jan 16, 2013 at 12:39:16AM +0900, Alex Shinn wrote:
  The internal representation is either decoded, or it is encoded.
  Either can be made to work.
 
  In this case, the decoded uri-common representation of the former is:
 
((bool-expr . xy=1))
 
  and the decoded representation of the latter is:
 
((bool-expr . x) (y . 1))
 
  just as you say, so this is how they are stored in the URI object.
 
  In uri-generic, both get parsed to:
 
((bool-expr . xy=1))

 This cannot work because uri-common is re-using uri-generic's parser.
 Also, uri-generic doesn't do alist-decoding at all, because form-encoding
 is a HTML affair and has nothing to do with HTTP or URI standards.


Ah, OK, there may be implementation details on why you
store encoded or decoded.

Anyway, this isn't really important.  I'm mostly concerned
with making utf8 do the right thing, and was wondering what
the API was because it's not clear from the docs.

Put another way, do uri-path and uri-query return the
encoded or decoded values (maybe differently for uri-common
and uri-generic)?

-- 
Alex
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

Re: [Chicken-users] Some questions about loading libraries

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

7 matches

Site Navigation

Mail list logo

Footer information