Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.
On Tue, Jan 15, 2013 at 3:03 PM, Ivan Raikov ivan.g.rai...@gmail.comwrote: Percent-encoded sequences of more than one octet will not get touched by pct-decode in the current implementation, so you will not get double escaping. Percent-encoded sequences of one octet will get decoded if they fall in the unstructured char-set, as per RFC 3986. OK, now I'm thoroughly confused. The percent-encoding is context sensitive? How can this not be broken? We need to make the design clear: * What can be constructed directly with make-uri. * What can be parsed, and how this is passed to make-uri. * How URIs are represented internally. * How URIs are encoded on output. It sounds like uri-common and uri-generic are doing different things here. -- Alex ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.
On Tue, Jan 15, 2013 at 06:07:06PM +0900, Alex Shinn wrote: On Tue, Jan 15, 2013 at 3:03 PM, Ivan Raikov ivan.g.rai...@gmail.comwrote: Percent-encoded sequences of more than one octet will not get touched by pct-decode in the current implementation, so you will not get double escaping. Percent-encoded sequences of one octet will get decoded if they fall in the unstructured char-set, as per RFC 3986. OK, now I'm thoroughly confused. The percent-encoding is context sensitive? How can this not be broken? We need to make the design clear: * What can be constructed directly with make-uri. * What can be parsed, and how this is passed to make-uri. * How URIs are represented internally. * How URIs are encoded on output. It sounds like uri-common and uri-generic are doing different things here. uri-generic is agnostic about specific encodings and types. uri-common is designed to make life simpler in the case of common URIs like HTTP where we know what types of characters are to be decoded. RFC3986 special characters cannot be decoded unless we know they have no special meaning. uri-common just decodes everything fully because there is generally no deeper nested encoding involved. It's also smart enough to know that port 80 belongs to http, so it can be omitted, whereas uri-generic can't make such assumptions. uri-common also makes the assumption that query args are x-www-form-urlencoded. This is the main reason to prefer it for web programming; uri-generic doesn't know about form-encoding because that is really only used in the context of HTML (it's strictly not even a HTTP thing), so this messy stuff should stay out of the generic URI library. Yes, the web is evil and must die. Cheers, Peter -- http://sjamaan.ath.cx ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.
On Tue, Jan 15, 2013 at 07:30:07PM +0900, Alex Shinn wrote: Right, I'm familiar with the evil standards :) I'm also hoping that we can have some basic compatibility between Chicken's uri module and Chibi's (and whatever R7RS WG2 comes up with). That would be nice indeed. It seems to me the sane thing to do is represent URIs unencoded internally, which can be generated directly with make-uri or decoded on parsing. That cannot be done in general. If you decode something like %2F, that will wreak havoc with path-structured URIs. The same will happen with other types of special characters; you need to be able to distinguish between the special character as-is and encoded. These special characters are called reserved in the BNF. As you can see, the question mark, equals sign and ampersand is in there. For query urlencoded query strings, these *cannot* be decoded, because then you can't distinguish between http://calc.example.com?bool-expr=x%26y%3D and http://calc.example.com?bool-expr=xy=1 The former should be decoded in uri-common to the alist ((bool-expr . xy=1)) and the latter to ((bool-expr . x) (y . 1)). By fully decoding all reserved characters in uri-generic, you drop important information. All unreserved characters are already fully decoded by uri-generic, but this leaves the extra decoding of things like the ampersand above inside the query string components after form-decoding to be done by uri-common. The decoding might be schema-specific, although really the only difference is the space-to-+ and query args encoding. No, the conversion to a friendly alist is specific to uri-common. I was confused because the uri-generic change Ivan suggests seems to be putting encoded characters directly in the representation, whereas uri-common is encoding only on output. I don't understand this either. I'm at work, so maybe it's just due to a lack of complete attention. [It also looks like the uri-common encoding is broken - why were bytes getting lost?] Probably because it doesn't correctly deal with UTF-8 in the decoding of URLencoded form data. I'll need a proper test case and some time to look into it. Finally, regarding parsing I still don't understand why %AB is decoded into the corresponding octet but %AB%CD is not? Unsure. Cheers, Peter -- http://sjamaan.ath.cx ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.
On Tue, Jan 15, 2013 at 7:48 PM, Peter Bex peter@xs4all.nl wrote: These special characters are called reserved in the BNF. As you can see, the question mark, equals sign and ampersand is in there. For query urlencoded query strings, these *cannot* be decoded, because then you can't distinguish between http://calc.example.com?bool-expr=x%26y%3D and http://calc.example.com?bool-expr=xy=1 The former should be decoded in uri-common to the alist ((bool-expr . xy=1)) and the latter to ((bool-expr . x) (y . 1)). By fully decoding all reserved characters in uri-generic, you drop important information. The internal representation is either decoded, or it is encoded. Either can be made to work. In this case, the decoded uri-common representation of the former is: ((bool-expr . xy=1)) and the decoded representation of the latter is: ((bool-expr . x) (y . 1)) just as you say, so this is how they are stored in the URI object. In uri-generic, both get parsed to: ((bool-expr . xy=1)) As the RFC states: Because the percent (%) character serves as the indicator for percent-encoded octets, it must be percent-encoded as %25 for that octet to be used as data within a URI. Therefore, if you intended the raw URI data to include a %, then the correct representation (for either common or generic) would have been: http://calc.example.com?bool-expr=x%2526y%253Dhttp://calc.example.com/?bool-expr=x%26y%3D So assuming is _not_ special to the query (as is the case with uri-generic), escaping with %25 or not produces the same result. -- Alex ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.
On Wed, Jan 16, 2013 at 12:39:16AM +0900, Alex Shinn wrote: The internal representation is either decoded, or it is encoded. Either can be made to work. In this case, the decoded uri-common representation of the former is: ((bool-expr . xy=1)) and the decoded representation of the latter is: ((bool-expr . x) (y . 1)) just as you say, so this is how they are stored in the URI object. In uri-generic, both get parsed to: ((bool-expr . xy=1)) This cannot work because uri-common is re-using uri-generic's parser. Also, uri-generic doesn't do alist-decoding at all, because form-encoding is a HTML affair and has nothing to do with HTTP or URI standards. Therefore, if you intended the raw URI data to include a %, then the correct representation (for either common or generic) would have been: http://calc.example.com?bool-expr=x%2526y%253Dhttp://calc.example.com/?bool-expr=x%26y%3D So assuming is _not_ special to the query (as is the case with uri-generic), escaping with %25 or not produces the same result. If you can make it work for both libraries, feel free to do so, but my energy to work on web stuff is very very low at the moment. Cheers, Peter -- http://sjamaan.ath.cx ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Some questions about loading libraries
I am developing a project which I expect will involve a number of extension libraries, or plugins (a large number, many of them provided by third parties, if my project ever becomes popular). For several reasons (which I will explain on request if anyone is curious), I feel it is best *not* to implement these libraries as eggs. So I am trying to work out a reasonable method for deploying the libraries and loading them at runtime. Sure. What does plugin mean, though? A dynamically loadable library? I assume it is compiled code? Written in Scheme? 1) Is there a way to set an arbitrary search path, such that REQUIRE (or LOAD, or some such thing) will work when the library is in a non-standard location? You could implement your own routine (probably based on load), which scans a number of directories. repository-path is for eggs, and require respects it. You can also add directories to ##sys#include-pathnames. I think require will try this too. 2) Is it possible to load a library selected at runtime (via an environment variable, config file, command-line argument ... the exact method isn't that important) AND have the symbols defined in that library included in a module? You can compute arbitrary file-paths and pass them to load. To make the symbols available, you have to make sure the import-libraries for the modules are available. How do you use the symbols in the libraries? Do you evaluate code at runtime? 3) Is there a way for a Chicken executable or library to determine its own location in the filesystem? Hm. You can copy the C_path_to_executable(argv[ 0 ]) in chicken.h, and use it via the foreign-function interface to get the pathname of an executable. 4) Is LOAD-RELATIVE broken? I wrote some test code to try to use that procedure, but as far as I can tell it behaves just like LOAD, i.e. any relative path I give it is determined relative to the current directory from which the program is invoked. Though I would note that the documentation isn't entirely clear to me: loads FILE relative to the path of the currently loaded file. ...? What does the currently loaded file mean? I will be happy to post my code if this is not a known issue. It means that if loading is chained, nested load-relative calls with load files relative to the outer load/load-relative invocations. e.g. file a.scm: ... (load foo/b.scm) ... file foo/b.scm: ... (load-relative c.scm) ; - will load foo/c.scm ... Perhaps, if you describe the implementation of your plugins and the way the code therein is invoked, we can give more detailed advice. cheers, felix ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.
On Wed, Jan 16, 2013 at 12:59 AM, Peter Bex peter@xs4all.nl wrote: On Wed, Jan 16, 2013 at 12:39:16AM +0900, Alex Shinn wrote: The internal representation is either decoded, or it is encoded. Either can be made to work. In this case, the decoded uri-common representation of the former is: ((bool-expr . xy=1)) and the decoded representation of the latter is: ((bool-expr . x) (y . 1)) just as you say, so this is how they are stored in the URI object. In uri-generic, both get parsed to: ((bool-expr . xy=1)) This cannot work because uri-common is re-using uri-generic's parser. Also, uri-generic doesn't do alist-decoding at all, because form-encoding is a HTML affair and has nothing to do with HTTP or URI standards. Ah, OK, there may be implementation details on why you store encoded or decoded. Anyway, this isn't really important. I'm mostly concerned with making utf8 do the right thing, and was wondering what the API was because it's not clear from the docs. Put another way, do uri-path and uri-query return the encoded or decoded values (maybe differently for uri-common and uri-generic)? -- Alex ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users