Re: [basex-talk] Shoud 'addraw' work as parameter with db:create?
Hi France, I just ran a little 'test.bxs' command script with the following contents on command line: xquery file:create-dir('test') xquery file:write('test/dummy.png', ()) xquery db:create('db', 'test', '/', map { 'addraw': true() }) xquery db:list('db') It outputs 'dummy.png' as expected. Could you provide us with some self-contained example code? Thanks, Christian On Fri, Jan 24, 2020 at 3:12 PM France Baril wrote: > > Hi, > > I have this function that loads all the indexable content, but leaves the > .png behind. Am I missing something? > > declare %rest:path("/demo/create-db") > %rest:GET > %rest:query-param('db', '{$db}', 'new-name') > %rest:query-param('dir-src', '{$dir-src}', '') > %output:method("html") > %output:html-version("5.0") > updating function democlean:create-db($db as xs:string, $dir-src as > xs:string){ >let $params := map {'updindex': true(), 'language': 'fr', 'addraw': > true(), 'chop': false(), 'intparse': true(), 'createfilter': '*.xml, > *.ditamap, *.dita'} >return ( > if (db:exists($db)) > then db:drop($db) > else (), > if ($dir-src = '') > then > db:create($db, (), (), $params) > else > db:create($db, $dir-src, '/', $params), > >db:output(DB ready) >) > } > > I also tried 'serializer: ''indent=no' as a parameter, which was > unrecognized. I assume that since addraw is not unrecognized, it should work. > > -- > France Baril > Architecte documentaire / Documentation architect > france.ba...@architextus.com
Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ
Hi Ivan, hi Gerrit, Thanks for your assessments. Most design decisions in RESTXQ have been taken from Java’s JAX-RS API [1]. The semantics for accessing paths is a bit more complex, though: JAX-RS provides two annotations @Path and @PathParam to access the full path and segments of the path, and the segments are automatically decoded. Automatic decoding can be disabled via an optional @Encoded annotation. In RESTXQ, we only have a single %rest:path annotations, which contains both the full path as well as variables for path segments. Requests with wrongly encoded URLs, such as http://localhost:8984/a%2, are already rejected by Jetty (and, I guess, any other web servers). They are rejected before any RESTXQ code can intervene. If a URLs is correctly encoded, the Java servlet function getPathInfo() is used to obtain the path. I noticed there is an alternative function getRequestURI() that could be used to access the original URL. Maybe the introduction of a %rest:encoded annotation could be discussed in the EXQuery/RESTXQ repository [2]? Best, Christian [1] https://download.oracle.com/otndocs/jcp/jaxrs-2_0-fr-eval-spec/index.html [2] https://github.com/exquery/exquery/issues On Fri, Jan 24, 2020 at 2:38 PM Imsieke, Gerrit, le-tex wrote: > > > > On 24.01.2020 14:36, Imsieke, Gerrit, le-tex wrote: > > So I agree, BaseX should not interpret escaped slashes as if they were > > regular slashes, thereby disallowing them as part of RESTXQ path pa > > …rameters.
[basex-talk] Shoud 'addraw' work as parameter with db:create?
Hi, I have this function that loads all the indexable content, but leaves the .png behind. Am I missing something? declare %rest:path("/demo/create-db") %rest:GET %rest:query-param('db', '{$db}', 'new-name') %rest:query-param('dir-src', '{$dir-src}', '') %output:method("html") %output:html-version("5.0") updating function democlean:create-db($db as xs:string, $dir-src as xs:string){ let $params := map {'updindex': true(), 'language': 'fr', 'addraw': true(), 'chop': false(), 'intparse': true(), 'createfilter': '*.xml, *.ditamap, *.dita'} return ( if (db:exists($db)) then db:drop($db) else (), if ($dir-src = '') then db:create($db, (), (), $params) else db:create($db, $dir-src, '/', $params), db:output(DB ready) ) } I also tried 'serializer: ''indent=no' as a parameter, which was unrecognized. I assume that since addraw is not unrecognized, it should work. -- France Baril Architecte documentaire / Documentation architect france.ba...@architextus.com
Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ
On 24.01.2020 14:36, Imsieke, Gerrit, le-tex wrote: So I agree, BaseX should not interpret escaped slashes as if they were regular slashes, thereby disallowing them as part of RESTXQ path pa …rameters.
Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ
While moving the URI parameter to the query string seems like an acceptable workaround, I, too, suggest that if *reserved* URI characters such as '/' appear percent-encoded, they should not be converted to their decoded character prior to analyzing the URI, in line with Sect. 2.2 of RFC 3986 [1]. If I enter an escaped colon (%3A) in a path segment, it will be kept as %3A by BaseX, rather than converted to the reserved character ':'. The RESTXQ specification [2] doesn’t seem to contain detailed instructions on how to decode the submitted URI before extracting path parameters, therefore I think RFC 3986 should prevail. So I agree, BaseX should not interpret escaped slashes as if they were regular slashes, thereby disallowing them as part of RESTXQ path pa Gerrit [1] https://tools.ietf.org/html/rfc3986#section-2.2 [2] http://exquery.github.io/exquery/exquery-restxq-specification/restxq-1.0-specification.html On 24.01.2020 13:54, Ivan Kanakarakis wrote: Hi Christian, thanks for the quick reply. It definitely helps, but it still keeps this behaviour in the "weird" domain. I do not see a reason to be decoding the URI before it gets to match a route. What is the reason for this? What you propose works, but if I have a route like "/search/{$query=.+}/page/{$page}", then the query will match everything including "/page/...". If the path was not decoded, I do not think I would need the regex, neither any other special operation on the route. It should work with "/search/{$query}/page/{$page}" and it should return "tea%2Ftime". Why do I have to make workarounds to try to guess how a part of the URL was encoded, when the URL I hit has that part encoded? I don't think it makes sense, and I don't see a use case for this. When the framework receives the payload, it is responsible to match a route. By matching the route, it will provide me with the binded parts of the route that I requested. Then, *I* am responsible to decode those parts as I see fit and handle the request as I need. If the framework decodes the URL before matching a route, that is a problem to me - I do not have the control I need. If the framework decodes the URL parts before binding the route variables, this is fine - it saves me an operation. While, I now refactored the endpoint handlers to work with query params, and this is no longer a problem for me, it is a problem in general. Cheers, On Mon, 20 Jan 2020 at 19:36, Christian Grün wrote: Hi Ivan, A more common approach is to supply search terms as query parameters (URL?query=...); in that case, your path won’t have new segments. If you prefer paths, you can use a regular expression in your RESTXQ path pattern [1]: "search/{$query=.+}" In both cases, encodeURIComponent should be the appropriate function to encode special characters. Hope this helps, Christian [1] http://docs.basex.org/wiki/RESTXQ#Paths On Mon, Jan 20, 2020 at 10:54 AM Ivan Kanakarakis wrote: Hello everyone, I am using BaseX 8.44 and the REST XQ interface (ie, http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when invoked with GET, it does a full text search (using "$db-nodes[text() contains text { $term } all]"), gets the results, constructs a JSON response and sends it back. That's all fine and works great. However, I am not sure how I should be doing the queries I describe bellow. _Note: the query is initiated by a SPA javascript client, thus when I say encode/uri-escape, what I mean is that I invoke the encodeURIComponent function (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent). _Note 2: for the sake of conversation let's consider the example endpoint declared as: %rest:GET %rest:path("/search/{$term}") 1. I want to search for "tea". That is the basic query. A single term, no problem. curl -s "https://example.com/search/tea"; 2. I want to search for "tea time". Now, this query has a space in between the two words. What I expect to get back, is any node that contains both words (thus I have used "contains text" with "all"), even if they may be a few words apart. - Should I be sending an encoded/uri-escape version of this, ie, "tea%20time"? - Or, should I be replacing the space with "+", ie "tea+time"? - Or, some other advice? curl -s "https://example.com/search/tea%20time"; curl -s "https://example.com/search/tea+time"; 3. I want to search for "tea/time". This is even trickier. What I expect to get back, is any node that contains "tea/time", ie a search result for a single term. How do I do this? - If I do not do anything, the slash is treated as part of the URL, thus not matching a route. - If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I invoke the endpoint I get the same as if there was a slash. - I am not sure how I should deal with the slash. How should I escape/encode this? curl -s "https://example.com/search/tea/time"; curl -s "https://example.com
Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ
Hi Christian, thanks for the quick reply. It definitely helps, but it still keeps this behaviour in the "weird" domain. I do not see a reason to be decoding the URI before it gets to match a route. What is the reason for this? What you propose works, but if I have a route like "/search/{$query=.+}/page/{$page}", then the query will match everything including "/page/...". If the path was not decoded, I do not think I would need the regex, neither any other special operation on the route. It should work with "/search/{$query}/page/{$page}" and it should return "tea%2Ftime". Why do I have to make workarounds to try to guess how a part of the URL was encoded, when the URL I hit has that part encoded? I don't think it makes sense, and I don't see a use case for this. When the framework receives the payload, it is responsible to match a route. By matching the route, it will provide me with the binded parts of the route that I requested. Then, *I* am responsible to decode those parts as I see fit and handle the request as I need. If the framework decodes the URL before matching a route, that is a problem to me - I do not have the control I need. If the framework decodes the URL parts before binding the route variables, this is fine - it saves me an operation. While, I now refactored the endpoint handlers to work with query params, and this is no longer a problem for me, it is a problem in general. Cheers, On Mon, 20 Jan 2020 at 19:36, Christian Grün wrote: > > Hi Ivan, > > A more common approach is to supply search terms as query parameters > (URL?query=...); in that case, your path won’t have new segments. If > you prefer paths, you can use a regular expression in your RESTXQ path > pattern [1]: > > "search/{$query=.+}" > > In both cases, encodeURIComponent should be the appropriate function > to encode special characters. > > Hope this helps, > Christian > > [1] http://docs.basex.org/wiki/RESTXQ#Paths > > > > > > On Mon, Jan 20, 2020 at 10:54 AM Ivan Kanakarakis > wrote: > > > > Hello everyone, > > > > I am using BaseX 8.44 and the REST XQ interface (ie, > > http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when > > invoked with GET, it does a full text search (using "$db-nodes[text() > > contains text { $term } all]"), gets the results, constructs a JSON > > response and sends it back. > > > > That's all fine and works great. However, I am not sure how I should > > be doing the queries I describe bellow. > > > > _Note: the query is initiated by a SPA javascript client, thus when I > > say encode/uri-escape, what I mean is that I invoke the > > encodeURIComponent function > > (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent). > > _Note 2: for the sake of conversation let's consider the example > > endpoint declared as: > > > > %rest:GET > > %rest:path("/search/{$term}") > > > > > > 1. I want to search for "tea". That is the basic query. A single term, > > no problem. > > > > curl -s "https://example.com/search/tea"; > > > > > > 2. I want to search for "tea time". Now, this query has a space in > > between the two words. What I expect to get back, is any node that > > contains both words (thus I have used "contains text" with "all"), > > even if they may be a few words apart. > > - Should I be sending an encoded/uri-escape version of this, ie, > > "tea%20time"? > > - Or, should I be replacing the space with "+", ie "tea+time"? > > - Or, some other advice? > > > > curl -s "https://example.com/search/tea%20time"; > > curl -s "https://example.com/search/tea+time"; > > > > > > 3. I want to search for "tea/time". This is even trickier. What I > > expect to get back, is any node that contains "tea/time", ie a search > > result for a single term. How do I do this? > > - If I do not do anything, the slash is treated as part of the URL, > > thus not matching a route. > > - If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I > > invoke the endpoint I get the same as if there was a slash. > > - I am not sure how I should deal with the slash. How should I > > escape/encode this? > > > > curl -s "https://example.com/search/tea/time"; > > curl -s "https://example.com/search/tea%2Ftime"; > > > > > > Thank you,