Re: [basex-talk] Shoud 'addraw' work as parameter with db:create?

2020-01-24 Thread Christian Grün
Hi France,

I just ran a little 'test.bxs' command script with the following
contents on command line:

xquery file:create-dir('test')
xquery file:write('test/dummy.png', ())
xquery db:create('db', 'test', '/', map { 'addraw': true() })
xquery db:list('db')

It outputs 'dummy.png' as expected.

Could you provide us with some self-contained example code?

Thanks,
Christian



On Fri, Jan 24, 2020 at 3:12 PM France Baril
 wrote:
>
> Hi,
>
> I have this function that loads all the indexable content, but leaves the 
> .png behind. Am I missing something?
>
> declare %rest:path("/demo/create-db")
> %rest:GET
> %rest:query-param('db', '{$db}', 'new-name')
> %rest:query-param('dir-src', '{$dir-src}', '')
> %output:method("html")
> %output:html-version("5.0")
> updating function democlean:create-db($db as xs:string, $dir-src as 
> xs:string){
>let $params := map {'updindex': true(), 'language': 'fr', 'addraw': 
> true(), 'chop': false(), 'intparse': true(), 'createfilter': '*.xml, 
> *.ditamap, *.dita'}
>return (
>   if (db:exists($db))
>   then db:drop($db)
>   else (),
>   if ($dir-src = '')
>   then
> db:create($db, (), (), $params)
>   else
> db:create($db, $dir-src, '/', $params),
>
>db:output(DB ready)
>)
> }
>
>  I also tried 'serializer: ''indent=no' as a parameter, which was 
> unrecognized. I assume that since addraw is not unrecognized, it should work.
>
> --
> France Baril
> Architecte documentaire / Documentation architect
> france.ba...@architextus.com


Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ

2020-01-24 Thread Christian Grün
Hi Ivan, hi Gerrit,

Thanks for your assessments.

Most design decisions in RESTXQ have been taken from Java’s JAX-RS API
[1]. The semantics for accessing paths is a bit more complex, though:
JAX-RS provides two annotations @Path and @PathParam to access the
full path and segments of the path, and the segments are automatically
decoded. Automatic decoding can be disabled via an optional @Encoded
annotation.

In RESTXQ, we only have a single %rest:path annotations, which
contains both the full path as well as variables for path segments.

Requests with wrongly encoded URLs, such as http://localhost:8984/a%2,
are already rejected by Jetty (and, I guess, any other web servers).
They are rejected before any RESTXQ code can intervene. If a URLs is
correctly encoded, the Java servlet function getPathInfo() is used to
obtain the path. I noticed there is an alternative function
getRequestURI() that could be used to access the original URL.

Maybe the introduction of a %rest:encoded annotation could be
discussed in the EXQuery/RESTXQ repository [2]?

Best,
Christian

[1] https://download.oracle.com/otndocs/jcp/jaxrs-2_0-fr-eval-spec/index.html
[2] https://github.com/exquery/exquery/issues



On Fri, Jan 24, 2020 at 2:38 PM Imsieke, Gerrit, le-tex
 wrote:
>
>
>
> On 24.01.2020 14:36, Imsieke, Gerrit, le-tex wrote:
> > So I agree, BaseX should not interpret escaped slashes as if they were
> > regular slashes, thereby disallowing them as part of RESTXQ path pa
>
> …rameters.


[basex-talk] Shoud 'addraw' work as parameter with db:create?

2020-01-24 Thread France Baril
Hi,

I have this function that loads all the indexable content, but leaves the
.png behind. Am I missing something?

declare %rest:path("/demo/create-db")
%rest:GET
%rest:query-param('db', '{$db}', 'new-name')
%rest:query-param('dir-src', '{$dir-src}', '')
%output:method("html")
%output:html-version("5.0")
updating function democlean:create-db($db as xs:string, $dir-src as
xs:string){
   let $params := map {'updindex': true(), 'language': 'fr', 'addraw':
true(), 'chop': false(), 'intparse': true(), 'createfilter': '*.xml,
*.ditamap, *.dita'}
   return (
  if (db:exists($db))
  then db:drop($db)
  else (),
  if ($dir-src = '')
  then
db:create($db, (), (), $params)
  else
db:create($db, $dir-src, '/', $params),

   db:output(DB ready)
   )
}

 I also tried 'serializer: ''indent=no' as a parameter, which was
unrecognized. I assume that since addraw is not unrecognized, it should
work.

-- 
France Baril
Architecte documentaire / Documentation architect
france.ba...@architextus.com


Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ

2020-01-24 Thread Imsieke, Gerrit, le-tex




On 24.01.2020 14:36, Imsieke, Gerrit, le-tex wrote:
So I agree, BaseX should not interpret escaped slashes as if they were 
regular slashes, thereby disallowing them as part of RESTXQ path pa


…rameters.


Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ

2020-01-24 Thread Imsieke, Gerrit, le-tex
While moving the URI parameter to the query string seems like an 
acceptable workaround, I, too, suggest that if *reserved* URI characters 
such as '/' appear percent-encoded, they should not be converted to 
their decoded character prior to analyzing the URI, in line with Sect. 
2.2 of RFC 3986 [1].


If I enter an escaped colon (%3A) in a path segment, it will be kept as 
%3A by BaseX, rather than converted to the reserved character ':'.


The RESTXQ specification [2] doesn’t seem to contain detailed 
instructions on how to decode the submitted URI before extracting path 
parameters, therefore I think RFC 3986 should prevail.


So I agree, BaseX should not interpret escaped slashes as if they were 
regular slashes, thereby disallowing them as part of RESTXQ path pa


Gerrit

[1] https://tools.ietf.org/html/rfc3986#section-2.2
[2] 
http://exquery.github.io/exquery/exquery-restxq-specification/restxq-1.0-specification.html


On 24.01.2020 13:54, Ivan Kanakarakis wrote:

Hi Christian,

thanks for the quick reply. It definitely helps, but it still keeps
this behaviour in the "weird" domain.
I do not see a reason to be decoding the URI before it gets to match a
route. What is the reason for this?

What you propose works, but if I have a route like
"/search/{$query=.+}/page/{$page}", then the query will match
everything including "/page/...". If the path was not decoded, I do
not think I would need the regex, neither any other special operation
on the route. It should work with "/search/{$query}/page/{$page}" and
it should return "tea%2Ftime". Why do I have to make workarounds to
try to guess how a part of the URL was encoded, when the URL I hit has
that part encoded?
I don't think it makes sense, and I don't see a use case for this.

When the framework receives the payload, it is responsible to match a route.
By matching the route, it will provide me with the binded parts of the
route that I requested.
Then, *I* am responsible to decode those parts as I see fit and handle
the request as I need.

If the framework decodes the URL before matching a route, that is a
problem to me - I do not have the control I need.
If the framework decodes the URL parts before binding the route
variables, this is fine - it saves me an operation.

While, I now refactored the endpoint handlers to work with query
params, and this is no longer a problem for me, it is a problem in
general.


Cheers,



On Mon, 20 Jan 2020 at 19:36, Christian Grün  wrote:


Hi Ivan,

A more common approach is to supply search terms as query parameters
(URL?query=...); in that case, your path won’t have new segments. If
you prefer paths, you can use a regular expression in your RESTXQ path
pattern [1]:

   "search/{$query=.+}"

In both cases, encodeURIComponent should be the appropriate function
to encode special characters.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/RESTXQ#Paths





On Mon, Jan 20, 2020 at 10:54 AM Ivan Kanakarakis
 wrote:


Hello everyone,

I am using BaseX 8.44 and the REST XQ interface (ie,
http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when
invoked with GET, it does a full text search (using "$db-nodes[text()
contains text { $term } all]"), gets the results, constructs a JSON
response and sends it back.

That's all fine and works great. However, I am not sure how I should
be doing the queries I describe bellow.

_Note: the query is initiated by a SPA javascript client, thus when I
say encode/uri-escape, what I mean is that I invoke the
encodeURIComponent function
(https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent).
_Note 2: for the sake of conversation let's consider the example
endpoint declared as:

 %rest:GET
 %rest:path("/search/{$term}")


1. I want to search for "tea". That is the basic query. A single term,
no problem.

 curl -s "https://example.com/search/tea";


2. I want to search for "tea time". Now, this query has a space in
between the two words. What I expect to get back, is any node that
contains both words (thus I have used "contains text" with "all"),
even if they may be a few words apart.
- Should I be sending an encoded/uri-escape version of this, ie, "tea%20time"?
- Or, should I be replacing the space with "+", ie "tea+time"?
- Or, some other advice?

 curl -s "https://example.com/search/tea%20time";
 curl -s "https://example.com/search/tea+time";


3. I want to search for "tea/time". This is even trickier. What I
expect to get back, is any node that contains "tea/time", ie a search
result for a single term. How do I do this?
- If I do not do anything, the slash is treated as part of the URL,
thus not matching a route.
- If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I
invoke the endpoint I get the same as if there was a slash.
- I am not sure how I should deal with the slash. How should I
escape/encode this?

 curl -s "https://example.com/search/tea/time";
 curl -s "https://example.com

Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ

2020-01-24 Thread Ivan Kanakarakis
Hi Christian,

thanks for the quick reply. It definitely helps, but it still keeps
this behaviour in the "weird" domain.
I do not see a reason to be decoding the URI before it gets to match a
route. What is the reason for this?

What you propose works, but if I have a route like
"/search/{$query=.+}/page/{$page}", then the query will match
everything including "/page/...". If the path was not decoded, I do
not think I would need the regex, neither any other special operation
on the route. It should work with "/search/{$query}/page/{$page}" and
it should return "tea%2Ftime". Why do I have to make workarounds to
try to guess how a part of the URL was encoded, when the URL I hit has
that part encoded?
I don't think it makes sense, and I don't see a use case for this.

When the framework receives the payload, it is responsible to match a route.
By matching the route, it will provide me with the binded parts of the
route that I requested.
Then, *I* am responsible to decode those parts as I see fit and handle
the request as I need.

If the framework decodes the URL before matching a route, that is a
problem to me - I do not have the control I need.
If the framework decodes the URL parts before binding the route
variables, this is fine - it saves me an operation.

While, I now refactored the endpoint handlers to work with query
params, and this is no longer a problem for me, it is a problem in
general.


Cheers,



On Mon, 20 Jan 2020 at 19:36, Christian Grün  wrote:
>
> Hi Ivan,
>
> A more common approach is to supply search terms as query parameters
> (URL?query=...); in that case, your path won’t have new segments. If
> you prefer paths, you can use a regular expression in your RESTXQ path
> pattern [1]:
>
>   "search/{$query=.+}"
>
> In both cases, encodeURIComponent should be the appropriate function
> to encode special characters.
>
> Hope this helps,
> Christian
>
> [1] http://docs.basex.org/wiki/RESTXQ#Paths
>
>
>
>
>
> On Mon, Jan 20, 2020 at 10:54 AM Ivan Kanakarakis
>  wrote:
> >
> > Hello everyone,
> >
> > I am using BaseX 8.44 and the REST XQ interface (ie,
> > http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when
> > invoked with GET, it does a full text search (using "$db-nodes[text()
> > contains text { $term } all]"), gets the results, constructs a JSON
> > response and sends it back.
> >
> > That's all fine and works great. However, I am not sure how I should
> > be doing the queries I describe bellow.
> >
> > _Note: the query is initiated by a SPA javascript client, thus when I
> > say encode/uri-escape, what I mean is that I invoke the
> > encodeURIComponent function
> > (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent).
> > _Note 2: for the sake of conversation let's consider the example
> > endpoint declared as:
> >
> > %rest:GET
> > %rest:path("/search/{$term}")
> >
> >
> > 1. I want to search for "tea". That is the basic query. A single term,
> > no problem.
> >
> > curl -s "https://example.com/search/tea";
> >
> >
> > 2. I want to search for "tea time". Now, this query has a space in
> > between the two words. What I expect to get back, is any node that
> > contains both words (thus I have used "contains text" with "all"),
> > even if they may be a few words apart.
> > - Should I be sending an encoded/uri-escape version of this, ie, 
> > "tea%20time"?
> > - Or, should I be replacing the space with "+", ie "tea+time"?
> > - Or, some other advice?
> >
> > curl -s "https://example.com/search/tea%20time";
> > curl -s "https://example.com/search/tea+time";
> >
> >
> > 3. I want to search for "tea/time". This is even trickier. What I
> > expect to get back, is any node that contains "tea/time", ie a search
> > result for a single term. How do I do this?
> > - If I do not do anything, the slash is treated as part of the URL,
> > thus not matching a route.
> > - If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I
> > invoke the endpoint I get the same as if there was a slash.
> > - I am not sure how I should deal with the slash. How should I
> > escape/encode this?
> >
> > curl -s "https://example.com/search/tea/time";
> > curl -s "https://example.com/search/tea%2Ftime";
> >
> >
> > Thank you,