A last update, which may illuminate a little. After reindexing the database
using Norwegian (snowball), stemming, and keeping diacritis, RESTXQ
processes neither the special characters (treats them as closest ascii),
nor inflected forms.

The words "mannen" (=the man, definite) and "spaserer" (=walks, present
tense), result in no output, while using the naked stems "mann" and
"spaser" the full result is displayed. In contrast to REST which behaves as
expected.


Cheers
Lars

2015-05-18 15:28 GMT+02:00 Lars Johnsen <yoon...@gmail.com>:

> As an update, after rebuilding database with
>
> text index,
> full text index (no language, no stemming, keep diacritics)
>
> restarting server:
> BaseX 8.1.1 [Server]
> Server was started (port: 29084)
> [main] INFO org.eclipse.jetty.server.AbstractConnector - Started
> SelectChannelConnector@0.0.0.0:8984
> HTTP Server was started (port: 8984)
>
> RESTXQ: Norwegian characters are converted using full text index, changing
> to text index takes forever.
> REST: Full-text works as expected, and text index works as expected (same
> as runing in GUI for both).
>
> It looks as if the index structure is treated differently.
>
>
> 2015-05-18 15:07 GMT+02:00 Lars Johnsen <yoon...@gmail.com>:
>
>> The full text query is blisteringly fast for both, the text index query
>> is fast only for REST queries and seems not to be used with queries in
>> RESTXQ. I am rebuilding the whole database now to see how it goes, and will
>> restart everything for a new assessment.
>>
>>
>>
>> 2015-05-18 15:00 GMT+02:00 Christian Grün <christian.gr...@gmail.com>:
>>
>>> > However, when using text index instead of full text the results are
>>> the same
>>> > for both, except that RESTXQ takes almost forever
>>>
>>> What about the original query: Has it been slow as well, or do you
>>> think this is a new problem?
>>>
>>>
>>> > 2015-05-18 14:28 GMT+02:00 Christian Grün <christian.gr...@gmail.com>:
>>> >>
>>> >> It could be that your URL is decoded in a wrong way.. What happens if
>>> >> you run the following function with REST and RESTXQ and "føre" as
>>> >> word?
>>> >>
>>> >>   declare
>>> >>     %rest:path("/test/encoding/{$word}")
>>> >>   function page:test-encoding($word) {
>>> >>     string-to-codepoints($word)
>>> >>   };
>>> >>
>>> >> Thanks,
>>> >> Christian
>>> >>
>>> >>
>>> >> string-to-codepoints()
>>> >> > REST output (2 first lines):
>>> >> >    føre
>>> >> >    fø - re 219
>>> >> >
>>> >> > RESTXQ
>>> >> >    føre
>>> >> >    fo - re 123
>>> >> >
>>> >> > The first word quoted is "føre" in both cases and is what the
>>> scripts
>>> >> > see,
>>> >> > so the full text is given the same in both cases. Could it be that
>>> >> > within
>>> >> > RESTXQ the full text index is treated differently?
>>> >> >
>>> >> > I will work closer on a  self contained example, but thought this
>>> might
>>> >> > point to something.
>>> >> >
>>> >> > Cheers
>>> >> > Lars
>>> >> >
>>> >> >
>>> >> > 2015-05-18 13:44 GMT+02:00 Lars Johnsen <yoon...@gmail.com>:
>>> >> >>
>>> >> >> Hi Christian - and thanks for fast response. Latest version 8.11
>>> is in
>>> >> >> use
>>> >> >> (same behaviour as previous). Let me see if I can make a self
>>> contained
>>> >> >> example.
>>> >> >>
>>> >> >> best,
>>> >> >> Lars
>>> >> >>
>>> >> >> 2015-05-18 13:40 GMT+02:00 Christian Grün <
>>> christian.gr...@gmail.com>:
>>> >> >>>
>>> >> >>> Hi Lars,
>>> >> >>>
>>> >> >>> hm, that's difficult to tell. All I can say is that this sounds
>>> >> >>> unusual, so I'm coming up with my standard questions: Do you
>>> think you
>>> >> >>> could build us a little example that allows us to reproduce the
>>> >> >>> problem? Have you tried the latest version of BaseX?
>>> >> >>>
>>> >> >>> Best,
>>> >> >>> Christian
>>> >> >>>
>>> >> >>>
>>> >> >>> On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen <yoon...@gmail.com>
>>> >> >>> wrote:
>>> >> >>> >
>>> >> >>> > I am running a web script in two identical versions (identical
>>> as in
>>> >> >>> > "cut
>>> >> >>> > and paste"), one via RESTXQ and one vi REST. The response is
>>> >> >>> > different,
>>> >> >>> > and
>>> >> >>> > I wondered what may be the trouble.
>>> >> >>> >
>>> >> >>> > For example the output (the URLs only works locally) for
>>> >> >>> >     http://ljohnsen:8984/hyphens/mellom
>>> >> >>> > is the same as
>>> >> >>> >      http://ljohnsen:8984/rest?run=hyphen-show.xq&word=mellom
>>> >> >>> >
>>> >> >>> > which is a set of hyphenation data:
>>> >> >>> >     mellom
>>> >> >>> >     mel - lom 17005
>>> >> >>> >     Mel - lom 144
>>> >> >>> >     mel - lom. 50
>>> >> >>> >
>>> >> >>> > but if "mellom" is exchanged with "nasjonalbiblioteket" only
>>> the
>>> >> >>> > REST
>>> >> >>> > version shows any result, which then is the same as I get
>>> >> >>> > experimenting
>>> >> >>> > in
>>> >> >>> > the GUI.
>>> >> >>> >
>>> >> >>> > The actual script is added below, and which runs in both
>>> versions
>>> >> >>> > (identical apart form the rest and restxq interfaces), it uses
>>> full
>>> >> >>> > text
>>> >> >>> > search, but results differ when run under the REST-regime.
>>> >> >>> >
>>> >> >>> > All the best
>>> >> >>> > Lars G Johnsen
>>> >> >>> > National Library of Norway
>>> >> >>> >
>>> >> >>> > module namespace page = 'http://basex.org/modules/web-page';
>>> >> >>> >
>>> >> >>> > declare
>>> >> >>> >   %rest:path("/hyphens/{$word}")
>>> >> >>> >   %output:method("html")
>>> >> >>> >
>>> >> >>> > function page:show-hyphens($word) {
>>> >> >>> >    let $db := db:open('hyphen-data')
>>> >> >>> >      let $hyphens :=  for $hyp in $db/hyphens/hyphens[full
>>> contains
>>> >> >>> > text
>>> >> >>> > {$word}]
>>> >> >>> >       group by $first := $hyp/first, $second := $hyp/second
>>> >> >>> >       let $count := count($hyp)
>>> >> >>> >       order by xs:int($count) descending
>>> >> >>> >       return element p {
>>> >> >>> >         attribute freq {$count},
>>> >> >>> >         $first, " - ", $second, $count
>>> >> >>> >       }
>>> >> >>> >
>>> >> >>> >      let $total := sum($hyphens//@freq)
>>> >> >>> >      let $div := element div {
>>> >> >>> >        element p {$word},
>>> >> >>> >        for $hyp in $hyphens
>>> >> >>> >        return element div {
>>> >> >>> >           attribute class {"hyph"},
>>> >> >>> >           attribute style {"font-size:", 1
>>> >> >>> > +round(xs:int($hyp//@freq/data())
>>> >> >>> > div $total,1) || "em"},
>>> >> >>> >           $hyp
>>> >> >>> >
>>> >> >>> >          }
>>> >> >>> >      }
>>> >> >>> >      return
>>> >> >>> >      <html encoding="UTF-8">
>>> >> >>> >     <head>
>>> >> >>> >         <meta http-equiv="Content-Type" content="text/html"
>>> >> >>> > charset="UTF-8"
>>> >> >>> > />
>>> >> >>> >         <title>Orddelinger</title>
>>> >> >>> >     </head>
>>> >> >>> >     <body>{$div}
>>> >> >>> >     </body>
>>> >> >>> >     </html>
>>> >> >>> >
>>> >> >>> > };
>>> >> >>
>>> >> >>
>>> >> >
>>> >
>>> >
>>>
>>
>>
>

Reply via email to