Re: [basex-talk] Matching multiple names across a list of sequences of names

2016-04-10 Thread Graydon Saunders
Hi Christian --

Thank you, that was helpful!  (And the faint squeaking noise of my brain
expanding when I first read through it may stick, too. :)

The eventual business requirement wound up being something else entirely.

The overall goal is to alter some complex application files that happen to
be expressed in XML, to avoid a (lengthy and expensive) need for someone to
manually change all the files through the application.  It's not an open
file format.  The actual alteration is being done with XSLT, but BaseX is
being extremely useful when it comes to "failing fast" on possible
approaches to figuring out what to change.  Matching on composed names
turned out not to work at all, but finding that out in a couple of days was
much better than finding out after the XSLT produced gibberish.

Thanks!
Graydon

On Mon, Apr 4, 2016 at 1:27 PM, Christian Grün 
wrote:

> Hi Graydon,
>
> > I can't give you a real example because it's the client's health care
> data,
>
> No problem, your example looks fine.
>
> > let $found := //*[@name eq $match(1)][./descendant::*[@name eq
> > $match(2)][./descendant::*[@name eq $match(3)]]]
>
> Right. You could try to rewrite this for index access:
>
> 1. You’ll have to mark the generated arrays as string arrays:
>
>let $composedNames as array(xs:string) :=
>   for $x in $composed//composed
>   return array { tokenize($x/string(),'\.') }
>
> 2. You need to replace "eq" with "=", and you can simplify the
> predicates a little:
>
>   let $found := //*[@name = $match(1)]
> [descendant::*/@name = $match(2)]
> [descendant::*/@name = $match(3)]
>
> You indicated that you’ll have thousands of paths. How do they look
> like? Could you add some more examples (besides
> "class.operation.specifier")? Are some parts of the paths more
> specific than others? E.g...
>
>A.A.A
>A.A.B
>A.A.C
>A.B.D
>A.B.E
>A.B.F
>...
>
> In this case, it could make sense to only look for the last path
> segment via the index. You could also try to group your results by the
> first segment, then do the search on the second segment, etc. See my
> attached query as example (I’m sure it needs to be revised to work
> properly, because I have only run it with your simple example file).
>
> Does this help?
> Christian
>
>
>
>
> >
> > This works, but it's going over the entire database for every three part
> > class-operation-specifier compound name.  I can't shake the feeling that
> > there's a more efficient way to do this, but I can't see what it might
> be.
> >
> > Thanks!
> > Graydon
> >
> > On Fri, Apr 1, 2016 at 12:04 PM, Christian Grün <
> christian.gr...@gmail.com>
> > wrote:
> >>
> >> Hi Graydon,
> >>
> >> Do you think there’d be a chance for us to get a minimized,
> >> self-contained example, which demonstrates the n^2 solution?
> >>
> >> Thanks  in advance,
> >> Christian
> >>
> >>
> >>
> >> On Fri, Apr 1, 2016 at 5:24 PM, Graydon Saunders 
> >> wrote:
> >> > Hello -
> >> >
> >> > I've got a problem I'm not sure how to best approach.
> >> >
> >> > I've got triplets of names -- class.operation.specifier -- that I need
> >> > to
> >> > match against much longer sequences of names. (Which are in attributes
> >> > in an
> >> > XML hierarchy; each sequence of names derives from a path to a leaf
> >> > element.)
> >> >
> >> > If there is a match (as there usually is not) one of the names in the
> >> > sequence of names will match to the class, a subsequent name to the
> >> > operation,  and a name subsequent to that match to the specifier. (All
> >> > simple string values.)
> >> >
> >> > The naive n^2 version is much too slow for the amount of data
> involved.
> >> >
> >> > Is there an efficient way to do this kind of matching?
> >> >
> >> > Thanks!
> >> > Graydon
> >
> >
>


Re: [basex-talk] Matching multiple names across a list of sequences of names

2016-04-04 Thread Christian Grün
Hi Graydon,

> I can't give you a real example because it's the client's health care data,

No problem, your example looks fine.

> let $found := //*[@name eq $match(1)][./descendant::*[@name eq
> $match(2)][./descendant::*[@name eq $match(3)]]]

Right. You could try to rewrite this for index access:

1. You’ll have to mark the generated arrays as string arrays:

   let $composedNames as array(xs:string) :=
  for $x in $composed//composed
  return array { tokenize($x/string(),'\.') }

2. You need to replace "eq" with "=", and you can simplify the
predicates a little:

  let $found := //*[@name = $match(1)]
[descendant::*/@name = $match(2)]
[descendant::*/@name = $match(3)]

You indicated that you’ll have thousands of paths. How do they look
like? Could you add some more examples (besides
"class.operation.specifier")? Are some parts of the paths more
specific than others? E.g...

   A.A.A
   A.A.B
   A.A.C
   A.B.D
   A.B.E
   A.B.F
   ...

In this case, it could make sense to only look for the last path
segment via the index. You could also try to group your results by the
first segment, then do the search on the second segment, etc. See my
attached query as example (I’m sure it needs to be revised to work
properly, because I have only run it with your simple example file).

Does this help?
Christian




>
> This works, but it's going over the entire database for every three part
> class-operation-specifier compound name.  I can't shake the feeling that
> there's a more efficient way to do this, but I can't see what it might be.
>
> Thanks!
> Graydon
>
> On Fri, Apr 1, 2016 at 12:04 PM, Christian Grün 
> wrote:
>>
>> Hi Graydon,
>>
>> Do you think there’d be a chance for us to get a minimized,
>> self-contained example, which demonstrates the n^2 solution?
>>
>> Thanks  in advance,
>> Christian
>>
>>
>>
>> On Fri, Apr 1, 2016 at 5:24 PM, Graydon Saunders 
>> wrote:
>> > Hello -
>> >
>> > I've got a problem I'm not sure how to best approach.
>> >
>> > I've got triplets of names -- class.operation.specifier -- that I need
>> > to
>> > match against much longer sequences of names. (Which are in attributes
>> > in an
>> > XML hierarchy; each sequence of names derives from a path to a leaf
>> > element.)
>> >
>> > If there is a match (as there usually is not) one of the names in the
>> > sequence of names will match to the class, a subsequent name to the
>> > operation,  and a name subsequent to that match to the specifier. (All
>> > simple string values.)
>> >
>> > The naive n^2 version is much too slow for the amount of data involved.
>> >
>> > Is there an efficient way to do this kind of matching?
>> >
>> > Thanks!
>> > Graydon
>
>


threePartMatchesEG-Grouping.xq
Description: Binary data


Re: [basex-talk] Matching multiple names across a list of sequences of names

2016-04-01 Thread Christian Grün
Hi Graydon,

Do you think there’d be a chance for us to get a minimized,
self-contained example, which demonstrates the n^2 solution?

Thanks  in advance,
Christian



On Fri, Apr 1, 2016 at 5:24 PM, Graydon Saunders  wrote:
> Hello -
>
> I've got a problem I'm not sure how to best approach.
>
> I've got triplets of names -- class.operation.specifier -- that I need to
> match against much longer sequences of names. (Which are in attributes in an
> XML hierarchy; each sequence of names derives from a path to a leaf
> element.)
>
> If there is a match (as there usually is not) one of the names in the
> sequence of names will match to the class, a subsequent name to the
> operation,  and a name subsequent to that match to the specifier. (All
> simple string values.)
>
> The naive n^2 version is much too slow for the amount of data involved.
>
> Is there an efficient way to do this kind of matching?
>
> Thanks!
> Graydon