Seems like this would be perfect. I do need both number and manuf. Using
your combination map, I'm now getting an "Out of Main Memory" error. Tried
on a second computer - same issue. Would it be more likely to work if I
tried it from the command line rather than the GUI? If so, I'll need to
look up how to create a database that way, but I'm sure it's close to hand.
Or is there a better workaround (besides buying a computer with more than
8GB of RAM)?

Thanks again,

Michael

On Tue, May 24, 2016 at 2:10 PM, Christian Grün <[email protected]>
wrote:

> Maybe you need something like this:
>
>   for $partinfo in //unit/partinfo
>   for $part in //part[deep-equal(partinfo, $partinfo)]
>   return replace node $partinfo with $part/node()
>
> The deep-equal will be pretty slow. If the value of the number element
> is unique, you could do something like this:
>
>   for $partinfo in //unit/partinfo
>   let $number := $partinfo/number
>   let $part := //part[partinfo/number, $number]
>   return replace node $partinfo with $part/node()
>
> Using a map will even be faster:
>
>   let $map := map:merge(//part/map:entry(partinfo/number/text(), .))
>   for $partinfo in //unit/partinfo
>   let $part := $map($partinfo/number)
>   return replace node $partinfo with $part/node()
>
> If you need to consider both number and manuf, you could e.g. combine
> these two in the map:
>
>   let $map := map:merge(
>     for $part in //part
>     return map:entry(string-join($part/partinfo/*, '/'), $part)
>   )
>   for $partinfo in //unit/partinfo
>   let $part := $map(string-join($partinfo/*, '/'))
>   return replace node $partinfo with $part/node()
>
> Does this help?
> Christian
>
>
>
>
> On Tue, May 24, 2016 at 10:54 PM, Michael Sanborn <[email protected]>
> wrote:
> > Thanks for that. The trouble in step 2 is, just wrapping partinfo with
> the
> > part element doesn't get me what I've labelled "misc part content 1" and
> > "misc part content 2". It's not sufficient to have just the tags - I need
> > all the content of the corresponding part elements in the later part of
> the
> > file. Is that something that can be done without too much difficulty?
> >
> > Thanks,
> >
> > Michael
> >
> > On Tue, May 24, 2016 at 12:16 PM, Christian Grün <
> [email protected]>
> > wrote:
> >>
> >> Hi Michael,
> >>
> >> Yes, this can easily be done with XQuery. There are many ways to do
> >> this; here is just one:
> >>
> >> 1. First, create a database from your input file (e.g. with the BaseX
> GUI)
> >>
> >> 2. Second, run the following query to replace wrap your partinfo
> >> elements with part elements:
> >>
> >>   //unit/partinfo/(replace node . with <part>{ . }</part>)
> >>
> >> 3. Third, write all page elements to disk:
> >>
> >>   for $page at $c in //page
> >>   return file:write($c || '.xml', $page)
> >>
> >> Hope this helps,
> >> Christian
> >>
> >>
> >>
> >> On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn <[email protected]>
> >> wrote:
> >> > I need to perform a transformation that would be simple in XSLT, but
> the
> >> > input is a file about 250 MBs in size. I'm wondering whether XQuery
> and
> >> > BaseX in particular would be the most efficient way of doing it. I'm
> new
> >> > to
> >> > XQuery, and I've come up with a couple of ways to do this, but they
> turn
> >> > out
> >> > to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to
> >> > find
> >> > out the proper way of doing this.
> >> >
> >> > The input consists of 2 sections. There are about 3600 page elements
> >> > with
> >> > this structure:
> >> >
> >> > <page>
> >> >     [misc page content...]
> >> >     <list>
> >> >         <unit>
> >> >             [misc unit content 1...]
> >> >             <partinfo>
> >> >                 <number>54321</number>
> >> >                 <manuf>A321</manuf>
> >> >             </partinfo>
> >> >             <partinfo>
> >> >                 <number>12345</number>
> >> >                 <manuf>B123</manuf>
> >> >             </partinfo>
> >> >             [misc unit content 2...]
> >> >         </unit>
> >> >         [multiple units...]
> >> >     </list>
> >> > </page>
> >> >
> >> > Each unit can have 1 or 2 partinfo elements. The other section has
> about
> >> > 82000 part elements like this:
> >> >
> >> > <part>
> >> > <partinfo>
> >> > <number>54321</number>
> >> > <manuf>A321</manuf>
> >> > </partinfo>
> >> > [misc part content 1]
> >> > </part>
> >> > [...]
> >> > <part>
> >> > <partinfo>
> >> > <number>12345</number>
> >> > <manuf>B123</manuf>
> >> > </partinfo>
> >> > [misc part content 2]
> >> > </part>
> >> >
> >> > I want to replace each unit/partinfo with the correpsonding part, like
> >> > this:
> >> >
> >> > <page>
> >> >     [misc page content...]
> >> >     <list>
> >> >         <unit>
> >> >             [misc unit content 1...]
> >> >             <part>
> >> >                 <partinfo>
> >> >                     <number>54321</number>
> >> >                     <manuf>A321</manuf>
> >> >                 </partinfo>
> >> >                 [misc part content 1]
> >> >             </part>
> >> >             <part>
> >> >                 <partinfo>
> >> >                     <number>12345</number>
> >> >                     <manuf>B123</manuf>
> >> >                 </partinfo>
> >> >                 [misc part content 2]
> >> >             </part>
> >> >             [misc unit content 2...]
> >> >         </unit>
> >> >         [multiple units...]
> >> >     </list>
> >> > </page>
> >> >
> >> > Is BaseX a good tool for this task? If so, how does one go about it?
> >> >
> >> > Finally, it would help to be able to output each page element in a
> >> > separate
> >> > file. Would it be better to have BaseX do this, or to output the whole
> >> > database and chunk it with another tool?
> >> >
> >> > Thanks,
> >> >
> >> > Michael
> >
> >
>

Reply via email to