Seems like this would be perfect. I do need both number and manuf. Using your combination map, I'm now getting an "Out of Main Memory" error. Tried on a second computer - same issue. Would it be more likely to work if I tried it from the command line rather than the GUI? If so, I'll need to look up how to create a database that way, but I'm sure it's close to hand. Or is there a better workaround (besides buying a computer with more than 8GB of RAM)?
Thanks again, Michael On Tue, May 24, 2016 at 2:10 PM, Christian Grün <[email protected]> wrote: > Maybe you need something like this: > > for $partinfo in //unit/partinfo > for $part in //part[deep-equal(partinfo, $partinfo)] > return replace node $partinfo with $part/node() > > The deep-equal will be pretty slow. If the value of the number element > is unique, you could do something like this: > > for $partinfo in //unit/partinfo > let $number := $partinfo/number > let $part := //part[partinfo/number, $number] > return replace node $partinfo with $part/node() > > Using a map will even be faster: > > let $map := map:merge(//part/map:entry(partinfo/number/text(), .)) > for $partinfo in //unit/partinfo > let $part := $map($partinfo/number) > return replace node $partinfo with $part/node() > > If you need to consider both number and manuf, you could e.g. combine > these two in the map: > > let $map := map:merge( > for $part in //part > return map:entry(string-join($part/partinfo/*, '/'), $part) > ) > for $partinfo in //unit/partinfo > let $part := $map(string-join($partinfo/*, '/')) > return replace node $partinfo with $part/node() > > Does this help? > Christian > > > > > On Tue, May 24, 2016 at 10:54 PM, Michael Sanborn <[email protected]> > wrote: > > Thanks for that. The trouble in step 2 is, just wrapping partinfo with > the > > part element doesn't get me what I've labelled "misc part content 1" and > > "misc part content 2". It's not sufficient to have just the tags - I need > > all the content of the corresponding part elements in the later part of > the > > file. Is that something that can be done without too much difficulty? > > > > Thanks, > > > > Michael > > > > On Tue, May 24, 2016 at 12:16 PM, Christian Grün < > [email protected]> > > wrote: > >> > >> Hi Michael, > >> > >> Yes, this can easily be done with XQuery. There are many ways to do > >> this; here is just one: > >> > >> 1. First, create a database from your input file (e.g. with the BaseX > GUI) > >> > >> 2. Second, run the following query to replace wrap your partinfo > >> elements with part elements: > >> > >> //unit/partinfo/(replace node . with <part>{ . }</part>) > >> > >> 3. Third, write all page elements to disk: > >> > >> for $page at $c in //page > >> return file:write($c || '.xml', $page) > >> > >> Hope this helps, > >> Christian > >> > >> > >> > >> On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn <[email protected]> > >> wrote: > >> > I need to perform a transformation that would be simple in XSLT, but > the > >> > input is a file about 250 MBs in size. I'm wondering whether XQuery > and > >> > BaseX in particular would be the most efficient way of doing it. I'm > new > >> > to > >> > XQuery, and I've come up with a couple of ways to do this, but they > turn > >> > out > >> > to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to > >> > find > >> > out the proper way of doing this. > >> > > >> > The input consists of 2 sections. There are about 3600 page elements > >> > with > >> > this structure: > >> > > >> > <page> > >> > [misc page content...] > >> > <list> > >> > <unit> > >> > [misc unit content 1...] > >> > <partinfo> > >> > <number>54321</number> > >> > <manuf>A321</manuf> > >> > </partinfo> > >> > <partinfo> > >> > <number>12345</number> > >> > <manuf>B123</manuf> > >> > </partinfo> > >> > [misc unit content 2...] > >> > </unit> > >> > [multiple units...] > >> > </list> > >> > </page> > >> > > >> > Each unit can have 1 or 2 partinfo elements. The other section has > about > >> > 82000 part elements like this: > >> > > >> > <part> > >> > <partinfo> > >> > <number>54321</number> > >> > <manuf>A321</manuf> > >> > </partinfo> > >> > [misc part content 1] > >> > </part> > >> > [...] > >> > <part> > >> > <partinfo> > >> > <number>12345</number> > >> > <manuf>B123</manuf> > >> > </partinfo> > >> > [misc part content 2] > >> > </part> > >> > > >> > I want to replace each unit/partinfo with the correpsonding part, like > >> > this: > >> > > >> > <page> > >> > [misc page content...] > >> > <list> > >> > <unit> > >> > [misc unit content 1...] > >> > <part> > >> > <partinfo> > >> > <number>54321</number> > >> > <manuf>A321</manuf> > >> > </partinfo> > >> > [misc part content 1] > >> > </part> > >> > <part> > >> > <partinfo> > >> > <number>12345</number> > >> > <manuf>B123</manuf> > >> > </partinfo> > >> > [misc part content 2] > >> > </part> > >> > [misc unit content 2...] > >> > </unit> > >> > [multiple units...] > >> > </list> > >> > </page> > >> > > >> > Is BaseX a good tool for this task? If so, how does one go about it? > >> > > >> > Finally, it would help to be able to output each page element in a > >> > separate > >> > file. Would it be better to have BaseX do this, or to output the whole > >> > database and chunk it with another tool? > >> > > >> > Thanks, > >> > > >> > Michael > > > > >

