Hi Michael,

Yes, this can easily be done with XQuery. There are many ways to do
this; here is just one:

1. First, create a database from your input file (e.g. with the BaseX GUI)

2. Second, run the following query to replace wrap your partinfo
elements with part elements:

  //unit/partinfo/(replace node . with <part>{ . }</part>)

3. Third, write all page elements to disk:

  for $page at $c in //page
  return file:write($c || '.xml', $page)

Hope this helps,
Christian



On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn <[email protected]> wrote:
> I need to perform a transformation that would be simple in XSLT, but the
> input is a file about 250 MBs in size. I'm wondering whether XQuery and
> BaseX in particular would be the most efficient way of doing it. I'm new to
> XQuery, and I've come up with a couple of ways to do this, but they turn out
> to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to find
> out the proper way of doing this.
>
> The input consists of 2 sections. There are about 3600 page elements with
> this structure:
>
> <page>
>     [misc page content...]
>     <list>
>         <unit>
>             [misc unit content 1...]
>             <partinfo>
>                 <number>54321</number>
>                 <manuf>A321</manuf>
>             </partinfo>
>             <partinfo>
>                 <number>12345</number>
>                 <manuf>B123</manuf>
>             </partinfo>
>             [misc unit content 2...]
>         </unit>
>         [multiple units...]
>     </list>
> </page>
>
> Each unit can have 1 or 2 partinfo elements. The other section has about
> 82000 part elements like this:
>
> <part>
> <partinfo>
> <number>54321</number>
> <manuf>A321</manuf>
> </partinfo>
> [misc part content 1]
> </part>
> [...]
> <part>
> <partinfo>
> <number>12345</number>
> <manuf>B123</manuf>
> </partinfo>
> [misc part content 2]
> </part>
>
> I want to replace each unit/partinfo with the correpsonding part, like this:
>
> <page>
>     [misc page content...]
>     <list>
>         <unit>
>             [misc unit content 1...]
>             <part>
>                 <partinfo>
>                     <number>54321</number>
>                     <manuf>A321</manuf>
>                 </partinfo>
>                 [misc part content 1]
>             </part>
>             <part>
>                 <partinfo>
>                     <number>12345</number>
>                     <manuf>B123</manuf>
>                 </partinfo>
>                 [misc part content 2]
>             </part>
>             [misc unit content 2...]
>         </unit>
>         [multiple units...]
>     </list>
> </page>
>
> Is BaseX a good tool for this task? If so, how does one go about it?
>
> Finally, it would help to be able to output each page element in a separate
> file. Would it be better to have BaseX do this, or to output the whole
> database and chunk it with another tool?
>
> Thanks,
>
> Michael

Reply via email to