On Thu, Feb 16, 2012 at 9:02 PM, Peter wrote:
> On Thu, Feb 16, 2012 at 6:42 PM, Chris wrote:
>> On Feb 16, 2012, at 12:24 PM, Peter wrote:
>>> I also need to look at merging multiple BLAST XML outputs,
>>> but this is looking promising.
>>
>> Yep, that's definitely one where a simple concatenation
>> wouldn't work (though NCBI used to think so, years ago…)
>
> Well, given the NCBI's historic practise of producing 'XML'
> output which was the concatenation of several XML files,
> some tools will tolerate this out of practicality - the Biopython
> BLAST XML parser for example.
>
> But yes, some care is needed over the header/footer to
> ensure a valid XML output is created by the merge. This
> may also require renumbering queries... I will check.

Basic BLAST XML merging implemented and apparently working:
https://bitbucket.org/peterjc/galaxy-central/changeset/ebf65c0b1e26

This does not currently attempt to remap the iteration
numbers or automatically assigned query names, e.g.
you can have this kind of thing in the middle of the XML
at a merge point:

      <Iteration_iter-num>1</Iteration_iter-num>
      <Iteration_query-ID>Query_1</Iteration_query-ID>

That isn't a problem for some tools, e.g. my code in
Galaxy to convert BLAST XML to tabular, but I suspect
it could cause trouble elsewhere. If anyone has specific
suggestions for what to test, that would be great.

If this is an issue, then the merge code needs a little
more work to edit these values.

I think the FASTA split code could be reviewed for
inclusion though. Dan - do you want to look at that?
Would a clean branch help?

Peter

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to