I did at one stage implement this in the webtools xml import routines and you may find a patch either in the old Jira, but from vague memory, more likely on the mailing list. Problem is though it was a couple of years ago and the xml import routines have changed considerably since then so it will not likely patch against the current SVN so would be no good other than a quick reference for thoughts on a possible implementation.

It was certainly useful when pushing data in as problems would be left in the XML file which you could review, modify and keep pushing as needed until you got it all in, with referential integrity active.

Ray


Daniel Kunkel wrote:
Hi

In general I agree that failing for a whole file is appropriate, however
there is at least one table where partial completion would be
advantageous, and this feature would make it far easier to find and fix
importing errors in the future.

A while back I ran into this with one table in particular, I think it
was the product category table.

The table is self referencing creating a hierarchy, so even though all
of the entries were in the file to be imported, the import function
choked on the first item that didn't have a valid parent and causing the
whole file to fail.

Although it was possible to turn off referential integrity to import the
file, it could have also worked to extend the cyclical import function
to "write out" a new xml data file with only the rows that fail during
each pass.

If I understand correctly, the current cyclical import function will
load xml data files atomically, deleting those files that work and
retrying any files that are left again again until the number of files
does not decrease in a pass.

A revised system would perform the same process, but instead of working
on the file level, work on the level of the row. For each file, if a row
is not accepted, write a new import xml file with that row that could
not be imported.
Rather than stopping when a pass results in no fewer files, stop when a
pass was unsuccessful at adding any additional rows to the database.

I think the best part of this scheme is that it separates the needle
from the hay when importing data sets with errors. After running it on a
data set that has real referential integrity issue (the needle), all of
the good records (the hay) are removed. This helps the developer to
quickly find and fix the problems.
Thanks

Reply via email to