Re: XML to RDB
___ | Nodes | |---| | id| --- | [more fields] | | | parent_id | -- |___| And then you join the table with itself as needed. Do note that you'll need a self-join for every level you query, and that every self-join incrementally slows down the query. It may be beneficial (actually, I'm pretty sure it will be :-) ) to implement the recursion in software as necessary, simply follow the branch you want and check wether there are children at each turn; or pick up the child and go back up until parent_id is NULL. Depending on what you want/need, you could also add a depth field, so you don't have to loop to figure out where you are in the tree, etc. Storing node x/y/z as an identifier, as you suggest, is also an option with it's own benefits and downsides; nothing prevents you from taking any or all of those approaches at the same time. - Original Message - From: Larry Martell larry.mart...@gmail.com To: Steven Siebert smsi...@gmail.com Cc: mysql mailing list mysql@lists.mysql.com Sent: Friday, 22 November, 2013 3:17:44 PM Subject: Re: XML to RDB Yes, I will need to query the data, and yes, it's app specific to the data. The parent-node structure will meet my needs if I can figure out how to preserver the unbound multi level nodes. There will be a higher level table, which will have the file name, date received, and other identifying info that will be start of the query. The nodes will have a foreign key back to the file table. The queries will be something like get the abcd parameter from node x/y/z from the file foo.xml from 10/10/13. I guess I can just store the x/y/z in the node table (as opposed to trying to represent z is child of y which is a child of x in 3 different tables or rows. -- Unhappiness is discouraged and will be corrected with kitten pictures. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: XML to RDB
Larry, If that's the only type of queries you're gonna make, sounds like MySQL can be a good solution (from an indexing perspective, those queries will be fine). Basically, you would be using MySQL as a read-only copy of the XML content...so your goal here is to understand how you'll be needing to query, and build the best model (in MySQL) you can to achieve these queries. Going back to your initial question, I don't believe there is an organic MySQL tool (or even a 3rd party tool that I know about) that will do this for you - you'll probably have to write something, very small, to do this. Not sure how much programming experience you have...but something like a simple SAX parser producing XML element and attribute rows that you (probably batch) insert into the schema you make will work quite easily...with very little code. Be careful with the schema, especially the IDs. The schema suggested by Johan will work in simple cases, but might become a problem with unbounded elements and if you wanted to break out XML arrays, for example (ie more than one instance of a node's ID). If you need to preserve sequence, that might be something you need to consider too. If you don't foresee the need to query the parent hierarchy, I wouldn't add it (no parent_id)...not only would it not be necessary, you have to consider how you want to query hierarchy...if you need to, checked out the Nested Set model as opposed to the adjacent list (see h ttp://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/). Again, this all depends on what you need...but based on the queries you indicated, you wouldn't need it. Good luck! S On Mon, Nov 25, 2013 at 4:29 AM, Johan De Meersman vegiv...@tuxera.bewrote: ___ | Nodes | |---| | id| --- | [more fields] | | | parent_id | -- |___| And then you join the table with itself as needed. Do note that you'll need a self-join for every level you query, and that every self-join incrementally slows down the query. It may be beneficial (actually, I'm pretty sure it will be :-) ) to implement the recursion in software as necessary, simply follow the branch you want and check wether there are children at each turn; or pick up the child and go back up until parent_id is NULL. Depending on what you want/need, you could also add a depth field, so you don't have to loop to figure out where you are in the tree, etc. Storing node x/y/z as an identifier, as you suggest, is also an option with it's own benefits and downsides; nothing prevents you from taking any or all of those approaches at the same time. - Original Message - From: Larry Martell larry.mart...@gmail.com To: Steven Siebert smsi...@gmail.com Cc: mysql mailing list mysql@lists.mysql.com Sent: Friday, 22 November, 2013 3:17:44 PM Subject: Re: XML to RDB Yes, I will need to query the data, and yes, it's app specific to the data. The parent-node structure will meet my needs if I can figure out how to preserver the unbound multi level nodes. There will be a higher level table, which will have the file name, date received, and other identifying info that will be start of the query. The nodes will have a foreign key back to the file table. The queries will be something like get the abcd parameter from node x/y/z from the file foo.xml from 10/10/13. I guess I can just store the x/y/z in the node table (as opposed to trying to represent z is child of y which is a child of x in 3 different tables or rows. -- Unhappiness is discouraged and will be corrected with kitten pictures.
Re: XML to RDB
On Mon, Nov 25, 2013 at 4:29 AM, Johan De Meersman vegiv...@tuxera.bewrote: ___ | Nodes | |---| | id| --- | [more fields] | | | parent_id | -- |___| And then you join the table with itself as needed. Do note that you'll need a self-join for every level you query, and that every self-join incrementally slows down the query. It may be beneficial (actually, I'm pretty sure it will be :-) ) to implement the recursion in software as necessary, simply follow the branch you want and check wether there are children at each turn; or pick up the child and go back up until parent_id is NULL. Depending on what you want/need, you could also add a depth field, so you don't have to loop to figure out where you are in the tree, etc. Storing node x/y/z as an identifier, as you suggest, is also an option with it's own benefits and downsides; nothing prevents you from taking any or all of those approaches at the same time. Thanks for the reply. I've had very bad performance in the past with multiple self joins on large tables. I think I will end up going with using x/y/z. - Original Message - From: Larry Martell larry.mart...@gmail.com To: Steven Siebert smsi...@gmail.com Cc: mysql mailing list mysql@lists.mysql.com Sent: Friday, 22 November, 2013 3:17:44 PM Subject: Re: XML to RDB Yes, I will need to query the data, and yes, it's app specific to the data. The parent-node structure will meet my needs if I can figure out how to preserver the unbound multi level nodes. There will be a higher level table, which will have the file name, date received, and other identifying info that will be start of the query. The nodes will have a foreign key back to the file table. The queries will be something like get the abcd parameter from node x/y/z from the file foo.xml from 10/10/13. I guess I can just store the x/y/z in the node table (as opposed to trying to represent z is child of y which is a child of x in 3 different tables or rows. -- Unhappiness is discouraged and will be corrected with kitten pictures.
Re: XML to RDB
On Mon, Nov 25, 2013 at 8:29 AM, Steven Siebert smsi...@gmail.com wrote: Larry, If that's the only type of queries you're gonna make, sounds like MySQL can be a good solution (from an indexing perspective, those queries will be fine). Basically, you would be using MySQL as a read-only copy of the XML content...so your goal here is to understand how you'll be needing to query, and build the best model (in MySQL) you can to achieve these queries. Going back to your initial question, I don't believe there is an organic MySQL tool (or even a 3rd party tool that I know about) that will do this for you - you'll probably have to write something, very small, to do this. Not sure how much programming experience you have...but something like a simple SAX parser producing XML element and attribute rows that you (probably batch) insert into the schema you make will work quite easily...with very little code. I've been programing for over 30 years. I already have written some python code using ElementTree that parses the file. Be careful with the schema, especially the IDs. The schema suggested by Johan will work in simple cases, but might become a problem with unbounded elements and if you wanted to break out XML arrays, for example (ie more than one instance of a node's ID). If you need to preserve sequence, that might be something you need to consider too. If you don't foresee the need to query the parent hierarchy, I wouldn't add it (no parent_id)...not only would it not be necessary, you have to consider how you want to query hierarchy...if you need to, checked out the Nested Set model as opposed to the adjacent list (see h ttp://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/). Again, this all depends on what you need...but based on the queries you indicated, you wouldn't need it. Thanks for the link - the Nested Set model looks very interesting, but as you say, I probably won't need that. This will most likely be an iterative process as how it will be queried will probably evolve. Good luck! S On Mon, Nov 25, 2013 at 4:29 AM, Johan De Meersman vegiv...@tuxera.bewrote: ___ | Nodes | |---| | id| --- | [more fields] | | | parent_id | -- |___| And then you join the table with itself as needed. Do note that you'll need a self-join for every level you query, and that every self-join incrementally slows down the query. It may be beneficial (actually, I'm pretty sure it will be :-) ) to implement the recursion in software as necessary, simply follow the branch you want and check wether there are children at each turn; or pick up the child and go back up until parent_id is NULL. Depending on what you want/need, you could also add a depth field, so you don't have to loop to figure out where you are in the tree, etc. Storing node x/y/z as an identifier, as you suggest, is also an option with it's own benefits and downsides; nothing prevents you from taking any or all of those approaches at the same time. - Original Message - From: Larry Martell larry.mart...@gmail.com To: Steven Siebert smsi...@gmail.com Cc: mysql mailing list mysql@lists.mysql.com Sent: Friday, 22 November, 2013 3:17:44 PM Subject: Re: XML to RDB Yes, I will need to query the data, and yes, it's app specific to the data. The parent-node structure will meet my needs if I can figure out how to preserver the unbound multi level nodes. There will be a higher level table, which will have the file name, date received, and other identifying info that will be start of the query. The nodes will have a foreign key back to the file table. The queries will be something like get the abcd parameter from node x/y/z from the file foo.xml from 10/10/13. I guess I can just store the x/y/z in the node table (as opposed to trying to represent z is child of y which is a child of x in 3 different tables or rows. -- Unhappiness is discouraged and will be corrected with kitten pictures.
XML to RDB
I have a need to store data that comes in an XML file in MySQL. I don't want to store the entire file as a CBLOB, and it doesn't lend itself to loading with Load XML. I am looking for tools that will help me create the schema and parse and load the data. I have googled for this and I've found many scholarly papers written about it, but the examples are always with very simple XML files, and I haven't found any tools. I do have the xsd schema for the XML file, but what's confusing me about how to design the RDB schema is the unbounded recursion of the data. The data is basically nodes that have parameters which have items with values. My first thought was to simply have a node table and a parameter table with a foreign key back to the node it belongs to. But when digging into the data I found that nodes can have sub nodes - in one file I have this goes down for 7 levels, but in theory there is no bound on that. I'm now sure how to best represent that. Any experiences, advice, or pointers are very welcome. Thanks- larry
Re: XML to RDB
Hi Larry, I'm trying to figure out what your wanting to do with the data once its in mysql? At first it seemed you didn't want to put it in as a lob because you might want to query on the data in different ways (kind of an assumption on my part, but a common reason to do this). Then, you thought about making a very generic schema (parent-node) which really doesn't lend itself well to this goal. Could you explain what you plan to do with the data in mysql and maybe a little about the goal of the application? Is this app specific to this data, or are you looking to build an app that can take any schema defined XML file, ingest it, and allow you to do some work with it? Why is a lob not appropriate for your needs? S On Nov 22, 2013 7:24 AM, Larry Martell larry.mart...@gmail.com wrote: I have a need to store data that comes in an XML file in MySQL. I don't want to store the entire file as a CBLOB, and it doesn't lend itself to loading with Load XML. I am looking for tools that will help me create the schema and parse and load the data. I have googled for this and I've found many scholarly papers written about it, but the examples are always with very simple XML files, and I haven't found any tools. I do have the xsd schema for the XML file, but what's confusing me about how to design the RDB schema is the unbounded recursion of the data. The data is basically nodes that have parameters which have items with values. My first thought was to simply have a node table and a parameter table with a foreign key back to the node it belongs to. But when digging into the data I found that nodes can have sub nodes - in one file I have this goes down for 7 levels, but in theory there is no bound on that. I'm now sure how to best represent that. Any experiences, advice, or pointers are very welcome. Thanks- larry
Re: XML to RDB
On Fri, Nov 22, 2013 at 8:05 AM, Steven Siebert smsi...@gmail.com wrote: Hi Larry, I'm trying to figure out what your wanting to do with the data once its in mysql? At first it seemed you didn't want to put it in as a lob because you might want to query on the data in different ways (kind of an assumption on my part, but a common reason to do this). Then, you thought about making a very generic schema (parent-node) which really doesn't lend itself well to this goal. Could you explain what you plan to do with the data in mysql and maybe a little about the goal of the application? Is this app specific to this data, or are you looking to build an app that can take any schema defined XML file, ingest it, and allow you to do some work with it? Why is a lob not appropriate for your needs? Yes, I will need to query the data, and yes, it's app specific to the data. The parent-node structure will meet my needs if I can figure out how to preserver the unbound multi level nodes. There will be a higher level table, which will have the file name, date received, and other identifying info that will be start of the query. The nodes will have a foreign key back to the file table. The queries will be something like get the abcd parameter from node x/y/z from the file foo.xml from 10/10/13. I guess I can just store the x/y/z in the node table (as opposed to trying to represent z is child of y which is a child of x in 3 different tables or rows. On Nov 22, 2013 7:24 AM, Larry Martell larry.mart...@gmail.com wrote: I have a need to store data that comes in an XML file in MySQL. I don't want to store the entire file as a CBLOB, and it doesn't lend itself to loading with Load XML. I am looking for tools that will help me create the schema and parse and load the data. I have googled for this and I've found many scholarly papers written about it, but the examples are always with very simple XML files, and I haven't found any tools. I do have the xsd schema for the XML file, but what's confusing me about how to design the RDB schema is the unbounded recursion of the data. The data is basically nodes that have parameters which have items with values. My first thought was to simply have a node table and a parameter table with a foreign key back to the node it belongs to. But when digging into the data I found that nodes can have sub nodes - in one file I have this goes down for 7 levels, but in theory there is no bound on that. I'm now sure how to best represent that. Any experiences, advice, or pointers are very welcome. Thanks- larry