Re: XML to RDB

2013-11-25 Thread Johan De Meersman
 ___
| Nodes |
|---|
| id| ---
| [more fields] |   |
| parent_id | --
|___|


And then you join the table with itself as needed. Do note that you'll need a 
self-join for every level you query, and that every self-join incrementally 
slows down the query.

It may be beneficial (actually, I'm pretty sure it will be :-) ) to implement 
the recursion in software as necessary, simply follow the branch you want and 
check wether there are children at each turn; or pick up the child and go back 
up until parent_id is NULL.

Depending on what you want/need, you could also add a depth field, so you don't 
have to loop to figure out where you are in the tree, etc.

Storing node x/y/z as an identifier, as you suggest, is also an option with 
it's own benefits and downsides; nothing prevents you from taking any or all of 
those approaches at the same time.


- Original Message -
 From: Larry Martell larry.mart...@gmail.com
 To: Steven Siebert smsi...@gmail.com
 Cc: mysql mailing list mysql@lists.mysql.com
 Sent: Friday, 22 November, 2013 3:17:44 PM
 Subject: Re: XML to RDB
 
 Yes, I will need to query the data, and yes, it's app specific to the data.
 The parent-node structure will meet my needs if I can figure out how to
 preserver the unbound multi level nodes. There will be a higher level
 table, which will have the file name, date received, and other identifying
 info that will be start of the query. The nodes will have a foreign key
 back to the file table. The queries will be something like get the abcd
 parameter from node x/y/z from the file foo.xml from 10/10/13. I guess I
 can just store the x/y/z in the node table (as opposed to trying to
 represent z is child of y which is a child of x in 3 different tables or
 rows.
 

-- 
Unhappiness is discouraged and will be corrected with kitten pictures.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: XML to RDB

2013-11-25 Thread Steven Siebert
Larry,


If that's the only type of queries you're gonna make, sounds like MySQL can
be a good solution (from an indexing perspective, those queries will be
fine).  Basically, you would be using MySQL as a read-only copy of the XML
content...so your goal here is to understand how you'll be needing to
query, and build the best model (in MySQL) you can to achieve these
queries.  Going back to your initial question, I don't believe there is an
organic MySQL tool (or even a 3rd party tool that I know about) that will
do this for you - you'll probably have to write something, very small, to
do this.  Not sure how much programming experience you have...but something
like a simple SAX parser producing XML element and attribute rows that
you (probably batch) insert into the schema you make will work quite
easily...with very little code.

Be careful with the schema, especially the IDs.  The schema suggested by
Johan will work in simple cases, but might become a problem with unbounded
elements and if you wanted to break out XML arrays, for example (ie more
than one instance of a node's ID).  If you need to preserve sequence, that
might be something you need to consider too.  If you don't foresee the need
to query the parent hierarchy, I wouldn't add it (no parent_id)...not only
would it not be necessary, you have to consider how you want to query
hierarchy...if you need to, checked out the Nested Set model as opposed to
the adjacent list (see h
ttp://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/).
Again, this all depends on what you need...but based on the queries you
indicated, you wouldn't need it.

Good luck!

S




On Mon, Nov 25, 2013 at 4:29 AM, Johan De Meersman vegiv...@tuxera.bewrote:

  ___
 | Nodes |
 |---|
 | id| ---
 | [more fields] |   |
 | parent_id | --
 |___|


 And then you join the table with itself as needed. Do note that you'll
 need a self-join for every level you query, and that every self-join
 incrementally slows down the query.

 It may be beneficial (actually, I'm pretty sure it will be :-) ) to
 implement the recursion in software as necessary, simply follow the branch
 you want and check wether there are children at each turn; or pick up the
 child and go back up until parent_id is NULL.

 Depending on what you want/need, you could also add a depth field, so you
 don't have to loop to figure out where you are in the tree, etc.

 Storing node x/y/z as an identifier, as you suggest, is also an option
 with it's own benefits and downsides; nothing prevents you from taking any
 or all of those approaches at the same time.


 - Original Message -
  From: Larry Martell larry.mart...@gmail.com
  To: Steven Siebert smsi...@gmail.com
  Cc: mysql mailing list mysql@lists.mysql.com
  Sent: Friday, 22 November, 2013 3:17:44 PM
  Subject: Re: XML to RDB
 
  Yes, I will need to query the data, and yes, it's app specific to the
 data.
  The parent-node structure will meet my needs if I can figure out how to
  preserver the unbound multi level nodes. There will be a higher level
  table, which will have the file name, date received, and other
 identifying
  info that will be start of the query. The nodes will have a foreign key
  back to the file table. The queries will be something like get the abcd
  parameter from node x/y/z from the file foo.xml from 10/10/13. I guess I
  can just store the x/y/z in the node table (as opposed to trying to
  represent z is child of y which is a child of x in 3 different tables or
  rows.
 

 --
 Unhappiness is discouraged and will be corrected with kitten pictures.



Re: XML to RDB

2013-11-25 Thread Larry Martell
On Mon, Nov 25, 2013 at 4:29 AM, Johan De Meersman vegiv...@tuxera.bewrote:

  ___
 | Nodes |
 |---|
 | id| ---
 | [more fields] |   |
 | parent_id | --
 |___|


 And then you join the table with itself as needed. Do note that you'll
 need a self-join for every level you query, and that every self-join
 incrementally slows down the query.

 It may be beneficial (actually, I'm pretty sure it will be :-) ) to
 implement the recursion in software as necessary, simply follow the branch
 you want and check wether there are children at each turn; or pick up the
 child and go back up until parent_id is NULL.

 Depending on what you want/need, you could also add a depth field, so you
 don't have to loop to figure out where you are in the tree, etc.

 Storing node x/y/z as an identifier, as you suggest, is also an option
 with it's own benefits and downsides; nothing prevents you from taking any
 or all of those approaches at the same time.


Thanks for the reply. I've had very bad performance in the past with
multiple self joins on large tables. I think I will end up going with using
x/y/z.



 - Original Message -
  From: Larry Martell larry.mart...@gmail.com
  To: Steven Siebert smsi...@gmail.com
  Cc: mysql mailing list mysql@lists.mysql.com
  Sent: Friday, 22 November, 2013 3:17:44 PM
  Subject: Re: XML to RDB
 
  Yes, I will need to query the data, and yes, it's app specific to the
 data.
  The parent-node structure will meet my needs if I can figure out how to
  preserver the unbound multi level nodes. There will be a higher level
  table, which will have the file name, date received, and other
 identifying
  info that will be start of the query. The nodes will have a foreign key
  back to the file table. The queries will be something like get the abcd
  parameter from node x/y/z from the file foo.xml from 10/10/13. I guess I
  can just store the x/y/z in the node table (as opposed to trying to
  represent z is child of y which is a child of x in 3 different tables or
  rows.
 

 --
 Unhappiness is discouraged and will be corrected with kitten pictures.



Re: XML to RDB

2013-11-25 Thread Larry Martell
On Mon, Nov 25, 2013 at 8:29 AM, Steven Siebert smsi...@gmail.com wrote:

 Larry,


 If that's the only type of queries you're gonna make, sounds like MySQL
 can be a good solution (from an indexing perspective, those queries will be
 fine).  Basically, you would be using MySQL as a read-only copy of the XML
 content...so your goal here is to understand how you'll be needing to
 query, and build the best model (in MySQL) you can to achieve these
 queries.  Going back to your initial question, I don't believe there is an
 organic MySQL tool (or even a 3rd party tool that I know about) that will
 do this for you - you'll probably have to write something, very small, to
 do this.  Not sure how much programming experience you have...but something
 like a simple SAX parser producing XML element and attribute rows that
 you (probably batch) insert into the schema you make will work quite
 easily...with very little code.


I've been programing for over 30 years. I already have written some python
code using ElementTree that parses the file.


 Be careful with the schema, especially the IDs.  The schema suggested by
 Johan will work in simple cases, but might become a problem with unbounded
 elements and if you wanted to break out XML arrays, for example (ie more
 than one instance of a node's ID).  If you need to preserve sequence, that
 might be something you need to consider too.  If you don't foresee the need
 to query the parent hierarchy, I wouldn't add it (no parent_id)...not only
 would it not be necessary, you have to consider how you want to query
 hierarchy...if you need to, checked out the Nested Set model as opposed to
 the adjacent list (see h
 ttp://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/).
 Again, this all depends on what you need...but based on the queries you
 indicated, you wouldn't need it.


Thanks for the link - the  Nested Set model looks very interesting, but as
you say, I probably won't need that. This will most likely be an iterative
process as how it will be queried will probably evolve.


 Good luck!

 S




 On Mon, Nov 25, 2013 at 4:29 AM, Johan De Meersman vegiv...@tuxera.bewrote:

  ___
 | Nodes |
 |---|
 | id| ---
 | [more fields] |   |
 | parent_id | --
 |___|


 And then you join the table with itself as needed. Do note that you'll
 need a self-join for every level you query, and that every self-join
 incrementally slows down the query.

 It may be beneficial (actually, I'm pretty sure it will be :-) ) to
 implement the recursion in software as necessary, simply follow the branch
 you want and check wether there are children at each turn; or pick up the
 child and go back up until parent_id is NULL.

 Depending on what you want/need, you could also add a depth field, so you
 don't have to loop to figure out where you are in the tree, etc.

 Storing node x/y/z as an identifier, as you suggest, is also an option
 with it's own benefits and downsides; nothing prevents you from taking any
 or all of those approaches at the same time.


 - Original Message -
  From: Larry Martell larry.mart...@gmail.com
  To: Steven Siebert smsi...@gmail.com
  Cc: mysql mailing list mysql@lists.mysql.com
  Sent: Friday, 22 November, 2013 3:17:44 PM
  Subject: Re: XML to RDB
 
  Yes, I will need to query the data, and yes, it's app specific to the
 data.
  The parent-node structure will meet my needs if I can figure out how to
  preserver the unbound multi level nodes. There will be a higher level
  table, which will have the file name, date received, and other
 identifying
  info that will be start of the query. The nodes will have a foreign key
  back to the file table. The queries will be something like get the abcd
  parameter from node x/y/z from the file foo.xml from 10/10/13. I guess
 I
  can just store the x/y/z in the node table (as opposed to trying to
  represent z is child of y which is a child of x in 3 different tables or
  rows.
 

 --
 Unhappiness is discouraged and will be corrected with kitten pictures.





XML to RDB

2013-11-22 Thread Larry Martell
I have a need to store data that comes in an XML file in MySQL. I don't
want to store the entire file as a CBLOB, and it doesn't lend itself to
loading with Load XML. I am looking for tools that will help me create the
schema and parse and load the data. I have googled for this and I've found
many scholarly papers written about it, but the examples are always with
very simple XML files, and I haven't found any tools. I do have the xsd
schema for the XML file, but what's confusing me about how to design the
RDB schema is the unbounded recursion of the data. The data is basically
nodes that have parameters which have items with values. My first thought
was to simply have a node table and a parameter table with a foreign key
back to the node it belongs to. But when digging into the data I found that
nodes can have sub nodes - in one file I have this goes down for 7 levels,
but in theory there is no bound on that. I'm now sure how to best represent
that.

Any experiences, advice, or pointers are very welcome.

Thanks-
larry


Re: XML to RDB

2013-11-22 Thread Steven Siebert
Hi Larry,

I'm trying to figure out what your wanting to do with the data once its in
mysql?  At first it seemed you didn't want to put it in as a lob because
you might want to query on the data in different ways (kind of an
assumption on my part, but a common reason to do this).  Then, you thought
about making a very generic schema (parent-node) which really doesn't lend
itself well to this goal.

Could you explain what you plan to do with the data in mysql and maybe a
little about the goal of the application?  Is this app specific to this
data, or are you looking to build an app that can take any schema defined
XML file, ingest it, and allow you to do some work with it?

Why is a lob not appropriate for your needs?

S
On Nov 22, 2013 7:24 AM, Larry Martell larry.mart...@gmail.com wrote:

 I have a need to store data that comes in an XML file in MySQL. I don't
 want to store the entire file as a CBLOB, and it doesn't lend itself to
 loading with Load XML. I am looking for tools that will help me create the
 schema and parse and load the data. I have googled for this and I've found
 many scholarly papers written about it, but the examples are always with
 very simple XML files, and I haven't found any tools. I do have the xsd
 schema for the XML file, but what's confusing me about how to design the
 RDB schema is the unbounded recursion of the data. The data is basically
 nodes that have parameters which have items with values. My first thought
 was to simply have a node table and a parameter table with a foreign key
 back to the node it belongs to. But when digging into the data I found that
 nodes can have sub nodes - in one file I have this goes down for 7 levels,
 but in theory there is no bound on that. I'm now sure how to best represent
 that.

 Any experiences, advice, or pointers are very welcome.

 Thanks-
 larry



Re: XML to RDB

2013-11-22 Thread Larry Martell
On Fri, Nov 22, 2013 at 8:05 AM, Steven Siebert smsi...@gmail.com wrote:

 Hi Larry,

 I'm trying to figure out what your wanting to do with the data once its in
 mysql?  At first it seemed you didn't want to put it in as a lob because
 you might want to query on the data in different ways (kind of an
 assumption on my part, but a common reason to do this).  Then, you thought
 about making a very generic schema (parent-node) which really doesn't lend
 itself well to this goal.

 Could you explain what you plan to do with the data in mysql and maybe a
 little about the goal of the application?  Is this app specific to this
 data, or are you looking to build an app that can take any schema defined
 XML file, ingest it, and allow you to do some work with it?

 Why is a lob not appropriate for your needs?


Yes, I will need to query the data, and yes, it's app specific to the data.
The parent-node structure will meet my needs if I can figure out how to
preserver the unbound multi level nodes. There will be a higher level
table, which will have the file name, date received, and other identifying
info that will be start of the query. The nodes will have a foreign key
back to the file table. The queries will be something like get the abcd
parameter from node x/y/z from the file foo.xml from 10/10/13. I guess I
can just store the x/y/z in the node table (as opposed to trying to
represent z is child of y which is a child of x in 3 different tables or
rows.

On Nov 22, 2013 7:24 AM, Larry Martell larry.mart...@gmail.com wrote:

 I have a need to store data that comes in an XML file in MySQL. I don't
 want to store the entire file as a CBLOB, and it doesn't lend itself to
 loading with Load XML. I am looking for tools that will help me create the
 schema and parse and load the data. I have googled for this and I've found
 many scholarly papers written about it, but the examples are always with
 very simple XML files, and I haven't found any tools. I do have the xsd
 schema for the XML file, but what's confusing me about how to design the
 RDB schema is the unbounded recursion of the data. The data is basically
 nodes that have parameters which have items with values. My first thought
 was to simply have a node table and a parameter table with a foreign key
 back to the node it belongs to. But when digging into the data I found
 that
 nodes can have sub nodes - in one file I have this goes down for 7 levels,
 but in theory there is no bound on that. I'm now sure how to best
 represent
 that.

 Any experiences, advice, or pointers are very welcome.

 Thanks-
 larry