Re: [GENERAL] Importing Many XML Records

2006-01-27 Thread George Pavlov
 I'm sure that this has been asked before but I can't find any 
 reference to it in google, and the search facility on 
 postgresql.org is currently down.

http://groups.google.com/groups?q=group%3Apgsql.*

provides the same with a slight delay but arguably a better user
interface.

 I have a large number of entries (possibly 10,000+) in an XML 
 file that I need to import into the database (7.4 on Debian) 
 on a daily basis. Does anyone have any recommendations 
 concerning the best way to do this? Is there some tool I 
 should use or should I create the code in java to parse and 
 import the data?
 
 If anyone has done this before, I would appreciate hearing 
 how they did this.

This is generally outside the scope of this list. I am guessing (since I
don't know much about your data format or goals), but you probably want
to first transform the XML into a format suitable for importation into
the database using COPY, or (much less desirable) a bunch of insert
statements. In either case you should become familiar with XSLT
processing and write yourself an XSLT template to do the job.

I deal with a similar task using Saxon and TagSoup (which I highly
recommend for XML that is not well-formatted) and create a CSV file out
of a multitude of XML files (or a single XML file), which can then be
COPY-ed into a PG table. Instead of a CSV file one could create a SQL
script file of INSERT statements. I recommend Jeni Tennison's Beginning
XSLT book as an excellent reference on the subject of XSLT. 

Depending on what your XML looks like you may get away without XSLT at
all, but just preprocess it with awk, sed, perl (Template::Extract is a
useful module) or whatever strikes your fancy.

Other questions to answer are do you want the records to stay as XML
in the database or do you want to import them into a regular table
format? If the former you may want to get familiar with the pgxml (aka
xml2 module) so you can query the XML data once inside your database.

George




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Importing Many XML Records

2006-01-27 Thread Ron St-Pierre
Thanks George. I just returned from the bookstore and was looking at an XSLT 
solution in one of the books there. I want to import the data into the DB as 
regular data, not as XML. I'll look into Saxon and TagSoup as well as the perl 
module you mentioned. As far as this being outside the scope of the list, I 
wasn't sure whether or not there were postgres modules to deal with this.

Thanks for pointing me to possible solutions.

Ron



 - Original Message -
 From: George Pavlov [EMAIL PROTECTED]
 To: Ron St-Pierre [EMAIL PROTECTED]
 Subject: Re: [GENERAL] Importing Many XML Records
 Date: Fri, 27 Jan 2006 16:03:20 -0800
 
 
  I'm sure that this has been asked before but I can't find any 
  reference to it in google, and the search facility on 
  postgresql.org is currently down.
 
 http://groups.google.com/groups?q=group%3Apgsql.*
 
 provides the same with a slight delay but arguably a better user
 interface.
 
  I have a large number of entries (possibly 10,000+) in an XML 
  file that I need to import into the database (7.4 on Debian) on a 
  daily basis. Does anyone have any recommendations concerning the 
  best way to do this? Is there some tool I should use or should I 
  create the code in java to parse and import the data?
 
  If anyone has done this before, I would appreciate hearing how they did 
  this.
 
 This is generally outside the scope of this list. I am guessing (since I
 don't know much about your data format or goals), but you probably want
 to first transform the XML into a format suitable for importation into
 the database using COPY, or (much less desirable) a bunch of insert
 statements. In either case you should become familiar with XSLT
 processing and write yourself an XSLT template to do the job.
 
 I deal with a similar task using Saxon and TagSoup (which I highly
 recommend for XML that is not well-formatted) and create a CSV file out
 of a multitude of XML files (or a single XML file), which can then be
 COPY-ed into a PG table. Instead of a CSV file one could create a SQL
 script file of INSERT statements. I recommend Jeni Tennison's Beginning
 XSLT book as an excellent reference on the subject of XSLT.
 
 Depending on what your XML looks like you may get away without XSLT at
 all, but just preprocess it with awk, sed, perl (Template::Extract is a
 useful module) or whatever strikes your fancy.
 
 Other questions to answer are do you want the records to stay as XML
 in the database or do you want to import them into a regular table
 format? If the former you may want to get familiar with the pgxml (aka
 xml2 module) so you can query the XML data once inside your database.
 
 George
 
 
 
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings




-- 
___
Play 100s of games for FREE! http://games.mail.com/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq