RE: Big XML files processing Really s-l-o-w. Solution?

2011-02-19 Thread Mark A. Kruger

Matt,

Very cool - an innovative approach I would not have thought of. Of course we
all know that Jochem's a certified genius :)

-Mark

Mark A. Kruger, MCSE, CFG
(402) 408-3733 ext 105
Skype: markakruger
www.cfwebtools.com
www.coldfusionmuse.com
www.necfug.com



-Original Message-
From: Matt Robertson [mailto:websitema...@gmail.com] 
Sent: Saturday, February 19, 2011 8:06 PM
To: cf-talk
Subject: Re: Big XML files processing Really s-l-o-w. Solution?


Here's the update:

Jochem's StAX-based solution worked so well -- and it allowed me to
re-use existing code since I was already processing xml -- that I
didn't bother to consider testing the xml2csv utility.

Original code processing a 45mb file yielded an insert time of 90
seconds per record with total file processing time being a projected
66 hours.

Using code that incorporated xmlSplitter.cfc, the routine created 2682
discrete xml files in approximately 45 seconds.  From there, the
insert loop did its work pulling out xml and pouring it into roughly
100 db table fields at ... 20 records per second.

Total time to read in the 45mb xml file, create the discrete files,
read them, insert them into the db and delete them (one at a time as I
went along in the loop) was 192250ms.

A bit more than three minutes.

Thats an acceptable level of improvement.

I owe you one, Jochem.

-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com



~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342453
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


RE: Big XML files processing Really s-l-o-w. Solution?

2011-02-19 Thread Rick Faircloth

Nice...

-Original Message-
From: Matt Robertson [mailto:websitema...@gmail.com] 
Sent: Saturday, February 19, 2011 9:06 PM
To: cf-talk
Subject: Re: Big XML files processing Really s-l-o-w. Solution?


Here's the update:

Jochem's StAX-based solution worked so well -- and it allowed me to
re-use existing code since I was already processing xml -- that I
didn't bother to consider testing the xml2csv utility.

Original code processing a 45mb file yielded an insert time of 90
seconds per record with total file processing time being a projected
66 hours.

Using code that incorporated xmlSplitter.cfc, the routine created 2682
discrete xml files in approximately 45 seconds.  From there, the
insert loop did its work pulling out xml and pouring it into roughly
100 db table fields at ... 20 records per second.

Total time to read in the 45mb xml file, create the discrete files,
read them, insert them into the db and delete them (one at a time as I
went along in the loop) was 192250ms.

A bit more than three minutes.

Thats an acceptable level of improvement.

I owe you one, Jochem.

-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com



~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342452
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-19 Thread Matt Quackenbush

Nice report!  I'll try and remember that the next time I have to parse huge
XML files.  :-)


~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342451
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-19 Thread Matt Robertson

Here's the update:

Jochem's StAX-based solution worked so well -- and it allowed me to
re-use existing code since I was already processing xml -- that I
didn't bother to consider testing the xml2csv utility.

Original code processing a 45mb file yielded an insert time of 90
seconds per record with total file processing time being a projected
66 hours.

Using code that incorporated xmlSplitter.cfc, the routine created 2682
discrete xml files in approximately 45 seconds.  From there, the
insert loop did its work pulling out xml and pouring it into roughly
100 db table fields at ... 20 records per second.

Total time to read in the 45mb xml file, create the discrete files,
read them, insert them into the db and delete them (one at a time as I
went along in the loop) was 192250ms.

A bit more than three minutes.

Thats an acceptable level of improvement.

I owe you one, Jochem.

-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342450
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-17 Thread Matt Robertson

On Thu, Feb 17, 2011 at 8:56 AM, Rick Faircloth wrote:
>
> Good to know!

Indeed.  Thanks to Marco and Mark.  I plan on trying out both this and
Jochem's solution starting maybe as soon as tomorrow and certainly
through the long U.S. holiday weekend.  Maybe try both and see which
is faster.  I'll post back the results.

-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342417
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


RE: Big XML files processing Really s-l-o-w. Solution?

2011-02-17 Thread Rick Faircloth

Good to know!


-Original Message-
From: Marco Antonio C. Santos [mailto:marcoacsan...@gmail.com] 
Sent: Thursday, February 17, 2011 10:54 AM
To: cf-talk
Subject: Re: Big XML files processing Really s-l-o-w. Solution?


In our company I found a solution to parse big XML files in ColdFusion
applications:

1) using xml2csv(http://www.a7soft.com/xml2csv.html), a very fast
application to convert XML to csv file;
2) with SQL Server BULK INSERT to insert in our database

Using this approach reduced time in 100 times. Very fast.

Cheers

Marco Antonio

On Thu, Feb 17, 2011 at 1:30 PM, Rick Faircloth
wrote:

>
> No problem... I haven't done much with XML, myself, so I thought
> this presented a good opportunity to learn something new, too.
>
> Let us know how the solution you're working on pans out.
>
>
>
> -Original Message-
> From: Matt Robertson [mailto:websitema...@gmail.com]
> Sent: Thursday, February 17, 2011 9:42 AM
> To: cf-talk
> Subject: Re: Big XML files processing Really s-l-o-w. Solution?
>
>
> On Wed, Feb 16, 2011 at 8:08 PM, Rick Faircloth wrote:
> > Perhaps this will be of some help.
>
> Thanks Rick,  Interesting stuff.  I don't work with xml so much so a
> lot of this is new, and I hate new :-).  I haven't been a full-time
> programmer for a few years now so I haven't kept up with everything
> like I would have otherwise.
>
> I have a business trip today so I have to set this aside but I'm going
> to run Jochem's solution - which is presented darn near turnkey - as
> soon as I can tomorrow and see what happens.
>
> --
> --m@Robertson--
> Janitor, The Robertson Team
> mysecretbase.com
>
>
>
> 



~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342388
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


RE: Big XML files processing Really s-l-o-w. Solution?

2011-02-17 Thread Mark A. Kruger

WE've used this approach as well... seems like a kludge but it actually
works really really well. MS Text driver and DTS (or SSIS) are exceptionally
fast for importing textual data. We have a process that consumes 350,000
rows of data in about 5-8 seconds... and does it every 2 minutes.

-Mark


-Original Message-
From: Marco Antonio C. Santos [mailto:marcoacsan...@gmail.com] 
Sent: Thursday, February 17, 2011 9:54 AM
To: cf-talk
Subject: Re: Big XML files processing Really s-l-o-w. Solution?


In our company I found a solution to parse big XML files in ColdFusion
applications:

1) using xml2csv(http://www.a7soft.com/xml2csv.html), a very fast
application to convert XML to csv file;
2) with SQL Server BULK INSERT to insert in our database

Using this approach reduced time in 100 times. Very fast.

Cheers

Marco Antonio

On Thu, Feb 17, 2011 at 1:30 PM, Rick Faircloth
wrote:

>
> No problem... I haven't done much with XML, myself, so I thought
> this presented a good opportunity to learn something new, too.
>
> Let us know how the solution you're working on pans out.
>
>
>
> -Original Message-
> From: Matt Robertson [mailto:websitema...@gmail.com]
> Sent: Thursday, February 17, 2011 9:42 AM
> To: cf-talk
> Subject: Re: Big XML files processing Really s-l-o-w. Solution?
>
>
> On Wed, Feb 16, 2011 at 8:08 PM, Rick Faircloth wrote:
> > Perhaps this will be of some help.
>
> Thanks Rick,  Interesting stuff.  I don't work with xml so much so a
> lot of this is new, and I hate new :-).  I haven't been a full-time
> programmer for a few years now so I haven't kept up with everything
> like I would have otherwise.
>
> I have a business trip today so I have to set this aside but I'm going
> to run Jochem's solution - which is presented darn near turnkey - as
> soon as I can tomorrow and see what happens.
>
> --
> --m@Robertson--
> Janitor, The Robertson Team
> mysecretbase.com
>
>
>
> 



~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342383
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-17 Thread Marco Antonio C. Santos

In our company I found a solution to parse big XML files in ColdFusion
applications:

1) using xml2csv(http://www.a7soft.com/xml2csv.html), a very fast
application to convert XML to csv file;
2) with SQL Server BULK INSERT to insert in our database

Using this approach reduced time in 100 times. Very fast.

Cheers

Marco Antonio

On Thu, Feb 17, 2011 at 1:30 PM, Rick Faircloth wrote:

>
> No problem... I haven't done much with XML, myself, so I thought
> this presented a good opportunity to learn something new, too.
>
> Let us know how the solution you're working on pans out.
>
>
>
> -Original Message-
> From: Matt Robertson [mailto:websitema...@gmail.com]
> Sent: Thursday, February 17, 2011 9:42 AM
> To: cf-talk
> Subject: Re: Big XML files processing Really s-l-o-w. Solution?
>
>
> On Wed, Feb 16, 2011 at 8:08 PM, Rick Faircloth wrote:
> > Perhaps this will be of some help.
>
> Thanks Rick,  Interesting stuff.  I don't work with xml so much so a
> lot of this is new, and I hate new :-).  I haven't been a full-time
> programmer for a few years now so I haven't kept up with everything
> like I would have otherwise.
>
> I have a business trip today so I have to set this aside but I'm going
> to run Jochem's solution - which is presented darn near turnkey - as
> soon as I can tomorrow and see what happens.
>
> --
> --m@Robertson--
> Janitor, The Robertson Team
> mysecretbase.com
>
>
>
> 

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342382
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


RE: Big XML files processing Really s-l-o-w. Solution?

2011-02-17 Thread Rick Faircloth

No problem... I haven't done much with XML, myself, so I thought
this presented a good opportunity to learn something new, too.

Let us know how the solution you're working on pans out.



-Original Message-
From: Matt Robertson [mailto:websitema...@gmail.com] 
Sent: Thursday, February 17, 2011 9:42 AM
To: cf-talk
Subject: Re: Big XML files processing Really s-l-o-w. Solution?


On Wed, Feb 16, 2011 at 8:08 PM, Rick Faircloth wrote:
> Perhaps this will be of some help.

Thanks Rick,  Interesting stuff.  I don't work with xml so much so a
lot of this is new, and I hate new :-).  I haven't been a full-time
programmer for a few years now so I haven't kept up with everything
like I would have otherwise.

I have a business trip today so I have to set this aside but I'm going
to run Jochem's solution - which is presented darn near turnkey - as
soon as I can tomorrow and see what happens.

-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com



~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342381
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-17 Thread Matt Robertson

On Wed, Feb 16, 2011 at 8:08 PM, Rick Faircloth wrote:
> Perhaps this will be of some help.

Thanks Rick,  Interesting stuff.  I don't work with xml so much so a
lot of this is new, and I hate new :-).  I haven't been a full-time
programmer for a few years now so I haven't kept up with everything
like I would have otherwise.

I have a business trip today so I have to set this aside but I'm going
to run Jochem's solution - which is presented darn near turnkey - as
soon as I can tomorrow and see what happens.

-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342379
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


RE: Big XML files processing Really s-l-o-w. Solution?

2011-02-16 Thread Rick Faircloth

Hi, Matt...

I've been tinkering with parsing the xml file and, though
I haven't developed a full solution, I've made some progress
that I thought I'd share.

I created a MySQL 5 database called, "xmlTest".
I created a MySQL 5 database table called, "xmlBlob".

Fields:

- rowID (int, not null, auto increment)
- xmlContent (blob)
- comment (varchar, 200)

I then ran this code:



filePath = #filePath# (checking accuracy)



select  xmlContent
fromxmlBlob
limit   1
into@xml




Data extraction method 1:



select  extractValue(@xml, '//dataRecord[1]/recordID[1]')
recordID



cfdump toString(extractData01.recordID) = 


Data extraction method 2:



select  extractValue(b.xmlContent, '//dataRecord/recordID')
recordID,
extractValue(b.xmlContent,
'//dataRecord/year') year,
extractValue(b.xmlContent,
'//dataRecord/make') make,
extractValue(b.xmlContent,
'//dataRecord/model') model

fromxmlBlob b



extractData02.recordCount =
#extractData02.recordCount#




#toString(extractData02.recordID)#
#toString(extractData02.year)#
#toString(extractData02.make)#
#toString(extractData02.model)#




---

When the code above is run in a browser, it outputs:

cfdump toString(extractData01.recordID) = 123
extractData02.recordCount = 1

123 124
1980 1981
Trabant Ford
TurboXL TurboDL

---

I was just to the point of figuring out how to get
the two records (I modified your data.xml file example so there
would be two records) to show separately, instead of
the data being in a list for the fields, when I saw your
post and thought I'd go ahead and respond so you could see
if this might be an approach that would work for you.

In the data extraction method 1 above, the idea is to insert
the xml file (which might be too large for this?)
into a blob field in a MySQL table, then read
the blob field and put the xml data file into a variable, '@xml',
then run a cfquery against the variable.

Data extraction method 2 skips reading the xml into a variable
and queries the blob field directly.  This method uses the
MySQL 'xpath_expression' syntax to access data in multi-level
xml data files. With xpath, you can go as deeply into the levels
as needed.

Like I said, this is just the result of tinkering around with
what I've been finding and attempting to get some usable output.

Perhaps this will be of some help.

Rick



-Original Message-
From: Matt Robertson [mailto:websitema...@gmail.com] 
Sent: Wednesday, February 16, 2011 10:18 PM
To: cf-talk
Subject: Re: Big XML files processing Really s-l-o-w. Solution?


On Wed, Feb 16, 2011 at 3:49 PM, Jochem van Dieten wrote:
> Would it help if you split the file into individual records before
> processing them?

I think it would help immensely I think.  I'd prefer to fight my way
thru this in CF if at all possible and things are just complex enough
that I may need to go the route of stax.  Outside my comfort zone in
terms of installing it on the server but I can likely figure it out.

This morning I investigated the possibility of dropping into java and
reading one line at a time, parsing out the text in that line... Its
what I do in this file's CSV-format predecessor.   But there are no
CR's or LF's in the file so that is out.

Thanks very much for that post and your thoughts.

Incidentally if anyone is looking for an xml reader/editor that can
almost instantly read these kinds of moster files, check out the free
firstobject xml editor.

http://www.firstobject.com/

-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com



~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342361
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-16 Thread Matt Robertson

On Wed, Feb 16, 2011 at 3:49 PM, Jochem van Dieten wrote:
> Would it help if you split the file into individual records before
> processing them?

I think it would help immensely I think.  I'd prefer to fight my way
thru this in CF if at all possible and things are just complex enough
that I may need to go the route of stax.  Outside my comfort zone in
terms of installing it on the server but I can likely figure it out.

This morning I investigated the possibility of dropping into java and
reading one line at a time, parsing out the text in that line... Its
what I do in this file's CSV-format predecessor.   But there are no
CR's or LF's in the file so that is out.

Thanks very much for that post and your thoughts.

Incidentally if anyone is looking for an xml reader/editor that can
almost instantly read these kinds of moster files, check out the free
firstobject xml editor.

http://www.firstobject.com/

-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342360
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-16 Thread Jochem van Dieten

On Mon, Feb 14, 2011 at 11:44 PM, Matt Robertson wrote:
> I am tasked with doing an import on a file whose size can essentially
> be unlimited.  We've been able to handle in the 10-15mb range but it
> recently ballooned to 100 mb, and its going to get larger.  Processing
> time seems to be about 66 hours for a 45 mb file and thats a disaster.

Would it help if you split the file into individual records before
processing them?
http://jochem.vandieten.net/2011/02/17/splitting-large-xml-files-with-coldfusion-and-stax/

Jochem


-- 
Jochem van Dieten
http://jochem.vandieten.net/

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342353
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-16 Thread Russ Michaels

Matt, as you want a non CF solution, you might be best posting elsewhere to
find a database guru who could do it all directly on the DB.
If no-one here pipes up then try www.odesk.com



On Wed, Feb 16, 2011 at 6:09 PM, Rick Faircloth wrote:

>
> Hi, Matt...
>
> I know you might have looked at this link at dev.mysql.com
> for a solution to processing your data feed, but I thought I'd
> bring it up, just in case you haven't seen it.  Looks like it
> might be right up your alley.
>
>
> http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html#xml-5.
> 1-in-and-out
>
> While I didn't have to process xml files, I do have to process
> a daily data feed of real estate data in comma-delimited files
> and insert that info into a MySQL 5 database.
>
> At first, I tried processing the files with CF, but it was a
> *really* slow process using CF, taking almost a minute to process
> some files.  Finally, someone on this list
> suggested I take a look at the MySQL commands for loading data
> from files, such as, in your case, "Load_File()", which opens
> an entire XML document, makes it available as a string,
> and inserts this string into a table column.  In my case, using
> MySQL's load data infile, the time was reduced to less than a second.
>
> Check out the page above for details on usage and syntax.
>
> hth,
>
> Rick
>
> -Original Message-
> From: Matt Robertson [mailto:websitema...@gmail.com]
> Sent: Wednesday, February 16, 2011 12:22 PM
> To: cf-talk
> Subject: Re: Big XML files processing Really s-l-o-w. Solution?
>
>
> Hey everybody, we're in a bind here and I pitched the idea to the
> client of bringing in a ringer.  Is there someone out there who is
> interested in building a direct-to-SQL Server or direct-to-mySQL-based
> solution?
>
> Solution would have to, on a scheduled basis, grab a file in a
> location on the server on its own without CF and process it (we have
> stuff that retrieves/places the file).  Current file size is about 50
> mb but must be expected to go to around 100.  At 42mb using just CF I
> am at about 70 seconds per record loop.  We need a *significant*
> improvement in performance.  Work would be on a dedicated Windows
> server.  Present box is a fairly powerful Crystaltech Win2003 Server
> w/4GB RAM and a Xeon 2.8 processor w/4 cores.  We'd consider a
> Win2008, 64-bit installation to improve speed.  Looking at a 24-core
> server at Viviotech w/16gb as a hi-horsepower alternative.
>
> Our feed partner is probably going to be able to use a different
> method of delivery that will reduce file size, but for now we need to
> plan for the worst and move on it.
>
> Anyone interested?  You can email me at my for-reals email at matt AT
> mysecretbase DOT com.
>
> --
> --m@Robertson--
> Janitor, The Robertson Team
> mysecretbase.com
>
>
>
> 

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342343
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-16 Thread Matt Robertson

Yeah I have seen that, but I'm up against a time constraint and there
are a number of issues I don't have my head wrapped around, and the
client needs a solution asap given that he has a daily feed
encompassing thousands of records and dozens of clients ... that has
now been dead about 10 days.

I'm sure I could figure it out with enough time, but the client would
be better served if someone who actually has done this can hit the
ground running on it.

BUT with this whole thing in mind, there are some things I don't get
about this whole load_file() thing that maybe you or someone else can
answer.  I get how the file is pulled into memory, but how can a
multi-level file get pulled in?

Hopefully this excerpt (and its only a fragment) will explain my problem.

- I don't need the header sectioon and want to skip over that.
- The dataArea contains the line item records I want to pull in.
- I can write CF code to pull the price and currency out of the
  pricing section, but how would mySQL do something like that
  where the xml design was made as complicated as humanly
  possible? (this format is StarStandard.org-based, btw)
- Note the ImageAttachmentExtended records.  There can
  be any number of those, or none.  A CFloop works great but ...
- the VehicleRemarketingBoat section can have any number
  of line items *or groups* and they vary from record to record.
  The groups can have more than one entry in them, or not.
  Here again CF gives me tools to deal with this.

I just don't see how something like this can be dumped into a db at
all.  I'm assuming its inexperience with the task.  Or is this xml a
bit beyond that type of approach?



blah
blah
yoo
hoo
woof



123
1980
Trabant
TurboXL

false
8

99500




Annapolis
US

MD



46795014

http://img.arf.com/images/1/70/33/3197033_0_08022024_1.jpg
412
471

2011-02-08T11:24:00-08:00

1



123456

http://img.arf.com/images/1/70/33/3197033_0_08022024_2.jpg
412
471

2011-02-08T11:24:00-08:00

2



987654

http://img.arf.com/images/1/70/33/3197033_0_08022024_3.jpg
412
471

2011-02-08T11:24:00-08:00

1



Starratt and Jenks
1979
Used
Morgan 45

Nominal Length
45


Length At Water Line
31.42


Length Overall
45

11

6.08
Max Draft

25000
Sail

Cruisers
true

One Big Boat
Starratt and Jenks
Charles Morgan
Inventions

Fiberglass

12500







On Wed, Feb 16, 2011 at 10:09 AM, Rick Faircloth
 wrote:
>
> Hi, Matt...
>
> I know you might have looked at this link at dev.mysql.com
> for a solution to processing your data feed, but I thought I'd
> bring it up, just in case you haven't seen it.  Looks like it
> might be right up your alley.
>
> http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html#xml-5.
> 1-in-and-out
>
> While I didn't have to process xml files, I do have to process
> a daily data feed of real estate data in comma-delimited files
> and insert that info into a MySQL 5 database.
>
> At first, I tried processing the files with CF, but it was a
> *really* slow process using CF, taking almost a minute to process
> some files.  Finally, someone on this list
> suggested I take a look at the MySQL commands for loading data
> from files, such as, in your case, "Load_File()", which opens
> an entire XML document, makes it available as a string,
> and inserts this string into a table column.  In my case, using
> MySQL's load data infile, the time was reduced to less than a second.
>
> Check out the page above for details on usage and syntax.
>
> hth,
>
> Rick
>
> -----Original Message-----
> From: Matt Robertson [mailto:websitema...@gmail.com]
> Sent: Wednesday, February 16, 2011 12:22 PM
> To: cf-talk
> Subject: Re: Big XML files processing Really s-l-o-w. Solution?
>
>
> Hey everybody, we're in a bind here and I pitched the idea to the

RE: Big XML files processing Really s-l-o-w. Solution?

2011-02-16 Thread Rick Faircloth

Hi, Matt...

I know you might have looked at this link at dev.mysql.com
for a solution to processing your data feed, but I thought I'd
bring it up, just in case you haven't seen it.  Looks like it
might be right up your alley. 

http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html#xml-5.
1-in-and-out

While I didn't have to process xml files, I do have to process
a daily data feed of real estate data in comma-delimited files
and insert that info into a MySQL 5 database.

At first, I tried processing the files with CF, but it was a
*really* slow process using CF, taking almost a minute to process
some files.  Finally, someone on this list
suggested I take a look at the MySQL commands for loading data
from files, such as, in your case, "Load_File()", which opens
an entire XML document, makes it available as a string,
and inserts this string into a table column.  In my case, using
MySQL's load data infile, the time was reduced to less than a second.

Check out the page above for details on usage and syntax.

hth,

Rick

-Original Message-
From: Matt Robertson [mailto:websitema...@gmail.com] 
Sent: Wednesday, February 16, 2011 12:22 PM
To: cf-talk
Subject: Re: Big XML files processing Really s-l-o-w. Solution?


Hey everybody, we're in a bind here and I pitched the idea to the
client of bringing in a ringer.  Is there someone out there who is
interested in building a direct-to-SQL Server or direct-to-mySQL-based
solution?

Solution would have to, on a scheduled basis, grab a file in a
location on the server on its own without CF and process it (we have
stuff that retrieves/places the file).  Current file size is about 50
mb but must be expected to go to around 100.  At 42mb using just CF I
am at about 70 seconds per record loop.  We need a *significant*
improvement in performance.  Work would be on a dedicated Windows
server.  Present box is a fairly powerful Crystaltech Win2003 Server
w/4GB RAM and a Xeon 2.8 processor w/4 cores.  We'd consider a
Win2008, 64-bit installation to improve speed.  Looking at a 24-core
server at Viviotech w/16gb as a hi-horsepower alternative.

Our feed partner is probably going to be able to use a different
method of delivery that will reduce file size, but for now we need to
plan for the worst and move on it.

Anyone interested?  You can email me at my for-reals email at matt AT
mysecretbase DOT com.

-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com



~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342338
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-16 Thread Matt Robertson

Hey everybody, we're in a bind here and I pitched the idea to the
client of bringing in a ringer.  Is there someone out there who is
interested in building a direct-to-SQL Server or direct-to-mySQL-based
solution?

Solution would have to, on a scheduled basis, grab a file in a
location on the server on its own without CF and process it (we have
stuff that retrieves/places the file).  Current file size is about 50
mb but must be expected to go to around 100.  At 42mb using just CF I
am at about 70 seconds per record loop.  We need a *significant*
improvement in performance.  Work would be on a dedicated Windows
server.  Present box is a fairly powerful Crystaltech Win2003 Server
w/4GB RAM and a Xeon 2.8 processor w/4 cores.  We'd consider a
Win2008, 64-bit installation to improve speed.  Looking at a 24-core
server at Viviotech w/16gb as a hi-horsepower alternative.

Our feed partner is probably going to be able to use a different
method of delivery that will reduce file size, but for now we need to
plan for the worst and move on it.

Anyone interested?  You can email me at my for-reals email at matt AT
mysecretbase DOT com.

-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342336
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-14 Thread Matt Robertson

Very interesting stuff, guys.  Very much appreciated.  Should have
come here before I did the project as opposed to afterward.  You'd
think I'd know better by now.

-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342231
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-14 Thread Russ Michaels

That is probably the solution I was thinking of too, I expect Ben or someone
just wrapped it up into a UDF or CFC or posted it on one of the
aforementioned sites.

Russ

On Tue, Feb 15, 2011 at 12:32 AM, Michael Grant  wrote:

>
> That actually looks like a pretty nice solution if bulkinsert isn't an
> option.
>
>
> On Mon, Feb 14, 2011 at 7:24 PM, Alan Rother 
> wrote:
>
> >
> > When in doubt, ask Ben Nadel(or check his blog)... Words to live by
> >
> > I had the same issue, huge variable XML file, lots of variations, had a
> > hell
> > of a time making it work in SQL Server due to the complexity
> >
> > Used this, works amazingly well, all things considered, executes
> > very quickly
> >
> >
> >
> http://www.bennadel.com/blog/1345-Ask-Ben-Parsing-Very-Large-XML-Documents-In-ColdFusion.htm
> >
> > =]
> >
> >
> > On Mon, Feb 14, 2011 at 5:15 PM, Casey Dougall <
> > ca...@uberwebsitesolutions.com> wrote:
> >
> > >
> > > On Mon, Feb 14, 2011 at 7:07 PM, Matt Robertson <
> websitema...@gmail.com
> > > >wrote:
> > >
> > > >
> > > > Client is running mySQL, But a SQL Server could probably be arranged
> > > > if we have to.  Kind of a painful investment for them, though.
> > > >
> > > >
> > >
> > > Well seems there is support for this in mysql as well but examples use
> > > something simular to BulkInsert where the xml file is on the MySql
> > Server.
> > >
> > > http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html
> > >
> > >
> > >
> >
> >
>
> 

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342229
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-14 Thread Michael Grant

That actually looks like a pretty nice solution if bulkinsert isn't an
option.


On Mon, Feb 14, 2011 at 7:24 PM, Alan Rother  wrote:

>
> When in doubt, ask Ben Nadel(or check his blog)... Words to live by
>
> I had the same issue, huge variable XML file, lots of variations, had a
> hell
> of a time making it work in SQL Server due to the complexity
>
> Used this, works amazingly well, all things considered, executes
> very quickly
>
>
> http://www.bennadel.com/blog/1345-Ask-Ben-Parsing-Very-Large-XML-Documents-In-ColdFusion.htm
>
> =]
>
>
> On Mon, Feb 14, 2011 at 5:15 PM, Casey Dougall <
> ca...@uberwebsitesolutions.com> wrote:
>
> >
> > On Mon, Feb 14, 2011 at 7:07 PM, Matt Robertson  > >wrote:
> >
> > >
> > > Client is running mySQL, But a SQL Server could probably be arranged
> > > if we have to.  Kind of a painful investment for them, though.
> > >
> > >
> >
> > Well seems there is support for this in mysql as well but examples use
> > something simular to BulkInsert where the xml file is on the MySql
> Server.
> >
> > http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html
> >
> >
> >
>
> 

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342227
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-14 Thread Russ Michaels

Processing the XML with Java rather than CFML will speed things up.

check riaforge.org or cflib.org and the adobe exchange, somewhere there is a
XML parser that does just this and may give you the boost you want.

Russ
On Tue, Feb 15, 2011 at 12:15 AM, Casey Dougall <
ca...@uberwebsitesolutions.com> wrote:

>
> On Mon, Feb 14, 2011 at 7:07 PM, Matt Robertson  >wrote:
>
> >
> > Client is running mySQL, But a SQL Server could probably be arranged
> > if we have to.  Kind of a painful investment for them, though.
> >
> >
>
> Well seems there is support for this in mysql as well but examples use
> something simular to BulkInsert where the xml file is on the MySql Server.
>
> http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html
>
>
> 

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342226
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-14 Thread Alan Rother

When in doubt, ask Ben Nadel(or check his blog)... Words to live by

I had the same issue, huge variable XML file, lots of variations, had a hell
of a time making it work in SQL Server due to the complexity

Used this, works amazingly well, all things considered, executes
very quickly

http://www.bennadel.com/blog/1345-Ask-Ben-Parsing-Very-Large-XML-Documents-In-ColdFusion.htm

=]


On Mon, Feb 14, 2011 at 5:15 PM, Casey Dougall <
ca...@uberwebsitesolutions.com> wrote:

>
> On Mon, Feb 14, 2011 at 7:07 PM, Matt Robertson  >wrote:
>
> >
> > Client is running mySQL, But a SQL Server could probably be arranged
> > if we have to.  Kind of a painful investment for them, though.
> >
> >
>
> Well seems there is support for this in mysql as well but examples use
> something simular to BulkInsert where the xml file is on the MySql Server.
>
> http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html
>
>
> 

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342225
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-14 Thread Casey Dougall

On Mon, Feb 14, 2011 at 7:07 PM, Matt Robertson wrote:

>
> Client is running mySQL, But a SQL Server could probably be arranged
> if we have to.  Kind of a painful investment for them, though.
>
>

Well seems there is support for this in mysql as well but examples use
something simular to BulkInsert where the xml file is on the MySql Server.

http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html


~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342223
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-14 Thread Michael Grant

>
> > you can make use of BulkInsert


+1. I love love love bulkinsert.


~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:34
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-14 Thread Matt Robertson

On Mon, Feb 14, 2011 at 3:27 PM, Mark Mandel wrote:
> Either that, or start looking at something like StaX -
> http://stax.codehaus.org/Home

thx for the tip on Stax but it frankly looks a bit out of my league
for this project.

-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342221
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-14 Thread Matt Robertson

Client is running mySQL, But a SQL Server could probably be arranged
if we have to.  Kind of a painful investment for them, though.

I was kind of hoping you guys would find a bonehead mistake that would
magically clear this up.  I'm out of CF-based ideas.

Speaking of which, I tried shortening those var names with some
replace() statements and it had zero effect.  Didn't expect it would
but wanted to throw it up against the wall.

Damn shame I can't drop to java and do a line-by-line read like I do
with CSV files :-(

-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342220
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-14 Thread Casey Dougall

On Mon, Feb 14, 2011 at 6:27 PM, Mark Mandel  wrote:

>
> Either that, or start looking at something like StaX -
> http://stax.codehaus.org/Home
>
> Pulling that large a file in memory is going to suck a lot, no matter
> what you do.
>
> Mark
>
>
yeah, I don't know, still seems better equipped for SQL server if you can do
it there.


  


And then the trimmed down Stored Procedure

ALTER PROCEDURE [dbo].[spBIG_XML_Insert]

 @mydoc xml,

AS

declare @hdoc int

-- Create an internal representation of the XML document.
EXEC sp_xml_preparedocument @hdoc OUTPUT, @mydoc, 'http://www.w3.org/2001/XMLSchema-instance";
xmlns:xsd="http://www.w3.org/2001/XMLSchema";
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/";
xmlns:c="urn:Scores/Services/v2/ScoresUpdate" />'

-- Now do the insert

INSERT INTO tblScoreUpdates
SELECT

MemberID = x.MemberID,
ReportDate= x.ReportDate,
Score   = x.Score,
DateAdded   = GetDate()

FROM OPENXML (@hdoc,
'soap:Envelope/soap:Body/c:LatestScoresResponse/c:parameters/c:Scores',2)

WITH
(   MemberID varchar(10)   '@MemberID',
ReportDate varchar(25)  '@ReportDate',
Scoreint   '@Score') as x

WHERE
x.MemberID NOT IN (SELECT t.MemberID
 FROM  tblScoreUpdates t With
(NoLock)
 WHERE t.MemberID=
x.MemberID)
;

--Do another query here if you want,

--remove the document from memory
EXEC sp_xml_removedocument @hDoc


And if you needed something from a level up... just dot notation back to it.

LogID = int  '../@LogID which would grab the logID from Parameters...


~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342218
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-14 Thread Mark Mandel

Either that, or start looking at something like StaX -
http://stax.codehaus.org/Home

Pulling that large a file in memory is going to suck a lot, no matter
what you do.

Mark

On Tue, Feb 15, 2011 at 10:24 AM, Casey Dougall
 wrote:
>
> On Mon, Feb 14, 2011 at 5:44 PM, Matt Robertson wrote:
>
>>
>> My question is how do I reduce processing time... is there anything I
>> could be doing better?
>>
>> I am tasked with doing an import on a file whose size can essentially
>> be unlimited.  We've been able to handle in the 10-15mb range but it
>> recently ballooned to 100 mb, and its going to get larger.  Processing
>> time seems to be about 66 hours for a 45 mb file and thats a disaster.
>>  For a 14mb file its about 90 minutes.
>>
>> Whats happening is this:  CF is looping over a very large number of
>> records that are in themselves fairly complex.  The more records there
>> are, the longer the loop over them takes.  More records + more time
>> per record = a seemingly geometric increase in processing time.
>>
>>
>
>
>
> Are you using MS SQL 2005? Because throwing the XML file directly at SQL
> server works a 1000% better. I've basically given up on processing large XML
> files directly with coldfusion. IF you can save these xml files on the SQL
> server you can make use of BulkInsert but that needs to read the file from
> that server. OR just setup a stored procedure, send your xml over and let it
> do it's thing.
>
> http://msdn.microsoft.com/en-us/library/ms345117%28v=SQL.90%29.aspx
>
>
> 

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342216
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Big XML files processing Really s-l-o-w. Solution?

2011-02-14 Thread Casey Dougall

On Mon, Feb 14, 2011 at 5:44 PM, Matt Robertson wrote:

>
> My question is how do I reduce processing time... is there anything I
> could be doing better?
>
> I am tasked with doing an import on a file whose size can essentially
> be unlimited.  We've been able to handle in the 10-15mb range but it
> recently ballooned to 100 mb, and its going to get larger.  Processing
> time seems to be about 66 hours for a 45 mb file and thats a disaster.
>  For a 14mb file its about 90 minutes.
>
> Whats happening is this:  CF is looping over a very large number of
> records that are in themselves fairly complex.  The more records there
> are, the longer the loop over them takes.  More records + more time
> per record = a seemingly geometric increase in processing time.
>
>



Are you using MS SQL 2005? Because throwing the XML file directly at SQL
server works a 1000% better. I've basically given up on processing large XML
files directly with coldfusion. IF you can save these xml files on the SQL
server you can make use of BulkInsert but that needs to read the file from
that server. OR just setup a stored procedure, send your xml over and let it
do it's thing.

http://msdn.microsoft.com/en-us/library/ms345117%28v=SQL.90%29.aspx


~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342215
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Big XML files processing Really s-l-o-w. Solution?

2011-02-14 Thread Matt Robertson

My question is how do I reduce processing time... is there anything I
could be doing better?

I am tasked with doing an import on a file whose size can essentially
be unlimited.  We've been able to handle in the 10-15mb range but it
recently ballooned to 100 mb, and its going to get larger.  Processing
time seems to be about 66 hours for a 45 mb file and thats a disaster.
 For a 14mb file its about 90 minutes.

Whats happening is this:  CF is looping over a very large number of
records that are in themselves fairly complex.  The more records there
are, the longer the loop over them takes.  More records + more time
per record = a seemingly geometric increase in processing time.

The file is read in like so.











The above takes only a few seconds.  No problem there.

Next I have to read in some header info
:
header.logicalID=x.processvehicleremarketing.applicationarea.sender.logicalID.xmltext;
header.taskID=x.processvehicleremarketing.applicationarea.sender.taskID.xmltext;
header.BODID=x.processvehicleremarketing.applicationarea.BODID.xmltext;
header.created=x.processvehicleremarketing.applicationarea.CreationDateTime.xmltext;
// ...
// and here comes the node with all of the line items in it I'll have
to loop over.  This is where all of the speed issues have been traced
to:
variables.mates.boatArrayLen=arrayLen(x.processvehicleremarketing.ProcessVehicleRemarketingDataArea.VehicleRemarketing);


knowing the array length I can use CFLOOP to look over it and pull
data in where it is then stored in a db.



listings_mates.inHouseListingNumber="M-" &
x.processVehicleRemarketing.processVehicleRemarketingDataArea.vehicleRemarketing[i].vehicleRemarketingHeader.documentIdentificationGroup.documentIdentification.documentID.xmltext;

listings_mates.price=x.processVehicleRemarketing.processVehicleRemarketingDataArea.vehicleRemarketing[i].vehicleRemarketingBoatLineItem.pricingABIE.price.chargeAmount.xmltext;

listings_mates.currency=x.processVehicleRemarketing.processVehicleRemarketingDataArea.vehicleRemarketing[i].vehicleRemarketingBoatLineItem.pricingABIE.price.chargeAmount.xmlAttributes.currencyID;
// there can be more than one of these items so run a loop inside
of the loop

variables.mates.engineArrayLen=arrayLen(x.processVehicleRemarketing.processVehicleRemarketingDataArea.vehicleRemarketing[i].vehicleRemarketingBoatLineItem.VehicleRemarketingEngineLineItem);
ii=0;
do {
ii=ii+1;

listings_mates.engineDesc=x.processVehicleRemarketing.processVehicleRemarketingDataArea.vehicleRemarketing[i].vehicleRemarketingBoatLineItem.VehicleRemarketingEngineLineItem[ii].VehicleRemarketingEngine.modelDescription.xmltext;

listings_mates.engHrs=x.processVehicleRemarketing.processVehicleRemarketingDataArea.vehicleRemarketing[i].vehicleRemarketingBoatLineItem.VehicleRemarketingEngineLineItem[ii].VehicleRemarketingEngine.totalEngineHoursNumeric.xmltext;
} while (ii LTE variables.mates.engineArrayLen);
...

...


And so on.  A hundred or so fields and a dozen or so loops inside the
main loop, along with a loop or two inside of those.  So the very long
variable names get even longer.

As you can see I am pouring the data into a struct, and when done, I
insert it as a db record.

Anyone see a mistake in my methods?  Would things speed up if, before
I read it into an xml object, I ran a replace() or three to shorten up
some of those names?


-- 
--m@Robertson--
Janitor, The Robertson Team
mysecretbase.com

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:342213
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm