[Mono-dev] Performance issue with DataTable.Load on "large" data sets

2011-04-07 Thread Nicklas Overgaard
Hi mono-devers!

I'm currently working on a rather large webproject, where we are using a
combination of mono 2.10.1 and MySQL.

Over the past week, I have observed that loading "large" datasets (5000+
rows) from mysql into a DataTable takes a very long time. 

It's done somewhat like this:


comm.CommandText = query;
comm.CommandTimeout = MySQLConnection.timeout;
MySqlDataReader reader = (MySqlDataReader)comm.ExecuteReader();
DataTable dt = new DataTable();
dt.Load(reader); // <- this is killing mono
reader.Close();



I have created a small testprogram, compiled it on my linux machine and
executed it.

It takes 15 seconds to do such operation under mono - but on windows it
takes only 0.4 seconds (with the same executable, fetching the same
data). I have profiled the application on windows, and it seems that
the .net framework is using specialized methods for loading data from a
datareader.

I have been looking through the implementation in mono, in regard to
DataTable.Load, and I can see that a lot of validation and other stuff
is going on, which could explain the huge difference. I'm also working
on a mono log profile trace, to dig a little deeper.

Would it be OK, if I tried to patch the current mono implementation to
gain the same speeds as .net? The reason for asking, is that I know that
I cannot contribute to Mono if I have seen the actual code in .NET (but
does a profile result count as "seeing the code"?)

Best regards,

Nicklas Overgaard

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Performance issue with DataTable.Load on "large" data sets

2011-04-07 Thread Nicklas Overgaard
Hi again,

I now have a profile log, created with the new mono profiler. It shows,
that the method "EndLoadData" is using up almost all of the time (16
minutes of the 17 minutes it took to create the dump).

When looking in the file "DbDataAdapter.cs" line 355 in current GIT
head, the "BeginLoadData" and "EndLoadData" methods are called for each
iteration in the DataReader's data.

This means that for each row we add to the DataTable, the DataSet is
begin asked to enforce constraints and other stuff in the datatable.

According to MSDN:
http://msdn.microsoft.com/en-us/library/system.data.datatable.beginloaddata.aspx

"BeginLoadData Turns off notifications, index maintenance, and
constraints while loading data."

So would'nt it make sense to move "BeginLoad.." and "EndLoad.." out of
the loop?

Well, I'm trying it out :)

Best regards,

Nicklas Overgaard

On Thu, 2011-04-07 at 14:58 +0200, Nicklas Overgaard wrote:
> Hi mono-devers!
> 
> I'm currently working on a rather large webproject, where we are using a
> combination of mono 2.10.1 and MySQL.
> 
> Over the past week, I have observed that loading "large" datasets (5000+
> rows) from mysql into a DataTable takes a very long time. 
> 
> It's done somewhat like this:
> 
> 
> comm.CommandText = query;
> comm.CommandTimeout = MySQLConnection.timeout;
> MySqlDataReader reader = (MySqlDataReader)comm.ExecuteReader();
> DataTable dt = new DataTable();
> dt.Load(reader); // <- this is killing mono
> reader.Close();
> 
> 
> 
> I have created a small testprogram, compiled it on my linux machine and
> executed it.
> 
> It takes 15 seconds to do such operation under mono - but on windows it
> takes only 0.4 seconds (with the same executable, fetching the same
> data). I have profiled the application on windows, and it seems that
> the .net framework is using specialized methods for loading data from a
> datareader.
> 
> I have been looking through the implementation in mono, in regard to
> DataTable.Load, and I can see that a lot of validation and other stuff
> is going on, which could explain the huge difference. I'm also working
> on a mono log profile trace, to dig a little deeper.
> 
> Would it be OK, if I tried to patch the current mono implementation to
> gain the same speeds as .net? The reason for asking, is that I know that
> I cannot contribute to Mono if I have seen the actual code in .NET (but
> does a profile result count as "seeing the code"?)
> 
> Best regards,
> 
> Nicklas Overgaard
> 
> ___
> Mono-devel-list mailing list
> Mono-devel-list@lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Performance issue with DataTable.Load on "large" data sets

2011-04-07 Thread Nicklas Overgaard
Hi again,

Sorry for the spamming. 

Moving out the "Begin" and "End" load methods reduced DataTable.Load
time to 1.7 seconds on my test machine, so we are getting there!

/Nicklas

On Thu, 2011-04-07 at 19:29 +0200, Nicklas Overgaard wrote:
> Hi again,
> 
> I now have a profile log, created with the new mono profiler. It shows,
> that the method "EndLoadData" is using up almost all of the time (16
> minutes of the 17 minutes it took to create the dump).
> 
> When looking in the file "DbDataAdapter.cs" line 355 in current GIT
> head, the "BeginLoadData" and "EndLoadData" methods are called for each
> iteration in the DataReader's data.
> 
> This means that for each row we add to the DataTable, the DataSet is
> begin asked to enforce constraints and other stuff in the datatable.
> 
> According to MSDN:
> http://msdn.microsoft.com/en-us/library/system.data.datatable.beginloaddata.aspx
> 
> "BeginLoadData Turns off notifications, index maintenance, and
> constraints while loading data."
> 
> So would'nt it make sense to move "BeginLoad.." and "EndLoad.." out of
> the loop?
> 
> Well, I'm trying it out :)
> 
> Best regards,
> 
> Nicklas Overgaard
> 
> On Thu, 2011-04-07 at 14:58 +0200, Nicklas Overgaard wrote:
> > Hi mono-devers!
> > 
> > I'm currently working on a rather large webproject, where we are using a
> > combination of mono 2.10.1 and MySQL.
> > 
> > Over the past week, I have observed that loading "large" datasets (5000+
> > rows) from mysql into a DataTable takes a very long time. 
> > 
> > It's done somewhat like this:
> > 
> > 
> > comm.CommandText = query;
> > comm.CommandTimeout = MySQLConnection.timeout;
> > MySqlDataReader reader = (MySqlDataReader)comm.ExecuteReader();
> > DataTable dt = new DataTable();
> > dt.Load(reader); // <- this is killing mono
> > reader.Close();
> > 
> > 
> > 
> > I have created a small testprogram, compiled it on my linux machine and
> > executed it.
> > 
> > It takes 15 seconds to do such operation under mono - but on windows it
> > takes only 0.4 seconds (with the same executable, fetching the same
> > data). I have profiled the application on windows, and it seems that
> > the .net framework is using specialized methods for loading data from a
> > datareader.
> > 
> > I have been looking through the implementation in mono, in regard to
> > DataTable.Load, and I can see that a lot of validation and other stuff
> > is going on, which could explain the huge difference. I'm also working
> > on a mono log profile trace, to dig a little deeper.
> > 
> > Would it be OK, if I tried to patch the current mono implementation to
> > gain the same speeds as .net? The reason for asking, is that I know that
> > I cannot contribute to Mono if I have seen the actual code in .NET (but
> > does a profile result count as "seeing the code"?)
> > 
> > Best regards,
> > 
> > Nicklas Overgaard
> > 
> > ___
> > Mono-devel-list mailing list
> > Mono-devel-list@lists.ximian.com
> > http://lists.ximian.com/mailman/listinfo/mono-devel-list
> 
> ___
> Mono-devel-list mailing list
> Mono-devel-list@lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Performance issue with DataTable.Load on "large" data sets

2011-04-12 Thread Nicklas Overgaard
Hi again,

I have now made further optimizations, which brings the Load method up
to speed with the .net implementation. However, 5 of the
regression-tests are now failing.

Have all these System.Data regression tests been verified on a windows
machine with .net? I just don't want to chase bugs / regressions that
does not exist/are not valid :)

Best regards,

Nicklas

On Thu, 2011-04-07 at 20:13 +0200, Nicklas Overgaard wrote:
> Hi again,
> 
> Sorry for the spamming. 
> 
> Moving out the "Begin" and "End" load methods reduced DataTable.Load
> time to 1.7 seconds on my test machine, so we are getting there!
> 
> /Nicklas
> 
> On Thu, 2011-04-07 at 19:29 +0200, Nicklas Overgaard wrote:
> > Hi again,
> > 
> > I now have a profile log, created with the new mono profiler. It shows,
> > that the method "EndLoadData" is using up almost all of the time (16
> > minutes of the 17 minutes it took to create the dump).
> > 
> > When looking in the file "DbDataAdapter.cs" line 355 in current GIT
> > head, the "BeginLoadData" and "EndLoadData" methods are called for each
> > iteration in the DataReader's data.
> > 
> > This means that for each row we add to the DataTable, the DataSet is
> > begin asked to enforce constraints and other stuff in the datatable.
> > 
> > According to MSDN:
> > http://msdn.microsoft.com/en-us/library/system.data.datatable.beginloaddata.aspx
> > 
> > "BeginLoadData Turns off notifications, index maintenance, and
> > constraints while loading data."
> > 
> > So would'nt it make sense to move "BeginLoad.." and "EndLoad.." out of
> > the loop?
> > 
> > Well, I'm trying it out :)
> > 
> > Best regards,
> > 
> > Nicklas Overgaard
> > 
> > On Thu, 2011-04-07 at 14:58 +0200, Nicklas Overgaard wrote:
> > > Hi mono-devers!
> > > 
> > > I'm currently working on a rather large webproject, where we are using a
> > > combination of mono 2.10.1 and MySQL.
> > > 
> > > Over the past week, I have observed that loading "large" datasets (5000+
> > > rows) from mysql into a DataTable takes a very long time. 
> > > 
> > > It's done somewhat like this:
> > > 
> > > 
> > > comm.CommandText = query;
> > > comm.CommandTimeout = MySQLConnection.timeout;
> > > MySqlDataReader reader = (MySqlDataReader)comm.ExecuteReader();
> > > DataTable dt = new DataTable();
> > > dt.Load(reader); // <- this is killing mono
> > > reader.Close();
> > > 
> > > 
> > > 
> > > I have created a small testprogram, compiled it on my linux machine and
> > > executed it.
> > > 
> > > It takes 15 seconds to do such operation under mono - but on windows it
> > > takes only 0.4 seconds (with the same executable, fetching the same
> > > data). I have profiled the application on windows, and it seems that
> > > the .net framework is using specialized methods for loading data from a
> > > datareader.
> > > 
> > > I have been looking through the implementation in mono, in regard to
> > > DataTable.Load, and I can see that a lot of validation and other stuff
> > > is going on, which could explain the huge difference. I'm also working
> > > on a mono log profile trace, to dig a little deeper.
> > > 
> > > Would it be OK, if I tried to patch the current mono implementation to
> > > gain the same speeds as .net? The reason for asking, is that I know that
> > > I cannot contribute to Mono if I have seen the actual code in .NET (but
> > > does a profile result count as "seeing the code"?)
> > > 
> > > Best regards,
> > > 
> > > Nicklas Overgaard
> > > 
> > > ___
> > > Mono-devel-list mailing list
> > > Mono-devel-list@lists.ximian.com
> > > http://lists.ximian.com/mailman/listinfo/mono-devel-list
> > 
> > ___
> > Mono-devel-list mailing list
> > Mono-devel-list@lists.ximian.com
> > http://lists.ximian.com/mailman/listinfo/mono-devel-list
> 
> ___
> Mono-devel-list mailing list
> Mono-devel-list@lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Performance issue with DataTable.Load on "large" data sets

2011-04-12 Thread Alan
Hey,

Firstly the simple change of moving the BeginLoad/EndLoad out of the
loop could easily be committed as a separate patch. If it's possible
to verify this change with an additional unit test, all the better! It
means it can never regress again.

As for the failing tests, the simplest thing to do would be to
copy/paste the test assembly from linux to windows and execute it
there to see if all the tests pass. If that doesn't work you could try
copying/pasting the individual tests you want to verify, compiling
them on windows and executing that. The complicated way of testing
would be to check out mono from git, build it on windows and then run
the tests. Either way, a commit which regresses tests can't be
accepted unless those tests can be proven to be incorrect (i.e. the
fail under MS .NET). It's also possible that these are behavioural
differences between .NET 3 and .NET 4, in which case the modifications
would have to be conditionally built.

Alan

On Tue, Apr 12, 2011 at 9:41 AM, Nicklas Overgaard  wrote:
> Hi again,
>
> I have now made further optimizations, which brings the Load method up
> to speed with the .net implementation. However, 5 of the
> regression-tests are now failing.
>
> Have all these System.Data regression tests been verified on a windows
> machine with .net? I just don't want to chase bugs / regressions that
> does not exist/are not valid :)
>
> Best regards,
>
> Nicklas
>
> On Thu, 2011-04-07 at 20:13 +0200, Nicklas Overgaard wrote:
>> Hi again,
>>
>> Sorry for the spamming.
>>
>> Moving out the "Begin" and "End" load methods reduced DataTable.Load
>> time to 1.7 seconds on my test machine, so we are getting there!
>>
>> /Nicklas
>>
>> On Thu, 2011-04-07 at 19:29 +0200, Nicklas Overgaard wrote:
>> > Hi again,
>> >
>> > I now have a profile log, created with the new mono profiler. It shows,
>> > that the method "EndLoadData" is using up almost all of the time (16
>> > minutes of the 17 minutes it took to create the dump).
>> >
>> > When looking in the file "DbDataAdapter.cs" line 355 in current GIT
>> > head, the "BeginLoadData" and "EndLoadData" methods are called for each
>> > iteration in the DataReader's data.
>> >
>> > This means that for each row we add to the DataTable, the DataSet is
>> > begin asked to enforce constraints and other stuff in the datatable.
>> >
>> > According to MSDN:
>> > http://msdn.microsoft.com/en-us/library/system.data.datatable.beginloaddata.aspx
>> >
>> > "BeginLoadData Turns off notifications, index maintenance, and
>> > constraints while loading data."
>> >
>> > So would'nt it make sense to move "BeginLoad.." and "EndLoad.." out of
>> > the loop?
>> >
>> > Well, I'm trying it out :)
>> >
>> > Best regards,
>> >
>> > Nicklas Overgaard
>> >
>> > On Thu, 2011-04-07 at 14:58 +0200, Nicklas Overgaard wrote:
>> > > Hi mono-devers!
>> > >
>> > > I'm currently working on a rather large webproject, where we are using a
>> > > combination of mono 2.10.1 and MySQL.
>> > >
>> > > Over the past week, I have observed that loading "large" datasets (5000+
>> > > rows) from mysql into a DataTable takes a very long time.
>> > >
>> > > It's done somewhat like this:
>> > > 
>> > >
>> > > comm.CommandText = query;
>> > > comm.CommandTimeout = MySQLConnection.timeout;
>> > > MySqlDataReader reader = (MySqlDataReader)comm.ExecuteReader();
>> > > DataTable dt = new DataTable();
>> > > dt.Load(reader); // <- this is killing mono
>> > > reader.Close();
>> > >
>> > > 
>> > >
>> > > I have created a small testprogram, compiled it on my linux machine and
>> > > executed it.
>> > >
>> > > It takes 15 seconds to do such operation under mono - but on windows it
>> > > takes only 0.4 seconds (with the same executable, fetching the same
>> > > data). I have profiled the application on windows, and it seems that
>> > > the .net framework is using specialized methods for loading data from a
>> > > datareader.
>> > >
>> > > I have been looking through the implementation in mono, in regard to
>> > > DataTable.Load, and I can see that a lot of validation and other stuff
>> > > is going on, which could explain the huge difference. I'm also working
>> > > on a mono log profile trace, to dig a little deeper.
>> > >
>> > > Would it be OK, if I tried to patch the current mono implementation to
>> > > gain the same speeds as .net? The reason for asking, is that I know that
>> > > I cannot contribute to Mono if I have seen the actual code in .NET (but
>> > > does a profile result count as "seeing the code"?)
>> > >
>> > > Best regards,
>> > >
>> > > Nicklas Overgaard
>> > >
>> > > ___
>> > > Mono-devel-list mailing list
>> > > Mono-devel-list@lists.ximian.com
>> > > http://lists.ximian.com/mailman/listinfo/mono-devel-list
>> >
>> > ___
>> > Mono-devel-list mailing list
>> > Mono-devel-list@lists.ximian.com
>> > http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
>> _

Re: [Mono-dev] Performance issue with DataTable.Load on "large" data sets

2011-04-12 Thread Nicklas Overgaard
Hey Alan,

Thanks for picking it up :)

> Firstly the simple change of moving the BeginLoad/EndLoad out of the
> loop could easily be committed as a separate patch. If it's possible
> to verify this change with an additional unit test, all the better! It
> means it can never regress again.

Well, the thing is that the simple move of Begin/End load actually
breaks four of the tests. However, after reviewing the test code, i'm
seriously doubting that the test is correct - hence the question about
having verified it on windows :)

The patch along with a little graph showing the performance improvement
has been attached.

I hope that someone with more insigt in System.Data can shed some light
on the now-broken unit tests.

I will get back when i have "fixed" the remaining issues, which also
gives more performance.

And thanks for the tips about testing it on windows. I will figure
something out.

Best regards,

Nicklas

On Tue, 2011-04-12 at 10:38 +0100, Alan wrote:
> Hey,
> 
> Firstly the simple change of moving the BeginLoad/EndLoad out of the
> loop could easily be committed as a separate patch. If it's possible
> to verify this change with an additional unit test, all the better! It
> means it can never regress again.
> 
> As for the failing tests, the simplest thing to do would be to
> copy/paste the test assembly from linux to windows and execute it
> there to see if all the tests pass. If that doesn't work you could try
> copying/pasting the individual tests you want to verify, compiling
> them on windows and executing that. The complicated way of testing
> would be to check out mono from git, build it on windows and then run
> the tests. Either way, a commit which regresses tests can't be
> accepted unless those tests can be proven to be incorrect (i.e. the
> fail under MS .NET). It's also possible that these are behavioural
> differences between .NET 3 and .NET 4, in which case the modifications
> would have to be conditionally built.
> 
> Alan
> 
> On Tue, Apr 12, 2011 at 9:41 AM, Nicklas Overgaard  wrote:
> > Hi again,
> >
> > I have now made further optimizations, which brings the Load method up
> > to speed with the .net implementation. However, 5 of the
> > regression-tests are now failing.
> >
> > Have all these System.Data regression tests been verified on a windows
> > machine with .net? I just don't want to chase bugs / regressions that
> > does not exist/are not valid :)
> >
> > Best regards,
> >
> > Nicklas
> >
> > On Thu, 2011-04-07 at 20:13 +0200, Nicklas Overgaard wrote:
> >> Hi again,
> >>
> >> Sorry for the spamming.
> >>
> >> Moving out the "Begin" and "End" load methods reduced DataTable.Load
> >> time to 1.7 seconds on my test machine, so we are getting there!
> >>
> >> /Nicklas
> >>
> >> On Thu, 2011-04-07 at 19:29 +0200, Nicklas Overgaard wrote:
> >> > Hi again,
> >> >
> >> > I now have a profile log, created with the new mono profiler. It shows,
> >> > that the method "EndLoadData" is using up almost all of the time (16
> >> > minutes of the 17 minutes it took to create the dump).
> >> >
> >> > When looking in the file "DbDataAdapter.cs" line 355 in current GIT
> >> > head, the "BeginLoadData" and "EndLoadData" methods are called for each
> >> > iteration in the DataReader's data.
> >> >
> >> > This means that for each row we add to the DataTable, the DataSet is
> >> > begin asked to enforce constraints and other stuff in the datatable.
> >> >
> >> > According to MSDN:
> >> > http://msdn.microsoft.com/en-us/library/system.data.datatable.beginloaddata.aspx
> >> >
> >> > "BeginLoadData Turns off notifications, index maintenance, and
> >> > constraints while loading data."
> >> >
> >> > So would'nt it make sense to move "BeginLoad.." and "EndLoad.." out of
> >> > the loop?
> >> >
> >> > Well, I'm trying it out :)
> >> >
> >> > Best regards,
> >> >
> >> > Nicklas Overgaard
> >> >
> >> > On Thu, 2011-04-07 at 14:58 +0200, Nicklas Overgaard wrote:
> >> > > Hi mono-devers!
> >> > >
> >> > > I'm currently working on a rather large webproject, where we are using 
> >> > > a
> >> > > combination of mono 2.10.1 and MySQL.
> >> > >
> >> > > Over the past week, I have observed that loading "large" datasets 
> >> > > (5000+
> >> > > rows) from mysql into a DataTable takes a very long time.
> >> > >
> >> > > It's done somewhat like this:
> >> > > 
> >> > >
> >> > > comm.CommandText = query;
> >> > > comm.CommandTimeout = MySQLConnection.timeout;
> >> > > MySqlDataReader reader = (MySqlDataReader)comm.ExecuteReader();
> >> > > DataTable dt = new DataTable();
> >> > > dt.Load(reader); // <- this is killing mono
> >> > > reader.Close();
> >> > >
> >> > > 
> >> > >
> >> > > I have created a small testprogram, compiled it on my linux machine and
> >> > > executed it.
> >> > >
> >> > > It takes 15 seconds to do such operation under mono - but on windows it
> >> > > takes only 0.4 seconds (with the same executable, fetching the same
> >> > > data). I have profiled the application on windows, and it se

Re: [Mono-dev] Performance issue with DataTable.Load on "large" data sets

2011-04-13 Thread Alan
Hey,

On Tue, Apr 12, 2011 at 11:09 AM, Nicklas Overgaard  wrote:
> Hey Alan,
>
> Thanks for picking it up :)
>
>> Firstly the simple change of moving the BeginLoad/EndLoad out of the
>> loop could easily be committed as a separate patch. If it's possible
>> to verify this change with an additional unit test, all the better! It
>> means it can never regress again.
>
> Well, the thing is that the simple move of Begin/End load actually
> breaks four of the tests. However, after reviewing the test code, i'm
> seriously doubting that the test is correct - hence the question about
> having verified it on windows :)

In that cast running those 4 tests on the microsoft implementation
would be the way forward. If they pass there then you know the change
requires further modifications to be correct. If they fail, then you'd
just have to update them so that they pass. Note that in that case
you'll have to run the tests under the 2.0, 3.0 and 4.0 frameworks in
case it was a behavioural change between newer and older runtimes. The
perf improvement is definitely worth the time this will take :)

Alan

>
> The patch along with a little graph showing the performance improvement
> has been attached.
>
> I hope that someone with more insigt in System.Data can shed some light
> on the now-broken unit tests.
>
> I will get back when i have "fixed" the remaining issues, which also
> gives more performance.
>
> And thanks for the tips about testing it on windows. I will figure
> something out.
>
> Best regards,
>
> Nicklas
>
> On Tue, 2011-04-12 at 10:38 +0100, Alan wrote:
>> Hey,
>>
>> Firstly the simple change of moving the BeginLoad/EndLoad out of the
>> loop could easily be committed as a separate patch. If it's possible
>> to verify this change with an additional unit test, all the better! It
>> means it can never regress again.
>>
>> As for the failing tests, the simplest thing to do would be to
>> copy/paste the test assembly from linux to windows and execute it
>> there to see if all the tests pass. If that doesn't work you could try
>> copying/pasting the individual tests you want to verify, compiling
>> them on windows and executing that. The complicated way of testing
>> would be to check out mono from git, build it on windows and then run
>> the tests. Either way, a commit which regresses tests can't be
>> accepted unless those tests can be proven to be incorrect (i.e. the
>> fail under MS .NET). It's also possible that these are behavioural
>> differences between .NET 3 and .NET 4, in which case the modifications
>> would have to be conditionally built.
>>
>> Alan
>>
>> On Tue, Apr 12, 2011 at 9:41 AM, Nicklas Overgaard  wrote:
>> > Hi again,
>> >
>> > I have now made further optimizations, which brings the Load method up
>> > to speed with the .net implementation. However, 5 of the
>> > regression-tests are now failing.
>> >
>> > Have all these System.Data regression tests been verified on a windows
>> > machine with .net? I just don't want to chase bugs / regressions that
>> > does not exist/are not valid :)
>> >
>> > Best regards,
>> >
>> > Nicklas
>> >
>> > On Thu, 2011-04-07 at 20:13 +0200, Nicklas Overgaard wrote:
>> >> Hi again,
>> >>
>> >> Sorry for the spamming.
>> >>
>> >> Moving out the "Begin" and "End" load methods reduced DataTable.Load
>> >> time to 1.7 seconds on my test machine, so we are getting there!
>> >>
>> >> /Nicklas
>> >>
>> >> On Thu, 2011-04-07 at 19:29 +0200, Nicklas Overgaard wrote:
>> >> > Hi again,
>> >> >
>> >> > I now have a profile log, created with the new mono profiler. It shows,
>> >> > that the method "EndLoadData" is using up almost all of the time (16
>> >> > minutes of the 17 minutes it took to create the dump).
>> >> >
>> >> > When looking in the file "DbDataAdapter.cs" line 355 in current GIT
>> >> > head, the "BeginLoadData" and "EndLoadData" methods are called for each
>> >> > iteration in the DataReader's data.
>> >> >
>> >> > This means that for each row we add to the DataTable, the DataSet is
>> >> > begin asked to enforce constraints and other stuff in the datatable.
>> >> >
>> >> > According to MSDN:
>> >> > http://msdn.microsoft.com/en-us/library/system.data.datatable.beginloaddata.aspx
>> >> >
>> >> > "BeginLoadData Turns off notifications, index maintenance, and
>> >> > constraints while loading data."
>> >> >
>> >> > So would'nt it make sense to move "BeginLoad.." and "EndLoad.." out of
>> >> > the loop?
>> >> >
>> >> > Well, I'm trying it out :)
>> >> >
>> >> > Best regards,
>> >> >
>> >> > Nicklas Overgaard
>> >> >
>> >> > On Thu, 2011-04-07 at 14:58 +0200, Nicklas Overgaard wrote:
>> >> > > Hi mono-devers!
>> >> > >
>> >> > > I'm currently working on a rather large webproject, where we are 
>> >> > > using a
>> >> > > combination of mono 2.10.1 and MySQL.
>> >> > >
>> >> > > Over the past week, I have observed that loading "large" datasets 
>> >> > > (5000+
>> >> > > rows) from mysql into a DataTable takes a very long time.
>> >> > >
>> >> > > It's done some

Re: [Mono-dev] Performance issue with DataTable.Load on "large" data sets

2011-04-20 Thread Nicklas Overgaard
That's true!

However, I'm currently very hung up on finishing the client's project,
but once I have finished that, I will have some spare-time to dig into
this issue.

Thanks for the guidance so far :)

Happy easter to everyone!

/Nicklas

On Wed, 2011-04-13 at 13:07 +0100, Alan wrote:
> Hey,
> 
> On Tue, Apr 12, 2011 at 11:09 AM, Nicklas Overgaard  wrote:
> > Hey Alan,
> >
> > Thanks for picking it up :)
> >
> >> Firstly the simple change of moving the BeginLoad/EndLoad out of the
> >> loop could easily be committed as a separate patch. If it's possible
> >> to verify this change with an additional unit test, all the better! It
> >> means it can never regress again.
> >
> > Well, the thing is that the simple move of Begin/End load actually
> > breaks four of the tests. However, after reviewing the test code, i'm
> > seriously doubting that the test is correct - hence the question about
> > having verified it on windows :)
> 
> In that cast running those 4 tests on the microsoft implementation
> would be the way forward. If they pass there then you know the change
> requires further modifications to be correct. If they fail, then you'd
> just have to update them so that they pass. Note that in that case
> you'll have to run the tests under the 2.0, 3.0 and 4.0 frameworks in
> case it was a behavioural change between newer and older runtimes. The
> perf improvement is definitely worth the time this will take :)
> 
> Alan
> 
> >
> > The patch along with a little graph showing the performance improvement
> > has been attached.
> >
> > I hope that someone with more insigt in System.Data can shed some light
> > on the now-broken unit tests.
> >
> > I will get back when i have "fixed" the remaining issues, which also
> > gives more performance.
> >
> > And thanks for the tips about testing it on windows. I will figure
> > something out.
> >
> > Best regards,
> >
> > Nicklas
> >
> > On Tue, 2011-04-12 at 10:38 +0100, Alan wrote:
> >> Hey,
> >>
> >> Firstly the simple change of moving the BeginLoad/EndLoad out of the
> >> loop could easily be committed as a separate patch. If it's possible
> >> to verify this change with an additional unit test, all the better! It
> >> means it can never regress again.
> >>
> >> As for the failing tests, the simplest thing to do would be to
> >> copy/paste the test assembly from linux to windows and execute it
> >> there to see if all the tests pass. If that doesn't work you could try
> >> copying/pasting the individual tests you want to verify, compiling
> >> them on windows and executing that. The complicated way of testing
> >> would be to check out mono from git, build it on windows and then run
> >> the tests. Either way, a commit which regresses tests can't be
> >> accepted unless those tests can be proven to be incorrect (i.e. the
> >> fail under MS .NET). It's also possible that these are behavioural
> >> differences between .NET 3 and .NET 4, in which case the modifications
> >> would have to be conditionally built.
> >>
> >> Alan
> >>
> >> On Tue, Apr 12, 2011 at 9:41 AM, Nicklas Overgaard  
> >> wrote:
> >> > Hi again,
> >> >
> >> > I have now made further optimizations, which brings the Load method up
> >> > to speed with the .net implementation. However, 5 of the
> >> > regression-tests are now failing.
> >> >
> >> > Have all these System.Data regression tests been verified on a windows
> >> > machine with .net? I just don't want to chase bugs / regressions that
> >> > does not exist/are not valid :)
> >> >
> >> > Best regards,
> >> >
> >> > Nicklas
> >> >
> >> > On Thu, 2011-04-07 at 20:13 +0200, Nicklas Overgaard wrote:
> >> >> Hi again,
> >> >>
> >> >> Sorry for the spamming.
> >> >>
> >> >> Moving out the "Begin" and "End" load methods reduced DataTable.Load
> >> >> time to 1.7 seconds on my test machine, so we are getting there!
> >> >>
> >> >> /Nicklas
> >> >>
> >> >> On Thu, 2011-04-07 at 19:29 +0200, Nicklas Overgaard wrote:
> >> >> > Hi again,
> >> >> >
> >> >> > I now have a profile log, created with the new mono profiler. It 
> >> >> > shows,
> >> >> > that the method "EndLoadData" is using up almost all of the time (16
> >> >> > minutes of the 17 minutes it took to create the dump).
> >> >> >
> >> >> > When looking in the file "DbDataAdapter.cs" line 355 in current GIT
> >> >> > head, the "BeginLoadData" and "EndLoadData" methods are called for 
> >> >> > each
> >> >> > iteration in the DataReader's data.
> >> >> >
> >> >> > This means that for each row we add to the DataTable, the DataSet is
> >> >> > begin asked to enforce constraints and other stuff in the datatable.
> >> >> >
> >> >> > According to MSDN:
> >> >> > http://msdn.microsoft.com/en-us/library/system.data.datatable.beginloaddata.aspx
> >> >> >
> >> >> > "BeginLoadData Turns off notifications, index maintenance, and
> >> >> > constraints while loading data."
> >> >> >
> >> >> > So would'nt it make sense to move "BeginLoad.." and "EndLoad.." out of
> >> >> > the loop?
> >> >> >
> >> >> > Well,