RE: GSOC 2018 Project - A New Sorting Routine

2018-08-01 Thread Kefan Yang
Thanks for your time!

From: Tomas Vondra
Sent: August 1, 2018 6:30 AM
To: Kefan Yang
Cc: Andrey Borodin; Peter Geoghegan; alvhe...@2ndquadrant.com; PostgreSQL 
Hackers
Subject: Re: GSOC 2018 Project - A New Sorting Routine


On 07/30/2018 11:21 PM, Kefan Yang wrote:
> Hey Tomas!
> 
> Sorry to bother but it would be great if we can get the test results 
> this week.
> 

Attached are results from the i5 machine. I'm unable to rerun the tests 
on the xeon box at the moment (which is IMHO the more interesting one).

Complete test data is available at

   https://bitbucket.org/tvondra/sort-intro-sort-i5-2/src

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: GSOC 2018 Project - A New Sorting Routine

2018-07-30 Thread Kefan Yang
Hey Tomas! 

Sorry to bother but it would be great if we can get the test results this week.

Regards,
Kefan

From: Tomas Vondra
Sent: July 24, 2018 8:16 AM
To: Kefan Yang
Cc: Andrey Borodin; Peter Geoghegan; alvhe...@2ndquadrant.com; PostgreSQL 
Hackers
Subject: Re: GSOC 2018 Project - A New Sorting Routine



On 07/24/2018 12:21 AM, Kefan Yang wrote:
> Hi Tomas!
> 
> I did a few tests on my own Linux machine, but the problem is that my 
> resources on AWS(CPU, RAM and even Disk space) are very limited. I 
> considered establishing virtual machine on my own PC but the performance 
> is even worse.
> 
> My original patch has two main optimizations: (1) switch to heap sort 
> when depth limit exceeded (2) check whether the array is presorted only 
> once at the beginning. Now I want to test these optimizations 
> separately. On AWS EC2 instance, regressions on CREATE INDEX cases seems 
> to be less significant if we use (1) only, but I can only test up to 
> 10 records and 512MB memory using your scripts.
> 
> So would you mind re-running the tests using the two patches I provided 
> in the attachment? That will be very helpful
> 

I can do that, but it'll have to wait a couple of days. I'm currently 
using the boxes for some other tests.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: GSOC 2018 Project - A New Sorting Routine

2018-07-24 Thread Tomas Vondra




On 07/24/2018 12:21 AM, Kefan Yang wrote:

Hi Tomas!

I did a few tests on my own Linux machine, but the problem is that my 
resources on AWS(CPU, RAM and even Disk space) are very limited. I 
considered establishing virtual machine on my own PC but the performance 
is even worse.


My original patch has two main optimizations: (1) switch to heap sort 
when depth limit exceeded (2) check whether the array is presorted only 
once at the beginning. Now I want to test these optimizations 
separately. On AWS EC2 instance, regressions on CREATE INDEX cases seems 
to be less significant if we use (1) only, but I can only test up to 
10 records and 512MB memory using your scripts.


So would you mind re-running the tests using the two patches I provided 
in the attachment? That will be very helpful




I can do that, but it'll have to wait a couple of days. I'm currently 
using the boxes for some other tests.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: GSOC 2018 Project - A New Sorting Routine

2018-07-23 Thread Kefan Yang
Hi Tomas!

I did a few tests on my own Linux machine, but the problem is that my resources 
on AWS(CPU, RAM and even Disk space) are very limited. I considered 
establishing virtual machine on my own PC but the performance is even worse.

My original patch has two main optimizations: (1) switch to heap sort when 
depth limit exceeded (2) check whether the array is presorted only once at the 
beginning. Now I want to test these optimizations separately. On AWS EC2 
instance, regressions on CREATE INDEX cases seems to be less significant if we 
use (1) only, but I can only test up to 10 records and 512MB memory using 
your scripts.

So would you mind re-running the tests using the two patches I provided in the 
attachment? That will be very helpful

Regards,
Kefan

From: Tomas Vondra
Sent: July 18, 2018 2:26 PM
To: Kefan Yang
Cc: Andrey Borodin; Peter Geoghegan; PostgreSQL Hackers
Subject: Re: GSOC 2018 Project - A New Sorting Routine

I don't have any script for that - load the files into a spreadsheet,
create pivot tables and you're done.

regards

On 07/18/2018 11:13 PM, Kefan Yang wrote:
> Hey Tomas!
> 
>  
> 
> I am trying to reproduce the results on my machine. Could you please
> share the script to generate .ods files?
> 
>  
> 
> Regards,
> 
> Kefan
> 
>  
> 
> *From: *Tomas Vondra <mailto:tomas.von...@2ndquadrant.com>
> *Sent: *July 18, 2018 2:05 AM
> *To: *Andrey Borodin <mailto:x4...@yandex-team.ru>
> *Cc: *Peter Geoghegan <mailto:p...@bowt.ie>; Kefan Yang
> <mailto:starord...@gmail.com>; PostgreSQL Hackers
> <mailto:pgsql-hackers@lists.postgresql.org>
> *Subject: *Re: GSOC 2018 Project - A New Sorting Routine
> 
>  
> 
>  
> 
>  
> 
> On 07/18/2018 07:06 AM, Andrey Borodin wrote:
> 
>> Hi, Tomas!
> 
>>
> 
>>> 15 июля 2018 г., в 1:20, Tomas Vondra  
>>> <mailto:tomas.von...@2ndquadrant.com>> написал(а):
> 
>>> 
> 
>>> So I doubt it's this, but I've tweaked the scripts to also set this GUC
> 
>>> and restarted the tests on both machines. Let's see what that does.
> 
>>
> 
>> Do you observe any different results?
> 
>>
> 
>  
> 
> It did change the CREATE INDEX results, depending on the scale. The full
> 
> data is available at [1] and [2], attached is a spreadsheet summary from
> 
> the Xeon box.
> 
>  
> 
> For the largest scale (1M rows) the regressions for CREATE INDEX queries
> 
> mostly disappeared. For 10k rows it still affects CREATE INDEX with a
> 
> text column, and the 100k case behaves just like before (so significant
> 
> regressions for CREATE INDEX).
> 
>  
> 
> I don't have time to investigate this further at the moment, but I'm
> 
> still of the opinion that there's little to gain by replacing our
> 
> current sort algorithm with this.
> 
>  
> 
>  
> 
> [1] https://bitbucket.org/tvondra/sort-intro-sort-xeon/src/master/
> 
> [2] https://bitbucket.org/tvondra/sort-intro-sort-i5/src/master/
> 
>  
> 
> regards
> 
>  
> 
> -- 
> 
> Tomas Vondra  http://www.2ndQuadrant.com
> 
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 
>  
> 

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



check_once.diff
Description: Binary data


use_heap.diff
Description: Binary data


Re: GSOC 2018 Project - A New Sorting Routine

2018-07-18 Thread Tomas Vondra
I don't have any script for that - load the files into a spreadsheet,
create pivot tables and you're done.

regards

On 07/18/2018 11:13 PM, Kefan Yang wrote:
> Hey Tomas!
> 
>  
> 
> I am trying to reproduce the results on my machine. Could you please
> share the script to generate .ods files?
> 
>  
> 
> Regards,
> 
> Kefan
> 
>  
> 
> *From: *Tomas Vondra <mailto:tomas.von...@2ndquadrant.com>
> *Sent: *July 18, 2018 2:05 AM
> *To: *Andrey Borodin <mailto:x4...@yandex-team.ru>
> *Cc: *Peter Geoghegan <mailto:p...@bowt.ie>; Kefan Yang
> <mailto:starord...@gmail.com>; PostgreSQL Hackers
> <mailto:pgsql-hackers@lists.postgresql.org>
> *Subject: *Re: GSOC 2018 Project - A New Sorting Routine
> 
>  
> 
>  
> 
>  
> 
> On 07/18/2018 07:06 AM, Andrey Borodin wrote:
> 
>> Hi, Tomas!
> 
>>
> 
>>> 15 июля 2018 г., в 1:20, Tomas Vondra  
>>> <mailto:tomas.von...@2ndquadrant.com>> написал(а):
> 
>>> 
> 
>>> So I doubt it's this, but I've tweaked the scripts to also set this GUC
> 
>>> and restarted the tests on both machines. Let's see what that does.
> 
>>
> 
>> Do you observe any different results?
> 
>>
> 
>  
> 
> It did change the CREATE INDEX results, depending on the scale. The full
> 
> data is available at [1] and [2], attached is a spreadsheet summary from
> 
> the Xeon box.
> 
>  
> 
> For the largest scale (1M rows) the regressions for CREATE INDEX queries
> 
> mostly disappeared. For 10k rows it still affects CREATE INDEX with a
> 
> text column, and the 100k case behaves just like before (so significant
> 
> regressions for CREATE INDEX).
> 
>  
> 
> I don't have time to investigate this further at the moment, but I'm
> 
> still of the opinion that there's little to gain by replacing our
> 
> current sort algorithm with this.
> 
>  
> 
>  
> 
> [1] https://bitbucket.org/tvondra/sort-intro-sort-xeon/src/master/
> 
> [2] https://bitbucket.org/tvondra/sort-intro-sort-i5/src/master/
> 
>  
> 
> regards
> 
>  
> 
> -- 
> 
> Tomas Vondra  http://www.2ndQuadrant.com
> 
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 
>  
> 

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: GSOC 2018 Project - A New Sorting Routine

2018-07-18 Thread Kefan Yang
Hey Tomas!

I am trying to reproduce the results on my machine. Could you please share the 
script to generate .ods files?

Regards,
Kefan

From: Tomas Vondra
Sent: July 18, 2018 2:05 AM
To: Andrey Borodin
Cc: Peter Geoghegan; Kefan Yang; PostgreSQL Hackers
Subject: Re: GSOC 2018 Project - A New Sorting Routine



On 07/18/2018 07:06 AM, Andrey Borodin wrote:
> Hi, Tomas!
> 
>> 15 июля 2018 г., в 1:20, Tomas Vondra > <mailto:tomas.von...@2ndquadrant.com>> написал(а):
>>
>> So I doubt it's this, but I've tweaked the scripts to also set this GUC
>> and restarted the tests on both machines. Let's see what that does.
> 
> Do you observe any different results?
> 

It did change the CREATE INDEX results, depending on the scale. The full 
data is available at [1] and [2], attached is a spreadsheet summary from 
the Xeon box.

For the largest scale (1M rows) the regressions for CREATE INDEX queries 
mostly disappeared. For 10k rows it still affects CREATE INDEX with a 
text column, and the 100k case behaves just like before (so significant 
regressions for CREATE INDEX).

I don't have time to investigate this further at the moment, but I'm 
still of the opinion that there's little to gain by replacing our 
current sort algorithm with this.


[1] https://bitbucket.org/tvondra/sort-intro-sort-xeon/src/master/
[2] https://bitbucket.org/tvondra/sort-intro-sort-i5/src/master/

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: GSOC 2018 Project - A New Sorting Routine

2018-07-17 Thread Andrey Borodin
Hi, Tomas!

> 15 июля 2018 г., в 1:20, Tomas Vondra  
> написал(а):
> 
> So I doubt it's this, but I've tweaked the scripts to also set this GUC
> and restarted the tests on both machines. Let's see what that does.

Do you observe any different results?

Thanks!

Best regards, Andrey Borodin.

Re: GSOC 2018 Project - A New Sorting Routine

2018-07-14 Thread Tomas Vondra
On 07/14/2018 12:10 AM, Peter Geoghegan wrote:
> On Fri, Jul 13, 2018 at 3:04 PM, Kefan Yang  wrote:
>> 1. Slow on CREATE INDEX cases.
>>
>> I am still trying to figure out where the bottleneck is. Is the data pattern
>> in index creation very different from other cases? Also, pg_qsort has
>> 10%-20% advantage at creating index even on sorted data (faster CPU, N =
>> 100). This is very strange to me since the two sorting routines execute
>> exactly the same code when the input data is sorted.
> 
> Yes. CREATE INDEX uses heap TID as a tie-breaker, so it's impossible
> for any two index tuples to compare as equal within tuplesort.c, even
> though they may be equal in other contexts. This is likely to defeat
> things like the Bentley-McIlroy optimization where equal keys are
> swapped, which is very effective in the event of many equal keys.
> 
> (Could also be parallelism, though I suppose you probably accounted for that.)
> 

Hmmm. Those scripts are older than max_parallel_maintenance_workers, so
were only setting the regular max_parallel_workers_per_gather GUCs. OTOH
these tests were done on fairly small data sets, starting from 10k rows
and the 10-20% regression is clearly visible for all scales (we don't
use parallel CREATE INDEX for tiny tables, right?). And it's not visible
on the i5 CPU at all, which would be a bit strange if it's
parallelism-related.

So I doubt it's this, but I've tweaked the scripts to also set this GUC
and restarted the tests on both machines. Let's see what that does.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: GSOC 2018 Project - A New Sorting Routine

2018-07-13 Thread Peter Geoghegan
On Fri, Jul 13, 2018 at 3:04 PM, Kefan Yang  wrote:
> 1. Slow on CREATE INDEX cases.
>
> I am still trying to figure out where the bottleneck is. Is the data pattern
> in index creation very different from other cases? Also, pg_qsort has
> 10%-20% advantage at creating index even on sorted data (faster CPU, N =
> 100). This is very strange to me since the two sorting routines execute
> exactly the same code when the input data is sorted.

Yes. CREATE INDEX uses heap TID as a tie-breaker, so it's impossible
for any two index tuples to compare as equal within tuplesort.c, even
though they may be equal in other contexts. This is likely to defeat
things like the Bentley-McIlroy optimization where equal keys are
swapped, which is very effective in the event of many equal keys.

(Could also be parallelism, though I suppose you probably accounted for that.)

-- 
Peter Geoghegan