RE: Question on disk contention

2018-06-03 Thread Charles Clavadetscher
Hi Melvin

 

From: Melvin Davidson [mailto:melvin6...@gmail.com] 
Sent: Donnerstag, 31. Mai 2018 15:53
To: Charles Clavadetscher 
Cc: pgsql-generallists.postgresql.org 
Subject: Re: Question on disk contention

 

 

 

On Thu, May 31, 2018 at 1:13 AM, Charles Clavadetscher 
mailto:clavadetsc...@swisspug.org> > wrote:

Hi Melvin

As an answer to a previous post you wrote:

"Also, your main problem is that when you have two exact same queries executing 
at the same time, they will cause contention in
the disk, and neither one will make much progress."

Could you elaborate a little more on the meaning of "contention in the disk"?
What is it that happens?

Thank you and have a good day.
Regards
Charles




>Could you elaborate a little more on the meaning of "contention in the disk"?
>What is it that happens? 


 

To simplify, you have two users/jobs, both wanting the exact same information. 
So the system instructs the disk to get 

that information from the disk, which causes the disk head to "seek" to the 
position of the first eligible row and 

continues positioning to other eligible rows. Now the job is not exclusive, so 
the system temporarily switches to the

other job, which causes the disk to go back to the first row and work from 
there. The switching back and forth continues,

so that instead of one job finishing quickly, they both have to take turns 
waiting for needed information. That takes

a lot longer,

 

Try this, Select a table that has a lot of rows, ideally 1M+. Then start a 
query with a WHERE clause and see how long

it takes. Then submit the same query from 5 separate connections simultaneously 
and see how long that takes.

 

Thank you very much for your answer.

Regards

Charles

 


-- 

Melvin Davidson
Maj. Database & Exploration Specialist
Universe Exploration Command – UXC
Employment by invitation only! 



Re: Question on disk contention

2018-06-01 Thread Fabio Pardi
For the sake of completeness I think i have to slightly correct myself.

While I cannot find it in the documentation, (if somebody has pointers, please 
provide them) on my personal notes I found that 'if a scan is already in 
progress, and a new scan starts, then the new scan will start where the 
existing scan is.'

In other words, the new scan will make use of the existing data retrieval, and 
then retrieve the data it misses.

I did some tests which look like my notes are right.

I created a table:

 \dt+ big_series 
  List of relations
  Schema   |Name| Type  | Owner | Size  | Description 
---++---+---+---+-
 pv_public | big_series | table | user | 24 GB | 
(1 row)


 \d big_series ;
 Table "pv_public.big_series"
 Column |  Type   | Modifiers 
+-+---
 a  | integer | 


containing 700+ million records


On a 4 cores VM, with 10GB RAM running Postgres 9.6.3: 

If i run a single query, then it takes around 3.5 minutes

running i=50 queries in parallel, which do

select * from big_series where a = '$i' 

and start with 2 seconds difference one to another,

then the times are:

output.0: Execution time: 213519.445 ms
output.10: Execution time: 223256.190 ms
output.11: Execution time: 224113.198 ms
output.12: Execution time: 225147.653 ms
output.13: Execution time: 226101.934 ms
output.14: Execution time: 227028.992 ms
output.15: Execution time: 228622.188 ms
output.16: Execution time: 229584.695 ms
output.17: Execution time: 229795.122 ms
output.18: Execution time: 229949.648 ms
output.19: Execution time: 229974.346 ms
output.1: Execution time: 214274.906 ms
output.20: Execution time: 230001.553 ms
output.21: Execution time: 230138.985 ms
output.22: Execution time: 230080.078 ms
output.23: Execution time: 230135.255 ms
output.24: Execution time: 230393.351 ms
output.25: Execution time: 230467.203 ms
output.26: Execution time: 230651.052 ms
output.27: Execution time: 230603.229 ms
output.28: Execution time: 230502.608 ms
output.29: Execution time: 230692.864 ms
output.2: Execution time: 215348.711 ms
output.30: Execution time: 230630.022 ms
output.31: Execution time: 230501.983 ms
output.32: Execution time: 230213.728 ms
output.33: Execution time: 229992.756 ms
output.34: Execution time: 229515.631 ms
output.35: Execution time: 228769.224 ms
output.36: Execution time: 228341.051 ms
output.37: Execution time: 227881.148 ms
output.38: Execution time: 226883.588 ms
output.39: Execution time: 225697.683 ms
output.3: Execution time: 216344.473 ms
output.40: Execution time: 224860.213 ms
output.41: Execution time: 222169.478 ms
output.42: Execution time: 221259.683 ms
output.43: Execution time: 220444.522 ms
output.44: Execution time: 219244.326 ms
output.45: Execution time: 218170.045 ms
output.46: Execution time: 217346.484 ms
output.47: Execution time: 216860.823 ms
output.48: Execution time: 216431.684 ms
output.49: Execution time: 216286.271 ms
output.4: Execution time: 217365.770 ms
output.5: Execution time: 218376.528 ms
output.6: Execution time: 219395.868 ms
output.7: Execution time: 220501.178 ms
output.8: Execution time: 221297.149 ms
output.9: Execution time: 222157.996 ms




Please correct me if my notes are wrong, or provide pointers to the 
documentation.


regards,

fabio pardi


On 31/05/18 16:20, Fabio Pardi wrote:
> As far as I know, the OS cache is shared, and shared_buffers too.
> 
> Back to the matter of contention, your statement i think might be true only 
> in cases when you are querying data which does not fit in RAM. 
> 
> Under those circumstances, the OS or Postgres might need to evict blocks from 
> RAM to make room to new blocks fetched from disk, while the concurrent query 
> is trying to do the same. Then the disk contention might happen. 
> 
> 
> regards,
> 
> fabio pardi
> 
> On 31/05/18 16:09, Melvin Davidson wrote:
>>
>>
>> On Thu, May 31, 2018 at 10:04 AM, Ron > > wrote:
>>
>> On 05/31/2018 08:52 AM, Melvin Davidson wrote:
>>>
>>>
>>> On Thu, May 31, 2018 at 1:13 AM, Charles Clavadetscher 
>>> mailto:clavadetsc...@swisspug.org>> wrote:
>>>
>>> Hi Melvin
>>>
>>> As an answer to a previous post you wrote:
>>>
>>> "Also, your main problem is that when you have two exact same 
>>> queries executing at the same time, they will cause contention in
>>> the disk, and neither one will make much progress."
>>>
>>> Could you elaborate a little more on the meaning of "contention in 
>>> the disk"?
>>> What is it that happens?
>>>
>>> Thank you and have a good day.
>>> Regards
>>> Charles
>>>
>>>
>>>
>>> >Could you elaborate a little more on the meaning of "contention in the 
>>> disk"?
>>> >What is it that happens?
>>>
>>> To simplify, you have two users/jobs, both wanting the exact same 
>>> information. So the system instructs the disk to get
>>> that informati

Re: Question on disk contention

2018-05-31 Thread Fabio Pardi
As far as I know, the OS cache is shared, and shared_buffers too.

Back to the matter of contention, your statement i think might be true only in 
cases when you are querying data which does not fit in RAM. 

Under those circumstances, the OS or Postgres might need to evict blocks from 
RAM to make room to new blocks fetched from disk, while the concurrent query is 
trying to do the same. Then the disk contention might happen. 


regards,

fabio pardi

On 31/05/18 16:09, Melvin Davidson wrote:
> 
> 
> On Thu, May 31, 2018 at 10:04 AM, Ron  > wrote:
> 
> On 05/31/2018 08:52 AM, Melvin Davidson wrote:
>>
>>
>> On Thu, May 31, 2018 at 1:13 AM, Charles Clavadetscher 
>> mailto:clavadetsc...@swisspug.org>> wrote:
>>
>> Hi Melvin
>>
>> As an answer to a previous post you wrote:
>>
>> "Also, your main problem is that when you have two exact same 
>> queries executing at the same time, they will cause contention in
>> the disk, and neither one will make much progress."
>>
>> Could you elaborate a little more on the meaning of "contention in 
>> the disk"?
>> What is it that happens?
>>
>> Thank you and have a good day.
>> Regards
>> Charles
>>
>>
>>
>> >Could you elaborate a little more on the meaning of "contention in the 
>> disk"?
>> >What is it that happens?
>>
>> To simplify, you have two users/jobs, both wanting the exact same 
>> information. So the system instructs the disk to get
>> that information from the disk, which causes the disk head to "seek" to 
>> the position of the first eligible row and
>> continues positioning to other eligible rows. Now the job is not 
>> exclusive, so the system temporarily switches to the
>> other job, which causes the disk to go back to the first row and work 
>> from there. The switching back and forth continues,
>> so that instead of one job finishing quickly, they both have to take 
>> turns waiting for needed information. That takes
>> a lot longer,
>>
>> Try this, Select a table that has a lot of rows, ideally 1M+. Then start 
>> a query with a WHERE clause and see how long
>> it takes. Then submit the same query from 5 separate connections 
>> simultaneously and see how long that takes.
> 
> Why isn't the OS caching the disk blocks, and why isn't Postgres using 
> the cached data?
> 
> 
> -- 
> Angular momentum makes the world go 'round.
> 
> 
> 
>> Why isn't the OS caching the disk blocks, and why isn't Postgres using the 
>> cached data?
> It does, but the cache is for each connection/job. They are not shared.
> -- 
> *Melvin Davidson**
> Maj. Database & Exploration Specialist**
> Universe Exploration Command – UXC***
> Employment by invitation only!



RE: Question on disk contention

2018-05-31 Thread Igor Neyman


> Why isn't the OS caching the disk blocks, and why isn't Postgres using the 
> cached data?
It does, but the cache is for each connection/job. They are not shared.
--
Melvin Davidson
Maj. Database & Exploration Specialist
Universe Exploration Command – UXC
Employment by invitation only!



That is simply not true: shared_buffers have this name for a reason. What’s not 
shared is work_mem used for sorting, etc…
Also OS cache is shared too.

Regards,
Igor Neyman



Re: Question on disk contention

2018-05-31 Thread Melvin Davidson
On Thu, May 31, 2018 at 10:04 AM, Ron  wrote:

> On 05/31/2018 08:52 AM, Melvin Davidson wrote:
>
>
>
> On Thu, May 31, 2018 at 1:13 AM, Charles Clavadetscher <
> clavadetsc...@swisspug.org> wrote:
>
>> Hi Melvin
>>
>> As an answer to a previous post you wrote:
>>
>> "Also, your main problem is that when you have two exact same queries
>> executing at the same time, they will cause contention in
>> the disk, and neither one will make much progress."
>>
>> Could you elaborate a little more on the meaning of "contention in the
>> disk"?
>> What is it that happens?
>>
>> Thank you and have a good day.
>> Regards
>> Charles
>>
>>
>>
> >Could you elaborate a little more on the meaning of "contention in the
> disk"?
> >What is it that happens?
>
> To simplify, you have two users/jobs, both wanting the exact same
> information. So the system instructs the disk to get
> that information from the disk, which causes the disk head to "seek" to
> the position of the first eligible row and
> continues positioning to other eligible rows. Now the job is not
> exclusive, so the system temporarily switches to the
> other job, which causes the disk to go back to the first row and work from
> there. The switching back and forth continues,
> so that instead of one job finishing quickly, they both have to take turns
> waiting for needed information. That takes
> a lot longer,
>
> Try this, Select a table that has a lot of rows, ideally 1M+. Then start a
> query with a WHERE clause and see how long
> it takes. Then submit the same query from 5 separate connections
> simultaneously and see how long that takes.
>
>
> Why isn't the OS caching the disk blocks, and why isn't Postgres using the
> cached data?
>
>
> --
> Angular momentum makes the world go 'round.
>


> Why isn't the OS caching the disk blocks, and why isn't Postgres using
the cached data?
It does, but the cache is for each connection/job. They are not shared.
-- 
*Melvin Davidson*
*Maj. Database & Exploration Specialist*
*Universe Exploration Command – UXC*
Employment by invitation only!


Re: Question on disk contention

2018-05-31 Thread Ron

On 05/31/2018 08:52 AM, Melvin Davidson wrote:



On Thu, May 31, 2018 at 1:13 AM, Charles Clavadetscher 
mailto:clavadetsc...@swisspug.org>> wrote:


Hi Melvin

As an answer to a previous post you wrote:

"Also, your main problem is that when you have two exact same queries
executing at the same time, they will cause contention in
the disk, and neither one will make much progress."

Could you elaborate a little more on the meaning of "contention in the
disk"?
What is it that happens?

Thank you and have a good day.
Regards
Charles



>Could you elaborate a little more on the meaning of "contention in the disk"?
>What is it that happens?

To simplify, you have two users/jobs, both wanting the exact same 
information. So the system instructs the disk to get
that information from the disk, which causes the disk head to "seek" to 
the position of the first eligible row and
continues positioning to other eligible rows. Now the job is not 
exclusive, so the system temporarily switches to the
other job, which causes the disk to go back to the first row and work from 
there. The switching back and forth continues,
so that instead of one job finishing quickly, they both have to take turns 
waiting for needed information. That takes

a lot longer,

Try this, Select a table that has a lot of rows, ideally 1M+. Then start a 
query with a WHERE clause and see how long
it takes. Then submit the same query from 5 separate connections 
simultaneously and see how long that takes.


Why isn't the OS caching the disk blocks, and why isn't Postgres using the 
cached data?



--
Angular momentum makes the world go 'round.


Re: Question on disk contention

2018-05-31 Thread Melvin Davidson
On Thu, May 31, 2018 at 1:13 AM, Charles Clavadetscher <
clavadetsc...@swisspug.org> wrote:

> Hi Melvin
>
> As an answer to a previous post you wrote:
>
> "Also, your main problem is that when you have two exact same queries
> executing at the same time, they will cause contention in
> the disk, and neither one will make much progress."
>
> Could you elaborate a little more on the meaning of "contention in the
> disk"?
> What is it that happens?
>
> Thank you and have a good day.
> Regards
> Charles
>
>
>
>Could you elaborate a little more on the meaning of "contention in the
disk"?
>What is it that happens?

To simplify, you have two users/jobs, both wanting the exact same
information. So the system instructs the disk to get
that information from the disk, which causes the disk head to "seek" to the
position of the first eligible row and
continues positioning to other eligible rows. Now the job is not exclusive,
so the system temporarily switches to the
other job, which causes the disk to go back to the first row and work from
there. The switching back and forth continues,
so that instead of one job finishing quickly, they both have to take turns
waiting for needed information. That takes
a lot longer,

Try this, Select a table that has a lot of rows, ideally 1M+. Then start a
query with a WHERE clause and see how long
it takes. Then submit the same query from 5 separate connections
simultaneously and see how long that takes.

-- 
*Melvin Davidson*
*Maj. Database & Exploration Specialist*
*Universe Exploration Command – UXC*
Employment by invitation only!


Question on disk contention

2018-05-30 Thread Charles Clavadetscher
Hi Melvin

As an answer to a previous post you wrote:

"Also, your main problem is that when you have two exact same queries executing 
at the same time, they will cause contention in
the disk, and neither one will make much progress."

Could you elaborate a little more on the meaning of "contention in the disk"?
What is it that happens?

Thank you and have a good day.
Regards
Charles