Re: why SSD is slower than HDD SAS 15K ?

2018-01-15 Thread Neto pr
Dear Merlin
2018-01-15 11:16 GMT-08:00 Merlin Moncure :

> On Mon, Jan 15, 2018 at 7:38 AM, Neto pr  wrote:
> > Hello all,
> > Someone help me analyze the two execution plans below (Explain ANALYZE
> > used), is the  query 9 of TPC-H benchmark [1].
> > I'm using two servers HP Intel Xeon 2.8GHz/4-core - Memory 8GB. O.S.
> > Debian8, using EXT4 filesystem.
> >
> > Server 1
> > - HDD SAS 15 Krpm - 320 GB (Location where O.S. Debian and Postgresql are
> > installed).
> >
> > Server 2
> > - Samsung Evo SSD 500 GB (Location where Postgresql is Installed)
> > - HDD Sata 7500 Krpm - 1TB (Location where O.S Debian is installed)
> >
> > My DBMS parameters presents in postgresql.conf is default, but in SSD I
> have
> > changed random_page_cost = 1.0.
> >
> > I do not understand, because running on an HDD SAS a query used half the
> > time. I explain better, in HDD spends on average 12 minutes the query
> > execution and on SSD spent 26 minutes.
> > I think maybe the execution plan is using more write operations, and so
> the
> > HDD SAS 15Krpm has been faster.
> > I checked that the temporary tablespace pg_default is on the SSD in
> server
> > 2, because when running show temp_tablespaces in psql returns empty,
> will be
> > in the default directory, where I installed the DBMS in:
> > /media/ssd500gb/opt/pgv101norssd/data.
> >
> > Anyway, I always thought that an SSD would be equal or faster, but in the
> > case and four more cases we have here, it lost a lot for the HDDs.
>
> Generally for reading data, yes, but you changed the query plan also.
> To get to the bottom of this let's get SSD performance numbers for
> both plans and HDD performance numbers for both plans.  You're trying
> to measure device performance about are probably measuring the
> relative efficiencies of the generated plans.
>
>
Very good tip. I discovered that my SAS HDD drive has a transfer rate of
12Gb/s versus 6Gb/s of the SSD. Because of that reason the difference in
performance occurred.  See below:

SSD: Samsung 500 GB SATA III 6Gb/s - Model: 850 Evo
http://www.samsung.com/semiconductor/minisite/ssd/product/consumer/850evo/

HDD: HPE 300GB 12G SAS Part-Number: 737261-B21
https://h20195.www2.hpe.com/v2/GetPDF.aspx%2Fc04111744.pdf

I intend to do my experiment, between HDD and SSD, abandon the SAS HDD and
use a SATA HDD, to compare with the SATA SSD.
I will use your strategy to put the OS and DBMS on the same disk, when it
is SSD and separate on the HDD.
Best Regards
Neto

> merlin
>


Re: why SSD is slower than HDD SAS 15K ?

2018-01-15 Thread NTPT
I bet this is a ssd partition alignment problem there are erase block size
of 3mb and this should be taken in account, when You partition ssd drive, 
creating a raid and filesystem etc...
-- Původní e-mail --
Od: Merlin Moncure 
Komu: Neto pr 
Datum: 15. 1. 2018 20:17:17
Předmět: Re: why SSD is slower than HDD SAS 15K ?
"On Mon, Jan 15, 2018 at 7:38 AM, Neto pr  wrote:
> Hello all,
> Someone help me analyze the two execution plans below (Explain ANALYZE 
> used), is the query 9 of TPC-H benchmark [1].
> I'm using two servers HP Intel Xeon 2.8GHz/4-core - Memory 8GB. O.S.
> Debian8, using EXT4 filesystem.
>
> Server 1
> - HDD SAS 15 Krpm - 320 GB (Location where O.S. Debian and Postgresql are
> installed).
>
> Server 2
> - Samsung Evo SSD 500 GB (Location where Postgresql is Installed)
> - HDD Sata 7500 Krpm - 1TB (Location where O.S Debian is installed)
>
> My DBMS parameters presents in postgresql.conf is default, but in SSD I 
have
> changed random_page_cost = 1.0.
>
> I do not understand, because running on an HDD SAS a query used half the
> time. I explain better, in HDD spends on average 12 minutes the query 
> execution and on SSD spent 26 minutes.
> I think maybe the execution plan is using more write operations, and so 
the
> HDD SAS 15Krpm has been faster.
> I checked that the temporary tablespace pg_default is on the SSD in server

> 2, because when running show temp_tablespaces in psql returns empty, will
be
> in the default directory, where I installed the DBMS in:
> /media/ssd500gb/opt/pgv101norssd/data.
>
> Anyway, I always thought that an SSD would be equal or faster, but in the
> case and four more cases we have here, it lost a lot for the HDDs.

Generally for reading data, yes, but you changed the query plan also.
To get to the bottom of this let's get SSD performance numbers for
both plans and HDD performance numbers for both plans. You're trying
to measure device performance about are probably measuring the
relative efficiencies of the generated plans.

merlin

"

Insert results in 0 1

2018-01-15 Thread Sherman Willden
Computer HP Compaq 6710b
Development Platform: Ubuntu 17.10 mainly command line work
Tools: perl 5.26 and postgresql 9.6

Why do I get a 0 1 when using insert?

Thanks;

Sherman
classical=# INSERT INTO string_groups VALUES('et_b_02', 'Sonata in B
minor', 'Eroica Trio', 'Jean Baptiste Loeillet', 'Baroque');
INSERT 0 1
classical=# INSERT INTO string_groups VALUES('et_b_02', 'Sonata in B
minor', 'Eroica Trio', 'Jean Baptiste Loeillet', 'Baroque');
ERROR:  duplicate key value violates unique constraint "string_groups_pkey"
DETAIL:  Key (id)=(et_b_02) already exists.


Re: why SSD is slower than HDD SAS 15K ?

2018-01-15 Thread Merlin Moncure
On Mon, Jan 15, 2018 at 7:38 AM, Neto pr  wrote:
> Hello all,
> Someone help me analyze the two execution plans below (Explain ANALYZE
> used), is the  query 9 of TPC-H benchmark [1].
> I'm using two servers HP Intel Xeon 2.8GHz/4-core - Memory 8GB. O.S.
> Debian8, using EXT4 filesystem.
>
> Server 1
> - HDD SAS 15 Krpm - 320 GB (Location where O.S. Debian and Postgresql are
> installed).
>
> Server 2
> - Samsung Evo SSD 500 GB (Location where Postgresql is Installed)
> - HDD Sata 7500 Krpm - 1TB (Location where O.S Debian is installed)
>
> My DBMS parameters presents in postgresql.conf is default, but in SSD I have
> changed random_page_cost = 1.0.
>
> I do not understand, because running on an HDD SAS a query used half the
> time. I explain better, in HDD spends on average 12 minutes the query
> execution and on SSD spent 26 minutes.
> I think maybe the execution plan is using more write operations, and so the
> HDD SAS 15Krpm has been faster.
> I checked that the temporary tablespace pg_default is on the SSD in server
> 2, because when running show temp_tablespaces in psql returns empty, will be
> in the default directory, where I installed the DBMS in:
> /media/ssd500gb/opt/pgv101norssd/data.
>
> Anyway, I always thought that an SSD would be equal or faster, but in the
> case and four more cases we have here, it lost a lot for the HDDs.

Generally for reading data, yes, but you changed the query plan also.
To get to the bottom of this let's get SSD performance numbers for
both plans and HDD performance numbers for both plans.  You're trying
to measure device performance about are probably measuring the
relative efficiencies of the generated plans.

merlin



Sv: Re: Re: Is ORDER BY in sub-query preserved when outer query is only projection?

2018-01-15 Thread Andreas Joseph Krogh
På mandag 15. januar 2018 kl. 19:04:51, skrev David G. Johnston <
david.g.johns...@gmail.com >:
On Sun, Jan 14, 2018 at 5:40 AM, Andreas Joseph Krogh > wrote:
På søndag 14. januar 2018 kl. 13:30:29, skrev Francisco Olarte <
fola...@peoplecall.com >:
Andreas:

 On Sun, Jan 14, 2018 at 1:03 PM, Andreas Joseph Krogh
 > wrote:
 > SELECT q.* FROM (
 >   SELECT comp.id , comp.name 
 >   FROM company comp JOIN req r ON r.company_id = comp.id 
 >   ORDER BY LOWER(comp.name ) ASC
 > ) AS q
 > ORDER BY r.status ASC
  


 
Do you see any solution sorting on a composite type without using an outer 
query?
 

 
​Tacking on ORDER BY to an inner query is generally not the right thing to do.
 

​What can you not write:
 
SELECT q.* FROM () AS q
ORDER BY lower(q.name )​ ASC, q.status ASC
?



 
This is what I ended up doing.
 
Also, ORDER BY r.status ASC in you original query shouldn't work - r is not 
visible at that point, only q is.



 
Yea, this was a thinko from my part, I meant q.status.
 

-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 
 




Re: Re: Is ORDER BY in sub-query preserved when outer query is only projection?

2018-01-15 Thread Haroldo Stenger
hi, maybe selecting the relevant fields to order-by alongside (next to) the
row()'ed part, they will be selected  "twice", but you'll be able to
reference them.

regards


Haroldo

2018-01-14 9:40 GMT-03:00 Andreas Joseph Krogh :

> På søndag 14. januar 2018 kl. 13:30:29, skrev Francisco Olarte <
> fola...@peoplecall.com>:
>
> Andreas:
>
> On Sun, Jan 14, 2018 at 1:03 PM, Andreas Joseph Krogh
>  wrote:
> > SELECT q.* FROM (
> >   SELECT comp.id, comp.name
> >   FROM company comp JOIN req r ON r.company_id = comp.id
> >   ORDER BY LOWER(comp.name) ASC
> > ) AS q
> > ORDER BY r.status ASC
> >
> > What I'm trying to do here is to order by some status (which may be only
> 1 of 3 values, for instance OPEN, IN_PROGRESS, CLOSED), then order by
> company-name so I get results for each status sorted by company-name.
> >
> > Is this kind of sort stable, can I assume the pre-sorted result's order
> is preserved so I achieve what I want?
>
> I doubt it is mandated to be stable. But IIRC you can sort by a
> non-returned field, so you should be able to do it in just one query (
> http://sqlfiddle.com/#!17/aaa62/3 )
>
> I would try
>
>  SELECT comp.id, comp.name
>  FROM company comp JOIN req r ON r.company_id = comp.id
>  ORDER BY  r.status ASC, LOWER(comp.name) ASC
>
> Francisco Olarte.
>
>
> Thanks, but my real query is more complex and I need to sort on a custom
> coposite type so I think I *need* an outer query for that, see my post
> here: http://www.postgresql-archive.org/ORDER-BY-custom-
> type-td6000437.html
>
> Do you see any solution sorting on a composite type without using an outer
> query?
>
> --
> *Andreas Joseph Krogh*
> CTO / Partner - Visena AS
> Mobile: +47 909 56 963
> andr...@visena.com
> www.visena.com
> 
>
>


Re: why SSD is slower than HDD SAS 15K ?

2018-01-15 Thread Neto pr
2018-01-15 9:13 GMT-08:00 Vick Khera :

> Try random page cost 1.1. Way back when I started using SSD we had a
> discussion here and came to the conclusion that it should be ever so
> slightly higher than sequential page cost.
>
>
Very good tip, I'm running the query with random_page_cost = 1.1, but
notice that there are no secondary indexes on my database.
The test you were doing is to first run the query without index, and then
create an index to check the performance improvement.
But what I reported that having problem, is that the execution of the query
without index in a SAS HDD is being much faster, than the query (without
index) in the SSD, and I found this very strange, see below:
- Query execution Time on SSD - Average: 26min 54 seconds
- Query execution Time on HDD SAS 15K - Average: 12 minutes 48 seconds


> It is very hard to read your query plans (maybe gmail is wrapping them
> funny or you need to use a fixed font on them or share them from
> https://explain.depesz.com), but they do look substantially different. My
> guess is that with the random page cost = sequential page cost you are
> tricking Pg into using more sequential scans than index searches.
>

The problem is that this plan is saved in the database, I have a Java
application that executes 6 times a query and saves the result of Explain
Analyze to a table in my Database. Upon regaining the execution plan, it
loses the line breaks, unfortunately.  I'm checking how to change the java
application, I sent a question in the java forum because I do not know how
to solve this other problem yet:  My question in Java forum:
https://stackoverflow.com/questions/48267819/save-line-break-in-database-in-text-field


Re: why SSD is slower than HDD SAS 15K ?

2018-01-15 Thread Vick Khera
Try random page cost 1.1. Way back when I started using SSD we had a
discussion here and came to the conclusion that it should be ever so
slightly higher than sequential page cost.

It is very hard to read your query plans (maybe gmail is wrapping them
funny or you need to use a fixed font on them or share them from
https://explain.depesz.com), but they do look substantially different. My
guess is that with the random page cost = sequential page cost you are
tricking Pg into using more sequential scans than index searches.


Re: pg_basebackup is taking more time than expected

2018-01-15 Thread Raghavendra Rao J S V
I am looking for the help to minimise the time taken by the pg_basebackup
utility.

As informed Earlier we are taking the backup of the database using
pg_basbackup utility using below command.

$PGHOME/bin/pg_basebackup -p 5433 -U postgres -P -v -x --format=tar --gzip
--compress=6 --pgdata=- -D /opt/backup_db

According to our previous discussion, pg_basebackup is not depend on any of
the postgresql configuration parameters. If I go for gzip format we need to
compromise on time.

We are planning to take by following below steps. Please correct me if I am
wrong.


   1. Identify the larger indexes(whose size is above 256MB) and drop those
   indexes. Due to this size of the database will reduce.
   2. Take the backup of the database.
   3. Recreate the indexes on the environment where we created the
   environment which we created using the backup.


I am new to postgres database. Could you  help me to construct the query to
drop and create the indexes, please?


why SSD is slower than HDD SAS 15K ?

2018-01-15 Thread Neto pr
Hello all,
Someone help me analyze the two execution plans below (Explain ANALYZE
used), is the  query 9 of TPC-H benchmark [1].
I'm using two servers HP Intel Xeon 2.8GHz/4-core - Memory 8GB. O.S.
Debian8, using EXT4 filesystem.

Server 1
- HDD SAS 15 Krpm - 320 GB (Location where O.S. Debian and Postgresql are
installed).

Server 2
- Samsung Evo SSD 500 GB (Location where Postgresql is Installed)
- HDD Sata 7500 Krpm - 1TB (Location where O.S Debian is installed)

My DBMS parameters presents in postgresql.conf is default, but in SSD I
have changed random_page_cost = 1.0.

I do not understand, because running on an HDD SAS a query used half the
time. I explain better, in HDD spends on average 12 minutes the query
execution and on SSD spent 26 minutes.
I think maybe the execution plan is using more write operations, and so the
HDD SAS 15Krpm has been faster.
I checked that the temporary tablespace pg_default is on the SSD in server
2, because when running show temp_tablespaces in psql returns empty, will
be in the default directory, where I installed the DBMS in:
/media/ssd500gb/opt/pgv101norssd/data.

Anyway, I always thought that an SSD would be equal or faster, but in the
case and four more cases we have here, it lost a lot for the HDDs.

Any help in understanding, is welcome

Best Regards
Neto

- Query execution Time on SSD ---
execution 1: 00:23:29
execution 2: 00:28:38
execution 3: 00:27:32
execution 4: 00:27:54
execution 5: 00:27:35
execution 6: 00:26:19
Average: 26min 54 seconds

Query execution Time on HDD SAS 15K
--
execution 1: 00:12:44
execution 2: 00:12:30
execution 3: 00:12:47
execution 4: 00:13:02
execution 5: 00:13:00
execution 6: 00:12:47
Average: 12 minutes 48 seconds

- EXECUTION PLAN (ANALYZE, BUFFERS) on SSD
Storage--

Finalize GroupAggregate  (cost=1588.33..15980046.69 rows=60150
width=66) (actual time=1569793.025..1573969.614 rows=175 loops=1)  Group
Key: nation.n_name, (date_part(_year_::text,
(orders.o_orderdate)::timestamp without time zone))  Buffers: shared
hit=1237677 read=2399403, temp read=1186697 written=1183781  ->  Gather
Merge  (cost=1588.33..15977791.06 rows=120300 width=66) (actual
time=1569767.662..1573968.933 rows=525 loops=1)Workers Planned:
2Workers Launched: 2Buffers: shared hit=1237677
read=2399403, temp read=1186697 written=1183781->  Partial
GroupAggregate  (cost=15821228.31..15962905.44 rows=60150 width=66) (actual
time=1547834.941..1552040.073 rows=175 loops=3)  Group Key:
nation.n_name, (date_part(_year_::text, (orders.o_orderdate)::timestamp
without time zone))  Buffers: shared hit=3522992 read=7371656,
temp read=3551003 written=3542253  ->  Sort
(cost=15821228.31..15838806.37 rows=7031225 width=57) (actual
time=1547819.849..1548887.629 rows=4344197 loops=3)Sort
Key: nation.n_name, (date_part(_year_::text,
(orders.o_orderdate)::timestamp without time zone)) DESC
Sort Method: external merge  Disk: 321648kBBuffers:
shared hit=3522992 read=7371656, temp read=3551003
written=3542253->  Hash Join
(cost=4708859.28..14719466.13 rows=7031225 width=57) (actual
time=1220169.593..1541279.300 rows=4344197
loops=3)  Hash Cond: (lineitem.l_suppkey =
supplier.s_suppkey)  Buffers: shared hit=3522922
read=7371656, temp read=3220661 written=3211373
->  Hash Join  (cost=4683017.71..14434606.65 rows=7071075 width=43) (actual
time=1142575.564..1535092.395 rows=4344197
loops=3)Hash Cond: (lineitem.l_orderkey =
orders.o_orderkey)Buffers: shared
hit=3503999 read=7362903, temp read=3114233
written=3104987->  Hash Join
(cost=1993687.71..11297331.33 rows=7071075 width=47) (actual
time=275104.573..1213552.106 rows=4344197
loops=3)  Hash Cond:
((lineitem.l_suppkey = partsupp.ps_suppkey) AND (lineitem.l_partkey =
partsupp.ps_partkey))  Buffers: shared
hit=1478115 read=6073916, temp read=2369833
written=2366725  ->  Hash Join
(cost=273201.71..9157213.44 rows=7071075 width=45) (actual
time=24569.390..895992.716 rows=4344197
loops=3)Hash Cond:
(lineitem.l_partkey =
part.p_partkey)Buffers: shared
hit=314284 read=5038767, temp read=1742656
written=1742614->  Parallel Seq
Scan on lineitem  (cost=0.00..5861333.20 rows=15120 width=41) (actual
time=0.147..712469.002 rows=80004097
loops=3)  Buffers: shared
hit=482 read=4860800->  Hash

Re: String comparison problem in select - too many results

2018-01-15 Thread Durumdara
Dear Peter and Francisco!


Firstly: sorry for name change. I might read too many about Totti or I was
listening too many "Vamos a la playa" spanitaliano disco... :-) :-) :-)

To better understand my problem I write about the background.
We used different database engine "before" PG, and more codes based on that
SQL.
Because now only 70-80%% migrated, we need to preserve the compatibility
with older database engine.
This run on Windows, and it uses the local collation order (ANSI: the basic
ASCII extended with HUN characters. Like ISO-8859-2, or Windows1250).
It uses Windows comparison, so it doesn't matter that client sort or the
server - in Hungarian machines it's same (AnsiCompareText/Str).
We dont't need to worry about filters/orders.

All ASCII characters preserved in sort, but Hungarians inserted between
ASCII...
English: A, B, C, D, E, F,
Hungarian: A, Á, B, C, D, E, É, F

The data export works same in MS Excel or Libreoffice. The filters are same.

I supposed that HU_HU collation mostly similar.

The whole thing is important in row filtering, not in order. For sort data
we can use Devexpress grids.

But the filters must be same. We used many range filters in text fieds.
With older engine we can safely search ranges with "Á16-00023",
"18343843/BÉR" values, the result row number is same in programme side and
f. e. in Excel side.

I didn't understand what a hell caused this problem in PG side, what I did
wrong.
This is why I showed many demonstrations, because I wanted to understand.
The simulations points to "symbol removal".

Yes, we may handle these problems, but we need to construct "compatible"
SQL, or we need to make different solution in PG side.
This would cause "bugs" in the working applications, and make difference
between two products.

The C collation is good for only numbered fields.
Splitting to substring and order on them is also good in this example.
But for special accents (, "ű", "ó", "ő", etc) we need to use different
solution...
And the text fields are free, so "/" is only one symbol they can use.

Thank you for your instructions, help, and "flame"... :-)

Have a nice day for you!

dd