Re: [PERFORM] Help with bulk read performance
Hi, My name is Dan and I'm a co-worker of Nick Matheson who initially submitted this question (because the mail group had me blacklisted for awhile for some reason). Thank you for all of the suggestions. We were able to improve out bulk read performance from 3 MB/s to 60 MB/s (assuming the data are NOT in cache in both cases) by doing the following: 1. Storing the data in a "bytea" column instead of an "array" column. 2. Retrieving the data via the Postgres 9 CopyManager#copyOut(String sql, OutputStream stream) method The key to the dramatic improvement appears to be the reduction in packing and unpacking time on the server and client, respectively. The server packing occurs when the retrieved data are packed into a bytestream for sending across the network. Storing the data as a simple byte array reduces this time substantially. The client-side unpacking time is spent generating a ResultSet object. By unpacking the bytestream into the desired arrays of floats by hand instead, this time became close to negligible. The only downside of storing the data in byte arrays is the loss of transparency. That is, a simple "select *" of a few rows shows bytes instead of floats. We hope to mitigate this by writing a simple stored procedures that unpacks the bytes into floats. A couple of other results: If the data are stored as a byte array but retrieve into a ResultSet, the unpacking time goes up by an order of magnitude and the observed total throughput is 25 MB/s. If the data are stored in a Postgres float array and unpacked into a byte stream, the observed throughput is 20 MB/s. Dan (and Nick) Andy Colson wrote: On 12/14/2010 9:41 AM, Jim Nasby wrote: On Dec 14, 2010, at 9:27 AM, Andy Colson wrote: Is this the same thing Nick is working on? How'd he get along? http://archives.postgresql.org/message-id/4cd1853f.2010...@noaa.gov So it is. The one I replied to stood out because no one had replied to it; I didn't see the earlier email. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net Oh.. I didn't even notice the date... I thought it was a new post. But still... (and I'll cc Nick on this) I'd love to hear an update on how this worked out. Did you get it to go fast? What'd you use? Did the project go over budget and did you all get fired? COME ON MAN! We need to know! :-) -Andy <> -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] performance libpq vs JDBC
I wrote a test program in C++ using libpq. It works as follows (pseudo code): for ( int loop = 0; loop < 1000; ++loop ) { PQexec("BEGIN"); const char* sql = "INSERT INTO pg_perf_test (id, text) VALUES($1,$2)"; PQprepare(m_conn, "stmtid",sql,0,NULL); for ( int i = 0; i < 1000; ++i ) // Set values etc. PQexecPrepared(m_conn,...); } PQexec("DEALLOCATE stmtid"); PQexec("COMMIT"); } I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert) After that, I wrote a test program in Java using JDBC. It works as follows: for ( int loops = 0; loops < 1000; ++i) { String sql = "INSERT INTO pq_perf_test (id,text) VALUES (?,?)"; PreparedStatement stmt = con.prepareStatement(sql); for (int i = 0; i < 1000; ++i ) { // Set values etc. stmt.addBatch(); } stmt.executeBatch(); con.commit(); stmt.close(); } I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert) This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq. Comparable results have been measured with analog update and delete statements. I need to enhance the performance of my C++ code. Is there any possibility in libpq to reach the performance of JDBC for INSERT, UPDATE and DELETE statements (I have no chance to use COPY statements)? I didn't find anything comparable to PreparedStatement.executeBatch() in libpq. Best regards, Werner Scholtes
[PERFORM] Problems with FTS
Hi, all. I'm trying to query table: EXPLAIN SELECT "v"."id", "v"."title" FROM "video" AS "v" WHERE (v.active) (v.fts @@ 'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery and v.id <> 500563 ) ORDER BY COALESCE(ts_rank_cd( '{0.1, 0.2, 0.7, 1.0}', v.fts, 'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery), 1) DESC, v.views DESC LIMIT 6 Here's the query that gets all related items, where fts is tsvector field with index on it (CREATE INDEX idx_video_fts ON video USING gin (fts);) earlier i tried gist, but results are the same. And here's what i got: "Limit (cost=98169.89..98169.90 rows=6 width=284)" " -> Sort (cost=98169.89..98383.16 rows=85311 width=284)" "Sort Key: (COALESCE(ts_rank_cd('{0.1,0.2,0.7,1}'::real[], fts, '( ( ( ( ( ''dexter'':A | ''season'':A ) | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) | ''сезон'':A ) | ''серия'':A'::tsquery), 1::real)), views" "-> Seq Scan on video v (cost=0.00..96640.70 rows=85311 width=284)" " Filter: (active AND (fts @@ '( ( ( ( ( ''dexter'':A | ''season'':A ) | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) | ''сезон'':A ) | ''серия'':A'::tsquery) AND (id <> 500563))" As you can see the query doesn't use index. If I drop "or" sentences for the query, it will, but I do need them. I'm using PostgreSQL 9.0. What should I do? The query is really too slow.
Re: [PERFORM] only one index is using, why?
On Wed, Dec 15, 2010 at 08:56, AI Rumman wrote: > My question is why "crmentity_setype_idx" index is being used only. > "crmentity_deleted_idx" index is not using. > Any idea please. Because the planner determined that the cost of scanning *two* indexes and combining the results is more expensive than scanning one index and filtering the results afterwards. Looks like your query could use a composite index on both columns: (deleted, setype) Or a partial index: (setype) WHERE deleted=0 Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance