Re: [GENERAL] Multithreaded query onto 4 postgresql instances

Alessandro Candini Tue, 15 Feb 2011 00:33:57 -0800

On 14 Feb 2011, at 9:38, Alessandro Candini wrote:

I performed tests with a query returning more or less 100000 records and using 
my C module I obtain the following results (every test performed cleaning cache 
before):
- single db: 9.555 sec
- splitted in 4: 5.496 sec

Is that a single query on that one DB compared to 4 queries on 4 DB's? How does 
a single DB with 4 parallel queries perform? I'd expect that to win from 4 
DB's, due to the overhead those extra DB instances are generating.
I do suppose you tried tuning that DB before you started making things 
complicated?


Maybe my configuration and test is not clear

Single instance ---> 600 millions of records in one single db (port5433) ---> query of 100000 records ---> 9.555 sec


Splitted instances:

600 millions of records in total splitted into 4 postgresql instances(port 5433, 5434, 5435, 5436), let's say more or less:


5433 ---> 150 millions of records
5434 ---> 150 millions of records
5435 ---> 150 millions of records
5436 ---> 150 millions of records

I've launched the same query reported above with 4 concurrent threadsusing a C program with libpq, which merges the results also,returning the same ordered 100000 tuples retrieved in the single db casebut in 5.496 sec.

So I think this can be a good approach...
I have already read this

http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html

Did you read all the way to section 35.9.10? That explains how to create SRF's 
like yours, including examples. If the step from simple functions to SRF's is 
too large for you, create a few sample-functions to learn how the intermediary 
steps work. We can't upload the info to your brain, after all - you'll have to 
do the learning part by yourself.

Do you think is a good idea continue using libpq or should I abandonthem and focus on postgresql documentation examples?

I posted it in my previous thread. But it is not clear to me how to embed my C 
function into postgresql.
I mean, I know how to compile and insert it into postgresql, I've already done 
it for simple function which return a boolean, for example.
But it is not clear to me how to do this if I want as return a complete table, 
or set of rows...
Can you give me a minimalistic example?

I'm not aware of any difference between adding a function returning a scalar 
vs. one that returns a set. You have to create a wrapper function to your 
library so that PG knows about your function, but apart from the return types 
(BOOLEAN vs. SETOF something), that declaration is exactly the same.

By the way, my goal is to perform a SELECT query in the smallest time possible.
For that reason an SQL function is not good beacuse slower than a C function...

I have no idea what you're referring to, since you're top-posting, but I never 
suggested to use SQL functions. I just pointed out that the document you linked 
was about SQL functions instead of C functions, in the assumption that was what 
was causing your confusion. You didn't explain what you were confused about 
after all, I had to guess.

You're right, I copy/pasted the wrong link, sorry.

On 10 Feb 2011, at 9:01, Alessandro Candini wrote:

I have installed 4 different instances of postgresql-9.0.2 on the same machine, 
on ports 5433, 5434, 5435, 5436.

I do hope you intend to put those databases on different machines eventually, 
or some such? Otherwise you probably didn't gain anything by splitting your 
database up like that - you've just reduced the available resources on that 
single machine.

Why I have to do something tricky like this is long too explain...

It would help to know what you're trying to achieve by splitting your database 
up like this. We don't need the full story, just a summary is fine; Maybe this 
is some experimental setup that's more related to multi-threading than to the 
actual database design? Maybe management smoked something outlandish and put 
you up with this? Maybe this is a macroscopic attempt to table partitioning?

Then I have developed a C function using libpq which creates 4 threads, each 
one which query a 1/4 of the db. After that I merge the results in one single 
response.

My function works fine, but I need to include it inside a postgresql instance 
in order to launch it as a normal SQL query (SELECT myfunc(...);).
I have read the documentation here 
http://www.postgresql.org/docs/9.0/static/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET,
 but I'm pretty confuse and I don't know if this is the right way to accomplish 
my goal.

What are you confused about? That's a chapter about set-returning functions 
written in SQL - perhaps you're looking for information about set-returning 
functions from an external library like yours? Perhaps you're looking for: 
http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html

You say you don't know if this is the right way to accomplish your goal - which 
is...?

Have you got any ideas or suggestions?

As others have suggested, you should probably have a look at 
table-partitioning, possibly in combination with tablespaces if you want to 
divide your database among multiple disks/filesystems. Putting them in 
different servers on the same hardware is probably not going to be a very good 
solution. As you've already found out, it makes querying the data silly 
difficult. But, we don't know the reason you're doing that of course.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


--
Alessandro Candini

MEEO S.r.l.
Via Saragat 9
I-44122 Ferrara, Italy
Tel: +39 0532 1861501
Fax: +39 0532 1861637
http://www.meeo.it

========================================
"ATTENZIONE:le informazioni contenute in questo messaggio sono
da considerarsi confidenziali ed il loro utilizzo è riservato unicamente
al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
per errore è tenuto ad informare il mittente ed a rimuoverlo
definitivamente da ogni supporto elettronico o cartaceo."

"WARNING:This message contains confidential and/or proprietary
information which may be subject to privilege or immunity and which
is intended for use of its addressee only. Should you receive this
message in error, you are kindly requested to inform the sender and
to definitively remove it from any paper or electronic format."


--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:1234,4d59820a11731774212429!



--
Alessandro Candini

MEEO S.r.l.
Via Saragat 9
I-44122 Ferrara, Italy
Tel: +39 0532 1861501
Fax: +39 0532 1861637
http://www.meeo.it

========================================
"ATTENZIONE:le informazioni contenute in questo messaggio sono
da considerarsi confidenziali ed il loro utilizzo è riservato unicamente
al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
per errore è tenuto ad informare il mittente ed a rimuoverlo
definitivamente da ogni supporto elettronico o cartaceo."

"WARNING:This message contains confidential and/or proprietary
information which may be subject to privilege or immunity and which
is intended for use of its addressee only. Should you receive this
message in error, you are kindly requested to inform the sender and
to definitively remove it from any paper or electronic format."


--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Multithreaded query onto 4 postgresql instances

Reply via email to