Re: [GENERAL] Help request to improve function performance
Hi Filip, Thanks a lot for your kind help. Selecting only once did the trick. Dropping to 2 seconds for select instead of 50 IS an improvement indeed :) Indexes on columns already existed, and just out of curiosity I've tested char columns instead of varchars, with no significant positive changes. Eliminating the loop made all the difference. Next thing to check out is the partitioning options. Kind regards. 2009/4/23 Filip Rembiałkowski > > W dniu 22 kwietnia 2009 23:47 użytkownik Seref Arikan < > serefari...@kurumsalteknoloji.com> napisał: > >> Hi Filip, >> First of all: thanks a lot for your kind response. Here is the create >> script for my schema: >> >> CREATE TABLE "app"."archetype_data" ( >> "id" BIGINT NOT NULL, >> "context_id" VARCHAR(1000), >> "archetype_name" VARCHAR(1000), >> "archetype_path" VARCHAR(1000), >> "name" VARCHAR(1000), >> "value_string" VARCHAR(1000), >> "value_int" BIGINT, >> "value_double" DOUBLE PRECISION, >> "session_id" VARCHAR(1000), >> "instance_index" INTEGER, >> CONSTRAINT "archetype_data_pkey" PRIMARY KEY("id") >> ) WITHOUT OIDS; >> >> Now, regarding your feedback, here are the points, hoping that you can >> give me feedback, and bring me up to speed in the topic, for I've been in >> the upper layers of the software world for so long :) > > > > Yes that's easy to observe; but Java and Postgres can go together, I assure > you :) > > > >> Please forgive me for further questions: > > > come on, nothing to forgive, if I did not like answering questions I would > not read this at all. > > >> >> >> Are you telling me that due to name resolution process, my use of variable >> and column names for context_id and session_id are causing problems? > > > that's what I'm telling, doesn't I? > > > > >> I'll change variable names into names which would be obviously different >> from column names. > > >> >> I used the temp table to speed up the following selects, since the actual >> table has more than 9 million rows. after creating the temp table, I am >> selecting from 50K rows. Am I wrong about the performance gain here? What >> would you suggest instead? > > > Select once. ONCE. Not to temp table, just to a implicit cursor [like here: > > http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING] > . You can use ORDER BY session_id , so in ONE pass you will be able to do > all needed computations. ( you will have block of records belonging to one > session_id, then another block for next session_id and so on). > > Regarding performance: if you create index on (context_id, session_id) this > query will be fast. > > > > >> >> Temp tables forced me to use execute, after hitting a known problem, also >> expressed in the faq, is this what you're talking about? > > > yes that's it, but as I sad before - you really can get rid of temporary > tables here. > > > >> >> >> I will be investing serious time into postgresql from now on, and I hope >> you can give me couple of useful hints, to ease my way forward :) Looking at >> the schema, can you think of a better way to send this result set to a java >> based app? > > > Better way to query or better schema? > > If you mean better way to query - I would just select all rows and combine > them in upper layer. ARRAYs are not so cute for me. > > If you mean better schema... That's the hardest question, as other guys > suggested. You will have to decide what to use. I know that some ORM for > Java (Hibernate, namely) produce quite decent database schemas. NOT E-A-V > model. Maybe this will suit you. > > > good luck! > > > -- > Filip Rembiałkowski > JID,mailto:filip.rembialkow...@gmail.com > http://filip.rembialkowski.net/ >
Re: [GENERAL] Help request to improve function performance
On Thu, Apr 23, 2009 at 09:44:53AM +0100, Seref Arikan wrote: > I have worked with very capable DBAs before, and even though it has been > quite some time since I've done real DB work, I would like to invest in > postgresql as much as I can Seref, if you can muster the man power to build archetypes right into PostgreSQL that would make it the killer database for OpenEHR: - functions for reading and validating ADL creating complex datatypes thereof including on-store validation of instances - functions to store and produce serialized versions of archetype instances (similar to XML handling) - AQL right inside the database (select aql('')) returning serialized instances of archetypes This list is probably incomplete and partially wrong. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help request to improve function performance
On Wed, Apr 22, 2009 at 06:21:41PM -0600, Scott Marlowe wrote: > > CREATE TABLE "app"."archetype_data" ( > > "id" BIGINT NOT NULL, > > "context_id" VARCHAR(1000), > > "archetype_name" VARCHAR(1000), > > "archetype_path" VARCHAR(1000), > > "name" VARCHAR(1000), > > "value_string" VARCHAR(1000), > > "value_int" BIGINT, > > "value_double" DOUBLE PRECISION, > > "session_id" VARCHAR(1000), > > "instance_index" INTEGER, > > CONSTRAINT "archetype_data_pkey" PRIMARY KEY("id") > > ) WITHOUT OIDS; > > If I'm not mistaken, you're doing Entity Attribute Value model type > storage. I.e. a database in a database. Makes for easy coding, and > danged near impossible to troubleshoot your data. The upper layer model he's trying to store data of has extremely well modelled constraints complete with verification engine and all. So while your concern is valid it's probably less so. > It's a religious issue but I come down on the side that good data > modelling is hard for a reason, because it pays you back so much in > the end. The hard part has already been done for him in a very thoughtful way: They've got a model, a metamodel and yet another model for data instances of the model ;-) The design group of the above methodology pretty much suggested not putting too much additional modelling into the data store (not that I agree too much) @Seref: Thomas Beale said so ;-) I dare say the extension power of PostgreSQL would lend itself extremely well to actual implementation of the OpenEHR model right in the database (mapping OpenEHR types to complex types including real-time verification, building AQL right into PostgreSQL in the form of stored procedures, etc) but it would take some serious effort. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help request to improve function performance
On Thu, Apr 23, 2009 at 12:02:13AM +0100, Seref Arikan wrote: > I have a set of dynamically composed objects represented in Java, with > string values for various attributes, which have variable length. In case > you have suggestions for a better type for this case, it would be my > pleasure to hear about them. Seref, he's suggesting you use TEXT instead of VARCHAR(something). In PG it's actually usually *less* overhead to use the unbounded text datatype (no length check required). Length checks mandated by business logic can be added by more dynamic means -- check constraints, triggers, etc which allow for less invasive change if needed. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help request to improve function performance
W dniu 22 kwietnia 2009 23:47 użytkownik Seref Arikan < serefari...@kurumsalteknoloji.com> napisał: > Hi Filip, > First of all: thanks a lot for your kind response. Here is the create > script for my schema: > > CREATE TABLE "app"."archetype_data" ( > "id" BIGINT NOT NULL, > "context_id" VARCHAR(1000), > "archetype_name" VARCHAR(1000), > "archetype_path" VARCHAR(1000), > "name" VARCHAR(1000), > "value_string" VARCHAR(1000), > "value_int" BIGINT, > "value_double" DOUBLE PRECISION, > "session_id" VARCHAR(1000), > "instance_index" INTEGER, > CONSTRAINT "archetype_data_pkey" PRIMARY KEY("id") > ) WITHOUT OIDS; > > Now, regarding your feedback, here are the points, hoping that you can give > me feedback, and bring me up to speed in the topic, for I've been in the > upper layers of the software world for so long :) Yes that's easy to observe; but Java and Postgres can go together, I assure you :) > Please forgive me for further questions: come on, nothing to forgive, if I did not like answering questions I would not read this at all. > > > Are you telling me that due to name resolution process, my use of variable > and column names for context_id and session_id are causing problems? that's what I'm telling, doesn't I? > I'll change variable names into names which would be obviously different > from column names. > > I used the temp table to speed up the following selects, since the actual > table has more than 9 million rows. after creating the temp table, I am > selecting from 50K rows. Am I wrong about the performance gain here? What > would you suggest instead? Select once. ONCE. Not to temp table, just to a implicit cursor [like here: http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING] . You can use ORDER BY session_id , so in ONE pass you will be able to do all needed computations. ( you will have block of records belonging to one session_id, then another block for next session_id and so on). Regarding performance: if you create index on (context_id, session_id) this query will be fast. > > Temp tables forced me to use execute, after hitting a known problem, also > expressed in the faq, is this what you're talking about? yes that's it, but as I sad before - you really can get rid of temporary tables here. > > > I will be investing serious time into postgresql from now on, and I hope > you can give me couple of useful hints, to ease my way forward :) Looking at > the schema, can you think of a better way to send this result set to a java > based app? Better way to query or better schema? If you mean better way to query - I would just select all rows and combine them in upper layer. ARRAYs are not so cute for me. If you mean better schema... That's the hardest question, as other guys suggested. You will have to decide what to use. I know that some ORM for Java (Hibernate, namely) produce quite decent database schemas. NOT E-A-V model. Maybe this will suit you. good luck! -- Filip Rembiałkowski JID,mailto:filip.rembialkow...@gmail.com http://filip.rembialkowski.net/
Re: [GENERAL] Help request to improve function performance
Hi Scott, I agree, and I am doing the entity attribute model because I simply have to. This table is used to persist data that is hold in user defined information models. Kind of a domain specific language. The users continously create these hierarchical structures, so neither the amount of them, nor their structure is stable. On top of that, these structures can have quite deep hieararchies, with collections, references to other structures etc.. This forces almost everyone working in the domain to end up in this db model. In case you are curious and have a lot of time at your hands, you can visit www.openehr.org to see what I'm talking about. The specifications part have all the documents one can need. I have worked with very capable DBAs before, and even though it has been quite some time since I've done real DB work, I would like to invest in postgresql as much as I can, to make the total framework faster. Therefore, all suggestions are welcommed. All the best Seref On Thu, Apr 23, 2009 at 1:21 AM, Scott Marlowe wrote: > 2009/4/22 Seref Arikan : > > Hi Filip, > > First of all: thanks a lot for your kind response. Here is the create > script > > for my schema: > > > > CREATE TABLE "app"."archetype_data" ( > > "id" BIGINT NOT NULL, > > "context_id" VARCHAR(1000), > > "archetype_name" VARCHAR(1000), > > "archetype_path" VARCHAR(1000), > > "name" VARCHAR(1000), > > "value_string" VARCHAR(1000), > > "value_int" BIGINT, > > "value_double" DOUBLE PRECISION, > > "session_id" VARCHAR(1000), > > "instance_index" INTEGER, > > CONSTRAINT "archetype_data_pkey" PRIMARY KEY("id") > > ) WITHOUT OIDS; > > If I'm not mistaken, you're doing Entity Attribute Value model type > storage. I.e. a database in a database. Makes for easy coding, and > danged near impossible to troubleshoot your data. > > It's a religious issue but I come down on the side that good data > modelling is hard for a reason, because it pays you back so much in > the end. >
Re: [GENERAL] Help request to improve function performance
Seref Arikan wrote: I have a set of dynamically composed objects represented in Java, with string values for various attributes, which have variable length. In case you have suggestions for a better type for this case, it would be my pleasure to hear about them. cut out about 3 layers of abstraction and get down to what you REALLY need to get done. dynamically composed piles of text valued attributes will NEVER be efficient, no matter what you do. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help request to improve function performance
2009/4/22 Seref Arikan : > Hi Filip, > First of all: thanks a lot for your kind response. Here is the create script > for my schema: > > CREATE TABLE "app"."archetype_data" ( > "id" BIGINT NOT NULL, > "context_id" VARCHAR(1000), > "archetype_name" VARCHAR(1000), > "archetype_path" VARCHAR(1000), > "name" VARCHAR(1000), > "value_string" VARCHAR(1000), > "value_int" BIGINT, > "value_double" DOUBLE PRECISION, > "session_id" VARCHAR(1000), > "instance_index" INTEGER, > CONSTRAINT "archetype_data_pkey" PRIMARY KEY("id") > ) WITHOUT OIDS; If I'm not mistaken, you're doing Entity Attribute Value model type storage. I.e. a database in a database. Makes for easy coding, and danged near impossible to troubleshoot your data. It's a religious issue but I come down on the side that good data modelling is hard for a reason, because it pays you back so much in the end. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help request to improve function performance
Hi there, I have a set of dynamically composed objects represented in Java, with string values for various attributes, which have variable length. In case you have suggestions for a better type for this case, it would be my pleasure to hear about them. 2009/4/22 Grzegorz Jaśkiewicz > you keep everything in varchars, and yet you request improvements in > performance. > you are a funny guy, ... >
Re: [GENERAL] Help request to improve function performance
you keep everything in varchars, and yet you request improvements in performance. you are a funny guy, ... -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help request to improve function performance
Hi Filip, First of all: thanks a lot for your kind response. Here is the create script for my schema: CREATE TABLE "app"."archetype_data" ( "id" BIGINT NOT NULL, "context_id" VARCHAR(1000), "archetype_name" VARCHAR(1000), "archetype_path" VARCHAR(1000), "name" VARCHAR(1000), "value_string" VARCHAR(1000), "value_int" BIGINT, "value_double" DOUBLE PRECISION, "session_id" VARCHAR(1000), "instance_index" INTEGER, CONSTRAINT "archetype_data_pkey" PRIMARY KEY("id") ) WITHOUT OIDS; Now, regarding your feedback, here are the points, hoping that you can give me feedback, and bring me up to speed in the topic, for I've been in the upper layers of the software world for so long :) Please forgive me for further questions: Are you telling me that due to name resolution process, my use of variable and column names for context_id and session_id are causing problems? I'll change variable names into names which would be obviously different from column names. I used the temp table to speed up the following selects, since the actual table has more than 9 million rows. after creating the temp table, I am selecting from 50K rows. Am I wrong about the performance gain here? What would you suggest instead? Temp tables forced me to use execute, after hitting a known problem, also expressed in the faq, is this what you're talking about? I will be investing serious time into postgresql from now on, and I hope you can give me couple of useful hints, to ease my way forward :) Looking at the schema, can you think of a better way to send this result set to a java based app? Many thanks again Kind regards Seref 2009/4/22 Filip Rembiałkowski > > > 2009/4/22 sarikan > >> >> Dear members of the list, >> I have a function which returns a custom type, that has only two fields, >> each of them being varchar arrays. >> The reason that I have written this function is that I have a table >> basically with the following structure (with simplified column names) >> >> name_col1 name_col2 sessionId >> value1 value3 id1 >> value2 value2 id1 >> value4 value4 id1 >> value7 value4 id2 >> value2 value2 id2 >> value4 value4 id2 >> value1 value5 id3 >> > > Why not post your REAL schema? It would make life easier, both for you and > for people trying to help. > > > >> >> So mutliple rows are bound together with sessionIds, and I need to get >> back >> all rows with a query, grouped by sessionID. However, group by sql >> statement >> does not solve my problem, since I get back a lot of rows, which I have to >> group into objects again in my application. What I need is a way to return >> all rows having the same sessionId as a single row. Of course this is not >> possible with this table, so I've created a custom type, which has array >> type columns. The following function gets all rows that belongs to a >> patient, and for each session id, it inserts rows with that session id >> into >> array fields of the custom type. >> The problem is, it is very slow! Getting back all the rows with a select >> takes 360 ms, while getting back the results of this function takes 50 >> seconds! Is there any way I can make the following function faster, or any >> other methods you can recommend to do what I'm trying to do? I am trying >> to >> avoid hundreds of calls to db, or grouping query results in my middleware >> application. Here comes the function, and your help will be much >> appreciated. >> >> Best Regards >> Seref >> >> > (below code edited to be more readable; logic unchanged) > > >> >> CREATE or REPLACE FUNCTION getNodeContainers( context_Id varchar) RETURNS >> setof NodesContainer AS >> $$ >> DECLARE >> archetype_data_row app.archetype_data; >> archetype_data_row_main app.archetype_data; >> nodescontainervar NodesContainer; >> session_Id varchar; >> indexVar integer := 0; >> BEGIN >>CREATE TEMP TABLE all_rows_of_patient AS select * from >> app.archetype_data >>WHERE context_id = context_Id; >>FOR session_Id IN >> SELECT distinct session_id from all_rows_of_patient >>LOOP -- do the following for each session_ID >>indexVar := 0; >>FOR archetype_data_row IN --select rows that belong to this session >> ID >>SELECT * from all_rows_of_patient >>WHERE session_id = session_Id and context_id = context_Id >>LOOP >>nodescontainervar.name[indexVar] := >> archetype_data_row.name; >>nodescontainervar.context_Id[indexVar] := >> archetype_data_row.context_Id; >>indexVar := indexVar + 1; >>END LOOP; >>return NEXT nodescontainervar; >>END LOOP; >>drop table all_rows_of_patient; >>return; >> END; >> $$ LANGUAGE 'plpgsql'; >> >> > please read above code - thats what postgres actually executes. column > names have precedence before variable names in name resolution. > conditions like > WHERE context_id = context_Id; >
Re: [GENERAL] Help request to improve function performance
2009/4/22 sarikan > > Dear members of the list, > I have a function which returns a custom type, that has only two fields, > each of them being varchar arrays. > The reason that I have written this function is that I have a table > basically with the following structure (with simplified column names) > > name_col1 name_col2 sessionId > value1 value3 id1 > value2 value2 id1 > value4 value4 id1 > value7 value4 id2 > value2 value2 id2 > value4 value4 id2 > value1 value5 id3 > Why not post your REAL schema? It would make life easier, both for you and for people trying to help. > > So mutliple rows are bound together with sessionIds, and I need to get back > all rows with a query, grouped by sessionID. However, group by sql > statement > does not solve my problem, since I get back a lot of rows, which I have to > group into objects again in my application. What I need is a way to return > all rows having the same sessionId as a single row. Of course this is not > possible with this table, so I've created a custom type, which has array > type columns. The following function gets all rows that belongs to a > patient, and for each session id, it inserts rows with that session id into > array fields of the custom type. > The problem is, it is very slow! Getting back all the rows with a select > takes 360 ms, while getting back the results of this function takes 50 > seconds! Is there any way I can make the following function faster, or any > other methods you can recommend to do what I'm trying to do? I am trying > to > avoid hundreds of calls to db, or grouping query results in my middleware > application. Here comes the function, and your help will be much > appreciated. > > Best Regards > Seref > > (below code edited to be more readable; logic unchanged) > > CREATE or REPLACE FUNCTION getNodeContainers( context_Id varchar) RETURNS > setof NodesContainer AS > $$ > DECLARE > archetype_data_row app.archetype_data; > archetype_data_row_main app.archetype_data; > nodescontainervar NodesContainer; > session_Id varchar; > indexVar integer := 0; > BEGIN >CREATE TEMP TABLE all_rows_of_patient AS select * from > app.archetype_data >WHERE context_id = context_Id; >FOR session_Id IN > SELECT distinct session_id from all_rows_of_patient >LOOP -- do the following for each session_ID >indexVar := 0; >FOR archetype_data_row IN --select rows that belong to this session > ID >SELECT * from all_rows_of_patient >WHERE session_id = session_Id and context_id = context_Id >LOOP >nodescontainervar.name[indexVar] := archetype_data_row.name > ; >nodescontainervar.context_Id[indexVar] := > archetype_data_row.context_Id; >indexVar := indexVar + 1; >END LOOP; >return NEXT nodescontainervar; >END LOOP; >drop table all_rows_of_patient; >return; > END; > $$ LANGUAGE 'plpgsql'; > > please read above code - thats what postgres actually executes. column names have precedence before variable names in name resolution. conditions like WHERE context_id = context_Id; WHERE session_id = session_Id and context_id = context_Id are obviously no-op conditions, not what you really want. I hope now it's clear now why this function has long execution time :) some other remarks: 1) you use temp tables inside a function, which is rather bad (search archives for explanation). try to avoid it. 2) usage of indexvar is not needed - there are array operators and functions 3) if you get rid of temp table, this function could be marked as STABLE,which will prevent penalty in some strange situations 4) mark your function as STRICT, which will save some CPU cycles when someone calls it on null input HTH. -- Filip Rembiałkowski JID,mailto:filip.rembialkow...@gmail.com http://filip.rembialkowski.net/
[GENERAL] Help request to improve function performance
Dear members of the list, I have a function which returns a custom type, that has only two fields, each of them being varchar arrays. The reason that I have written this function is that I have a table basically with the following structure (with simplified column names) name_col1 name_col2 sessionId value1 value3 id1 value2 value2 id1 value4 value4 id1 value7 value4 id2 value2 value2 id2 value4 value4 id2 value1 value5 id3 So mutliple rows are bound together with sessionIds, and I need to get back all rows with a query, grouped by sessionID. However, group by sql statement does not solve my problem, since I get back a lot of rows, which I have to group into objects again in my application. What I need is a way to return all rows having the same sessionId as a single row. Of course this is not possible with this table, so I've created a custom type, which has array type columns. The following function gets all rows that belongs to a patient, and for each session id, it inserts rows with that session id into array fields of the custom type. The problem is, it is very slow! Getting back all the rows with a select takes 360 ms, while getting back the results of this function takes 50 seconds! Is there any way I can make the following function faster, or any other methods you can recommend to do what I'm trying to do? I am trying to avoid hundreds of calls to db, or grouping query results in my middleware application. Here comes the function, and your help will be much appreciated. Best Regards Seref CREATE or REPLACE FUNCTION getNodeContainers( context_Id varchar) RETURNS setof NodesContainer AS $$ DECLARE archetype_data_row app.archetype_data%ROWTYPE; archetype_data_row_main app.archetype_data%ROWTYPE; nodescontainervar NodesContainer%ROWTYPE; session_Id varchar; indexVar integer := 0; BEGIN CREATE TEMP TABLE all_rows_of_patient AS select * from app.archetype_data WHERE app.archetype_data.context_id = context_Id; FOR session_Id IN SELECT distinct(all_rows_of_patient.session_id) from all_rows_of_patient LOOP -- do the following for each session_ID indexVar := 0; FOR archetype_data_row IN --select rows that belong to this session ID SELECT * from all_rows_of_patient WHERE all_rows_of_patient.session_id = session_Id and all_rows_of_patient.context_id = context_Id LOOP nodescontainervar.name[indexVar] := archetype_data_row.name; nodescontainervar.context_Id[indexVar] := archetype_data_row.context_Id; indexVar := indexVar + 1; END LOOP; return NEXT nodescontainervar; END LOOP; drop table all_rows_of_patient; return; END; $$ LANGUAGE 'plpgsql'; -- View this message in context: http://www.nabble.com/Help-request-to-improve-function-performance-tp23175540p23175540.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general