[GENERAL] C++, Postgres , libpqxx huge query
Hello experts. I have posted this question on stack overflow, but I did not get any detailed answer, so I thought I should cross post here. My apologies. I have to execute an SQL query to Postgres by the following code. The query returns a huge number of rows (40M or more) and has 4 integer fields: When I use a workstation with 32Gb everything works but on a 16Gb workstation the query is very slow (due to swapping I guess). Is there any way to tell the C++ to load rows at batches, without waiting the entire dataset? With Java I never had these issues before, due to the probably better JDBC driver. try { work W(*Conn); result r = W.exec(sql[sqlLoad]); W.commit(); for (int rownum = 0; rownum r.size(); ++rownum) { const result::tuple row = r[rownum]; vid1 = row[0].asint(); vid2 = row[1].asint(); vid3 = row[2].asint(); . } catch (const std::exception e) { std::cerr e.what() std::endl; } I am using PostgreSQL 9.3 and there I see this http://www.postgresql.org/docs/9.3/static/libpq-single-row-mode.html, but I do not how to use it on my C++ code. Your help will be appreciated. EDIT: This query runs only once, for creating the necessary main memory data structures. As such, tt cannot be optimized. Also, pgAdminIII could easily fetch those rows, in under one minute on the same (or with smaller RAM) PCs. Also, Java could easily handle twice the number of rows (with Statent.setFetchSize() http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize%28int%29) So, it is really an issue for the libpqxx library and not an application issue. Is there a way to enforce this functionality in C++, without explicitly setting limits / offsets manually? Also, is there another driver I should use with C++ that offers this kind of functionality? -- View this message in context: http://postgresql.1045698.n5.nabble.com/C-Postgres-libpqxx-huge-query-tp5802330.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] C++, Postgres , libpqxx huge query
To answer my own question, I adapted How to use pqxx::stateless_cursor class from libpqxx? try { work W(*Conn); pqxx::stateless_cursorpqxx::cursor_base::read_only, pqxx::cursor_base::owned cursor(W, sql[sqlLoad], mycursor, false); /* Assume you know total number of records returned */ for (size_t idx = 0; idx countRecords; idx += 10) { /* Fetch 100,000 records at a time */ result r = cursor.retrieve(idx, idx + 10); for (int rownum = 0; rownum r.size(); ++rownum) { const result::tuple row = r[rownum]; vid1 = row[0].asint(); vid2 = row[1].asint(); vid3 = row[2].asint(); . } } } catch (const std::exception e) { std::cerr e.what() std::endl; } -- View this message in context: http://postgresql.1045698.n5.nabble.com/C-Postgres-libpqxx-huge-query-tp5802330p5802392.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High Level Committers Wanted
It seems like spam to me. Where is the guy's name or credentials? If I would request something, I would sign with my name, government email and telephone. -- View this message in context: http://postgresql.1045698.n5.nabble.com/High-Level-Committers-Wanted-tp5795695p5795703.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pg_trgm extension and theory
Hello to all experts, I am considering of using pg_trgm extension in a research publication, since initial results seem promising. The index seems to works pretty fast for finding similar text and significantly accelerate query time. The problem is that I do not know the theory behind it or the exact method it uses. My questions: a) It probably uses the q-grams method (basically 3 grams only). Does it also create 2 grams and 1 grams to determine similarity? b) About the index (either gist on gin). Is it based on RD-tree? If not what is the exact indexing method it uses? c) Will it work for any UTF8 characters / strings because the documentation says for ASCII. d) I also found the http://pgsimilarity.projects.pgfoundry.org/ project who does similarity functions for string. Does pg_trgm extension have anything to do with that? Since pgsimilarity seems abandoned is there another project that a) uses some kind of indexing for similarity b) provides most functions for string similarity like pgsimilarity? Thanks -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-trgm-extension-and-theory-tp5793180.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Pg_trgm and invalid invalid byte sequence for encoding UTF8
Hello experts, I want to compare integer arrays basically with methods based on string similarity (i.e., levenshtein, trigrams etc).. In order to do that I hacked a custom function that converts those integer array to strings, where each integer is converted to a character by the function CHR(my_array1[i]+64) (so that 1-A, 2 -B etc). This hack of course for large integers (I have integers up to 300,000) probably creates invalid UTF-8 characters. Levenshtein (from fuzzystrmatch module) does not seem to have a problem with that and works perfectly, since it is based on just comparing UTF8 codes. On the other hand when I try similarity function array1-array1 for some cases it works (I think it works for all integers up to 4096) but for some larger indexes I get invalid byte sequence for encoding UTF8 errors: Example integer sequence 8527,63586,8526,63585,63584,63583,63582,8525,8760,63820,63821,63822,860,57610,861,57611,862,57612,57613,863,57614,57615,57616,39850,39851,39852,39853,39854,39855,95275,39856,39857,95276,95277,39858,95278,95279,39859,95280,39860,95281,95282,39861,39862,39863,95283,95284,27095,27096,82406,82407,27097,27098,27099,27100,82408,27101,27102,27103,25702,80837,25703,25704,80838,25705,25706,25707,25708,30011,85343,30012,85344,30013,30014,51019,48260,48261,56809,56810,56811,56812,113829,31762,87568,31763,45925,41778,41779,41780,31778,31779,87571}; Error message: invalid byte sequence for encoding UTF8: 0xed 0xb8 0xa9 Is there a way to suppress these errors similar to levenshtein which does not care about validity of UTF characters? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Pg-trgm-and-invalid-invalid-byte-sequence-for-encoding-UTF8-tp5791681.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] client that supports editing views
Why not use Access directly? You can connect to PostgreSQL though ODBC (inside Access) and do the forms there. I am sure 95% that I had done that in the past, although I cannot be sure it works with the most recent versions. Something like that: http://www.depts.ttu.edu/ithelpcentral/solutions/odbcsetup.php -- View this message in context: http://postgresql.1045698.n5.nabble.com/client-that-supports-editing-views-tp5782418p5791070.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Dynamic insert into ARRAY? plpgsql
You must a) join the 2 tables on the orderID ... where orderID=15 and then GROUP BY the result by the order ID and concat the orderlines by a custom aggregate function like: http://www.postgresql.org/message-id/db7789b.0309131210.625da...@posting.google.com -- View this message in context: http://postgresql.1045698.n5.nabble.com/Dynamic-insert-into-ARRAY-plpgsql-tp5791090p5791093.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Temporary table already exists
@mephysto I think you are trying to solve the wrong type of problem. Creation of tables (temporary or not) are not supposed to run concurrently. So, this is not an issue of PostgreSQL but design. There are two ways to solve the problem. a) You could use the sessionID (provided The Glassfish server) to create unique names for the temporary table, if the temporary table is unique per session. If not, since you are talking about multiplayer game the temporary table name could include the userID, so it is unique per user. In that sense, it could be persistent per user, or temporary depending on your needs. In that case you will need EXECUTE command to CREATE the table in your pgsql code, since the table name must be a variable in your functions. b) I highly suspect that this temporary table is used for either caching or local sorting / ordering. In this case Postgres is again the wrong tool. You can use an embedded DB like SQLite, HSQLDB for the local database which may be used for this type of operations, which 'syncs' to the global PostgreSQL DB when connecting or disconnecting. Every client will have a separate copy of this DB, so no overhead to the server. Either way you will have solved your issue. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Temporary-table-already-exists-tp5789852p5790806.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hard upgrade (everything)
I would use normal pg_dump and pg_restore for the DBs and not utils/postgis_restore.pl. Also, AFTER I backup all databases and everything else, you could try to upgrade Postgis without upgrading PostgreSQL by buliding from source e.g. http://trac.osgeo.org/postgis/wiki/UsersWikiPostGIS21Ubuntu1304src. There it says that Postgis 2.1 could be used with PostgreSQL 9.1. This will give you access to Postgis 2.1 features without reinstalling everything. Of course normally I would not upgrade if this is an 1-2 years project, unless I 100% need Postgis 2.1 features not present in 1.5. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Hard-upgrade-everything-tp5790801p5790811.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Postgresql GROUP BY SIMILAR but not equal values
I wanted to ask you the following question to all experts here. Let's say I have this table foo ID|G1|T1| 1|2|ABC| 1|2|ABCD| 1|2|DEF| 1|2|DEFG| SELECT * FROM foo GROUP BY ID,G1,T1 RETURNS exactly the same table. Is there a way in SQL or PostgreSQL in general to group by values than are not exactly the same but are quite similar (like 'ABC' and 'ABCD') based on some distance function (levenshtein for example) if the distance is within some threshold (i.e., 1) My intuition is that SQL cannot support such queries but I was wondering if there was some hack around it. The problem as I see it that distance functions require 2 values but GROUP BY only checks equality. Another subproblem that might help is can we overload an operator of a custom type, so that equals operator is more relaxed and is calculated by a function? OR can we use GROUP BY with a custom comparator for a data type? I hope my question makes sense. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Postgresql-GROUP-BY-SIMILAR-but-not-equal-values-tp5790860.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgresql GROUP BY SIMILAR but not equal values
Conceptually, Tom (as always) is right. But Alban's hack help. DROP TABLE foo; CREATE TABLE IF NOT EXISTS foo(ID INTEGER,G1 INTEGER, T1 TEXT, ID2 SERIAL PRIMARY KEY); INSERT INTO foo(ID,G1,T1) VALUES(1,2,'ABC'); INSERT INTO foo(ID,G1,T1) VALUES(1,2,'ABCD'); INSERT INTO foo(ID,G1,T1) VALUES(1,2,'ABDC'); INSERT INTO foo(ID,G1,T1) VALUES(1,2,'DEF'); INSERT INTO foo(ID,G1,T1) VALUES(1,2,'DEFH'); /* A little editing to remove duplicates a to b and b to a */ SELECT foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1),foo2.T1 FROM foo foo1 INNER JOIN foo foo2 ON (foo2.ID = foo1.ID AND foo2.G1 = foo1.G1) WHERE foo1.ID2foo2.ID2 GROUP BY foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1),foo2.T1 HAVING levenshtein(foo1.T1, foo2.T1) 2; RETURNS ID|G1|foo1.T1|foo2.T1 1;2;ABC;1;ABCD 1;2;ABC;1;ABDC 1;2;ABCD;2;ABDC 1;2;DEF;1;DEFH Then it requires a second grouping but as Tom suggested it would be hard to somehow group all similar cases together because then it becomes a clustering problem. With a second grouping we will have 3 records instead of 4, so it is better than the initial case by 25%. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Postgresql-GROUP-BY-SIMILAR-but-not-equal-values-tp5790860p5790876.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Postgresql multidimensional arrays cast fail
I do: SELECT '{{1,2},{3,4}}'::INTEGER[][] But I get: {{1,2},{3,4}} INTEGER[]. Somehow the PostgreSQL server does not understand that is a multidimensional array. So, later if I want to get {1,2} or {3,4}, the field[1] or field[2]. Evem when I try: field [1:1] I get {{1,2}} and not plain one dimensional integer array {1,2} which I want. How do I achieve that? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Postgresql-multidimensional-arrays-cast-fail-tp5790077.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgresql multidimensional arrays cast fail
This is probably the only way to do it. Still, it seems to me an overkill if basically you need to run a function at each multidimensional array to get access to each i-array element. Thanks for your answer. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Postgresql-multidimensional-arrays-cast-fail-tp5790077p5790095.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Reindexing and tablespaces
You were right. I just reindexed the DB and I saw no real changes in drive storage (those tablespaces are on separate hard disk volumes). Thanks -- View this message in context: http://postgresql.1045698.n5.nabble.com/Reindexing-and-tablespaces-tp5789827p5789853.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Reindexing and tablespaces
Hello to all, I have done ALTER DATABASE [database_name] SET default_tablespace = [new_tablespace]; I am wondering, if I reindex this entire DB would the indexes automatically moved into the [new_tablespace] or will they remain in the tablespace they were originally created on; -- View this message in context: http://postgresql.1045698.n5.nabble.com/Reindexing-and-tablespaces-tp5789827.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] levenshtein_less_equal does not work
Hello to all, I always use PostgreSQL from enterpriseDB. I have both 9.2 and 9.1 latest verions installed on two different Ubuntu 64bit machines. For some reason levenshtein_less_equal does not work, in contrast to levenshtein which works as expected. I even downloaded the src code for PostgreSQL code for fuzzystrmatch and compiled myself, added the so file to the correct place and still it does not work. The process was correct, because I have increased the levenshtein limit from 255 to 2048. Can you tell me how to make levenshtein_less_equal. Are those mysterious IFDEF levenshtein_less_equal that are messing compilation? Thanks, Alexandros -- View this message in context: http://postgresql.1045698.n5.nabble.com/levenshtein-less-equal-does-not-work-tp5788710.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] levenshtein_less_equal does not work
Query: SELECT levenshtein_less_equal('extensive', 'exhaustive',2); ERROR: function levenshtein_less_equal(unknown, unknown, integer) does not exist LINE 1: SELECT levenshtein_less_equal('extensive', 'exhaustive',2); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. Even this: SELECT levenshtein_less_equal('extensive'::TEXT, 'exhaustive'::TEXT,2::INTEGER); function levenshtein_less_equal(text, text, integer) does not exist But this works perfectly: SELECT levenshtein('extensive', 'exhaustive'); 4 -- View this message in context: http://postgresql.1045698.n5.nabble.com/levenshtein-less-equal-does-not-work-tp5788710p5788713.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] levenshtein_less_equal does not work
Just this: create extension fuzzystrmatch; If the extension was not installed, then the simple levenstein distance would not work either. -- View this message in context: http://postgresql.1045698.n5.nabble.com/levenshtein-less-equal-does-not-work-tp5788710p5788715.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] levenshtein_less_equal does not work
SELECT version(); and \df levenshtein_less_equal PostgreSQL 9.1.11 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-52), 64-bit routing_nw2=# \df levenshtein_less_equal List of functions Schema | Name | Result data type | Argument data types | Type +--+--+-+-- (0 rows) routing_nw2=# -- View this message in context: http://postgresql.1045698.n5.nabble.com/levenshtein-less-equal-does-not-work-tp5788710p5788717.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] levenshtein_less_equal does not work
=# \dx List of installed extensions Name | Version | Schema | Description -+-++-- plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language (1 row) -# \df levenshtein* List of functions Schema |Name | Result data type | Argument data types | Type +-+--+-+ public | levenshtein | integer | text, text | normal (1 row) The server is in localhost. The binaries are from enterpriseDB. I also downloaded src code for Postgres 9.1.11. Recompiled fuzzystrmatch, which includes levenstein less_equal. Replaced .so file in Postgres installation dir. Change in levenshtein limit of character successful. It is not a compilation error, because I do not get levenstein less_equal in the Postgres 9.2 server which is exactly as downloaded from EnterpriseDB. -- View this message in context: http://postgresql.1045698.n5.nabble.com/levenshtein-less-equal-does-not-work-tp5788710p5788723.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] levenshtein_less_equal does not work
With your help I found the solution. My DB was initially based on a 9.0 server and I had restored it on the 9.1 server. For some reason 9.0 fuzzystrmatch functions (levenstein...) were already present in this DB. So, CREATE EXTENSION fuzzystrmatch did not work. I had to manually remove function 9.0 fuzzystrmatch functions and then CREATE EXTENSION worked perfectly and levenstein_less_equal as well. So, it was error from the DB restored from an older version of postgresql. -- View this message in context: http://postgresql.1045698.n5.nabble.com/levenshtein-less-equal-does-not-work-tp5788710p5788729.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general