Re: [GENERAL] Sequential vs. random values - number of pages in B-tree
Daniel: On Thu, Aug 18, 2016 at 5:24 PM, Daniel Verite <dan...@manitou-mail.org> wrote: >> unless you know of an easy way to generate a random permutation on the >> fly without using a lot of memory, I do not. > It could be done by encrypting the stream. > For 32 bits integers: > https://wiki.postgresql.org/wiki/Skip32 > For 64 bits integers: > https://wiki.postgresql.org/wiki/XTEA Nearly, probably good enough for tests, but only generates a pseudorandom permutation if you encrypt 2**32/64 values, not with the 1..1E7 range, it will map them into 1E7 different numbers in the range 2**32/64. I think there are some pseudo-random number generators which can be made to work with any range, but do not recall which ones right now. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sequential vs. random values - number of pages in B-tree
Hi: On Thu, Aug 18, 2016 at 1:32 PM, pinker <pin...@onet.eu> wrote: ... > create table t01 (id bigint); > create index i01 on t01(id); > insert into t01 SELECT s from generate_series(1,1000) as s; > > and random values: > create table t02 (id bigint); > create index i02 on t02(id); > insert into t02 SELECT random()*100 from generate_series(1,1000) as s; It's already been told that btrees work that way, if you find it strange read a bit about them, this is completely normal, but ... ... what I come to point is your test is severely flawed. It probably does not matter in this case, but you are inserting 10M DIFFERENT VALUES in the first case and only 100 in the second one, which an average of 100K DUPLICATES of each. This affects btrees too. You could try using random*1G, or at least 100M, for a better test ( which may have even worse behaviour, ideally I would just write 10M integers to a disk file, then shuffle it and compare COPY FROM times from both ) ( unless you know of an easy way to generate a random permutation on the fly without using a lot of memory, I do not ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SQL help - multiple aggregates
CCing to the list ( if you are new to this list, messages come from the sender address, you have to use "reply all" ( at least in my MUA, web gmail ) to make your replies appear in the list ). On Thu, Aug 18, 2016 at 3:03 PM, <haman...@t-online.de> wrote: > Hi Francisco, > thanks a lot. I will give it a try later Do it, and do not forget to try the straightforward solution ( sume of cases ) given by Ladislav Lenart above.I normally prefer to do this kind of things the way I pointed you because the queries are simpler and normally only the first one takes time, and using count tends to be the faster way to extract the relevant data ( the rest of my query, after the first with, is just moving data around for pretty-printing ( or pretty-selecting ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SQL help - multiple aggregates
On Thu, Aug 18, 2016 at 10:56 AM, <haman...@t-online.de> wrote: > I have a table cv with custid and vendid columns. Every entry represents the > purchase of a product > available from a specific vendor. > Now, for a set of "interesting" vendors, I would like to select a new table > custid, c415, c983, c1256 > based upon part queries > select custid, count(vendid) as c415 from cv where vendid = 415 group by > custid Divide and conquer, first you get the raw data ( so you have what you need as 'vertical' tagged columns ): ( beware, untested )... with raw_data as ( select custid, vendid, count(*) as c from cv where vendid in (415,983,1256) group by 1,2; ) Then put it in three columns ( transforming it into diagonal matrix ): , column_data as ( select custid, case when vendid=415 then c else 0 end as c415, case when vendid=983 then c else 0 end as c983, case when vendid=1256 then c else 0 end as c1256 from raw_data ) and then group then ( putting them into horizontal rows ): select custid, max(c415) as c415, max(c983) as c983, max(c1256) as c1256 from column_data group by 1; Note: I used 0 in else to get correct counts for the case where not al vendids are present. If you prefer null you can use it, IIRC max ignores them. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Any reasons for 'DO' statement not returning result?
On Fri, Aug 12, 2016 at 11:34 PM, Xtra Coder <xtraco...@gmail.com> wrote: ... > In my particular case I'm more interested in an easy way to create complex > SELECTs that require usage of variables in the one-time through-away scripts > (some-time during experiments for implementation of functions, to see > immediate results of the intermediate code). The easiest way would be > MsSQL-like when declaring a variable outside of SP actually makes it visible > globally in current session. In such case I do not need 'DO' at all and this > is simple. Probably PostgreSQL has another way to make that thing simple. If you are just interested in avoiding some mistakes, and/or parametrizing some queries from the command line, psql ( the CLI program ) has macro expansion with some sql quoting capabilities, see https://www.postgresql.org/docs/9.5/static/app-psql.html#APP-PSQL-VARIABLES and be sure to scroll down to "SQL Interpolation" after the built in variables list and read that. I've used it several times, just remember it's a macro processor and it's done by psql, not by the server. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Corrupted Data ?
Hi Adrian: On Fri, Aug 12, 2016 at 9:01 PM, Adrian Klaver <adrian.kla...@aklaver.com> wrote: > "Specially if this happens, you may have some slightly bad disks/ram/ > leading to this kind of problems." > > Trying to reconcile that with all the physical machine 1 VMs sharing the > same RAM and physical disk, but the error only occurring on db3. > > Is the VM hosting db3 setup different from the VMs 1 & 2? They probably share the disk, bus, ram and disk controllers, but they surely do not share the disk SECTORS. "Weak" ( nearly failing ) tracks can give this kind of problems ( although the disk CRC should catch all odd number of bit errors , but with VMs in the mix who knows where the messages could end up ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Corrupted Data ?
CCing to the list... On Fri, Aug 12, 2016 at 4:10 PM, Ioana Danes <ioanada...@gmail.com> wrote: >> given 318220 and 318216 are just a bit away ( 4db08/4db0c ), and it >> repeats sporadically, have you ruled out ( by having page checksums or >> other mechanism ) a potential disk read/write error ? >> >> >> > Also the index is correct on db3 as the record in case (with drawid = >> > 318216) is retrieved if I filter by drawid = 318220 >> >> Specially if this happens, you may have some slightly bad disks/ram/ >> leading to this kind of problems. >> > > Could be. I also had some issues with an rsync between db3 and drdb a week > ago that did not complete for bigger files (> 200MB) and gave me some > corruption messages. Then the system was revbooted and everything seemed > fine but apparently it is not. > I am planning to drop & create the table from a good backup and if that does > not fix the issue then I will rebuild the server. I would check whatever logs you can ( syslog or eventlog, smart log, etc.. ) hunting for disk errors ( sometimes they are reported ). This kind of problems, with programs as tested as postgres and rsync, tend to indicate controller/RAM/disk going bad ( in your case it could be caused by a single bit getting flipped in a sector for the data portion of the table, and not being propagated either because it happened after your sync of drdb or because it was synced from the WAL and not the table, or because it was read from the disk cache ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Corrupted Data ?
On Fri, Aug 12, 2016 at 3:09 PM, Ioana Danes <ioanada...@gmail.com> wrote: > drawid | 318220 ... > drawid | 318216 > Here are the facts I know: > > August 10 @ 11:10 > - The record was created on db1 and replicated to db2 and db3 > August 11 @ 2:30 > - db1, db2 and db3 are in sync (I have a script that compares the data > for all 3 dbs every night @ 2:30 am) > August 12 @ 2:30 > - db3 is out of sync because of this field (drawid) > - drdb (which is PITRed from db3) is in sync with db1 and db2? given 318220 and 318216 are just a bit away ( 4db08/4db0c ), and it repeats sporadically, have you ruled out ( by having page checksums or other mechanism ) a potential disk read/write error ? > Also the index is correct on db3 as the record in case (with drawid = > 318216) is retrieved if I filter by drawid = 318220 Specially if this happens, you may have some slightly bad disks/ram/ leading to this kind of problems. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SELECT col INTO TEMP TABLE tab2 ON COMMIT DROP FROM tab1
Alexander: On Fri, Aug 12, 2016 at 11:00 AM, Alexander Farber <alexander.far...@gmail.com> wrote: > but the custom function I am trying to call (from another function) does not > return one row, but several rows, which I'd like to store into a temp table: This I know, I wasn't trying to solve the problem. I was just trying to point that "select" is not the same in plpgsql and in sql, so you need to read the docs for plpgsql to find how to solve it. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SELECT col INTO TEMP TABLE tab2 ON COMMIT DROP FROM tab1
On Fri, Aug 12, 2016 at 10:41 AM, Alexander Farber <alexander.far...@gmail.com> wrote: > why does this syntax fail in 9.5.3 please? Maybe because... > $func$ LANGUAGE plpgsql; ... you are writing pspgsql. > The doc https://www.postgresql.org/docs/9.5/static/sql-selectinto.html just But are looking at the docs for SQL. This kind of languages are similar to SQL, but not the same. I think https://www.postgresql.org/docs/9.5/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW is the proper place to look it up. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres Pain Points: 1 pg_hba conf
On Thu, Aug 11, 2016 at 7:04 PM, support-tiger <supp...@tigernassau.com> wrote: > #1) pg_hba conf > Out of the box the md5 setting blocks access. Most "advice" say change to > "all all trust" and indeed that works. But that seems a big security issue. Indeed it is. I do not know where do you get this "advice" from, but I'll seriously consider blackholing the source. > Specifying a postgres role, password, and peer does not seem to work. And > this approach is problematic if there are many roles or even dynamically > created roles. In general this works, in nearly every situation. If you have problems, consider explaining it and may be you wil get some ``advice''. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [SPAM] Re: [SPAM] Re: [GENERAL] WAL directory size calculation
Hi Moreno: On Wed, Aug 3, 2016 at 1:07 PM, Moreno Andreo <moreno.and...@evolu-s.it> wrote: It's already been answered, but as it seems to be answering a chunk of my mail... > Should I keep fsync off? I'd think it would be better leaving it on, right? Yes. If you have to ask wether fsync should be on, it should. I mean, you only take it off when you are absolutely sure of where you are doing, fsync off goes against the D in acid. You normally only turn it off in counted cases. As an example we have an special postgresql.conf for full cluster restores, with fsync=off. Wehen we need it we stop the cluster, boot it with that, restore, stop it again and reboot with the normal fsync=on config. In this case we do not mind losing data as we are doing a full restore anyway. But normally, its a bad idea. As a classic photo caption says, fsync=off => DBAs running with scissors. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [SPAM] Re: [GENERAL] WAL directory size calculation
Hi: On Fri, Jul 29, 2016 at 10:35 AM, Moreno Andreo <moreno.and...@evolu-s.it> wrote: > After Andreas post and thinking about it a while, I went to the decision > that it's better not to use RAM but another persistent disk, because there > can be an instant between when a WAL is written and it's fsync'ed, and if a > failure happens in this instant the amount of data not fsync'ed is lost. Am > I right? With the usual configuration, fsync on, etc.. what postgres does is to write and sync THE WAL before commit, but it does not sync the table pages. Should anything bad (tm) happen it can replay the synced wal to recover. If you use a ram disk for WAL and have a large enough ram cache you can lose a lot of data, not just from the last sync. At the worst point you could start a transaction, create a database, fill it and commit and have everything in the ram-wal and the hd cache, then crash and have nothing on reboot. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] WAL directory size calculation
On Thu, Jul 28, 2016 at 3:25 PM, Moreno Andreo <moreno.and...@evolu-s.it> wrote: > Obviously ramdisk will be times faster disk, but having a, say, 512 GB > ramdisk will be a little too expensive :-) Besides defeating the purpose of WAL, if you are going to use non persistent storage for WAL you could as well use minimal level, fsync=off and friends. > Aside of this, I'm having 350 DBs that sum up a bit more than 1 TB, and plan > to use wal_level=archive because I plan to have a backup server with barman. Is this why you plan using RAM for WAL ( assuming fast copies to the archive and relying on it for recovery ) ? Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: pg_basebackup vs archive_command - looking for WAL archive older than archive_command start
On 07/17/2016 06:35 PM, Francisco Reyes wrote: Why is the pg_basebackup restore looking for a WAL file that is even older than the ones I have, when I turned on WAL archiving before I started the pg_basebackup? Figured it out.. the error is from a secondary slave trying to sync from the machine I just restored the pg_basebackup. Also, why is that one WAL archive named differently with .backup? Still would be nice to know the above.. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pg_basebackup vs archive_command - looking for WAL archive older than archive_command start
I turned on archive_command and have wal archiving going. I did a pg_basebackup and copied the resulting file from source machine to target, yet when I restore I am getting requested WAL segment 000508AE009B has already been removed The earliest WAL archives I have are 000508D2005C.0028.backup.gz 000508D2005C.gz 000508D2005D.gz Why is the pg_basebackup restore looking for a WAL file that is even older than the ones I have, when I turned on WAL archiving before I started the pg_basebackup? Also, why is that one WAL archive named differently with .backup? Normally I restore with pg_basebackup and streaming replication, which works well, but lately due to constant network disconnects I have to make the pg_basebackup in the source machine and rsync. That is why I turned on WAL archiving before starting the base backup since it would take some time to copy the base backup file over. The command I am using to create the base backup is pg_basebackup -U UserName -h 127.0.0.1 -D - -P -Ft | gzip > Basebackup.tar.gz Any suggestions? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [BUGS] Where clause in pg_dump: need help
Hi Adi: ( Is this correct? Adi is what your message uses to mark your replies ). On Sun, Jul 10, 2016 at 6:53 PM, Prashanth Adiyodi <prashan...@celltick.com> wrote: > Hi Franciso, My comments below inline Got them. Only problem is your MUA does not signal quotes. It looks like some kind of outlook by the headers it sends, so I assume it is normal, I will try to correct it. >> 1.- You have a backup with a series of tables which get inserted WITH a >> timestamp. > Adi-The series of tables may or may not have timestamp >> 2.- At the end of the day you want to transfer the inserted data, and only >> the inserted data, to another server and insert it ther. > Adi-Exactly., somewhere post midnight I need to transfer the inserted data > for the day to another DB. Then you NEED some kind of marker. The tables WITHOUT timestamp ( point 1 above ) are going to be difficult. For the discussion I assume you are somehow capable of making a SELECT query which identifies inserted data for the day, and that either you do not care about updates/deletions ( not having any is a subset of this condition ) or you can do a query for those too. > Adi- I am OK with the copy command, however I am not able to understand (my > bad, I am not used to postgres and using for the 1st time) the where clause > that should be used to achieve this result. Well, then your problem is on the queries. You need to be able to identify the data inserted yesterday. There is no magic way to do it. You NEED some kind of timestamp column. If you lack this you can use a trigger to mark them into auxiliary tables or, if you feel adventurous, you can try to use the xmin/xmax columns ( I would NOT recommend even trying that, given you are having problems with simple select queries ). > I tries using something like the below, > psql -d my_db -c 'copy (select * from mytab WHERE date_trunc('day',NOW() - > interval '1 day') TO STDOUT' -o data1.copy; Your where expresion is a constant of timestamp type, i.e., '2016-07-11 12:00:00+00', where needs a BOOLEAN. YOU need to be able to identify the inserted rows. YOU know your data definitions. Ar you able to query them ? > but this, I am sure has some syntax errors, could you help correct this, NOT, because I do not know the table structure. Only you can do that. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pasting a lot of commands to psql
Hi: On Thu, Jul 7, 2016 at 8:18 PM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: >> You might have better luck with "psql -n", or maybe not. > I've wished sometimes for a "\set READLINE off" psql metacommand for > this kind of thing. It's pretty annoying when the text being pasted > contains tabs and readline uses to do completion. Doesn't 'cat | psql ' disable it? I use it with other programs for these purpose ( as well as things like ls | cat to avoid colors/wordwrapping, just makes the program see a non-tty on stidn/stdout ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [BUGS] Where clause in pg_dump: need help
1.- CCing to the list ( remember to hit reply-all or whatever your MUA uses for that, otherwise threads may get lost ). 2.- Try to avoid top-posting, it's not the style of the list ( or mine ). On Fri, Jul 8, 2016 at 4:43 AM, Prashanth Adiyodi <prashan...@celltick.com> wrote: > Basically my requirement is, I have a live Db with certain tables and a > backup Db at another location (both on postgressql). I need to take a backup > of this live DB every night for the previous day (i.e the backup script > running on 07/07/2016 will take the backup of the DB for 06/07/2016). This > backup will be then transferred to the backup DB server and will be inserted > into that DB. From what I have read pg_dump is the solution (similar to > export in oracle), do you think of any other approach to get to this > objective, have you come across a script or something that already does this, Your requirement is a bit 'understated'. I assume your problem is: 1.- You have a backup with a series of tables which get inserted WITH a timestamp. 2.- At the end of the day you want to transfer the inserted data, and only the inserted data, to another server and insert it ther. If BOTH servers are postgres, you can do it easily with a series of COPY commands easily. If the target one is not postgres I would use it too, but pass the COPY data through a perl script to generate whatever syntax the target DB needs ( I've done that to go from postgres to sql server and back using freebcp, IIRC, on the sql server side ) You still can have problems IF you have updates to the tables, or deletions, or . But if you just have insertions, copy is easy to do. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_dump fundenental question
On Tue, Jul 5, 2016 at 4:54 PM, J. Cassidy <s...@jdcassidy.eu> wrote: > I have hopefully an "easy" question. > If I issue the pg_dump command with no switches or options i.e. > /usr/local/pgsql/bin/pg_dump -v dbabcd > /data3TB/Backup/dbabcd > Is their any "default" compression involved or not? Does pg_dump talk to > zlib during the dump process given that I have not specified any compression > on the > command line? (see above). IIRC no options means you are requesting an SQL-script. Those are not compressed, just pipe them through your favorite compressor. ( In a later message you stated you were in Linux and had a 324Gb file, and could head/tail it, have you done so? ). > Your considered opinions would be much appreciated. OTOH, with those big backup I would recommend using custom format ( -Fc ), it's much more flexible ( andyou can have the sql script but asking pg_restore to generate it if you need it, but not the other way round ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] dblink authentication failed
Hello: On Mon, Jun 27, 2016 at 9:38 AM, Kiss Attila <kiss.att...@dardanis.hu> wrote: I'm not expert on dblink but, even supposing it uses libpq and its files: > # psql –h 192.168.1.40 –U myuser –w remote_db_name ... > # ls –la ./.pgpass > -rw--- 1 postgres postgres 193 jún 24 13:54 ./.pgpass This '#' seems to indicate you run those commands as root, while the server typically runs as postgres. Have you checked the commands work when issued as the server user? Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] Online DW
I may be wrong but ... On Fri, Jun 10, 2016 at 6:33 PM, Sridhar N Bamandlapally <sridhar@gmail.com> wrote: > One thing we can restrict to "begin noarchive" transaction block are DELETE > and SELECT only > On 10 Jun 2016 21:57, "Sridhar N Bamandlapally" <sridhar@gmail.com> > wrote: >> This is what I feel will give me solution to maintain production >> (current+7days) and archive(current+history) without any etl/scheduler It seems ( to me ) you have a grossly underspecificied ( in the list ) problem and you have invented an even more underspecified keyword to magically solve it, and expect someone to develop it. And you haven't even bothered to avoid top posting, which is frowned upon on this list and makes infering your problems solution even more difficult. If you want to have something like this you'll need to post much more details on what you are proposing, what are the use cases for the general public, etc.. Just eyeballing it I would estimate this will need many pages just to state the problems and the intended semantics of your proposal. Regards. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Switching roles as an replacement of connection pooling tools
Hi Melvin: On Tue, May 31, 2016 at 3:55 PM, Melvin Davidson <melvin6...@gmail.com> wrote: > On Tue, May 31, 2016 at 3:45 AM, CN <cnli...@fastmail.fm> wrote: >> SET SESSION AUTHORIZATION user2 PASSWORD p2; > Your points make no sense. You can accomplish the same with: > GRANT ROLE user2 TO user1; I'm not discussing wether it makes sense, but you do not acomplish the same. In his case you cannot swict to the other role unless you know the password for it, in yours you can. Also I suspect he wants it to work like a login, i.e., if you have N roles and you add another one he wants it to be like a new login user, and apps/people could have a set of X role+password combos different for each one. I think it's a bizarre thing, but not the same as granting some roles to other ( of course if you have M people and N schemas ( in his example )you can have M login roles and grant combos of N 'schema roles' to them to achieve this, but if N is, say, a hundred, and you have a huge M, like ten thousand, with a different combo for each one, his solution may make sense ( I do not think such a bizarre case justifies the bug-risk of including the feature, but it can make sense ) ) Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Drop/Re-Creating database extremely slow + doesn't lose data
Hi Thalis On Tue, May 31, 2016 at 3:49 PM, Thalis Kalfigkopoulos <tkalf...@gmail.com> wrote: > Intention: to drop a database and recreate it. > Expectation: the newly created db should be empty > What happens: dropping is fast, creation is slow, and when I reconnect, > all the data objects are still there. > > Commands (tried both through command line with dropdb/createdb and through > psql) > Creation is normally slower then dropping. This is normal, as it is a more involved process, and normally an unusual one, so I suspect it's being developed thinking more on correctness and verifiability than speed. Onto the other problem. I did not see the drop commands in your examples. And neither did I see how the / data got to the first dafodb in the first place. And I saw you connect to template1. ¿ Are you aware databases in postgres are made by copying a template database ? Maybe you modified the default template database ( this tends to be template1 ) and this is the reason they are all equal ( read https://www.postgresql.org/docs/9.5/static/manage-ag-templatedbs.html , and also think newly created postgres database are never empty, they have the system catalogs inside them ). Francisco Olarte.
Re: [GENERAL] Deleting a table file does not raise an error when the table is touched afterwards, why?
Hi Daniel: On Mon, May 30, 2016 at 5:35 PM, Daniel Westermann <daniel.westerm...@dbi-services.com> wrote: > I get the file for that table: ... > Then I delete the file: Well, you corrupted the database and invoked undefined behaviour ( not exactly, but postgres is not designed for this ). > No issue in the log. This is probably coming from the cache, isn't it? Is > this intended and safe? It's probably not intended. It can come from the cache or it can arrive from the fact that you are running a unix flavour. In unix ( at the OS level, in the clasical filesystems ) you do not delete a file, you unlink it ( remove the pointer to it in the directory ), the file is removed by the OS when nobody can reach it, which means nobody has it open an no directory points to it ( so no one else can open it, is like reference counting ) ( In fact this behaviour is used on purpose for temporary files, you open it, unlink it and know when you exit, either normaly or crashing, the OS deletes it ). Postgres has the file open, and probably does not bother checking wether somebody removed it under from the directory, as there is no correct behaviour in this case, so no point in checking it. > Then I restart the instance and do the select again: > 2016-05-30 19:25:20.633 CEST - 9 - 2777 - - @ FATAL: could not open file > "base/16422/32809": No such file or directory As expected. > Can someone please tell me the intention behind that? From my point of view > this is dangerous. If nobody is monitoring the log (which sadly is the case > in reality) nobody will notice that only parts of the table are there. > Wouldn't it be much more safe to raise an error as soon as the table is > touched? If you are going to implement idealised behaviour, prohibiting people from deleting it would be better. Any user with minimu knwledge and enouugh privileges can put programs in states from which they cannot recover, there is not point in checking every corner case. In fact, if you can remove the file under the servers feet you can probably alter the running server memory, which would you think the correct behaviour would be for a 'poke rand(),rand()' in the server process? It could have triple redundancy copy of every page and try to vote and detect in each instruction, but is pointless. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Migrate 2 DB's - v8.3
Martin: Could you please avoid unedited top posts? On Sat, May 28, 2016 at 7:53 PM, Martín Marqués <mar...@2ndquadrant.com> wrote: > I still don't understand why the OP is getting into so much trouble and > doesn't upgrade to a newer version like 9.3 or 9.4 (or even 9.5). Neither do I, but the thing is trying to find the fastest way to move a single database between two clusters on different machines, or trying to find a way to do it without downtime, is useful on its own, whicever the versions are. > All this hassle to stay on an unsupported postgres is just useless, IMNSHO. You can ask the OP for the reason to stay in 8.3 directly. Maybe is something as simple as "I'm the one who pays, you do what I pay you for.". I've had several of these. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Migrate 2 DB's - v8.3
Jeff: On Sat, May 28, 2016 at 12:38 AM, Jeff Baldwin <tarheelj...@gmail.com> wrote: > Thank you for your time Alan. .. > To move the DB, you are suggesting something like this: > pg_dump -h dbms11 -U postgres -C mls11 | psql -h localhost -d mls11 -U > postgres I'd like to point one thing, you MAY get a little more speed if you run pg_dump AND psql each in the same host as the DB it's operating on to minimize latency ( and I would time unix socket vs network first in case it differs ). ( to do that I would try something like 'ssh dbms11 "pg_dump mls11 " | psql -d mls11' with all the needed doodahs, and maybe use something like netcat or socat instead of ssh ). The rationale being the intermediate dump is just a data stream and not latency sensitive ( except for the window*latency problem, but you are not going to hit that on a LAN ), while the dump/restore does DB work which is more latency sensitive ( I do not know how many RTTs it would need, specially with blobs, but you can try it ). ¿ How many hours does it take in your tests? Because if you have 1-2 and you can do the dump psql pipe trick, which is quite robust, in 3-4 you may push for it ( arguing it's a simpler an more testable process ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Connections - Postgres 9.2
Charles: On Mon, May 16, 2016 at 6:56 PM, Charles Clavadetscher <clavadetsc...@swisspug.org> wrote: > There really is a state 'Idle in transaction'? Good to learn. Again, IIRC, it was there in the graph legend, orange was Idle, yellow was Idle in transaction ( not in the data, just in the legend ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Connections - Postgres 9.2
Hi Lucas On Mon, May 16, 2016 at 4:10 AM, Lucas Possamai <drum.lu...@gmail.com> wrote: > > Those IDLE connections, might be because the user/application didn't > commit the transaction? > IIRC Those would be 'Idle in transaction' ( which are normally bad if numerous, unless your app has a reason for them, as they are the ones which can block things ). Plain 'Idle' are normally connections between transactions, totally normal if you use poolers, or if your app keeps connection opens while it does other things ( like preparing for a transaction ). Francisco Olarte.
Re: [GENERAL] View not allowing to drop column (Bug or Feature enhancement )
Hi: On Mon, May 16, 2016 at 9:46 AM, Sachin Kotwal <kotsac...@gmail.com> wrote: >> You need to drop the view before recreating it. Then it works. If you >> changed the access to the view with grants or revokes, you also neet to >> recreate them. They are dropped with the view. > Sorry to say but If we need to drop and replace then what is use of "Create > OR Replace " syntax? That is the use case. Create or replace is for doing backwards compatible changes without having to worry about what other things is running concurrently with you. If you make non-backwards compatible changes the system forces you to drop, so you can see dependencies and who is using the things before dropping ( unless you go trigger happy with cascade ). >>> If its not a bug and a limitation kindly guide me towards any >>> documentation where it is mentioned. >> CREATE OR REPLACE VIEW is similar, but if a view of the same name already >> exists, it is replaced. The new query must generate the same columns that > If this is the limitation. Is community is planning update this or add this > feature soon? IMO you've got it backwards. The limitation IS the feature. In my case, as an example, I can code every view ( and functions ) in a script file using create or replace. And when I find a bug / want to make a backwards compatible improvement I can just edit the script and fire it again, and this feature insures I do not impact other code if I inadvertently rename a column, or delete it. If the feature were to be removed, and backwards-incompatible changes were allowed, a lot of people will be unhappy. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] View not allowing to drop column (Bug or Feature enhancement )
On Mon, May 16, 2016 at 8:49 AM, Shrikant Bhende <shrikantbhende@gmail.com> wrote: > While working on the view I came across an unusual behaviour of the view, > PostgreSQL do not allows to drop a column from the view, whereas same > pattern of Create and Replace view works while adding a column. This is probably because you are using create or replace, which is normally used to switch things in place and so it needs them to be compatible with the old ones. A view with an extra column can be used instead of the old one, but a view with less columns can not. I do not see the 'not dropping' part as unusual, and the 'can add columns', well, I see them as a little unusual on a create or replace but I see the point in hallowing it, so just a little. > Alter command do not have any option to drop column > postgres=# alter view vi1 > ALTER COLUMN OWNER TO RENAME TO SET SCHEMA Well, it is a view, not a table. They are basically shorthands for queries and places to attach triggers, so its normal they do not have as much management options. > If its not a bug and a limitation kindly guide me towards any documentation > where it is mentioned. Right at the top of create view? : >>>> Description CREATE VIEW defines a view of a query. The view is not physically materialized. Instead, the query is run every time the view is referenced in a query. CREATE OR REPLACE VIEW is similar, but if a view of the same name already exists, it is replaced. The new query must generate the same columns that were generated by the existing view query (that is, the same column names in the same order and with the same data types), but it may add additional columns to the end of the list. The calculations giving rise to the output columns may be completely different. <<<<<< Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Thoughts on "Love Your Database"
On Thu, May 5, 2016 at 12:34 AM, Vincent Veyron <vv.li...@wanadoo.fr> wrote: > I like this quote from Fred Brooks : > > `Show me your code and conceal your data structures, and I shall continue to > be mystified. Show me your data structures, and I won't usually need your > code; it'll be obvious.' I remembered it as 'Show me your tables,...", which looks even more applicable in this list. A quick googgle search for SMYT did in fact return a link to wikiquote at the top, it says "Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious. " from TMMM, so its normal I remember it that way ( I still own it and reread some chunks every couple of years. ) Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] psql color hostname prompt
Hi Steve: On Wed, Apr 27, 2016 at 7:09 PM, Steve Crawford <scrawf...@pinpointresearch.com> wrote: > The various hacks appear to not deal with the fact that there may be > multiple instances of postgresql running on different TCP ports or Unix > connections nor with the fact that the local connection may, in fact, be a > pooler and not a direct connection to the database. Because the problems is with the host, the port is solved trivially with %> and the local socket name is dependent on the listening port. And, regarding pgbouncer, psql just knows it's talking with someone who speaks the postgres protocol, it has no way to know what is being done with the socket. > As long as we're into hack-land, I'll offer one. > First, you abuse the custom variables feature in postgresql.conf and add > something like: That's a nice trick ( once you peel of the ; before \gset IIRC ) for the problem of knowing which of your server databases you are connected to. Low impact and if the database does not have the guc you can easily know it ( I think knowing the host is not the problem, it does not matter how many socket redirections, bouncers or other things you go through your solucion solves the problem ). > Next you update .psqlrc with something along the lines of: Just a problem, you need it somewhere where it can be re-executed on reconnects ( am I the only one who routinely uses \c ? ). > On the plus side, the custom GUC setting is available to any client, not > just psql. It also handles multiple PostgreSQL instances and connections > that are actually routed through a pooler. Yes, you do not know who you are connected to, but you know which server istance you end up in, which is nice. I think knowing the host/path+port is useful for some things, but for the originally stated problem this seems better. > On the down side, it is a hack. The method is not in any way guaranteed to > be future-proof. It leaves an ugly bit of server output at psql startup. It > requires ensuring that all servers have the variable set appropriately. You > need to avoid colliding with a custom GUC used by an extension. But perhaps > it is useful. Not this hacky, I'll use it in preference to changing the prompt with scripts ( I'll continue using %M and changing terminal titles, but I'm too used to it ). Well seen. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] psql color hostname prompt
Hi Cal: On Tue, Apr 26, 2016 at 5:20 PM, Cal Heldenbrand <c...@fbsdata.com> wrote: ... > 2) %M vs shell call > %M on when connected to the local machine displays the string "[local]" > which I didn't like. I wanted a real hostname to show no matter which > client/server pair I was using. Zero chance for mistaken commands on the > wrong host. Many times we ssh to a remote server, then run psql locally. I do this (ssh'ing) too. What I do when it matters ( connecting to many similar servers at a time ) is to use host connections for everything ( so %M works, and the overhead of using local ip connections vs unix domain sockets is nearly zero these days ). > Perhaps the more elegant route here, is to change psql's behavior with %M > when connected to the local machine? (This would also solve point #3) mmm, strong -1 for this. I would vote for another mechanism, but I think it must reflect the real connection, after all I can typically connect to [local], 127.0.0.1/localhost, $(hostname -i)/$(hostname) and they are different things. A %nice_name would be ok for me, ( and I think easy to do, just do 'if (local) expand hostname else expand whatever %M does'. Also, you could precede it by something, or print it like '[local=host.na.me]' without disturbing present %M usage. > 3) a forked process for every prompt > While this also isn't very elegant, it seems to work fine. Not an elegance concern, and forking is what shells do every time, so fine for me. > It would be nice if there was a way to do some kind of templating script > with the psqlrc file. Something that would dynamically generate the "\set > PROMPT" commands on psql startup, rather than calling out to a shell every > command. (I'm thinking along the lines of ERB for Ruby, Django for Python, > etc.) That can be done with a named pipe ;-> ( or with an alias / function using getopt to parse the options before forwarding them to psql ). But, which just \sets $hostname in a var and uses it. ) Anyway, the problem with this is that if you do \connect to another. You could do something similar to this using only psql/psqlrc tricks: cdrs=> \set fecha `date` cdrs=> \echo :fecha Wed Apr 27 10:23:22 CEST 2016 Here you would use your script instead of fecha, and interpolate it using %:fecha: in the prompt. And now the second step of the trick: cdrs=> \set recalc '\\set fecha `date`' cdrs=> \echo :recalc \set fecha `date` cdrs=> :recalc cdrs=> \echo :fecha Wed Apr 27 10:24:07 CEST 2016 cdrs=> :recalc cdrs=> \echo :fecha Wed Apr 27 10:24:16 CEST 2016 Now you can use :recalc if you do connect to have the prompt updated. Anyway, TIMTOWTDI. > But again, I think the more elegant approach is to alter the %M logic. > Any thoughts? At risk of being redundant, not altering %M, another %x better. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Calculating Minkowski distance between two rows
Hi: On Mon, Apr 25, 2016 at 4:26 PM, Babak Alipour <babak.alip...@gmail.com> wrote: > That is correct. The function I've written only works when the two tables > are named table_train and table_test; is it possible to generalize that to > take in any two tables? And only when all table_train columns are numbers AND table_test contains at least all of them AND they are numbers too. Wouldn't it be easier to use numeric arrays to represent coordinate vectors? Anyway, I'm not versed in all of this, wbut I would try to make a function to turn a table record to a numeric array and then write the numeric array version of the func and call them, divide and conquer. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] psql color hostname prompt
Hi: On Mon, Apr 25, 2016 at 4:04 PM, Achilleas Mantzioswrote: > Hello, have done that, looked really nice, but unfortunately this resulted > in a lot of garbled output, in case of editing functions, huge queries, up > arrows, etc... Did you use %[ %] to delimit non-printing sequences as Cal did? I've tested his prompt ( substituting 'echo tarari' for his sh script ) and it works fine ( at least in xfce4-terminal / ubuntu / 6 terminal lines long input line ). I've seen a lot of color prompts worked by forgetting them ( even when the doc for them specially speaks of color changes ). Frnacisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] psql color hostname prompt
Hi Cal: On Mon, Apr 25, 2016 at 3:55 PM, Cal Heldenbrand <c...@fbsdata.com> wrote: > I whipped up a psqlrc and companion shell script to provide a colored prompt > with the hostname of the machine you're connected to. It works for both > local sockets and remote connections too. Color may be nice, but as previously pointed it can distort things. > /usr/local/pgsql/etc/psqlrc > == > -- PROMPT1 is the primary prompt > \set PROMPT1 > '%[%033[1;31m%]%`/usr/local/pgsql/etc/psql_hostname.sh`%[%033[0m%] > %n@%/%R%#%x ' And you are exec'ing in every round ( not too big, but I come from a time of slow forks). I think psql ( at least in 9.3 didn't bother looking behind ) gives you enough: cdrs=> \set PROMPT1 %M:%>-%x-%:AUTOCOMMIT:-:PROMPT1 db1:5432--on:-cdrs=> begin; BEGIN db1:5432-*-on:-cdrs=> commit; COMMIT db1:5432--on:-cdrs=> \c apc - psql (9.4.7, server 9.3.10) You are now connected to database "apc" as user "folarte". db1:5432--on:-apc=> And you can color it: db1:5432--on:-apc=> \set PROMPT1 '%[%033[1;31m%]%M%[%033[0m%]%n@%/%R%#%x ' db1folarte@apc=> -- not a fan of html mail. lsof may give you longer or more acurate names, but I think std escapes are enough. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Columnar store as default for PostgreSQL 10?
Hi Bráulio: On Thu, Apr 21, 2016 at 12:08 PM, Bráulio Bhavamitra <brauli...@gmail.com> wrote: > And what I keep reading all over the web is many databases switching to > columnar store (RedShift, Cassandra, cstore_fdw, etc) and having great > performance on queries in general and giant boosts with big analytics > queries. And have you read anything about the drawbacks of columnar? They are there, but writing about them does not makes the headlines. > I wonder if there is any plans to move postgresql entirely to a columnar > store (or at least make it an option), maybe for version 10? An option may be good ( may, not sure because nothing is free. More complex code, more bug surface, some time will be eaten managing the extra complexity, less developer time available for each feature, ... ) , but IMHO a complete move would be bad. Columnar is not that good for a lot of postgres usages. If columnar were the silver bullet everybody would be doing it. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Fetching last n records from Posgresql
could turn the sort into a reverse, but it seems difficult. Or you could try to use a cursor, goto to the last record, and then skip N backwards and go fro there, but IMHO it's not worth the complexity, and, at least in my case, it is slower for 1000 records, but YMMV $ begin; BEGIN Time: 61.229 ms $ declare last_1000 scroll cursor for select * from carrier_cdrs_201603 order by setup; DECLARE CURSOR Time: 61.025 ms $ move last in last_1000; MOVE 1 Time: 282.142 ms $ move backward 1000 in last_1000; MOVE 1000 Time: 61.969 ms $ fetch all from last_1000; Time: 248.071 ms $ close last_1000; CLOSE CURSOR Time: 60.922 ms $ commit; COMMIT Time: 60.814 ms Note how once you account for my 60ms RTT It's taking 220 ms to go to the end, and 188 to fetch the result, while: cdrs=# select * from ( select * from carrier_cdrs_201603 order by setup desc limit 1000 ) last_1000 order by setup; Time: 248.566 ms I can do the select in just 188 too. ( This are just 1000 records, but without explain analyze a nice chunk of the time is spent sending them over my 60 ms RTT connection ). Anyway, try things, measure, post results so we know what happens. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] psql question: aborting a "script"
Hi Melvin: On Tue, Mar 15, 2016 at 3:57 PM, Melvin Davidson <melvin6...@gmail.com> wrote: > What you really want is > "ON_ERROR_STOP ... > So just > SET ON_ERROR_STOP = ON > before any other statements IIRC you are right with the variable ... BUT .. it is a psql setting, not a session setting, so he'll probably need to use the \set psql mettacommand: \set ON_ERROR_STOP on and also, use on as suggested on the docs, not ON, I'm not sure wether PSQL is case sensitive. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpected result using floor() function
Hi Frank: On Tue, Mar 15, 2016 at 6:57 AM, Frank Millman <fr...@chagford.com> wrote: > 2. As pointed out, there are two forms of the power function. > > test=> select pg_typeof(power(10, 2)); > pg_typeof > -- > double precision > > test=> select pg_typeof(power(10., 2)); > pg_typeof > -- > numeric > > I found that adding a decimal point after the 10 is the easiest way to force > it to return a numeric. > > Putting this together, my solution is - > > test=> select floor(4.725 * power(10., 2) + 0.5); > floor > --- >473 > Can anyone see any problems with this? I see a problem in it relying in interpretation of constants. From my experience I would recommend explicit casts, it's just a second longer to type but much clearer. The problems start with 10 being interpreted as integer, all the other ones as numeric: s=> select pg_typeof(10.), pg_typeof(10), pg_typeof(10.E0), pg_typeof(10E0); pg_typeof | pg_typeof | pg_typeof | pg_typeof ---+---+---+--- numeric | integer | numeric | numeric (1 row) This may byte you any day, so I wuld recommend doing s=> select v, pg_typeof(v) from (select floor(4.725 * power(cast(10. as numeric), 2) + 0.5)) as aux(v); v | pg_typeof -+--- 473 | numeric (1 row) s=> select v, pg_typeof(v) from (select floor(4.725 * power(cast(10 as numeric), 2) + 0.5)) as aux(v); v | pg_typeof -+--- 473 | numeric (1 row) which makes your intention clear. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Suppress decimal point like digits in to_char?
HI: On Mon, Mar 14, 2016 at 8:02 PM, Ken Tanzer <ken.tan...@gmail.com> wrote: >> price: >> -- >> xx5.45 >> xx1.20 >> 99 >> xx2.40 ... > I appreciate the comment and explanation. But your example shows numbers > where the trailing 0s are not suppressed. Yeah, my fault, but 5.45, 1.25, 99.00, 2.45 will create a that kind of alignment which stronly suggest its .99. Of course it's not printed as such, but visually it can trick you, that's why decimal points are never supressed anc softwar has options to align coluns to the decimal point. > It seems to me that if you're requesting suppression of trailing 0s, then > you're accepting that your numbers aren't going to align in the first place. > And so it's hard for me to see how, for example "99." is ever going to be > desirable output if suppression is what you're after. Never desirable for me, but I never use d9, I always do d0, but you are right. > And just as context on my end, the times I use to_char are generally to merge > numbers into a document or some fragment of English text. For the grouping ',' I see your point, I normally just use defaut conversion for these as I dislike the grouping. As I said, I could see a legitimitate case for Dd similar to the 09 stuff, but having so many replace options ( I think you can even do <trim(trailing '.' from to_char(number, 'FMD99'))>, which is easier on the eye but fails on locales, as the regexp does ( as a note, in Spain they are inverted, dot for grouping comma for decimals ) ) I do not think it's a big deal, uglier things are coded by me continuously nearly via muscle memory. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Suppress decimal point like digits in to_char?
Hi Ken: On Mon, Mar 14, 2016 at 7:33 PM, Ken Tanzer <ken.tan...@gmail.com> wrote: > Thanks for all the info and suggestions. I'll just observe that sure, you > can do it with a regex, but I'm still surprised that this can't be done with > to_char. Well, this may be a good enhancement request, add something like d=decimal point, supressed if alone. > In particular, one might reasonably choose a format string like > 'FM999,999D99' and not realize it will fail on whole numbers. Is there any > particular reason the D is not suppressible in this case, either by default > or as an option? It seems to me if the trailing 0s are suppressed, the > decimal should follow suit for whole numbers. It does not fail, it just works in a diffrent way of what you would like. Regarding supression, IMO it's a bad thing, it can lead to misleading results. Imagine it is, and you do a right aligned print ( usual for numbers ) of prices 5.45, 1.20, 99.00, 2.40, and you end up with ( using x for align ) price: -- xx5.45 xx1.20 99 xx2.40 It would be misleading, I prefer to have xxx99., ugly but clearer IMO ( of course one never supress decimals in prices, so I would use 990D00, but anyway ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Suppress decimal point like digits in to_char?
Hi David: On Mon, Mar 14, 2016 at 4:22 PM, David G. Johnston <david.g.johns...@gmail.com> wrote: >> But a right-aligning string output routine needs to be used. ... >> Summarising, any combination can be easily done with a single round of >> replace. > See also: > http://www.postgresql.org/docs/9.5/interactive/functions-string.html > format(formatstr text [, formatarg "any" [, ...] ]) You mean to use it to right align the replaced string ( i.e. format('%12s',replace(...)) ) or is there a code I do not know off which can be used to achieve the global result ( I use it but it seems to be like a restricted sprintf which can not do the supress the zero stuff ) ? Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_restore fails
Hi Karsten.. On Sun, Mar 13, 2016 at 12:09 AM, Karsten Hilbert <karsten.hilb...@gmx.net> wrote: > I am trying to pg_restore from a directory dump. > However, despite using > > --clean > --create > --if-exists > > I am getting an error because schema PUBLIC already exists. snip, snip Have you tried the classic combo pg_restore -l > toc.dat, your_favorite_editor toc.dat pg_restore -L toc.dat? I've had great success with that in the past, even splitting the TOC in several chunks to be able to make adjustements between them, but I've never used the directory format for ( serious, I've tried all when learning ) backups. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] multiple UNIQUE indices for FK
Hi Rafal: On Mon, Mar 7, 2016 at 9:29 AM, Rafal Pietrak <ra...@ztk-rp.eu> wrote: . >> be slower ). And you introduce several problems, the naming ones, a >> very strange foreign-key relationship between kids, the possibility of >> having a row inserted in the parent. > No, no. It was OK. the parent table was appropriately "ruled on insert" > for inherited tables to work as partitions. But you have to rule a lot, to avoid people inserting into the kids, anyway, without seeing the whole lot I'm not gonna comment more. > and the table was called "messages" :) But it did not contain messages, it contained message-persons relations. . sniped, too complex without seeing the whole dessign. >> http://xyproblem.info/ . > :) this is a good one!!! > Actually I'm very, very acquainted with this "XY problem". i.e quite > often when "end-users" ask me for additional functionality, they (like > in the XY case) suggest a "technical solution". And (just like you :) I > always ask: pls tell me what you do "traditionally", e.g "when you are > currently doing this on paper: how does it go - step by step", then I'll > find a good IT solution for you. Well, now you have an url to mail them. > In case of a complex inter-mangled systems, where a well defined > "critical point" shows up, it's more efficient to extract the "show > case" that causes the problem and focus on this, instead of digressing > on overall design. (which may be flowed, but cannot be rewritten at this > point). May be, but for me your solutions are so complex I cannot follow them. > BTW: using timestamp instead of FK to message_id does not work quite so > well. To see the problem, ponder a question: what time resolution should > such timestamp have, to be as robust as FK ... irrespectively of the > transaction load? That's irrelevant. Timestamp is a concept, as I told you, it's just a value whcich defines a full order. Normally the system has a timestamp source which insures it. If you have not one you can use a cache=1 sequence. When in a single process problem like this I normally use an second timestamp which I autoincrement if repeated, something like: get_timestamp_for_id() { Locked(mutex) { now=time(); if (last_returned_id >= now) { return ++ last_returned_id; } else { return last_returned_id = now; } } This has the nice property that it eventually drops to timestamp after a burst, so the ID do double service as generation timestamps, but a single locked counter, a sequence, works as well. ... More snipping. I cannot recommend more things. The only thing, for easier locating of a message in a person, cache the last message id in the person ( which you can use as a lock for updtings ) and just use the next for linking the chain ( because, as you said, a message can be no longer the last, so, unless this only happens when you destructively pop the last message in the chain for a user, you need a full linked list to recover the previous one ) ( if you determine the new last message by other means you do not need any of these things, just cache the last message in each person record, then when you insert a new one you update each sender / recipient with the last message id at the same time you insert the records, preferably sorting the ids first to avoid deadlocks if your concurrency is high, although I suspect you'll need a linked-list-per-user if it has to be the previous one ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] multiple UNIQUE indices for FK
Hi Rafal: On Fri, Mar 4, 2016 at 11:46 PM, Rafal Pietrak <ra...@ztk-rp.eu> wrote: > W dniu 04.03.2016 o 18:59, Francisco Olarte pisze: >> Make sender_person_id NOT NULL in messages if you want to insure every >> message ahs exactly ONE SENDER, leave it out if you want to allow >> senderless messages. An FK column must either link to a record or be >> null. >> >> Then, if you want to have a msgs-person ''table'' I would use a view: >> >> CREATE VIEW msgs_persons as >> SELECT message_id, sender_person_id as person_id, 'SENDER' as role from >> messages >> UNION ALL >> SELECT message_id, recipient_person_id as person_id, 'RECIPIENT' as >> role from recipients > This was my initial schema .. with the addition of one "super table", > that the two above (sender_person_id and recipient_person_id) both > inharited from (to avoid the UNION ALL when selecting everything). Wuf. I do not like it. I would add a column named sender_person_id to messages ( to distinguish its role ) and put a recipient_person_id, or just person_id, in recipients ( the role is clear in that table ) to avoid problems. Otherwise, what do you call the parent table and the fields? It's a naming issue, nut I've found the hard way naming is important in this things. Bear in mind you do only avoid TYPING the union all when selecting everything ( as inheritance DOES do a union all, it would have to do it with both kids AND the parent, so it MAY be slower ). And you introduce several problems, the naming ones, a very strange foreign-key relationship between kids, the possibility of having a row inserted in the parent. > With that layout, the NEXT column worked just fine. I do not doubt the NEXT column works, I just doubt it's a good thing on a relational dessign. > Only then came the requirement to have a "possibly sequence-continues" > unique ID assigned to every message irrespectively if a particular > person was a sender or a recipient of that message_id. And I couldn't > figure out how to implement it across separate (even if inharited) tables. > So came the concept of single table of messages, with ROLE field and a > partial unique constraint on sender+sender-message-id ... and I've > sterted to rewrite the schema, but at certain point I realized that it > broke the NEXT functionality and I cannot imagine any way to reintroduce > it into the new table layouts. Which is exactly the functionality of the NEXT column ? I mean, I see you have messages with ONE sender and MANY? (Can they be zero? ) recipients. What are you trying to achieve with it? How are you planning to maintain it in your dessign? > Now I'm quite stuck here. I ask these questions because I think we are in a case of http://xyproblem.info/ . > BTW: I'm considering your sugestion of replaceing NEXT with the > timestamp. The primary reason for the NEXT is to be able to fetch a row > "just preceeding" currently inserted new one AFTER the insert is done > (in trigger after), so that some elaborated "statistics" get updated in > that "one before" message record. May be timestap would do instead... If you are planning on updating the previous row for a message ( or a person ? ) on a trigger, this smells fishy. You may have a reason, not knowing what you are exactly planning to do, I cannot tell, but it sounds really weird. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] multiple UNIQUE indices for FK
epts the requirement for a unique index > as FK target column "mandatory performance support", then I fail to see > real reazon, where *ENY* unique index shouldn't do that role too. They > are unique (within domains of their conditions) and by definition yield > a single row for FK (or nothing); that should be sufficient for the > engine to keep data consistancy as expected, shouldn't it? Maybe. I'm not following too much what you try to do here. Either you are way above my level or you have a mental model of how postgres should work which does not correspond to how it does. > Naturally I undestand that there might be some deep reasons for > exclusion of partial indexed as FK target "selectors" - I'd apreciate > further explanations. But in case those reasons exist, I'd expect > workarounds to exist too - like "SQL idioms" - that people normally use > for cases like these. Would those "generic constraint" be be idiom? Please, forget anything I said about generic constraint, my fault, I was trying to express something, failed at it, and do not know how to fix it. What I personally do to avoid these kind of problems is to avoid dessigning something which needs references to a partial index. And I've been successful at it for a long time. I'm not going to recover the complete thread to recap on why you exactly are doing these kind of really advanced things, but I suppose you need them for some reason and I do not have the resources to study it. > So as I said before, I'm looking for some guidence here. (I'm really > emotionally bond to that NEXT field there :) ON this I cannot help you too much. I do not see what you are trying to achieve with the NEXT field. These will need more explanations, and more study, and as I said before, I do not have the available resources for them. Sorry for the tme I've taken, but I feel I can not be of any help here. Best regards. Francisco Olarte, over & out. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] multiple UNIQUE indices for FK
Hi Rafal: On Fri, Mar 4, 2016 at 11:44 AM, Rafal Pietrak <ra...@ztk-rp.eu> wrote: > While doing so I fell onto another problem, to which I cannot find any > resolve so far. ... > 2. but in the original schema I did have an additional field NEXT, which > allowed me to dasy-chain all messages originating from a particular > sender, and just one message (the most recent one) did have it a NULL > there, so it was easy to peek the last message (which is a frequent > operation). I do not recall your original schema too well, but IMO doing linked-lists with database records is not usually a good idea. They are very procedural and relational is declarative. Normally to peek at the last message from a chain you just declare your intentions in sql, typically by having a timestamp column and doing select whatever where whatelse order by xxx_ts desc limit 1. > 3. currently, having just partially-unique index on messages-persons > table for senders, I'm unable to FK (person,role,next) to (person,role,ssn). This is normally a sympton of your schema not being appropiately normalized. > Postgres complains, that FK columns MUST have an unconditional unique > index at its target columns. I would expect this, a foreign key must uniquely determine a row on another table, postgres insures this with unconditional unique index. This is because FK target tables, not indexes. If you are targetting a conditional index probably you want another type of constraint. > A) how to get around it? Do not use FK. Try to use generic constraints. Better , normalize your schema, IIRC it was not even in 2NF, and this tends to be asking for problems. > B) why that unique index at the target have to be unconditional? I mean: > --> if the table was split into two inharited tables (one for > role-sender, one for role-recepient), the partition table containing > only role-sender could have a full-unique index and thus could become > target for FK(sender,next). > --> so why rdbms cannot treat partial indexes just like that: as if > those where full-unique-indexes, but only covering part of the data. And > consequently if there was a 'partial-index-miss', the target key is > assumed as not present. > I'd apreciate any help in how should I implement the chaining of rows in > messages-persons table (like above); and some info on the "theory of > rdbms" (or clasure in standard specs) which lead to restrictions > preventing partial indexes as FK targets, Partial indexes, even indexes in general, are implementation details in "theory of rdbms". This theory is more matemathics, based on tuples, sets, and the like. Normally FK wants unique keys as targets, the fact that many dbms force a unique index for these is an implementation detail, you can have a unique constraint by just scanning the table on every insertion / update, it will be slow but will work ( and in some cases, like extremely small tables would even be much better than indexes ). >From what you write your way of operation reminds me of when I worked with COBOL and indexed files, you try to use the indexes, directly, and make the rdbms use them automatically for some ops like you would do in a one of this systems, but that is not the way rdbms work, they like to have a declared structure and decide by themselves what to do. This is nice in that once you write a query you can partition, add indexes, create views, and let the rdbms work out how to do it, but imposes some ( some would say a lot ) constraints in how you put your data in. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How jsonb updates affect GIN indexes
CCing to list to maintain context. On Sat, Feb 27, 2016 at 12:14 PM, Eric Mortensen <e...@appstax.com> wrote: > Thanks Francisco, I had not considered MVCC. If that is true, it would seem > to me that a GIN index would "always" be less efficient, as it potentially > would have to update every key's posting tree if a tuple moves, whereas two > btree indexes would only require modififying two trees. If you are that worried about performance, maybe your data will be best served by splitting those fields out of the jsonb or even splitting the table. As always, measure, identify bottleneck. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How jsonb updates affect GIN indexes
Eric: On Sat, Feb 27, 2016 at 11:46 AM, Eric Mortensen <e...@appstax.com> wrote: > When a jsonb column is updated, as far as I understand the entire column is > updated, even though perhaps only one of the keys has a modified value. Also, bear in mind when a column is updated the tuple is too, and MVCC treats it similarly to a delete+insert, so it may move around ( as the old version may be needed by some transaction, and not enough space may be free on the original location ). and every index reference ( GIN or whatever, to any column ) needs to be updated. I do not know the details, I just know it's a complex decision, someone with more knowledge of the internals may give you a more acurate descriptin if needed. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why Postgres use a little memory on Windows.
On Sat, Feb 20, 2016 at 7:37 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > It looks like the bitmap heap scan generally returns exactly one row for > each outer row, which makes me wonder if the BETWEEN couldn't be replaced > with some sort of equality. Mm, I'm not good reading explains, but that seems to confirm my suspicion that gaps partition the id range in non overlapping ranges. > But that might take some rethinking of the data. If id is a series, gap defines a range, he can do something with an auxiliary table, like select start as a, 0 as b from gaps where status = 'GP' union all select id as a,1 as b from data union all end-1 as a, 2 as b from gaps where status='gp' -- to end-1 to make intervals half open. order by a,b which would give all the ids in a with b=1 surrounded by (0,2) when valid and by (2,0) when invalid. and then, with a creative window clause or a small function, filter that and join with data.id. I suppose adding a third c column, null on b=1 and =b on b=0/2 and selecting the previous non-null in the sequence could do it, but it's somehow above my window-fu, I'm more of a code gouy and would do it with two nested loops on a function. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why Postgres use a little memory on Windows.
On Sat, Feb 20, 2016 at 7:13 PM, Adrian Klaver <adrian.kla...@aklaver.com> wrote: . > FROM > sym_data d INNER JOIN sym_data_gap g ON g.status = 'GP' > AND d.data_id BETWEEN g.start_id > AND g.end_id . > The thing that stands out to me is that I do not see that sym_data and > sym_data_gp are actually joined on anything. Yes they are, although the formatting hid it somehow. It is a classic, data_gap defines intervals via start+end id over data, he wants to join every data with the corresponding gap. It is a hard optimization problem without knowing more of the data distributions, maybe the interval types and ginindexes can help him. When faced with this kind of structure, depending on the data distribution, I've solved it via two paralell queries ( gap sorted by start plus end, data sorted by id, sweep them in paralell joining by code, typical tape-update problem, works like a charm for non-overlapping ranges and even for some overlapping ones with a couple of queues ) . And he seems to want all of the data ( sometime this goes faster if you can add a couple of range conditions for data.id / gap.start/end_id. > Also is it possible to see the schema definitions for the two tables? My bet is on somethink like data.id ~serial primary key, gap.start/end_id foreign key to that. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] repeated characters in SQL
On Sun, Jan 24, 2016 at 7:05 PM, <rasha...@gmail.com> wrote: > I guess the escape character (which is not needed in, say, Notepad++) threw > me a bit. Notepad ++ is, AFAIK, an editor, it SHOULD (within reason) let you write any text. The double quote is needed due to the quoting rules of the language. You want the regexp engine to see the characters leftp, dot, rightp, backslash, one. But backslah is the scape character in strings ( in many languages ), so you need to escape it too. The same happens in, for example, C and Java. To put those five chars in a string you need "(.)\\1". If you send "(.)\1" to a C compiler it will build the string leftp, dot, rightp, SOH=(char)(1). It will arrive to the backslash when parsing, see it is followed by a digit less than 8, interpret it as an octal escape, and emit the SOH. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL upgrade 9.3.4 -> 9.3.10
Hello: On Tue, Jan 12, 2016 at 10:28 AM, Dev Kumkar <devdas.kum...@gmail.com> wrote: > I want to upgrade my database from version 9.3.4 to 9.3.10. > > For this task, do I need to upgrade database using pg_upgrade utility? > http://www.postgresql.org/docs/9.3/static/pgupgrade.html No, that is a minor release. Minor version upgrades are normally just install new vesion & restart. In fact we usually install the new version over the old one and just restart the server ( but YMMV, specially if you use one of those operating systems which do not let you delete open files ). Minor (9.3.4->9.3.10) version normally preserve on disk formats, so nothing needs to be done. Mayor ( 9.3.4 -> 9.4.x ) do not, and they normally need either a dump & restore or a pg_upgrade ( which works like an optimized dump & restore ). In any case, read the release notes ( 9.3 docs, at end,http://www.postgresql.org/docs/9.3/static/release.html ) for versions 9.3.5-10, specially the headers, any special instructions should be contained here, when you read 'A dump/restore is not required for those running 9.3.X.' it meand you can just install and restart. > From the details it looks like for minor version upgrade pg_upgrade utility > is not required. > "pg_upgrade (formerly called pg_migrator) allows data stored in PostgreSQL > data files to be upgraded to a later PostgreSQL major version without the > data dump/reload typically required for major version upgrades, e.g. from > 8.4.7 to the current major release of PostgreSQL. It is not required for > minor version upgrades, e.g. from 9.0.1 to 9.0.4." > > Can someone please provide more details here and also what steps needs to be > done to upgrade to 9.3.10 level? When postgres changes on-disk formats it bumps the major version. Normally these used to mean you needed to dump & restore the database, ( Using the NEWER dump/restore utilities, as 9.4 dump/restore know how to deal with a 9.3 DB, but 9.3 D/R do not even know wether a 9.4 version exists ) pg_upgrade was developed because the on-disk format changes are tipically minor, and a special program could be made to transform the data from a version to a later one faster than dumping & restoring, but is more or less equivalent to doing that. As minor version upgrades do not need dump/restore, they do not need pg_upgrade either. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Options for complex materialized views sharing most of the same logic?
On Sun, Dec 27, 2015 at 10:39 PM, Wells Oliver <wells.oli...@gmail.com> wrote: > I have one view which totals about 60 columns per day. Each day has a "days > ago" column like so: . > Each of these views is basically a copy of one another for 99% of the code > (the summing, percentages, etc). The only differences are: > > 1) checks the days_back <= 10 > 2) checks days_back <= 30 > 3) checks days_back <= 60 > 4) does not check days_back > > Is there some easier way for me to maintain the structure of the view > without copying/pasting it 4 times and making one small tweak? I find myself > adding/removing columns to these views and I do it 4 times each time. Without knowing more details I would sugest making number 4 and then defining 1-3 as select * from v4 wher days_back<=. And, for the copy/paste stuff, you can do it easily with an script. I normally would do it in some scripting language but you can even do it in psql with some creative variable usage, just put the head stuf in a var, the tail in another one and compose the sentences using them, simplified example: s=> \set head 'select version() where 2>1' s=> \set tail 'and 3>1' s=> :head :tail; version PostgreSQL 9.3.10 on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc (Gentoo 4.9.3 p1.4, pie-0.6.4) 4.9.3, 64-bit (1 row) s=> :head and 0>1 :tail ; version - (0 rows) If posible I would try the composite stuff mentioned first, but one of these should be enough, in the second case you still recreate the things, but let the macro processor do the boring stuff. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How to audit non LDAP connections?
Due to security/audits have moved most users to LDAP. Looking for a way to tell if a connection is/is not going through LDAP. Other than errors, such as bad password, have not found a way to tell if a connection is using LDAP or postgresql internal authentication in the logs. Tried going through different log variables and setting verbosity to max and did not see any indication of LDAP connections. Long term we plan to only allow remote connections through LDAP in pg_hba.conf, but hoping to track down any connection that is not using LDAP to have that converted in a controlled fashion vs flipping the switch and seeing what breaks. Postgres 9.3 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] JSONB performance enhancement for 9.6
Hi: On Sun, Nov 29, 2015 at 1:09 PM, Bill Moran <wmo...@potentialtech.com> wrote: > On Sat, 28 Nov 2015 21:27:51 -0500 >> Currently, if I save a large json document with top level keys of thousands ** LARGE ** > The default storage for a JSONB field is EXTENDED. Switching it to > EXTERNAL will disable compression. You'll have to insert your data over > again, since this change doesn't alter any existing data, but see > if that change improves performance. Good recomendation, but see if it improves AND if it affects other queries in the system. Turning off compresion CAN decrease the eficiency ( hit ratio ) of the shared buffers and the cache, IIRC ( but worth testing anyway ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Convert from hex to string
Hello Yuriy... On Wed, Nov 25, 2015 at 4:47 PM, Yuriy Rusinov <yrusi...@gmail.com> wrote: > I have to transform string, encoded to hexadecimal to text, but if I try to > select encode('Qt is great!', 'hex'); I receive valid and correct results > > 517420697320677265617421 > > but if I try to select decode ('517420697320677265617421', 'hex'), I > receive the same string, such as > '\x517420697320677265617421, which way I have to do for valid convert > to/from hexadecimal ? I seem to recall having answered this a very short time ago, but maybe it was in the spanish list. decode/encode are for converting bytes to a string. You need to convert the string to bytes in a controlled way first ( bear in mind there are implicit conversions ). What you want is, given a text: 1.- Convert it to a bytea, in a controlled encoding: convert_to(string text, dest_encoding name) => bytea 2.- Then encode the bytes in hex: encode(data bytea, format text) => text then, to revert it you: 3.- Decode the hex string to bytes: decode(string text, format text) => bytea 4.- Convert the bytea, in a controlled encoding, to text: convert_from(string bytea, src_encoding name) => text As you see, they are nicelly paired. I see another response which just does encode , decode+convert_from. This works because the database does implicit conversions, but I would not recomend it. I cannot try it because all my databases are UTF-8 but I feel Adrians example would not work if your database encoding is NOT UTF-8 ( and you use any char outside of ascii range ). Look at the docs of the functions, section 9.4 table 9.7 int the 9.4.5 manual. If you do it this way, you can also choose the encoding, ie, if you know your data is latin1 you can convert from/to it and save a few bytes, or you can convert to/from utf8 an insure you can represent anything. Then you can encode/decode the bytes in whatever sutis you, hex, as in yuour eample or base64 if you need to save a few bytes. Types are there for a reason. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Convert from hex to string
On Wed, Nov 25, 2015 at 6:22 PM, Adrian Klaver <adrian.kla...@aklaver.com> wrote: >> 1.- Convert it to a bytea, in a controlled encoding: convert_to(string >> text, dest_encoding name) => bytea >> 2.- Then encode the bytes in hex: encode(data bytea, format text) => text >> >> then, to revert it you: >> >> 3.- Decode the hex string to bytes: decode(string text, format text) => >> bytea > > > Can't this be shortened to: > > aklaver@test=> select convert_to( 'é', 'latin1'); > convert_to > > \xe9 > (1 row) > > aklaver@test=> select convert_from( '\xe9', 'latin1'); > convert_from > -- > é > > since convert_to() outputs bytea and convert_from() accepts bytea? Yes, but he originally asked to convert to hex, I assumed a hex string. Note you are using hex because this is th default encoding for bytea output in the database, as the database needs to convert everything to text to send it to psql to display it and psql sends text which the database needs to convert to operate. But if you are using something like, say, java, you would need to bind convert_to output and convert_from input to a byte[] ( although maybe jdbc is sending/receivine hex strings in the client encoding, the wire protocol is transparent to you ), which you can then print however you like, if you want to bind String with hex encoded data you nide the encode/decode steps. >> As you see, they are nicelly paired. I see another response which just >> does encode , decode+convert_from. This works because the database >> does implicit conversions, but I would not recomend it. I cannot try >> it because all my databases are UTF-8 but I feel Adrians example would >> not work if your database encoding is NOT UTF-8 ( and you use any char >> outside of ascii range ). > > If you are doing all this in the same database I am not sure how the above > applies? You explicitly used convert_from with UTF8, if the database was latin 1 or ebcdic you would have an encoding mismatch, as the text output routines will convert the input text to bytea using that encoding. > Would you not just use the database encoding for the src_encoding? I do not quite understand the question, if you want your encode-decode-convert_from towork, yes, you use the database encoding in convert from, but I did not see a 'show client encoding' or similar thing in your sample. Anyway, I was assuming the hex conversion was needed for something more complex than just pasting the data, for that anything can go, including a \g | perl -pe 'unpack H*'. So I tried to show how the data flows without relying on any implicit conversion, the convert_to+encode => decode+convert_from works in any client encoding, even in a thing like ebcdic. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Convert from hex to string
Mail pingpong day. ;-) On Wed, Nov 25, 2015 at 7:27 PM, Adrian Klaver <adrian.kla...@aklaver.com> wrote: > postgres@latin1_db=# \encoding > UTF8 This does not matter, as you are sending/receiving hex data, and the encoding done when sending query results to you gets reverted when you send queries back to the server. > postgres@latin1_db=# select convert_from('\xe9', 'latin1'); > convert_from > -- > é > (1 row) This DOES matter, you've had to change the query for it to work, you had to look up ( or know beforehand ) the database encoding and change it accordingly. . I mean, when you do encode('text',) what you are really asking for is encode( implicit_text_to_bytea_conversion(text), ), so you save one step, but them you have to change your query to the 'latin1' encoding you discovered. This is what I meant, you had to look at the database properties. But if you do an explicit convert_form with any encoding capable of representing all your data, like utf-8 which can represent anything, the database encoding does not matter. And it should not, proper code should work with any database encoding. Even more, I can do encode(convert_to(utf8)) in a latin1 database conecting with any encoding, then send the hex to convert_from(decode(),utf8) to an ebcdic database use another encoding ( Of course I may need to transcode the hex, but not to the ebcdic, but to the client encoding been used in teh second case ), ant it would work as long as all the characters In the source data are representable in the destination database ( if they are not you are out luck in any scheme you may think off ). You cannot encode generically an string to hex unless you define an encoding. How do you encode '1', "31" or "F1" ? or maybe "0031" or "3100"? You can do a quick hack, but normally what you want is first to encode a sequence of characters to a sequence of bytes and then hex-encode that, as nearly everybody uses the same conversion for hex-encoding a byte sequence. This means you can have a '0' in a ebcdic database, transform it to to [0x30] byte array, encode this as "30" and then transform the later to 00 30 00 10 because you are using UTF16-BE wire encoding. Encoding is tricky enough without relying on implicit convertion or on a character being the same as a byte. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Convert from hex to string
Hi Adrian: On Wed, Nov 25, 2015 at 9:33 PM, Adrian Klaver <adrian.kla...@aklaver.com> wrote: > I will grant you that working with encodings is like working with > timestamps, explicit is better. The thing I am having a problem with is how > not knowing the context of the bytea value is different in the implicit vs > explicit case: > Granted the above is contrived and bound to fail, but the point is you need > to know what created the bytea however it got there. Now if you are in > charge of both ends of the process, then the above is your own fault. > Otherwise, you are down to detective work on what encoding was used whether > it was implicit or explicit. As the OP was working in a single context I am > not seeing the issue in making use of that context to do the heavy lifting. > For the use cases that you show I agree that a defined > convert_to/encode/decode/convert_from chain is a best practice and something > I had not really thought out, so thanks. Well, I tend to think as a programmer. So I was thinking of the scenario where you want to get some text from the database and move it around, and you control all the code. What I was trying to point is that using explicit all the character sets that matter are in the database, so I can move it around freely, as I'm the one defining the queries, while in the implicit case I have to know, or get from the catalogs, the encoding of the database. I do not know what context the OP was working and wanted to point he was mixing types. Postgres has a lot of them, specially to/from text, and I've found the hard way that lots of implicit conversions are great for one shot programs or interactive tests, but relying on implicit type conversions for real production code, put in a source, causes a lot of problems. In a single session case you can even use implicit conversion + encode and then paste the result into a convert_from adding quotes and x and it's going to work, but if you write down that in code you are going to be confused if something fails later, things like: cdrs=> select encode('Año','hex'); encode -- 41c3b16f cdrs=> select convert_from('\x41c3b16f', 'UTF-8'); convert_from -- Año (1 row) seem like you are converting back and forth, but then: cdrs=> select convert_from(encode('Año','hex'), 'UTF-8'); ERROR: function convert_from(text, unknown) does not exist LINE 1: select convert_from(encode('Año','hex'), 'UTF-8'); And also, the encode() example will give different results depending on database encoding. Using explicit charsets and correct types frees me from it. After all, to make the full round trip a covert-to+encode+decode+convert-from is needed, and making it explciti makes things easier. After all, not everybody knows that all values in the database pass through a conversion to/from text to be sent to / received from psql, and that you can use it if you just want the hex digits in the current database encoding: cdrs=> select 'Año'::bytea; bytea \x41c3b16f (1 row) But I doubt using cast instead of encode can be recomended. And for the heavy lifting, you are just saving some keystrokes, which IMO is a cheap price to pay for having a nearly self documenting conversion chain. Anyway, I think people should be told to respect types, and people should be teach that strings are sequences of characters, that to do what people think is 'hex encoding' ( two digits per *byte*, no delimiter ), you need to first transform the string to bytes, then hex-encode that. Nearly all the encoding problems I see is because people thing strings are sequences of bytes, which they ceased to be when multibyte encodings where detected ( and even without them, in Java and I think some of the windows NT API Strings where sequences of 16 bits thingos, first UCS-2. then UTF16. Once people stop trying to encode/decode strings directly normally they problems vanish. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Timezone and Brazilian DST
On Wed, Oct 28, 2015 at 6:43 AM, Edson Richter <edsonrich...@hotmail.com> wrote: > Configured since first setup. Brazil went DST few days ago. > Today, executing > Select now() > Resulted in > > 2015-10-28 02:45:37-03:00 > > I do expect > > 2015-10-28 03:45:37-02:00 > > I suspect that Postgres is ignoring the DST change. Have you checked the timezone definitions are correct in the server machine? I.e., doing something like this on it: $ TZ="America/Sao_Paulo" date -R Wed, 28 Oct 2015 10:09:19 -0200 $ TZ="America/Sao_Paulo" date -R -d '1 month ago' Mon, 28 Sep 2015 09:09:55 -0300 ( I'm not familiar with your distro, but I got bitten by one of those soem years ago, incorrect timezone definitions ) Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Can we make regexp processing more friendly by recognizing "\r\n" as a "newline" for "^$" purposes?
Hi David: On Sun, Oct 18, 2015 at 7:49 PM, David G. Johnston <david.g.johns...@gmail.com> wrote: > Other implementation of regular expressions handle "newline" mechanics > related to "^" and "$" semantically instead of literally. By that I mean > that both "\r\n" and "\n" are considered "newlines" instead of just "\n". Which ones ? AFAIK this kind of thing is usually done by C ( and related ) runtimes when reading text files. At least in my machine perl does not do it: censored:~$ perl -e 'print( ("A\r\n" =~ /A$/) ? "matched\n" : "NO MATCH\n");' NO MATCH censored:~$ perl -e 'print( ("A\r\n" =~ /A.$/) ? "matched\n" : "NO MATCH\n");' matched censored:~$ perl -e 'print( ("A\r\n" =~ /A\s$/) ? "matched\n" : "NO MATCH\n");' matched Normally when reading lines in CP/M and related ( MSDOS, Windows ) the CRT does collapse them ( and sometimes just zaps \r, or collapse any run, or consider [\r*]\n[\r*] or ). But I normally do not see that behaviour in regexes. > If changing behavior is not desirable I would be content with another flag > that would toggle such behavior. > In code - both of these subqueries should match whereas presently only the > first one does. > SELECT regexp_matches(E'123\n', E'123$', 'w'); > SELECT regexp_matches(E'123\r\n', E'123$', 'w'); > I don't know if this is server O/S dependent...but I would not expect it to > be so. Neither do I ( expect it to be os dep. ) , but I find the current behaviour correct. I mean, newline stuff is OS dependent, and you should convert when ingesting data, when matching them it should already have been converted to whatever the language uses for newlines ( in C and perl that means \n, which needs not be \012, BTW . In unix \n=\012 on disk, on CP/M it's \015\012 and when I worked with Mac ( before the unixy osX they use now ) it was \015, and I cannot think on what they can use on EBCDIC machines ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] question
Hi Anj: On Thu, Oct 15, 2015 at 10:35 PM, anj patnaik <patn...@gmail.com> wrote: > > I will experiment with -Fc (custom). The file is already growing very > large. > I do not recall if you've already provided them, but, how large? I mean, if you have a large database, backup will take time and ocupy space, you may be approaching. As a benchmark, for intellectual satisfaction, the smallest backup you can get is probably text format and then compress with the more agressive option of your favorite compressor, but this is normally useless except for very special cases. My recomendation will be to use plain Fc for a backup, this is what I do. Sometimes tweaking the -Z after tests, but normally in my experience the default level is right. bear in mind DB disk tend to be expensive, backup disks can be much cheaper and, unless you are keeping a lot of them, backups are smaller. As an example, we have a server pair ( replicated ), with a couple short stroked fast disks for the database and a couple 'normal' disks for first line backup in each one. Normal disks are about ten times database disks, and easily fit 30 backups, so we can backup to one of them, copy to the seconds, and replicate to the other in the server pair, just using Fc. This because backup compress indexes quite well, by reducing them to a 'CREATE INDEX', and the copy format used inside is generally more compact than the layout used on disk ( which needs free space, is framed and lot of other things ) and compresses quite well too. If you are pressed for backup size, you normally have very special needs or do not have a properly dimensioned system. But, to say anything more you will need to provide some numbers ( how big is your database and backups, how fast are you disks and things like this. In this case maybe hints can be provided. > > I am running this: > ./pg_dump -t RECORDER -Fc postgres | gzip > /tmp/dump > In this case gzip is useless. -Fc already uses gzip compression at the member level. Doing it with -Z0 and then gzipping will gain you a bit, obvously, as it will compress everything as a single chunk ( except if you manage to hit a pathological case ), but I doubt it will be significant . As pointed in other places you can use Fc+Z0 and then compress with a 'better' compresor you may get a smaller file, or get it faster, but remember you'll need to decompress it before restoring ( this does not happen for text format, as you can do stream restore, but the restore options for text format are limited, it's an all or nothing approach unless you are really fluent in stream editors ). Francisco Olarte.
Re: [GENERAL] question
Hi Anj: On Sat, Oct 17, 2015 at 3:11 AM, anj patnaik <patn...@gmail.com> wrote: > My question is for Francisco who replied regarding xz. I was curious what > options he used. Thanks. 1st, we do not normally top post on this list. Second, I do not remember the exact options I use. I can look them up, but they are going to be surely useless ( they are for a custom process with does several things with files, it uses gzip ( in Fc backups ), plus xz ( for some files which need to be kept for a long time and are nearly never needed ), plus lzo ( as I found lzo compressed temporary files were faster than uncompressed ones ), and a lot of code. But in the development process we did a full comparison of several compressor, and found what I stated with bzip2, it was surpassed in every combination of options by xz ( plain bzip2, plain xz ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] question
On Fri, Oct 16, 2015 at 8:27 AM, Guillaume Lelarge <guilla...@lelarge.info> wrote: > 2015-10-15 23:05 GMT+02:00 Adrian Klaver <adrian.kla...@aklaver.com>: >> On 10/15/2015 01:35 PM, anj patnaik wrote: ... >>> ./pg_dump -t RECORDER -Fc postgres | gzip > /tmp/dump >>> Are there any other options for large tables to run faster and occupy >>> less disk space? >> Yes, do not double compress. -Fc already compresses the file. > Right. But I'd say "use custom format but do not compress with pg_dump". Use > the -Z0 option to disable compression, and use an external multi-threaded > tool such as pigz or pbzip2 to get faster and better compression. Actually I would not recommend that, unless you are making a long term or offsite copy. Doing it means you need to decompress the dump before restoring or even testing it ( via i.e., pg_restore > /dev/null ). And if you are pressed on disk space you may corner yourself using that on a situation where you do NOT have enough disk space for an uncompressed dump. Given you normally are nervous enough when restoring, for normal operations I think built in compression is better. Also, I'm not current with the compressor Fc uses, I think it still is gzip, which is not that bad and is normally quite fast ( In fact I do not use that 'pbzip2', but I did some tests about a year ago and I found bzip2 was beaten by xz quite easily ( That means on every level of bzip2 one of the levels of xz beat it in BOTH size & time, that was for my data, YMMV ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Best way to sync table DML between databases
On 10/05/2015 09:46 AM, jimbosworth wrote: Im not in a position to change the database setup on server A. Can you have the owners/maintainers do the needed changes to setup replication? Or that is 100% out of the question? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Test failover for PosgreSql 9.2
On 09/25/2015 11:20 AM, yuryu wrote: According to manual I have to kill completely Master and "touch" a trigger to make Slave new Master. You don't have to do anything in the master. If you have configured the slave to check for a file, then it will become Read Write when that file is created. You can also do pg_ctlcluster #.# main promote Where #.# is version like pg_ctlcluster 9.3 main promote In the slave you can run this to check if it is in read only (replicating) or read write select pg_is_in_recovery(); -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Dropped connections with pg_basebackup
On 09/24/2015 04:29 PM, Sherrylyn Branchaw wrote: I'm assuming based on the "SSL error" that you have ssl set to 'on'. What's your ssl_renegotiation_limit? The default is 512MB, but setting it to 0 has solved problems for a number of people on this list, including myself. I have also seen instances were ssl_renegotiation_limit=0 helped and I already tried that. Did not help in this case. Perhaps will try some tests with a non SSL connection. These are machines in an internal network so it may not be too much a security issue to turn off SSL at least during initial sync. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Dropped connections with pg_basebackup
On 09/24/2015 04:34 PM, Alvaro Herrera wrote: Sherrylyn Branchaw wrote: Moreover, the default has been set to 0, because the bugs both in our usage and in OpenSSL code itself seem never to end. Just disable it. Set it to 0 and did not help. Likely will move all machines to have it =0 since I have seen some SSL errors in logs. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Fwd: [GENERAL] Convert number to string
Forgot replying to all, forwarding to the list, sorree for the potential dupes. Hi Hengky: On Thu, Sep 24, 2015 at 10:09 AM, Hengky Liwandouw <hengkyliwand...@gmail.com> wrote: > > I don’t have any idea how can I use postgres function to convert series of > number (currency) to my own string. > > > > My string : F G H I J K L M N as the replacement for number : 1 2 3 > 4 5 6 7 8 9 > > Dozens = O > > Hundreds = P > > Thousands = C > > Ten thousands = B > > > > So… > > 200 = GP > > 2000 = GC > > 1150 = FFJO > > 3 = HB > > > > Any idea ? Your example es severely unspecified, how do you translate 1024? and 1002? and 10? given the numbers in the example you can use a simple trick. 1st replace using O for 0 via translate ( I'm not using capitals in teh exmaple for better 0-o contrast ): # with data(s) as (values('200'),('2000'),('1150'),('3')) select translate(s,'0123456789','ofghijklmn') from data; translate --- goo gooo ffjo h (4 rows) then replace sequences of 'o' starting with the longer ones: # with source(s) as (values('200'),('2000'),('1150'),('3')) , step1(s) as (select translate(s,'0123456789','ofghijklmn') from source) , step2(s) as (select replace(s,'','b') from step1) , step3(s) as (select replace(s,'ooo','c') from step2) , step4(s) as (select replace(s,'oo','p') from step3) select * from step4; s -- gp gc ffjo hb (4 rows) clasical trick. But, as I said above, you need to specify it much better. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Convert number to string
HI Melvin: On Thu, Sep 24, 2015 at 2:51 PM, Melvin Davidson <melvin6...@gmail.com> wrote: > Postgresql has a translate function: > Did you read the ( quoted at the bottom of the reply you sent ) message you were replying to? It contained a working sample using exactly this translate. ;-> > > You would then need a little further processing to determine the tens, > hundreds, etc. > > I'll leave that to you to work out, but additional functions of > strpos(string, substring) > substr(string, from [, count]) > length(string) > Do not forget replace, shorter, easier. Francisco Olarte.
[GENERAL] Dropped connections with pg_basebackup
Have an existing setup of 9.3 servers. Replication has been rock solid, but recently the circuits between data centers were upgraded and pg_basebackup now seems to fail often when setting up streaming replication. What used to take 10+ hours now only took 68 minutes, but had to do many retries. Many attempts fail within minutes while others go to 90% or higher and then drop. The reason we are doing a sync is because we have to swap data centers every so often for compliance. So I had to swap master and slave. Calling pg_basebackup like this: pg_basebackup -P -R -X s -h -D -U replicator The error we keep having is: Sep 23 13:36:32 postgres[16804]: [11-1] 2015-09-23 13:36:32 EDT [unknown] replicator LOG: SSL error: bad write retry Sep 23 13:36:32 postgres[16804]: [12-1] 2015-09-23 13:36:32 EDT [unknown] replicator LOG: SSL error: bad write retry Sep 23 13:36:32 postgres[16804]: [13-1] 2015-09-23 13:36:32 EDT [unknown] replicator FATAL: connection to client lost Sep 23 13:36:32 postgres[16972]: [9-1] 2015-09-23 13:36:32 EDT [unknown] replicator LOG: could not receive data from client: Connection reset by peer I have been working with the network team and we have even been actively monitoring the line, and running ping, as the replication is setup. At the point the connection reset by peer error happens, we don't see any issue with the network and ping doesn't show an issue at that point in time. The issue also happened on another set of machines and likewise, had to retry many times before pg_basebackup would do the initial sync. Once the initial sync is set, replication is fine. I tried both "-X s" (stream) and "-X f" (fetch) and both fail often. Any ideas what may be going on? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table using more disk space than expected
Hi Steve: On Wed, Sep 23, 2015 at 7:25 PM, Steve Pritchard <steve.pritch...@bto.org> wrote: > I thought that autovacuum should recover the free space, however I see now > from the documentation that it doesn't (and that this is deliberate): > I'll do a VACUUM FULL, which I expect to reduce table_len. Full is for when you've done a drastic reduction on a table. Some tables oscillate in size, grow and shrink and regrow.., those do not benefit of vacuum full on the long run, because if you have a table which oscilates between , let's say, 10G and 20G you need 20G of disk space, if you shrink and fill the rest with other uses server will crash on next growth ( some very special cases may be different, but in general if you have free space is because you create/delete, be it directly or via MVCC updates, so having it there for next usage is not so bad ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] GIN Trigram Index Size
Hi Christian: On Mon, Sep 14, 2015 at 1:54 PM, Christian Ramseyer <r...@networkz.ch> wrote: > I agree with your append-only and disposable partition approach, it > would work a lot better. The idea with using a second schema for > selective backups is great, I'll totally steal this :) Feel free. Just remember if you use redirection trigers/rules for insert into the partitions it may lead to problems ( I do not normally have them as I either use a partition aware dedicated inserter or zap the archived tables from rules, as I only insert for the current date, maintenance updates are done directly in the partitions ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] postgresql-ctl systemd failed: permission denied
Hi Arnaud... On Tue, Jul 28, 2015 at 12:13 PM, arnaud gaboury arnaud.gabo...@gmail.com wrote: On Fedora 22 ... Jul 28 11:49:57 poppy postgresql-ctl[307]: FATAL: 42501: could not open log file /storage/log/postgresql/postgresql-Tue.log: Permission denied Jul 28 11:49:57 poppy postgresql-ctl[307]: LOCATION: logfile_open, syslogger.c:1160 Jul 28 11:50:01 poppy postgresql-ctl[307]: pg_ctl: could not start server % ls -al /storage/log drwxr-xr-x 1 postgres postgres0 Jul 28 11:29 postgresql/ May be stating the obvious, but can you try 'touch /storage/log/postgresql/postgresql-Tue.log as postgres to see it it works. Also, doing the ls -al on /storage/log/postgresql, in case it has a previous read only file for Tue, and may be even a lsattr on the paths ( listing the fs type will be useful too, maybe Fedora is using some extended thingies ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] INSERT ... ON CONFLICT DO UPDATE
Hi Rafal: On Tue, Jul 21, 2015 at 12:43 PM, Rafal Pietrak ra...@ztk-rp.eu wrote: W dniu 21.07.2015 o 09:34, Francisco Olarte pisze: In this case I think you are mixing vouchers with voucher-numbers. IMO you could get a better dessign by using an auxiliary table and not nullifying the number after been consumed. Having only 6 digits I Hmmm. I don't think so. The application just needs to create a unique (for a period of time) number, and consume it at certain point. Everything else would be implementation burden, which should be kept to minimum. I see your points, totally opposite opinions, so no point in discussing it, discard my sugestions as not aplicable SVP. This looks a bit like an overkill for the above examples. It certainly is for your style of dessign, workng target, discard it. But I have other thoughts on the use of cryptographic sequences here. I wouldn't call it that, its misleading. It's just encrypted sequences. It has the pitfall of being sensitive to out-of-the-sequence poisoning, I mean: When another instance of an application starts issuing another sequence of vouchers, at certain point those sequences collide and applications despite using guaranteed lack of collisions will have a collision. Well, if you have aplication instance specific sequences, of course you have. But in this case even plain unencrypted sequences hae them. So the application *will have to have* a re-issuing of an INSERT implemented anyway. Of course, because the only point of using instance specific sequences instead of serial like you normally do must be having the possibility of collisions to justify a the existence of a re-issuing code and exercise it. If so, the whole point of using cryptographic sequence is missing. No. The whole point of using a global sequence ( in the db ) is avoiding collisions, be it encrypted or plain. The whole point of using crypto is to make it look random. If you use an application specific cryptographic sequence is because you want colisions ( success, as told above ) which looks random ( success too ). If you do not want colisions, use a global sequence. So, even though this collision is not statistically significant, but just its possibility results in that application have to take care of re-issuing of an INSERT. I use to tell people there are three meaninful cardinalities in computing, zero, one and many. And three probabilities ( NOT possibilities ), zero, one and other==(0,1). Except in some lucky domains you have to trat every (0,1) probability as been possible ( in fact my three probability values map nicely to impossible, possible and certain ). Using database.sequence() function to seed the cypher is not secure enough. What are you talking about? Where did you get that seeding idea? You do not seed the cipher, you use the ciphered sequence as voucher. In fact I've done this with session ids. I use a sequence for the ID and send the user the ciphered val. When it comes back I just decipher it and search. I did not have your 6-digit problems, so I just used 128 bit blocks, and it worked nicely. And I did not have any ciphered data in the DB. On the other hand, the ON CONFLICT RETRY has a nice feature for an application programmer (like myself) that it leaves us free of the implementation of the re-issue of an INSERT. One database-schema designer does that for all of us. But knowing if that usage scenario is too rare to match the heavy lifting the implementation required, is beyond my experience. Saying OCR is a nice feature is like saying MAGIC RAINBOW OVERDRIVE is a nice feature for a car. It does not exist, and nobody has described it with enough detail so people can assses its usefulness or implementation difficulty. A careful definition of the corner case will be needed. And even with it you have the possibility of inifinite colisions ( either due to generating too many 'vouchers' or to sheer bad luck ( like collisions among application instances ). If you try to write a nicely wrapped up description of the functionality maybe someone could see the usefulness and implement it, but I think this is possible but unlikely. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] INSERT ... ON CONFLICT DO UPDATE
Hi Rafal: On Mon, Jul 20, 2015 at 3:33 PM, Rafal Pietrak ra...@ztk-rp.eu wrote: 3. there are methods (like cryptographic random sequence), which guarantee no conflicts. So one should resort to that. Regarding the last point. Usually, I implement one-time used vouchers as rows in table like: CREATE TABLE (voucher int not null, consumed bool, expire timestamp not null default timestamp_pl_interval(now()::timestamp, '2 min'::interval),..., unique (voucher,consumed) ); with CONSUMED column NULLyfied when voucher is used. The entire row of consumed voucher is purged after clearence and verification, which happen significantly later. Random as a primary key is considered a bad practice by many people with much experience, nullyfing it too. Many people even frown on just changing the primary key ( and one of the reasons for using serial as keys in many situations is to have a guaranteed not null unchanging value ). Such short lived (when active) voucher is usually just 6-digit long, to help people enter it. Then, random and with a narrow value domain, they make, IMNSHO, a really bad choice for primery keys. I don't know much about cryptography, but would a generic encryption function (like that indicated by Daniel) have the same waking through the entire range-space behavior as the original when that range-space is externally (by my application) truncated to those 6 digits? If not, would it be as efficient in conflict avoidance as used with original 32-bit range-space? An encryption function never has collisions ( do not confuse with a hash ). If it had you would be unable to decrypt it. The problem is the value domain for you. i.e., for your example you could choose a bit stream cipher applied to a 20 bit value. This is a moderately complex prolem to find or build ( from the classic cryptographic primitives nearly every language includes ). This will map every different 20 bit input value to a different 20 bit output value, so your value domain will be 20 bit numbers, your inputs will be the 10^6 6 digit numbers and the outputs will be 10^6 DIFFERENT 20bit numbers, of wich you could expect about 4.8% of them ( 2^20-10^6)/10^6 to have 7 digits ( with a leading one in this case ). To solve that problem you could use 19 digit input/output numbers or try to fin a decimal cypher which uses exactly 10^6 input digits. If you use a 32 bit block cypher it will not have collisions, but if you TRUNCATE the 32 bit ~ 9.5 digits output to 6 digits, you are no longer encrypting. You may call it hashing or whatever, but that is NOTt encryption, you would have collisions. Then again. Is it really a good practice to rely on a programmer to peek proper/correct encryption helper instead of providing him/her with a database-integrated tool for a well defined and not so rare usage scenario as random default for UNIQUE/PK column? Many of us are too old to get caught by this. This question is like asking Is it good practice to hit a person with a 10 pound hammer in the head instead of giving a cookie?. There are other options. IMO NOT modifying a very complex chunk of code ( the server code doing the inserts and checking the collision cases and acting on them, plus the parser for insert queries plus ) and risking all the bugs it may introduce to help with inserting random pk is good practice. It doesn't matter if the requesting programmer peeks a bad encryption methods, keeps his old code for inserting random ids or introduces bugs in his program, the potential harm to the rest of the users is too great. So my conclusion from this thread is that as this usage scenario does not seem to be foreseen by current implementation of ON CONFLICT transaction, a workaround exists (like: cryptographic range-walker). Being it a workaround, I'd vote for some direct supported of that scenario in the future at database level. Bear in mind your problem is totally ill defined. I mean, you want to insert a random 6 digits ID, and want the database to keep trying until it finds an unique one. What should it do if there already are 10^6 records in the db? Stall until a free one is found? abort? This kind of uses is very narrow, and very difficult to get right , and normally confined to the application domain. Even if you choose a totally correct encryption function for collision avoidance, like identity, you are going to have problems in your scheme. You are not voting for anything, you need a feature proposal to vote upon. So far the only one I could extract from this thread is something which magically solves the current Rafal problem. I would vote against that. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] INSERT ... ON CONFLICT DO UPDATE
Hi Igor: On Mon, Jul 20, 2015 at 4:56 PM, Igor Neyman iney...@perceptron.com wrote: Well, there is a caveat. If I create table and couple indexes like this: .. and populate them: and then check the size of the indexes: for select pg_relation_size('U1') I get 2834432 while select pg_relation_size('U2') returns 2285568. So, index based on randomly populated column is bigger than the one based on sequentially populated. But, on the other hand, after: reindex table test_index_size; both indexes are of the same size: 2260992. I would totally expect this. On reindex you get the values from a tree walk, so both of them come in order, and being a reindex ( where you know in advance the full set of values, so you can plan ahead where to put the leaves, how many levels you need and how many splits ) you get an even bigger advantage from the squential insertion case. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] INSERT ... ON CONFLICT DO UPDATE
Hi Alvaro. On Mon, Jul 20, 2015 at 4:07 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Some web research suggests that random sequences are not great for indexes because of the resultant keyspace fragmentation. I'm assuming that means a low number of nodes in the btree leafs, so an increase in memory usage for the index? Not sure what type of indexes would be affected by that problem, but I don't think Postgres' btrees would be. I do not know if postgres btrees do it, but I know you can build btree inserting code in such a way that inserting nodes sequentially leads to optimally filled leaf pages an denser trees, as an optimization for an easy and common case, which are better than the normal ones generated by random insertion. So is not that random are bad, it is that ordered are very good, or in another way thay are not affected by a problem, but do not get the advantage. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] INSERT ... ON CONFLICT DO UPDATE
Hi Rafal: On Mon, Jul 20, 2015 at 3:33 PM, Rafal Pietrak ra...@ztk-rp.eu wrote: Regarding the last point. Usually, I implement one-time used vouchers as rows in table like: CREATE TABLE (voucher int not null, consumed bool, expire timestamp not null default timestamp_pl_interval(now()::timestamp, '2 min'::interval),..., unique (voucher,consumed) ); with CONSUMED column NULLyfied when voucher is used. The entire row of consumed voucher is purged after clearence and verification, which happen significantly later. Such short lived (when active) voucher is usually just 6-digit long, to help people enter it. In this case I think you are mixing vouchers with voucher-numbers. IMO you could get a better dessign by using an auxiliary table and not nullifying the number after been consumed. Having only 6 digits I tould try: 1.- Add a serial PK column to voucher table if needed to link it with the rest of the system. 2.- Create an index on voucher where consumed is true. 3.- Add another table, voucher_nums, with columns voucher, order, used. Populate it with the 10^6 vouchers and a random order value. Also, this lets you switch to alphanumeric vouchers, or zap the ones with two consecutive equal digits, or whatever. 4.- Make a function to select a free voucher, you can do 'select from voucher_nums where not used order by order limit 1¡', if yout put this into a with clause of an update-returning setting used to true to you get a one shot way of getting a free voucher. If you add a partial index on order where not used, you get a fast way of getting it. 5.- Make another function to free a voucher num, which sets consumed to true on vouchers, used to false and order to a random number on voucher_nums. This way you keep the old voucher numbers, and you get no collisions. If you run for some years, you can see which vouchers have been used, so you can debug potential problems. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] INSERT ... ON CONFLICT DO UPDATE
Hi Daniel: On Sun, Jul 19, 2015 at 9:03 PM, Daniel Verite dan...@manitou-mail.org wrote: For SERIAL, it's too obvious to guess what is the next one, so malicious people could claim access codes or vouchers they don't own. Why don't you use encryption? Specifically only on the external side. You use a serial in the DB and send the encrypted serial as voucher code ( this way you do not need to have database resident encryption ). Then when you receive the code in the program you decrypt it and are done. And having serial in the DB can be good for your internal operations. Encryption, reversible and colision free, not hashing. The constraint is that such codes must be reasonably short, but someone who tries to make up one must have a near-zero chance of guessing one that actually exists. If you can live with a little longer voucher ( it seems you use 10^9 in random ), you can use 2^32, which is just 9.5 digits, and search for a 32 bit block cipher ( or build it yourself, it's not that hard using stream ciphers or other tools ). I also thinks random UUIDs are not ok, not because they are long but because they are random, and can collide ( encrypted serials can not ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to test SSL cert from CA?
On 07/11/2015 07:32 PM, James Cloos wrote: FR == Francisco Reyes li...@natserv.net writes: Did you include the intermediate cert(s) in the bundle which the server presents to the client? Yes. And did you confirm that the client trusts the issuer's root? Some require explicit configurastion of that. The client in this case is a program a client of mine runs. I don't have access to the program -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Bounded Zone Offset Query
Hi Robert: On Fri, Jul 10, 2015 at 6:55 PM, Robert DiFalco robert.difa...@gmail.com wrote: I want to do a query from an application that returns all devices who's time is between 10am or 10pm for a given instant in time. . Thanks John, let me revise my original query to give a CORRECT and working example, maybe this will help. I've created a query that actually works, it's just ugly and I'd like to figure out how to make it like the example you gave (i.e. no math on the utc_offset field, just comparisons). select * from devices d where (now() at time zone 'UTC' + make_interval(hours := d.utc_offset))::time BETWEEN time '10:00' AND time '22:00'; It seems you have hours in utc-offset, and then you work in seconds in the day. Your problem to turn it around is the (now+offset) cast to time, which, working in hours only, is more or less equivalent to (hours_now+hours_offset+24) mod 24, can materialize into two distinct utc_offset intervals. I'll try to make an example. If utc-now is 2:00, and utc-offset goes from -12 to +12, you need to select -12 to -4 ( which added to 2 are -10 to -2 or 14 - 22) and 8 to 12 ( which translate to 10-14). You are going to have this problem even if you use utc-offset 0-24 ( in this case utc-now=2 works, you would use 8 to 20 for utc-offset, but if utc-now where 20:00 you would need to query for utc-offset in 0-2 and 14-24 ( which translate to 20-22 and 34-44=10-20 ). If I were to do this, I would make the host code emit the appropiate ranges ( either one or two ) as all my machines are NTP synced, so now() is the same as time(). If You really need the database timestamps you can do a small function and do it there. If the db is properly analyzed and indexed on utc-time you can even emit the dual ranges always. Something like this ( I'll use some placeholders ): secs_now=utc seconds of now, you can get this casting to times and doing some fancy stuff, I've tested this: # select extract('epoch' from (cast(now() at time zone 'UTC' as time)-'00:00'::time)) as secs_now; secs_now -- 33741.008092 (1 row) Now, you can use seconds instead of 10:00, 20:00 easily, either use the above or just multiply by 3600 if your original values are seconds, that would be 10*3600, 22*3600, let's call them lo_secs and hi_secs. Given there are 86400=24*3600 secs per day , and assuming utc-offset is in the range -12..24 ( to cover everything ) you need to select: You want to select ( utc_offset*3600 + secs_now + 86400 ) % 86400 between lo_secs and hi_secs. The +86400 is to bring everything to possitive ( I never do modulus on negative, behaviour depends on languages ). Anyway. Utc_offset*3600 is in the range -43200, 86400, secs_now is in 0-86400, so with the addition the sum range is in 43200-86400*3, so you can get rid of the modulus operation by doing: ( utc_offset*3600 + secs_now + 86400 ) - 0 * 86400 between lo_secs and hi_secs or ( utc_offset*3600 + secs_now + 86400 ) - 1 * 86400 between lo_secs and hi_secs or ( utc_offset*3600 + secs_now + 86400 ) - 2 * 86400 between lo_secs and hi_secs Which you can then simplify and turn into conditions on utc-offset: ( utc_offset*3600 + secs_now + 86400 ) between lo_secs and hi_secs or ( utc_offset*3600 + secs_now ) between lo_secs and hi_secs or ( utc_offset*3600 + secs_now - 86400 ) between lo_secs and hi_secs ... utc_offsets between (lo_secs-secs_now-86400)/3600 and (hi_secs-secs_now-86400)/3600 or utc_offsets between (lo_secs-secs_now)/3600 and (hi_secs-secs_now)/3600 or utc_offsets between (lo_secs-secs_now+86400)/3600 and (hi_secs-secs_now+86400)/3600 Its a little tricky, but can be easily packed into a procedure, and if the db is properly set up it can get rid of the empty ranges with just a simple index check. Also, if you start with lo-hours and hi-hours, you can make it more pallatable by calculating hours-now instead of secs-now by dividing by 3600 and then you have utc_offsets between (lo_hours-hours_now-24) and (hi_hours-hours_now-24) or utc_offsets between (lo_hours-hours_now) and (hi_hours-hours_now) or utc_offsets between (lo_hours-hours_now+24) and (hi_hours-hours_now+24) (note, if your utc-offset range is just 24hr instead of 36hr like I've assumed you can get by with just two conditions, but they are a bit harder to obtain as you will need to massage the range ( been there, done this, just not in sql, also, if you are in procedure and precalculate the ranges for the three between and you know your utc-offsets ranges you will notice int the 24 hrs range case one of the three can always be proved empty ( i.e., if your ranges are -12 to 12, one of the three betweens will have hi-limit less than -12 or low limit greater than 12 ) Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to test SSL cert from CA?
On 07/09/2015 03:07 PM, Vick Khera wrote: On Wed, Jul 8, 2015 at 10:17 PM, Francisco Reyes li...@natserv.net mailto:li...@natserv.net wrote: openssl s_client -connect HOST:PORT -CAfile /path/to/CA.pem According to this post: http://serverfault.com/questions/79876/connecting-to-postgresql-with-ssl-using-openssl-s-client?rq=1 one can not use openssl to test ssl connection to postgresql. True? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to test SSL cert from CA?
On 07/08/2015 10:52 PM, Tom Lane wrote: What's the complaint exactly? The error we are getting is: The security of this transaction may be compromised. The following SSL errors have been reported: * The issuer certificate of a locally looked up certificate could not be found. * The root CA certificate is not trusted for this purpose -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Backup Method
On 07/03/2015 08:08 AM, howardn...@selestial.com wrote: I am trying to move away from pg_dump as it is proving too slow. Have you looked into barman? http://www.pgbarman.org Also, another potential approach is to setup replication and to do the backups from the slave. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How to test SSL cert from CA?
Have a client using a commercial application. For a year plus we had been using a local self signed certificate without issues. As of a few weeks ago a change/update to the program is making it complain about the self signed cert. I bought a SSL cert and installed it, but the program is still having the issue. Anyone knows of a way to test the SSL connection such that it validates against the CA? Preferably an open source application. Connecting through psql works fine on SSL with what I have setup, but the application, xtuple, seems to still be having the issue. The client already wrote to the application support department, but still waiting for an answer from them. If I had a way to at least reproduce the error I could more easily track down what I am missing. Any suggestions? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Inserting from multiple processes?
Hi Dave: On Mon, Jun 29, 2015 at 6:32 AM, Dave Johansen davejohan...@gmail.com wrote: The issue is that the following uses 5 XIDs when I would only expect it to us 1: BEGIN; SELECT insert_test_no_dup('2015-01-01', 1, 1); END; I see. It appears that the unique violation that is caught and ignored increments the XID even though I didn't expect that to happen. I agree that our software was burning XIDs needlessly and Postgres handled this situation as best as it could. It also sounds like Postgres 9.5 adds features to support this sort of use more efficiently, but the XID incrementing on the unique violation seems like it could/should be fixed, if it hasn't been already. IIRC you were using BEGIN/EXCEPTION, which I think uses a savepoint internally, which maybe what is burning the xid on every execution ( it probably needs one to implement rollback to savepoint properly ). I've done a simple test which burns one very time the exception is raised ( using a division by zero ). If this is your case you may be able to work around it using a conditional insert instead of an exception, and as you are using a function the potential ugliness will remain encapsulated ( it may even be faster, as the docs explicitly say exception blocks are expensive, but as usual YMMV depending on the exact query and the collision ratio ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Inserting from multiple processes?
Hi Dave: On Fri, Jun 26, 2015 at 2:59 AM, Dave Johansen davejohan...@gmail.com wrote: It appears that calling SELECT insert_test_no_dup('2015-01-01', 1, 1) cause the XID to increment? I'm not sure if it's only when the exception happens or all the time, but if there some way to prevent the increment of XID because it's causing problems with our system: http://www.postgresql.org/message-id/CAAcYxUer3MA=enxvnowe0osaa8comvxcf6orhp-vuppr56t...@mail.gmail.com I, personally, would expect an START TRANSACTION to burn an XID, they are serial, and they need to be allocated to have transaction ordering, like the thing which happens with the sequences. I assume the server can have some optimizations ( like delaying XID adquisition to the first appropiate statement, which I think depends on your isolation level ), but I would never expect it to not allocate it before an insert, it needs it to be sent to the table, in case it succeeds, and has to acquire it beforehand, in case someone needs to acquire another xid between the time it starts inserting and the time it succeeds or fail. Some internals expert may shed some light, but after reading your link it seems your problem is just you do too many transactions without a vacuum ( also reading your pointed threas it sees you do vacuum fulls, which seems unneeded ) and expecting postgres has some kind of magic to avoid burning the xids. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] DB access speeds, App(linux)-PG(linux) vs App(linux) -MSSql(Windows)
Hi Dave: On Wed, Jun 24, 2015 at 3:44 PM, Gauthier, Dave dave.gauth...@intel.com wrote: Hi: I'm trying to get a 10,000 ft understanding of the difference in DB access speeds for two different scenarios... Scenario 1: Apps are on linux. PG DB is on linux (different server than apps) Scenario 2: Apps are on linux. MSSql DB is on Windows (obviously a different server) The apps are typically perl scripts using Perl DBI. I'm thinking that since perl has a DBD driver for PG, that would be faster than going through ODBC to get to the MsSQL DB, but I'm not sure about that. Any insights/guesses ? From 10k ft, same speed ( variations due to lack of precision in the definition of the problem are going to dwarf everything else ). I assume apps are the same on both, and same network distance. If you use DBI for both you can discount it. If you use DBI you must use DBD. DBD::Pg is quite efficient in my measurements, but for the windows part I do not know what you are using. I'm using Mssql from linux using freetds ( DBD::Sybase, built with freetds, which is equivalent to DBD::Pg built with libpq ) and it goes at the appropiate speed ( I mean, nothing big is going to be gained, optimization time will be bigger than savings ). You say you use ODBC, but where exactly? You use ODBC on linux going to mssql using ¿which library? and DBD::ODBC, or you use remote odbc and an ODBC driver on windows? In any way, speed differences in your queries may be much bigger than requester speeds, unless you have a very complex path ( dbd::proxy going to a windows perl dbiproxy going with DBD::odbc to local mssql ? ). You should time a couple of your queries in real world condition, and test some loops of null queries ( like select 1 and the mssql equivalent in autocommit and no autocommit mode for both ), either of them can win. Also, someone has already told you that for really fast queries native may make a difference. It may, but you have to define 'non native' better. In very fast queries requester differences may be dwarfed by network roundtrips, and if this is a problem to you, you should look for optimizing the common path firsts, things like how many roundtrips each PROTOCOL needs for the small query and other similar. You should measure before. Requester is not normally going to be your big problem. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] double precision[] storage space questions
Hi Greg: On Fri, Jun 12, 2015 at 4:08 PM, greg gregory.jevar...@unige.ch wrote: I cannot find any documentation on the space taken by a double precision array. And the few tests I did surprise me. Here are a few tries I did to understand select pg_column_size(1.1::double precision) return 8--- as expected select pg_column_size('{}'::double precision[]) return 16 --- ok maybe an array header select pg_column_size('{1.111}'::double precision[]) return 32 --- I expected 16+ sizeof(double) = 24 select pg_column_size('{1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0}'::double precision[]) return 104 --- I'am lost because I expected 10*16 + 16 = 176. It is neither 16+10*8 (96) That's not a very good set of tests, look at mine ( slightly edited for size ); apc=# select pg_column_size('{}'::double precision[]); 16 apc=# select pg_column_size('{1.0}'::double precision[]); 32 apc=# select pg_column_size('{1.0,2.0}'::double precision[]); 40 apc=# select pg_column_size('{1.0,2.0,3.0}'::double precision[]); 48 apc=# select pg_column_size('{1.0,2.0,3.0,4.0}'::double precision[]); 56 Here I already expect 8*n+24, so the data point for 10 apc=# select pg_column_size('{1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0}'::double precision[]); 104 Does not surprise me. So what is happening behind the scene, I did not found any documentation. Since the queries are done in memory I suppose no compression is going on. Furthermore select pg_column_size('{1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0}'::double precision[]) return 104 as well. So I discard compression. You're hunting for wild things, as said in some other places, headers, special case for dimensionless arrays, it's a classic thing in databases. The whole point is that in the application I work on, we store double arrays as bytea (using some serialization before storing the data). I was very surprised to see that the storage of an array of double take more space using double precision[] than serializing it and storing it into a bytea. Not too much, just 20 bytes more per column, unless you play compression tricks. Unless you have lots of small columns, I doubt serializing/deserializing it is worth the hassle. Postgres does not always use the most compact form for storage. In fact I would be greatly surprised that any database stores an array ( which can be multidimensional, I do not know if other databases have single dimensional array types ) in a more compact way than an specialized serialization format for one dimensional double arrays. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Planner cost adjustments
Hi Daniel: On Thu, Jun 11, 2015 at 2:38 PM, Daniel Begin jfd...@hotmail.com wrote: . The remaining problem seems related to the statistics of some large tables. On one hand, I might increase the statistic target for these tables to 500, or even to 1000 and look at the results (but I have doubts it will help). On the other hand, I could simply force enable_seqscan to OFF for queries dealing with them. Still not sure about the best solution but the problem is definitely narrower :-) One somehow unrelated point. IIRC your problems where related to queries doing joins with selected sets of indexed IDs on a smaller table which then have to be looked up on some very large tables. ( I'm not able to follow too well which is which, users on changesets, I'm a bit lost ). Given your runtimes are always high ( in the seconds range, so it seems wire speed / latencies are not too much of an issue ) and that selectivity estimates on huge tables are always problematic and may be thwarting your plans you may be able to get faster results splitting your query. If I read your plans correctly, that would be selecting your 600 users in one query and then preparing the changeset query for a single user_id, which should be indexed, and looping it for every user. Given current machines can easily send-receive 600 queries in a second it may lead to a simpler solution. This mean you're using the DB as a somehow inteligent plain old indexed file, but sometimes this is the simpler approach ( heck, some of my code uses algorithms from the tape era as they were the faster way I could do it ). I needed to do this in one of my programs, the optimizer kept selecting bad plans so I did the first query, held the results in memory, and then prepared and executed in a loop from the app, my query was selecting a couple thousand values from submillion table, and joining with a multimillion one, getting a couple hundreds matches per original value. Splitting it made the queries on the big table always run indexed and fast ( and as a side bonus avoided duplicating the parts of the first record in the wire a hundred times, which was nice since the short table was wide and I only needed 3 short fields from the second one, and that made the first query run at wire speed and the second at disk speed ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] select count(*);
Hi Marc: On Thu, Jun 11, 2015 at 4:17 PM, Marc Mamin m.ma...@intershop.de wrote: But COUNT(*) does have meaning - it means the number of rows. which rows? :-) Well, docs could use a little polish there, as the select page says Compatibility Of course, the SELECT statement is compatible with the SQL standard. But there are some extensions and some missing features. Omitted FROM Clauses PostgreSQL allows one to omit the FROM clause. It has a straightforward use to compute the results of simple expressions: SELECT 2+2; ?column? -- 4 Some other SQL databases cannot do this except by introducing a dummy one-row table from which to do the SELECT. Old time users have grown used to use it without questioning, and I think it's sometimes needed to call functions, but some words along the a magic one row zero columns table is used when it is omited, which is what it seems to be done, would be nice. And, as I said, * only means the columns in a select, I think on no from Pg may be generating a fake one row table to satisfy the requirements ( maybe not, but is one easy way to make this work given how select is explained to work in the docs ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Efficient sorting the results of a join, without denormalization
Hi Glen: On Mon, Jun 1, 2015 at 1:16 AM, Glen M. Witherington g...@fea.st wrote: Thanks Francisco, that makes sense. I've started moving my code to that, and it eliminates all the performance issues I had. Happty to hear it. Seems you have a kind of speed-size trade off. If you can solve it while preserving integrity, great for you. I guess I was really hoping there would exist some sort of dereference option when indexing, so I could dereference a foreign key, and then index on a attribute of that row. E.g. So I could have created an index such as: deref(deref(mail.folder_id).user_id, created_at) That is difficult, as it would need a way to force mail.folder.user_id to be a constant or have a lot of rules/triggers ( manual, automatic or just plain magic ) to update your indexes. On this cases composite keys let you use composite indexes to accelerate your queries while preserving normalization, if you can afford them they are nice. The big problem comes many times if you try to mix them with ORMs and similar. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Efficient sorting the results of a join, without denormalization
Hi Glen: On Sun, May 31, 2015 at 6:43 AM, Glen M. Witherington g...@fea.st wrote: On Sat, May 30, 2015, at 11:33 PM, David G. Johnston wrote: This is one problem with using made up surrogate keys... The PK of A is a component of both the PK of B and the PK of C but you throw that information away by using serial fields for PKs instead. You should have unique indexes on B and C that incorporate the ID from A That is quite a strange schema, though isn't it? If you imagine it as emails: C = Emails B = Folder A = User Now you're suggesting that even though an email belongs to to a folder, which belongs to a user ... each email should also contain contain a reference to a user? I guess that's fine, but seems unideal from a redundancy perspective It may seem, and be, unideal from a redundancy perspective, but keys are more natural. It means you have user (Glen), folder (Glen, PGlist) and message (Glen,PGlist,27), different from (Glen,Inbox,27) or (Glen, PgList,28) or (Francisco,PgList,27) ( Where the 'tuples' I've printed are the PK values ). This has a lot of advantages, which you pay for in other ways, like redundancies, but having composite primary keys sometimes work in your favor as you can express restrictions with the relationships and build composite indexes for add hoc queries. In this case ( an email database ), a serial could be used ( instead of the name ) for the user and folder PK, but still have very fast, simple queries from a MUA for things like 'select * from messages where user_id = Prefetched_id and not read order by timestamp desc limit 100'. Also it will help catch things like mismatching folder ids, or using the user id as folder id, which are easily made when all the keys are synthetic and meaningless numbers. As an example, I have a currency table, with it's serial key currency_id, and a seller table, which sells just a currency and whose pk is (currency_id+seller_id), and a rate table with rates (currency_id, rate_id), and an allowed rates table ( to see which rates a seller can use ), with primay key (currency_id, seller_id, rate_id) and foreign keys (currency_id, seller_id) and (currency_id, rate_id) ( it is more or less a classical example. The composite keys guarantee I can only allow a seller to sell rates on her currency. I can also, if needed, build unique indexes on any single id ( they are all serials, as I have no other candidate keys ), if I need them, but given the access patterns I normally have all of them, and things like populating a drop box to allow new rates for a seller are very easy. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to retrieve Comment text using SQL, not psql?
Hi Bob: On Sat, May 30, 2015 at 1:48 PM, Bob Futrelle bob.futre...@gmail.com wrote: Using pgAdmin3 I've tried this and variations on it. All are rejected. select COMMENT ON TABLE articlestats pgAdmin3 is a bit complex for me, bot on vanilla psql you can use the switch: -E --echo-hidden Echo the actual queries generated by \d and other backslash commands. You can use this to study psql's internal operations. This is equivalent to setting the variable ECHO_HIDDEN to on. to learn how it does them, and elaborate from there, Its a nice resource ( although I fear its queries evolve with versions, but you can always port them touse the information_schema ( http://www.postgresql.org/docs/9.4/static/information-schema.html ) which should be a little more stable. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] FW: Constraint exclusion in partitions
for the single table approach hands down ( maybe with an intermediate weekend / nigt time cluster/vacuum full/analyze if it is insert a lot- select a lot, rinse, repeat ) unless you are appending / updating / deleting a lot. You are in a case ( big data, medium machine, slow disks ) where the real access patterns are what is going to determine your strategy, no boilerplate solution is going to apply there. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] MD5 password storage - should be the same everywhere?
On 05/25/2015 07:58 PM, Adrian Klaver wrote: On 05/25/2015 01:41 PM, Francisco Reyes wrote: I understood that is just a md5 hash of the password and the username with the string md5 pre-appended, so it should be the same. Mistery solved.. Because I usually do script of most of my work for audits I was able to track down what happened. Original request was to get data from 'machine 2', but it was later changed. Found something in the output of my script that lead me to believe maybe the roles did not come from 'machine 2'. Was able to find the source machine by using select rolname, rolpassword,rolcanlogin from pg_catalog.pg_authid where rolname = 'SomeUser'; Against machines I thought could have been the source and found a matching one. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general