Re: [GENERAL] archive_command not being executed

2017-11-10 Thread Paul Jungwirth

On 11/10/2017 09:10 AM, Eric D wrote:

I have a standby db server (SB1) that will soon become the master.  SB1
is set up with streaming replication from the current master.  I'm
trying to set up a third server (SB2) as a slave/standby to SB1, so that
when SB1 becomes the master, there will be a standby for it.  First step
is to get WAL files shipped from SB1->SB2.


Oh this has happened to me before. :-) On SB1 you need to set
archive_mode to always (not on). Otherwise it is ignored when running as 
a standby.


Btw just in case you are using Ansible here is an unmerged PR for the 
postgres role: https://github.com/ANXS/postgresql/pull/254


Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Two versions of an extension in the same cluster?

2017-10-23 Thread Paul Jungwirth

Hello,

I've got an extension that supplies functions written in C. Two 
databases from the same cluster both use this extension. I understand 
how I can load the example--2.0.0.sql file in one database, and 
example--3.0.0.sql in another, but from what I can tell both databases 
still share the same .so file. Is there any way to provide a separate 
.so for each version?


If not, what is the best approach for releasing a new .so that keeps the 
old functionality? I guess something this?:


# example--2.0.0.sql
CREATE OR REPLACE FUNCTION
myfunc(anyarray)
RETURNS integer
AS 'example', 'myfunc_v2_0_0'
LANGUAGE c IMMUTABLE;

# example--3.0.0.sql
CREATE OR REPLACE FUNCTION
myfunc(anyarray)
RETURNS integer
AS 'example', 'myfunc_v3_0_0'
LANGUAGE c IMMUTABLE;

Thanks,
Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_dump throws too many command-line arguments in Postgres 10

2017-10-18 Thread Paul A Jungwirth
On Wed, Oct 18, 2017 at 8:05 AM, Andrus  wrote:
> pg_dump.exe -b -f b.backup -Fc -h  -U admin -p 5432 mydb
>
> causes error
>
> pg_dump: too many command-line arguments (first is "-p")

Don't you need a hostname after -h? I think right now pg_dump thinks
your hostname is "-U", your database is "admin", and everything after
that is extra.

Yours,
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Is float8 a reference type?

2017-09-23 Thread Paul A Jungwirth
On Sat, Sep 23, 2017 at 9:40 AM, Tom Lane  wrote:
> I wonder whether you're using up-to-date Postgres headers (ones
> where Float8GetDatum is a static inline function).

I'm building against 9.6.3 on both machines. I'm not doing anything
special to change the compilation options. Here is my whole Makefile:

MODULES = floatfile
EXTENSION = floatfile
EXTENSION_VERSION = 1.0.0
DATA = floatfile--$(EXTENSION_VERSION).sql

PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

But what I'm really interested in is this: What are the bad things
that can happen if I do `datums = (Datum *)floats`, as long as it's
only when Datums are 8 bytes wide? Is there a platform with
pass-by-val float8s where that won't work?

Thanks,
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Is float8 a reference type?

2017-09-23 Thread Paul A Jungwirth
On Fri, Sep 22, 2017 at 8:38 PM, Tom Lane  wrote:
> "Premature optimization is the root of all evil".  Do you have good reason
> to think that it's worth your time to write unsafe/unportable code?  Do
> you know that your compiler doesn't turn Float8GetDatum into a no-op
> already?  (Mine does, on a 64-bit machine.)

Ha ha, thank you for keeping me honest! But can you explain what is
unsafe about the cast? For a little more context: I've loaded a float8
array from a file, but I need to pass a Datum array to
construct_md_array. With an 8-byte Datum, I can just pass the original
float array, right? But with smaller Datums I need to go through the
array and convert each element. (I'm not really worried about these
files being moved between machines, so I'm willing to make the on-disk
format the same as the in-memory format.)

Since I'm expecting ~10 million elements per array, it seems like
skipping the conversion will have a real effect. I checked the
assembly and do see a difference (on both Mac+clang and Linux+gcc).
Here is the Mac command line:

platter:floatfile paul$ clang -Wall -Wmissing-prototypes
-Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -Wno-unused-command-line-argument -O2  -I. -I./
-I/usr/local/Cellar/postgresql@9.6/9.6.3/include/server
-I/usr/local/Cellar/postgresql@9.6/9.6.3/include/internal
-I/usr/local/opt/gettext/include -I/usr/local/opt/openldap/include
-I/usr/local/opt/openssl/include -I/usr/local/opt/readline/include
-I/usr/local/opt/tcl-tk/include -g -S -o floatfile.s floatfile.c

Here is the assembly for the cast:

  .loc2 391 23 is_stmt 1  ## floatfile.c:391:23
  movq-48(%rbp), %r15
Ltmp176:
  ##DEBUG_VALUE: load_floatfile:datums <- %R15

Here is the assembly for the loop (after just changing the code to `if
(FLOAT8PASSBYVAL && false)`):

  .loc2 393 21 is_stmt 1  ## floatfile.c:393:21
  movslq%r15d, %r13
  .loc2 393 28 is_stmt 0  ## floatfile.c:393:28
  leaq(,%r13,8), %rdi
  .loc2 393 14## floatfile.c:393:14
  callq_palloc
  movq%rax, %r12
Ltmp177:
  ##DEBUG_VALUE: load_floatfile:i <- 0
  .loc2 394 19 is_stmt 1 discriminator 1 ## floatfile.c:394:19
  testl%r15d, %r15d
Ltmp178:
  .loc2 394 5 is_stmt 0 discriminator 1 ## floatfile.c:394:5
  jeLBB7_11
Ltmp179:
## BB#9:
  ##DEBUG_VALUE: load_floatfile:arrlen <- %R15D
  ##DEBUG_VALUE: load_floatfile:nulls <- [%RBP+-80]
  ##DEBUG_VALUE: load_floatfile:floats <- [%RBP+-72]
  ##DEBUG_VALUE: load_floatfile:filename <- %RBX
  .loc2 0 5 discriminator 1   ## floatfile.c:0:5
  movq-72(%rbp), %rbx
Ltmp180:
  ##DEBUG_VALUE: load_floatfile:floats <- %RBX
  xorl%r14d, %r14d
Ltmp181:
  .p2align4, 0x90
LBB7_10:## =>This Inner Loop Header: Depth=1
  ##DEBUG_VALUE: load_floatfile:floats <- %RBX
  ##DEBUG_VALUE: load_floatfile:arrlen <- %R15D
  ##DEBUG_VALUE: load_floatfile:nulls <- [%RBP+-80]
  .loc2 395 34 is_stmt 1  ## floatfile.c:395:34
  movsd(%rbx,%r14,8), %xmm0## xmm0 = mem[0],zero
  .loc2 395 19 is_stmt 0  ## floatfile.c:395:19
  callq_Float8GetDatum
  .loc2 395 17## floatfile.c:395:17
  movq%rax, (%r12,%r14,8)
Ltmp182:
  .loc2 394 30 is_stmt 1 discriminator 2 ## floatfile.c:394:30
  incq%r14
  .loc2 394 19 is_stmt 0 discriminator 1 ## floatfile.c:394:19
  cmpq%r13, %r14
Ltmp183:
  .loc2 394 5 discriminator 1 ## floatfile.c:394:5
  jlLBB7_10
Ltmp184:
LBB7_11:
  ##DEBUG_VALUE: load_floatfile:arrlen <- %R15D
  ##DEBUG_VALUE: load_floatfile:nulls <- [%RBP+-80]

I get the same results on gcc too: the palloc, the loop, and even
`call Float8GetDatum@PLT`.

I'll do some timing of each version too, but it doesn't look like a
pointless optimization. I'd still like to know what is unsafe about it
though.

Thanks!
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Is float8 a reference type?

2017-09-22 Thread Paul A Jungwirth
On Fri, Sep 22, 2017 at 8:05 PM, Pavel Stehule  wrote:
> yes, it is 8 bytes on 64-bit.

Thanks!

> I don't think so it is good idea to write 64bit only extensions.

I agree, but how about this?:

if (FLOAT8PASSBYVAL) {
  datums = (Datum *)floats;
} else {
  datums = palloc0(arrlen * sizeof(Datum));
  for (i = 0; i < arrlen; i++) {
datums[i] = Float8GetDatum(floats[i]);
  }
    }

Thanks,
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Is float8 a reference type?

2017-09-22 Thread Paul A Jungwirth
On Fri, Sep 22, 2017 at 7:52 PM, Paul A Jungwirth
 wrote:
> Is a Datum always 8 bytes for 64-bit architectures?

Never mind, I found this in `pg_config.h`:

/* float8, int8, and related values are passed by value if 'true', by
   reference if 'false' */
#define FLOAT8PASSBYVAL true

Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Is float8 a reference type?

2017-09-22 Thread Paul A Jungwirth
The docs say that a Datum can be 4 bytes or 8 depending on the machine:

https://www.postgresql.org/docs/9.5/static/sql-createtype.html

Is a Datum always 8 bytes for 64-bit architectures?

And if so, can my C extension skip a loop like this when compiling
there, and just do a memcpy (or even a cast)?:

float8 *floats;
Datum *datums;

datums = palloc(arrlen * sizeof(Datum));
for (i = 0; i < arrlen; i++) {
  datums[i] = Float8GetDatum(floats[i]);
}

Thanks!
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance appending to an array column

2017-09-21 Thread Paul A Jungwirth
> It's going to suck big-time :-(.

Ha ha that's what I thought, but thank you for confirming. :-)

> We ended up keeping
> the time series data outside the DB; I doubt the conclusion would be
> different today.

Interesting. That seems a little radical to me, but I'll consider it
more seriously now. I also tried cstore_fdw for this, but my queries
(building a 2-D histogram) were taking 4+ seconds, compared to 500ms
using arrays. Putting everything into regular files gives up filtering
and other SQL built-ins, but maybe I could write my own extension to
load regular files into Postgres arrays, sort of getting the best of
both worlds.

Anyway, thanks for sharing your experience!

Yours,
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Performance appending to an array column

2017-09-21 Thread Paul A Jungwirth
I'm considering a table structure where I'd be continuously appending
to long arrays of floats (10 million elements or more). Keeping the
data in arrays gives me much faster SELECT performance vs keeping it
in millions of rows.

But since these arrays keep growing, I'm wondering about the UPDATE
performance. I was reading this commit message about improving
performance of *overwriting* individual array elements, and I was
wondering if there is anything similar for growing an array column?:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1dc5ebc9077ab742

Is there a faster way to append to an array than just this?:

UPDATE measurements
SET vals = vals || ARRAY[5.0, 4.2, 9.9]::float[]
;

Thanks!
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 10 beta 4 foreign table partition check constraint broken?

2017-09-21 Thread Paul Jones
On Thu, Sep 21, 2017 at 02:59:21PM +0900, Michael Paquier wrote:

/tmp/mutt-mayon-1000-26043-945be079d938129298
> On Fri, Sep 15, 2017 at 10:43 PM, Paul Jones  wrote:
> > Is this a bug in Postgres 10b4?  Looks like neither partition ranges
> > nor check constraints are honored in 10b4 when inserting into
> > partitions that are foreign tables.
> 
> Here is what you are looking for in the documentation:
> https://www.postgresql.org/docs/10/static/sql-createforeigntable.html
> Constraints on foreign tables (such as CHECK or NOT NULL clauses) are
> not enforced by the core PostgreSQL system, and most foreign data
> wrappers do not attempt to enforce them either; that is, the
> constraint is simply assumed to hold true. There would be little point
> in such enforcement since it would only apply to rows inserted or
> updated via the foreign table, and not to rows modified by other
> means, such as directly on the remote server. Instead, a constraint
> attached to a foreign table should represent a constraint that is
> being enforced by the remote server.

Thank you for the pointer... it is clear that I just didn't read far enough.

> 
> > Here is a nearly shovel-ready example.  Just replace with your
> > servers/passwords.
> >
> > -- --
> > -- Server 2
> > -- --
> >
> > CREATE DATABASE cluster;
> > \c cluster
> >
> > CREATE TABLE foo_1 (
> > id  INT NOT NULL,
> > nameTEXT
> > );
> 
> So here I think that you should add a CHECK constraint to this table,
> and that the behavior of your example works as expected.

I will try this, thanks!

> -- 
> Michael

.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] 10 beta 4 foreign table partition check constraint broken?

2017-09-15 Thread Paul Jones
Is this a bug in Postgres 10b4?  Looks like neither partition ranges
nor check constraints are honored in 10b4 when inserting into
partitions that are foreign tables.

Here is a nearly shovel-ready example.  Just replace with your
servers/passwords.

-- --
-- Server 1
-- --

CREATE DATABASE cluster;
\c cluster

CREATE EXTENSION postgres_fdw;

CREATE SERVER server2 FOREIGN DATA WRAPPER postgres_fdw
OPTIONS(host 'server2', dbname 'cluster');

CREATE USER MAPPING FOR postgres SERVER server2
OPTIONS(user 'postgres', password 'pgpassword');

CREATE TABLE foo (
id  INT NOT NULL,
nameTEXT
) PARTITION BY RANGE (id);

CREATE FOREIGN TABLE foo_1
PARTITION OF foo
FOR VALUES FROM (0) TO (1)
SERVER server2 OPTIONS (table_name 'foo_1');

-- --
-- Server 2
-- --

CREATE DATABASE cluster;
\c cluster

CREATE TABLE foo_1 (
id  INT NOT NULL,
nameTEXT
);

-- --
-- Server 1
-- --

INSERT INTO foo_1 VALUES(0,'funky bug'),
(100, 'wiggle frank'),
(15000, 'boegger snot');

SELECT * FROM foo;

DROP FOREIGN TABLE foo_1;

CREATE FOREIGN TABLE foo_1
PARTITION OF foo
(id CONSTRAINT f1 CHECK ((id >= 0) AND (id < 1)))
FOR VALUES FROM (0) TO (1)
SERVER server2 OPTIONS (table_name 'foo_1');

INSERT INTO foo_1 VALUES(0,'funky bug'),
(100, 'wiggle frank'),
(15000, 'boegger snot');

SELECT * FROM foo;


.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Strange SQL result - any ideas.

2017-09-01 Thread Paul Linehan


I have a table (fred) that I want to transform into JSON and
I use the following command (ignore the backslash stuff):

SELECT REGEXP_REPLACE(ROW_TO_JSON(t)::TEXT, '', '\\', 'g')
FROM
(
  SELECT * FROM fred
  ORDER BY mary, jimmy, paulie
) AS t;

which gives

regexp_replace
--
 {"mary":2,"jimmy":43,"paulie":"asfasfasfd"}
 {"mary":3,"jimmy":435,"paulie":"ererere"}
 {"mary":3,"jimmy":44545,"paulie":"\sdfs\\\sfs\\gf"}
 {"mary":6,"jimmy":43343,"paulie":"eresdfssfsfasfae"}
 {"mary":35,"jimmy":5,"paulie":"wrew\sdfsd"}
(5 rows)


which is fine (note that the field "mary" is sorted correctly) but
I want "proper" JSON - i.e. with open and close square brackets
i.e. ([ - ]) before and after the fields!

So, I tried this query:

SELECT '[' AS my_data  -- <<-- added line
UNION  -- <<-- added line
SELECT REGEXP_REPLACE(ROW_TO_JSON(t)::TEXT, '', '\\', 'g')
FROM
(
  SELECT * FROM fred
  ORDER BY mary, jimmy, paulie
) AS t
UNION   -- <<-- added line
SELECT ']';-- <<-- added line

*_BUT_*, this gives


   my_data
--
 ]
 [
 {"mary":6,"jimmy":43343,"paulie":"eresdfssfsfasfae"}
 {"mary":35,"jimmy":5,"paulie":"wrew\sdfsd"}
 {"mary":3,"jimmy":44545,"paulie":"\sdfs\\\sfs\\gf"}
 {"mary":2,"jimmy":43,"paulie":"asfasfasfd"}
 {"mary":3,"jimmy":435,"paulie":"ererere"}
(7 rows)


Two problems with this  result - one is that my square brackets are not in
the right place - this at least I understand - the first character of
each line is sorted by its ASCII value - '[' comes before ']' (naturally)
and '{' comes after them both - or have I got that right?

But, I do *_not_* understand why my table data is now out
of sort order - I've looked at it and can't see *_how_* the sort
order in my table data has been determined.

Anybody got any logical explanations as to what's going on?

TIA & Rgs,


Paul...


-- CREATE TABLE and INSERT INTO statements.


CREATE TABLE fred (
mary integer PRIMARY KEY,
jimmy integer,
paulie character varying(20)
);

INSERT INTO fred (mary, jimmy, paulie) VALUES (2, 43, 'asfasfasfd');
INSERT INTO fred (mary, jimmy, paulie) VALUES (3, 435, 'ererere');
INSERT INTO fred (mary, jimmy, paulie) VALUES (6, 43343, 'eresdfssfsfasfae');
INSERT INTO fred (mary, jimmy, paulie) VALUES (35, 5, 'wrew\sdfsd');
INSERT INTO fred (mary, jimmy, paulie) VALUES (3, 44545, '\sdfs\\\sfs\\gf');


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG and database encryption

2017-08-22 Thread Paul Jungwirth

On 08/22/2017 01:08 PM, John McKown wrote:

On Tue, Aug 22, 2017 at 2:48 PM, rakeshkumar464
 wrote:

We have a requirement to encrypt the entire database.


Personally, what I'd do (and actually do at work) is to us LUKS.


I second that, although I'll add that if you're on AWS you can also use 
encrypted EBS volumes. You get a very similar effect, except all you 
need to do is tick a checkbox (or set a CloudFormation attribute, etc.). 
Also you can get unattended reboots without storing the key somewhere 
vulnerable. There may be perf advantages too; I'm not sure.


Good luck!
Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Development of an extension for PostgreSQL and PostGIS

2017-08-14 Thread Paul Ramsey
In order to get an LWGEOM from PostGIS you'll need to convert from the
serialized form (GSERIALIZED) which you can read all about in the
liblwgeom.h header. You'll be adding a hard dependency of course, but
hopefully you're OK with that.

If you're just hoping to build a compound type, as your example shows, you
can do that without a C extension, just read up on CREATE TYPE.

For an alternate example of an extension with a lighter dependency on
PostGIS, check out pgpointcloud, which has it's own structure for spatial
data (a point patch) and exchanges data with PostGIS via well-known-binary.
This removes the liblwgeom dependency, which means it's possible to compile
and use pgpointcloud without PostGIS installed, which is not entirely
uncommon.

P


On Mon, Aug 14, 2017 at 11:18 AM, Fabiana Zioti 
wrote:

> Hello.
>
> I will start developing an extension to PostgreSQL next to PostGIS using
> the C language.
>
> If my new type were:
>
>
> CREATE TYPE mytype (.., .., .., geom geometry);
>
> The creation of the structure in c, would be something like?
>
> #include "liblwgeom.h"
>
> Struct mytype
> {
>Int32 id;
>LWGEOM lwgeom;
>
> };
>
>
> In the extension I will create new data types for PostgreSQL, but I would
> like to use the geometric objects that the PostGIS extension offers, such
> as POINT, LINE, POLYGON, etc. In addition to their input functions (wkt-
> ST_GeomFromText ()), operators, index, etc.
>
> In this case just importing the liblwgeom library would be enough to
> develop an extension to PostgreSQL / PostGIS?
>
> Would you have any examples of such a project?
>
> Thanks in advance!!
>
>


[GENERAL] Redacting JSONB

2017-08-07 Thread Paul Jones
Is there some way to redact JSONB fields? This is essentially implementing
"row-level" security on JSONB fields.  Could possibly be done with a
view or a function.  MongoDB and MarkLogic both do this.

PJ


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to stop array_to_json from interpolating column names that weren't there

2017-07-19 Thread Paul Jungwirth

which is great. I have an array of perfect JSON objects. Now I just need
to turn that into a single JSON object.


I think you're saying you want it as a single JSON *array*, right? An 
object of objects doesn't make sense. Assuming that's right, this seems 
to work:


db1=# select json_agg(schemata) from schemata;
 json_agg 


--

[{"catalog_name":"db1","schema_name":"information_schema","schema_name_address":"/information_schema"},
{"catalog_name":"db1","schema_name":"pg_catalog","schema_name_address":"/pg_catalog"},
{"catalog_name":"db1","schema_name":"pg_temp_1","schema_name_address":"/pg_temp_1"},
{"catalog_name":"db1","schema_name":"pg_toast","schema_name_address":"/pg_toast"},
{"catalog_name":"db1","schema_name":"pg_toast_temp_1","schema_name_address":"/pg_toast_temp_1"},
{"catalog_name":"db1","schema_name":"public","schema_name_address":"/public"}]
(1 row)

Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ERROR: query returned no rows

2017-06-26 Thread Paul A Jungwirth
On Mon, Jun 26, 2017 at 12:47 PM, Adrian Klaver
 wrote:
> On 06/26/2017 12:03 PM, Paul Jungwirth wrote:
>> Perhaps
>> you should see what is line 85 when you do `\sf words_skip_game` (rather
>> than line 85 in your own source code).
>
> Or easier yet:
>
> https://www.postgresql.org/docs/9.5/static/app-psql.html
> "
> \ef [ function_description [ line_number ] ]

I am always nervous about answering questions here when so many actual
Postgres maintainers are around. But it's pretty great that so often
when I do it provokes an even better answer. It's not the first time!
:-)

Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ERROR: query returned no rows

2017-06-26 Thread Paul Jungwirth

On 06/26/2017 11:21 AM, Alexander Farber wrote:

The error message returned by
the database is rather cryptic:

words=> select * from words_skip_game(1, 1);
ERROR:  query returned no rows
CONTEXT:  PL/pgSQL function words_skip_game(integer,integer) line 85 at
SQL statement

When I look at my source code
( https://gist.github.com/afarber/cac9a83b7a37307ace8d787be9b8ff4c ) at
the line 85, then I am not sure if the line number reported by the error
message is correct, because it points into middle of an UPDATE statement:


I agree the line numbers for error messages inside of plpgsql functions 
never seem to make sense, so I wouldn't get too hung up on line 85. 
Perhaps you should see what is line 85 when you do `\sf words_skip_game` 
(rather than line 85 in your own source code). But mostly I would try to 
find some other way of locating the cause of the error.


You can read about the "query returned no rows" message here:

https://www.postgresql.org/docs/9.5/static/plpgsql-statements.html

It seems to me you should only see it for `INTO STRICT`, not plain 
`INTO`. I see from your gist that your function *does* have some queries 
that are `INTO STRICT`, so I would focus on those.


That page also describes how to use `print_strict_params` to get a 
little more info about the details of the error.


I hope that helps!

Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Left join help

2017-06-23 Thread Paul Jungwirth

I tried a
query, but it is not giving me any result. Can anyone help me pls?

SELECT missions.*,
  CASE WHEN submissions.id IS NULL THEN 'incompleted'
  ELSE 'completed' END AS mission_status
FROM "missions" LEFT JOIN submissions ON submissions.mission_id =
missions.id
INNER JOIN members ON members.id = submissions.member_id
WHERE (members.id = 1 AND missions.track_id = 7)


I always think about JOINs as being implemented from top to bottom, and 
you can track the current result rows in your head. So in your case you 
start with one row for each mission. Then you add zero or more rows for 
each submission. Because it's an outer join you keep rows even if they 
don't match.


Then you join to members, but if there is no match, you drop the row 
from the result. But since there are no members with a NULL id (I hope) 
any row where submissions.member_id is NULL have no matches, so all the 
unmatched rows you kept from the outer join fall out here.


Since you know that a submission never has more than one member, it 
would be safe to use a LEFT OUTER JOIN in both places, and that will let 
you preserve incomplete missions all the way through to the final result.


Furthermore, your WHERE drops everything where members.id is not 1. So 
again you are throwing away incomplete missions. I guess you need to 
permit anything where members.id is 1 or NULL.


Finally, your intent seems to be to get one row per mission, but if a 
mission has several submissions you will get duplicates. So maybe for 
complete/incomplete you should just use EXISTS with a correlated subquery.


I hope that helps. Good luck!

Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Question about jsonb and data structures

2017-06-21 Thread Paul Jones
On Wed, Jun 21, 2017 at 09:37:20AM -0700, Emilie Laffray wrote:
> Date: Wed, 21 Jun 2017 09:37:20 -0700
> From: Emilie Laffray 
> To: Achilleas Mantzios 
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Question about jsonb and data structures
> 
> Hello Achilleas,
> 
> I fail to see how it would solve my problem here. I already have a
> structure that is packed and nested. Your example is on a simple key/value
> pair structure and effectively you can address the ids very simply. In my
> case, I would need to return only a subset of the json data.
> Maybe I missed something from your example?

Would jsquery help?

https://github.com/postgrespro/jsquery

> 
> On Wed, Jun 21, 2017 at 12:27 AM, Achilleas Mantzios <
> ach...@matrix.gatewaynet.com> wrote:
> 
> > On 21/06/2017 01:01, Emilie Laffray wrote:
> >
> >> Hello,
> >>
> >> I have been playing with Postgresql recently with a large table and I
> >> have started looking at reducing the number of rows in that table.
> >> One idea to reduce the actual size, I thought I would "compress" the data
> >> structure into a JSON object (more on that later).
> >> The table is pretty straightforward in itself
> >> other_id integer
> >> type_id integer
> >> label_id integer
> >> rank_id integer
> >> value real
> >>
> >> and the goal is to move to a data structure where we have
> >> other_id integer
> >> value jsonb
> >>
> >> There are many things in the table that is not optimal for legacy reasons
> >> and I can't just get rid of them.
> >>
> >> I looked at several json object data structure to see if I could make it
> >> work notably
> >> [{"type":1,"label":1,"rank":1,"value":.5},{"type":1,"label":
> >> 2,"rank":1,"value":.25}]
> >> {"1":{"1":{"rank":1,"value":.5},"2":{"rank":1,"value":.25}}}
> >>
> >> For better or worse, the first one would be the best for me as I can do a
> >> simple query like this using the GIN index built on top of value:
> >> SELECT *
> >> FROM mytable
> >> WHERE value @> '[{"type":1,"rank":1,"label":2}]'
> >>
> >> Effectively, I would want to extract the value corresponding to my
> >> condition using simple SQL aka not having to write a function extracting
> >> the json.
> >>
> >> The experiment on the second data structure shows that it is not as
> >> convenient as I may need to perform search on either type, label, rank and
> >> various combinations of the fields.
> >>
> >> Maybe you could try smth like :
> > test=# select * from lala;
> >  id |txt
> > +
> >   1 | one
> >   2 | two
> >   3 | ZZZbabaZZZ
> >   4 | ZZZbabaZZZ
> >   5 | ZZZbabaZZZ
> >   6 | ZZZbabaZZZ
> >   7 | ZZZbabaZZZ
> >   8 | ZZZbabaZZZ
> >   9 | ZZZbabaZZZ
> >  10 | ZZZbabaZZZ
> >  11 | ZZZbabaZZZ
> >  12 | ZZZbabaZZZ
> >  13 | ZZZbabaZZZ
> > (13 rows)
> >
> > select foo.* FROM (select id,to_jsonb(lala) as jzon from lala) as foo
> > where jzon @> '{"id":5}';
> >
> >
> > Am I missing something?
> >>
> >> Thanks in advance,
> >> Emilie Laffray
> >>
> >
> >
> > --
> > Achilleas Mantzios
> > IT DEV Lead
> > IT DEPT
> > Dynacom Tankers Mgmt
> >
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
> >


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Cookbook for doing installation and configuration of PostgreSQL on Redhat

2017-06-21 Thread Paul Jungwirth

I am new to chef concept, can anyone provide me a cookbook for doing
installation and configuration of PostgreSQL on Redhat.


Hi Pawan,

I have always used the standard "postgresql" cookbook, which has recipes 
to install a server. In a Berksfile you just say:


cookbook "postgresql"

The repo is here with usage information: 
https://github.com/sous-chefs/postgresql


One nice thing is it lets you control postgresql.conf values with chef 
node attributes.


You can either add its recipe directly to your runlist, or include it 
from an app recipe you write yourself. Here is an example of the latter 
(a snippet from one of my own recipes). Of course you should change all 
this to match your own situation (e.g. probably not `*` for 
`listen_addresses`).


# These attributes control the postgresql.conf file:
node.default[:postgresql][:config][:data_directory] = 
node[:postgresql][:data_dir]

node.default[:postgresql][:config][:port] = 5432
node.default[:postgresql][:config][:max_connections] = 100
node.default[:postgresql][:config][:unix_socket_directories] = 
'/var/run/postgresql'

node.default[:postgresql][:config][:ssl] = true
node.default[:postgresql][:config][:ssl_cert_file] = 
'/etc/ssl/certs/ssl-cert-snakeoil.pem'
node.default[:postgresql][:config][:ssl_key_file] = 
'/etc/ssl/private/ssl-cert-snakeoil.key'

node.default[:postgresql][:config][:shared_buffers] = '2GB'
node.default[:postgresql][:config][:effective_cache_size] = '3GB'
# node.default[:postgresql][:config][:wal_level] = 'hot_standby'
# node.default[:postgresql][:config][:max_wal_senders] = 3
node.default[:postgresql][:config][:checkpoint_completion_target] = '0.9'
node.default[:postgresql][:config][:work_mem] = '32MB'
node.default[:postgresql][:config][:synchronous_commit] = 'off'
node.default[:postgresql][:config][:wal_keep_segments] = 8
node.default[:postgresql][:config][:log_line_prefix] = '%t '
node.default[:postgresql][:config][:log_timezone] = 'UTC'
node.default[:postgresql][:config][:log_min_duration_statement] = 500
node.default[:postgresql][:config][:log_checkpoints] = 'on'
node.default[:postgresql][:config][:datestyle] = 'iso, mdy'
node.default[:postgresql][:config][:timezone] = 'UTC'
node.default[:postgresql][:config][:lc_messages] = 'en_US.UTF-8'
node.default[:postgresql][:config][:lc_monetary] = 'en_US.UTF-8'
node.default[:postgresql][:config][:lc_numeric] = 'en_US.UTF-8'
node.default[:postgresql][:config][:lc_time] = 'en_US.UTF-8'
node.default[:postgresql][:config][:default_text_search_config] = 
'pg_catalog.english'

node.default[:postgresql][:config][:listen_addresses] = '*'
node.default[:postgresql][:pg_hba] << {
  'type' => 'hostssl',
  'db' => 'all',
  'user' => 'all',
  'addr' => "0.0.0.0/0",
  'method' => 'md5'
}

include_recipe 'postgresql::server'
include_recipe 'postgresql::client'
package "postgresql-#{node['postgresql']['version']}-postgis-2.2"

# used for archiving WAL files
directory "#{node[:postgresql][:data_dir]}/archive" do
  owner "postgres"
  group "postgres"
  mode "0700"
end

...

You could also set these things via a role/environment/node file. Chef 
has a lot of ways to set attributes. But setting them in your own recipe 
is direct and lets you version the settings with the recipe. It may not 
be appropriate for deployments with many variations though.


But if you are just learning chef, it can be easier to start out writing 
one big recipe that is essentially just a long script. The most 
degenerate use of chef would be this:


bash "do everything" do
  code <<-EOF
echo Hello World!
apt-get install postgresql-9.6
etc
etc
  EOF
end

(I don't actually recommend you do that, except pedagogically)

A step better would be to break that into individual resources, e.g.

package 'postgresql-9.6'
...

And then a step above that would be to use third-party cookbooks where 
available, and include their recipes in your runlist.


I also think it is easier to use chef-solo when you are just starting 
out, if possible. That gives you a model more like Ansible: "push these 
commands from my local development machine to the target and run them." 
It combines well with Vagrant so you can quickly try things out and 
start from fresh.


This is all really more about Chef than Postgres though, so you might 
have more success asking that community. Or send me a note and I can 
probably help solve specific Chef problems. The try-test-fix cycle can 
be pretty slow!


Good luck!

Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres Data Encryption Using LUKS with dm-crypt ?

2017-06-19 Thread Paul Jungwirth

On 06/19/2017 12:40 AM, Scott Marlowe wrote:

On Sun, Jun 18, 2017 at 2:20 PM, Condor  wrote:

What I should expect, what is good and bad things that can be happened.


I've run Postgres on a LUKS volume for a few years now and it's all been 
pretty quiet. One challenge is you need to supply the password if the 
server restarts. Automating that in a way that doesn't simply reveal the 
password is tricky.


I'm not using RAID, so I can't speak to combing LUKS + RAID.

If you are on AWS, nowadays they have encrypted EBS volumes which will 
do all this for you automatically. If I were setting up this system 
today that's probably what I would have used.


> I think the only real test here is to build a luks system, initiate
> some pgbench type runs, wait a minute, run checkpoint and then yank
> out the plug. Run a dozen or so times looking for data corruption.

I think this is really the right answer!

Paul




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Upgrade method from 9.2 to 10 ?

2017-06-16 Thread Paul Jones
Would it be possible to upgrade from 9.2 to 10 by doing a pg_basebackup
to the new server, followed by pg_upgrade -k, then streaming replication
from the 9.2 server to the 10 server until we're ready to cut over to 10?

The idea is to minimize downtime.

PJ


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] CREATE FOREIGN TABLE difficulties

2017-06-15 Thread Paul Lavoie
The ‘does not actually create’ the table was the piece I was missing. I wasn’t 
sure either way from reading the documentation for the command, but makes sense 
from a perspective of trying to keep the databases independent.

One of the reasons I went down this path was the hopes to not need to 
manipulate the table definitions in each of the databases, rather make the 
definition once and let it propagate to the children as necessary, courtesy of 
the ‘INHERITS’.  I’ll adjust my plans accordingly.

Thanks for the reply!

- Paul

> On Jun 15, 2017, at 1:27 PM, Adrian Klaver  wrote:
> 
> On 06/15/2017 09:49 AM, Paul Lavoie wrote:
>> Hi,
>> Long time user, new poster…
>> For assorted reasons I’ve decided to shard a database across multiple 
>> instances of postgresql running on the same machine. I’ve set up a lot of 
>> children servers with a ‘fdw’ user to work with the foreign data wrapper and 
>> created the child database along with a schema, and then on the main 
>> database go thru the process of:
>> CREATE SCHEMA myschema;
>> CREATE TABLE mytable (mycol TEXT);
>> CREATE EXTENSION postgres_fdw;
>> CREATE SERVER db001 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 
>> ‘localhost’, port ‘8001’, dbname = ‘db001’);
>> CREATE USER MAPPING FOR myuser SERVER db001 OPTIONS (user ‘fdw’, password 
>> ‘XXX’);
>> CREATE FOREIGN TABLE myschema.mytable () INHERITS (mytable) SERVER db001 
>> OPTIONS (schema_name ‘myschema’, table_name ‘mytable’);
>> Attempts to SELECT from myschema.mytable then fail with a “relation 
>> “myschema.mytable” does not exist” error, and going into the child database 
>> shows no signs of any tables whatsoever.
> 
> I am assuming you are doing the above on the parent database.
> CREATE FOREIGN TABLE does not actually create the table on the remote(child), 
> it has to exist there already. It creates the table on parent and links it to 
> the table on the remote(child).:
> 
> https://www.postgresql.org/docs/9.6/static/sql-createforeigntable.html
> 
> You will also want to pay attention to:
> 
> "If a schema name is given (for example, CREATE FOREIGN TABLE 
> myschema.mytable ...) then the table is created in the specified schema. 
> Otherwise it is created in the current schema. The name of the foreign table 
> must be distinct from the name of any other foreign table, table, sequence, 
> index, view, or materialized view in the same schema."
> 
> 
> from the above link.
> 
>> This is under Postgresql 9.6.3, with the software being tested on servers 
>> running MacOS, NetBSD/amd64, & Solaris/x86_64 (various combinations). I’ve 
>> tried removing the schema qualifications, redoing the host as properly 
>> remote rather than localhost, removing the port number, all without any 
>> signs of success. Oddly, the inverse of IMPORT FOREIGN SCHEMA appears to 
>> work.
>> I’m particularly curious as to how one would troubleshoot this scenario. I’m 
>> somewhat surprised at the CREATE FOREIGN TABLE returning success when it 
>> doesn’t appear to have done all the work - the Postgreql instance must be 
>> present, but it doesn’t complain if the database, never mind the schema, 
>> doesn’t exist.
>> If this turns out to be a bug, I’ll happily move to the bug mailing list to 
>> discuss further. But under the possibility I’m missing the obvious, I’d 
>> thought I’d try here first.
>> I’m going to go try the 10.0 beta now…
>> Thanks!
>> - Paul
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] CREATE FOREIGN TABLE difficulties

2017-06-15 Thread Paul Lavoie
Hi,

Long time user, new poster…

For assorted reasons I’ve decided to shard a database across multiple instances 
of postgresql running on the same machine. I’ve set up a lot of children 
servers with a ‘fdw’ user to work with the foreign data wrapper and created the 
child database along with a schema, and then on the main database go thru the 
process of:

CREATE SCHEMA myschema;
CREATE TABLE mytable (mycol TEXT);

CREATE EXTENSION postgres_fdw;
CREATE SERVER db001 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 
‘localhost’, port ‘8001’, dbname = ‘db001’);
CREATE USER MAPPING FOR myuser SERVER db001 OPTIONS (user ‘fdw’, password 
‘XXX’);
CREATE FOREIGN TABLE myschema.mytable () INHERITS (mytable) SERVER db001 
OPTIONS (schema_name ‘myschema’, table_name ‘mytable’);

Attempts to SELECT from myschema.mytable then fail with a “relation 
“myschema.mytable” does not exist” error, and going into the child database 
shows no signs of any tables whatsoever.

This is under Postgresql 9.6.3, with the software being tested on servers 
running MacOS, NetBSD/amd64, & Solaris/x86_64 (various combinations). I’ve 
tried removing the schema qualifications, redoing the host as properly remote 
rather than localhost, removing the port number, all without any signs of 
success. Oddly, the inverse of IMPORT FOREIGN SCHEMA appears to work.

I’m particularly curious as to how one would troubleshoot this scenario. I’m 
somewhat surprised at the CREATE FOREIGN TABLE returning success when it 
doesn’t appear to have done all the work - the Postgreql instance must be 
present, but it doesn’t complain if the database, never mind the schema, 
doesn’t exist.

If this turns out to be a bug, I’ll happily move to the bug mailing list to 
discuss further. But under the possibility I’m missing the obvious, I’d thought 
I’d try here first.

I’m going to go try the 10.0 beta now…

Thanks!

- Paul

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] trying to program in PostgreSQL C a statistics function

2017-06-11 Thread Paul Jungwirth

Hi Andre,

I've written some C statistics functions for Postgres before, here:

https://github.com/pjungwir/aggs_for_arrays
https://github.com/pjungwir/aggs_for_vecs

They are all really simple, although they operate on arrays, so yours 
should be even simpler. The second link has aggregate functions, 
including one for sample variance, so that would probably be the easiest 
to adapt to your own needs.


In general Postgres C functions use macros to access their arguments 
(and to return something). You can read more about it here:


https://www.postgresql.org/docs/current/static/xfunc-c.html

I'm happy to help more if you need it. I've found extending Postgres in 
this way to be a lot of fun and very effective at gaining performance.


Good luck!

Paul


On 06/11/2017 09:57 AM, Andre Mikulec wrote:

SUMMARY
--

I am trying to program in PostgreSQL a statistics function.

My old method in SQL is here.

select ( avg(measure) - 0.0 ) / nullif(stddev_pop(case when measure >
0.00 then 0.00 else measure end ),0) sortino_true from TABLE/VIEW;

The logic is based on
SORTINO RATIO: ARE YOU CALCULATING IT WRONG?
SEPTEMBER 11, 2013
https://www.rcmalternatives.com/2013/09/sortino-ratio-are-you-calculating-it-wrong/

In PostgreSQL C, how do I get access to the 'X' (data) variable?

DETAILS
---

I see a similar function with other variables in here.
But I am not trying to re-program 'stddev_samp'.

float8_stddev_samp ( lines 2741 through 2767 )
https://github.com/postgres/postgres/blob/9a34123bc315e55b33038464422ef1cd2b67dab2/src/backend/utils/adt/float.c

  N = transvalues[0];
  sumX  = transvalues[1];
  sumX2 = transvalues[2];

The following article explains it and this makes sense.

Sum of Squares Formula Shortcut
https://www.thoughtco.com/sum-of-squares-formula-shortcut-3126266

I can match article symbols to PostgreSQL C varaibles.

numerator = N * sumX2 -sumX * sumX;
N * SIGMA(X**2) - (SIGMA(X))**2

BUT I NEED ...
--

However in my case,
I have this adjustement

"case when  measure > 0.00 then 0.00 else measure end"

So the case seems that I need access to 'X'
but access to sumX and sumX2 are only shown.

How would I get accress to X?

May anyone point me to a simple good working example?

Thanks,
Andre Mikulec
andre_miku...@hotmail.com






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] type "xxxxxxx" does not exist

2017-05-19 Thread Paul Jungwirth

On 05/19/2017 02:25 PM, Micky Hulse wrote:

Awesome, that worked!

SET search_path TO myschema, public;

Thanks to everyone for the help! I really appreciate it. :)


Glad you figured it out! Setting the seach_path is often a good thing to 
put in your ~/.psqlrc so you don't run into the same problem next time.


Paul




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] type "xxxxxxx" does not exist

2017-05-19 Thread Paul Jungwirth

On 05/19/2017 01:06 PM, Micky Hulse wrote:

ERROR:  type "xxx_xxx_x" does not exist
LINE 1:  DECLARE results xxx_xxx_x;


It sounds like the type might be in a different schema. You can say \dn 
to see the schemas in your database, and \dT+ will show the types along 
with their schema. You could also do \dT+ foo.* to see all the types in 
schema foo.


If you find that the type isn't in the public schema, try setting your 
schema search path so that the function can locate it, e.g.:


SET search_path TO foo, public;

Good luck!

Paul




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Python versus Other Languages using PostgreSQL

2017-05-10 Thread Paul Hughes
Adrian Klaver Wrote:

>>Many on this list(myself included) will want to know how you came to that
conclusion and I am speaking as someone who uses Python, Django and
Postgres.

I came to that conclusion when I saw a list of the top 15 websites (based
on traffic). On that list, *all* of the sites that used PostgreSQL as their
primary data store, also used Python as their primary back-end language.
Those three sites are/were Yahoo, Instagram and Uber. 3 out of 3 is a
pretty high statistical correlation when you're only talking about a sample
of 15. Given that, my saying, "I noticed that most of the largest web
platforms that use PostgreSQL as their primary database, also use Python as
their primary back-end language.", was a fair observation. An observation
is just that (one blind man feeling an elephant). Saying "I noticed that"
is a *very* different thing than making an assertion.


<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Wed, May 10, 2017 at 7:37 AM, Adrian Klaver 
wrote:

> On 05/09/2017 01:00 PM, Paul Hughes wrote:
>
>> Thank you all for taking the time to answer my questions. I've been out
>> of the programming world for a long time, so I am back to being a newbie. I
>> was told this is the place for newcomers to ask questions. I apologize if
>> my questions did not contain the necessary sophistication or nuance for
>> some to answer.
>>
>
> It has more to do with making unsubstantiated assertions, namely:
>
> "I noticed that most of the largest web platforms that use PostgreSQL as
> their primary database, also use Python as their primary back-end language.
> ..."
>
> Many on this list(myself included) will want to know how you came to that
> conclusion and I am speaking as someone who uses Python, Django and
> Postgres.
>
>
>
>> George, your answer makes the most sense. Thank you!  You said:
>>
>> Most cloud servers come preconfigured with some variation of either
>>>>
>>> the "LAMP" or "LAPP" stack: i.e. Linux, Apache, MySQL or Postgresql,
>> PHP or Python.
>>
>> There needs to be a compelling *application* reason to install
>>>>
>>> something else: a JVM (or Apache-Tomcat vs regular Apache),
>> Ruby/Rails, Groovy/Grails, etc.
>>
>>
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>



-- 
Paul Hughes
Executive Director
Vivation International
1-800-514-8483
Skype: Vivation2
p...@vivation.com
http://www.vivation.com/


Re: [GENERAL] Python versus Other Languages using PostgreSQL

2017-05-10 Thread Paul Hughes
Thank you all for taking the time to answer my questions. I've been out of
the programming world for a long time, so I am back to being a newbie. I
was told this is the place for newcomers to ask questions. I apologize if
my questions did not contain the necessary sophistication or nuance for
some to answer.

George, your answer makes the most sense. Thank you!  You said:

>>Most cloud servers come preconfigured with some variation of either
the "LAMP" or "LAPP" stack: i.e. Linux, Apache, MySQL or Postgresql,
PHP or Python.

>>There needs to be a compelling *application* reason to install
something else: a JVM (or Apache-Tomcat vs regular Apache),
Ruby/Rails, Groovy/Grails, etc.

>

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Tue, May 9, 2017 at 7:54 AM, George Neuner  wrote:

> On Mon, 8 May 2017 14:26:02 -0700, Paul Hughes 
> wrote:
>
> >Hello,
> >
> >I noticed that most of the largest web platforms that use PostgreSQL as
> >their primary database, also use Python as their primary back-end
> language.
> >Yet, according to every benchmark I could find over the last couple of
> >years, back-end languages like PHP, HHVM, and Node.JS outperform Python by
> >2x to 8x!
> >
> >So here are my questions:
> >
> >1) Why do the largest web applications that use PostgreSQL also use
> Python,
> >even though Python is significantly slower than it's biggest competitors?
>
> Most cloud servers come preconfigured with some variation of either
> the "LAMP" or "LAPP" stack: i.e. Linux, Apache, MySQL or Postgresql,
> PHP or Python.
>
> There needs to be a compelling *application* reason to install
> something else: a JVM (or Apache-Tomcat vs regular Apache),
> Ruby/Rails, Groovy/Grails, etc.
>
>
> >2) Can PostgreSQL just as easily be used with PHP or Node.js? If not, why
> >not?
>
> Absolutely.  The DBMS is (programming) language neutral - the only
> requirement is that a client speak the wire protocol.  That can be
> done natively, or via a library/driver.
>
>
> >3) Can PostgreSQL be made to work seamlessly to take advantage of the
> >superior performance of HHVM or Node.js?
>
> Not really sure what you're asking.  The application running under
> HHVM or node.js is completely separate from Postgresql.
>
> Both PHP and Javascript (generically and for node.js specifically)
> have libraries for Postgresql.
>
>
> George
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



-- 
Paul Hughes
Executive Director
Vivation International
1-800-514-8483
Skype: Vivation2
p...@vivation.com
http://www.vivation.com/


Re: [GENERAL] Python versus Other Languages using PostgreSQL

2017-05-08 Thread Paul Hughes
Thank you for the links. I'm glad there are other languages that are
working with PostgreSQL. My question still remains though - why is it that
all the largest web platforms that have used PostgreSQL *specifically*
choose Python as their back-end language? Why are Postgres and Python so
married, in the same way that Node.js is largely married to MondogDB?


On Mon, May 8, 2017 at 3:50 PM, Adrian Klaver 
wrote:

> On 05/08/2017 02:26 PM, Paul Hughes wrote:
>
>> Hello,
>>
>> I noticed that most of the largest web platforms that use PostgreSQL as
>> their primary database, also use Python as their primary back-end
>> language. Yet, according to every benchmark I could find over the last
>> couple of years, back-end languages like PHP, HHVM, and Node.JS
>> outperform Python by 2x to 8x!
>>
>
> Postgres does not really care what you use to pull data from it. There are
> many libraries across many languages that you can use.




<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


[GENERAL] Python versus Other Languages using PostgreSQL

2017-05-08 Thread Paul Hughes
Hello,

I noticed that most of the largest web platforms that use PostgreSQL as
their primary database, also use Python as their primary back-end language.
Yet, according to every benchmark I could find over the last couple of
years, back-end languages like PHP, HHVM, and Node.JS outperform Python by
2x to 8x!

So here are my questions:

1) Why do the largest web applications that use PostgreSQL also use Python,
even though Python is significantly slower than it's biggest competitors?

2) Can PostgreSQL just as easily be used with PHP or Node.js? If not, why
not?

3) Can PostgreSQL be made to work seamlessly to take advantage of the
superior performance of HHVM or Node.js?


Thank you in advance!

~Paul

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


[GENERAL] Partitioning and Table Inheritance

2017-05-08 Thread Paul A Jungwirth
I'm working on a problem where partitioning seems to be the right
approach, but we would need a lot of partitions (say 10k or 100k).
Everywhere I read that after ~100 child tables you experience
problems. I have a few questions about that:

1. Is it true that the only disadvantage to 10k children is query
planning time when hitting the parent table? Is there anything else to
worry about? (Unlike ext3, ext4 seems to have no problem with lots of
files in one directory (e.g. 1,000,000). It doesn't break and it even
performs fine.) I know a lot of tables increases memory use. There is
clutter too in \dt but I think I can use schemas to cut down on that.
Anything else?

2. Is there any reason I shouldn't bypass the query planning cost by
having the app just SELECT from the correct child table (which is easy
for us 99% of the time)? Similarly I could bypass trigger time by
INSERTing directly into the child, etc.

3. Is it true that query planning time should scale linearly as I add
more child tables? I'm seeing ~16ms planning time from EXPLAIN ANALYZE
with 1000 tables. I haven't tried 10k tables yet, but so far 16ms or
even 160ms seems tolerable if it's only for the 1% of queries that
can't SELECT directly from a known child table.

4. I tried a scheme where instead of one parent table with 1000 direct
children, I have 1 parent with 10 children, and they each have 10
children, and they each have 10 children. I thought by using
increasingly-specific constraints I could maybe make query planning
scale by O(log n) instead of O(n), but instead it went up! Upon
investigating it looks like maybe the planner flattens the inheritance
hierarchy before doing constraint_exclusion work. Is that true? If so,
is there any interest in changing this in future versions? (I might be
willing to take a stab at a patch for it.)

Btw this is on 9.4 but it could be 9.5 or 9.6 if that would help any.

Thanks,
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are multiple array_aggs going to be in the same order?

2017-04-10 Thread Paul Jungwirth

On 04/10/2017 11:35 AM, Tom Lane wrote:

I'm not very keen on recommending that the OP insert an ORDER BY into
each aggregate call, because that would cause a separate sort for each
aggregate (unless someone's improved that recently while I wasn't
looking).


I mentioned this in my other email, upon a second look I was misreading 
the EXPLAIN output. (The sort was for the GROUP BY, not the individual 
ORDER BYs.) Do aggregate function sorts even show up in EXPLAIN? I can't 
seem to find any. For example in this try the sorts are just for 
grouping and joining:


db=> explain select c.id, array_agg(e.item order by e.id), 
array_agg(e.payee order by e.id) from expense_categories c join expenses 
e on e.expense_category_id = c.id group by c.id;
  QUERY PLAN 


---
 GroupAggregate  (cost=223.44..285.14 rows=16 width=30)
   ->  Merge Join  (cost=223.44..264.44 rows=2728 width=30)
 Merge Cond: (c.id = e.expense_category_id)
 ->  Sort  (cost=1.48..1.52 rows=16 width=4)
   Sort Key: c.id
   ->  Seq Scan on expense_categories c  (cost=0.00..1.16 
rows=16 width=4)

 ->  Sort  (cost=221.96..228.78 rows=2728 width=30)
   Sort Key: e.expense_category_id
   ->  Seq Scan on expenses e  (cost=0.00..66.28 rows=2728 
width=30)

(9 rows)

Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are multiple array_aggs going to be in the same order?

2017-04-10 Thread Paul Jungwirth

On 04/10/2017 09:33 AM, Merlin Moncure wrote:

On Sun, Apr 9, 2017 at 4:27 PM, Guyren Howe  wrote:

If I do a SELECT with a GROUP_BY and multiple ARRAY_AGGs,

>> will the ARRAY_AGGs be guaranteed to have entries in the
>> same (ie corresponding) order?

It is unsafe to rely on aggregation order unless specified --

> you can add ORDER BY to the aggregation clause.

Just thought I'd add that if you do this:

SELECT  u.name,
ARRAY_AGG(o.order_date ORDER BY o.id) AS order_dates,
ARRAY_AGG(o.order_total ORDER BY o.id) AS order_totals
...

Then you can check EXPLAIN and should see that Postgres is only doing 
one sort, not two, so there is no performance cost. (Of course for more 
complicated queries you might want to double check what EXPLAIN intends 
to do.)


But something I've wondered myself is how well Postgres handles sorting 
already-mostly-sorted lists? I tried diving into the code, starting from 
here:


https://doxygen.postgresql.org/tuplesort_8c_source.html

but I couldn't answer the question myself. The comments say that as long 
as the tuples all fit in work_mem, it uses qsort, but the code appears 
to call qsort_ssup or qsort_tuple, whose definitions I couldn't find. (I 
looks like they are from qsort_tuple.c which is generated by a Perl 
script.) I know that qsort(3) is not necessarily quicksort, despite the 
name. Does anyone know what algorithm Postgres uses? It seems like the 
mostly-already-sorted case would happen a lot, so I'm curious if 
Postgres pays a performance cost there?


Thanks,
Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres Permissions Article

2017-03-30 Thread Paul Jungwirth

Also I don't understand why you wrote “You need the permission on both
tables”: Only the owner of a table can add constraints to it


Ah, this piece was really helpful for me in making it click. Thanks so 
much! I added a couple new paragraphs to my post with a link back to 
this thread. I feel like it all makes sense now! :-)


FYI "You need this permission on both tables" is what the docs say 
(https://www.postgresql.org/docs/9.6/static/sql-grant.html):


> To create a foreign key constraint, it is necessary to have this 
privilege on both the referencing and referenced columns.


Maybe it would be worth clarifying there that you need to *own* the 
referencing table, and you need REFERENCES on the referenced table?


In any case, thanks again to you all for your help figuring this out!

Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres Permissions Article

2017-03-29 Thread Paul Jungwirth

On 03/29/2017 06:36 AM, Tom Lane wrote:

Karsten Hilbert  writes:

Being able to create foreign keys may allow to indirectly
discover whether certain values exists in a table which I
don't otherwise have access to (by means of failure or
success to create a judiciously crafted FK).


Aside from that, an FK can easily be used to cause effective
denial-of-service, for example preventing rows from being deleted
within a table, or adding enormous overhead to such a deletion.


Thank you both for taking a look! I agree those are both worthwhile 
concerns. It still seems a little strange it is not just part of the 
CREATE permission (for example). I understand why not everyone can 
create a foreign key, I just have trouble imagining a use case where it 
is helpful to separate it from other DDL commands. Anyway, I didn't 
write the article to nitpick details like that, but sometimes by asking 
"why" you learn new things. I really appreciate your offering your thoughts!


Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Postgres Permissions Article

2017-03-28 Thread Paul Jungwirth

Hi All,

I wrote a blog post about the Postgres permissions system, and I thought 
I'd share:


http://illuminatedcomputing.com/posts/2017/03/postgres-permissions/

The main point I wanted to convey, which I somehow never grasped 
confidently from reading the docs or other articles, is how permissions 
are purely additive. Also I wanted to gather in one place how to *see 
the current permissions*, which seems to be missing/incomplete in many 
other articles. Anyway, maybe it will be helpful for some of you! Or for 
the true experts, if you see any errors, I'd be happy to know so I can 
correct them.


I also shared a few opinions amidst the facts (like that `USAGE` for 
schemas doesn't add much), so I am very pleased to have those 
challenged. You can consider them my own outstanding questions. I'd be 
especially grateful for any feedback there.


Yours,
Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] COPY ... FROM stdin WITH FORMAT csv

2017-03-21 Thread Paul Jungwirth

On 03/21/2017 12:21 PM, David G. Johnston wrote:

>   words=> COPY words_reviews (uid, author, nice, review, updated) FROM
>   stdin FORMAT csv;

What did you read that lead you to think the above shoud work?


I don't know about COPY FROM, but COPY TO works without parens (or 
FORMAT), like this:


db=> copy (select 1+1, 2+2) to stdout with csv; 


2,4

(tested on pg 9.5)

I never use parens, so I was surprised to see in the docs and the 
replies here that they are necessary. Am I just exploiting a bug in the 
parser?


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] appropriate column for storing ipv4 address

2017-03-01 Thread Paul Jungwirth

On 03/01/2017 08:39 AM, jonathan vanasco wrote:


I have to store/search some IP data in Postgres 9.6 and am second-guessing my 
storage options.

Would anyone mind giving this a quick look for me?

Right now I have two tables, and am just using cidr for both:


Hi Jonathan,

CIDR seems like a better match to how people think about IPs, but 
another option would be to use a custom range type on inet. I wrote a 
blog post about that here, including how to use a GiST index to get fast 
searches:


http://illuminatedcomputing.com/posts/2016/06/inet-range/

Maybe it will be useful for you! Or maybe there is already some built-in 
way to treat cidr columns like ranges?


Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] json aggregation question

2017-02-28 Thread Paul Jungwirth

On 02/28/2017 08:21 AM, Chris Withers wrote:

How can I aggregate the results of a query that equates to "show me the
number of matching rows and the set of |tag1| value that have
a |tag2| value of |t2val1|?

...but I really want:

|count |tag1 ---+-2|["val1","val2","val3"](1row)|


Seems like this does the trick?:

SELECT  COUNT(DISTINCT id),
json_agg(DISTINCT elem)
FROM(
  SELECT  id,
  jsonb_array_elements_text(tags->'tag1') AS elem
  FROMthing
  WHERE   tags->'tag2'?'t2val1'
) x;

You are looking to get always one result, right?

Yours,
Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running out of memory the hard way ...

2017-02-08 Thread Paul Ramsey
On Wed, Feb 8, 2017 at 7:44 AM, Tom Lane  wrote:

> Albe Laurenz  writes:
> > Bill Moran wrote:
> >> What I feel is the best way to mitigate the situation, is to have some
> >> setting that limits the maximum RAM any backend can consume.
>
> > I'd delegate that problem to the operating system which, after all,
> > should know best of all how much memory a process uses.
>
> I've had some success using ulimit in the past, although it does have
> the disadvantage that you have to impose the same limit on every PG
> process.  (You set it before starting the postmaster and it inherits
> to every child process.)  If memory serves, limiting with the -v switch
> works better than -d or -m on Linux; but I might be misremembering.
> Conceivably we could add code to let the ulimit be set per-process,
> if the use-case were strong enough.
>
> To implement a limit inside PG, we'd have to add expensive bookkeeping
> to the palloc/pfree mechanism, and even that would be no panacea because
> it would fail to account for memory allocated directly from malloc.
> Hence, you could be pretty certain that it would be wildly inaccurate
> for sessions using third-party code such as PostGIS or Python.  An
> OS-enforced limit definitely sounds better from here.
>
>
Confirming what Tom said, with respect to the specific example in this
thread, a large proportion of the allocations in memory hungry bits of
PostGIS are in fact using bare malloc via the GEOS library.

P


[GENERAL] Can we not give tyrannical pedants control of #postgresql?

2017-01-20 Thread Julian Paul

Hi all!
It's really a bad sign when some user is given operator status who is 
intolerant to minor offtopic conversations that span no more than a 
couple of line buffers. Witnessing a user getting kicked for asking for 
my location was way beyond reasonable, considering even the channel was 
rather idle. Not only that, conversations in the past have elevated to 
much more heated levels with rare instances of such drastic measures. 
And they've been great!
I've previously considered and advocated #postgresql as the best channel 
on freenode for years, I'm not sure I can hold this perception from the 
facts witnessed recently.
This particular user and a few others seem to be more inclined to 
bullying behaviour quick with drastic overractions rather than stable 
and calm tolerance that was appreciated in the past (I haven't been 
perfect!).
I won't name names, however I've rarely seen one invoke +o privileges in 
the past and rarely that such a user has shown an immature perspective 
to databases and intolerant to the perspectives of others that aren't 
regularly feeding the buffer.
I think a little bit of rational consideration on what kind of people 
would even join #postgresql should be considered and not be so quick to 
kick due to minor infractions. This leaves one to kick/ban bots and 
offtopic spammers and severely abusive behaviour.


Freenode also has other options beside a +k (+q).

I think the operator status of such individuals should be revoked to be 
honest. However even in regards to database design, server instances and 
personal real world experience of individuals such as myself, it is 
unfortunate I won't be able to share that with new users on IRC.


I hope that particular stereotypes aren't proven here, but it appears 
#postgresql encourages a particular tier and makes aware of it's rigid 
hierarchy. I owe alot to #postgresql but not to these particular users, 
I've perhaps been idle for too long and the channel has change for the 
worse, well that's not my fault. I leave it with the community to sort out.


All the best, Julian.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Improve PostGIS performance with 62 million rows?

2017-01-09 Thread Paul Ramsey
romText('SRID=4326;LINESTRING(-150.008056
> 61.179167,-156.77 71.285833)'),
> 600
> )::geometry
> ) as pt
> )
> SELECT elevation
> FROM data
> INNER JOIN (SELECT
> ST_MakeLine(ARRAY[a.geom, b.geom]) as short_line
> FROM pts a
> INNER JOIN pts b
> ON a.vert=b.vert-1 AND b.vert>1) segments
> ON  ST_DWithin(location, segments.short_line, 600)
> ORDER BY elevation DESC limit 1;
>
>
> Then you could remove the useless and (potentially explosive if you have
> large number of dump points) inner join on points :
> "FROM pts a
> INNER JOIN pts b "
>
> You could simply use a window function to generate the segments, like in
> here
> <https://github.com/Remi-C/_utilities/blob/master/postgis/rc_DumpSegments.sql#L51>
> .
> The idea is to dump points, order them by path, and then link each point
> with the previous one (function lag).
> Assuming you don't want to use the available function,
> this would be something like :
>
>
>
> WITH segments AS (
> SELECT ST_MakeLine( lag((pt).geom , 1, NULL) OVER (ORDER BY (pt).path)
>   ,(pt).geom) AS short_line
> FROM ST_DumpPoints(
>   ST_Segmentize(
> ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056
> 61.179167,-156.77 71.285833)'),
> 600
> )::geometry
> ) as pt
> )
> SELECT elevation
> FROM data ,segments
> WHERE segments.short_line IS NOT NULL --the first segment is null by
> design (lag function)
>   AND  ST_DWithin(location, segments.short_line, 600) = TRUE
> ORDER BY elevation DESC
> limit 1;
>
>
> I don't know if you can further improve this query after that,
> but I'll guess it would reduce your time and be more secure regarding
> scaling.
>
>
> if you want to further improve your result,
> you'll have to reduce the number of row in your index,
> that is partition your table into several tables !
>
> This is not easy to do with current postgres partitionning methods as far
> as I know
> (partitionning is easy, automatic efficient query is hard).
>
> Another way would be to reduce you requirement, and consider that in some
> case you may want less details in the altimetry, which would allow you to
> use a Level Of Detail approach.
>
> Congrats for the well explained query/problem anyway !
> Cheers,
> Rémi-C
>
>
>
> Ooooh, nice use of a window function - that change right there cut the
> execution time in half! I was able to shave off a few hundreds of a second
> more but tweaking the ST_Segmentize length parameter up to 5,000 (still
> have to play with that number some), so execution time is now down to the
> sub-300ms range. If I reduce the radius I am looking around the line, I
> can additionally improve the time to around 200 ms, but I'm not sure that
> will be an option. Regardless, 300ms is rather impressive, I think. Thanks!
>
>
> 2017-01-05 23:09 GMT+01:00 Paul Ramsey :
>
>> Varying the segment length upwards might have a salutary effect for a
>> while, as the efficiency improvement of fewer inner loops battles with the
>> inefficiency of having more points selected by the index filter. Worth an
>> experiment.
>>
>> P
>>
>> On Thu, Jan 5, 2017 at 1:00 PM, Israel Brewster 
>> wrote:
>>
>>>
>>> On Jan 5, 2017, at 10:38 AM, Paul Ramsey 
>>> wrote:
>>>
>>> Yes, you did. You want a query that spits out a tupleset of goemetries
>>> (one each for each wee segment), and then you can join that set to your
>>> main table using st_dwithin() as the join clause.
>>> So start by ditching the main table and just work on a query that
>>> generates a pile of wee segments.
>>>
>>>
>>> Ahhh, I see you've done this sort of thing before (
>>> http://blog.cleverelephant.ca/2015/02/breaking-linestring-i
>>> nto-segments.html) :-)
>>>
>>> So following that advice I came up with the following query:
>>>
>>> WITH dump AS (SELECT
>>> ST_DumpPoints(
>>> ST_Segmentize(
>>> ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056
>>> 61.179167,-156.77 71.285833)'),
>>> 600
>>> )::geometry
>>> ) as pt
>>> ),
>>> pts AS (
>>> SELECT (pt).geom, (pt).path[1] as vert FROM dump
>>> )
>>> SELECT elevation
>>> FROM data
>>> INNER JOIN (SELECT
>>> ST_MakeLine(ARRAY[a.geom, b.geom]) as short_line
>>> FROM pts a
>>> INNER

Re: [GENERAL] Improve PostGIS performance with 62 million rows?

2017-01-05 Thread Paul Ramsey
Varying the segment length upwards might have a salutary effect for a
while, as the efficiency improvement of fewer inner loops battles with the
inefficiency of having more points selected by the index filter. Worth an
experiment.

P

On Thu, Jan 5, 2017 at 1:00 PM, Israel Brewster 
wrote:

>
> On Jan 5, 2017, at 10:38 AM, Paul Ramsey 
> wrote:
>
> Yes, you did. You want a query that spits out a tupleset of goemetries
> (one each for each wee segment), and then you can join that set to your
> main table using st_dwithin() as the join clause.
> So start by ditching the main table and just work on a query that
> generates a pile of wee segments.
>
>
> Ahhh, I see you've done this sort of thing before (
> http://blog.cleverelephant.ca/2015/02/breaking-
> linestring-into-segments.html) :-)
>
> So following that advice I came up with the following query:
>
> WITH dump AS (SELECT
> ST_DumpPoints(
> ST_Segmentize(
> ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056
> 61.179167,-156.77 71.285833)'),
> 600
> )::geometry
> ) as pt
> ),
> pts AS (
> SELECT (pt).geom, (pt).path[1] as vert FROM dump
> )
> SELECT elevation
> FROM data
> INNER JOIN (SELECT
> ST_MakeLine(ARRAY[a.geom, b.geom]) as short_line
> FROM pts a
> INNER JOIN pts b
> ON a.vert=b.vert-1 AND b.vert>1) segments
> ON  ST_DWithin(location, segments.short_line, 600)
> ORDER BY elevation DESC limit 1;
>
> Which yields the following EXPLAIN ANALYZE (https://explain.depesz.com/s/
> RsTD <https://explain.depesz.com/s/ukwc>):
>
>
>QUERY PLAN
>
>
> 
> 
> 
> 
>  Limit  (cost=11611706.90..11611706.91 rows=1 width=4) (actual
> time=1171.814..1171.814 rows=1 loops=1)
>CTE dump
>  ->  Result  (cost=0.00..5.25 rows=1000 width=32) (actual
> time=0.024..1.989 rows=1939 loops=1)
>CTE pts
>  ->  CTE Scan on dump  (cost=0.00..20.00 rows=1000 width=36) (actual
> time=0.032..4.071 rows=1939 loops=1)
>->  Sort  (cost=11611681.65..11611768.65 rows=34800 width=4) (actual
> time=1171.813..1171.813 rows=1 loops=1)
>  Sort Key: data.elevation DESC
>  Sort Method: top-N heapsort  Memory: 25kB
>  ->  Nested Loop  (cost=0.55..11611507.65 rows=34800 width=4)
> (actual time=0.590..1167.615 rows=28408 loops=1)
>->  Nested Loop  (cost=0.00..8357.50 rows=1665 width=64)
> (actual time=0.046..663.475 rows=1938 loops=1)
>  Join Filter: (a.vert = (b.vert - 1))
>  Rows Removed by Join Filter: 3755844
>  ->  CTE Scan on pts b  (cost=0.00..22.50 rows=333
> width=36) (actual time=0.042..0.433 rows=1938 loops=1)
>Filter: (vert > 1)
>Rows Removed by Filter: 1
>  ->  CTE Scan on pts a  (cost=0.00..20.00 rows=1000
> width=36) (actual time=0.000..0.149 rows=1939 loops=1938)
>->  Index Scan using location_gix on
> data  (cost=0.55..6968.85 rows=1 width=36) (actual time=0.085..0.256
> rows=15 loops=1938)
>  Index Cond: (location &&
> _st_expand((st_makeline(ARRAY[a.geom, b.geom]))::geography, '600'::double
> precision))
>  Filter: (((st_makeline(ARRAY[a.geom,
> b.geom]))::geography && _st_expand(location, '600'::double precision)) AND
> _st_dwithin(location, (st_makeline(ARRAY[a.geom,
> b.geom]))::geography, '600'::double precision, true))
>  Rows Removed by Filter: 7
>  Planning time: 4.318 ms
>  Execution time: 1171.994 ms
> (22 rows)
>
> So not bad. Went from 20+ seconds to a little over 1 second. Still
> noticeable for a end user, but defiantly usable - and like mentioned,
> that's a worst-case scenario query. Thanks!
>
> Of course, if you have any suggestions for further improvement, I'm all
> ears :-)
> ---
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> ---
>
>
> On Thu, Jan 5, 2017 at 11:36 AM, Israel Brewster 
> wrote:
>
>> On Jan 5, 2017, at 8:50 AM, Paul Ramsey 
>> wrote:
>>
>>
>> The index filters using bounding boxes.  A long, diagonal route will have
>> a 

Re: [GENERAL] Improve PostGIS performance with 62 million rows?

2017-01-05 Thread Paul Ramsey
Yes, you did. You want a query that spits out a tupleset of goemetries (one
each for each wee segment), and then you can join that set to your main
table using st_dwithin() as the join clause.
So start by ditching the main table and just work on a query that generates
a pile of wee segments.

On Thu, Jan 5, 2017 at 11:36 AM, Israel Brewster 
wrote:

> On Jan 5, 2017, at 8:50 AM, Paul Ramsey  wrote:
>
>
> The index filters using bounding boxes.  A long, diagonal route will have
> a large bounding box, relative to the area you actually care about (within
> a narrow strip of the route). Use ST_Segmentize() to add points to your
> route, ST_DumpPoints() to dump those out as point and ST_MakeLine to
> generate new lines from those points, each line very short. The maximum
> index effectiveness will come when your line length is close to your buffer
> width.
>
> P
>
>
> Ok, I think I understand the concept. So attempting to follow your advice,
> I modified the query to be:
>
> SELECT elevation
> FROM data
> WHERE
> ST_DWithin(
> location,
> (SELECT ST_MakeLine(geom)::geography as split_line
>  FROM (SELECT
> (ST_DumpPoints(
> ST_Segmentize(
> ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056
> 61.179167,-156.77 71.285833)'),
> 600
> )::geometry
> )).geom
> ) s1),
> 600
> )
> ORDER BY elevation DESC limit 1;
>
> It took some fiddling to find a syntax that Postgresql would accept, but
> eventually that's what I came up with. Unfortunately, far from improving
> performance, it killed it - in running the query, it went from 22 seconds
> to several minutes (EXPLAIn ANALYZE has yet to return a result). Looking at
> the query execution plan shows, at least partially, why:
>
>   QUERY PLAN
>
> 
> --
>  Limit  (cost=17119748.98..17119748.98 rows=1 width=4)
>InitPlan 1 (returns $0)
>  ->  Aggregate  (cost=17.76..17.77 rows=1 width=32)
>->  Result  (cost=0.00..5.25 rows=1000 width=32)
>->  Sort  (cost=17119731.21..17171983.43 rows=20900890 width=4)
>  Sort Key: data.elevation DESC
>  ->  Seq Scan on data  (cost=0.00..17015226.76 rows=20900890
> width=4)
>Filter: st_dwithin(location, $0, '600'::double precision)
> (8 rows)
>
> So apparently it is now doing a sequential scan on data rather than using
> the index. And, of course, sorting 20 million rows is not trivial either.
> Did I do something wrong with forming the query?
>
> ---
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> ---
>
>
> On Thu, Jan 5, 2017 at 9:45 AM, Israel Brewster 
> wrote:
>
>> I have a database (PostgreSQL 9.6.1) containing 62,702,675 rows of
>> latitude (numeric), longitude(numeric), elevation(integer) data, along with
>> a PostGIS (2.3.0) geometry column (location), running on a CentOS 6.8 box
>> with 64GB RAM and a RAID10 SSD data drive. I'm trying to get the maximum
>> elevation along a path, for which purpose I've come up with the following
>> query (for one particular path example):
>>
>> SELECT elevation FROM data
>>
>>
>>
>>
>> WHERE ST_DWithin(location, 
>> ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056
>> 61.179167,-156.77 71.285833)'), 600)
>>
>>
>>
>>   ORDER BY elevation LIMIT 1;
>>
>> The EXPLAIN ANALYZE output of this particular query (
>> https://explain.depesz.com/s/heZ) shows:
>>
>>
>>
>> QUERY PLAN
>>
>>
>> 
>> 
>> 
>> 
>> 
>> --
>>  Limit  (cost=4.83..4.83 rows=1 width=4) (actual
>> time=22653.840..22653.842 rows=1 loops=1)
>>->  Sort  (cost=4.83..4.83 rows=1 width=4) (actual
>> time=22653.837..22653.837 rows=1 loops=1)
>>  Sort Key: elevation DESC
>>  Sort Method: top-N heapsort  Memory: 25kB
>>  ->  Index Scan using location_gix on data  (cost=0.42..4.82
>>

Re: [GENERAL] Improve PostGIS performance with 62 million rows?

2017-01-05 Thread Paul Ramsey
The index filters using bounding boxes.  A long, diagonal route will have a
large bounding box, relative to the area you actually care about (within a
narrow strip of the route). Use ST_Segmentize() to add points to your
route, ST_DumpPoints() to dump those out as point and ST_MakeLine to
generate new lines from those points, each line very short. The maximum
index effectiveness will come when your line length is close to your buffer
width.

P

On Thu, Jan 5, 2017 at 9:45 AM, Israel Brewster 
wrote:

> I have a database (PostgreSQL 9.6.1) containing 62,702,675 rows of
> latitude (numeric), longitude(numeric), elevation(integer) data, along with
> a PostGIS (2.3.0) geometry column (location), running on a CentOS 6.8 box
> with 64GB RAM and a RAID10 SSD data drive. I'm trying to get the maximum
> elevation along a path, for which purpose I've come up with the following
> query (for one particular path example):
>
> SELECT elevation FROM data
>
>
>
>
> WHERE ST_DWithin(location, 
> ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056
> 61.179167,-156.77 71.285833)'), 600)
>
>
>
>   ORDER BY elevation LIMIT 1;
>
> The EXPLAIN ANALYZE output of this particular query (
> https://explain.depesz.com/s/heZ) shows:
>
>
>
> QUERY PLAN
>
>
> 
> 
> 
> 
> 
> --
>  Limit  (cost=4.83..4.83 rows=1 width=4) (actual time=22653.840..22653.842
> rows=1 loops=1)
>->  Sort  (cost=4.83..4.83 rows=1 width=4) (actual
> time=22653.837..22653.837 rows=1 loops=1)
>  Sort Key: elevation DESC
>  Sort Method: top-N heapsort  Memory: 25kB
>  ->  Index Scan using location_gix on data  (cost=0.42..4.82
> rows=1 width=4) (actual time=15.786..22652.041 rows=11081 loops=1)
>Index Cond: (location && '010220E61002002C11
> A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography)
>Filter: (('010220E61002002C11
> A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography
> && _st_expand(location, '600'::double precision)) AND
> _st_dwithin(location, '010220E61002002C11
> A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography,
> '600'::double precision, true))
>Rows Removed by Filter: 4934534
>  Planning time: 0.741 ms
>  Execution time: 22653.906 ms
> (10 rows)
>
> So it is using the index properly, but still takes a good 22 seconds to
> run, most of which appears to be in the Index Scan.
>
> Is there any way to improve this, or is this going to be about as good as
> it gets with the number of rows being dealt with? I was planning to use
> this for a real-time display - punch in a couple of points, get some
> information about the route between, including maximum elevation - but with
> it taking 22 seconds for the longer routes at least, that doesn't make for
> the best user experience.
>
> It's perhaps worth noting that the example above is most likely a worst
> case scenario. I would expect the vast majority of routes to be
> significantly shorter, and I want to say the shorter routes query much
> faster [testing needed]. That said, the faster the better, even for short
> routes :-)
> ---
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> ---
>
>
>
>
>
>


Re: [GENERAL] Index impact on update?

2017-01-04 Thread Paul Ramsey
You'd be better off forcing the table to write in bulk with something like

CREATE TABLE mynewtable AS
SELECT *, geography(ST_SetSRID(ST_MakePoint(lng, lat), 4326)) AS geog
FROM myoldtable;

Then index the new table, rename, etc. Bulk update will, in addition to
being slow, use 2x the amount of space on disk, as all the old tuples are
left behind from the update until you cluster or vacuum full the table.

P



On Wed, Jan 4, 2017 at 8:59 AM, Israel Brewster 
wrote:

> Short version:
> Do indexes impact the speed of an UPDATE, even when the indexed columns
> aren't changing?
>
> Details:
> I have a table containing geographical data (Latitude, longitude, and
> elevation) with 406,833,705 records. The Latitude and Longitude columns are
> indexed. In order to better utilize the data, I've been looking into
> PostGIS, and decided I wanted to add a "Location" column with PostGIS type
> "GEOGRAPHY(point)". I then tried to populate it from the existing
> latitude/longitude data using the following query:
>
> UPDATE data SET location=ST_GeographyFromText(
> 'SRID=4326;POINT('||lng::text||' '||lat::text||')');
>
> I expected this update to take quite a while, since it has 406 million
> rows to update, but at this point it's been over 19 hours since I started
> the query, and it still hasn't completed.
>
> I'm wondering if the presence of the indexes could be slowing things down
> even though the indexed columns aren't being updated? Would I be better off
> canceling the update query, dropping the indexes, and trying again? Or is
> more likely that the update query is "almost" done, and it would be better
> to just let it run it's course? Or is there an even better option, such as
> perhaps exporting the data, adding the additional column in a text editor,
> and re-importing the data with a COPY command?
>
> Thanks for any feedback/advice you can offer!
> ---
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> ---
>
>
>
>
>
>


Re: [GENERAL] Extensions and privileges in public schema

2016-12-04 Thread Paul Ramsey
When you create the student user, remove their create privs in public.
Then create a scratch schema and grant them privs there.
Finally, alter the student user so that the scratch schema appears FIRST in
their search path. This will cause unqualified CREATE statements to create
in the scratch schema.
For full separation, give each student their own login and set the search
path to

"$user", public

That way each student gets their own private scratch area, and it is used
by default for their creates.

P


On Sun, Dec 4, 2016 at 1:10 PM, Lee Hachadoorian <
lee.hachadooria...@gmail.com> wrote:

> This question is specifically motivated by my use of the PostGIS
> extension, but since other extensions create functions and other supporting
> objects in public schema, I believe it is more general.
>
> I'm teaching a university-level class using PostGIS. I have created a
> scratch schema for students to create objects in. At the end of the term I
> can drop scratch and start fresh the following term.
>
> Students of course can also create objects in public schema, and often do
> unintentionally because the forget to schema qualify their CREATE TABLE
> statements. This complicates things because I can't drop public schema
> without dropping various PostGIS (and other) tables and functions.
> Additionally, while I doubt the students would do something like drop a
> public function or supporting table (like spatial_ref_sys), it nonetheless
> seems like a poor idea for these database objects to be vulnerable.
>
> What is considered best practices in this case? Should PostGIS extension
> be kept in its own schema (as was suggested when I asked about this on
> GIS.SE)? If I do so, can I treat public schema the way I have been using
> scratch schema, i.e. could I drop and recreate clean public schema at end
> of term? Should I leave extensions in public but limit rights of public
> role in that schema (so that they don't unintentionally create tables
> there, or accidentally delete other objects)? Or do Postgres DBA's just not
> worry about the objects in public schema, and rely upon applications and
> login roles to interact with the database intelligently?
>
> To be clear, primary goal is to keep student created objects in one schema
> which can be dropped at the end of the term. But the question of preventing
> accidental creation/deletion of objects in public schema is possibly
> related, and the overall database organization might address both concerns.
>
> Best,
> --Lee
>
>
> --
> Lee Hachadoorian
> Assistant Professor of Instruction, Geography and Urban Studies
> Assistant Director, Professional Science Master's in GIS
> Temple University
>


[GENERAL] Avoiding double-counting in aggregates with more than one join?

2016-11-18 Thread Paul Jungwirth

Hi All,

I've noticed in the past that doing aggregates while joining to more 
than one table can sometimes give you unintended results. For example, 
suppose I have three tables: products, sales, and resupplies. In sales I 
track what I sell, and in resupplies I track my own purchases to 
increase inventory. Both have a foreign key to products. Now I want to 
run a report showing the total dollars sold for each product versus the 
total dollars spent for each product. I could try this:


SELECT  p.id,
SUM(s.price * s.qty) AS total_sold,
SUM(r.price * r.qty) AS total_spent
FROMproducts p
LEFT OUTER JOIN sales s
ON  s.product_id = p.id
LEFT OUTER JOIN resupplies r
ON  r.product_id = p.id
GROUP BY p.id
;

That seems pretty safe, but actually I get bad answers,
for example if some product has this data:

sales
-
sold 1 @ $2/ea

resupplies
--
bought 1 @ $1/eq
bought 2 @ $1/ea

Then pre-grouping I have this:

p.id | s.qty | s.price | r.qty | r.price
-+---+-+---+
   1 | 1 |  $2 | 1 |  $1
   1 | 1 |  $2 | 2 |  $1

You can see the problem is that I'm going to double-count my sales.
What I really want is this:

p.id | s.qty | s.price | r.qty | r.price
-+---+-+---+
   1 | 1 |  $2 | 1 |  $1
   1 |   | | 2 |  $1

In the past I've always handled these situations by aggregating each 
table separately

and only then joining things together:

WITH
s AS (
  SELECT  product_id,
  SUM(price * qty) AS total_sold
  FROMsales
  GROUP BY product_id) s
),
r AS (
  SELECT  product_id,
  SUM(price * qty) AS total_spent
  FROMresupplies
  GROUP BY product_id) r
)
SELECT  p.id,
COALESCE(s.total_sold, 0),
COALESCE(r.total_spent, 0)
FROMproducts p
LEFT OUTER JOIN s
ON  s.product_id = p.id
LEFT OUTER JOIN r
ON  r.product_id = p.id
;

Since I've guaranteed that each CTE includes at most one row per product,
this is safe from double-counting errors.

But is there a better way? My approach feels verbose
and harder to read. Also you have to type COALESCE a lot. :-)
Is there some different way of doing things I haven't thought of yet?

Also I wonder about the performance merging all these subqueries together.
Would the final merging be any faster if I had an ORDER BY in each CTE?

It seems like this pattern comes up a lot;
what have others done about it?

Thanks,
Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Multi tenancy : schema vs databases

2016-09-30 Thread Paul Jungwirth

On Fri, Sep 30, 2016 at 6:06 AM, Rakesh Kumar
 wrote:

A typical fear mongering Q from
them "what if due to a bug in your s/w, our competitors end up looking at our 
data" or
something like that. That's why schema level vs db level discussion.


I've been reading this discussion with great interest, to see what other 
Postgres experts think. :-)


I've almost always taken the customer_id approach, and I prefer it---but 
I also agree it is easier to make mistakes, because you have to include 
that condition in your code everywhere. With per-schema or per-database, 
you can manage access simply by handing out connections.


If isolation is your goal, one drawback with one-database-many-schemas 
is that there is no way to prevent users from listing all the schemas in 
the database. In psql this is `\dn`, but you can also do it with SQL 
against the system catalog. You can forbid querying the tables in the 
schema, but anyone can see that the schema itself is there. So that 
would leak some information---at the very least the number of customers 
you have. You haven't said whether users will go through an application 
tier or have direct SQL access, but if it's the latter, this is 
something to be aware of. I believe it is possible to prevent, but only 
by taking away access from important catalog tables that would also 
break `\dt` or `\d foo`. (I would love to be corrected btw!) Also you 
can't use RLS against the system catalog, so there's no solution there.


Good luck!

Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Time travel?

2016-09-29 Thread Paul Jungwirth

Hi Melvin:

On 09/29/2016 12:06 PM, Melvin Davidson wrote:

I list the creation time for a WAL file and it shows:

 /home/mdavidson/dba$ ls -l --time=ctime
/d-log/pg_xlog/0001000D00C9
-rw--- 1 postgres postgres 16777216 Sep 29 07:14
/d-log/pg_xlog/0001000D00C9


ctime stands for "changed" not "created". It is not quite the same as 
mtime ("modified") because mtime is updated only when the file contents 
change, but ctime is updated if either the file contents or file 
attributes change:


http://www.linux-faqs.info/general/difference-between-mtime-ctime-and-atime

I hope that explains it!

Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Stats update difference between VACUUM ANALYZE and ANALYZE in 9.2?

2016-09-26 Thread Paul Jones
For a freshly pg_restore'd 9.2 database, would VACUUM ANALYZE update
statistics any better than just an ANALYZE?

After a restore, we ran a bunch of ANALYZEs on each table individually
using GNU 'parallel' (for speed).  Many of these tables are child tables
in a partition.  Following the ANALYZEs, a join with the parent table
showed all of the child tables scanned sequentially.

After running VACUUM ANALYZE on the whole database, the same join used
index-only scans on the child tables.

An examination of the fine manual implies there may be some difference
(or a documentation conflict?) between running ANALYZE manually on
individual tables and an unqualified ANALYZE on the whole database.

5.9.6:
"If you are using manual VACUUM or ANALYZE commands, don't forget
that you need to run them on each partition individually. A
command like:
ANALYZE measurement;
will only process the master table."

ANALYZE:
"If the table being analyzed has one or more children, ANALYZE
will gather statistics twice: once on the rows of the parent table
only, and a second time on the rows of the parent table with all
of its children. This second set of statistics is needed when
planning queries that traverse the entire inheritance tree. The
autovacuum daemon, however, will only consider inserts or updates
on the parent table itself when deciding whether to trigger
an automatic analyze for that table. If that table is rarely
inserted into or updated, the inheritance statistics will not
be up to date unless you run ANALYZE manually."

Can anyone explain what's going on here?

Thanks,
Paul Jones


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] journaling / time travel

2016-09-19 Thread Paul Jungwirth

> On 09/19/2016 10:56 AM, Willy-Bas Loos wrote:
> > On Mon, Sep 19, 2016 at 6:26 PM, Paul Jungwirth
> > mailto:p...@illuminatedcomputing.com>> 
wrote:

> > I've worked on similar
> > projects that maintain history for regulatory reasons.
> Can you explain "regulatory reasons" please?

I mean government regulations, like HIPAA and Sarbanes-Oxley. I'm in the 
U.S. Since you are in the Netherlands you'll have different rules to 
deal with.


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] journaling / time travel

2016-09-19 Thread Paul Jungwirth

On 09/19/2016 08:48 AM, Willy-Bas Loos wrote:

Since records can
be changed afterwards, it has been argued that we should have
"journaling", meaning that every change to the data is saved in a
separate schema that holds a "journaling" copy of each table


I don't think this is especially unusual. I've worked on similar 
projects that maintain history for regulatory reasons. I can't speak to 
your legal situation, but I don't think keeping history is a strange choice.


Re the security argument: it seems to me it depends on whether you can 
restrict access to the journal while providing access to the normal 
tables. I guess journaling would help for some threats but not others, 
right? If regular users can't touch the journal, then I agree keeping 
one could make sense.


Are you saying the journaling portion is 624GB on top of 1.1TB? Or that 
of the 1.1TB, 624GB of it is from journaling? Either way it doesn't seem 
like a massive cost to me.


I don't think PITR is an appropriate solution to keeping a 10-year 
history of changes.


It sounds like you're not looking for a green-field solution, but just 
trying to get perspective on what others are doing. Some resources for 
this that might help you:


https://www.youtube.com/watch?v=TRgni5q0YM8
https://github.com/arkhipov/temporal_tables (aka 
http://pgxn.org/dist/temporal_tables/)

http://www.cs.arizona.edu/~rts/tdbbook.pdf (also available in print)
https://www.amazon.com/Bitemporal-Data-Practice-Tom-Johnston/dp/0124080677

The two books contain multiple approaches to storing history, each with 
tradeoffs for ease-of-use, disk space, etc. Reading them might be 
overkill for you. If you want to choose one, I'd read Snodgrass. The 
first few chapters are tedious, but it gets better.


Temporal databases are in a funny in-between zone where there is lots of 
research, but standard tools are fairly underdeveloped. Postgres 
recently added range types and exclusion constraints, which are 
important primitives for building a temporal system, but it is still a 
long way from SQL:2011, and SQL:2011 is itself a long way from 
everything you might want. One thing that seems lacking to me, even in 
the research, is how to handle DDL changes. You should be glad that you 
only care about audit history and not subject history too, because going 
bi-temporal is where you really cross over into lack of available tools, 
outside of a few commercial offerings. (Teradata has temporal support, 
using a Snodgrass-like approach that pre-dates the standard.)


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] GIN Indexes: Extensibility

2016-07-27 Thread Paul Jungwirth

On 07/27/2016 07:44 AM, Vick Khera wrote:

On Wed, Jul 27, 2016 at 3:28 AM, Anton Ananich  wrote:

In my situation this order is invalid. Obviously, year 2016 should go after
2014, like that:


I think you expect JSONB to sort differently than it does. I cannot
imagine what a "natural" ordering of arbitrary JSON objects is.


FWIW, Postgres arrays do sort in the way he's expecting:

paul=# create table t (id integer, v integer[]);
CREATE TABLE
paul=# insert into t values (1, array[2014]), (2, array[2014, 1]), (3, 
array[2016]);

INSERT 0 3
paul=# select * from t order by v;
 id |v
+--
  1 | {2014}
  2 | {2014,1}
  3 | {2016}
(3 rows)

So maybe convert to an array before sorting?

Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] migrating data from an old postgres version

2016-07-15 Thread Paul Ramsey

> On Jul 15, 2016, at 6:55 AM, Melvin Davidson  wrote:
> 
> 
> 
> On Fri, Jul 15, 2016 at 9:46 AM, Willy-Bas Loos  > wrote:
> Hi,
> 
> A coworker is getting a new laptop and he wants to migrate some data from his 
> old one to the new one. So he installed postgres 9.5 on the new one and is 
> asking me how to migrate the data from the old 8.4 database. This database 
> includes postgis.
> 
> I asked him to make a network connection so that he can make a dump from the 
> 9.5 machine, but that seems to be tricky for him.
> He is inserting a harddrive in his new laptop which has the data directory on 
> it.
> 
> So i went to look for a 8.4 installer but it has been EOL for a while now, so 
> i found no official source for that. But i found a 8.4.12 installer on 
> filehorse.com 
> 
> Now, maybe this is overcautious, but being a linux man myself, i dislike 
> installing unknown software. So is there an md5 signature for that installer 
> around?
> Or maybe a more official source for a 8.4 installer?
> 
> Cheers,


Make sure to pg_dump -Fc
(note the flags)
so that in case you need to do a hard restore (if you’re going from postgis 1.x 
to 2.x) you can run the dump file through the special cleaning script.

See postgis documentation notes on “hard upgrade” for full info on doing a 
clean 1->2 upgrade. 

If he’s already blown away his old environment then yeah, you’re going to have 
to exactly duplicate it first so you can run 8.4 *and* an old version of 
postgis (have to match major version numbers) and get a clean dump file out of 
it.

P.

[GENERAL] RHEL 7

2016-07-08 Thread Paul Tilles - NOAA Federal
I am currently running Version 9.3.10 of postgres with RHEL 6.  I am going
to upgrade my O/S soon to RHEL 7. Do I need to upgrade to version 9.4.x of
postgres?

Paul Tilles


Re: [GENERAL] Update statement results in Out of memory

2016-07-06 Thread Paul Ramsey
Running a multi-million row update will take a long time.
It's possible you've exposed a memory leak in ST_Buffer (the older
your version of GEOS, the more likely that is) but it's also possible
you're just running a really long update.
I find for batch processing purposes that creating fresh tables is far
preferable:

CREATE TABLE newtable AS SELECT ST_Buffer(geom) ... FROM oldtable;

If you still see memory issues with the above then you probably do
have a leak, *or* you're just running buffer on a sufficiently large
input geometry or with a large enough radius to blow up the memory
naturally.

P


On Wed, Jul 6, 2016 at 2:13 AM, Ivan Bianchi  wrote:
> Hello,
>
> I am trying to update a column using a PostGIS ST_Buffer function into a
> table of 4.257.769 rows, but after 6 hours, an Out of memory error appears
> and the kernel starts killing processes until a Kernel Panic shows up.
>
> I have simplified the buffer target geometry and also added a gist index to
> that column.
>
> The statement is the following:
>>
>> psql -h host -U user -W -d database -c "UPDATE table SET buffer =
>> ST_Buffer(simplified_geometry, 0.005);"
>
>
> After reading and tunning the configuration, I still have the same result.
>
> Here's the initial memory stats:
>
>>   totalusedfree shared  buff/cache   available
>> Mem:15G1.5G 12G503M1.4G
>> 13G
>> Swap:  7.8G  0B7.8G
>
>
>
> I'm running out of ideas, as I think the postgresql.conf memory parameters
> are quite low for the machine specs. I understand I can split the process
> and paginate the rows, but I can't see why I can't deal with this full
> statement right now.
>
> Do you think this issue is related with the postgres memory parameters
> configuration? Why is not respecting the shared_buffers or
> effective_cache_size parameters and keeps growing?
>
>
> Here's some info:
>
> Machine specs
>
> Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (8 cores)
> 16 GB of memory
> Fedora release 23 (Twenty Three)
> Kernel - 4.5.7-202.fc23.x86_64
>
> postgresql.conf
>
> effective_cache_size = 5GB
> shared_buffers = 3GB
> work_mem = 10MB
>
> maintenance_work_mem = 800MB
> wal_buffers = 16MB
>
> Kernel parameters
>
> vm.overcommit_memory=2
>
> kernel.shmmax = 8340893696
> kernel.shmall = 2036351
>
> Versions:
>
> PostgreSQL 9.5.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 5.3.1
> 20160406 (Red Hat 5.3.1-6), 64-bit
> POSTGIS="2.2.2 r14797" GEOS="3.5.0-CAPI-1.9.0 r4084" PROJ="Rel. 4.9.1, 04
> March 2015" GDAL="GDAL 2.0.2, released 2016/01/26" LIBXML="2.9.3"
> LIBJSON="0.12" RASTER
>
>
> Many thanks,
>
> --
> Ivan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Possible to create canonicalized range type without being superuser?

2016-07-05 Thread Paul A Jungwirth
On Tue, Jul 5, 2016 at 10:17 PM, Paul A Jungwirth
 wrote:
> db=> create type inetrange;

Here is a follow-up question for creating inet ranges. Is there any
way to prevent someone from doing this?:

db=> select inetrange('1.2.3.4',
'2001:0db8::0042::8a2e:0370:7334', '[]');
inetrange
-
 [1.2.3.4,2001:db8:0:42:0:8a2e:370:7334]

Thanks!
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Possible to create canonicalized range type without being superuser?

2016-07-05 Thread Paul A Jungwirth
On Tue, Jul 5, 2016 at 3:28 PM, David G. Johnston
 wrote:
> On Tue, Jul 5, 2016 at 5:37 PM, Tom Lane  wrote:
>> Paul Jungwirth  writes:
>> > The problem is this (tried on 9.3 and 9.5):
>>
>> The only other obvious way to deal with this is to allow the canonical
>> function to be defined after the range type is created, and then added to
>> the type via an ALTER TYPE command.
>
> Can the canonical function be definitionally optional but runtime required?

Thanks Tom and David for your help! I actually tried to find something
in ALTER TYPE that would do just that, but of course it's not there.
It seems like an appealing solution though.

With Tom's warning that a shell type would require writing the
function in C, I figured I might as well just package it all up as an
extension, so here it is:

https://github.com/pjungwir/inetrange

Of course it still requires superuser privileges, but as least the C
part is just `make && sudo make install`.

Thanks again!
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Possible to create canonicalized range type without being superuser?

2016-07-05 Thread Paul Jungwirth

Hello,

I'm trying to create a custom inetrange type. This works:

CREATE TYPE inetrange AS RANGE (
  subtype = inet,
  subtype_diff = inet_diff
);

But since inet is discrete, not continuous, I'd like to define a 
canonical function too:


CREATE TYPE inetrange AS RANGE (
  subtype = inet,
  subtype_diff = inet_diff,
  canonical = inetrange_canonical
);

But of course I can't define the function until I have the type, and I 
can't define the type until I have the function.


Now the docs address this 
(https://www.postgresql.org/docs/9.5/static/sql-createtype.html), and 
recommend:


> Creating a canonical function is a bit tricky, since it must be 
defined before the range type can be declared. To do this, you must 
first create a shell type, which is a placeholder type that has no 
properties except a name and an owner. This is done by issuing the 
command CREATE TYPE name, with no additional parameters. Then the 
function can be declared using the shell type as argument and result, 
and finally the range type can be declared using the same name. This 
automatically replaces the shell type entry with a valid range type.


The problem is this (tried on 9.3 and 9.5):

db=> create type inetrange;
ERROR:  must be superuser to create a base type

So I'm wondering whether there is any way around this circle without 
being a superuser?


Thanks!
Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_dump fundenental question

2016-07-05 Thread Paul Linehan
> a good point, but I would prefer NOT to open a 324GB backup file in a text
> editor. I can however cat/less/head/tail the file in Linux.

Use vi (or flavour thereof) - it doesn't load the entire file in order to
read the contents of lines 1 - 100 (say).


Paul...


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID and Enum columns in exclusion constraints

2016-06-17 Thread Paul Jungwirth

Hi Adam,

On 06/17/2016 03:01 PM, Adam Brusselback wrote:

Just wondering what others have done for using enum or uuid columns in
exclusion constraints?
[snip]
And as a closing note on this, I really can't wait until these are
supported types for gist indexes.


Here is some work I did to add support for uuid types, with help from 
Ildus Kurbangaliev and Teodor Sigaev:


https://commitfest.postgresql.org/10/332/

I think it was basically done, but the last patch was never reviewed. 
Here is the mailing list thread:


http://postgresql.nabble.com/Review-GiST-support-for-UUIDs-td5865845.html

I would love to rebase that to the current code and re-submit. Maybe 
this weekend. :-)


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgresSQL and HIPAA compliance

2016-06-17 Thread Paul Jungwirth

On 06/17/2016 03:03 AM, Alex John wrote:

RDS is a prime candidate except for the fact that they have explicitly
stated that the Postgres engine is *not* HIPAA compliant.


More precisely, it is not covered by the BAA Amazon will sign.

I've helped several companies run HIPAA-compliant Postgres on regular 
EC2 instances (which *are* covered by your BAA, as long as they are 
dedicated instances---which do cost more). So you just have to do some 
of the server work yourself. If you are making the rest of your app 
HIPAA-compliant anyway, it shouldn't add a large burden to do Postgres 
that way too. Make sure your access rules are good, use SSL for the 
connections, put it on an encrypted disk (easy these days with encrypted 
EBS volumes), etc.


Slightly more effort but still very doable is handling requirements for 
auditing accesses and changes. How you do this probably depends on the 
rest of your stack.


Yours,
Paul







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Thoughts on "Love Your Database"

2016-05-04 Thread Paul Jungwirth

On 05/04/2016 08:39 AM, Paul Jungwirth wrote:

On 05/03/2016 09:11 PM, Guyren Howe wrote:

I think I'm going to write a book called Love Your Database, aimed at
web developers


I gave a talk here about doing "interesting" Postgres things in Rails:


Oh also: one part of my talk I did like what giving my mental process 
for building up a query. Because of SQL's declarative nature, a lot of 
people just don't know where to start. My own thinking goes like this:


1. Each output row is a _.
2. Use that for the `FROM`.

From there, it is easy to JOIN to whatever else I need, add filters, 
and fill in the SELECT.


That's worked really well for me. I'd love to hear what other people do, 
if you've done any reflection on your own thought process.


I think teaching this is a different thing than just teaching SQL 
syntax. I haven't seen it written about a lot. It must be somewhere, but 
any book encouraging "outsiders" to use more SQL could benefit from 
giving them direction like that.


Paul




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Thoughts on "Love Your Database"

2016-05-04 Thread Paul Jungwirth

On 05/03/2016 09:11 PM, Guyren Howe wrote:

I think I'm going to write a book called Love Your Database, aimed at web 
developers

What might I cover that I haven't mentioned? What are the usual objections to 
server-side code and how can they be met? When *are* they justified and what 
should the criteria be to put code in Postgres? Any other thoughts? Any other 
websites or books on the topic I might consult?


I gave a talk here about doing "interesting" Postgres things in Rails:

https://github.com/pjungwir/rails-and-sql-talk

I don't think that will satisfy many people on this list advocating for 
stored procedures, but it is an example of trying to teach what SQL can 
do, and how you can do those things without losing your convenient and 
familiar ORM tools, e.g. running a query and getting back a bunch of 
Ruby objects.


I was not really happy with the talk to be honest. I felt it lacked 
unity, it didn't fit in the time I had, and it was too split between 
"for beginners" and advanced stuff. It was just snippets---in other 
words CTEs and window functions :-). I would like to see something that 
offers more strategic advice. Give me a plan.


I would be very interested in that book. I think the biggest challenge 
will be identifying with your audience: knowing their priorities and 
concerns and workflows. My take on the last 25 years of software 
architecture theory is "how to put a layer in front of my database." I 
think most people who want business logic in the database are dismissive 
of this work and too derogatory toward typical web developers, so I 
would be pleased to see a book that takes that history more seriously. 
You aren't going to convince the world to trade Rails for PLSQL. But are 
there ways I can use SQL (and PLSQL) without giving up Rails? How will I 
maintain that stuff? Does it play nice with Rails database migrations? 
How will I write tests for it? How do I debug it? What principles will 
help me draw the line between (PL)SQL and Ruby?


Good luck! I think a book like that would be great.

Paul







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Proper relational database?

2016-04-22 Thread Paul Jungwirth

On 04/21/2016 01:36 PM, Guyren Howe wrote:

Anyone familiar with the issue would have to say that the tech world

> would be a significantly better place if IBM had developed a real
> relational database with an elegant query language

I'm surprised no one yet has mentioned Tutorial D by C. J. Date. His 
book _Database in Depth_ is pretty much an extended argument for how 
superior it is to SQL. RelDB is apparently an open source implementation 
of it, and D4 is a commercial one. That's about all I know in terms of 
practically using it for something. But Date & Tutorial D seems like a 
good place to start if SQL isn't relational enough for you. The book I 
mentioned is short and easy to read.


Paul







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Question about shared_buffer cache behavior

2016-03-20 Thread Paul Jones
In Postgres 9.5.1 with a shared_buffer cache of 7Gb, a SELECT from
a single table that uses an index appears to read the table into the
shared_buffer cache.  Then, as many times as the exact same SELECT is
repeated in the same session, it runs blazingly fast and doesn't even
touch the disk.  All good.

Now, in the *same* session, if a different SELECT from the *same* table,
using the *same* index is run, it appears to read the entire table from
disk again.

Why is this?  Is there something about the query that qualifies the
contents of the share_buffer cache?  Would this act differently for
different kinds of indexes?

PJ


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] MongoDB 3.2 beating Postgres 9.5.1?

2016-03-11 Thread Paul Jones
I have been running the EDB benchmark that compares Postgres and MongoDB.
I believe EDB ran it against PG 9.4 and Mongo 2.6.  I am running it
against PG 9.5.1 and Mongo 3.2 with WiredTiger storage using 10,000,000
JSON records generated by the benchmark.  It looks like Mongo is winning,
and apparently because of its cache management.

The first queries on both run in ~30 min.  And, once PG fills its cache,
it whips Mongo on repeats of the *same* query (vmstat shows no disk
reads for PG).

However, when different query on the same table is issued to both,
vmstat shows that PG has to read the *entire* table again, and it takes
~30 min.  Mongo does a lot of reads initially but after about 5 minutes,
it stops reading and completes the query, most likely because it is
using its cache very effectively.

Host:   Virtual Machine
4 CPUs
16 Gb RAM
200 Gb Disk
RHEL 6.6

PG: 9.5.1 compiled from source
shared_buffers = 7GB
effectve_cache_size = 12GB

Mongo:  3.2 installed with RPM from Mongo

In PG, I created the table by:

CREATE TABLE json_tables
(
dataJSONB
);

After loading, it creates the index:

CREATE INDEX json_tables_idx ON json_tables USING GIN (data jsonb_path_ops);

After a lot of experimentation, I discovered that the benchmark was not
using PG's index, so I modified the four queries to be:

SELECT data FROM json_tables WHERE data @> '{"brand": "ACME"}';
SELECT data FROM json_tables WHERE data @> '{"name": "Phone Service Basic 
Plan"}';
SELECT data FROM json_tables WHERE data @> '{"name": "AC3 Case Red"}';
SELECT data FROM json_tables WHERE data @> '{"type": "service"}';

Here are two consecutive explain analyze for PG, for the same query.
No functional difference in the plans that I can tell, but the effect
of PG's cache on the second is dramatic.

If anyone has ideas on how I can get PG to more effectively use the cache
for subsequent queries, I would love to hear them.

---

benchmark=# explain analyze select data from json_tables where data @> 
'{"name": "AC3 Case Red"}';

   QUERY PLAN
 

-

Bitmap Heap Scan on json_tables  (cost=113.50..37914.64 rows=1 width=1261)
(actual time=2157.118..1259550.327 rows=909091 loops=1)
   Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
   Rows Removed by Index Recheck: 4360296
   Heap Blocks: exact=37031 lossy=872059
   ->  Bitmap Index Scan on json_tables_idx  (cost=0.00..111.00 rows=1 
width =0) (actual time=2141.250..2141.250 rows=909091 loops=1)
 Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
Planning time: 291.932 ms
Execution time: 1259886.920 ms
(8 rows)

Time: 1261191.844 ms

benchmark=# explain analyze select data from json_tables where data @> 
'{"name": "AC3 Case Red"}';
  QUERY PLAN

---

Bitmap Heap Scan on json_tables  (cost=113.50..37914.64 rows=1 width=1261) 
(actual time=779.261..29815.262 rows=909091 loops=1)
   Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
   Rows Removed by Index Recheck: 4360296
   Heap Blocks: exact=37031 lossy=872059
   ->  Bitmap Index Scan on json_tables_idx  (cost=0.00..111.00 rows=1 
width =0) (actual time=769.081..769.081 rows=909091 loops=1)
 Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
Planning time: 33.967 ms
Execution time: 29869.381 ms

(8 rows)

Time: 29987.122 ms


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Anyone compare PG 9.5 and MongoDB 3.2?

2016-03-01 Thread Paul Jones
MongoDB has released 3.2 with their WiredTiger storage.  Has anyone
benchmarked 9.5 against it, and for JSONB elements several MB in size?

PJ


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Ubuntu and Rails postgresql setup

2016-02-24 Thread Paul Jungwirth

On 02/24/2016 09:44 AM, Paul Jungwirth wrote:

Also, Rails wants to use Postgres "ident" authentication, which does not
require a password because it trusts that the OS has already
authenticated you.


Sorry, I misspoke: this is "peer" authentication, not "ident". Should 
have looked it up first. :-)


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Ubuntu and Rails postgresql setup

2016-02-24 Thread Paul Jungwirth

On 02/24/2016 12:34 AM, Marco Lobbia wrote:

I am on a Ubuntu 14.04 LTS machine.


I thought I'd chime in since I work with Rails and Postgres on Ubuntu 
all day long. :-) 14.04 LTS is fine for both production and development. 
(Sounds like you're using Heroku for production in any case.)



Putting everything together, all the information seems to converge on
the necessity of creating a database superuser with login name that
match my Ubuntu user name with:

|sudo -u postgres createuser --superuser $USER|


This is the "normal" way to do it for Rails. It wants a superuser so it 
can drop and recreate the database before running tests (`rake test`). 
Personally I prefer to give each project a separate non-superuser 
Postgres account and start tests with just `rspec spec`, but if you're 
new to Rails I recommend going with the approved method.


Also, Rails wants to use Postgres "ident" authentication, which does not 
require a password because it trusts that the OS has already 
authenticated you. You can avoid this too if you like by adding `host: 
localhost` to your `database.yml`, but then you'll need to use a password.


Btw since this is not the Rails mailing list, feel free to email me 
personally if you have more Rails-specific questions.


> According to Heroku
> <https://devcenter.heroku.com/articles/heroku-postgresql#local-setup> 
> it is necessary "to export the DATABASE_URL environment variable for

> your app to connect to it when running locally", with:
> |export DATABASE_URL=postgres:///$(whoami)|

That is if you want to connect to the *remote* database on Heroku (i.e. 
probably your *production* database). If you want to connect to your 
locally-installed Postgres, you should not set DATABASE_URL.


> Finally I am wondering whether the choice of installing PostgreSQL
> through the PostgreSQL apt repository would be safe enough or it would
> be preferable to install the LTS version of Ubuntu.

Either is fine.

Good luck!
Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 9.4 -> 9.5 upgrade problem when both python2 and python3 present

2016-01-12 Thread Paul Jones
On Mon, Jan 11, 2016 at 06:23:06PM -0500, Tom Lane wrote:
> Date: Mon, 11 Jan 2016 18:23:06 -0500
> From: Tom Lane 
> To: Paul Jones 
> cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] 9.4 -> 9.5 upgrade problem when both python2 and
>  python3 present
> 
> 
> > I guess the thing to do is to manually pg_dump the databases that have
> > python, drop them, upgrade, and then manually restore.
> 
> If you'd rather build a patched version of PG, I have posted a
> work-in-progress patch to address this issue:
> http://www.postgresql.org/message-id/31659.1452538...@sss.pgh.pa.us

It worked most excellently, thank you.

> 
>   regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 9.4 -> 9.5 upgrade problem when both python2 and python3 present

2016-01-11 Thread Paul Jones
On Mon, Jan 11, 2016 at 10:04:16AM -0500, Tom Lane wrote:

/tmp/mutt-mayon-1000-19386-284b6a00794950f414
> Paul Jones  writes:
> > pg_upgrade complains about not being able to find $libdir/plpython3
> > when upgrading a 9.4 cluster that has both python2 and python3 used.
> 
> No, that's not what the error message says:
> 
> > Could not load library "$libdir/plpython3"
> > FATAL:  Python major version mismatch in session
> > DETAIL:  This session has previously used Python major version 2, and it is 
> > now attempting to use Python major version 3.
> > HINT:  Start a new session to use a different Python major version.
> 
> This is a restriction we put in place because libpython2 and libpython3
> don't coexist nicely in the same address space.  Unfortunately, it makes
> it problematic to restore a dump that contains references to both python2
> and python3 functions.
> 
> It looks like pg_upgrade tries to load all libraries from functions in
> any database in the old cluster into a single session in the new cluster,
> which will fail in a scenario like this even if you keep python2 and
> python3 functions rigorously separated into distinct databases.  I'm
> not sure if we could weaken that test enough to work.

I guess the thing to do is to manually pg_dump the databases that have
python, drop them, upgrade, and then manually restore.

> 
> > I dropped the python2 database but still got the problem.
> 
> You must still have at least one database that contains references
> to python2 (check pg_language to be sure).

I thought of that after I pulled the trigger on the mail...

Thanks for the information...

> 
>   regards, tom lane
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] 9.4 -> 9.5 upgrade problem when both python2 and python3 present

2016-01-11 Thread Paul Jones
pg_upgrade complains about not being able to find $libdir/plpython3
when upgrading a 9.4 cluster that has both python2 and python3 used.
Both the 9.4 and 9.5 PGs have been built from source with python2/3 in
the recommended way and the plpython3.so is present in /usr/local/pgsql/lib.

I dropped the python2 database but still got the problem.

The session:


postgres@mayon:~$ pg_upgrade -b /usr/local/pgsql9.4/bin -B /usr/local/pgsql/bin 
-d /mnt/pgdata9.4 -D /mnt/pgdata 
Performing Consistency Checks
-
Checking cluster versions   ok
Checking database user is the install user  ok
Checking database connection settings   ok
Checking for prepared transactions  ok
Checking for reg* system OID user data typesok
Checking for contrib/isn with bigint-passing mismatch   ok
Creating dump of global objects ok
Creating dump of database schemas
ok
Checking for presence of required libraries fatal

Your installation references loadable libraries that are missing from the
new installation.  You can add these libraries to the new installation,
or remove the functions using them from the old installation.  A list of
problem libraries is in the file:
loadable_libraries.txt

Failure, exiting
postgres@mayon:~$ cat *.txt
Could not load library "$libdir/plpython3"
FATAL:  Python major version mismatch in session
DETAIL:  This session has previously used Python major version 2, and it is now 
attempting to use Python major version 3.
HINT:  Start a new session to use a different Python major version.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] ERROR: could not read block 3 in file "base/12511/12270"

2015-12-23 Thread Paul Jones
I have been having disk errors that have corrupted something in
my postgres database.  Other databases work ok:

Running on Ubuntu 10.04.

paul@kitanglad:~$ psql -U postgres
psql (9.4.5)
Type "help" for help.

postgres=# SELECT pg_catalog.pg_is_in_recovery();
ERROR:  could not read block 3 in file "base/12511/12270": read only 4096 of 
8192 bytes
postgres=# \c pjtest
You are now connected to database "pjtest" as user "postgres".
pjtest=# SELECT pg_catalog.pg_is_in_recovery();
 pg_is_in_recovery 
 ---
  f
(1 row)


Since this is the "postgres" database, dropping and re-creating it
doesn't seem possible.  pg_dump also gets the same error when I run
it on "postgres" so pg_basebackup will probably get the same error.

The only thing I can think of is to create a completely new data directory
with initdb and pg_dump/restore all the databases in the cluter to the
new data directory.

Is this my only option?

PJ


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trigger function, C, lookup of attribute type

2015-12-15 Thread Paul


I found the solution: SPI_gettype() does the job fine.

I was led to that by rummaging through the slony source
code to see how they handle the triggers in C.

--
Paul Nicholson
--


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Trigger function, C, lookup of attribute type

2015-12-15 Thread Paul


I'm writing a trigger procedure in C to catch updates to a
table and write them to a log file.

The function must walk along trigdata->tg_trigtuple
pulling out the attributes, comparing them with those in
trigdata->tg_newtuple and writing the diffs to a flat ASCII
file.

I've got a loop over the attributes, and I can get the values
if I know the type, eg

 Datum d = heap_getattr( trigdata->tg_trigtuple, ...
 int32_t val = DatumGetInt32( d);

but I'm struggling to find how the function can determine the
attribute types.   I'm only using the built-in scalar types.

Do I have to query the system catalogs?  Or maybe there's a
function that works from a TupleDesc?

--
Paul Nicholson
--


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgres sometimes returns no data

2015-11-12 Thread Paul Jungwirth

On 11/12/2015 11:49 AM, db042190 wrote:

I see "unexpected eof...could not receive data..target
machine actively refused it.".


That sounds like the same error message as discussed here:

http://www.postgresql.org/message-id/4d75289d.9020...@techbaza.pl

Could it be a problem of too many open connections? Possibly some 
hanging around longer than they should?


Paul




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_archivecleanup not deleting anything?

2015-11-02 Thread Paul Jungwirth

So something is doing a base backup roughly every two hours.

Is that what you would expect?


No. :-)

Sounds like I need to do some archeology. This is a system I inherited, 
so I haven't yet explored all the dark corners.


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_archivecleanup not deleting anything?

2015-11-02 Thread Paul Jungwirth

On 11/02/2015 09:11 AM, Adrian Klaver wrote:

The *.backup files should not be 16MB and from your original post they
looked to be 300 bytes. Now if you have 30K of 16MB files then something
else is going on.


Ah, you are right! Sorry for the misunderstanding.

Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_archivecleanup not deleting anything?

2015-11-02 Thread Paul Jungwirth

So, as Albe posted pg_archivecleanup is only cleaning up the WAL files,
not the auxiliary files. The WAL files would be the ones with no
extension and a size of 16 MB(unless someone changed the compile settings).


Okay, thank you both for the explanation! I'm glad to hear that it's not 
a misconfiguration on our side. I guess we should just make sure we 
remove older *.backup files by some other means, because thousands of 
16MB files add up eventually. We had more than 30k of them!


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_archivecleanup not deleting anything?

2015-11-02 Thread Paul Jungwirth

Is there anything else beside *.backup files in the directory?


There were a few *.history files, and a few files with no extension, 
like this: 000600BE0040.


Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] pg_archivecleanup not deleting anything?

2015-11-02 Thread Paul Jungwirth

Hello,

I'm running Postgres 9.3 in a warm standby configuration, and the slave 
has this setting in recovery.conf:


archive_cleanup_command = '/usr/lib/postgresql/9.3/bin/pg_archivecleanup 
/secure/pgsql/archive/ %r'


But I noticed that the archive directory had files going back to 
February 2014:


$ ls -ltr archive | head
total 9126292
-rw--- 1 postgres postgres  300 Feb 15  2014 
00010002.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
00010003.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
00010004.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
00010006.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
00010007.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
0001000B.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
0001000C.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
0001000E.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
0001000F.0028.backup


And even if I run the command by hand, nothing is actually deleted:

$ pg_archivecleanup -d /secure/pgsql/archive/ 
0001000F.0028.backup
pg_archivecleanup: keep WAL file 
"/secure/pgsql/archive//0001000F" and later


Despite the message, and the same files remain.

Does anyone have any idea why pg_archivecleanup isn't deleting anything?

Thanks,
Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Left Join with Limit 1

2015-10-12 Thread Paul Jungwirth

Running the queries individually and using a limit on the golite ip db
results are back immediately 1-2ms but when using the first query it
takes 2-3 seconds.

Is there a way to use a limit in the join?


This sounds like the real issue is a missing/incorrect index, but if 
you're on 9.4+ you can use a lateral join like this:



SELECT S.referrer_ip,
   I.geoname_id
FROM viewing_stats AS S
LEFT JOIN LATERAL (
SELECT *
FROM geolite_city_ip4
WHERE S.referrer_ip::inet <<= network
LIMIT 1
) I
ON true
WHERE viewing_id=74;

You might also want some kind of ordering in that subquery so that the 
results are deterministic.



Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] PostGIS 2.2.0 Released

2015-10-07 Thread Paul Ramsey
http://postgis.net/2015/10/07/postgis-2.2.0

PostGIS 2.2.0 is released! Over the last two years a number of interesting new 
features have been added, such as:

* True nearest-neighbor searching for all geometry and geography types
* New volumetric geometry support, including ST_3DDifference, ST_3DUnion and 
more
* Temporal data model support and functions like ST_ClosestPointOfApproach to 
support temporal query
* Spatial clustering functions ST_ClusterIntersecting and ST_ClusterWithin
* Subdividing large geometries with ST_Subdivide
* Fast box clipping with ST_ClipByBox2D
* In-database raster processing with ST_Retile and ST_CreateOverview
* New high-speed native code address standardizer
* Visvalingam-Whyatt geometry simplification with ST_SimplifyVW
* Support for compressed “tiny well-known binary” format with ST_AsTWKB and 
ST_GeomFromTWKB

See the full list of changes in the news file: 
  http://svn.osgeo.org/postgis/tags/2.2.0/NEWS

Thanks to everyone who helped with testing and development over this cycle!

Yours,
Team PostGIS
 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] relpages for pg_toast.* tables

2015-08-28 Thread Paul Ramsey
In the spirit of "asking the question leads you to the answer", while
ANALYZE doesn't update stats on toast tables, VACUUM does. So running
VACUUM ANALYZE on the parent table updates all the relevant "relpages"
slots and the space summary turns out passable guesstimates.

P.

On Fri, Aug 28, 2015 at 6:16 AM, Paul Ramsey  wrote:
> I've been trying to write up a "pg_total_relation_size()" replacement
> that just uses the system tables (sum up relpages for tables, indexes
> and toast tables), thereby avoiding the overhead of running stat() on
> thousands of filesystem tables, but I've come up against what seems to
> be an unsuperable problem.
>
> The `relpages` data for the toast tables doesn't seem to get updated.
> Not when I run a general 'ANALYZE' call, and when I specifically call
> ANALYZE on the toast tables, it tells me "WARNING:  skipping
> "pg_toast_4597532" --- cannot analyze non-tables or special system
> tables". Well, OK then.
>
> Presumably this is by design, but it makes it impossible to get a true
> guesstimate (as of latest ANALYZE) of size. Are there any known
> workarounds?
>
> Thanks,
> P


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] relpages for pg_toast.* tables

2015-08-28 Thread Paul Ramsey
I've been trying to write up a "pg_total_relation_size()" replacement
that just uses the system tables (sum up relpages for tables, indexes
and toast tables), thereby avoiding the overhead of running stat() on
thousands of filesystem tables, but I've come up against what seems to
be an unsuperable problem.

The `relpages` data for the toast tables doesn't seem to get updated.
Not when I run a general 'ANALYZE' call, and when I specifically call
ANALYZE on the toast tables, it tells me "WARNING:  skipping
"pg_toast_4597532" --- cannot analyze non-tables or special system
tables". Well, OK then.

Presumably this is by design, but it makes it impossible to get a true
guesstimate (as of latest ANALYZE) of size. Are there any known
workarounds?

Thanks,
P


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Paul Jungwirth

Thanks Paul, I guess I'm not sure how a generate_series between 0 to 6
would solve this problem. Wouldn't I have to generate a series based on
the date range (by day) and then group by DOW _after_ that? Can you give
me an example of how I'd do it with a series based on 0 to 6?


Looks like David Johnston beat me to it! :-) But this is what I had in mind:

SELECT  s.d AS dow,
COUNT(u.id) c
FROMgenerate_series(0, 6) s(d)
LEFT OUTER JOIN users u
ON  EXTRACT(dow FROM created) = s.d
GROUP BY dow
ORDER BY dow
;

You can also get human-readable DOW names by creating a 7-row CTE table 
and joining to it based on the numeric dow.


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Paul Jungwirth

> I'm not sure how to create a result where I get the average number of
> new users per day of the week. My issues are that days that did not
> have any new users will not be factored into the average

This is a pretty common problem with time-series queries when there is 
sparse data. My go-to solution is to use generate_series---in your case 
from 0 to 6---then do a left join from there to your actual data.


Paul





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Less is More

2015-06-22 Thread Paul Ramsey
Hey all,
I have an odd performance quirk I wonder if anyone has any theories for… (not a 
one-time result, table all heated up, etc)


spain=# explain analyze select way from planet_osm_point;                       
                                  QUERY PLAN
-
 Seq Scan on planet_osm_point  (cost=0.00..37240.97 rows=1748797 width=32) 
(actual time=0.044..813.521 rows=1748797 loops=1)
 Total runtime: 902.256 ms
(2 rows)

Time: 902.690 ms

spain=# explain analyze select * from planet_osm_point;
                                                          QUERY PLAN
--
 Seq Scan on planet_osm_point  (cost=0.00..37240.97 rows=1748797 width=848) 
(actual time=0.051..241.413 rows=1748797 loops=1)
 Total runtime: 330.271 ms
(2 rows)

Time: 331.869 ms



Thanks,

-- 
http://postgis.net
http://cleverelephant.ca



Re: [GENERAL] date with month and year

2015-05-21 Thread Paul Jungwirth

what you've said above is incorrect.
All "WITH TIME ZONE" does is tell PostgreSQL to apply timezone
conversions during various operations.  The stored data is represented
as an epoch without any concept of the source data's timezone
representation.


Oh, very interesting! Thank you for pointing that out. I'll have to 
think some more about when I'd want that behavior.


Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] date with month and year

2015-05-21 Thread Paul Jungwirth

Hi Steve,

Thanks for such a thorough response! I agree that time is a lot trickier 
and trappier than one might expect, so it's good to learn how others 
grapple with it.


> Your original question had to do with month/year.

Just to clarify, that was Daniel's original question, but you're 
replying to my follow-up question.



The first is that
web developers shouldn't become educated about the capabilities of a
database but rather use the database as a dumb data-store and redo
everything themselves (often this includes an utter failure to use the
data-integrity capabilities of the database).


That's not a debate I can hope to settle, but for what it's worth, I 
mostly agree with you. That's why I've written these tools to let Rails 
users leverage more of the capabilities inside Postgres, especially 
integrity constraints:


https://github.com/pjungwir/db_leftovers
https://github.com/pjungwir/aggs_for_arrays/

also these efforts at education:

https://github.com/pjungwir/rails-and-sql-talk
http://illuminatedcomputing.com/posts/2015/02/postgres_lateral_join/
http://illuminatedcomputing.com/posts/2015/03/generate_series_for_time_series/

Anyway, I agree that you have to store the time zone *somewhere*, and I 
suppose that's the reason Joshua remarked that you really shouldn't use 
WITHOUT TIME ZONE. And often a time has one perspective that is 
"canonical" or "preferred", e.g. the time zone of the user who created 
the object. And in that case WITH TIME ZONE gives you a convenient place 
to store that. I think I still prefer a more "relativistic" approach 
where times have no preferred perspective, and input strings are 
converted to a bare "instant" as quickly as possible (using whatever 
time zone is appropriate). For instance that avoids the failure scenario 
Brian described. I concede that storing the time zone separately as a 
string makes it tricker for other database clients, at least when the 
string is a name only meaningful to Rails. In the future I'll keep an 
eye out for when WITH might be handy. And maybe I'll do some research to 
see how well Rails would handle those columns.


Thanks again for your generosity!

Yours,
Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] date with month and year

2015-05-21 Thread Paul Jungwirth

You really shouldn't use WITHOUT TIME ZONE.


I'd like to know more about this. Can you say why? Are there any 
articles you'd recommend? I'm fond of normalizing all times to UTC and 
only presenting them in a time zone when I know the current 
"perspective". I've written about that approach in a Rails context here:


http://illuminatedcomputing.com/posts/2014/04/timezones/

I find that this helps me to ignore time zones in most parts of my 
application and cut down on my timezone-related bugs.


Thanks!
Paul





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] date with month and year

2015-05-21 Thread Paul Jungwirth

Is it possible to have date type data that only contain month and year?,
how can I obtain that from a timestamp (without time zone) column?


I think you want date_trunc, which will cut everything down to the first 
of the month, e.g 2015-01-01, 2015-02-01, etc. The results will still be 
dates, so you can still use date functions to manipulate them.


Paul






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Success story full text search

2015-05-02 Thread Paul Jungwirth

Does someone have a success story of using Postgres Full Search
Capability with significant data, lets say > 50-100 GB  ?


This is a recent and very complete article on using Postgres for 
full-text search:


http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/

see also the discussion here:

https://news.ycombinator.com/item?id=8381748
https://news.ycombinator.com/item?id=8714477

That should give you a good sense of the abilities and limitations vs 
using Lucene etc.


On scanning that article I don't see any mention of size, but you could 
always ask the author!


Paul




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] On using doubles as primary keys

2015-04-17 Thread Paul A Jungwirth
On Apr 17, 2015 8:35 AM, "Kynn Jones"  wrote:
> (The only reason for wanting to transfer this data to a Pg table
> is the hope that it will be easier to work with it by using SQL

800 million 8-byte numbers doesn't seem totally unreasonable for
python/R/Matlab, if you have a lot of memory. Are you sure you want it in
Postgres? Load the file once then filter it as you like. If you don't have
the memory I can see how using Postgres to get fewer rows at a time might
help. Fewer columns at a time would help even more if that's possible.

> In its simplest form, this would mean using
> doubles as primary keys, but this seems to me a bit weird.

I'd avoid that and just include an integer PK with your data. Datagrams in
the languages above support that, or just slice off the PK column before
doing your matrix math.

Also instead of 401 columns per row maybe store all 400 doubles in an array
column? Not sure if that's useful for you but maybe it's worth considering.

Also if you put the metadata in the same table as the doubles, can you
leave off the PKs altogether? Why join if you don't have to? It sounds like
the tables are 1-to-1? Even if some metadata is not, maybe you can finesse
it with hstore/arrays.

Good luck!

Paul


Re: [GENERAL] ERROR: could not access status of transaction 283479860

2015-04-06 Thread Pweaver (Paul Weaver)
We figured that out. We were afraid that there was corrupt data due the
counter overflow so we reverted to a backup right before the overflow. We
submitted a bug this morning (bug #12990) to the Postgres bugs mailing list
which has more information and questions.

On Mon, Apr 6, 2015 at 2:17 PM, Alvaro Herrera 
wrote:

> Pweaver (Paul Weaver) wrote:
> > We started getting the following error on some transactions on our
> database
> > (when against particular rows run against the table).
> >
> >
> > PGError: ERROR: could not access status of transaction 283479860 DETAIL:
> > Could not open file "pg_multixact/members/4D6D": No such file or
> directory.
> >
> > We don't know what happen or how to recover. Any advise?
>
> This looks like you overran pg_multixact/members files by wrapping
> around the 32 bit counter (NextMultiOffset in your pg_controldata
> output); a cleanup process later removed files that were still needed.
> This is a Postgres bug, I fear.  Recovery is a bit complicated.  I see
> that your report was made 5 days ago -- have you had any progress on
> getting things fixed since then?
>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>



-- 
Thank You,
Pweaver (pwea...@panjiva.com)


  1   2   3   4   5   6   7   >