Re: Return rows in input array's order?

2023-05-09 Thread Steven Lembark
On Tue, 9 May 2023 11:37:29 +0200
Dominique Devienne  wrote:

> On Tue, May 9, 2023 at 11:23 AM David Wheeler 
> wrote:
> 
> > > Hi. With an integer identity primary key table,
> > > we fetch a number of rows with WHERE id = ANY($1),
> > > with $1 an int[] array. The API using that query must return
> > > rows in the input int[] array order, and uses a client-side
> > > mapping to achieve that currently.
> > >
> > > Is it possible to maintain $1's order directly in SQL?
> > > Efficiently?  
> >
> > We’ve done this before with an “order by array_index(id,
> > input_array)”. I forget the actual function consider that pseudo
> > code 
> 
> Thanks David. I see how this would work.
> 
> It was only used for small arrays but never noticed any performance
> issues
> 

Depending on your PG version:

Create a temp table via unnest, join that with what you need 
and order by tmp.seq.

Forgot which version allows inlining of CTE's but you can 
use a CTE (12?):

with int_seq
as
(
select  unnest( int_array_col ) "order_by"
fromwhatever
where   blah
)
select

from
foobar  a
join 
int_seq b
on 
a.foo = b.order_by
order by
b.order_by
  , 
    

This dodges the tmp table and the optimizer can inline the 
results, probably gets you the fastest result. 


-- 
Steven Lembark
Workhorse Computing
lemb...@wrkhors.com
+1 888 359 3508




Re: Does Postgres 14 have a query cache?

2023-02-18 Thread Steven Lembark
On Sat, 18 Feb 2023 12:43:42 -0600
Ron  wrote:

> > I think the answer is no but wanted to confirm here. this is what
> > my best friend told me.

There are caches for prepared statements, table rows, indexes. 

What about the caches are you interested in?


-- 
Steven Lembark
Workhorse Computing
lemb...@wrkhors.com
+1 888 359 3508




Re: Order by and timestamp

2020-03-15 Thread Steven Lembark
8 | To Be Placed | 2017-04-01 15:45:00 |
> 28174115 | 1.130662643 | To Be Placed | 2017-04-01
> 16:20:00 | 28174115 | 1.130662647 | 1m2f Hcap|
> 2017-04-01 16:50:00 | 28174115 |
> 
> 
> I see this on some dates, but most are in order 
> Actually it looks like ’order by MARKETID'
> 
> The data is collected on Amazon cloud, Ireland, and in Sweden. time
> diff is 1 hour between the countries, GMT / CET
> 
> Hmm, I now realise that daylight saving time starts stops
> ended  2016-10-30, and started again 2017-03-26
> so it is not on the bad dates.
> 
> I vaguely recall that I have once defined the column as 
> Timestamp with timezone, and changed it to without.
> May that have an impact?
> 
> I realize that I should migrate to 11 or 12, but hmm, I still wonder
> over this sort order.
> 
> TZ=Europe/Stockholm
> 
> 
> Timing is on.
> AUTOCOMMIT off
> psql (9.6.10)
> Type "help" for help.
> 
> bnl=# \d amarkets
>Table "public.amarkets"
>   Column  |  Type  |
> Modifiers
> --++-
> marketid | character varying(11)  | not null default
> ' '::character varying marketname   | character
> varying(50)  | not null default ' '::character varying
> startts  | timestamp(3) without time zone | not null
> eventid  | character varying(11)  | not null default
> ' '::character varying markettype   | character
> varying(25)  | not null default ' '::character varying
> status   | character varying(50)  | not null default
> ' '::character varying betdelay |
> integer| not null default 1 numwinners
> | integer| not null default 1
> numrunners   | integer| not null default
> 1 numactiverunners | integer| not null
> default 1 totalmatched     | numeric(15,2)  | not
> null default 0.0 totalavailable   | numeric(15,2)  |
> not null default 0.0 ixxlupd  | character
> varying(15)  | not null default ' '::character varying
> ixxluts  | timestamp(3) without time zone | not null Indexes:
> "amarketsp1" PRIMARY KEY, btree (marketid) "amarketsi2" btree
> (eventid) "amarketsi3" btree (markettype) "amarketsi4" btree (status)
> "amarketsi5" btree (numwinners) "amarketsi6" btree (ixxluts)
> 
> bnl=# 
> 
> 
> regards
> --
> Björn Lundin
> b.f.lun...@gmail.com


-- 
Steven Lembark3646 Flora Place
Workhorse ComputingSt. Louis, MO 63110
lemb...@wrkhors.com+1 888 359 3508




Re: trouble making PG use my Perl

2020-02-28 Thread Steven Lembark
On Thu, 27 Feb 2020 15:07:29 -0500
Tom Lane  wrote:

> You might be able to override that with LD_LIBRARY_PATH, but it's
> a pain, and it will certainly not work if your homebrew libperl
> isn't 100% ABI-compatible with the system one.
> 
> Personally I'd build plperl against the Perl you want to use it with.
> The rest of PG isn't dependent on Perl, so you could use the community
> install for the rest of it if you like.

Funny thing is that both PG and Perl are easy enough to build
from scratch and the centos compile of Perl at least is both
ancient and horrid enough (5.00503 compatibility, really?) that
it's easier to just shell-script both builds and run it overnight.

Q: How un-optimized and ancient is the PG on centos?

-- 
Steven Lembark3646 Flora Place
Workhorse ComputingSt. Louis, MO 63110
lemb...@wrkhors.com+1 888 359 3508




Re: trouble making PG use my Perl

2020-02-28 Thread Steven Lembark
On Thu, 27 Feb 2020 20:42:36 +
Kevin Brannen  wrote:

> Thanks Tom, I can see your point. With the right change to
> LD_LIBRARY_PATH, I can make `ldd plperl.so` point to my Perl, but as
> you say, I'm probably playing with fire to expect it all to be 100%
> compatible between Perl 5.10.1 (Centos 6 default) and 5.30.1 (mine).

Note that you don't *want* to be binary compatible with the junk
they distribute.

Unless you really expect to have copies of 5.00503 running?
No, probably not. 

RH and Debian distros are distriuted by heavy Python users who
go out of their way to hamstring Perl at every step. Unless 
things have changed massively, the Perl they distribute is not
only out of date it's nearly broken out of the box. Try running
"perl -V" sometime and review the "config_args" values.

enjoi

-- 
Steven Lembark3646 Flora Place
Workhorse ComputingSt. Louis, MO 63110
lemb...@wrkhors.com+1 888 359 3508




Re: trouble making PG use my Perl

2020-02-28 Thread Steven Lembark


> I can make Pg come up, initdb, that sort of stuff just fine. But we
> also use the Perl extension and we have references to Perl modules
> that are in *our* Perl and not the system one. Yes, we compile our
> own Perl like we provide our own Pg because Centos uses much older
> versions.
> 
> The issue is that I've not been able to make Pg use our Perl
> (in /opt/perl) instead of the system one (in /usr). I've tried
> changing the env-vars in multiple places, the most obvious being
> the /etc/init.d script, but none of that helps. When we compiled our
> own Pg, I could set this with "export PERL=/opt/perl/bin/perl" before
> the "configure --with-perl" command. Setting PERL in the init.d file
> doesn't help either.

If you use the centos pre-compiled glob then you'll get their 
pre-compiled paths to their pre-compiled Perl which, among
other things, is compiled with all optimization turned off, 
with 5.00503 compatibility turned *on*, and a host of other
pure idiocies that make their perl unsuitable for human use.

Simplest fix is probably rolling your own: PG's build cycle 
is quite manageable, as is Perl's, and you'll get something
that is reasonably optimized for your platform and use.


-- 
Steven Lembark3646 Flora Place
Workhorse ComputingSt. Louis, MO 63110
lemb...@wrkhors.com+1 888 359 3508




Re: performance of loading CSV data with COPY is 50 times faster than Perl::DBI

2020-02-03 Thread Steven Lembark


> 2. postgres has not implicit plan cache, so every INSERT planned
> again and again

Good point: If you are doing something in a loop you probably
want to use prepare_cached() to get the $sth, for both extract
and insert.


-- 
Steven Lembark3646 Flora Place
Workhorse ComputingSt. Louis, MO 63110
lemb...@wrkhors.com+1 888 359 3508




Re: performance of loading CSV data with COPY is 50 times faster than Perl::DBI

2020-02-03 Thread Steven Lembark
On Fri, 31 Jan 2020 19:24:41 +0100
Matthias Apitz  wrote:

> Hello,
> 
> Since ages, we transfer data between different DBS (Informix, Sybase,
> Oracle, and now PostgreSQL) with our own written tool, based on
> Perl::DBI which produces a CSV like export in a common way, i.e. an
> export of Oracle can be loaded into Sybase and vice versa. Export and
> Import is done row by row, for some tables millions of rows.
> 
> We produced a special version of the tool to export the rows into a
> format which understands the PostgreSQL's COPY command and got to know
> that the import into PostgreSQL of the same data with COPY is 50 times
> faster than with Perl::DBI, 2.5 minutes ./. 140 minutes for around 6
> million rows into an empty table without indexes.
> 
> How can COPY do this so fast?

DBI is a wonderful tool, but not intended for bulk transfer. It
is useful for post-processing queries that extract specific
data in ways that SQL cannot readily handle. 

One big slowdown is the cycle of pull-a-row, push-a-row involves 
signiicant latency due to database connections. That limits the 
throughput.

Depending on who wrote the code, they may have extracted the rows
as hashrefs rather than arrays; that can be a 10x slowdown right
there. [I have no idea why so many people are so addicted to storing
rows in hashes, but it is always a significant slowdown; and 
array slices are no more complicated than hash slices!]

Where DBI is really nice is managing the copy: generating a 
useful basename, determining the path, deciding whether to zip
the output, etc. Using metadata to determine which of the tables
to back up and where to put the result, all of the automation 
you'd want to get flexible backups is nice in DBI. Bulk copies, 
probably not.

-- 
Steven Lembark3646 Flora Place
Workhorse ComputingSt. Louis, MO 63110
lemb...@wrkhors.com+1 888 359 3508




Re: combination join against multiple tables

2020-01-31 Thread Steven Lembark
On Fri, 31 Jan 2020 14:01:17 +
Geoff Winkless  wrote:

>  a  |  c  |  c
> +-+-
>   1 | 111 | 211
>   1 | 112 |
>   2 | 121 |
>   2 | | 222
>   3 | |
>   4 | 141 |
>   5 | | 253
>   6 | |
>   7 | |
>   8 | |
>   9 | |
>  10 | |

The c's look just like a full outer join of t1 & t2 on a & b.

Giving them saner names to avoid duplicate output col's, let's
call them "c1" & "c2".

At that point a left outer join on a gives you all of the base.a
values with any t{1,2} rows that have a matching a:

No idea what your data really looks like but if t1 or t2 has more
than three col's distinct can save some annoying cross-products:

select
  distinct
base.a
  , z.c1
  , z.c2
from
  base
  left join
  (
select
t1.a
  , t1.c  "c1"
  , t2.c  "c2"
from
  t1
  full outer join
  t2
  on
t1.a= t2.a
    and
t1.b= t2.b
  ) z
  on
  base.a  = z.a
;

-- 
Steven Lembark3646 Flora Place
Workhorse ComputingSt. Louis, MO 63110
lemb...@wrkhors.com+1 888 359 3508




Re: combination join against multiple tables

2020-01-31 Thread Steven Lembark
On Fri, 31 Jan 2020 14:01:17 +
Geoff Winkless  wrote:

> Hi
> 
> I have a query involving multiple tables that I would like to return
> in a single query. That means returning multiple sets of the data
> from the first base table, but that's acceptable for the simplicity
> in grabbing all the data in one hit.
> 
> An example set:
> 
> CREATE TABLE t1 (a int, b int, c int);
> CREATE TABLE t2 (a int, b int, c int);
> CREATE TABLE base (a int);
> INSERT INTO t1 (a, b, c) VALUES (1, 1, 111), (1,2,112), (2,1,121),
> (4,1,141);
> INSERT INTO t2 (a, b, c) VALUES (1, 1, 211), (2, 2, 222), (5,3,253);
> INSERT INTO base(a) SELECT * FROM GENERATE_SERIES(1,10);
> 
> Now the problem is that I would like to return all the rows from a,
> but with a single row where t2.b and t1.b match.
> 
> So the results I would like:
> 
>  a  |  c  |  c
> +-+-
>   1 | 111 | 211
>   1 | 112 |
>   2 | 121 |
>   2 | | 222
>   3 | |
>   4 | 141 |
>   5 | | 253
>   6 | |
>   7 | |
>   8 | |
>   9 | |
>  10 | |
> 
> At the moment I'm doing
> 
> SELECT base.a, t1.c, t2.c
> FROM base
> CROSS JOIN (SELECT b FROM t1 UNION SELECT b FROM t2 UNION SELECT -1)
> tmpset LEFT JOIN t1 ON t1.a=base.a AND t1.b=tmpset.b
> LEFT JOIN t2 ON t2.a=base.a AND t2.b=tmpset.b
> WHERE t1.a IS NOT NULL
> OR t2.a IS NOT NULL
> OR (tmpset.b=-1
>   AND NOT EXISTS (SELECT FROM t1 WHERE t1.a=base.a)
>   AND NOT EXISTS (SELECT FROM t2 WHERE t2.a=base.a)
> );
> 
> 
> but this seems like a really convoluted way to do it.
> 
> Is there a join style that will return the set I want without the
> pain?
> 
> I should be clear that the real world data is much more complex than
> this, but it describes the basic problem.
> 
> Thanks
> 
> Geoff

The join on T1 & t2 seems to just be a full outer join of 
t1 & t2 on a & b.

Note that you cannot have two columns in the output with the
same name (e.g., a+c+c, above, is not a valid table). Call them
"c1" & "c2":

A full outer join of t1 & t2 on a & b seems to give you all of
the necessary combinations of c necessary; at which point an
outer join on a associates base values with anything that 
mathes on a:

select
  distinct
base.a
  , z.c1
  , z.c2
from
  base
  left join
  (
select
  distinct
t1.a
  , t1.c  "c1"
  , t2.c  "c2"
from
  t1
  full outer join
  t2
      on
t1.a= t2.a
and
t1.b= t2.b
  ) z
  on
  base.a  = z.a
;

No idea what the real data looks like, but distinct likely to be
helpful if real t's have more than three cols.

-- 
Steven Lembark3646 Flora Place
Workhorse ComputingSt. Louis, MO 63110
lemb...@wrkhors.com+1 888 359 3508




Re: How to prevent POSTGRES killing linux system from accepting too much inserts?

2019-12-18 Thread Steven Lembark
On Wed, 18 Dec 2019 17:53:26 +0800
"James(王旭)"  wrote:

> Hello,I encountered into this kernel message, and I cannot login into
> the Linux system anymore:
> 
> Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1Dec 17
> 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than 120
> seconds.Dec 17 23:02:30 hq-pg kernel: "echo 0
>  /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.Dec 17 23:02:30 hq-pg kernel: sync   
>   D 965ebabd1040  0
> 6573 6572 0x0080Dec 17 23:02:30 hq-pg kernel: Call
> Trace:Dec 17 23:02:30 hq-pg kernel: [ generic_write_sync+0x70/0x70 After some google I guess it's the
> problem that IO speed is low, while the insert requests are coming
> too much quickly.So PG put these into cache first then kernel called
> sync.I know I can queue the requests, so that POSTGRES will not
> accept these requests which will result in an increase in system
> cache.But is there any way I can tell POSTGRES, that you can only
> handle 2 records per second, or 4M per second, please don't
> accept inserts more than that speed.For me, POSTGRES just waiting is
> much better than current behavior. Any help will be much appreciated.

There isn't one magic-bullet solution for this. It may be that you can 
tune Linux, PG, or the filesystem to handle the load more 
gracefully; or that you just need more hardware. Streaming inserts might
be better batched and handled via synchronous ETL than pushed in at
random, at that point you can control the resources.

One approach might be tighter timeouts on the server or client, which
will leave the queries failing when the queue gets too high. That
frees up resources on the server, at the obvious expense of having
transactions roll back. On the other hand, you can end up with 
timeouts so tight that you end up thrashing, which doesn't help the
problem.

Catch from this end is that without more informaton on the system
you are dealing with there isn't any quick-and-dirty fix.

I'd suggest looking over:

<https://duckduckgo.com/?q=linux+postgres+tuning=ffab=web>

for suggestions and seeing which ones work or don't. If you have
more specific questions on the parameters or how to evaluate the
stats PG is keeping feel free to ask them here, but you will need
to be specific as to the stats and situation in which they were
acquired so that people have enough context to give you a reasonable
answer.

-- 
Steven Lembark3646 Flora Place
Workhorse ComputingSt. Louis, MO 63110
lemb...@wrkhors.com+1 888 359 3508




Re: Backup PostgreSQL from RDS straight to S3

2019-09-19 Thread Steven Lembark


s3fs available on linux allows mounting S3 directly as a local 
filesystem. At that point something like:

  pg_dump ... | gzip -9 -c > /mnt/s3-mount-point/$basename.pg_dump.gz;

will do the deed nicely. If your S3 volume is something like
your_name_here.com/pg_dump then you could parallize it by dumping
separate databases into URL's based on the date and database name:

tstamp=$(date +%Y.%m.%d-%H.%M.%S);

gzip='/bin/gzip -9 -v';
dump='/opt/postgres/bin/pg_dump -blah -blah -blah';

for i in your database list
do
echo "Dump: '$i'";
$dump $i | $gzip > /mnt/pg-backups/$tstamp/$i.dump.gz &
done

# at this point however many databases are dumping...

wait;

echo "Goodnight.";

If you prefer to only keep a few database backups (e.g., a rolling
weekly history) then use the day-of-week for the tstamp; if you
want to keep fewer then $(( $(date +%s) / 86400 % $num_backups)) 
will do (leap-second notwhithstanding).

Check rates to see which AWS location is cheapest for the storage
and procesing to gzip the content. Also check the CPU charges for
zipping vs. storing the data -- it may be cheaper in the long run
to use "gzip --fast" with smaller, more repeatetive content than 
to pay the extra CPU charges for "gzip --best".

-- 
Steven Lembark3646 Flora Place
Workhorse ComputingSt. Louis, MO 63110
lemb...@wrkhors.com+1 888 359 3508




Re: Work hours?

2019-08-30 Thread Steven Lembark


> > Any thoughts as to the best way to approach this?  
> 
> Use generate_series:
> 
> https://www.postgresql.org/docs/11/functions-srf.html
> 
> to generate all the days in the month.
> 
> Loop over the days and use EXTRACT:
> 
> https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT
> 
> to find the dates with a dow(The day of the week as Sunday (0) to 
> Saturday (6)) or isodow(The day of the week as Monday (1) to Sunday
> (7)) that falls in Mon-Fri and add to counter.

Assumes all weekdays are work days and that all weeks are uniform
for work. Any holiday, emergency, outage leaves you with a skewed
result.

First workaround is table of non-work days: generate a temp table of 
the series not intersecting the non-workdays (materialized views
are nice for this). 

Then you get into issues of different people having different non-
work days, leaving your subtraction table keyed by person+date.

Frequently the non-work days are by employee class, which allows a
table of days off by employee grade + employees w/ grade => days
off by empoloyee.

Then individual employees will have their own time off due to paid
vacation, medical or family leave, and sick days. Depending on your
number of employees a non_working_days w/ date + employee works or 
you get into the pay grade + generic days and employee + pay grade
for the generic days off merged with a separate table of individual
days off. Subtract that from a temp table generated by the sequences
and you'll have a complete schedule.



-- 
Steven Lembark 3920 10th Ave South
Workhorse Computing   Birmingham, AL 35222
lemb...@wrkhors.com+1 888 359 3508




Re: migrating from Oracle to PostgreSQL 11

2019-06-27 Thread Steven Lembark
On Wed, 26 Jun 2019 19:12:07 +0530
Hitesh Chadda  wrote:

> Hi PostgresSQL Support,
> 
> I have to migrate from Oracle 11g to PostgresSQL 11.4. Please suggest
> best solution for doing the migration.

Bit of a broad topic for an answer. Other than "carefully", there
isn't enough of a question to answer.

Issues that will affect the transfer include the quantity of data,
relative locations of the servers, complexity of the data, how much
custom code you have tied up in triggers and local functions, and 
the platforms you will be running with.

For example, moving a 6PB database has downtime and data transfer
issues that you don't have with a few GB; a 7x24 database with 
multiple hot servers accessed via VPN has different challenges than
a single-server on a VM on your LAN; translating a bunch of integer
SK's and some text has different requirements than something you want
to compose arrays, ranges, or JSONB objects out of; the dialaects
of PLSQL are not entirely compatible so you will have to think about
translating triggers, reports, and op's code between dialects; Oracle
only supports "global temporary tables", PG avoids them with the 
added features of drop/truncate on commit...

Your best first step will be to get a basic understanding of PG. I
like the "PostgreSQL: Up and Running", I think it's up to the 3rd 
edition. Grab it on Safari, fairly short read covers the topics you
need to get a running PG database.

PG is a moving target: 11 supports all sorts of nice, new things;
12 has even more of them. Pick a PG version (suggest 11, it's 
stable), look at what it does, investigate the specific applications 
you have on site, document them, and start porting them to PG 
from smallest to largest. 

If you have specific questions about the PG version, its features,
or how to translate specific Oracle-isms into PG please feel free
to ask -- that's what we're here for :-) 

-- 
Steven Lembark 3920 10th Ave South
Workhorse Computing   Birmingham, AL 35222
lemb...@wrkhors.com+1 888 359 3508




Re: Postgres for SQL Server users

2019-05-07 Thread Steven Lembark


> Next month I'll be making a presentation about Postgres to a SQL
> Server crowd in L.A. at their SQL Saturday event.
> 
> I was wondering if anyone has any tips that are specific for SQL
> Server users?  Best features?  Known issues?  Common rebuttals?

PosgreSQL is more than a "database", it's more of an ecosystem. The
combination of open source,well-described interfacs, and simplicity
of "create extension" allows us all to develop and add features,
functionality to PG. One example: "exclude using gist" with
timestamp ranges make time-series databases nearly trivial. PostGIS
is another.

Another is that PG is written and developed by people who actually
use it. Wander through psql: things like \gset and \gexec with 
format() make introspective tasks quite doable with SQL. 

pgadmin is another nice "addon" that exists partly because PG was
designed from the ground up to support external access. Hell, throw
in EnterpriseDB if you want even more addons; there aren't third-
party SQL Server support distributions (I'm a capatilist: competition
can be Good Thing).

However much people rag on it, a *tunable* vacuum system is quite
helpful. The "I don't to worry about that with " starts
to fall apart when the system starts unncessary cleanups in the 
middle of a large daily dump... Being able to control the process
makes it a whole lot easier to forget.

Foreign Data Wrappers are a nice feature, the byproduct of a system
that was designed for open access. Instead of waiting for MS to 
decide that any one external resource is profitable, we have quite a
few quite usable interfaces that support what people need.

The combination of extensions and FDW also gives companies to add
whatever they need when they need it. If anyone tells you that "X
isn't available in PG" the simplest answer is "Fine, then add it!"
For the cost of a MS product license you can pay someone to write
whatever you want as an extension or special-purpose FDW. Pay them
for support and you can have whatever you need.

Which gets back to the ecosystem: The PG *product* includes a level
of flexability not available in SQL Server -- or Oracle for that
matter. SQL Server was heavily influenced by Sybase, which was based
on Ingres, which worked because it was an open, flexible platform.
Maybe you could just tell them PG gives them the chance to evolve
full-circle :-)


-- 
Steven Lembark 3920 10th Ave South
Workhorse Computing   Birmingham, AL 35222
lemb...@wrkhors.com+1 888 359 3508




Re: Move vs. copy table between databases that share a tablespace?

2019-04-03 Thread Steven Lembark


> Through a pipe to a different server.

No access to a server w/ sufficient space.

Looks like a compressed dump + restore...

Thanks


-- 
Steven Lembark 3920 10th Ave South
Workhorse Computing   Birmingham, AL 35222
lemb...@wrkhors.com+1 888 359 3508




Re: Move vs. copy table between databases that share a tablespace?

2019-04-03 Thread Steven Lembark
On Wed, 3 Apr 2019 08:33:54 -0500
Ron  wrote:

> On 4/3/19 8:18 AM, Steven Lembark wrote:
> > Trying to find a way of moving a large table between databases
> > in the same cluster. There is not sufficient space to copy the
> > contents -- the dedicated tablespace that fits the beastie is
> > on an 80% full disk.
> >
> > Given that the two databases live in the same cluster and have
> > the owner & the tablespace in common, is there any way to move
> > the contents without a dump & reload?  
> 
> COPY TO STDIN and pipe that through ssh to another server?

That makes a copy. There is no place to store two copies of the 
full data on disk. 

I'm basically trying to perform the moral equivalent of what "mv"
does in the filesystem: re-link the inode and remove the old inode
without duplicating the data.

Since the tablespace is visible to both I would get that there is
some way to alter table... that allocates the space on disk to the
new database w/o having to unload all 400M rows -- if it comes down
to that I'll have to copy it into xz and load it back in the new
database.

-- 
Steven Lembark 3920 10th Ave South
Workhorse Computing   Birmingham, AL 35222
lemb...@wrkhors.com+1 888 359 3508




Move vs. copy table between databases that share a tablespace?

2019-04-03 Thread Steven Lembark


Trying to find a way of moving a large table between databases
in the same cluster. There is not sufficient space to copy the
contents -- the dedicated tablespace that fits the beastie is 
on an 80% full disk.

Given that the two databases live in the same cluster and have
the owner & the tablespace in common, is there any way to move
the contents without a dump & reload?

Thanks

-- 
Steven Lembark 3920 10th Ave South
Workhorse Computing   Birmingham, AL 35222
lemb...@wrkhors.com+1 888 359 3508




Re: Where **not** to use PostgreSQL?

2019-03-02 Thread Steven Lembark


For small, lightweight, portable SQLite is quite nice for times you
don't want to create a full PG installation and still prefer SQL for
lookups.

-- 
Steven Lembark 3920 10th Ave South
Workhorse Computing   Birmingham, AL 35222
lemb...@wrkhors.com+1 888 359 3508



Re: conditionally terminate psql script

2018-12-18 Thread Steven Lembark
On Mon, 17 Dec 2018 13:42:14 +0100
Pavel Stehule  wrote:

> you need psql from PostgreSQL 10 and higher
> 
> there is a \if statement

Q: What is the "abort cycle command"?

e.g., 


select  count(1) < 1 as "lacks_rows";
fromfoobar
where   blah blah
\gset

\if :lacks_rows

\echo foobar lacks rows to process.
\echo goodnight :-)

?<--- what goes here to stop execution?

\endif


The point is that adding many levels of if-block logic is becomes
difficult to maintain. It would be nice to stop execution without
having to nest everything one level deeper.



-- 
Steven Lembark 3920 10th Ave South
Workhorse Computing   Birmingham, AL 35222
lemb...@wrkhors.com+1 888 359 3508



Re: Select "todays" timestamps in an index friendly way

2018-10-29 Thread Steven Lembark


> create temporary table t (
>   id SERIAL primary key,
>   ts timestamp not null default now()
> );

* add date( ts ) as a field and index date = now()::date.

* Depending on the amount of data in your table the date
  may not be seletive enough to be worth using, at which 
  point the index may be present and ignored. Only way to
  be sure is analyze it.

* Might be worth looking at a partial index using >= 00:00:00 
  and < 24:00:00 (PG grocks the 2400 notation for "midnight at 
  the end of today) or

where ts && tsrange( ... 00:00:00, ... 24:00:00, '[)] )

  Nice thing about the partial index is that you can create it
  on all of the non-ts fields for fast lookup by whatever and 
  only index the portion for today. 

* Think about using a materialized view rather than a temp
  table. May prove simpler to query.


-- 
Steven Lembark 3920 10th Ave South
Workhorse Computing   Birmingham, AL 35222
lemb...@wrkhors.com+1 888 359 3508



Re: Convert interval to hours

2018-09-14 Thread Steven Lembark
On Fri, 14 Sep 2018 12:21:14 -0400
David Gauthier  wrote:

> I'm using postgres v9.5.2 on RH6.

PG can convert the times for you.
For times (not timestamps) you are always better off dealing with
either time or integer seconds. There are a variety of issues with
rouding that affect repeatability and accuracy of results using 
floats or doubles. Given that 10 and three are both continuing 
fractions in binary (e.g., 1/10 binary is an infinite series) 
division by 3600 will only cause you annoyance at some point.

If you are subtracting times then you will (usually) end up with
an interval, which can be cast to seconds in the query and give
you precise, accurate, repeatable results every time.

e.g., 

select
extract
(
epoch from ( time1 - time2 )::interval 
)
as "seconds",
...

is one approach.

In nearly all cases you are better off selecting and converting
the time in SQL rather than converting the start and end times 
from numeric (time) to string (DBI) and then back from char *
to float/double or int/unsigned. The charaacter conversion is 
expensive and numeric -> string -> numeric leaes you open to all
sorts of rouding and conversion issues.

Frankly, if you have to run the query more than once I'd suggest
adding a view that does the select/convert for you (along with 
dealing with any NULL's that creep into things). PG makes it quite
easy to add the view and quite in-expensive to apply it.

-- 
Steven Lembark 3920 10th Ave South
Workhorse Computing   Birmingham, AL 35222
lemb...@wrkhors.com+1 888 359 3508

-- 
Steven Lembark 3920 10th Ave South
Workhorse Computing   Birmingham, AL 35222
lemb...@wrkhors.com+1 888 359 3508



Re: Convert interval to hours

2018-09-14 Thread Steven Lembark
On Fri, 14 Sep 2018 11:55:18 -0400
Peter Kleiner  wrote:

> On Fri, Sep 14, 2018 at 11:51 AM David Gauthier
>  wrote:
> >
> > Hi:
> >
> > In perl/DBI, I have code that's getting me an "age" which returns
> > something like... "-17 days -08:29:35".  How can I convert that to
> > a number of hours (as a float I would presume) ?

Suggest using one of the date modules. One issue is that not all
days are 86400s long: "leap second" is used to keep atomic clocks
in sync with siderial time so that telescopes report consistent 
values over time. Catch is that simply dividing by 3600 doesn't 
always work if the times fall across the wrong days.

You would normally want an integer for these rather than float
as the result to avoid rounding issues on extract or with comparisions 
in or out of the database. You will normally not have more than one 
second precision on times (timestamps are a different matter).

Q: What database are you using?

Postgres makes it easy enough to cast the values or compute the 
difference is seconds. 




-- 
Steven Lembark 3920 10th Ave South
Workhorse Computing   Birmingham, AL 35222
lemb...@wrkhors.com+1 888 359 3508



Re: Question on postgresql.conf

2018-08-01 Thread Steven Lembark
On Tue, 31 Jul 2018 00:02:47 -0500 (PET)
Alvaro Aguayo Garcia-Rada  wrote:

> Hi.
> 
> As far as I know, it's not currently possible. Maybe recompiling, but
> that could not be the best for production environment. I suppose this
> is this way becuase postgres is designed to expect a certain folder
> structure for it's data folder, and configuration files are
> considered part of it.
> 
> As you may see on the documentation:
> 
> https://www.postgresql.org/docs/9.6/static/app-postgres.html
> 
> You can change the data folder (-D command line option), but, no
> matter what the data folder is, it MUST have a postgresql.conf, as
> well as pg_hba.conf
> 
> However, some distros have made a similar appriach to what you may be
> looking to do. Don't remmeber which one, but there's a distro which
> it's stabdard postgresql server packages store configuration files
> at /etc/postgresql, while storing data at /var/lib/postgresql. This
> is done by simoly making a symlink, like "ln
> -s /mnt/shared/postgres/server1.conf /var/lib/postgres/data/postgresql.conf".
> Same applies to other conficuration files.

Normally the config files live on the server. If you want to 
hack the basename I'd suggest using symlinks for each machine.
Set up a whatever.conf. files you like, symlink them 
into of /etc/posgresql (or whatever you use on Windows):

e.g., 

#!/usr/bin/env bash

[ -z "$HOST_CONF_DIR"   ] || exit -1;
[ -d  $HOST_CONF_DIR] || exit -2;
[ -r  $HOST_CONF_DIR] || exit -3;

suffix=".$(hostname)";

cd /etc/postgresql;

for i in $HOST_CONF_DIR/*;
do
ln -fsv $i ./$(basename $i $suffix);
done

echo 'Config dir:';
ls -al $PWD;

exit 0;

Result: You have the standard paths where PG expexts them and 
ls -al (or readlink) will tell you which host they were generated
for.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: PostgreSQL Volume Question

2018-06-15 Thread Steven Lembark
On Thu, 14 Jun 2018 14:33:54 -0700
Data Ace  wrote:

> Hi, I'm new to the community.
> 
> Recently, I've been involved in a project that develops a social
> network data analysis service (and my client's DBMS is based on
> PostgreSQL). I need to gather huge volume of unstructured raw data
> for this project, and the problem is that with PostgreSQL, it would
> be so dfficult to handle this kind of data. Are there any PG
> extension modules or methods that are recommended for my project?

"huge" by modern standards is Petabytes, which might require some
specialized database service for a data lake.

Short of that look up the "jsonb" data type in Postgres.
The nice thing about using PG for this is that you can keep enough
identifying and metadata in a relational system where it is easier
to query and the documents in jsonb where they are still accessable.


-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: reduce number of multiple values to be inserted

2018-06-01 Thread Steven Lembark
On Wed, 30 May 2018 08:10:05 +0800
tango ward  wrote:

> curr_pgsql.execute('''
>INSERT INTO student (created, modified, name,
> address, age,
> level )
>VALUES (current_timezone, current_timezone,
>'Scott', 'Malayan Village', 21, 2),
>   (current_timezone, current_timezone,
>'Ben', 'Malayan Village', 21, 2),
>   (current_timezone, current_timezone,
>'Scott', 'Malayan Village', 21, 2),
> 
>   (current_timezone, current_timezone,
>'Andrew', 'Malayan Village', 25, 2),
>   (current_timezone, current_timezone,
>'Larry', 'Malayan Village', 25, 2),
>   (current_timezone, current_timezone,
>'Adam', 'Malayan Village', 25, 2),
> 
>   (current_timezone, current_timezone,
>'Elisse', 'Malayan Village', 27, 2),
>   (current_timezone, current_timezone,
>'Xena', 'Malayan Village', 27, 2),
>   (current_timezone, current_timezone,
>'Karen', 'Malayan Village', 27, 2)
> 
> I will repeat the same process for 13 villages so that will be 117 of
> values. I would like to know if there's a way to reduce the script?
> This has to be done strictly via script.

Rather than try to cut+paste SQL it may be easier to put the data
into a flat file and have some code spit the SQL out for you:

#!/usr/bin/env perl

# housekeeping

use v5.22;
use YAML::XSqw( Load );


# package variables


my $data 
= do
{
# slurp the data from named files or stdin.

local $/;
Load 
};

my $prefix  = <<'SQL';

/*
 * Input data for vx.y.z of student data schema.
 */

insert into student 
(
created,
modified,
name,
address,
age,
level
)
values
(
SQL

my $suffix = <<'SQL';
);

/*
 * end of input
 */
SQL


# output the SQL


say $prefix;

while( my ( $place, $place_valz ) = each %$data )
{
while( my ( $nums, $namz ) = each %$place_valz )
{
for my $name ( @$namz )
{
say <<"SQL";
(
current_timezone,
current_timezone,
'$name' ,
'$place',
'$nums'
)
}
SQL
}
}

say $suffix;

# this is not a module
0
__END__

=head1 NAME

output_sql - dump insert ... values ... from YAML

=head1 SYNOPSIS

Inputs arrive via stdin, literal, or glob-ed paths:


output_sql << /path/to/foobar.yaml;
output_sql /path/to/foobar.yaml;
output_sql /path/to/*.yaml;


gzip -dc < /path/to/bigfile.yaml | output_sql;
gzip -dc < /path/to/bigfile.yaml | output_sql | psql;



Your data file could look like this if you want a single flat file
for all of it:

---
Malayan Village :
  21, 2 :
  - Ben
  - Scott
  25, 2 :
  - Anderew
  - Larry
  - Adam
  ...
Another Village :
 ...

Or your could break it into chunks using multiple documents within
the YAML file (notice the extra '---'):

---
Malayan Village :
  21, 2 :
  - Ben
  - Scott
  25, 2 :
  - Anderew
  - Larry
  - Adam
  ...
---
Another Village :
 ...

At which point $data, above, is an array and you get:

for my $chunk ( @$data )
{
while( my ( $place, $place_valz ) = each %$chunk )
{
    ...
}
}

with the added flexibility of breaking the input data into 
multiple files if needed.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Selecting strict, immutable text for a composite type.

2018-05-10 Thread Steven Lembark
On Thu, 10 May 2018 17:38:48 -0400
Tom Lane <t...@sss.pgh.pa.us> wrote:

> Yeah, but that's not what you did.
> 
> I think you could make that work with
> 
> exclude using gist (
>   lat_lng_text(location) with =,
>   effective with &&
> )
> 
> but it's not going to apply the function without you telling it to.

Q: Why does it work with enums?

e.g., If I create a type foo_t as enum (...) and install the function
foo_text on foo_t the gist works. Why would the presence of a text
verison of the composite not get used the same way?

If the function works I'm happy, I just don't see the reasoning
behind having the enum's supported automatically and the composite
not handled.

enjoi

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Selecting strict, immutable text for a composite type.

2018-05-10 Thread Steven Lembark
On Thu, 10 May 2018 14:41:26 -0400
Tom Lane <t...@sss.pgh.pa.us> wrote:

> [ please keep the list cc'd ]
> 
> Steven Lembark <lemb...@wrkhors.com> writes:
> > On Thu, 10 May 2018 11:52:48 -0400
> > Tom Lane <t...@sss.pgh.pa.us> wrote:  
> >> Maybe you should show a more concrete example of what's not
> >> working.  
> 
> > The problem is with gists telling me that they cannot index
> > the type. This works for enums, just not the composite type.  
> 
> Oh, well, they can't.  There's no GiST opclass covering arbitrary
> composite types.  This doesn't seem very surprising to me given
> the lack of operators that such an opclass might accelerate.

But I thought that they could include functions of composite
types that were indexable (e.g., text)?

e.g., enums.

 
> What are you expecting an index on such a column to do for you?
> If you just want a uniqueness constraint, plain btree can handle it.

The composite participates in an exclusion constraint:

location
lat_lng_t
not null
,
effective   
tstzrange
not null
defualt tstzrange( now(), 'infinity', '(]' )
,
exclude using gist
(
locationusing =,
effective   using &&
)

i.e., the time series can have only one effective set of 
data for any one period.

So far as I knew it was possible to have a function on the
type that produced an indexable type (e.g., text). This worked
for the enums, I thought it would work for a composite: produce
a text value that is indexable.

Or is the declaration of the exclusion with a function rather
than the column? Or a function rather than '='? Or a separate
declaration that describes comparing the composite type that
allows the gist to work?



-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Selecting strict, immutable text for a composite type.

2018-05-10 Thread Steven Lembark
Trying to find specific doc's. Lots of people asking similar quesitons
told to work around it different ways. Is there any specific doc on
how to create a default operator class for a given composite type
with any examples of minimal working code to access the pieces and
convert them to text?

I have produced quite a variety of botched syntax on this trying to
see which shit sticks to the wall... just one working example would
be most apprecated.

As an example: Say I wanted to use domains for latitude and longitude
that enforce +/- 180 and +/- 90 and let's say for the moment that
float is an approprite base type (though it might be real or numeric,
that is a seprate question). I can combine the domains into a single
type with a lat and lng elemnt.

So far so good.

Q: What is the syntax for a strict, immutable function in 
   SQL that returns text suitable for use with either 
   creating a "select *" view or a GIST index?

I believe my mistake is somewhere in the number and placement
of paren's but I have tried quite a few combinations based on
various postings without success. Rather than detail them I'm
just asking for a reference to one working example for any
roughly similar composite type.

thanks

Example:

  /*
   * this much works.
   */

  create domain longitude_d
  as
  float
  not null
  check
  (
VALUE >= -180.0::float
and
VALUE <= +180.0::float
  );

  create domain latitude_d
  as
  float
  not null
  check
  (
value >= -90.0::float
and
value <= +90.0::float
  );

  create type lat_lng_t
  as
  (
lng longitude_d,
lat latitude_d
  );


  /*
   * someting about this does not work.
   * the signiture seems reasonable,
   * as does the returns ... as.
   *
   * Q: What is the correct syntax for
   *select ... ?
   */

  create or replace function
  lat_lng_text
  (
lat_lng_t
  )
  returns text language sql strict immutable as
  $$
select ($1).lng::text || '-' || ($1).lat::text
  $$




-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Enhancement to psql command, feedback.

2018-05-10 Thread Steven Lembark

The whole point of "#!" working in shell is that the two-bytes
(a) mark the file as executable by a specific shell command and 
(b) are a shell comment.

One fairly simple fix that would make annotating here scripts
and the like simpler for shell(ish) execution would be simply
ignoring all text from "\n#" to the first "\n", which would 
allow the #! to function as a comment -- just as it does in
the shell. 

Another way to do it would be adding a '#' command to psql,
similar to '\', that accepts a one-line directive and ignores
it entirely. This would use the existing framework for detecting
the context of '\' as a command, just with a different magic
char.


-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Rationale for aversion to the central database?

2018-04-28 Thread Steven Lembark

> That seems un-pragmatic to me. IMHO if any business logic needs
> access to lots of data, it's best implemented by code that
> resides in the database itself, close to the data. I once had a

There is a balance: ETL and reporting code is often iterative
and can be more flexable in an external language. 

Getting the balance right takes some thought, planning -- both of
which are usually in short supply on software projects. 

Testing database code and structures is easy in PG using TAP:

<https://blog.2ndquadrant.com/using-postgresql-tap-framework-extensions/>

Also easy to test database contents and emedded logic:

<https://www.slideshare.net/lembark/object-exercise>

Putting all or no logic into the database as a knee-jerk reaction 
is usually a mistake. Especially with PG testing is easy using 
extensions and schemas to segregate the code/data being tested.

At some point the data will have to hit code, especially in ETL or
reporting logic. Balancing the work in and out of the database
just makes sense. Which is probaly why it doesn't happen...


-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Rationale for aversion to the central database?

2018-04-28 Thread Steven Lembark
On Sat, 28 Apr 2018 08:02:21 +0200
"Peter J. Holzer" <hjp-pg...@hjp.at> wrote:

> On 2018-04-27 22:52:39 +, g...@luxsci.net wrote:
> > Perhaps I'm extreme. In my ideal world, developers might not even
> > know table names! I'm kidding ,sorta...  
> 
> If they don't know the table names, how can they write those stored
> procedures?

One of the main reasons DBA's need to be pigs. Much spaghetti can be
avoided in ORM frameworks with well-designed views. 

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Long running INSERT+SELECT query

2018-04-27 Thread Steven Lembark
On Fri, 27 Apr 2018 19:38:15 +0300
Vitaliy Garnashevich <vgarnashev...@gmail.com> wrote:

> We're going to try using "SELECT 1 FROM table FOR KEY SHARE" for each of 
> the tables, which are referenced by results, before running the big 
> query. That should be up to a million of rows in total. It will probably 
> not cover the case when a record is INSERT'ed and then DELETE'd after 
> the calculation has begun, but such cases should be even more rare than 
> the DELETE's we're currently facing.

Thing about using a couple of Materialized Views for the worst 
part of it.


-- 
Steven Lembark 1505 National Ave
Workhorse Computing   Rockford, IL 61103
lemb...@wrkhors.com  +1 888 359 3508



Re: decompose big queries

2018-04-27 Thread Steven Lembark

> Hi,
> I want to know what are the best practice to use in order to
> decompose a big query which contains so many joins.Is it recommended
> to use stored procedures ? or is there any other solution?

The main problem with SP's is that they can really screw up
optimization. Most of the time you'll get more bang for the
buck by managing the SQL, breaking it into more manageable 
chunks.

Views can be a big help for this. Materialized views can help
by pushing lock-heavy or long-running subqueries into the wee
hours where resources can be managed a bit more easily (e.g.,
you don't collide with interactive queries or may be able to
just lock the tables and be done with it). They also give you 
a more stable dataset to start with (e.g., no need to worry
about inserts causing disconnects in totals or requiring 
table locks to avoid).

Temporary tables can help by breaking out messy portions of
the query and allowing you to pre-filter portions of the result.

Scratch tables (i.e., a permenant table that gets truncated after
use) can also help by staging portions of the query in indexed
areas. With careful use they can be a big help because you can 
pre-aggregate portions of the query into a table with nice, 
specific indexes. 

Many of the worst queries involve reporting on aggregates over time
that are re-calculated for each report. Pre-aggregating on, say, a
daily basis can both simplify the reporting query and allow you to
push some of the work off to the wee hours. You can also get the 
benefit of more specific values that may allow for unique indexes
on the subtotals. 

If the query involves "many" joins there is a good chance that 
they break into clumps of related data. Breaking, say, daily usage
vs. user account vs. accounting data into separate [materialized]
views or temp tables keeps the joins more manageable and helps
release resources that might otherwise get consumed for the entire
query.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Rationale for aversion to the central database?

2018-04-27 Thread Steven Lembark
On Sun, 8 Apr 2018 14:39:49 -0700
Guyren Howe <guy...@gmail.com> wrote:

> I am a Rails developer at a medium-large size company. I’ve mostly
> worked at smaller companies. I’ve some exposure to other web
> development communities.
> 
> When it comes to databases, I have universally encountered the
> attitude that one should treat the database as a dumb data bucket.
> There is a *very* strong aversion to putting much of any business
> logic in the database. I encounter substantial aversion to have
> multiple applications access one database, or even the reverse: all
> abstraction should be at the application layer.
> 
> My best theory is that these communities developed at a time when
> Windows was more dominant, and just generally it was *significantly*
> easier to use MySQL than Postgres for many, particularly new,
> developers. And it is pretty reasonable to adopt an aversion to
> sophisticated use of the database in that case.

The biggest single problem in most cases is a combination
of communiction and attitude.

Recall that most programmers are not DBA's and don't have access to
create anything in the database. Traditionally the High Priests of 
Holy Data don't allow mere peons to create or modify content in the
databases. Result is that programmers *have* to do most of their 
logic in code, even if the results put hugely more load on the servers.

The DBA's are also not Well Qualified Genious Programmers who obviously
know much more than mere DBA's about what the real use of data is 
supposed to be. They don't need no stinkin' SQL! They have the magic
of ORM!

Net result: People talk past one another.

I've nearly been lynched for creating ten-row temporary tables without
prior written permission or even suggesting that we might add indexes
to support more effecient use of views.

I've also nearly strung programmers up for running hundreds of 
"select * from table" statements because they weren't willing to 
figure out how to write a join or simple filter or give me enough
data to do it for them.

Good example are ORM packages: Most of really hideous code required
for joins, group by, coalesce can easily be pushed into views allowing
simple select X from Y where or one-line joins. DBA's can craft views
that make sense, programmers could use them to simplify their code.

 Q: When is the last place you worked where DBAs were pigs (vs.
chickens)?

 Q: How often was a DBA involved in the analysis and design of cycle
of reporting?

 Q: How many programming projects have you been on where the hackers
were willing to write clear, concise data reqirements and keep
them up to date?

When that happens people will use the databases as tools, ORM code
will be [semi-] maintainable, and we'll live happily for ever after.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Long running INSERT+SELECT query

2018-04-27 Thread Steven Lembark
On Thu, 26 Apr 2018 23:32:33 +0300
Vitaliy Garnashevich <vgarnashev...@gmail.com> wrote:

> The aggregated table has hundreds of millions of rows, and the query 
> runs for many hours (which is one of the reasons why it's better not
> to fail). I really doubt that row level locking would work. That
> would be a lot of RAM just to hold all the locks.

All the more reason to use temp tables.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Long running INSERT+SELECT query

2018-04-27 Thread Steven Lembark
On Thu, 26 Apr 2018 21:08:01 +0300
Vitaliy Garnashevich <vgarnashev...@gmail.com> wrote:

>      INSERT INTO cmdb_sp_usage_history
>    (created_by, updated_by, created_on, updated_on, mod_count,
>    summary_on, quarter, product, used_from, "user",
>    keystrokes, minutes_in_use, times_started, avg_keystrokes, 
> max_keystrokes, spkg_operational)
>      SELECT
>     2, 2, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 1,
>     CURRENT_TIMESTAMP, quarter.id, "spv"."product", 
> "usage"."used_from", "usage"."user",
>     coalesce(sum("usage"."keystrokes"), 0),
>     coalesce(sum("usage"."minutes_in_use"), 0),
>     coalesce(sum("usage"."times_started"), 0),
>     coalesce(avg("usage"."keystrokes"), 0),
>     coalesce(max("usage"."keystrokes"), 0),
>     bool_or("cmdb_ci"."operational")
>      FROM
>    "cmdb_program_daily_usage" "usage"
>      LEFT OUTER JOIN
>    "cmdb_program_instance" "p" ON "p"."id" =
> "usage"."program_instance" LEFT OUTER JOIN
>    "cmdb_ci_spkg" "s" ON "s"."id" = "p"."spkg"
>      LEFT OUTER JOIN
>    "cmdb_ci" "cmdb_ci" ON "s"."id" = "cmdb_ci"."id"
>      LEFT OUTER JOIN
>    "cmdb_software_product_version" "spv" ON "spv"."id" =
> "s"."software" WHERE ("usage"."minutes_in_use" > 0)
>    AND ((NOT ("s"."software" IS NULL))
>     AND ((NOT ("s"."os" = TRUE))
>      OR ("s"."os" IS NULL)))
>    AND ("usage"."usage_date" >= quarter.start_date)
>    AND ("usage"."usage_date" < quarter.end_date)
>      GROUP BY "spv"."product", "usage"."used_from", "usage"."user"
>      HAVING (coalesce(sum("usage"."keystrokes"), 0) > 0) OR 
> (coalesce(sum("usage"."minutes_in_use"), 0) > 0) OR 
> (coalesce(sum("usage"."times_started"), 0) > 0)
>      ORDER BY "spv"."product", "usage"."used_from", "usage"."user";

create temporary table
foobar 

select
<unaggregatedl, un-coalesced data>
from

;

This isolates the lock time to performing the bare select, after
which you can coalesce and sum to your heart's content without 
locking any of it.

The point is performing the absolute minimum of processing to 
generate the temp table so as to release any locks quickly and
avoid "group by" in the main join.

Yes, this might end up creating a large-ish temp table :-)

One other approach would be selecting only incremental data
(e.g., daily) which locks a much smaller subset of the rows
and aggregating the daily totals into quarterly, whatever.
Call it daily usage, select where usage_date = today's or
timestamp && a tstzrange of ( , 2400, [) ). That might
also simplify your query logic: all the coalesce op's end up
in your daily/weekly/monthly/whatever summary, the quarterly 
values in the reporting are just sum X group by.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Long running INSERT+SELECT query

2018-04-27 Thread Steven Lembark
On Thu, 26 Apr 2018 19:13:17 +0300
Vitaliy Garnashevich <vgarnashev...@gmail.com> wrote:

> We're trying to populate a table with aggregated data from other
> tables. For that we're running a huge INSERT+SELECT query which joins
> several tables, aggregates values, and then inserts the results into
> another table. The problem we're facing is that while the query is
> running , some records in tables referenced by the results table may
> be deleted, which causes the following error:
> 
> ERROR:  insert or update on table "..." violates foreign key
> constraint "..."
> DETAIL:  Key (...)=(...) is not present in table "...".
> 
> Who do we make sure that such aggregating query would not fail?

Create a temporary table with a useful subset of the data.

You can select the mininimum number of columns joined and release
the locks. This can also help large queries by giving you a stable
snapshot of the data for repeated queries. 

I usually find that pre-joining the tables is easier because 
temp tables have restrictions on re-use w/in the query, and also
usually don't have indexes to speed up the joins.

If you are going to run this, say, daily it's easy enough to create
a view and just "create temporary table foo as select * from bar"
for some collection of views and go from there. This makes it easier
to tune the queries on the back end without having to hack the front
end code.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Please suggest the best suited unit test frame work for postgresql database.

2018-04-02 Thread Steven Lembark
On Sun, 1 Apr 2018 10:26:32 +0530
Raghavendra Rao J S V <raghavendra...@gmail.com> wrote:

> Good morning.
> 
> Please suggest the best suited unit test frame work for postgresql
> database and also shared the related documents to understand the
> framework.

If anyone there knows Perl (if not I'll be happy to teach it) you
can use DBI with Object::Exercise to validate anything. The module
documentation includes DBI examples or I can send you a few.

One nice thing about Perl is that you can inlcude source code
from a dozen or so other languages from Python to Java that will
be auto-wrapped and run automatically. This makes it really easy
to test multi-language interfaces to the database with a single
harness.

enjoi

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Enforce primary key on every table during dev?

2018-03-01 Thread Steven Lembark

> On 03/01/2018 02:20 AM, Alban Hertroys wrote:
> [snip]
> > Not to mention that not all types of tables necessarily have
> > suitable candidates for a primary key. You could add a surrogate
> > key based on a serial type, but in such cases that may not serve
> > any purpose other than to have some arbitrary primary key.
> >
> > An example of such tables is a monetary transaction table that
> > contains records for deposits and withdrawals to accounts. It will
> > have lots of foreign key references to other tables, but rows
> > containing the same values are probably not duplicates. Adding a
> > surrogate key to such a table just adds overhead, although that
> > could be useful in case specific rows need updating or deleting
> > without also modifying the other rows with that same data -
> > normally, only insertions and selections happen on such tables
> > though, and updates or deletes are absolutely forbidden -
> > corrections happen by inserting rows with an opposite transaction.  
> 
> Wouldn't the natural pk of such a table be timestamp+seqno, just as
> the natural pk of a transaction_detail table be transaction_no+seqno?

Start with Date's notion that a database exists to correclty represent
data about the real world. Storing un-identified data breaks this 
since we have no idea what the data means or have any good way of 
getting it back out. Net result is that any workable relational 
database will have at least one candidate key for any table in it.

If you can say that "rows containing the same values are not
duplicates" then you have a database that cannot be queried, audited,
or managed effectively. The problem is that you cannot identify the 
rows, and thus cannot select related ones, update them (e.g., to 
expire outdated records), or validate the content against any external
values (e.g., audit POS tapes using the database).

In the case of a monitary transaction you need a transaction
table, which will have most of the FK's, and a ledger for the 
transaction amounts.

A minimum candidate key for the transaction table would be account, 
timestamp, authorizing customer id, and channel. This allows two 
people to, say, make deposits at the same time or the same authorizing 
account (e.g., a credit card number) to be processed at the same time 
in two places.

The data for a transaction would include things like the final status, 
in-house authorizing agent, completion time.

The ledger entries would include the transaction SK, sequence within
the transaction, amount, and account. The ledger's candidate key is 
a transaction SK + sequence number -- the amount and account don't 
work because you may end up, say, making multiple deposits of $10
to your checking account on the same transaction.

The ledger's sequence value can be problematic, requiring external
code or moderately messy triggers to manage. Timestamps don't always
work and are subject to clock-skew. One way to avoid this is require
that a single transaction contain only unique amounts and accounts.
At that point the ledger becomes a degenerate table of  transaction id, 
amount, account (i.e., the entire table is nothing but a unique index).

This might require generating multiple database transactions for a
single external process (e.g., a customer walking up to the teller)
but does simplify processing quite a bit.

In both cases, having an SK on the ledger is useful for audit queries,
which might have to process a large number of ledger entries in code.
Extracting the ledger SK's in one query and walking down them using
a unique index can be more effecient than having to extract the values.

Either way, you can identify all of the transactions as unique and 
all of the ledger entries for that transaction. At that point the 
database can be queried for data, updated as necessary, audited 
against external data.

If you have a design with un-identified data it means that you havn't
normalized it properly: something is missing from the table with 
un-identifiable rows.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: PG Sharding

2018-01-31 Thread Steven Lembark
On Mon, 29 Jan 2018 15:34:18 +0100
Matej <gma...@gmail.com> wrote:

> Hi Everyone.
> 
> We are looking at a rather large fin-tech installation. But as
> scalability requirements are high we look at sharding of-course.
> 
> I have looked at many sources for Postgresql sharding, but we are a
> little confused as to shared with schema or databases or both.

Suggest looking at the Xreme Data product. It is a parallel,
shared-nothing implementation of PG that should solve your
needs rather nicely.

You can see a description of their product at
https://xtremedata.com/

Happy scaling :-)



-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Recreating functions after starting the database server.

2018-01-31 Thread Steven Lembark

> In my case the upgrade routines run migration scripts which modify a
> database schema. Besides that I have a bunch of SQL files with the
> schema that can be used to initialize a new database.

Why not just update the proc's, etc, that depend on the changes?
 
-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Using AWS ephemeral SSD storage for production database workload?

2018-01-29 Thread Steven Lembark
On Mon, 29 Jan 2018 23:27:32 +0530
Pritam Barhate <pritambarh...@gmail.com> wrote:

> In short, I am just trying to learn from other people's experience.

This is identical to solutions that use tmpfs on linux for 
database storage or dealing with a fully failed storage 
system. Think about what you'd do if a RAID controller 
fried and botchd your entire array at once. You'll feel
just the same way if a box using ephemeral storage goes
down.

Your application needs to handle restarting transactions
and either a reverse proxy/load-balancer or client-side
switchover.

Depending on your tolerance for data loss you might need
three servers up, on as a secondary failover if the primary
fails so that you (pretty much) always have two servers up
to maintain the data. The last server only has to last long
enough for a restart and recovery so it might not have to
be very heavy duty, it's main purpose is to keep the database
alive long enough to recover the "real" server.

Q: Why not just use RDS?

It'll be simpler.

-- 
Steven Lembark   1505 National Ave
Workhorse Compuing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: PG Schema to be used as log and monitoring store

2017-12-09 Thread Steven Lembark
On Sat, 9 Dec 2017 20:22:02 +0100
Stefan Keller <sfkel...@gmail.com> wrote:

>   create table m1 (
> id bigint,
> created timestamp,
> b20 bit(20) default b'',
> farr20 float8[20]
>   );

In general this is a bad idea *unless* you have benchmarked the 
database and found that the amount of space saved really does make
some difference. Using the packed format makes most SQL a lot harder
to write and makes indexing impossible (or at least messy and rather
error prone). This also makes adding add'l fields harder.

If you were really intent on doing this I'd add a few million recods
with both formats on a database tuned to handle the load and see if 
the packed bits really do make a difference. My guess is that you
won't see all that much difference in storage and the query speed
with effective indexing is going to be decent.

Using this database might be a lot simpler with a few that 
breaks the sub-fields out, or which has indexes on the sub
-fields within the packed data. 


-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: large numbers of inserts out of memory strategy

2017-12-01 Thread Steven Lembark
On Thu, 30 Nov 2017 08:43:32 -0600
Ted Toth <txt...@gmail.com> wrote:

> What is the downside of using a DO block? I'd have to do a nextval on
> each sequence before I could use currval, right? Or I could do 'select
> last_value from '.

You are creating a piece of code that has to be parsed, tokenized,
and compiled prior to execution. What's biting you is that you've
created a function the size of your dataset.

If you like do-blocks then write a short block to insert one record
using placeholders and call it a few zillion times.

That or (in DBI-speak):


eval
{
$dbh->{ RaiseError  } = 1;
$dbh->{ AutoCommit  } = 0;

my $sth = $dbh->prepare
(
'insert into yourtable ( field field ) values ( $1, $2 )'
);

$sth->do( @$_ ) for @rows;

$dbh->commit
}
or die "Failed execution: $@";

which will be nearly as effecient in the long run.

That or just import the data from a csv/tsv (there are good 
examples of data import available in the PG docs).

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: large numbers of inserts out of memory strategy

2017-11-29 Thread Steven Lembark

> I'm pretty new to postgres so I haven't changed any configuration
> setting and the log is a bit hard for me to make sense of :(

Diving into the shark tank is a helluva way to learn how to swim :-)

Are you interested in finding doc's on how to deal with the tuning?


-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: large numbers of inserts out of memory strategy

2017-11-29 Thread Steven Lembark

> > what tools / languages ate you using?  
> 
> I'm using python to read binary source files and create the text files
> contains the SQL. Them I'm running psql -f .

Then chunking the input should be trivial.
There are a variety of techniques you can use to things like disable
indexes during loading, etc. Maybe load them into temp tables and 
then insert the temp's into the destination tables. The point is to 
amortize the memory load over the entire load period.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508