Re: The tragedy of SQL

2021-09-17 Thread Benedict Holland
I love how we would admonish sql but love lisp. There isn't a perfect
language. SQL is fine. C is damn good. C++ is impossible, Java is C++ but
simple, Python is a C wrapper. God help us if we settled on Fortran. We
would still have single core processors. Lisp at least allowed
multithreading but is very hard to debug.

But back to the issue at hand, SQL is hard. It falls into the trap that C++
did where it is very hard and time consuming to make good schemas and
organize data correctly but it is very easy to do it badly. Then we try and
fix bad design in places where the fixes don't belong like the view or God
forbid the controller. That leads to horrible code and even more bugs.

I make a career of fixing broken schemas and teaching good design. I am 15
years in and learn how people screw up designs every day. Sometimes they
are beyond repair and I end up creating new designs and migrate the data.
Who knew that you should hire experts to teach novices but experts are
expensive and most of the time the code is throwaway anyway.

I don't get why there are so many programming languages out there. C is
virtually perfect. Python is C with benefits. Everything else appears to
just be offshoots of Python or Lisp and no one uses Lisp as far as I can
tell. SQL isn't really a programming language. It is just a layer on top of
data.

On Fri, Sep 17, 2021, 3:44 PM Gavin Flower 
wrote:

> On 17/09/21 23:49, Raymond Brinzer wrote:
> > On Tue, Sep 14, 2021 at 9:06 AM Merlin Moncure 
> wrote:
> >> I've long thought that there is more algebraic type syntax sitting
> >> underneath SQL yearning to get out.
> > I wanted to come back to this, because I've been looking to take a
> > single problem (from my perspective) and explain it concisely.  Your
> > intuition is right on the mark.
> >
> > Shell syntax is a pretty good lingua franca, so let's use it.  Say you
> > were working at the command line, and you said something like:
> >
> > cat somefile | awk '{print $3 " " $1 " " $5;}' | sort | grep "^Robert"
> >
> > And the shell responded with something like:  ERROR: syntax error at
> > or near "sort".  After a little tinkering, you discover:  that's
> > because the grep has to come before the sort.  But why?
> >
> > The database is not going to evaluate relational operations in order,
> > passing the output of one into the next as a shell pipe does.
> > Nevertheless, they are logically independent.  Each should take in a
> > relation and either a second relation or a predicate, and return a
> > relation.  Or, to put it mathily, relations are closed under
> > relational operations.  So:
> >
> > Operation 1 | Operation 2
> > and
> > Operation 2 | Operation 1
> >
> > should both be valid, whether or not they're semantically equivalent
> > (though they often are).  The operations are inherently atomic, and
> > can be understood in isolation.
>
> [...]
>
> In Mathematics which way round you do things may be important. For
> numbers in the Real & Complex domains then this does not matter.
> However, in the Quaternions it does matter, here A * B is not always the
> same as B * A.  And amongst the Octonions it is even worse, as there the
> order in which you do things may lead to different results, so A * (B *
> C) is not necessarily the same as (A * B) * C.
>
> Another example is rotating things in 3 dimensions.  Hold a book with
> its front facing you.  Rotate the book towards you so it is now flat,
> them rotate the book along the vertical access so it is now edge on.
> When you do the operations in the reverse order, then you get a
> different result! Yes, you can blame the Quaternions.
>
> In PostgreSQL, if the operations are 'not idempotent' (relies on at
> least one function that has varying output for the same input
> parameters) then the order in which you do things could lead to
> different results.
>
> For the optimizer to be effective then it must be allowed to do
> operations in the best order it sees fit -- this is documented. Just as
> you must not rely on the order in which results are returned, unless you
> explicitly have an ORDER BY -- as the system will extract results in the
> fastest way it knows, which may not necessarily be in the same order as
> the values where inserted. This would be true, even if you had a totally
> different query language.
>
>
> Cheers,
> Gavin
>
>
>
>
>


Re: ZFS filesystem - supported ?

2021-10-25 Thread Benedict Holland
In my opinion, ext4 will solve any and all problems without a very deep
understanding of file system architecture. In short, i would stick with
ext4 unless you have a good reason not to. Maybe there is one. I have done
this a long time and never thought twice about which file system should
support my servers.

On Mon, Oct 25, 2021, 6:01 PM Robert L Mathews  wrote:

> On 10/25/21 1:40 PM, Mladen Gogala wrote:
> > This is probably not the place
> > to discuss the inner workings of snapshots, but it is worth knowing that
> > snapshots drastically increase the IO rate on the file system - for
> > every snapshot. That's where the slowness comes from.
>
> I have recent anecdotal experience of this. I experiment with using
> Btrfs for a 32 TB backup system that has five 8 TB spinning disks.
> There's an average of 8 MBps of writes scattered around the disks, which
> isn't super high, obviously.
>
> The results were vaguely acceptable until I created a snapshot of it, at
> which point it became completely unusable. Even having one snapshot
> present caused hundreds of btrfs-related kernel threads to thrash in the
> "D" state almost constantly, and it never stopped doing that even when
> left for many hours.
>
> I then experimented with adding a bcache layer on top of Btrfs to see if
> it would help. I added a 2 TB SSD using bcache, partitioned as 1900 GB
> read cache and 100 GB write cache. It made very little difference and
> was still unusable as soon as a snapshot was taken.
>
> I did play with the various btrfs and bcache tuning knobs quite a bit
> and couldn't improve it.
>
> Since that test was a failure, I then decided to try the same setup with
> OpenZFS on a lark, with the same set of disks in a "raidz" array, with
> the 2 TB SSD as an l2arc read cache (no write cache). It easily handles
> the same load, even with 72 hourly snapshots present, with the default
> settings. I'm actually quite impressed with it.
>
> I'm sure that the RAID, snapshots and copy-on-write reduce the maximum
> performance considerably, compared to ext4. But on the other hand, it
> did provide the performance I expected to be possible given the setup.
> Btrfs *definitely* didn't; I was surprised at how badly it performed.
>
> --
> Robert L Mathews, Tiger Technologies, http://www.tigertech.net/
>
>
>


Re: ZFS filesystem - supported ?

2021-10-26 Thread Benedict Holland
Honestly, I think if you are at the point I performance where you care
about a file system then you are in the upper 5% or higher. I am trying to
get my users to not do 100k individual inserts in django rather than 1
insert using sql. As for backups, I don't care how long they take. I ha e
separate computers that run backup and 15m interval Wal archives. Is it the
best? No. But this is a mailing list to get help with postgres and
specifically not a way to get production level support from a free service.
We try and get the message out that postgres is amazing. If your OS
defaults to ext4 then I assume linux experts did the tests to show that
ext4 is fine. If you have specific use cases for reported benchmarking then
that would be an interesting read but apart from that, this feels like such
a nuanced and detailed question that the op should pay for the postgres
support and test their systems and then report the results.

But that might just be me. Maybe the debate between zfs, xfs, ext4 or
others is a good discussion to have. I would love to hear about what bench
marks provide for a variety of use cases.

On Tue, Oct 26, 2021, 8:56 PM Mladen Gogala  wrote:

>
> On 10/26/21 20:50, Imre Samu wrote:
>
> > Phoronix has some very useful benchmarks:
> >
> https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.14-File-Systems
> > Ext4 is much better than XFS with SQLite tests and almost equal with
> > MariaDB test. PostgreSQL is a relational database (let's forget the
> > object part for now) and the IO patterns will be similar to SQLite and
> > MariaDB.
>
> there is a link from the Phoronix page to the full OpenBenchmarking.org
> result file
> and multiple PostgreSQL 13 pgbench results included:
> https://openbenchmarking.org/result/2108260-PTS-SSDS978300&sor&ppt=D&oss=postgres
> ( XFS, F2FS, EXT4, BTRFS )
>
> Regards,
>  Imre
>
> Wow! That is really interesting. Here is the gist of it:
>
> XFS is the clear winner. It also answers the question about BTRFS. Thanks
> Imre!
>
> --
> Mladen Gogala
> Database Consultant
> Tel: (347) 321-1217https://dbwhisperer.wordpress.com
>
>


Re: ZFS filesystem - supported ?

2021-11-04 Thread Benedict Holland
Right... which is why you do pg_basebackup infrequently. It also captures
WALs when conducting the backup. The tradeoff is that if you have a huge
amount of WAL files then running a restore can be quite time-consuming.
There isn't really a clear win here though. You trade off a long time
backup that will restore everything to that exact moment in time or you
trade off a restore that could take an extremely long time in the case
where you just keep WALs and do a base backup monthly. Worst case scenario
is that you need to restore an hour before a scheduled base backup so you
get to go through every WAL to that point. Snapshots are fine. We also just
pg_dump and tar+bz2 the result. You can even do that on a remote machine.
It is probably a crude solution but it works.

But again, the OP should probably be paying for a service and not relying
on a message board for something like this. I guarantee that Postgres
owners know what to do for this case and have a selection of best
practices because they should and need to or you hire a professional DBA to
step in and tell you what they found works the best.

I absolutely loved the response of ZFS because the clear winner when ZFS
and ext4 performed the benchmark was nearly the same. Now you get to deal
with a non-default file system for a gain that should not be noticeable
unless you are really trying to use every single last cycle on your
computer. If you are, I highly recommend getting the paid support and
hiring a crack team of DBAs who can install your servers using complex
distrubutions.

Thanks,
~Ben

On Mon, Nov 1, 2021 at 10:43 PM Mladen Gogala 
wrote:

>
> On 11/1/21 17:58, Stephen Frost wrote:
> > Well, at least one alternative to performing these snapshots would be to
> > use a tool like pg_basebackup or pgbackrest to perform the backups
> > instead.
>
> File system based backups are much slower than snapshots. The
> feasibility of file based utility like pg_basebackup depends on the size
> of the database and the quality of the infrastructure. However, if
> opting for snapshot based solutions, I would advise something like Pure
> or NetApp which use special hardware to accelerate the process and have
> tools to backup snapshots, like SnapVault (NetApp).
>
> Also, when using file level utilities, I would go full commercial.
> Commercial backup utilities are usually optimized for speed, support
> deduplication and maintain backup catalog, which can come handy if there
> are regulatory requirements about preserving your backups (HIPPA, SOX)
>
>
> --
> Mladen Gogala
> Database Consultant
> Tel: (347) 321-1217
> https://dbwhisperer.wordpress.com
>
>
>
>


Re: Postgresql + containerization possible use case

2021-12-13 Thread Benedict Holland
Check out rolling updates with kubernetis and yes, a containerized
postgresql environment is likely the best option. The really nice bit about
containers is that you don't actually care about underlying hardware. You
need a docker daemon. K8 manages clusters and it sounds like your use case
would benefit from that.

On Mon, Dec 13, 2021, 11:03 AM Achilleas Mantzios <
ach...@matrix.gatewaynet.com> wrote:

> Hi Berto,
>
> On 13/12/21 3:03 μ.μ., Bèrto ëd Sèra wrote:
>
>
> On Mon, 13 Dec 2021 at 12:41, Achilleas Mantzios <
> ach...@matrix.gatewaynet.com> wrote:
>
>> Hmm, I don't know about that. The hardware is given (existing) and
>> limited.
>> You are like the 2nd person who warned about comms as being an issue with
>> docker/containers.
>>
>
> Quick question: do those ships ever touch port? If so, it's not a fact
> that you have to rely on sat com to do updates. Just keep an on-shore table
> of updates and do the updates every time a ship is in a port and can use a
> normal port wifi com. IMHO this would be the safest option EVEN if satcom
> was free of cost, as it will never be half as stable as a land based com
> line.
>
> Yes they do touch port, otherwise they would be of little commercial value!
> I am confused here : Are we talking about traffic due to :
> a) normal business traffic (designed by me/us old system based on DBMirror)
> b) system updates due to some decision of the system itself and/or our team
> c) normal app shipping via new images, or new .ear/.war or whatever
> ?
>
> And since you are not deploying the same thing everywhere (you quote
> different HW everywhere, so hardly one image could be optimised for
> everything), you are going to need to test upgrades for each ship, which is
> both dangerous and expensive, if done on sat com.
>
>
> We are deploying the same exact app everywhere (jboss), but of course
> exim4, etc other services will be specific for every ship.
>
>
> My 5p
> Berto
>
>
>
> --
> Achilleas Mantzios
> DBA, Analyst, IT Lead
> IT DEPT
> Dynacom Tankers Mgmt
>
>


Re: Best Strategy for Large Number of Images

2021-12-15 Thread Benedict Holland
I would recommend storing a link to the file rather than the file itself.
Other than that, use BLOBS. I would probably recommend not storing any
binary objects in a database for a variety of reasons but if you have to
then bytea will work, assuming they are smaller than 2gb.

Thanks,
- Ben

On Wed, Dec 15, 2021, 4:41 PM Bruce Momjian  wrote:

> On Wed, Dec 15, 2021 at 02:45:15PM -0300, Estevan Rech wrote:
> > Hi,
> >
> > I don't currently use PostgreSQL, but I plan to migrate and I have a
> question
> > about the best way/strategy for storing images. I have about 2 million
> images
> > with a growth trend of around 1 million images per year. I plan to store
> in
> > bytea format in an isolated table. Is this recommended? Is there another
> better
> > way?
>
> You might want to read this:
>
> https://momjian.us/main/blogs/pgblog/2020.html#June_19_2020
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   If only the physical world exists, free will is an illusion.
>
>
>
>


Re: Re. Backup of postgresql database

2022-02-04 Thread Benedict Holland
Pg_dump or do a wal archive.

Thanks,
Ben

On Fri, Feb 4, 2022, 10:49 PM Ron  wrote:

> On 2/4/22 9:14 PM, sivapostg...@yahoo.com wrote:
>
> Hello,
> Got few (currently 3 will increase as days go) database in A2 hosting
> unmanaged vps server, running Ubuntu server 18 lts
>
> Planning to have an additional backup of all files, including pg database,
> to another cloud provider like Amazon S3 (any other suggestions?)
>
> Size won't be crossing 5GB per database for the next few years.
>
>
> How are you *currently* backing up the databases?
>
> --
> Angular momentum makes the world go 'round.
>


Re: Can we go beyond the standard to make Postgres radically better?

2022-02-10 Thread Benedict Holland
This is a strange post. Why is SQL bad and how do your reconcile that with
managing 99%+ of all data? It's so bad that we have systems that plug into
sql to query data outside of tables like Athena or Excel.

Why are you not using pgadmin4? Yes. Psql as a command line isn't great for
humans. It's spectacular for computers though. So we have pgadmin4, which I
would take over any other database ui.

Do you not want your views to change with underlying base tables changing?
Do a fully specified select. It's better programming anyway. Do you want an
api? That seems like a bad idea (i would never implement it) but you also
have a postgres socket, flask, and sqlalchemy or psycopg2. It would take a
few hours to write your own. Again, please don't do that. You will almost
surely lose user information like who decided to delete your client data
and your api would likely require user privileges to get passed by token
(sso would be a nightmare to authenticate) or simply give root privileges
to an api. Both are honestly really bad.

Now if postgres had the ability to do schema change tracking with
rollback... now that would be a victory. But there are sort of 3rd party
solutions that sort of work some of the time. It's a hard problem and
automated sql generation, particularly automated schema migrations, are
really hard to build in general and there are specific things that are damn
hard to not break.

Thanks,
Ben


On Thu, Feb 10, 2022, 4:13 PM Bruce Momjian  wrote:

> On Thu, Feb 10, 2022 at 06:25:45PM +0100, Peter J. Holzer wrote:
> > On 2022-02-10 18:22:29 +0100, Peter J. Holzer wrote:
> > > On 2022-02-09 21:14:39 -0800, Guyren Howe wrote:
> > > >   • SELECT * - b.a_id from a natural join b
> > > >   □ let me describe a select list by removing fields from a
> relation. In
> > > > the example, I get all fields in the join of  a  and b other
> than the
> > > > shared key, which I only get once.
> > >
> > > Natural join already does this.
> > >
> > > My use case for such a feature are tables which contain one column (or
> a
> > > small number of columns) which you usually don't want to select: A
> bytea
> > > column or a very wide text column. In a program I don't mind (in fact I
> > > prefer) listing all the columns explicitely, but exploring a database
> > > interactively with psql typing lots of column names is tedious
> > > (especially since autocomplete doesn't work here).
> >
> > Forgot to add: I think that the syntax would have to be more explicit.
> > It's too easy to mix up
> > SELECT * - b.a_id FROM ...
> > and
> > SELECT *, - b.a_id FROM ...
> >
> > Maybe
> > SELECT * EXCEPT b.a_id FROM ...
>
> Yes, this was proposed on hackers a few months ago and a patch was
> proposed:
>
>
> https://www.postgresql.org/message-id/flat/892708.1634233481%40sss.pgh.pa.us#1f17923ad50a1442867162991c54ead9
>
> The last post was from October of 2021 so you can email the author to
> ask about its status.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   If only the physical world exists, free will is an illusion.
>
>
>
>


Re: Can we go beyond the standard to make Postgres radically better?

2022-02-11 Thread Benedict Holland
So to summarize, people are bad programmers who refuse to learn SQL So SQL
is the problem? Common. You cannot bring that to a postgres list serve.

Look. It's not perfect. It's a pain. It is hard to generate queries (oh my
God why are you doing this?) and it's hard to work with. You are describing
c++ to Python programmers and arguing why no one should ever use c++. And
look, there are other languages that are way better like Python. So why
would anyone ever chose c++?

Because the application is already in c++. Companies store petabytes of
data in SQL databases. With very simple commands, you can ensure that a
table with a billion rows has 10-way combination of unique values. Is FROM
X select Y better? Probably. Is print(x) better than cout << x.p() << endl;
? Yep. But it would take some serious guts to wander over to a C++ list and
explain how it is the worst language and needs to be scrapped and there are
way better languages out there.

Also, no one really argues for sql. We know it isn't great but we also know
why it likely can't change and it works.


On Fri, Feb 11, 2022, 9:26 AM Mladen Gogala  wrote:

> On 2/10/22 23:56, Guyren Howe wrote:
>
> On Feb 10, 2022, at 17:06 , Mladen Gogala  wrote:
>
>
> But SQL is a terrible, no good, very bad language.
>
>
> I cannot accept such a religious persecution of SQL without a detailed
> explanation.
>
>
> I feel like anyone who is defending SQL here isn’t aware of how much
> better the alternatives are, and how bad SQL really is.
>
> I mean: it’s hard to write, hard to read. It’s hard to generate. But
> that’s just the starting point.
>
> OK. If there are better alternatives, I am sure you will be able to sell
> them to the world. Establish a company and have a go at it.
>
>
>
> One of the worst things about it that I don’t see much discussed is that
> it imposes assumptions about the storage model that aren’t part of the
> relational model. Like heavyweight, hard to change tables with
> transactional guarantees and such. Don’t get me wrong, those things are
> great to have available, but I don’t need them all the time.
>
> Storage model and implementation are not part of SQL for good reason.
> Database vendors have different implementations. MySQL and Postgres are
> different. MySQL storage engines differ among themselves. Both of them are
> different from SQL Server which is in turn different from Oracle and DB2.
> Storage model determines the implementation. When there is only a single
> relational database vendor left in the market, then they can burn their
> storage implementation into the language standard. Until then, the more,
> the merrier.
>
>
>
>
> The whole NoSQL movement has been such a tragedy. Having diagnosed a
> problem with SQL databases, they threw out the relational model and very
> often reimplemented a form of SQL when they should have done the opposite.
> There is no reason you can’t have a relational database with an on-demand
> schema, with eventual consistency, with all those fun things that various
> NoSQL databases provide. Those storage models have their place, but the SQL
> standard says you can’t use them.
>
> But the biggest issue is the verbose, terrible, very bad query language.
> In the web development community where I spend my time, it is almost holy
> writ to treat the database as a dumb data bucket, and do everything in the
> application layer (even things like validations, even when that is a
> provably incorrect approach). Why? I think it’s because they’re used to
> working in a pleasant language like Ruby or Python, and they want to do
> everything there. And who can blame them?
>
> As a database architect who has successfully bridged two very different
> database systems, I can tell you that the application programmers put the
> business logic into the application because they frequently don't know what
> the options are. They know Java or Python and that's what they do, period.
> That has nothing to do with SQL.
>
>
> But this is bad. Proper relational design can take over much (most!) of
> the design of a typical business app, with significant efficiency gains the
> result. But no *community* is going to choose that when most of the members
> of the community don’t want to learn SQL and who can blame them?
>
> Business community which hires them to make efficient applications can
> blame them. And frequently does so.
>
>
> Another issue: everyone thinks “relational” is the same thing as “SQL”. If
> we could get folks to break that association, then relations should be a
> standard feature of high-level programming languages, just as arrays and
> hashes are.
>
> Heck, give me a functional programming language with a good relational
> model, and I can implement OOP in that relational language without breaking
> a sweat.
>
> Software *should* be designed around a logical/relational layer with
> minimal occasional forays into Turing completeness where necessary. But
> no-one is even thinking about software like that be

Re: Re: How long does iteration over 4-5 million rows usually take?

2022-04-02 Thread Benedict Holland
For work planning, create a temporary table, copy from a generated data
set, and test it out. It really depends. Normally, you don't iterate in SQL
since rows are assumed to be independent. You think about SQL in terms of
sets and let application code handle the data transfers. It also really
depends on the size of your tables, indexing, joins, and more. Basically,
there isn't really an answer but it isnt hard to test.

Thanks,
Ben


On Sat, Apr 2, 2022, 10:16 AM Shaozhong SHI  wrote:

> Thanks, Karsten,
>
> I would like the information to work planning purpose.
>
> Regards,
>
> David
>
> On Sat, 2 Apr 2022 at 14:47, Karsten Hilbert 
> wrote:
>
>> > > On Apr 1, 2022, at 10:18 PM, Ron  wrote:
>> > >
>> > >  On 4/1/22 20:34, Shaozhong SHI wrote:
>> > >>
>> > >> I have a script running to iterate over 4-5 million rows.  It keeps
>> showing up in red in PgAdmin.  It remains active.
>> > >>
>> > >> How long does iteration over 4-5 million rows usually take?
>> >
>> > 4-5 million times as long as it takes to do one iteration ( if you’re
>> doing it correctly)
>>
>> I may not take quite that long because setup/teardown times might not be
>> needed for each iteration.
>>
>> Best,
>> Karsten
>>
>>
>>


Re: Are stored procedures/triggers common in your industry

2022-04-20 Thread Benedict Holland
It's a very wierd concern for me. I have never liked that justification as
we convert 1:1 SAS to python. If you use Django, converting it to flask is
really hard. If you use postgresql, converting it to oracle is really hard.

I love stored procedures and triggers. Many of my colleagues don't
understand why sticking everything on a database is a great idea. These are
the same people who think that unique constraints are too much overhead.
It's a great tool to use. Do you need CRUD stored procedures when sql
alchemy exists? Nope. Do you need it when doing an extremely complex select
with multiple joins that you want to run all the time? Maybe. Or allowing
insert operations on base tables in a view. Or tracking and monitoring who
does what. Also, you can back up stroed procedures to make updates easy.

Thanks,
Ben

On Wed, Apr 20, 2022, 4:48 PM Alex Aquino  wrote:

> Agree on the lock in comment, however, can't we say that of anything one
> is dependent on in the tech stack, whether that be at the java vs
> javascript vs python, or now aws vs azure vs gcp?
>
> Have always wondered that lock in concern seems to be only mentioned in
> light of dbs, but not any other piece of the tech stack.
>
> On Wed, Apr 20, 2022 at 3:31 PM Ravi Krishna 
> wrote:
>
>> >I've really only ever worked in web development. 90+% of web
>> >developers regard doing anything at all clever in the database with
>> suspicion.
>>
>> One common argument they use is that if you write your business logic in
>> stored procedure, you are locked to that database since stored procedure
>> languages are pretty much vendor locked.
>>
>> TBH when one sees tens of thousands of Oracle PL/SQL code, there is some
>> truth in this.
>>
>


Re: restore question

2022-07-19 Thread Benedict Holland
You have to create the database to restore to. Then pg_resrore should work.

Thanks,
Ben

On Tue, Jul 19, 2022, 12:44 PM Ronald Haynes  wrote:

> Thanks Ray, running
>
> pSql -f backup-file.sql
>
>
>
> psql: error: FATAL: database "rhaynes74" does not exist
>
> Which seems odd since rhaynes74 is a user not a database name in the file.
>
> Sincerely,
>
> Dr. Ronald D. Haynes
> Professor, Department of Mathematics and Statistics
> Chair, MSc and Phd Scientific Computing Programs
> Memorial University of Newfoundland
>
> *We acknowledge that the lands on which Memorial University’s campuses are
> situated are in the traditional territories of diverse Indigenous groups,
> and we acknowledge with respect the diverse histories and cultures of the
> Beothuk, Mi’kmaq, Innu, and Inuit of this province.*
> On Jul 19, 2022, 9:40 AM -0230, Ray O'Donnell , wrote:
>
> On 19/07/2022 11:11, Ronald Haynes wrote:
>
> Hi folks,  I am a fairly novice postgresql user.
>
>
>
> I have a backup file for a couple of small postgresql databases.The
> backup file was created using postgresql 12.2, my Mac now has a
> postgresql 13.x or 14.x versions.   I would like to restore the setup
> from the backup file.
>
>
> That looks like a dump file generated by pg_dumpall, so restoring it
> should be as simple as:
>
> psql -f backup-file.sql
>
> You'll possibly need other connection parameters as required by psql,
> such as -U (the PostgreSQL user to use for connecting) or others.
>
> HTH,
>
> Ray.
>
>
> --
> Raymond O'Donnell // Galway // Ireland
> r...@rodonnell.ie
>
>


create extension points to the wrong directory

2019-06-28 Thread Benedict Holland
Hello all,

I really can't figure this one out. I am trying to get debug working on a
postgrseql 10 database on ubuntu. I installed the proper package
postgresql-10-pldebugger*. *When I go to create the extention, I get the
error:

create extension pldbgapi
ERROR: could not open extension control file
"/usr/share/postgresql/9.5/extension/pldbgapi.control": No such file or
directory

Which is true because it is in
/usr/share/postgrseql/10/extension/pldbgapi.control. I am trying to run
this command in pgadmin4. I am stuck. There doesn't seem to be a
configuration option. Any ideas?

Thanks,
~Ben


Re: Machine learning top of db logs

2019-07-12 Thread Benedict Holland
I have been thinking about how to apply NLP work to logs. The problem is
that logs are structured data. It would depend on what you are looking for,
what your outcomes are, and how you want to generate it. The other problem
is that your problems are going to be very rare events. Rare event
detection is its own special mess. Like, if you have 1000 users generating
1 million line log files, how do you pick out the 10 that are bad? You also
need to make those rare events representative of very rare events so you
might need say, 100 million line log files with a know 100 or 1000 issues
to start getting models to work.

Thanks,
~Ben

On Fri, Jul 12, 2019 at 2:44 AM Toomas Kristin 
wrote:

> Hi,
>
> Basically seems that data science and machine learning are going to be
> more and more popular at every field of life. I have considered to use
> machine learning top of logs generated by PostgreSQL servers. However
> before I start maybe someone already has done that and can share some
> experience?
>
> Toomas
>
>
>
>


Does pgadmin4 work with postgresql 8.4?

2019-08-05 Thread Benedict Holland
The subject says it all. I am using a very old database that I cant
upgrade. Do I have to manage it with pgadmin3 or can I use postgresql 4?

Thanks,
~Ben


Re: Does pgadmin4 work with postgresql 8.4?

2019-08-05 Thread Benedict Holland
Yea. I did look in the doc's. What I really dont want is some strange
feature that is unique to 8.4 not have support. Like, a select option for
example. I am fairly sure most of it will work but my "not supported" does
it mean that it wasnt tested but will probably work or that it really isnt
supported at all. Either is fine as I can use pgadmin3 if I have to but I
would prefer at least 1 piece of software to have support rather than none.

Thanks,
~Ben

On Mon, Aug 5, 2019, 4:48 PM Adrian Klaver 
wrote:

> On 8/5/19 1:15 PM, Benedict Holland wrote:
> > The subject says it all. I am using a very old database that I cant
> > upgrade. Do I have to manage it with pgadmin3 or can I use postgresql 4?
>
> Trolling the docs does not offer any insight. If you have pgAdmin4
> already installed I would just set up a connection to the 8.4 instance
> and see if it works:)
>
> >
> > Thanks,
> > ~Ben
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Does pgadmin4 work with postgresql 8.4?

2019-08-06 Thread Benedict Holland
We cant upgrade because of business reasons but that is very high on our
agenda to fix. To me, there is a huge difference between unsupported and
wont work. I dont expect support for this but are there significant changes
between 8.4 and 9.2 that would prevent pgadmin4 from working?

Thanks,
~Ben

On Tue, Aug 6, 2019, 3:21 AM Luca Ferrari  wrote:

> On Tue, Aug 6, 2019 at 7:02 AM Murtuza Zabuawala
>  wrote:
> > It is mentioned on the front page of https://www.pgadmin.org/
> > (Check introduction texts on the elephant wallpaper)
>
> "pgAdmin may be used on Linux, Unix, Mac OS X and Windows to manage
> PostgreSQL 9.2 and above."
>
> However, I would suspect it can work even on older versions.
>
> Luca
>


Re: Why must AUTOCOMMIT be ON to do txn control in plpgsql procedure?

2019-08-07 Thread Benedict Holland
All stored procedures run in their own transaction. My guess is that when
you turn off autocommit, you are not committing something after your
transaction ends. Also, I have never seen a rollback at the start of a
proc. A pure hypothetical is that it is doing nothing or definitely not
what you think it is.

BTW, this is why you cant close a transaction within a stored procedure. It
doesnt make sense.

Thanks,
~Ben

On Wed, Aug 7, 2019, 4:41 PM David G. Johnston 
wrote:

> On Wed, Aug 7, 2019 at 12:18 PM Luca Ferrari  wrote:
>
>> On Wed, Aug 7, 2019 at 8:28 PM Bryn Llewellyn  wrote:
>>
>> > B.t.w., I noticed that “set transaction isolation level serializable”
>> must be the very first statement after “rollback” (or “commit”). Even an
>> invocation of txid_current() after the rollback and before the ““set
>> transaction” causes this runtime error: “SET TRANSACTION ISOLATION LEVEL
>> must be called before any query”.
>> >
>>
>> Well, SET TRANSACTION ISOLATION must be the very first instruction of
>> every transaction, not only within the case you describe.
>>
>
> IMHO, The documentation, probably in chapter 13, could use some exposition
> on this topic.
>
> What is being described here is basically:
>
> [implicit] BEGIN (using default transaction options)
> CALL
> BEGIN (with inherited default transactions options)
> ROLLBACK (to get rid of the nested transaction setup by the call with
> the inherited default options)
> START WITH OPTIONS
> COMMIT;
> <<< [implicit] COMMIT
>
> As far as psql is concerned there is only one statement and once its
> executed psql issues the implicit commit to match the implicit begin is
> sent previously.
>
> It should be better documented which combinations of outer and inner
> transaction commands are considered valid and which are not.  WIth
> examples.  The current scattering of words leaves the user to perform
> trial-and-error, just as the OP has, to determine what is allowed.
>
> The nested transaction seems like it has to be correct since otherwise the
> rollback as a first statement would attempt to rollback the transaction the
> call itself is executing within...
>
> Note I only have access to v10 at the moment so I haven't tried my own
> experiments.  To my main point I shouldn't have to - the expected behavior
> should be something I could directly interpret from the documentation and
> in my admitted brief attempt I could not do so.
>
> David J.
>
>
>


Re: Input validation

2019-08-07 Thread Benedict Holland
I think a check will also work but I second triggers.

Thanks,
~Ben

On Wed, Aug 7, 2019, 2:21 PM Rob Sargent  wrote:

>
> On 8/7/19 12:07 PM, stan wrote:
> >   Have table that contains employee keys, paired up with work type keys
> >   (both foreign keys) and a 3rd column that you enter a billing rate in.
> >   Then I have a table where employees enter their work. I need to
> validate
> >   that the employee, work type pair exists, before allowing the new
> record
> >   to be inserted.
> >
> >   Any thoughts as to good way to do this?
> >
> >
> Does the employee interactively specify the "work type" then some
> time-spent value?
>
> Can the work-type be chosen from a drop-down generated by
>
>  select work_type from table where employee = 
>
> Otherwise you'll need a trigger on the insert into "enter their work"
> table.  Sad thing here is the user has likely left the scene.
>
>
>
>


Re: Understanding PostgreSQL installer debug log

2019-08-09 Thread Benedict Holland
Does the problem go away if you install pg11? Are the drives you are
getting your logs from encrypted?

Thanks,
~Ben

On Fri, Aug 9, 2019, 3:17 AM Luca Ferrari  wrote:

> On Fri, Aug 9, 2019 at 8:46 AM Ramesh Maddi 
> wrote:
> > ./postgresql-9.6.6-1-linux-x64.run --installer-language en
> --serviceaccount postgres --servicename postgresqld  --datadir
> "/home/postgres/" --prefix  "/home/postgres" --superpassword 1234
> --serverport 5432 --debuglevel 4 --debugtrace ./postgresql-debug.log --mode
> unattended
> >
>
> I suspect this is a dump produced by Qt used by EDB installer, maybe
> you should ask support to them for this:
>
> 
> 18.8.0
> Linux 4.18.0-25-generic x86_64
> 
>
> Please note that, my case, I was able to get a dump immediatly because
> the data directory did not exist. Why don't you use at least attended
> mode to see if it is something as trivial as in my case?
>
> Beside, is there a specific reason why you are not using
> distro-specific packages? See the note here
> .
> Luca
>
>
>


Re: Understanding PostgreSQL installer debug log

2019-08-09 Thread Benedict Holland
It looks like your encoding is correct. You are getting letters. If your
encoding was just wrong. You would end up with a lot of strange characters.
If this is tied to one client, it sounds like an encryption issue and
mounting drives for logging that the client cant de-encrypt.

Thanks,
~Ben

On Fri, Aug 9, 2019, 10:45 AM Adrian Klaver 
wrote:

> On 8/9/19 4:25 AM, Ramesh Maddi wrote:
> > Our product is certified along with PostgreSQL 9.6.6 only. We cannot use
> > further version of PostgreSQL at this time in production until it is
> > certified for our product( which is integrated may other components)
> > and is officially released.
> >
>
> >
> > Coming to *Luca's *question, I understand that using Yum/RPM is better
> > way of installation. That would be our future plan. But for this
> > particular issue, we need to identify debug steps.
>
> Might try the EDB forum:
>
> https://postgresrocks.enterprisedb.com/t5/EDB-Postgres/bd-p/EDBPostgres
>
> >
> > Thanks for your insight.
> > -- Ramesh
> >
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Authentication?

2018-03-07 Thread Benedict Holland
Not to get off topic, can you authenticate database users via Kerberos?

Thanks,
~Ben

On Wed, Mar 7, 2018 at 10:19 AM, Stephen Frost  wrote:

> Greetings,
>
> * Bjørn T Johansen (b...@havleik.no) wrote:
> > Is it possible to use one authentication method as default, like LDAP,
> and if the user is not found, then try to authenticate using
> > md5/scram-sha-256 ?
>
> Not directly in pg_hba.conf.  You might be able to construct a system
> which works like this using PAM though, but it wouldn't be much fun.
>
> LDAP use really should be discouraged as it involves sending the
> password to the PG server.  If you are operating in an active directory
> environment then you should be using GSSAPI/Kerberos.
>
> SCRAM is a good alternative as it doesn't send the password to the
> server either, though that is only available in PG10, of course.
>
> Thanks!
>
> Stephen
>
>


Re: How to revoke privileged from PostgreSQL's superuser

2018-08-10 Thread Benedict Holland
The short answer I will provide from my experience is that you can't do it.
Your DBA will have access to just about anything across all tables and
databases.

The longer answer are ones that others have pointed out. If a DBA should be
restricted from tables, they probably shouldn't be your DBA. Your DBA will
likely be the one responsible, for example, for backing up all of the
databases on a server. That requires read access and understanding concepts
about secure backups of sensitive data. It is also possible that they are
running backups as their own user rather than postgres. If you don't want
DBAs to access your data you really do not want that data to not have
backups.

I also would take Bruce's comment with a massive grain of salt. Everything
that everyone does on a database is logged somewhere assuming proper
logging. Now do you have the person-power to go through gigs of plain text
logs to find out if someone is doing something shady... that is a question
for your management team. Also, if you suspect someone of doing something
shady, you should probably revoke their admin rights.

~Ben


On Fri, Aug 10, 2018 at 3:41 PM, Bruce Momjian  wrote:

> On Mon, Aug  6, 2018 at 06:19:55AM -0700, David G. Johnston wrote:
> > On Monday, August 6, 2018,  wrote:
> >
> >
> > I have a request for revoking the access to user's data from
> DBA-user.
> > I think the request is right because users should be the only ones
> can
> > access their data.
> >
> >
> > User then needs to encrypt data prior to storing it.  Superuser can still
> > access the data but would be challenged to make sense of it,
>
> Keep in mind DBAs can often remove data with little detection, unless
> you are using some kind of block chain, which itself can force
> serialized data access, slowing things down.
>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +  Ancient Roman grave inscription +
>
>


Re: How to revoke privileged from PostgreSQL's superuser

2018-08-10 Thread Benedict Holland
Oh this is an easy one. There are simpler solutions for PITR. Also, a PITR
is a very specific case of a database use, if it even uses one. Generally
speaking, you would not want to keep encrypted data within a database.
There simply isn't a need for it. Just use a file or a folder. You can't do
anything that you would normally do with a database if you can't read or
access any of your objects. It would just be a table of binary objects
without names, possibly access or creation dates depending on the level of
paranoia. Literally, you would have an int column and a binary object
column. What can you honestly do with that? You can't even link it to other
objects. It has no relational structure, hense the question. If there isn't
a relationship to anything then a relational database wouldn't really help
anything.

Also, I would probably keep the encryption key within the database anyway.
Otherwise, your objects could get permanently lost making the whole thing
moot in the first place.

Look, you either trust your DBAs or you don't. If you don't trust them, why
are they your DBA? This is like writing unit tests for unit tests or having
even higher levels of privilege than a superuser. It's turtles all the way
down.

~Ben


On Fri, Aug 10, 2018 at 4:12 PM, Rui DeSousa  wrote:

>
>
> On Aug 6, 2018, at 10:45 AM, Bear Giles  wrote:
>
> then it's reasonable to ask if storing the information in a relational
> database is the best approach.
>
>
> Why? Just because its encrypted doesn’t mean it shouldn’t be stored in the
> database. What about PITR, how would that be handled?  You basically would
> have to reimplement things the RDBMS system gives you for free by storing
> it outside the database.  Don’t forget it's called a management system for
> a reason.
>


Re: COMMIT IN STORED PROCEDURE WHILE IN A LOOP

2022-10-19 Thread Benedict Holland
This seems like a bad use of a stored procedure. Why wouldn't you spin up
40 clients with a table name and run it across 40 connections? But also, I
don't like loops in stored procedures. Working with loops in a set based
system hurts me but it's personal preference.

Like, I could write this in python in less than 4 hours with an extremely
simple sproc. How often are you running table deletions like this to
require a sproc?

Thanks,
Ben

On Wed, Oct 19, 2022, 5:39 PM Thomas Kellerer  wrote:

> gogala.mla...@gmail.com schrieb am 19.10.2022 um 01:46:
> > Amazon, lead by Kevin Closson, the guy who has famously designed
> > Oracle Exadata among other things, even came up with the recipe how
> > to migrate it to Postgres:
> >
> >
> https://aws.amazon.com/blogs/database/migrating-oracle-autonomous-transactions-to-postgresql/
>
>
> The workaround to use dblink to simulate autonomous transactions is
> nothing new,
> and has been floating around for quite a while.
>
> Here is a blog post from 2012 (6 years before the Amazon recipe was
> published)
>
> https://raghavt.blog/autonomous-transaction-in-postgresql-9-1/
>
> and another one from 2016
>
> https://blog.dalibo.com/2016/09/08/Pragma_Autonoumous_transactions.html
>
>
>
>


Re: Regd. the Implementation of Wallet (in Oracle) config equivalent in postgreSQL whilst the database migration

2022-12-21 Thread Benedict Holland
What would you be missing? You can encrypt databases. You can encrypt the
s3 buckets using kms. You can govern access via ssh Auth. When you do
backups, you can encrypt the tar.gz files or whatever format and store it
on s3. Same with the wal files. The fact that oracle charges for this is a
joke. Of course, you would need to ensure compliance with your opsec teams
and stuck with best security practices but it seems top to bottom
encryption is unrelated or tangentially related to the databases.

Also, if you lose the encryption keys for your backups then bad things
happen. I doubt what I did was production viable but I limited database
access to a handful of users, encrypted the disks, left the Wal files
unencrypted but mounted with read access for a single user, compressed full
backups with encryption and a password, generated sah keys for anyone who
needed service accounts to access the systems, enforced database ownership
permissions, and and gave server access to a tiny team with 2fa. The way 8
figured it, if someone somehow rooted the box we were screwed anyway.

For an internal database, this seemed sufficient. For an external database,
I would highly recommend paid consulting security firms or hire people who
know to build an externally facing platform.

Thanks
Ben

On Wed, Dec 21, 2022, 4:39 PM Rainer Duffner  wrote:

>
>
> Am 21.12.2022 um 22:34 schrieb Laurenz Albe :
>
> There is no exact equivalent, but there is something similar and much
> better: you can
> authenticate the client with SSL client certificates:
> https://www.postgresql.org/docs/current/auth-cert.html
>
>
>
> Isn’t the wallet the part where the encryption keys are stored?
>
> Indeed, one of the few remaining features that only Oracle (and of course
> other commercial RDMSs) has seems to be full HSM support for TDE.
>
>
> Rainer
>


Re: Regd. the Implementation of Wallet (in Oracle) config equivalent in postgreSQL whilst the database migration

2022-12-21 Thread Benedict Holland
Yea. I wasn't clear. I tar up the dump files as part of the command. Dont
lose your keys is awesome in theory. AWS comes with managed keys. The Wal
command will let you encrypt your Wal files too but if I were a hacker then
I would also copy ssh folders. It depends on security and business needs.
If I migrated this to a production environment we would have quite a bit
more infrastructure, likely a dedicated s3 location for Wal files, backups,
redundancy, k8's for nodes, a complex file server, and a bunch more stuff
that you pay people to set up, configure, and manage.

Like, does oracle give you something more? Probably. It's also a ton of
money and I mean a geuine ton. At that point, you also need security
audits, security protocols, requirements, backup and retention policies,
and redundancy key locations. If someone has root, I don't know how they
also don't have your encryption keys. Maybe they don't on a USB but then
th3 USB breaks or gets lost. Unencrypted Wal files on an RW partition with
a single user able to read or write along with nightly encrypted pgdump
might meet security needs. Or not. But that is true about anything. I have
never set up a credit card database. I assume those are locked down way
more than I ever will lock down a system.

On Wed, Dec 21, 2022, 6:13 PM Ron  wrote:

> And encrypting a tar.gz file presumes a pretty small database.  (The
> --jobs= option was added to pg_dump/pg_restore for just that reason.)
>
> On 12/21/22 16:25, Benedict Holland wrote:
>
> What would you be missing? You can encrypt databases. You can encrypt the
> s3 buckets using kms. You can govern access via ssh Auth. When you do
> backups, you can encrypt the tar.gz files or whatever format and store it
> on s3. Same with the wal files. The fact that oracle charges for this is a
> joke. Of course, you would need to ensure compliance with your opsec teams
> and stuck with best security practices but it seems top to bottom
> encryption is unrelated or tangentially related to the databases.
>
> Also, if you lose the encryption keys for your backups then bad things
> happen. I doubt what I did was production viable but I limited database
> access to a handful of users, encrypted the disks, left the Wal files
> unencrypted but mounted with read access for a single user, compressed full
> backups with encryption and a password, generated sah keys for anyone who
> needed service accounts to access the systems, enforced database ownership
> permissions, and and gave server access to a tiny team with 2fa. The way 8
> figured it, if someone somehow rooted the box we were screwed anyway.
>
> For an internal database, this seemed sufficient. For an external
> database, I would highly recommend paid consulting security firms or hire
> people who know to build an externally facing platform.
>
> Thanks
> Ben
>
> On Wed, Dec 21, 2022, 4:39 PM Rainer Duffner 
> wrote:
>
>>
>>
>> Am 21.12.2022 um 22:34 schrieb Laurenz Albe :
>>
>> There is no exact equivalent, but there is something similar and much
>> better: you can
>> authenticate the client with SSL client certificates:
>> https://www.postgresql.org/docs/current/auth-cert.html
>>
>>
>>
>> Isn’t the wallet the part where the encryption keys are stored?
>>
>> Indeed, one of the few remaining features that only Oracle (and of course
>> other commercial RDMSs) has seems to be full HSM support for TDE.
>>
>>
>> Rainer
>>
>
> --
> Angular momentum makes the world go 'round.
>


Re: Sequence vs UUID

2023-01-26 Thread Benedict Holland
You could always create a uuid matching table for anything displayed to
users and keep a private ID for anything internal. From my particle
standpoint, one is 8 bytes, the other is 16 or 32. Any database
implementation should guarantee a unique value. I have had cases where it
didn't work but those were rare and possibly a coding error. Basically, i
don't see a particle difference apart from url access and giving away a PK.
I haven't had an issue with it. Some people have. I still use internally
but my requirements are not public facing.

Thanks,
Ben



On Thu, Jan 26, 2023, 4:55 PM Erik Wienhold  wrote:

> > On 26/01/2023 20:17 CET veem v  wrote:
> >
> > Hello, We were trying to understand whether we should use UUID or
> Sequence in
> > general for primary keys. In many of the blogs (one is below) across
> multiple
> > databases, I saw over the internet and all are mostly stating the
> sequence is
> > better as compared to UUID. But I think in the case of concurrent data
> load
> > scenarios UUID will spread the contention point whereas sequence can be a
> > single point of contention.
> >
> > So we want to understand from experts here, if there are any clear rules
> > available or if we have any pros vs cons list available for each of
> those to
> > understand the exact scenario in which we should go for one over other?
> > Basically I wanted to see if we can perform some test on sample data to
> see
> > the percentage of overhead on read and write performances of the query in
> > presence of UUID VS Sequence to draw some conclusion in general? And also
> > considering open source postgres as the base for many databases like
> redshift
> > etc, so the results which apply to progress would apply to others as
> well.
> >
> >
> https://www.percona.com/blog/2019/11/22/uuids-are-popular-but-bad-for-performance-lets-discuss/
>
> I think that[1] provides a good summary.  Performance consideration is
> just one
> aspect.  Is there a technical requirement for using UUID over sequential
> values?
>
> If there's a single generator of primary keys use bigint sequences.  In
> case of
> multiple generators (multi-master replication, sharding, clients
> generating IDs)
> consider UUID.
>
> There are arguments against sequential PK, e.g. they give away too much
> info and
> allow attacks such as forced browsing[2].  The first I can understand: you
> may
> not want to reveal the number of users or customers.  But access control
> should
> prevent forced browsing.
>
> [1]
> https://www.cybertec-postgresql.com/en/uuid-serial-or-identity-columns-for-postgresql-auto-generated-primary-keys/
> [2] https://owasp.org/www-community/attacks/Forced_browsing
>
> --
> Erik
>
>
>


Re: Sequence vs UUID

2023-01-28 Thread Benedict Holland
Why is it a terrible idea? I have been using them for years without a
single problem. I don't rely on them for create order. Terrible seem a bit
extreme.

Thanks,
Ben

On Sat, Jan 28, 2023, 3:39 PM Erik Wienhold  wrote:

> > On 27/01/2023 01:48 CET Ron  wrote:
> >
> > On 1/26/23 15:55, Erik Wienhold wrote:
> > >
> > > There are arguments against sequential PK, e.g. they give away too
> much info and
> > > allow attacks such as forced browsing[2].  The first I can understand:
> you may
> > > not want to reveal the number of users or customers.  But access
> control should
> > > prevent forced browsing.
> >
> > Shouldn't your application layer isolate the users from the database?
> UUIDs
> > are all over the DBs I manage, but the PKs are all sequences.
>
> Yes, I meant the application layer, not Postgres' access control.
>
> --
> Erik
>
>
>


Re: Sequence vs UUID

2023-02-02 Thread Benedict Holland
Well... until two processes generate an identical UUID. That happened to me
several times. It's rare but when that happens, oh boy that is a mess to
figure out.

Thanks,
Ben

On Thu, Feb 2, 2023, 10:17 AM Miles Elam  wrote:

> On Wed, Feb 1, 2023 at 10:48 AM Kirk Wolak  wrote:
>
>>
>>
>> On Wed, Feb 1, 2023 at 1:34 PM veem v  wrote:
>>
>>>
>>> 1) sequence generation vs UUID generation, execution time increased from
>>> ~291ms to 5655ms.
>>> 2) Insert performance of "sequence" vs "UUID"  execution time increased
>>> from ~2031ms to 10599ms.
>>> 3) Index performance for sequence vs UUID,  execution time increased
>>> from ~.3ms to .5ms.
>>>
>>>
>> Yes, assuming that UUIDs would be efficient as keys when they are
>> randomly generated, versus sequences (which tend to expand in one
>> direction, and have been relatively optimized for years).
>>
>> This article explains in detail what is going on.  If I were doing this,
>> I would strongly consider a ULID because of getting the best of both worlds.
>> https://blog.daveallie.com/ulid-primary-keys
>>
>> Of course, YMMV...  And since ULIDs are not native to PG, there is
>> overhead, but it is far more performant, IMO...
>>
>
> Biased comparison. ULIDs have a timestamp component. The closest UUID
> equivalent in Postgres is UUIDv1 from the uuid-ossp extension, not v4.
> Another difference not mentioned in the blog article is that UUID is
> versioned, meaning you can figure out what kind of data is in the UUID,
> whereas ULIDs are a "one size fits all" solution.
>
> There is an implementation of sequential UUIDs for Postgres I posted
> earlier in this thread. In addition, here is an implementation of UUIDv7
> for Postgres:
>
> https://gist.github.com/kjmph/5bd772b2c2df145aa645b837da7eca74
>
> I would suggest running your tests against v1, v7, and sequential UUID
> before jumping on ULID, which has no native type/indexing in Postgres.
>
> It should also be noted that apps cannot provide a bigint ID due to
> collisions, but an app can generate UUIDs and ULIDs without fear,
> essentially shifting the generation time metric in UUID/ULID's favor over a
> bigserial.
>
> - Miles
>
>
>


Re: Sequence vs UUID

2023-02-02 Thread Benedict Holland
No idea at all. We had the data for the insert and had to insert it again.
It was extremely confusing but oh boy did it wreck our systems.

Thanks,
Ben

On Thu, Feb 2, 2023, 6:17 PM Ron  wrote:

> On 2/2/23 17:11, Peter J. Holzer wrote:
> > On 2023-02-02 10:22:09 -0500, Benedict Holland wrote:
> >> Well... until two processes generate an identical UUID. That happened
> to me
> >> several times.
> > How did that happen? Pure software implementation with non-random seed?
> > Hardware with insufficient entropy source?
>
> Poorly implemented algorithm?
>
> --
> Born in Arizona, moved to Babylonia.
>
>
>


Re: UPSERT in Postgres

2023-04-06 Thread Benedict Holland
That is the answer. Postgresql can upsert easily via triggers and on
conflict.

Thanks,
Ben

On Thu, Apr 6, 2023, 5:01 PM Adrian Klaver 
wrote:

> On 4/5/23 23:21, Louis Tian wrote:
> > This is a question/feature request.
> >
>
> > Given the definition of upsert, I'd expect an upsert command to do the
> following.
> > - `upsert into person (id, name) values (0, 'foo')` to insert a new row
> > - `upsert into person (id, is_active) values (0, true)` updates the
> is_active column for the row inserted above
> >
> > Naturally, since there isn't a real upsert command in PostgreSQL this
> won't work today.
> > But can we achieve the same effect with "INSERT ... ON CONFLICT DO
> UPDATE" like a lot of references on the internet seems to suggest.
> >
> > insert into person (id, name) values (0, 'foo') on conflict ("id") do
> update set id=excluded.id, name=excluded.name
> > insert into person (id, is_active) values (0, true) on conflict
> ("id") do update set id=excluded.id, is_active=excluded.is_active
>
> insert into person (id, name, is_active) values (0, '', true) on
> conflict ("id") do update set id=excluded.id, name=person.name,
> is_active=excluded.is_active ;
> INSERT 0 1
>
> select * from person;
>   id | name | is_active
> +--+---
>0 | foo  | t
>
> >
> > Unfortunately. the second statement will fail due to violation of the
> not null constraint on the "name" column.
> > PostgreSQL will always try to insert the row into the table first. and
> only fallback to update when the uniqueness constraint is violated.
> > Is this behavior wrong? maybe not, I think it is doing what it reads
> quite literally.
> > That being said, I have never had a need for the ON CONFLICT DO UPDATE
> statement other than where I need upsert.
> > But using it as "upsert" is only valid when the table is absent of any
> NOT NULL constraint on it's non primary key columns.
> > So, if my experience/use case is typical (meaning the main purpose / use
> case for ON CONFLICT DO UPDATE is to support upsert) then it can be argue
> the current behavior is incorrect?
> >
> > This has been a source confusion to say at least.
> >
> https://stackoverflow.com/questions/48816629/on-conflict-do-nothing-in-postgres-with-a-not-null-constraint
> >
> https://www.postgresql.org/message-id/flat/1996438.1604952901%40sss.pgh.pa.us#ace8adc1354265aca5672028faea0c0f
> >
> > The MERGE command introduced in PG15 in theory can be used to do UPSERT
> properly that is void of the aforementioned limitation.
> > The downside is it is rather verbose.
> >
> > *Question*
> > This there a way to do an upsert proper prior to PG15?
> >
> > *Feature Request*
> > Given that UPSERT is an *idempotent* operator it is extremely useful.
> > Would love to see an UPSERT command in PostgreSQL so one can 'upsert'
> properly and easily.
> >
> >
> > Regards,
> > Louis Tian
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>
>


Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-08 Thread Benedict Holland
This went on too long. On conflict is the solution. It has been since at
least 9. I have run that in a production stored proc without a single
problem.

This is an actual and literal solution.

Thanks,
Ben

On Sat, Apr 8, 2023, 5:51 PM Adrian Klaver 
wrote:

> On 4/6/23 17:49, Louis Tian wrote:
> > Hi Peter,
> >
> > Thanks for your reply. Appreciate the help and discussion.
> >
> >>   In general UPSERT (or any definition of it that I can think of) does
> >> not imply idempotency.
> >
> > "Idempotence is the property of
> certain operations in mathematics and computer science whereby they can be
> applied multiple times without changing the result beyond the initial
> application." from Wikipedia.
> > the concept of Idempotence when applies to HTTP is consistent with the
> above. https://developer.mozilla.org/en-US/docs/Glossary/Idempotent. Or
> are you going by a different defintion that I am not aware of?
> > If you execute the same upsert multiple times, the state of the database
> remains the same as if only execute once.
> > If a row already exists, the first statement will update the row so does
> any subsequent statements. executing the same update multiple time is the
> same as executing it only once.
> > If the row doesn't exist, the first statement will insert that row and
> any subsequent will try to update, but the update has no real effect since
> it the value is exactly the same as the insert.
> > So by defintion, upsert is idempotent.
>
> No it is not as Israel Brewster pointed out.
>
> To his example I would add:
>
> alter some_table add column ts_upsert_update timestamptz;
>
> insert into some_table values('foo', 'bar') on conflict(tbl_id) do
> update set foo_fld = excluded.foo_fld, bar_fld = some_table.bar_fld,
> ts_upsert_update = now();
>
> You are substituting whatever definition you have in your head for the
> definition as it actually exists.
>
> >
> >> It could just be a unique index or a unique constraint. So you can
> >> upsert on any individual unique constraint/index, or the primary key.
> >> Of course there might be several on a given table, but you can only
> >> use one as the "conflict arbiter" per statement.
> >
> > Understand that I can use any unique constraint with on conflict.
> > But semantically the only correct one is the primary key, since that's
> what identifies a row logically.
> > In that sense, any unique column(s) is a potential candidate for primary
> key.
> > It's more of a pedantic point rather than pragmatic one.
> > It's less of a problem for PostgreSQL where the semantic importance of
> primary key is not manifested at implementation level, since all index
> points to the tuple directly
> > Whereas it is more import for Databaes like MySQL where the secondary
> index points to the primary key index.
>
> Again you are dealing with the imagined instead of the reality. Along
> that line you left out that a 'exclusion constraint violation error' can
> also trigger the ON CONFLICT.
>
>
> > Use some pesudo code might be helpful here to explain the difference.
> >
> > How on conflict works at the moment.
> >
> > try {
> >   insert row
> > } catch (duplicated key error) {
> >update row
> > }
> >
> > How I think it upsert should work
>
> And therein lies your problem, you are imagining something that does not
> exist and more to the point will most likely not exist as it would break
> all code that depends on above behavior.
>
> >
> > if (new.id exists) {
> >   update row
> > } else {
> >   insert row
> > }
> >
>
> > I am not expecting an error here. The problem is with no conflict it
> always go down the insert path first and results in a not null constraint
> error.
>
> To be expected, as after all the command is:
>
> INSERT INTO  ...
>
> > While I am expecting the insert is never executed in the first place
> when that row already exist (as identified by it primary key). So the
> update execute without error.
> > I hope the pesudo code above is enough to clarify the difference?
> >
>
> > Cheers,
> > Louis Tian
> >
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>
>


Re: TEXT column > 1Gb

2023-04-12 Thread Benedict Holland
For documents that long I would seriously consider using large objects and
refencing them with their OIDs. Text fields get put in a special location
within the database. It's similar (possibly exactly) to using large
objects. Also, you can potentially compress them to save space on write and
read. 1gb of text is a lot of text.

See https://www.postgresql.org/docs/current/largeobjects.html

Thanks,
Ben

On Wed, Apr 12, 2023, 1:20 PM Joe Carlson  wrote:

> I’ve certainly thought about using a different representation. A factor of
> 2x would be good, for a while anyway. For nucleotide sequence, we’d need to
> consider a 10 character alphabet (A, C, G, T, N and the lower case forms
> when representing ’soft masked’ sequence*). So it would be 2 bases/byte.
> (Proteins are not nearly so long so a straight storage is simpler.) But
> these would be bigger changes on the client side than storing in chunks so
> I think this is the way to go.
>
> We’re working with plant genomes, which compared to human chromosomes, are
> HUGE. One chromosome of fava bean is over a gig. And pine tree is another
> monster. This, together with the fact that sequence data collection and
> assembly have improved so much in the past couple years has forced us to
> rethink a lot of our data storage assumptions.
>
> * for those curious, especially in plants, much of sequence consists of
> repetitive element that are remnants of ancient viruses, simple repeats and
> the like. For people who want to identify particular functional components
> in a genome, they typically do not want to search against this sequence but
> restrict searching to coding regions. But the repetitive sequence is still
> important and we need to keep it.
>
> > On Apr 12, 2023, at 10:04 AM, Mark Dilger 
> wrote:
> >
> >
> >
> >> On Apr 12, 2023, at 7:59 AM, Joe Carlson  wrote:
> >>
> >> The use case is genomics. Extracting substrings is common. So going to
> chunked storage makes sense.
> >
> > Are you storing nucleotide sequences as text strings?  If using the
> simple 4-character (A,C,G,T) alphabet, you can store four bases per byte.
> If using a nucleotide code 16-character alphabet you can still get two
> bases per byte.  An amino acid 20-character alphabet can be stored 8 bases
> per 5 bytes, and so forth.  Such a representation might allow you to store
> sequences two or four times longer than the limit you currently hit, but
> then you are still at an impasse.  Would a factor or 2x or 4x be enough for
> your needs?
> >
> > —
> > Mark Dilger
> > EnterpriseDB: http://www.enterprisedb.com
> > The Enterprise PostgreSQL Company
> >
> >
> >
>
>
>
>


Re: TEXT column > 1Gb

2023-04-12 Thread Benedict Holland
Yea. For ease of use, out of the box solutions that will just work, large
objects. You might know them as BLOBS in other SQL varieties. If you are
dealing with that much data, I'm going to assume that storage isn't really
your concern. I wouldn't even waste time compressing. I use them frequently
to store all sorts of wierd objects like pictures or serialized pickle
files. They are really fast and extremely easy to use. They do not play
nicely with a lot of 3rd party software, particularly UIs sitting on top of
a database but again, if that isnt a concern or you can use stored
procedures for the selects, it should be just fine.

On Wed, Apr 12, 2023, 3:21 PM Rob Sargent  wrote:

> On 4/12/23 13:02, Ron wrote:
>
> *Must* the genome all be in one big file, or can you store them one line
> per table row?
>
>
> Not sure what OP is doing with plant genomes (other than some genomics)
> but the tools all use files and pipeline of sub-tools.  In and out of
> tuples would be expensive.  Very,very little "editing" done in the usual
> "update table set val where id" sense.
>
> Lines in a vcf file can have thousands of colums fo nasty, cryptic garbage
> data that only really makes sense to tools, reader.  Highly denormalized of
> course.  (Btw, I hate sequencing :) )
>
>
>
>


Re: Need Help On Upgrade

2023-07-11 Thread Benedict Holland
This also sounds like a fairly advanced setup and a corporate environment.
Postgresql offers paid support and you probably want that.

Thanks,
Ben

On Tue, Jul 11, 2023, 4:33 PM Adrian Klaver 
wrote:

> On 7/11/23 14:45, Johnathan Tiamoh wrote:
> > Hello Adrian
> >
> > 1) PgBouncer version.
> >
> > pgbouncer.1.7.2
>
> PgBouncer most recent version is from here:
>
> http://www.pgbouncer.org/changelog.html
>
> PgBouncer 1.19.x
>
> 2023-05-31 - PgBouncer 1.19.1 - “Sunny Spring”
>
>
>
>
> And since 1.7.x there have been these related auth changes:
>
> PgBouncer 1.11.x
>
> Add support for SCRAM authentication for clients and servers. A new
> authentication type scram-sha-256 is added.
>
> PgBouncer 1.12.x
>
> Accept SCRAM channel binding enabled clients. Previously, a client
> supporting channel binding (that is, PostgreSQL 11+) would get a
> connection failure when connecting to PgBouncer in certain situations.
> (PgBouncer does not support channel binding. This change just fixes
> support for clients that offer it.)
>
> PgBouncer 1.14.x
>
> Add SCRAM authentication pass-through. This allows using encrypted SCRAM
> secrets in PgBouncer (either in userlist.txt or from auth_query) for
> logging into servers
>
> PgBouncer 1.16.x
>
> Mixed use of md5 and scram via hba has been fixed.
>
> PgBouncer 1.17.x
>
> Don’t apply fast-fail at connect time. This is part of the
> above-mentioned change to not report server errors before
> authentication. It also fixes a particular situation with SCRAM
> pass-through authentication, where we need to allow the client-side
> authentication exchange in order to be able to fix the server-side
> connection by re-authenticating. The fast-fail mechanism still applies
> right after authentication, so the effective observed behavior will be
> the same in most situations.
>
>
> I think an update is in order.
>
> >
> > 2) Did you change the settings when going from 9.5 to 14.8?
> >
> > No. I did not do any configuration changes on the bouncers
> >
> > 3) Does the PgBouncer log show anything relevant?
> >
> > No. It does not show anything
> >
> > 4) Does '...through the  pgbouncers...' mean there is more then one
> > PgBouncer instance in use?
> >
> > Yes, I have more than 3 pgbouncers for different connections.
> >
> >
> >
> > On Tue, Jul 11, 2023 at 12:19 PM Adrian Klaver
> > mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 7/11/23 08:53, Johnathan Tiamoh wrote:
> >  > I can connect.
> >  >
> >  > All applications and other users that connect from to the
> databases
> >  > through the  pgbouncers can't connect.
> >
> > That would have been a good thing to have mentioned in your first
> post.
> > I don't use PgBouncer so I am not going to be of much use going
> > forward.
> > For those that do use it and can help answers to the following would
> be
> > helpful:
> >
> > 1) PgBouncer version.
> >
> > 2) Did you change the settings when going from 9.5 to 14.8?
> >
> > 3) Does the PgBouncer log show anything relevant?
> >
> > 4) Does '...through the  pgbouncers...' mean there is more then one
> > PgBouncer instance in use?
> >
> >  >
> >  > On Tue, Jul 11, 2023 at 11:46 AM Adrian Klaver
> >  > mailto:adrian.kla...@aklaver.com>
> >  > >> wrote:
> >  >
> >  > On 7/10/23 20:45, Laurenz Albe wrote:
> >  >  > On Mon, 2023-07-10 at 20:38 -0400, Johnathan Tiamoh wrote:
> >  >  >> Below is the full error message.
> >  >  >>
> >  >  >> 2023-06-27 05:01:27.385 CDT| XXX
> (60930)|
> >  > APPNAME=[unknown]| DB=postgres| USER=grafana| PID=9061| 1|
> >  > authentication| PGE-28P01: FATAL:  password authentication
> failed
> >  > for user
> >  >  >> "grafana"
> >  >  >> 2023-06-27 05:01:27.385 CDT| XXX
> (60930)|
> >  > APPNAME=[unknown]| DB=postgres| USER=grafana| PID=9061| 2|
> >  > authentication| PGE-28P01: DETAIL:  Connection matched
> > pg_hba.conf
> >  > line 113:
> >  >  >> "hostall all 0.0.0.0/0 
> >  > > md5"
> >  >  >
> >  >  > Then you must have entered the wrong password.
> >  >  >
> >  >  > If in doubt, change the password.
> >  >
> >  > Can you connect to the database at all or is this specific to
> >  > certain users?
> >  >
> >  > What client(s) are you using and is the problem coming from a
> > specific
> >  > client?
> >  >
> >  >
> >  >
> >  >
> >  >  >
> >  >  > Yours,
> >  >  > Laurenz Albe
> >  >
> >  > --
> >  > Adrian Klaver
> >  > adrian.kla...@aklaver.com 
> > 

Re: Is there a way to dump schema to files on disk and keep them in sync

2020-12-13 Thread Benedict Holland
You want Alembic and an afternoon of python writing. You just described an
ORM.

On Sun, Dec 13, 2020, 12:53 PM Tomas Vondra 
wrote:

> On 12/13/20 6:34 PM, Adrian Klaver wrote:
> > On 12/12/20 6:58 PM, Tim Uckun wrote:
> >> I want to dump my postgres schema to disk in neat directories like
> >> pgadmin presents. Then I want to be able to edit the files and sync
> >> changes to the database and ideally if changes were made in the
> >> database to sync them back to the disk.
> >>
> >> Is there a tool that does this? Is there a tool that will dump the
> >> schema into separate directories and files like pgadmin does?
> >
> > pgAdmin does not create directories, it just organizes the contents of
> > the system catalogs into GUI elements.
> >
> > For schema management I would suggest a tool like the one I use
> > Sqitch(https://sqitch.org/). It will organize the process of schema
> > creation and management.
> >
>
> Yeah, that was my thought too. Or maybe look at the other schema
> versioning tools available - we have a list on the wiki:
>
> https://wiki.postgresql.org/wiki/Change_management_tools_and_techniques
>
> I'm sure it's incomplete, but it's helpful nevertheless.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>


Re: Is there a way to dump schema to files on disk and keep them in sync

2020-12-13 Thread Benedict Holland
I mean, you want to dump a schema into a directory. Alembic, sqlalchemy and
pathlib will do that in a few minutes. If you want to sync changes then
write alembic change scripts and reflect those changes in the ORM, alembic
does that. Modifying the ORM and reflecting those changes is a terrible
idea and no one does it well. Every application has huge caveats baked into
their application so I recommend not doing that. Buy seriously, this is a
few hours of work or less.

Thanks,
Ben

On Sun, Dec 13, 2020, 4:05 PM Tim Uckun  wrote:

> >pgAdmin does not create directories, it just organizes the contents of
> the system catalogs into GUI elements.
>
>
> I realize that :). I meant organized in the same way but on disk.
>
>
>


Re: Is it possible to compare a long text string and fuzzy match only phrases contained in?

2021-01-17 Thread Benedict Holland
You want to do NLP in postgres? I would say that you would need a tool like
opennlp to get your tokens and phases, then run a fuzzy matching algorithm.

Unless postgres has nlp capabilities but I am not sure I would use them.
You actually want something fairly complex.

Thanks,
Ben

On Sun, Jan 17, 2021, 4:55 PM Shaozhong SHI  wrote:

> We are looking for working examples of comparing a long text string and
> fuzzy-matching multiple words (namely, phrases) contained in.
>
> Any such work examples?
>
> Regards,
>
> David
>


Re: Copy & Re-copy of DB

2021-01-21 Thread Benedict Holland
Pg_dump does what you want but you could also set up replication.

Thanks,
Ben

On Thu, Jan 21, 2021, 8:42 AM Ray O'Donnell  wrote:

> On 21/01/2021 13:13, sivapostg...@yahoo.com wrote:
> > Hello,
> >
> > I'm from SQL Server now developing my current project using PG.
> >
> > In our earlier project(s), we used to copy the database from one server
> > to another, work in another server, and re-copy it in the original
> > server.   All happened easily with just transferring the two files (mdf
> > & ldf) to our required server.
> >
> > Want to replicate the same in PG.  Is there any way to accomplish the
> > same in PG ?
>
> create database  template 
>
> Would that do the job? You could also use pg_dump.
>
> Ray.
>
>
> --
> Raymond O'Donnell // Galway // Ireland
> r...@rodonnell.ie
>
>
>


Re: Copy & Re-copy of DB

2021-01-21 Thread Benedict Holland
Pg_dump and pg_restore are what you want. They create sql files that you
pass around.

But seriously, if you have two servers running, replicate one or have your
application point to a server address that you can direct via dns to
whatever active one you want. Postgres makes replication simple. There are
a lot of solutions for your problem but the 1:1 solution to copy over a
single file and load it is pg_dump and pg_restore.

Thanks,
Ben

On Thu, Jan 21, 2021, 8:53 AM sivapostg...@yahoo.com 
wrote:

> create database is to create a new database.  If we switch to new
> database, we need to change the new databasename in the program(s) that
> access this database.
>
> Is there any way to overwrite the data ?
>
>
> On Thursday, 21 January, 2021, 07:12:19 pm IST, Ray O'Donnell <
> r...@rodonnell.ie> wrote:
>
>
> On 21/01/2021 13:13, sivapostg...@yahoo.com wrote:
>
> > Hello,
> >
> > I'm from SQL Server now developing my current project using PG.
> >
> > In our earlier project(s), we used to copy the database from one server
> > to another, work in another server, and re-copy it in the original
> > server.   All happened easily with just transferring the two files (mdf
> > & ldf) to our required server.
> >
> > Want to replicate the same in PG.  Is there any way to accomplish the
> > same in PG ?
>
>
> create database  template 
>
> Would that do the job? You could also use pg_dump.
>
> Ray.
>
>
> --
> Raymond O'Donnell // Galway // Ireland
> r...@rodonnell.ie
>
>
>
>


Re: Copy & Re-copy of DB

2021-01-22 Thread Benedict Holland
I mean... the best? You just laid out a bunch of steps to define a process.
I will tell you that this sounds like a bad process and that there are
easier ways to make changes like inside of a transaction or backing up a
database before making changes or creating a table replication job, making
changes to one server, and push it to the other one.

But you didn't want to do that. You wanted to create a single file for your
database, load it into a server, make changes, and copy them back. I would
say that this is an overly complicated and unnecessary process but if that
is how you want to make changes then what you described is the set of
steps.

Thanks,
Ben

On Fri, Jan 22, 2021, 2:35 AM Ken Tanzer  wrote:

>
>
> On Thu, Jan 21, 2021 at 11:12 PM sivapostg...@yahoo.com <
> sivapostg...@yahoo.com> wrote:
>
>> So the solution to the issue will be
>>
>> 1.  Backup a DB using PGDUMP from 1st server.
>> 2.  Restore the DB in 2nd server.
>> 3.  Make required changes in the 2nd server.
>> 4.  Backup that DB using PGDUMP from 2nd server.
>> 5.  Delete / Rename that DB in the 1st server
>> 6.  Restore that DB in the 1st server.
>> 7.  Work again in the 1st server.
>>
>> Is this the best way to carry out this process?
>>
>>
> Well I don't know if it's best or not.  But you said you wanted:
>
> to copy the database from one server to another, work in another server,
>> and re-copy it in the original server.
>>
>> (and additionally end up with the database having the same name on the
> original server.)
>
> So your steps do seem like they would be one way to accomplish that.  What
> counts as the "best" way would depend on what else is important to you
> about this process, i.e. what criteria would you use for best?
>
> Cheers,
> Ken
>
>
>
>
> --
> AGENCY Software
> A Free Software data system
> By and for non-profits
> *http://agency-software.org/ *
> *https://demo.agency-software.org/client
> *
> ken.tan...@agency-software.org
> (253) 245-3801
>
> Subscribe to the mailing list
>  to
> learn more about AGENCY or
> follow the discussion.
>


Re: Copy & Re-copy of DB

2021-01-22 Thread Benedict Holland
No. Just no. I is fine to make stored procedure changes in a development
environment and deploy them as part of a release. Typically you would want
some sort of change tracking software like alembic or squitch or something
like that. Production databases typically contain a huge amount of data or
data that you really shouldn't share.

Part of the release is to deploy changes to production systems. Often you
will want to back up those systems before a release in case you have to
roll back or just make small changes that you can revert. I would say that
release procedures for database deployment is well beyond the scope of this
list and every company I have worked for has different procedures.

Basically, the OP wanted to replicate a process that sort of works in MsSQL
kind of, maybe. The set of steps outlined will dump all of the tables and
restore them. If this is their process, I highly question that process but
those steps are correct. I would point out that eventually that system will
break down, is highly dependant on individuals knowing  lot of steps,
possibly exposes data to people who shouldn't have it, is overly
complicated, probably isn't best practices for releases, and is error prone
(someone makes a change that no one else knows about and it breaks a
webpage).

So is this the best? In my opinion, probably not. Will it work in the way
that the OP wanted it to work? Yes. I simply wouldn't manage a process like
this but if that is the process that the OP is comfortable with and a lot
of people agreed to, it will work. Sometimes it is easier to simply
replicate the existing bad process that a team agrees to rather than making
a better process.

Thanks,
Ben

On Fri, Jan 22, 2021, 9:21 AM Rory Campbell-Lange 
wrote:

> On 22/01/21, Benedict Holland (benedict.m.holl...@gmail.com) wrote:
> > I mean... the best? You just laid out a bunch of steps to define a
> process.
> >
> > > On Thu, Jan 21, 2021 at 11:12 PM sivapostg...@yahoo.com <
> > > sivapostg...@yahoo.com> wrote:
> > >
> > >> So the solution to the issue will be
> > >>
> > >> 1.  Backup a DB using PGDUMP from 1st server.
> > >> 2.  Restore the DB in 2nd server.
> > >> 3.  Make required changes in the 2nd server.
> > >> 4.  Backup that DB using PGDUMP from 2nd server.
> > >> 5.  Delete / Rename that DB in the 1st server
> > >> 6.  Restore that DB in the 1st server.
> > >> 7.  Work again in the 1st server.
> > >>
> > >> Is this the best way to carry out this process?
>
> Rather late to the party, and I expect this has been mentioned already,
> but presumably changes to any database are either to do with the data or
> to do with aspects such as the pl functions.
>
> Data transformations can be tested in production and testing in
> transactions, so that the logic of the transformation can be captured in
> a set of SQL statements which can be applied to either environment
> through a (possibly automated) revision control system.
>
> Data insertions are much more conveniently only done on the production
> database, as converging data between different databases can be tricky.
> However if your data is conveniently added in bulk without the risk of
> duplication, a revision control approach could also work.
>
> Finally working on pl functions and similar changes we do out of
> revision control. In other words, "if it works in testing we apply it to
> production". Of the the three only this is idempotent.
>
> In any event, perhaps the following could work?
>
> 1.  Backup the DB on the 1st server using pg_dump
> 2.  Restore the DB on the 2nd server
> 3.  Record changes on the 2nd server as revision controlled statements
> 4.  Replay changes on the 1st server using the revision controlled
> statements
>
> Regards
> Rory
>


Re: Copy & Re-copy of DB

2021-01-23 Thread Benedict Holland
Right. Like, you can absolutely do this. If you need single tables, you can
technically set pgdump to include or exclude tables. Be warned though that
this is extremely dangerous as it will not copy over any linked tables and
I don't think it does sequences associated with those tables either.
Basically, you can get yourself into a huge heap of trouble by doing that
but you can. I do that on some systems when I have huge data tables that no
one cares about.

There are always outside factors for all of this. If this is how you
figured out how to get people what they sort of want, you can do this but
be aware of the many many risks associated with it regarding security, data
management, and trying to sync databases using dump files.

Good luck!
Ben

On Sat, Jan 23, 2021, 10:17 AM Ron  wrote:

> On 1/23/21 6:52 AM, sivapostg...@yahoo.com wrote:
>
> We are an ISV.   I agree the risk involved in sharing the data.  Still few
> of my customers need that facility and are accustomed to it when using SQL
> Server.   On switch over to PG, I face this issue as a limitation. Need to
> find and provide a solution.
>
> For those customers, having good volume of data, we're implementing
> replication which resolves this issue.   For smaller sized database
> (company(ies)), they prefer (and we too) this copy and re-copy procedure,
> to transfer the data between home and office.
>
> And this pandemic made this a compulsory feature, which they don't want to
> loose.  This transfer is not a one time job, it gets repeated, which they
> have been doing for years.  Here security is not a big concern for them.
>
> Portability is the need for them.
>
>
> Sadly, the architecture of Postgres means that there's no concept of
> detaching *a single database*.
>
> If you only have one database in the "cluster" (ancient Postgres term for
> "instance"), then you can stop the cluster "-m smart", tar up data/, and
> transfer it across.  You'll need to have a directory on your dev server,
> custom postgresql.conf (that among other things uses a different port
> number) and pg_hba.conf files,
>
> TBH, tarring data/ isn't really necessary.
>
> Happiness Always
> BKR Sivaprakash
>
> On Friday, 22 January, 2021, 09:28:13 pm IST, Rory Campbell-Lange
>   wrote:
>
>
> On 22/01/21, Benedict Holland (benedict.m.holl...@gmail.com) wrote:
>
> > Sometimes it is easier to simply > replicate the existing bad process
> > that a team agrees to rather than making > a better process.
>
>
> As Alvar Aalto said in a lecture at MIT
>
> It is not by temporary building that Parthenon comes on Acropolis.
>
>
> --
> Angular momentum makes the world go 'round.
>


Re: Slow while inserting and retrieval (compared to SQL Server)

2021-02-17 Thread Benedict Holland
Sql server is a 10k dollar to 1 million dollar application (or more) that
is specifically optimized for windows and had limited to no support
anywhere else. Postgres is free and from my experience, comes within 5% of
any other dbms. Inserting 1 row at a time with auto commit on will be a bit
slow but it shouldn't be noticeable. What times are you seeing if you do
this with pgadmin4 compared to sql server? Also, have you done any
performance tuning for postgres server? There are many documents detailing
performance tuning your servers, like you probably did, at some point, with
your sql server.

Thanks,
Ben

On Wed, Feb 17, 2021, 8:28 AM sivapostg...@yahoo.com 
wrote:

> We use datawindows.  Datawindows will send the required DML statements to
> the database.  And it sent in format 1 .
>
> IN start of the application, Autocommit set to True.
> Before update of any table(s)
> Autocommit is set to False
> Insert/Update/Delete records
> If success commit else rollback
> Autocommit is set to True
>
> This has been followed for decades and it's working fine with Sql server.
>
>
> Here we are trying to insert just 10 records spread across 6 tables, which
> is taking more time.. that's what we feel.   The similar work in SQL Server
> takes much less time < as if no wait is there >.
>
> On Wednesday, 17 February, 2021, 06:48:35 pm IST, Thomas Kellerer <
> sham...@gmx.net> wrote:
>
>
> sivapostg...@yahoo.com schrieb am 17.02.2021 um 13:01:
>
> > To populate some basic data we try to insert few records (max 4
> > records) in few tables (around 6 tables) from one window.  We feel
> > that the insert time taken is longer than the time taken while using
> > Sql Server.  We tested almost a similar window that updated the
> > similar table(s) in SQL server, which was faster.  With Postgres
> > database, we need to wait for a couple of seconds before the
> > insert/update is over, which we didn't feel in Sql Server.
>
>
>
> Are you doing single-row inserts like:
>
> insert into ... values (..);
> insert into ... values (..);
> insert into ... values (..);
> insert into ... values (..);
>
> or are you doing multi-row inserts like this:
>
> insert into ... values (..), (..), (..), (..);
>
> Typically the latter will perform much better (especially if autocommit is
> enabled)
>
>
>
>
>


Re: Slow while inserting and retrieval (compared to SQL Server)

2021-02-17 Thread Benedict Holland
Yea. Let's see what the metrics actually are. It is possible that the data
gets loaded in instantly but for whatever reason you do t see the reflected
changes.

On Wed, Feb 17, 2021, 9:09 AM sivapostg...@yahoo.com 
wrote:

> So far no performance tuning done for sql server.  It works fine for the
> load. Even the express edition which is free, works better.  I don't think
> postgres will be so slow to insert such a low number of records in an empty
> database.
>
> I'll be preparing the required sql statements to insert those records in
> pgadmin and see the timings, tomorrow.
>
>
> On Wednesday, 17 February, 2021, 07:29:29 pm IST, Benedict Holland <
> benedict.m.holl...@gmail.com> wrote:
>
>
> Sql server is a 10k dollar to 1 million dollar application (or more) that
> is specifically optimized for windows and had limited to no support
> anywhere else. Postgres is free and from my experience, comes within 5% of
> any other dbms. Inserting 1 row at a time with auto commit on will be a bit
> slow but it shouldn't be noticeable. What times are you seeing if you do
> this with pgadmin4 compared to sql server? Also, have you done any
> performance tuning for postgres server? There are many documents detailing
> performance tuning your servers, like you probably did, at some point, with
> your sql server.
>
> Thanks,
> Ben
>
> On Wed, Feb 17, 2021, 8:28 AM sivapostg...@yahoo.com <
> sivapostg...@yahoo.com> wrote:
>
> We use datawindows.  Datawindows will send the required DML statements to
> the database.  And it sent in format 1 .
>
> IN start of the application, Autocommit set to True.
> Before update of any table(s)
> Autocommit is set to False
> Insert/Update/Delete records
> If success commit else rollback
> Autocommit is set to True
>
> This has been followed for decades and it's working fine with Sql server.
>
>
> Here we are trying to insert just 10 records spread across 6 tables, which
> is taking more time.. that's what we feel.   The similar work in SQL Server
> takes much less time < as if no wait is there >.
>
> On Wednesday, 17 February, 2021, 06:48:35 pm IST, Thomas Kellerer <
> sham...@gmx.net> wrote:
>
>
> sivapostg...@yahoo.com schrieb am 17.02.2021 um 13:01:
>
> > To populate some basic data we try to insert few records (max 4
> > records) in few tables (around 6 tables) from one window.  We feel
> > that the insert time taken is longer than the time taken while using
> > Sql Server.  We tested almost a similar window that updated the
> > similar table(s) in SQL server, which was faster.  With Postgres
> > database, we need to wait for a couple of seconds before the
> > insert/update is over, which we didn't feel in Sql Server.
>
>
>
> Are you doing single-row inserts like:
>
> insert into ... values (..);
> insert into ... values (..);
> insert into ... values (..);
> insert into ... values (..);
>
> or are you doing multi-row inserts like this:
>
> insert into ... values (..), (..), (..), (..);
>
> Typically the latter will perform much better (especially if autocommit is
> enabled)
>
>
>
>
>


Re: Hello - About how to install PgAdmin4 on Debian 10

2021-03-26 Thread Benedict Holland
Install docker and pull dpage/pgadmin4.

Seriously. It just works. All of the time, every time, with updates. Docker
is amazing.

On Fri, Mar 26, 2021, 2:55 PM Adrian Klaver 
wrote:

> On 3/26/21 11:00 AM, Carlos Montenegro wrote:
> > Hello Adrian.
> > First I installed postgresql version 12.6 then for pgadmin4 followed
> > this guide: https://www.pgadmin.org/download/pgadmin-4-apt/
> > 
>
> There are three choices in the above as to what mode to install as,
> please indicate which one you chose as well as the command executed.
>
>
> >
> > So, when  try to open pgadmin have the next message:
> >
> > The PgAdmin4 server could not be contacted:
> >
> > pgAdmin Runtime Environment
> > 
> > Python Path: "/usr/pgadmin4/venv/bin/python3"
> > Runtime Config File:
> "/home/ics-debian/.config/pgadmin/runtime_config.json"
> > pgAdmin Config File: "/usr/pgadmin4/web/config.py"
> > Webapp Path: "/usr/pgadmin4/web/pgAdmin4.py"
> > pgAdmin Command: "/usr/pgadmin4/venv/bin/python3 -s
> > /usr/pgadmin4/web/pgAdmin4.py"
> > Environment:
> >- USER: ics-debian
> >- LANGUAGE: es_NI:es
> >- XDG_SEAT: seat0
> >- XDG_SESSION_TYPE: x11
> >- SSH_AGENT_PID: 1330
> >- HOME: /home/ics-debian
> >- DESKTOP_SESSION: lightdm-xsession
> >- XDG_SEAT_PATH: /org/freedesktop/DisplayManager/Seat0
> >- GTK_MODULES: gail:atk-bridge
> >- DBUS_SESSION_BUS_ADDRESS: unix:path=/run/user/1000/bus
> >- GLADE_MODULE_PATH: :
> >- LOGNAME: ics-debian
> >- XDG_SESSION_CLASS: user
> >- XDG_SESSION_ID: 2
> >- PATH: /usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
> >- GLADE_PIXMAP_PATH: :
> >- XDG_SESSION_PATH: /org/freedesktop/DisplayManager/Session0
> >- XDG_RUNTIME_DIR: /run/user/1000
> >- XDG_MENU_PREFIX: xfce-
> >- LANG: es_NI.UTF-8
> >- XDG_CURRENT_DESKTOP: XFCE
> >- XDG_SESSION_DESKTOP: lightdm-xsession
> >- XAUTHORITY: /home/ics-debian/.Xauthority
> >- XDG_GREETER_DATA_DIR: /var/lib/lightdm/data/ics-debian
> >- SSH_AUTH_SOCK: /tmp/ssh-7U4tMKJqlRu8/agent.1299
> >- GLADE_CATALOG_PATH: :
> >- SHELL: /bin/bash
> >- GDMSESSION: lightdm-xsession
> >- QT_ACCESSIBILITY: 1
> >- XDG_VTNR: 7
> >- PWD: /home/ics-debian
> >- XDG_DATA_DIRS:
> > /usr/share/xfce4:/usr/local/share/:/usr/share/:/usr/share
> >- XDG_CONFIG_DIRS: /etc/xdg
> >- SESSION_MANAGER:
> > local/ics-debian:@
> /tmp/.ICE-unix/1340,unix/ics-debian:/tmp/.ICE-unix/1340
> >- DISPLAY: :0.0
> >- GDK_BACKEND: x11
> >- NO_AT_BRIDGE: 1
> >- PGADMIN_INT_PORT: 5433
> >- PGADMIN_INT_KEY: f6b73bcd-abb3-42b8-9bf2-c8d47536c80e
> >- PGADMIN_SERVER_MODE: OFF
> > 
> >
> > Failed to launch pgAdmin4. Error:
> > Error: spawn /usr/pgadmin4/venv/bin/python3 ENOENT
> >
> > Thank you in advance,
> > Carlos
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>


Re: Hello - About how to install PgAdmin4 on Debian 10

2021-03-27 Thread Benedict Holland
The containers for postgres and pgadmin4 are incredible. This entire
thread, worrying about configurations settings and options, it all
disappears. You just run the container. There are tutorials to help you get
it set up with nginx. It just works. The developers and maintainers have
done an incredible job making it ridiculously easy to just make it work.

Thanks,
Ben

On Sat, Mar 27, 2021, 11:15 AM Ray O'Donnell  wrote:

> On 27/03/2021 14:59, Ray O'Donnell wrote:
>
> > If you can, include the relevant bits from the pgAdmin log, which
> > according to [2] should be here:
> >
> >   ~/.pgadmin/pgadmin4.log
>
> A quick find + grep also found me these:
>
> ~/.local/share/pgadmin4.startup.log
> ~/.local/share/pgadmin/pgadmin4/.pgAdmin4.startup.log
>
> ...though the second one is much older.
>
> Ray.
>
> --
> Raymond O'Donnell // Galway // Ireland
> r...@rodonnell.ie
>
>
>


Re: How to install PostgreSQL binaries on a different directory than the default one

2021-04-02 Thread Benedict Holland
Why not use the docker container?

Thanks,
Ben

On Fri, Apr 2, 2021, 8:25 PM rob stone  wrote:

> Hello,
>
> On Fri, 2021-04-02 at 22:01 +, Allie Crawford wrote:
> > Hi,
> > I am new in postgresql and I am trying to install PostgreSQL on Linux
> > server but instead of using the default location for the binaries I
> > want to install them in a different directory.
> > I have research for a while and I cannot find any information on how
> > to do that using the dnf utility which is what the PostgreSQL website
> > gives me as instruction when I go the download page.
> >
> > I have a directory called /pg01 that I want to use to install the
> > binaries. The postgreSQL download site (
> > https://www.postgresql.org/download/linux/redhat/ ) gives me the
> > following steps to install the binaries:
> >
> >
> > That set of instructions shown right above, installs the binaries
> > under the/usr/pgsql-13, and what I want is to install the binaries
> > under /pg01 instead of /usr.
> > My question is, what are the changes I need to do in the instructions
> > detailed right above, so that I can get the binary installation under
> > the /pg01 directory instead of the default /usr directory.
> >
> > Thanks in advance for any help you can five me on this.
> >
> > Regards,
> > Allie
>
> Most Linux distros keep the binaries by major version. So when you
> install version 14 it will go into /usr/pgsql-14.
> Thus, by using different port numbers you can run multiple versions on
> the same box.
> Useful for development, QA, etc.
>
> HTH,
> Robert
>
>
>
>
>


Re: POSTGRES/MYSQL

2019-03-12 Thread Benedict Holland
MySQL isn't ACID. Postgresql is a full-featured database that doesn't allow
injection. It is very safe and secure. Also, the way that PostgreSQL has a
much better user management system and database/table level access ACLs.

Basically, you just asked a bunch of people who have used  PostgreSQL over
MySQL why PostgerSQL is better. It is just better. The only time that MySQL
might be better is if you have a very simple website and you want a SQL
backend. For anything else, PostgreSQL is infinitely better.

Thanks,
~Ben

On Tue, Mar 12, 2019 at 5:34 AM Thomas Kellerer  wrote:

> Laurenz Albe schrieb am 12.03.2019 um 10:05:
> >> Also MySQL has a query cache that allows the results of very common
> queries to be much faster.
> >
> > I have used that feature, and it has bitten me:
> > https://stackoverflow.com/q/44244482/6464308
>
> Note that the query cache was removed in MySQL 8.0 (don't know about
> MariaDB)
>
>
>
>
>


Re: POSTGRES/MYSQL

2019-03-12 Thread Benedict Holland
When you create a table in a transaction, it will commit the transaction
and prevent a rollback. MySQL is not ACID.

Thanks,
~Ben

On Tue, Mar 12, 2019 at 11:44 AM Ron  wrote:

> The Percona fork of MySQL makes active-active clustering very simple to
> set up.
>
> On 3/12/19 9:10 AM, Benedict Holland wrote:
>
> MySQL isn't ACID. Postgresql is a full-featured database that doesn't
> allow injection. It is very safe and secure. Also, the way that PostgreSQL
> has a much better user management system and database/table level access
> ACLs.
>
> Basically, you just asked a bunch of people who have used  PostgreSQL over
> MySQL why PostgerSQL is better. It is just better. The only time that MySQL
> might be better is if you have a very simple website and you want a SQL
> backend. For anything else, PostgreSQL is infinitely better.
>
> Thanks,
> ~Ben
>
> On Tue, Mar 12, 2019 at 5:34 AM Thomas Kellerer 
> wrote:
>
>> Laurenz Albe schrieb am 12.03.2019 um 10:05:
>> >> Also MySQL has a query cache that allows the results of very common
>> queries to be much faster.
>> >
>> > I have used that feature, and it has bitten me:
>> > https://stackoverflow.com/q/44244482/6464308
>>
>> Note that the query cache was removed in MySQL 8.0 (don't know about
>> MariaDB)
>>
>>
>>
>>
>>
> --
> Angular momentum makes the world go 'round.
>


Re: POSTGRES/MYSQL

2019-03-12 Thread Benedict Holland
I am not saying it is not well documented. I am saying that it isn't ACID
compliant, which it isn't, as they document.

It comes up far more often than I would like, particularly with migrations
and schema changes. It is one of the huge reasons I actively refuse to work
with MySQL. I have never upgraded a MySQL schema without running into a
problem where I needed to roll back and the transaction committed on a
table creation. That left my databases in an uncertain state, which is
about as bad as you can get from my perspective.

MsSQL is better at this but they heavily use dynamic SQL and SQL injection,
both of which open the database up to SQL injection. Apparently, PostgreSQL
is even fedramp compliant, and that actually matters to me.

This is still a very strange thread. This would be like asking a C#
developer why the .net stack is the best thing ever created and why they
should use it over PHP or Django. I don't know what the OP really expected
to get out of this apart from basically everyone here saying "you should
probably use PostgreSQL".

Thanks,
~Ben

On Tue, Mar 12, 2019 at 12:09 PM Michael Nolan  wrote:

> The MySQL manual says that INNODB 'adheres closely' to the ACID model,
> though there are settings where you can trade some ACID compliance for
> performance.
>
> See https://dev.mysql.com/doc/refman/5.6/en/mysql-acid.html
>
> I've been running PostgreSQL for a client since 2005, we're on our 5th
> hardware platform in that time period.  We also run a MySQL/MariaDB
> database to support our website, it is currently on an AWS RDB server which
> is refreshed from the PostgreSQL server.  I also administered an Oracle
> system for a well-known educational publisher for about 10 years.  Given my
> druthers, I'd take PostgreSQL over either Oracle or MySQL.
> --
> Mike Nolan
>
>>


Re: Where to store Blobs?

2019-03-13 Thread Benedict Holland
I store large models in the database because I need to have a historical
data to compare to. That said, I could probably also automate a git repo
but it will be just that much more work and git with binary files really
doesn't make sense. Storage is really cheap and I assume the database
stores bytesa types like they do text within a separate and partitioned
section of the database.

Thanks,
~Ben

On Wed, Mar 13, 2019 at 1:37 PM Jamesie Pic  wrote:

> Make dump/restore of database data unnecessarily expensive in terms of
> time and space imho.
>


schema change tracking

2019-05-16 Thread Benedict Holland
Hi All,

I am fairly this question has many answers but here it goes:

I need a tool that can track schema changes in a postgesql database, write
scripts to alter the tables, and store those changes in git. Are there
tools that exist that can do this?

BTW, I know if I use a framework with some form of model tracking (laravel
and Django), those can do it. The application I am writing does not track
model files and I really cannot compare sql schema dumps across versions.
It would be far too time-consuming.

Thanks,
~Ben


Re: schema change tracking

2019-05-16 Thread Benedict Holland
Hi All!

Thanks for the replies. I am looking for a solution, possibly 3rd party,
where I write changes to the schema and it keeps track of the changes I
made. I am used to doing stuff in pgadmin4 but I don't mind something else.

This isn't really ad-hoc. This is more like, I have a v1.0 release coming
out soon and I have made dozens or hundreds of changes to the schema from
the alpha. Right now, I can destroy the databases and recreate them without
data loss because it is in active development. When this gets released, we
need some way to make sure that we can easily migrate between release
versions and that means schema change tracking. Basically, to Steve's
concern, I don't have to go back and figure out changes (thankfully, I have
done that and it sucks) but we need something from an initial release
moving forward.

I don't have time to write my own and this problem is complex enough to
either buy a solution or go with a 3rd party application.

So far, SQITCH is looking like a winner. Yoyo can work as I have access to
python. We are not using SQLAlchemy but I don't see a reason why we can't.

Thanks all for the suggestions. This is great.
~Ben


On Thu, May 16, 2019 at 1:06 PM Mark Fletcher  wrote:

> On Thu, May 16, 2019 at 9:41 AM Benedict Holland <
> benedict.m.holl...@gmail.com> wrote:
>
>>
>> I need a tool that can track schema changes in a postgesql database,
>> write scripts to alter the tables, and store those changes in git. Are
>> there tools that exist that can do this?
>>
>> We ended up rolling our own. We do schema dumps and then use
> https://www.apgdiff.com/ to diff them. For our relatively simple schemas,
> it's worked fine. One thing to note, apgdiff doesn't support `ALTER TABLE
> ONLY [a-z\.]+ REPLICA IDENTITY FULL;` lines, which we just remove before
> diffing.
>
> Cheers,
> Mark
>