Re: Tools for moving normalized data around

2023-01-18 Thread Gavan Schneider
On 19 Jan 2023, at 6:47, Peter wrote:

> Now I want to grab some part of the data, on a certain condition
> (let's say all records belonging to user 'Bob', if there is a "user"
> table somewhere at the tree-bottom), and move it to another database
> with the very same layout - which is already populated with data
> and runs a different instance of the same application.
>
> Grabbing the intended records is just some dead-simple Selects. But
> then inserting them into the other database is not fun, because some
> primary keys will likely collide.
>
A very small sliver in this problem:
The key collision problem could be avoided if the unique and arbitrary keys 
were UUID
Many of the other keys should be related to their respective “table of truth” 
so the migration is dependant on these being the same across the locations

The rest TBA down thread :)

Regards

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920




Re: pg_upgrade 13.6 to 15.1?

2023-01-15 Thread Gavan Schneider
On 16 Jan 2023, at 8:59, p...@pfortin.com wrote:

> encodings for database "template1" do not match: old "UTF8", new
> "SQL_ASCII" Failure, exiting
>
Suggest the old dB using UTF8 is the better practice, and the new dB should do 
likewise

> "template1" is not a DB I've ever messed with; so this will require that
> I fire up the old version and change the encoding somehow?
>
This is created at initdb and mostly you don’t need/want to mess with it

> Is this likely to repeat for my actual databases?
>
AFAICT the least work option is to redo the initdb for the new v15.1 database. 
There is a lot of pain (and potential data corruption) to be had trying to 
reconfigure the old one before it can be moved.

Personally, UTF8 is the way to go. It will handle everything in the old 
database and the future brings to the new one. I can see no advantage in pure 
ASCII when there is the potential for the real world to be contributing text. 
And there could well be non-ASCII characters lurking in the old dB, especially 
since someone set it up to receive them.

Regards

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920




Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

2022-12-01 Thread Gavan Schneider
On 2 Dec 2022, at 6:51, Tom Lane wrote:

> Dominique Devienne  writes:
>> On Thu, Dec 1, 2022 at 8:09 PM Christophe Pettus  wrote:
>>> Since this is a custom-built system, there is nothing keeping you from 
>>> creating your own table in the database that stores the original text of 
>>> the function.
>
>> That's not the point. If a DBA updates one of our triggers or proc or
>> whatever else,
>> the recorded info in a custom table won't be affected. We are diff'ing
>> the server-side
>> schema, against the expected in-memory model of the physical model.
>
>> Thus the dictionaries are the only source of truth we can trust for
>> the current state of the schema.
>> And beside minor syntactic differences, and some more troublesome
>> object-name rewrites, this is
>> exactly what we want. The system itself needs to preserve the original
>> DDL IMHO. --DD
>
> Do you really fail to see the contradictions in this?  You want the
> database to preserve the original DDL, but you also want it to update
> in response to subsequent alterations.  You can't have both those
> things.
>
At the risk of stating the (over) obvious…

NEW:  the Pg system has a parsed/tokenised version of functions and such like, 
with the excellent feature that these will be kept synchronised with any name 
change of objects referenced
OLD: easily available text version of functions, etc. with comments and layout 
(but not easily updatable in the event referenced objects get renamed) — 
sysadmin usage

The contradiction is obvious but both needs can be justified…

NEXT(?): human readable version of function, etc definitions be generated from 
the parsed version, with the addition of  tokens within this parsed version 
that allow programer’s comments to be reinstated. Leave the layout style 
conventions to the system and promise to keep this style consistent and 
maintained as part of the feature. Putting this generated (and updated as 
needed) text into pg_proc.prosrc would seem to be the least change needed to 
allow existing usage to move with the new Pg versions (and maybe help pgAdmin 
as well)

Regards

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920




Re: Seeking practice recommendation: is there ever a use case to have two or more superusers?

2022-11-21 Thread Gavan Schneider
On 22 Nov 2022, at 10:05, Bryn Llewellyn wrote:

> Because PG allows a cluster to have as many superusers as you please, and 
> because any one of these can create or drop another, any convention in this 
> space needs some extra mechanisms to enforce it..
>
> …  effectively tamper-proof implementation of the scheme …
>
Somewhat interesting thread so far but seems to be asking more than one 
question —
Q1. Is there ever a use case to have two or more superusers? Answer: Yes, but 
entirely depending on the use case.
Q2. [IMPLIED] How to make the database tamper-proof since at least one 
superuser is unavoidable? Answer: Not possible, ever — see below.

It is best to consider a database security system’s design objectives to be 
tamper proof from the outside (ie., general client access perspective), and 
tamper evident from within the database. As far as the server is concerned one 
person’s superuser tampering is another person’s maintenance. There is no way 
to configure login credentials to prevent malicious or mistaken changes when 
you need to have the occasional superuser role that can repair a serious fault 
or process a system upgrade. If an upgrade or repair can be anticipated it 
should already be done, the superuser is needed for the things that were not 
expected or too complex to pre-automate.

AFAICT minimal tamper evident criteria will include logs being kept of changes 
made and these on a system the database superuser cannot change. At worst the 
logs will still have recorded when they were turned off. And the logs should 
have recorded the credentials used to assume the superuser role. After that it 
is basic business management — was the person acting badly, were the 
credentials stolen, what damage was done, and are the backups useful?

The full security analysis is complex and searches around “threat landscape” 
will help widen the perspective. The question becomes one of identifying what 
is possible to prevent, what can only be mitigated, and what can only be 
repaired after the fact. Database security is a lot more complicated than just 
trying to restrict the superuser role.

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920




Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

2022-10-06 Thread Gavan Schneider

On 6 Oct 2022, at 16:04, Bryn Llewellyn wrote:

Does this imply a risk that a future PG version will go against the 
SQL standard and reject any non-latin name that is free of all 
punctuation characters, when used in the role of a SQL identifier, 
unless it's double quoted?


From my perspective this thread seems to miss the essential purposes 
behind quote_ident(). It is part of processing external/user input —
1. Protecting from PostgreSQL which always maps everything to lower case 
before anything gets to the parser
2. Protection against SQL injection when processing input from outside 
the trusted perimeter


Expecting an arbitrary string to be equal to itself after it has been 
through string processing code is risky unless that processing is part 
of the design, and quote_ident() was never designed to be part of any 
such arrangement.


Expanding —

1. It is a complex question what happens to non-ASCII characters when 
they are mapped to lower case… sometimes this is a meaningful concept 
e.g., ∏ -> π, sometimes it is not, e.g., pick any 
Chinese/Korean/Japanese character. If the designer decides to use 
non-ASCII characters in the identifier they can… just double-quote 
those identifiers. If the designer wants to use camelCase ASCII they 
can, but the identifier will be camelcase inside the machine unless it 
was double quoted.


AFAIK we never really use quote_ident() except to process external 
input. As noted above this function is not designed to be part of an 
equality test when attempting system introspection, rather —


2. The simple quote_ident() function can also be used to wrap untrusted 
input so it will not mess with the parser. It is used with 
quote_literal() when building dynamic SQL statements from user (i.e., 
untrusted) input.


From my perspective any use of these function outside their scope is 
just that… outside their scope, with no promise this usage will work 
or comply with any current or future standard, or imply anything useful 
about pretty much anything.


Maybe I’m oversimplifying but I believe the current functions work and 
do their specific jobs, and have nothing to do with anything else. So 
there is no surprise for me in the subject line. There is mild surprise 
the question was asked.


BTW this ignores whether or not PG mapping everything that’s not 
quoted to lower case is standards compliant. This whole topic would be 
simpler if the case was left alone but that’s a long road ago and I 
believe most of the bridges have been burnt :)


Regards

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong.

— H. L. Mencken, 1920


Re: Creation of FK without enforcing constraint for existing data

2022-08-02 Thread Gavan Schneider

On 2 Aug 2022, at 17:14, sivapostg...@yahoo.com wrote:


Hello,
I need to create a Foreign Key for a table without enforcing the 
constraint for existing data.   Few orphan exists in existing data, 
which we plan to resolve it later.  
We use the following query to create the FK [ Which of course checks 
for the presence of record in referencing table]


Try something like 
[sql-altertable.html](https://www.postgresql.org/docs/current/sql-altertable.html):


	ALTER TABLE public.table1 ADD CONSTRAINT "constraint1_FK" FOREIGN KEY 
(field1) REFERENCES
	public.tabnle2(field2) MATCH SIMPLE  ON UPDATE RESTRICT ON DELETE 
RESTRICT

DEFERRABLE  INITIALLY DEFERRED
;

The facility NOT to check for the presence of the parent record is 
available in SQL Server.   Searching an equivalent option in 
PostGreSQL.  Is it available?   Else any other way to achieve this.


No knowledge of SQL Server and how this is specified there, but the 
…DEFFER… syntax is according to the SQL standard


Regards

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong.

— H. L. Mencken, 1920


Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

2022-07-19 Thread Gavan Schneider

On 20 Jul 2022, at 4:08, Francisco Olarte wrote:


As a remark, in Spain bill numbers need to be gapless increasing. I
have done it with a sequence ( with cache 1, so I doubt it is much
more performant than a single row table, as it has a backing element
which needs to be saved ), and just used another process which swept
the values and filled the gap a posteriori ( there is a way to fill a
gap, has it legal uses, you can use it for that as long as you do not
have too many, basically insert cancelled bills in the gaps ). I
probably would do it with the one-row table now for simplicity, I
doubt I gained enough performance to justify the 20-something perl
lines plus the crontab entry to use sequences. As beard turns grey I
apreciate more the simpler ways to do things.

One answer to this problem has been around for a while, and my version 
is shown below. No extra table is needed, just a windowed look at the 
table where the reference is used. My usage is at human speed so 
performance is not an issue but it should be pretty good if there are 
minimal holes. What is not addressed is that a reference can be reissued 
upto the time the calling process commits an entry in the table and 
takes that reference out of circulation. There are different approaches 
to handling/preventing such collisions.


CREATE OR REPLACE FUNCTION accounts.next_active_reference()
RETURNS integer
LANGUAGE 'sql'
VOLATILE  LEAKPROOF STRICT
PARALLEL UNSAFE
COST 3000 -- pure guesstimate
AS $BODY$
SELECT L.transaction_ref + 1 AS start
FROM accounts.transaction_refs AS L
LEFT OUTER JOIN accounts.transaction_refs AS R
ON L.transaction_ref + 1 = R.transaction_ref
WHERE R.transaction_ref IS NULL
		AND L.transaction_ref >700 -- historic reasons only, added to existing 
system

ORDER BY L.transaction_ref
LIMIT 1;
$BODY$;

COMMENT ON FUNCTION accounts.next_active_reference() IS
$COMMENT$
	Determines the next available reference number, making sure to fill any 
holes.
	The holes in the sequence prior to 700 are ignored (pure history), and 
this code
	will make sure any out of sequence data blocks will not cause 
conflicts.

Credits:
	Ref: 
<http://www.xaprb.com/blog/2005/12/06/find-missing-numbers-in-a-sequence-with-sql/>
	2022 update: this link is now dead, only reporting "There is nothing 
here".

$COMMENT$;

Regards

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong.

— H. L. Mencken, 1920


Re: newbie db design question

2022-06-11 Thread Gavan Schneider
On 11 Jun 2022, at 17:05, Rino Mardo wrote:

> … experimental database design:
>
> create table products (
> product_id serial primary key,
> description text,
> supplier_id) ;
>
> create table supplier (
> supplier_id serial primary key,
> description text) ;
>
> … the products table should be linked to the supplier table via "supplier_id"
>
This is best done via foreign key constraint — see below

> … what would be the data type of supplier_id in
>
This is a matter of style, either an integer (even serial), or text (which will 
help a human like yourself when checking the data)

> how then to proceed?
>
You could adopt a style convention advocated by Joe Celko, eg., in “SQL for 
Smarties”, where the table name is the plural form of the content item (primary 
key). This is not the only way to do business. Many on this list use integers 
as an id. The rationale behind the short text identifiers is that you can see 
what is going on when inspecting the data. So in this spirit you might also 
consider a “natural”, i.e., text, as the primary key for the products. Opinions 
vary in this area and often devolve to performance. Remember the basic 
programers trade off — your time versus hardware time. Mostly we are better 
using conventions that make our time more effective, but some have truly 
serious data problems and may benefit from less human friendly practices. As 
always YMMV.

So combining all these suggestions:

pendari=#  create table products (
pendari(#  product serial primary key,
pendari(#  description text,
pendari(#  supplier text)
pendari-#  ;
CREATE TABLE

pendari=#  create table suppliers (
pendari(#  supplier text primary key,
pendari(#  description text)
pendari-#  ;
CREATE TABLE

— I used pgadmin4 to build this constraint, and I suggest you will find it 
helpful as well
— this is the definition of the constraint holding the supplier key between 
tables
—
pendari=#  ALTER TABLE IF EXISTS public.products
pendari-#  ADD CONSTRAINT fkey_check__supplier_must_exist FOREIGN KEY (supplier)
pendari-#  REFERENCES public.suppliers (supplier) MATCH SIMPLE
pendari-#  ON UPDATE CASCADE
pendari-#  ON DELETE RESTRICT
pendari-#  DEFERRABLE
pendari-#  NOT VALID;
ALTER TABLE

You now have a basic skeleton which ties the supplier keys together. The 
supplier has to be entered first, then all the products from that supplier can 
get entered. This means any attempt to enter a product where the supplier is 
not known (or has been misspelt) will raise an error. IMNSHO constraints are 
good: they help preserve data integrity, allow keys to be changes (and the 
change cascades to all the places which matter), and only cause extra db work 
when new data is entered (i.e., no ongoing overhead).

Regards

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920




Re: Strange results when casting string to double

2022-02-16 Thread Gavan Schneider
On 17 Feb 2022, at 8:02, Carsten Klein wrote:

> We have a trigger, that monitors such updates and it reports a changed value, 
> which is not correct. The problem is, that the assignment
>
> SET xyz = 1.56
>
> actually means
>
> SET xyz = 1.55999
>
> since getting a double value from the string 1.56 yields 1.55999.



> I just wanted to ask whether someone knows something about this or has ever 
> heard about such a behavior. You say, the snowflake machine has issues... I 
> don't believe in hardware issues, since it runs in VMware and likely on many 
> different CPUs. Isn't it more than unlikely that such a constantly occurring 
> error is caused by one faulty CPU (among that many CPUs an ESX server 
> typically has)?
>
Harking back to my long distant formative years I was taught to never ever rely 
on equality tests when it came to floating point values. Nothing has changed in 
this regard.

If binary exact is part of the requirement then the answer is INTEGER or 
NUMERIC never FLOAT, REAL, DOUBLE or anything similar.

So, assuming the trigger function is the only source of this spurious grief one 
could apply the lesson taught in the 60s from the dawn of computing:
EQUALITY = absolute_value(op1 - op2)  < epsilon — where op1 & op2 are 
DOUBLE, and epsilon is smaller than you care for

Given the intrinsic (standards compliant) uncertainty when converting from 
absolute (e.g., string representation) to floating point there will never be 
value resolving why there are differences.

I suggest using the comparison that is appropriate to the representation of 
those values or fix the design by using the proper representation.

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920




Re: Best Strategy for Large Number of Images

2021-12-15 Thread Gavan Schneider
On 16 Dec 2021, at 7:12, Estevan Rech wrote:

> I have an application that takes pictures of items and uses them as
> evidence in a report.
>
> These images are saved and then used to generate a report.
>
Once you start talking reports and evidence I think audit trail and 
verification a.k.a. “chain of evidence”.
With the images in the database they can be part of your information ecosystem, 
get a datestamp along with checksum and all this included in the audit trail.
All this can be done with the images in the file system and the database holds 
the pointers only… it just may not be as easy to convince the oppositions 
lawyers that it is tamper proof, except the best you can really get is tamper 
evident. 

> Each item averages 10 photos and I have about 2 million photos currently,
> with an average growth of 1 million photos over the next year.
>
> I think about using it within the database because of the ease of selecting
> records with the content of the images (when necessary).
>
That will be the same however you go. The selection will be on the context data 
you enter with the images. There has been no mention of searching within the 
image itself, and this is not something PG can do “out of the box” anyway.

> I think my biggest concern is to continue to use this strategy for the next
> 2 years and encounter a PostgreSQL limitation or some failure and have to
> migrate database again or have significant loss of images. And on the
> contrary too, if I use it on disk and in 2 years I find some failure or
> file limit in the operating system (currently it's windows server 2016)...
>
The total image data will end up somewhere less than 4Tb if your estimates 
continue to hold. That’s not going to stress PG. It’s not such a big deal on a 
modern home computer either, but I can’t make any informed comment on Windows 
anything. Realistically, once the data is in the database the whole can be 
moved to a more capable machine and/or OS if problems start to appear. PG will 
work wherever you deploy it.

> Related to this is the backup service that seems to me that in a disaster,
> I can restore the database relatively quickly if it's in the database. On
> the disk, I believe that the restoration is much slower and I don't think
> it's reliable that all the images are copied/restored.
>
If you have a reliable tested backup system for the database and don’t want to 
add to the workload then get the images into the database. If there are reasons 
why you end up with the images outside the database then, of course, the backup 
process has to be able to keep everything together and guarantee a full 
restore. Doable as well, but it does add some complexity.

If I am reading this correctly the use case is one of receiving data in the 
form of photos and field reports, entering the report items associated with the 
photos into the information system, generating a report (with embedded photos), 
and keeping records so disputes can be handled. This is not a high volume 
access to the images so there is no real need for optimum filesystem speed to 
serve the images… keeping them in the database as bytea is perfectly workable 
and will work for data volumes well beyond the current estimates.

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920




Re: Alter and move corresponding: was The tragedy of SQL

2021-09-15 Thread Gavan Schneider
On 16 Sep 2021, at 9:31, Gavin Flower wrote:

> would assign the value of 42 to x.
>
Which brings up another topic, e.g., 
https://news.mit.edu/2019/answer-life-universe-and-everything-sum-three-cubes-mathematics-0910

>  Never tried it, I now wish I had!
>
You could see if it’s accurately emulated, e.g., http://ibm1130.org/emu/

What is truly amazing about old style FORTRAN is that it has a theological 
aspect.

What other computer language can give truth to a maxim such as:
IN FORTRAN GOD IS REAL

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920




Re: Sort question - Fractions, Metric etc

2021-08-14 Thread Gavan Schneider
On 15 Aug 2021, at 7:47, Bret Stern wrote:

> I will say this business has been behind in the attributes game. Plus  there 
> are many "artsy" vendors who can hardly speak in these terms, and don't 
> publish to us, so we do the best we can.
>
>  Getting vendors to supply the basic values is a struggle.
>
I suspect you have already found all vendors reliably supply two values: how 
many (a dimensionless parameter), and how much (mostly in units of dollar) 
After that there may be length width and thickness values specified with 
(hopefully) the same units ± weight (per unit or box?) ± allowance  for 
grout/joining (in case your application is going to be used as an aid in 
estimating quantities needed)
The truly artistic supplier will refrain from making anything the same and your 
customer will be expected to buy the item first and plan their project around 
it.

One possible data wrangling scheme would be to give each item a catalogue 
number (yours) — Just noticed Adrain saying something similar so apologies for 
the overlap.
The vendor_spec table would capture the vendor’s identifier, description and 
supplied dimensions (with a units column).
The dimensions_view (suggest a materialised view) would carry the dimension 
information in converted form, e.g., mm: numeric(8,3)
Once the conversions are setup the customer can be supplied with dimensions in 
the system of their choice and you have a sensible common point reference for 
any rankings.

When I first saw this thread I thought you were getting into metal dimensions 
and started to wonder if your next phase was to “organise” bolts according to 
diameter length and thread… you have picked the easier course, the other is a 
nightmare, e.g., https://www.americanmachinetools.com/machinist_tables.htm

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920




Re: A simple question about text fields

2021-06-17 Thread Gavan Schneider
On 18 Jun 2021, at 9:34, David G. Johnston wrote:

> On Thursday, June 17, 2021, Gavan Schneider 
> wrote:
>
>>
>> My approach is to define such fields as ‘text’ and set a constraint using
>> char_length(). This allows PG to do the business with the text in native
>> form, and only imposes the cost of any length check when the field is
>> updated… best of both worlds.
>>
>
> Those are basically the same world…your alternative probably is strictly
> worse than varchar(n) because of its novel way of implementing the same
> functionality.
>
Not sure if this is strictly true. Novelty per se is not always worse. :)
The design advantage is in all text fields being defined the same — no built in 
length.
When it becomes apparent a length constraint is needed it can be added for the 
relevant field(s), e.g., when the system does not impose proper restraint at 
the input stage.

> For most text fields any good constraint is going be done in the form of a
> regular expression, one that at minimum prevents non-printable characters
> (linefeed and carriage return being obvious examples).
>
Agree. If the business rules need some additional restrictions they can go here 
as well.
Not so obvious that newlines, etc. are unwelcome. Consider the need for human 
readable text in comment fields (or other annotation where the readability is 
enhanced by such layout). There is always the consideration of SQL injection 
(but it’s mostly too late if we’re leaving this to a constraint) and other 
toxic string sequences. But this is all business logic. The database just needs 
to handle the ‘text’ and we need to design the restraint around the content.

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920




Re: A simple question about text fields

2021-06-17 Thread Gavan Schneider
On 17 Jun 2021, at 1:08, Tom Lane wrote:

> Martin Mueller  writes:
>
>> Are there performance issues with the choice of 'text' vs. varchar and some 
>> character limit?  For instance, if I have a table with ten million records 
>> and text fields that may range in length from 15 to 150, can I expect a 
>> measurable improvement in response time for using varchar(150) or will text  
>>   do just or nearly as well.
>
>  There is no situation where varchar outperforms text in Postgres.
>  If you need to apply a length constraint for application semantic
>  reasons, do so ... otherwise, text is the native type.  It's
>  useful to think of varchar as being a domain over text, though
>  for various reasons it's not implemented quite that way.
>
This reminds of my days converting from MySQL to PostgreSQL. MySQL, along with 
other databases, seemed to have a strong preference for setting a length on 
character strings. And all this from before the advent of UTF encoding which 
has made the concept of string ‘length’ very messy.

Database guru and SQL author Joe Celko asserts in his ’SQL for Smarties’ that 
if he finds a text field without a length limit he will input the Heart Sutra 
(presumably in ASCII :) to demonstrate the design error. (Of course he is 
ignoring the potential for this input to help the database achieve inner 
consistency. :) . But taking Joe’s central point there do seem to be grounds 
for restricting user input text fields to a reasonable length according to the 
business need… if only to limit the damage of a cat sitting on the keyboard.

My approach is to define such fields as ‘text’ and set a constraint using 
char_length(). This allows PG to do the business with the text in native form, 
and only imposes the cost of any length check when the field is updated… best 
of both worlds.

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920




Re: database sorting algorithms.

2021-05-01 Thread Gavan Schneider

On 1 May 2021, at 17:06, Jian He wrote:

Been self study Database, from database I deep dived into sorting 
algorithms.


Databases can do in-memory QuickSort. It also has an on-disk 
MergeSort.


For MergeSort: I follow this tutorial 
https://youtu.be/6pV2IF0fgKY?t=1108

(around 1 minutes only)


Also check https://en.wikipedia.org/wiki/Merge_sort

But I am still not fully understanding about *nlogn*. I  understand 
how many

passes it will take, that is* logn. *
Yes each pass will sort N elements.
But I still don't get the *N* stand f*or in n*logn.*


So, answering the question…
The ’n’ refers to the need to do something to each element at least 
once, so the sort time grows in simple proportion to the size of the 
list that needs to be sorted. Unfortunately that is not enough work to 
get the list sorted so extra steps are needed. The log(n) indicates how 
many extra steps are needed. So the overall performance is proportional 
to the number of elements (N) multiplied by the log of the number of 
elements, viz., N * log(N)


Regards
Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong.

— H. L. Mencken, 1920


Re: archive_commnad parameter question

2021-04-19 Thread Gavan Schneider

On 20 Apr 2021, at 7:09, Allie Crawford wrote:

archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p 
/mnt/server/archivedir/%f'  # Unix


Does anybody know what is “test” representing in this parameter 
configuration?



my_unix_prompt> man test

gives:
	 Tests  the  expression given and sets the exit status to 0 if true, 
and 1 if false. An expression is made up

  of one or more operators and their arguments.

In other words “test” here is a unix command which evaluated the 
expression supplied via the arguments.

Mostly used in its alternate form of “[ … ]” in shell scripts

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong.

— H. L. Mencken, 1920




Re: unexpected character used as group separator by to_char

2021-03-10 Thread Gavan Schneider

On 10 Mar 2021, at 16:24, Alvaro Herrera wrote:

That space (0xe280af) is U+202F, which appears to be used for French 
and

Mongolian languages (exclusively?).  It is quite possible that in the
future some other language will end up using some different whitespace
character, possibly breaking any code you write today -- the use of
U+202F appears to be quite recent.

Drifting off topic a little. That a proper code point for things that 
will benefit from the whitespace but should still stay together.
Also it’s not that new, added in 1999 — 
https://codepoints.net/U+202F
The other use case is between the number and its ISO symbol e.g., 
20 °C  — and the non-breaking form is important here
As for who uses a thin space in money… it is a normal in European 
notation — https://www.languageediting.com/format-numbers-eu-vs-us/
And the thin space is part of the international standard for breaking up 
large numbers (from 1948), specifically no dots or commas should be used 
in this role. The dot or comma is only to be used for the decimal point!
All of which just highlights problems in localisation and begs the 
question whether there is an ISO locale setting — it should work 
everywhere ;)


Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong.

— H. L. Mencken, 1920




Re: vacuumdb not letting me connect to db

2021-02-06 Thread Gavan Schneider

On 6 Feb 2021, at 23:06, Atul Kumar wrote:

Thanks for providing the details, I need more clarification on this as 
how
should I analyze that what should be ideal no. of connections should 
we set

to avoid IO overhead based on the available hardware resources.

How to do this analysis properly is outside my expertise and likely off 
topic for this list. But you can get an overall idea doing what I 
suggested: measure the performance (i.e., total time to completion) when 
using different numbers of threads). In an ideal world the time will get 
less as you invoke more threads. But there will be a point where there 
is a bottle neck (or some resource hits its limit) and the time will be 
more or less the same no matter how many threads.



How to do this calculation ?

That’s likely too hard. Measurement will tell you what you need to 
know. Once you know how many threads it takes to saturate you will know 
you need to use a lesser number. This may, or may not, get you to 
revisit the max 700 connections setting but once you know there is a 
performance limit, and you decide it’s not good enough, then you can 
ask the hardware people for help.



Note: even during 300 threads, my RAM utilisation is totally normal.

 That’s not unexpected. The CPU activity should be using the data 
held in RAM not spending all those cycles allocating RAM.


Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong.

— H. L. Mencken, 1920




Re: vacuumdb not letting me connect to db

2021-02-05 Thread Gavan Schneider

On 6 Feb 2021, at 3:37, Ron wrote:


On 2/5/21 10:22 AM, Rob Sargent wrote:



On 2/5/21 9:11 AM, Ron wrote:

Obviously... don't use 300 threads.


No, no Ron.  Clearly the answer is more CPUs


I hope you're being sarcastic.

A reasonable conjecture… though there is the consideration that 300 
CPU intensive tasks spread across a given number of CPUs is going to 
waste some resources with context switching., i.e., need more CPUs :)


Basically if there is plenty of wait time for I/O completion then CPU 
task switching can get more total work done.  So far so obvious. In this 
thread I can see where it is disappointing to have a system considered 
capable of 700 connections getting saturated by a “mere” 300 
threads. But this is only a “problem” if connections are equated to 
threads. PG max connection count is about external users having access 
to resources needed to get a task done. Like all resource allocations 
this relies on estimated average usage, i.e., each connection only asks 
for a lot of CPU in brief bursts and then the result is transmitted with 
a time lag before the connection makes another CPU demand. The system 
designer should use estimations about usage and load to budget and 
configure the system, and, monitor it all against actual performance in 
the real world. Of course estimates are a standing request for outliers 
and the system will show stress under an unexpected load.


So far I have not seen an analysis of where the bottle neck has 
occurred: CPU RAM HD and/or the data bus connecting these. Some of these 
hardware resources maxed out to the extent the system would not 
immediately pick up an additional work unit. As I see it OP started 300 
CPU intensive tasks on hardware intended for 700 connections. If the 
connection count was designed with say 50% CPU intensive time per 
connection you would expect this hardware to be fully saturated with 300 
CPU intensive tasks. More than that, doing the task with 300 threads 
would probably take longer than (say) 200 threads as the increased CPU 
context swapping time is just wasted effort.


OP now has a choice: decrease threads or (seriously) upgrade the 
hardware. We in the gallery would love to see a plot of total time to 
completion as a function of threads invoked (50-300 increments of 50) 
assuming the starting conditions are the same :)


Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong.

— H. L. Mencken, 1920




Re: How to convert escaped text column - force E prefix

2021-01-06 Thread Gavan Schneider

On 6 Jan 2021, at 19:43, Pavel Stehule wrote:

Currently there are not any functions that you need. You need to write 
your

own.


CREATE OR REPLACE FUNCTION public.unistr(text)
  RETURNS text
  LANGUAGE plpgsql
  IMMUTABLE STRICT
 AS $function$
 declare r text;
 begin
   execute 'select e''' || quote_literal($1) ||  into r;
   return r;
 end;
 $function$;


Attention: This is ugly and possible sql injection vulnerable!!! But 
there

is not another way. The fix is in queue


https://www.postgresql.org/docs/current/functions-string.html
quote_literal ( text ) → text
	  Returns the given string suitably quoted to be used as a string 
literal in an SQL statement string.

  Embedded single-quotes and backslashes are properly doubled.
	  Note that quote_literal returns null on null input; if the argument 
might be null, quote_nullable is often more suitable.

  See also Example 42.1. quote_literal(E'O\'Reilly') → 'O''Reilly'

It is even more ugly but would it at least help with the SQL injection 
risk?


Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong.

— H. L. Mencken, 1920


Re: DOMAIN/composite TYPE vs. base TYPE

2020-09-28 Thread Gavan Schneider

On 29 Sep 2020, at 7:31, Joe Abbate wrote:


Hello Rob,

On 28/9/20 17:17, Rob Sargent wrote:
just record all three fields (day, month, year) with nulls and do the 
to-date as needed.


That is not sufficient.  An earlier implementation had something like 
a CHAR(8) to record MMDD, but how can you indicate, for example, 
an issue date of a bimonthly magazine, say July-Aug 2020?  We can 
store 2020-07-01 in the DATE attribute, but we need another attribute 
to indicate it's really two months.  Also, by storing three separate 
columns, you loose the beauty of the PG DATE abstraction.


This is only a partial “fix” and goes nowhere near solving the full 
wrapper/abstraction problem…


Consider expressing all the component fields as a range. This allows you 
the ability to be a precise as you need and still have the benefits of 
well defined comparison functions.


Regards
Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong. The ancients, in the case at bar, laid the blame upon the gods: 
sometimes they were remote and surly, and sometimes they were kind. In 
the Middle Ages lesser powers took a hand in the matter, and so one 
reads of works of art inspired by Our Lady, by the Blessed Saints, by 
the souls of the departed, and even by the devil.  H. L. Mencken, 1920





Re: PG12.2 Configure cannot enalble SSL

2020-05-15 Thread Gavan Schneider

On 15 May 2020, at 22:27, Peter Eisentraut wrote:



Then the next step is to check in config.log for the details of the 
test failure.  (Search the file for "openssl/ssl.h" to find the right 
place.)



Thank you, hopefully I’ll find my typos quicker this way

configure:13101: checking openssl/ssl.h usability
	configure:13101: /usr/bin/gcc -c -Wall -Wmissing-prototypes 
-Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels 
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing 
-fwrapv -Wno-unused-	command-line-argument -O2  
-I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include  
-I/usr/local/Cellar/openssl@1.1/1.1.1g/include -I/usr/local/include 
-I/Library/Developer/CommandLineTools/usr/include conftest.c >&5

configure:13101: $? = 0
configure:13101: result: yes


Regards
Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong. The ancients, in the case at bar, laid the blame upon the gods: 
sometimes they were remote and surly, and sometimes they were kind. In 
the Middle Ages lesser powers took a hand in the matter, and so one 
reads of works of art inspired by Our Lady, by the Blessed Saints, by 
the souls of the departed, and even by the devil.  H. L. Mencken, 1920


Re: PG12.2 Configure cannot enalble SSL

2020-05-15 Thread Gavan Schneider

On 15 May 2020, at 23:24, Tom Lane wrote:


"Gavan Schneider"  writes:


HDRS=${HDRS}:"/usr/local/Cellar/openssl@1.1/1.1.1g/include"
...
--with-includes=${HRDS}



If that's an accurate copy of your script, spelling HDRS correctly
would help.

D’oh! More more proof I haven’t been having a good week.  But it’s 
getting better. Thank you for spotting this,


checking openssl/ssl.h usability... yes
checking openssl/ssl.h presence... yes
checking for openssl/ssl.h... yes

Now to get some real work done :)

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong. The ancients, in the case at bar, laid the blame upon the gods: 
sometimes they were remote and surly, and sometimes they were kind. In 
the Middle Ages lesser powers took a hand in the matter, and so one 
reads of works of art inspired by Our Lady, by the Blessed Saints, by 
the souls of the departed, and even by the devil.  H. L. Mencken, 1920


Re: PG12.2 Configure cannot enalble SSL

2020-05-15 Thread Gavan Schneider

On 14 May 2020, at 23:26, Tom Lane wrote:


"Gavan Schneider"  writes:

-bash-3.2$ ./configure --with-openssl \
> --with-includes=/usr/local/opt/openssl/include/openssl \
> --with-libraries=/usr/local/opt/openssl/lib
...
checking openssl/ssl.h usability... no
checking openssl/ssl.h presence... no
checking for openssl/ssl.h... no
	configure: error: header file  is required for 
OpenSSL


Offhand, I'll guess that you need to shorten the --with-includes
path to

--with-includes=/usr/local/opt/openssl/include

What you are showing here would only work if the header file's full 
path

is

/usr/local/opt/openssl/include/openssl/openssl/ssl.h

which doesn't seem likely.

My bad. I made an error in cobbling together a reduced example which 
achieved the same error for the wrong reasons.
I’ll share the larger version but the problem remains. It does not 
seem to be as simple as a bad path.


-bash-3.2$
-bash-3.2$ ./_Build_PostgeSQL.sh > build.log 2>&1
	-bash-3.2$ ls -las 
/usr/local/Cellar/openssl@1.1/1.1.1g/include/openssl/ssl.h
	224 -rw-r--r--  1 pendari  staff  111253 21 Apr 22:22 
/usr/local/Cellar/openssl@1.1/1.1.1g/include/openssl/ssl.h

-bash-3.2$

Where _Build_PostgeSQL.sh looks like:


echo ===
echo = `date "+%Y-%m-%d %H:%M"` 
echo ===

PORT=65432

# Working Base
BASE="/usr/local/pgsql/v12"

	# Define basic tools == Homebrew versions (updated on a regular 
basis)

MAKE="/usr/local/bin/gmake"
CC="/usr/bin/gcc" # alternate? "/usr/local/bin/gcc-9"

LIBS=""
HDRS=""
# OPENSSL
LIBS=${LIBS}:"/usr/local/Cellar/openssl@1.1/1.1.1g/lib"
HDRS=${HDRS}:"/usr/local/Cellar/openssl@1.1/1.1.1g/include"
#LIBS=${LIBS}:"/usr/local/Cellar/libressl/3.1.1/lib"
#HDRS=${HDRS}:"/usr/local/Cellar/libressl/3.1.1/include"
# OPENLDAP
LIBS=${LIBS}:"/usr/local/opt/openldap/lib"
HDRS=${HDRS}:"/usr/local/opt/openldap/include"
#  GENERIC

LIBS=${LIBS}:"/usr/local/lib:/Library/Developer/CommandLineTools/usr/lib"

HDRS=${HDRS}:"/usr/local/include:/Library/Developer/CommandLineTools/usr/include"

cd postgresql-12.2

./configure \
CC=${CC}\
--with-includes=${HRDS} 
\
--with-libraries=${LIBS}
\
--prefix=${BASE}\
--with-pgport=${PORT}   \
--with-python   \
--with-libxml --with-libxslt\
--with-wal-blocksize=8  \
--with-system-tzdata=/usr/share/zoneinfo\
--with-pam  \
--with-ldap \
--with-uuid=e2fs\
--with-bonjour  \
--with-openssl  \

echo ===
echo ===
echo --with-includes =${HDRS}
echo --with-libraries=${LIBS}
echo CC is: $CC
echo MAKE is: $MAKE
echo ===
echo ===

exit DEBUG

and generated

===
= 2020-05-15 21:43 
===
checking build system type... x86_64-apple-darwin19.4.0
checking host system type... x86_64-apple-darwin19.4.0
. . .
checking zlib.h usability... yes
checking zlib.h presence... yes
checking for zlib.h... yes
checking openssl/ssl.h usability... no
checking openssl/ssl.h presence... no
checking for openssl/ssl.h... no
configure: error: header file  is required for OpenSSL
===
===
	--with-includes 
=:/usr/local/Cellar/openssl

PG12.2 Configure cannot enalble SSL

2020-05-14 Thread Gavan Schneider


Greetings:

This problem may be related to several threads I have found, e.g.,
https://www.postgresql.org/message-id/29069.1575298784%40sss.pgh.pa.us

Context:
Building PG12.2 and need to enable SSL
Previous version builds have been uneventful but I haven’t attempted 
to enable SSL before


Sources:
PG — standard source distribution 12.2
OpenSSL — Homebrew
macOS Catalina v 10.15.4

Checks:
configure finishes normally if I do not attempt to enable SSL
the header files are at the end of the —with-includes
they are readable
	searches for this problem seem to relate to older versions and are no 
longer relevant in that I seem to have the include files properly 
identified


Problem:
	I really don’t know where to go with this except to hope someone can 
see my =obvious= error

All help appreciated.



Relecvant terminal:
-bash-3.2$ cd postgresql-12.2
-bash-3.2$ ls -las
total 1488
   0 drwxrwxrwx@ 18 pendari   staff 576 11 Feb 09:29 .
   0 drwxr-xr-x+  8 postgres  wheel 256 14 May 16:17 ..
   8 -rw-r--r--@  1 pendari   staff 730 11 Feb 09:14 .dir-locals.el
   8 -rw-r--r--@  1 pendari   staff1547 11 Feb 09:14 .gitattributes
   8 -rw-r--r--@  1 pendari   staff 504 11 Feb 09:14 .gitignore
   8 -rw-r--r--@  1 pendari   staff1192 11 Feb 09:14 COPYRIGHT
   8 -rw-r--r--@  1 pendari   staff3909 11 Feb 09:14 GNUmakefile.in
   8 -rw-r--r--@  1 pendari   staff 284 11 Feb 09:14 HISTORY
 128 -rw-r--r--@  1 pendari   staff   61606 11 Feb 09:29 INSTALL
   8 -rw-r--r--@  1 pendari   staff1665 11 Feb 09:14 Makefile
   8 -rw-r--r--@  1 pendari   staff1212 11 Feb 09:14 README
   8 -rw-r--r--@  1 pendari   staff 522 11 Feb 09:14 aclocal.m4
   0 drwxrwxrwx@ 23 pendari   staff 736 11 Feb 09:28 config
1120 -rwxr-xr-x@  1 pendari   staff  572134 11 Feb 09:14 configure
 168 -rw-r--r--@  1 pendari   staff   83465 11 Feb 09:14 configure.in
   0 drwxrwxrwx@ 59 pendari   staff1888 11 Feb 09:28 contrib
   0 drwxrwxrwx@  7 pendari   staff 224 11 Feb 09:28 doc
   0 drwxrwxrwx@ 22 pendari   staff 704 11 Feb 09:29 src
-bash-3.2$
-bash-3.2$ ./configure --with-openssl \
> --with-includes=/usr/local/opt/openssl/include/openssl \
> --with-libraries=/usr/local/opt/openssl/lib
checking build system type... x86_64-apple-darwin19.4.0
checking host system type... x86_64-apple-darwin19.4.0
...
...
...
checking zlib.h usability... yes
checking zlib.h presence... yes
checking for zlib.h... yes
checking openssl/ssl.h usability... no
checking openssl/ssl.h presence... no
checking for openssl/ssl.h... no
configure: error: header file  is required for OpenSSL
-bash-3.2$

-bash-3.2$ /usr/local/Cellar/openssl@1.1/1.1.1g/bin/openssl version
OpenSSL 1.1.1g  21 Apr 2020
-bash-3.2$



Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong. The ancients, in the case at bar, laid the blame upon the gods: 
sometimes they were remote and surly, and sometimes they were kind. In 
the Middle Ages lesser power took a hand in the matter, and so one reads 
of works of art inspired by Our Lady, by the Blessed Saints, by the 
souls of the departed, and even by the devil.  H. L. Mencken, 1920