Re: [GENERAL] varchar vs. text + constraint/triggers was: Help request to improve function performance

2009-04-23 Thread Karsten Hilbert
On Thu, Apr 23, 2009 at 01:21:05PM +0200, Ivan Sergio Borgonovo wrote:

> I'll try to rephrase to check if I understood and for reference.
> 
> varchar is slower than text since it has to do some "data type
> check".

Yes but no. It is said to be slower because it has to do a
data length check, not a data *type* check. Oh, did you mean
"check inherent to the data type" ?

> text is faster
... than varchar(something)

> but if you add a check... it gets slower
... than itself without the check, yes

> (slower than varchar?, faster?).
subject to testing

> constraint and trigger should have the advantage that in case of
> refactoring you're not going to touch the table definition
as far as the data types are constrained, the "core" table
definition so to speak

> that *may* end in being faster.
Yes, but I wasn't concerned about faster with respect to
which method for constraints only about which datatype
(which distinction, in the case of TEXT vs VARCHAR, is
somewhat arbitrary).

> But... if in postgresql implementation varchar is just text with a
> check... how can a change in type be faster?

A change of column datatype rewrites the table (again, I am
not 100% sure whether this applies for VARCHAR(x) <->
VARCHAR(y) and VARCHAR(x) <-> TEXT) since they are
essentially the same type with or without a check).

> On the other end... you're preferring text just because they have
> the same implementation (modulo check) in Postgresql... but it may
> not be so in other DB.
> So *maybe* other DB do some optimization on varchar vs. text.
True enough but I wasn't talking about those. Seref asked
about implementing archetypes on PostgreSQL.

> Nothing can handle strings of infinite length, and much before
> reaching infinite I'll get in trouble.
> People read differently what you'd like to say writing varchar(N).

> Most people read:
> 1) we expect a length around N
> Fewer people read:
> 2) There is something not working if we get something larger than N

VARCHAR(N)

VAR  - variable something
CHAR - characters

-> so, likely, variable *number* of characters because
   it better store variable characters ;-)

(N) - some boundary condition

so, either:

- exactly N (but, then, why *VAR*char ?)
- at least N (huh ?, but, well)
- at most N

> But it may also mean:
> 3) if we get something larger than N something is going to explode
> I think the same "ambiguity" is carried by check().
> Anyway for a sufficiently large N 2) and 3) can be valid.

No doubt.

> Supposing the cost of loosing an insert for an unpredicted large
> value of N is high I'd be tempted to set N to at least protect me
> from 3) but I bet people may interpret it as 1).
If you want PostgreSQL to help protect you from the risk of
out-of-memory error, then, yes, it can help a tiny bit to
use VARCHAR(N) where N = "reasonable" (due to earlier
warning) instead of letting PG go to the limits with TEXT.
Agreed.

> In my experience anyway varchar is a good early warning for troubles
> and the risk of being misunderstood/get caught by implementation
> dependent gotcha writing varchar(N) where N mean 3) largely
> encompass the risk of loosing an insert you didn't have to lose.
I see. That's surely a valid point of view.

> Maybe I've spotted a potential advantage of check over varchar.
> If you use some kind of convention to name checks you could
> remove/re-apply them easier than spotting varchars().
> The name of the constraint may contain metadata to help you.
> The name of the constraint may also suggest why it's there to your
> colleagues.
> But this works just if your implementation perform similarly on text
> over varchar().
No, the self-documentation advantage is there regardless of
performance. But the choice is a tradeoff either way, that's
for sure.

I think we may have gotten to a point where we won't help
the OP much :-)

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] varchar vs. text + constraint/triggers was: Help request to improve function performance

2009-04-23 Thread Grzegorz Jaƛkiewicz
essentially you are trying to store a database in a database, and
that's slow for one.
Second, storing things as varchar is space and index (space)
ineffective - and that's another reason to make things slower.
Third - you need to complicate your logic to retrieve data, and that adds up.

text is less of hassle for db, true - but that's just a tip of iceberg.

Just learn to create proper database schema, and make it so it meets
your criteria - otherwise , whatever else you choose - especially
'automatic' 'intelligent' isn't going to be ever as good as proper
schema.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] varchar vs. text + constraint/triggers was: Help request to improve function performance

2009-04-23 Thread Ivan Sergio Borgonovo
On Thu, 23 Apr 2009 12:00:30 +0200
Karsten Hilbert  wrote:

> On Thu, Apr 23, 2009 at 11:33:34AM +0200, Ivan Sergio Borgonovo
> wrote:
> 
> > Karsten Hilbert  wrote:
> > 
> > > On Thu, Apr 23, 2009 at 12:02:13AM +0100, Seref Arikan wrote:
> > > 
> > > > I have a set of dynamically composed objects represented in
> > > > Java, with string values for various attributes, which have
> > > > variable length. In case you have suggestions for a better
> > > > type for this case, it would be my pleasure to hear about
> > > > them.
> > > 
> > > Seref, he's suggesting you use TEXT instead of
> > > VARCHAR(something). In PG it's actually usually *less*
> > > overhead to use the unbounded text datatype (no length check
> > > required).
> > > 
> > > Length checks mandated by business logic can be added by
> > > more dynamic means -- check constraints, triggers, etc which
> > > allow for less invasive change if needed.
> > 
> > Could you point us to some example of a constraint/trigger
> > (etc...) that is going to provide the same checking of varchar
> > and explain (if the code/example... doesn't make it clear) why
> > it should be faster or less invasive?
> 
> check constraint based:
> 
>   create table foo (
>   txt text
>   check (char_length(txt) < 1001)
>   );
> 
> trigger based:
> 
>   create function trf_check_length_1000() ... returns
> trigger ... $$...$$;
> 
>   create table foo (
>   txt text
>   );
> 
>   create trigger check_txt_length before INSERT or
> UPDATE ... execute trf_check_length_1000();
> 
> faster:
> 
> - TEXT is (judging by previous comments on this list)
>   marginally faster than VARCHAR(1000) because a) it runs
>   the same code but b) doesn't have to check for the 1000
>   length
> 
> - other options (VARCHAR, constraint, trigger) incur
>   additional overhead and are thus slower
> 
> less invasive:
> 
> Well, poor wording on my part, perhaps. What I meant is that
> changing a check constraint or trigger appears to be a less
> costly operation on a table than changing the datatype of a
> column (although I seem to remember there being some
> optimizations in place for the case of changing the *length*
> of a varchar).

I'll try to rephrase to check if I understood and for reference.

varchar is slower than text since it has to do some "data type
check".

text is faster but if you add a check... it gets slower (slower than
varchar?, faster?).

constraint and trigger should have the advantage that in case of
refactoring you're not going to touch the table definition that
*may* end in being faster.

But... if in postgresql implementation varchar is just text with a
check... how can a change in type be faster?
If it was a char(N) maybe there would be some kind of optimization
since the length of the data is known in advance... so
shrinking/enlarging a char(N) may have a different cost than
shrinking a varchar(N) that in pg *should* have the same
implementation than text.

On the other end... you're preferring text just because they have
the same implementation (modulo check) in Postgresql... but it may
not be so in other DB.
So *maybe* other DB do some optimization on varchar vs. text.

Somehow I like the idea of considering a varchar a text with a
check, but I think I prefer the "more traditional" approach since
somehow is the "most expected".

Nothing can handle strings of infinite length, and much before
reaching infinite I'll get in trouble.
People read differently what you'd like to say writing varchar(N).
Most people read:
1) we expect a length around N
Fewer people read:
2) There is something not working if we get something larger than N
But it may also mean:
3) if we get something larger than N something is going to explode
I think the same "ambiguity" is carried by check().
Anyway for a sufficiently large N 2) and 3) can be valid.

Supposing the cost of loosing an insert for an unpredicted large
value of N is high I'd be tempted to set N to at least protect me
from 3) but I bet people may interpret it as 1).

In my experience anyway varchar is a good early warning for troubles
and the risk of being misunderstood/get caught by implementation
dependent gotcha writing varchar(N) where N mean 3) largely
encompass the risk of loosing an insert you didn't have to lose.

Maybe I've spotted a potential advantage of check over varchar.
If you use some kind of convention to name checks you could
remove/re-apply them easier than spotting varchars().
The name of the constraint may contain metadata to help you.
The name of the constraint may also suggest why it's there to your
colleagues.
But this works just if your implementation perform similarly on text
over varchar().

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] varchar vs. text + constraint/triggers was: Help request to improve function performance

2009-04-23 Thread Karsten Hilbert
On Thu, Apr 23, 2009 at 11:33:34AM +0200, Ivan Sergio Borgonovo wrote:

> Karsten Hilbert  wrote:
> 
> > On Thu, Apr 23, 2009 at 12:02:13AM +0100, Seref Arikan wrote:
> > 
> > > I have a set of dynamically composed objects represented in
> > > Java, with string values for various attributes, which have
> > > variable length. In case you have suggestions for a better type
> > > for this case, it would be my pleasure to hear about them.
> > 
> > Seref, he's suggesting you use TEXT instead of
> > VARCHAR(something). In PG it's actually usually *less*
> > overhead to use the unbounded text datatype (no length check
> > required).
> > 
> > Length checks mandated by business logic can be added by
> > more dynamic means -- check constraints, triggers, etc which
> > allow for less invasive change if needed.
> 
> Could you point us to some example of a constraint/trigger (etc...)
> that is going to provide the same checking of varchar and explain
> (if the code/example... doesn't make it clear) why it should be
> faster or less invasive?

check constraint based:

create table foo (
txt text
check (char_length(txt) < 1001)
);

trigger based:

create function trf_check_length_1000() ... returns trigger ... $$...$$;

create table foo (
txt text
);

create trigger check_txt_length before INSERT or UPDATE ... execute 
trf_check_length_1000();

faster:

- TEXT is (judging by previous comments on this list)
  marginally faster than VARCHAR(1000) because a) it runs
  the same code but b) doesn't have to check for the 1000
  length

- other options (VARCHAR, constraint, trigger) incur
  additional overhead and are thus slower

less invasive:

Well, poor wording on my part, perhaps. What I meant is that
changing a check constraint or trigger appears to be a less
costly operation on a table than changing the datatype of a
column (although I seem to remember there being some
optimizations in place for the case of changing the *length*
of a varchar).


I may be wrong in the above and if so it better be brought
to our collective attention for the benefit of readers.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] varchar vs. text + constraint/triggers was: Help request to improve function performance

2009-04-23 Thread Ivan Sergio Borgonovo
On Thu, 23 Apr 2009 11:00:59 +0200
Karsten Hilbert  wrote:

> On Thu, Apr 23, 2009 at 12:02:13AM +0100, Seref Arikan wrote:
> 
> > I have a set of dynamically composed objects represented in
> > Java, with string values for various attributes, which have
> > variable length. In case you have suggestions for a better type
> > for this case, it would be my pleasure to hear about them.
> 
> Seref, he's suggesting you use TEXT instead of
> VARCHAR(something). In PG it's actually usually *less*
> overhead to use the unbounded text datatype (no length check
> required).
> 
> Length checks mandated by business logic can be added by
> more dynamic means -- check constraints, triggers, etc which
> allow for less invasive change if needed.

Could you point us to some example of a constraint/trigger (etc...)
that is going to provide the same checking of varchar and explain
(if the code/example... doesn't make it clear) why it should be
faster or less invasive?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general