Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Martijn van Oosterhout
On Fri, May 19, 2006 at 07:04:47PM -0400, Bruce Momjian wrote:
> > libreadline is not a problem because you can distribute postgresql
> > compiled with readline and comply with all licences involved
> > simultaneously. It doesn't work with openssl because the licence
> > requires things that are incompatable with the GPL.
> 
> My question is whether psql using libreadline.so has to be GPL, meaning
> the psql source has to be included in a binary distribution.

IANAL, but yes. Or any other of the methods allowed, like providing a
written voucher valid for at least three years. People who feel they
need to keep the source to psql secret should link against libeditline
instead.

The way I understand it, the GPL affects programs in two main ways:

1. A program which is GPL'd must, when distributed, be able to provide
all source used to build it under terms compatable with the GPL.

2. A program which includes a GPL'd header file while building, must,
when distributed, provide its own source and the library under GPL
compatable terms, but not necessariliy the source of anything else
needed to build it. This is why it's OK that psql links against openssl
and readline.

These are obviously only relevent when distributing precompiled
binaries. If you are only distributing source, none of the above
applies to you.

There's a third method that some people claim, but I don't buy. This
where a program using an interface of a GPL'd library somehow become a
derived work of said library. That's just way whacked out.

You may ofcourse disagree with any of the above, and hey, if you have a
lawyer to back you up, who am I to argue?

As for why you don't solve the problem by distributing a libpq not
compiled against OpenSSL, well, that's a different question. Back when
SSL was considered an arms exports by the US, having both SSL and
non-SSL versions was common (and a big PITA). When that disappeared,
the main reason for the split went away and people started compiling
SSL by default. This solved the problem for 99% of programs.

However, one tiny subset remains problematic:
- A library implements SSL, but only using OpenSSL
- The library doesn't use the GPL, or doesn't have an OpenSSL exception
clause.
- A GPL'd program uses this library, without an OpenSSL exception
clause.

In this subset of a subset of a subset of programs, it's a problem.
Many libraries that implement SSL provide an alternative to OpenSSL,
many programs using such libraries have exception clauses so that
there's just a handful of programs and libraries that are problematic.

As long as there's a possibility that the situation can change (either
every GPL program using postgresql gains an exception clause, or
postgresql might someday support some other library) it will probably
stay this way.

If the the postgresql core decides that OpenSSL will be the only SSL
ever supported, no matter what, well, the split distribution may yet
happen. In the meantime, we have status quo.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] text_position worst case runtime

2006-05-19 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> If it did that might be a nice solution, but I'm not sure that it does
> use B-M ... I can't find either "Boyer" or "Moore" in its source code.
> 
> There's no particular reason to suppose offhand that a regex engine
> would be faster than the naive code for fixed patterns.

Well even a lame regexp implementation ought to be O(n+m). The factors will be
less than Boyer-Moore which can skip over substantial sections of the search
space without even looking at the characters. But there's no way it would be
O(n*m) for simple patterns unless the implementation was seriously deficient.

Of course your statement could still be true for particular usage patterns
like searching many different short strings with many different patterns where
the setup time of the regexp tables may dominate.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] String Similarity

2006-05-19 Thread Oleg Bartunov

Get pg_trgm http://www.sai.msu.su/~megera/oddmuse/index.cgi/ReadmeTrgm
It doesn't depends on language.

Oleg
On Fri, 19 May 2006, Mark Woodward wrote:


I have a side project that needs to "intelligently" know if two strings
are contextually similar. Think about how CDDB information is collected
and sorted. It isn't perfect, but there should be enough information to be
usable.

Think about this:

"pink floyd - dark side of the moon - money"
"dark side of the moon - pink floyd - money"
"money - dark side of the moon - pink floyd"
etc.

To a human, these strings are almost identical. Similarly:

"dark floyd of money moon pink side the"

Is a puzzle to be solved by 13 year old children before the movie starts.

My post has three questions:

(1) Does anyone know of an efficient and numerically quantified method of
detecting these sorts of things? I currently have a fairly inefficient and
numerically bogus solution that may be the only non-impossible solution
for the problem.

(2) Does any one see a need for this feature in PostgreSQL? If so, what
kind of interface would be best accepted as a patch? I am currently
returning a match liklihood between 0 and 100;

(3) Is there also a desire for a Levenshtein distence function for text
and varchars? I experimented with it, and was forced to write the function
in item #1.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] String Similarity

2006-05-19 Thread Mark Woodward
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
>
>> I have a side project that needs to "intelligently" know if two strings
>> are contextually similar.
>
> The examples you gave seem heavy on word order and whitespace
> consideration,
> before applying any algorithms. Here's a quick perl version that does the
> job:

[SNIP]

This is a case where the example was too simple to explain the problem,
sorry. I have an implementation of Oracle's "contains" function for
PostgreSQL, and it does basically what you are doing, and, in fact, also
has Mohawk Software Extensions (LOL) that provide metaphone. The problem
is that parsing white space realy isn't reliable. Sometimes it is
pinkfloyd-darksideofthemoon.

Also, I have been thinking of other applications.

I have a piece of code that does this:

apps$ ./stratest "pink foyd dark side of the moon money" "money dark side
of the moon pink floyd"
Match:  dark side of the moon
Match: pink f
Match: money
Match: oyd

apps$ ./stratest "pinkfoyddarksideofthemoonmoney"
"moneydarksideofthemoonpinkfloyd"
Match: darksideofthemoon
Match: pinkf
Match: money
Match: oyd

I need to come up with a numerically sane way of taking this information
and understanding overall "similarity."

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] patch review, please: Autovacuum/Vacuum times via stats.

2006-05-19 Thread Bruce Momjian

Applied by Alvaro. Thanks.

---

Larry Rosenman wrote:
> Larry Rosenman wrote:
> > Greetings,
> > I've got a patch to be reviewed for having the stats system keep
> > track of the last
> > time a table was vacuumed or analyzed either by the user or via
> > AutoVacuum.
> > 
> > The patch is at:
> > http://www.lerctr.org/~ler/pg-dev/vacuum-autovacuum-times-stats.diff
> > 
> > I'd appreciate a full review, it includes docs as well.
> > 
> > Thanks!
> > 
> > LER
> 
> I just replaced this one with one that actually bumps catversion.
> 
> LER
> 
> pgsql-patches added as well. 
> 
> I think this one is applyable if the powers that be want to.
> 
> Comments/criticism welcome.
> 
> LER
> 
> 
> -- 
> Larry Rosenman http://www.lerctr.org/~ler
> Phone: +1 512-248-2683 E-Mail: ler@lerctr.org
> US Mail: 430 Valona Loop, Round Rock, TX 78681-3893
> 
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
> 

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS]

2006-05-19 Thread Jaime Casanova

Hi,

suppose we have something like this:

upd_views=# create table tabla1 (col1 point);
CREATE TABLE
upd_views=# insert into tabla1 values ('3,2');
INSERT 0 1
upd_views=# insert into tabla1 values ('2,2');
INSERT 0 1
upd_views=# insert into tabla1 values ('3,2');
INSERT 0 1

then, this select will give an error:

upd_views=# select col1, count(*) from tabla1 group by col1;
ERROR:  could not identify an ordering operator for type point
HINT:  Use an explicit ordering operator or modify the query.
upd_views=#

i guess this is related to:
http://archives.postgresql.org/pgsql-hackers/2003-08/msg00809.php

so, what happened with this idea? there is another way to automagicaly
identify an "equality operator" for datatypes like 'point'?

as you said in the message linked above that is because postgres ask
for the operator name...

--
regards,
Jaime Casanova

"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning."
  Richard Cook

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] text_position worst case runtime

2006-05-19 Thread Alvaro Herrera
Hannu Krosing wrote:

> I had a (false ?) memory that we used some variant of pcre, and that
> pcre uses BM. I may be false on both  accounts. (I know that python
> borrowed its re module from pcre).

Our code is a derivative from Henry Spencer's code found in Tcl.  It
certainly isn't Boyer Moore, because it processes chars one at a time,
left to right (BM processes right to left).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] String Similarity

2006-05-19 Thread Josh Berkus

> > I have a side project that needs to "intelligently" know if two
> > strings are contextually similar.

Also check out the "fuzzystrmatch" module in /contrib, which offers 
soundex, metaphone and levenschtein functions.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Bruce Momjian
Joshua D. Drake wrote:
> Jonah H. Harris wrote:
> > On 5/19/06, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
> >> And with that, I am going to sit in a lawn chair and watch the bonfire.
> > 
> > This is one of the finest examples of unfocused discussions I've ever
> > seen on -hackers... while surely entertaining, what a huge waste of
> > time.
> 
> All discussions on mailing lists are unfocused and a waste of time to 
> some degree, even the best of them because by its very nature, email is 
> a time wasting tool.

Let's not forget my bad jokes.  I am still chuckling at JavaZero.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Bruce Momjian
Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> On Fri, May 19, 2006 at 04:41:20PM -0400, Bruce Momjian wrote:
> > > Yes, the exeption applies to libreadline, which is why we can deliver 
> > > psql with libreadline linked on Linux, for example. But we can't on 
> > > Windows or Solaris.
> > 
> > OK, where do you see this exception?  I have not.
> 
> The exception is not relevent in this case. The exception is to allow
> GPL applications to work on non-free operating systems. Obviously a
> GPL'd application on Windows can never supply the source to the Win32
> libraries.
> 
> libreadline is not a problem because you can distribute postgresql
> compiled with readline and comply with all licences involved
> simultaneously. It doesn't work with openssl because the licence
> requires things that are incompatable with the GPL.

My question is whether psql using libreadline.so has to be GPL, meaning
the psql source has to be included in a binary distribution.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] text_position worst case runtime

2006-05-19 Thread Hannu Krosing
Ühel kenal päeval, L, 2006-05-20 kell 01:34, kirjutas Hannu Krosing:
> Ühel kenal päeval, R, 2006-05-19 kell 18:18, kirjutas Tom Lane:
> > Hannu Krosing <[EMAIL PROTECTED]> writes:
> > > I guess our regex implementation already uses boyer-moore or similar.
> > > Why not just expose the match position of substring('text' in 'regex')
> > > using some function, called  match_position(int searched_text, int
> > > regex, int matchnum) ?
> > 
> > If it did that might be a nice solution, but I'm not sure that it does
> > use B-M ... I can't find either "Boyer" or "Moore" in its source code.
> 
> Ok, maybe it is not optimised for finding longish strings inside even
> longers trings.
> 
> I had a (false ?) memory that we used some variant of pcre, and that
> pcre uses BM. I may be false on both  accounts. (I know that python
> borrowed its re module from pcre).

http://www.mcabee.org/lists/snort-users/Mar-05/msg00026.html

seems to imply that PCRE uses BM at least for some case, so I might not
have been wrong in case 2 :)


> > There's no particular reason to suppose offhand that a regex engine
> > would be faster than the naive code for fixed patterns.
> 
> if naive code is O(n*m), then starting from some values of n and m it is
> probably faster if it is based on somewhat optimised regex engine, the
> question is, what is the threasold and dataset for fasterness 
> 
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] String Similarity

2006-05-19 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


> I have a side project that needs to "intelligently" know if two strings
> are contextually similar.

The examples you gave seem heavy on word order and whitespace consideration,
before applying any algorithms. Here's a quick perl version that does the
job:

CREATE OR REPLACE FUNCTION matchval(text,text)
RETURNS INT LANGUAGE plperlu AS
$$
  
use strict;
use String::Approx 'adist';
  
my $uno = join ' ', sort split /\s+/ => lc shift;
my $dos = join ' ', sort split /\s+/ => lc shift;
  
return adist(length $unohttp://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iD8DBQFEbktUvJuQZxSWSsgRAiCtAJ9nlpqGxlYnimDPp8t5XQsc8y9RywCfZZL6
iU9iPnxHaWOvYCUD7+rK8Do=
=zo3T
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-05-19 kell 22:53, kirjutas Martijn van
Oosterhout:

> libreadline is not a problem because you can distribute postgresql
> compiled with readline and comply with all licences involved
> simultaneously. 

oh? my impression was that we are clear, because libreadline is just one
of "readline" implementations we support.

> It doesn't work with openssl because the licence
> requires things that are incompatable with the GPL.

Still clients can compile/use libpq without OpenSSL and be on safe
ground.

> The openssl faq suggest that you can take advantage of the exception,
> which reads:
> 
>   However, as a special exception, the source code distributed need not
>   include anything that is normally distributed (in either source or
>   binary form) with the major components (compiler, kernel, and so on)
>   of the operating system on which the executable runs, unless that
>   component itself accompanies the executable.
> 
> I don't buy that argument, and I'm not the only one. OpenSSL is an
> optional part of most Linuxes, so there's no way you can use that
> exception.

But on most linuxes optional parts ar also "normally distributed" :P

Even network drivers may be counted optional for pure linux (kernel)
experience, but are still normally distributed.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

NOTICE: This communication contains privileged or other confidential
information. If you have received it in error, please advise the sender
by reply email and immediately delete the message and any attachments
without copying or disclosing the contents.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] text_position worst case runtime

2006-05-19 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-05-19 kell 18:18, kirjutas Tom Lane:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > I guess our regex implementation already uses boyer-moore or similar.
> > Why not just expose the match position of substring('text' in 'regex')
> > using some function, called  match_position(int searched_text, int
> > regex, int matchnum) ?
> 
> If it did that might be a nice solution, but I'm not sure that it does
> use B-M ... I can't find either "Boyer" or "Moore" in its source code.

Ok, maybe it is not optimised for finding longish strings inside even
longers trings.

I had a (false ?) memory that we used some variant of pcre, and that
pcre uses BM. I may be false on both  accounts. (I know that python
borrowed its re module from pcre).

> There's no particular reason to suppose offhand that a regex engine
> would be faster than the naive code for fixed patterns.

if naive code is O(n*m), then starting from some values of n and m it is
probably faster if it is based on somewhat optimised regex engine, the
question is, what is the threasold and dataset for fasterness 

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Compression and on-disk sorting

2006-05-19 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-05-19 kell 14:57, kirjutas Jim C. Nasby:
> On Fri, May 19, 2006 at 09:29:44PM +0200, Martijn van Oosterhout wrote:
> > On Fri, May 19, 2006 at 10:02:50PM +0300, Hannu Krosing wrote:
> > > > > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) 
> > > > > a;
> > > > > If the tape routines were actually storing visibility information, I'd
> > > > > expect that to be pretty compressible in this case since all the 
> > > > > tuples
> > > > > were presumably created in a single transaction by pgbench.
> > > 
> > > Was he not using pg_bench data ?
> > 
> > Hmm, so there was only 3 integer fields and one varlena structure which
> > was always empty. This prepended with a tuple header with mostly blank
> > fields or at least repeated, yes, I can see how we might get a 25-to-1
> > compression.
> > 
> > Maybe we need to change pgbench so that it puts random text in the
> > filler field, that would at least put some strain on the compression
> > algorithm...
> 
> Wow, I thought there was actually something in there...
> 
> True random data wouldn't be such a great test either; what would
> probably be best is a set of random words, since in real life you're
> unlikely to have truely random data.

I usually use something like the following for my "random name" tests:

#!/usr/bin/python

import random

words = [line.strip() for line in open('/usr/share/dict/words')]

def make_random_name(min_items, max_items):
l = []
for w in range(random.randint(min_items, max_items)):
l.append(random.choice(words))
return ' '.join(l)

it gives out somewhat justifyable but still quite amusing results:

>>> make_random_name(2,4)
'encroaches Twedy'
>>> make_random_name(2,4)
'annuloida Maiah commends imputatively'
>>> make_random_name(2,4)
'terebral wine-driven pacota'
>>> make_random_name(2,4)
'ballads disenfranchise cabriolets spiny-fruited'


-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] text_position worst case runtime

2006-05-19 Thread Tom Lane
Hannu Krosing <[EMAIL PROTECTED]> writes:
> I guess our regex implementation already uses boyer-moore or similar.
> Why not just expose the match position of substring('text' in 'regex')
> using some function, called  match_position(int searched_text, int
> regex, int matchnum) ?

If it did that might be a nice solution, but I'm not sure that it does
use B-M ... I can't find either "Boyer" or "Moore" in its source code.

There's no particular reason to suppose offhand that a regex engine
would be faster than the naive code for fixed patterns.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Compression and on-disk sorting

2006-05-19 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> True random data wouldn't be such a great test either; what would
> probably be best is a set of random words, since in real life you're
> unlikely to have truely random data.

True random data would provide worst-case compression behavior, so
we'd want to try that to find out what the downside is; but we shouldn't
consider it to be the design center.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Joshua D. Drake

Jonah H. Harris wrote:

On 5/19/06, Joshua D. Drake <[EMAIL PROTECTED]> wrote:

And with that, I am going to sit in a lawn chair and watch the bonfire.


This is one of the finest examples of unfocused discussions I've ever
seen on -hackers... while surely entertaining, what a huge waste of
time.


All discussions on mailing lists are unfocused and a waste of time to 
some degree, even the best of them because by its very nature, email is 
a time wasting tool.


Have a great weekend!

Joshua D. Drake


--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Jonah H. Harris

On 5/19/06, Joshua D. Drake <[EMAIL PROTECTED]> wrote:

And with that, I am going to sit in a lawn chair and watch the bonfire.


This is one of the finest examples of unfocused discussions I've ever
seen on -hackers... while surely entertaining, what a huge waste of
time.


--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax

2006-05-19 Thread Tom Lane
Hannu Krosing <[EMAIL PROTECTED]> writes:
> Ühel kenal päeval, R, 2006-05-19 kell 16:12, kirjutas Jim C. Nasby:
>> IIRC, Oracle handles this by allowing you to prefix variables with the
>> name of the function. 

> what happens if your function name is the same as some table name or
> local record variable name ? would it still bite newcomers ? 

Yeah.  Since functions and tables have independent namespaces in PG
(dunno about Oracle), this didn't seem like much of a solution to me.

I think we should just recommend a coding practice such as _ prefix
for local variables, and leave it at that.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax

2006-05-19 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-05-19 kell 16:12, kirjutas Jim C. Nasby:
> On Fri, May 19, 2006 at 11:03:21PM +0300, Hannu Krosing wrote:
> > A less disruptive change would be to have some predefined "record" where
> > all local variables belong to, perhaps called 'local' or '_local_' :) so
> > one could access both input parameter inp_orderdate and declared
> > variable var_orderdate as local.inp_orderdate and local.var_orderdate
> > respectively ? The old use (without local.) should still work.
> 
> That might be useful for othing things, too. Having _local. and
> _parameters. could be handy, for example. But I'm not sure if this is
> better than using $ or not...

I dont think that having separate _parameters and _locals is a good idea
(then we would probably also need _const :) ). lerts have just _vars OR
_locals.

> IIRC, Oracle handles this by allowing you to prefix variables with the
> name of the function. 

what happens if your function name is the same as some table name or
local record variable name ? would it still bite newcomers ? 

> You can also have package-level variables, which
> can be handy (whatever happened to the discussion about adding packages
> or something similar to plpgsql??)

I got the impression that this was either a) tied together with adding
*procedures* in addition to functions or b) planned somehow to be solved
by using schemas

> > > I think it would be better to at least strongly recommend always
> > > prefixing variables and parameters with something. 
> > 
> > At least we should use such convention in our sample code in docs.
> 
> Yes, at a minimum.
> 
> > > I'd argue that it'd
> > > be even better to put us on the road of eventually mandating plpgsql
> > > variables be prefixed with something (like $), but I'm not holding my
> > > breath on that one...
> > 
> > I dont believe in mandating non-backward-compatible changes, but prefix
> > $ may be one way to disambiguate vars and fieldnames.
> 
> Well, this could be made optional (strict mode).

or perhaps have plpgsql_lint as a separate external tool or as a
database function :)

I guess it is hard to make a strict mode, when the need for using $ in
first place comes from inability to distinguish between fields and
variables :)

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

NOTICE: This communication contains privileged or other confidential
information. If you have received it in error, please advise the sender
by reply email and immediately delete the message and any attachments
without copying or disclosing the contents.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-05-19 kell 09:40, kirjutas Christopher
Kings-Lynne:
> > We also need better support for non C locales in tsearch.  As I was porting 
> > mysql's sakila sample database I was reminded just how painful it is when 
> > you 
> > initdb in a non-supported locale (which is probably the default on the 
> > majority of distros out there)
> 
> 
> In 8.2 tsearch2 supports utf8...

Utf8 is encoding, but I guess that tsearch2 does not care much about
locales ?

tsearch2 does not do sorts, but it may care about upper()/lower() for
languages that support it, so there our locale support should be good
for utf8 encoding if we care about language-specific case insensitivity.


-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax

2006-05-19 Thread Jim C. Nasby
On Fri, May 19, 2006 at 11:03:21PM +0300, Hannu Krosing wrote:
> A less disruptive change would be to have some predefined "record" where
> all local variables belong to, perhaps called 'local' or '_local_' :) so
> one could access both input parameter inp_orderdate and declared
> variable var_orderdate as local.inp_orderdate and local.var_orderdate
> respectively ? The old use (without local.) should still work.

That might be useful for othing things, too. Having _local. and
_parameters. could be handy, for example. But I'm not sure if this is
better than using $ or not...

IIRC, Oracle handles this by allowing you to prefix variables with the
name of the function. You can also have package-level variables, which
can be handy (whatever happened to the discussion about adding packages
or something similar to plpgsql??)

> > I think it would be better to at least strongly recommend always
> > prefixing variables and parameters with something. 
> 
> At least we should use such convention in our sample code in docs.

Yes, at a minimum.

> > I'd argue that it'd
> > be even better to put us on the road of eventually mandating plpgsql
> > variables be prefixed with something (like $), but I'm not holding my
> > breath on that one...
> 
> I dont believe in mandating non-backward-compatible changes, but prefix
> $ may be one way to disambiguate vars and fieldnames.

Well, this could be made optional (strict mode).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] String Similarity

2006-05-19 Thread Mark Woodward
> Mark Woodward wrote:
>> I have a side project that needs to "intelligently" know if two strings
>> are contextually similar. Think about how CDDB information is collected
>> and sorted. It isn't perfect, but there should be enough information to
>> be
>> usable.
>>
>> Think about this:
>>
>> "pink floyd - dark side of the moon - money"
>> "dark side of the moon - pink floyd - money"
>> "money - dark side of the moon - pink floyd"
>> etc.
>>
>> To a human, these strings are almost identical. Similarly:
>>
>> "dark floyd of money moon pink side the"
>>
>> Is a puzzle to be solved by 13 year old children before the movie
>> starts.
[snip]
>
> Hmmm...  I think I like this problem.  Maybe I'll work on it a bit as a
> contrib
> module.

I *have* a working function, but it is not very efficient and it is not
what I would call numerically predictable. And it does find the various
sub-strings between the two strings in question.

Email me offline and we can make something for contrib.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Martijn van Oosterhout
On Fri, May 19, 2006 at 04:41:20PM -0400, Bruce Momjian wrote:
> > Yes, the exeption applies to libreadline, which is why we can deliver 
> > psql with libreadline linked on Linux, for example. But we can't on 
> > Windows or Solaris.
> 
> OK, where do you see this exception?  I have not.

The exception is not relevent in this case. The exception is to allow
GPL applications to work on non-free operating systems. Obviously a
GPL'd application on Windows can never supply the source to the Win32
libraries.

libreadline is not a problem because you can distribute postgresql
compiled with readline and comply with all licences involved
simultaneously. It doesn't work with openssl because the licence
requires things that are incompatable with the GPL.

The openssl faq suggest that you can take advantage of the exception,
which reads:

  However, as a special exception, the source code distributed need not
  include anything that is normally distributed (in either source or
  binary form) with the major components (compiler, kernel, and so on)
  of the operating system on which the executable runs, unless that
  component itself accompanies the executable.

I don't buy that argument, and I'm not the only one. OpenSSL is an
optional part of most Linuxes, so there's no way you can use that
exception.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] String Similarity

2006-05-19 Thread Mark Dilger
Mark Woodward wrote:
> I have a side project that needs to "intelligently" know if two strings
> are contextually similar. Think about how CDDB information is collected
> and sorted. It isn't perfect, but there should be enough information to be
> usable.
> 
> Think about this:
> 
> "pink floyd - dark side of the moon - money"
> "dark side of the moon - pink floyd - money"
> "money - dark side of the moon - pink floyd"
> etc.
> 
> To a human, these strings are almost identical. Similarly:
> 
> "dark floyd of money moon pink side the"
> 
> Is a puzzle to be solved by 13 year old children before the movie starts.
> 
> My post has three questions:
> 
> (1) Does anyone know of an efficient and numerically quantified method of
> detecting these sorts of things? I currently have a fairly inefficient and
> numerically bogus solution that may be the only non-impossible solution
> for the problem.
> 
> (2) Does any one see a need for this feature in PostgreSQL? If so, what
> kind of interface would be best accepted as a patch? I am currently
> returning a match liklihood between 0 and 100;
> 
> (3) Is there also a desire for a Levenshtein distence function for text
> and varchars? I experimented with it, and was forced to write the function
> in item #1.

The Levenshtein distance (also known as "edit distance") won't really give you
what you want above, because operations to transplant whole chunks of the string
aren't supported.  (You can simulate it with inserts and deletes, but you pay
individually for each of them.)  Also, Levenshtein distances don't charge much
for changing a word into a similarly spelled but semantically distinct word,
such as "word" => "work".

What you would want, I think, is some function that recognizes that the whole
substring "pink floyd" has been moved from the beginning to the middle of the
string, and only charges you a small edit cost for having done so.  It would
need to recognize both the word boundaries and the transplants.  Off the top of
my head, I'm not sure how you would achieve that with good runtime
characteristics.  You can go even further and allow synonyms, so that "pink
floyd" is more related to "red floyd" than it is to "large floyd", but for that
sort of thing you would probably need to pull in wordnet.

If you want to notice that two strings contain local similarity, but don't have
an overall good Levenshtein distance, take a look at global vs. local alignment
algorithms used in biological applications.  Local alignment can be achieved in
O(n*m) time, where n and m are the lengths of the two strings, using the
Smith-Waterman algorithm.  (Temple Smith and Michael Waterman).  There are
faster heuristic algorithms, but they don't have the same guarantees.  These
local alignments might tell you something useful as a part of the overall 
solution.

Hmmm...  I think I like this problem.  Maybe I'll work on it a bit as a contrib
module.

mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Bruce Momjian
Andrew Dunstan wrote:
> Bruce Momjian wrote:
> > Rod Taylor wrote:
> >   
> >> Exceptions exist in the GPL for libraries and tools included in the
> >> operating system and this is enough in most cases. GPL applications on
> >> Windows may have problems.
> >> 
> >
> > What exception, exactly?  Does an exception apply to libreadline,
> > because list I looked, it didn't.
> >
> >   
> 
> Yes, the exeption applies to libreadline, which is why we can deliver 
> psql with libreadline linked on Linux, for example. But we can't on 
> Windows or Solaris.

OK, where do you see this exception?  I have not.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Andrew Dunstan

Bruce Momjian wrote:

Rod Taylor wrote:
  

Exceptions exist in the GPL for libraries and tools included in the
operating system and this is enough in most cases. GPL applications on
Windows may have problems.



What exception, exactly?  Does an exception apply to libreadline,
because list I looked, it didn't.

  


Yes, the exeption applies to libreadline, which is why we can deliver 
psql with libreadline linked on Linux, for example. But we can't on 
Windows or Solaris.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Bruce Momjian
Rod Taylor wrote:
> Exceptions exist in the GPL for libraries and tools included in the
> operating system and this is enough in most cases. GPL applications on
> Windows may have problems.

What exception, exactly?  Does an exception apply to libreadline,
because list I looked, it didn't.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [pgsql-advocacy] Toward A Positive Marketing Approach.

2006-05-19 Thread Matthew T. O'Connor

Tom Lane wrote:

Josh Berkus  writes:
Other projects need even more intensive coding help.  OpenOffice, for example, 
doesn't offer the Postgres driver by default because it's still too buggy.  


That seems like something that it'd be worth our while to help fix.


+1 (or +10 if that's not to piggy ;-)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [pgsql-advocacy] OO PostgreSQL Driver

2006-05-19 Thread Alvaro Herrera
Joshua D. Drake wrote:
> Robert Treat wrote:
> >On Friday 19 May 2006 14:22, Joshua D. Drake wrote:
> >>>Other projects need even more intensive coding help.  OpenOffice, for
> >>>example, doesn't offer the Postgres driver by default because it's still
> >>>too buggy. That would be solvable with money, but $1000 to $2000, not
> >>>$50.
> >>Does it really need one since it supports JDBC and ODBC?
> >>
> >
> >It's not about what OO needs, it's about what PG needs.  Consider this; if 
> >database M works out of the box...  but database P requires you to go find 
> >some third party software and download it and install it in order to 
> >work... you tell me which one is going to have the advantage in new user 
> >adoption?
> 
> Well then, why don't we convince the OO people to bundle either ODBC or 
> ODBCng with OO.

The ODBC interface is less powerful for OOo than their own SDBC
interface AFAIK (which is why they developed it).  So while it would be
good to have an ODBC driver in there, the SDBC driver will also continue
to be developed.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] text_position worst case runtime

2006-05-19 Thread Mark Dilger
Tom Lane wrote:
> Greg Stark <[EMAIL PROTECTED]> writes:
> 
>>Tom Lane <[EMAIL PROTECTED]> writes:
>>
>>>And how much code would those take?  The bottom line here is that we
>>>don't have a pile of complaints about the performance of text_position,
>>>so it's difficult to justify making it much more complicated than it
>>>is now.
> 
> 
>>It seems somewhat contrary to the Postgres design philosophy to assume that
>>all strings are small.
> 
> 
> That is a straw-man argument.  If we try to optimize every single
> function in the system to the Nth degree, we'll end up with a system
> that is unmaintainable (and likely unusably buggy as well).  We've got
> to set limits on the amount of complexity we're willing to accept in
> the core code.
> 
> Note that I have not said "you can't put Boyer-Moore into core".
> What I've said is that the case to justify doing that hasn't been made.
> And handwaving about "design philosophy" isn't the kind of case I'm
> looking for --- common applications in which it makes a real performance
> difference are what I'm looking for.
> 
> At this point we haven't even been shown any evidence that text_position
> itself is what to optimize if you need to do searches in large text
> strings.  It seems entirely likely to me that the TOAST mechanisms would
> be the bottleneck, instead.  And one should also consider other approaches
> entirely, like indexes (tsearch2 anyone?).

In case anyone is following this thread specifically for the biological sequence
data aspect of it, I should mention that I wrote a GiST index for the dna and
protein sequence datatypes.  The performance of the index was inconsistent.  For
certain data, I could get about two orders of magnitude speed increase on
selects, where the select was based on a limited regular expression approximate
match against the data.  But if you change the regular expression (or to a
degree, if you change the data) the performance can drop off to roughly tied
with a sequential scan.  And of course, inserts are far more expensive because
the index has to be kept up to date.

If anyone wants specifics, send me an email and I'll put something together.

mark

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] text_position worst case runtime

2006-05-19 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-05-19 kell 11:20, kirjutas Jim C. Nasby:
> On Thu, May 18, 2006 at 06:49:38PM -0700, Mark Dilger wrote:
> > > I would think that the worst-case times would be fairly improbable.
> > > I'm disinclined to push something as complicated as Boyer-Moore matching
> > > into this function without considerable evidence that it's a performance
> > > bottleneck for real applications.
> > 
> > A common approach in biological data applications is to store nucleic and 
> > amino
> > acid sequences as text in a relational database.  The smaller alphabet 
> > sizes and
> > the tendency for redundancy in these sequences increases the likelihood of a
> > performance problem.  I have solved this problem by writing my own data 
> > types
> > with their own functions for sequence comparison and alignment, and I used
> > boyer-moore for some of that work.  Whether the same technique should be 
> > used
> > for the text and varchar types was unclear to me, hence the question.
> 
> Perhaps it would be best to add a seperate set of functions that use
> boyer-moore, and reference them in appropriate places in the
> documentation. Unless someone has a better idea on how we can find out
> what people are actually doing in the field...

I guess our regex implementation already uses boyer-moore or similar.
Why not just expose the match position of substring('text' in 'regex')
using some function, called  match_position(int searched_text, int
regex, int matchnum) ?

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax

2006-05-19 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-05-19 kell 14:39, kirjutas Jim C. Nasby:
> On Fri, May 19, 2006 at 09:57:29PM +0300, Hannu Krosing wrote:
> > ??hel kenal p??eval, R, 2006-05-19 kell 11:24, kirjutas Jim C. Nasby:
> > 
> > > Something that's always bugged me is how horribly variables are handled
> > > in plpgsql, namely that if you have a variable and a field with the same
> > > name it can be extremely difficult to keep them seperated. Perhaps := vs
> > > = might be a way to keep it clear as to which is which...
> > 
> > I can't see how := helps here. Once you have fields, you are either in
> > SQL-land and always use = or have the fields selected into a type or
> > recors and can use record.field syntax.
>  
> The problem happens down in the SQL layer. Actually, I guess := wouldn't
> help anything...
> 
> > > Though, a better way would probably just be to provide a built-in
> > > construct for referencing plpgsql variables, such as $.
> > 
> > Where is it exactly a problem which can't be solved simply by naming
> > your variables differently from fields?
> 
> That's how I solve it, but a lot of newbies get bit by this. 

A newbie will be bit by several things anyway. For example you could
write code in C (and several other languages) with all your loop
variables named "i", and those in inner scopes overshadowing those in
outer.

> What makes
> it really bad is that they typically get bit after they've already
> written a bunch of code that doesn't prefix variable names with
> something, so then they have to switch coding-conventions after they
> already have a bunch of code written.

A less disruptive change would be to have some predefined "record" where
all local variables belong to, perhaps called 'local' or '_local_' :) so
one could access both input parameter inp_orderdate and declared
variable var_orderdate as local.inp_orderdate and local.var_orderdate
respectively ? The old use (without local.) should still work.

> I think it would be better to at least strongly recommend always
> prefixing variables and parameters with something. 

At least we should use such convention in our sample code in docs.

> I'd argue that it'd
> be even better to put us on the road of eventually mandating plpgsql
> variables be prefixed with something (like $), but I'm not holding my
> breath on that one...

I dont believe in mandating non-backward-compatible changes, but prefix
$ may be one way to disambiguate vars and fieldnames.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Chris Browne
[EMAIL PROTECTED] ("Mark Woodward") writes:
>> Jim C. Nasby wrote:
>>> Maybe a compatability layer isn't worth doing, but I certainly
>>> think it's very much worthwhile for the community to do everything
>>> possible to encourage migration from MySQL. We should be able to
>>> lay claim to most advanced and most popular OSS database.
>>>
>>
>> We'll do that by concentrating on spiffy features, not
>> compatibility layers. I want people to use PostgreSQL because it's
>> the best, not because it's just like something else.
>>
>
> While I do agree with the ideal, the reality may not be good
> enough. Even I, a PostgreSQL user for a decade, have to use MySQL
> right now because that is what the client uses.
>
> Again, there is so much code for MySQL, a MySQL emulation layer, MEL
> for short, could allow plug and play compatibility for open source,
> and closed source, applications that otherwise would force a
> PostgreSQL user to hold his or her nose and use MySQL.

But this is essentially what killed off OS/2 in the marketplace.

IBM created a "good enough" emulation layer that it ran [early]
Windows(tm) applications sufficiently well that nobody bothered
porting applications to *properly* work with OS/2.

Microsoft then played off that with exceeding success; they made sure
that future versions of Windows(tm) were sufficiently different that
OS/2 was left orphaned.

We *are* in a sufficiently comparable state here; MySQL AB is *NOT*
our friend; they want to successfully 'take over the world,' at least
as far as they can do so with their product line...
-- 
"cbbrowne","@","cbbrowne.com"
http://cbbrowne.com/info/unix.html
CBS News report on Fort Worth tornado damage:
"Eight major downtown buildings were severely damaged and 1,000 homes
were damaged, with 95 uninhabitable.  Gov. George W. Bush declared
Tarrant County a disaster area.  Federal Emergency Management Agency
workers are expected to arrive sometime next week after required
paperwork is completed."

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Jim C. Nasby
On Fri, May 19, 2006 at 03:39:23PM -0400, Mark Woodward wrote:
> >
> >> Actually, I think it's a lot more accurate to compare PostgreSQL and
> >> MySQL as FreeBSD vs Linux from about 5 years ago. Back then FreeBSD was
> >> clearly superior from a technology standpoint, and clearly playing
> >> second-fiddle when it came to users. And now, Linux is actually
> >> technically superior in most ways thanks to all the mindshare that's
> >> been poured into it.
> >
> > And with that, I am going to sit in a lawn chair and watch the bonfire.
> 
> Even I know that is NOT a discussion we want to start.

Yeah, wasn't trying to start an OS flamewar; my point is that it's now
pretty hard to find anything FreeBSD related/specific, and that the
sheer popularity of Linux has given it a huge boost in terms of
development.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] String Similarity

2006-05-19 Thread Andrew Dunstan

Mark Woodward wrote:


(3) Is there also a desire for a Levenshtein distence function for text
and varchars? I experimented with it, and was forced to write the function
in item #1.

  



fuzzystrmatch in contrib already has a Levenshtein function.

cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [pgsql-advocacy] OO PostgreSQL Driver

2006-05-19 Thread Joshua D. Drake

Robert Treat wrote:

On Friday 19 May 2006 14:22, Joshua D. Drake wrote:

Other projects need even more intensive coding help.  OpenOffice, for
example, doesn't offer the Postgres driver by default because it's still
too buggy. That would be solvable with money, but $1000 to $2000, not
$50.

Does it really need one since it supports JDBC and ODBC?



It's not about what OO needs, it's about what PG needs.  Consider this; if 
database M works out of the box...  but database P requires you to go find 
some third party software and download it and install it in order to work... 
you tell me which one is going to have the advantage in new user adoption?


Well then, why don't we convince the OO people to bundle either ODBC or 
ODBCng with OO.


Joshua D. Drake






--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Compression and on-disk sorting

2006-05-19 Thread Jim C. Nasby
On Fri, May 19, 2006 at 09:29:44PM +0200, Martijn van Oosterhout wrote:
> On Fri, May 19, 2006 at 10:02:50PM +0300, Hannu Krosing wrote:
> > > > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a;
> > > > If the tape routines were actually storing visibility information, I'd
> > > > expect that to be pretty compressible in this case since all the tuples
> > > > were presumably created in a single transaction by pgbench.
> > 
> > Was he not using pg_bench data ?
> 
> Hmm, so there was only 3 integer fields and one varlena structure which
> was always empty. This prepended with a tuple header with mostly blank
> fields or at least repeated, yes, I can see how we might get a 25-to-1
> compression.
> 
> Maybe we need to change pgbench so that it puts random text in the
> filler field, that would at least put some strain on the compression
> algorithm...

Wow, I thought there was actually something in there...

True random data wouldn't be such a great test either; what would
probably be best is a set of random words, since in real life you're
unlikely to have truely random data.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] String Similarity

2006-05-19 Thread Martijn van Oosterhout
On Fri, May 19, 2006 at 04:00:48PM -0400, Mark Woodward wrote:
> (3) Is there also a desire for a Levenshtein distence function for text
> and varchars? I experimented with it, and was forced to write the function
> in item #1.

Postgres already has a Levenshtein distence function, see fuzzystrmatch
in contrib. Whatever you come up with might fit in well there...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


[HACKERS] String Similarity

2006-05-19 Thread Mark Woodward
I have a side project that needs to "intelligently" know if two strings
are contextually similar. Think about how CDDB information is collected
and sorted. It isn't perfect, but there should be enough information to be
usable.

Think about this:

"pink floyd - dark side of the moon - money"
"dark side of the moon - pink floyd - money"
"money - dark side of the moon - pink floyd"
etc.

To a human, these strings are almost identical. Similarly:

"dark floyd of money moon pink side the"

Is a puzzle to be solved by 13 year old children before the movie starts.

My post has three questions:

(1) Does anyone know of an efficient and numerically quantified method of
detecting these sorts of things? I currently have a fairly inefficient and
numerically bogus solution that may be the only non-impossible solution
for the problem.

(2) Does any one see a need for this feature in PostgreSQL? If so, what
kind of interface would be best accepted as a patch? I am currently
returning a match liklihood between 0 and 100;

(3) Is there also a desire for a Levenshtein distence function for text
and varchars? I experimented with it, and was forced to write the function
in item #1.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax

2006-05-19 Thread Jim C. Nasby
On Fri, May 19, 2006 at 09:57:29PM +0300, Hannu Krosing wrote:
> ??hel kenal p??eval, R, 2006-05-19 kell 11:24, kirjutas Jim C. Nasby:
> 
> > Something that's always bugged me is how horribly variables are handled
> > in plpgsql, namely that if you have a variable and a field with the same
> > name it can be extremely difficult to keep them seperated. Perhaps := vs
> > = might be a way to keep it clear as to which is which...
> 
> I can't see how := helps here. Once you have fields, you are either in
> SQL-land and always use = or have the fields selected into a type or
> recors and can use record.field syntax.
 
The problem happens down in the SQL layer. Actually, I guess := wouldn't
help anything...

> > Though, a better way would probably just be to provide a built-in
> > construct for referencing plpgsql variables, such as $.
> 
> Where is it exactly a problem which can't be solved simply by naming
> your variables differently from fields?

That's how I solve it, but a lot of newbies get bit by this. What makes
it really bad is that they typically get bit after they've already
written a bunch of code that doesn't prefix variable names with
something, so then they have to switch coding-conventions after they
already have a bunch of code written.

I think it would be better to at least strongly recommend always
prefixing variables and parameters with something. I'd argue that it'd
be even better to put us on the road of eventually mandating plpgsql
variables be prefixed with something (like $), but I'm not holding my
breath on that one...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] New feature proposal

2006-05-19 Thread Marc Munro
On Fri, 2006-05-19 at 14:44 -0400, Tom Lane wrote:
> Marc Munro <[EMAIL PROTECTED]> writes:
> > My proposal makes it possible for properly configured add-ins to have a
> > guaranteed amount of shared memory available.
> 
> This could all be solved in a cleaner, more bulletproof way if you
> simply require such add-ins to be preloaded into the postmaster process
> using the existing preload_libraries hook.  Then, such an add-in would
> allocate its own shmem segment independent of the main Postgres one.
> This totally eliminates worries about one chunk of code eating the other
> one's memory, which otherwise we'd have to have additional mechanism to
> deal with.

This is an interesting idea that I had not previously considered.  I
will give it some thought.

I'm not convinced that we actually do need to prevent add-ins from
eating each other's memory.  Just as existing add-ins that use palloc
are expected to use the appropriate memory context and behave
themselves, I would expect the same to be true for add-ins that require
shared memory.

> In a Unix environment, such a thing would Just Work because pointers to
> the new segment would be inherited through fork().  In the Windows port
> you'd need to do more pushups --- perhaps allocate a small amount of
> memory in the main Postgres shmem segment containing the ID of the other
> shmem segment, which a backend would use to reattach.
> 

For me, adding windows-specific code to Veil is highly unappealling - I
have no easy way to build or test for windows, and no experience of
doing so, so the more I can leverage the existing functionality, the
better.  

I had hoped to simply piggyback on Postgres' existing memory management
with a very small change to effectively add an add-in shared memory
context.

On the other hand, if this is the way we have to go, then perhaps it
could be added to Postgres as part of its api, rather than having Veil,
and perhaps other add-ins, implement it for themselves.

Thoughts?

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Compression and on-disk sorting

2006-05-19 Thread Jim C. Nasby
On Fri, May 19, 2006 at 10:02:50PM +0300, Hannu Krosing wrote:
> ??hel kenal p??eval, R, 2006-05-19 kell 14:53, kirjutas Tom Lane:
> > "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > > On Fri, May 19, 2006 at 09:29:03AM +0200, Martijn van Oosterhout wrote:
> > >> I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost
> > >> unbeleiveable. What's in the table? It would seem to imply that our
> > >> tuple format is far more compressable than we expected.
> > 
> > > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a;
> > > If the tape routines were actually storing visibility information, I'd
> > > expect that to be pretty compressible in this case since all the tuples
> > > were presumably created in a single transaction by pgbench.
> > 
> > It's worse than that: IIRC what passes through a heaptuple sort are
> > tuples manufactured by heap_form_tuple, which will have consistently
> > zeroed header fields.  However, the above isn't very helpful since the
> > rest of us have no idea what that "accounts" table contains.  How wide
> > is the tuple data, and what's in it?
> 
> Was he not using pg_bench data ?

I am. For reference:

bench=# \d accounts 
   Table "public.accounts"
  Column  | Type  | Modifiers 
--+---+---
 aid  | integer   | not null
 bid  | integer   | 
 abalance | integer   | 
 filler   | character(84) | 


> > (This suggests that we might try harder to strip unnecessary header info
> > from tuples being written to tape inside tuplesort.c.  I think most of
> > the required fields could be reconstructed given the TupleDesc.)
> 
> I guess that tapefiles compress better than averahe table because they
> are sorted, and thus at least a little more repetitive than the rest. 
> If there are varlen types, then they usually also have abundance of
> small 4-byte integers, which should also compress at least better than
> 4/1, maybe a lot better.

If someone wants to provide a patch that strips out the headers I can test that
as well.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Compression and on-disk sorting

2006-05-19 Thread Martijn van Oosterhout
On Fri, May 19, 2006 at 10:02:50PM +0300, Hannu Krosing wrote:
> > > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a;
> > > If the tape routines were actually storing visibility information, I'd
> > > expect that to be pretty compressible in this case since all the tuples
> > > were presumably created in a single transaction by pgbench.
> 
> Was he not using pg_bench data ?

Hmm, so there was only 3 integer fields and one varlena structure which
was always empty. This prepended with a tuple header with mostly blank
fields or at least repeated, yes, I can see how we might get a 25-to-1
compression.

Maybe we need to change pgbench so that it puts random text in the
filler field, that would at least put some strain on the compression
algorithm...

> I guess that tapefiles compress better than averahe table because they
> are sorted, and thus at least a little more repetitive than the rest. 
> If there are varlen types, then they usually also have abundance of
> small 4-byte integers, which should also compress at least better than
> 4/1, maybe a lot better.

Hmm, that makes sense. That also explains the 37-to-1 compression I was
seeing on indexes :).

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Mark Woodward
>
>> Actually, I think it's a lot more accurate to compare PostgreSQL and
>> MySQL as FreeBSD vs Linux from about 5 years ago. Back then FreeBSD was
>> clearly superior from a technology standpoint, and clearly playing
>> second-fiddle when it came to users. And now, Linux is actually
>> technically superior in most ways thanks to all the mindshare that's
>> been poured into it.
>
> And with that, I am going to sit in a lawn chair and watch the bonfire.
>

Even I know that is NOT a discussion we want to start.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [pgsql-advocacy] OO PostgreSQL Driver

2006-05-19 Thread Robert Treat
On Friday 19 May 2006 14:22, Joshua D. Drake wrote:
> > Other projects need even more intensive coding help.  OpenOffice, for
> > example, doesn't offer the Postgres driver by default because it's still
> > too buggy. That would be solvable with money, but $1000 to $2000, not
> > $50.
>
> Does it really need one since it supports JDBC and ODBC?
>

It's not about what OO needs, it's about what PG needs.  Consider this; if 
database M works out of the box...  but database P requires you to go find 
some third party software and download it and install it in order to work... 
you tell me which one is going to have the advantage in new user adoption?

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Compression and on-disk sorting

2006-05-19 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-05-19 kell 14:53, kirjutas Tom Lane:
> "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > On Fri, May 19, 2006 at 09:29:03AM +0200, Martijn van Oosterhout wrote:
> >> I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost
> >> unbeleiveable. What's in the table? It would seem to imply that our
> >> tuple format is far more compressable than we expected.
> 
> > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a;
> > If the tape routines were actually storing visibility information, I'd
> > expect that to be pretty compressible in this case since all the tuples
> > were presumably created in a single transaction by pgbench.
> 
> It's worse than that: IIRC what passes through a heaptuple sort are
> tuples manufactured by heap_form_tuple, which will have consistently
> zeroed header fields.  However, the above isn't very helpful since the
> rest of us have no idea what that "accounts" table contains.  How wide
> is the tuple data, and what's in it?

Was he not using pg_bench data ?

> (This suggests that we might try harder to strip unnecessary header info
> from tuples being written to tape inside tuplesort.c.  I think most of
> the required fields could be reconstructed given the TupleDesc.)

I guess that tapefiles compress better than averahe table because they
are sorted, and thus at least a little more repetitive than the rest. 
If there are varlen types, then they usually also have abundance of
small 4-byte integers, which should also compress at least better than
4/1, maybe a lot better.


-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax

2006-05-19 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-05-19 kell 11:24, kirjutas Jim C. Nasby:

> Something that's always bugged me is how horribly variables are handled
> in plpgsql, namely that if you have a variable and a field with the same
> name it can be extremely difficult to keep them seperated. Perhaps := vs
> = might be a way to keep it clear as to which is which...

I can't see how := helps here. Once you have fields, you are either in
SQL-land and always use = or have the fields selected into a type or
recors and can use record.field syntax.

> Though, a better way would probably just be to provide a built-in
> construct for referencing plpgsql variables, such as $.

Where is it exactly a problem which can't be solved simply by naming
your variables differently from fields?

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

NOTICE: This communication contains privileged or other confidential
information. If you have received it in error, please advise the sender
by reply email and immediately delete the message and any attachments
without copying or disclosing the contents.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Jim C. Nasby
Moving to -advocacy, bcc to -hackers.

On Fri, May 19, 2006 at 08:11:42AM -0700, Joshua D. Drake wrote:
>  When MySQL is at that
> >point, which database do you think executives will be choosing? The one
> >with a very large userbase and lots of marketing and PR that they've
> >heard plenty about,
> 
> All due respect, Jim -- but don't you work for a publicly traded 
> database company that happens to have its own version of PostgreSQL?

Actually, we haven't had a distribution of PostgreSQL since 8.0.3, and
even then it was only a distribution; the bits were all community.

> This is really a discussion for your marketing (and mine frankly) then 
> the PostgreSQL mailing lists :)

Yes and no... should MySQL eventually become popular enough that there's
little use of PostgreSQL that hurts the community just as much as it
hurts our companies. In fact, I'd say it's already hurting the community
more than our companies; look at how many people lament about running
software XYZ because it only supports MySQL. Or about trying to find
PostgreSQL hosting providers.

But yes, the group of PostgreSQL companies should also be working to
raise awareness of PostgreSQL as a very viable OSS database.
Unfortunately, a lot of the commercial interest is in the higher-end
market. And to a large extent, this really needs to be a grass-roots
effort. After all, you don't win OSS mindshare by taking out ads or
anything like that. So I think this really needs to be a joint venture
between companies and the community.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Compression and on-disk sorting

2006-05-19 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> On Fri, May 19, 2006 at 09:29:03AM +0200, Martijn van Oosterhout wrote:
>> I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost
>> unbeleiveable. What's in the table? It would seem to imply that our
>> tuple format is far more compressable than we expected.

> It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a;
> If the tape routines were actually storing visibility information, I'd
> expect that to be pretty compressible in this case since all the tuples
> were presumably created in a single transaction by pgbench.

It's worse than that: IIRC what passes through a heaptuple sort are
tuples manufactured by heap_form_tuple, which will have consistently
zeroed header fields.  However, the above isn't very helpful since the
rest of us have no idea what that "accounts" table contains.  How wide
is the tuple data, and what's in it?

(This suggests that we might try harder to strip unnecessary header info
from tuples being written to tape inside tuplesort.c.  I think most of
the required fields could be reconstructed given the TupleDesc.)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Joshua D. Drake



Actually, I think it's a lot more accurate to compare PostgreSQL and
MySQL as FreeBSD vs Linux from about 5 years ago. Back then FreeBSD was
clearly superior from a technology standpoint, and clearly playing
second-fiddle when it came to users. And now, Linux is actually
technically superior in most ways thanks to all the mindshare that's
been poured into it.


And with that, I am going to sit in a lawn chair and watch the bonfire.

Joshua D. Drake

--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] New feature proposal

2006-05-19 Thread Tom Lane
Marc Munro <[EMAIL PROTECTED]> writes:
> My proposal makes it possible for properly configured add-ins to have a
> guaranteed amount of shared memory available.

This could all be solved in a cleaner, more bulletproof way if you
simply require such add-ins to be preloaded into the postmaster process
using the existing preload_libraries hook.  Then, such an add-in would
allocate its own shmem segment independent of the main Postgres one.
This totally eliminates worries about one chunk of code eating the other
one's memory, which otherwise we'd have to have additional mechanism to
deal with.

In a Unix environment, such a thing would Just Work because pointers to
the new segment would be inherited through fork().  In the Windows port
you'd need to do more pushups --- perhaps allocate a small amount of
memory in the main Postgres shmem segment containing the ID of the other
shmem segment, which a backend would use to reattach.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Jim C. Nasby
On Fri, May 19, 2006 at 11:29:23AM -0400, Mark Woodward wrote:
> I kind of agree with this statement, but while I was playing devils's
> advocate and just grousing a bit about having to use MySQL, there is a
> sort of reality of "openomics" where mind-share is everything.
> 
> The more mind-share you have, the more opportunities you have and the more
> resources become available. Not always, of course, look at OpenSSH, but
> for the most part.
> 
> As MySQL adds features, not matter how poorly implemented, and maintain a
> migration path, we will never reach their users.
> 
> PostgreSQL is better, true, but it is not ideal in many ways. It can be
> best said that the difference between PostgreSQL and MySQL is similar to
> the difference between Linux/BSD and Windows.

Actually, I think it's a lot more accurate to compare PostgreSQL and
MySQL as FreeBSD vs Linux from about 5 years ago. Back then FreeBSD was
clearly superior from a technology standpoint, and clearly playing
second-fiddle when it came to users. And now, Linux is actually
technically superior in most ways thanks to all the mindshare that's
been poured into it.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Compression and on-disk sorting

2006-05-19 Thread Jim C. Nasby
On Fri, May 19, 2006 at 09:29:03AM +0200, Martijn van Oosterhout wrote:
> On Thu, May 18, 2006 at 10:02:44PM -0500, Jim C. Nasby wrote:
> > http://jim.nasby.net/misc/compress_sort.txt is preliminary results.
> > I've run into a slight problem in that even at a compression level of
> > -3, zlib is cutting the on-disk size of sorts by 25x. So my pgbench sort
> > test with scale=150 that was producing a 2G on-disk sort is now
> > producing a 80M sort, which obviously fits in memory. And cuts sort
> > times by more than half.
> 
> I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost
> unbeleiveable. What's in the table? It would seem to imply that our
> tuple format is far more compressable than we expected.

It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a;
If the tape routines were actually storing visibility information, I'd
expect that to be pretty compressible in this case since all the tuples
were presumably created in a single transaction by pgbench.

If needs be, I could try the patch against http://stats.distributed.net,
assuming that it would apply to REL_8_1.

> Do you have any stats on CPU usage? Memory usage?

I've only been taking a look at vmstat from time-to-time, and I have yet
to see the machine get CPU-bound. Haven't really paid much attention to
memory. Is there anything in partucular you're looking for? I can log
vmstat for the next set of runs (with a scaling factor of 1). I plan
on doing those runs tonight...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] New feature proposal

2006-05-19 Thread Marc Munro
On Fri, 2006-05-19 at 10:05 -0700, Josh Berkus wrote:
> Marc,
> 
> > The add-in would not "know" how much had been allocated to it, but could
> > be told through it's own config file.  I envisage something like:
> >
> > in postgresql.conf
> >
> > # add_in_shmem = 0# Amount of shared mem to set aside for add-ins
> >   # in KBytes
> > add_in_shem = 64
> >
> >
> > in veil.conf
> >
> > veil_shmem = 32   # Amount of shared memory we can use from
> >   # the postgres add-ins shared memory pool
> >
> > I think this is better than add-ins simply stealing from, and contending
> > for, postgres shared memory which is the only real alternative right
> > now.
> 
> H ... what would happen if I did:
> 
> add_in_shmem = 64
> veil_shmem = 128
> 
> or even:
> 
> add_in_shmem = 128
> veil_shmem = 64
> plperl_shmem = 64
> pljava_shmem = 64
> 

If that happens, one of the add-ins will be sadly disappointed when it
tries to use its allocation.  The same as would happen now, if Veil
attempted to allocate too large a chunk of shared memory.

My proposal makes it possible for properly configured add-ins to have a
guaranteed amount of shared memory available.  It allows add-ins to be
well-behaved in their use of shared memory, and it prevents them from
being able to exhaust postgres' own shared memory.

It doesn't prevent add-ins from over-allocating from the add-in memory
context, nor do I think it can or should do this.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [pgsql-advocacy] Toward A Positive Marketing Approach.

2006-05-19 Thread Tom Lane
Josh Berkus  writes:
> Other projects need even more intensive coding help.  OpenOffice, for 
> example, 
> doesn't offer the Postgres driver by default because it's still too buggy.  

That seems like something that it'd be worth our while to help fix.
Does anyone have a handle on what the problems are?  Is it something
that could reasonably be fixed by a Postgres person, or is the real
problem that it'd take a whole lot of both OO-fu and Postgres-fu?
If so, can we find someone with the former nature to collaborate with?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] OO PostgreSQL Driver

2006-05-19 Thread Joshua D. Drake


Other projects need even more intensive coding help.  OpenOffice, for example, 
doesn't offer the Postgres driver by default because it's still too buggy.  
That would be solvable with money, but $1000 to $2000, not $50.


Does it really need one since it supports JDBC and ODBC?

J



--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] text_position worst case runtime

2006-05-19 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> And how much code would those take?  The bottom line here is that we
>> don't have a pile of complaints about the performance of text_position,
>> so it's difficult to justify making it much more complicated than it
>> is now.

> It seems somewhat contrary to the Postgres design philosophy to assume that
> all strings are small.

That is a straw-man argument.  If we try to optimize every single
function in the system to the Nth degree, we'll end up with a system
that is unmaintainable (and likely unusably buggy as well).  We've got
to set limits on the amount of complexity we're willing to accept in
the core code.

Note that I have not said "you can't put Boyer-Moore into core".
What I've said is that the case to justify doing that hasn't been made.
And handwaving about "design philosophy" isn't the kind of case I'm
looking for --- common applications in which it makes a real performance
difference are what I'm looking for.

At this point we haven't even been shown any evidence that text_position
itself is what to optimize if you need to do searches in large text
strings.  It seems entirely likely to me that the TOAST mechanisms would
be the bottleneck, instead.  And one should also consider other approaches
entirely, like indexes (tsearch2 anyone?).

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] text_position worst case runtime

2006-05-19 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> You've obviously missed the point of my concern, which is code bloat.
> 
> > So why not just replace our code with better algorithms?  We could use
> > Shift-Or or Shift-And which AFAIK are even better than Boyer-Moore.
> 
> And how much code would those take?  The bottom line here is that we
> don't have a pile of complaints about the performance of text_position,
> so it's difficult to justify making it much more complicated than it
> is now.

Even Boyer-Moore, while conceptually tricky isn't actually all that much code.

It seems somewhat contrary to the Postgres design philosophy to assume that
all strings are small.

Other databases have two different string data types, one that has a small
length limit (often only 255 bytes or so) and another that has all kinds of
awkward restrictions on how it can be used. Postgres allows text to contain
gigabytes of data and lets you use all the normal string functions on it. 

It seems like having those string functions assuming the strings are small
compromises that design choice.

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [pgsql-advocacy] Toward A Positive Marketing Approach.

2006-05-19 Thread Josh Berkus
Michael,

Howdy, glad to see you came back.

> 1.  We should treat all marketing efforts by hackers/programmers as
> social bugs.  Get some marketing pros (debuggers) in on this, or the
> popularity of postgresql will continue to pale in the real world.

Not really in line with PostgreSQL's "personality".   This could work for 
OpenOffice, but not here.  PG is a very engineering-central project and there 
aren't many people who want to change that.

Your other comments have been mostly answered, but:

> 3. Reward existing FOSS projects that make sensible provision to
> accomodate postgresql in preference to other more "commercial" db's.
> Free links, mention in newsletter, listing on websites, whatever it
> takes to start pulling other open source communities behind postgresql.
> A good example is bitweaver.org, a great integration project, very
> professional, helpful to small businesses, but needs some promotional help.
>
> 4. Stop being too cheap.  Money Talks!  Offer to PAY premiums to major
> OSS aps who don't do pg, or don't do it well enough.  Like Compierre,
> like Drupal.  

Actually, what projects who don't have a bias against PostgreSQL mostly need 
is developer time to help them with code.  Drupal already supports Postgres; 
they need DBAs to help them be faster/better on Postgres.   They are in the 
same boat with lots of other projects, so much so that there is more demand 
than there are PG volunteers.

If you have Postgres DBA experience, I'll be happy to hook you up with 
someone.

Other projects need even more intensive coding help.  OpenOffice, for example, 
doesn't offer the Postgres driver by default because it's still too buggy.  
That would be solvable with money, but $1000 to $2000, not $50.

I do think that we could use a list of what other mature OSS projects support 
PostgreSQL reasonably well already.  This is pretty much a data collection 
effort; are you volunteering for it?   We could use it.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [pgsql-hackers] Daily digest v1.5943 (21 messages)

2006-05-19 Thread Marc Munro
On Fri, 2006-05-19 at 13:41 -0300, [EMAIL PROTECTED]
wrote:

> Marc Munro wrote:
> > Veil http://pgfoundry.org/projects/veil is currently not a very good
> > Postgres citizen.  It steals what little shared memory it needs from
> > postgres' shared memory using ShmemAlloc().
> > 
> > For Postgres 8.2 I would like Veil to be a better citizen and use
> only
> > what shared memory has been reserved for postgres add-ins.
> 
> Why should this be individually restricted? AFAICS Veil's
> functionality 
> would be essential to access row level ACL controlled tables, so if
> it 
> fails for low mem conditions it's much like a backend failure.

The problem is that postgres currently has no way of knowing how much
shared memory Veil is likely to require, and so will not allocate space
for it.  Consequently if Veil needs a lot of shared memory, it is
unlikely to be able to get it.

My intention is to allow add-ins to use only shared memory that has been
allocated for their own use, to ensure that enough has been allocated,
to isolate postgres shared memory from abuse by add-ins, and to avoid
the need for add-ins to manage their own shared memory.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] text_position worst case runtime

2006-05-19 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> You've obviously missed the point of my concern, which is code bloat.

> So why not just replace our code with better algorithms?  We could use
> Shift-Or or Shift-And which AFAIK are even better than Boyer-Moore.

And how much code would those take?  The bottom line here is that we
don't have a pile of complaints about the performance of text_position,
so it's difficult to justify making it much more complicated than it
is now.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] New feature proposal

2006-05-19 Thread Josh Berkus
Marc,

> The add-in would not "know" how much had been allocated to it, but could
> be told through it's own config file.  I envisage something like:
>
> in postgresql.conf
>
> # add_in_shmem = 0    # Amount of shared mem to set aside for add-ins
>                       # in KBytes
> add_in_shem = 64
>
>
> in veil.conf
>
> veil_shmem = 32       # Amount of shared memory we can use from
>       # the postgres add-ins shared memory pool
>
> I think this is better than add-ins simply stealing from, and contending
> for, postgres shared memory which is the only real alternative right
> now.

H ... what would happen if I did:

add_in_shmem = 64
veil_shmem = 128

or even:

add_in_shmem = 128
veil_shmem = 64
plperl_shmem = 64
pljava_shmem = 64

... seems like we'll need to check for overallocation, no?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] New feature proposal

2006-05-19 Thread Marc Munro
On Fri, 2006-05-19 at 13:41 -0300, [EMAIL PROTECTED]
wrote:
> On Thu, 2006-05-18 at 17:39 -0700, Marc Munro wrote:
> 
> > For Postgres 8.2 I would like Veil to be a better citizen and use
> only
> > what shared memory has been reserved for postgres add-ins.
> 
> How would Postgres ask the add-in how much memory it needs? How would
> the add-in know how much has been reserved for it? How would an add-in
> know whether it was the only add-in and whether it could take all of
> the
> allocation?

Postgres would not ask any add-ins how much they need, it would simply
allocate the extra amount defined in a GUC and not make that available
through the normal shared memory allocation mechanism.

The add-in would not "know" how much had been allocated to it, but could
be told through it's own config file.  I envisage something like:

in postgresql.conf

# add_in_shmem = 0# Amount of shared mem to set aside for add-ins
  # in KBytes
add_in_shem = 64


in veil.conf

veil_shmem = 32   # Amount of shared memory we can use from
  # the postgres add-ins shared memory pool

I think this is better than add-ins simply stealing from, and contending
for, postgres shared memory which is the only real alternative right
now.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] text_position worst case runtime

2006-05-19 Thread Alvaro Herrera
Tom Lane wrote:
> "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > Perhaps it would be best to add a seperate set of functions that use
> > boyer-moore, and reference them in appropriate places in the
> > documentation. Unless someone has a better idea on how we can find out
> > what people are actually doing in the field...
> 
> You've obviously missed the point of my concern, which is code bloat.
> A parallel set of functions incorporating B-M would make things worse
> not better from that standpoint.  (Unless you are proposing that someone
> do it as a separate pgfoundry project; which'd be fine with me.  I'm
> just concerned about how much we buy into as core features.)

So why not just replace our code with better algorithms?  We could use
Shift-Or or Shift-And which AFAIK are even better than Boyer-Moore.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] text_position worst case runtime

2006-05-19 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> Perhaps it would be best to add a seperate set of functions that use
> boyer-moore, and reference them in appropriate places in the
> documentation. Unless someone has a better idea on how we can find out
> what people are actually doing in the field...

You've obviously missed the point of my concern, which is code bloat.
A parallel set of functions incorporating B-M would make things worse
not better from that standpoint.  (Unless you are proposing that someone
do it as a separate pgfoundry project; which'd be fine with me.  I'm
just concerned about how much we buy into as core features.)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Simon Riggs
On Fri, 2006-05-19 at 09:36 -0700, Jeff Frost wrote:
> On Fri, 19 May 2006, Tom Lane wrote:
> 
> > What I'd suggest is resuming the test after making sure you've killed
> > off any old archivers, and seeing if you can make any progress on
> > reproducing the original problem.  We definitely need a
> > multiple-archiver interlock, but I think that must be unrelated to your
> > real problem.
> 
> Ok, so I've got the old archivers gone (and btw, after a restart I ended up 
> with 3 of them - so I stopped postmaster, and killed them all individually 
> and 
> started postmaster again).  

Thats good.

> Now I can run my same pg_bench, or do you guys 
> have any other suggestions on attempting to reproduce the problem?

No. We're back on track to try to reproduce the original error.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Jeff Frost

On Fri, 19 May 2006, Tom Lane wrote:


What I'd suggest is resuming the test after making sure you've killed
off any old archivers, and seeing if you can make any progress on
reproducing the original problem.  We definitely need a
multiple-archiver interlock, but I think that must be unrelated to your
real problem.


Ok, so I've got the old archivers gone (and btw, after a restart I ended up 
with 3 of them - so I stopped postmaster, and killed them all individually and 
started postmaster again).  Now I can run my same pg_bench, or do you guys 
have any other suggestions on attempting to reproduce the problem?


--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Jeff Frost

On Fri, 19 May 2006, Tom Lane wrote:


Well, the fact that there's only one archiver *now* doesn't mean there
wasn't more than one when the problem happened.  The orphaned archiver
would eventually quit.

Do you have logs that would let you check when the production postmaster
was restarted?


I looked through /var/log/messages* and there wasn't a restart prior to the 
problem in the logs.  They go back to April 16.  The postmaster was restarted 
on May 15th (this Monday), but that was after the reported problem.



--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Joshua D. Drake

Alvaro Herrera wrote:

Joshua D. Drake wrote:

Martijn van Oosterhout wrote:

On Thu, May 18, 2006 at 02:58:11PM -0400, Mark Woodward wrote:

The reality is that MySQL is widely supported by some very, shall we say,
"interesting" open source projects and using these products with
PostgreSQL would be a plus.

The biggest headache I find with using postgres is that various GPL
licenced programs have trouble directly shipping postgresql support
because of our use of OpenSSL. Each and every one of those program
needs to add an exception to their licence for distributors to
distribute postgresql support.
Why would that be the case... OpenSSL and PostgreSQL both are BSD 
licensed... Am I missing something?


Advertising clause.  PostgreSQL doesn't have it, OpenSSL does.


Is that the same clause that caused the XFree86/X.Org fork?

J


--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Simon Riggs
On Fri, 2006-05-19 at 12:03 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > OK, I'm on it.
> 
> What solution have you got in mind?  I was thinking about an fcntl lock
> to ensure only one archiver is active in a given data directory.  That
> would fix the problem without affecting anything outside the archiver.
> Not sure what's the most portable way to do it though.

I was trying to think of a better way than using an archiver.pid file in
pg_xlog/archive_status...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Rod Taylor
On Fri, 2006-05-19 at 09:11 -0700, Joshua D. Drake wrote:
> Martijn van Oosterhout wrote:
> > On Thu, May 18, 2006 at 02:58:11PM -0400, Mark Woodward wrote:
> >> The reality is that MySQL is widely supported by some very, shall we say,
> >> "interesting" open source projects and using these products with
> >> PostgreSQL would be a plus.
> > 
> > The biggest headache I find with using postgres is that various GPL
> > licenced programs have trouble directly shipping postgresql support
> > because of our use of OpenSSL. Each and every one of those program
> > needs to add an exception to their licence for distributors to
> > distribute postgresql support.
> 
> Why would that be the case... OpenSSL and PostgreSQL both are BSD 
> licensed... Am I missing something?

OpenSSL is not the 3 clause BSD license, it also includes a number of
advertising clauses that the GPL has never liked -- GPL must not be
modified for derivatives but the advertising clauses are in addition to
the GPL, so it must be modified for the combination.

Exceptions exist in the GPL for libraries and tools included in the
operating system and this is enough in most cases. GPL applications on
Windows may have problems.


http://www.openssl.org/support/faq.html#LEGAL2
2. Can I use OpenSSL with GPL software?

On many systems including the major Linux and BSD distributions, yes
(the GPL does not place restrictions on using libraries that are part of
the normal operating system distribution). 

On other systems, the situation is less clear. Some GPL software
copyright holders claim that you infringe on their rights if you use
OpenSSL with their software on operating systems that don't normally
include OpenSSL.

If you develop open source software that uses OpenSSL, you may find it
useful to choose an other license than the GPL, or state explicitly that
"This program is released under the GPL with the additional exemption
that compiling, linking, and/or using OpenSSL is allowed." If you are
using GPL software developed by others, you may want to ask the
copyright holder for permission to use their software with OpenSSL.




OpenSSL License
 * 1. Redistributions of source code must retain the above copyright
 *notice, this list of conditions and the following disclaimer. 
 *
 * 2. Redistributions in binary form must reproduce the above copyright
 *notice, this list of conditions and the following disclaimer in
 *the documentation and/or other materials provided with the
 *distribution.
 *
 * 3. All advertising materials mentioning features or use of this
 *software must display the following acknowledgment:
 *"This product includes software developed by the OpenSSL Project
 *for use in the OpenSSL Toolkit. (http://www.openssl.org/)"
 *
 * 4. The names "OpenSSL Toolkit" and "OpenSSL Project" must not be used
to
 *endorse or promote products derived from this software without
 *prior written permission. For written permission, please contact
 *[EMAIL PROTECTED]
 *
 * 5. Products derived from this software may not be called "OpenSSL"
 *nor may "OpenSSL" appear in their names without prior written
 *permission of the OpenSSL Project.
 *
 * 6. Redistributions of any form whatsoever must retain the following
 *acknowledgment:
 *"This product includes software developed by the OpenSSL Project
 *for use in the OpenSSL Toolkit (http://www.openssl.org/)"



> > I'm thinking particularly of FreeRadius but there are others. More than
> > once I thought while waiting for stuff to compile: if I'd chosen mysql
> > I'd be done by now...
> > 
> > Have a nice day,
-- 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Simon Riggs
On Fri, 2006-05-19 at 12:20 -0400, Tom Lane wrote:
> I wrote:
> > Well, the fact that there's only one archiver *now* doesn't mean there
> > wasn't more than one when the problem happened.  The orphaned archiver
> > would eventually quit.
> 
> But, actually, nevermind: we have explained the failures you were seeing
> in the test setup, but a multiple-active-archiver situation still
> doesn't explain the original situation of incoming connections getting
> blocked.

Agreed.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax

2006-05-19 Thread Jim C. Nasby
On Thu, May 18, 2006 at 08:40:04PM -0400, Douglas McNaught wrote:
> Mark Dilger <[EMAIL PROTECTED]> writes:
> 
> > Tom Lane wrote:
> >> No it isn't.  The plpgsql scanner treats := and = as *the same token*.
> >> They can be interchanged freely.  This has nothing to do with the case
> >> of modifying a loop variable in particular.
> >
> > I disagree.  If the scanner treated them the same, then
> >
> >   if i := 1 then ...
> >
> > would work, but it doesn't.  The := is rejected in a conditional.  Try the
> > following code if you don't believe me:
> 
> That's because (AIUI) all expressions to be evaluated are handed off
> to the SQL parser (why re-implement all that logic and have subtle and
> annoying differences?)  plpgsql only handles the statements, loops, etc.
> So it doesn't care about the difference but SQL does...

Something that's always bugged me is how horribly variables are handled
in plpgsql, namely that if you have a variable and a field with the same
name it can be extremely difficult to keep them seperated. Perhaps := vs
= might be a way to keep it clear as to which is which...

Though, a better way would probably just be to provide a built-in
construct for referencing plpgsql variables, such as $.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Andrew Dunstan

Joshua D. Drake wrote:

Martijn van Oosterhout wrote:

On Thu, May 18, 2006 at 02:58:11PM -0400, Mark Woodward wrote:
The reality is that MySQL is widely supported by some very, shall we 
say,

"interesting" open source projects and using these products with
PostgreSQL would be a plus.


The biggest headache I find with using postgres is that various GPL
licenced programs have trouble directly shipping postgresql support
because of our use of OpenSSL. Each and every one of those program
needs to add an exception to their licence for distributors to
distribute postgresql support.


Why would that be the case... OpenSSL and PostgreSQL both are BSD 
licensed... Am I missing something?



http://www.openssl.org/support/faq.html#LEGAL2

Of course, on that reasoning, they would need to provide a similar 
exception for libpq with or without openssl. More and more I love the 
fact that we don't play these games.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] text_position worst case runtime

2006-05-19 Thread Jim C. Nasby
On Thu, May 18, 2006 at 06:49:38PM -0700, Mark Dilger wrote:
> > I would think that the worst-case times would be fairly improbable.
> > I'm disinclined to push something as complicated as Boyer-Moore matching
> > into this function without considerable evidence that it's a performance
> > bottleneck for real applications.
> 
> A common approach in biological data applications is to store nucleic and 
> amino
> acid sequences as text in a relational database.  The smaller alphabet sizes 
> and
> the tendency for redundancy in these sequences increases the likelihood of a
> performance problem.  I have solved this problem by writing my own data types
> with their own functions for sequence comparison and alignment, and I used
> boyer-moore for some of that work.  Whether the same technique should be used
> for the text and varchar types was unclear to me, hence the question.

Perhaps it would be best to add a seperate set of functions that use
boyer-moore, and reference them in appropriate places in the
documentation. Unless someone has a better idea on how we can find out
what people are actually doing in the field...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Tom Lane
I wrote:
> Well, the fact that there's only one archiver *now* doesn't mean there
> wasn't more than one when the problem happened.  The orphaned archiver
> would eventually quit.

But, actually, nevermind: we have explained the failures you were seeing
in the test setup, but a multiple-active-archiver situation still
doesn't explain the original situation of incoming connections getting
blocked.

What I'd suggest is resuming the test after making sure you've killed
off any old archivers, and seeing if you can make any progress on
reproducing the original problem.  We definitely need a
multiple-archiver interlock, but I think that must be unrelated to your
real problem.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Tom Lane
Jeff Frost <[EMAIL PROTECTED]> writes:
> Hurray!  Unfortunately, the postmaster on the original troubled server almost
> never gets restarted, and in fact only has only one archiver process running 
> right now.  Drat!

Well, the fact that there's only one archiver *now* doesn't mean there
wasn't more than one when the problem happened.  The orphaned archiver
would eventually quit.

Do you have logs that would let you check when the production postmaster
was restarted?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Alvaro Herrera
Joshua D. Drake wrote:
> Martijn van Oosterhout wrote:
> >On Thu, May 18, 2006 at 02:58:11PM -0400, Mark Woodward wrote:
> >>The reality is that MySQL is widely supported by some very, shall we say,
> >>"interesting" open source projects and using these products with
> >>PostgreSQL would be a plus.
> >
> >The biggest headache I find with using postgres is that various GPL
> >licenced programs have trouble directly shipping postgresql support
> >because of our use of OpenSSL. Each and every one of those program
> >needs to add an exception to their licence for distributors to
> >distribute postgresql support.
> 
> Why would that be the case... OpenSSL and PostgreSQL both are BSD 
> licensed... Am I missing something?

Advertising clause.  PostgreSQL doesn't have it, OpenSSL does.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Joshua D. Drake

Martijn van Oosterhout wrote:

On Thu, May 18, 2006 at 02:58:11PM -0400, Mark Woodward wrote:

The reality is that MySQL is widely supported by some very, shall we say,
"interesting" open source projects and using these products with
PostgreSQL would be a plus.


The biggest headache I find with using postgres is that various GPL
licenced programs have trouble directly shipping postgresql support
because of our use of OpenSSL. Each and every one of those program
needs to add an exception to their licence for distributors to
distribute postgresql support.


Why would that be the case... OpenSSL and PostgreSQL both are BSD 
licensed... Am I missing something?


Joshua D. Drake



I'm thinking particularly of FreeRadius but there are others. More than
once I thought while waiting for stuff to compile: if I'd chosen mysql
I'd be done by now...

Have a nice day,



--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Jeff Frost

On Fri, 19 May 2006, Tom Lane wrote:


Well, there's our smoking gun.  IIRC, all the failures you showed us are
consistent with race conditions caused by multiple archiver processes
all trying to do the same tasks concurrently.

Do you frequently stop and restart the postmaster?  Because I don't see
how you could get into this state without having done so.

I've just been looking at the code, and the archiver does commit
hara-kiri when it notices its parent postmaster is dead; but it only
checks that in the outer loop.  Given sufficiently long delays in the
archive_command, that could be a long time after the postmaster died;
and in the meantime, successive executions of the archive_command could
be conflicting with those launched by a later archiver incarnation.


Hurray!  Unfortunately, the postmaster on the original troubled server almost 
never gets restarted, and in fact only has only one archiver process running 
right now.  Drat!


I guess I'll have to try and catch it in the act again the next time the NAS 
gets wedged so I can debug a little more (it was caught by one of the windows 
folks last time) and gather some useful data.


Let me know if you want me to test a patch since I've already got this test 
case setup.


--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Andreas Pflug

Martijn van Oosterhout wrote:


The biggest headache I find with using postgres is that various GPL
licenced programs have trouble directly shipping postgresql support
because of our use of OpenSSL. Each and every one of those program
needs to add an exception to their licence for distributors to
distribute postgresql support.


They could distribute a non-ssl-enabled version, *if* they really need 
to include libpq in the package, or advise to to replace it with the 
common version if ssl is required. I bet >99 % of pgsql connections are 
not encrypted anyway.


Regards,
Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] New feature proposal

2006-05-19 Thread Andreas Pflug

Marc Munro wrote:

Veil http://pgfoundry.org/projects/veil is currently not a very good
Postgres citizen.  It steals what little shared memory it needs from
postgres' shared memory using ShmemAlloc().

For Postgres 8.2 I would like Veil to be a better citizen and use only
what shared memory has been reserved for postgres add-ins.


Why should this be individually restricted? AFAICS Veil's functionality 
would be essential to access row level ACL controlled tables, so if it 
fails for low mem conditions it's much like a backend failure.


Regards,
Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Weird ..... (a=1 or a=2) <> (a=2 or a=1)

2006-05-19 Thread Rafael Martinez
On Fri, 2006-05-19 at 11:34 -0400, Tom Lane wrote:
[...]
> 
> This only affects the 7.4 and 8.0 branches, because earlier and later
> versions of Postgres don't use this technique for detecting duplicates.
> But it's surprising we didn't find it before.
> 
> Patches will appear in next week's releases.  Thanks again!
> 

Thanks to you for finding and fixing the problem :-)
It looks like you are finish so I will update the server and you will
lose access to it.

regards
-- 
Rafael Martinez, <[EMAIL PROTECTED]>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> OK, I'm on it.

What solution have you got in mind?  I was thinking about an fcntl lock
to ensure only one archiver is active in a given data directory.  That
would fix the problem without affecting anything outside the archiver.
Not sure what's the most portable way to do it though.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] New feature proposal

2006-05-19 Thread Simon Riggs
On Thu, 2006-05-18 at 17:39 -0700, Marc Munro wrote:

> For Postgres 8.2 I would like Veil to be a better citizen and use only
> what shared memory has been reserved for postgres add-ins.

How would Postgres ask the add-in how much memory it needs? How would
the add-in know how much has been reserved for it? How would an add-in
know whether it was the only add-in and whether it could take all of the
allocation?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Martijn van Oosterhout
On Thu, May 18, 2006 at 02:58:11PM -0400, Mark Woodward wrote:
> The reality is that MySQL is widely supported by some very, shall we say,
> "interesting" open source projects and using these products with
> PostgreSQL would be a plus.

The biggest headache I find with using postgres is that various GPL
licenced programs have trouble directly shipping postgresql support
because of our use of OpenSSL. Each and every one of those program
needs to add an exception to their licence for distributors to
distribute postgresql support.

I'm thinking particularly of FreeRadius but there are others. More than
once I thought while waiting for stuff to compile: if I'd chosen mysql
I'd be done by now...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [GENERAL] Weird ..... (a=1 or a=2) <> (a=2 or a=1)

2006-05-19 Thread Tom Lane
Many thanks for allowing me to trace through your problem case.
It's a real Postgres bug, and a nasty one.  The problem is a thinko in
nodeIndexscan.c's code that tests whether the same tuple has already
been emitted in a previous OR'd scan: it is looking for a match on
tuple->t_data->t_ctid, when what it should really be looking at is
tuple->t_self.  What I find is that the indexscan for status == open
is returning TID (880,5), which has XMAX_INVALID and a t_ctid pointing
at (880,18).  (This is perfectly normal, it just indicates that
somebody tried to update the row but the updating transaction rolled
back, and the updated version at 880,18 was later recycled by VACUUM.)
So this causes a bogus rejection when TID (880,18) is scanned during the
second indexscan.

This only affects the 7.4 and 8.0 branches, because earlier and later
versions of Postgres don't use this technique for detecting duplicates.
But it's surprising we didn't find it before.

Patches will appear in next week's releases.  Thanks again!

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Mark Woodward
> Andrew Dunstan <[EMAIL PROTECTED]> writes:
>> Mark Woodward wrote:
>>> Again, there is so much code for MySQL, a MySQL emulation layer, MEL
>>> for
>>> short, could allow plug and play compatibility for open source, and
>>> closed
>>> source, applications that otherwise would force a PostgreSQL user to
>>> hold
>>> his or her nose and use MySQL.
>>>
>> If we had infinite resources this might make sense. We don't, so it
>> doesn't. There is a real cost to producing a compatibility layer, and
>> the cost will be those spiffy new features.
>
> The real problem is that there's a whole lot of stuff, such as mysql's
> weak error checking, that I don't think a "compatibility layer" could
> sanely provide.
>
I kind of agree with this statement, but while I was playing devils's
advocate and just grousing a bit about having to use MySQL, there is a
sort of reality of "openomics" where mind-share is everything.

The more mind-share you have, the more opportunities you have and the more
resources become available. Not always, of course, look at OpenSSH, but
for the most part.

As MySQL adds features, not matter how poorly implemented, and maintain a
migration path, we will never reach their users.

PostgreSQL is better, true, but it is not ideal in many ways. It can be
best said that the difference between PostgreSQL and MySQL is similar to
the difference between Linux/BSD and Windows.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Joshua D. Drake

 When MySQL is at that

point, which database do you think executives will be choosing? The one
with a very large userbase and lots of marketing and PR that they've
heard plenty about,


All due respect, Jim -- but don't you work for a publicly traded 
database company that happens to have its own version of PostgreSQL?


This is really a discussion for your marketing (and mine frankly) then 
the PostgreSQL mailing lists :)


Sincerely,

Joshua D. Drake




--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Jim C. Nasby
On Fri, May 19, 2006 at 01:26:34AM +0200, Dawid Kuroczko wrote:
> Personally my opinion is that there is no point in pushing PostgreSQL
> everywhere -- if there is no siginifcant performance gain, most managers
> will refuse it, on the grounds that "if it ain't (too) broke, don't fix it".
> The real places to "attack at" are the BIG dbs, the dataware housing
> applications.  Places where MySQL is not used, because someones
> select count(*) should not kill the database.  Because the queries
> take few hours to complete "by design". This should be doable. :)

The problem with limiting ourselves to going after only the 'high end'
of databases is that MySQL is also pushing in that direction, but they
have the advantage of a much larger user base than us. So in the
not-to-distant future, a lot of people who are looking to come off of
Oracle will look at both MySQL and PostgreSQL (in fact I'm sure there's
already some people moving from Oracle to MySQL). When MySQL is at that
point, which database do you think executives will be choosing? The one
with a very large userbase and lots of marketing and PR that they've
heard plenty about, or the one that might theoretically be technically
superior but has a small userbase and they've never heard of? And if the
technical people in the company are MySQL users, because that's the
database they cut their teeth on...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS]

2006-05-19 Thread Tom Lane
Dragan Zubac <[EMAIL PROTECTED]> writes:
> Does anybody knows the Postgres v8.0.3 data directory hierarchy ? What is 
> the purpose of the multiply files such as:

> 5
> 5.1
> 5.2
> 5.3

> each of the same size ?

http://developer.postgresql.org/docs/postgres/storage.html

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Compression and on-disk sorting

2006-05-19 Thread Tom Lane
Martijn van Oosterhout  writes:
> However, postgres tables are very highly compressable, 10-to-1 is not
> that uncommon. pg_proc and pg_index compress by that for example.
> Indexes compress even more (a few on my system compress 25-to-1 but
> that could just be slack space, the record being 37-to-1
> (pg_constraint_conname_nsp_index)).

Anything containing a column of type "name" will compress amazingly well
because of all the padding spaces.  I don't think that's representative
of user data though ... except maybe for the occasional novice using
"char(255)" ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Compression and on-disk sorting

2006-05-19 Thread Martijn van Oosterhout
On Fri, May 19, 2006 at 09:03:31AM -0400, Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost
> > unbeleiveable. What's in the table?
> 
> Yeah, I'd tend to question the test data being used.  gzip does not do
> that well on typical text (especially not at the lower settings we'd
> likely want to use).

However, postgres tables are very highly compressable, 10-to-1 is not
that uncommon. pg_proc and pg_index compress by that for example.
Indexes compress even more (a few on my system compress 25-to-1 but
that could just be slack space, the record being 37-to-1
(pg_constraint_conname_nsp_index)).

The only table on my test system over 32KB that doesn't reach 2-to-1
compression with gzip -3 is one of the toast tables.

So getting 25-to-1 is a lot, but possibly not that extreme.
pg_statistic, which is about as close to random data as you're going to
get on a postgres system, compresses 5-to-1.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS]

2006-05-19 Thread Larry Rosenman
Dragan Zubac wrote:
> Hello
> 
> Does anybody knows the Postgres v8.0.3 data directory hierarchy ?
> What is the purpose of the multiply files such as:
> 
> 5
> 5.1
> 5.2
> 5.3
> 
> each of the same size ?
> 
those are the 1 gigabyte segments of your table/index/relation.

see the storage section of the documentation.

-- 
Larry Rosenman  
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX  78727-6531 

Tel: 512.231.6173
Fax: 512.231.6597
Email: [EMAIL PROTECTED]
Web: www.pervasive.com 

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS]

2006-05-19 Thread Dragan Zubac

Hello

Does anybody knows the Postgres v8.0.3 data directory hierarchy ? What is 
the purpose of the multiply files such as:

5
5.1
5.2
5.3

each of the same size ?

Sincerely

Dragan Zubac

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Compression and on-disk sorting

2006-05-19 Thread Tom Lane
Martijn van Oosterhout  writes:
> I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost
> unbeleiveable. What's in the table?

Yeah, I'd tend to question the test data being used.  gzip does not do
that well on typical text (especially not at the lower settings we'd
likely want to use).

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Compression and on-disk sorting

2006-05-19 Thread Luke Lonergan
Jim, 

> http://jim.nasby.net/misc/compress_sort.txt is preliminary results.
> I've run into a slight problem in that even at a compression 
> level of -3, zlib is cutting the on-disk size of sorts by 
> 25x. So my pgbench sort test with scale=150 that was 
> producing a 2G on-disk sort is now producing a 80M sort, 
> which obviously fits in memory. And cuts sort times by more than half.

When you're ready, we can test this on some other interesting cases and
on fast hardware.

BTW - external sorting is *still* 4x slower than popular commercial DBMS
(PCDB) on real workload when full rows are used in queries.  The final
results we had after the last bit of sort improvements were limited to
cases where only the sort column was used in the query, and for that
case the improved external sort code was as fast as PCDB provided lots
of work_mem are used, but when the whole contents of the row are
consumed (as with TPC-H and in many real world cases) the performance is
still far slower.

So, compression of the tuples may be just what we're looking for.

- Luke


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Tommi Maekitalo
Am Freitag, 19. Mai 2006 02:35 schrieb Robert Treat:
> On Thursday 18 May 2006 12:38, Josh Berkus wrote:
> > Personally, I'd go after MSSQL before I bothered with MySQL.   Sure,
> > let's make *migration* easier for those who wake up and smell the BS, but
> > migration can (and probably should) be one-way.
>
> If you want to get users to swtich to your software from your competitors,
> you have to eliminate barriers, and a big one for any database is getting
> locked into a specific one.  People aren't going to take the time to try
> switching to postgresql if they can't easily make it back to thier former
> database. It's one of the reasons why PostgreSQL's standards compliance is
> so important; if you want to swtich to a new database, your best bet is to
> give PostgreSQL a shot, because even if you don't like it, we're not going
> to try and trap you into our software with bunches of non-standard knobs.
> Low barrier to exit == low barrier to entry.

The way to go are standards. If postgresql supports standard-sql (like we all 
know it know), mysql-users has to justify their apps to use standard-sql. 
What they gain is not only compatibility with PostgreSQL but compatiblity 
with all database-servers, which supports this standard. They wont have much 
trouble to switch back to mysql or downgrade their postgresql to oracle ;-), 
if they follow standards.

Also if PostgreSQL would have a compatibility-layer, it has to follow every 
quirk of mysql and will be measured by that. Much better is to promote users 
of mysql to use standards.

Tommi

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


  1   2   >