Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
On Fri, May 19, 2006 at 07:04:47PM -0400, Bruce Momjian wrote: > > libreadline is not a problem because you can distribute postgresql > > compiled with readline and comply with all licences involved > > simultaneously. It doesn't work with openssl because the licence > > requires things that are incompatable with the GPL. > > My question is whether psql using libreadline.so has to be GPL, meaning > the psql source has to be included in a binary distribution. IANAL, but yes. Or any other of the methods allowed, like providing a written voucher valid for at least three years. People who feel they need to keep the source to psql secret should link against libeditline instead. The way I understand it, the GPL affects programs in two main ways: 1. A program which is GPL'd must, when distributed, be able to provide all source used to build it under terms compatable with the GPL. 2. A program which includes a GPL'd header file while building, must, when distributed, provide its own source and the library under GPL compatable terms, but not necessariliy the source of anything else needed to build it. This is why it's OK that psql links against openssl and readline. These are obviously only relevent when distributing precompiled binaries. If you are only distributing source, none of the above applies to you. There's a third method that some people claim, but I don't buy. This where a program using an interface of a GPL'd library somehow become a derived work of said library. That's just way whacked out. You may ofcourse disagree with any of the above, and hey, if you have a lawyer to back you up, who am I to argue? As for why you don't solve the problem by distributing a libpq not compiled against OpenSSL, well, that's a different question. Back when SSL was considered an arms exports by the US, having both SSL and non-SSL versions was common (and a big PITA). When that disappeared, the main reason for the split went away and people started compiling SSL by default. This solved the problem for 99% of programs. However, one tiny subset remains problematic: - A library implements SSL, but only using OpenSSL - The library doesn't use the GPL, or doesn't have an OpenSSL exception clause. - A GPL'd program uses this library, without an OpenSSL exception clause. In this subset of a subset of a subset of programs, it's a problem. Many libraries that implement SSL provide an alternative to OpenSSL, many programs using such libraries have exception clauses so that there's just a handful of programs and libraries that are problematic. As long as there's a possibility that the situation can change (either every GPL program using postgresql gains an exception clause, or postgresql might someday support some other library) it will probably stay this way. If the the postgresql core decides that OpenSSL will be the only SSL ever supported, no matter what, well, the split distribution may yet happen. In the meantime, we have status quo. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] text_position worst case runtime
Tom Lane <[EMAIL PROTECTED]> writes: > If it did that might be a nice solution, but I'm not sure that it does > use B-M ... I can't find either "Boyer" or "Moore" in its source code. > > There's no particular reason to suppose offhand that a regex engine > would be faster than the naive code for fixed patterns. Well even a lame regexp implementation ought to be O(n+m). The factors will be less than Boyer-Moore which can skip over substantial sections of the search space without even looking at the characters. But there's no way it would be O(n*m) for simple patterns unless the implementation was seriously deficient. Of course your statement could still be true for particular usage patterns like searching many different short strings with many different patterns where the setup time of the regexp tables may dominate. -- greg ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] String Similarity
Get pg_trgm http://www.sai.msu.su/~megera/oddmuse/index.cgi/ReadmeTrgm It doesn't depends on language. Oleg On Fri, 19 May 2006, Mark Woodward wrote: I have a side project that needs to "intelligently" know if two strings are contextually similar. Think about how CDDB information is collected and sorted. It isn't perfect, but there should be enough information to be usable. Think about this: "pink floyd - dark side of the moon - money" "dark side of the moon - pink floyd - money" "money - dark side of the moon - pink floyd" etc. To a human, these strings are almost identical. Similarly: "dark floyd of money moon pink side the" Is a puzzle to be solved by 13 year old children before the movie starts. My post has three questions: (1) Does anyone know of an efficient and numerically quantified method of detecting these sorts of things? I currently have a fairly inefficient and numerically bogus solution that may be the only non-impossible solution for the problem. (2) Does any one see a need for this feature in PostgreSQL? If so, what kind of interface would be best accepted as a patch? I am currently returning a match liklihood between 0 and 100; (3) Is there also a desire for a Levenshtein distence function for text and varchars? I experimented with it, and was forced to write the function in item #1. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] String Similarity
> > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > >> I have a side project that needs to "intelligently" know if two strings >> are contextually similar. > > The examples you gave seem heavy on word order and whitespace > consideration, > before applying any algorithms. Here's a quick perl version that does the > job: [SNIP] This is a case where the example was too simple to explain the problem, sorry. I have an implementation of Oracle's "contains" function for PostgreSQL, and it does basically what you are doing, and, in fact, also has Mohawk Software Extensions (LOL) that provide metaphone. The problem is that parsing white space realy isn't reliable. Sometimes it is pinkfloyd-darksideofthemoon. Also, I have been thinking of other applications. I have a piece of code that does this: apps$ ./stratest "pink foyd dark side of the moon money" "money dark side of the moon pink floyd" Match: dark side of the moon Match: pink f Match: money Match: oyd apps$ ./stratest "pinkfoyddarksideofthemoonmoney" "moneydarksideofthemoonpinkfloyd" Match: darksideofthemoon Match: pinkf Match: money Match: oyd I need to come up with a numerically sane way of taking this information and understanding overall "similarity." ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] patch review, please: Autovacuum/Vacuum times via stats.
Applied by Alvaro. Thanks. --- Larry Rosenman wrote: > Larry Rosenman wrote: > > Greetings, > > I've got a patch to be reviewed for having the stats system keep > > track of the last > > time a table was vacuumed or analyzed either by the user or via > > AutoVacuum. > > > > The patch is at: > > http://www.lerctr.org/~ler/pg-dev/vacuum-autovacuum-times-stats.diff > > > > I'd appreciate a full review, it includes docs as well. > > > > Thanks! > > > > LER > > I just replaced this one with one that actually bumps catversion. > > LER > > pgsql-patches added as well. > > I think this one is applyable if the powers that be want to. > > Comments/criticism welcome. > > LER > > > -- > Larry Rosenman http://www.lerctr.org/~ler > Phone: +1 512-248-2683 E-Mail: ler@lerctr.org > US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 > > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to >choose an index scan if your joining column's datatypes do not >match > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS]
Hi, suppose we have something like this: upd_views=# create table tabla1 (col1 point); CREATE TABLE upd_views=# insert into tabla1 values ('3,2'); INSERT 0 1 upd_views=# insert into tabla1 values ('2,2'); INSERT 0 1 upd_views=# insert into tabla1 values ('3,2'); INSERT 0 1 then, this select will give an error: upd_views=# select col1, count(*) from tabla1 group by col1; ERROR: could not identify an ordering operator for type point HINT: Use an explicit ordering operator or modify the query. upd_views=# i guess this is related to: http://archives.postgresql.org/pgsql-hackers/2003-08/msg00809.php so, what happened with this idea? there is another way to automagicaly identify an "equality operator" for datatypes like 'point'? as you said in the message linked above that is because postgres ask for the operator name... -- regards, Jaime Casanova "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs and the universe trying to produce bigger and better idiots. So far, the universe is winning." Richard Cook ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] text_position worst case runtime
Hannu Krosing wrote: > I had a (false ?) memory that we used some variant of pcre, and that > pcre uses BM. I may be false on both accounts. (I know that python > borrowed its re module from pcre). Our code is a derivative from Henry Spencer's code found in Tcl. It certainly isn't Boyer Moore, because it processes chars one at a time, left to right (BM processes right to left). -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] String Similarity
> > I have a side project that needs to "intelligently" know if two > > strings are contextually similar. Also check out the "fuzzystrmatch" module in /contrib, which offers soundex, metaphone and levenschtein functions. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Joshua D. Drake wrote: > Jonah H. Harris wrote: > > On 5/19/06, Joshua D. Drake <[EMAIL PROTECTED]> wrote: > >> And with that, I am going to sit in a lawn chair and watch the bonfire. > > > > This is one of the finest examples of unfocused discussions I've ever > > seen on -hackers... while surely entertaining, what a huge waste of > > time. > > All discussions on mailing lists are unfocused and a waste of time to > some degree, even the best of them because by its very nature, email is > a time wasting tool. Let's not forget my bad jokes. I am still chuckling at JavaZero. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Martijn van Oosterhout wrote: -- Start of PGP signed section. > On Fri, May 19, 2006 at 04:41:20PM -0400, Bruce Momjian wrote: > > > Yes, the exeption applies to libreadline, which is why we can deliver > > > psql with libreadline linked on Linux, for example. But we can't on > > > Windows or Solaris. > > > > OK, where do you see this exception? I have not. > > The exception is not relevent in this case. The exception is to allow > GPL applications to work on non-free operating systems. Obviously a > GPL'd application on Windows can never supply the source to the Win32 > libraries. > > libreadline is not a problem because you can distribute postgresql > compiled with readline and comply with all licences involved > simultaneously. It doesn't work with openssl because the licence > requires things that are incompatable with the GPL. My question is whether psql using libreadline.so has to be GPL, meaning the psql source has to be included in a binary distribution. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] text_position worst case runtime
Ühel kenal päeval, L, 2006-05-20 kell 01:34, kirjutas Hannu Krosing: > Ühel kenal päeval, R, 2006-05-19 kell 18:18, kirjutas Tom Lane: > > Hannu Krosing <[EMAIL PROTECTED]> writes: > > > I guess our regex implementation already uses boyer-moore or similar. > > > Why not just expose the match position of substring('text' in 'regex') > > > using some function, called match_position(int searched_text, int > > > regex, int matchnum) ? > > > > If it did that might be a nice solution, but I'm not sure that it does > > use B-M ... I can't find either "Boyer" or "Moore" in its source code. > > Ok, maybe it is not optimised for finding longish strings inside even > longers trings. > > I had a (false ?) memory that we used some variant of pcre, and that > pcre uses BM. I may be false on both accounts. (I know that python > borrowed its re module from pcre). http://www.mcabee.org/lists/snort-users/Mar-05/msg00026.html seems to imply that PCRE uses BM at least for some case, so I might not have been wrong in case 2 :) > > There's no particular reason to suppose offhand that a regex engine > > would be faster than the naive code for fixed patterns. > > if naive code is O(n*m), then starting from some values of n and m it is > probably faster if it is based on somewhat optimised regex engine, the > question is, what is the threasold and dataset for fasterness > -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] String Similarity
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 > I have a side project that needs to "intelligently" know if two strings > are contextually similar. The examples you gave seem heavy on word order and whitespace consideration, before applying any algorithms. Here's a quick perl version that does the job: CREATE OR REPLACE FUNCTION matchval(text,text) RETURNS INT LANGUAGE plperlu AS $$ use strict; use String::Approx 'adist'; my $uno = join ' ', sort split /\s+/ => lc shift; my $dos = join ' ', sort split /\s+/ => lc shift; return adist(length $unohttp://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iD8DBQFEbktUvJuQZxSWSsgRAiCtAJ9nlpqGxlYnimDPp8t5XQsc8y9RywCfZZL6 iU9iPnxHaWOvYCUD7+rK8Do= =zo3T -END PGP SIGNATURE- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Ühel kenal päeval, R, 2006-05-19 kell 22:53, kirjutas Martijn van Oosterhout: > libreadline is not a problem because you can distribute postgresql > compiled with readline and comply with all licences involved > simultaneously. oh? my impression was that we are clear, because libreadline is just one of "readline" implementations we support. > It doesn't work with openssl because the licence > requires things that are incompatable with the GPL. Still clients can compile/use libpq without OpenSSL and be on safe ground. > The openssl faq suggest that you can take advantage of the exception, > which reads: > > However, as a special exception, the source code distributed need not > include anything that is normally distributed (in either source or > binary form) with the major components (compiler, kernel, and so on) > of the operating system on which the executable runs, unless that > component itself accompanies the executable. > > I don't buy that argument, and I'm not the only one. OpenSSL is an > optional part of most Linuxes, so there's no way you can use that > exception. But on most linuxes optional parts ar also "normally distributed" :P Even network drivers may be counted optional for pure linux (kernel) experience, but are still normally distributed. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com NOTICE: This communication contains privileged or other confidential information. If you have received it in error, please advise the sender by reply email and immediately delete the message and any attachments without copying or disclosing the contents. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] text_position worst case runtime
Ühel kenal päeval, R, 2006-05-19 kell 18:18, kirjutas Tom Lane: > Hannu Krosing <[EMAIL PROTECTED]> writes: > > I guess our regex implementation already uses boyer-moore or similar. > > Why not just expose the match position of substring('text' in 'regex') > > using some function, called match_position(int searched_text, int > > regex, int matchnum) ? > > If it did that might be a nice solution, but I'm not sure that it does > use B-M ... I can't find either "Boyer" or "Moore" in its source code. Ok, maybe it is not optimised for finding longish strings inside even longers trings. I had a (false ?) memory that we used some variant of pcre, and that pcre uses BM. I may be false on both accounts. (I know that python borrowed its re module from pcre). > There's no particular reason to suppose offhand that a regex engine > would be faster than the naive code for fixed patterns. if naive code is O(n*m), then starting from some values of n and m it is probably faster if it is based on somewhat optimised regex engine, the question is, what is the threasold and dataset for fasterness -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Compression and on-disk sorting
Ühel kenal päeval, R, 2006-05-19 kell 14:57, kirjutas Jim C. Nasby: > On Fri, May 19, 2006 at 09:29:44PM +0200, Martijn van Oosterhout wrote: > > On Fri, May 19, 2006 at 10:02:50PM +0300, Hannu Krosing wrote: > > > > > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) > > > > > a; > > > > > If the tape routines were actually storing visibility information, I'd > > > > > expect that to be pretty compressible in this case since all the > > > > > tuples > > > > > were presumably created in a single transaction by pgbench. > > > > > > Was he not using pg_bench data ? > > > > Hmm, so there was only 3 integer fields and one varlena structure which > > was always empty. This prepended with a tuple header with mostly blank > > fields or at least repeated, yes, I can see how we might get a 25-to-1 > > compression. > > > > Maybe we need to change pgbench so that it puts random text in the > > filler field, that would at least put some strain on the compression > > algorithm... > > Wow, I thought there was actually something in there... > > True random data wouldn't be such a great test either; what would > probably be best is a set of random words, since in real life you're > unlikely to have truely random data. I usually use something like the following for my "random name" tests: #!/usr/bin/python import random words = [line.strip() for line in open('/usr/share/dict/words')] def make_random_name(min_items, max_items): l = [] for w in range(random.randint(min_items, max_items)): l.append(random.choice(words)) return ' '.join(l) it gives out somewhat justifyable but still quite amusing results: >>> make_random_name(2,4) 'encroaches Twedy' >>> make_random_name(2,4) 'annuloida Maiah commends imputatively' >>> make_random_name(2,4) 'terebral wine-driven pacota' >>> make_random_name(2,4) 'ballads disenfranchise cabriolets spiny-fruited' -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] text_position worst case runtime
Hannu Krosing <[EMAIL PROTECTED]> writes: > I guess our regex implementation already uses boyer-moore or similar. > Why not just expose the match position of substring('text' in 'regex') > using some function, called match_position(int searched_text, int > regex, int matchnum) ? If it did that might be a nice solution, but I'm not sure that it does use B-M ... I can't find either "Boyer" or "Moore" in its source code. There's no particular reason to suppose offhand that a regex engine would be faster than the naive code for fixed patterns. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Compression and on-disk sorting
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > True random data wouldn't be such a great test either; what would > probably be best is a set of random words, since in real life you're > unlikely to have truely random data. True random data would provide worst-case compression behavior, so we'd want to try that to find out what the downside is; but we shouldn't consider it to be the design center. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Jonah H. Harris wrote: On 5/19/06, Joshua D. Drake <[EMAIL PROTECTED]> wrote: And with that, I am going to sit in a lawn chair and watch the bonfire. This is one of the finest examples of unfocused discussions I've ever seen on -hackers... while surely entertaining, what a huge waste of time. All discussions on mailing lists are unfocused and a waste of time to some degree, even the best of them because by its very nature, email is a time wasting tool. Have a great weekend! Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
On 5/19/06, Joshua D. Drake <[EMAIL PROTECTED]> wrote: And with that, I am going to sit in a lawn chair and watch the bonfire. This is one of the finest examples of unfocused discussions I've ever seen on -hackers... while surely entertaining, what a huge waste of time. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax
Hannu Krosing <[EMAIL PROTECTED]> writes: > Ãhel kenal päeval, R, 2006-05-19 kell 16:12, kirjutas Jim C. Nasby: >> IIRC, Oracle handles this by allowing you to prefix variables with the >> name of the function. > what happens if your function name is the same as some table name or > local record variable name ? would it still bite newcomers ? Yeah. Since functions and tables have independent namespaces in PG (dunno about Oracle), this didn't seem like much of a solution to me. I think we should just recommend a coding practice such as _ prefix for local variables, and leave it at that. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax
Ühel kenal päeval, R, 2006-05-19 kell 16:12, kirjutas Jim C. Nasby: > On Fri, May 19, 2006 at 11:03:21PM +0300, Hannu Krosing wrote: > > A less disruptive change would be to have some predefined "record" where > > all local variables belong to, perhaps called 'local' or '_local_' :) so > > one could access both input parameter inp_orderdate and declared > > variable var_orderdate as local.inp_orderdate and local.var_orderdate > > respectively ? The old use (without local.) should still work. > > That might be useful for othing things, too. Having _local. and > _parameters. could be handy, for example. But I'm not sure if this is > better than using $ or not... I dont think that having separate _parameters and _locals is a good idea (then we would probably also need _const :) ). lerts have just _vars OR _locals. > IIRC, Oracle handles this by allowing you to prefix variables with the > name of the function. what happens if your function name is the same as some table name or local record variable name ? would it still bite newcomers ? > You can also have package-level variables, which > can be handy (whatever happened to the discussion about adding packages > or something similar to plpgsql??) I got the impression that this was either a) tied together with adding *procedures* in addition to functions or b) planned somehow to be solved by using schemas > > > I think it would be better to at least strongly recommend always > > > prefixing variables and parameters with something. > > > > At least we should use such convention in our sample code in docs. > > Yes, at a minimum. > > > > I'd argue that it'd > > > be even better to put us on the road of eventually mandating plpgsql > > > variables be prefixed with something (like $), but I'm not holding my > > > breath on that one... > > > > I dont believe in mandating non-backward-compatible changes, but prefix > > $ may be one way to disambiguate vars and fieldnames. > > Well, this could be made optional (strict mode). or perhaps have plpgsql_lint as a separate external tool or as a database function :) I guess it is hard to make a strict mode, when the need for using $ in first place comes from inability to distinguish between fields and variables :) -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com NOTICE: This communication contains privileged or other confidential information. If you have received it in error, please advise the sender by reply email and immediately delete the message and any attachments without copying or disclosing the contents. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Ühel kenal päeval, R, 2006-05-19 kell 09:40, kirjutas Christopher Kings-Lynne: > > We also need better support for non C locales in tsearch. As I was porting > > mysql's sakila sample database I was reminded just how painful it is when > > you > > initdb in a non-supported locale (which is probably the default on the > > majority of distros out there) > > > In 8.2 tsearch2 supports utf8... Utf8 is encoding, but I guess that tsearch2 does not care much about locales ? tsearch2 does not do sorts, but it may care about upper()/lower() for languages that support it, so there our locale support should be good for utf8 encoding if we care about language-specific case insensitivity. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax
On Fri, May 19, 2006 at 11:03:21PM +0300, Hannu Krosing wrote: > A less disruptive change would be to have some predefined "record" where > all local variables belong to, perhaps called 'local' or '_local_' :) so > one could access both input parameter inp_orderdate and declared > variable var_orderdate as local.inp_orderdate and local.var_orderdate > respectively ? The old use (without local.) should still work. That might be useful for othing things, too. Having _local. and _parameters. could be handy, for example. But I'm not sure if this is better than using $ or not... IIRC, Oracle handles this by allowing you to prefix variables with the name of the function. You can also have package-level variables, which can be handy (whatever happened to the discussion about adding packages or something similar to plpgsql??) > > I think it would be better to at least strongly recommend always > > prefixing variables and parameters with something. > > At least we should use such convention in our sample code in docs. Yes, at a minimum. > > I'd argue that it'd > > be even better to put us on the road of eventually mandating plpgsql > > variables be prefixed with something (like $), but I'm not holding my > > breath on that one... > > I dont believe in mandating non-backward-compatible changes, but prefix > $ may be one way to disambiguate vars and fieldnames. Well, this could be made optional (strict mode). -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] String Similarity
> Mark Woodward wrote: >> I have a side project that needs to "intelligently" know if two strings >> are contextually similar. Think about how CDDB information is collected >> and sorted. It isn't perfect, but there should be enough information to >> be >> usable. >> >> Think about this: >> >> "pink floyd - dark side of the moon - money" >> "dark side of the moon - pink floyd - money" >> "money - dark side of the moon - pink floyd" >> etc. >> >> To a human, these strings are almost identical. Similarly: >> >> "dark floyd of money moon pink side the" >> >> Is a puzzle to be solved by 13 year old children before the movie >> starts. [snip] > > Hmmm... I think I like this problem. Maybe I'll work on it a bit as a > contrib > module. I *have* a working function, but it is not very efficient and it is not what I would call numerically predictable. And it does find the various sub-strings between the two strings in question. Email me offline and we can make something for contrib. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
On Fri, May 19, 2006 at 04:41:20PM -0400, Bruce Momjian wrote: > > Yes, the exeption applies to libreadline, which is why we can deliver > > psql with libreadline linked on Linux, for example. But we can't on > > Windows or Solaris. > > OK, where do you see this exception? I have not. The exception is not relevent in this case. The exception is to allow GPL applications to work on non-free operating systems. Obviously a GPL'd application on Windows can never supply the source to the Win32 libraries. libreadline is not a problem because you can distribute postgresql compiled with readline and comply with all licences involved simultaneously. It doesn't work with openssl because the licence requires things that are incompatable with the GPL. The openssl faq suggest that you can take advantage of the exception, which reads: However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. I don't buy that argument, and I'm not the only one. OpenSSL is an optional part of most Linuxes, so there's no way you can use that exception. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] String Similarity
Mark Woodward wrote: > I have a side project that needs to "intelligently" know if two strings > are contextually similar. Think about how CDDB information is collected > and sorted. It isn't perfect, but there should be enough information to be > usable. > > Think about this: > > "pink floyd - dark side of the moon - money" > "dark side of the moon - pink floyd - money" > "money - dark side of the moon - pink floyd" > etc. > > To a human, these strings are almost identical. Similarly: > > "dark floyd of money moon pink side the" > > Is a puzzle to be solved by 13 year old children before the movie starts. > > My post has three questions: > > (1) Does anyone know of an efficient and numerically quantified method of > detecting these sorts of things? I currently have a fairly inefficient and > numerically bogus solution that may be the only non-impossible solution > for the problem. > > (2) Does any one see a need for this feature in PostgreSQL? If so, what > kind of interface would be best accepted as a patch? I am currently > returning a match liklihood between 0 and 100; > > (3) Is there also a desire for a Levenshtein distence function for text > and varchars? I experimented with it, and was forced to write the function > in item #1. The Levenshtein distance (also known as "edit distance") won't really give you what you want above, because operations to transplant whole chunks of the string aren't supported. (You can simulate it with inserts and deletes, but you pay individually for each of them.) Also, Levenshtein distances don't charge much for changing a word into a similarly spelled but semantically distinct word, such as "word" => "work". What you would want, I think, is some function that recognizes that the whole substring "pink floyd" has been moved from the beginning to the middle of the string, and only charges you a small edit cost for having done so. It would need to recognize both the word boundaries and the transplants. Off the top of my head, I'm not sure how you would achieve that with good runtime characteristics. You can go even further and allow synonyms, so that "pink floyd" is more related to "red floyd" than it is to "large floyd", but for that sort of thing you would probably need to pull in wordnet. If you want to notice that two strings contain local similarity, but don't have an overall good Levenshtein distance, take a look at global vs. local alignment algorithms used in biological applications. Local alignment can be achieved in O(n*m) time, where n and m are the lengths of the two strings, using the Smith-Waterman algorithm. (Temple Smith and Michael Waterman). There are faster heuristic algorithms, but they don't have the same guarantees. These local alignments might tell you something useful as a part of the overall solution. Hmmm... I think I like this problem. Maybe I'll work on it a bit as a contrib module. mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Andrew Dunstan wrote: > Bruce Momjian wrote: > > Rod Taylor wrote: > > > >> Exceptions exist in the GPL for libraries and tools included in the > >> operating system and this is enough in most cases. GPL applications on > >> Windows may have problems. > >> > > > > What exception, exactly? Does an exception apply to libreadline, > > because list I looked, it didn't. > > > > > > Yes, the exeption applies to libreadline, which is why we can deliver > psql with libreadline linked on Linux, for example. But we can't on > Windows or Solaris. OK, where do you see this exception? I have not. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Bruce Momjian wrote: Rod Taylor wrote: Exceptions exist in the GPL for libraries and tools included in the operating system and this is enough in most cases. GPL applications on Windows may have problems. What exception, exactly? Does an exception apply to libreadline, because list I looked, it didn't. Yes, the exeption applies to libreadline, which is why we can deliver psql with libreadline linked on Linux, for example. But we can't on Windows or Solaris. cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Rod Taylor wrote: > Exceptions exist in the GPL for libraries and tools included in the > operating system and this is enough in most cases. GPL applications on > Windows may have problems. What exception, exactly? Does an exception apply to libreadline, because list I looked, it didn't. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [pgsql-advocacy] Toward A Positive Marketing Approach.
Tom Lane wrote: Josh Berkus writes: Other projects need even more intensive coding help. OpenOffice, for example, doesn't offer the Postgres driver by default because it's still too buggy. That seems like something that it'd be worth our while to help fix. +1 (or +10 if that's not to piggy ;-) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [pgsql-advocacy] OO PostgreSQL Driver
Joshua D. Drake wrote: > Robert Treat wrote: > >On Friday 19 May 2006 14:22, Joshua D. Drake wrote: > >>>Other projects need even more intensive coding help. OpenOffice, for > >>>example, doesn't offer the Postgres driver by default because it's still > >>>too buggy. That would be solvable with money, but $1000 to $2000, not > >>>$50. > >>Does it really need one since it supports JDBC and ODBC? > >> > > > >It's not about what OO needs, it's about what PG needs. Consider this; if > >database M works out of the box... but database P requires you to go find > >some third party software and download it and install it in order to > >work... you tell me which one is going to have the advantage in new user > >adoption? > > Well then, why don't we convince the OO people to bundle either ODBC or > ODBCng with OO. The ODBC interface is less powerful for OOo than their own SDBC interface AFAIK (which is why they developed it). So while it would be good to have an ODBC driver in there, the SDBC driver will also continue to be developed. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] text_position worst case runtime
Tom Lane wrote: > Greg Stark <[EMAIL PROTECTED]> writes: > >>Tom Lane <[EMAIL PROTECTED]> writes: >> >>>And how much code would those take? The bottom line here is that we >>>don't have a pile of complaints about the performance of text_position, >>>so it's difficult to justify making it much more complicated than it >>>is now. > > >>It seems somewhat contrary to the Postgres design philosophy to assume that >>all strings are small. > > > That is a straw-man argument. If we try to optimize every single > function in the system to the Nth degree, we'll end up with a system > that is unmaintainable (and likely unusably buggy as well). We've got > to set limits on the amount of complexity we're willing to accept in > the core code. > > Note that I have not said "you can't put Boyer-Moore into core". > What I've said is that the case to justify doing that hasn't been made. > And handwaving about "design philosophy" isn't the kind of case I'm > looking for --- common applications in which it makes a real performance > difference are what I'm looking for. > > At this point we haven't even been shown any evidence that text_position > itself is what to optimize if you need to do searches in large text > strings. It seems entirely likely to me that the TOAST mechanisms would > be the bottleneck, instead. And one should also consider other approaches > entirely, like indexes (tsearch2 anyone?). In case anyone is following this thread specifically for the biological sequence data aspect of it, I should mention that I wrote a GiST index for the dna and protein sequence datatypes. The performance of the index was inconsistent. For certain data, I could get about two orders of magnitude speed increase on selects, where the select was based on a limited regular expression approximate match against the data. But if you change the regular expression (or to a degree, if you change the data) the performance can drop off to roughly tied with a sequential scan. And of course, inserts are far more expensive because the index has to be kept up to date. If anyone wants specifics, send me an email and I'll put something together. mark ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] text_position worst case runtime
Ühel kenal päeval, R, 2006-05-19 kell 11:20, kirjutas Jim C. Nasby: > On Thu, May 18, 2006 at 06:49:38PM -0700, Mark Dilger wrote: > > > I would think that the worst-case times would be fairly improbable. > > > I'm disinclined to push something as complicated as Boyer-Moore matching > > > into this function without considerable evidence that it's a performance > > > bottleneck for real applications. > > > > A common approach in biological data applications is to store nucleic and > > amino > > acid sequences as text in a relational database. The smaller alphabet > > sizes and > > the tendency for redundancy in these sequences increases the likelihood of a > > performance problem. I have solved this problem by writing my own data > > types > > with their own functions for sequence comparison and alignment, and I used > > boyer-moore for some of that work. Whether the same technique should be > > used > > for the text and varchar types was unclear to me, hence the question. > > Perhaps it would be best to add a seperate set of functions that use > boyer-moore, and reference them in appropriate places in the > documentation. Unless someone has a better idea on how we can find out > what people are actually doing in the field... I guess our regex implementation already uses boyer-moore or similar. Why not just expose the match position of substring('text' in 'regex') using some function, called match_position(int searched_text, int regex, int matchnum) ? -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax
Ühel kenal päeval, R, 2006-05-19 kell 14:39, kirjutas Jim C. Nasby: > On Fri, May 19, 2006 at 09:57:29PM +0300, Hannu Krosing wrote: > > ??hel kenal p??eval, R, 2006-05-19 kell 11:24, kirjutas Jim C. Nasby: > > > > > Something that's always bugged me is how horribly variables are handled > > > in plpgsql, namely that if you have a variable and a field with the same > > > name it can be extremely difficult to keep them seperated. Perhaps := vs > > > = might be a way to keep it clear as to which is which... > > > > I can't see how := helps here. Once you have fields, you are either in > > SQL-land and always use = or have the fields selected into a type or > > recors and can use record.field syntax. > > The problem happens down in the SQL layer. Actually, I guess := wouldn't > help anything... > > > > Though, a better way would probably just be to provide a built-in > > > construct for referencing plpgsql variables, such as $. > > > > Where is it exactly a problem which can't be solved simply by naming > > your variables differently from fields? > > That's how I solve it, but a lot of newbies get bit by this. A newbie will be bit by several things anyway. For example you could write code in C (and several other languages) with all your loop variables named "i", and those in inner scopes overshadowing those in outer. > What makes > it really bad is that they typically get bit after they've already > written a bunch of code that doesn't prefix variable names with > something, so then they have to switch coding-conventions after they > already have a bunch of code written. A less disruptive change would be to have some predefined "record" where all local variables belong to, perhaps called 'local' or '_local_' :) so one could access both input parameter inp_orderdate and declared variable var_orderdate as local.inp_orderdate and local.var_orderdate respectively ? The old use (without local.) should still work. > I think it would be better to at least strongly recommend always > prefixing variables and parameters with something. At least we should use such convention in our sample code in docs. > I'd argue that it'd > be even better to put us on the road of eventually mandating plpgsql > variables be prefixed with something (like $), but I'm not holding my > breath on that one... I dont believe in mandating non-backward-compatible changes, but prefix $ may be one way to disambiguate vars and fieldnames. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
[EMAIL PROTECTED] ("Mark Woodward") writes: >> Jim C. Nasby wrote: >>> Maybe a compatability layer isn't worth doing, but I certainly >>> think it's very much worthwhile for the community to do everything >>> possible to encourage migration from MySQL. We should be able to >>> lay claim to most advanced and most popular OSS database. >>> >> >> We'll do that by concentrating on spiffy features, not >> compatibility layers. I want people to use PostgreSQL because it's >> the best, not because it's just like something else. >> > > While I do agree with the ideal, the reality may not be good > enough. Even I, a PostgreSQL user for a decade, have to use MySQL > right now because that is what the client uses. > > Again, there is so much code for MySQL, a MySQL emulation layer, MEL > for short, could allow plug and play compatibility for open source, > and closed source, applications that otherwise would force a > PostgreSQL user to hold his or her nose and use MySQL. But this is essentially what killed off OS/2 in the marketplace. IBM created a "good enough" emulation layer that it ran [early] Windows(tm) applications sufficiently well that nobody bothered porting applications to *properly* work with OS/2. Microsoft then played off that with exceeding success; they made sure that future versions of Windows(tm) were sufficiently different that OS/2 was left orphaned. We *are* in a sufficiently comparable state here; MySQL AB is *NOT* our friend; they want to successfully 'take over the world,' at least as far as they can do so with their product line... -- "cbbrowne","@","cbbrowne.com" http://cbbrowne.com/info/unix.html CBS News report on Fort Worth tornado damage: "Eight major downtown buildings were severely damaged and 1,000 homes were damaged, with 95 uninhabitable. Gov. George W. Bush declared Tarrant County a disaster area. Federal Emergency Management Agency workers are expected to arrive sometime next week after required paperwork is completed." ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
On Fri, May 19, 2006 at 03:39:23PM -0400, Mark Woodward wrote: > > > >> Actually, I think it's a lot more accurate to compare PostgreSQL and > >> MySQL as FreeBSD vs Linux from about 5 years ago. Back then FreeBSD was > >> clearly superior from a technology standpoint, and clearly playing > >> second-fiddle when it came to users. And now, Linux is actually > >> technically superior in most ways thanks to all the mindshare that's > >> been poured into it. > > > > And with that, I am going to sit in a lawn chair and watch the bonfire. > > Even I know that is NOT a discussion we want to start. Yeah, wasn't trying to start an OS flamewar; my point is that it's now pretty hard to find anything FreeBSD related/specific, and that the sheer popularity of Linux has given it a huge boost in terms of development. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] String Similarity
Mark Woodward wrote: (3) Is there also a desire for a Levenshtein distence function for text and varchars? I experimented with it, and was forced to write the function in item #1. fuzzystrmatch in contrib already has a Levenshtein function. cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [pgsql-advocacy] OO PostgreSQL Driver
Robert Treat wrote: On Friday 19 May 2006 14:22, Joshua D. Drake wrote: Other projects need even more intensive coding help. OpenOffice, for example, doesn't offer the Postgres driver by default because it's still too buggy. That would be solvable with money, but $1000 to $2000, not $50. Does it really need one since it supports JDBC and ODBC? It's not about what OO needs, it's about what PG needs. Consider this; if database M works out of the box... but database P requires you to go find some third party software and download it and install it in order to work... you tell me which one is going to have the advantage in new user adoption? Well then, why don't we convince the OO people to bundle either ODBC or ODBCng with OO. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Compression and on-disk sorting
On Fri, May 19, 2006 at 09:29:44PM +0200, Martijn van Oosterhout wrote: > On Fri, May 19, 2006 at 10:02:50PM +0300, Hannu Krosing wrote: > > > > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a; > > > > If the tape routines were actually storing visibility information, I'd > > > > expect that to be pretty compressible in this case since all the tuples > > > > were presumably created in a single transaction by pgbench. > > > > Was he not using pg_bench data ? > > Hmm, so there was only 3 integer fields and one varlena structure which > was always empty. This prepended with a tuple header with mostly blank > fields or at least repeated, yes, I can see how we might get a 25-to-1 > compression. > > Maybe we need to change pgbench so that it puts random text in the > filler field, that would at least put some strain on the compression > algorithm... Wow, I thought there was actually something in there... True random data wouldn't be such a great test either; what would probably be best is a set of random words, since in real life you're unlikely to have truely random data. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] String Similarity
On Fri, May 19, 2006 at 04:00:48PM -0400, Mark Woodward wrote: > (3) Is there also a desire for a Levenshtein distence function for text > and varchars? I experimented with it, and was forced to write the function > in item #1. Postgres already has a Levenshtein distence function, see fuzzystrmatch in contrib. Whatever you come up with might fit in well there... Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
[HACKERS] String Similarity
I have a side project that needs to "intelligently" know if two strings are contextually similar. Think about how CDDB information is collected and sorted. It isn't perfect, but there should be enough information to be usable. Think about this: "pink floyd - dark side of the moon - money" "dark side of the moon - pink floyd - money" "money - dark side of the moon - pink floyd" etc. To a human, these strings are almost identical. Similarly: "dark floyd of money moon pink side the" Is a puzzle to be solved by 13 year old children before the movie starts. My post has three questions: (1) Does anyone know of an efficient and numerically quantified method of detecting these sorts of things? I currently have a fairly inefficient and numerically bogus solution that may be the only non-impossible solution for the problem. (2) Does any one see a need for this feature in PostgreSQL? If so, what kind of interface would be best accepted as a patch? I am currently returning a match liklihood between 0 and 100; (3) Is there also a desire for a Levenshtein distence function for text and varchars? I experimented with it, and was forced to write the function in item #1. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax
On Fri, May 19, 2006 at 09:57:29PM +0300, Hannu Krosing wrote: > ??hel kenal p??eval, R, 2006-05-19 kell 11:24, kirjutas Jim C. Nasby: > > > Something that's always bugged me is how horribly variables are handled > > in plpgsql, namely that if you have a variable and a field with the same > > name it can be extremely difficult to keep them seperated. Perhaps := vs > > = might be a way to keep it clear as to which is which... > > I can't see how := helps here. Once you have fields, you are either in > SQL-land and always use = or have the fields selected into a type or > recors and can use record.field syntax. The problem happens down in the SQL layer. Actually, I guess := wouldn't help anything... > > Though, a better way would probably just be to provide a built-in > > construct for referencing plpgsql variables, such as $. > > Where is it exactly a problem which can't be solved simply by naming > your variables differently from fields? That's how I solve it, but a lot of newbies get bit by this. What makes it really bad is that they typically get bit after they've already written a bunch of code that doesn't prefix variable names with something, so then they have to switch coding-conventions after they already have a bunch of code written. I think it would be better to at least strongly recommend always prefixing variables and parameters with something. I'd argue that it'd be even better to put us on the road of eventually mandating plpgsql variables be prefixed with something (like $), but I'm not holding my breath on that one... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] New feature proposal
On Fri, 2006-05-19 at 14:44 -0400, Tom Lane wrote: > Marc Munro <[EMAIL PROTECTED]> writes: > > My proposal makes it possible for properly configured add-ins to have a > > guaranteed amount of shared memory available. > > This could all be solved in a cleaner, more bulletproof way if you > simply require such add-ins to be preloaded into the postmaster process > using the existing preload_libraries hook. Then, such an add-in would > allocate its own shmem segment independent of the main Postgres one. > This totally eliminates worries about one chunk of code eating the other > one's memory, which otherwise we'd have to have additional mechanism to > deal with. This is an interesting idea that I had not previously considered. I will give it some thought. I'm not convinced that we actually do need to prevent add-ins from eating each other's memory. Just as existing add-ins that use palloc are expected to use the appropriate memory context and behave themselves, I would expect the same to be true for add-ins that require shared memory. > In a Unix environment, such a thing would Just Work because pointers to > the new segment would be inherited through fork(). In the Windows port > you'd need to do more pushups --- perhaps allocate a small amount of > memory in the main Postgres shmem segment containing the ID of the other > shmem segment, which a backend would use to reattach. > For me, adding windows-specific code to Veil is highly unappealling - I have no easy way to build or test for windows, and no experience of doing so, so the more I can leverage the existing functionality, the better. I had hoped to simply piggyback on Postgres' existing memory management with a very small change to effectively add an add-in shared memory context. On the other hand, if this is the way we have to go, then perhaps it could be added to Postgres as part of its api, rather than having Veil, and perhaps other add-ins, implement it for themselves. Thoughts? __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Compression and on-disk sorting
On Fri, May 19, 2006 at 10:02:50PM +0300, Hannu Krosing wrote: > ??hel kenal p??eval, R, 2006-05-19 kell 14:53, kirjutas Tom Lane: > > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > > On Fri, May 19, 2006 at 09:29:03AM +0200, Martijn van Oosterhout wrote: > > >> I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost > > >> unbeleiveable. What's in the table? It would seem to imply that our > > >> tuple format is far more compressable than we expected. > > > > > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a; > > > If the tape routines were actually storing visibility information, I'd > > > expect that to be pretty compressible in this case since all the tuples > > > were presumably created in a single transaction by pgbench. > > > > It's worse than that: IIRC what passes through a heaptuple sort are > > tuples manufactured by heap_form_tuple, which will have consistently > > zeroed header fields. However, the above isn't very helpful since the > > rest of us have no idea what that "accounts" table contains. How wide > > is the tuple data, and what's in it? > > Was he not using pg_bench data ? I am. For reference: bench=# \d accounts Table "public.accounts" Column | Type | Modifiers --+---+--- aid | integer | not null bid | integer | abalance | integer | filler | character(84) | > > (This suggests that we might try harder to strip unnecessary header info > > from tuples being written to tape inside tuplesort.c. I think most of > > the required fields could be reconstructed given the TupleDesc.) > > I guess that tapefiles compress better than averahe table because they > are sorted, and thus at least a little more repetitive than the rest. > If there are varlen types, then they usually also have abundance of > small 4-byte integers, which should also compress at least better than > 4/1, maybe a lot better. If someone wants to provide a patch that strips out the headers I can test that as well. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Compression and on-disk sorting
On Fri, May 19, 2006 at 10:02:50PM +0300, Hannu Krosing wrote: > > > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a; > > > If the tape routines were actually storing visibility information, I'd > > > expect that to be pretty compressible in this case since all the tuples > > > were presumably created in a single transaction by pgbench. > > Was he not using pg_bench data ? Hmm, so there was only 3 integer fields and one varlena structure which was always empty. This prepended with a tuple header with mostly blank fields or at least repeated, yes, I can see how we might get a 25-to-1 compression. Maybe we need to change pgbench so that it puts random text in the filler field, that would at least put some strain on the compression algorithm... > I guess that tapefiles compress better than averahe table because they > are sorted, and thus at least a little more repetitive than the rest. > If there are varlen types, then they usually also have abundance of > small 4-byte integers, which should also compress at least better than > 4/1, maybe a lot better. Hmm, that makes sense. That also explains the 37-to-1 compression I was seeing on indexes :). Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
> >> Actually, I think it's a lot more accurate to compare PostgreSQL and >> MySQL as FreeBSD vs Linux from about 5 years ago. Back then FreeBSD was >> clearly superior from a technology standpoint, and clearly playing >> second-fiddle when it came to users. And now, Linux is actually >> technically superior in most ways thanks to all the mindshare that's >> been poured into it. > > And with that, I am going to sit in a lawn chair and watch the bonfire. > Even I know that is NOT a discussion we want to start. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [pgsql-advocacy] OO PostgreSQL Driver
On Friday 19 May 2006 14:22, Joshua D. Drake wrote: > > Other projects need even more intensive coding help. OpenOffice, for > > example, doesn't offer the Postgres driver by default because it's still > > too buggy. That would be solvable with money, but $1000 to $2000, not > > $50. > > Does it really need one since it supports JDBC and ODBC? > It's not about what OO needs, it's about what PG needs. Consider this; if database M works out of the box... but database P requires you to go find some third party software and download it and install it in order to work... you tell me which one is going to have the advantage in new user adoption? -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Compression and on-disk sorting
Ühel kenal päeval, R, 2006-05-19 kell 14:53, kirjutas Tom Lane: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > On Fri, May 19, 2006 at 09:29:03AM +0200, Martijn van Oosterhout wrote: > >> I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost > >> unbeleiveable. What's in the table? It would seem to imply that our > >> tuple format is far more compressable than we expected. > > > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a; > > If the tape routines were actually storing visibility information, I'd > > expect that to be pretty compressible in this case since all the tuples > > were presumably created in a single transaction by pgbench. > > It's worse than that: IIRC what passes through a heaptuple sort are > tuples manufactured by heap_form_tuple, which will have consistently > zeroed header fields. However, the above isn't very helpful since the > rest of us have no idea what that "accounts" table contains. How wide > is the tuple data, and what's in it? Was he not using pg_bench data ? > (This suggests that we might try harder to strip unnecessary header info > from tuples being written to tape inside tuplesort.c. I think most of > the required fields could be reconstructed given the TupleDesc.) I guess that tapefiles compress better than averahe table because they are sorted, and thus at least a little more repetitive than the rest. If there are varlen types, then they usually also have abundance of small 4-byte integers, which should also compress at least better than 4/1, maybe a lot better. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax
Ühel kenal päeval, R, 2006-05-19 kell 11:24, kirjutas Jim C. Nasby: > Something that's always bugged me is how horribly variables are handled > in plpgsql, namely that if you have a variable and a field with the same > name it can be extremely difficult to keep them seperated. Perhaps := vs > = might be a way to keep it clear as to which is which... I can't see how := helps here. Once you have fields, you are either in SQL-land and always use = or have the fields selected into a type or recors and can use record.field syntax. > Though, a better way would probably just be to provide a built-in > construct for referencing plpgsql variables, such as $. Where is it exactly a problem which can't be solved simply by naming your variables differently from fields? -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com NOTICE: This communication contains privileged or other confidential information. If you have received it in error, please advise the sender by reply email and immediately delete the message and any attachments without copying or disclosing the contents. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Moving to -advocacy, bcc to -hackers. On Fri, May 19, 2006 at 08:11:42AM -0700, Joshua D. Drake wrote: > When MySQL is at that > >point, which database do you think executives will be choosing? The one > >with a very large userbase and lots of marketing and PR that they've > >heard plenty about, > > All due respect, Jim -- but don't you work for a publicly traded > database company that happens to have its own version of PostgreSQL? Actually, we haven't had a distribution of PostgreSQL since 8.0.3, and even then it was only a distribution; the bits were all community. > This is really a discussion for your marketing (and mine frankly) then > the PostgreSQL mailing lists :) Yes and no... should MySQL eventually become popular enough that there's little use of PostgreSQL that hurts the community just as much as it hurts our companies. In fact, I'd say it's already hurting the community more than our companies; look at how many people lament about running software XYZ because it only supports MySQL. Or about trying to find PostgreSQL hosting providers. But yes, the group of PostgreSQL companies should also be working to raise awareness of PostgreSQL as a very viable OSS database. Unfortunately, a lot of the commercial interest is in the higher-end market. And to a large extent, this really needs to be a grass-roots effort. After all, you don't win OSS mindshare by taking out ads or anything like that. So I think this really needs to be a joint venture between companies and the community. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Compression and on-disk sorting
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > On Fri, May 19, 2006 at 09:29:03AM +0200, Martijn van Oosterhout wrote: >> I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost >> unbeleiveable. What's in the table? It would seem to imply that our >> tuple format is far more compressable than we expected. > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a; > If the tape routines were actually storing visibility information, I'd > expect that to be pretty compressible in this case since all the tuples > were presumably created in a single transaction by pgbench. It's worse than that: IIRC what passes through a heaptuple sort are tuples manufactured by heap_form_tuple, which will have consistently zeroed header fields. However, the above isn't very helpful since the rest of us have no idea what that "accounts" table contains. How wide is the tuple data, and what's in it? (This suggests that we might try harder to strip unnecessary header info from tuples being written to tape inside tuplesort.c. I think most of the required fields could be reconstructed given the TupleDesc.) regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Actually, I think it's a lot more accurate to compare PostgreSQL and MySQL as FreeBSD vs Linux from about 5 years ago. Back then FreeBSD was clearly superior from a technology standpoint, and clearly playing second-fiddle when it came to users. And now, Linux is actually technically superior in most ways thanks to all the mindshare that's been poured into it. And with that, I am going to sit in a lawn chair and watch the bonfire. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] New feature proposal
Marc Munro <[EMAIL PROTECTED]> writes: > My proposal makes it possible for properly configured add-ins to have a > guaranteed amount of shared memory available. This could all be solved in a cleaner, more bulletproof way if you simply require such add-ins to be preloaded into the postmaster process using the existing preload_libraries hook. Then, such an add-in would allocate its own shmem segment independent of the main Postgres one. This totally eliminates worries about one chunk of code eating the other one's memory, which otherwise we'd have to have additional mechanism to deal with. In a Unix environment, such a thing would Just Work because pointers to the new segment would be inherited through fork(). In the Windows port you'd need to do more pushups --- perhaps allocate a small amount of memory in the main Postgres shmem segment containing the ID of the other shmem segment, which a backend would use to reattach. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
On Fri, May 19, 2006 at 11:29:23AM -0400, Mark Woodward wrote: > I kind of agree with this statement, but while I was playing devils's > advocate and just grousing a bit about having to use MySQL, there is a > sort of reality of "openomics" where mind-share is everything. > > The more mind-share you have, the more opportunities you have and the more > resources become available. Not always, of course, look at OpenSSH, but > for the most part. > > As MySQL adds features, not matter how poorly implemented, and maintain a > migration path, we will never reach their users. > > PostgreSQL is better, true, but it is not ideal in many ways. It can be > best said that the difference between PostgreSQL and MySQL is similar to > the difference between Linux/BSD and Windows. Actually, I think it's a lot more accurate to compare PostgreSQL and MySQL as FreeBSD vs Linux from about 5 years ago. Back then FreeBSD was clearly superior from a technology standpoint, and clearly playing second-fiddle when it came to users. And now, Linux is actually technically superior in most ways thanks to all the mindshare that's been poured into it. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Compression and on-disk sorting
On Fri, May 19, 2006 at 09:29:03AM +0200, Martijn van Oosterhout wrote: > On Thu, May 18, 2006 at 10:02:44PM -0500, Jim C. Nasby wrote: > > http://jim.nasby.net/misc/compress_sort.txt is preliminary results. > > I've run into a slight problem in that even at a compression level of > > -3, zlib is cutting the on-disk size of sorts by 25x. So my pgbench sort > > test with scale=150 that was producing a 2G on-disk sort is now > > producing a 80M sort, which obviously fits in memory. And cuts sort > > times by more than half. > > I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost > unbeleiveable. What's in the table? It would seem to imply that our > tuple format is far more compressable than we expected. It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a; If the tape routines were actually storing visibility information, I'd expect that to be pretty compressible in this case since all the tuples were presumably created in a single transaction by pgbench. If needs be, I could try the patch against http://stats.distributed.net, assuming that it would apply to REL_8_1. > Do you have any stats on CPU usage? Memory usage? I've only been taking a look at vmstat from time-to-time, and I have yet to see the machine get CPU-bound. Haven't really paid much attention to memory. Is there anything in partucular you're looking for? I can log vmstat for the next set of runs (with a scaling factor of 1). I plan on doing those runs tonight... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] New feature proposal
On Fri, 2006-05-19 at 10:05 -0700, Josh Berkus wrote: > Marc, > > > The add-in would not "know" how much had been allocated to it, but could > > be told through it's own config file. I envisage something like: > > > > in postgresql.conf > > > > # add_in_shmem = 0# Amount of shared mem to set aside for add-ins > > # in KBytes > > add_in_shem = 64 > > > > > > in veil.conf > > > > veil_shmem = 32 # Amount of shared memory we can use from > > # the postgres add-ins shared memory pool > > > > I think this is better than add-ins simply stealing from, and contending > > for, postgres shared memory which is the only real alternative right > > now. > > H ... what would happen if I did: > > add_in_shmem = 64 > veil_shmem = 128 > > or even: > > add_in_shmem = 128 > veil_shmem = 64 > plperl_shmem = 64 > pljava_shmem = 64 > If that happens, one of the add-ins will be sadly disappointed when it tries to use its allocation. The same as would happen now, if Veil attempted to allocate too large a chunk of shared memory. My proposal makes it possible for properly configured add-ins to have a guaranteed amount of shared memory available. It allows add-ins to be well-behaved in their use of shared memory, and it prevents them from being able to exhaust postgres' own shared memory. It doesn't prevent add-ins from over-allocating from the add-in memory context, nor do I think it can or should do this. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] [pgsql-advocacy] Toward A Positive Marketing Approach.
Josh Berkus writes: > Other projects need even more intensive coding help. OpenOffice, for > example, > doesn't offer the Postgres driver by default because it's still too buggy. That seems like something that it'd be worth our while to help fix. Does anyone have a handle on what the problems are? Is it something that could reasonably be fixed by a Postgres person, or is the real problem that it'd take a whole lot of both OO-fu and Postgres-fu? If so, can we find someone with the former nature to collaborate with? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] OO PostgreSQL Driver
Other projects need even more intensive coding help. OpenOffice, for example, doesn't offer the Postgres driver by default because it's still too buggy. That would be solvable with money, but $1000 to $2000, not $50. Does it really need one since it supports JDBC and ODBC? J -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] text_position worst case runtime
Greg Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> And how much code would those take? The bottom line here is that we >> don't have a pile of complaints about the performance of text_position, >> so it's difficult to justify making it much more complicated than it >> is now. > It seems somewhat contrary to the Postgres design philosophy to assume that > all strings are small. That is a straw-man argument. If we try to optimize every single function in the system to the Nth degree, we'll end up with a system that is unmaintainable (and likely unusably buggy as well). We've got to set limits on the amount of complexity we're willing to accept in the core code. Note that I have not said "you can't put Boyer-Moore into core". What I've said is that the case to justify doing that hasn't been made. And handwaving about "design philosophy" isn't the kind of case I'm looking for --- common applications in which it makes a real performance difference are what I'm looking for. At this point we haven't even been shown any evidence that text_position itself is what to optimize if you need to do searches in large text strings. It seems entirely likely to me that the TOAST mechanisms would be the bottleneck, instead. And one should also consider other approaches entirely, like indexes (tsearch2 anyone?). regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] text_position worst case runtime
Tom Lane <[EMAIL PROTECTED]> writes: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> You've obviously missed the point of my concern, which is code bloat. > > > So why not just replace our code with better algorithms? We could use > > Shift-Or or Shift-And which AFAIK are even better than Boyer-Moore. > > And how much code would those take? The bottom line here is that we > don't have a pile of complaints about the performance of text_position, > so it's difficult to justify making it much more complicated than it > is now. Even Boyer-Moore, while conceptually tricky isn't actually all that much code. It seems somewhat contrary to the Postgres design philosophy to assume that all strings are small. Other databases have two different string data types, one that has a small length limit (often only 255 bytes or so) and another that has all kinds of awkward restrictions on how it can be used. Postgres allows text to contain gigabytes of data and lets you use all the normal string functions on it. It seems like having those string functions assuming the strings are small compromises that design choice. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [pgsql-advocacy] Toward A Positive Marketing Approach.
Michael, Howdy, glad to see you came back. > 1. We should treat all marketing efforts by hackers/programmers as > social bugs. Get some marketing pros (debuggers) in on this, or the > popularity of postgresql will continue to pale in the real world. Not really in line with PostgreSQL's "personality". This could work for OpenOffice, but not here. PG is a very engineering-central project and there aren't many people who want to change that. Your other comments have been mostly answered, but: > 3. Reward existing FOSS projects that make sensible provision to > accomodate postgresql in preference to other more "commercial" db's. > Free links, mention in newsletter, listing on websites, whatever it > takes to start pulling other open source communities behind postgresql. > A good example is bitweaver.org, a great integration project, very > professional, helpful to small businesses, but needs some promotional help. > > 4. Stop being too cheap. Money Talks! Offer to PAY premiums to major > OSS aps who don't do pg, or don't do it well enough. Like Compierre, > like Drupal. Actually, what projects who don't have a bias against PostgreSQL mostly need is developer time to help them with code. Drupal already supports Postgres; they need DBAs to help them be faster/better on Postgres. They are in the same boat with lots of other projects, so much so that there is more demand than there are PG volunteers. If you have Postgres DBA experience, I'll be happy to hook you up with someone. Other projects need even more intensive coding help. OpenOffice, for example, doesn't offer the Postgres driver by default because it's still too buggy. That would be solvable with money, but $1000 to $2000, not $50. I do think that we could use a list of what other mature OSS projects support PostgreSQL reasonably well already. This is pretty much a data collection effort; are you volunteering for it? We could use it. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [pgsql-hackers] Daily digest v1.5943 (21 messages)
On Fri, 2006-05-19 at 13:41 -0300, [EMAIL PROTECTED] wrote: > Marc Munro wrote: > > Veil http://pgfoundry.org/projects/veil is currently not a very good > > Postgres citizen. It steals what little shared memory it needs from > > postgres' shared memory using ShmemAlloc(). > > > > For Postgres 8.2 I would like Veil to be a better citizen and use > only > > what shared memory has been reserved for postgres add-ins. > > Why should this be individually restricted? AFAICS Veil's > functionality > would be essential to access row level ACL controlled tables, so if > it > fails for low mem conditions it's much like a backend failure. The problem is that postgres currently has no way of knowing how much shared memory Veil is likely to require, and so will not allocate space for it. Consequently if Veil needs a lot of shared memory, it is unlikely to be able to get it. My intention is to allow add-ins to use only shared memory that has been allocated for their own use, to ensure that enough has been allocated, to isolate postgres shared memory from abuse by add-ins, and to avoid the need for add-ins to manage their own shared memory. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] text_position worst case runtime
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> You've obviously missed the point of my concern, which is code bloat. > So why not just replace our code with better algorithms? We could use > Shift-Or or Shift-And which AFAIK are even better than Boyer-Moore. And how much code would those take? The bottom line here is that we don't have a pile of complaints about the performance of text_position, so it's difficult to justify making it much more complicated than it is now. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] New feature proposal
Marc, > The add-in would not "know" how much had been allocated to it, but could > be told through it's own config file. I envisage something like: > > in postgresql.conf > > # add_in_shmem = 0 # Amount of shared mem to set aside for add-ins > # in KBytes > add_in_shem = 64 > > > in veil.conf > > veil_shmem = 32 # Amount of shared memory we can use from > # the postgres add-ins shared memory pool > > I think this is better than add-ins simply stealing from, and contending > for, postgres shared memory which is the only real alternative right > now. H ... what would happen if I did: add_in_shmem = 64 veil_shmem = 128 or even: add_in_shmem = 128 veil_shmem = 64 plperl_shmem = 64 pljava_shmem = 64 ... seems like we'll need to check for overallocation, no? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] New feature proposal
On Fri, 2006-05-19 at 13:41 -0300, [EMAIL PROTECTED] wrote: > On Thu, 2006-05-18 at 17:39 -0700, Marc Munro wrote: > > > For Postgres 8.2 I would like Veil to be a better citizen and use > only > > what shared memory has been reserved for postgres add-ins. > > How would Postgres ask the add-in how much memory it needs? How would > the add-in know how much has been reserved for it? How would an add-in > know whether it was the only add-in and whether it could take all of > the > allocation? Postgres would not ask any add-ins how much they need, it would simply allocate the extra amount defined in a GUC and not make that available through the normal shared memory allocation mechanism. The add-in would not "know" how much had been allocated to it, but could be told through it's own config file. I envisage something like: in postgresql.conf # add_in_shmem = 0# Amount of shared mem to set aside for add-ins # in KBytes add_in_shem = 64 in veil.conf veil_shmem = 32 # Amount of shared memory we can use from # the postgres add-ins shared memory pool I think this is better than add-ins simply stealing from, and contending for, postgres shared memory which is the only real alternative right now. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] text_position worst case runtime
Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > Perhaps it would be best to add a seperate set of functions that use > > boyer-moore, and reference them in appropriate places in the > > documentation. Unless someone has a better idea on how we can find out > > what people are actually doing in the field... > > You've obviously missed the point of my concern, which is code bloat. > A parallel set of functions incorporating B-M would make things worse > not better from that standpoint. (Unless you are proposing that someone > do it as a separate pgfoundry project; which'd be fine with me. I'm > just concerned about how much we buy into as core features.) So why not just replace our code with better algorithms? We could use Shift-Or or Shift-And which AFAIK are even better than Boyer-Moore. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] text_position worst case runtime
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > Perhaps it would be best to add a seperate set of functions that use > boyer-moore, and reference them in appropriate places in the > documentation. Unless someone has a better idea on how we can find out > what people are actually doing in the field... You've obviously missed the point of my concern, which is code bloat. A parallel set of functions incorporating B-M would make things worse not better from that standpoint. (Unless you are proposing that someone do it as a separate pgfoundry project; which'd be fine with me. I'm just concerned about how much we buy into as core features.) regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?
On Fri, 2006-05-19 at 09:36 -0700, Jeff Frost wrote: > On Fri, 19 May 2006, Tom Lane wrote: > > > What I'd suggest is resuming the test after making sure you've killed > > off any old archivers, and seeing if you can make any progress on > > reproducing the original problem. We definitely need a > > multiple-archiver interlock, but I think that must be unrelated to your > > real problem. > > Ok, so I've got the old archivers gone (and btw, after a restart I ended up > with 3 of them - so I stopped postmaster, and killed them all individually > and > started postmaster again). Thats good. > Now I can run my same pg_bench, or do you guys > have any other suggestions on attempting to reproduce the problem? No. We're back on track to try to reproduce the original error. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?
On Fri, 19 May 2006, Tom Lane wrote: What I'd suggest is resuming the test after making sure you've killed off any old archivers, and seeing if you can make any progress on reproducing the original problem. We definitely need a multiple-archiver interlock, but I think that must be unrelated to your real problem. Ok, so I've got the old archivers gone (and btw, after a restart I ended up with 3 of them - so I stopped postmaster, and killed them all individually and started postmaster again). Now I can run my same pg_bench, or do you guys have any other suggestions on attempting to reproduce the problem? -- Jeff Frost, Owner <[EMAIL PROTECTED]> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?
On Fri, 19 May 2006, Tom Lane wrote: Well, the fact that there's only one archiver *now* doesn't mean there wasn't more than one when the problem happened. The orphaned archiver would eventually quit. Do you have logs that would let you check when the production postmaster was restarted? I looked through /var/log/messages* and there wasn't a restart prior to the problem in the logs. They go back to April 16. The postmaster was restarted on May 15th (this Monday), but that was after the reported problem. -- Jeff Frost, Owner <[EMAIL PROTECTED]> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Alvaro Herrera wrote: Joshua D. Drake wrote: Martijn van Oosterhout wrote: On Thu, May 18, 2006 at 02:58:11PM -0400, Mark Woodward wrote: The reality is that MySQL is widely supported by some very, shall we say, "interesting" open source projects and using these products with PostgreSQL would be a plus. The biggest headache I find with using postgres is that various GPL licenced programs have trouble directly shipping postgresql support because of our use of OpenSSL. Each and every one of those program needs to add an exception to their licence for distributors to distribute postgresql support. Why would that be the case... OpenSSL and PostgreSQL both are BSD licensed... Am I missing something? Advertising clause. PostgreSQL doesn't have it, OpenSSL does. Is that the same clause that caused the XFree86/X.Org fork? J -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?
On Fri, 2006-05-19 at 12:03 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > OK, I'm on it. > > What solution have you got in mind? I was thinking about an fcntl lock > to ensure only one archiver is active in a given data directory. That > would fix the problem without affecting anything outside the archiver. > Not sure what's the most portable way to do it though. I was trying to think of a better way than using an archiver.pid file in pg_xlog/archive_status... -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
On Fri, 2006-05-19 at 09:11 -0700, Joshua D. Drake wrote: > Martijn van Oosterhout wrote: > > On Thu, May 18, 2006 at 02:58:11PM -0400, Mark Woodward wrote: > >> The reality is that MySQL is widely supported by some very, shall we say, > >> "interesting" open source projects and using these products with > >> PostgreSQL would be a plus. > > > > The biggest headache I find with using postgres is that various GPL > > licenced programs have trouble directly shipping postgresql support > > because of our use of OpenSSL. Each and every one of those program > > needs to add an exception to their licence for distributors to > > distribute postgresql support. > > Why would that be the case... OpenSSL and PostgreSQL both are BSD > licensed... Am I missing something? OpenSSL is not the 3 clause BSD license, it also includes a number of advertising clauses that the GPL has never liked -- GPL must not be modified for derivatives but the advertising clauses are in addition to the GPL, so it must be modified for the combination. Exceptions exist in the GPL for libraries and tools included in the operating system and this is enough in most cases. GPL applications on Windows may have problems. http://www.openssl.org/support/faq.html#LEGAL2 2. Can I use OpenSSL with GPL software? On many systems including the major Linux and BSD distributions, yes (the GPL does not place restrictions on using libraries that are part of the normal operating system distribution). On other systems, the situation is less clear. Some GPL software copyright holders claim that you infringe on their rights if you use OpenSSL with their software on operating systems that don't normally include OpenSSL. If you develop open source software that uses OpenSSL, you may find it useful to choose an other license than the GPL, or state explicitly that "This program is released under the GPL with the additional exemption that compiling, linking, and/or using OpenSSL is allowed." If you are using GPL software developed by others, you may want to ask the copyright holder for permission to use their software with OpenSSL. OpenSSL License * 1. Redistributions of source code must retain the above copyright *notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright *notice, this list of conditions and the following disclaimer in *the documentation and/or other materials provided with the *distribution. * * 3. All advertising materials mentioning features or use of this *software must display the following acknowledgment: *"This product includes software developed by the OpenSSL Project *for use in the OpenSSL Toolkit. (http://www.openssl.org/)" * * 4. The names "OpenSSL Toolkit" and "OpenSSL Project" must not be used to *endorse or promote products derived from this software without *prior written permission. For written permission, please contact *[EMAIL PROTECTED] * * 5. Products derived from this software may not be called "OpenSSL" *nor may "OpenSSL" appear in their names without prior written *permission of the OpenSSL Project. * * 6. Redistributions of any form whatsoever must retain the following *acknowledgment: *"This product includes software developed by the OpenSSL Project *for use in the OpenSSL Toolkit (http://www.openssl.org/)" > > I'm thinking particularly of FreeRadius but there are others. More than > > once I thought while waiting for stuff to compile: if I'd chosen mysql > > I'd be done by now... > > > > Have a nice day, -- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?
On Fri, 2006-05-19 at 12:20 -0400, Tom Lane wrote: > I wrote: > > Well, the fact that there's only one archiver *now* doesn't mean there > > wasn't more than one when the problem happened. The orphaned archiver > > would eventually quit. > > But, actually, nevermind: we have explained the failures you were seeing > in the test setup, but a multiple-active-archiver situation still > doesn't explain the original situation of incoming connections getting > blocked. Agreed. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax
On Thu, May 18, 2006 at 08:40:04PM -0400, Douglas McNaught wrote: > Mark Dilger <[EMAIL PROTECTED]> writes: > > > Tom Lane wrote: > >> No it isn't. The plpgsql scanner treats := and = as *the same token*. > >> They can be interchanged freely. This has nothing to do with the case > >> of modifying a loop variable in particular. > > > > I disagree. If the scanner treated them the same, then > > > > if i := 1 then ... > > > > would work, but it doesn't. The := is rejected in a conditional. Try the > > following code if you don't believe me: > > That's because (AIUI) all expressions to be evaluated are handed off > to the SQL parser (why re-implement all that logic and have subtle and > annoying differences?) plpgsql only handles the statements, loops, etc. > So it doesn't care about the difference but SQL does... Something that's always bugged me is how horribly variables are handled in plpgsql, namely that if you have a variable and a field with the same name it can be extremely difficult to keep them seperated. Perhaps := vs = might be a way to keep it clear as to which is which... Though, a better way would probably just be to provide a built-in construct for referencing plpgsql variables, such as $. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Joshua D. Drake wrote: Martijn van Oosterhout wrote: On Thu, May 18, 2006 at 02:58:11PM -0400, Mark Woodward wrote: The reality is that MySQL is widely supported by some very, shall we say, "interesting" open source projects and using these products with PostgreSQL would be a plus. The biggest headache I find with using postgres is that various GPL licenced programs have trouble directly shipping postgresql support because of our use of OpenSSL. Each and every one of those program needs to add an exception to their licence for distributors to distribute postgresql support. Why would that be the case... OpenSSL and PostgreSQL both are BSD licensed... Am I missing something? http://www.openssl.org/support/faq.html#LEGAL2 Of course, on that reasoning, they would need to provide a similar exception for libpq with or without openssl. More and more I love the fact that we don't play these games. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] text_position worst case runtime
On Thu, May 18, 2006 at 06:49:38PM -0700, Mark Dilger wrote: > > I would think that the worst-case times would be fairly improbable. > > I'm disinclined to push something as complicated as Boyer-Moore matching > > into this function without considerable evidence that it's a performance > > bottleneck for real applications. > > A common approach in biological data applications is to store nucleic and > amino > acid sequences as text in a relational database. The smaller alphabet sizes > and > the tendency for redundancy in these sequences increases the likelihood of a > performance problem. I have solved this problem by writing my own data types > with their own functions for sequence comparison and alignment, and I used > boyer-moore for some of that work. Whether the same technique should be used > for the text and varchar types was unclear to me, hence the question. Perhaps it would be best to add a seperate set of functions that use boyer-moore, and reference them in appropriate places in the documentation. Unless someone has a better idea on how we can find out what people are actually doing in the field... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?
I wrote: > Well, the fact that there's only one archiver *now* doesn't mean there > wasn't more than one when the problem happened. The orphaned archiver > would eventually quit. But, actually, nevermind: we have explained the failures you were seeing in the test setup, but a multiple-active-archiver situation still doesn't explain the original situation of incoming connections getting blocked. What I'd suggest is resuming the test after making sure you've killed off any old archivers, and seeing if you can make any progress on reproducing the original problem. We definitely need a multiple-archiver interlock, but I think that must be unrelated to your real problem. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?
Jeff Frost <[EMAIL PROTECTED]> writes: > Hurray! Unfortunately, the postmaster on the original troubled server almost > never gets restarted, and in fact only has only one archiver process running > right now. Drat! Well, the fact that there's only one archiver *now* doesn't mean there wasn't more than one when the problem happened. The orphaned archiver would eventually quit. Do you have logs that would let you check when the production postmaster was restarted? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Joshua D. Drake wrote: > Martijn van Oosterhout wrote: > >On Thu, May 18, 2006 at 02:58:11PM -0400, Mark Woodward wrote: > >>The reality is that MySQL is widely supported by some very, shall we say, > >>"interesting" open source projects and using these products with > >>PostgreSQL would be a plus. > > > >The biggest headache I find with using postgres is that various GPL > >licenced programs have trouble directly shipping postgresql support > >because of our use of OpenSSL. Each and every one of those program > >needs to add an exception to their licence for distributors to > >distribute postgresql support. > > Why would that be the case... OpenSSL and PostgreSQL both are BSD > licensed... Am I missing something? Advertising clause. PostgreSQL doesn't have it, OpenSSL does. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Martijn van Oosterhout wrote: On Thu, May 18, 2006 at 02:58:11PM -0400, Mark Woodward wrote: The reality is that MySQL is widely supported by some very, shall we say, "interesting" open source projects and using these products with PostgreSQL would be a plus. The biggest headache I find with using postgres is that various GPL licenced programs have trouble directly shipping postgresql support because of our use of OpenSSL. Each and every one of those program needs to add an exception to their licence for distributors to distribute postgresql support. Why would that be the case... OpenSSL and PostgreSQL both are BSD licensed... Am I missing something? Joshua D. Drake I'm thinking particularly of FreeRadius but there are others. More than once I thought while waiting for stuff to compile: if I'd chosen mysql I'd be done by now... Have a nice day, -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?
On Fri, 19 May 2006, Tom Lane wrote: Well, there's our smoking gun. IIRC, all the failures you showed us are consistent with race conditions caused by multiple archiver processes all trying to do the same tasks concurrently. Do you frequently stop and restart the postmaster? Because I don't see how you could get into this state without having done so. I've just been looking at the code, and the archiver does commit hara-kiri when it notices its parent postmaster is dead; but it only checks that in the outer loop. Given sufficiently long delays in the archive_command, that could be a long time after the postmaster died; and in the meantime, successive executions of the archive_command could be conflicting with those launched by a later archiver incarnation. Hurray! Unfortunately, the postmaster on the original troubled server almost never gets restarted, and in fact only has only one archiver process running right now. Drat! I guess I'll have to try and catch it in the act again the next time the NAS gets wedged so I can debug a little more (it was caught by one of the windows folks last time) and gather some useful data. Let me know if you want me to test a patch since I've already got this test case setup. -- Jeff Frost, Owner <[EMAIL PROTECTED]> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Martijn van Oosterhout wrote: The biggest headache I find with using postgres is that various GPL licenced programs have trouble directly shipping postgresql support because of our use of OpenSSL. Each and every one of those program needs to add an exception to their licence for distributors to distribute postgresql support. They could distribute a non-ssl-enabled version, *if* they really need to include libpq in the package, or advise to to replace it with the common version if ssl is required. I bet >99 % of pgsql connections are not encrypted anyway. Regards, Andreas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] New feature proposal
Marc Munro wrote: Veil http://pgfoundry.org/projects/veil is currently not a very good Postgres citizen. It steals what little shared memory it needs from postgres' shared memory using ShmemAlloc(). For Postgres 8.2 I would like Veil to be a better citizen and use only what shared memory has been reserved for postgres add-ins. Why should this be individually restricted? AFAICS Veil's functionality would be essential to access row level ACL controlled tables, so if it fails for low mem conditions it's much like a backend failure. Regards, Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [GENERAL] Weird ..... (a=1 or a=2) <> (a=2 or a=1)
On Fri, 2006-05-19 at 11:34 -0400, Tom Lane wrote: [...] > > This only affects the 7.4 and 8.0 branches, because earlier and later > versions of Postgres don't use this technique for detecting duplicates. > But it's surprising we didn't find it before. > > Patches will appear in next week's releases. Thanks again! > Thanks to you for finding and fixing the problem :-) It looks like you are finish so I will update the server and you will lose access to it. regards -- Rafael Martinez, <[EMAIL PROTECTED]> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?
Simon Riggs <[EMAIL PROTECTED]> writes: > OK, I'm on it. What solution have you got in mind? I was thinking about an fcntl lock to ensure only one archiver is active in a given data directory. That would fix the problem without affecting anything outside the archiver. Not sure what's the most portable way to do it though. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] New feature proposal
On Thu, 2006-05-18 at 17:39 -0700, Marc Munro wrote: > For Postgres 8.2 I would like Veil to be a better citizen and use only > what shared memory has been reserved for postgres add-ins. How would Postgres ask the add-in how much memory it needs? How would the add-in know how much has been reserved for it? How would an add-in know whether it was the only add-in and whether it could take all of the allocation? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
On Thu, May 18, 2006 at 02:58:11PM -0400, Mark Woodward wrote: > The reality is that MySQL is widely supported by some very, shall we say, > "interesting" open source projects and using these products with > PostgreSQL would be a plus. The biggest headache I find with using postgres is that various GPL licenced programs have trouble directly shipping postgresql support because of our use of OpenSSL. Each and every one of those program needs to add an exception to their licence for distributors to distribute postgresql support. I'm thinking particularly of FreeRadius but there are others. More than once I thought while waiting for stuff to compile: if I'd chosen mysql I'd be done by now... Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] [GENERAL] Weird ..... (a=1 or a=2) <> (a=2 or a=1)
Many thanks for allowing me to trace through your problem case. It's a real Postgres bug, and a nasty one. The problem is a thinko in nodeIndexscan.c's code that tests whether the same tuple has already been emitted in a previous OR'd scan: it is looking for a match on tuple->t_data->t_ctid, when what it should really be looking at is tuple->t_self. What I find is that the indexscan for status == open is returning TID (880,5), which has XMAX_INVALID and a t_ctid pointing at (880,18). (This is perfectly normal, it just indicates that somebody tried to update the row but the updating transaction rolled back, and the updated version at 880,18 was later recycled by VACUUM.) So this causes a bogus rejection when TID (880,18) is scanned during the second indexscan. This only affects the 7.4 and 8.0 branches, because earlier and later versions of Postgres don't use this technique for detecting duplicates. But it's surprising we didn't find it before. Patches will appear in next week's releases. Thanks again! regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
> Andrew Dunstan <[EMAIL PROTECTED]> writes: >> Mark Woodward wrote: >>> Again, there is so much code for MySQL, a MySQL emulation layer, MEL >>> for >>> short, could allow plug and play compatibility for open source, and >>> closed >>> source, applications that otherwise would force a PostgreSQL user to >>> hold >>> his or her nose and use MySQL. >>> >> If we had infinite resources this might make sense. We don't, so it >> doesn't. There is a real cost to producing a compatibility layer, and >> the cost will be those spiffy new features. > > The real problem is that there's a whole lot of stuff, such as mysql's > weak error checking, that I don't think a "compatibility layer" could > sanely provide. > I kind of agree with this statement, but while I was playing devils's advocate and just grousing a bit about having to use MySQL, there is a sort of reality of "openomics" where mind-share is everything. The more mind-share you have, the more opportunities you have and the more resources become available. Not always, of course, look at OpenSSH, but for the most part. As MySQL adds features, not matter how poorly implemented, and maintain a migration path, we will never reach their users. PostgreSQL is better, true, but it is not ideal in many ways. It can be best said that the difference between PostgreSQL and MySQL is similar to the difference between Linux/BSD and Windows. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
When MySQL is at that point, which database do you think executives will be choosing? The one with a very large userbase and lots of marketing and PR that they've heard plenty about, All due respect, Jim -- but don't you work for a publicly traded database company that happens to have its own version of PostgreSQL? This is really a discussion for your marketing (and mine frankly) then the PostgreSQL mailing lists :) Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
On Fri, May 19, 2006 at 01:26:34AM +0200, Dawid Kuroczko wrote: > Personally my opinion is that there is no point in pushing PostgreSQL > everywhere -- if there is no siginifcant performance gain, most managers > will refuse it, on the grounds that "if it ain't (too) broke, don't fix it". > The real places to "attack at" are the BIG dbs, the dataware housing > applications. Places where MySQL is not used, because someones > select count(*) should not kill the database. Because the queries > take few hours to complete "by design". This should be doable. :) The problem with limiting ourselves to going after only the 'high end' of databases is that MySQL is also pushing in that direction, but they have the advantage of a much larger user base than us. So in the not-to-distant future, a lot of people who are looking to come off of Oracle will look at both MySQL and PostgreSQL (in fact I'm sure there's already some people moving from Oracle to MySQL). When MySQL is at that point, which database do you think executives will be choosing? The one with a very large userbase and lots of marketing and PR that they've heard plenty about, or the one that might theoretically be technically superior but has a small userbase and they've never heard of? And if the technical people in the company are MySQL users, because that's the database they cut their teeth on... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS]
Dragan Zubac <[EMAIL PROTECTED]> writes: > Does anybody knows the Postgres v8.0.3 data directory hierarchy ? What is > the purpose of the multiply files such as: > 5 > 5.1 > 5.2 > 5.3 > each of the same size ? http://developer.postgresql.org/docs/postgres/storage.html regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Compression and on-disk sorting
Martijn van Oosterhout writes: > However, postgres tables are very highly compressable, 10-to-1 is not > that uncommon. pg_proc and pg_index compress by that for example. > Indexes compress even more (a few on my system compress 25-to-1 but > that could just be slack space, the record being 37-to-1 > (pg_constraint_conname_nsp_index)). Anything containing a column of type "name" will compress amazingly well because of all the padding spaces. I don't think that's representative of user data though ... except maybe for the occasional novice using "char(255)" ... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Compression and on-disk sorting
On Fri, May 19, 2006 at 09:03:31AM -0400, Tom Lane wrote: > Martijn van Oosterhout writes: > > I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost > > unbeleiveable. What's in the table? > > Yeah, I'd tend to question the test data being used. gzip does not do > that well on typical text (especially not at the lower settings we'd > likely want to use). However, postgres tables are very highly compressable, 10-to-1 is not that uncommon. pg_proc and pg_index compress by that for example. Indexes compress even more (a few on my system compress 25-to-1 but that could just be slack space, the record being 37-to-1 (pg_constraint_conname_nsp_index)). The only table on my test system over 32KB that doesn't reach 2-to-1 compression with gzip -3 is one of the toast tables. So getting 25-to-1 is a lot, but possibly not that extreme. pg_statistic, which is about as close to random data as you're going to get on a postgres system, compresses 5-to-1. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS]
Dragan Zubac wrote: > Hello > > Does anybody knows the Postgres v8.0.3 data directory hierarchy ? > What is the purpose of the multiply files such as: > > 5 > 5.1 > 5.2 > 5.3 > > each of the same size ? > those are the 1 gigabyte segments of your table/index/relation. see the storage section of the documentation. -- Larry Rosenman Database Support Engineer PERVASIVE SOFTWARE. INC. 12365B RIATA TRACE PKWY 3015 AUSTIN TX 78727-6531 Tel: 512.231.6173 Fax: 512.231.6597 Email: [EMAIL PROTECTED] Web: www.pervasive.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS]
Hello Does anybody knows the Postgres v8.0.3 data directory hierarchy ? What is the purpose of the multiply files such as: 5 5.1 5.2 5.3 each of the same size ? Sincerely Dragan Zubac ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Compression and on-disk sorting
Martijn van Oosterhout writes: > I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost > unbeleiveable. What's in the table? Yeah, I'd tend to question the test data being used. gzip does not do that well on typical text (especially not at the lower settings we'd likely want to use). regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Compression and on-disk sorting
Jim, > http://jim.nasby.net/misc/compress_sort.txt is preliminary results. > I've run into a slight problem in that even at a compression > level of -3, zlib is cutting the on-disk size of sorts by > 25x. So my pgbench sort test with scale=150 that was > producing a 2G on-disk sort is now producing a 80M sort, > which obviously fits in memory. And cuts sort times by more than half. When you're ready, we can test this on some other interesting cases and on fast hardware. BTW - external sorting is *still* 4x slower than popular commercial DBMS (PCDB) on real workload when full rows are used in queries. The final results we had after the last bit of sort improvements were limited to cases where only the sort column was used in the query, and for that case the improved external sort code was as fast as PCDB provided lots of work_mem are used, but when the whole contents of the row are consumed (as with TPC-H and in many real world cases) the performance is still far slower. So, compression of the tuples may be just what we're looking for. - Luke ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Am Freitag, 19. Mai 2006 02:35 schrieb Robert Treat: > On Thursday 18 May 2006 12:38, Josh Berkus wrote: > > Personally, I'd go after MSSQL before I bothered with MySQL. Sure, > > let's make *migration* easier for those who wake up and smell the BS, but > > migration can (and probably should) be one-way. > > If you want to get users to swtich to your software from your competitors, > you have to eliminate barriers, and a big one for any database is getting > locked into a specific one. People aren't going to take the time to try > switching to postgresql if they can't easily make it back to thier former > database. It's one of the reasons why PostgreSQL's standards compliance is > so important; if you want to swtich to a new database, your best bet is to > give PostgreSQL a shot, because even if you don't like it, we're not going > to try and trap you into our software with bunches of non-standard knobs. > Low barrier to exit == low barrier to entry. The way to go are standards. If postgresql supports standard-sql (like we all know it know), mysql-users has to justify their apps to use standard-sql. What they gain is not only compatibility with PostgreSQL but compatiblity with all database-servers, which supports this standard. They wont have much trouble to switch back to mysql or downgrade their postgresql to oracle ;-), if they follow standards. Also if PostgreSQL would have a compatibility-layer, it has to follow every quirk of mysql and will be measured by that. Much better is to promote users of mysql to use standards. Tommi ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match