[GENERAL] REINDEX requirement?

2010-11-09 Thread AI Rumman
How do I know that index require REINDEX?


Re: [GENERAL] Run postgresql\9.0\data on a RAID-1 disk on my Local Area Network (LAN)

2010-11-09 Thread Sachin Srivastava
Hello,

From the logs it seems, the initdb was successful, However, the database
server failed to start afterwards. Can you manually start the database
service? If yes try installing PostGIS from stackbuilder after that.
If no, then whats the error message you are getting?


2010/11/9 Anders Söderman anders.soder...@gisassistans.se

  Hello

 I´ve tried many times to install in different ways. Hopefully I removed all
 files from previous installations before trying again.
 I don´t know if the logs are useful. I´ve tried to understand what the log
 says without result.
 I think, right now, that I should do an PostgreSQL/PostGIS installation
 using all default values and then try to change the postgresql\9.0\data 
 folder
 to the RAID-1 LAN-disk. Thanks Anders Söderman Stockholm Sweden
  --

 *From:* pgsql-general-ow...@postgresql.org [mailto:
 pgsql-general-ow...@postgresql.org] *On Behalf Of *Sachin Srivastava
 *Sent:* den 8 november 2010 08:56

 *To:* Anders Söderman
 *Cc:* pgsql-general@postgresql.org
 *Subject:* Re: [GENERAL] Run postgresql\9.0\data on a RAID-1 disk on my
 Local Area Network (LAN)



 Hello,

 On Nov 7, 2010, at 1:22 PM, Anders Söderman wrote:



  Hello – I´m trying to install PostgreSQL and PostGIS ”asking” PostgreSQL
 to store data on my new Netgear NAS Duo LAN-disk in a Windows XP sp2
  environment.

 In the Setup Wizard I accept C:\Program\PostgreSQL\9.0

 and I change C:\Program\PostgreSQL\9.0\data to either
  //R1/aa_db/PostgreSQL\9.0\data or P:\PostgreSQL\9.0\data where   P:\  =
  //R1/aa_db/   on the computer from which I´m installing Postgresql.

 R1 is the name of the Netgear disk och aa_db the Share/folder I want to
 store the PostgreSQL data in.



 I have tried every possible combination I can think of. Sometimes the
 installation seems to work, but in pgAdmin “Connected?” says “No”. If I try
 to install PostGIS afterwards I crashes during the EnterpriseDB Stackbuilder
 installation.
 After several hours I can´t find any useful information in the manual or on
 the FORUMs.
 If anybody could give me a hint where I should look I would be so happy.
 Regards Anders Söderman Stockholm Sweden



 What exactly is the error message you get while installing PostgreSQL? Can
 you attach the installation log (%TEMP%\install-postgresql.log) or
 (%TEMP%\bitrock_installer_some number.log).

 --

 Regards,

 Sachin Srivastava

 EnterpriseDB http://www.enterprisedb.com, the Enterprise 
 PostgreSQLhttp://www.enterprisedb.com
  company.






-- 
Regards,
Sachin Srivastava
EnterpriseDB, India


Re: [GENERAL] Problem with frequent crashes related to semctl

2010-11-09 Thread Adrian Maier

On 11/05/2010 05:02 PM, Tom Lane wrote:

Adrian Maieradrian.ma...@thalesgroup.com  writes:

I am running PostgreSQL 8.3.5 on a linux machine (Ubuntu 10.04).
Sometimes it happens that connecting to the database fails with error :



  FATAL:  semctl(360458, 3, SETVAL, 0) failed: Invalid argument (PGError)



If i restart postgres the problem gets fixed.  It doesn't matter how do i
connect to the database :  i saw this happening from psql, from jdbc, and
from ruby.


The most likely theory is that something deleted Postgres' semaphores
out from under it.  You could check this by noting the output of ipcs -s
while the database is running normally, and then comparing to the output
after it starts to fail.

If that does seem to be what's happening, look around for root-executed
scripts doing ipcrm calls.


Tom,
Thanks for the tip.

The semaphores are indeed deleted with ipcrm from a script. The script is
(re)starting another application and it simply erases all the semaphores
without taking into account the possibility that some semaphores are actually
belonging to another process...

I'll simply move the postgres installation to be started by another user. Doing
this should protect the postgres semaphores against the script (which is
executed as a regular user, not root).



Thanks,
Adrian Maier

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] temporary table as a subset of an existing table and indexes

2010-11-09 Thread Matthieu Huin

Hello Merlin,

So far the improvement in responsiveness has been very noticeable, even 
without indexing the temporary tables. Of course, this is just trading 
accuracy for speed as I simply narrow arbitrarily the search space ...


The schema I am working on is close to the one I am referencing in this 
thread : http://archives.postgresql.org/pgsql-general/2010-11/msg00272.php


Since we want to implement full text search and tags querying, it can 
lead to rather complex autogenerated queries such as this one (find log 
lines with the word 'root' in it, dated from 11/04 to 11/06, where the 
'program' tag is sshd and the 'severity_code' tag is less than 3) :



EXPLAIN ANALYZE SELECT r.logid FROM (((SELECT lcond84.logid, 
lcond84.date FROM

( SELECT tmp84.logid, tmp84.date FROM logs tmp84
  WHERE TRUE  AND ( to_tsvector('simple',tmp84.body) @@ 
plainto_tsquery('simple','root') ) AND tmp84.date  '2010-11-04 
10:22:06.26' AND tmp84.date  '2010-11-06 10:22:06.26' ) AS lcond84

NATURAL JOIN
( SELECT tmp85.logid FROM tags tmp85 WHERE
  FALSE
OR (tmp85.name='severity_code' AND num_lt(tmp85.value, 3) )
OR (tmp85.name='program' AND tmp85.value = CAST(ROW('sshd') AS tagvalue) )
  GROUP BY tmp85.logid HAVING COUNT(tmp85.logid) = 2 ) AS tcond84  )

)) AS r ORDER BY r.date DESC LIMIT 1000;



Giving the following query plan :

 Limit  (cost=765445.54..765445.56 rows=9 width=16) (actual 
time=34744.257..34744.257 rows=0 loops=1)
   -  Sort  (cost=765445.54..765445.56 rows=9 width=16) (actual 
time=34744.255..34744.255 rows=0 loops=1)

 Sort Key: tmp84.date
 Sort Method:  quicksort  Memory: 17kB
 -  Hash Join  (cost=765005.46..765445.40 rows=9 width=16) 
(actual time=34744.202..34744.202 rows=0 loops=1)

   Hash Cond: (tmp85.logid = tmp84.logid)
   -  HashAggregate  (cost=758440.29..758669.77 rows=15299 
width=8) (actual time=33343.816..33343.816 rows=0 loops=1)

 Filter: (count(tmp85.logid) = 2)
 -  Bitmap Heap Scan on tags tmp85  
(cost=92363.26..757225.45 rows=242968 width=8) (actual 
time=20676.354..33294.252 rows=32864 loops=1)
   Recheck Cond: ((name = 
'severity_code'::text) OR (name = 'program'::text))
   Filter: (((name = 'severity_code'::text) AND 
num_lt(value, 3::double precision)) OR ((name = 'program'::text) AND 
((value).storedvalue = 'sshd'::text)))
   -  BitmapOr  (cost=92363.26..92363.26 
rows=2148625 width=0) (actual time=19688.915..19688.915 rows=0 loops=1)
 -  Bitmap Index Scan on nameval_idx  
(cost=0.00..30388.35 rows=707841 width=0) (actual 
time=19337.358..19337.358 rows=708719 loops=1)
   Index Cond: (name = 
'severity_code'::text)
 -  Bitmap Index Scan on nameval_idx  
(cost=0.00..61853.42 rows=1440784 width=0) (actual time=351.551..351.551 
rows=1484703 loops=1)

   Index Cond: (name = 'program'::text)
   -  Hash  (cost=6553.06..6553.06 rows=969 width=16) 
(actual time=1400.378..1400.378 rows=32516 loops=1)
 -  Bitmap Heap Scan on logs tmp84  
(cost=215.51..6553.06 rows=969 width=16) (actual time=1104.226..1383.745 
rows=32516 loops=1)
   Recheck Cond: 
(to_tsvector('simple'::regconfig, body) @@ '''root'''::tsquery)
   Filter: ((date  '2010-11-04 
10:22:06.26+01'::timestamp with time zone) AND (date  '2010-11-06 
10:22:06.26+01'::timestamp with time zone))
   -  Bitmap Index Scan on fulltext_body_idx  
(cost=0.00..215.27 rows=1740 width=0) (actual time=1097.874..1097.874 
rows=64340 loops=1)
 Index Cond: 
(to_tsvector('simple'::regconfig, body) @@ '''root'''::tsquery)

 Total runtime: 34756.938 ms

This one isn't too bad, but the runtime seems to increase exponentially 
with the tables size. Therefore, using a temporary table based on the 
date condition can cut the query time by a factor of up to ten (table 
creation included, and provided the resulting table isn't too big - I 
make a COUNT check prior to creation so that I will eventually limit 
manually the table size.). But of course, I'd rather have speed AND 
accuracy ...


To make things worse, the tables tend to grow very quickly since as you 
might have guessed, I am working on the database part of a logs 
collector; the current implementation doesn't scale well along the data.


I hope this makes things clearer. Feel free to ask if you need more 
clarifications, and thanks for your time.


Matthieu


Le 08/11/2010 22:26, Merlin Moncure a écrit :

On Mon, Nov 8, 2010 at 12:15 PM, Matthieu Huinmatthieu.h...@wallix.com  wrote:

Greetings all,

I am trying to optimize SELECT queries on a large table (10M rows and more)
by using temporary tables that are subsets of my main table, thus narrowing
the search space 

[GENERAL] Looking for PostgreSQL Folks in New Orleans area

2010-11-09 Thread Robert Treat
Howdy folks,

We're looking for some PostgreSQL users / advocates in the New Orleans area
for some community outreach activities, like PGDays and User Groups. If you
are in that area and interested in helping, or know who to talk to, please
drop me a line, thanks!

Robert Treat
play: http://www.xzilla.net
work: http://www.omniti.com


Re: [GENERAL] temporary table as a subset of an existing table and indexes

2010-11-09 Thread Merlin Moncure
On Tue, Nov 9, 2010 at 4:47 AM, Matthieu Huin matthieu.h...@wallix.com wrote:
 Hello Merlin,

 So far the improvement in responsiveness has been very noticeable, even
 without indexing the temporary tables. Of course, this is just trading
 accuracy for speed as I simply narrow arbitrarily the search space ...

 The schema I am working on is close to the one I am referencing in this
 thread : http://archives.postgresql.org/pgsql-general/2010-11/msg00272.php

 Since we want to implement full text search and tags querying, it can lead
 to rather complex autogenerated queries such as this one (find log lines
 with the word 'root' in it, dated from 11/04 to 11/06, where the 'program'
 tag is sshd and the 'severity_code' tag is less than 3) :


 EXPLAIN ANALYZE SELECT r.logid FROM (((SELECT lcond84.logid, lcond84.date
 FROM
 ( SELECT tmp84.logid, tmp84.date FROM logs tmp84
  WHERE TRUE  AND ( to_tsvector('simple',tmp84.body) @@
 plainto_tsquery('simple','root') ) AND tmp84.date  '2010-11-04 10:22:06.26'
 AND tmp84.date  '2010-11-06 10:22:06.26' ) AS lcond84
 NATURAL JOIN
 ( SELECT tmp85.logid FROM tags tmp85 WHERE
  FALSE
 OR (tmp85.name='severity_code' AND num_lt(tmp85.value, 3) )
 OR (tmp85.name='program' AND tmp85.value = CAST(ROW('sshd') AS tagvalue) )
  GROUP BY tmp85.logid HAVING COUNT(tmp85.logid) = 2 ) AS tcond84  )

 )) AS r ORDER BY r.date DESC LIMIT 1000;



 Giving the following query plan :

  Limit  (cost=765445.54..765445.56 rows=9 width=16) (actual
 time=34744.257..34744.257 rows=0 loops=1)
   -  Sort  (cost=765445.54..765445.56 rows=9 width=16) (actual
 time=34744.255..34744.255 rows=0 loops=1)
         Sort Key: tmp84.date
         Sort Method:  quicksort  Memory: 17kB
         -  Hash Join  (cost=765005.46..765445.40 rows=9 width=16) (actual
 time=34744.202..34744.202 rows=0 loops=1)
               Hash Cond: (tmp85.logid = tmp84.logid)
               -  HashAggregate  (cost=758440.29..758669.77 rows=15299
 width=8) (actual time=33343.816..33343.816 rows=0 loops=1)
                     Filter: (count(tmp85.logid) = 2)
                     -  Bitmap Heap Scan on tags tmp85
  (cost=92363.26..757225.45 rows=242968 width=8) (actual
 time=20676.354..33294.252 rows=32864 loops=1)
                           Recheck Cond: ((name = 'severity_code'::text) OR
 (name = 'program'::text))
                           Filter: (((name = 'severity_code'::text) AND
 num_lt(value, 3::double precision)) OR ((name = 'program'::text) AND
 ((value).storedvalue = 'sshd'::text)))
                           -  BitmapOr  (cost=92363.26..92363.26
 rows=2148625 width=0) (actual time=19688.915..19688.915 rows=0 loops=1)
                                 -  Bitmap Index Scan on nameval_idx
  (cost=0.00..30388.35 rows=707841 width=0) (actual time=19337.358..19337.358
 rows=708719 loops=1)
                                       Index Cond: (name =
 'severity_code'::text)
                                 -  Bitmap Index Scan on nameval_idx
  (cost=0.00..61853.42 rows=1440784 width=0) (actual time=351.551..351.551
 rows=1484703 loops=1)
                                       Index Cond: (name = 'program'::text)
               -  Hash  (cost=6553.06..6553.06 rows=969 width=16) (actual
 time=1400.378..1400.378 rows=32516 loops=1)
                     -  Bitmap Heap Scan on logs tmp84
  (cost=215.51..6553.06 rows=969 width=16) (actual time=1104.226..1383.745
 rows=32516 loops=1)
                           Recheck Cond: (to_tsvector('simple'::regconfig,
 body) @@ '''root'''::tsquery)
                           Filter: ((date  '2010-11-04
 10:22:06.26+01'::timestamp with time zone) AND (date  '2010-11-06
 10:22:06.26+01'::timestamp with time zone))
                           -  Bitmap Index Scan on fulltext_body_idx
  (cost=0.00..215.27 rows=1740 width=0) (actual time=1097.874..1097.874
 rows=64340 loops=1)
                                 Index Cond:
 (to_tsvector('simple'::regconfig, body) @@ '''root'''::tsquery)
  Total runtime: 34756.938 ms

 This one isn't too bad, but the runtime seems to increase exponentially with
 the tables size. Therefore, using a temporary table based on the date
 condition can cut the query time by a factor of up to ten (table creation
 included, and provided the resulting table isn't too big - I make a COUNT
 check prior to creation so that I will eventually limit manually the table
 size.). But of course, I'd rather have speed AND accuracy ...

 To make things worse, the tables tend to grow very quickly since as you
 might have guessed, I am working on the database part of a logs collector;
 the current implementation doesn't scale well along the data.

 I hope this makes things clearer. Feel free to ask if you need more
 clarifications, and thanks for your time.

How are you partitioning the tags?  Is the partitioned query doing the
same job as the non partitioned query?   Is date a forced criteria?
(and if it is, have you considered date partition/brute force?)

merlin

-- 
Sent via pgsql-general 

Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Merlin Moncure
On Mon, Nov 8, 2010 at 11:24 PM, Sandeep Srinivasa s...@clearsenses.com wrote:
 There was an interesting post today on highscalability
 - http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html
 The discussion/comments touched upon why mysql is a better idea for Facebook
 than Postgres. Here's an interesting one

postgresql might not be a good fit for this type of application, but
the reasoning given in the article is really suspicious.  The true
answer was hinted at in the comments: we chose it first, and there
was never a reason to change it.  It really comes down to they
probably don't need much from the database other than a distributed
key value store, and they built a big software layer on top of that to
manage it.  Hm, I use facebook and I've seen tons of inconsistent
answers, missing notifications and such.  I wonder if there's a
connection there...

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Allan Kamau
On Tue, Nov 9, 2010 at 3:50 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Mon, Nov 8, 2010 at 11:24 PM, Sandeep Srinivasa s...@clearsenses.com 
 wrote:
 There was an interesting post today on highscalability
 - http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html
 The discussion/comments touched upon why mysql is a better idea for Facebook
 than Postgres. Here's an interesting one

 postgresql might not be a good fit for this type of application, but
 the reasoning given in the article is really suspicious.  The true
 answer was hinted at in the comments: we chose it first, and there
 was never a reason to change it.  It really comes down to they
 probably don't need much from the database other than a distributed
 key value store, and they built a big software layer on top of that to
 manage it.  Hm, I use facebook and I've seen tons of inconsistent
 answers, missing notifications and such.  I wonder if there's a
 connection there...

 merlin

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


I agree with Merlin, There is a surprising big number of good
technology companies (including Google) out there using MySQL. For
sometime I have been wondering why and have come up with a few
(possibly wrong) theories. Such as: these companies are started by
application developers not database experts, the cost (effort) of
changing to other database engine is substantial given that that
probably there is already so much inconsistencies in their current
data setup coupled with considerable amount of inconsistency cover-up
code at the application programs, and maybe the IT team is doubling up
as a fire fighting department constantly putting out the data driven
fires. This is then compounded by the rapid increase in data.

Allan.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Scott Ribe
On Nov 9, 2010, at 7:04 AM, Allan Kamau wrote:

 have come up with a few
 (possibly wrong) theories.

They all sound reasonable. I think you missed an important one though: 
aggressive (and even sometimes outright false) promotion and sales by the 
company MySQL AB.

Why I started looking at databases, you didn't have to look very hard to find 
PostgreSQL, but you did have to at least make a minimal effort.

Also, my understanding is that if you go way back on the PostgreSQL timeline to 
versions 6 and earliest 7.x, it was a little shaky. (I started with 7.3 or 7.4, 
and it has been rock solid.)

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] REINDEX requirement?

2010-11-09 Thread Josh Kupershmidt
On Tue, Nov 9, 2010 at 4:26 AM, AI Rumman rumman...@gmail.com wrote:
 How do I know that index require REINDEX?

Well, the REINDEX page:
  http://www.postgresql.org/docs/current/static/sql-reindex.html

gives a few examples of why you might need to reindex. I think the
most common reason would probably be due to index bloat. See
  http://wiki.postgresql.org/wiki/Index_Maintenance

under Index Bloat for links to some handy queries to see whether
your tables and/or indexes suffer bloat.

Josh

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Failed archive_command copy - number of attempts configurable?

2010-11-09 Thread Fujii Masao
On Tue, Nov 9, 2010 at 4:01 AM, dan.m.harris
daniel.har...@metaswitch.com wrote:
 But then the primary retries this another 49 times! So 150 attempts in all.

 What I need to know is whether these numbers are configurable?

No.

 Can they be
 timed? How long before the primary stops retrying altogether?

Forever until the archive will have been available again.

BTW, since the primary cannot remove the unarchived WAL file from
pg_xlog directory, unless you fix the archive soon, the primary might
run out of the disk space and cause a PANIC error.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Vick Khera
On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe scott_r...@killerbytes.com wrote:
 Also, my understanding is that if you go way back on the PostgreSQL timeline 
 to versions 6 and earliest 7.x, it was a little shaky. (I started with 7.3 or 
 7.4, and it has been rock solid.)


In those same times, mysql was also, um, other than rock solid.  I
have somewhere a personal email from Monty describing how to
crash-recover corrupted myisam data files (I was customer number 13 I
believe... i wish i still had that support contract certificate as an
artifact)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Tom Lane
Vick Khera vi...@khera.org writes:
 On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe scott_r...@killerbytes.com 
 wrote:
 Also, my understanding is that if you go way back on the PostgreSQL timeline 
 to versions 6 and earliest 7.x, it was a little shaky. (I started with 7.3 
 or 7.4, and it has been rock solid.)

 In those same times, mysql was also, um, other than rock solid.

I don't have enough operational experience with mysql to speak to how
reliable it was back in the day.  What it *did* have over postgres back
then was speed.  It was a whole lot faster, particularly on the sort of
single-stream-of-simple-queries cases that people who don't know
databases are likely to set up as benchmarks.  (mysql still beats us on
cases like that, though not by as much.)  I think that drove quite a
few early adoption decisions, and now folks are locked in; the cost of
conversion outweighs the (perceived) benefits.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Dmitriy Igrishin
Hey all,

IMO that they choiced MySQL because of no knowledge
about PostgreSQL and about valid database designs.
Just garbage of data for SELECTing with minimal efforts
on data integrity and database server programming (ala
typical PHP project).
Sorry :-)

2010/11/9 Tom Lane t...@sss.pgh.pa.us

 Vick Khera vi...@khera.org writes:
  On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe scott_r...@killerbytes.com
 wrote:
  Also, my understanding is that if you go way back on the PostgreSQL
 timeline to versions 6 and earliest 7.x, it was a little shaky. (I started
 with 7.3 or 7.4, and it has been rock solid.)

  In those same times, mysql was also, um, other than rock solid.

 I don't have enough operational experience with mysql to speak to how
 reliable it was back in the day.  What it *did* have over postgres back
 then was speed.  It was a whole lot faster, particularly on the sort of
 single-stream-of-simple-queries cases that people who don't know
 databases are likely to set up as benchmarks.  (mysql still beats us on
 cases like that, though not by as much.)  I think that drove quite a
 few early adoption decisions, and now folks are locked in; the cost of
 conversion outweighs the (perceived) benefits.

regards, tom lane

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




-- 
// Dmitriy.


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Cédric Villemain
2010/11/9 Tom Lane t...@sss.pgh.pa.us:
 Vick Khera vi...@khera.org writes:
 On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe scott_r...@killerbytes.com 
 wrote:
 Also, my understanding is that if you go way back on the PostgreSQL 
 timeline to versions 6 and earliest 7.x, it was a little shaky. (I started 
 with 7.3 or 7.4, and it has been rock solid.)

 In those same times, mysql was also, um, other than rock solid.

 I don't have enough operational experience with mysql to speak to how
 reliable it was back in the day.  What it *did* have over postgres back
 then was speed.  It was a whole lot faster, particularly on the sort of
 single-stream-of-simple-queries cases that people who don't know
 databases are likely to set up as benchmarks.  (mysql still beats us on
 cases like that, though not by as much.)  I think that drove quite a
 few early adoption decisions, and now folks are locked in; the cost of
 conversion outweighs the (perceived) benefits.

Facebook have writen  Flashcache [is] built primarily as a block
cache for InnoDB but is general purpose and can be used by other
applications as well.

https://github.com/facebook/flashcache/

A good tool by the way. It is the only place where I like to see SSD
disk. (not at facebook, but with 'volatile' data)


                        regards, tom lane

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I guess I'm missing something here WRT FOUND

2010-11-09 Thread Rob Sargent


On 11/08/2010 09:11 PM, Ralph Smith wrote:
 How is COLLEEN not there and there at the same time?
 -
 NOTICE:  did not = 11K = 42
 CONTEXT:  PL/pgSQL function get_word line 37 at perform
 NOTICE:  value = COLLEEN
 CONTEXT:  PL/pgSQL function get_word line 29 at perform
 
 ERROR:  duplicate key violates unique constraint uniq_tokens
 CONTEXT:  PL/pgSQL function get_word line 30 at SQL statement
 
 #
 /*
 Generate a list of up to 7 tokens from the business table's
 conformedname field.
 Strip off leading and trailing commans and quotes, etc.
 Results are inserted into table zbus_tokens, not sorted.
 */
 
 CREATE OR REPLACE FUNCTION get_word() RETURNS VOID AS '
 
 DECLARE business business%ROWTYPE ;
 bnamevarchar(100) ; --business.conformedname%TYPE ;
 Word varchar(100) ;
 Word2varchar(100) ;
 Wcount   INTEGER ;
 IBIGINT DEFAULT 0 ;
 JBIGINT DEFAULT 0 ;
 K BIGINT DEFAULT 0 ;
 IsThere  INT ;
 
 BEGIN
 
   FOR business IN SELECT * FROM business limit 500 LOOP
 bname=business.conformedname ;
 I=I+1 ;
 
 FOR Wcount IN 1..7  LOOP
   Word=split_part(bname,'' '',Wcount) ;
   Word2=ltrim(Word,''!?.%()+$*/0123456789'') ;
   Word=rtrim(Word2,''!?.()+$*/0123456789'') ;
   Word2=rtrim(ltrim(Word,'',''),'','') ;
   Word=rtrim(ltrim(Word2,),) ;
   
   IF LENGTH(Word)0 THEN
 Word2=substring(Word from 1 for 50) ;
  -- PERFORM SELECT COUNT(*) FROM zbus_tokens WHERE token = Word2 ;
  -- IF FOUND THEN
   PERFORM RNotice1(1,''value'',Word2) ;-- line 29
   INSERT INTO zbus_tokens (token) values(Word2);
   J=J+1 ;
   IF J % 100 = 0 THEN
 PERFORM Rnotice2(1,''Row'',I,''Inserts'',J) ;
   END IF ;
 ELSE
   K=K+1 ;
   PERFORM RNotice2(1,''did not'',I,''K'',K) ;   -- line 37
  -- END IF ;
   END IF ;
   
 END LOOP ;
 
   END LOOP ;
  
   RETURN  ;
  
 END ; ' LANGUAGE plpgsql;
 -- ==
 SELECT get_word ();
 SELECT token, count(token) from zbus_tokens group by 1 order by 2 desc ;
 SELECT count(*) from zbus_tokens where token='COLLEEN;
 
 drop function get_word() ;
 truncate zbus_tokens ;
 drop table zbus_tokens;
 create table zbus_tokens (id bigserial, token varchar(50), CONSTRAINT
 uniq_tokens UNIQUE (token)) ;
 ===
 DOCTOR FINN'S CARD COMPANY
 SPECIALTY MAINTENANCE
 RIVERS LANDING RESTAURANT
 SEATTLE FUSION FC
 PROFESSIONAL PRACTICE ENVIRONMENTS INC
 CELEBRATE YOURSELF
 NEW ACTIVITEA BEVERAGE CO
 KARY ADAM HORWITZ
 JOHN CASTRO MAGICIAN
 RELIABLE AUTO RENTAL  PARKING
 COLLEEN CASEY, LMP
 COLLEEN CASEY, LMP
 
 THANKS!
 Again, 7.4 BITES!
 
 -- 
 
 Ralph
 _
 


I'm wondering if count(*) isn't ALWAYS found?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] temporary table as a subset of an existing table and indexes

2010-11-09 Thread Matthieu Huin
Basically, I take the same query as above and replace all occurences of 
tables logs and tags with temp_logs and temp_tags, created as follow:


CREATE TEMPORARY TABLE temp_logs ON COMMIT DROP AS
SELECT * FROM logs WHERE condition ORDER BY date DESC LIMIT max_size;

CREATE TEMPORARY TABLE temp_tags ON COMMIT DROP AS
SELECT * FROM tags WHERE logid IN (SELECT logid FROM temp_logs);

With condition usually defining a date window. As we are experimenting 
with this approach, date has become a forced criteria. I have 
experimented with partitioning, but it led to the logid primary key not 
being unique anymore, which was a problem when joining data with the 
tags table.


So the queries are pretty much the same, the boost in speed being simply 
due to the limitation of the search space.



How are you partitioning the tags?  Is the partitioned query doing the
same job as the non partitioned query?   Is date a forced criteria?
(and if it is, have you considered date partition/brute force?)

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Merlin Moncure
On Tue, Nov 9, 2010 at 10:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Vick Khera vi...@khera.org writes:
 On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe scott_r...@killerbytes.com 
 wrote:
 Also, my understanding is that if you go way back on the PostgreSQL 
 timeline to versions 6 and earliest 7.x, it was a little shaky. (I started 
 with 7.3 or 7.4, and it has been rock solid.)

 In those same times, mysql was also, um, other than rock solid.

 I don't have enough operational experience with mysql to speak to how
 reliable it was back in the day.  What it *did* have over postgres back
 then was speed.  It was a whole lot faster, particularly on the sort of
 single-stream-of-simple-queries cases that people who don't know
 databases are likely to set up as benchmarks.  (mysql still beats us on
 cases like that, though not by as much.)  I think that drove quite a
 few early adoption decisions, and now folks are locked in; the cost of
 conversion outweighs the (perceived) benefits.

Postgres 7.2 brought non blocking vacuum.   Before that, you could
pretty much write off any 24x7 duty applications -- dealing with dead
tuples was just too much of a headache.   The mysql of the time, 3.23,
was fast but locky and utterly unsafe.  It has been easier to run
though until recently (8.4 really changed things).

Postgres has been relatively disadvantaged in terms of administrative
overhead which is a bigger deal than sql features, replication,
performance, etc for high load website type cases.  heap FSM, tunable
autovacuum, checkpoint management, smarter/faster statistics
collector, and more backup options may not be as sexy as replication
etc but are very appealing features if you are running 50 database
servers backing a monster web site.   Dumping sys v ipc for mmap is a
hypothetical improvement in that vein :-) (aiui, it is not possible
though).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Andy

--- On Tue, 11/9/10, Gauthier, Dave dave.gauth...@intel.com wrote:
 
 A different slant on this has to do with licensing and $$.
 Might Oracle decide some day to start charging for their new
 found DB?  They are a for-profit company that's
 beholding to their shareholders LONG before an open software
 community.  Consumers like Facebook and Google have
 deep pockets, something corporate executives really don't
 dismiss lightly.

This is just FUD.

MySQL is GPL'd, just like Linux is.

To say you should avoid MySQL because Oracle may someday start charging for it 
is like saying you should avoid Linux because Red Hat may someday start 
charging for it.

That makes no sense, especially since both Oracle and Red Hat are already 
charging for their products. Doesn't mean you can't keep using free Linux and 
MySQL.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] finding the other statement causing a sharelock

2010-11-09 Thread Dimitri Fontaine
Ivan Sergio Borgonovo m...@webthatworks.it writes:
 I've some strong suspect... and I'd like to exit earlier from a
 function if a process is running but I'm not really sure how to add a
 semaphore...

Maybe pg_try_advisory_lock() would help you there?

  http://www.postgresql.org/docs/8.3/static/explicit-locking.html#ADVISORY-LOCKS
  
http://www.postgresql.org/docs/8.3/static/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Gauthier, Dave
Think upgrades

-Original Message-
From: Andy [mailto:angelf...@yahoo.com] 
Sent: Tuesday, November 09, 2010 12:02 PM
To: pgsql-general@postgresql.org; Gauthier, Dave
Subject: Re: [GENERAL] Why facebook used mysql ?


--- On Tue, 11/9/10, Gauthier, Dave dave.gauth...@intel.com wrote:
 
 A different slant on this has to do with licensing and $$.
 Might Oracle decide some day to start charging for their new
 found DB?  They are a for-profit company that's
 beholding to their shareholders LONG before an open software
 community.  Consumers like Facebook and Google have
 deep pockets, something corporate executives really don't
 dismiss lightly.

This is just FUD.

MySQL is GPL'd, just like Linux is.

To say you should avoid MySQL because Oracle may someday start charging for it 
is like saying you should avoid Linux because Red Hat may someday start 
charging for it.

That makes no sense, especially since both Oracle and Red Hat are already 
charging for their products. Doesn't mean you can't keep using free Linux and 
MySQL.


  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Scott Marlowe
On Tue, Nov 9, 2010 at 10:00 AM, Merlin Moncure mmonc...@gmail.com wrote:
 Postgres 7.2 brought non blocking vacuum.   Before that, you could
 pretty much write off any 24x7 duty applications -- dealing with dead
 tuples was just too much of a headache.

Amen!  I remember watching vacuum run alongside other queries and
getting all school-girl giggly over it.  Seriously it was a big big
change for pgsql.

 The mysql of the time, 3.23,
 was fast but locky and utterly unsafe.

True, it was common to see mysql back then just stop, dead.  Go to
bring it back up and have to repair tables.

 Postgres has been relatively disadvantaged in terms of administrative
 overhead which is a bigger deal than sql features, replication,
 performance, etc for high load website type cases.

I would say it's a bigger problem for adoption than for high load
sites.  If Joe User spends an hour a day keeping his database on his
workstation happy, he's probably not happy.  If Joe Admin spends an
hour a day keeping his 100 machine db farm happy, he's probably REALLY
happy that it only takes so long.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Chris Browne
kamaual...@gmail.com (Allan Kamau) writes:
 I agree with Merlin, There is a surprising big number of good
 technology companies (including Google) out there using MySQL. For
 sometime I have been wondering why and have come up with a few
 (possibly wrong) theories. Such as: these companies are started by
 application developers not database experts, the cost (effort) of
 changing to other database engine is substantial given that that
 probably there is already so much inconsistencies in their current
 data setup coupled with considerable amount of inconsistency cover-up
 code at the application programs, and maybe the IT team is doubling up
 as a fire fighting department constantly putting out the data driven
 fires. This is then compounded by the rapid increase in data.

This wasn't a good explanation for what happened when Sabre announced
they were using MySQL:

   http://www.mysql.com/news-and-events/generate-article.php?id=2003_33

I used to work at Sabre, and what I saw was *mostly* an Oracle shop, but
with significant bastions of IMS, DB2, Teradata, and Informix.  Your
theory might fit with dumb startups, but certainly not with Sabre,
which still has significant deployments of IMS!  :-)

I actually am inclined to go with less rational explanations; a lot of
decisions get made for reasons that do not connect materially (if at
all) with the technical issues.

One such would be that the lawyers and marketing folk that tend to be at
the executive layer do *their* thing of making deals, and when they're
busy making deals, the only people interfacing with them are:

 - Salescritters from the Big O buying them lunch

 - Other Political Animals that Made The Decision to go with MySQL (or
   such) and are happy to explain, over golf, that it went fine for us
   (even if it didn't go entirely so fine; they didn't hear about it)

Lunch and golf can have material effects.
-- 
cbbrowne,@,acm.org
Rules of the Evil Overlord #67.  No matter how many shorts we have in
the system, my  guards will be instructed to  treat every surveillance
camera malfunction as a full-scale emergency.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Graham Leggett

On 09 Nov 2010, at 7:16 PM, Gauthier, Dave wrote:


Think upgrades


This is covered by the GPL license. Once you have released code under  
the GPL, all derivative code - ie upgrades - have to also be released  
in source form, under the GPL license.


Regards,
Graham
--


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Andy
Any upgrades that are based on the MySQL source code will be legally required 
to be released under GPL too. 

That's the beauty of GPL.

Software under MIT or BSD license could be hijacked by private companies. 
Software under GPL license could not.


--- On Tue, 11/9/10, Gauthier, Dave dave.gauth...@intel.com wrote:

 From: Gauthier, Dave dave.gauth...@intel.com
 Subject: RE: [GENERAL] Why facebook used mysql ?
 To: Andy angelf...@yahoo.com, pgsql-general@postgresql.org 
 pgsql-general@postgresql.org
 Date: Tuesday, November 9, 2010, 12:16 PM
 Think upgrades
 
 -Original Message-
 From: Andy [mailto:angelf...@yahoo.com]
 
 Sent: Tuesday, November 09, 2010 12:02 PM
 To: pgsql-general@postgresql.org;
 Gauthier, Dave
 Subject: Re: [GENERAL] Why facebook used mysql ?
 
 
 --- On Tue, 11/9/10, Gauthier, Dave dave.gauth...@intel.com
 wrote:
  
  A different slant on this has to do with licensing and
 $$.
  Might Oracle decide some day to start charging for
 their new
  found DB?  They are a for-profit company that's
  beholding to their shareholders LONG before an open
 software
  community.  Consumers like Facebook and Google have
  deep pockets, something corporate executives really
 don't
  dismiss lightly.
 
 This is just FUD.
 
 MySQL is GPL'd, just like Linux is.
 
 To say you should avoid MySQL because Oracle may someday
 start charging for it is like saying you should avoid Linux
 because Red Hat may someday start charging for it.
 
 That makes no sense, especially since both Oracle and Red
 Hat are already charging for their products. Doesn't mean
 you can't keep using free Linux and MySQL.
 
 
       
 




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread David Boreham

On 11/9/2010 10:27 AM, Graham Leggett wrote:


This is covered by the GPL license. Once you have released code under 
the GPL, all derivative code - ie upgrades - have to also be released 
in source form, under the GPL license.


Sorry but this is 100% not true. It may be true for a 3rd party (you 
release something under the GPL, I enhance it, therefore I am required 
to release my enhancement under the GPL). But Oracle owns the copyright 
to the MySql code and therefore they can decide to do whatever they want 
with it. The only thing they can't do is to 'un-release' existing code 
released under the GPL. Everything else is possible.


Ownership of the copyright trumps the GPL.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Dave Page
On Tue, Nov 9, 2010 at 9:28 AM, Andy angelf...@yahoo.com wrote:
 Any upgrades that are based on the MySQL source code will be legally required 
 to be released under GPL too.

 That's the beauty of GPL.

Upgrades released by Oracle *do not* have be under GPL. They own all
the IP, and can release future versions under whatever terms they see
fit.

Other entities, do have to use the GPL if they release their own updates.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread David Boreham

In addition to the license a product is currently available under,
you need to also consider who owns its copyright; who owns
its test suite (which may not be open source at all); who
employs all the people who understand the code and who owns
the trademarks that identify the product.

Red Hat owns none of these things with respect to Linux
(although they do for various other products such as
their Directory Server and JBoss).



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Scott Marlowe
On Tue, Nov 9, 2010 at 10:28 AM, Andy angelf...@yahoo.com wrote:
 Any upgrades that are based on the MySQL source code will be legally required 
 to be released under GPL too.

 That's the beauty of GPL.

This isn't entirely true.  Oracle owns all copyrights to mysql source
code.  they can release a binary only commercially licensed version
with features that they choose NOT to release under the GPL.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Andy
Not true.

As a condition of getting European Commission's approval of its acquisition of 
Sun/MySQL, Oracle had to agree to continue the GPL release.

And there are non-Oracle upgrades from Google, facebook, Percona, etc. So no 
one is beholden to Oracle.

--- On Tue, 11/9/10, Dave Page dp...@pgadmin.org wrote:

 From: Dave Page dp...@pgadmin.org
 Subject: Re: [GENERAL] Why facebook used mysql ?
 To: Andy angelf...@yahoo.com
 Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org, 
 DaveGauthier dave.gauth...@intel.com
 Date: Tuesday, November 9, 2010, 12:31 PM
 On Tue, Nov 9, 2010 at 9:28 AM, Andy
 angelf...@yahoo.com
 wrote:
  Any upgrades that are based on the MySQL source code
 will be legally required to be released under GPL too.
 
  That's the beauty of GPL.
 
 Upgrades released by Oracle *do not* have be under GPL.
 They own all
 the IP, and can release future versions under whatever
 terms they see
 fit.
 
 Other entities, do have to use the GPL if they release
 their own updates.
 
 -- 
 Dave Page
 Blog: http://pgsnake.blogspot.com
 Twitter: @pgsnake
 
 EnterpriseDB UK: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company
 


  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Lincoln Yeoh

At 12:24 PM 11/9/2010, Sandeep Srinivasa wrote:
There was an interesting post today on highscalability - 
http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.htmlhttp://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html 


T

I wonder if anyone can comment on this - especially the part that PG 
doesnt scale as well as MySQL on multiple cores ?


The multiple cores part is unlikely to be very relevant in the 
Facebook context.


Scaling over four or 8 cores is not a biggie nowadays right? From 
what I've seen the Facebook, Google type companies tend to use LOTS 
of cheap servers (dual core, quad core, whatever Intel or AMD can 
churn out cheaply). I doubt they use =32 core machines for their 
public facing apps.


What's more important to such companies is the ability to scale over 
multiple machines. There is no way a single server is going to handle 
1 billion users.


So they have to design and build their apps accordingly, to not need 
massive serialization/locking. When you post something on Facebook, 
nobody else has to wait for your post first. Nobody cares if their FB 
friend/likes counter is somewhat wrong for a while (as long as it 
eventually shows the correct figure). So scaling out over multiple 
machines and sharding isn't as hard.


Whereas if you require all posts to have a globally unique ID taken 
from an integer sequence that increases without any gaps, it becomes 
a rather difficult problem to scale out over many machines. No matter 
how many machines you have and wherever they are in the world, every 
post has to wait for the sequence.


As for why they used mysql- probably the same reason why they used 
php. They're what the original developers used. It doesn't matter so 
much as long as the app is not that slow and can scale out without 
too much pain.


Regards,
Link.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread David Boreham

On 11/9/2010 10:45 AM, Andy wrote:

As a condition of getting European Commission's approval of its acquisition of 
Sun/MySQL, Oracle had to agree to continue the GPL release.


In case anyone is interested in what specifically Oracle agreed to do, 
this is the text
from the decision (they agreed to do the following for 5 years 
post-deal-closing) :


Commitment to enhance MySQL in the future under the GPL. Oracle shall 
continue to

enhance MySQL and make subsequent versions of MySQL, including Version 6,
available under the GPL. Oracle will not release any new, enhanced 
version of MySQL
Enterprise Edition without contemporaneously releasing a new, also 
enhanced version
of MySQL Community Edition licensed under the GPL. Oracle shall continue 
to make
the source code of all versions of MySQL Community Edition publicly 
available at no

charge.







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full Vacuum/Reindex vs autovacuum

2010-11-09 Thread Vick Khera
On Mon, Nov 8, 2010 at 2:06 PM, Jason Long
mailing.li...@octgsoftware.com wrote:
 Every night when there is no activity I do a full vacuum, a reindex, and
 then dump a nightly backup.

 Is this optimal with regards to performance?  autovacuum is set to the
 default.

In the general case this seems way overkill.  Do you suffer from a lot
of churn daily?  That is, are there bunches of updates?

One thing you lose when running vacuum full is the space in the file
that is pre-allocated but empty.  If you do lots of updates and
inserts, you'll be allocating pages and growing the underlying files
to hold your data.  If you leave the unused space there, it is much
faster for postgres just to fill it.  The key is keeping that unused
space from growing beyond reason... the trick is defining for your own
use case what within reason means.

As for re-index, don't bother.  Unless you have some degenerate case
(something like a queue) where you always insert values at the tail
end of the index and delete from the front end of the index, and let
autovacuum do its work, you should remain in a fairly steady state.

There are queries you can run against the database to detect how
bloated your indexes are after a while, and then reindex if necessary.

I find that some of my data needs a reindex about every 4 to 6 months,
while others never benefit.  I *never* run a vacuum full.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Dave Page
On Tue, Nov 9, 2010 at 5:45 PM, Andy angelf...@yahoo.com wrote:
 Not true.

 As a condition of getting European Commission's approval of its acquisition 
 of Sun/MySQL, Oracle had to agree to continue the GPL release.

 And there are non-Oracle upgrades from Google, facebook, Percona, etc. So no 
 one is beholden to Oracle.

It is true. The EU commitments are entirely independent of the licencing.

Also note that the commitments
(http://www.oracle.com/us/corporate/press/042364) are pretty loosely
worded. For example, Oracle would be fulfilling them if they released
a new Enterprise version with 1000 new features, and a corresponding
community version with just one of those features.

And after 5 years (nearer to 4 now I guess), those commitments get
thrown away entirely.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Tom Lane
David Boreham david_l...@boreham.org writes:
 In addition to the license a product is currently available under,
 you need to also consider who owns its copyright; who owns
 its test suite (which may not be open source at all); who
 employs all the people who understand the code and who owns
 the trademarks that identify the product.

Indeed.  One thing that's particularly worth noting is that the mysql
documentation is not, and never has been, freely redistributable.

The *real* license risk for mysql in the past was that all the people
who were qualified to do serious development worked for the same
company.  If that company chose to stop releasing updates for free ---
as they absolutely had the legal right to do --- the fact that you had
the source code for previous releases wasn't really going to do you a
whole lot of good.  (Now it's possible that users could band together
to start their own fork from the last GPL release, and eventually get to
the point of doing useful development.  We've seen that movie before,
in fact: it's called postgres, circa 1996 right after Berkeley abandoned
it.  So one could hope that after several years you might have a viable
development community, but there's gonna be a lot of pain first.)

The recent fragmentation of development talent over in the mysql world
might change things, but it's still very unclear what the long-term
result will be.  If I were about to choose a database to bet my company
on, I'd be afraid of picking mysql simply because its future development
path isn't clear.  Oracle may own the copyright, but all the key
developers left, so it's definitely not clear that they'll be able to do
much with it for some time to come (even assuming that they want to).
And who knows which of the forks will succeed?

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Porting from MS Access 2007 to PostgreSQL

2010-11-09 Thread Adrian Klaver
On Monday 08 November 2010 8:22:51 pm Victor Hooi wrote:
 Hi,

 Disclaimer: Not a DBA, nor I am not a DB guy, so please excuse any
 ignorance in the below.


 *4. MS Access to Postgres*

 Anyhow, somebody else suggested it might be better to just go straight from
 the original MS Access database to PostgreSQL.

 My first question is, what is the current recommended procedure for this?


One way I have done this is to use the Make Table Query in Access to create a 
table in Postgres from one in Access. This assumes you have an ODBC connection 
set up to Postgres. 

 Cheers,
 Victor



-- 
Adrian Klaver
adrian.kla...@gmail.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgresql scalability issue

2010-11-09 Thread Ivan Voras
On 11/08/10 16:33, umut orhan wrote:
 Hi all,
 
 
 I've collected some interesting results during my experiments which I 
 couldn't 
 figure out the reason behind them and need your assistance.
 
 I'm running PostgreSQL 9.0 on a quad-core machine having two level on-chip 
 cache 
 hierarchy. 

Based on this information, you are most likely running on Intel Xeon
5000-5400 series CPU, right? It probably doesn't matter much since you
apparently have only a single socket populated but is a bit old
architecture known for its FSB bottleneck to the memory.

You should write some details about your hardware: at least CPU
model/speed and memory speed, and software (which OS? 32-bit or 64-bit?)

 PostgreSQL has a large and warmed-up  buffer
 cache thus, no disk I/O is observed during experiments (i.e. for each query 
 buffer cache hit rate is 100%). I'm pinning each query/process to an 
 individual 
 core. Queries are simple read-only queries (only selects). Nested loop 
 (without 
 materialize) is used for the join operator.

 When I pin a single query to an individual core, its execution time is 
 observed 
 as 111 seconds. This result is my base case. Then, I fire two instances of 
 the 
 same query concurrently and pin them to two different cores separately. 
 However, 
 each execution time becomes 132 seconds in this case. In a similar trend, 
 execution times are increasing for three instances (164 seconds)  and four 
 instances (201 seconds) cases too. What I was expecting is a linear 
 improvement 
 in throughput (at least). I tried several different queries and got the same 
 trend at each time.

Are you measuring wall-clock execution time for queries in parallel?
I.e. start measuring when the first query is started (asynchronously?)
and stop when the last one is finished?

Did you try the same measurement without pinning?

 I wonder why execution times of individual queries are increasing when I 
 increase the number of their instances.

 Btw, I don't think on-chip cache hit/miss rates make a  difference since L2 
 cache misses are decreasing as expected. I'm not an expert in PostgreSQL 
 internals. Maybe there is a lock-contention (spinlocks?) occurring even if 
 the 
 queries are read-only. Anyways, all ideas are welcome.

As others said, memory bandwidth is the most likely suspect here. CPUs
are unfortunately so much faster than memory and memory buses that they
frequently have to wait. Unless PostgreSQL uses the exclusive lock model
instead of shared-exclusive, there shouldn't be much contention for the
shared buffers.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Porting from MS Access 2007 to PostgreSQL

2010-11-09 Thread Richard Broersma
On Mon, Nov 8, 2010 at 11:39 PM, Arnaud Lesauvage
arnaud.lis...@codata.eu wrote:
 I can't really give you any advice about this, but we kept using Access as
 our Frontend and everything runs fine.
 You'll just have to make sure that you have read about the 'boolean -
 integer' problem. This article is a nice start I think :
 http://www.postgresonline.com/journal/archives/24-Using-MS-Access-with-PostgreSQL.html

I've played with mapping access [yes/no] datatype to postgresql
BOOLEANs for a while.  And after all of that *fun*, I've since started
mapping postgresql's INTEGER to access's [yes/no] datatype.

Life is so much better now because.


-- 
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Sandeep Srinivasa
hi,
   I am the OP.

With due respect to everyone (and sincere apologies to Richard Broersma), my
intention was not to create a thread about MySQL/Oracle's business
practices.

It was about the technical discussion on Highscalability - I have been
trying to wrap my head around the concept of multiple core scaling for
Postgres, especially beyond 8 core (like Scott's Magny Coeurs example). My
doubt arises from  whether Postgres depends on the kernel scheduler for
multiple CPU/core utilization.

If that is the case, then does using FreeBSD vs Linux give rise to any
differences in scaling?

Taking the question one step further, do different Linux kernels (and
schedulers) impact Postgres scalability ? The Phoronix Test Suite already
tests linux kernel releases for regressions in performance w.r.t postgres DB
performance (e.g
http://www.phoronix.com/scan.php?page=articleitem=linux_perf_regressionsnum=1),
but doesnt particularly focus on multiple cores.

Is it something that should be benchmarked ?

thanks
-Sandeep

P.S. on the topic of scalability, here is another article -
http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html
,
where people have asked if a similar thing can be done using Postgres UDF or
a marshalling  ODBA  http://scm.ywesee.com/?p=odba/.git;a=summary


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread David Boreham

On 11/9/2010 11:10 AM, Sandeep Srinivasa wrote:
It was about the technical discussion on Highscalability - I have been 
trying to wrap my head around the concept of multiple core scaling for 
Postgres, especially beyond 8 core (like Scott's Magny Coeurs 
example). My doubt arises from  whether Postgres depends on the kernel 
scheduler for multiple CPU/core utilization.


If that is the case, then does using FreeBSD vs Linux give rise to any 
differences in scaling?


Hmm...typically multi-core scaling issues are in the area of memory 
contention and cache coherence (and therefore are for the most part not 
dependent on the OS and its scheduler).




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [JDBC] psqlexception syntax error at or near $

2010-11-09 Thread Kevin Grittner
[Please keep the list copied and please don't top-post.]
[Also, this thread really belongs on pgsql-general or pgsql-novice;
it doesn't seem to have anything to do with JDBC, so I'm blind
copying the JDBC list and posting to general to move it.]

Steven Dahlin pgdb.sldah...@gmail.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.govwrote:

 CREATE OR REPLACE FUNCTION system_info_fnc01() RETURNS trigger
 AS $$

  raise exception ''System Info record already present'';

 $$

 When you use dollar quoting you should not double your
 apostrophes.

 Try:

 raise exception 'System Info record already present';

 That makes no difference.

It seems to make a difference on my machine:

test=# CREATE OR REPLACE FUNCTION system_info_fnc01() RETURNS
trigger AS
test-# $$
test$# declare
test$#   iCnt   integer;
test$# begin
test$#   select count(*) into  iCnt from  system_info;
test$#   if ( iCnt  0 ) then
test$#  raise exception ''System Info record already present'';
test$#   end if;
test$#   return new;
test$# end;
test$# $$
test-# LANGUAGE 'plpgsql' IMMUTABLE;
ERROR:  syntax error at or near System
LINE 8:  raise exception ''System Info record already present'';
   ^
test=# CREATE OR REPLACE FUNCTION system_info_fnc01() RETURNS
trigger AS
test-# $$
test$# declare
test$#   iCnt   integer;
test$# begin
test$#   select count(*) into  iCnt from  system_info;
test$#   if ( iCnt  0 ) then
test$#  raise exception 'System Info record already present';
test$#   end if;
test$#   return new;
test$# end;
test$# $$
test-# LANGUAGE 'plpgsql' IMMUTABLE;
CREATE FUNCTION

If you're getting different results, please copy/paste actual code.
 
What version of PostgreSQL are you running?

-Kevin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Sandeep Srinivasa
On Tue, Nov 9, 2010 at 11:46 PM, David Boreham david_l...@boreham.orgwrote:


 Hmm...typically multi-core scaling issues are in the area of memory
 contention and cache coherence (and therefore are for the most part not
 dependent on the OS and its scheduler).


If it is independent of the OS, then how does one go about tuning it.

Consider this - I get a 12 core server on which I want multiple webserver
instances + DB. Can one create CPU pools (say core 1,2,3 for webservers,
4,5,6,7 for DB, etc.) ?

I know about taskset, but should one be using it ?


Re: [GENERAL] REINDEX requirement?

2010-11-09 Thread Igor Neyman
 

 -Original Message-
 From: AI Rumman [mailto:rumman...@gmail.com] 
 Sent: Tuesday, November 09, 2010 3:26 AM
 To: pgsql-general General
 Subject: REINDEX requirement?
 
 How do I know that index require REINDEX?
 
 

Look at the results of pgstatindex(...) function for specific index.
It's part of pgstattupple contrib module - read it up in the docs.

Regards,
Igor Neyman

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I guess I'm missing something here WRT FOUND

2010-11-09 Thread Ralph Smith




Yeah your right Alban, that looks bad, but it was an artifact of
'try-this, try-this, no, try-this'.

The table is empty, and unfortunately remains that way; nothing gets
inserted.
I tried other variations, however FOUND just isn't behaving as I would
think.

---
OUTPUT SNIPPET:
NOTICE: SQL cmd is = SELECT COUNT(*) FROM zbus_tokens WHERE token =
PARKING
NOTICE: Row = 10, Skipped INSERT Count = 32, Word2 = PARING
NOTICE: SQL cmd is = SELECT COUNT(*) FROM zbus_tokens WHERE token =
COLLEEN
NOTICE: Row = 11, Skipped INSERT Count = 33, Word2 = COLLEEN

---
 Alban Hertroys wrote:
 On 9 Nov 2010, at 5:11, Ralph Smith wrote:

 Why is FOUND 'finding' and hence avoiding an INSERT?
 
  Not really sure what your point is (don't have time to look
closely), but...

   PERFORM Rnotice1(1,''SQL cmd is'',''SELECT COUNT(*) FROM
zbus_tokens WHERE token = ''||Word2::varchar) ;
   PERFORM (SELECT COUNT(*) FROM zbus_tokens WHERE token =
Word2) ;
  IF NOT FOUND THEN
   PERFORM RNotice1(1,''value'',Word2) ;
   INSERT INTO zbus_tokens (token) values(Word2); 
   J=J+1 ;
   IF J % 100 = 0 THEN
   PERFORM Rnotice2(1,''Row'',I,''Insert Count'',J) ;
   END IF ;
  ELSE
   K=K+1 ;
   PERFORM RNotice2(1,''Row'',I,''Skipped INSERT Count'',K) ;
  END IF ;
 You just connected this ELSE block to the IF statement it was nested
inside.
 You probably need to comment out the rest of this ELSE block as well.


  Alban Hertroys

--
Screwing up is an excellent way to attach something to the
ceiling.
(Assuming you're not turning the screw driver the wrong way.)


-- 
Ralph
_




Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Scott Marlowe
On Tue, Nov 9, 2010 at 11:10 AM, Sandeep Srinivasa s...@clearsenses.com wrote:
 hi,
    I am the OP.
 With due respect to everyone (and sincere apologies to Richard Broersma), my
 intention was not to create a thread about MySQL/Oracle's business
 practices.

Hehe, we head off on tangents.  It's common, don't worry.

 It was about the technical discussion on Highscalability - I have been
 trying to wrap my head around the concept of multiple core scaling for
 Postgres, especially beyond 8 core (like Scott's Magny Coeurs example). My
 doubt arises from  whether Postgres depends on the kernel scheduler for
 multiple CPU/core utilization.
 If that is the case, then does using FreeBSD vs Linux give rise to any
 differences in scaling?

All multi-process applications like pgsql have to depend on the OS
kernel scheduler to get their processes run.  But in terms of scaling,
that's usually not the biggest issue, it's getting rid of choke points
in the kernel like linux's much earlier versions having one big spin
lock on huge chunks of the kernel.  That's been gone a long time, but
as the number of cores keeps going up, new chokepoints are found and
fixes in both Linux and BSD.

 Taking the question one step further, do different Linux kernels (and
 schedulers) impact Postgres scalability ? The Phoronix Test Suite already
 tests linux kernel releases for regressions in performance w.r.t postgres DB
 performance

The IO scheduler mostly just gets in the way on bigger machines with
battery backed caching controllers and / or SAN arrays.

 (e.g http://www.phoronix.com/scan.php?page=articleitem=linux_perf_regressionsnum=1),
 but doesnt particularly focus on multiple cores.
 Is it something that should be benchmarked ?

Yes.  Sadly, to do so you really need a $7500 machine

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Fwd: Re: [GENERAL] I guess I'm missing something here WRT FOUND]

2010-11-09 Thread Ralph Smith
Yeah your right Alban, that looks bad, but it was an artifact of 
'try-this, try-this, no, try-this'.


The table is empty, and unfortunately remains that way; nothing gets 
inserted.
I tried other variations, however FOUND just isn't behaving as I would 
think.


---
OUTPUT SNIPPET:
NOTICE:  SQL cmd is = SELECT COUNT(*) FROM zbus_tokens WHERE token = PARKING
NOTICE:  Row = 10,   Skipped INSERT Count = 32,   Word2 = PARING
NOTICE:  SQL cmd is = SELECT COUNT(*) FROM zbus_tokens WHERE token = COLLEEN
NOTICE:  Row = 11,   Skipped INSERT Count = 33,   Word2 = COLLEEN

---
   Alban Hertroys wrote:
   On 9 Nov 2010, at 5:11, Ralph Smith wrote:

   Why is FOUND 'finding' and hence avoiding an INSERT?
  
   Not really sure what your point is (don't have time to look 
closely), but...


   PERFORM Rnotice1(1,''SQL cmd is'',''SELECT COUNT(*) FROM 
zbus_tokens WHERE token = ''||Word2::varchar) ;

   PERFORM (SELECT COUNT(*) FROM zbus_tokens WHERE token = Word2) ;
   IF NOT FOUND THEN
 PERFORM RNotice1(1,''value'',Word2) ;
 INSERT INTO zbus_tokens (token) values(Word2);
 J=J+1 ;
 IF J % 100 = 0 THEN
   PERFORM Rnotice2(1,''Row'',I,''Insert Count'',J) ;
 END IF ;
   ELSE
 K=K+1 ;
 PERFORM RNotice2(1,''Row'',I,''Skipped INSERT Count'',K) ;
   END IF ;
  You just connected this ELSE block to the IF statement it was nested 
inside.

  You probably need to comment out the rest of this ELSE block as well.


 Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.
(Assuming you're not turning the screw driver the wrong way.)


--
Ralph
_

--

Ralph
_


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I guess I'm missing something here WRT FOUND

2010-11-09 Thread Tom Lane
Ralph Smith rsm...@10kinfo.com writes:
 ttYeah your right Alban, that looks bad, but it was an artifact of
 'try-this, try-this, no, try-this'.br
 br
 The table is empty, and unfortunately remains that way; nothing gets
 inserted.br
 I tried other variations, however FOUND just isn't behaving as I would
 think.br

(Please avoid html-encoded email.)

The original mail looked like you were trying to do

perform count(*) from something where something;
if found then ...

This will in fact *always* set FOUND, because the query always yields
exactly one row: that's the nature of aggregate functions.  FOUND
doesn't respond to whether the result of count(*) was zero or nonzero,
but just to the fact that it did deliver a result row.

You probably wanted something like

perform 1 from something where something;
if found then ...

which will set FOUND depending on whether there are any rows matching
the where-clause.  Or you could avoid FOUND altogether:

if exists(select 1 from something where something) then ...

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Gauthier, Dave
 -Original Message-
 From: pgsql-general-ow...@postgresql.org 
 [mailto:pgsql-general-ow...@postgresql.org] On Behalf  Of Tom Lane
 Sent: Tuesday, November 09, 2010 10:55 AM
 To: Vick Khera
 Cc: Scott Ribe; Allan Kamau; pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Why facebook used mysql ? 

 Vick Khera vi...@khera.org writes:
  On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe scott_r...@killerbytes.com 
  wrote:
  Also, my understanding is that if you go way back on the PostgreSQL 
  timeline to versions 6  and earliest 7.x, it was a little shaky. (I 
  started with 7.3 or 7.4, and it has been rocksolid.)

  In those same times, mysql was also, um, other than rock solid.

 I don't have enough operational experience with mysql to speak to how
 reliable it was back in the day.  What it *did* have over postgres back
 then was speed.  It was a whole lot faster, particularly on the sort of
 single-stream-of-simple-queries cases that people who don't know
 databases are likely to set up as benchmarks.  (mysql still beats us on
 cases like that, though not by as much.)  I think that drove quite a
 few early adoption decisions, and now folks are locked in; the cost of
 conversion outweighs the (perceived) benefits.

A different slant on this has to do with licensing and $$. Might Oracle decide 
some day to start charging for their new found DB?  They are a for-profit 
company that's beholding to their shareholders LONG before an open software 
community.  Consumers like Facebook and Google have deep pockets, something 
corporate executives really don't dismiss lightly.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread David Boreham


Also there's the strange and mysterious valley group-think syndrome.
I've seen this with several products/technologies over the years.
I suspect it comes from the VCs, but I'm not sure. The latest example
is you should be using EC2. There always follows a discussion where
I can present 50 concrete reasons based on hard experience why
the suggestion is a bad idea and the other person presents nothing
besides everyone's doing it. I saw exactly the same thing with MySQL
a few years ago. Before that it was Oracle. It's often easier to go along
with the flow and get some work done vs. trying to argue.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Graham Leggett

On 09 Nov 2010, at 7:30 PM, David Boreham wrote:

Sorry but this is 100% not true. It may be true for a 3rd party (you  
release something under the GPL, I enhance it, therefore I am  
required to release my enhancement under the GPL). But Oracle owns  
the copyright to the MySql code and therefore they can decide to do  
whatever they want with it. The only thing they can't do is to 'un- 
release' existing code released under the GPL. Everything else is  
possible.


Ownership of the copyright trumps the GPL.


Ownership of the copyright is owned by whoever made the contribution,  
and any competent version control system will give you the list of  
contributions (and therefore contributors). If a contribution was made  
in terms of the GPL, then permission would need to be sought from  
everyone who has made a contribution before it could be released under  
a different license.


Regards,
Graham
--


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Scott Marlowe
On Tue, Nov 9, 2010 at 1:04 PM, Graham Leggett minf...@sharp.fm wrote:
 On 09 Nov 2010, at 7:30 PM, David Boreham wrote:

 Sorry but this is 100% not true. It may be true for a 3rd party (you
 release something under the GPL, I enhance it, therefore I am required to
 release my enhancement under the GPL). But Oracle owns the copyright to the
 MySql code and therefore they can decide to do whatever they want with it.
 The only thing they can't do is to 'un-release' existing code released under
 the GPL. Everything else is possible.

 Ownership of the copyright trumps the GPL.

 Ownership of the copyright is owned by whoever made the contribution, and
 any competent version control system will give you the list of contributions
 (and therefore contributors). If a contribution was made in terms of the
 GPL, then permission would need to be sought from everyone who has made a
 contribution before it could be released under a different license.

Contributed code to MySQL AB MUST be assigned copyright to MySQL AB.
 If it's been incorporated into MySQL proper, it's owned by MySQL AB
ne Oracle.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Tom Lane
Scott Marlowe scott.marl...@gmail.com writes:
 On Tue, Nov 9, 2010 at 1:04 PM, Graham Leggett minf...@sharp.fm wrote:
 Ownership of the copyright is owned by whoever made the contribution, and
 any competent version control system will give you the list of contributions
 (and therefore contributors). If a contribution was made in terms of the
 GPL, then permission would need to be sought from everyone who has made a
 contribution before it could be released under a different license.

 Contributed code to MySQL AB MUST be assigned copyright to MySQL AB.

Yeah, MySQL AB and successors have been very careful to ensure that they
have air-tight ownership of that code.  I've been asked for copyright
assignments for four-line patches :-(

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Dann Corbit
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Sandeep Srinivasa
Sent: Tuesday, November 09, 2010 10:10 AM
To: Lincoln Yeoh
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Why facebook used mysql ?

hi,
   I am the OP.

With due respect to everyone (and sincere apologies to Richard Broersma), my 
intention was not to create a thread about MySQL/Oracle's business practices.

It was about the technical discussion on Highscalability - I have been trying 
to wrap my head around the concept of multiple core scaling for Postgres, 
especially beyond 8 core (like Scott's Magny Coeurs example). My doubt arises 
from  whether Postgres depends on the kernel scheduler for multiple CPU/core 
utilization.

If that is the case, then does using FreeBSD vs Linux give rise to any 
differences in scaling?

Taking the question one step further, do different Linux kernels (and 
schedulers) impact Postgres scalability ? The Phoronix Test Suite already tests 
linux kernel releases for regressions in performance w.r.t postgres DB 
performance (e.g 
http://www.phoronix.com/scan.php?page=articleitem=linux_perf_regressionsnum=1),
 but doesnt particularly focus on multiple cores.

Is it something that should be benchmarked ?

thanks
-Sandeep

P.S. on the topic of scalability, here is another article - 
http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html
 , where people have asked if a similar thing can be done using Postgres UDF or 
a marshalling  ODBA  http://scm.ywesee.com/?p=odba/.git;a=summary

Regarding scaling, there is an interesting NoSQL engine called Kyoto cabinet 
that has some testing in high volume transactions under different loads and 
different conditions.
The Kyoto cabinet data engine is written by the Tokyo cabinet author Mr. 
Hirabayashi of Fallabs.com.  In this document:
http://fallabs.com/kyotocabinet/spex.html
We find something interesting.  In the section called Transaction, we have this:
=
default
risk on process crash: Some records may be missing.
risk on system crash: Some records may be missing.
performance penalty: none
remark: Auto recovery after crash will take time in proportion of the database 
size.
transaction
implicit usage: open(..., BasicDB::OAUTOTRAN);
explicit usage: begin_transaction(false); ...; end_transaction(true);
risk on process crash: none
risk on system crash: Some records may be missing.
performance penalty: Throughput will be down to about 30% or less.
transaction + synchronize
implicit usage: open(..., BasicDB::OAUTOTRAN | BasicDB::OAUTOSYNC);
explicit usage: begin_transaction(true); ...; end_transaction(true);
risk on process crash: none
risk on system crash: none
performance penalty: Throughput will be down to about 1% or less.
=

Notice that there is a 3:1 penalty for flushing request to disk from the 
program to the operating system, and a 100:1 penalty for hard flushing from the 
operating system to the disk.
So the simple way to scale to huge volumes is simply to allow data loss.  That 
is a major way in which NoSQL data systems can achieve absurd transaction rates.

There are also distributed hash tables (these are also called NoSQL engines, 
but they are an entirely different technology).  With a distributed hash table, 
you can get enormous scaling and huge transaction volumes.
http://en.wikipedia.org/wiki/Distributed_hash_table
Distributed hash tables are another kind of key/value store {but an entirely 
different technology compared to traditional key/value stores like DBM}.

When we design a data system, we should examine the project requirements and 
choose appropriate tools to solve the problems facing the project.

For something like FaceBook, a Key/Value store is a very good solution.  You do 
not have a big collection of related tables, and there are no billion dollar 
bank transactions taking place where someone will get a bit bent out of shape 
if it goes missing.

For an analytic project where we plan to do cube operations, a column store 
like MonetDB is a good idea.

For a transactional system like Point Of Sale or Accounting, a traditional 
RDBMS like PostgreSQL is the best solution.

I think that an interesting path for growth would be to expand PostgreSQL to 
allow different table types.  For instance, all leaf tables (those tables 
without any children) could easily be Key/Value stores.  For analytics, create 
column store tables.  For ultra-high access, have a distributed hash table.

But right now, most RDBMS systems do not have these extra, special table types. 
 So if you want tools that do those things, then use those tools.

IMO-YMMV



Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread David Boreham

On 11/9/2010 11:36 AM, Sandeep Srinivasa wrote:


If it is independent of the OS, then how does one go about tuning it.

Consider this - I get a 12 core server on which I want multiple 
webserver instances + DB. Can one create CPU pools (say core 1,2,3 for 
webservers, 4,5,6,7 for DB, etc.) ?


I know about taskset, but should one be using it ?


There are plenty of things you might do, but first you need to figure 
out what problem you're solving.
I'd suggest deploying a relatively simple configuration then evaluate 
its capacity under your workload.
Does it run fast enough? If so, then job done. If not then why not, and 
so on...


The simplest configuration would be one web server instance and one DB 
instance.


I don't think you should be looking at process partitioning and core 
affinity unless you have already proved that
you have processes that don't scale over the cores you have, to deliver 
the throughput you need.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Thoughts on a surrogate key lookup function?

2010-11-09 Thread Merlin Moncure
On Sat, Nov 6, 2010 at 1:01 PM, Nick nboutel...@gmail.com wrote:
 Merlin, thanks for the reply. Yes, using email_addresses was a very
 silly example. Maybe the following is a better example...

 CREATE TABLE first_names (id INT, first_name VARCHAR);
 ALTER TABLE first_names ADD CONSTRAINT first_names_pkey PRIMARY KEY
 (id,first_name);
 ALTER TABLE first_names ADD CONSTRAINT first_names_unique_first_name
 UNIQUE (first_name);

 CREATE TABLE last_names (id INT, first_name VARCHAR);
 ALTER TABLE last_names ADD CONSTRAINT last_names_pkey PRIMARY KEY
 (id,last_name);
 ALTER TABLE last_names ADD CONSTRAINT last_names_unique_last_name
 UNIQUE (last_name);

 CREATE TABLE referrals (id INT, ref_code VARCHAR);
 ALTER TABLE referrals ADD CONSTRAINT referrals_pkey PRIMARY KEY
 (id,ref_code);
 ALTER TABLE referrals ADD CONSTRAINT referrals_unique_ref_code UNIQUE
 (ref_code);

 CREATE TABLE users (id INT, first_name_id INT, first_name VARCHAR,
 last_name_id INT, last_name VARCHAR, ref_code_id INT, ref_code DATE);
 ALTER TABLE users ADD CONSTRAINT users_fkey_first_names FOREIGN KEY
 (first_name_id,first_name) REFERENCES first_names(id,first_name) ON
 UPDATE CASCADE ON DELETE SET NULL;
 ALTER TABLE users ADD CONSTRAINT users_fkey_last_names FOREIGN KEY
 (last_name_id,last_name) REFERENCES last_names(id,last_name) ON UPDATE
 CASCADE ON DELETE SET NULL;
 ALTER TABLE users ADD CONSTRAINT users_fkey_referrals FOREIGN KEY
 (ref_code_id,ref_code) REFERENCES referrals(id,ref_code) ON UPDATE
 CASCADE ON DELETE SET NULL;
 CREATE TRIGGER auto_first_name_id BEFORE INSERT OR UPDATE ON users
 FOR EACH ROW EXECUTE PROCEDURE _auto_id('first_name_id');
 CREATE TRIGGER auto_last_name_id BEFORE INSERT OR UPDATE ON users
 FOR EACH ROW EXECUTE PROCEDURE _auto_id('last_name_id');
 CREATE TRIGGER auto_ref_code_id BEFORE INSERT OR UPDATE ON users FOR
 EACH ROW EXECUTE PROCEDURE _auto_id('ref_code_id');

 If I would like to insert a new user, first name, last name I would
 give the surrogates a value of NULL or -1. Their referral code must
 exist so ill give that surrogate a value of 0.
 INSERT INTO users (id, first_name_id, first_name, last_name_id,
 last_name, ref_code_id, ref_code) VALUES (1,-1,'John',-1,'Doe',
 0,'xyz') which would...

 SELECT id FROM first_names WHERE first_name = 'John' INTO
 NEW.first_name_id
 IF NOT FOUND INSERT INTO first_names (first_name) VALUES ('John')
 RETURNING id INTO NEW.first_name_id

 SELECT id FROM last_names WHERE last_name = 'Doe' INTO
 NEW.last_name_id
 IF NOT FOUND INSERT INTO last_names (last_name) VALUES ('Doe')
 RETURNING id INTO NEW.last_name_id

 SELECT id FROM referral_codes WHERE ref_code = 'xyz' INTO
 NEW.ref_code_id
 IF NOT FOUND raise exception

 If I want to insert the new user John Smith, and I already know the
 surrogate value for John and I dont want to add a ref_code then I can
 do...
 INSERT INTO users (id, first_name_id, first_name, last_name_id,
 last_name, ref_code_id, ref_code) VALUES
 (2,1,NULL,-1,'Smith',NULL,NULL) which would...

 SELECT first_name FROM first_names WHERE id = 1 INTO NEW.first_name
 IF NOT FOUND raise exception

 SELECT id FROM last_names WHERE last_name = 'Smith' INTO
 NEW.last_name_id
 IF NOT FOUND INSERT INTO last_names (last_name) VALUES ('Smith')
 RETURNING id INTO NEW.last_name_id

 So by adding both the surrogate and natural keys to users table and
 toggling the surrogate on insert by 0 (must exist) or -1 (select or
 insert) I can bypass a much more complex insert statement. Is this
 frowned upon? I havent had many issues (but some ive been able to work
 around) with this as a plperl trigger and am pleased with how much
 easier it makes my inserts (besides the execution speed).

It's a neat idea, but all things considered, you are better off using
one of the two approaches I outlined above:
*) Your idea need extra composite index on two fields (each unique)
that serves no integrity purpose
*) Referring table has extra fields, pick natural or surrogate, but not both...
*) It's not faster.  Any way you slice this, you need lookup on the
master table, even if the system does it internally through RI in the
purely natural case. insert where not exists...will do exactly what
you are doing above, and does it in one statement, not two.

If you want to do this inside database, it's more common to do this in
regular function, not trigger function.  Just make a function
insert_user() that handles logic checking and dispense with all the
extra fields...

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Scott Marlowe
On Tue, Nov 9, 2010 at 4:12 PM, David Boreham david_l...@boreham.org wrote:

 I don't think you should be looking at process partitioning and core
 affinity unless you have already proved that
 you have processes that don't scale over the cores you have, to deliver the
 throughput you need.

Note that you're likely to get FAR more out of processor affinity with
multiple NICs assigned each to its own core / set of cores that share
L3 cache and such.Having the nics and maybe RAID controllers and /
or fibre channel cards etc on their own set of cores in one group can
make a big difference.

Processor affinity doesn't seem to make much difference for me with
pgsql.  Modern linux schendulers are pretty good at keeping things on
the same core for a while without predefined affinity.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread David Boreham

On 11/9/2010 5:05 PM, Scott Marlowe wrote:

Note that you're likely to get FAR more out of processor affinity with
multiple NICs assigned each to its own core / set of cores that share
L3 cache and such.Having the nics and maybe RAID controllers and /
or fibre channel cards etc on their own set of cores in one group can
make a big difference.


Be careful though: this phenomenon typically only comes into play at 
very high I/O rates.
I wouldn't want to send the OP down this path without first verifying 
that he has a problem.

Most folks are not trying to push 100k requests/s out their web servers..



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Using EXCLUDE in 9.0 with operator ...

2010-11-09 Thread Derek Harland
Hallo all,

I have a question about Exclusion constraints in 9.0:
* Lets say I have a table  with two columns X and Y (of type varchar or int)
* Is there any way I can add add an EXCLUDE constraint that says, for each 
value of X the values of Y must be the same.
* So the following rows are ok:

  X | Y
  
  A , 1
  A , 1
  B , 1
  B , 1
  C , 2
  C , 2

But adding 

  A , 3

should fail, as everything with value X = A should have the same value as 
existing data (where Y = 1)

Theoretically the following would be nice if it worked

  EXCLUDE (X WITH =, Y WITH )

but it complains that 

  ERROR:  operator (text,text) is not a member of operator family text_ops

because the Btree index method only allows = in an exclude constraint.  Or am I 
missing a simpler way of doing this?  (without having to create and then 
foreign key to another table, which is obviously a possibility)

des.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using EXCLUDE in 9.0 with operator ...

2010-11-09 Thread Jeff Davis
On Wed, 2010-11-10 at 13:45 +1300, Derek Harland wrote:
 Theoretically the following would be nice if it worked
 
   EXCLUDE (X WITH =, Y WITH )
 
 but it complains that 
 
   ERROR:  operator (text,text) is not a member of operator family
 text_ops
 
 because the Btree index method only allows = in an exclude constraint.
 Or am I missing a simpler way of doing this?  (without having to
 create and then foreign key to another table, which is obviously a
 possibility)
 

Try using 9.1alpha ( http://www.postgresql.org/developer/alpha ) and
installing btree_gist. Then, use:

  EXCLUDE USING gist (X WITH =, Y WITH )

In 9.0 this particular constraint won't work because there is an
additional sanity check in the code that won't pass if the operator is
. The sanity check was deemed worthwhile for the first release of
the feature, but will be lifted in version 9.1.

Also,  doesn't work (yet) with btree, but in principle there is no
reason why not. Perhaps for 9.1 as well.

Can you please share your use case for this particular constraint? I'd
like to hear it.

Regards,
Jeff Davis


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Libpq is very slow on windows but fast on linux.

2010-11-09 Thread Rob Brown-Bayliss
Further testing shows it is windows networking causing the issue.
Copying files to and from the server is 5 to 6 times slower on a
Windows client compared to the Linux client.

The issue is not specific to libpq.
--

Rob

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Libpq is very slow on windows but fast on linux.

2010-11-09 Thread Dann Corbit
 -Original Message-
 From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
 ow...@postgresql.org] On Behalf Of Rob Brown-Bayliss
 Sent: Tuesday, November 09, 2010 8:20 PM
 To: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Libpq is very slow on windows but fast on linux.
 
 Further testing shows it is windows networking causing the issue.
 Copying files to and from the server is 5 to 6 times slower on a
 Windows client compared to the Linux client.
 
 The issue is not specific to libpq.

Check your Nagle setting.
http://support.microsoft.com/kb/138831

See also:
http://apocryph.org/2006/07/08/achieving_maximum_tcp_performance_on_windows_xp_2k3/

P.S.
Iperf is really useful to see how well things are going as a diagnostic tool:
http://sourceforge.net/projects/iperf/files/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Libpq is very slow on windows but fast on linux.

2010-11-09 Thread John R Pierce

On 11/09/10 9:53 PM, Dann Corbit wrote:

Check your Nagle setting.
http://support.microsoft.com/kb/138831


that link works for SNA Server, which is some real obscure stuff, but 
not the general case


try this, http://www.speedguide.net/articles/windows-2kxp-more-tweaks-158
where it says _Gaming Tweak - Disable Nagle's algorithm_

This page is talking about games, but this is the exact same thing.

and, I still bet the OP need to increase his tcp recieve window to like 
320k or something. with the kind of latency mentioned.




*
*

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Libpq is very slow on windows but fast on linux.

2010-11-09 Thread Alban Hertroys
On 10 Nov 2010, at 5:19, Rob Brown-Bayliss wrote:

 Further testing shows it is windows networking causing the issue.
 Copying files to and from the server is 5 to 6 times slower on a
 Windows client compared to the Linux client.
 
 The issue is not specific to libpq.

Do both machines have similar hardware? If the Linux machine has a proper NIC 
(intel, for example) while the Windows machine has a poor NIC (Realtek!), then 
of course you would get differences in performance.

I'm not saying Microsofts network stack is particularly good, mind you, just 
that it isn't necessarily the main cause. One of the problems Windows is facing 
is that, while supporting a lot of hardware is a good thing in general, it also 
supports all the crap hardware, crap drivers and crap ACPI implementations.

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4cda484310265265216060!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Libpq is very slow on windows but fast on linux.

2010-11-09 Thread Antonio Goméz Soto

Op 10-11-10 08:22, Alban Hertroys schreef:

On 10 Nov 2010, at 5:19, Rob Brown-Bayliss wrote:


Further testing shows it is windows networking causing the issue.
Copying files to and from the server is 5 to 6 times slower on a
Windows client compared to the Linux client.

The issue is not specific to libpq.


Do both machines have similar hardware? If the Linux machine has a proper NIC 
(intel, for example) while the Windows machine

 has a poor NIC (Realtek!), then of course you would get differences in 
performance.




Not that much, in the past I have seen differences in performance between Intel 
and Realtek,
20-30% yes, but never by a factor of 5 or 6.

Antonio


I'm not saying Microsofts network stack is particularly good, mind you, just 
that it isn't necessarily the main cause.

 One of the problems Windows is facing is that, while supporting a lot of 
hardware is a good thing in general, it also
 supports all the crap hardware, crap drivers and crap ACPI implementations.


Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4cda484310265265216060!






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Libpq is very slow on windows but fast on linux.

2010-11-09 Thread John R Pierce

On 11/09/10 11:22 PM, Alban Hertroys wrote:
Do both machines have similar hardware? If the Linux machine has a 
proper NIC (intel, for example) while the Windows machine has a poor 
NIC (Realtek!), then of course you would get differences in performance.

I'm not saying Microsofts network stack is particularly good, mind you, just 
that it isn't necessarily the main cause. One of the problems Windows is facing 
is that, while supporting a lot of hardware is a good thing in general, it also 
supports all the crap hardware, crap drivers and crap ACPI implementations.



He has high speed connections with 350ms ping times to the server thats 
half way around the world.Windows XP, at least, did not default to 
an RWIN over 64k.   He needs about 300K for that link, give or take, 
depending on how fast the wires are.  He can go into the registry (of 
the XP client) and bump is RWIN to something larger than pingtime (in 
seconds) * wirespeed (in byte/sec)  ...  example:   0.350 ping * 700K 
byte/sec = ~250K, so use 300k or 400k for TCP Recieve Window Size)





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general