Re: [HACKERS] get rid of psql welcome message

2008-04-18 Thread Naz Gassiep


Tom Lane wrote:

Well, in general the *variable* parts of the banner were all put there
because of fairly urgent need, and I'd resist removing them.  It's the
unchanging boilerplate that seems open to debate.

I'm +1 for cutting that down to a single line.  I don't care one way or
the other about providing a .psqlrc option to suppress it altogether.
  


It could be that even optional removal of the version number is a 
foot-gun for users who perhaps carelessly lose track of which version 
they are running and do something with it (such as rsync with another 
server's data dir or something silly like that) expecting the wrong version.


I don't see how, if it were reduced to a single line, the indication of 
version number could possibly be considered problematic under any 
circumstances.


Regards,
- Naz.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timezone view

2008-02-20 Thread Naz Gassiep

Tom Lane wrote:

Naz Gassiep [EMAIL PROTECTED] writes:
  
I think that it would be great if the pg_timezone_names and 
pg_timezone_abbrevs included a boolean field indicating if that item is 
in the Olsen DB



Huh?  They're all in the Olsen DB
Not true, the zone.tab file has 398 zones defined, and in my PG 8.2 
running on Debian, there are 564 timezones listed in pg_timezone_names. 
The field I propose would indicate which 398 of those 564 are listed in 
zone.tab as those are the actual Olsen defined timezones, the rest are 
aliases.


Regards,
- Naz.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Timezone view

2008-02-20 Thread Naz Gassiep

Alvaro Herrera wrote:

Naz Gassiep wrote:

  
It may also be beneficial to add the ISO 3166 column into that view, the  
data is in zone.tab and I can't see a reason to not include it.



We also have the country name in iso3166.tab and the geo coordinates.
And there is also a comment field.  Which ones make sense to add


I'd say the country name and geo coordinates both make sense to add.

Regards,
- Naz.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Timezone view

2008-02-13 Thread Naz Gassiep
I brought this up a while ago, but I didn't get any responses, I assume 
due to everyone being too busy with 8.3


I think that it would be great if the pg_timezone_names and 
pg_timezone_abbrevs included a boolean field indicating if that item is 
in the Olsen DB or if it is a system alias or other added item. This 
would make it far easier to integrate the data in the view with external 
data sources that also use the Olsen DB.


It may also be beneficial to add the ISO 3166 column into that view, the 
data is in zone.tab and I can't see a reason to not include it.


- Naz.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Data from zone.tab

2008-01-08 Thread Naz Gassiep
Is there any reason that the zone.tab information is not included in the 
pg_timezone_names system view? ISTM that there is really no reason not 
to, as that view is really populated using that file anyway. There is a 
1:1 mapping (assuming the aliases are mapped to the zone.tab entries 
they are aliases of) of entries in that view with enties in zone.tab.


Reading an earlier thread on this matter, I think Magnus is behind the 
code that generates the view. What are the chances of getting at least 
the country code included in the pg_timezone_names system view? It'd 
really help out with i18n / L10n work, and given that PG already ships 
with that data present, it seems silly to not take advantage of it given 
how easy it would be to do so.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Data from zone.tab

2008-01-08 Thread Naz Gassiep
Sorry to reply, but there should also be a field in the system view  
is_alias so that devs are able to tell which zone names are in the 
zone.tab file and which are not. That way a perfect 1:1 mapping between 
zone.tab and app can be made. If this were done then it'd make things 
like using CLDR data and other standardized data sources easier, as you 
could be confident that all timezone names matched the data in the CLDR.


I think what I'm trying to say is that using and applying standards is a 
good thing.


- Naz.

Naz Gassiep wrote:
Is there any reason that the zone.tab information is not included in 
the pg_timezone_names system view? ISTM that there is really no reason 
not to, as that view is really populated using that file anyway. There 
is a 1:1 mapping (assuming the aliases are mapped to the zone.tab 
entries they are aliases of) of entries in that view with enties in 
zone.tab.


Reading an earlier thread on this matter, I think Magnus is behind the 
code that generates the view. What are the chances of getting at least 
the country code included in the pg_timezone_names system view? It'd 
really help out with i18n / L10n work, and given that PG already ships 
with that data present, it seems silly to not take advantage of it 
given how easy it would be to do so.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Spoofing as the postmaster

2007-12-28 Thread Naz Gassiep
The problem with forcing authentication is that an auth-unaware client 
connecting to a legitimate postmaster would have its connections 
refused. That same client would have its connections accepted by an 
impostor postmaster. Thus, there is no way to stop impostor postmasters 
from carrying out these attacks on auth-unaware clients.


The proper solution, as I see it, would be to have an authentication 
system in the postmaster that was not enforced. If the client requests 
authentication, postmaster will provide it, if not, then postmaster will 
connect normally without it. This would not result in *any* change in 
the default behavior of postmaster, and as far as users who don't want 
to use it are concerned, they don't even need to bother to turn it off 
(assuming that having it turned on does not consume extra resources and 
I don't think having an unused authentication mechanism sitting in the 
postmaster connection establishment routine would).


This does not appear to result in greater security, however it does. It 
allows DBAs who suspect that they are likely going to be the target of 
these attacks to deploy authentication procedures in their client 
packages. This could be a modification to their applications, or 
whatever steps are necessary to mandate authenticated connections within 
their organization.


There is no point forcing some auth mechanism within postmaster, as 
attackers would simply catch users using software that did not require 
the server to auth before sending passwords. For this reason it is not 
postmaster's responsibility to check that unknown clients do not connect 
to impostors, it is postmaster's responsibility however to authenticate 
itself, if the client asks for it. So the onus (rightfully in my 
opinion) falls upon network administrators / DBAs to ensure that all of 
their users are using auth-enabled client packages which will not allow 
connections to be established with a postmaster until authentication has 
passed, and disallow the use of other client software to connect to 
postmaster.


In my view, this puts the security responsibility where it rightfully 
belongs *and* maintains a non-breaking of client packages in the wild. 
Making a server or anything that *requires* auth and disallows 
non-authed clients is pointless, as there is nothing stopping attackers 
from setting up an auth-disabled impostor and waiting for someone to 
just connect using psql or some other vanilla connection method. The 
onus really ought to be with the administrators who give their users the 
software they use to connect to ensure that the software they use 
adheres to the relevant security policy, in the same way that its their 
responsibility to ensure that the client software does not contain 
keyloggers and other such trashware.


In the web world, it is the client's responsibility to ensure that they 
check the SSL cert and don't do their banking at 
www.bankofamerica.hax0r.ru and there is nothing that the real banking 
site can do to stop them using their malware infested PC to connect to 
the phishing site. They can only provide a site that provides 
authentication. This is analogous to postmaster: It is only the 
responsibility of postmaster to provide the option of authentication, it 
is the client's responsibility to know if they should use it, and if so, 
to ensure they do so properly.


Regards,
- MrNaz.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] todo: Hash index creation

2007-07-03 Thread Naz Gassiep




Wow... not sure how I missed that. I *did* create this schema ages ago,
perhaps it wasn't there, or at the time I had no idea what the
implications were. *shrug*
Regards,
- Naz.


Tom Lane wrote:

  Naz Gassiep [EMAIL PROTECTED] writes:
  
  
As a result, when creating tables containing large blocks of text I wish
to index, I've been using HASH as an index method. Please can we state
in the manual that HASH index types are in a beta stage of development
or something similar, or perhaps remove the manual entry altogether
until HASH is at a point where it is usable in production.

  
  
Uh, the manual already does say

Note: Testing has shown PostgreSQL's hash indexes to perform no better
than B-tree indexes, and the index size and build time for hash indexes
is much worse. Furthermore, hash index operations are not presently
WAL-logged, so hash indexes might need to be rebuilt with REINDEX after
a database crash. For these reasons, hash index use is presently
discouraged.

under 11.2 Index Types, as well as various derogatory remarks elsewhere.

			regards, tom lane

  





Re: [HACKERS] todo: Hash index creation

2007-07-01 Thread Naz Gassiep

 Actually I think the *most* important thing to work on is to get hash to
 the point where its search speed actually beats btree consistently, so
 that it has an excuse to live.  If that is insoluble we might well end up
 ripping it out entirely.  (The first three TODO items for hash indexes
 are ideas for trying to improve the speed.)

 Fixing the WAL support would come after that, and bring it to the point
 where someone could actually recommend it for production use.

 After that it would be sensible to work on inessentials like improving
 the build speed.
I've been warned away from hash indexes before, however I had no idea
that it's performance was that abysmal that BTREE beat it and I was
definitely not aware that they were not included in WAL logs. I was told
it wasn't as good as it could be, but I wasn't told it was pretty much
an alpha piece of code.

As a result, when creating tables containing large blocks of text I wish
to index, I've been using HASH as an index method. Please can we state
in the manual that HASH index types are in a beta stage of development
or something similar, or perhaps remove the manual entry altogether
until HASH is at a point where it is usable in production.

Regards,
A very surprised n00b.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] pg_get_tabledef

2007-05-21 Thread Naz Gassiep
Just a question, is there any advantage to having this then building a
function in applications that wrap and use pg_dump with a few options?
Surely that's a more appropriate way to achieve this functionality?
- Naz.


Usama Munir wrote:
 Hi,

 i was following a thread some time ago where adding a function
 *pg_get_tabledef* was one of the TODOs for 8.2, but it somehow didn't
 make it to the release perhaps because the functionality was not
 clearly defined? not sure.

 Anyway  i happen to come up with a function for pg_get_tabledef which
 works something like following

 /postgres=# select pg_get_tabledef(16388) ;
 pg_get_tabledef
 
  CREATE TABLE  public.dept
  (
   deptno numeric(2,0) NOT NULL,
   dname  character varying(14),
   loccharacter varying(13)
  )
  WITHOUT OIDS;
 /(1 row)



 i wanted to submit a patch for this, IFF the community wants this
 function. The rationale is obviously to help Application developers
 writing applications like pgAdmin. Currently this part of SQL needs to
 be constructed manually for postgres by the tools.

 it is arguable that a table defintion will have constraints , triggers
 etc as well, and they can be added without much problem, but i think
 if a tool needs to construct an SQL for all table related objects then
 functions are already available for them like pg_get_constraintdef,
 pg_get_ruledef, pg_get_indexdef, pg_get_triggerdef etc

 i understand that you guys don't like someone to develop a patch in
 isolation and just come up with it one day, but it really came out as
 a by-product of some other work , and i thought you guys might be
 interested in it.

 if it is desired,  i will submit  a patch for it, within a day or so.

 Regards,
 Usama Munir
 EnterpriseDB (www.enterprisedb.com)




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Feature freeze progress report

2007-05-01 Thread Naz Gassiep
Andrew Dunstan wrote:
 Naz Gassiep wrote:
 I believe the suggestion was to have an automated process that only ran
 on known, sane patches.
 How do we know in advance of reviewing them that they are sane?
Same way as happens now. I would assume this mechanism would only be
applied to patches that had already been approved to contrib, or some
other measure that can be used to isolate only those patches that we
*expect* to already be working. The intention of this mechanism, in my
head, is to just help us make sure that regression issues on patches get
detected sooner.
 What is more, we often run into situations where patch a will require
 changes in patch b, so testing them individually against CVS is not
 likely to be terribly useful.
Yeap, given that this proposition is for an automated system, perhaps it
could be designed to apply combinations of patches together to look for
conflicts.
 Frankly, our problems are not primarily technological. They have to do
 mainly with scarcity of available time from competent reviewers. No
 amount of automation will fix that.
I fully understand that. However I find the idea of an automated process
checking for big issues while we're all sleeping to be... sexy. I'm not
sure how difficult a system like this would be to set up but it doesn't
seem to me to be the sort of thing that requires more than a few simple
scripts. If it's not too had to set up, even if it only yields small and
rare benefits, it will have been a worthwhile exercise.

My 2c (adjusted for inflation).

Regards,
- Naz

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Feature freeze progress report

2007-05-01 Thread Naz Gassiep

 What is approved to contrib?

 The problem here is that having reasonable certainty that a patch is
 not malicious requires having gone over it in some detail; at which
 point you might as well apply the thing.  Or if you didn't apply it,
 you bounced it for reasons that are unlikely to have anything to do
 with needing more automated testing.

 ISTM this idea can only work if we have a second tier of reviewers
 who are considered good enough to vet patches as safe, but not quite
 good enough to certify them as commitable.  I'm not seeing a large pool
 of people volunteering to hold that position --- at best it'd be a
 transitory state before attaining committerdom.  If you are relying
 on a constant large influx of new people, you are doomed to failure
 (see Ponzi scheme for a counterexample).
Yep. For the record, Ponzi died in poverty, so it's not a counter
example, just proves that any gains that are had will be short lived and
increase the size of the crash when crunch time comes. :)
- Naz.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Feature freeze progress report

2007-04-30 Thread Naz Gassiep
I believe the suggestion was to have an automated process that only ran
on known, sane patches. I don't think he was suggesting a mechanism for
the great unwashed masses to dump arbitrary code into and have it
applied in the buildfarm. You'd have an inventory of patches (you could
use a hash to ensure they hadn't changed just before they ar
automatically applied) that were verified as good, and the system would
apply them to HEAD periodically.

Even if the patch inventory wasn't kept right up to date, this system
could potentially help many regression issues or bugs to surface sooner,
and as it would require zero work once set up besides system maintenance
(which should be low if it is implemented in a reasonably intelligent
manner) I feel that it is a great idea. Generally, I am all for
automating mundane tasks as much as possible.

Regards,
- Naz.

Andrew Dunstan wrote:
 Marc Munro wrote:
 On Mon, 2007-30-04 at 08:56 -0300, Heikki Linnakangaspgsql wrote:
  
 Date: Mon, 30 Apr 2007 09:18:36 +0100
 From: Heikki Linnakangas [EMAIL PROTECTED]
 To: Tom Lane [EMAIL PROTECTED]
 Cc: Dave Page [EMAIL PROTECTED], Simon Riggs
 [EMAIL PROTECTED],  Bruce Momjian [EMAIL PROTECTED],
  PostgreSQL-development pgsql-hackers@postgresql.org
 Subject: Re: Feature freeze progress report
 Message-ID: [EMAIL PROTECTED]
 

  
 If we had a 1-2 lines status blurp attached to each patch in the
 queue, like waiting for review, author is fixing issue XX, etc.,
 that
 might help. Bruce would need to do that if we keep the current patch
 queue system unmodified otherwise, or we'd need to switch to
 something else.
 

 Would it be possible to also automatically determine some sort of
 bit-rot status?  What I had in mind was an automated process that would
 apply each patch to HEAD on a daily basis and report whether the patch
 still applies cleanly and still allows all regression tests to pass on
 at least one platform.  If and when the result of these tests changes
 from pass to fail, the patch submitter would be automatically
 notified. 
 The patch status could then also show the last time at which the patch
 applied cleanly, and the last time that regression tests ran
 successfully.


   

 This or something similar has been discussed in the past w.r.t. the
 buildfarm. One major problem is that most sane system owners won't
 want to apply, compile and run an arbitrary patch. It could well have
 an intended or unintended trojan horse, for example. So you'd need
 some level of sanity checking to be done by some trusted person even
 to get it to this stage.

 cheers

 andrew

 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] tsearch2 in 8.3

2007-04-24 Thread Naz Gassiep

A few of us on IRC were wondering what the status of tsearch2 is in 8.3 ?
Was it decided to include it in core or did we decide to keep FTS as a 
plugin?
Some brief comments from anyone on the inside of the whole FTS issue 
would be greatly appreciated by us mere end users.

Regards,
- Naz.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4

2007-03-14 Thread Naz Gassiep




Granted, but a configure switch would allow users who want to use OS TZ
file in conjunction with a compiled from source installation. Many
users of OSes with package managers such as Debian or RedHat may, for
whatever reason, want to use a source tarball to install and also use
the OS TZ list.


That being said, this user group may be small enough to ignore. Just
throwing it in for thought.


Tom Lane wrote:

  Josh Berkus josh@agliodbs.com writes:
  
  
Zdenec,


  I have following idea:
1) add guc varibale which enable usage of OS time zone files
2) add extra parameters into ./configure script which enable OS TZ
support in the code and get path to OS TZ files.
  

  
  
  
  
If we're adding it as a configure-time variable, there's no reason to have 
a GUC.

  
  
I see zero reason to have either.  It would only make sense to do this
in the context of a platform-specific distribution such as an RPM, and
in that context the simplest solution is to let the RPM specfile make
the substitution (ie, after "make install" and before packaging,
rm -rf PG's timezone tree and insert a symlink).  Then it's on the RPM
packager's head whether it's the right thing to do or not.  A configure
switch strikes me as mostly a foot-gun, because the average user of
Postgres won't have any way to know whether the files are compatible.

			regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

  





Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!

2007-03-11 Thread Naz Gassiep
I do see your points regarding the existence of use cases for this 
feature, and I agree that at worst, the implementation of this feature 
would provide a way to greatly simplify query design and at best provide 
a whole new method of obtaining decision supporting data from a 
relational database.


However I am strongly in disagreement with your fourth point, I.e., that 
users will only become aware of it once it has been implemented. This 
sort of mentality is what gave us the sad case of late 90s HTML in which 
browser vendors assumed that they could use the if you build it they 
will come argument for feature extension of the HTML spec. That is a 
debacle we are still suffering the effects of. Let us not do the same to 
SQL and implement SKYLINE on our own, only to have other DBMS vendors 
implement it in different ways and then finally when the SQL standard 
includes it they try to make some kind of average approximation of the 
implementations resulting in *none* of the DBs being compliant. Then 
we'll be between the rock of breaking backwards compatibility and the 
hard place of unwarranted standards non-compliance.


While Josh did point out that being in the leading group as far as 
implementing new functionality goes, I feel that it has to be weighed 
against the need to not strike out too aggressively, potentially 
isolating ourselves with excessive non-standard syntax or behavior.


While I am convinced there is a strong use case for this functionality 
and we should definitely start looking at it, I don't see why we should 
be in a rush to get it into core. People have survived without it up to 
now, I don't think our userbase will suffer if it is implemented 6 
months after foo commercial DB implements it, at least, not as much as 
it will suffer if we start drifting away from standards compliance.


Just my 2 rupees. :)

- Naz

Nikita wrote:

Few things from our side:

1. 'Skyline Of' is a new operator proposed in ICDE 2003, one of the 
topmost conferences of Data Engineering. Skyline operation is a hot 
area of research in query processing. Many of the database community 
people do know about this operator, and it is fast catching the 
attention.


2. The skyline operation is very useful in data analysis. Suppose, if 
we have a cricket database, and we want to find the bowlers who have 
taken maximum wickets in minimum overs, we can issue an easy-to-write 
query using 'Skyline of' syntax as follows:


Select * from Player_Match Skyline Of overs_bowled min, wickets_taken max;

This query gives 25 interesting tuples (result set) out of 24750 
tuples in 0.0509 seconds. The same result is obtained in 0.8228 
seconds if the following equivalent nested-query is issued:


select * from Player_Match p1 where not exists ( select * from 
Player_Match p2 where p2.overs_bowled = p1.overs_bowled and 
p2.wickets_taken = p1.wickets_taken and (p2.overs_bowled  
p1.overs_bowled or p2.wickets_taken  p1.wickets_taken))


Note that the above time is the time elapsed between issuing a query 
and obtaining the result set.
As can be seen, the above query looks pretty cumbersome to write and 
is inefficient too. So, which query will the user prefer? As the 
number of dimensions increases, writing a nested-query will become a 
hedious task.

Btw, how can such a query be written using aggregate function syntax??

3. As far as optimizing the Skyline is concerned, it is still a 
research problem since it requires estimating the cardinality of the 
skyline result set.


4. Until and unless this operator is implemented in a popular database 
system, how can a user ever get to know about it and hence appreciate 
its usefulness?


Btw, it was our B.Tech final year project, and not a term project :-)

Regards.

On 3/8/07, *Tom Lane* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 
wrote:


Shane Ambler [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Well, whether it's horrible or not is in the eye of the
beholder, but
 this is certainly a non-standard syntax extension.

 Being non-standard should not be the only reason to reject a
worthwhile
 feature.

No, but being non-standard is certainly an indicator that the feature
may not be of widespread interest --- if it were, the SQL committee
would've gotten around to including it; seems they've managed to
include
everything but the kitchen sink already.  Add to that the complete
lack
of any previous demand for the feature, and you have to wonder
where the
market is.

 The fact that several
 different groups have been mentioned to be working on this
feature would
 indicate that it is worth considering.

It looks to me more like someone published a paper that caught the
attention of a few profs looking for term projects for their students.

Now maybe it really is the best idea since sliced bread and will
be seen
in the next SQL spec edition, but color me 

Re: [HACKERS] Proposal: Commit timestamp

2007-01-26 Thread Naz Gassiep
I would be *very* concerned that system time is not a guaranteed 
monotonic entity. Surely a counter or other internally managed mechanism 
would be a better solution.


Furthermore, what would be the ramifications of master and slave system 
times being out of sync?


Finally what if system time is rolled forward a few minutes as part of a 
correction and there were transactions completed in that time? There is 
a change, albeit small, that two transactions will have the same 
timestamp. More importantly, this will throw all kinds of issues in when 
the slave sees transactions in the future. Even with regular NTP syncs, 
drift can cause a clock to be rolled forward a few milliseconds, 
possibly resulting in duplicate transaction IDs.


In summary, I don't think the use of system time has any place in 
PostgreSQL's internal consistency mechanisms, it is too unreliable an 
environment property. Why can't a counter be used for this instead?


- Naz.

Jan Wieck wrote:
For a future multimaster replication system, I will need a couple of 
features in the PostgreSQL server itself. I will submit separate 
proposals per feature so that discussions can be kept focused on one 
feature per thread.


For conflict resolution purposes in an asynchronous multimaster 
system, the last update definition often comes into play. For this 
to work, the system must provide a monotonically increasing timestamp 
taken at the commit of a transaction. During replication, the 
replication process must be able to provide the remote nodes timestamp 
so that the replicated data will be as of the time it was written on 
the remote node, and not the current local time of the replica, which 
is by definition of asynchronous later.


To provide this data, I would like to add another log directory, 
pg_tslog. The files in this directory will be similar to the clog, but 
contain arrays of timestamptz values. On commit, the current system 
time will be taken. As long as this time is lower or equal to the last 
taken time in this PostgreSQL instance, the value will be increased by 
one microsecond. The resulting time will be added to the commit WAL 
record and written into the pg_tslog file.


If a per database configurable tslog_priority is given, the timestamp 
will be truncated to milliseconds and the increment logic is done on 
milliseconds. The priority is added to the timestamp. This guarantees 
that no two timestamps for commits will ever be exactly identical, 
even across different servers.


The COMMIT syntax will get extended to

COMMIT [TRANSACTION] [WITH TIMESTAMP timestamptz];

The extension is limited to superusers and will override the normally 
generated commit timestamp. This will be used to give the replicating 
transaction on the replica the exact same timestamp it got on the 
originating master node.


The pg_tslog segments will be purged like the clog segments, after all 
transactions belonging to them have been stamped frozen. A frozen xid 
by definition has a timestamp of epoch. To ensure a system using this 
timestamp feature has enough time to perform its work, a new GUC 
variable defining an interval will prevent vacuum from freezing xid's 
that are younger than that.


A function get_commit_timestamp(xid) returning timpstamptz will return 
the commit time of a transaction as recorded by this feature.



Comments, changes, additions?

Jan



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-26 Thread Naz Gassiep

Andrew Dunstan wrote:

I am constantly running into this:

Q. Does PostgreSQL have full text indexing?
A. Yes it is in contrib.
Q. But that isn't part of core.
A. *sigh*

Where on the website can I see what plugins are included with 
PostgreSQL?


Where on the website can I see the Official PostgreSQL Documentation for
Full Text Indexing?

With TSearch2 in core will that fix the many upgrade problems associated
with using TSearch2?


  


contrib is a horrible misnomer. Can we maybe bite the bullet and call 
it something else?
After years of PG use, I am still afraid to use contrib modules because 
it just *feels* like voodoo. I have spent much time reading this mailing 
list and on IRC with PG users, and I know that contrib modules are on 
the whole tested and safe, but the lack of web documentation and any 
indication of what they do other than check the notes that come with 
the source makes me just feel like they are use and cross fingers 
type thing.


I don't know how hard it would be to implement, but perhaps contrib 
modules could be compiled in a similar way to Apache modules. E.g., 
./configure --with-modulename   with the onus for packaging them 
appropriately falling onto the shoulders of the module authors. I feel 
that even a basic module management system like this would greatly 
increase awareness of and confidence in the contrib modules. Oh, and

+1 on renaming contrib
+1 on the need for a comprehensive list of them
+1 on the need for more doc on the website about each of them, onus 
falling on module authors, perhaps require at least a basic doc patch as 
a requirement for /contrib inclusion.


- Naz

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] dump / restore functionality

2006-09-12 Thread Naz Gassiep
At risk of being chastised for reviving old issues, I was wondering, 
what are the chances were of getting the dump / restore selectivity into 
8.2 ? I am referring to the idea that, instead of the current 2 parts, a 
dump could be broken up into 3 parts, namely tables, data and everything 
else, so that data from one dump could be mixed and matched with schema 
defs from another dump easily and scriptably.


I think the previous discussion concluded that the functionality would 
be best implemented as a selective restore, rather than a breakable dump 
due to the risk of inconsistent restores, so you could restore just the 
tables, data or everything else components from a given dump.


Did this item make it onto the to-do list? If so, did anyone pick this 
up or will I be waiting until a future as-yet-undefined date?


More generally, is there a publicly accessible place one can see the 
to-do items, who has adopted which ones and what the status is on them? 
Sorry for asking this, but I am still a rather new participant in here.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] dump / restore functionality

2006-09-12 Thread Naz Gassiep




Zero, because feature freeze is over.
  

Aah yes, fair enough
If you find this feature interesting, you are free to drive the development 
yourself, independent of it appearing on any list.  To avoid tears later on, 
look for a consensus about the merit of the feature first, though
This has been discussed already, and there was a not insignificant 
amount of support from it, IIRC Tom Lane agreed that such functionality 
would be useful.


Tom, are you aware if this item made it onto the to-do list?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] dump / restore functionality

2006-09-12 Thread Naz Gassiep



None, but feel free to start coding for 8.3.My coding skills are still nascent, 
but I shall do my best.

My coding skills are still pretty nascent, but I shall do my best.


That seems like a rather spectacular overstatement of the likely
benefits, not to mention a misdescription of what was discussed.
  
Once again I get pulled over by the semantics police :) Yes, you are 
right, that's what was discussed, and that is the functionality I am 
hoping for, as it would allow scripting the merging of a schema from one 
database with the table data from another.


Did this make it into the to-do list for 8.3 ?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] Timezone List

2006-09-06 Thread Naz Gassiep
Any chance for a DB Client accessible list of allowable time zones? I've 
been told that the only way to get at this list is by looking through 
the source and lifting the list from zone.tab.


While I'm at it, how about an accessible list of country codes? I know 
that it's not core db functionality, but these lists are so universally 
useful that making users parse the files and store them in tables seems 
silly.


What are other people's thoughts on this? Good idea or not?

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Timezone List

2006-09-06 Thread Naz Gassiep

Martijn van Oosterhout wrote:

In the CVS version there is a table with this information:

http://developer.postgresql.org/pgdocs/postgres/view-pg-timezonenames.html 

  

Great, thanks for that

Err, where does postgres use this information? I beleive there is a
project on pgfoundary that has some standard datasets.
  
Currently, it is stored in /src/timezone/data/iso3166.tab and I propose 
to have it available in a system view or something similar. This data is 
as useful as the available timezones, although I concede that it is not 
part of PG functionality and this may be more appropriate as a simple 
file that can be psql -f'd into the database if users need it as part of 
an app. It's more developer helper data than database functionlity 
and hence it could be more appropriate to distribute through the support 
community rather than as part of the postgresql core.


Comments?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Timezone List

2006-09-06 Thread Naz Gassiep



Actually, what that view gives you is timezone offset abbreviations, not
the full zone names that you could use with SET TIME ZONE.  It strikes
me that we should have a view for that as well.  We could use code
similar to scan_available_timezones() to generate the view output.

It's somewhat urgent to address this now, because pg_timezonenames is
sitting on the obvious name for such a view, and once we release 8.2
we won't be able to change it.  On reflection I think the existing view
is wrongly named --- perhaps it should be pg_timezoneabbrevs?  Or
more readably, perhaps pg_timezone_abbrevs, with pg_timezone_names for
the other view.

regards, tom lane
  
I agree with having two views, and I also think that the name as it is, 
is not right. I agree with pg_timezone_abbrevs and pg_timezone_names or 
similar.


On a related note, there is not a one:one relationship between 
abbreviations and zone names, some abbreviations are used by two zones 
(forex EST, CST and others are used in Australia and the Americas) 
and currently it is a server configuration directive 
(australian_timezones) to assume Australian or American zones in the 
case of ambiguity.


I don't know about anyone else, but the whole australian_timezones thing 
seems like an ugly hackaround to me. I do not have a proposed solution 
to this, but I see a non-trivial risk of an application being 
re-deployed on a server where the admin forgets to change this directive 
resulting in all kinds of fun and games. Forgive me if this is an 
already-discussed issue.


I am also rather baffled at the way SAT is changed from being 
interpreted as a day of the week in one mode, and a timezone in another. 
This seems an awful incongruity of behavior, and SAT should be 
interpreted as a timezone in both modes. If it must be done, switching 
of this behavior doesn't fit in with the purpose of the 
australian_timezones directive and should be made the subject of a 
different directive (e.g., sat_is_timezone(boolean) or something 
similar). SAT should, IMHO, always be considered a timezone and use of 
the SAT string by DB programmers should be just another case for care 
as with any other SQL keyword.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] PostgreSQL on 64 bit Linux

2006-08-20 Thread Naz Gassiep
I have a PostgreSQL installation on a Debian box that had the 64bit SMP 
kernel installed before PostgreSQL was compiled and installed on it. 
Does PostgreSQL take any advantage of the 64 bit environment or have we 
not done anything to move into the 64 bit world yet?

Regards,
- Naz

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] PostgreSQL on 64 bit Linux

2006-08-20 Thread Naz Gassiep




Douglas McNaught wrote:

  Naz Gassiep [EMAIL PROTECTED] writes:

  
  
I have a PostgreSQL installation on a Debian box that had the 64bit
SMP kernel installed before PostgreSQL was compiled and installed on
it. Does PostgreSQL take any advantage of the 64 bit environment or
have we not done anything to move into the 64 bit world yet?

  
  
Depends on whether PG was compiled as 64-bit or 32-bit--is your
toolchain 64-bit all the way, or is it just the kernel?

-Doug

I just compiled as the manual says. I guess I must have compiled it in
32. I'll recompile in 64 when I upgrade to 8.2 when it's out.
Thanks,
- Naz.




Re: [HACKERS] pg_dump schema breakup

2006-08-19 Thread Naz Gassiep




Tom Lane wrote:

  Andrew Dunstan [EMAIL PROTECTED] writes:
  
  
Well, the other issue is how many canned breakup schemes we are going to 
support. If this particular one is of sufficiently general usefulness 
then I have no objection. But when you can produce it trivially from the 
output of "pg_dump -s", the need to hardcode it hardly seems pressing.

  
  
FWIW, I am in favor of providing a way to break up the dump output like
this, I was merely objecting to the vocabulary ;-).  We have certainly
seen tons of people burnt by the performance problems inherent in
separate-data-and-schema restores, and splitting the dump into three
parts instead of two seems like it would fix that.

But I also like Alvaro's comment that this should be on the restore side
not so much the dump side.  If you do two or three successive pg_dump
runs to make your dump then you run a nontrivial risk of not getting
consistent dumps.  My advice to people would be to do *one* full
"pg_dump -Fc" and then extract three scripts out of that.

The question then is whether it's worth providing the extraction
functionality in a more canned, user-friendly form than "here, hack up
the -L output with this perl script".  I'd vote yes.

			regards, tom lane

I greatly appreciate the comments here and am glad that my initial idea
has support. This thread highlights to me the difference between the
"hey there's a good idea there despite the fact that's he's obviously
not a veteran software developer" culture that the PostgreSQL community
has instead of the "he is obviously not a veteran software developer so
what on Earth could he have to offer us" responses I've had from
various other open source projects.

On a less obsequious note, I agree that pg_dump should be used to dump
everything in a single run to avoid consistency issues, and the
selection of data to be restored should be done with pg_restore. As
this is a feature that I would benefit greatly from, how do I go about
ensuring that this idea finds its way to the appropriate developer and
doesn't get forgotten in the mountain of ideas in the "that'd be nice
to have some day" category?

- Naz




[HACKERS] pg_dump schema breakup

2006-08-18 Thread Naz Gassiep
This is my first post to a PostgreSQL mailing list, so please forgive me 
if I have posted to the wrong place


Currently pg_dump has flags for dumping only table definitions and/or 
data. These flags are respectively:

--schema-only
--data-only

I propose that two more be added:
--tables-only
--constraints-only

These would essentially break up the output of --schema-only into two 
sections. I.e., the output of --tables-only plus the output of 
--constraints-only would be identical to the output of --schema-only .


There are a number of scenarios where this may be useful, I will 
describe the one that I would use it for.


When making changes to my database schema I would like to take the 
schema from the newly modified database, and dump the data from the old 
database into it to ensure that the new schema is able to take the data 
that exists in the live database. If it isn't then I modify the new 
schema or the live dataset as appropriate, and try again.


This requires the following steps:

1. Create temporary database and apply modified schema to it
2. Dump new database using --schema-only
3. Split new schema into table definitions and constraints
4. Apply new schema table definitions from step 34 to the testing database
5. Dump the existing database using --data-only
6. Apply the dataset from step 5 to the testing database
7. Apply new schema constraints from step 3 to the testing database

All of these steps are easily scriptable except step 3, which means that 
making quick changes to the new schema and re-applying includes the 
tedious task of opening the 5,000 line schema file in a text editor and 
manually pasting the relevant sections into new files. Step 3 really 
does hold up the development process with regards to testing changes to 
the schema.


Generalizing the nature of this task, the pg_dump features I propose 
would allow the easy scripting of dumping databases, making changes to 
the dumps and then re-applying them in a non paradox-inducing order.


FWIW I thought this would be a very simple patch, and had a look at the 
code for pg_dump myself, despite the fact that I've not even written a 
Hello World in C as yet. That attempt failed miserably, so now I am 
reduced to selling the merits of this idea to real developers. 
Incidentally, --schema-only appears to break tables and constraints into 
two sections already, as evidenced by the fact that all tables are 
created first, and constraints afterwards.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] pg_dump schema breakup

2006-08-18 Thread Naz Gassiep





Tom Lane wrote:

  Naz Gassiep [EMAIL PROTECTED] writes:
  
  
I propose that two more be added:
--tables-only
--constraints-only

  
  
This doesn't seem well-defined at all.  There are many objects in a
database that are definitely neither tables nor constraints, and it's
not very clear what things should be considered constraints either.

I think what you may really be after is "the stuff that should be loaded
before inserting data" and "the stuff that should be loaded after", but
the above are poor names for these concepts.

			regards, tom lane

And here I was thinking that I'd been clear :)

But yes, you are right, what I want is "the stuff that gets loaded
before data insertion" and "the stuff that goes after data insertion"

Essentially, the CREATE TABLE statements are the first part as that is
what is needed for data to be inserted. Everything else is the second
part.




Re: [HACKERS] pg_dump schema breakup

2006-08-18 Thread Naz Gassiep




Andreas Joseph Krogh wrote:

  On Friday 18 August 2006 18:52, Tom Lane wrote:
  
  
Naz Gassiep [EMAIL PROTECTED] writes:


  I propose that two more be added:
--tables-only
--constraints-only
  

This doesn't seem well-defined at all.  There are many objects in a
database that are definitely neither tables nor constraints, and it's
not very clear what things should be considered constraints either.

I think what you may really be after is "the stuff that should be loaded
before inserting data" and "the stuff that should be loaded after", but
the above are poor names for these concepts.

  
  
But it certainly would be nice to be able to dump all that "stuff":-)

Yea, I've been told that this would not be a high demand feature. So do
I have a second vote? ;-)




Re: [HACKERS] pg_dump schema breakup

2006-08-18 Thread Naz Gassiep

Andrew Dunstan wrote:
We already have a highly selective and configurable restore mechanism, 
using the -L feature of pg_restore. Maybe there's a good special case 
for this particular split, but it is hardly undoable now.


As for Naz' needs - I gave him a perl script I whipped up in few 
minutes to do the split he wanted - and I'm sure you could do the same 
in python ;-)


cheers

andrew
And very appreciative I am too. While I concede it is doable now, and I 
could do this in Python as well, I do feel that a feature like this 
would have wider use than may be apparent on a first glance.
Furthermore, I think that the simplicity of implementing it makes it 
really a question of why not rather than why.

- Naz

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org