from:"Zeugswetter Andreas DAZ SD"

Re: [HACKERS] Running PostGre on DVD

2005-11-15 Thread Zeugswetter Andreas DAZ SD


> > This has been proposed before, and always rejected. While you're 
> > always welcome to provide a patch, I'm very doubtful it would be 
> > accepted into the main product.
> 
> The example given in this thread certainly isn't going to change
anybody's mind.
> "Hi, I propose reducing everybody's security because my local admins
insist on an 
> utterly brain-dead security policy."

I think there is still need for discussion in this area for typical
Windows desktop use.

1. You can run Windows without creating users at all.
2. You may be using a Windows box where you are not allowed to create a
user

To apply unix practices to Windows is imho not really practicable.
For example a Windows developer usually uses an account with
administrative privs
and thus cannot run "make check" from his account :-(

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [GENERAL] aix build question re: duplicate symbol warning

2005-10-28 Thread Zeugswetter Andreas DAZ SD


> > Hmm.  pqStrerror is defined in libpgport (which is linked into the
> > backend) as well as libpq.  ISTM that libpq should not be 
> linked with 
> > -Wl,-bI:../../../src/backend/postgres.imp, since it's not 
> intended to 
> > be loaded into the backend.  Without having looked at the code, I'm 
> > wondering if the AIX makefiles stick that option into LDFLAGS_SL 
> > rather than someplace more restricted.
> 
> It seems that the right place to put this is BE_DLLLIBS, 
> which is a macro that probably didn't exist when the AIX 
> support was last looked at.  But both Windows and Darwin

yes
 
> ports use it now, so we may as well bring AIX up to speed.  
> Would you try the attached patch and see if it gets rid of 
> the warnings?  Also check that regression tests and contrib 
> build/regression tests still work.

works perfectly on AIX 4.3.3 with xlc. All regression tests pass, except
horology
that shows 7 extra -infinity rows. In contrib I tested btree_gist and
dblink
which also pass.

There are a few places left in contrib with duplicate warnings, some
because of
-l pgport and -I:postgres.imp but the main build is now free of those
warnings.

Thank you for the work, please apply.

Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-06 Thread Zeugswetter Andreas DAZ SD


> Now I've asked for the quickest path to detailed 
> understanding of the pg IO subsystem.  The goal being to get 
> more up to speed on its coding details.  Certainly not to 
> annoy you or anyone else.

Basically pg does random 8k (compile time blocksize) reads/writes only.
Bitmap and sequential scans read 8k blocks in order.
Only WAL does n x 8k writes with one system call.

pg relys on the OS readahead (== larger block IO) to do efficient IO.
Basically the pg scan performance should match a dd if=file of=/dev/null
bs=8k,
unless CPU bound.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Vacuum and Transactions

2005-10-05 Thread Zeugswetter Andreas DAZ SD


> > Is it reasonable to cancel and restart the vacuum process
periodically 
> > (say every 12 hours) until it manages to complete the work? It takes

> > about 2 hours to do the table scan, and should get in about 10 hours

> > of index work each round.

If we started the vacuum with the indexes, remembered a lowest xid per
index,
we could then vacuum the heap up to the lowest of those xids, no ?
We could then also vacuum each index separately.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Inherited indexes.

2005-10-04 Thread Zeugswetter Andreas DAZ SD


> Another possibility is optimizing for the special case of 
> indexing on a partitioning key. In this case, index values 
> would be very localized to one table, so just storing the 
> table info on each index page (or something similar) would work well.

If you have the partitioning key in the index and the partitions don't
overlap, it is better to create separate [unique] indexes on the
subtables.
Building separate indexes per partition is usually preferred because of:
1. performance of dropping a partition
2. smaller index for CE

Only if you need an "order by" without a sort step, that spawns more
than one partition
things usually get ugly. Imho the best solution would be a merge node,
that merges results of
several index accesses to avoid a sort and still use separate indexes.
Such
a merge node could probably also detect the case where accessing
partitions in a certain 
order still produces ordered results.

Usually you will only want the "one big unique index" when the
partitioning is not 
reflectable in the index keys, and then (also in other db's) such an
index is usually a pain ...

Andreas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-29 Thread Zeugswetter Andreas DAZ SD


> In my original example, a sequential scan of the 1TB of 2KB 
> or 4KB records, => 250M or 500M records of data, being sorted 
> on a binary value key will take ~1000x more time than reading 
> in the ~1GB Btree I described that used a Key+RID (plus node 
> pointers) representation of the data.

Imho you seem to ignore the final step your algorithm needs of
collecting the
data rows. After you sorted the keys the collect step will effectively
access the 
tuples in random order (given a sufficiently large key range).

This random access is bad. It effectively allows a competing algorithm
to read the
whole data at least 40 times sequentially, or write the set 20 times
sequentially. 
(Those are the random/sequential ratios of modern discs)

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Call for 7.5 feature completion

2005-09-02 Thread Zeugswetter Andreas DAZ SD


> I think the main problem with switching to visual studio 
> project files is maintainabilty. (It's not easy to get all 

I think the target should be a way to auto create those files with gmake
(maybe with mingw for configure).

The format of VS6 project and workspace files is pretty simple.
It should be possible to derive them from the makefiles and simple
templates.

Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Pre-allocated free space for row

2005-09-01 Thread Zeugswetter Andreas DAZ SD


> > The bgwriter could "update" all columns of dead heap tuples in heap 
> > pages to NULL and thus also gain free space without the need to
touch 
> > the indexes.
> > The slot would stay used but it would need less space.
> 
> Not unless it's running a transaction (consider TOAST updates).

Ok, you could leave all toast pointers and the toast table as is.

Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Pre-allocated free space for row

2005-09-01 Thread Zeugswetter Andreas DAZ SD


> > My wild guess is that deleting all index pointers for a removed
index 
> > is more-or-less the same cost as creating new ones for 
> > inserted/updated page.
> 
> Only if you are willing to make the removal process 
> recalculate the index keys from looking at the deleted tuple.

The bgwriter could "update" all columns of dead heap tuples in heap
pages
to NULL and thus also gain free space without the need to touch the
indexes.
The slot would stay used but it would need less space.

Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] wal_buffer tests in

2005-07-28 Thread Zeugswetter Andreas DAZ SD


> I ran a wal_buffer test series.   It appears that increasing the 
> wal_buffers is indeed very important for OLTP applications, 
> potentially resulting in as much as a 15% average increase in 
> transaction processing.  
> What's interesting is that this is not just true for 8.1, it's true
for 
> 8.0.3 as well.

You will want the log write to write up to at least 256kb (>= 256kb you
reach 
the max Mb/s a disk can write) (if that is possible, no commit/fsync
inbetween).
You will also want enough buffers, that can be filled while the 256kb
are written. 
Thus a value of at least 64 with 8k pages seems reasonable.

Andreas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Zeugswetter Andreas DAZ SD


>>> The point here is that fsync-off is only realistic for development
or 
>>> playpen installations.  You don't turn it off in a production 
>>> machine, and I can't see that you'd turn off the full-page-write 
>>> option either.  So we have not solved anyone's performance problem.
> 
>> Yes, this is basically another fsync-like option that isn't for 
>> production usage in most cases.  Sad but true.
> 
> Just to make my position perfectly clear: I don't want to see 
> this option shipped in 8.1.

Why not ? If your filesystem buffer size matches your pg page size,
and you have a persistent write cache, the option makes perfect sense.

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD


>> Only workable solution would imho be to write the LSN to each 512
byte 
>> block (not that I am propagating that idea).

"Only workable" was a stupid formulation, I meant a solution that works
with
a LSN.

> We're not doing anything like that, as it would create an 
> impossible space-management problem (or are you happy with 
> limiting tuples to 500 bytes?).

To do it, a layer between physical storage and row workmemory
would need to be inserted, of course that would add a lot of overhead.
I guess more overhead than computing a page crc.

> We still don't know enough about the situation to know what a solution
might look like.
> Is the slowdown Josh is seeing due to the extra CPU cost of the CRCs,
or the extra I/O cost,
> or excessive locking of the WAL-related data structures while we do
this stuff, or ???.
> Need more data.

Yes, especially the 10 sec instead of 1 sec response times look very
suspicious.

Andreas

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD


> Here's an idea:
> 
> We read the page that we would have backed up, calc the CRC and 
> write a short WAL record with just the CRC, not the block. When 
> we recover we re-read the database page, calc its CRC and 
> compare it with the CRC from the transaction log. If they 
> differ, we know that the page was torn and we know the database 
> needs recovery. (So we calc the CRC when we log AND when we recover).

Won't work, since the page on disk may have x different contents
between 2 checkpoints (bgwriter from lru).

Only workable solution would imho be to write the LSN to each 512
byte block (not that I am propagating that idea). 

Andreas

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD


>> Are you sure about that? That would probably be the normal case, but 
>> are you promised that the hardware will write all of the sectors of a

>> block in order?
> 
> I don't think you can possibly assume that.  If the block 
> crosses a cylinder boundary then it's certainly an unsafe 
> assumption, and even within a cylinder (no seek required) I'm 
> pretty sure that disk drives have understood "write the next 
> sector that passes under the heads"
> for decades.

A lot of hardware exists, that guards against partial writes
of single IO requests (a persistent write cache for a HP raid 
controller for intel servers costs ~500$ extra).

But, the OS usually has 4k (some 8k) filesystem buffer size,
and since we do not use direct io for datafiles, the OS might decide 
to schedule two 4k writes differently for one 8k page.

If you do not build pg to match your fs buffer size you cannot
guard against partial writes with hardware :-(

We could alleviate that problem with direct io for datafiles.

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] NOLOGGING option, or ?

2005-06-02 Thread Zeugswetter Andreas DAZ SD


> Escape processing would proceed as before, but the semantics would change to
> allow the use of different characters as the escape character, in addition
> to the special characters for delimiter and newline.

If you mean syntax to specify escape and delimiter (and newline ?), 
that is a great addition, that imho all would like.

>  Also, escape
> processing would be "false" as the default, so that the only special
> characters by default would be the newline and delimiter characters.

I don't see how that would be any advantage ? What is so wrong about having 
syntax to 
choose no escape processing, like "escape ''" ?

Andreas

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Tablespace-level Block Size Definitions

2005-06-01 Thread Zeugswetter Andreas DAZ SD


> The problem I see with this proposal is that the buffer manager knows
> how to handle only a equally-sized pages.  And the shared memory stuff
> gets sized according to size * num_pages.  So what happens if a certain
> tablespace A with pagesize=X gets to have a lot of its pages cached,
> evicting pages from tablespace B with pagesize=Y, where Y < X?

You could create a separate bufferpool per page size. Of course that 
has other disadvantages.

Is it really so difficult to create and attach another shmem segment ? 

Andreas

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] compiling postgres with Visual Age compiler on OpenPower5 / Linux

2005-05-30 Thread Zeugswetter Andreas DAZ SD


> I have not researched any deeper,but  currently it fails with
> 
> [EMAIL PROTECTED] postgresql-8.0.3]$
> CC=/opt/ibmcmp/vac/7.0/bin/xlc ./configure
> ..A..
> checking for int timezone... yes
> checking types of arguments for accept()... configure: error: 
> could not determine argument types

The odds are very high, that it rather depends on the system header file 
version,
or the order of inclusion (e.g. stdio.h before config.h).
Would need to see the snippet of configure output though.
IIRC there was a fix for this or a similar case, has this been reverted ?

Andreas

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)

2005-05-19 Thread Zeugswetter Andreas DAZ SD


> But to get the estimated cost ratio to match up with the actual cost
> ratio, we'd have to raise random_page_cost to nearly 70, which is a bit
> hard to credit.  What was the platform being tested here?

Why ? Numbers for modern single disks are 1-2Mb/s 8k random and 50-120 Mb/s 
sequential.

Andreas

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)

2005-05-19 Thread Zeugswetter Andreas DAZ SD


> >Incrementing random_page_cost from 4 (the default) to 5 causes the
> >planner to make a better decision.
> 
> We have such a low default random_page_cost primarily to mask other
> problems in the optimizer, two of which are
> 
> . multi-column index correlation
> 
> . interpolation between min_IO_Cost and max_IO_cost which approximates
> max_IO_cost too fast.

One other important figure here is concurrency. If you have a lot of backends
concurrently doing other IO, your sequential IO numbers will suffer more than
random IO numbers. Might be, that some super smart OS readahead implementation 
aleviates that problem, but I have not yet experienced one.
Also less of random IO tends to get higher cache rates.

Thus I think if you are alone, 4 tends to be too low, while with concurrent 
load 
4 tends to be too high. (All assuming farly large tables, that don't fit into 
RAM)

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] DO INSTEAD and conditional rules

2005-04-27 Thread Zeugswetter Andreas DAZ SD


> > If we did not define
> > it that way, I think your example would have to error out --- how
> > would you choose which INSTEAD rule wins?
> 
> The documentation says that they evaluate in alphabetical order by 
> name. So I would expect that the first one to have its WHERE statement 
> evaluate to true would short-circuit the execution of the remaining 
> rules.

Why not simply write non overlapping WHERE statements ?
Imho not doing them all would be very counter intuitive.

Andreas

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] locks in CREATE TRIGGER, ADD FK

2005-03-23 Thread Zeugswetter Andreas DAZ SD


> >   It would keep the old table around while building the new, then grab 
> > an exclusive lock to swap the two.
> 
> Lock upgrading is right out.

It would need a whole of new family of "intent" locks, with different rules.

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] We are not following the spec for HAVING without GROUP BY

2005-03-10 Thread Zeugswetter Andreas DAZ SD


> What that means is that neither the HAVING clause nor the targetlist
> can use any ungrouped columns except within aggregate calls; that is,
> 
>   select col from tab having 2>1

Informix:
select tabname from systables having 2 > 1;
   294: The column (tabname) must be in the GROUP BY list.
select tabname from systables group by 1 having 2 > 1;
   all rows returned
select tabname from systables group by 1 having 1 > 2;
   no rows found

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [pgsql-hackers-win32] [HACKERS] win32 performance - fsync question

2005-02-25 Thread Zeugswetter Andreas DAZ SD


> >Are you verifying that all the data that was committed was actually stored? 
> >Or
> >just verifying that the database works properly after rebooting?
> 
> I verified the data.

Does pg startup increase the xid by some amount (say 1000 xids) after crash ?
Else I think you would also need to rollback a range of xids after
the crash, to see if you don't loose data by reusing and rolling back xids.

The risk is datapages reaching the disk before WAL, because the disk rearranges.
I think you would not notice such corruption (with pg_dump) unless you do the
range rollback.

Andreas

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] win32 performance - fsync question

2005-02-21 Thread Zeugswetter Andreas DAZ SD


> >> One point that I no longer recall the reasoning behind is that xlog.c
> >> doesn't think O_SYNC is a preferable default over fsync.  
> >
> >For larger (>8k) transactions O_SYNC|O_DIRECT is only good with the recent
> >pending patch to group WAL writes together. The fsync method gives the OS a 
> >chance to do the grouping. (Of course it does not matter if you have small
> >tx < 8k WAL)
> 
> This would be true for fdatasync() but not for fsync(), I think.

No, it is only worse with fsync, since that adds a mandatory seek.

> On win32 (which started this discussion, fsync will sync the directory
> entry as well, which will lead to *at least* two seeks on the disk.
> Writing two blocks after each other to an O_SYNC opened file should give
> exactly two seeks.

I think you are making the following not maintainable assumptions.
1. there is no other outstanding IO on that drive that the OS happily 
inserts between your two 8k writes
2. the rotational delay is neglectible
3. the per call overhead is neglectible

You will at least wait until the heads reach the write position again,
since you will not be able to supply the next 8k in time for the drive to 
continue writing (with the single backend large tx I was referring to).

If you doubt what I am saying do dd blocksize tests on a raw device.
The results are, that up to ~256kb blocksize you can increase the drive
performance on a drive that does not have a powerfailsafe cache, and 
does not lie about write success.

Andreas

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] win32 performance - fsync question

2005-02-18 Thread Zeugswetter Andreas DAZ SD


> One point that I no longer recall the reasoning behind is that xlog.c
> doesn't think O_SYNC is a preferable default over fsync.  

For larger (>8k) transactions O_SYNC|O_DIRECT is only good with the recent
pending patch to group WAL writes together. The fsync method gives the OS a 
chance to do the grouping. (Of course it does not matter if you have small
tx < 8k WAL)

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Design notes for BufMgrLock rewrite

2005-02-17 Thread Zeugswetter Andreas DAZ SD


> Would there be any value in incrementing by 2 for index accesses and 1 
> for seq-scans/vacuums? Actually, it should probably be a ratio based on 
> random_page_cost shouldn't it?

What happens with very small hot tables that are only a few pages and thus have 
no index defined.

I think it would not be good to treat such data pages as less important than 
index pages.

Andreas

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] enforcing a plan (in brief)

2005-02-15 Thread Zeugswetter Andreas DAZ SD


> And the user maintenance of updating those hints for every release of
> PostgreSQL as we improve the database engine.

I don't think so. Basically an optimizer hint simply raises or lowers the cost 
of an index, mandates a certain join order, allows or disallows a seq scan ...
Imho it is not so far from the things people currently do with the set 
seq_scan= 
type of commands.
(I don't think actually giving a certain plan is a good idea)

A good optimizer hint system would imho not circumvent the optimizer, but only 
give it hints. The hints should be very specifically aimed, like "an index on 
column x"
is going to be more expensive than you (the optimizer) think, if used with this 
query.
like: select /*+ avoid_index(atab atab_x0) */ * from atab ...

> > The people who are actually doing the work think their time is more
> > usefully spent on improving the planner's intelligence than on devising
> > ways to override it.

The subject of this mail and "override it" imho goes too far, I would like to 
be able to give advice in the form of hints to the optimizer.

> One consistent problem is the planner not being able to handle this or
> that scenario. At this stage, the *best* way to improve the planner is to
> add the ability to place hints in the plan.
 sql statement

I agree.

Andreas

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] New form of index "persistent reference"

2005-02-11 Thread Zeugswetter Andreas DAZ SD


> I asked the question how do you get a record without going through an
> index, the answer was CTID, which unfortunately changes when the row is
> updated.

The ctid is a physical location of the row. On update a new tuple is written 
in a new location, that is why the ctid changes. The old tuple has a system 
field
t_ctid which is then a forward pointer to the new tuple.

Thus you can follow that chain until the visible tuple is found. 
The current tid = tid does not do that (I think because the ODBC driver
which was the first to use it (for result set modification) needed to notice 
when the 
tuple was updated underneath).

But you can use:
select * from atab where ctid = currtid2('atab', '(0,1)'); -- '(0,1)' 
is the old ctid

Of course the old ctid is only valid until (auto)vacuum marks it free.
Without vacuum you are currently safe to use the currtid functions.

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] ARC patent

2005-01-19 Thread Zeugswetter Andreas DAZ SD


> > There's a very recent paper at 
> > http://carmen.cs.uiuc.edu/~zchen9/paper/TPDS-final.ps on an alternative 
> > to ARC which claims superior performance ...
> 
> From a quick glance, this doesn't look applicable. The authors are
> discussing buffer replacement strategies for a multi-level cache
> hierarchy (e.g. they would call the DBMS buffer cache "L1", and the

Yes, it might not matter however. Another algorithm that was written by 
university folk (thus probably not patent prone) that looks promising is:
http://www.cs.wm.edu/hpcs/WWW/HTML/publications/papers/TR-02-6.pdf
http://parapet.ee.princeton.edu/~sigm2002/papers/p31-jiang.pdf  (same, but 
better typeset)

It even seems to slightly beat ARC according to the MQ paper.

Andreas

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] ARC patent

2005-01-17 Thread Zeugswetter Andreas DAZ SD


> >> FYI, IBM has applied for a patent on ARC (AFAICS the patent application
> >> is still pending, although the USPTO site is a little hard to grok):
> > 
> >> 
> http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220040098541%22.PGNR.&OS=DN/20040098541&RS=DN/20040098541
> > 
> > Ugh.  We could hope that the patent wouldn't be granted, but I think
> > it unlikely, unless Jan is aware of prior art (like a publication
> > predating the filing date).  I fear we'll have to change or remove
> > that code.
> > 
> > regards, tom lane
> 
> Unfortunately no. The document that inspired me to adapt ARC for 
> PostgreSQL is from the USENIX File & Storage Technologies Conference 
> (FAST), March 31, 2003, San Francisco, CA.
> 
> I am seriously concerned about this and think we should not knowingly 
> release code that is possibly infringing a patent.

I thought IBM granted the right to use these methods in OSS software.
PostgreSQL is OSS software, thus only such entities relicensing pg
need to worry about the patent. 
Also the algo is probably sufficiently altered already to not be subject 
to the patent, no ?

Andreas

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] RC2 and open issues

2004-12-21 Thread Zeugswetter Andreas DAZ SD


> If we don't start where we left off, I am thinking if you do a lot of
> writes then do nothing, the next checkpoint would be huge because a lot
> of the LRU will be dirty because the bgwriter never got to it.

I think the problem is, that we don't see wether a "read hot" 
page is also "write hot". We would want to write dirty "read hot" pages,
but not "write hot" pages. It does not make sense to write a "write hot"
page since it will be dirty again when the checkpoint comes.

Andreas

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Shared row locking

2004-12-21 Thread Zeugswetter Andreas DAZ SD


> In general, I agree with Tom: I haven't seen many programs that use
> extended SELECT FOR UPDATE logic. However, the ones I have seen have
> been batch style programs written using a whole-table cursor - these
> latter ones have been designed for the cursor stability approach.

I think if we add shared locks we should by default behave like 
cursor stability isolation level, that only holds one shared lock for 
the current cursor row. The semantics are well defined in SQL.
If you want repeatable read you need to change isolation level.
I know FK checks will need to keep the locks, but I would special case 
that.

Andreas

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] [Testperf-general] BufferSync and bgwriter

2004-12-16 Thread Zeugswetter Andreas DAZ SD


> Hmmm, I've not seen this.  For example, with people who are having trouble 
> with checkpoint spikes on Linux, I've taken to recommending that they call 
> sync() (via cron) every 5-10 seconds (thanks, Bruce, for suggestion!).   
> Believe it or not, this does help smooth out the spikes and give better 
> overall performance in a many-small-writes situation.

The reason is imho that the checkpoint otherwise also syncs all other 
writes. These can be writes other backends had to do to replace a buffer.
Linux obviously lacks a mechanism to distribute the IO for cached writes 
over time ala bgwriter (or does not do it when already faced with an IO 
bottleneck).

Andreas

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] bgwriter changes

2004-12-16 Thread Zeugswetter Andreas DAZ SD


> > Only if you redefine the meaning of bgwriter_percent.  At present it's
> > defined by reference to the total number of dirty pages, and that can't
> > be known without collecting them all.
> > 
> > If it were, say, a percentage of the total length of the T1/T2 lists,
> > then we'd have some chance of stopping the scan early.

> The other way around would make sense. In order to avoid writing the 
> busiest buffers at all (except for checkpoinging), the parameter should 
> mean "don't scan the last x% of the queue at all".

Your meaning is 1 - above meaning (at least that is what Tom and I meant),
but is probably easier to understand (== Informix LRU_MIN_DIRTY).

> Still, we need to avoid scanning over all the clean blocks of a large 
> buffer pool, so there is need for a separate dirty-LRU.

Maybe a "may be dirty" bitmap would be easier to do without beeing deadlock 
prone ?

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] bgwriter changes

2004-12-15 Thread Zeugswetter Andreas DAZ SD


> The two alternative algorithms are similar, but have these 
> differences:
> The former (option (2)) finds a constant number of dirty pages, though
> has varying search time.

This has the disadvantage of converging against 0 dirty pages.
A system that has less than maxpages dirty will write every page with 
every bgwriter run.

> The latter (option (3)) has constant search
> time, yet finds a varying number of dirty pages.

This might have the disadvantage of either leaving too much for the 
checkpoint or writing too many dirty pages in one run. Is writing a lot 
in one run actually a problem though ? Or does the bgwriter pause
periodically while writing the pages of one run ?
If this is expressed in pages it would naturally need to be more than the 
current maxpages (to accomodate for clean pages). The suggested 2% sounded 
way too low for me (that leaves 98% to the checkpoint).

Also I think we are doing too frequent checkpoints with bgwriter in 
place. Every 15-30 minutes should be sufficient, even for benchmarks.
We need a tuned bgwriter for this though.

Andreas

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] bgwriter changes

2004-12-15 Thread Zeugswetter Andreas DAZ SD


> > > and stops early when eighter maxpages is reached or bgwriter_percent
> > > pages are scanned ?
> > 
> > Only if you redefine the meaning of bgwriter_percent.  At present it's
> > defined by reference to the total number of dirty pages, and that can't
> > be known without collecting them all.
> > 
> > If it were, say, a percentage of the total length of the T1/T2 lists,
> > then we'd have some chance of stopping the scan early.
> 
> ...which was exactly what was proposed for option (3).

But the benchmark run was with bgwriter_percent 100. I wanted to point out,
that I think 100% is too much (writes hot pages multiple times between 
checkpoints).
In the benchmark, bgwriter obviously falls behind, the delay is too long. But 
if you
reduce the delay you will start to see what I mean.

Actually I think what is really needed is a max number of pages we want dirty 
during checkpoint. Since that would again require scanning all pages, the next 
best
definition would imho be stop at a percentage (or a number of pages short) of 
total T1/T2. 
Then you can still calculate a worst case IO for checkpoint (assume that all 
hot pages are dirty) 

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] bgwriter changes

2004-12-14 Thread Zeugswetter Andreas DAZ SD

> (2) Remove bgwriter_percent. I have yet to hear anyone argue that 
> there's an actual need for bgwriter_percent in tuning 
> bgwriter behavior,

One argument for it is to avoid writing very hot pages.

> (3) Change the meaning of bgwriter_percent, per Simon's proposal. Make 
> it mean "the percentage of the buffer pool to scan, at most, to look for 
> dirty buffers". I don't think this is workable, at least not at this

a la long I think we want to avoid that checkpoint needs to do a lot of 
writing, without writing hot pages too often. This can only reasonably be 
defined with a max number of pages we want to allow dirty at checkpoint time. 
bgwriter_percent comes close to this meaning, although in this sense the value 
would need to be high, like 80%.

I think we do want 2 settings. Think of one as a short time value 
(so bgwriter does not write everything in one run) and one a long term
target over multiple runs.

Is it possible to do a patch that produces a dirty buffer list in LRU order
and stops early when eighter maxpages is reached or bgwriter_percent
pages are scanned ?

Andreas

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] V8 Beta 5 on AIX

2004-12-10 Thread Zeugswetter Andreas DAZ SD


>> However, one thing we can do is to try this in Makefile.aix:
>>  # AIX needs threads for everything that uses libpq
>>  LIBS += $(PTHREAD_LIBS)
>> That is going to enable thread libs for all linking including the
>> backend, but it might work.

> That is certainly wrong.  The correct thing is to add PTHREAD_LIBS to
> all and only those links that include libpq.  I suspect that the cc_r

You mean like attached patch ? That is safe and can be applied.
My machine has problems with two configure tests with --enable-thread-safety.

1. the snprintf long long int check that uses int64 as variable name which is 
a typedef in sys/inttypes.h
2. whether strerror_r returns int check fails with a redefine of named function 
from string.h

No idea how to fix those, but workaround is easy :-(

-lpthread can be removed from PTHREAD_LIBS, the lib is a compat symlink to 
-lpthreads, I guess it does not hurt eighter. 

Andreas


pg_aix_pthreads.patch
Description: pg_aix_pthreads.patch

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] V8 Beta 5 on AIX

2004-12-07 Thread Zeugswetter Andreas DAZ SD


> OK, so does someone want to suggest why a library used to link libpq
> would also be needed to link binaries that use libpq?  And with no cc_r
> it seems I have no idea how to get this working.

AIX does not pull in any libraries that a shared lib depends upon, not even 
libc.
You only specify those on the .so link line so the linker can check what
is unresolved.

If he does not have cc_r, then he eighter uses gcc or a newer compiler, 
that does not need cc_r any more ???

I would generally use cc_r if it exists or gcc for the threaded build. 
The performance difference between cc and cc_r should not be noticeable.

Andreas

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PATCHES] [HACKERS] Open Items

2004-12-02 Thread Zeugswetter Andreas DAZ SD


> I am going to discard these emails.  We haven't solve the Win32 terminal
> server problem and I think it needs to be moved to the TODO list instead.

Yes, please do that. I do not think there is a problem on TS other than some 
missing permissions. The patch was only intended to avoid starting 2 postmasters
for the same datadir. The patch is not 100% since I was told that my check 
would 
fail on Win95 and falsely assume Win95 allows a \ in the shmem name.
I think that would currently not matter, but it is unclean for the future.

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Stopgap solution for table-size-estimate updatingproblem

2004-11-30 Thread Zeugswetter Andreas DAZ SD


>> I think I recall that lseek may have a negative effect on some OS's
>> readahead calculations (probably only systems that cannot handle an
>> lseek to the next page eighter) ? Do you think we should cache the
>> last value to avoid the syscall ?
> 
> We really can't, since the point of doing it is to find out whether any
> other backends have extended the file since we last looked.  Also, IIRC

Well yes. The value would need to be kept ajour by the backends that 
extend, with an interlocked increment. We would thus need to keep the 
count in shmem.

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Stopgap solution for table-size-estimate updatingproblem

2004-11-29 Thread Zeugswetter Andreas DAZ SD


>> This is not true in my case, since I only "update statistics"/analyze
>> when the tables have representative content (i.e. not empty).
> 
> I'm unsure why you feel you need a knob to defeat this.  The only time
> when the plan would change from what you think of as the hand-tuned
> case is when the physical table size is greatly different from what it
> was when you analyzed.

Ok, understood. I just need to make sure I don't "vacuum full" in that case,
which is good anyway if I expect the table to soon grow to this size again.
I think that is good. 

I think I recall that lseek may have a negative effect on some OS's readahead 
calculations (probably only systems that cannot handle an lseek to the next 
page 
eighter) ? Do you think we should cache the last value to avoid the syscall ?

Andreas

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Stopgap solution for table-size-estimate updatingproblem

2004-11-29 Thread Zeugswetter Andreas DAZ SD

>> One possibility: vacuum already knows how many tuples it removed.  We
>> could set reltuples equal to, say, the mean of the number-of-tuples-
>> after-vacuuming and the number-of-tuples-before.  In a steady state
>> situation this would represent a fairly reasonable choice.  In cases
>> where the table size has actually decreased permanently, it'd take a few
>> cycles of vacuuming before reltuples converges to the new value, but that
>> doesn't seem too bad.
> 
> That sounds good to me.  Covers all cases I can see from here.

Yes, sounds good for me also. I think that would be a good thing even if viewed
isolated from the rest of the proposal. I am sorry if I made the impression that
I don't like a change in this direction in general, I think there is need for 
both.
I am only worried about core OLTP applications where every query is highly 
tuned 
(and a different plan is more often than not counter productive, especially if 
it 
comes and goes without intervention).

>> A standalone ANALYZE should still do what it does now, though, I think;
>> namely set reltuples to its best estimate of the current value.

good, imho :-)

> A GUC-free solution...but yet manual control is possible. Sounds good to
> me - and for you Andreas, also?

It is the GUC to keep the optimizer from using the dynamic page count, that 
I would still like to have.
I especially liked Simon's name for it: enable_dynamic_statistics=true 

Tom wrote:
>> But I am used to applications
>> that prepare a query and hold the plan for days or weeks. If you happen to 
>> create the plan when the table is by chance empty you lost.
> 
> You lose in either case, since this proposal doesn't change when
> planning occurs or doesn't occur.

This is not true in my case, since I only "update statistics"/analyze when the
tables have representative content (i.e. not empty).

Andreas

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Stopgap solution for table-size-estimate updating problem

2004-11-26 Thread Zeugswetter Andreas DAZ SD


>   rel->pages = RelationGetNumberOfBlocks(relation);

Is RelationGetNumberOfBlocks cheap enough that you can easily use it for the
optimizer ?

I myself have always preferred more stable estimates that only change
when told to. I never liked that vacuum (without analyze) and create index
change those values, imho only analyze should. But I am used to applications
that prepare a query and hold the plan for days or weeks. If you happen to 
create the plan when the table is by chance empty you lost.
But it seems I am quite alone with that opinion.

> A variant of this is to set reltuples = density, relpages = 1 instead
> of 0, which makes the relpages value a lie but would be even less likely
> to confuse client-side code.

I would let analyze still fill in the correct values.
The 0 or 1 page previous denstity could be used for only when the table is 
empty.
I don't think eighter would confuse client-side code more or less.
More confusing for client-side code would be if the two would not show the
analyze results any more.

I would like to see this configurable. Like "use_analyzed_stats" or
"enable_online_statistics" or some such that can be used more future
online statistics.
One setting would use:
pages = RelationGetNumberOfBlocks(relation)
count = pages * reltuples / relpages

the other: 
pages = relpages
count = relcount

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Can postgresql accept mutliple connections in the same instance?

2004-11-19 Thread Zeugswetter Andreas DAZ SD


>> I am running of postgresql database servers with generally 30-50 users 
>> at a time per server.   I have noticed one thing for web based databases 
>> that they fail to initialse a pg_connection connection every now and 
>> again and return no error message at all.

I am thinking of the PG_SOMAXCONN maximum connects in the queue
of the listen socket. That would only be visible to the client app,
and would mainly happen when load is high so that postmaster cannot fork
new backends fast enough. And if the client does not properly report ...
IIRC some systems have a max of 5.

Andreas

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] pg_resetxlog options

2004-11-18 Thread Zeugswetter Andreas DAZ SD


>> Some other time maybe. Meanwhile, this patch ought to make it compile
>> more cleanly on Windows - not sure why I get errors there but not
>> Linux.
> 
> Because getopt() is normally declared in unistd.h, not getopt.h (Windows 
> being an exception?).

getopt is not in any standard Windows headers. The getopt.h header is from mingw
to assist porting (don't know why they didn't put it in unistd.h ?).

Andreas

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] psql \e broken again

2004-11-16 Thread Zeugswetter Andreas DAZ SD

>> I like Kevin Brown's suggestion of writing out a temporary .txt file and
>> 'executing' it. It will follow the principle of least suprise for Windows
>> users.
>
> Note that the current default behavior (assuming you've not set EDITOR)
> is "vi foo.txt" which is quite unlikely to be helpful to a Windows user.

works for me :-) but agreed not a good default.

> I'm not sure we need to do the "execute a textfile" bit, but at the very

I don't think that eighter. And also imho the obvious extension would be .sql, 
and that might not be registered, or already registered for another not really
wanted sql tool. I think we need or own editor setting, the windows way would 
be 
a private registry setting, but using EDITOR is imho just as good. 

> least DEFAULT_EDITOR should probably be "notepad.exe" on Windows.

Yes that would be a sane default on Windows.

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] psql \e broken again

2004-11-16 Thread Zeugswetter Andreas DAZ SD


>> So I thought I'd try a few scenarios, since I have it installed:
>> [ none of which work ]
>> So to answer your question, at least in part, there is no current good
>> behavior to emulate. At least on this version of CVS:

> I think this is fairly graphic proof that (1) a straight port without
> doing anything special is essentially nonfunctional on Windows, and that
> (2) there isn't any easy way for the user to fix it by inserting quotes
> for himself.

I am not sure the test is valid, since %EDITOR% was used on the command line,
which does it's own magic on quotes. Is there a command that would use the 
envvar EDITOR without putting it on the commandline ?

We are talking about directly using the envvar from inside the program, no?

Andreas

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] psql \e broken again

2004-11-15 Thread Zeugswetter Andreas DAZ SD


> We could maybe compromise with (again, for Windows only) a policy like
> "double-quote unless the value already contains at least one
> double-quote".  This should work conveniently for everybody except the
> user who wants EDITOR='gvim.exe -y' on Windows; he'll have to add
> some useless-looking double quotes a la EDITOR='"gvim.exe" -y'.

Single quotes don't work on Windows (tried with %EDITOR% x.txt). I think we 
should not worry too much since the user could use the "good old" 
EDITOR=C:/PROGRA~1/vim/vim63/gvim.exe -y syntax on win if all else fails.

Andreas

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] psql \e broken again

2004-11-15 Thread Zeugswetter Andreas DAZ SD


>> The EDITOR variable seems to have a fairly standard meaning on Unix 
>> systems.  I've been using that EDITOR value for years without problems, 
>> only when I use psql's \e once in a while it breaks.  I don't think we 
>> should deviate from what seems to be a standard practice.
>
> Agreed, no quotes on Unix.  I'm just wondering what to do on Windows.

I would require the user to add quotes around his executable if it has spaces.

set EDITOR="C:/Program Files/vim/vim63/gvim.exe" -y

Andreas

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-11 Thread Zeugswetter Andreas DAZ SD


> How are you planning to represent the association between MIN/MAX and
> particular index orderings in the system catalogs?

Don't we already have that info to decide whether an index handles 
an "ORDER BY" without a sort node ?

Andreas

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PATCHES] [HACKERS] Open Items

2004-11-02 Thread Zeugswetter Andreas DAZ SD


> Just one question about the actual implementation of the patch - why are
> you setting the OS version *before* you call GetVersionEx()? 

The Microsoft Example did a memset on the structure before calling void GetVersionEx().
Setting it to a version that needs the Global\ is only a safeguard against a failing 
call. I guess the call cannot fail, and thus setting it before is not really needed.

> It'll just get overwritten...

Yes, if the void call does not fail to work.

Andreas

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PATCHES] [HACKERS] Open Items

2004-11-02 Thread Zeugswetter Andreas DAZ SD


> > >   o fix shared memory on Win2k terminal server
> > > 
> > >   We might be able to just mark this as not supported.

The shmem code works in a terminal server session with or without the patch.
Magnus had a different problem, probably permissions. Since I do not have a
non admin user (on a TS server) I disabled the win32_is_admin check.

It is my opinion that we should allow pg to run as Admin on Windows, at least 
with an override option. Services that run under a specified user are a headache
on Win32, because you need to store a password, and a lot of systems only have 
one user.

> > I have attached a patch that I think fixes this. The problem I saw 
> > and fixed is, that the shmem created in a terminal services client is not 
> > visible to the console (or services.msc).

The decision is simple. If we want the shmem to be global on the machine we need 
the patch. I think we want that, but can not give an authoritative answer.

> > I think in addition the system global name "sharemem.1" should be made more 
> > pg specific, like "PostgreSQL.1". I have not done this since a new compile 
> > would not detect a running old beta. But now would be the time (or never).

I think all agreed that this is good.

Andreas

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PATCHES] [HACKERS] Open Items

2004-11-02 Thread Zeugswetter Andreas DAZ SD


> It makes no difference on any of my systems, so at least it doesn't
> completely solve the problem. I haven't heard any 
> confirmation on wether
> it partially solves it.

It certainly does not solve any part of your problem. I think your problem 
is a permissions problem.

It does however make sure the shmem is global, which I think we need for
detecting certain parallel startup (still running backend) situations.

Andreas

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] rmtree() failure on Windows

2004-10-27 Thread Zeugswetter Andreas DAZ SD


> >>> 300 secs (!) fs timeout is really broken.
> >>> Looks more like a locking or network timeout issue.
> >>> What error codes does unlink(3) return?
> >>
> >
> >> success.
> >
> >
> > Oops! 5min timeout for success is certainly problematic.
> >
> >
> 
> You misunderstood. The 300 secs is not in waiting for unlink() to 
> return, it is in waiting for its effects to be seen by 
> rmdir() (i.e. for 

because the bgwriter obviously keeps them open, no ?

Andreas

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PATCHES] [HACKERS] Open Items

2004-10-19 Thread Zeugswetter Andreas DAZ SD


> >>o fix shared memory on Win2k terminal server
> >> 
> >>We might be able to just mark this as not supported.
> >
> >I have attached a patch that I think fixes this. The problem I saw 
> >and fixed is, that the shmem created in a terminal services client is not 
> >visible to the console (or services.msc).
> 
> Does this actually fix the problem for you?
> Because, as I have previously posted I think, it does *not* solve the
> problem on any of my test machines. I still get the shmget() error
> message when running from a TS session.

I think you are having another problem. 
I can create it here (with or without the patch). I am running 
W2000 5.00.2195 SP4. Maybe you are having a permissions problem? 
I am using a user with near Administrator privs.

> Also, I don't really see how the visibility of the shmem segment
> matters.

If it really does not matter, please don't apply my patch Bruce.
(still do the rename though please)

> We can't *create* the first instance of it, which should not
> affect this at all. And if we passed that, all backends are still
> execute in the same session, so there is no effect on it.

Yes it only matters if postmaster is started/trying to start from different 
TS Sessions. I think we need to determine global existance of the shm segment 
to get rid of old processes/segments.

> services.msc only interacts with the SCM, it has nothing at all to do
> with shmem.

I meant if "run as a service", which is the same TS session as the console.

Andreas

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Open Items

2004-10-18 Thread Zeugswetter Andreas DAZ SD


>   o fix shared memory on Win2k terminal server
> 
>   We might be able to just mark this as not supported.

I have attached a patch that I think fixes this. The problem I saw 
and fixed is, that the shmem created in a terminal services client is not 
visible to the console (or services.msc).

It was necessary to differenciate OS versions, this might be better put 
elsewhere.

I think in addition the system global name "sharemem.1" should be made more 
pg specific, like "PostgreSQL.1". I have not done this since a new compile 
would not detect a running old beta. But now would be the time (or never).

Andreas


shmem.win32.patch
Description: shmem.win32.patch

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] plans for bitmap indexes?

2004-10-14 Thread Zeugswetter Andreas DAZ SD


> > create index people_male_gay_ix on people (city) where gender = 'male' and
> > orientation = 'gay';
> 
> You've forgotten part of my premise (based on a real case I discussed on IRC) 
> that there are EIGHTEEN criteria columns.

That is why I said maybe :-) Whether it helps depends on the number of 
actually (often) used access patterns.

>   That would require, by the method 
> you have above, roughly 18(3rd factorial) indexes, times the number of values 
> allowed by each column, which if it averaged, say 5 values, would be 24,480 

Well, an index only needs to reduce the cost enough so that you can afford your 
workload and have reasonable response times, so you might only need to create a 
few of them.

I was actually only trying to help optimize this example without the "bitmap index"
feature, not trying to say that for this example "partial indexes" are better.
Especially since the first example, that mentioned partial indexes was not 
"the way to do it" for a value that represents a large part of your table (here approx 
50%).
(don't do: create index people_male on people (gender) where gender='male';)

Andreas

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] plans for bitmap indexes?

2004-10-13 Thread Zeugswetter Andreas DAZ SD


> > The most nearly comparable thing is be the notion of "partial
> > indexes," where, supposing you had 60 region codes (e.g. - 50 US
> > states, 10 Canadian provinces), you might set up indices thus:

> For example, imagine you have a table on a dating website with 18 columns 
> representing 18 different characteristics for matching.  Imagine that you 
> index each of those columns seperately. If you do:
> 
> SELECT * FROM people WHERE orientation = 'gay' AND gender = 'male' AND city = 
> 'San Francisco';

I think bitmap indexes do have valid use cases, but partitioned indexes
are really a wonderful feature with a lot of use cases, maybe including this one.

Workable examples for useful partitioned indexes, that help here are:

create index people_male_ix on people (city) where gender = 'male';
create index people_gay_ix on people (city) where orientation = 'gay';
create index people_male_gay_ix on people (city) where gender = 'male' and orientation 
= 'gay';

Note, that the indexed column differs from the partitioning clause.
Note also, that the last index will perform way better than a combo of bitmap indexes.

Andreas

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Security implications of config-file-location patch

2004-10-08 Thread Zeugswetter Andreas DAZ SD


> > If they are using tablespaces is it OK that anyone can see their
> > location?
> 
> Good point.  Should we obscure pg_tablespace similarly to 
> what we do for pg_shadow?

Hmm, I can not see how a person with file access could not easily find the 
file for a specific table without pg_tablespace anyway (since oid names will 
be quite unique). Without file access, what malicious act is he going to do 
with that info ? 

I think hiding that info would not really be safer, thus not worth it.

Andreas

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Two-phase commit

2004-10-07 Thread Zeugswetter Andreas DAZ SD


> Well, the question is how long must the individual databases retain
> state with which to answer "recover" requests.  I don't like "forever",
> so I'm proposing that there should be an explicit command to say "you
> can forget about this gid".

I think it would be ok to forget the gid after:
1. the same TX manager session sends the next statement (maybe only BEGIN?)
2. the TX manager session is explicitly disconnected (do we know the difference
to an aborted connection ?)

> 
> Note that this is only really necessary because of Heikki's choice to
> make the API work in terms of a user-assigned GID.

This was not an arbitrary choice, but is required by most/all TX managers :-(
I agree that it would be cleaner for the tx manager to keep such a map.
Maybe the idea was for the gid's to be human readable since human intervention 
is rarely but still required with 2PC, especially in the case where the tx manager 
is unavailable/unrecoverable. 

Andreas

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

61 matches

Mail list logo