Re: [HACKERS] Concurrent connections in psql
+++ We'd love this feature as it would really help us write better test cases ! Regards Sailesh -- Sailesh Krishnamurthy Amalgamated Insight [W] (650) 242-3503 [C] (650) 804-6585 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gregory Stark Sent: Tuesday, March 27, 2007 6:39 AM To: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Concurrent connections in psql Would people be interested in this feature? There was some positive reaction from users but I'm not sure how excited developers are about complicating the logic in psql (which is already pretty tangled). This code bitrotted severely when Tom added the cursor support to psql. I don't mind redoing it if people want it though. I already did a first pass at doing so but it wasn't clear to me how best to integrate it with that cursor support change. I elected to treat each chunk of results from the cursor as a separate result set which makes it possible to switch connections between chunks. That's nice but probably not really acceptable judging by how much effort Tom went through in the cursor code to avoid having the chunks appear as separate result sets. Probably I'll have to do more work in that area. Are people interested in having this? The reason I think it's particularly interesting is writing regression tests -- especially to test HOT cases. Gregory Stark [EMAIL PROTECTED] writes: I mentioned this a while back, now that 8.2 is out perhaps others will be more interested in new code. Currently Postgres regression tests only test functionality within a single session. There are no regression tests that test the transaction semantics or locking behaviour across multiple transactions. I modified psql to allow you to open multiple connections and switch between them with a sort of csh job control style interface. It actually works out pretty well. It's fairly easy to write regression tests for basic 2-client or 3-client cases. The actual user interface may need some discussion though. I didn't want to play the name game so I just prefixed all my commands with c and figured we can always rename them later. And experience with actually writing the tests shows that the explicit \cwait command which was needed to eliminate (in practice if not in theory) race conditions in regression tests turns out to be more flexibility than necessary. Since you end up having to insert one in precisely predictable locations -- namely after every asynchronous command and after every connection switch -- perhaps it would be simpler to just have a \pset cwait command that automatically introduces timeouts in precisely those places. A brief explanation including an example regression test (the SAVEPOINT locking bug discovered recently) and the patch here: http://community.enterprisedb.com/concurrent/index.html -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Concurrent connections in psql
+++ We'd love this feature as it would really help us write better test cases ! Regards Sailesh -- Sailesh Krishnamurthy Amalgamated Insight [W] (650) 242-3503 [C] (650) 804-6585 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gregory Stark Sent: Tuesday, March 27, 2007 6:39 AM To: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Concurrent connections in psql Would people be interested in this feature? There was some positive reaction from users but I'm not sure how excited developers are about complicating the logic in psql (which is already pretty tangled). This code bitrotted severely when Tom added the cursor support to psql. I don't mind redoing it if people want it though. I already did a first pass at doing so but it wasn't clear to me how best to integrate it with that cursor support change. I elected to treat each chunk of results from the cursor as a separate result set which makes it possible to switch connections between chunks. That's nice but probably not really acceptable judging by how much effort Tom went through in the cursor code to avoid having the chunks appear as separate result sets. Probably I'll have to do more work in that area. Are people interested in having this? The reason I think it's particularly interesting is writing regression tests -- especially to test HOT cases. Gregory Stark [EMAIL PROTECTED] writes: I mentioned this a while back, now that 8.2 is out perhaps others will be more interested in new code. Currently Postgres regression tests only test functionality within a single session. There are no regression tests that test the transaction semantics or locking behaviour across multiple transactions. I modified psql to allow you to open multiple connections and switch between them with a sort of csh job control style interface. It actually works out pretty well. It's fairly easy to write regression tests for basic 2-client or 3-client cases. The actual user interface may need some discussion though. I didn't want to play the name game so I just prefixed all my commands with c and figured we can always rename them later. And experience with actually writing the tests shows that the explicit \cwait command which was needed to eliminate (in practice if not in theory) race conditions in regression tests turns out to be more flexibility than necessary. Since you end up having to insert one in precisely predictable locations -- namely after every asynchronous command and after every connection switch -- perhaps it would be simpler to just have a \pset cwait command that automatically introduces timeouts in precisely those places. A brief explanation including an example regression test (the SAVEPOINT locking bug discovered recently) and the patch here: http://community.enterprisedb.com/concurrent/index.html -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] how to add seconds to a TimestampTz
If you read the autovacuum_naptime into an Interval object once, why can't you just use timestamptz_pl_interval ? You won't be using the interval input/output repeatedly surely. Regards Sailesh -- Sailesh Krishnamurthy Amalgamated Insight [W] (650) 242-3503 [C] (650) 804-6585 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alvaro Herrera Sent: Wednesday, March 14, 2007 3:46 PM To: Hackers Subject: [HACKERS] how to add seconds to a TimestampTz Is there a better way than going to time_t and back? I am currently using this: db-next_worker = time_t_to_timestamptz(timestamptz_to_time_t(current_time) + autovacuum_naptime); (db-next_worker is a TimestampTz, as is current_time. autovacuum_naptime is integer for a number of seconds) but it doesn't seem clean, and the comments in the functions more or less say that their use is discouraged. I saw about doing it via the interval input/output but that's an awful lot of work ... Is this the first time this is going to be done in the backend? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Update on TelegraphCQ
Folks I'm glad to be back on the pgsql-hackers mailing list. I'm writing to give an update on the status of TelegraphCQ. As Josh says, the TelegraphCQ project at UC Berkeley is more or less done as most of the students on the project have graduated, or will be done this summer. Having said that, some of us from the TelegraphCQ team have started a company (Amalgamated Insight, Inc.) in order to commercialize an enterprise-class stream processing engine. We're working on a much more polished streaming engine that is based on the v8.1 code (among other more application-oriented things). If anybody in the community is interested in more details we'll be happy to have an off-line conversation, whether it's about a potential use for our technology or a possible employment opportunity. For instance, Neil Conway has interned with our team this summer and we have benefited enormously from his experience. I'll also make it a point to attend the next SFPUG meeting in case anybody local would like to talk in vivo. Let me however say that we can't talk too much about various aspects of our business model and/or distribution strategy (in other words, I don't yet know if we will do open-source releases of everything we build). Josh Berkus said: Dragan, What are the possibilities (if any) for continuous dataflow streaming with PostgreSQL v.8.1 ? Something like TelegraphCQ project,but it was for v.7.3.Is there any alternatives for the latest version of PostgreSQL ? The TelegraphCQ team has stopped public development. So it's pretty much waiting for someone to take on their code. Regards Sailesh -- Sailesh Krishnamurthy Amalgamated Insight [W] (650) 267-2503 [C] (650) 804-6585 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Call to build-in operator from new operator
As others pointed out, DirectFunctionCall2 is your friend. There is a mailing list for TelegraphCQ. It's a Yahoo group - visit http://groups.yahoo.com/group/telegraphcq About accessing data from a table in another database ... we need to know more about what exactly you're doing. Please post on the tcq mailing list. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Query optimizer 8.0.1 (and 8.0)
Sounds a bit like multi-dimensional clustering ... http://www.research.ibm.com/mdc/ After the ARC experience though ... -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Design notes for BufMgrLock rewrite
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom and changing the buf_table hash table. The only common Tom operation that needs exclusive lock is reading in a page that Tom was not in shared buffers already, which will require at Tom least a kernel call and usually a wait for I/O, so it will be Tom slow anyway. Why not a separate lock per bucket chain in the hash table in addition to the system-wide LWLock ? It's not so much that such an operation will be slow anyway but that such a slow operation will unnecessarily block other operations. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] enforcing a plan (in brief)
Hicham For your experiments (VLDB ? :-) your best bet of specifically bolting on a plan (if you can't convince the optimizer to do the right thing) is to hack it in the query planner. I've done similar hacks in the past, but only in the TelegraphCQ code and not in PostgreSQL. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Group-count estimation statistics
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom The only real solution, of course, is to acquire cross-column Tom statistics, but I don't see that happening in the near Tom future. Another approach is a hybrid hashing scheme where we use a hash table until we run out of memory at which time we start spilling to disk. In other words, no longer use SortAgg at all .. Under what circumstances will a SortAgg consumer more IOs than a hybrid hash strategy ? -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Much Ado About COUNT(*)
Jonah == Jonah H Harris [EMAIL PROTECTED] writes: Jonah Replying to the list as a whole: Jonah If this is such a bad idea, why do other database systems Jonah use it? As a businessperson myself, it doesn't seem Jonah logical to me that commercial database companies would Jonah spend money on implementing this feature if it wouldn't be Jonah used. Remember guys, I'm just trying to help. Systems like DB2 don't implement versioning schemes. As a result there is no need to worry about maintaining visibility in indexes. Index-only plans are thus viable as they require no change in the physical structure of the index and no overhead on update/delete/insert ops. I don't know about Oracle, which I gather is the only commercial system to have something like MVCC. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Much Ado About COUNT(*)
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom People who hang around Postgres too long tend to think that Tom MVCC is the obviously correct way to do things, but much of Tom the rest of the world thinks differently ;-) It works the other way too ... people who come from the locking world find it difficult to wrap their heads around MVCC. A big part of this is because Gray's original paper on transaction isolation defines the different levels based on what kind of lock acquisitions they involve. A very nice alternative approach to defining transaction isolation is Generalized isolation level definitions by Adya, Liskov and O'Neill that appears in ICDE 2000. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] plans for bitmap indexes?
Gavin == Gavin Sherry [EMAIL PROTECTED] writes: Gavin I'm uncertain about the potential benefit of Gavin this. Isn't/shouldn't the effects of caching be assisting Gavin us here? It all depends on how large your table is, and how busy the system is (other pressures on the cache). While it's difficult to plan for the latter you can plan for the former. For the latter you could assume that the effective cache size is some fraction of the real size to account for the effects of other queries. Unless you are real sure that you will never kick out a page from the buffer cache .. I believe that for large enough tables this can certainly help .. it sure is something that many other systems have implemented. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] plans for bitmap indexes?
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom One huge advantage is that the actual heap visiting becomes Tom efficient, eg you never visit the same page more than once. Tom (What you lose is the ability to retrieve data in index Tom order, so this isn't a replacement for existing indexscan Tom methods, just another plan type to consider.) Even without bitmap indexes, without trying to use multiple indexes etc. this (visiting a page only once) is useful. In other words, I'd like to see the indexscan broken up into: (1) an operator that returns a list of TIDs, (2) Sort the TIDs and (3) an operator that fetches heap tuples from the sorted TID list. Of course the resulting data from the heap will be out of order but that often times is less important than unnecessarily visiting (and possibly even re-fetching from disk) the same heap page twice for a given index scan. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] APR 1.0 released
Marc == Marc G Fournier [EMAIL PROTECTED] writes: Marc On Thu, 7 Oct 2004, Bruce Momjian wrote: Added to TODO: * Consider parallel processing a single query This would involve using multiple threads or processes to do optimization, sorting, or execution of single query. The major advantage of such a feature would be to allow multiple CPUs to work together to process a single query. Marc Do we have 'make backend thread safe' listed yet? As I Marc recall it, until that gets done, parallelization of anything Marc was considered to be a relatively onerous task, no? You don't really need to parallelize in separate threads .. you can have more than one process working on one query. This is in fact the model that exploits SMPs in at least one commercial RDBMS. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] APR 1.0 released
IMHO the best references to parallelizing query plans are in the Volcano papers. The Exchange operator is a really clean abstraction - the idea is to place the Exchange operator in query plans and that way you don't have to paralellize any other operator. Exchange takes care of managing the IPC queues and also worries about whether or not you have to, say, rehash the data, or broadcast the data to all other processes or direct the data to a single node ... I'd suggest reading the following paper: Encapsulation of parallelism in the Volcano query processing system By Goetz Graefe in SIGMOD 1990. Link: http://portal.acm.org/citation.cfm?id=98720 The above link also has references to Gamma but I really like the exposition in the Volcano/Exchange work much better. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] plans for bitmap indexes?
Yann == Yann Michel [EMAIL PROTECTED] writes: Yann O.K. I downloaded it :-) We will see if and how I can Yann help FYI .. in case you aren't aware already: http://portal.acm.org/citation.cfm?id=98720 -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] beta1 beta2 Windows heavy load
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom instead of increasing it. And I don't want to create a Tom full-fledged alloc/free package for shared memory --- the Tom bang-for-buck ratio for that is way too low. So I'm inclined I think I've said it before, but we actually have such a beast - a shared memory MemoryContext that uses libmm (which apparently Apache also uses). It should be fairly easy to patch in if it's worth it for you folks. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] APR 1.0 released
CB == Christopher Browne [EMAIL PROTECTED] writes: CB futile discussions ask for it. Notably, on an SMP system, it CB would be a neat idea for complex queries involving joins to CB split themselves so that different parts run in separate CB threads. You don't really need threads for this. All you need is to have multiple backends and use queues to exchange tuples at specific points. This is much like the Exchange operator in Volcano. CB The other Way, Way Cool part would be for queries that are CB scanning big tables to split the scans into unions of partial CB scans, so that on an 8 CPU box you'd take the Big 4GB Table CB and have 8 threads simultaneously scanning different parts of CB it. (And making ARC all the more important :-).) Again this can be done without threads .. you just need inter-process communication. (BTW, there is at least one commercial system that follows exactly this model). -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] pulling projection up in plans
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom Hicham G. Elmongui [EMAIL PROTECTED] writes: is there a way to pull the projection operator to the top of the query plan? I wish there's a variable that can be set to do so. Tom Could you be more specific about what you're hoping to Tom accomplish? This is just a suspicion, but I suspect that Hicham wants to avoid tuples being projected in Scan operators .. in a select (*) query projecting a tuple essentially causes a copy from the buffer pool to the backend process' heap space. I guess it would work just fine to have the tuple remain in the buffer and keep the buffer pinned. (In TelegraphCQ we actually do this .. we also don't really materialize join tuples - instead we have an intermediate tuple format which is a list of pointers to the various source tuples. This makes sense as we do shared query processing - ie. different queries with similar join predicates and different projection attributes use the same physical join tuples - projection is the final operation on the intermediate tuples when we return the tuples to clients. We did this really to facilitate sharing and don't really have any hard numbers on whether this would make sense in a general context with pgsql. Actually IIRC we did consider this - if this would make a diff to pgsql - and did some crude perf studies and found that it didn't really help if there was no sharing .. as is the case with pgsql). -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Extended customizing, SQL functions,
pgsql == pgsql [EMAIL PROTECTED] writes: pgsql The have a database of information that is coming in at a pgsql high speed regular basis. One bit of information is a pgsql value. To get this value they must perform SELECT pgsql sum(field) FROM table. Well, this simply does not pgsql scale. They've used a trigger system with a small summary pgsql table where they update, the number in the sumary pgsql field. That works fine, except, that after a few thousand pgsql updates, the SELECT time takes a while. Then they have to pgsql vacuum constanty. It just seems like an ugly and wastefull pgsql process. Sounds like something that TelegraphCQ can do well .. http://telegraph.cs.berkeley.edu -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] On query rewrite
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom This particular issue is handled as part of our Path Tom enumeration mechanism, but the more hard-wired sorts of Tom transformations that you are asking about live mostly in Thanks again. To confirm the actual cost comparison with plan enumeration is a dynamic programming algorithm, is it not ? Selinger-style with 2-way join paths enumerated, then 3-way using the best 2-way etc. ? BTW, do lots of people use the GEQO ? Tom optimizer/prep/* and plan/planner.c. In particular you Tom probably want to look at prepjointree.c and prepqual.c. Tom (Note prepqual also looks considerably different in CVS tip Tom than in prior releases.) Thanks .. I've extracted cvstip .. sigh .. one of these days I'll have to do another merge with the TelegraphCQ code. You guys hack too much :-) -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[HACKERS] On query rewrite
Hackers Are there any default query rewrite rules that kick in, in the absence of any user-defined RULE or VIEW ? Also, is there any place that lists any interesting query rewrite that PG does on queries for perf. improvement ? For instance, in the presence of a view or a subquery, does PG do a subquery to join transformation ? Thanks ! -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] On query rewrite
Alvaro == Alvaro Herrera [EMAIL PROTECTED] writes: For instance, in the presence of a view or a subquery, does PG do a subquery to join transformation ? Alvaro Yes, there are transformations of this sort, but they are Alvaro not called query rewrite in the code's terminology, but Alvaro optimization -- rewrite (rules and views) happens to the Alvaro parsed statement, and the optimizer works on the output of Alvaro rewriting. So actually the optimizations happen whether Alvaro there were or not rules or views. Interesting .. so these are rule-based then ? Not cost-based ? I understand that there is a cost-based optimizer anyway that does the planning and selects the right plan .. but does this come _after_ all these transformations ? Or does it happen along with the transformations ? Alvaro The query's path is SQL - parse - rewrite - optimize - Alvaro execute Can you please point me to the code that indeed does such transformations ? -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] On query rewrite
Alvaro == Alvaro Herrera [EMAIL PROTECTED] writes: I understand that there is a cost-based optimizer anyway that does the planning and selects the right plan .. but does this come _after_ all these transformations ? Or does it happen along with the transformations ? Alvaro No, there's no rules optimizer, only the cost-based one Alvaro you already know of. Okay ... Alvaro The query's path is SQL - parse - rewrite - optimize - Alvaro execute Can you please point me to the code that indeed does such transformations ? Alvaro Sorry, I don't know the optimizer code. You can find a Alvaro lot of detail in backend/optimizer/README. Probably you Alvaro want to look at what happens to JOIN_IN nodes, for Alvaro example, regarding the conversion of a Couldn't find the README but I'm operating on an older souce base. Anyway, thanks for the tips. I think I found what I'm looking for: the function is probably pull_up_subqueries .. and what it tries to do is check if a subquery is simple or not .. simple means not having Aggs or something more complicated. If that's the case, and if some NULLable conditions are safe then, the subquery gets pulled up - essentially, a subquery to join transformation. Now my next question is more subtle .. Are these alternatives (pulling up vs not pulling up subqueries) considered in different plans ? Because, if not, it seems that this is really just a query rewrite .. it's just that it happens as part of the optimizer component. In fact, there is an implicit rule here in operation (by rule, I don't mean a pgsql RULE). Are more such transformations spread around the optimizer component ? Is there any reason to have it integrated with the planner as opposed to having it be part of the rewrite component (apart from historical .. plus the code is solid and works etc.) ? Sorry for all the questions .. as I stated before I'm working on a chapter of a text book that is a case-study of pgsql (the 4th ed contains case studies of Oracle, DB2 and MSSQL) and the 5th ed is gonna have pgsql. Another question about regular RULE processing .. suppose after applying a rule the resultant query tree is eligible for another rule, does pgsql's rule system keep iterating over and over until it reaches a fixed point or is there some heuristic in operation (just apply the rules twice ..) ? From my cursory inspection of the code it looks like the latter, but I'd like to know for sure. Thanks ! -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] proposal: be smarter about i/o patterns in index scan
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom For starters, read the previous discussions of this in the Tom archives. Tom Two questions you should have answers to before starting to Tom implement, rather than after, are how to deal with locking Tom considerations and what will be the implications of giving up Tom the property that indexscans deliver sorted output. I don't know about the former, but as to the latter, we should certainly have the ability for both output sorted by key and TID-List-Fetch ... -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] proposal: be smarter about i/o patterns in index scan
Yes, fetching a RID list from an index scan, sorting 'em and then fetching from the table would be a very appreciable speedup for many queries. I would imagine that all the commercial engines do this (db2 calls it a sorted RID-list-fetch) .. and this has in fact been discussed on -hackers before. One issue for this is that there could be a slip between the cup and the lip .. ie., between the fetch from the index, the sort, and the fetch from the table a vaccuum could have taken place rendering TIDs invalid. This should however not be a show-stopper .. surely all that's needed is a lock against vaccuuming in the presence of a tid-list-fetch. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[HACKERS] Position available at the Telegraph project
Dear Hackers I apologize in advance if this posting is construed as spam. As many of you are aware, the database research group at UC Berkeley is building TelegraphCQ, a system for processing continuous queries over data streams. TelegraphCQ was built with the PostgreSQL source base. We are looking for a full-time programmer to work with our group. We believe that PostgreSQL hackers will be great candidates for such a position. Appended to this email is a full description of the job requirements. I encourage any hackers interested to apply for the position. To apply for the position please do the following: - Point your web browser to http://jobs.berkeley.edu - Click on External applicants - Click on View job posting / apply for jobs - Check the Information Technology box, Choose Regular from Regular/Temporary, fill in telegraph in the Keyword box and hit the search button. - Click on Programmer/Analyst IV-Ucb, job #001090 Please note: YOU HAVE TO APPLY FOR THE JOB THROUGH THE WEB SITE .. Job Description Posting Title: Programmer/Analyst IV-Ucb Requisition: 001090 Department: Electronics Research Lab Location: Main Campus-Berkeley First Review Date: 03/29/2004 This requisition will remain open until filled. Job Description: The focus of the Telegraph research group is on developing a general-purpose software infastructure for the dataflow applications. Dataflow applications are prevalent in networked environments where information from multiple sources must be flowed through various networked machines and software services for integration and analysis. The unpredictable nature of decoupled, networked machines and services requires novel, very adaptive software infrastructures: the design and study of such infastructure is the goal of the Telegraph project. The position is in support of the Telegraph project, with a focus on integration with two other software projects currently underway in the UC Berkeley database group: TinyDB, a query processing system that runs on ad hoc networks of wireless sensor motes, and YFilter, a distributed XML filtering and brokering system. This position will represent the chief programmer for the Telegraph project, with a focus on overall system architecture, documentation, open-source software delivery, and demonstration development and deployment. This work will be done in collaboration with the students and faculty developing the system. The success of the work will be best measured by the ability of our group to deploy interesting functionality with Telegraph, and for others outside of our group to successfully exploit our open source software in the development of applications in a wide range of scenarios. We want to make Telegraph the system of choice for research groups that are experimenting with data streaming and advanced dataflow applications. This project involves a large amount of existing software developed in the three projects (Telegraph, TinyDB, and YFilter), probably totaling several hundred thousand lines of code. The person holding the position will need to understand database management internals, XML processing and must be able to interface such systems with external devices such as sensor networks. Responsibilities: Design, implement, test, deliver, release, and maintain the core Telegraph system as a viable open source dataflow infastructure. This position will make all client-side design decisions: i.e. they will decide on user interface design, client-server APIs, and client and application design (there are a number of client applications as yet undeveloped which will be the full responsibility of the programmer.) The position will collaborate with students and faculty on server-side design decisions. The position will also be fully responsible for all public code release and installer issues, for documentation, and for testing and other quality assurance of both server and client code. Design, implement, test, deliver and maintain a number of demonstration applications over Telegraph- including conception, execution, and deployment at venues such as international research conferences, departmental meetings, CITRIS functions, etc. Coordinate with students and faculty to integrate the core Telegraph code with ongoing research extensions thereof, including participation in research meetings of the Telegraph group. Outreach activities to promote the adoption of Telegraph for future application, including assistance to selected users as well as general tutorials and presentations. Requirements Qualifications: The position requires a broad-based skill and experience level with the ability to quickly grasp the intricacies of complex software systems. Position
Re: [HACKERS] linked list rewrite
Re changing APIs or not. I agree with Tom that an incremental change is easier. More importantly, it is also easier to test your implementation. Even if everybody agrees that the API should be changed, IMO a better way would be to first use your implementation with the old API and go through some thorough testing to make sure it is AOK. Once we are certain (or as certain as we can be) about that, you can changeover to the new API. This way, when bugs show up we won't be left wondering if its because of the new API interacting with some hidden assumption at the call-sites or if it is only the implementation. One step at a time seems safer to me. I would think a new CVS branch would be the right way to merge your stuff especially if you are going with the API change route. That way you won't have to lock the tip and can incrementally work with a broken branch until it is fine (and keep re-merging). Which brings me to another question .. has anybody considered using subversion instead of CVS ? -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] Chapter on PostreSQL in a book
Hackers Along with some other folks I am co-authoring a chapter on PostgreSQL in the next edition of Database System Concepts by Silberschatz, Korth and Sudarshan (http://db-book.com) This is in the form of a case study and will follow the pattern in Chapters 25,26 and 27 (Part 8 in the Table of Contents available at the above URL) that deal with Oracle, DB2 and MS SQL Server respectively. There are three sections, though, that are non-engine specific: - Database Design and Querying Tools - Replication, Distribution and External Data - Database Administration Tools I'm at a loss as to how much we should focus on these sections. Do we use what's in GBorg ? Do the hackers have any suggestions ? There are 33 DBA tools and 19 Design tools in GBorg .. are there any specific tools that are recommended ? Suggestions appreciated .. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] subversion vs cvs
Marc == Marc G Fournier [EMAIL PROTECTED] writes: Marc On Tue, 23 Mar 2004, Sailesh Krishnamurthy wrote: Which brings me to another question .. has anybody considered using subversion instead of CVS ? Marc Why? not that I'm for a chance from something that isn't Marc broken, but what advantages does subversion give us over Marc what we already have? I've had plenty of pain with cvs in terms of directories not being first-class etc .. but I don't really contribute to pgsql so you guys probably don't have the same experience. I was just curious as it looks like eventually subversion (or arch :-) will be an alternative to cvs. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during
Kevin == Kevin Brown [EMAIL PROTECTED] writes: The bigger problem though with this is that it makes the problem of list overflow much worse. The hard part about shared memory management is not so much that the available space is small, as that the available space is fixed --- we can't easily change it after postmaster start. The more finely Again, I can suggest the shared memory MemoryContext we use in TelegraphCQ that is based on the OSSP libmm memory manager. We use it to grow and shrink shared memory at will. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Reducing expression evaluation overhead
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom The idea I was toying with is to generate, not x = y with Tom repeated copies of x, but placeholder = y where placeholder Tom is a dummy expression tree node. Then at runtime, the CASE Tom code would evaluate the test expression once and save it into Tom the econtext so the dummy node(s) could regurgitate that Tom value. We already have exactly such a mechanism in place to Tom handle the VALUE keyword for domain check constraints; it'd Tom be easy to duplicate it for CASE. That is exactly what I was proposing. I implemented something like this in TCQ and used a Const expression tree node. This was for something we call grouped filters where we build an index on predicates from multiple queries. So if you have a bunch of queries (say ~1000) each with a predicate R.a ??? xxx where ??? is one of ..=,=,= then we evaluate using the predicate index which queries fail for each incoming tuple. In a separate experiment we found that ExecEvalVar is particularly expensive for us .. this is because we have an IntermediateHeapTuple data structure to represent join tuples (in our framework, join orders are not fixed) .. the IHT has a set of pointers to the constituent tuples. This means that we have to do more work in ExecEvalVar .. essentially one more lookup into the IHT. All this was only possible because you guys kept around the varnoold and the attnoold !! -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Reducing expression evaluation overhead
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom I'm not sure that this would let us catch up to what Arjen Tom reports as MySQL's expression evaluation speed, but it should Tom at least speed things up a bit with only fairly localized Tom changes. I like the idea of memoizing the switch with function pointers as I don't think branch prediction helps much with varying switch arms selected with different exprs. Also I agree that the delta of indirect function invocation is probably small. I've forgotten the syntax of case, but for the simple form isn't expr=const going to be the same expr for each case arm ? If that's the case, couldn't we actually save the value of expr in a Datum and then reuse that (through a Const) in each of the other arms to evaluate the actual exprs ? That should reduce the number of times ExecEvalVar (and through it heapgetattr) are called. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Performance and WAL on big inserts/updates
Marty == Marty Scholes [EMAIL PROTECTED] writes: Marty Why have I not seen this in any database? Marty There must be a reason. For ARIES-style systems, logging parsed statements (commonly called logical logging) is not preferred compared to logging data items (physical or physiological logging). A major reason for this is that logical logs make recovery contingent on being able to execute the parsed statements. This execution might, however, not be possible if the system is itself not in a consistent state .. as is normally the case during recovery. What if, for instance, it's the catalog tables that were hosed when the system went down ? It may be difficult to execute the parsed statements without the catalogs. For this reason, a major goal of ARIES was to have each and every data object (tables/indexes) individually recoverable. So ARIES follows page-oriented redo logging. Having said that, page-oriented undo logging can be a pain when B-tree pages split. For higher concurrency, ARIES uses logical undo logging. In this case, the logs are akin to your parsed statement idea. In any case, the only place that parsed statements are useful, imo are with searched updates that cause a large number of records to change and with insert into from select statements. Then, there is also the case that this, the parsed statements approach, is not a general solution. How would you handle the update current of cursor scenarios ? In this case, there is some application logic that determines the precise records that change and how they change. Ergo, it is my claim that while logical redo logging does have some benefits, it is not a viable general solution. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Performance and WAL on big inserts/updates
(Just a note: my comments are not pg-specific .. indeed I don't know much about pg recovery). Marty == Marty Scholes [EMAIL PROTECTED] writes: Marty If the DB state cannot be put back to a consistent state Marty prior to a SQL statement in the log, then NO amount of Marty logging will help. The idea is that the state can be put Marty back to what it was prior to a particular log entry, be it Marty raw datafile blocks or a SQL statement. The point is that with redo logging, you can just blindly apply the log to the data pages in question, without even really restarting the database. Note that in ARIES, recovery follows: (1) analysis, (2) redo _everything_ since last checkpoint, (3) undo losers. So logging carefully will indeed help get the system to a consistent state - actually after phase (2) above the system will be in precisely the state during the crash .. and all that's left to do is undo all the live transactions (losers). BTW, logging raw datafile blocks would be pretty gross (physical logging) and so ARIES logs the changes to each tuple in logical fashion .. so if only one column changes only that value (before and after images) are logged. This is what's called physiological logging. Marty See above. If this cannot be resolved prior to Marty re-executing a statement in the log, then the problem is Marty beyond ANY subsequent logging. Not true ! By applying the log entries carefully you should be able to restore the system to a consistent state. Having said that, page-oriented undo logging can be a pain when B-tree pages split. For higher concurrency, ARIES uses logical undo logging. In this case, the logs are akin to your parsed statement idea. Marty Yes, my experience exactly. Maybe we are the only company Marty on the planet that experiences this sort of thing. Maybe Well, logical undo is still at a much lower level than parsed statements. Each logical undo log is something like delete key 5 from index xyz. Marty Maybe this is not a traditional RDBMS app, but I am not Marty in the mood to write my own storage infrastructure for it. I agree that your app has a lot of updates .. it's just that I'm not convinced that logical logging is a clean solution. I also don't have a solution for your problem :-) -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets
We have a rude hack of temping hashed aggs to disk to deal with the case where there is not enough memory. I don't think that's an ideal solution, but it certainly has the code to dump to file. I can post the patch later in the day .. (This is some code for our undergrad db class assignment. I was planning to clean it up and submit it properly but I never got the time) -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] PostgreSQL port to pure Java?
We remain sceptical about writing an RDBMS in Java. The earlier version of TelegraphCQ was in Java and turned out to be a bit of a pain. Some more information: Mehul A. Shah, Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein: Java Support for Data-Intensive Systems: Experiences Building the Telegraph Dataflow System. SIGMOD Record 30(4): 103-114 (2001) Apart from our group, the database research group in Wisconsin also rewrote their Niagara system from Java to C++. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] relation_byte_size()
Hackers Here is the definition of relation_byte_size() in optimizer/path/costsize.c: -- /* * relation_byte_size *Estimate the storage space in bytes for a given number of tuples *of a given width (size in bytes). */ static double relation_byte_size(double tuples, int width) { return tuples * (MAXALIGN(width) + MAXALIGN(sizeof(HeapTupleData))); } -- Shouldn't this be HeapTupleHeaderData and not HeapTupleData ? (Of course, from a costing perspective these shouldn't be very different but ...) -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Materialized views proposal
Hannu == Hannu Krosing [EMAIL PROTECTED] writes: Hannu Neil Conway kirjutas P, 30.11.2003 kell 02:18: Jonathan Gardner [EMAIL PROTECTED] writes: 3) We would implement some sort of differential view update scheme based on the paper Efficiently Updating Materialized Views[1]. Hannu Maybe the TelegraphCQ engine can give some ideas Hannu http://telegraph.cs.berkeley.edu/ Hannu from brief reading, it seems that all they do is kind of Hannu materialized views ;) - they call it Continuous Dataflow Hannu Processing, i.e. queries that run continuously over Hannu incoming data. A fair portion of the community argues that everything that we do is merely materialized views :-) While materialized views are certainly related, there are however, things that are quite different in processing continuous queries over data streams. The first ideas on stream processing did come from work on scalable triggers which is intimately related to materialized views. Work on mviews isn't really concerned with things like windowed joins and aggregate processing. My advice is to take a very incremental approach to materialized view implementation. There are essentially 2 pieces - one the maintenance of the views and second the routing of queries to automagically use the materialized views. The former is done by something like triggers and the latter is through either a query-rewrite mechanism or something like an optimizer choosing a different access method. As for the former, there are again two subdivisions - what changes you propagate and when you apply 'em. What you propagate are the deltas. So you have the following choices: IPIA - Immediate Propagate, Immediate Apply IPDA - Immediate Propagate, Deferred Apply DPDA - Deferred Propagate, Deferred Apply DPIA - makes no sense .. The easiest is DPDA .. in this model you essentially recompute the view on demand. The next is IPIA.. with IPIA, you can choose to either recompute the entire view or only figure out how to translate a delta into appropriate changes in the mview. After that there is IPDA .. this involves creating a delta table that is populated based on the triggers. Every so often you process a set of deltas and recompute the view (either in its entirety or in parts Another way to slice the problem is to limit the scope of queries that can be used to define views. I suggest that first we don't consider joins .. and only consider grouped aggregates over a single table. Joins are more hairy with changes in one table potentially requiring a wholesale recomputation of the join. Note that all this discussion is only about the first part of the problem .. maintenance of the views. The other part, routing is almost equally complicated .. there you have to solve the query subsumption problem without also eliminating choices in your access plan. As people have said there are plenty of papers on this in the literature. While I am no theorist I can certainly help with reading the papers .. not every bit of a paper is very useful. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] detecting poor query plans
Neil == Neil Conway [EMAIL PROTECTED] writes: Neil It occurred to me that these kinds of poor planning Neil decisions could easily be detected by PostgreSQL itself: Neil after we've finished executing a plan, we can trivially Neil compare the # of results produced by each node in the query Neil tree with the # of results the planner expected that node to Neil produce (look at EXPLAIN ANALYZE, for example). If the Indeed. This is the approach being followed by the LeO project (Learning Optimizer) at IBM Almaden. http://www.almaden.ibm.com/software/dm/SMART/leo.shtml There is a vldb paper that describes it .. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Is there going to be a port to Solaris 9 x86 in the
Greg == Greg Stark [EMAIL PROTECTED] writes: Greg I think you're talking about situations like where x = ? or Greg y = ? or where x = ? and y = ? Greg When both `x' and `y' are indexed. It's possible to do the Greg index lookup, gather a list of tid pointers in some Greg efficient format like a bit vector, and apply the and/or and Greg any other clauses. Yes, Index ANDing/ORing are useful whether or not the list of tids are in an efficient format. Especially ORing for performing disjunctions. Greg Oracle can do this, and it's useful in some cases when you Greg have DSS-style where all the indexes have poor selectivity Greg but using enough of them together gets you a reasonable Greg number of records. I guess this is the piece where variant indexes is useful - essentially when you have a large number of matches for a given key. I'm not sure how useful it is in practice - I've only read the original research paper. Greg I think this is different from what he meant, but yes, Greg bitmap indexes might be an interesting project. Like hash You're right .. a sorted TLF is really something quite simple that can make quite a difference in accessing a non-clustered index. I believe this is something all the commercial guys do. Sorting the Tids before fetching 'em buys you buffer cache locality. When there are large numbers of hits, it also buys you sequential scans where the file system prefetcher can help. The additional overhead you pay is the sorting cost. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Is there going to be a port to Solaris 9 x86 in the
Mike == Mike Mascari [EMAIL PROTECTED] writes: Mike Robert Treat wrote: While some form of bitmapped indexing would be cool, other ideas might be to implement different buffer manager strategies. I was impressed by how quickly Jan was able to implement ARC over LRU, but there are a host of other strategies that could also be implemented. We already do that ! We have a first warm-up assignment for which they get 2 weeks and have to change the strategy to MRU from LRU (in an earlier semester they were assigned 2Q). The idea here more to just get used to the code and the debugger. Sadly the undergraduate OS class uses Java (horrors) as an implementation language and many of our juniors and seniors are not as uncomfortable with C programming (and pointers) as I'd like. The good news is that they all pretty much got into the groove fast. Re PITR, maybe that's an option - the thing is we are looking less at a full semester long project and more at a 3/4 week assignment where students get to hack something, learn about the practical side to what's in lecture, and learn to do some performance comparisons. Mike If you go to sequential access of larger chunks of the disk, you will Mike get 500 times more bandwidth—you can read or write the disk in a day. Mike So programmers have to start thinking of the disk as a sequential Mike device rather than a random access device. Mike Isn't a TID-List-Fetch implementation a crucial first step in the Mike right direction? I believe so .. I think it's a clear win. I believe there are some concurrency issues although I'm not sure .. what if there is a vaccuum that comes in between building the Tid list and then doing a fetch ? -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Is there going to be a port to Solaris 9 x86 in the
Robert == Robert Treat [EMAIL PROTECTED] writes: Robert allowing indexes for searching nulls, or adding Robert concurrency to GIST, or allowing non btree indexes to Oh this has come up before on -hackers and I've been meaning to chime in. Marcel Kornacker did implement concurrency for GiST - I confirmed as much with Joe Hellerstein (his advisor). I know there's a paper he wrote with C.Mohan on it. I don't know which version his implementation was for. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Is there going to be a port to Solaris 9 x86 in the
PostgreSQL most definitely works great on Solaris x86 ! At UC Berkeley, we have our undergraduate students hack on the internals of PostgreSQL in the upper-division Introduction to Database Systems class .. http://www-inst.eecs.berkeley.edu/~cs186/ The official platform is Solaris x86 - that's where the students get accounts and they have to get their code working on that platform as the TAs will only test and grade their submissions on Solaris x86. (Besides, I also got TelegraphCQ running on Solaris x86 .. just for kicks though .. and TelegraphCQ is based off of pgsql-7.3.2) -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Is there going to be a port to Solaris 9 x86 in the
Mike == Mike Mascari [EMAIL PROTECTED] writes: Mike Robert Treat wrote: http://www-inst.eecs.berkeley.edu/~cs186/hwk0/index.html Are these screenshots of PgAccess on Mac OSX? Yup .. that's from Joe Hellerstein, who was the instructor in the Spring when I was a TA. Mike It's pretty sad that Mike Stonebraker only has a salary of Mike $15,000. ;-) We've got to do something now that he's a competitor :-) Mike I also thought this SIGMOD article was a nice read: Mike http://www.acm.org/sigmod/record/issues/0309/4.JHdbcourseS03.pdf That's right .. it was a fun experience and I like to think that the students enjoyed it. The best part was that we got to pick up 3 great undergraduates for our research - and pgsql hacking experience is invaluable in hacking TelegraphCQ. Mike How about extra credit for PITR? One step at a time :-) Actually a big problem is figuring out new pieces for the projects. Most of the items in the TODO list are way too much for a class project - we gave 'em 3 weeks to make the Hash GroupedAgg work for large numbers of unique values (by using a form of hybrid hashing). Another thing I toyed with was having an implementation of a Tid-List-Fetch .. sorting a TID-list from an index and fetching the records of the relation off the sorted list for better IO performance. AFAICT something like this isn't present yet .. can pgsql do this already ? -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Is there going to be a port to Solaris 9 x86 in the
Chris == Christopher Kings-Lynne [EMAIL PROTECTED] writes: PostgreSQL most definitely works great on Solaris x86 ! At UC Berkeley, we have our undergraduate students hack on the internals of PostgreSQL in the upper-division Introduction to Database Systems class .. http://www-inst.eecs.berkeley.edu/~cs186/ Chris Hi Sailesh, Chris You know what would be kind of cool? If you could write a Chris Guide to PostgreSQL to Teach Databases. Chris eg. You could cover how to set up the server securely Chris (eg. schemas for each person), etc. Chris How to manage it all, handle upgrades, etc. Mention what Chris things are good to get students to hack on in the Chris internals, etc. Chris Could be a good techdocs.postgresql.org article. Hmm .. this is probably a good idea. I ought to do it for the benefit of future TAs here anyway - but I wasn't thinking on the lines of a full-fledged article .. just burnt out with too much writing :-) I'm just underwater until the end of the semester. If I don't come through by the end of december do ping me. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] O_DIRECT in freebsd
DB2 supports cooked and raw file systems - SMS (System Manged Space) and DMS (Database Managed Space) tablespaces. The DB2 experience is that DMS tends to outperform SMS but requires considerable tuning and administrative overhead to see these wins. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] O_DIRECT in freebsd
Jordan == Jordan Henderson [EMAIL PROTECTED] writes: Jordan significantly better results. I would not say it requires Jordan considerable tuning, but an understanding of data, storage Jordan and access patterns. Additionally, these features did not Jordan cause our group considerable administrative overhead. I won't dispute the specifics. I have only worked on the DB2 engine - never written an app for it nor administered it. You're right - the bottomline is that you can get a significant performance advantage provided you care enough to understand what's going on. Anyway, I merely responded to provide a data point. Will PostgreSQL users/administrators care for additional knobs or is there a preference for keep it simple, stupid ? -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Dreaming About Redesigning SQL
Josh == Josh Berkus [EMAIL PROTECTED] writes: Relational is all about theory and proving things mathematically correct. MV is all about engineering and getting the result. And if that means pinching all the best ideas we can find from relational, then we're engineers - of course we'll do it :-) Josh Relational is all about preserving the *long-term* Josh integrity, utility, and accessability of your data. MV is Josh all about getting an expedient result immediately and to Josh heck with the future. To emphasize the *long-term* - relational is all about separating physical database representation from a logical view of the data. Josh, I'll be happy to meet at Intermezzo - after Nov 24 perhaps. I have a conference deadline .. the papers that we write just to secure funding - your tax dollars at work. Long live the NSF ! BTW, I'm not necessarily that far from your views. There is, however, more to an XML database than just storing data - relational databases do just fine there. The tools that are used to build database systems have other uses - for instance XML message brokers. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Dreaming About Redesigning SQL
Josh == Josh Berkus [EMAIL PROTECTED] writes: This is an unfair characterization of XML databases, and I can say this without accusations of bias for I vastly prefer working with the relational model. Josh Actually, amusingly enough, there is a body of theory Josh backing XML databases, but it's not one any current devloper Josh would want to admit to: the XML database is functionally Josh identical to the Network Databases of the 1960's. Of Josh course, Network Databases gave way, historically, to Josh Relational for good reason. If you look at the academic research work, there have been gazillions of recent papers on XML database technology. All the major database vendors (Oracle, IBM and Microsoft) are investing fairly heavily in core-engine XMLDB technology. Finally, while it is true that some of XML db technology is evocative of network databases, XML databases are certainly more than network databases. For one, they are semi-structure .. in addition they present query language access to their data (although I'm not a big fan of XQuery). -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
Christopher == Christopher Browne [EMAIL PROTECTED] writes: Christopher Ah, but do papers honestly indicate the emergence Christopher of some underlying theoretical model for which Christopher fidelity could be evaluated? Certainly. The model is that of semi-structured data, where often times schema isn't very clearly defined. It's a different model from the relational model - which I'm partial to. There are situations where the XML model does make sense. Christopher Or merely that academics are looking to write papers Christopher on whatever topics can attract research funding? Bash academics if you want. The truth is that industry is also working on it. As I said before, I have no axe to grind in this. I might be in academia now, but frankly I couldn't give a toss about XML. Christopher Half the articles in SIGOS have been about pretend Christopher applications of Java to operating systems; why does Christopher it seem likely that the database academics are any Christopher less affected by this? I think you are looking at the wrong publishing location :-) The premier venue for the OS community are the SOSP and OSDI conferences. Please look at the SOSP04 papers - you'll find fairly good systems work. BTW, I'm not sure which database papers you read - the premeer venues for database systems work are the SIGMOD and VLDB conferences. Christopher CODASYL had a query system, albeit something that Christopher looked more like assembly language than anything Please take a fair look at the XQuery data model and the XQuery language before comparing it with CODASYL. I will not admit (at least in public :-) to being a big fan of XQuery but that is because of certain details, not anything fundamental. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Database Kernels and O_DIRECT
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom I tend to agree with the opinion that Oracle's architecture Tom is based on twenty-year-old assumptions. Back then it was Tom reasonable to assume that database-specific algorithms could Tom outperform a general-purpose operating system. In today's Tom environment that assumption is not a given. In fact: Michael Stonebraker: Operating System Support for Database Management. CACM 24(7): 412-418 (1981) Abstract: Several operating system services are examined with a view toward their applicability to support of database management functions. These services include buffer pool management; the file system; scheduling, process management, and interprocess communication; and consistency control. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL not ACID compliant?
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom AFAIK, no commercial database does predicate locking either, True .. Tom so we all fall short of true serializability. The usual Tom solution if you need the sort of behavior you're talking Tom about is to take a non-sharable write lock on the table you Tom want to modify, so that only one transaction can do the Not really. If you have B+-tree indexes on the table you can get by with key-value locking (as in ARIES/KVL) and achieve some of the effects of predicate locking to get true serializability without losing too much concurrency. While this falls short in the general case, it turns out to be pretty acceptable normally (when indexes are present). -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] this is in plain text (row level locks)
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom Personally I'd feel more comfortable with a shared-lock Tom approach, if we could work out the scalability issues. Dirty Tom reads seem ... well ... dirty. Tom I was going to do some experiments to measure the costs of our shared memory MemoryContext. Is there something specific you'd be interested in ? My first goal is mainly to measure the amount of time it takes to allocate and deallocate shared memory in the contested and uncontested cases. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] this is in plain text (row level locks)
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom That doesn't work, unless you insist that the first backend Tom can't exit its transaction until all the other ones are done. Tom Which introduces its own possibilities for deadlock --- but Tom even worse, how does the first backend *know* that the other Tom ones are done? You're right back where you started: it has Tom to be possible to tell which backends have share-locked a Tom particular row. Is a count a solution ? The first backend gets the S lock on the row - I'm assuming you plan to do it by recording it on the tuple and not in a shared memory lock table, which means that you might have to unnecessarily write an unmodified page if its buffer pool frame is stolen. The problem is that on commit time, you must carefully decrement the count value of shared locks on any tuple that you own. This can be accomplished by having each backend keep track of the list of files and TIDs for any rows for which it acquired S locks. Is this the same way that pgsql releases the X locks ? Bruce, I don't disagree that MVCC has the very nice property that writers don't block readers. However, I don't buy that 2-phase locking, with lock escalation is either unworkable because of too many locks, or causes any extra pain for the user application (apart from the fact that writers not blocking readers gives you more concurrency at some very minor overhead of not being strictly serializable). -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] this is in plain text (row level locks)
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom Rod Taylor [EMAIL PROTECTED] writes: It may be best to have a locking manager run as a separate process. That way it could store locks in ram or spill over to disk. Tom Hmm, that might be workable. We could imagine that in place Tom of the HEAP_MARKED_FOR_UPDATE status bit, we have a this row Tom is possibly locked hint bit. Only if you see the bit set do Tom you need to query the lock manager. If the answer comes back Why do you want to query the lock manager as a separate process ? Why not have the traditional approach of a lock table in shared memory, growing and shrinking as appropriate, and have each individual process update it (need to protect it with a latch of course). -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] this is in plain text (row level locks)
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom Sailesh Krishnamurthy [EMAIL PROTECTED] writes: Why not have the traditional approach of a lock table in shared memory, growing and shrinking as appropriate, Tom Because we can't grow shared memory. Whatever size we get at Tom startup is what we're stuck with. (I suppose we could try Tom asking the kernel for additional segments, but there's every Tom likelihood that that will fail.) We implemented a Shared Memory MemoryContext using OSSP libmm (used in Apache) for TelegraphCQ. If you think it's useful I can submit it as a patch. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] this is in plain text (row level locks)
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom Sailesh Krishnamurthy [EMAIL PROTECTED] writes: We implemented a Shared Memory MemoryContext using OSSP libmm (used in Apache) for TelegraphCQ. Tom How portable is that? What kind of real-world performance do Tom you get? As portable as libmm I guess. We've only tried it on Linux and MacOS X. I understand that the Apache guys use it - so it should be fairly portable. http://www.ossp.org/pkg/lib/mm/ Actually I've also tried it successfully on SPARC Solaris. I believe that a user of our first TelegraphCQ release (that we did not announce) tried it on FreeBSD. Sean Chittenden has tried it on FreeBSD but I'm not sure if he built it and got it to work properly. As for our SHMemoryContext code itself, it's pretty much a straight copy of the code in aset.c and mctx.c, with appropriate changes where we call the mm_alloc functions and also use mm_lock and mm_unlock for protection. As for performance, I'm in the midst of a performance study, but it's not a micro-benchmark of the shmem allocator. After I get some numbers I'll start profiling our code. One problem is that on some platforms the locking implementation uses the file-system. This is not in our control and we just trust mm to do the best possible. Given that we rely on shared query execution (we run multiple queries in a single adaptive query plan) we don't have much choice - we must rely on shared memory. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] this is in plain text (row level locks)
Bruce == Bruce Momjian [EMAIL PROTECTED] writes: -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh Bruce ^^^ Watch out, that code from Bruce Berkeley usually is a mess. :-) LOL ! That's why we release the code - in the hope of getting real hackers playing with it :-) Bruce Seriously, though, it would be good to see what you guys Bruce are up to. Our code is based on 7.3.2 pgsql and available at: http://telegraph.cs.berkeley.edu/telegraphcq If the specific shm stuff is interesting, I can easily take it out and submit it. If you prefer to just look at the code in the tarball that's fine too. Or you can browse our cvs repository at: http://triplerock.cs.berkeley.edu:8080/viewcvs/viewcvs.cgi/ Look for shmctx.c and shmset.c in src/backend/utils/mmgr -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] this is in plain text (row level locks)
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom Jenny - [EMAIL PROTECTED] writes: Iam trying to acquire rowlevel locks in postgresql. I try doing this: 'select * from students where name='Larry' for update; But by looking at the holding array of proclock , I've noticed that by doing this only AccessShareLock gets acquired which is a table level lock. Tom Row-level locks are not recorded in proclock --- they are Tom implemented by marking the individual tuple on-disk. If we Tom tried to record them in shared memory, it'd be very easy to Tom run out of shared memory, because you could be holding row Tom locks on a large number of tuples. Of course, other database systems do this without too much hassle .. including relying on lock escalation (move up to page/table level locks) when the number of locks grow too large. Does pgsql only record X locks on the individual tuples on-disk or does it do so for S locks as well ? Not that I dislike the idea - Toby Lehman suggested this in his Ph.D. thesis in the mid-eighties for main-memory databases (where you don't take the write penalty). -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] table-level and row-level locks.
Once more unto the breach - Could you please abstain from sending HTML email to the list ? Many thanks ! -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[HACKERS] ANNOUNCEMENT: Availability of TelegraphCQ v0.2 (BETA release)
ANNOUNCEMENT: Availability of TelegraphCQ v0.2 (BETA release) - The Telegraph Team at UC Berkeley is pleased to announce the immediate availability of TelegraphCQ v0.2 (a BETA release). TelegraphCQ is a system for processing long-running continuous queries over data streams. TelegraphCQ is implemented as a significant modification to the PostgreSQL open-source database system. Specifically, TelegraphCQ v0.2 is based on the PostregreSQL 7.3.2 code-base. For more information on the Telegraph project please visit: http://telegraph.cs.berkeley.edu To download TelegraphCQ and for more information, please visit: http://telegraph.cs.berkeley.edu/telegraphcq Here is the README.TelegraphCQ from the source distribution: -- This file contains information on TelegraphCQ 0.2 (a Beta releae). What's new (since TelegraphCQ 0.1) - - Better stability: Lots of bugs have been squashed - CSV Wrapper: An easy to use wrapper that accepts simple comma separated text as input and supports all PostgreSQL and user-defined types. There is now no need to write C code to develop a new wrapper if your data source produces textual data. - Pull Wrappers: If you want to pull (lookup) data from external sources (such as web pages) based on query predicates, you can use the new infrastructure for the Pull Wrapper. - Introspective Query Processing: The dynamic catalog is a new feature in TelegraphCQ. It lets users query three special purpose internal data streams to get a continuous view of how the TelegraphCQ system is behaving. - Applications: Included with this distribution are a few applications that we demonstrated at the ACM FCRC/SIGMOD 2003 Conference in San Diego. We have integrated data from CalTrans (California Department of Transportation), CHP (California Highway Patrol) and a local network interface (using the tcpdump utility). We have also included graphical visualizers that run some interesting queries over data streams from these sources. Request --- If you are using TelegraphCQ for any project (research or otherwise), we'd like to know ! - If you are making any performance measurements of TelegraphCQ and are publishing it or disseminating it in some way, please do let us know. - We welcome bug fixes and code contributions of any kind. Please send all patches to [EMAIL PROTECTED] Caveat Emptor - The current version of TelegraphCQ (0.2) is still a Beta release and is very much research code. We haven't optimized it for performance. Also, you will need to create a fresh database to use it as it will not work against an existing PostgreSQL database. You are free to use pgdump to export and then import your data from an existing PostgreSQL database. - Archived streams: Note that although TelegraphCQ 0.2 supports archived streams, we currently do not support access of historical data. This will be fixed in our next release. For more information Getting Started: A TelegraphCQ Getting Started guide can be found in docs/TelegraphCQ/gettingstarted.html The Getting Started guide also has links to other information such as DDL/DML syntax, examples etc. Note that the documentation installed in your TelegraphCQ installation directory contains documentation for the new syntax changes (CREATE/ALTER/DROP STREAM, the WINDOW clause in DML) integrated into the existing PostgreSQL documentation. The Telegraph Project: http://telegraph.cs.berkeley.edu Supported Platforms --- TelegraphCQ is supported on Linux x86 (various distributions including Red Hat Linux 8,9 and Debian) and MacOS X 10.2, the platforms on which it was developed. In addition, it has been minimally tested on the SPARC Solaris platform. In general, TelegraphCQ should run anywhere PostgreSQL runs. However, there are significant differences in how we use shared memory and synchronization primitives such as semaphores. This is likely to cause some hiccups on platforms where we have not tested it yet. We welcome patches that help in porting TelegraphCQ. Please send all comments (and patches) to [EMAIL PROTECTED] -- -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] find_typedef on Linux
Friends src/tools/find_typedef has in its prolog: # This script attempts to find all typedef's in the postgres binaries # by using 'nm' to report all typedef debugging symbols. # # For this program to work, you must have compiled all binaries with # debugging symbols. # # This is run on BSD/OS 4.0, so you may need to make changes. I'm trying to run find_typedef on Linux and I don't get any output (I'm on a 7.3.2 base). I tried playing with different options to objdump (tried --syms instead of --stabs) to no avail. Is there something I need to pass to configure to ensure that all binaries are built appropriately ? That is, something _other_ than --enable-debug as I always have that anyways. Thanks ! -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] agg/order-by question
Consider the explain for the following queries .. sample=# explain select a, count(*) from foo group by a order by a; QUERY PLAN - Aggregate (cost=69.83..77.33 rows=100 width=4) - Group (cost=69.83..74.83 rows=1000 width=4) - Sort (cost=69.83..72.33 rows=1000 width=4) Sort Key: a - Seq Scan on foo (cost=0.00..20.00 rows=1000 width=4) (5 rows) sample=# explain select a, count(*) from foo group by a order by a desc; QUERY PLAN --- Sort (cost=80.65..80.90 rows=100 width=4) Sort Key: a - Aggregate (cost=69.83..77.33 rows=100 width=4) - Group (cost=69.83..74.83 rows=1000 width=4) - Sort (cost=69.83..72.33 rows=1000 width=4) Sort Key: a - Seq Scan on foo (cost=0.00..20.00 rows=1000 width=4) (7 rows) In the first case pgsql doesn't have a Sort on top because the Sort below the Group produces the right interesting order (using the System-R term). In the second case however, since the order-by clause demands desc there is a Sort tagged on on top. Now, instead of doing this, isn't it better to just have a similar plan as in the first case, but just change the lower Sort to be descending ? It doesn't affect the Group and the Agg in any way .. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] agg/order-by question
Bruno == Bruno Wolff, Bruno writes: Bruno You might try this in 7.4. I am pretty sure a change was Bruno made a couple of weeks ago to let group by work with either Bruno sort order. Also hash aggragates have been available for Bruno quite a while in 7.4. This is a better plan when there are Bruno only a small number of distinct values. Gotcha ! Thanks. TelegraphCQ is still on the 7.3.2 code base .. after doing one hellish merge in March, I'm not too eager to do another, although merging more often is likely going to be less painful. I knew about the hash-aggregates - we had set spilling of hash-aggregates to disk for large number of distinct values (with a crude form of recursive partitioning) as a course project for our undergraduate database class at Berkeley. When I get some time, I want to clean up my solution code and contribute it as a patch. I don't think that will be before the end of summer though. BTW, some systems prefer sorted grouped-aggregates to hashed grouped-aggregates - even for small distinct values. How it works is to just update the running aggregates in place in the sort tournament tree. The only requirement is to be able to compute aggregates of aggregates, so that partial aggs for the same distinct values across different runs can be merged. The advantage is that you get sorted grouped aggregation for the same cost of unsorted hash-grouped agg. The disadvantage is that you lose the modularity of the sort. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] 2PC: discussion in comp.arch
From Bill Todd's post: This is the simple 'two-phase commit, presumed-abort' mechanism. It has no problems guaranteeing distributed consistency, but does suffer from the problem that if the coordinator *never* recovers some of the other nodes may be left 'in doubt' about the fate of the transaction. In practice, with replicated data 'never' recovering is virtually impossible, but just waiting for some human to replace failed hardware can stall things enough that a more complex 'three-phase commit' model exists where nodes need not wait for the coordinator to recover. Another manner in which that issue is addressed is by having a 'back-up' coordinator to which coordination responsibility can be transferred; a third is by having a robust coordinator - e.g., whose storage can be 'failed over' to another separate system which can continue operation. This is exactly what some of us have been saying here. As long as the co-ordinator is capable of recovering, there is no danger of in-doubt transactions stalling a subordinate indefinitely. While I'm not sure if any database system implements 3PC, certainly 2PC-PA is implemented by most database vendors (and is part of the XA standard). Presumed Abort (PA) is preferred to Presumed Commit (PC) because PA involves less log sync operations for the common case of read-only transactions. 2PC certainly has various real-world applications, as with the integration of message-queueing systems. Alternatively, if there is a plan to support serious federated capabilities within pgsql (by expanding the current dblink prototype for instance) the issue of 2PC will come into play if pgsql supports remote write operations. Anyways, this is all moot if integration with enterprise transaction systems is not an important goal of pgsql. If there is no explicit need for such features amongts users of pgsql, I see no need in polluting the codebase with unnecessary complexity. Let's just not confuse no need for XYZ functionality with XYZ functionality is lame and can never work in practice. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Two weeks to feature freeze
Let me add my two cents .. I think something like PostgreSQL needs two test suites - an acceptance test (to ensure that checkins don't break functionality) and a regression test suite. What we call the regression test suite is really an acceptance test. Tom Lane is absolutely right in asserting that a test suite that takes a week to run will mean that people won't test at all. Personally, I can (and have for many years) tolerate acceptance test suites that take upto an hour. The existence of such an acceptance test should not however obviate the presence of a wider regression test suite. It should be fine to have an entire suite of regression test buckets take a week to run, because you only start running them once you have a release candidate or equivalent. Of course, setting up a regression test suite takes effort. There is no need, however, to spend umpteen amounts of time writing the buckets. What can be done is to incrementally build it up. So whenever we have a significant new feature, somebody (preferably not the key developers) could take the time to set up a set of test cases that try to test it thoroughly. It's okay if such a test bucket takes 10-15 minutes to run. Then this can get rolled up into the regression suite while a very small representative test case makes it to the acceptance test suite. Of course, in the open source world, these things take resources and are not easy to do. I certainly think that the base regression test suite is great. We have clearing the pgsql regression test a checkin requirement for TelegraphCQ developers as our goal is to not break pgsql functionality. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Two weeks to feature freeze
Bruce == Bruce Momjian [EMAIL PROTECTED] writes: Bruce Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: The question was whether 2PC is useful. The question wasn't if an unreliable 2PC was useful. My question is whether there is such a thing as reliable 2PC. I sure don't see how you build that. Bruce Other databases use 2PC --- are you saying none of them are Bruce reliable? And they use them for both federated read/write (what you refer to as distributed access through dblink) and for clustered configurations. I'm not sure if I understand Tom's beef - I think he is concerned about what happens if a subordinate does not respond to a prepare message. I would assume that the co-ordinator would not let the commit go through until it has received confirmations from every subordinate. The application's commit will remain blocked against the co-ordinator when this takes place. That said, I agree that 2PC (and variants) is rather heavy weight when in widely distributed configurations. (Although I guess in practice, many people use Presumed Abort and not vanilla 2PC as PA results in fewer log flushes for read-only transactions.) -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Two weeks to feature freeze
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom Sailesh Krishnamurthy [EMAIL PROTECTED] writes: I'm not sure if I understand Tom's beef - I think he is concerned about what happens if a subordinate does not respond to a prepare message. I would assume that the co-ordinator would not let the commit go through until it has received confirmations from every subordinate. Tom No. I want to know what the subordinate does when it's Tom promised to commit and the co-ordinator never responds. Tom AFAICS the subordinate is screwed --- it can't commit, and it Tom can't abort, and it can't expect to make progress Tom indefinitely on other work while it's holding locks for the Tom not-quite-committed transaction. Okay I understand what you mean now. AFAIK the general way things happen is that each site has a recovery procedure that kicks in after a crash. If the co-ordinator crashes (which could be before or after it sends out COMMIT messages to some of the subordinates), its recovery manager will bring the system up, read the log and ready information about all uncommitted transactions in virtual storage. If a Xact is in the PREPARE stage it will periodically send a message to the co-ordinator asking about what happened to the transaction in question. Once the co-ordinator has come back online it can respond to the query. Of course in the case of a co-ordinator going out of action totally and remaining unconnected this is not a viable solution. If you're making the case that 2PC is not viable on very wide area networks with intermitted connectivity, I agree whole-heartedly. That said, 2PC (and its children, PA and PC) have their place, and are indeed used in many systems. For instance, say you are rigging up a message queueing infrastructure (like MQ-series) to your database (say with NOTIFY), you'd at least like to have the db act as a co-ordinator with the MQ. Or the parallel cluster example I gave earlier. Clustered linux boxes are definitely here although no open-source DBMS offers a parallel solution. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] [HACKERS] psql
Nailah == Nailah Ogeer [EMAIL PROTECTED] writes: Nailah Well here's the thing. Before i was trying to use Nailah ShmemInitStruct in buf_init.c. The problem with this is Nailah that you can't shrink or grow shared memory. That is why i Nailah switched over and just used malloc. So i seem to be in a We've implemented a Shared Memory MemoryContext in TelegraphCQ. We used the opensource libmm from the Apache project. Maybe you can try using it - it's fairly easy to use. The current version in the web is based off of 7.2 code, but I hope to refresh with a beta based on 7.3 code in the next few weeks. http://telegraph.cs.berkeley.edu/telegraphcq -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Help needed in testing my code.
Why don't you use elog(LOG, instead of printf ? -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Help needed in testing my code.
Srikanth == Srikanth M Srikanth writes: Srikanth Dear Sir I am new to postgres and dont know the excat Srikanth procedure of testing my code, okay i will use elog, but Srikanth please tell me the procedure of testing that. Srikanth As you're a big fan of stealing code, why don't you grep the source for occurences of elog(LOG or elog(NOTICE. elog(NOTICE should send NOTICE messages to the psql client (I think !) while elog(LOG while only send it to the logfile you might have specified while starting the postmaster Srikanth for example,pls tell what should i do after i add code, Srikanth what make files should i run and where should i place Srikanth the new executables. Well you sould start out by RTFM .. read how to build postgres. I run make install in my postgres directory (right above src/). -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] some more docbook help
Hi Gang I can sense that I'm _really_ close to getting docbook working, but not quite there yet. I'm hoping somebody can tell me where I'm screwing up ! I'm on a Red Hat Linux 7.3 system. Although I do have openjade installed I was finding it difficult to figure out where to point stylesheets to postgres, so instead I decided to build openjade (and so OpenSP) from source and configure it _exactly_ as described in doc-toolsets.html and doc-build.html in the pgsql-7.3 docs. Although a motley mix of sgml stuff was initially installed in /usr/share/sgml, I created a /usr/local/share/sgml directory and pointed it to openjade while building openjade as follows: $ ./configure --enable-default-catalog=/usr/local/share/sgml/catalog Subsequently I followed all instructions in doc-toolsets.html for the other packages. I set DOCBOOKSTYLE to /usr/local/share/sgml before running configure for postgres .. I saw an old usenet message (exchange with Bruce Momjian) which suggests that the SGML_CATALOG_FILES env. variable must be set to: export SGML_CATALOG_FILES=/somewhere/docbook31/docbook.cat I set DOCBOOKINDEX to be /usr/bin/collateindex.pl Now finally I go to the doc/src/sgml directory and try: gmake admin.html new.out 2 new.err This however results in a make error. The output file is: /usr/bin/perl /usr/bin/collateindex.pl -f -g -t 'Index' -i 'bookindex' -o bookindex.sgml HTML.index /usr/bin/perl /usr/bin/collateindex.pl -f -g -t 'Index' -i 'setindex' -x -o setindex.sgml HTML.index openjade -D . -D ./ref -c /usr/local/share/sgml/catalog -d stylesheet.dsl -i output-html -t sgml book-decl.sgml admin.sgml The error file however starts off with: Processing HTML.index... 0 entries loaded... 0 entries ignored... Done. Processing HTML.index... 0 entries loaded... 0 entries ignored... Done. openjade:/usr/local/share/sgml/docbook31/docbook.cat:63:79:E: end of entity in comment openjade:/usr/local/share/sgml/docbook31/docbook.cat:62:20:E: cannot find data; tried /usr/local/share/sgml/docbook31/data, ./data, ./ref/data openjade:/usr/local/share/sgml/docbook31/docbook.cat:63:79:E: end of entity in comment openjade:/usr/local/share/sgml/docbook31/docbook.cat:62:20:E: cannot find data; tried /usr/local/share/sgml/docbook31/data, ./data, ./ref/data openjade:book-decl.sgml:1:55:W: cannot generate system identifier for public text -//OASIS//DTD DocBook V3.1//EN openjade:book-decl.sgml:11:0:E: reference to entity BOOK for which no system identifier could be generated openjade:book-decl.sgml:1:0: entity was defined here openjade:book-decl.sgml:11:0:E: DTD did not contain element declaration for document type name openjade:admin.sgml:5:9:E: there is no attribute ID openjade:admin.sgml:5:16:E: element BOOK undefined openjade:admin.sgml:9:7:E: element TITLE undefined snip whole bunch of similar errors openjade:/usr/local/share/sgml/docbook31/docbook.cat:62:20:E: cannot find data; tried /usr/local/share/sgml/docbook31/data, ./data, ./ref/data openjade:/usr/local/share/sgml/docbook-dsssl-1.77/html/dblink.dsl:203:1:E: XRef LinkEnd to missing ID 'STON86' openjade:/usr/local/share/sgml/docbook-dsssl-1.77/html/dblink.dsl:203:1:E: XRef LinkEnd to missing ID 'ROWE87' openjade:/usr/local/share/sgml/docbook-dsssl-1.77/html/dblink.dsl:203:1:E: XRef LinkEnd to missing ID 'STON87a' openjade:/usr/local/share/sgml/docbook-dsssl-1.77/html/dblink.dsl:203:1:E: XRef LinkEnd to missing ID 'STON87b' openjade:/usr/local/share/sgml/docbook-dsssl-1.77/html/dblink.dsl:203:1:E: XRef LinkEnd to missing ID 'STON90a' openjade:/usr/local/share/sgml/docbook-dsssl-1.77/html/dblink.dsl:203:1:E: XRef LinkEnd to missing ID 'STON89' openjade:/usr/local/share/sgml/docbook-dsssl-1.77/html/dblink.dsl:203:1:E: XRef LinkEnd to missing ID 'STON90b' gmake: *** [admin.html] Error 1 Does anybody have any suggestions ? Am I doing something horribly wrong ? Many thanks ! -- Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] some more docbook help
Phew .. after some more struggling with docbook I think I finally managed to get it working. Posting here to help other folks googling through usenet archives. My problem was that I had DOCBOOKSTYLE set to /usr/local/share/sgml - the directory which contained the catalog file. However, inspite of this environment variable, configure insisted on setting DOCBOOKSTYLE to /usr/local/share/sgml/docbook31 I finally fixed it by creating a file /usr/local/share/sgml/docbook31/catalog as: CATALOG ../dsssl/catalog CATALOG docbook.cat CATALOG ../docbook-dsssl-1.77/catalog This works fine thankfully. Now I have another question - and this is probably incredibly stupid. I want to generate the PostgreSQL documentation with some extra add-ons (DDL and DML we have added for streaming data and continuous long-running queries). I would like the add-ons to show in some changed fashion - say perhaps in bright red to show that this is new, not-yet-very-well-tested (:-) functionality specific to our system (TelegraphCQ) and not part of the base PostgreSQL features. How would I go about doing something like this in SGML ? I understand the whole process of SGML documentation as separating content from presentation. So I imagine I should surround our new stuff with some kind of tag, that when used with an HTML (or other) stylesheet generates appropriately changed documentation. I hope I am on the right track ! Could somebody help me please ? Many thanks ! -- Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Detecting corrupted pages earlier
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom Postgres has a bad habit of becoming very confused if the Tom page header of a page on disk has become corrupted. In Tom particular, bogus values in the pd_lower field tend to make I haven't read this piece of pgsql code very carefully so I apologize if what I suggest is already present. One standard solution to handle disk page corruption is the use of consistency bits. The idea is that the bit that starts every 256th byte of a page is a consistency bit. In a 8K page, you'd have 32 consistency bits. If the page is in a consistent state, then all 32 bits will be either 0 or 1. When a page is written to disk, the actual bit in each c-bit position is copied out and placed in the header (end/beginning) of the page. With a 8K page, there will be one word that contains the actual bit. Then the c-bits are all either set or reset depending on the state when the page was last read: if on read time the c-bits were set, then on write time they are reset. So when you read a page, if some of the consistency bits are set and some others are reset then you know that there was a corruption. This is of course based on the assumption that most disk arms manage to atomically write 256 bytes at a time. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] set_ps_display on solaris x86
Our students are (unfortunately) on solaris x86 (unfortunate because I have to do another round of testing before I deploy pgsql code for them to hack). Sadly, set_ps_display does not seem to have any effect in solaris x86. At least ps only reports multiple postmaster processes and arguments. Is there some magic incantation to be chanted to get this to work ? Thanks ! -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] PG_TEMP_FILES_DIR
Quick question for the group I'm assuming that the PG_TEMP_FILES_DIR for BufFile temps is automatically under the PGDATA directory. Is that correct ? -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] docbook and postgresql
Hi folks I'm trying to build the documentation for pgsql (so that I can change it for the stuff we are building) and I'm having trouble finding the necessary docbook stuff. I looked at: http://developer.postgresql.org/docs/postgres/doc-build.html checking for onsgmls... onsgmls checking for openjade... openjade checking for DocBook V3.1... yes checking for DocBook stylesheets... /usr/share/sgml/docbook/ checking for sgmlspl... sgmlspl I set my DOCBOOKSTYLE to /usr/share/sgml/docbook However, I don't seem to have anything that looks like nwalsh-modular in my system (the example in the above web page). As a result I get this: sailesh:datafix~/work/postgres/doc/src/sgml: cd sailesh:datafix~: cd work/postgres/ sailesh:datafix~/work/postgres: cd doc/src/sgml sailesh:datafix~/work/postgres/doc/src/sgml: gmake admin.html /usr/bin/perl /usr/share/sgml/docbook//bin/collateindex.pl -f -g -t 'Index' -o bookindex.sgml -N Can't open perl script /usr/share/sgml/docbook//bin/collateindex.pl: No such file or directory gmake: *** [bookindex.sgml] Error 2 sailesh:datafix~/work/postgres/doc/src/sgml: So, sorry for the newbie question, but if somebody could point me to what exactly I need to install, given that configure thinks that I do have docbook-3.1, I'd be obliged. Thanks ! -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Implementing a new Join Algorithm
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom Anagh Lal [EMAIL PROTECTED] writes: I am trying to test a new join algorithm by implementing it on Postgresql. It would be great if you could give me some start off pointers so as to where all in the source code I will have to make changes. Tom Lots of places ;-). Tom You will find that a full-text indexing tool is an essential Tom aid for working with the Postgres source code. I am partial I've had great success with cscope. Especially with the XEmacs interfaces. I think in fact the easy part is the executor stuff because it's nicely localized. Getting the planner to choose a new option is a little messier. As part of the TelegraphCQ project we've implemented what smells like a symmetric hash join. Actually it's more like an N-way symmetric hash join. This is done using the new SteM operator which can work in a classical iterator model plan. With the Eddy operator we can avoid static dataflows. Since we were a bit chary of jumping headlong into the optimizer, we cheated. What we do is let the postgres optimizer do its thing and produce a plan with Hash Joins. Then we walk the plan and replace Hash Join nodes with our new SteM nodes. We call our conversion code in pg_exec_query_string() right after pg_plan_query() It's more complicated than that because we're actually implementing the Eddy operator. But if your goal is just to try out your nice new join algorithm, this would probably work and be a quick fix to get you started. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Question about bit.h and bit.c
I have a small nit Why is it that bit.h is in src/include/utils and bit.c is in src/backend/lib ? I can never for the life of me remember which is in which :-) -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Question about bit.h and bit.c
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom Sailesh Krishnamurthy [EMAIL PROTECTED] writes: Why is it that bit.h is in src/include/utils and bit.c is in src/backend/lib ? Tom Possibly a more interesting question is why haven't we Tom ditched them both ... AFAICT none of the bit.c routines are Tom used anymore. True. I just searched and found the only uses of the bitmask functions (now greatly expanded) in our code :-) -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Threads
Shridhar == Shridhar Daithankar [EMAIL PROTECTED] writes: Shridhar On Saturday 04 January 2003 03:20 am, you wrote: I am sure, many of you would like to delete this message before reading, hold on. :-) I'm afraid most posters did not read the message. Those who replied Why bother? did not address your challenge: Shridhar Our challenges may be..;-) Not having threading does reduce some of the freedom we've been having in our work. But then we have ripped the process model a fair bit and we have the freedom of an entirely new process to deal with data streams entering the system and we're experimenting with threading for asynchronous I/O there. However, in general I agree with the spirit of the previous messages in this thread that threading isn't the main issue for PG. One thing that I missed so far in the threading thread. Context switches are (IMHO) far cheaper between threads, because you save TLB flushes. Whether this makes a real difference in a data intensive application, I don't know. I wonder how easy it is to measure the x86 counters to see TLB flushes/misses. In a database system, even if one process dies, I'd be very chary of trusting it. So I am not too swayed by the fact that a process-per-connection gets you better isolation. BTW, many commercial database systems also use per-process models on Unix. However they are very aggressive with connection sharing and reuse - even to the point of reusing the same process for multiple active connections .. maybe at transaction boundaries. Good when a connection is maintained for a long duaration with short-lived transactions separated by fair amouns of time. Moreover, in db2 for instance, the same code base is used for both per-thread and per-process models - in other words, the entire code is MT-safe, and the scheduling mechanism is treated as a policy (Win32 is MT, and some Unices MP). AFAICT though, postgres code, such as perhaps the memory contexts is not MT-safe (of course the bufferpool/shmem accesses are safe). -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] BITMAP Index support (and other DSS info.)
Shahbaz == Shahbaz Chaudhary [EMAIL PROTECTED] writes: Shahbaz There are bound to be people in the academia (grad Shahbaz students, professors of CS, etc.) on this mailing list, Shahbaz yet I see few RDBMS courses using postgresql as an Shahbaz example. If people still have connections to Shahbaz universities, it would seem that inviting researchers to Shahbaz use PGSQL for their experiments will quickly make it Shahbaz comparable to Oracle/etc. This would be specifically At Berkeley, PostgreSQL is used for projects in the upper division undergraduate database systems class. At least it was used this past Fall, and we plan to use it in the Spring as well (I will be TA'ng). The projects involved using pg as a back-end and a small buffer replacement policy assignment. http://www-inst.eecs.berkeley.edu/~cs186/ In addition, we (the database systems research group) are using the PostgreSQL code base (7.2.1) to build TelegraphCQ, our new system to process continuous queries over data streams. Preliminary paper here: http://www.cs.berkeley.edu/~franklin/Papers/TCQcidr03.pdf No, it's not really close to a release yet ... :-) -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])