[HACKERS] Pl/Java - next step?
Two Pl/Java implementations exists today. Due to the architecture of PostgreSQL, compromises have been made in both of them to deal with the fact that each connection lives in its own process. One, I'll call it Pl/Java_JNI will spawn a JVM on demand for each connection and the other, Pl/Java_remote, will spawn at least one JVM that lives in a process of its own and use an inter-process calling mechanism. I can see PostgreSQL moving forward in one of four different directions: 1. Select Pl/Java_JNI. 2. Select Pl/Java_remote 3. Choose both and agree on the SQL + Java semantics 4. Make the postmaster spawn threads rather than processes (controversial? Nah :-) ) As the one behind Pl/Java_JNI I'm perhaps not the most objective person when it comes to choice, but I'll make an effort here and try to list the pros and cons with each choice. My objective is to start a healthy discussion. I think Pl/Java migth boost usability of PostgreSQL quite a bit and with an almost explosive growth of the Java Community its essential that we conclude this sooner rather than later. ** 1. Select Pl/Java_JNI ** #Pros:# - Each call becomes extremely lightweight. JNI is in essence a straight forward in-process function invocation. Minimizing call overhead becomes very important for functions that a) are called very often and b) functions that need to call back into the backend several times. - Minimum resource utilization when passing values. Values can be passed by reference. TriggerData, TupleDesc, HeapTuple, byte arrays etc. need not be copied. Return values can be allocated directly in the correct MemoryContext. - Transaction visibility Using a JDBC driver that's implemented directly on top of SPI ensures that the transaction visibility is correct without the need to either propagate a transaction context or make remote calls back into the backend. - Connection isolation Easy to use since the developer owns the whole JVM. There's no need to terminate all connections in order to replace code or to establish a debug session. Migration can take place gradually. - Simplicity No hassle setting up inter-process communication or maintaining a separate JVM. - Modern JVM's are less demanding Sun and other JVM vendors are making serious efforts to make the JVM more adaptable. Java is not used for heavy weight server processing only. Small utility programs become more and more common. Thus, decreasing start-up time and ability to adapt resource consumption have very high priority. Look here what Java 1.5 does http://java.sun.com/j2se/1.5.0/docs/relnotes/features.html#vm. - Well knonw programming envionment JNI is standard. A potential developer of the code have access to on-line training. #Cons:# - Resource consumption. A JVM is expensive from a resource perspective. - Connection start-up time is high. Booting a JVM takes time. Setups where connections that makes invocations to Pl/Java are closed and created frequently will suffer from this. - Java execution model differs from the one used by PostgreSQL Java uses multithreading wether you like it or not. And the JVM will throw exceptions. The Pl/Java_JNI handles this by introducing some macros that a potential developer that makes additions to the port must be aware of. This also introduces limitations for the user of Pl/Java JNI (such as very limited functionality once an error has been generated by the backend). ** 2. Select Pl/Java_remote ** #Pros:# - Each connection becomes fairly lightweight. A connection is represented as a thread in the remote JVM. Threads are much less expensive than a full-blown JVM. - Connection start-up time is low Startup time will be very quick since thread creation is cheap. Even quicker if a thread-pool is utilized. - Reuse of an existing JVM Small systems might use the same JVM to run an app-server as the one used by triggers and functions. Albeit not great from a separation of concern perspective, it might be very efficient for special needs. - Ability to run the JVM on another server The JVM can run on a server different from the one running the backend process. If the number of calls are few in relation to the actual work performed in each call, this might be interesting. #Cons:# - RPC calls are slow Call between processes are inherently very slow compared to in-process calls. - RPC resources needed Each connection will need an additional socket or shared memory segment. - Transaction visibility A connection established in the remote JVM must have the same transaction visibility as the invoker. In essence, a transaction context must be propagated to the remote JVM, or the remote JVM must have a JDBC driver that calls back into the backend. - RPC management CORBA or some other mechanism must be installed and maintained. - Starting/Stopping JVM affects all connections Attaching a debugger or generating profiling information implies a restart of the JVM, killing all existing connections that make use of Pl/Java_remote. Code
Re: [HACKERS] Progress Report on Materialized Views
Jonathan M. Gardner I've implemented a pretty simple Materialized Views scheme. It's not terribly complicated, and it works quite well. Exciting news - excellent work. Starting simple was the right approach! There were some issues with the time-sensitivity of the queries. For instance, our list of members will decrease as time goes along due to the expiration date of accounts. Because we were running the refresh once a day, there were a few hours of the day where the materialized view would say that the person is a member, but the actual data says he is not. We rewrote our code to pull everything from the materialized view, greatly simplifying the code, and also increasing performance. That's just part of the package of using Materialized Views. That is an acceptable trade-off for the performance gains realised. My next step... Could I suggest that your next step is to sync up with the work being done on tuning the DBT-3 query workload? As I'm sure you're aware, that is very similar to TPC-H workload, where most of the commercial RDBMS vendors utilise Materialized Views to enhance certain queries. Focusing on that workload may then suggest to you what the next steps to take are, now that you have solved the specific problems of your own workloads, though using a generic approach. I think ... Mark Wong, Josh Berkus and Tom Lane are currently involved with DBT-3 testing on the OSDL test environment. Materialized Views and improved join-ordering are the next two best angles of attack on the DBT-3 workload, IMHO. I very much look forward to further news. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] 7.4.1 release status - Turkish Locale
Nicolai Tufar [EMAIL PROTECTED] writes: It looks to me like every use of strcasecmp in the backend has to be questioned if we're going to make this work. I'm starting to lean in the direction of tr_TR is hopelessly broken again... With this patch applied everything works fine. Thanks! Did you try running the regression tests under tr_TR locale? It seems a few bricks short of fine yet :-( regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Pl/Java - next step?
Thomas Hallgren [EMAIL PROTECTED] writes: ** 4. Make the postmaster spawn threads rather than processes ** I know this is very controversial and perhaps I should not bring it up at all. But then again, why not? Most readers are open-minded right? It's been considered and rejected before, and pljava isn't going to tilt the scales. In fact, the main thing that bothers me about your description of JNI is Java uses multithreading wether you like it or not. I am very afraid of what impact a JVM will have on the stability of the surrounding backend. Other than that fear, though, the JNI approach seems to have pretty considerable advantages. You listed startup time as the main disadvantage, but perhaps that could be worked around. Suppose the postmaster started a JVM --- would that state inherit correctly into subsequently forked backends? Also, regarding your option #3 (do both), do you really think something different is going to happen in practice? The developers of the other implementation aren't likely to give it up just because yours exists. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Progress Report on Materialized Views
Simon Riggs [EMAIL PROTECTED] writes: Could I suggest that your next step is to sync up with the work being done on tuning the DBT-3 query workload? As I'm sure you're aware, that is very similar to TPC-H workload, where most of the commercial RDBMS vendors utilise Materialized Views to enhance certain queries. Oh? As far as I can tell, TPC-H forbids use of materialized views. See sections 1.5.6 and 1.5.7 of the spec. The effect of the fine print seems to be that the only way you are allowed to store extra copies of data is as indexes over columns that are primary keys, foreign keys, or date columns. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Pl/Java - next step?
It's been considered and rejected before, and pljava isn't going to tilt the scales. Didn't think it would. Thought it worth mentioning anyway, partly to get your reaction. In fact, the main thing that bothers me about your description of JNI is Java uses multithreading wether you like it or not. I am very afraid of what impact a JVM will have on the stability of the surrounding backend. I have taken extensive measures to prevent multiple threads to access the backend simultaniously. I encourage you and anyone else who have an interest in how this is done to read my Some problems and their solution document posted here: http://gborg.postgresql.org/project/pljava/genpage.php?solutions. Other than that fear, though, the JNI approach seems to have pretty considerable advantages. You listed startup time as the main disadvantage, but perhaps that could be worked around. Suppose the postmaster started a JVM --- would that state inherit correctly into subsequently forked backends? That's an interesting thougth. The postmaster just forks. It never exec's right? Is this true for win32 as well? I've never tried it but it might be worth pursuing. Sun's new Java 1.5 jvm does this albeit a bit differently. An initializer process starts up and persists its state. Subsequent JVM's then reuse that state. I definitely plan for Pl/Java_JNI to take advantage of that. Also, regarding your option #3 (do both), do you really think something different is going to happen in practice? The developers of the other implementation aren't likely to give it up just because yours exists. My objective is not that they or I should give up. I want us to reach a concensus around what PostgreSQL should offer. If we can find ways to collaborate and create a two way solution, that's great. If we can collaborate around one of the solutions, that's perhaps even better, at least from a developer resource perspective. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] pl/perl thoughts
Hackers, FYI here are some of my thoughts I just sent to Joshua Drake. I would add a couple of points: . since there is a security advisory for Safe.pm prior to version 2.08, maybe we should replace require Safe; with use Safe 2.08;. . I thought about some sort of precompilation methods for perl functions known to the database (store the function refs in a hash so you could get at them via syntax like {$func_hash{schema_name}-{func_name}}(@func_args) ). I decided to abandon this approach mainly because of postgresql's function overloading, making disambiguation very hard. With a full SPI interface there will be access to any function, not just plperl functions, and postgres will do the disambiguation for you. Also, see below for a slightly more klunky but much more lightweight approach which would mean there was no necessity for a callback to postgresql. cheers andrew I haven't got past the thinking stage yet, and a few things have held up my progress elsewhere. The one possibly productive thought I have to share with you is this: it is probably overkill to have a separate Safe container for each plperl function. I don't see any reason that they shouldn't all live in one Safe container. Then they could share data and indeed some preloaded functions without doing anything special. Thus the following perl contained in plperl.c and executed on interpreter startup: require Safe; SPI::bootstrap(); sub ::mksafefunc { my $x = new Safe; $x-permit_only(':default');$x-permit(':base_math'); $x-share(qw[elog DEBUG LOG INFO NOTICE WARNING ERROR]); return $x-reval(qq[sub { $_[0] }]); } sub ::mkunsafefunc {return eval(qq[ sub { $_[0] } ]); } would become something like: require Safe; SPI::bootstrap(); use vars qw($PLContainer); $PLContainer = new Safe(PLPerl); $PLContainer-permit_only(':default');$PLContainer-permit(':base_math'); $PLContainer-share(qw[elog DEBUG LOG INFO NOTICE WARNING ERROR]); sub ::mksafefunc { return $PLContainer-reval(qq[sub { $_[0] }]); } sub ::mkunsafefunc {return eval(qq[ sub { $_[0] } ]); } Now you could do something like this: create function myplperlfuncs() returns int language plperl is ' $datavar = foo; $funcvar = sub { return bar; }; return 1; '; create function f1 () returns text language plperl as ' return $datavar; '; create function f2() returns text language plperl as ' return $funcvar(); '; At the start of your session you would issue select myplperlfuncs(); to preload the values, and thereafter you could call f1() and f2() quite happily. It's actually a bit of a pity that we don't have provision for a per database exec on startup procedure, which could handle this more elegantly (i.e. we would register myplperlfuncs() as something to be run on startup, so the user wouldn't have to worry at all about it.) I have not tested any of this - it is still straight out of my head. As others have noted, the biggest need is for full SPI access. I don't think this is hard - just a fair bit of work. After that it would be very nice to have a DBI type handle so that programmers used to doing things the DBI way (and what perl programmer isn't?) will feel right at home with plperl. In most cases I guess that would be a thin layer over the SPI stuff. Alternatively, you could hide the SPI access and just make the DBI handle visible - it would still be mostly calling SPI under the hood, of course. As they say in the perl world, TIMTOWTDI. Providing a DBI handle would also square with what happens in the Java world, where you can write server side methods that access the database via a JDBC interface. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Pl/Java - next step?
Thomas Hallgren wrote: Other than that fear, though, the JNI approach seems to have pretty considerable advantages. You listed startup time as the main disadvantage, but perhaps that could be worked around. Suppose the postmaster started a JVM --- would that state inherit correctly into subsequently forked backends? That's an interesting thougth. The postmaster just forks. It never exec's right? Is this true for win32 as well? I've never tried it but it might be worth pursuing. Sun's new Java 1.5 jvm does this albeit a bit differently. An initializer process starts up and persists its state. Subsequent JVM's then reuse that state. I definitely plan for Pl/Java_JNI to take advantage of that. Unfortunately, WIN32 has no fork(), and we have to exec the backend, in effect. You would need to handle both scenarios (#ifdef EXEC_BACKEND). For Unix this could be nice, though , and eliminate most of the disadvantage of your approach. cheers andrew ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Mac OS X, PostgreSQL, PL/Tcl
Ok, so it's something specific to my setup. I created a test account, logged in and compiled postgresql there with a clean shell environment and it worked fine. So I'm shooting myself in the foot in my login environment. *sigh*. thanks, /s. On Feb 21, 2004, at 1:51 AM, Tom Lane wrote: Scott Goodwin [EMAIL PROTECTED] writes: Hoping someone can help me figure out why I can't get PL/Tcl to load without crashing the backend on Mac OS 10.3.2. FWIW, pltcl seems to work for me. Using up-to-date Darwin 10.3.2 and PG CVS tip, I did configure --with-tcl --without-tk then make, make install, etc. pltcl installs and passes its regression test. psql:/Users/scott/pgtest/add_languages.sql:12: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Can you provide a stack trace for this? regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] 7.4.1 release status - Turkish Locale
-Original Message- From: Tom Lane Did you try running the regression tests under tr_TR locale? It seems a few bricks short of fine yet :-( I run regression tests under tr_TR locale. To do this I hardcoded Turkish locale in initdb in pg_regress.sh. Three tests failed, I attached resulting diff. With days of the week, the same problem is with downcasting occurs. I think it is not that crucial, but the rest of the differences in the file seem to be important. I was not able to interpret them. Thanks, Nicolai regards, tom lane regression.diffs Description: Binary data ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match