[HACKERS] Pl/Java - next step?

2004-02-21 Thread Thomas Hallgren
Two Pl/Java implementations exists today. Due to the architecture of
PostgreSQL, compromises have been made in both of them to deal with the fact
that each connection lives in its own process. One, I'll call it
Pl/Java_JNI will spawn a JVM on demand for each connection and the other,
Pl/Java_remote, will spawn at least one JVM that lives in a process of its
own and use an inter-process calling mechanism.

I can see PostgreSQL moving forward in one of four different directions:

1. Select Pl/Java_JNI.
2. Select Pl/Java_remote
3. Choose both and agree on the SQL + Java semantics
4. Make the postmaster spawn threads rather than processes (controversial?
Nah :-) )

As the one behind Pl/Java_JNI I'm perhaps not the most objective person when
it comes to choice, but I'll make an effort here and try to list the pros
and cons with each choice. My objective is to start a healthy discussion. I
think Pl/Java migth boost usability of PostgreSQL quite a bit and with an
almost explosive growth of the Java Community its essential that we conclude
this sooner rather than later.



** 1. Select Pl/Java_JNI **
#Pros:#
- Each call becomes extremely lightweight.
JNI is in essence a straight forward in-process function invocation.
Minimizing call overhead becomes very important for functions that a) are
called very often and b) functions that need to call back into the backend
several times.

- Minimum resource utilization when passing values.
Values can be passed by reference. TriggerData, TupleDesc, HeapTuple, byte
arrays etc. need not be copied. Return values can be allocated directly in
the correct MemoryContext.

- Transaction visibility
Using a JDBC driver that's implemented directly on top of SPI ensures that
the transaction visibility is correct without the need to either propagate a
transaction context or make remote calls back into the backend.

- Connection isolation
Easy to use since the developer owns the whole JVM. There's no need to
terminate all connections in order to replace code or to establish a debug
session. Migration can take place gradually.

- Simplicity
No hassle setting up inter-process communication or maintaining a separate
JVM.

- Modern JVM's are less demanding
Sun and other JVM vendors are making serious efforts to make the JVM more
adaptable. Java is not used for heavy weight server processing only. Small
utility programs become more and more common. Thus, decreasing start-up time
and ability to adapt resource consumption have very high priority. Look here
what Java 1.5 does
http://java.sun.com/j2se/1.5.0/docs/relnotes/features.html#vm.

- Well knonw programming envionment
JNI is standard. A potential developer of the code have access to on-line
training.

#Cons:#
- Resource consumption.
A JVM is expensive from a resource perspective.

- Connection start-up time is high.
Booting a JVM takes time. Setups where connections that makes invocations to
Pl/Java are closed and created frequently will suffer from this.

- Java execution model differs from the one used by PostgreSQL
Java uses multithreading wether you like it or not. And the JVM will throw
exceptions. The Pl/Java_JNI handles this by introducing some macros that a
potential developer that makes additions to the port must be aware of. This
also introduces limitations for the user of Pl/Java JNI (such as very
limited functionality once an error has been generated by the backend).





** 2. Select Pl/Java_remote **
#Pros:#
- Each connection becomes fairly lightweight.
A connection is represented as a thread in the remote JVM. Threads are much
less expensive than a full-blown JVM.

- Connection start-up time is low
Startup time will be very quick since thread creation is cheap. Even quicker
if a thread-pool is utilized.

- Reuse of an existing JVM
Small systems might use the same JVM to run an app-server as the one used by
triggers and functions. Albeit not great from a separation of concern
perspective, it might be very efficient for special needs.

- Ability to run the JVM on another server
The JVM can run on a server different from the one running the backend
process. If the number of calls are few in relation to the actual work
performed in each call, this might be interesting.

#Cons:#
- RPC calls are slow
Call between processes are inherently very slow compared to in-process
calls.

- RPC resources needed
Each connection will need an additional socket or shared memory segment.

- Transaction visibility
A connection established in the remote JVM must have the same transaction
visibility as the invoker. In essence, a transaction context must be
propagated to the remote JVM, or the remote JVM must have a JDBC driver that
calls back into the backend.

- RPC management
CORBA or some other mechanism must be installed and maintained.

- Starting/Stopping JVM affects all connections
Attaching a debugger or generating profiling information implies a restart
of the JVM, killing all existing connections that make use of
Pl/Java_remote. Code 

Re: [HACKERS] Progress Report on Materialized Views

2004-02-21 Thread Simon Riggs
Jonathan M. Gardner
 I've implemented a pretty simple Materialized Views scheme. It's not
 terribly complicated, and it works quite well.

Exciting news - excellent work. Starting simple was the right approach!

 There were some issues with the time-sensitivity of the queries. For
 instance, our list of members will decrease as time goes along due to
the
 expiration date of accounts. Because we were running the refresh once
a
 day, there were a few hours of the day where the materialized view
would
 say that the person is a member, but the actual data says he is not.
We
 rewrote our code to pull everything from the materialized view,
greatly
 simplifying the code, and also increasing performance.

That's just part of the package of using Materialized Views. That is
an acceptable trade-off for the performance gains realised.

 My next step... 

Could I suggest that your next step is to sync up with the work being
done on tuning the DBT-3 query workload? As I'm sure you're aware, that
is very similar to TPC-H workload, where most of the commercial RDBMS
vendors utilise Materialized Views to enhance certain queries. Focusing
on that workload may then suggest to you what the next steps to take
are, now that you have solved the specific problems of your own
workloads, though using a generic approach. I think ... Mark Wong, Josh
Berkus and Tom Lane are currently involved with DBT-3 testing on the
OSDL test environment.
 
Materialized Views and improved join-ordering are the next two best
angles of attack on the DBT-3 workload, IMHO.

I very much look forward to further news. 

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] 7.4.1 release status - Turkish Locale

2004-02-21 Thread Tom Lane
Nicolai Tufar [EMAIL PROTECTED] writes:
 It looks to me like every use of strcasecmp in the backend has to be
 questioned if we're going to make this work.  I'm starting to lean in
 the direction of tr_TR is hopelessly broken again...

 With this patch applied everything works fine. Thanks!

Did you try running the regression tests under tr_TR locale?  It seems
a few bricks short of fine yet :-(

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Pl/Java - next step?

2004-02-21 Thread Tom Lane
Thomas Hallgren [EMAIL PROTECTED] writes:
 ** 4. Make the postmaster spawn threads rather than processes **
 I know this is very controversial and perhaps I should not bring it up at
 all. But then again, why not? Most readers are open-minded right?

It's been considered and rejected before, and pljava isn't going to tilt
the scales.  In fact, the main thing that bothers me about your
description of JNI is Java uses multithreading wether you like it or
not.  I am very afraid of what impact a JVM will have on the stability
of the surrounding backend.

Other than that fear, though, the JNI approach seems to have pretty
considerable advantages.  You listed startup time as the main
disadvantage, but perhaps that could be worked around.  Suppose the
postmaster started a JVM --- would that state inherit correctly into
subsequently forked backends?

Also, regarding your option #3 (do both), do you really think something
different is going to happen in practice?  The developers of the other
implementation aren't likely to give it up just because yours exists.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Progress Report on Materialized Views

2004-02-21 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Could I suggest that your next step is to sync up with the work being
 done on tuning the DBT-3 query workload? As I'm sure you're aware, that
 is very similar to TPC-H workload, where most of the commercial RDBMS
 vendors utilise Materialized Views to enhance certain queries.

Oh?  As far as I can tell, TPC-H forbids use of materialized views.
See sections 1.5.6 and 1.5.7 of the spec.  The effect of the fine print
seems to be that the only way you are allowed to store extra copies of
data is as indexes over columns that are primary keys, foreign keys,
or date columns.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Pl/Java - next step?

2004-02-21 Thread Thomas Hallgren
 It's been considered and rejected before, and pljava isn't going to tilt
 the scales.

Didn't think it would. Thought it worth mentioning anyway, partly to get
your reaction.

 In fact, the main thing that bothers me about your
 description of JNI is Java uses multithreading wether you like it or
 not.  I am very afraid of what impact a JVM will have on the stability
 of the surrounding backend.

I have taken extensive measures to prevent multiple threads to access the
backend simultaniously. I encourage you and anyone else who have an interest
in how this is done to read my Some problems and their solution document
posted here:
http://gborg.postgresql.org/project/pljava/genpage.php?solutions.

 Other than that fear, though, the JNI approach seems to have pretty
 considerable advantages.  You listed startup time as the main
 disadvantage, but perhaps that could be worked around.  Suppose the
 postmaster started a JVM --- would that state inherit correctly into
 subsequently forked backends?

That's an interesting thougth. The postmaster just forks. It never exec's
right? Is this true for win32 as well? I've never tried it but it might be
worth pursuing. Sun's new Java 1.5 jvm does this albeit a bit differently.
An initializer process starts up and persists its state. Subsequent JVM's
then reuse that state. I definitely plan for Pl/Java_JNI to take advantage
of that.

 Also, regarding your option #3 (do both), do you really think something
 different is going to happen in practice?  The developers of the other
 implementation aren't likely to give it up just because yours exists.

My objective is not that they or I should give up. I want us to reach a
concensus around what PostgreSQL should offer. If we can find ways to
collaborate and create a two way solution, that's great. If we can
collaborate around one of the solutions, that's perhaps even better, at
least from a developer resource perspective.

Regards,

Thomas Hallgren


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[HACKERS] pl/perl thoughts

2004-02-21 Thread Andrew Dunstan
Hackers,

FYI here are some of my thoughts I just sent to Joshua Drake.

I would add a couple of points:
. since there is a security advisory for Safe.pm prior to version 2.08, 
maybe we should replace require Safe; with use Safe 2.08;.
. I thought about some sort of precompilation methods for perl functions 
known to the database (store the function refs in a hash so you could 
get at them via syntax like 
{$func_hash{schema_name}-{func_name}}(@func_args) ). I decided to 
abandon this approach mainly because of postgresql's function 
overloading, making disambiguation very hard. With a full SPI interface 
there will be access to any function, not just plperl functions, and 
postgres will do the disambiguation for you. Also, see below for a 
slightly more klunky but much more lightweight approach which would mean 
there was no necessity for a callback to postgresql.

cheers

andrew

I haven't got past the thinking stage yet, and a few things have held 
up my progress elsewhere. The one possibly productive thought I have 
to share with you is this: it is probably overkill to have a separate 
Safe container for each plperl function. I don't see any reason that 
they shouldn't all live in one Safe container. Then they could share 
data and indeed some preloaded functions without doing anything special.

Thus the following perl contained in plperl.c and executed on 
interpreter startup:

   require Safe; SPI::bootstrap();
   sub ::mksafefunc { my $x = new Safe; 
$x-permit_only(':default');$x-permit(':base_math');
   $x-share(qw[elog DEBUG LOG INFO NOTICE WARNING 
ERROR]);
return $x-reval(qq[sub { $_[0] }]); }
   sub ::mkunsafefunc {return eval(qq[ sub { $_[0] } ]); }

would become something like:

   require Safe; SPI::bootstrap();
   use vars qw($PLContainer); $PLContainer = new 
Safe(PLPerl);
   
$PLContainer-permit_only(':default');$PLContainer-permit(':base_math');
   $PLContainer-share(qw[elog DEBUG LOG INFO NOTICE 
WARNING ERROR]);
   sub ::mksafefunc { return $PLContainer-reval(qq[sub { 
$_[0] }]); }
   sub ::mkunsafefunc {return eval(qq[ sub { $_[0] } ]); }

Now you could do something like this:

create function myplperlfuncs() returns int language plperl is '
   $datavar = foo;
   $funcvar = sub { return bar; };
   return 1;
';
create function f1 () returns text language plperl as '
 return $datavar;
';
create function f2() returns text language plperl as '
 return $funcvar();
';
At the start of your session you would issue select myplperlfuncs(); 
to preload the values, and thereafter you could call f1() and f2() 
quite happily.

It's actually a bit of a pity that we don't have provision for a per 
database exec on startup procedure, which could handle this more 
elegantly (i.e. we would register myplperlfuncs() as something to be 
run on startup, so the user wouldn't have to worry at all about it.)

I have not tested any of this - it is still straight out of my head.

As others have noted, the biggest need is for full SPI access. I don't 
think this is hard - just a fair bit of work. After that it would be 
very nice to have a DBI type handle so that programmers used to doing 
things the DBI way (and what perl programmer isn't?) will feel right 
at home with plperl. In most cases I guess that would be a thin layer 
over the SPI stuff. Alternatively, you could hide the SPI access and 
just make the DBI handle visible - it would still be mostly calling 
SPI under the hood, of course. As they say in the perl world, 
TIMTOWTDI. Providing a DBI handle would also square with what happens 
in the Java world, where you can write server side methods that access 
the database via a JDBC interface.




---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Pl/Java - next step?

2004-02-21 Thread Andrew Dunstan


Thomas Hallgren wrote:

Other than that fear, though, the JNI approach seems to have pretty
considerable advantages.  You listed startup time as the main
disadvantage, but perhaps that could be worked around.  Suppose the
postmaster started a JVM --- would that state inherit correctly into
subsequently forked backends?
   

That's an interesting thougth. The postmaster just forks. It never exec's
right? Is this true for win32 as well? I've never tried it but it might be
worth pursuing. Sun's new Java 1.5 jvm does this albeit a bit differently.
An initializer process starts up and persists its state. Subsequent JVM's
then reuse that state. I definitely plan for Pl/Java_JNI to take advantage
of that.
 

Unfortunately, WIN32 has no fork(), and we have to exec the backend, in 
effect. You would need to handle both scenarios (#ifdef EXEC_BACKEND). 
For Unix this could be nice, though , and eliminate most of the 
disadvantage of your approach.



cheers

andrew

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Mac OS X, PostgreSQL, PL/Tcl

2004-02-21 Thread Scott Goodwin
Ok, so it's something specific to my setup. I created a test account, 
logged in and compiled postgresql there with a clean shell environment 
and it worked fine. So I'm shooting myself in the foot in my login 
environment. *sigh*.

thanks,

/s.

On Feb 21, 2004, at 1:51 AM, Tom Lane wrote:

Scott Goodwin [EMAIL PROTECTED] writes:
Hoping someone can help me figure out why I can't get PL/Tcl to load
without crashing the backend on Mac OS 10.3.2.
FWIW, pltcl seems to work for me.  Using up-to-date Darwin 10.3.2
and PG CVS tip, I did
configure --with-tcl --without-tk
then make, make install, etc.  pltcl installs and passes its regression
test.
psql:/Users/scott/pgtest/add_languages.sql:12: server closed the
connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
Can you provide a stack trace for this?

			regards, tom lane



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] 7.4.1 release status - Turkish Locale

2004-02-21 Thread Nicolai Tufar
 -Original Message-
 From: Tom Lane 
 Did you try running the regression tests under tr_TR locale?  It seems
 a few bricks short of fine yet :-(

I run regression tests under tr_TR locale. To do this I hardcoded
Turkish locale in initdb in pg_regress.sh. Three tests failed, I
attached resulting diff. 

With days of the week, the same problem is with downcasting occurs. I
think it is not that crucial, but the rest of the differences in the
file seem to be important. I was not able to interpret them.

Thanks,
Nicolai







 
   regards, tom lane


regression.diffs
Description: Binary data

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match