Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Scott Lamb
On Feb 16, 2006, at 2:17 PM, Mark Lewis wrote:Data types which could probably provide a useful function for f would be int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). ...and with some work, floats (I think just the exponent would work, if nothing else). bytea. Probably just about anything.Interesting. If you abandon the idea that collisions should be impossible (they're not indexes) or extremely rare (they're not hashes), it's pretty easy to come up with a decent hint to avoid a lot of dereferences. --Scott Lamb <http://www.slamb.org/> 

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Scott Lamb

On Feb 16, 2006, at 8:32 AM, Ron wrote:
Let's pretend that we have the typical DB table where rows are  
~2-4KB apiece.  1TB of storage will let us have 256M-512M rows in  
such a table.


A 32b hash code can be assigned to each row value such that only  
exactly equal rows will have the same hash code.

A 32b pointer can locate any of the 256M-512M rows.

Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b 
+32b= 64b*(2^28 to 2^29)=  2-4GB of pointers+keys followed by an  
optional pass to rearrange the actual rows if we so wish.


I don't understand this.

This is a true statement: (H(x) != H(y)) => (x != y)
This is not: (H(x) < H(y)) => (x < y)

Hash keys can tell you there's an inequality, but they can't tell you  
how the values compare. If you want 32-bit keys that compare in the  
same order as the original values, here's how you have to get them:


(1) sort the values into an array
(2) use each value's array index as its key

It reduces to the problem you're trying to use it to solve.


--
Scott Lamb <http://www.slamb.org/>



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Mixing threaded and non-threaded

2004-01-31 Thread Scott Lamb
On Jan 30, 2004, at 4:53 PM, Bruce Momjian wrote:
Actually, thinking about this a bit more, that might not even be
necessary. Is SIGPIPE-via-(read|write) synchronous or asynchronous?
(I.e., is the SIGPIPE guaranteed to arrive during the offending system
call?) I was thinking not, but maybe yes. I can't seem to find a
straight answer. A lot of documents seem to confuse thread-directed 
and
synchronous, when they're not quite the same thing. 
SIGALRM-via-alarm()
is thread-directed but obviously asynchronous.
SIGPIPE is a sychronous signal that is called during the read() in
libpq.  I am not sure what thread-directed is.
Ahh, then the usage in libpq is safe; sorry for the false alarm. The 
concerns about signal safety are really only for async signals, as the 
behavior is undefined only when one async signal-unsafe function is 
called from a signal interrupting another:

"In the presence of signals, all functions defined by this volume of 
IEEE Std 1003.1-2001 shall behave as defined when called from or 
interrupted by a signal-catching function, with a single exception: 
when a signal interrupts an unsafe function and the signal-catching 
function calls an unsafe function, the behavior is undefined."

thread-directed, by the way, simply means that the signal is directed 
at a specific thread, not just some thread in the process that doesn't 
have it masked. It's the difference between kill() and pthread_kill(). 
AFAIK, all synchronous signals are thread-directed, but not all 
thread-directed signals are synchronous.

Here the signal is synchronous, so the signal is guaranteed to happen 
at a safe point (during the read()), so there's no problem.

Thanks,
Scott Lamb
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Mixing threaded and non-threaded

2004-01-30 Thread Scott Lamb
Scott Lamb wrote:
You could just do a pthread_sigmask() before and after the 
pthread_setspecific() to guarantee that no SIGPIPE will arrive on that 
thread in that time. I think it's pretty safe to assume that as long as 
you're not doing a pthread_[gs]etspecific() on that same pthread_key_t, 
it's safe.
Actually, thinking about this a bit more, that might not even be 
necessary. Is SIGPIPE-via-(read|write) synchronous or asynchronous? 
(I.e., is the SIGPIPE guaranteed to arrive during the offending system 
call?) I was thinking not, but maybe yes. I can't seem to find a 
straight answer. A lot of documents seem to confuse thread-directed and 
synchronous, when they're not quite the same thing. SIGALRM-via-alarm() 
is thread-directed but obviously asynchronous.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Mixing threaded and non-threaded

2004-01-30 Thread Scott Lamb
Bruce Momjian wrote:
Scott Lamb wrote:

Speaking of async signal-safe functions, pthread_getspecific() isn't 
specified to be (and thus PQinSend() and thus 
sigpipe_handler_ignore_send()). It's probably okay, but libpq is 
technically using undefined behavior according to SUSv3.


Yikes.  I never suspected pthread_getspecific() would not be signal safe
because it is already thread safe, but I see the point that it is called
in the current thread.  Any ideas how to fix this?
A few idea.

When I ran a similar situation in my own code, my approach was to just 
add a comment to make the assumption explicit. It's quite possible the 
standard is just overly conservative. Some specific platforms - 
<http://www.qnx.com/developer/docs/qnx_6.1_docs/neutrino/lib_ref/p/pthread_getspecific.html> 
- mark it as being async signal-safe.

Searching for "pthread_getspecific signal" on google groups turns up a 
bunch of other people who have run into this same problem. One person 
notes that it's definitely not safe on LinuxThreads if you use 
sigaltstack().

If your platform has SA_SIGINFO, you could - in theory - use the 
ucontext argument to see if that thread is in a PostgreSQL operation. 
But I doubt that's portable.

You could just do a pthread_sigmask() before and after the 
pthread_setspecific() to guarantee that no SIGPIPE will arrive on that 
thread in that time. I think it's pretty safe to assume that as long as 
you're not doing a pthread_[gs]etspecific() on that same pthread_key_t, 
it's safe.

There's one thread function that is guaranteed to be async signal-safe - 
sem_post(). (Though apparently older LinuxThreads on x86 fails to meet 
this assumption.) I'm not quite sure what you could do with that, but 
apparently there's something or they wouldn't have gone to the effort of 
making it so.

Scott

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Mixing threaded and non-threaded

2004-01-30 Thread Scott Lamb
On Jan 30, 2004, at 3:18 AM, Bruce Momjian wrote:
Manfred Spraul wrote:
Bruce Momjian wrote:

Woh, as far as I know, any application should run fine with 
-lpthread,
threaded or not.  What OS are you on?  This is the first I have 
heard of
this problem.


Perhaps we should try to figure out how other packages handle
multithreaded/singlethreaded libraries? I'm looking at openssl right
now, and openssl never links against libpthread: The caller is
responsible for registering the locking primitives.
Some other libraries, such as boost, always link against -lpthread when 
it is present.

I don't think OpenSSL's example is a good one to follow. It's way too 
easy to forget to do that, and then your application is broken. You'll 
have weird crashes that will be hard to figure out. I think OpenSSL was 
made such because pthreads was not so common back in the day; they 
probably wanted to support other threading APIs. That's unnecessary 
now.

Another reason might be to avoid the expense of locks when they are 
unnecessary. But also, I think that is not as necessary as it once was, 
particularly with modern systems like Linux+NPTL having locks cost 
virtually nothing when there is no contention.

We perhaps don't need -lpthread for creating libpq, but only for ecpg.
However, now that we have used thread locking for SIGPIPE, we are now
calling pthread from libpq, but only 7.5.
However, I still don't understand why the user is seeing a problem and
what rewrite he thinks is necessary for his application because pthread
is linked in.
I'm 99% certain that any application will work with -lpthread on RedHat 
Linux. And 95% certain that's true on _any_ platform. There's no 
pthread_init() or anything; the distinction he was describing between a 
non-threaded application and a threaded application with only one 
thread doesn't exist as far as I know.

And he mentioned that the deadlocks are occurring in a SIGCHLD handler. 
Since so few functions are async signal-safe (I doubt anything in libpq 
is), the code in question was broken before; the extra locking is just 
making it more obvious.

Speaking of async signal-safe functions, pthread_getspecific() isn't 
specified to be (and thus PQinSend() and thus 
sigpipe_handler_ignore_send()). It's probably okay, but libpq is 
technically using undefined behavior according to SUSv3.

Scott Lamb

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] Mixing threaded and non-threaded

2004-01-27 Thread Scott Lamb
On Jan 27, 2004, at 1:16 PM, Steve Atkins wrote:
A hint, though, might be that it's a multiprocess application with a
single master process that controls dozens of child processes. When the
master shuts down it asks all the children to shut down, and then it
deadlocks in the SIGCHILD handler.
It's not safe to do anything interesting in a SIGCHLD handler, unless  
you have pretty severe restrictions on when the signal can arrive. Take  
a look at  
<http://www.opengroup.org/onlinepubs/007904975/functions/ 
xsh_chap02_04.html>. It contains a list of all the async signal-safe  
functions in SUSv3. It's a pretty short list. Notably absent are  
pthread_mutex_*() and malloc() (and anything that uses them).

Scott Lamb

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] returning PGresult as xml

2004-01-27 Thread Scott Lamb
On Jan 25, 2004, at 3:07 AM, Brian Moore wrote:
it's been said that converting a PGresult into xml is "trivial" and
that's why it hasn't been done in the codebase as of yet. i have seen
much code that writes xml, and many mistakes are made. most often
improper escaping, or writing to a schema/DTD that has not been
well-thought out. the transformation into xml is not difficult, but it
does require attention to detail.
The escaping, at any rate, is trivial if you use a proper API. It  
sounds like your code is not using any XML API, given that you have not  
mentioned adding dependencies to libpq and that you've mentioned your  
own hashtable algorithm. It would be much easier if you did so, though  
I imagine the additional dependency would mean it would not be accepted  
into libpq.


  
  
  
1
2
  

How would you filter for a column in XSLT based on column name with  
this schema? It's certainly not trivial. I have similar code, and I  
included the column name as an attribute in each column element for  
this reason.

I also used the java.sql type names rather than PostgreSQL ones, as my  
code is not specific to PostgreSQL.

i would expect that integration would look something like exposing
from libpq a function that looks something like:
  const char *PGresult_as_xml(PGresult *result, int include_dtd);
Ugh. So it returns the whole thing as one big string? That won't hold  
up well if your resultset is large.

A better way would be to pump out SAX events. This is what I did for  
three reasons:

1) The escaping becomes trivial, as mentioned above. In fact, not only  
does SAX escape things correctly, but it makes you explicitly specify  
that the string you're giving it is character data, an element name, an  
attribute name, an attribute value, etc, and handles everything  
properly based on that. So you'd really have to work to screw it up,  
unlike code that just does like

printf("%s",  
xml_attr_escape(foo_val), xml_attr_escape(bar_val),  
xml_char_escape(elem_val));

where it would be quite easy to lose track of what needs to be escaped  
how, what variables are already escaped, etc.

2) It can stream large result sets, provided that the next stage  
supports doing so. Certainly a raw SAX serializer would, also some XSLT  
stylesheets with Xalan, and STX/Joost is designed for streaming  
transformations.

3) If the next stage is a transformation, this makes it unnecessary to  
serialize and parse the data between. So the SAX way is faster.

You're welcome to take a look at my code. I imagine it will not be  
directly useful to you, as it is written in Java, but I have a live  
example which puts this stuff to use. Designing an acceptable API and  
schema is always much easier when you see how it is put to use.

<http://www.slamb.org/projects/xmldb/> - my (so far poorly-named) xmldb  
project, which includes the org.slamb.xmldb.ResultSetProducer class to  
transform a java.sql.ResultSet to SAX events in my resultset schema.

<http://www.slamb.org/svn/repos/projects/xmldb/src/java/org/slamb/ 
xmldb/ResultSetProducer.java> - source code for said class

<http://www.slamb.org/projects/mb/> - a message board which uses this  
code and some XSLT

<https://www.slamb.org/mb/> - a live example of said message board

<http://www.slamb.org/svn/repos/projects/mb/src/WEB-INF/xsl/ 
resultset.xsl> - simple XSLT to take an arbitrary resultset and convert  
it to an HTML table

<http://www.slamb.org/svn/repos/projects/mb/src/WEB-INF/xsl/main.xsl> -  
an example XSLT file that inherits this and then provides exceptions  
for a couple columns (not displaying the id column, instead including  
it as a hyperlink in the name column).

Good luck.

Scott Lamb

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Allow backend to output result sets in XML

2004-01-22 Thread Scott Lamb
On Jan 21, 2004, at 12:19 PM, Peter Eisentraut wrote:
I think "output XML" is just buzz.  Give us a real use scenario and an
indication that a majority also has that use scenario (vs. the other
ones listed above), then we can talk.
I do this all the time.

I have JDBC code to take a java.sql.ResultSet and push out SAX events 
in a standard schema. I also have a XSLT stylesheet that formats them 
in a decent way. In this manner, it's very easy for me to make 
database-driven webpages. I can inherit the "normal" stylesheet and 
then just code the exceptions.

However, I'm quite happy doing this on the client side. I'm not sure 
why it would be beneficial to do this as part of the PostgreSQL server.

Scott Lamb

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Date input changed in 7.4 ?

2003-08-30 Thread Scott Lamb
On Thursday, Aug 28, 2003, at 00:07 America/Chicago, Dennis Björklund 
wrote:

On Wed, 27 Aug 2003, Kevin Brown wrote:

There are some cases where it's extremely useful for PostgreSQL to
accept dates of any format it knows about (ambiguities should be
resolved either by looking at the current DateStyle or, failing that, 
by
applying the recognition in a well-defined order
And the argument bhen this was that it only leads to wrong data. As I 
see
it, the only time you have dates in different styles is when you get it
from a human entering dates. Then he/she will enter 01/30/03 and it is
interpreted as 2003 January 30, he/she feels happy and enters another 
date
in january, say 01/10/03 and now maybe it is interpreted as 2003 
October
1. Of course that error is not noticed since it worked the previous 
time..
Yes, yes, yes. I've run into exactly that problem when scripting MS 
Outlook. All the dates on the twelfth of the month or earlier had the 
month and day transposed. It never threw an error. I checked the stuff 
with my own birthday (the 26th of April) so I didn't notice the problem 
until a user pointed it out. The moral of the story is that an error is 
much better than a guess. (Alternate moral: don't be like Microsoft.)

Thanks,
Scott Lamb
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] New Portal in Place, DNS switched ...

2003-01-06 Thread Scott Lamb
Marc G. Fournier wrote:

I'm just announcing here, since I'd like to see some ppl testing this out
and let us know if there are any problems ... DNS is going to take a
little while to propogate, so the old site may still come up in the
interium ... another reason not to announce it right away :)


My hope was that this would be all the information in a single, 
integrated site. What I see now is one (nice) front page with links to 
all the other sites, which don't even have links back, much less the 
same look and feel. Are you intending to switch those eventually? 
(Please say yes.)

Thanks,
Scott


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] PostgreSQL Global Development Group

2002-12-04 Thread Scott Lamb
Lamar Owen wrote:

However, I seriously question the need in the long term for our sites to be as 
fractured as they are.  Good grief!  We've got advocacy.postgresql.org, 
techdocs.postgresql.org, odbc.postgresql.org, gborg.postgresql.org, 
developer.postgresql.org, jdbc.postgresql.org, etc.  Oh, and we also have 
www.postgresql.org on the side?  I think not.  Oh, and they are fractured in 
their styles -- really, guys, we need a unified style here.

I'd love to see this happen. From reading the messages here, it sounds 
like the perception is that marketing == spouting bullshit. I don't 
believe that's true. I think having an informative, up-to-date, 
stylistically consistent website would do a tremendous amount of good.

The JDBC one is a particularly bad example right now - it doesn't fit in 
with any of the rest of the site and its most prominent link is to a 
completely out-of-date list of compliance tests the driver fails. The 
driver may have its flaws but it's a lot better than presented there.

IMHO these things make a difference to technical people as well as 
suits. If that site and the MySQL JDBC driver's site were my first 
impressions, I would be using MySQL.

The JDBC site is certainly not the only one with flaws. The main website 
has this paragraph in :

For encrypted postgresql connections, Brett McCormick
([EMAIL PROTECTED]) has made a patch for PostgreSQL
version 6.3.2 using SSL. Visit his info page for more information.

That's horribly obselete. In fact, I think a lot of the related projects 
are. That's only two clicks away from the main page.

I'm volunteering to do work here. I could at the very least go through 
the sites and make a longer list of things like this that I notice. If 
they are public CVS somewhere, I can send patches. I saw that there's a 
. What's going on with that? Is there 
anything I can do to speed up its adoption? How will it affect the rest 
of the sites?

Is this list the appropriate place to discuss the websites? or should I 
take it to -advocacy? My impression here is that the two sites are 
maintained separately and the people involved haven't interacted very 
much. Is that accurate or no?

Thanks,
Scott


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Postgres 7.3 announcement on postgresql.org

2002-11-29 Thread Scott Lamb
Christopher Kings-Lynne wrote:

There isn't one!

Chris


No kidding. I propose changes to the following pages:

: Prominent mention of the latest 
stable and unstable releases (just stable now, of course). But since 
this seems to be kind of a repeat of the news page, maybe an 
announcement also.

: Announcement of each of the 
betas, release candidates, and the final release now.

: Mention that 7.3 has been 
released, where it says it has entered release candidate phase 2 now.

: page marked as obselete 
until the 7.4 cycle begins. (Instead of just broken, as it is now.)

: Announcement that it will be 
switching over to the 7.3 docs soon and later switching, like was done 
with 7.1->7.2 IIRC.

Is the website in CVS somewhere? I might put together a patch if so...

I mentioned this around RC1 time and got no response. Really, I think 
that people would have tested it more if they knew about it. And the web 
page is the first place I go to find out about a piece of software. The 
mailing lists are not enough IMNSHO.

Thanks for a great piece of software...but please tell people about it!

Scott


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] nested transactions

2002-11-22 Thread Scott Lamb
Bruce Momjian wrote:

I am going to work on nested transactions for 7.4.


If you're going to do a lot of reworking of how transactions are 
handled, maybe this is a good time to beg for cursors that stay open 
across commits. It looks like the JDBC driver is moving to using cursors 
with ResultSet.CLOSE_CURSORS_AT_COMMIT, for the advantage of not having 
to fetch the entire result immediately and hold it in memory. If this 
were implemented, the same could be done for 
ResultSet.HOLD_CURSORS_OVER_COMMIT, which I think a lot of JDBC code needs.

Thanks,
Scott


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] RC1?

2002-11-14 Thread Scott Lamb
Tom Lane wrote:


Seriously, I agree with Marc's opinion that issuing an RC1 is the best
way to flush out some more port reports.  I do not know what else we can
do to get people off their duffs and onto last-minute testing.


If testing is the problem, I think publicizing the betas would help 
more. I had no idea that 7.3b[2-5] had been released. And looking at the 
website, it's not hard to see why:

: No mention
 (or whatever mirror): No mention
: No mention
: Mentions beta has begun
: Shows latest release at 
bottom of page.

I'd really expect to see an announcement on the news page for each beta 
release and the latest stable/beta release on the front page. That would 
help more than releasing RC1, especially if it's done in the same way.

Thanks,
Scott


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org