Re: [HACKERS] 7.0.3(nofsync) vs 7.1
> I've run tests (with 50 .. 250 simult users) for some PG project > of my company. 7.1 was 3 times faster than 7.0.3 (fsync) but near > 3 times slower than 7.0.3 (nofsync). It was not the best day in > my life - WAL looked like big bottleneck -:( > > But finally I've realized that this test makes ~3 FK insertions > ... and FK insert means SELECT FOR UPDATE ... and this could > reduce # of commits per fsync. > > So, I've run simple test (below) to check this. Seems that 7.1 > is faster than 7.0.3 (nofsync), and that SELECT FOR UPDATE in RI > triggers is quite bad for performance. > > Please take this into account when comparing 7.1 with 7.0.3. > Also, we should add new TODO item: implement dirty reads > and use them in RI triggers. > > Vadim I did some testings using contrib/pgbench (100k tuples, 32 concurrent users) with 7.1 and 7.0.3. It seems 7.1 is 5 times faster than 7.0.3 with fsync, but 1.5 times slower than 7.0.3 without fsync. So I modified access/transam/xlog.c to disable fsync() call at all. Now I get nearly equal performance as 7.0.3 without fsync. It seems the bottle neck is logging with fsync(). It might be interesting moving data/pg_xlog to a separate disk drive and see how it performs better. BTW pgbench does PK insertions and updates, but does no FK things. -- Tatsuo Ishii
[HACKERS] Re: [GENERAL] Japan pictures
> > There is a picture somewhere of the three of us. Hopefully Tatsuo can > > find it. > > Yes, I will upload soon. Done. http://www.sra.co.jp/people/t-ishii/Bruce/DSCN0295.JPG >From left to right: Hiroshi, Bruce, me. More pictures are in: http://www.sra.co.jp/people/t-ishii/Bruce/ Enjoy! -- Tatsuo Ishii
Re: [HACKERS] European Datestyle
> > That is the ISO-style, isn't it? > Yes, it is; but according to the documentation (and how it used to be on > other machines running PG 6.x) it should be ordered in european format, The documentation should be clear that this is the correct output, giving the current "datestyle" settings. Please let me know what part was confusing and I can fix it for the next release. The default date style in 7.0 was changed from "Postgres" to "ISO". The euro vs US setting determines how *input* dates are interpreted, since they are not restricted to being only the format of the default output style. Use "set datestyle = 'Postgres,European'" to change to what you expect. You can set an environment variable or change the defaults when building the backend to get this always. There is an appendix in the docs discussing the parsing strategy, though it is all detail. - Thomas
Re: [HACKERS] Using Threads?
-- Start of PGP signed section. > On Mon, Nov 27, 2000 at 11:42:24PM -0600, Junfeng Zhang wrote: > > I am new to postgreSQL. When I read the documents, I find out the Postmaster > > daemon actual spawns a new backend server process to serve a new client > > request. Why not use threads instead? Is that just for a historical reason, > > or some performance/implementation concern? > > Once all the questions regarding "why not" have been answered, it would > be good to also ask "why use threads?" Do they simplify the code? Do > they offer significant performance or efficiency gains? What do they > give, other than being buzzword compliant? Good question. I have added this to the developers FAQ: --- 14) Why don't we use threads in the backend? There are several reasons threads are not used: Historically, threads were unsupported and buggy. An error in one backend can corrupt other backends. Speed improvements using threads are small compared to the remaining backend startup time. The backend code would be more complex. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
Re: [HACKERS] Using Threads?
Bruce Momjian <[EMAIL PROTECTED]> writes: >> There is no difference. If anything bad happens with the current >> multi-process server, all the postgres backends shutdown because the >> shared memory may be corrupted. > Yes. Are we adding reliability with per-process backends. Yes, we are: the postmaster forces a system-wide restart only if a backend actually coredumps, or exits with elog(STOP). If a backend curls up and dies with elog(FATAL), we assume it's a positive sign that it was able to detect the error ;-), and keep plugging. Now you might argue about whether any particular error case has been misclassified, and I'm sure some are. But my point is that in a multithread environment we couldn't risk allowing the other threads to keep running after an elog(FATAL), either. Threads don't have *any* protection against screwups in other threads. A closely related point: how often do you hear of postmaster crashes? They don't happen, as a rule. That's because the postmaster is (a) simple and (b) decoupled from the backends. To convert backend process launch into backend thread launch, the postmaster would have to live in the same process space as the backends, and that means any backend coredump would take down the postmaster too. regards, tom lane
Re: [HACKERS] Re: CRC
On Fri, Dec 08, 2000 at 09:28:38PM -0500, Tom Lane wrote: > Bruce Guenter <[EMAIL PROTECTED]> writes: > >> I agree, don't send it to the whole list. But I'd like a copy. > > Here you go. > As near as I could tell, the test as you have it (one CRC computation per > fread) is purely I/O bound. Nope. They got 99-100% CPU time with the original version. > I changed the main loop to this: > [...hash each block repeatedly...] Good idea. Might have been even better to just read the block once and hash it even more times. > On an > otherwise idle HP 9000 C180 machine, I get the following numbers on a > 1MB input file: > > time benchcrc real 35.3 > user 35.0 > sys 0.0 > > time benchmd5 real 37.6 > user 37.3 > sys 0.0 > > This is a lot closer than I'd have expected, but it sure ain't > "MD5 40% faster" as you reported. I wonder why the difference > in results between your platform and mine? The difference is likely because PA-RISC (like most other RISC architectures) lack a "roll" opcode that is very prevalent in the MD5 algorithm. Intel CPUs have it. With a new version modified to repeat the inner loop 100,000 times, I got the following: time benchcrchttp://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Using Threads?
> All the major operating systems should have POSIX threads implemented. > Actually this can be configurable--multithreads or one thread. > > Thread-only server is unsafe, I agree. Maybe the following model can be a > little better. Several servers, each is multi-threaded. Every server can > support a maximum number of requests simultaneously. If anything bad > happends, it is limited to that server. > > The cons side of processes model is not the startup time. It is about > kernel resource and context-switch cost. Processes consume much more > kernel resource than threads, and have a much higher cost for context > switch. The scalability of threads model is much better than that of > processes model. My question here is how much do we really context switch. We do quite a bit of work for each query, and I don't see us giving up the CPU very often, as would be the case for a GUI where each thread does a little work and goes to sleep. Also, as someone pointed out, the postmaster doesn't connect to the database, so there isn't much COW overhead. The big win is that all the text page are already in memory and shared by all backends. They can't modify those. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
Re: [HACKERS] Using Threads?
> Bruce Guenter <[EMAIL PROTECTED]> writes: > > [ some very interesting datapoints ] > > > > So, forking a process with lots of data is expensive. However, most of > > the PostgreSQL data is in a SysV IPC shared memory segment, which > > shouldn't affect the fork numbers. > > I believe (but don't have numbers to prove it) that most of the present > backend startup time has *nothing* to do with thread vs process > overhead. Rather, the primary startup cost has to do with initializing > datastructures, particularly the system-catalog caches. A backend isn't > going to get much real work done until it's slurped in a useful amount > of catalog cache --- for example, until it's got the cache entries for > pg_class and the indexes thereon, it's not going to accomplish anything > at all. > > Switching to a thread model wouldn't help this cost a bit, unless > we also switch to a shared cache model. That's not necessarily a win > when you consider the increased costs associated with cross-backend > or cross-thread synchronization needed to access or update the cache. > And if it *is* a win, we could get most of the same benefit in the > multiple-process model by keeping the cache in shared memory. Of course, we would also have to know which database was being used next. Each database's system catalog can be different. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
Re: [HACKERS] Using Threads?
> Adam Haberlach writes: > > Typically (on a well-written OS, at least), the spawning of a thread > > is much cheaper then the creation of a new process (via fork()). > > This would be well worth testing on some representative sample > systems. > > Within the past year and a half at one of my gigs some coworkers did > tests on various platforms (Irix, Solaris, a few variations of Linux > and *BSDs) and concluded that in fact the threads implementations were > often *slower* than using processes for moving and distributing the > sorts of data that they were playing with. > > With copy-on-write and interprocess pipes that are roughly equivalent > to memcpy() speeds it was determined for that application that the > best way to split up tasks was fork()ing and dup(). This brings up a good point. Threads are mostly useful when you have multiple processes that need to share lots of data, and the interprocess overhead is excessive. Because we already have that shared memory area, this benefit of threads doesn't buy us much. We sort of already have done the _shared_ part, and the addition of sharing our data pages is not much of a win. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
Re: [HACKERS] Using Threads?
> > On Mon, 4 Dec 2000, Junfeng Zhang wrote: > > > All the major operating systems should have POSIX threads implemented. > > Actually this can be configurable--multithreads or one thread. > > I don't understand this. The OS can be configured for one thread? How > would that be any of use? > > > Thread-only server is unsafe, I agree. Maybe the following model can be a > > little better. Several servers, each is multi-threaded. Every server can > > support a maximum number of requests simultaneously. If anything bad > > happends, it is limited to that server. > > There is no difference. If anything bad happens with the current > multi-process server, all the postgres backends shutdown because the > shared memory may be corrupted. Yes. Are we adding reliability with per-process backends. I think so because things are less likely to go haywire, and we are more likely to be able to clean things up in a failure. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
Re: [HACKERS] Re: A mb problem in PostgreSQL
> > Please apply following one-line-patch and test it again. If it's ok, I > > will commit it to both current and stable trees. > > > > ! return (b2c3[i][1] | 0x8080U); > > Yes, it's OK. Thank you! Thanks for the testings. I will commit soon. > But I wonder why we need to "| 0x8080U"? > b2c3[][] and BIG5toCNS()'s return value are both unsigned short, aren't they? Since the function returns EUC_TW. In b2c3[] we have CNS 11643-1992 value, and we need to add 0x8080 to convert from CNS 11643-1992 to EUC. -- Tatsuo Ishii
Re: [GENERAL] Japan pictures
> I would like to mention that I met Tatsuo Ishii and Hiroshi Inoue while > in Japan. This was the first time I met them, though I have worked with > them on PostgreSQL for many years. Tatsuo is really the voice of > PostgreSQL in Japan. It was a real thrill. I was thrilled to meet with you too. This is the first time ever for us (all PostgreSQL users in Japan) to have one of cores visit to Japan. This has been the plan in my mind since I have started to run a local mailing list in Japan in 1995! > There is a picture somewhere of the three of us. Hopefully Tatsuo can > find it. Yes, I will upload soon. -- Tatsuo Ishii
Re: [HACKERS] Indexing for geographic objects?
Tom Lane wrote: > Oleg Bartunov <[EMAIL PROTECTED]> writes: > > We've done some work with GiST indices and found a little problem > > with optimizer. > > > test=# set enable_seqscan = off; > > SET VARIABLE > > test=# explain select * from test where s @ '1.05 .. 3.95'; > > NOTICE: QUERY PLAN: > > > Index Scan using test_seg_ix on test (cost=0.00..369.42 rows=5000 width=12) > > > EXPLAIN > > % ./bench.pl -d test -b 100 -i > > total: 1.71 sec; number: 100; for one: 0.017 sec; found 18 docs > > I'd venture that the major problem here is bogus estimated selectivities > for rtree/gist operators. Yes, the problem is, I didn't have the foggiest idea how to estimate selectivity, nor I had any stats when I developed the type. Before 7.0, I had some success using selectivity estimators of another datatype (I think that was int, but I am not sure). In 7.0, most of those estimators were gone and I have probably chosen the wrong ones or none at all, just so I could get it to work again. The performance was good enough for my taste, so I have even forgotten that was an issue. I know, I know: 'good enough' is never good. I apoligize. --Gene
Re: [HACKERS] Re: A mb problem in PostgreSQL
> > Please apply following one-line-patch and test it again. If it's ok, I > > will commit it to both current and stable trees. > > > > ! return (b2c3[i][1] | 0x8080U); > > Yes, it's OK. Thank you! > But I wonder why we need to "| 0x8080U"? > b2c3[][] and BIG5toCNS()'s return value are both unsigned short, aren't they? b2c3 has CNS 11643-1992 value. That is, we need to add 0x8080 to convert to EUC_TW. -- Tatsuo Ishii
Re: [HACKERS] European Datestyle
Trond Eivind GlomsrØd wrote: > > > 2000-12-07 04:40:23+01 > > ^^ > > That is the ISO-style, isn't it? Yes, it is; but according to the documentation (and how it used to be on other machines running PG 6.x) it should be ordered in european format, I don't know if I'm missing something obviuous or what... > There are two ways of making dates make sense, none of them American > (but hey, they're still using Fahrenheit, feet, lb, fl.oz. acres and > other nonsensical units... ) I do not mean to cricticize british units, after all, I would have preferred base16 units instead of base10 :) Bye! -- Daniele --- Daniele Orlandi - Utility Line Italia - http://www.orlandi.com Via Mezzera 29/A - 20030 - Seveso (MI) - Italy ---
Re: [HACKERS] Re: CRC
A couple further observations while playing with this benchmark --- 1. This MD5 implementation is not too robust. On my machine it dumps core if given a non-word-aligned data buffer. We could probably work around that, but it bespeaks a little *too* much hand optimization... 2. It's a bad idea to ignore the startup/termination costs of the algorithms. Of course startup/termination is trivial for CRC, but it's not so trivial for MD5. I changed the code so that the md5 update() routine also calls md5_finish_ctx(), so that each inner loop represents a complete MD5 calculation for a message of the size of the main routine's fread buffer. I then experimented with different buffer sizes. At a buffer size of 1K: time benchcrc
Re: [HACKERS] OK, does anyone have any better ideas?
Andrew Snow wrote: > > Could you perhaps post the code you have for splitting a text field up into > keys, then I could work on turning into a new type of index with a new > operator, as Tom suggested? > > (Or is this already what the text search code in contrib already does??) > > - Andrew OK, I guess I am not getting everything across. Let me give the basics: There are two programs: sqlindex, and sqlfts. sqlindex, is the SQL indexer. sqlfts, is the SQL full text server. They currently take a config file, which will be replaced by columns in the database. (This technology is aimed at multiple databases and non-SQL uses) The config file currently looks like the example at the bottom of this post. The current incarnation of this server sits outside of Postgres and execute joins based the results of the query. The indexing query returns a number of fields, one must be designated as the "key" field. In websearch lingo, think of it as "document name." During index time, I separate the individual fields and create bitmap files which relate word numbers to document bits. Words are parsed and a dictionary is created. Phrase meta-data is also stored along with the document reference (key field) associated with a document number. When a query is executed, each word is picked out of the dictionary. At various points, phrases are evaluated, the bitmap indexes are ANDed, ORed, or NOTed together, rank is applied. The results are then sorted by rank, and the document numbers are merged in with document "references" (key field value) and return with the rank. This technology works quite well as a search engine sort of thing if I store a URL or file name and a teaser as the document reference. I thought it would be cool (and easy) if I just stored a SQL key field as the URL, and connected this stuff to a SQL database. I chose Postgres because I had used it in a number of projects, and thought since it was open source I would have fewer problems. It has not been easy to do what I thought would be a fairly trivial task. I am starting to get Windows programming flashbacks of the "so close, but yet so far" feeling one gets when one tries to do conceptually simple things on Windows. I'm sorry I am getting discouraged and beginning to think that this project is not going to work. >>> configuration file << # The computer host name used for the database sqlindex=localhost sqlfts=localhost # The name of the database sqldb=cdinfo # Base name of the index files. basename=cdinfo/index # The key field used to index and find records. sqlkey=trackid sqlkeyindex=off metaphone=1 # A SQL query that produces a single result, which is the count of # records to be indexed. sqlcount=select count(trackid) from zsong # The SQL query used to produce data to be indexed. sqlretrieve=select * from songview; sqlfields = all,performer2,title,song # A SQL query that will be used to display a list records found sqldisplay=select zsong.muzenbr, performer, performer2, title, song, trackid from ztitles, zsong where zsong.muzenbr = ztitles.muzenbr and zsong.trackid = %s # The tcport is the TCP/IP port for the server tcpport = 8090 ftsproc = 5 ftsqueue = 32 -- http://www.mohawksoft.com
[HACKERS] CRC, hash & Co.
There have been some misconceptions in previous mails. 1.) A CRC is _not_ stronger than a hash. CRC is a subset of the hash domain, defined as "a fast error-check hash based on mod 2 polynomial operations" which has typically no crypto strength (and does not need it either for most purposes). 2.) Theoretically, an optimal MD5 implementation can't be faster than an optimal CRC-32 implementation. Check it yourself: download openssl (www.openssl.org) or Peter Gutmans cryptlib where all sorts of hashes and CRC-32 are implemented in a very reasonable way. Write a tiny routine generating random strings, popping them through the hash function. You will see, CRC-32 is typically several times faster. 3.) There are many domains where you need to protect yout database not only against random accidental glitches, but also against malicious attacks. In these cases, CRC-32 (and other CRCs without any cryptographic strength) are no help. The majority will probably be more happy with fast CRCs, but there will always be some heavy weight users (such as in medical, legal and financial domains) where strong hashes are required. Thus, it should be user-definable at runtime which one to choose. 4.) Without CRC/hash facility, we will have no means of checking our data integrity at all. At least in my domain (medical) most developers are craving for database backends where we don't have to re-implement the integrity checking stuff again and again. If postgres could provide this, it would be a strong argument in favour of postgres. 5.) As opposed to a previous posting (Bruce ?), MD5 has been shown to be "crackable" (deliberate collison feasible withavailable technology) - that was one of the main reasons for the development of RIPEMD-160 (check the RIPEMD home page for more information) Once again, I am happy to implement any number of CRC/hash methods in postgres if anybody (especially theone who implemented the SERIAL data type) points me into the right direction within the postgres source code. No takers so far :-( Horst
Re: [HACKERS] OK, does anyone have any better ideas?
Andrew Snow wrote: > > Could you perhaps post the code you have for splitting a text field up into > keys, then I could work on turning into a new type of index with a new > operator, as Tom suggested? > > (Or is this already what the text search code in contrib already does??) Go to a search engine like lycos, alltheweb, or altavista. This is the type of search capability I want to use in Postgres. I have it working as a stand-alone daemon, it is fast and produces very relevant results. I just thought that this sort of functionality would be a big plus if I could tie it down deep in Postgres. The big advantage to the code is high relevance, boolean operations, and very very high speed operation. If I could get easy Postgres record selection out of it, it would rock. -- http://www.mohawksoft.com
Re: [HACKERS] Re: CRC
Bruce Guenter <[EMAIL PROTECTED]> writes: >> I agree, don't send it to the whole list. But I'd like a copy. > Here you go. As near as I could tell, the test as you have it (one CRC computation per fread) is purely I/O bound. I changed the main loop to this: int main() { static char buf[8192]; size_t rd; hash_t hash; while (rd = fread(buf, 1, sizeof buf, stdin)) { int i; for (i = 0; i < 1000; i++) { init(&hash); update(&hash, buf, rd); } } return 0; } so as to get a reasonable amount of computation per fread. On an otherwise idle HP 9000 C180 machine, I get the following numbers on a 1MB input file: time benchcrc
Re: [HACKERS] 7.0.3(nofsync) vs 7.1
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: > And we always will have to enable fsync when comparing our > performance with other DBes. Of course, but when people say "it's slower than 7.0.3+nofsync" I think that turning off fsync makes a fairer comparison there. >> also reduce the WAL commit delay to zero, no? What is the default > I think so. >> commit delay now? > As before 5 * 10^(-6) sec - pretty the same as sleep(0) -:) > Seems CommitDelay is not very useful parameter now - XLogFlush > logic and fsync time add some delay. There was a thread recently about smarter ways to handle shared fsync of the log --- IIRC, we talked about self-tuning commit delay, releasing waiting processes as soon as someone else had fsync'd, etc. Looks like none of those ideas are in the code now. Did you not like any of those ideas, or just no time to work on it yet? regards, tom lane
Re: [HACKERS] OK, does anyone have any better ideas?
Could you perhaps post the code you have for splitting a text field up into keys, then I could work on turning into a new type of index with a new operator, as Tom suggested? (Or is this already what the text search code in contrib already does??) - Andrew
[GENERAL] Japan pictures
I would like to mention that I met Tatsuo Ishii and Hiroshi Inoue while in Japan. This was the first time I met them, though I have worked with them on PostgreSQL for many years. Tatsuo is really the voice of PostgreSQL in Japan. It was a real thrill. There is a picture somewhere of the three of us. Hopefully Tatsuo can find it. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
RE: [HACKERS] 7.0.3(nofsync) vs 7.1
> > I have only one explanation: it reduces number of transactions ready > > to commit (because of the same FK writers will wait till first one > > committed - ie log fsynced) and WAL commit performance > > greatly depends on how many commits were done by single log fsync. > > 7.0.3+nofsync commit performance doesn't depend on this factor. > > Sure, but that's not exactly a fair comparison is it? 7.0.3+nofsync > should be compared against 7.1+nofsync. I put the pg_fsync routine > back in a little while ago, so nofsync should work again. But I tested old 7.1 (fsync) version -:) And we always will have to enable fsync when comparing our performance with other DBes. > It occurs to me though that disabling fsync should probably > also reduce the WAL commit delay to zero, no? What is the default I think so. > commit delay now? As before 5 * 10^(-6) sec - pretty the same as sleep(0) -:) Seems CommitDelay is not very useful parameter now - XLogFlush logic and fsync time add some delay. Vadim
[HACKERS] Re: A mb problem in PostgreSQL
Tatsuo, Tatsuo Ishii ¼g¹D¡G > Please apply following one-line-patch and test it again. If it's ok, I > will commit it to both current and stable trees. > > ! return (b2c3[i][1] | 0x8080U); Yes, it's OK. Thank you! But I wonder why we need to "| 0x8080U"? b2c3[][] and BIG5toCNS()'s return value are both unsigned short, aren't they? -- Chih-Chang Hsieh
Re: [HACKERS] Trip to Japan
> > Bruce, > > > > what was the camera ? > > No idea. It was not mine. I brought a video camera, and made 30 > minutes of video for my family and company. I don't know how to make an > MP3 of that. > > My wife wants a digital camera now, so it looks like I will have one > soon. :-) Mine is a Nikon "CoolPix990". Originally those pictures had 2048x1536 pixcels that is too much for web pages (~700KB per picture). I shrinked them to 1024x768 using ImageMagick. More pictures will be uploaded to the web pages... -- Tatsuo Ishii
Re: [HACKERS] 7.0.3(nofsync) vs 7.1
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: > I have only one explanation: it reduces number of transactions ready > to commit (because of the same FK writers will wait till first one > committed - ie log fsynced) and WAL commit performance greatly depends > on how many commits were done by single log fsync. > 7.0.3+nofsync commit performance doesn't depend on this factor. Sure, but that's not exactly a fair comparison is it? 7.0.3+nofsync should be compared against 7.1+nofsync. I put the pg_fsync routine back in a little while ago, so nofsync should work again. It occurs to me though that disabling fsync should probably also reduce the WAL commit delay to zero, no? What is the default commit delay now? regards, tom lane
[HACKERS] Bug in index scans with Locale support enabled
In researching a problem I have uncovered the following bug in index scans when Locale support is enabled. Given a 7.0.3 postgres installation built with Locale support enabled and a default US RedHat 7.0 Linux installation (meaning that the LANG environment variable is set to en_US) to enable the US english locale and Given the following table and index structure with the following data: create table test (test_col text); create index test_index on test (test_col); insert into test values ('abc.xyz'); insert into test values ('abcxyz'); insert into test values ('abc/xyz'); If you run the query: select * from test where test_col >= 'abc.'; One would normally expect to only get one record returned, but instead all records are returned. The reason for this is that in the en_US locale all non-alphanumeric characters are ignored when doing string comparisons. So the data above gets treated as: abc.xyz = abcxyz = abc/xyz (as the non-alphanumeric characters of '.' and '/' are ignored). This implys that the above query will then return all rows as the constant 'abc.' is the same as 'abc' for comparison purposes and all rows are >= 'abc'. Note that if you use a different locale for example en_UK, you will get different results as this locale does not ignore the . and / in the comparison. Now the real problem comes in when either the like or regex operators are used in a sql statement. Consider the following sql: select * from text where test_col like 'abc/%'; This query should return one row, the row for 'abc/xyz'. However if the above query is executed via an index scan it will return the wrong number of rows (0 in this case). Why is this? Well the query plan created for the above like expression looks like the following: select * from text where test_col >= 'abc/' and test_col < 'abc0'; In order to use the index the like has been changed into a '>=' and a '<' for the constant prefix ('abc/') and the constant prefix with the last character incremented by one ('/abc0') (0 is the next character after / in ASCII). Given what was shown above about how the en_US locale does comparisons we know that the non-alphanumeric characters are ignored. So the query essentially becomes: select * from text where test_col >= 'abc' and test_col < 'abc0'; and the data it is comparing against is 'abcxyz' in all cases (once the .'s an /'s are removed). Therefore since 'abcxyz' > 'abc0', no rows are returned. Over the last couple of months that I have been on the postgres mail lists there have been a few people who reported that queries of the form "like '/aaa/bbb/%' don't work. From the above information I have determined that such queries don't work if: a) database is built with Locale support enabled (--enable-locale) b) the database is running with locale en_US c) the column the like is being performed on is indexed d) the query execution plan uses the above index (Discovering the exact set of circumstances for how to reproduce this has driven me crazy for a while now). The current implementation for converting the like into an index scan doesn't work with Locale support enabled and the en_US locale as shown above. thanks, --Barry PS. my test case: drop table test; create table test (test_col text); create index test_index on test (test_col); insert into test values ('abc.xyz'); insert into test values ('abcxyz'); insert into test values ('abc/xyz'); explain select * from test where test_col like 'abc/%'; select * from test where test_col like 'abc/%'; when run against postgres 7.0.3 with locale support enabled (used the standard RPMs on postgresql.org for RedHat) with LANG=en_US: barry=# drop table test; DROP barry=# create table test (test_col text); CREATE barry=# create index test_index on test (test_col); CREATE barry=# insert into test values ('abc.xyz'); INSERT 227611 1 barry=# insert into test values ('abcxyz'); INSERT 227612 1 barry=# insert into test values ('abc/xyz'); INSERT 227613 1 barry=# explain select * from test where test_col like 'abc/%'; NOTICE: QUERY PLAN: Index Scan using test_index on test (cost=0.00..8.14 rows=10 width=12) EXPLAIN barry=# select * from test where test_col like 'abc/%'; test_col -- (0 rows) barry=# when run against postgres 7.0.3 with locale support enabled (used the standard RPMs on postgresql.org) with LANG=en_UK: barry=# drop table test; DROP barry=# create table test (test_col text); CREATE barry=# create index test_index on test (test_col); CREATE barry=# insert into test values ('abc.xyz'); INSERT 227628 1 barry=# insert into test values ('abcxyz'); INSERT 227629 1 barry=# insert into test values ('abc/xyz'); INSERT 227630 1 barry=# explain select * from test where test_col like 'abc/%'; NOTICE: QUERY PLAN: Index Scan using test_index on test (cost=0.00..8.14 rows=10 width=12) EXPLAIN barry=# select * from test where test_col like 'abc/%'; test_col -- abc/xyz (1 row) barry=# Note the second query (under en_UK) r
Re: [HACKERS] OK, does anyone have any better ideas?
mlw <[EMAIL PROTECTED]> writes: > Then you call search with a string, such as "the long and winding road" > or "software OR hardware AND engineer NOT sales." A few milliseconds > later, a list of key/rank pairs are produced. This is FAR faster than > the '~~~' operator because it never does a full table scan. An index-associated operator doesn't imply a full table scan either. The whole purpose of an index is to pull out the rows matched by the WHERE expression without doing a full scan. The thing that bothers me about the way you're doing it is that the search result as such doesn't give you access to anything but the keys themselves. Typically what you want to do is get the whole record(s) in which the matching keys are located --- and that's why the notion of SELECT ... WHERE textfield-matches-search-string looks so attractive. You get the records immediately, in one step. Without that, your next step after the search engine call is to do a join of the search result table against your data table, and poof there goes much of your speed gain. (At best, you can make the join reasonably quick by having an index on the unique key field ... but that just means another index to maintain.) Another advantage of handling it as an index is that you don't have to rely on a periodic recomputation of the index; you can do on-the-fly updates each time the table is altered. (Of course, if your indexing technology can't handle incremental updates efficiently, that might not be of any value to you. But there's nothing in the system design that precludes making an index type that's only updated by REINDEX.) I realize this is probably not what you wanted to hear, since building a new index type is a lot more work than I suppose you were looking for. But if you want a full-text index that's integrated naturally into Postgres, that's the path to travel. The way you're doing it is swimming against the tide. Even when the function-returning-recordset limitation is gone (maybe a version or two away), it's still going to be an awkward and inefficient way to work. regards, tom lane
RE: [HACKERS] 7.0.3(nofsync) vs 7.1
> > So, I've run simple test (below) to check this. Seems that 7.1 > > is faster than 7.0.3 (nofsync), and that SELECT FOR UPDATE in RI > > triggers is quite bad for performance. > > Also, we should add new TODO item: implement dirty reads > > and use them in RI triggers. > > That would fix RI triggers, I guess, but what about plain SELECT FOR > UPDATE being used by applications? What about it? Application normally uses exclusive row locks only when it's really required by application logic. Exclusive PK locks are not required for FK inserts by RI logic, we just have no other means to ensure PK existence currently. Keeping in mind that RI is used near in every application I would like to see this fixed. And ppl already complained about it. > Why exactly is SELECT FOR UPDATE such a performance problem for 7.1, > anyway? I wouldn't have thought it'd be a big deal... I have only one explanation: it reduces number of transactions ready to commit (because of the same FK writers will wait till first one committed - ie log fsynced) and WAL commit performance greatly depends on how many commits were done by single log fsync. 7.0.3+nofsync commit performance doesn't depend on this factor. Vadim
[GENERAL] Re: [HACKERS] Bug in index scans with Locale support enabled
Barry Lind <[EMAIL PROTECTED]> writes: > Now the real problem comes in when either the like or regex operators > are used in a sql statement. Right. As of 7.1beta1 we are dealing with this by suppressing LIKE/regex index optimization in all locales other than "C". That's a pretty crude answer but it seems the only reliable one :-(. > Over the last couple of months that I have been on the postgres mail > lists there have been a few people who reported that queries of the form > "like '/aaa/bbb/%' don't work. From the above information I have > determined that such queries don't work if: > a) database is built with Locale support enabled (--enable-locale) > b) the database is running with locale en_US > c) the column the like is being performed on is indexed > d) the query execution plan uses the above index en_US is not the only dangerous locale, unfortunately. I suspect that there are some non-C locales in which we could still do the optimization safely. The trick is to know which ones have collation rules that are affected by character combinations, multi-pass ordering rules, etc. Do you have any info on that? BTW, thanks for the very clear explanation --- we can point people at this next time the question comes up, which it does regularly... regards, tom lane
Re: [HACKERS] OK, does anyone have any better ideas?
Tom Lane wrote: > > mlw <[EMAIL PROTECTED]> writes: > > I have a working version of a text search engine. I want to make it work > > for Postgres (I will be releasing it GPL). It can literally find the > > occurrence of a string of words within 5 million records in a few > > milliseconds. > > Where are the records coming from? Are they inside the database? > (If not, why do you care about integrating this with Postgres?) > > It seems like the right way to integrate this sort of functionality > is to turn it into a kind of index, so that you can do > > SELECT * FROM mytable WHERE keyfield ~~~ 'search string'; > > where ~~~ is the name of some operator that is associated with the > index. The temporary-table approach you are taking seems inherently > klugy, and would still be awkward even if we had functions returning > recordsets... Oh! Another method I tried and just could not get working was returning an array of integers. I as thinking about "select * from table where key_field in ( textsearch('bla bla') ), but I haven't been able to get that to work, and as per a previous post and belatedly reading a FAQ, this would probably still force a full table scan. Another method I thought about was having a table with some maximum number of zero initialized records, and trying something like: create table temp_table as select * from ts_template limit textsearch('bla bla', 10); select filltable(temp_table, 10); select * from table where key_field = temp_table.key; As you can see, all of these ideas are heinous hacks, there has to be a better way. Surely someone has a better idea. -- http://www.mohawksoft.com
Re: [HACKERS] OK, does anyone have any better ideas?
Tom Lane wrote: > > mlw <[EMAIL PROTECTED]> writes: > > I have a working version of a text search engine. I want to make it work > > for Postgres (I will be releasing it GPL). It can literally find the > > occurrence of a string of words within 5 million records in a few > > milliseconds. > > Where are the records coming from? Are they inside the database? > (If not, why do you care about integrating this with Postgres?) > > It seems like the right way to integrate this sort of functionality > is to turn it into a kind of index, so that you can do > > SELECT * FROM mytable WHERE keyfield ~~~ 'search string'; > > where ~~~ is the name of some operator that is associated with the > index. The temporary-table approach you are taking seems inherently > klugy, and would still be awkward even if we had functions returning > recordsets... OK, I get the misunderstanding, you are absolutely right it is VERY kludgy. It is sort of like a bitmap index, but it is more of a search engine. I actually have it working on a commercial website. You run a program periodically (cron job?) that executes a query, the query is then parsed and an index of words, keys, ranks and phrase meta-data is created. You also specify which fields in the query should be indexed and which field will be the "key." (It is not ACID if I understand what they term means.) The data for the text search need not even be in the database, as long as the "key" being indexed is. Then you call search with a string, such as "the long and winding road" or "software OR hardware AND engineer NOT sales." A few milliseconds later, a list of key/rank pairs are produced. This is FAR faster than the '~~~' operator because it never does a full table scan. It is assumed that the "key" field specified is properly indexed. If I had a way of getting the key/rank result pair deeper into Postgres, it would be an amazing platform to make some serious high speed search applications. Think about a million resumes' online and searchable with an arbitrary text string to get a list of candidates, powered by Postgres, handling 100 queries a second. Right now, the way I have it working is PHP makes the search call and then executes a query with the first result (highest rank) and returns the data. If I could get the key/rank pair into postgres as a table or multiple searches into postgres as a set of tables, then you could do some amazing queries really really fast. Still, you said that "select foo from bar where key = textsearch('bla bla',..)" could not be done, and my previous example was the only other way I have been able to even prototype my idea. -- http://www.mohawksoft.com
Re: [HACKERS] 7.0.3(nofsync) vs 7.1
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: > So, I've run simple test (below) to check this. Seems that 7.1 > is faster than 7.0.3 (nofsync), and that SELECT FOR UPDATE in RI > triggers is quite bad for performance. > Also, we should add new TODO item: implement dirty reads > and use them in RI triggers. That would fix RI triggers, I guess, but what about plain SELECT FOR UPDATE being used by applications? Why exactly is SELECT FOR UPDATE such a performance problem for 7.1, anyway? I wouldn't have thought it'd be a big deal... regards, tom lane
Re: [HACKERS] Hash index on macaddr -> crash
"Darren King" <[EMAIL PROTECTED]> writes: > How about creating an SQL statement that will make the change and > putting a blurb about it it in the README, INSTALL and/or FAQ? In theory we could do that, but I doubt it's worth the trouble. Hash on macaddr has never worked (until my upcoming commit anyway ;-)) and the lack of complaints seems to be adequate evidence that no one in the beta-test community has any use for it... so who's going to go to the trouble of manually updating each of their databases? My bet is that there'll be an initdb forced for some other reason (like adding CRCs, or some much-more-serious bug than this one) before 7.1 final anyway... regards, tom lane
[HACKERS] 7.0.3(nofsync) vs 7.1
I've run tests (with 50 .. 250 simult users) for some PG project of my company. 7.1 was 3 times faster than 7.0.3 (fsync) but near 3 times slower than 7.0.3 (nofsync). It was not the best day in my life - WAL looked like big bottleneck -:( But finally I've realized that this test makes ~3 FK insertions ... and FK insert means SELECT FOR UPDATE ... and this could reduce # of commits per fsync. So, I've run simple test (below) to check this. Seems that 7.1 is faster than 7.0.3 (nofsync), and that SELECT FOR UPDATE in RI triggers is quite bad for performance. Please take this into account when comparing 7.1 with 7.0.3. Also, we should add new TODO item: implement dirty reads and use them in RI triggers. Vadim == Tables: PK (p int primary key), p in 1 .. 1000. FK (f int, foreign key(f) references pk(p)). First column below - # of users; second - # of FK inserts per user; next - what values were used in each insert: either unique (ie there was no users inserting same value - no waiters on SELECT FOR UPDATE on PK) or some random value from range. 7.1 50 1000 unique:250 tps 100 1000 unique:243 tps 50 1000 rand(1 .. 10): 178 tps 50 1000 rand(1 .. 5): 108 tps 7.0.3 (nofsync) 50 1000 unique:157 tps 50 1000 rand(1 .. 10): 154 tps 50 1000 rand(1 .. 5): 154 tps
Re: [HACKERS] Re: CRC
Bruce Guenter <[EMAIL PROTECTED]> writes: > Would you like to see the simple benchmarking setup I used? The amount > of code involved (once all the hashes are factored in) is fairly large, > so I'm somewhat hesitant to just send it to the mailing list. I agree, don't send it to the whole list. But I'd like a copy. regards, tom lane
Re: [HACKERS] European Datestyle
Daniele Orlandi <[EMAIL PROTECTED]> writes: > Hello, > > Why is this happening ? > > ctonet=# show datestyle; > NOTICE: DateStyle is ISO with European conventions > SHOW VARIABLE > > ctonet=# select creation_date from users limit 1; > creation_date > > 2000-12-07 04:40:23+01 > ^^ That is the ISO-style, isn't it? There are two ways of making dates make sense, none of them American (but hey, they're still using Fahrenheit, feet, lb, fl.oz. acres and other nonsensical units... ) -- Trond Eivind Glomsrød Red Hat, Inc.
Re: [HACKERS] Hash index on macaddr -> crash
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: hashable. Either change will not take effect without an initdb, however, and I'm loath to force one now that we've started beta. >> >> If we're going to add CRC to log then we need >> in beta anyway... > Ops - we need in INITDB... Not to mention adding a CRC to page headers, which was the other part of the thread. >> Are we going? I dunno. For now, I'll put in the hash function but not force an initdb. If we do the CRC thing then we'll have the initdb at that point. regards, tom lane
[HACKERS] European Datestyle
Hello, Why is this happening ? ctonet=# show datestyle; NOTICE: DateStyle is ISO with European conventions SHOW VARIABLE ctonet=# select creation_date from users limit 1; creation_date 2000-12-07 04:40:23+01 ^^ Datestyle has been set either with -e and with "set datestyle" with no change. Context: Postgresql 7.0.3 on RedHat Linux 7.0 - Kernel 2.4.0-test10 - Glibc 2.1.94 and 2.2 Thanks! Bye! -- Daniele Orlandi
Re: [HACKERS] Re: CRC
On Fri, Dec 08, 2000 at 04:30:58PM -0500, Tom Lane wrote: > Bruce Guenter <[EMAIL PROTECTED]> writes: > >> Are you really saying MD5 was faster than CRC-32? > > Yes. I expect it's because the operations used in MD5 are easily > > parallelized, and operate on blocks of 64-bytes at a time, while the CRC > > is mostly non-parallelizable, uses a table lookup, and operates on > > single bytes. > What MD5 implementation did you use? I used the GPL'ed implementation written by Ulrich Drepper in 1995. The code from OpenSSL looks identical in terms of the operations performed. > The one I have handy (the original > RSA reference version) sure looks like it's more computation per byte > than a CRC. The algorithm itself does use more computation per byte. However, the algorithm works on blocks of 64 bytes at a time. As well, the operations should be easily pipelined. On the other hand, the CRC code is largely serial, and highly dependant on a table lookup operation. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
RE: [HACKERS] Hash index on macaddr -> crash
> We could fix this either by adding a new hash function to support > macaddr, or by removing the pg_amXXX entries that claim macaddr is > hashable. Either change will not take effect without an initdb, > however, and I'm loath to force one now that we've started beta. How about creating an SQL statement that will make the change and putting a blurb about it it in the README, INSTALL and/or FAQ? This wouldn't require an initdb and would let people have the fix. For things like this that update exising fields (vs adding/deleting fields hard-wired for use in the backend), it should work, no? Darren
RE: [HACKERS] Hash index on macaddr -> crash
> > We could fix this either by adding a new hash function to support > > macaddr, or by removing the pg_amXXX entries that claim macaddr is > > hashable. Either change will not take effect without an initdb, > > however, and I'm loath to force one now that we've started beta. > > If we're going to add CRC to log then we need > in beta anyway... ^ Ops - we need in INITDB... > Are we going? Vadim
RE: [HACKERS] Hash index on macaddr -> crash
> We could fix this either by adding a new hash function to support > macaddr, or by removing the pg_amXXX entries that claim macaddr is > hashable. Either change will not take effect without an initdb, > however, and I'm loath to force one now that we've started beta. If we're going to add CRC to log then we need in beta anyway... Are we going? Vadim
Re: [HACKERS] Re: CRC
On Fri, Dec 08, 2000 at 11:10:19AM -0800, I wrote: > Current evidence suggests that MD4 would be a good choice for a hash > algorithm. Thinking about it, I suspect that any CRC implementation that can't outrun MD5 by a wide margin is seriously sub-optimal. Can you post any more details about how the tests were run? I'd like to try it. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] Re: CRC
[EMAIL PROTECTED] (Nathan Myers) writes: > Thinking about it, I suspect that any CRC implementation that can't outrun > MD5 by a wide margin is seriously sub-optimal. I was finding that hard to believe, too, at least for CRC-32 (CRC-64 would take more code, so I'm not so sure about it). Is that 64-bit code you pointed us to before actually a CRC, or something else? It doesn't call itself a CRC, and I was having a hard time extracting anything definite (like the polynomial) from all the bit-pushing underbrush :-( regards, tom lane
Re: [HACKERS] pre-beta is slow
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: > I just didn't care about -F functionality, sorry. > I agreed that we should resurrect it. OK. Do you want to work on that, or shall I? regards, tom lane
Re: CRC was: Re: [HACKERS] beta testing version
On Fri, Dec 08, 2000 at 03:38:09PM -0500, Tom Lane wrote: > Bruce Guenter <[EMAIL PROTECTED]> writes: > > MD5 is a cryptographic hash, which means (AFAIK) that ideally it is > > impossible to produce a collision using any other method than brute > > force attempts. > True but irrelevant. What we need to worry about is the probability > that a random error will be detected, Which I indicated immediately after the sentence you quoted. The probability that a random error will be detected is the same as the probability of a collision in the hash given two different inputs. The brute force note means that the probability of a collision is as good as random. > MD5 is designed for a purpose that really doesn't have much to do with > error detection, when you think about it. It says "you will have a hard > time computing a different string that produces the same hash as some > prespecified string". This is not the same as promising > better-than-random odds against a damaged copy of some string having the > same hash as the original. It does provide as-good-as-random odds against a damaged copy of some string having the same hash as the original -- nobody has been able to exhibit any collisions in MD5 (see http://cr.yp.to/papers/hash127.ps, page 18 for notes on this). > CRC, on the other hand, is specifically > designed for error detection, and for localized errors (such as a > corrupted byte or two) it does a provably better-than-random job. > For nonlocalized errors you don't get a guarantee, but you do get > same-as-random odds of detection (ie, 1 in 2^N for an N-bit CRC). For the log, the CRC's primary function (as far as I understand it) would be to guard against inconsistent transaction being treated as consistent data. Such inconsistent transactions would be partially written, resulting in errors much larger than a small burst. For guarding the actual record data, I agree with you 100% -- what we're likely to see is a few localized bytes with flipped bits due to hardware failure of one kind or another. However, if the data is really critical, an ECC may be more appropriate, but that would make the data significantly larger (9/8 for the algorithms I've seen). > I really doubt that MD5 can beat a CRC with the same number of output > bits for the purpose of error detection; Agreed. However, MD5 provides four times as many bits as the standard 32-bit CRC. (I think I initially suggested you could take an arbitrary 32 bits out of MD5 to provide a check code "as good as CRC-32". I now take that back. Due to the burst error nature of CRCs, nothing else could be as good as it, unless the alternate algorithm also made some guarantees, which MD5 definitely doesn't.) > (Wild-pointer > stomps on disk buffers are an example of the sort of thing that may > look like a burst error.) Actually, wild-pointer incidents involving disk buffers at the kernel level, from my experience, are characterized by content from one file appearing in another, which is distinctly different than a burst error, and more like what would be seen if a log record were partially written. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Re: CRC
On Fri, Dec 08, 2000 at 04:21:21PM -0500, Tom Lane wrote: > [EMAIL PROTECTED] (Nathan Myers) writes: > > Thinking about it, I suspect that any CRC implementation that can't outrun > > MD5 by a wide margin is seriously sub-optimal. > I was finding that hard to believe, too, at least for CRC-32 (CRC-64 > would take more code, so I'm not so sure about it). Would you like to see the simple benchmarking setup I used? The amount of code involved (once all the hashes are factored in) is fairly large, so I'm somewhat hesitant to just send it to the mailing list. > Is that 64-bit code you pointed us to before actually a CRC, or > something else? It doesn't call itself a CRC, and I was having a hard > time extracting anything definite (like the polynomial) from all the > bit-pushing underbrush :-( It isn't a CRC. It's a fingerprint. As you've mentioned, it doesn't have the guarantees against burst errors that a CRC would have, but it does have as good as random collision avoidance over any random data corruption. At least, that's what the author claims. My math isn't nearly good enough to verify such claims. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
[HACKERS] Hash index on macaddr -> crash
It was just pointed out on pggeneral that hash indexes on macaddr columns don't work. Looking into it, I find that someone (me :-() made a booboo: pg_amproc claims that hashvarlena is the appropriate hash function for macaddr --- but macaddr isn't a varlena type, it's a fixed-length pass-by-reference type. We could fix this either by adding a new hash function to support macaddr, or by removing the pg_amXXX entries that claim macaddr is hashable. Either change will not take effect without an initdb, however, and I'm loath to force one now that we've started beta. What I'm inclined to do is add the hash function but not force an initdb (ie, not increment catversion). That would mean that people running 7.1beta1 would still have the bug in 7.1 final if they don't choose to do an initdb when they update. But hashing macaddr isn't very common (else we'd have noticed sooner!) so this seems OK, and better than forcing an initdb on our beta testers. Comments, objections? regards, tom lane
Re: [HACKERS] Re: CRC
Bruce Guenter <[EMAIL PROTECTED]> writes: >> Are you really saying MD5 was faster than CRC-32? > Yes. I expect it's because the operations used in MD5 are easily > parallelized, and operate on blocks of 64-bytes at a time, while the CRC > is mostly non-parallelizable, uses a table lookup, and operates on > single bytes. What MD5 implementation did you use? The one I have handy (the original RSA reference version) sure looks like it's more computation per byte than a CRC. regards, tom lane
Re: [HACKERS] Re: CRC
On Fri, Dec 08, 2000 at 11:10:19AM -0800, Nathan Myers wrote: > This is very interesting. MD4 is faster than MD5. (MD5, described as > "MD4 with suspenders on", does some extra stuff to protect against more- > obscure attacks, of no interest to us.) Which 64-bit CRC code did you > use, Mark Mitchell's? Yes. > Are you really saying MD5 was faster than CRC-32? Yes. I expect it's because the operations used in MD5 are easily parallelized, and operate on blocks of 64-bytes at a time, while the CRC is mostly non-parallelizable, uses a table lookup, and operates on single bytes. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
RE: [HACKERS] pre-beta is slow
> > I just didn't care about -F functionality, sorry. > > I agreed that we should resurrect it. > > OK. Do you want to work on that, or shall I? In near future I'll be busy doing CRC + "physical log" things... Vadim
Re: CRC was: Re: [HACKERS] beta testing version
Bruce Guenter <[EMAIL PROTECTED]> writes: > MD5 is a cryptographic hash, which means (AFAIK) that ideally it is > impossible to produce a collision using any other method than brute > force attempts. True but irrelevant. What we need to worry about is the probability that a random error will be detected, not the computational effort that a malicious attacker would need in order to insert an undetectable error. MD5 is designed for a purpose that really doesn't have much to do with error detection, when you think about it. It says "you will have a hard time computing a different string that produces the same hash as some prespecified string". This is not the same as promising better-than-random odds against a damaged copy of some string having the same hash as the original. CRC, on the other hand, is specifically designed for error detection, and for localized errors (such as a corrupted byte or two) it does a provably better-than-random job. For nonlocalized errors you don't get a guarantee, but you do get same-as-random odds of detection (ie, 1 in 2^N for an N-bit CRC). I really doubt that MD5 can beat a CRC with the same number of output bits for the purpose of error detection; given the lack of guarantee about short burst errors, I doubt it's even as good. (Wild-pointer stomps on disk buffers are an example of the sort of thing that may look like a burst error.) Now, if you are worried about crypto-capable gremlins living in your file system, maybe what you want is MD5. But I'm inclined to think that CRC is more appropriate for the job at hand. regards, tom lane
RE: [HACKERS] pre-beta is slow
> > > Try to compare 7.0.3 & 7.1beta in multi-user environment. > > > > As I understand it you claim it to be faster in multi-user > > environment ? > > > > Could you give some brief technical background why is it so > > and why must it make single-user slower ? > > Because of commit in 7.1 does fsync, with ot without -F > (we can discuss and change this), but in multi-user env > a number of commits can be made with single fsync. > Seems I've described this before? Ops, I forgot to answer question "why in single-user env 7.1 is slower than 7.0.3?". I assumed that 7.1 was compared with 7.0.3 *with -F*, which probably is not correct, I don't know. Well, the next test shows that 7.1 is faster in single-user env than 7.0.3 *without -F*: table (i int, t text); 1000 INSERTs (in separate transactions), sizeof(t) 1 .. 256: 7.0.3: 42 sec -> 24 tps 7.1 : 24 sec -> 42 tps Vadim
[HACKERS] Re: CRC
On Fri, Dec 08, 2000 at 12:19:39PM -0600, Bruce Guenter wrote: > On Thu, Dec 07, 2000 at 04:01:23PM -0800, Nathan Myers wrote: > > 1. Computing a CRC-64 takes only about twice as long as a CRC-32, for > >2^32 times the confidence. That's pretty cheap confidence. > > Incidentally, I benchmarked the previously mentioned 64-bit fingerprint, > the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my > Celeron and on a PIII was MD5. The 64-bit fingerprint was only a hair > slower, the CRC was (quite surprisingly) about 40% slower, and the > implementation of SHA that I had available was a real dog. Taking an > arbitrary 32 bits of a MD5 would likely be less collision prone than > using a 32-bit CRC, and it appears faster as well. This is very interesting. MD4 is faster than MD5. (MD5, described as "MD4 with suspenders on", does some extra stuff to protect against more- obscure attacks, of no interest to us.) Which 64-bit CRC code did you use, Mark Mitchell's? Are you really saying MD5 was faster than CRC-32? I don't know of any reason to think that 32 bits of an MD5 would be better distributed than a CRC-32, or that having computed the 64 bits there would be any point in throwing away half. Current evidence suggests that MD4 would be a good choice for a hash algorithm. Nathan Myers [EMAIL PROTECTED]
Re: AW: [HACKERS] beta testing version
Bruce Guenter wrote: > > CRCs are designed to catch N-bit errors (ie N bits in a row with their > values flipped). N is (IIRC) the number of bits in the CRC minus one. > So, a 32-bit CRC can catch all 31-bit errors. That's the only guarantee > a CRC gives. Everything else has a 1 in 2^32-1 chance of producing the > same CRC as the original data. That's pretty good odds, but not a > guarantee. Nothing is a guarante. Everywhere you have a non-null probability of failure. Memories of any kind doesn't give you a *guarantee* that the data you read is exactly the one you wrote. CPUs and transmsision lines are subject to errors too. You only may be guaranteed that the overall proabability of your system is under a specified level. When the level is low enought you usually suppose the absence of errors guaranteed. With CRC32 you considerably reduce p, and given the frequency when CRC would need to reveal an error, I would consider it enought. Bye! -- Daniele --- Daniele Orlandi - Utility Line Italia - http://www.orlandi.com Via Mezzera 29/A - 20030 - Seveso (MI) - Italy ---
Re: [HACKERS] OK, does anyone have any better ideas?
mlw <[EMAIL PROTECTED]> writes: > I have a working version of a text search engine. I want to make it work > for Postgres (I will be releasing it GPL). It can literally find the > occurrence of a string of words within 5 million records in a few > milliseconds. Where are the records coming from? Are they inside the database? (If not, why do you care about integrating this with Postgres?) It seems like the right way to integrate this sort of functionality is to turn it into a kind of index, so that you can do SELECT * FROM mytable WHERE keyfield ~~~ 'search string'; where ~~~ is the name of some operator that is associated with the index. The temporary-table approach you are taking seems inherently klugy, and would still be awkward even if we had functions returning recordsets... regards, tom lane
Re: CRC was: Re: [HACKERS] beta testing version
Bruce Guenter <[EMAIL PROTECTED]> writes: > ... Taking an > arbitrary 32 bits of a MD5 would likely be less collision prone than > using a 32-bit CRC, and it appears faster as well. ... but that would be an algorithm that you know NOTHING about the properties of. What is your basis for asserting it's better than CRC? CRC is pretty well studied and its error-detection behavior is known (and good). MD5 has been studied less thoroughly AFAIK, and in any case what's known about its behavior is that the *entire* MD5 output provides a good signature for a datastream. If you pick some ad-hoc method like taking a randomly chosen subset of MD5's output bits, you really don't know anything at all about what the error-detection properties of the method are. I am reminded of Knuth's famous advice about random number generators: "Random numbers should not be generated with a method chosen at random. Some theory should be used." Error-detection codes, like random-number generators, have decades of theory behind them. Seat-of-the-pants tinkering, even if it starts with a known-good method, is not likely to produce an improvement. regards, tom lane
Re: CRC was: Re: [HACKERS] beta testing version
On Fri, Dec 08, 2000 at 10:36:39AM -0800, Ian Lance Taylor wrote: >Incidentally, I benchmarked the previously mentioned 64-bit fingerprint, >the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my >Celeron and on a PIII was MD5. The 64-bit fingerprint was only a hair >slower, the CRC was (quite surprisingly) about 40% slower, and the >implementation of SHA that I had available was a real dog. Taking an >arbitrary 32 bits of a MD5 would likely be less collision prone than >using a 32-bit CRC, and it appears faster as well. > > I just want to confirm that you used something like the fast 32-bit > CRC algorithm, appended. The one posted earlier was accurate but > slow. Yes. I just rebuilt the framework using this exact code, and it performed identically to the previous CRC code (which didn't have an unrolled inner loop). These were compiled with -O6 with egcs 1.1.2. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
RE: [HACKERS] pre-beta is slow
> > Because of commit in 7.1 does fsync, with ot without -F > > (we can discuss and change this), but in multi-user env > > a number of commits can be made with single fsync. > > I was planning to ask why you disabled the -F switch. Seems > to me that people who trusted their OS+hardware before would > still want to do so in 7.1, and so it still makes sense to be > able to suppress the fsync calls. I just didn't care about -F functionality, sorry. I agreed that we should resurrect it. Vadim
Re: [HACKERS] Indexing for geographic objects?
On Fri, 8 Dec 2000, Tom Lane wrote: > Date: Fri, 08 Dec 2000 12:59:27 -0500 > From: Tom Lane <[EMAIL PROTECTED]> > To: Oleg Bartunov <[EMAIL PROTECTED]> > Cc: [EMAIL PROTECTED], 'pgsql-hackers ' <[EMAIL PROTECTED]> > Subject: Re: [HACKERS] Indexing for geographic objects? > > Oleg Bartunov <[EMAIL PROTECTED]> writes: > >> 5000 looks like a suspiciously round number ... how many rows are in > >> the table? Have you done a vacuum analyze on it? > > > about 10,000 rows, > > So the thing is estimating 0.5 selectivity, which is a fallback for > operators it knows nothing whatever about. > > [ ... digs in Selkov's scripts ... ] > > CREATE OPERATOR @ ( >LEFTARG = seg, RIGHTARG = seg, PROCEDURE = seg_contains, >COMMUTATOR = '~' > ); > > CREATE OPERATOR ~ ( >LEFTARG = seg, RIGHTARG = seg, PROCEDURE = seg_contained, >COMMUTATOR = '@' > ); > > Sure 'nuff, no selectivity info attached to these declarations. > Try adding > >RESTRICT = contsel, JOIN = contjoinsel > > to them. That's still an entirely bogus estimate, but at least > it's a smaller bogus estimate ... small enough to select an indexscan, > one hopes (see utils/adt/geo_selfuncs.c). Great ! Now we have better plan: test=# explain select * from test where s @ '1.05 .. 3.95'; NOTICE: QUERY PLAN: Index Scan using test_seg_ix on test (cost=0.00..61.56 rows=100 width=12) EXPLAIN > > I have not dug through Gene's stuff to see which other indexable > operators might be missing selectivity estimates, but I'll bet there > are others. If you have the time to look through it and submit a > patch, I can incorporate it into the version that will go into contrib. > We didn't look at Gene's stuff yet. Maybe Gene could find a time to check his code. > regards, tom lane > _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Re: CRC was: Re: [HACKERS] beta testing version
On Fri, Dec 08, 2000 at 01:58:12PM -0500, Tom Lane wrote: > Bruce Guenter <[EMAIL PROTECTED]> writes: > > ... Taking an > > arbitrary 32 bits of a MD5 would likely be less collision prone than > > using a 32-bit CRC, and it appears faster as well. > > ... but that would be an algorithm that you know NOTHING about the > properties of. What is your basis for asserting it's better than CRC? MD5 is a cryptographic hash, which means (AFAIK) that ideally it is impossible to produce a collision using any other method than brute force attempts. In other words, any stream of input to the hash that is longer than the hash length (8 bytes for MD5) is equally probable to produce a given hash code. > CRC is pretty well studied and its error-detection behavior is known > (and good). MD5 has been studied less thoroughly AFAIK, and in any > case what's known about its behavior is that the *entire* MD5 output > provides a good signature for a datastream. If you pick some ad-hoc > method like taking a randomly chosen subset of MD5's output bits, > you really don't know anything at all about what the error-detection > properties of the method are. Actually, in my reading reagarding the properties of MD5, I read an article that stated that if a smaller number of bits was desired, one could either (and here's where my memory fails me) just select the middle N bits from the resulting hash, or fold the hash using XOR until the desired number of bits was reached. I'll see if I can find a reference... RFC2289 (http://www.ietf.org/rfc/rfc2289.txt) includes an algorithm for folding MD5 digests down to 64 bits by XORing the top half with the bottom half. See appendix A. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] pre-beta is slow
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: > Because of commit in 7.1 does fsync, with ot without -F > (we can discuss and change this), but in multi-user env > a number of commits can be made with single fsync. I was planning to ask why you disabled the -F switch. Seems to me that people who trusted their OS+hardware before would still want to do so in 7.1, and so it still makes sense to be able to suppress the fsync calls. regards, tom lane
Re: [HACKERS] Trip to Japan
Bruce, what was the camera ? Regards, Oleg On Fri, 8 Dec 2000, Bruce Momjian wrote: > Date: Fri, 8 Dec 2000 12:07:56 -0500 (EST) > From: Bruce Momjian <[EMAIL PROTECTED]> > To: PostgreSQL-general <[EMAIL PROTECTED]> > Cc: PostgreSQL-development <[EMAIL PROTECTED]> > Subject: [HACKERS] Trip to Japan > > I have just returned from a seven-day trip to Japan. I spoke for seven > hours to three separate groups, totalling 200 people. I spoke to a > Linux Conference, a PostgreSQL user's group, and to SRA, a PostgreSQL > support company. You can get more information on my home page under > "Writings". Here are some pictures from Japan: > > http://www.sra.co.jp/people/t-ishii/Bruce/ > http://ebony.rieb.kobe-u.ac.jp/~yasuda/BRUCE/ > > PostgreSQL is more popular in Japan than in the United States. Japan > has user's groups in many cities and has had a commercial support > company (SRA) for two years. MySQL is not popular in Japan. > > It was great to meet so many nice people. > > -- > Bruce Momjian| http://candle.pha.pa.us > [EMAIL PROTECTED] | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 > _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
RE: [HACKERS] pre-beta is slow
> > Try to compare 7.0.3 & 7.1beta in multi-user environment. > > As I understand it you claim it to be faster in multi-user > environment ? > > Could you give some brief technical background why is it so > and why must it make single-user slower ? Because of commit in 7.1 does fsync, with ot without -F (we can discuss and change this), but in multi-user env a number of commits can be made with single fsync. Seems I've described this before? Vadim
Re: [HACKERS] Indexing for geographic objects?
On Fri, 8 Dec 2000, Oleg Bartunov wrote: > On Fri, 8 Dec 2000, The Hermit Hacker wrote: > > > Date: Fri, 8 Dec 2000 12:19:56 -0400 (AST) > > From: The Hermit Hacker <[EMAIL PROTECTED]> > > To: Tom Lane <[EMAIL PROTECTED]> > > Cc: Oleg Bartunov <[EMAIL PROTECTED]>, [EMAIL PROTECTED], > > 'pgsql-hackers ' <[EMAIL PROTECTED]> > > Subject: Re: [HACKERS] Indexing for geographic objects? > > > > > > just a note here ... recently, we had a client with similar problems with > > using index scan, where turning off seqscan did the trick ... we took his > > tables, loaded them into a v7.1beta1 server and it correctly comes up with > > the index scan ... > > > > Oleg, have you tried this with v7.1 yet? > > Not yet. Just a plain 7.0.3 release. Will play with 7.1beta. > But we're working in real life and need things to work in production :-) Okay, then I believe that what you are experience wiht v7.0.3 is already fixed in v7.1beta, based on similar results I got with some queries and then tested uver v7.1 ... > > regards, > Oleg > > > > > On Fri, 8 Dec 2000, Tom Lane wrote: > > > > > Oleg Bartunov <[EMAIL PROTECTED]> writes: > > > > We've done some work with GiST indices and found a little problem > > > > with optimizer. > > > > > > > test=# set enable_seqscan = off; > > > > SET VARIABLE > > > > test=# explain select * from test where s @ '1.05 .. 3.95'; > > > > NOTICE: QUERY PLAN: > > > > > > > Index Scan using test_seg_ix on test (cost=0.00..369.42 rows=5000 width=12) > > > > > > > EXPLAIN > > > > % ./bench.pl -d test -b 100 -i > > > > total: 1.71 sec; number: 100; for one: 0.017 sec; found 18 docs > > > > > > I'd venture that the major problem here is bogus estimated selectivities > > > for rtree/gist operators. Note the discrepancy between the estimated > > > row count and the actual (I assume the "found 18 docs" is the true > > > number of rows output by the query). With an estimated row count even > > > half that (ie, merely two orders of magnitude away from reality ;-)) > > > the thing would've correctly chosen the index scan over sequential. > > > > > > 5000 looks like a suspiciously round number ... how many rows are in > > > the table? Have you done a vacuum analyze on it? > > > > > > regards, tom lane > > > > > > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > > Systems Administrator @ hub.org > > primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org > > > > _ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org
Re: [HACKERS] Indexing for geographic objects?
Oleg Bartunov <[EMAIL PROTECTED]> writes: >> 5000 looks like a suspiciously round number ... how many rows are in >> the table? Have you done a vacuum analyze on it? > about 10,000 rows, So the thing is estimating 0.5 selectivity, which is a fallback for operators it knows nothing whatever about. [ ... digs in Selkov's scripts ... ] CREATE OPERATOR @ ( LEFTARG = seg, RIGHTARG = seg, PROCEDURE = seg_contains, COMMUTATOR = '~' ); CREATE OPERATOR ~ ( LEFTARG = seg, RIGHTARG = seg, PROCEDURE = seg_contained, COMMUTATOR = '@' ); Sure 'nuff, no selectivity info attached to these declarations. Try adding RESTRICT = contsel, JOIN = contjoinsel to them. That's still an entirely bogus estimate, but at least it's a smaller bogus estimate ... small enough to select an indexscan, one hopes (see utils/adt/geo_selfuncs.c). I have not dug through Gene's stuff to see which other indexable operators might be missing selectivity estimates, but I'll bet there are others. If you have the time to look through it and submit a patch, I can incorporate it into the version that will go into contrib. regards, tom lane
Re: [HACKERS] RFC: CRC datatype
"Horst Herb" <[EMAIL PROTECTED]> writes: >> Surely you don't trust your TCP connection to the >> server, either? > TCP _IS_ heavily checksummed. Yes, and so are the disk drives that you are asserting you don't trust. My point is that in both cases, there are lots and lots of failure mechanisms that won't be caught by the transport or storage CRC. The same applies to anything other than an end-to-end check. regards, tom lane
Re: [HACKERS] Indexing for geographic objects?
On Fri, 8 Dec 2000, The Hermit Hacker wrote: > Date: Fri, 8 Dec 2000 12:19:56 -0400 (AST) > From: The Hermit Hacker <[EMAIL PROTECTED]> > To: Tom Lane <[EMAIL PROTECTED]> > Cc: Oleg Bartunov <[EMAIL PROTECTED]>, [EMAIL PROTECTED], > 'pgsql-hackers ' <[EMAIL PROTECTED]> > Subject: Re: [HACKERS] Indexing for geographic objects? > > > just a note here ... recently, we had a client with similar problems with > using index scan, where turning off seqscan did the trick ... we took his > tables, loaded them into a v7.1beta1 server and it correctly comes up with > the index scan ... > > Oleg, have you tried this with v7.1 yet? Not yet. Just a plain 7.0.3 release. Will play with 7.1beta. But we're working in real life and need things to work in production :-) regards, Oleg > > On Fri, 8 Dec 2000, Tom Lane wrote: > > > Oleg Bartunov <[EMAIL PROTECTED]> writes: > > > We've done some work with GiST indices and found a little problem > > > with optimizer. > > > > > test=# set enable_seqscan = off; > > > SET VARIABLE > > > test=# explain select * from test where s @ '1.05 .. 3.95'; > > > NOTICE: QUERY PLAN: > > > > > Index Scan using test_seg_ix on test (cost=0.00..369.42 rows=5000 width=12) > > > > > EXPLAIN > > > % ./bench.pl -d test -b 100 -i > > > total: 1.71 sec; number: 100; for one: 0.017 sec; found 18 docs > > > > I'd venture that the major problem here is bogus estimated selectivities > > for rtree/gist operators. Note the discrepancy between the estimated > > row count and the actual (I assume the "found 18 docs" is the true > > number of rows output by the query). With an estimated row count even > > half that (ie, merely two orders of magnitude away from reality ;-)) > > the thing would've correctly chosen the index scan over sequential. > > > > 5000 looks like a suspiciously round number ... how many rows are in > > the table? Have you done a vacuum analyze on it? > > > > regards, tom lane > > > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > Systems Administrator @ hub.org > primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org > _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
[GENERAL] Re: [HACKERS] Trip to Japan
> Bruce, > > what was the camera ? No idea. It was not mine. I brought a video camera, and made 30 minutes of video for my family and company. I don't know how to make an MP3 of that. My wife wants a digital camera now, so it looks like I will have one soon. :-) -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
Re: CRC was: Re: [HACKERS] beta testing version
Date: Fri, 8 Dec 2000 12:19:39 -0600 From: Bruce Guenter <[EMAIL PROTECTED]> Incidentally, I benchmarked the previously mentioned 64-bit fingerprint, the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my Celeron and on a PIII was MD5. The 64-bit fingerprint was only a hair slower, the CRC was (quite surprisingly) about 40% slower, and the implementation of SHA that I had available was a real dog. Taking an arbitrary 32 bits of a MD5 would likely be less collision prone than using a 32-bit CRC, and it appears faster as well. I just want to confirm that you used something like the fast 32-bit CRC algorithm, appended. The one posted earlier was accurate but slow. Ian /* * Copyright (C) 1986 Gary S. Brown. You may use this program, or * code or tables extracted from it, as desired without restriction. */ /* Modified slightly by Ian Lance Taylor, [EMAIL PROTECTED], for use with Taylor UUCP. */ #include "uucp.h" #include "prot.h" /* First, the polynomial itself and its table of feedback terms. The */ /* polynomial is */ /* X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X^1+X^0 */ /* Note that we take it "backwards" and put the highest-order term in */ /* the lowest-order bit. The X^32 term is "implied"; the LSB is the */ /* X^31 term, etc. The X^0 term (usually shown as "+1") results in*/ /* the MSB being 1.*/ /* Note that the usual hardware shift register implementation, which */ /* is what we're using (we're merely optimizing it by doing eight-bit */ /* chunks at a time) shifts bits into the lowest-order term. In our */ /* implementation, that means shifting towards the right. Why do we */ /* do it this way? Because the calculated CRC must be transmitted in */ /* order from highest-order term to lowest-order term. UARTs transmit */ /* characters in order from LSB to MSB. By storing the CRC this way, */ /* we hand it to the UART in the order low-byte to high-byte; the UART */ /* sends each low-bit to hight-bit; and the result is transmission bit */ /* by bit from highest- to lowest-order term without requiring any bit */ /* shuffling on our part. Reception works similarly. */ /* The feedback terms table consists of 256, 32-bit entries. Notes: */ /* */ /* The table can be generated at runtime if desired; code to do so */ /* is shown later. It might not be obvious, but the feedback */ /* terms simply represent the results of eight shift/xor opera-*/ /* tions for all combinations of data and CRC register values. */ /* [this code is no longer present--ian] */ /* */ /* The values must be right-shifted by eight bits by the "updcrc" */ /* logic; the shift must be unsigned (bring in zeroes). On some */ /* hardware you could probably optimize the shift in assembler by */ /* using byte-swap instructions. */ static const unsigned long aicrc32tab[] = { /* CRC polynomial 0xedb88320 */ 0xL, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L, 0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L, 0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L, 0x90bf1d91L, 0x1db71064L, 0x6ab020f2L, 0xf3b97148L, 0x84be41deL, 0x1adad47dL, 0x6ddde4ebL, 0xf4d4b551L, 0x83d385c7L, 0x136c9856L, 0x646ba8c0L, 0xfd62f97aL, 0x8a65c9ecL, 0x14015c4fL, 0x63066cd9L, 0xfa0f3d63L, 0x8d080df5L, 0x3b6e20c8L, 0x4c69105eL, 0xd56041e4L, 0xa2677172L, 0x3c03e4d1L, 0x4b04d447L, 0xd20d85fdL, 0xa50ab56bL, 0x35b5a8faL, 0x42b2986cL, 0xdbbbc9d6L, 0xacbcf940L, 0x32d86ce3L, 0x45df5c75L, 0xdcd60dcfL, 0xabd13d59L, 0x26d930acL, 0x51de003aL, 0xc8d75180L, 0xbfd06116L, 0x21b4f4b5L, 0x56b3c423L, 0xcfba9599L, 0xb8bda50fL, 0x2802b89eL, 0x5f058808L, 0xc60cd9b2L, 0xb10be924L, 0x2f6f7c87L, 0x58684c11L, 0xc1611dabL, 0xb6662d3dL, 0x76dc4190L, 0x01db7106L, 0x98d220bcL, 0xefd5102aL, 0x71b18589L, 0x06b6b51fL, 0x9fbfe4a5L, 0xe8b8d433L, 0x7807c9a2L, 0x0f00f934L, 0x9609a88eL, 0xe10e9818L, 0x7f6a0dbbL, 0x086d3d2dL, 0x91646c97L, 0xe6635c01L, 0x6b6b51f4L, 0x1c6c6162L, 0x856530d8L, 0xf262004eL, 0x6c0695edL, 0x1b01a57bL, 0x8208f4c1L, 0xf50fc457L, 0x65b0d9c6L, 0x12b7e950L, 0x8bbeb8eaL, 0xfcb9887cL, 0x62dd1ddfL, 0x15da2d49L, 0x8cd37cf3L, 0xfbd44c65L, 0x4db26158L, 0x3ab551ceL, 0xa3bc0074L, 0xd4bb30e2L, 0x4adfa541L, 0x3dd895d7L, 0xa4d1c46dL, 0xd3d6f4fbL, 0x4369e96aL, 0x346ed9fcL, 0xad678846L, 0xda60b8d0L, 0x44042d73L, 0x33031de5L, 0xaa0a4c5fL, 0xdd0d7cc9L, 0x5005713cL, 0x270241aaL, 0xbe0b1010L, 0xc90c2086L, 0x5768b525L, 0x206f85b3L, 0xb966d409L, 0xce61e49fL, 0x5edef90eL, 0x29d9c998L, 0xb0d09822L, 0xc7d7a8b4L, 0x59b33d17L, 0x2eb40d81L, 0xb7bd5c3bL, 0xc0ba6cadL, 0xedb88320L, 0x9abfb
Re: CRC was: Re: [HACKERS] beta testing version
On Thu, Dec 07, 2000 at 04:01:23PM -0800, Nathan Myers wrote: > 1. Computing a CRC-64 takes only about twice as long as a CRC-32, for >2^32 times the confidence. That's pretty cheap confidence. Incidentally, I benchmarked the previously mentioned 64-bit fingerprint, the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my Celeron and on a PIII was MD5. The 64-bit fingerprint was only a hair slower, the CRC was (quite surprisingly) about 40% slower, and the implementation of SHA that I had available was a real dog. Taking an arbitrary 32 bits of a MD5 would likely be less collision prone than using a 32-bit CRC, and it appears faster as well. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
[HACKERS] Re: Threads
Hi all Small intrusion in the Threads discussion 1) don' t forget other OS .. the linux is not the only one (for now :-) ) For example check the performance under Solaris http://www2.linuxjournal.com/lj-issues/issue70/3184.html 2) unfortunatly some platforms that had no threads (but is a few) or an incompatible threads library (the major .. ) On the informix doc i read : To most effectively utilize system resources, a configurable pool of database server processes called virtual processors schedule and manage user requests across multiple CPUs and disks. User requests are represented by lightweight mechanisms called threads. Each thread consists of a single sequential flow of control that represents part of a discrete task within a database server process. For example, a processing-intensive request such as a multi-table join can be divided into multiple database threads (subtasks) and spread across all the available virtual processors in the system. Could be intresting: One process for connection client ,but composed with many threads (similar to apache 2.0) - No crash all the system - Better performance in sql execution (i hope) I think will be the way for the future (Postgresql 8.0 ? ) , i know is not simple
Re: [HACKERS] RFC: CRC datatype
> I think a row-level CRC is rather pointless. Perhaps it'd be a good > idea to have a disk-page-level CRC, though. That would check the rows > on the page *and* allow catching errors in the page and tuple overhead > structures, which row-level CRCs would not cover. row level is neccessary to be able tocheck integrity at application level. > I suspect TOAST breaks your notion of computing a CRC at trigger time > anyway --- some of the fields may be toasted already, some not. The workaround is a loggingtable where you store the crcs as well. Lateron, an "integrity daemon" can compare whether match or not. > If you're sufficiently paranoid that you insist you need a row-level > CRC, it seems to me that you'd want to generate it and later check it > in your application, not in the database. That's the only way you get Oh, sure, that is the way we do it now. And no, nothing to do with paranoia. Burnt previously badly by assumption that a decent SQL server is a "guarantee" for data integrity. Shit simply happens. > end-to-end coverage. Surely you don't trust your TCP connection to the > server, either? TCP _IS_ heavily checksummed. But yes, we _do_ calculate checksums at the client, recalculate at the server, and compare after the transaction is completed. As we have only few writes between heavy read access, the performance penalty doing this (for our purposes) is minimal. Horst
[HACKERS] OSDN database summit
Here is a late report on the OSDN database summit. It was great to meet so many PostgreSQL users, and to meet the major developers of MySQL, Interbase, and Sleepycat. We clearly have many of the same hopes and concerns for open-source databases. PostgreSQL had half of all attendees. Seems we must have done a good job publicising it. There is hope to do this again next year. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
Re: [HACKERS] Indexing for geographic objects?
On Fri, 8 Dec 2000, Tom Lane wrote: > Date: Fri, 08 Dec 2000 10:47:37 -0500 > From: Tom Lane <[EMAIL PROTECTED]> > To: Oleg Bartunov <[EMAIL PROTECTED]> > Cc: [EMAIL PROTECTED], 'pgsql-hackers ' <[EMAIL PROTECTED]> > Subject: Re: [HACKERS] Indexing for geographic objects? > > Oleg Bartunov <[EMAIL PROTECTED]> writes: > > We've done some work with GiST indices and found a little problem > > with optimizer. > > > test=# set enable_seqscan = off; > > SET VARIABLE > > test=# explain select * from test where s @ '1.05 .. 3.95'; > > NOTICE: QUERY PLAN: > > > Index Scan using test_seg_ix on test (cost=0.00..369.42 rows=5000 width=12) > > > EXPLAIN > > % ./bench.pl -d test -b 100 -i > > total: 1.71 sec; number: 100; for one: 0.017 sec; found 18 docs > > I'd venture that the major problem here is bogus estimated selectivities > for rtree/gist operators. Note the discrepancy between the estimated > row count and the actual (I assume the "found 18 docs" is the true > number of rows output by the query). With an estimated row count even yes, 18 docs is the true number > half that (ie, merely two orders of magnitude away from reality ;-)) > the thing would've correctly chosen the index scan over sequential. > > 5000 looks like a suspiciously round number ... how many rows are in > the table? Have you done a vacuum analyze on it? park-lane:~/app/pgsql/gist_problem$ wc SQL 10009 10049 157987 SQL about 10,000 rows, relevant part of script is: .skipped... 1.9039...3.5139 1.8716...3.9317 \. CREATE INDEX test_seg_ix ON test USING gist (s); vacuum analyze; ^^ explain select * from test where s @ '1.05 .. 3.95'; set enable_seqscan = off; explain select * from test where s @ '1.05 .. 3.95'; Regards, Oleg > > regards, tom lane > _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
[GENERAL] Trip to Japan
I have just returned from a seven-day trip to Japan. I spoke for seven hours to three separate groups, totalling 200 people. I spoke to a Linux Conference, a PostgreSQL user's group, and to SRA, a PostgreSQL support company. You can get more information on my home page under "Writings". Here are some pictures from Japan: http://www.sra.co.jp/people/t-ishii/Bruce/ http://ebony.rieb.kobe-u.ac.jp/~yasuda/BRUCE/ PostgreSQL is more popular in Japan than in the United States. Japan has user's groups in many cities and has had a commercial support company (SRA) for two years. MySQL is not popular in Japan. It was great to meet so many nice people. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
Re: [HACKERS] RFC: CRC datatype
"Horst Herb" <[EMAIL PROTECTED]> writes: > AFAIK the thread for "built in" crcs referred only to CRCs in the > transaction log. This here is a different thing. CRCs in the transaction log > are crucial to proof integrity of the log, CRCs as datatype are neccessary > to proof integrity of database entries at row level. I think a row-level CRC is rather pointless. Perhaps it'd be a good idea to have a disk-page-level CRC, though. That would check the rows on the page *and* allow catching errors in the page and tuple overhead structures, which row-level CRCs would not cover. I suspect TOAST breaks your notion of computing a CRC at trigger time anyway --- some of the fields may be toasted already, some not. If you're sufficiently paranoid that you insist you need a row-level CRC, it seems to me that you'd want to generate it and later check it in your application, not in the database. That's the only way you get end-to-end coverage. Surely you don't trust your TCP connection to the server, either? regards, tom lane
Re: [HACKERS] Indexing for geographic objects?
just a note here ... recently, we had a client with similar problems with using index scan, where turning off seqscan did the trick ... we took his tables, loaded them into a v7.1beta1 server and it correctly comes up with the index scan ... Oleg, have you tried this with v7.1 yet? On Fri, 8 Dec 2000, Tom Lane wrote: > Oleg Bartunov <[EMAIL PROTECTED]> writes: > > We've done some work with GiST indices and found a little problem > > with optimizer. > > > test=# set enable_seqscan = off; > > SET VARIABLE > > test=# explain select * from test where s @ '1.05 .. 3.95'; > > NOTICE: QUERY PLAN: > > > Index Scan using test_seg_ix on test (cost=0.00..369.42 rows=5000 width=12) > > > EXPLAIN > > % ./bench.pl -d test -b 100 -i > > total: 1.71 sec; number: 100; for one: 0.017 sec; found 18 docs > > I'd venture that the major problem here is bogus estimated selectivities > for rtree/gist operators. Note the discrepancy between the estimated > row count and the actual (I assume the "found 18 docs" is the true > number of rows output by the query). With an estimated row count even > half that (ie, merely two orders of magnitude away from reality ;-)) > the thing would've correctly chosen the index scan over sequential. > > 5000 looks like a suspiciously round number ... how many rows are in > the table? Have you done a vacuum analyze on it? > > regards, tom lane > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org
[HACKERS] OK, does anyone have any better ideas?
I have a working version of a text search engine. I want to make it work for Postgres (I will be releasing it GPL). It can literally find the occurrence of a string of words within 5 million records in a few milliseconds. It is very fast, it works similarly to many web search engines. I have tried many approaches to integrate the search system with Postgres, but I can't find any method that isn't too slow or too cumbersome. The best I have been able to come up with is this: create function textsearch(varchar) returns integer as ' DECLARE handle integer; count integer; pos integer; BEGIN handle = search_exec( \'localhost\', $1); count = search_count(handle); for pos in 0 .. count-1 loop insert into search_result(key, rank) values (search_key(handle,pos), search_rank(handle,pos)); end loop; return search_done(handle); END; ' language 'plpgsql'; And this is used as: create temp table search_result (key integer, rank integer); select textsearch('bla bla'); select field from table where field_key = search_result.key order by search_result.rank ; drop table search_result ; The problems with this are, I can't seem to be able to create a table in plpgsql. (I read about a patch, but have to find out what version it is in), so I have to create a table outside the function. I can only execute one text search, because I can't seem to use the name of a table that has been passed in to the plpgsql environment, that would allow multiple searches to be joined. As: select textsearch(temp_tbl1, 'bla bla'); select textsearch(temp_tbl2, 'foo bar'); select field from table1, table2 where table1.field_key = temp_tbl1.key and table2.field_key = temp_tbl2.key; This could be so sweet, but, right now, it is just a disaster and I am pulling my hair out. Does anyone have any suggestions or tricks that could make this easier/faster, or is Postgres just unable to do this sort of thing. -- http://www.mohawksoft.com
Re: [HACKERS] Indexing for geographic objects?
Oleg Bartunov <[EMAIL PROTECTED]> writes: > We've done some work with GiST indices and found a little problem > with optimizer. > test=# set enable_seqscan = off; > SET VARIABLE > test=# explain select * from test where s @ '1.05 .. 3.95'; > NOTICE: QUERY PLAN: > Index Scan using test_seg_ix on test (cost=0.00..369.42 rows=5000 width=12) > EXPLAIN > % ./bench.pl -d test -b 100 -i > total: 1.71 sec; number: 100; for one: 0.017 sec; found 18 docs I'd venture that the major problem here is bogus estimated selectivities for rtree/gist operators. Note the discrepancy between the estimated row count and the actual (I assume the "found 18 docs" is the true number of rows output by the query). With an estimated row count even half that (ie, merely two orders of magnitude away from reality ;-)) the thing would've correctly chosen the index scan over sequential. 5000 looks like a suspiciously round number ... how many rows are in the table? Have you done a vacuum analyze on it? regards, tom lane
Re: [HACKERS] Indexing for geographic objects?
Hi, We've done some work with GiST indices and found a little problem with optimizer. The problem could be reproduced with Gene's code (link is in original message below). test data and sql I could send - it's just 52Kb gzipped file. What is a reason for optimizer to decide that sequential scan is better (look below for a numbers). Implicite disabling of seq scan gave much better timings. Regards, Oleg test=# explain select * from test where s @ '1.05 .. 3.95'; NOTICE: QUERY PLAN: Seq Scan on test (cost=0.00..184.01 rows=5000 width=12) EXPLAIN % ./bench.pl -d test -b 100 total: 3.19 sec; number: 100; for one: 0.032 sec; found 18 docs test=# set enable_seqscan = off; SET VARIABLE test=# explain select * from test where s @ '1.05 .. 3.95'; NOTICE: QUERY PLAN: Index Scan using test_seg_ix on test (cost=0.00..369.42 rows=5000 width=12) EXPLAIN % ./bench.pl -d test -b 100 -i total: 1.71 sec; number: 100; for one: 0.017 sec; found 18 docs On Mon, 27 Nov 2000 [EMAIL PROTECTED] wrote: > Date: Mon, 27 Nov 2000 12:36:42 -0600 > From: [EMAIL PROTECTED] > To: Tom Lane <[EMAIL PROTECTED]> > Cc: 'pgsql-general ' <[EMAIL PROTECTED]>, > 'pgsql-hackers ' <[EMAIL PROTECTED]> > Subject: Re: [HACKERS] Indexing for geographic objects? > > Tom Lane wrote: > > Michael Ansley <[EMAIL PROTECTED]> writes: > > > Remember also that the GiST library has been integrated into PG, (my brother > > > is doing some thesis workon that at the moment), > > > > Yeah? Does it still work? > > You bet. One would otherwise be hearing from me. I depend on it quite > heavily and I am checking with almost every release. I am now current > with 7.0.2 -- this time it required some change, although not in the c > code. And that's pretty amazing: I was only screwed once since > postgres95 -- by a beta version I don't remember now; then I > complained and the problem was fixed. I don't even know whom I owe > thanks for that. > > > Since the GIST code is not tested by any standard regress test, and is > > so poorly documented that hardly anyone can be using it, > I've always > > assumed that it is probably suffering from a severe case of bit-rot. > > > > I'd love to see someone contribute documentation and regression test > > cases for it --- it's a great feature, if it works. > > The bit rot fortunately did not happen, but the documentation I > promised Bruce many months ago is still "in the works" -- meaning, > something interfered and I haven't had a chance to start. Like a > friend of mine muses all the time, "Promise doesn't mean > marriage". Boy, do I feel guilty. > > It's a bit better with the testing. I am not sure how to test the > GiST directly, but I have adapted the current version of regression > tests for the data types that depend on it. One can find them in my > contrib directory, under test/ (again, it's > http://wit.mcs.anl.gov/~selkovjr/pg_extensions/contrib.tgz) > > It would be nice if at least one of the GiST types became a built-in > (that would provide for a more intensive testing), but I can also > think of the contrib code being (optionally) included into the main > build and regression test trees. The top-level makefile can have a > couple of special targets to build and test the contribs. I believe my > version of the tests can be a useful example to other contributors > whose code is already in the source tree. > > --Gene > _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Re: [HACKERS] pre-beta is slow
"Mikheev, Vadim" wrote: > > > recently I have downloaded a pre-beta postgresql, I found > > insert and update speed is slower then 7.0.3, > > even I turn of sync flag, it is still slow than 7.0, why? How much slower do you see it to be ? > > how can I make it faster? > > Try to compare 7.0.3 & 7.1beta in multi-user environment. As I understand it you claim it to be faster in multi-user environment ? Could you give some brief technical background why is it so and why must it make single-user slower ? --- Hannu
Re: [HACKERS] RFC: CRC datatype
> I suspect that you are really looking at the problem from the wrong end. > CRC checking should not need to be done by the database user, with a fancy > type. The postgres server itself should guarantee data integrity - you > shouldn't have to worry about it in userland. I agree in principle. However, performance sometimes is more important than integrity. Think of a data logger of uncritical data. A online forum. There a plenty of occasions where you don't have to worry for a single bit on or off, but a lot to worry about performance. Look at all those people using M$ Access or MySQL who don't give a damn about data integrity. As opposed to them, there will always be other "typical" database applications where 100% integrity is paramount. Then it is nice to have a choice of CRCs, where the database designer can choose according to his/her specific performance/integrity balanced needs. This is why I would prefer the "datatype" solution. > This is, in fact, what the recent discussion on this list has been > proposing... AFAIK the thread for "built in" crcs referred only to CRCs in the transaction log. This here is a different thing. CRCs in the transaction log are crucial to proof integrity of the log, CRCs as datatype are neccessary to proof integrity of database entries at row level. Always remember that a psotgres data base on the harddisk can be manipulated accidentally / maliciously without postgres even running. These are the cases where you need row level CRCs. Horst
Re: [HACKERS] Re: COPY BINARY file format proposal
At 02:31 8/12/00 -0500, Tom Lane wrote: > >How about: > >Signature: 12-byte sequence "PGBCOPY\n\377\r\n\0" (detects newline >replacements, dropped nulls, dropped high bits, parity changes); > >Integer layout field: int32 constant 0x01020304 in source's byte order. > How about a CRC? ;-P Philip Warner| __---_ Albatross Consulting Pty. Ltd. |/ - \ (A.B.N. 75 008 659 498) | /(@) __---_ Tel: (+61) 0500 83 82 81 | _ \ Fax: (+61) 0500 83 82 82 | ___ | Http://www.rhyme.com.au |/ \| |---- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
RE: [HACKERS] RFC: CRC datatype
> Therefore, I propose defining new data types like "CRC32", "CRC64", > "RIPEMD", whatever (rather than pluggable arbitrary CRCs). I suspect that you are really looking at the problem from the wrong end. CRC checking should not need to be done by the database user, with a fancy type. The postgres server itself should guarantee data integrity - you shouldn't have to worry about it in userland. This is, in fact, what the recent discussion on this list has been proposing... Chris