Re: [PATCHES] strpos() && KMP

2007-08-10 Thread Tom Lane
Pavel Ajtkulov <[EMAIL PROTECTED]> writes:
> Tom Lane writes:
>> Moreover, you'd lose the guarantee of not-worse-than-linear time,
>> because hash lookup can be pathologically bad if you get a lot of hash
>> collisions.

> compute max_wchar, min_wchar. If (d = max_wchar - min_wchar) < k (for
> example, k = 1000), then we use index table (wchar -> wchar -
> min_wchar). Else we use hash table. Number of collisions would be a
> few (because hash table needs for pattern characters only.

I think you missed my point: there's a significant difference between
"guaranteed good performance" and "probabilistically good performance".
Even when the probably-good algorithm wins for typical cases, there's a
strong argument to be made for guarantees.  The problem you set out to
solve really is that an algorithm that's all right in everyday cases
will suck in certain uncommon cases --- so why do you want to fix it
by just moving around the cases in which it fails to do well?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PATCHES] final CSVlog patch

2007-08-10 Thread Andrew Dunstan


I think this is ready to be committed now.It's been a long and tiresome 
road ;-)


Last-minute comments welcome.

cheers

andrew
Index: doc/src/sgml/config.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.136
diff -c -r1.136 config.sgml
*** doc/src/sgml/config.sgml	4 Aug 2007 01:26:53 -	1.136
--- doc/src/sgml/config.sgml	11 Aug 2007 02:00:58 -
***
*** 2253,2259 
 
  PostgreSQL supports several methods
   for logging server messages, including
!  stderr and
   syslog. On Windows, 
   eventlog is also supported. Set this
   parameter to a list of desired log destinations separated by
--- 2253,2259 
 
  PostgreSQL supports several methods
   for logging server messages, including
!  stderr, csvlog and
   syslog. On Windows, 
   eventlog is also supported. Set this
   parameter to a list of desired log destinations separated by
***
*** 2262,2278 
   This parameter can only be set in the postgresql.conf
   file or on the server command line.
 

   
  
!  
!   redirect_stderr (boolean)

!redirect_stderr configuration parameter


 
!  This parameter allows messages sent to stderr to be
   captured and redirected into log files.
   This method, in combination with logging to stderr,
   is often more useful than
--- 2262,2285 
   This parameter can only be set in the postgresql.conf
   file or on the server command line.
 
+ If log_destination is set to csvlog, 
+  the log is output as comma seperated values. The format is:
+  timestamp with milliseconds, username, database name, session id, host:port number,
+  process id, per process line number, command tag, session start time, transaction id, 
+  error severity, SQL state code, statement/error message. 
+

   
  
!  
!   start_log-collector (boolean)

!start_log_collector configuration parameter


 
!  This parameter allows messages sent to stderr,
! 		 and CSV logs, to be
   captured and redirected into log files.
   This method, in combination with logging to stderr,
   is often more useful than
***
*** 2280,2285 
--- 2287,2293 
   might not appear in syslog output (a common example
   is dynamic-linker failure messages).
   This parameter can only be set at server start.
+ 		 It is required to be on if CSV logs are to be generated.
 

   
***
*** 2291,2298 


 
! When redirect_stderr is enabled, this parameter
! determines the directory in which log files will be created.
  It can be specified as an absolute path, or relative to the
  cluster data directory.
  This parameter can only be set in the postgresql.conf
--- 2299,2306 


 
! When start_log_collector is enabled, 
! this parameter determines the directory in which log files will be created.
  It can be specified as an absolute path, or relative to the
  cluster data directory.
  This parameter can only be set in the postgresql.conf
***
*** 2308,2315 


 
! When redirect_stderr is enabled, this parameter
! sets the file names of the created log files.  The value
  is treated as a strftime pattern,
  so %-escapes can be used to specify time-varying
  file names.  (Note that if there are
--- 2316,2323 


 
! When start_log_collector is enabled,
! this parameter sets the file names of the created log files.  The value
  is treated as a strftime pattern,
  so %-escapes can be used to specify time-varying
  file names.  (Note that if there are
***
*** 2324,2329 
--- 2332,2344 
  This parameter can only be set in the postgresql.conf
  file or on the server command line.
 
+
+ If log_destination is set to csvlog,
+ .csv will be appended to the file name, to obtain the lofile
+ 	name for CSV logs, unless the filename ends with .log, 
+ 	in which case the suffix is just overwritten. In the case of the example above, the 
+ file name will be server_log.1093827753.csv
+

   
  
***
*** 2334,2341 


 
! When redirect_stderr is enabled, this parameter
! determines the maximum lifetime of an individual log file.
  After this many minutes have elapsed, a new log file will

Re: [PATCHES] strpos() && KMP

2007-08-10 Thread Pavel Ajtkulov
Tom Lane writes:

>> hash table?

> I'd think the cost of hashing would render it impractical.  Most of the
> papers I've seen on this topic worry about getting single instructions
> out of the search loop --- a hash lookup will cost lots more than that.
> Moreover, you'd lose the guarantee of not-worse-than-linear time,
> because hash lookup can be pathologically bad if you get a lot of hash
> collisions.

compute max_wchar, min_wchar. If (d = max_wchar - min_wchar) < k (for
example, k = 1000), then we use index table (wchar -> wchar -
min_wchar). Else we use hash table. Number of collisions would be a
few (because hash table needs for pattern characters only. Characters
located serially, hash function = whchar % const).

>> The main difficulty with BM is verification and understanding "good
>> suffix shift" (the second part of BM) (I don't understand it entirely).

> Yeah, there seem to be a bunch of variants of BM (many of them not
> guaranteed linear, which I'm sure we don't want) and the earliest
> papers had bugs.  But a good implementation would be a lot easier
> sell because it would show benefits for a much wider set of use-cases
> than KMP.

Is there requirement for some string mathching algorithms/data
structure(suffix array/tree) in PG? or "We've
had no complaints about the speed of those functions".



Ajtkulov Pavel
[EMAIL PROTECTED]



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Reduce the size of PageFreeSpaceInfo on 64bit platform

2007-08-10 Thread Decibel!
On Fri, Aug 10, 2007 at 10:32:35AM +0900, ITAGAKI Takahiro wrote:
> Here is a patch to reduce the size of PageFreeSpaceInfo on 64bit platform.
> We will utilize maintenance_work_mem twice with the patch.
> 
> The sizeof(PageFreeSpaceInfo) is 16 bytes there because the type of 'avail'
> is 'Size', that is typically 8 bytes and needs to be aligned in 8-byte bounds.
> I changed the type of the field to uint32. We can store the freespace with
> uint16 at smallest, but the alignment issue throws it away.

So... does that mean that the comment in the config file about 6 bytes
per page is incorrect?
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpTGjLFGE552.pgp
Description: PGP signature