Re: [HACKERS] 8.2 Beta 3 Now Available for Download / Testing ...

2006-11-08 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Done ...

- --On Wednesday, November 08, 2006 21:31:38 -0500 Robert Treat 
<[EMAIL PROTECTED]> wrote:

> On Tuesday 07 November 2006 13:25, Marc G. Fournier wrote:
>> As a result of there being two *known* outstanding bugs, we have just
>> bundled up a Beta3, to allow for testing of the recent patch concerning WAL
>> replay ...
>>
>> Please download the latest beta from:
>>
>>ftp.postgresql.org:/pub/source/v8.2beta
>>
>> While the mirrors themselves catch up over the next 24 hours ...
>>
>
> Hey Marc, this url breaks consistency with previous beta builds and with the
> windows beta binaries which all use an incrementing beta version number (ie.
> beta3 vs. beta).  I've adjusted the links on the website to match what's
> actually up there, but it might be a good idea to remove the beta2 directory
> from http://www.postgresql.org/ftp/source/ lest someone gets confused about
> where to look for the most recent beta.
>
> --
> Robert Treat
> Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL



- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFFUqVR4QvfyHIvDvMRAjyKAKCZ+TDox/aHs5iEcvC8UaVybYM5rACfcuRl
080Zda4k36bXOV8nrPxdxRA=
=CzW5
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] NULL in arrays

2006-11-08 Thread Robert Treat
On Monday 06 November 2006 20:16, Jim C. Nasby wrote:
> On Sun, Nov 05, 2006 at 09:53:08PM +0100, Martijn van Oosterhout wrote:
> > Note that the constructs:
> >
> > ARRAY['a',"NULL",'c']
> >
> > and
> >
> > '{a,"NULL",c}'
> >
> > are *completely* different. The first is a special array constructor
> > and all its parameters are normal SQL expressions, so you can reference
> > columns and use NULL directly without quotes. The latter is the string
> > value of the array, which is specially decoded. Thats why the latter
> > treats the double quotes differently.
>
> This could probably be expanded on in the docs... mentioning the
> difference in the NULLs section would be a good start. IE:
>
> To set an element of an array constant to NULL, write NULL  for the
> element value. (Any upper- or lower-case variant of NULL will do.) If
> you want an actual string value "NULL", you must put double quotes
> around it. Note that if you use the ARRAY construct you should just use
> a bareword NULL instead.

This is not terribly clearer :-)  Hopefully we can put examples along with it, 
since that helps out tremendously. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 8.2 Beta 3 Now Available for Download / Testing ...

2006-11-08 Thread Robert Treat
On Tuesday 07 November 2006 13:25, Marc G. Fournier wrote:
> As a result of there being two *known* outstanding bugs, we have just
> bundled up a Beta3, to allow for testing of the recent patch concerning WAL
> replay ...
>
> Please download the latest beta from:
>
>ftp.postgresql.org:/pub/source/v8.2beta
>
> While the mirrors themselves catch up over the next 24 hours ...
>

Hey Marc, this url breaks consistency with previous beta builds and with the 
windows beta binaries which all use an incrementing beta version number (ie. 
beta3 vs. beta).  I've adjusted the links on the website to match what's 
actually up there, but it might be a good idea to remove the beta2 directory 
from http://www.postgresql.org/ftp/source/ lest someone gets confused about 
where to look for the most recent beta.

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Error in from_char() for field 'D'?

2006-11-08 Thread Brendan Jurd

Hey hackers,

I was doing some work in backend/utils/adt/formatting.c, and found the
following:

case DCH_D:
INVALID_FOR_INTERVAL;
if (is_to_char)
{
sprintf(inout, "%d", tm->tm_wday + 1);
if (S_THth(suf))
str_numth(p_inout, inout, 
S_TH_TYPE(suf));
return strlen(p_inout);
}
else
{
sscanf(inout, "%1d", &tmfc->d);
return strspace_len(inout) + 1 + SKIP_THth(suf);
}

The tm_wday field is internally stored as an integer 0 - 6, with 0
being Sunday.  The 'D' formatting field, as per the documentation,
gives 1 - 7 with 1 being Sunday.  So to convert tm_wday to 'D' in
to_char(), you add one.  This works as expected.

However, in from_char(), the reverse is not true.  Looking at the code
snippet above, the digit is scanned straight into tmfc->d unaltered
(this value is later copied directly to tm->tm_wday circa line 3394).

Unless I'm missing something, when converting to text, 'D' yields 1-7,
but when converting back from text, 'D' expects 0-6.

It's not a major problem, since there's not really a use-case for
specifying dates for conversion with the 'D' field, but this behaviour
appears to be incorrect, or at the very least, incorrectly documented.

The fix should be trivial; subtract one from tmfc->d after the call to sscanf()

Regards,
BJ

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] string_to_array eats too much memory?

2006-11-08 Thread Oleg Bartunov

On Thu, 9 Nov 2006, Tatsuo Ishii wrote:


Porblem with Japanese is, it's an agglutinative language and we need
to separate each word from a sentence. So, I need to modify tsearch2
anyway (I know someone from Japan is working on this).

https://www.oss.ecl.ntt.co.jp/tsearch2j/index.html
That's it?


Yes. However I'm going to use different "word separation" library from
them and will make some tweaks.


BTW, can tsearch2 handle ~70k words in a document?


I don't see any problem.


Great. I have made a little trial and it seems tsearch2 works great
with GIN.


Tatsuo, ideallly, I'd like to have tsearch2 untouched, but with 
japanese parser(s) and dictionaries (program) available. This is how

tsearch2 was designed. If something prevent to do so, we should improve
tsearch2. This is important now, since we're going to build tsearch2 into
PostgreSQL core for 8.3.



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] string_to_array eats too much memory?

2006-11-08 Thread Teodor Sigaev

Limitations

Sorry for noise - it's mentioned in README.tsearch2
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] string_to_array eats too much memory?

2006-11-08 Thread Teodor Sigaev

Is this documented somewhere? Also I noticed that tsearch2 treats ":"
as a special character. Are there any special characters? If so where
are they documented?


You can avoid confusions with special character by quoting:
# select '''wow:'''::tsvector;
 tsvector
--
 'wow:'
(1 row)

':' is separator of lexeme and its position information
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] string_to_array eats too much memory?

2006-11-08 Thread Teodor Sigaev

tsvector size should not be greater than 1Mb however.


Is this documented somewhere? Also I noticed that tsearch2 treats ":"
as a special character. Are there any special characters? If so where
are they documented?

http://www.sai.msu.su/~megera/wiki/Tsearch_V2_in_Brief
Limitations

13.1 2048 bytes for lexems
13.2 ts_vector has limit about 1Mb. Exact value depends on
quantity of position information. If there is no any position
information, then sum of length of lexem must be less than 1Mb,
otherwise, sum of length of and pos. info.
Positional information uses 2 bytes per each
position and 2 bytes per lexem with pos info. The number of
lexems is limited by 4^32, so in practice it's unlimited.
13.3 ts_query:
Number of entries (nodes, i.e sum of lexems and operation)
is limited: internal representation is in polish notation
and position of one operand is pointed by int2, so it's
rather soft limit.
In any case, low range of limit - 32768 nodes.
Notice: ts_query doesn't designed for storing in table and
is optimized for speed, not for size.
13.4 Positional information in ts_vector:
13.4.1 Value of position may not be greater than 2^14 (16384),
   any value greater than this limit will be replaced
   by 16383.
13.4.2 Only 256 positional info per lexem.


Some useful articles
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/HOWTO-parser-tsearch2.html
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/custom-dict.html


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] string_to_array eats too much memory?

2006-11-08 Thread Tatsuo Ishii
> > Porblem with Japanese is, it's an agglutinative language and we need
> > to separate each word from a sentence. So, I need to modify tsearch2
> > anyway (I know someone from Japan is working on this).
> https://www.oss.ecl.ntt.co.jp/tsearch2j/index.html
> That's it?

Yes. However I'm going to use different "word separation" library from
them and will make some tweaks.

> > BTW, can tsearch2 handle ~70k words in a document?
> 
> I don't see any problem.

Great. I have made a little trial and it seems tsearch2 works great
with GIN.

> tsvector size should not be greater than 1Mb however.

Is this documented somewhere? Also I noticed that tsearch2 treats ":"
as a special character. Are there any special characters? If so where
are they documented?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] string_to_array eats too much memory?

2006-11-08 Thread Tom Lane
Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> string_to_array() consumes too much memory. For example, to make ~70k
> array elements, string_to_array seems to eat several Gig bytes of
> memory.

I'd argue that the problem comes from enlarging the work arrays only 64
elements at a time in accumArrayResult().  Most of the rest of the code
deals with resizing arrays using a "double it each time it has to grow"
approach, I wonder why this is different?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] string_to_array eats too much memory?

2006-11-08 Thread Teodor Sigaev

Porblem with Japanese is, it's an agglutinative language and we need
to separate each word from a sentence. So, I need to modify tsearch2
anyway (I know someone from Japan is working on this).

https://www.oss.ecl.ntt.co.jp/tsearch2j/index.html
That's it?



BTW, can tsearch2 handle ~70k words in a document?


I don't see any problem. tsvector size should not be greater than 1Mb however.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] string_to_array eats too much memory?

2006-11-08 Thread Tatsuo Ishii
> > I'm testing how GIN scales.
> 
> Have a look at http://www.sigaev.ru/cvsweb/cvsweb.cgi/ftsbench/ - utility is 
> specially developed for measuring performance of full-text solutions ( now it 
> supports PgSQL( GiST, GIN ) and MySQL ). Right now I'm searching good query 
> statistic for simulate load, but this data is a closed information in 
> internet-wide search engines :(

Thanks.

> GIN itself is a just a tool for speedup searches, linguistic part is still in 
> tsearch2.
> 
> It's possible to use tsearch2 without any indexes at all. GiST and GIN is  a 
> way 
> to speedup searches.
> 
> Of course, you can develop another framework for full text search and 
> framework 
> may use GIN as it wish :)

Porblem with Japanese is, it's an agglutinative language and we need
to separate each word from a sentence. So, I need to modify tsearch2
anyway (I know someone from Japan is working on this).

BTW, can tsearch2 handle ~70k words in a document?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] Bug in WAL backup documentation

2006-11-08 Thread Martijn van Oosterhout
On Tue, Nov 07, 2006 at 07:11:35PM -0600, Bruno Wolff III wrote:
> On Sun, Nov 05, 2006 at 11:49:36 -0500,
>   Tom Lane <[EMAIL PROTECTED]> wrote:
> > 
> > As already discussed upthread, anyone who wants the path can get it from
> > `pwd` or local equivalent --- and that mechanism is robust (as long as
> > the directory move doesn't happen while any particular instance of the
> > script is running).  I don't see why we should go out of our way to
> > provide a bad substitute for pwd.
> 
> I think you also still need read access to the intervening directories.
> If the command works by walking up and matching inode numbers with names,
> then it will break if it can't read the names. (For example /bin/pwd
> breaks when it can't read a parent directories filenames.)

That's system dependant though, Linux getcwd doesn't have that problem
for example. Should probably dig up some documention on which systems
would be affected by this.

Have a ncie day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] string_to_array eats too much memory?

2006-11-08 Thread Teodor Sigaev

I'm testing how GIN scales.


Have a look at http://www.sigaev.ru/cvsweb/cvsweb.cgi/ftsbench/ - utility is 
specially developed for measuring performance of full-text solutions ( now it 
supports PgSQL( GiST, GIN ) and MySQL ). Right now I'm searching good query 
statistic for simulate load, but this data is a closed information in 
internet-wide search engines :(



? I thought GIN is superior than tsearch2. 


From your GIN proposal posted to pgsql-hackers:

"The primary goal of the Gin index is a scalable full text search in
PostgreSQL"


GIN itself is a just a tool for speedup searches, linguistic part is still in 
tsearch2.


It's possible to use tsearch2 without any indexes at all. GiST and GIN is  a way 
to speedup searches.


Of course, you can develop another framework for full text search and framework 
may use GIN as it wish :)





--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] string_to_array eats too much memory?

2006-11-08 Thread Magnus Hagander
> > > I'm playing with GIN to make a full text search system. GIN comes 
> > > with built-in TEXT[] support and I use string_to_array() 
> to make a 
> > > TEXT[]. Problem is, if there's large number of array elemets,
> > > string_to_array() consumes too much memory. For example, to make 
> > > ~70k array elements, string_to_array seems to eat several 
> Gig bytes 
> > > of memory. ~70k array elements means there are same 
> number of words 
> > > in a document which is not too big in a large text IMO.
> > 
> > Do you mean 70k unique lexemes? Ugh.
> 
> I'm testing how GIN scales.
> 
> > Why do not you use tsearch framework?
> 
> ? I thought GIN is superior than tsearch2. 
> 
> From your GIN proposal posted to pgsql-hackers:
> 
> "The primary goal of the Gin index is a scalable full text 
> search in PostgreSQL"

tsearch2 *uses* GIN in 8.2. Just CREATE INDEX foo ON bar USING
gin(mytsvector).

And tsearch2 in 8.2 with GIN can be a *lot* faster than with GIST. I've
been running experiments on the website search with tsearch2/GIN and
i've been seeing fantastic performance compared top revious versions.



//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] string_to_array eats too much memory?

2006-11-08 Thread Tatsuo Ishii
> > I'm playing with GIN to make a full text search system. GIN comes with
> > built-in TEXT[] support and I use string_to_array() to make a
> > TEXT[]. Problem is, if there's large number of array elemets,
> > string_to_array() consumes too much memory. For example, to make ~70k
> > array elements, string_to_array seems to eat several Gig bytes of
> > memory. ~70k array elements means there are same number of words in a
> > document which is not too big in a large text IMO.
> 
> Do you mean 70k unique lexemes? Ugh.

I'm testing how GIN scales.

> Why do not you use tsearch framework?

? I thought GIN is superior than tsearch2. 

>From your GIN proposal posted to pgsql-hackers:

"The primary goal of the Gin index is a scalable full text search in
PostgreSQL"

What do you think?:-)
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 8.2 Beta 3 Now Available for Download / Testing ...

2006-11-08 Thread Heikki Linnakangas

Andrew Dunstan wrote:

Marc G. Fournier wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


As a result of there being two *known* outstanding bugs, we have just 
bundled up a Beta3, to allow for testing of the recent patch 
concerning WAL replay ...



What are the bugs?


AFAIK:

1. Tuple freezing and hint bits were not WAL-logged, which could lead to 
data corruption if database crashed after VACUUM. See recent thread 
titled "WAL logging freezing".


2. B-tree page deletion bug, which could lead to VACUUM erroring with 
"failed to re-find parent key" message. See thread "Nasty btree deletion 
bug": http://archives.postgresql.org/pgsql-hackers/2006-10/msg01373.php


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Tsearch Index Size and GiST vs. GIN

2006-11-08 Thread Teodor Sigaev



Richard Whidden wrote:
Due to popular demand (1 person), I've compared sizes with 90 and 100 
fillfactors, along with using the new GIN index.


First, GIN doesn't utilize fillfactor option yet.



Findings were not surprising, except for the GIN indexes, which doubled 
in size.
Second, In opposite to GiST, size of GIN index strongly depends on document 
collection. Each unique word in collection has list of pointer to document in 
which it occur. Size of pointer is a 6 byte. So word with 4 bytes length will 
occupy at least 4 + N_docs_with_word * 6.


Fortunately, for searching it's needed to read very small part of index, so GIN 
will be faster.





After several ALTER/RE INDEXes, here they are:

GiST

8.1 = 94990
8.2 FF90 = 106244 relpages (8k)
8.2 FF100 = 95049

GIN
---
FF100 = 197702


Richard Whidden

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] string_to_array eats too much memory?

2006-11-08 Thread Teodor Sigaev

I'm playing with GIN to make a full text search system. GIN comes with
built-in TEXT[] support and I use string_to_array() to make a
TEXT[]. Problem is, if there's large number of array elemets,
string_to_array() consumes too much memory. For example, to make ~70k
array elements, string_to_array seems to eat several Gig bytes of
memory. ~70k array elements means there are same number of words in a
document which is not too big in a large text IMO.


Do you mean 70k unique lexemes? Ugh.
Why do not you use tsearch framework?

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Index ignored with "is not distinct from", 8.2 beta2

2006-11-08 Thread Teodor Sigaev

There's been work on it. Theodor cleaned it up for HEAD and looked at
adding GiST support. I beleive he's waiting for 8.2 to release.


Yep, I have bundle of patches and I'm waiting for 8.2 branch split out of HEAD.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend