Re: [HACKERS] Why is there a tsquery data type?

2008-03-12 Thread Bruce Momjian

Added to TODO:

* Consider a special data type for regular expressions

  http://archives.postgresql.org/pgsql-hackers/2007-08/msg01067.php


---

Tom Lane wrote:
 Ron Mayer [EMAIL PROTECTED] writes:
  Could similar logic argue that we'd want special types for regular
  expressions too? That seems quite parallel to the tsquery type to me.
 
 Yeah, it certainly seems like something we might want to consider in
 future --- we could get rid of that klugy cache for compiled regexps,
 for one thing.  An implicit cast from text to the regexp type would
 cover backwards compatibility issues, I think.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is there a tsquery data type?

2007-08-30 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes:

 There is no question things would be clearer with only one text search
 data type.  The only value I can see to having a tsquery data type is
 that you can store a tsquery value in a column, but why would that be
 much better than just storing it in a TEXT field?

When you try storing a tsquery in a column does it alert you if you have an
invalid syntax at that point? Storing it as text would mean not finding out
until you try to use the query.

Is converting a text query into the internal format faster or less memory
intensive than converting text into the internal representation? When you run
a query like WHERE '...' @@ col if there wasn't a tsquery data type then
'...' would have to be parsed over and over again for each row.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Why is there a tsquery data type?

2007-08-30 Thread Bruce Momjian
Gregory Stark wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  There is no question things would be clearer with only one text search
  data type.  The only value I can see to having a tsquery data type is
  that you can store a tsquery value in a column, but why would that be
  much better than just storing it in a TEXT field?
 
 When you try storing a tsquery in a column does it alert you if you have an
 invalid syntax at that point? Storing it as text would mean not finding out
 until you try to use the query.

Yes it does check syntax:

test= select 'lkjadsf kjfdsa'::tsquery;
ERROR:  syntax error in tsearch query: lkjadsf kjfdsa

A larger question is how many people store queries in the database to
make it worth the complexity.

 Is converting a text query into the internal format faster or less memory
 intensive than converting text into the internal representation? When you run
 a query like WHERE '...' @@ col if there wasn't a tsquery data type then
 '...' would have to be parsed over and over again for each row.

No, internally the TEXT string would be converted to something the
system could deal with for that query, which is probably what 99% of all
queries are going to do anyway by calling to_tsquery().

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Why is there a tsquery data type?

2007-08-30 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes:

 Gregory Stark wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  There is no question things would be clearer with only one text search
  data type.  The only value I can see to having a tsquery data type is
  that you can store a tsquery value in a column, but why would that be
  much better than just storing it in a TEXT field?
 
 When you try storing a tsquery in a column does it alert you if you have an
 invalid syntax at that point? Storing it as text would mean not finding out
 until you try to use the query.

 Yes it does check syntax:

   test= select 'lkjadsf kjfdsa'::tsquery;
   ERROR:  syntax error in tsearch query: lkjadsf kjfdsa

 A larger question is how many people store queries in the database to
 make it worth the complexity.

So would this still happen if you didn't have a tsquery type? Or would it
throw the error only when the actual matching operator executed and tried to
parse the text?

 Is converting a text query into the internal format faster or less memory
 intensive than converting text into the internal representation? When you run
 a query like WHERE '...' @@ col if there wasn't a tsquery data type then
 '...' would have to be parsed over and over again for each row.

 No, internally the TEXT string would be converted to something the
 system could deal with for that query, which is probably what 99% of all
 queries are going to do anyway by calling to_tsquery().

How would that happen if there wasn't a tsquery type?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Why is there a tsquery data type?

2007-08-30 Thread Bruce Momjian
Gregory Stark wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  Gregory Stark wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
  
   There is no question things would be clearer with only one text search
   data type.  The only value I can see to having a tsquery data type is
   that you can store a tsquery value in a column, but why would that be
   much better than just storing it in a TEXT field?
  
  When you try storing a tsquery in a column does it alert you if you have an
  invalid syntax at that point? Storing it as text would mean not finding out
  until you try to use the query.
 
  Yes it does check syntax:
 
  test= select 'lkjadsf kjfdsa'::tsquery;
  ERROR:  syntax error in tsearch query: lkjadsf kjfdsa
 
  A larger question is how many people store queries in the database to
  make it worth the complexity.
 
 So would this still happen if you didn't have a tsquery type? Or would it
 throw the error only when the actual matching operator executed and tried to
 parse the text?

Well, if you didn't have a tsquery data type then you would get the
error when the TEXT was cast to tsquery for the search itself.

  Is converting a text query into the internal format faster or less memory
  intensive than converting text into the internal representation? When you 
  run
  a query like WHERE '...' @@ col if there wasn't a tsquery data type then
  '...' would have to be parsed over and over again for each row.
 
  No, internally the TEXT string would be converted to something the
  system could deal with for that query, which is probably what 99% of all
  queries are going to do anyway by calling to_tsquery().
 
 How would that happen if there wasn't a tsquery type?

There is an internal C structure which holds the tsquery information. 
My guess is that we would internally have something like tsquery but it
wouldn't be user-visible perhaps.  Right now I am a little confused
about how to do this and keep the data-type-independent nature of the
backend.  You are right that we might have to end up re-parsing the TEXT
field every time it hits the @@ operator, which is a pain.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Why is there a tsquery data type?

2007-08-30 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 How would that happen if there wasn't a tsquery type?

I don't think Bruce is suggesting that the search operand can really
be plain text (or if he is, he's nuts).  The question here is whether
there's really a need for a distinction between tsvector and tsquery
datatypes; could we have tsvector serve both purposes instead?

I can see that there are differences: tsquery can tell the difference
between x AND y and x OR y, whereas tsvector just knows x, y.
A superset datatype that can do both is certainly possible, but whether
it's practical, or would be easier to use than the current design,
I dunno.

Perhaps a suitable analogy is regexp pattern matching.  Traditionally
regexps are conceived of as strings, but if they'd originated in more
strongly typed languages than they did, they'd certainly be thought
of as a distinct data type.  Had we implemented ~ as taking a right
operand of type 'regexp', we could win on a number of levels: entry-time
syntax checking for regexps, and a precompiled internal representation,
for instance.  For regexps it seems clear to me that the target text
string and the pattern really are different datatypes, and fuzzing that
distinction is not an improvement.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Why is there a tsquery data type?

2007-08-30 Thread Bruce Momjian
Tom Lane wrote:
 Gregory Stark [EMAIL PROTECTED] writes:
  How would that happen if there wasn't a tsquery type?
 
 I don't think Bruce is suggesting that the search operand can really
 be plain text (or if he is, he's nuts).  The question here is whether

Yes, that is what I was suggesting because as I mentioned TEXT already
functions fine as tsquery.

 there's really a need for a distinction between tsvector and tsquery
 datatypes; could we have tsvector serve both purposes instead?
 
 I can see that there are differences: tsquery can tell the difference
 between x AND y and x OR y, whereas tsvector just knows x, y.
 A superset datatype that can do both is certainly possible, but whether
 it's practical, or would be easier to use than the current design,
 I dunno.

Because of the special behavior of  and |, I assume tsquery and
tsvector cannot be the same data type.

 Perhaps a suitable analogy is regexp pattern matching.  Traditionally
 regexps are conceived of as strings, but if they'd originated in more
 strongly typed languages than they did, they'd certainly be thought
 of as a distinct data type.  Had we implemented ~ as taking a right
 operand of type 'regexp', we could win on a number of levels: entry-time
 syntax checking for regexps, and a precompiled internal representation,
 for instance.  For regexps it seems clear to me that the target text
 string and the pattern really are different datatypes, and fuzzing that
 distinction is not an improvement.

Yes, this is a good analogy.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Why is there a tsquery data type?

2007-08-30 Thread Oleg Bartunov

On Thu, 30 Aug 2007, Bruce Momjian wrote:


Gregory Stark wrote:

Bruce Momjian [EMAIL PROTECTED] writes:


There is no question things would be clearer with only one text search
data type.  The only value I can see to having a tsquery data type is
that you can store a tsquery value in a column, but why would that be
much better than just storing it in a TEXT field?


When you try storing a tsquery in a column does it alert you if you have an
invalid syntax at that point? Storing it as text would mean not finding out
until you try to use the query.


Yes it does check syntax:

test= select 'lkjadsf kjfdsa'::tsquery;
ERROR:  syntax error in tsearch query: lkjadsf kjfdsa

A larger question is how many people store queries in the database to
make it worth the complexity.


you forget about very powerfull query rewriting, which is table driven




Is converting a text query into the internal format faster or less memory
intensive than converting text into the internal representation? When you run
a query like WHERE '...' @@ col if there wasn't a tsquery data type then
'...' would have to be parsed over and over again for each row.


No, internally the TEXT string would be converted to something the
system could deal with for that query, which is probably what 99% of all
queries are going to do anyway by calling to_tsquery().




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Why is there a tsquery data type?

2007-08-30 Thread Ron Mayer
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 Why does text search need a tsquery data type?  I realize it needs
 tsvector so it can create indexes and updated trigger columns, but it
 seems tsquery could instead just be a simple text string.
 
 By that logic, we don't need any data types other than text.
 

Could similar logic argue that we'd want special types for regular
expressions too? That seems quite parallel to the tsquery type to me.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Why is there a tsquery data type?

2007-08-30 Thread Tom Lane
Ron Mayer [EMAIL PROTECTED] writes:
 Could similar logic argue that we'd want special types for regular
 expressions too? That seems quite parallel to the tsquery type to me.

Yeah, it certainly seems like something we might want to consider in
future --- we could get rid of that klugy cache for compiled regexps,
for one thing.  An implicit cast from text to the regexp type would
cover backwards compatibility issues, I think.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Why is there a tsquery data type?

2007-08-29 Thread Bruce Momjian
Why does text search need a tsquery data type?  I realize it needs
tsvector so it can create indexes and updated trigger columns, but it
seems tsquery could instead just be a simple text string.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Why is there a tsquery data type?

2007-08-29 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Why does text search need a tsquery data type?  I realize it needs
 tsvector so it can create indexes and updated trigger columns, but it
 seems tsquery could instead just be a simple text string.

By that logic, we don't need any data types other than text.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Why is there a tsquery data type?

2007-08-29 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Why does text search need a tsquery data type?  I realize it needs
  tsvector so it can create indexes and updated trigger columns, but it
  seems tsquery could instead just be a simple text string.
 
 By that logic, we don't need any data types other than text.

What is tsquery giving us that text would not?

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Why is there a tsquery data type?

2007-08-29 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 By that logic, we don't need any data types other than text.

 What is tsquery giving us that text would not?

A preprocessed representation that can be compared to tsvector
efficiently.

Now, if you'd asked whether we need *both* tsvector and tsquery,
that'd be a fair question.  I'm not 100% clear on what the differences
are, but they seem pretty subtle.  Do you think that having only
one datatype would be clearer?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Why is there a tsquery data type?

2007-08-29 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  By that logic, we don't need any data types other than text.
 
  What is tsquery giving us that text would not?
 
 A preprocessed representation that can be compared to tsvector
 efficiently.
 
 Now, if you'd asked whether we need *both* tsvector and tsquery,
 that'd be a fair question.  I'm not 100% clear on what the differences
 are, but they seem pretty subtle.  Do you think that having only
 one datatype would be clearer?

There is no question things would be clearer with only one text search
data type.  The only value I can see to having a tsquery data type is
that you can store a tsquery value in a column, but why would that be
much better than just storing it in a TEXT field?

Internally I assume you would have to generate a tsquery structure from
a TEXT string, so the idea of a query representation wouldn't go away;
it would just be internal.

The one thing we would lose is the ability to process the query string
with a named configuration.  If we always cast to TEXT, I assume we
would always be using default_text_search_config, and I am a little
worried about queries in triggers that have to wire-down the
configuration name.  As I understand it the tsquery goes through the
configuration just like the tsvector.

Right now you can already do:

'query' @@ 'boy girl'::tsvector

and the system casts your text string to tsquery automatically.  Perhaps
we just need to minimize tsquery in the documentation and mention its
special purpose.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly