Oleg Bartunov wrote: > On Wed, 20 Jun 2007, Bruce Momjian wrote: > >> Comments to editorial work of Bruce Momjian. > >> > >> fulltext-intro.sgml: > >> > >> it is useful to have a predefined list of lexemes. > >> > >> Bruce, here should be list of types of lexemes ! > > > > Agreed. Are the list of lexemes parser-specific? > > > > yes, it it parser which defines types of lexemes.
OK, how will users get a list of supported lexemes? Do we need a list per supported parser? > >> fulltext-opfunc.sgml: > >> > >> All of the following functions that accept a configuration argument can > >> use either an integer <!-- why an integer --> or a textual configuration > >> name to select a configuration. > >> > >> originally it was integer id, probably better use <type>oid</type> > > > > Uh, my question is why are you allowing specification as an integer/oid > > when the name works just fine. I don't see the value in allowing > > numbers here. > > for compatibility reason. Hmm, indeed, i don't recall where oid's could be > important. Well, if neither of ussee no reason for it, let's remove it. We don't need to support a feature that has no usefulness. > >> This returns the query used for searching an index. It can be used to test > >> for an empty query. The <command>SELECT</> below returns <literal>'T'</>, > >> <!-- lowercase? --> which corresponds to an empty query since GIN indexes > >> do not support negate queries (a full index scan is inefficient): > >> > >>> capital case. This looks cumbersome, probably querytree() should > >>> just return NULL. > > > > Agreed. > > > >> The integer option controls several behaviors which is done using bit-wise > >> fields and <literal>|</literal> (for example, <literal>2|4</literal>): > >> <!-- why so complex? --> > >> > >>> to avoid 2 arguments > > > > But I don't see why you would want to set two of those values --- they > > seem mutually exclusive, e.g. > > > > 1 divides the rank by the 1 + logarithm of the document length > > 2 divides the rank by the length itself > > > > I assume you do either one, not both. > > but what's about others variants ? OK, here is the full list: 0 (the default) ignores document length 1 divides the rank by the 1 + logarithm of the document length 2 divides the rank by the length itself 4 divides the rank by the mean harmonic distance between extents 8 divides the rank by the number of unique words in document 16 divides the rank by 1 + logarithm of the number of unique words in document so which ones would be both enabled? > > What I missed is the definition of extent. > > >From http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking > Extent is a shortest and non-nested sequence of words, which satisfy a query. I don't understand how that relates to this. > > > >> its <replaceable>id</replaceable> or <replaceable>ts_name</replaceable>; > >> <!-- n > >> if none is specified that the current configuration is used. > >> > >>> I don't understand this question > > > > Same issue as above --- why allow a number here when the name works just > > fine. We don't allow tables to be specified by number, so why > > configurations? > > > >> <para> > >> <!-- why? --> > >> Note that the cascade dropping of the <function>headline</function> > >> function > >> cause dropping of the <literal>parser</literal> used in fulltext > >> configuration > >> <replaceable>tsname</replaceable>. > >> </para> > >> > >>> hmm, probably it should be reversed - cascade dropping of the parser cause > >>> dropping of the headline function. > > > > Agreed. > > > >> > >> In example below, <literal>fulltext_idx</literal> is > >> a GIN index:<!-- why isn't this automatic --> > >> > >>> It's explained above. The problem is that current index api doesn't allow > >>> to say if search was lossy or exact, so to preserve performance of > >>> GIN index we had to introduce @@@ operator, which is the same as @@, but > >>> lossy. > > > > Well, then we have to fix the API. Telling users to use a different > > operator based on what index is defined is just bad style. > > This was raised by Heikki and we discussed it a bit in Ottawa, but it's > unclear if it's doable for 8.3. @@@ operator is in rare use, so we could > say it will be improved in future versions. Uh, I am wondering if we just have to force heap access in all cases until it is fixed. > >> nly the <token>lword</token> lexeme, then a <acronym>TZ</acronym> > >> definition like ' one 1:11' will not work since lexeme type > >> <token>digit</token> is not assigned to the <acronym>TZ</acronym>. > >> <!-- what do these numbers mean? --> > >> </para> > > > > OK, I changed it to be clearer. > > > >>> nothing special, just numbers for example. > >> > >> <function>ts_debug</> displays information about every token of > >> <replaceable class="PARAMETER">document</replaceable> as produced by the > >> parser and processed by the configured dictionaries using the configuration > >> specified by <replaceable class="PARAMETER">cfgname</replaceable> or > >> <replaceable class="PARAMETER">oid</replaceable>. <!-- no need for oid > >> > >>> don't understand this comment. ts_debug accepts cfgname or its oid > > > > Again, no need for oid. > > We need to decide if we need oids as user-visible argument. I don't see > any value, probably Teodor think other way. This is a good time to clean up the API because there are going to be user-visible changes anyway. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match