Re: [rt-users] case insensitive searching in Content?

2010-06-08 Thread Václav Ovsík
... 3rd try to mail ...

On Sun, Jun 06, 2010 at 02:34:58PM -0500, Kenneth Marshall wrote:
 Hi Zito,
 
 You can look in the mailing list archives for several similar discoveries on
 how MySQL works with content searches and the longblob data type. I would
 definitely recommend using PostgreSQL for its full-text indexing available
 in the wiki. Of course, I am partial... :)

Hi Ken,
thanks for the suggestion. I red the wiki page
http://wiki.bestpractical.com/view/PostgreSQLFullText already.
I'm definitely going to try this way, also patching DBIx::SearchBuilder
is a bit hack, but ILIKE is too slow on Pg.
Regards
-- 
Zito

Discover RT's hidden secrets with RT Essentials from O'Reilly Media.
Buy a copy at http://rtbook.bestpractical.com


Re: [rt-users] case insensitive searching in Content?

2010-06-08 Thread Kenneth Marshall
On Tue, Jun 08, 2010 at 09:10:23AM +0200, V?clav Ovs?k wrote:
 ... 3rd try to mail ...
 
 On Sun, Jun 06, 2010 at 02:34:58PM -0500, Kenneth Marshall wrote:
  Hi Zito,
  
  You can look in the mailing list archives for several similar discoveries 
  on
  how MySQL works with content searches and the longblob data type. I would
  definitely recommend using PostgreSQL for its full-text indexing available
  in the wiki. Of course, I am partial... :)
 
 Hi Ken,
 thanks for the suggestion. I red the wiki page
 http://wiki.bestpractical.com/view/PostgreSQLFullText already.
 I'm definitely going to try this way, also patching DBIx::SearchBuilder
 is a bit hack, but ILIKE is too slow on Pg.
 Regards
 -- 
 Zito
 

Hi Zito,

ILIKE is fine on PostgreSQL as long as the search is anchored on
the left and there is an index. Otherwise it will do a sequential
scan of the attachments table to find matching records. The wiki
entries for full-text indexing with PostgreSQL and Oracle can avoid
the sequential scan through the use of the database's full-text
index support. I do agree that patching the actual DBIx::SearchBuilder
is not preferable. Unfortunately, since that is the API that is used
to provide a database abstraction layer and it does not yet include
the ability to specify full-text indexing details, you have to patch
it. If it is any consolation, the patch is much, much simpler (trivial)
for the PostgreSQL support versus the version that is needed for
OracleText. :)

Regards,
Ken

Discover RT's hidden secrets with RT Essentials from O'Reilly Media.
Buy a copy at http://rtbook.bestpractical.com


Re: [rt-users] case insensitive searching in Content?

2010-06-08 Thread Václav Ovsík
Hi Ken,

On Tue, Jun 08, 2010 at 07:48:15AM -0500, Kenneth Marshall wrote:
 ILIKE is fine on PostgreSQL as long as the search is anchored on
 the left and there is an index. Otherwise it will do a sequential
 scan of the attachments table to find matching records. The wiki
 entries for full-text indexing with PostgreSQL and Oracle can avoid
 the sequential scan through the use of the database's full-text

thanks for your work on PostgreSQLFullText wiki page! Good starting
point. I'm playing with the fulltext on one instance already. I must
discuss the possibility to use the fulltext on Pg with my leader. The
results are different from those get from ILIKE. I tried to find 'cisco'
for example :). Normal search returned 25 tickets, but fulltext only 15.
Fulltext parser did tokens from URL and www.cisco.com was a lexem as is
(not broken further). I can't say this is fine or not, it is difference.
Users must know this - how to write the query correctly.

 index support. I do agree that patching the actual DBIx::SearchBuilder
 is not preferable. Unfortunately, since that is the API that is used
 to provide a database abstraction layer and it does not yet include
 the ability to specify full-text indexing details, you have to patch
 it. If it is any consolation, the patch is much, much simpler (trivial)
 for the PostgreSQL support versus the version that is needed for
 OracleText. :)

I completely understand. I put the package libdbix-searchbuilder-perl
(Debian) on hold in aptitude, so sysadmin should remember, that this
package needs a special handling. I did a copy of SearchBuilder.pm file
into rt-prefix/local/lib/DBIx/ and did a modification. This perl path
is searched first, so a modification is beside other RT code...

I have prepared a script for converting data from Mysql to Pg and for
adding/removing fulltext support based on your wiki page. Maybe it can
be added to wiki for someone else to play with fulltext and Pg. I must
to test it a bit and switch some fixed parts into conditional blocks
controlled by command-line options.

Regards
-- 
Zito

Discover RT's hidden secrets with RT Essentials from O'Reilly Media.
Buy a copy at http://rtbook.bestpractical.com


Re: [rt-users] case insensitive searching in Content?

2010-06-08 Thread Kenneth Marshall
On Tue, Jun 08, 2010 at 04:57:25PM +0200, V?clav Ovs?k wrote:
 Hi Ken,
 
 On Tue, Jun 08, 2010 at 07:48:15AM -0500, Kenneth Marshall wrote:
  ILIKE is fine on PostgreSQL as long as the search is anchored on
  the left and there is an index. Otherwise it will do a sequential
  scan of the attachments table to find matching records. The wiki
  entries for full-text indexing with PostgreSQL and Oracle can avoid
  the sequential scan through the use of the database's full-text
 
 thanks for your work on PostgreSQLFullText wiki page! Good starting
 point. I'm playing with the fulltext on one instance already. I must
 discuss the possibility to use the fulltext on Pg with my leader. The
 results are different from those get from ILIKE. I tried to find 'cisco'
 for example :). Normal search returned 25 tickets, but fulltext only 15.
 Fulltext parser did tokens from URL and www.cisco.com was a lexem as is
 (not broken further). I can't say this is fine or not, it is difference.
 Users must know this - how to write the query correctly.
 

Correct, the results depend on the parsing rules that are used
by the full-text configuration. These rules can be adapted to
your needs, but I do agree that some explanation about common
types of query problems would be worthwhile.

Regards,
Ken


Discover RT's hidden secrets with RT Essentials from O'Reilly Media.
Buy a copy at http://rtbook.bestpractical.com


[rt-users] case insensitive searching in Content?

2010-06-04 Thread Václav Ovsík
Hi,
I'm preparing migration from RT 3.2.2 + Mysql 3.x to RT 3.8.8
+ Mysql or Pg.
I just discovered a problem with the case insensitive searching in the
column Attachments.Content (longblob) while playing with a new RT
instance.

Ticket SQL query
Content like 'outlook'
on our current (old) RT instance, returns some result set. Column Content has
data-type longtext in the old database schema, so every letter case
combination are found.

On a new testing platform RT 3.8.8 + Mysql only partial set of tickets is
returned with the same query. I must to modify the query to 
Content like 'outlook' or Content like 'Outlook'
at least to see the most of the tickets.

On the other hand RT 3.8.8 + Pg has Content column data-type text, so
case-insensitive searching works, but there is another issue with non
UTF-8 attachments encoded in quoted-printable, but this is a minor
problem. Pg database behaves better than Mysql in this aspect.

Is there any intention to detach non textual data from the Content
column into e.g. ContentB in future?

Have anyone some knowledge or suggestion about this?

Best Regards
-- 
Zito

Discover RT's hidden secrets with RT Essentials from O'Reilly Media.
Buy a copy at http://rtbook.bestpractical.com