Re: [BUGS] [GENERAL] fuzzystrmatch module buggy? observations

2012-11-06 Thread Bruce Momjian
On Tue, Oct 30, 2012 at 02:29:09PM +0100, r d wrote:
> The fuzzystrmatch module (http://www.postgresql.org/docs/9.2/static/
> fuzzystrmatch.html) is currently, as of 9.2.1, documented with the caution "At
> present, the soundex, metaphone, dmetaphone, and dmetaphone_alt functions do
> not work well with multibyte encodings (such as UTF-8)". 
> 
> While the venerable algorithms contained in the module seem to generally work
> for Latin strings from European languages which all have accented/diacritic
> characters such as äöüñáéíóúàèìòù, for languages with non-Latin characters 
> such
> as Kyrillic, Hebrew, Arabic, Chinese, these venerable algorithms return NULL
> (empty) or plain weirdness. 
> 
> Some examples:
> 
> dmetaphone ('Новости') = 'NN'
> soundex ('Новости') = NULL
> 
> dmetaphone ('לפחות') = NULL
> soundex ('לפחות') = NULL
> 
> soundex ('相关搜索') = NULL
> dmetaphone ('相关搜索') = NULL
> 
> metaphone() crashes with SQL state: 42883 for all these strings (it tells me I
> should cast the 'unknown' input).
> 
> The string 'äöüñáéíóúàèìòù' causes metaphone(), dmetaphone(), dmetaphone_alt,
> soundex() to fail.
> 
> Only levenshtein() appears to function correctly with all above inputs, even
> when I let it compare Hebrew against Chinese strings.
> 
> Summarizing my experience:
> * for english (ASCII equivalent), the module works, 
> * for the rest of the Latin charsets (equivalent to ISO 8859-x) the module
> works unreliably,
> * for non-latin chars (UTF8 with 2-4 bytes per char) the module does not work
> 
> Note: My DB and the OS are set up for UTF-8.
> 
> This would appear to be less a problem of Postgresql and the fuzzystrmach
> module itself but because there
> appear to exist no replacement algorithms adequate for a multilingual world -
> at least that is my impression 
> after looking at the IPA and http://www.lt-world.org websites and branching 
> out
> from there.

This is a very good summary.  I was not aware of all these behaviors.

> Given all this I have no idea of this is a bug at all or the state-of-the-art
> around this topic is inadequate.

I have no idea either.

> Questions (to the developers):
> - Is there anything in work or planned for the fuzzystrmatch module?
> - Does anybody know about adequate replacements or upgrades of the soundex,
> metaphone etc. algorithms from academia?

I have not heard of anyone working in this area.  What usually happens
is some expert in the field shows up and submits a patch to improve it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Introducing floating point cast into filter drastically changes row estimate

2012-11-06 Thread Merlin Moncure
On Wed, Oct 24, 2012 at 5:40 PM, Merlin Moncure  wrote:
> On Wed, Oct 24, 2012 at 3:51 PM, Merlin Moncure  wrote:
>> On Wed, Oct 24, 2012 at 3:33 PM, Tom Lane  wrote:
>>> Merlin Moncure  writes:
 Yeah -- I have a case where a large number of joins are happening that
 have a lot of filtering based on expressions and things like that.
>>>
>>> Might be worth your while to install some indexes on those expressions,
>>> if only to trigger collection of stats about them.
>>
>> Not practical -- these expressions are all about 'outlier culling'.
>> It's just wasteful to materialize indexes for stastical purposes only.
>>  Anyways, in this case, I just refactored the query into a CTE.

Apologies for blabbing, but I was wondering if a solution to this
problem might be to have the planner identify low cost/high impact
scenarios that would qualify for simply running some of the stored
statistical values through qualifying stable expressions, particularly
when the input variables are constant or single sourced from a table.
 Over the years, the planner has been getting very precise in terms of
algorithm choice and this is making the costs of statistics misses
increasingly dangerous, a trend which I think has been reflected by
regression reports on -performance.

merlin


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #7637: postgres92-9.2.1-1.x86_64 requires libuuid.so.16()(64bit)

2012-11-06 Thread Devrim GÜNDÜZ

Hi,

On Thu, 2012-11-01 at 21:13 +, abail...@aol.com wrote:

> Im trying to install postgres 9.2 on rhel6.1 and it gives me the error
> 'postgres92-9.2.1-1.x86_64 requires libuuid.so.16()(64bit)' 
> I am trying to install offline to our internal server. How do I get the
> libuuid.so.16 where it will be found so it will install?
> 

Not sure which package you are using (I don't recall any package that
has only the name with "postgres", and that package should not require
uuid, but 

uuid

is the package you are looking for.

Regards,
-- 
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


signature.asc
Description: This is a digitally signed message part