Re: [HACKERS] Strange errors from 9.2.1 and 9.2.2 (I hope I'm missing something obvious)
On Dec 11, 2012 9:28 PM, David Gould da...@sonic.net wrote: Thank you. I got the example via cut and paste from email and pasted it into psql on different hosts. od tells me it ends each line with: \n followed by 0xC2 0xA0 and then normal spaces. The C2A0 thing is apparently NO-BREAK SPACE. Invisible, silent, odorless but still deadly. Which will teach me not to accept text files from the sort of people who write code in Word I guess. It's not just Word... I was bitten by this last week by a WYSIWYG HTML widget I was using to write some documentation. When I copied the examples I had created out of said environment during a final technical accuracy pass and they failed to run in psql, I panicked for a few minutes. I eventually determined that, rather than just wrapping my code in pre tags, the widget had created nbsp; entities that were faithfully converted into Unicode non-breaking spaces in the psql input.
Re: [HACKERS] Extending range of to_tsvector et al
On Sun, Sep 30, 2012 at 1:56 PM, johnkn63 john.knight...@gmail.com wrote: When using to_tsvector a number of newer unicode characters and pua characters are not included. How do I add the characters which I desire to be found? I've just started digging into this code a bit, but from what I've found src/backend/tsearch/wparser_def.c defines much of the parser functionality, and in the area of Unicode includes a number of comments like: * with multibyte encoding and C-locale isw* function may fail or give wrong result. * multibyte encoding and C-locale often are used for Asian languages. * any non-ascii symbol with multibyte encoding with C-locale is an alpha character ... in concert with ifdefs around WIDE_UPPER_LOWER (in effect if WCSTOMBS and TOWLOWER are available) to complicate testing scenarios :) Also note that src/test/regress/sql/tsearch.sql and regress/sql/tsdicts.sql currently focus on English, ASCII-only data. Perhaps this is a good opportunity for you to describe what your environment looks like (OS, PostgreSQL version, encoding and locale settings for the database) and show some sample to_tsquery() @@ to_tsvector() queries that don't behave the way you think they should behave - and we could start building some test cases as a first step? -- Dan Scott Laurentian University -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doc patch, normalize search_path in index
On Fri, Sep 28, 2012 at 1:40 PM, Karl O. Pinc k...@meme.com wrote: Hi, The attached patch (against git head) normalizes search_path as the thing indexed and uses a secondary index term to distinguish the configuration parameter from the run-time setting. Makes sense to me, although I suspect the conceptual material is better served by the search path-the-concept index entry and the reference material by the search_path configuration parameter entry (so, from that perspective, perhaps the patch should just be to remove the search_path index entry from the DDL schemas conceptual section). search path the concept remains distinguished in the index from search_path the setting/config param. It's hard to say whether it's useful to make this distinction. I think that indexing search path-the-concept is useful for translations, and the Japanese translation includes an index (I couldn't find the index for the French translation). -- Dan Scott Laurentian University -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending range of to_tsvector et al
Hi John: On Sun, Sep 30, 2012 at 11:45 PM, john knightley john.knight...@gmail.com wrote: Dear Dan, thank you for your reply. The OS I am using is Ubuntu 12.04, with PostgreSQL 9.1.5 installed on a utf8 local A short 5 line dictionary file is sufficient to test:- raeuz 我们 昭厵 꽖떂 撘䮬 line 1 raeuz Zhuang word written using English letters and show up under ts_vector ok line 2 我们 uses everyday Chinese word and show up under ts_vector ok line 3 昭厵 Zhuang word written using rather old Chinese charcters found in Unicode 3.1 which came in about the year 2000 and show up under ts_vector ok line 4 꽖떂 Zhuang word written using rather old Chinese charcters found in Unicode 5.2 which came in about the year 2009 but do not show up under ts_vector ok line 5 撘䮬 Zhuang word written using rather old Chinese charcters found in PUA area of the font Sawndip.ttf but do not show up under ts_vector ok (Font can be downloaded from http://gdzhdb.l10n-support.com/sawndip-fonts/Sawndip.ttf) The last two words even though included in a dictionary do not get accepted by ts_vector. Hmm. Fedora 17 x86-64 w/ PostgreSQL 9.1.5 here, the latter seems to work using the default text search configuration (albeit with one crucial note: I created the database with the lc_ctype=C lc_collate=C options): WORKING: createdb --template=template0 --lc-ctype=C --lc-collate=C foobar foobar=# select ts_debug('撘䮬'); ts_debug (word,Word, all letters,撘䮬,{english_stem},english_stem,{撘䮬}) (1 row) NOT WORKING AS EXPECTED: foobaz=# SHOW LC_CTYPE; lc_ctype - en_US.UTF-8 (1 row) foobaz=# select ts_debug('撘䮬'); ts_debug - (blank,Space symbols,撘䮬,{},,) (1 row) So... perhaps LC_CTYPE=C is a possible workaround for you? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql gram.y make rule
On Mon, Sep 24, 2012 at 10:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: I wanted to refactor the highly redundant flex and bison rules throughout the source into common pattern rules. (Besides saving some redundant code, this could also help some occasionally flaky code in pgxs modules.) The only outlier that breaks this is in plpgsql pl_gram.c: gram.y I would like to either rename the intermediate file(s) to gram.{c,h}, or possibly rename the source file to pl_gram.y. Any preferences or other comments? Hmmm ... it's annoyed me for a long time that that file is named the same as the core backend's gram.y. So renaming to pl_gram.y might be better. On the other hand I have very little confidence in git's ability to preserve change history if we do that. Has anyone actually done a file rename in a project with lots of history, and how well did it turn out? (For instance, does git blame still provide any useful tracking of pre-rename changes? If you try to cherry-pick a patch against the new file into a pre-rename branch, does it work?) git handles renaming just fine with cherry-picks, no special options necessary. (Well, there are probably corner cases, but it's code, there are always corner cases!) For git log, you'll want to add the --follow parameter if you're asking for the history of a specific file or directory beyond a renaming event. git blame will show you the commit that renamed the file, by default, but then you can request the revision prior to that using the commit hash || '^', for example. git blame 2fb6cc90^ -- src/backend/parser/gram.y to work your way back through history. -- Dan Scott Laurentian University -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Doc typo: lexems - lexemes
I ran across a minor typo while reviewing the full-text search documentation. Attached is a patch to address the one usage of lexems in a sea of lexemes. diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml new file mode 100644 index 978aa54..5305198 *** a/doc/src/sgml/textsearch.sgml --- b/doc/src/sgml/textsearch.sgml *** ts_rank(optional replaceable class=P *** 867,873 listitem para ! Ranks vectors based on the frequency of their matching lexems. /para /listitem /varlistentry --- 867,873 listitem para ! Ranks vectors based on the frequency of their matching lexemes. /para /listitem /varlistentry -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers