Buildfarm member hamerkop has a niggle about this patch:
c:\\build-farm-local\\buildroot\\head\\pgsql.build\\contrib\\fuzzystrmatch\\daitch_mokotoff.c
: warning C4819: The file contains a character that cannot be represented in
the current code page (932). Save the file in Unicode format to
On 2023-04-07 Fr 23:25, Tom Lane wrote:
I wrote:
Anyway, I assume this is just syntactic sugar for something
we can do another way? If it's at all fundamental, I'll have
to back the patch out.
On closer inspection, this script is completely devoid of any
need to deal in non-ASCII data at
I wrote:
> Anyway, I assume this is just syntactic sugar for something
> we can do another way? If it's at all fundamental, I'll have
> to back the patch out.
On closer inspection, this script is completely devoid of any
need to deal in non-ASCII data at all. So I just nuked the
"use" lines.
Andrew Dunstan writes:
> On 2023-04-07 Fr 21:52, Tom Lane wrote:
>> pipit appears to be running a reasonably current system (RHEL8), so
>> the claim that "open" is a Perl core module appears false. We need
>> to rewrite this to not use that.
> I think it is a core module (See
On 2023-04-07 Fr 21:52, Tom Lane wrote:
I wrote:
I pushed this after some mostly-cosmetic fiddling. Most of the
buildfarm seems okay with it,
Spoke too soon [1]:
make[1]: Entering directory
'/home/linux1/build-farm-16-pipit/buildroot/HEAD/pgsql.build/contrib/fuzzystrmatch'
'/usr/bin/perl'
I wrote:
> I pushed this after some mostly-cosmetic fiddling. Most of the
> buildfarm seems okay with it,
Spoke too soon [1]:
make[1]: Entering directory
'/home/linux1/build-farm-16-pipit/buildroot/HEAD/pgsql.build/contrib/fuzzystrmatch'
'/usr/bin/perl' daitch_mokotoff_header.pl
Andres Freund writes:
> On 2023-04-07 21:13:43 -0400, Tom Lane wrote:
>> I pushed this after some mostly-cosmetic fiddling. Most of the
>> buildfarm seems okay with it, but crake's perlcritic run is not:
>>
>> ./contrib/fuzzystrmatch/daitch_mokotoff_header.pl: I/O layer ":utf8" used at
>> line
Hi,
On 2023-04-07 21:13:43 -0400, Tom Lane wrote:
> I wrote:
> > That seems fine to me. I'll check this over and see if I can get
> > it pushed today.
>
> I pushed this after some mostly-cosmetic fiddling. Most of the
> buildfarm seems okay with it, but crake's perlcritic run is not:
>
>
I wrote:
> That seems fine to me. I'll check this over and see if I can get
> it pushed today.
I pushed this after some mostly-cosmetic fiddling. Most of the
buildfarm seems okay with it, but crake's perlcritic run is not:
./contrib/fuzzystrmatch/daitch_mokotoff_header.pl: I/O layer ":utf8"
Tomas Vondra writes:
> Hi, I think from the technical point of view it's sound and ready for
> commit. The patch stalled on the copyright/credit stuff, which is
> somewhat separate and mostly non-technical aspect of patches. Sorry for
> that, I'm sure it's annoying/frustrating :-(
> I see the
On 4/3/23 15:19, Dag Lem wrote:
> Dag Lem writes:
>
>> I sincerely hope this resolves any blocking issues with copyright /
>> legalese / credits.
>>
>
> Can this now be considered ready for commiter, so that Paul or someone
> else can flip the bit?
>
Hi, I think from the technical point of
Dag Lem writes:
> I sincerely hope this resolves any blocking issues with copyright /
> legalese / credits.
>
Can this now be considered ready for commiter, so that Paul or someone
else can flip the bit?
Best regards
Dag Lem
Dag Lem writes:
> Tomas Vondra writes:
>
>> On 2/7/23 18:08, Paul Ramsey wrote:
>>>
>>>
On Feb 7, 2023, at 6:47 AM, Dag Lem wrote:
I just went by to check the status of the patch, and I noticed that
you've added yourself as reviewer earlier - great!
Please tell
I sincerely hope this resolves any blocking issues with copyright /
legalese / credits.
Best regards
Dag Lem
diff --git a/contrib/fuzzystrmatch/Makefile b/contrib/fuzzystrmatch/Makefile
index 0704894f88..12baf2d884 100644
--- a/contrib/fuzzystrmatch/Makefile
+++ b/contrib/fuzzystrmatch/Makefile
Hi,
On 2023-02-09 10:28:36 +0100, Dag Lem wrote:
> I'll ask again, would the proposed credits be acceptable? In this case,
> the code already existed elsewhere (as in your example for double
> metaphone) as a separate extension. The copyright owner is OK with
> copyright assignment, however I
Tomas Vondra writes:
> On 2/8/23 15:31, Dag Lem wrote:
>> Alvaro Herrera writes:
>>
>>> On 2023-Jan-17, Dag Lem wrote:
>>>
+ * Daitch-Mokotoff Soundex
+ *
+ * Copyright (c) 2021 Finance Norway
+ * Author: Dag Lem
>>>
>>> Hmm, I don't think we accept copyright lines that
On 2/8/23 15:31, Dag Lem wrote:
> Alvaro Herrera writes:
>
>> On 2023-Jan-17, Dag Lem wrote:
>>
>>> + * Daitch-Mokotoff Soundex
>>> + *
>>> + * Copyright (c) 2021 Finance Norway
>>> + * Author: Dag Lem
>>
>> Hmm, I don't think we accept copyright lines that aren't "PostgreSQL
>> Global
Alvaro Herrera writes:
> On 2023-Jan-17, Dag Lem wrote:
>
>> + * Daitch-Mokotoff Soundex
>> + *
>> + * Copyright (c) 2021 Finance Norway
>> + * Author: Dag Lem
>
> Hmm, I don't think we accept copyright lines that aren't "PostgreSQL
> Global Development Group". Is it okay to use that, and
Tomas Vondra writes:
> On 2/7/23 18:08, Paul Ramsey wrote:
>>
>>
>>> On Feb 7, 2023, at 6:47 AM, Dag Lem wrote:
>>>
>>> I just went by to check the status of the patch, and I noticed that
>>> you've added yourself as reviewer earlier - great!
>>>
>>> Please tell me if there is anything I can
On 2023-Jan-17, Dag Lem wrote:
> + * Daitch-Mokotoff Soundex
> + *
> + * Copyright (c) 2021 Finance Norway
> + * Author: Dag Lem
Hmm, I don't think we accept copyright lines that aren't "PostgreSQL
Global Development Group". Is it okay to use that, and update the year
to 2023? (Note that
On 2/7/23 18:08, Paul Ramsey wrote:
>
>
>> On Feb 7, 2023, at 6:47 AM, Dag Lem wrote:
>>
>> I just went by to check the status of the patch, and I noticed that
>> you've added yourself as reviewer earlier - great!
>>
>> Please tell me if there is anything I can do to help bring this across
>>
> On Feb 7, 2023, at 6:47 AM, Dag Lem wrote:
>
> I just went by to check the status of the patch, and I noticed that
> you've added yourself as reviewer earlier - great!
>
> Please tell me if there is anything I can do to help bring this across
> the finish line.
Honestly, I had set it to
Hi Paul,
I just went by to check the status of the patch, and I noticed that
you've added yourself as reviewer earlier - great!
Please tell me if there is anything I can do to help bring this across
the finish line.
Best regards,
Dag Lem
Alvaro Herrera writes:
> On 2023-Jan-05, Dag Lem wrote:
>
>> Is there anything else I should do here, to avoid the status being
>> incorrectly stuck at "Waiting for Author" again.
>
> Just mark it Needs Review for now. I'll be back from vacation on Jan
> 11th and can have a look then (or
Paul Ramsey writes:
>> On Jan 12, 2023, at 7:30 AM, Dag Lem wrote:
>>
[...]
>>
>> Sure, I can do that. You don't think this much example text will be
>> TL;DR?
>
> I can only speak for myself, but examples are the meat of
> documentation learning, so as long as they come with enough
>
> On Jan 12, 2023, at 7:30 AM, Dag Lem wrote:
>
> Paul Ramsey writes:
>
>> On Mon, Jan 2, 2023 at 2:03 PM Dag Lem wrote:
>>
>>> I also improved on the documentation example (using Full Text Search).
>>> AFAIK you can't make general queries like that using arrays, however in
>>> any case I
Paul Ramsey writes:
> On Mon, Jan 2, 2023 at 2:03 PM Dag Lem wrote:
>
>> I also improved on the documentation example (using Full Text Search).
>> AFAIK you can't make general queries like that using arrays, however in
>> any case I must admit that text arrays seem like more natural building
>>
On Mon, Jan 2, 2023 at 2:03 PM Dag Lem wrote:
> I also improved on the documentation example (using Full Text Search).
> AFAIK you can't make general queries like that using arrays, however in
> any case I must admit that text arrays seem like more natural building
> blocks than space delimited
Alvaro Herrera writes:
> On 2023-Jan-05, Dag Lem wrote:
>
>> Is there anything else I should do here, to avoid the status being
>> incorrectly stuck at "Waiting for Author" again.
>
> Just mark it Needs Review for now. I'll be back from vacation on Jan
> 11th and can have a look then (or
On 2023-Jan-05, Dag Lem wrote:
> Is there anything else I should do here, to avoid the status being
> incorrectly stuck at "Waiting for Author" again.
Just mark it Needs Review for now. I'll be back from vacation on Jan
11th and can have a look then (or somebody else can, perhaps.)
--
Álvaro
Is there anything else I should do here, to avoid the status being
incorrectly stuck at "Waiting for Author" again.
Best regards
Dag Lem
Sorry about the latest unfinished email - don't know what key
combination I managed to hit there.
Alvaro Herrera writes:
> Hello
>
> On 2022-Dec-23, Dag Lem wrote:
>
[...]
>
> So, yes, I'm proposing that we returns those as array elements and that
> @> is used to match them.
>
Looking into
Alvaro Herrera writes:
> Hello
>
> On 2022-Dec-23, Dag Lem wrote:
>
[...]
> So, yes, I'm proposing that we returns those as array elements and that
> @> is used to match them.
Looking into the array operators I guess that to match such arrays
directly one would actually use && (overlaps)
Hello
On 2022-Dec-23, Dag Lem wrote:
> It seems to me like you're trying to use soundex coding for something it
> was never designed for.
I'm not trying to use it for anything, actually. I'm just reading the
pages your patch links to, to try and understand how this algorithm can
be best
Dag Lem writes:
> Alvaro Herrera writes:
>
>> On 2022-Dec-23, Alvaro Herrera wrote:
>>
>
> [...]
>
>> I tried downloading a list of surnames from here
>> https://www.bibliotecadenombres.com/apellidos/apellidos-espanoles/
>> pasted that in a text file and \copy'ed it into a table. Then I ran
>>
Alvaro Herrera writes:
> On 2022-Dec-23, Alvaro Herrera wrote:
>
[...]
> I tried downloading a list of surnames from here
> https://www.bibliotecadenombres.com/apellidos/apellidos-espanoles/
> pasted that in a text file and \copy'ed it into a table. Then I ran
> this query
>
> select
Alvaro Herrera writes:
> I wonder why do you have it return the multiple alternative codes as a
> space-separated string. Maybe an array would be more appropriate. Even
> on your documented example use, the first thing you do is split it on
> spaces.
In the example, the *input* is split on
Tom Lane writes:
> Alvaro Herrera writes:
>> On 2022-Dec-22, Dag Lem wrote:
>>> This should hopefully fix the last Cfbot failures, by exclusion of
>>> daitch_mokotoff.h from headerscheck and cpluspluscheck.
>
>> Hmm, maybe it'd be better to move the typedefs to the .h file instead.
>
> Indeed,
Andres Freund writes:
> On 2022-12-22 14:27:54 +0100, Dag Lem wrote:
>> This should hopefully fix the last Cfbot failures, by exclusion of
>> daitch_mokotoff.h from headerscheck and cpluspluscheck.
>
> Btw, you can do the same tests as cfbot in your own repo by enabling CI
> in a github repo.
Alvaro Herrera writes:
> On 2022-Dec-22, Dag Lem wrote:
>> This should hopefully fix the last Cfbot failures, by exclusion of
>> daitch_mokotoff.h from headerscheck and cpluspluscheck.
> Hmm, maybe it'd be better to move the typedefs to the .h file instead.
Indeed, that sounds like exactly the
On 2022-Dec-23, Alvaro Herrera wrote:
> I wonder why do you have it return the multiple alternative codes as a
> space-separated string. Maybe an array would be more appropriate. Even
> on your documented example use, the first thing you do is split it on
> spaces.
I tried downloading a list
I wonder why do you have it return the multiple alternative codes as a
space-separated string. Maybe an array would be more appropriate. Even
on your documented example use, the first thing you do is split it on
spaces.
--
Álvaro Herrera PostgreSQL Developer —
On 2022-Dec-22, Dag Lem wrote:
> This should hopefully fix the last Cfbot failures, by exclusion of
> daitch_mokotoff.h from headerscheck and cpluspluscheck.
Hmm, maybe it'd be better to move the typedefs to the .h file instead.
--
Álvaro Herrera PostgreSQL Developer —
On 2022-12-22 14:27:54 +0100, Dag Lem wrote:
> This should hopefully fix the last Cfbot failures, by exclusion of
> daitch_mokotoff.h from headerscheck and cpluspluscheck.
Btw, you can do the same tests as cfbot in your own repo by enabling CI
in a github repo. See src/tools/ci/README
Dag Lem writes:
> Hi Ian,
>
> Ian Lawrence Barwick writes:
>
[...]
>> I see you provided some feedback on
>> https://commitfest.postgresql.org/36/3468/,
>> though the patch seems to have not been accepted (but not
>> conclusively rejected
>> either). If you still have the chance to review
Dag Lem writes:
> I noticed that the Meson builds failed in Cfbot, the updated patch adds
> a missing "include_directories" line to meson.build.
>
This should hopefully fix the last Cfbot failures, by exclusion of
daitch_mokotoff.h from headerscheck and cpluspluscheck.
Best regards
Dag Lem
I noticed that the Meson builds failed in Cfbot, the updated patch adds
a missing "include_directories" line to meson.build.
Best regards
Dag Lem
diff --git a/contrib/fuzzystrmatch/Makefile b/contrib/fuzzystrmatch/Makefile
index 0704894f88..12baf2d884 100644
--- a/contrib/fuzzystrmatch/Makefile
Hi Andreas,
Thank you for your detailed and constructive review!
I have made a conscientuous effort to address all the issues you point
out, please see comments below.
Andres Freund writes:
> Hi,
>
> On 2022-02-03 15:27:32 +0100, Dag Lem wrote:
[...]
> [23:43:34.796]
Hi,
On 2022-02-03 15:27:32 +0100, Dag Lem wrote:
> Just some minor adjustments to the patch:
>
> * Removed call to locale-dependent toupper()
> * Cleaned up input normalization
This patch currently fails in cfbot, likely because meson.build needs to be
adjusted (this didn't exist at the time
Hi Ian,
Ian Lawrence Barwick writes:
> Hi Dag
>
> 2022年2月3日(木) 23:27 Dag Lem :
>>
>> Hi,
>>
>> Just some minor adjustments to the patch:
>>
>> * Removed call to locale-dependent toupper()
>> * Cleaned up input normalization
>
> This patch was marked as "Waiting on Author" in the CommitFest
Hi Dag
2022年2月3日(木) 23:27 Dag Lem :
>
> Hi,
>
> Just some minor adjustments to the patch:
>
> * Removed call to locale-dependent toupper()
> * Cleaned up input normalization
This patch was marked as "Waiting on Author" in the CommitFest entry [1]
but I see you provided an updated version which
Hi,
Just some minor adjustments to the patch:
* Removed call to locale-dependent toupper()
* Cleaned up input normalization
I have been asked to sign up to review a commitfest patch or patches -
unfortunately I've been ill with COVID-19 and it's not until now that
I feel well enough to have a
Dag Lem writes:
> Thomas Munro writes:
>
>> On Wed, Jan 5, 2022 at 2:49 AM Dag Lem wrote:
>>> However I guess this won't make any difference wrt. actually running the
>>> tests, as long as there seems to be an encoding problem in the cfbot
>>
>> Fixed -- I told it to pull down patches as
Dag Lem writes:
> Please find attached a patch to run the previously commented-out
> UTF8-dependent tests for citext, according to best practice. For now I
> don't dare to touch the unaccent module, which seems to be UTF8-only
> anyway.
I tried this on a bunch of different locale settings and
Tom Lane writes:
> Dag Lem writes:
>
>> Running "ack -l '[\x80-\xff]'" in the contrib/ directory reveals that
>> two other modules are using UTF8 characters in tests - citext and
>> unaccent.
>
> Yeah, neither of those have been upgraded to said best practice.
> (If you feel like doing the
Thomas Munro writes:
> On Wed, Jan 5, 2022 at 2:49 AM Dag Lem wrote:
>> However I guess this won't make any difference wrt. actually running the
>> tests, as long as there seems to be an encoding problem in the cfbot
>
> Fixed -- I told it to pull down patches as binary, not text. Now it
>
On Wed, Jan 5, 2022 at 2:49 AM Dag Lem wrote:
> However I guess this won't make any difference wrt. actually running the
> tests, as long as there seems to be an encoding problem in the cfbot
Fixed -- I told it to pull down patches as binary, not text. Now it
makes commits that look healthier,
Andres Freund writes:
> Hi,
>
> On 2022-01-02 21:41:53 -0500, Tom Lane wrote:
>> ... so, that test case is guaranteed to fail in non-UTF8 encodings,
>> I suppose? I wonder what the LANG environment is in that cfbot
>> instance.
>
> LANG="en_US.UTF-8"
>
> But it looks to me like the problem is
Hi,
On 2022-01-02 21:41:53 -0500, Tom Lane wrote:
> ... so, that test case is guaranteed to fail in non-UTF8 encodings,
> I suppose? I wonder what the LANG environment is in that cfbot
> instance.
LANG="en_US.UTF-8"
But it looks to me like the problem is in the commit cfbot creates, rather
Dag Lem writes:
> Tom Lane writes:
>> (We do have methods for dealing with non-ASCII test cases, but
>> I can't see that this patch is using any of them.)
> I naively assumed that tests would be run in an UTF8 environment.
Nope, not necessarily.
Our current best practice for this is to
Tom Lane writes:
> Thomas Munro writes:
>> Erm, it looks like something weird is happening somewhere in cfbot's
>> pipeline, because Dag's patch says:
>
>> +SELECT daitch_mokotoff('Straßburg');
>> + daitch_mokotoff
>> +-
>> + 294795
>> +(1 row)
>
> ... so, that test case is
Thomas Munro writes:
> Erm, it looks like something weird is happening somewhere in cfbot's
> pipeline, because Dag's patch says:
> +SELECT daitch_mokotoff('Straßburg');
> + daitch_mokotoff
> +-
> + 294795
> +(1 row)
... so, that test case is guaranteed to fail in non-UTF8
On Mon, Jan 3, 2022 at 10:32 AM Andres Freund wrote:
> On 2021-12-21 22:41:18 +0100, Dag Lem wrote:
> > This is my very first code contribution to PostgreSQL, and I would be
> > grateful for any advice on how to proceed in order to get the patch
> > accepted.
>
> Currently the tests don't seem to
On 2021-12-21 22:41:18 +0100, Dag Lem wrote:
> This is my very first code contribution to PostgreSQL, and I would be
> grateful for any advice on how to proceed in order to get the patch
> accepted.
Currently the tests don't seem to pass on any platform:
Hello again,
It turns out that there actually exists an(other) implementation of
the Daitch-Mokotoff Soundex System which gets it right; the JOS
Soundex Calculator at https://www.jewishgen.org/jos/jossound.htm
Other implementations I have been able to find, like the one in Apache
Commons Coded
Tomas Vondra writes:
[...]
>
> Thanks, looks interesting. A couple generic comments, based on a quick
> code review.
Thank you for the constructive review!
>
> 1) Can the extension be marked as trusted, just like fuzzystrmatch?
I have now moved the daitch_mokotoff function into the
On 12/13/21 16:05, Andrew Dunstan wrote:
On 12/13/21 09:26, Tomas Vondra wrote:
On 12/13/21 14:38, Dag Lem wrote:
Please find attached an updated patch, with the following fixes:
* Replaced remaining malloc/free with palloc/pfree.
* Made "make check" pass.
* Updated notes on other
On 12/13/21 09:26, Tomas Vondra wrote:
> On 12/13/21 14:38, Dag Lem wrote:
>> Please find attached an updated patch, with the following fixes:
>>
>> * Replaced remaining malloc/free with palloc/pfree.
>> * Made "make check" pass.
>> * Updated notes on other implementations.
>>
>
> Thanks, looks
On 12/13/21 14:38, Dag Lem wrote:
Please find attached an updated patch, with the following fixes:
* Replaced remaining malloc/free with palloc/pfree.
* Made "make check" pass.
* Updated notes on other implementations.
Thanks, looks interesting. A couple generic comments, based on a quick
ELECT daitch_mokotoff('OBr''ien');
+SELECT daitch_mokotoff('OBri''en');
+SELECT daitch_mokotoff('OBrie''n');
+SELECT daitch_mokotoff('OBrien''');
+
+-- testEncodeIgnoreHyphens
+SELECT daitch_mokotoff('KINGSMITH');
+SELECT daitch_mokotoff('-KINGSMITH');
+SELECT daitch_mokotoff('K-INGSMITH');
+SELECT daitch_mokotoff
Hello,
Please find attached a patch for the daitch_mokotoff module.
This implements the Daitch-Mokotoff Soundex System, as described in
https://www.avotaynu.com/soundex.htm
The module is used in production at Finance Norway.
In order to verify correctness, I have compared generated soundex
71 matches
Mail list logo