At Tue, 28 Feb 2017 15:20:06 +0900, Michael Paquier <[email protected]> wrote in <cab7npqr49krgp6qaakal2v3hcnn+dnzv8dq_ysgbdsr6b_y...@mail.gmail.com> > On Mon, Feb 27, 2017 at 5:37 PM, Kyotaro HORIGUCHI > <[email protected]> wrote: > > At Wed, 22 Feb 2017 16:06:14 +0900, Michael Paquier > > <[email protected]> wrote in > > <cab7npqrtq+7zjxuptbsr18mxvw7mtd29mn+91n7ag8fe5ac...@mail.gmail.com> > >> In order to conduct sanity checks on the shape of the radix tree maps > >> compared to the existing maps, having map_checker surely makes sense. > >> Now in the final result I don't think we need it. The existing map > >> files ought to be replaced by their radix versions at the end, and > >> map_checker should be removed. This leads to a couple of > >> simplifications, like Makefile, and reduces the maintenance to one > >> mechanism. > > > > Hmm.. Though I don't remember clearly what the radix map of the > > first version looked like, the current radix map seems > > human-readable for me. It might be by practice or by additional > > comments in map files. Anyway I removed all of the stuff so as > > not to generate the plain maps. But I didn't change the names of > > _radix.map and just commented out the line to output the plain > > maps in UCS_to_*.pl. Combined maps are still in the plain format > > so print_tables was changed to take character tables separately > > for regular (non-combined) characters and combined characters. > > Do others have thoughts to offer on the matter? I would think that the > new radix maps should just replace by the old plain ones, and that the > only way to build the maps going forward is to use the new methods. > The radix trees is the only thing used in the backend code as well > (conv.c). We could keep the way to build the old maps, with the > map_checker in module out of the core code. FWIW, I am fine to add the > old APIs in my plugin repository on github and have the sanity checks > in that as well. And of course also publish on this thread a module to > do that.
I couldn't make out my mind to move to radix tree completely, but
UtfToLocal/LocalToUtf no longer handle the "plain map"s for
non-combined character so they have lost their planground. Okay,
I think I removed all the trace of the plain map era.
Every characters in a mapping has a comment that describes what
the character is or where it is defined. This information is no
longer useful (radix map doesn't have a plance to show it) but
I left it for debug use. (This might just be justification..)
> > - Split the property {direction} into two boolean properties
> > {to_unicode} and {from_unicode}.
> >
> > - Make the {direction} property an integer and compared with
> > defined constants $BOTH, $TO_UNICODE and $FROM_UNICODE using
> > the '=' operator.
> >
> > I choosed the former in this patch.
>
> Fine for me.
Thanks.
> >> + $charmap{ ucs2utf($src) } = $dst;
> >> + }
> >> +
> >> + }
> >> Unnecessary newline here.
> >
> > removed in convutils.pm.
> >
> > Since Makefile ignores old .map files, the steps to generate a
> > patch for map files was a bit chaged.
> >
> > $ rm *.map
> > $ make distclean maintainer-clean all
> > $ make distclean
> > $ git add .
> > $ git commit
>
> +# ignore generated files
> +/map_checker
> +/map_checker.h
> [...]
> +map_checker.h: make_mapchecker.pl $(MAPS) $(RADIXMAPS)
> + $(PERL) $<
> +
> +map_checker.o: map_checker.c map_checker.h ../char_converter.c
> +
> +map_checker: map_checker.o
> With map_checker out of the game, those things are not needed.
Ouch! Thanks for pointing out it. Removed.
> +++ b/src/backend/utils/mb/char_converter.c
> @@ -0,0 +1,116 @@
> +/*-------------------------------------------------------------------------
> + *
> + * Character converter function using radix tree
> In the simplified version of the patch, pg_mb_radix_conv() being only
> needed in conv.c I think that this could just be a static local
> routine.
>
> -#include "../../Unicode/utf8_to_koi8r.map"
> -#include "../../Unicode/koi8r_to_utf8.map"
> -#include "../../Unicode/utf8_to_koi8u.map"
> -#include "../../Unicode/koi8u_to_utf8.map"
> +#include "../../Unicode/utf8_to_koi8r_radix.map"
> +#include "../../Unicode/koi8r_to_utf8_radix.map"
> +#include "../../Unicode/utf8_to_koi8u_radix.map"
> +#include "../../Unicode/koi8u_to_utf8_radix.map"
> FWIW, I am fine to use those new names as include points.
>
> -distclean: clean
> +distclean:
> rm -f $(TEXTS)
> -maintainer-clean: distclean
> +# maintainer-clean intentionally leaves $(TEXTS)
> +maintainer-clean:
> Why is that? There is also a useless diff down that code block.
It *was* for convenience but now it is automatically downloaded
so such distinction donsn't offer anything good. Changed it to
remove $(TEXTS).
> +conv.o: conv.c char_converter.c
> This also can go away.
Touching char_converter.c will be ignored if it is removed. Did
you mistake it for map_checker?
> -print_tables("EUC_JIS_2004", \@all, 1);
> +# print_tables("EUC_JIS_2004", \@regular, undef, 1);
> +print_radix_trees("EUC_JIS_2004", \@regular);
> +print_tables("EUC_JIS_2004", undef, \@combined, 1);
> [...]
> sub print_tables
> {
> - my ($charset, $table, $verbose) = @_;
> + my ($charset, $regular, $combined, $verbose) = @_;
> print_tables is only used for combined maps, you could remove $regular
> from it and just keep $combined around, perhaps renaming print_tables
> to print_combined_maps?
Renamed to print_combied_maps.
And the code-comment pointed in the comment by the previous mail
is rewritten as Robert's suggestion.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
0001-Use-radix-tree-for-character-conversion.patch.gz
Description: Binary data
-- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
