Re: "ICU - International Components for Unicode"

2020-09-29 Thread Matthew Stuckwisch
In #raku it was mentioned that it would be nice to have a $*UNICODE variable of 
sorts that reports back the version, but not sure how that would be from an 
implementation POV.

I'm also late to the discussion, so pardon me jumping back a bit.  Basically, 
ICU is something that lets you quickly add in robust Unicode support.  But it's 
also a swiss army knife and overkill for what Raku generally needs (at 
whichever its implemented in), and also limiting in some ways because you 
become beholden to their structures which as Samantha pointed out, doesn't work 
for MoarVM's approach.  Rolling your own has a lot of advantages.

Beyond UCD and UAC (sorting), everything else really should go into module land 
since they're heavily based on an ever changing and growing CLDR, and even 
then, there can be good arguments made for putting sorting in module space too. 
 For reasons like performance, code clarity, data size, etc, companies have 
rolled their own ICU-like libraries (Google's Closure for JS, TwitterCLDR in 
Ruby, etc) running on the same CLDR data.  In Raku (shameless selfplug), a lot 
is already available in the Intl namespace.  There are actually some very cool 
things that can be done mixing CLDR and Raku like creating new 
character-class-like tokens, or even extending built ins — they just don't have 
any business being near core, just... core-like :-)

Matéu


PS: For understanding some of Samantha's incredible work, her talks at the 
Amsterdam convention are really great, and Perl Weekly has an archive of her 
grant write ups:
  Articles: https://perlweekly.com/a/samantha-mcvey.html
  High End Unicode in Perl 6: https://www.youtube.com/watch?v=Oj_lgf7A2LM
  Unicode Internals of Perl 6: https://www.youtube.com/watch?v=9Vv7nUUDdeA
  

> On Sep 29, 2020, at 3:14 PM, William Michels via perl6-users 
>  wrote:
> 
> Thank you, Samantha!
> 
> An outstanding question is one posed by Joseph Brenner--that
> is--knowing which version of the Unicode standard is supported by
> Raku. I grepped through two files, one called "unicode.c" and the
> other called "unicode_db.c". They're both located in rakudo at:
> /rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/ .
> 
> Below are the first 4 lines of my grep results. As you can see
> (above/below), rakudo-2020.06 supports Unicode12.1.0:
> 
> ~$ raku -ne '.say if .grep(/unicode/)'
> ~/rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/unicode_db.c
> # For terms of use, see http://www.unicode.org/terms_of_use.html
> # The UAXes can be accessed at http://www.unicode.org/versions/Unicode12.1.0/
> From http://unicode.org/copyright.html#Exhibit1 on 2017-11-28:
> Distributed under the Terms of Use in http://www.unicode.org/copyright.html.
> 
> 
> It would be really interesting to follow your Unicode work, Samantha.
> The ideas you propose are interesting and everyone hopes for speed
> improvements. Is there any place Raku-uns can go to read
> updates--maybe a grant report, blog, or Github issue? Or maybe right
> here, on the Perl6-Users mailing list? Thanks in advance.
> 
> Best, Bill.
> 
> W. Michels, Ph.D.
> 
> 
> 
> On Sun, Sep 27, 2020 at 4:03 AM Samantha McVey  wrote:
>> 
>> So MoarVM uses its own database of the UCD. One nice thing is this can
>> probably be faster than calling to the ICU to look up information of each
>> codepoint in a long string. Secondly it implements its own text data
>> structures, so the nice features of the UCD to do that would be difficult to
>> use.
>> 
>> In my opinion, it could make sense to use ICU for things like localized
>> collation (sorting). It also could make sense to use ICU for unicode
>> properties lookup for properties that don't have to do with grapheme
>> segmentation or casing. This would be a lot of work but if something like 
>> this
>> were implemented it would probably happen in the context of a larger
>> rethinking of how we use unicode. Though everything is complicated by that we
>> support lots of complicated regular expressions on different unicode
>> properties. I guess first I'd start by benchmarking the speed of ICU and
>> comparing to the current implementation.
>> 
>> 


Re: "ICU - International Components for Unicode"

2020-09-29 Thread William Michels via perl6-users
Thank you, Samantha!

An outstanding question is one posed by Joseph Brenner--that
is--knowing which version of the Unicode standard is supported by
Raku. I grepped through two files, one called "unicode.c" and the
other called "unicode_db.c". They're both located in rakudo at:
/rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/ .

Below are the first 4 lines of my grep results. As you can see
(above/below), rakudo-2020.06 supports Unicode12.1.0:

~$ raku -ne '.say if .grep(/unicode/)'
~/rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/unicode_db.c
# For terms of use, see http://www.unicode.org/terms_of_use.html
# The UAXes can be accessed at http://www.unicode.org/versions/Unicode12.1.0/
>From http://unicode.org/copyright.html#Exhibit1 on 2017-11-28:
Distributed under the Terms of Use in http://www.unicode.org/copyright.html.


It would be really interesting to follow your Unicode work, Samantha.
The ideas you propose are interesting and everyone hopes for speed
improvements. Is there any place Raku-uns can go to read
updates--maybe a grant report, blog, or Github issue? Or maybe right
here, on the Perl6-Users mailing list? Thanks in advance.

Best, Bill.

W. Michels, Ph.D.



On Sun, Sep 27, 2020 at 4:03 AM Samantha McVey  wrote:
>
> So MoarVM uses its own database of the UCD. One nice thing is this can
> probably be faster than calling to the ICU to look up information of each
> codepoint in a long string. Secondly it implements its own text data
> structures, so the nice features of the UCD to do that would be difficult to
> use.
>
> In my opinion, it could make sense to use ICU for things like localized
> collation (sorting). It also could make sense to use ICU for unicode
> properties lookup for properties that don't have to do with grapheme
> segmentation or casing. This would be a lot of work but if something like this
> were implemented it would probably happen in the context of a larger
> rethinking of how we use unicode. Though everything is complicated by that we
> support lots of complicated regular expressions on different unicode
> properties. I guess first I'd start by benchmarking the speed of ICU and
> comparing to the current implementation.
>
>


Re: "ICU - International Components for Unicode"

2020-09-27 Thread Samantha McVey
So MoarVM uses its own database of the UCD. One nice thing is this can 
probably be faster than calling to the ICU to look up information of each 
codepoint in a long string. Secondly it implements its own text data 
structures, so the nice features of the UCD to do that would be difficult to 
use.

In my opinion, it could make sense to use ICU for things like localized 
collation (sorting). It also could make sense to use ICU for unicode 
properties lookup for properties that don't have to do with grapheme 
segmentation or casing. This would be a lot of work but if something like this 
were implemented it would probably happen in the context of a larger 
rethinking of how we use unicode. Though everything is complicated by that we 
support lots of complicated regular expressions on different unicode 
properties. I guess first I'd start by benchmarking the speed of ICU and 
comparing to the current implementation.


Re: "ICU - International Components for Unicode"

2020-09-25 Thread Patrick R. Michaud
On Fri, Sep 25, 2020 at 12:37:49PM +0200, Elizabeth Mattijsen wrote:
> > On 25 Sep 2020, at 04:25, Brad Gilbert  wrote:
> > Rakudo does not use ICU
> > 
> > It used to though.
> > 
> > Rakudo used to run on Parrot.
> > Parrot used ICU for its Unicode features.
> 
> I do remember that in the Parrot days, any non-ASCII character in 
> any string, would have a significant negative effect on grammar parsing.  
> This was usually not that visible when trying to run a script, but the 
> time needed to compile the core setting (which already took a few minutes 
> then) rose (probably exponentially) to: well, I don't know.  

Part of this is because Parrot/ICU was using UTF-8 and/or UTF-16 to
encode non-ASCII strings.  As a result, indexing into a string often 
became a O(n) operation instead of O(1).  For short strings, no problem,
for long strings (such as the core setting) it was really painful.

We did work on some ways in Parrot/NQP to reduce the amount of string
scanning involved, such as caching certain index-points in the string, 
but it was always a bit of a hack.  Switching to a fixed-width encoding
(NFG, which MoarVM implements) was definitely the correct path to take
there.

Pm


Re: "ICU - International Components for Unicode"

2020-09-25 Thread Elizabeth Mattijsen
> On 25 Sep 2020, at 04:25, Brad Gilbert  wrote:
> Rakudo does not use ICU
> 
> It used to though.
> 
> Rakudo used to run on Parrot.
> Parrot used ICU for its Unicode features.

Ah, the days.

I do remember that in the Parrot days, any non-ASCII character in any string, 
would have a significant negative effect on grammar parsing.  This was usually 
not that visible when trying to run a script, but the time needed to compile 
the core setting (which already took a few minutes then) rose (probably 
exponentially) to: well, I don't know.  The last time I tried to see if it 
would actually complete, I killed the compilation process after an hour.

People complain about compilation taking long these days, and they're right: it 
should be better.  But still, compared to the Parrot days...  it's orders of 
magnitude better now.


Liz

Re: "ICU - International Components for Unicode"

2020-09-24 Thread Brad Gilbert
Rakudo does not use ICU

It used to though.

Rakudo used to run on Parrot.
Parrot used ICU for its Unicode features.

(Well maybe the JVM backend does currently, I don't actually know.)

MoarVM just has Unicode as one of its features.
Basically it has something similar to ICU already.

---

The purpose of ICU is to be able to add Unicode abilities to systems that
don't already have them.

As such, it does not really make sense to add support for the ICU library
in Raku as I don't think it adds anything that isn't already present.

If there is some feature that ICU has that Raku doesn't then it would make
more sense to add that feature directly to Raku itself.

On Thu, Sep 24, 2020 at 2:15 PM William Michels via perl6-users <
perl6-users@perl.org> wrote:

> Thanks everyone for the replies. I guess the two questions I have
> pertain mainly to 1) lineage and 2) versioning:
>
> Regarding lineage, I'm interested in knowing if
> Pugs/Parrot/Niecza/STD/Perlito/viv/JVM/Rakudo ever used the ICU
> Libraries--even if now that data has been extracted into a Raku-native
> data structure. I'm fairly certain one principal Rakudo developer is a
> C++ expert, so this idea isn't too far fetched.
>
> Regarding versioning, it would be great to tell people that Raku
> conforms to the latest-and-greatest ICU Library version, currently
> sitting at version# ICU_67. That way when people are weighing Raku vs
> Ruby or Python or Haskell or Go, we can tell them "Raku v6.d extracts
> ICU_67 thus it conforms to the most current (and most widely accepted)
> Unicode Library release (ICU 67 / CLDR 37 locale data / Unicode 13)."
> I've read over Daniel's blog post but I don't recall explicit mention
> of Unicode version 12, or 13, etc., although it does seem that
> following his links takes you to references for Unicode 13.0.0 (see
> https://www.unicode.org/reports/tr44/). Does Rakudo roll it's own UCD?
> Is there no reliance on ICU?
>
> Anyway, If Daniel or Samantha or Joseph or Liz can confirm/refute
> Raku's use of the (widely-adopted) ICU C-Library and/or Java-Library,
> I will have learned something.
>
> Thanks, Bill.
>
> http://site.icu-project.org/download/67
>
> "ICU 67 updates to CLDR 37 locale data with many additions and
> corrections. This release also includes the updates to Unicode 13,
> subsuming the special CLDR 36.1 and ICU 66 releases. ICU 67 includes
> many bug fixes for date and number formatting, including enhanced
> support for user preferences in the locale identifier. The
> LocaleMatcher code and data are improved, and number skeletons have a
> new “concise” form that can be used in MessageFormat strings. This is
> the first regular release after ICU 65. ICU 66 was a low-impact
> release with just Unicode 13 and a few bug fixes."
>
>
> Library/Language support for ICU:
>
> Objective C CocoaICU A set of Objective-C classes that encapsulate parts
> of ICU.
> C# GenICUWrapper A tool that generates a rudimentary C# wrapper around
> the C API of ICU4C. This could be used to generate headers for other
> ICU wrappers.
> C# ICU Dotnet - .NET bindings for ICU
> D Mango.icu is a set of wrappers for the D programming language
> Erlang icu4e is a set of bindings for Erlang to ICU4C
> Cobol COBOL A page on how ICU could be used from a COBOL application.
> Go icu4go provides a Go binding for the icu4c library
> Haskell Data.Text.ICU Haskell bindings for ICU4C.
> Lua ICU-Lua ICU for the Lua language
> Pascal ICU4PAS An Object Pascal wrapper around ICU4C.
> Perl PICU Perl wrapper for ICU
> PHP PHP intl A PHP wrapper around core ICU4C APIs.
> Python PyICU A Python extension wrapper around ICU4C.
> R stringi An R language wrapper of for ICU4C.
> Ruby icu4r ICU4C binding for MRI ruby.
> Smalltalk VA Smalltalk Wrappers
> Parrot Virtual Machine This is a virtual machine for Perl 6 and other
> various programming languages. ICU4C is used to improve the Unicode
> support.
> PHP The upcoming PHP 6 language is expected to support Unicode through
> ICU4C.
>
> Companies and Organizations using ICU:
>
> ABAS Software, Adobe, Amazon (Kindle), Amdocs, Apache, Appian, Apple,
> Argonne National Laboratory, Avaya, BAE Systems Geospatial
> eXploitation Products, BEA, BluePhoenix Solutions, BMC Software,
> Boost, BroadJump, Business Objects, caris, CERN, CouchDB, Debian
> Linux, Dell, Eclipse, eBay, EMC Corporation, ESRI, Facebook (HHVM),
> Firebird RDBMS, FreeBSD, Gentoo Linux, Google, GroundWork Open Source,
> GTK+, Harman/Becker Automotive Systems GmbH, HP, Hyperion, IBM,
> Inktomi, Innodata Isogen, Informatica, Intel, Interlogics, IONA, IXOS,
> Jikes, Library of Congress, LibreOffice, Mathworks, Microsoft,
> Mozilla, Netezza, Node.js, Oracle (Solaris, Java), Lawson Software,
> Leica Geosystems GIS & Mapping LLC, Mandrake Linux, OCLC, Progress
> Software, Python, QNX, Rogue Wave, SAP, SIL, SPSS, Software AG, SuSE,
> Sybase, Symantec, Teradata (NCR), ToolAware, Trend Micro, Virage,
> webMethods, Wine, WMS Gaming, XyEnterprise, Yahoo!, Vuo, and 

Re: "ICU - International Components for Unicode"

2020-09-24 Thread Joseph Brenner
I think more to the point is which version of Unicode is supported,
rather than the ICU libraries.   It might be worth writing some tests
that check that Raku's unicode handling matches the ICU libraries.

On 9/24/20, William Michels  wrote:
> Thanks everyone for the replies. I guess the two questions I have
> pertain mainly to 1) lineage and 2) versioning:
>
> Regarding lineage, I'm interested in knowing if
> Pugs/Parrot/Niecza/STD/Perlito/viv/JVM/Rakudo ever used the ICU
> Libraries--even if now that data has been extracted into a Raku-native
> data structure. I'm fairly certain one principal Rakudo developer is a
> C++ expert, so this idea isn't too far fetched.
>
> Regarding versioning, it would be great to tell people that Raku
> conforms to the latest-and-greatest ICU Library version, currently
> sitting at version# ICU_67. That way when people are weighing Raku vs
> Ruby or Python or Haskell or Go, we can tell them "Raku v6.d extracts
> ICU_67 thus it conforms to the most current (and most widely accepted)
> Unicode Library release (ICU 67 / CLDR 37 locale data / Unicode 13)."
> I've read over Daniel's blog post but I don't recall explicit mention
> of Unicode version 12, or 13, etc., although it does seem that
> following his links takes you to references for Unicode 13.0.0 (see
> https://www.unicode.org/reports/tr44/). Does Rakudo roll it's own UCD?
> Is there no reliance on ICU?
>
> Anyway, If Daniel or Samantha or Joseph or Liz can confirm/refute
> Raku's use of the (widely-adopted) ICU C-Library and/or Java-Library,
> I will have learned something.
>
> Thanks, Bill.
>
> http://site.icu-project.org/download/67
>
> "ICU 67 updates to CLDR 37 locale data with many additions and
> corrections. This release also includes the updates to Unicode 13,
> subsuming the special CLDR 36.1 and ICU 66 releases. ICU 67 includes
> many bug fixes for date and number formatting, including enhanced
> support for user preferences in the locale identifier. The
> LocaleMatcher code and data are improved, and number skeletons have a
> new “concise” form that can be used in MessageFormat strings. This is
> the first regular release after ICU 65. ICU 66 was a low-impact
> release with just Unicode 13 and a few bug fixes."
>
>
> Library/Language support for ICU:
>
> Objective C CocoaICU A set of Objective-C classes that encapsulate parts of
> ICU.
> C# GenICUWrapper A tool that generates a rudimentary C# wrapper around
> the C API of ICU4C. This could be used to generate headers for other
> ICU wrappers.
> C# ICU Dotnet - .NET bindings for ICU
> D Mango.icu is a set of wrappers for the D programming language
> Erlang icu4e is a set of bindings for Erlang to ICU4C
> Cobol COBOL A page on how ICU could be used from a COBOL application.
> Go icu4go provides a Go binding for the icu4c library
> Haskell Data.Text.ICU Haskell bindings for ICU4C.
> Lua ICU-Lua ICU for the Lua language
> Pascal ICU4PAS An Object Pascal wrapper around ICU4C.
> Perl PICU Perl wrapper for ICU
> PHP PHP intl A PHP wrapper around core ICU4C APIs.
> Python PyICU A Python extension wrapper around ICU4C.
> R stringi An R language wrapper of for ICU4C.
> Ruby icu4r ICU4C binding for MRI ruby.
> Smalltalk VA Smalltalk Wrappers
> Parrot Virtual Machine This is a virtual machine for Perl 6 and other
> various programming languages. ICU4C is used to improve the Unicode
> support.
> PHP The upcoming PHP 6 language is expected to support Unicode through
> ICU4C.
>
> Companies and Organizations using ICU:
>
> ABAS Software, Adobe, Amazon (Kindle), Amdocs, Apache, Appian, Apple,
> Argonne National Laboratory, Avaya, BAE Systems Geospatial
> eXploitation Products, BEA, BluePhoenix Solutions, BMC Software,
> Boost, BroadJump, Business Objects, caris, CERN, CouchDB, Debian
> Linux, Dell, Eclipse, eBay, EMC Corporation, ESRI, Facebook (HHVM),
> Firebird RDBMS, FreeBSD, Gentoo Linux, Google, GroundWork Open Source,
> GTK+, Harman/Becker Automotive Systems GmbH, HP, Hyperion, IBM,
> Inktomi, Innodata Isogen, Informatica, Intel, Interlogics, IONA, IXOS,
> Jikes, Library of Congress, LibreOffice, Mathworks, Microsoft,
> Mozilla, Netezza, Node.js, Oracle (Solaris, Java), Lawson Software,
> Leica Geosystems GIS & Mapping LLC, Mandrake Linux, OCLC, Progress
> Software, Python, QNX, Rogue Wave, SAP, SIL, SPSS, Software AG, SuSE,
> Sybase, Symantec, Teradata (NCR), ToolAware, Trend Micro, Virage,
> webMethods, Wine, WMS Gaming, XyEnterprise, Yahoo!, Vuo, and many
> others.
>
>
> On Thu, Sep 24, 2020 at 11:14 AM Joseph Brenner  wrote:
>>
>> Elizabeth Mattijsen  wrote:
>> > https://www.codesections.com/blog/raku-unicode/
>>
>> Thanks, yes I was just reading through that.  It makes it clear that
>> the "Unicode Character Database" is built-in to the MoarVM, but I'm
>> not that clear what the ICU libraries do for you, and I thought there
>> might be some point in using them for something or other.
>>
>>
>> On 9/24/20, Elizabeth Mattijsen  wrote:
>> > 

Re: "ICU - International Components for Unicode"

2020-09-24 Thread William Michels via perl6-users
Thanks everyone for the replies. I guess the two questions I have
pertain mainly to 1) lineage and 2) versioning:

Regarding lineage, I'm interested in knowing if
Pugs/Parrot/Niecza/STD/Perlito/viv/JVM/Rakudo ever used the ICU
Libraries--even if now that data has been extracted into a Raku-native
data structure. I'm fairly certain one principal Rakudo developer is a
C++ expert, so this idea isn't too far fetched.

Regarding versioning, it would be great to tell people that Raku
conforms to the latest-and-greatest ICU Library version, currently
sitting at version# ICU_67. That way when people are weighing Raku vs
Ruby or Python or Haskell or Go, we can tell them "Raku v6.d extracts
ICU_67 thus it conforms to the most current (and most widely accepted)
Unicode Library release (ICU 67 / CLDR 37 locale data / Unicode 13)."
I've read over Daniel's blog post but I don't recall explicit mention
of Unicode version 12, or 13, etc., although it does seem that
following his links takes you to references for Unicode 13.0.0 (see
https://www.unicode.org/reports/tr44/). Does Rakudo roll it's own UCD?
Is there no reliance on ICU?

Anyway, If Daniel or Samantha or Joseph or Liz can confirm/refute
Raku's use of the (widely-adopted) ICU C-Library and/or Java-Library,
I will have learned something.

Thanks, Bill.

http://site.icu-project.org/download/67

"ICU 67 updates to CLDR 37 locale data with many additions and
corrections. This release also includes the updates to Unicode 13,
subsuming the special CLDR 36.1 and ICU 66 releases. ICU 67 includes
many bug fixes for date and number formatting, including enhanced
support for user preferences in the locale identifier. The
LocaleMatcher code and data are improved, and number skeletons have a
new “concise” form that can be used in MessageFormat strings. This is
the first regular release after ICU 65. ICU 66 was a low-impact
release with just Unicode 13 and a few bug fixes."


Library/Language support for ICU:

Objective C CocoaICU A set of Objective-C classes that encapsulate parts of ICU.
C# GenICUWrapper A tool that generates a rudimentary C# wrapper around
the C API of ICU4C. This could be used to generate headers for other
ICU wrappers.
C# ICU Dotnet - .NET bindings for ICU
D Mango.icu is a set of wrappers for the D programming language
Erlang icu4e is a set of bindings for Erlang to ICU4C
Cobol COBOL A page on how ICU could be used from a COBOL application.
Go icu4go provides a Go binding for the icu4c library
Haskell Data.Text.ICU Haskell bindings for ICU4C.
Lua ICU-Lua ICU for the Lua language
Pascal ICU4PAS An Object Pascal wrapper around ICU4C.
Perl PICU Perl wrapper for ICU
PHP PHP intl A PHP wrapper around core ICU4C APIs.
Python PyICU A Python extension wrapper around ICU4C.
R stringi An R language wrapper of for ICU4C.
Ruby icu4r ICU4C binding for MRI ruby.
Smalltalk VA Smalltalk Wrappers
Parrot Virtual Machine This is a virtual machine for Perl 6 and other
various programming languages. ICU4C is used to improve the Unicode
support.
PHP The upcoming PHP 6 language is expected to support Unicode through ICU4C.

Companies and Organizations using ICU:

ABAS Software, Adobe, Amazon (Kindle), Amdocs, Apache, Appian, Apple,
Argonne National Laboratory, Avaya, BAE Systems Geospatial
eXploitation Products, BEA, BluePhoenix Solutions, BMC Software,
Boost, BroadJump, Business Objects, caris, CERN, CouchDB, Debian
Linux, Dell, Eclipse, eBay, EMC Corporation, ESRI, Facebook (HHVM),
Firebird RDBMS, FreeBSD, Gentoo Linux, Google, GroundWork Open Source,
GTK+, Harman/Becker Automotive Systems GmbH, HP, Hyperion, IBM,
Inktomi, Innodata Isogen, Informatica, Intel, Interlogics, IONA, IXOS,
Jikes, Library of Congress, LibreOffice, Mathworks, Microsoft,
Mozilla, Netezza, Node.js, Oracle (Solaris, Java), Lawson Software,
Leica Geosystems GIS & Mapping LLC, Mandrake Linux, OCLC, Progress
Software, Python, QNX, Rogue Wave, SAP, SIL, SPSS, Software AG, SuSE,
Sybase, Symantec, Teradata (NCR), ToolAware, Trend Micro, Virage,
webMethods, Wine, WMS Gaming, XyEnterprise, Yahoo!, Vuo, and many
others.


On Thu, Sep 24, 2020 at 11:14 AM Joseph Brenner  wrote:
>
> Elizabeth Mattijsen  wrote:
> > https://www.codesections.com/blog/raku-unicode/
>
> Thanks, yes I was just reading through that.  It makes it clear that
> the "Unicode Character Database" is built-in to the MoarVM, but I'm
> not that clear what the ICU libraries do for you, and I thought there
> might be some point in using them for something or other.
>
>
> On 9/24/20, Elizabeth Mattijsen  wrote:
> > https://www.codesections.com/blog/raku-unicode/
> >
> >> On 24 Sep 2020, at 20:00, Joseph Brenner  wrote:
> >>
> >> I'm not sure myself, but my first guess would be probably not...I
> >> *think*  Raku is doing it's own Unicode thing, and isn't using any
> >> system ICU libraries (but I'm willing to stand corrected on that).
> >>
> >> As far as perl (the-language-formerly-known-as-perl5) is concerned:
> >>
> >> That page 

Re: "ICU - International Components for Unicode"

2020-09-24 Thread Joseph Brenner
Elizabeth Mattijsen  wrote:
> https://www.codesections.com/blog/raku-unicode/

Thanks, yes I was just reading through that.  It makes it clear that
the "Unicode Character Database" is built-in to the MoarVM, but I'm
not that clear what the ICU libraries do for you, and I thought there
might be some point in using them for something or other.


On 9/24/20, Elizabeth Mattijsen  wrote:
> https://www.codesections.com/blog/raku-unicode/
>
>> On 24 Sep 2020, at 20:00, Joseph Brenner  wrote:
>>
>> I'm not sure myself, but my first guess would be probably not...I
>> *think*  Raku is doing it's own Unicode thing, and isn't using any
>> system ICU libraries (but I'm willing to stand corrected on that).
>>
>> As far as perl (the-language-formerly-known-as-perl5) is concerned:
>>
>> That page http://site.icu-project.org/related is a little strange in
>> any case.  If you follow the links for "perl" it goes to J. Briggs
>> personal web page, and if you comb through that there's a link to his
>> PICU just in tarball form.  He has a CPAN account, but doesn't seem to
>> have put this code there.
>>
>> (On the other hand there's this cpan module that uses the system icu
>> libraries:   https://metacpan.org/pod/Unicode::Transliterate)
>>
>> Anyway, I don't think perl has an ICU dependency either, it does it's
>> own unicode thing as well (i.e. the Unicode "database" ships with it).
>>
>>
>> On 9/24/20, William Michels  wrote:
>>> Hi,
>>>
>>> I stumbled across the "ICU - International Components for Unicode"
>>> website:
>>>
>>> http://site.icu-project.org/
>>> https://github.com/unicode-org/icu
>>>
>>> There's a list of programming languages using the ICU libraries here:
>>>
>>> http://site.icu-project.org/related
>>>
>>> Should Raku be added to the list above?
>>> I see Perl and Parrot listed, but not Raku.
>>>
>>> Best, Bill.
>>>
>


Re: "ICU - International Components for Unicode"

2020-09-24 Thread Elizabeth Mattijsen
https://www.codesections.com/blog/raku-unicode/

> On 24 Sep 2020, at 20:00, Joseph Brenner  wrote:
> 
> I'm not sure myself, but my first guess would be probably not...I
> *think*  Raku is doing it's own Unicode thing, and isn't using any
> system ICU libraries (but I'm willing to stand corrected on that).
> 
> As far as perl (the-language-formerly-known-as-perl5) is concerned:
> 
> That page http://site.icu-project.org/related is a little strange in
> any case.  If you follow the links for "perl" it goes to J. Briggs
> personal web page, and if you comb through that there's a link to his
> PICU just in tarball form.  He has a CPAN account, but doesn't seem to
> have put this code there.
> 
> (On the other hand there's this cpan module that uses the system icu
> libraries:   https://metacpan.org/pod/Unicode::Transliterate)
> 
> Anyway, I don't think perl has an ICU dependency either, it does it's
> own unicode thing as well (i.e. the Unicode "database" ships with it).
> 
> 
> On 9/24/20, William Michels  wrote:
>> Hi,
>> 
>> I stumbled across the "ICU - International Components for Unicode" website:
>> 
>> http://site.icu-project.org/
>> https://github.com/unicode-org/icu
>> 
>> There's a list of programming languages using the ICU libraries here:
>> 
>> http://site.icu-project.org/related
>> 
>> Should Raku be added to the list above?
>> I see Perl and Parrot listed, but not Raku.
>> 
>> Best, Bill.
>> 


Re: "ICU - International Components for Unicode"

2020-09-24 Thread Joseph Brenner
I'm not sure myself, but my first guess would be probably not...I
*think*  Raku is doing it's own Unicode thing, and isn't using any
system ICU libraries (but I'm willing to stand corrected on that).

As far as perl (the-language-formerly-known-as-perl5) is concerned:

That page http://site.icu-project.org/related is a little strange in
any case.  If you follow the links for "perl" it goes to J. Briggs
personal web page, and if you comb through that there's a link to his
PICU just in tarball form.  He has a CPAN account, but doesn't seem to
have put this code there.

(On the other hand there's this cpan module that uses the system icu
libraries:   https://metacpan.org/pod/Unicode::Transliterate)

Anyway, I don't think perl has an ICU dependency either, it does it's
own unicode thing as well (i.e. the Unicode "database" ships with it).


On 9/24/20, William Michels  wrote:
> Hi,
>
> I stumbled across the "ICU - International Components for Unicode" website:
>
> http://site.icu-project.org/
> https://github.com/unicode-org/icu
>
> There's a list of programming languages using the ICU libraries here:
>
> http://site.icu-project.org/related
>
> Should Raku be added to the list above?
> I see Perl and Parrot listed, but not Raku.
>
> Best, Bill.
>