[GitHub] [arrow] xhochy commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-24 Thread GitBox


xhochy commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-648653322


   > The R ones probably?
   
   For these, we need to add `utf8proc` to rtools40 and rtools35 and add them 
to the linker line of the R build.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] xhochy commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-24 Thread GitBox


xhochy commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-648649038


   The R ones probably?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] xhochy commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-24 Thread GitBox


xhochy commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-648648745


   @kou What is the problematic CI job that shows your problem? The MinGW ones 
seem fine.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] xhochy commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-17 Thread GitBox


xhochy commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-645401816


   > Would a lookup table in the order of 256kb (generated at runtime, not in 
the binary) per case mapping be acceptable for Arrow?
   
   I would find that acceptable if the mapping is only generated if needed 
(thus you will have a one-off payment when using a UTF8-kernel). I would though 
prefer if `utf8proc` could implement it just like this on their side. Can you 
open an issue there?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] xhochy commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-17 Thread GitBox


xhochy commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-645279445


   Also crossreferenced this in 
https://github.com/JuliaStrings/utf8proc/issues/12 to make the `utf8proc` 
maintainers aware of what we're doing in case they are interested.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] xhochy commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-17 Thread GitBox


xhochy commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-645253060


   The major difference between `unilib` and `utf8proc` in uppercasing a 
character seems to be that  [unilib looks up the uppercase value 
directly](https://github.com/ufal/unilib/blob/d8276e70b7c11c677897f71030de7258cbb1f99e/unilib/unicode.h#L81)
 wheras [utf8proc first gets a struct with all 
properties](https://github.com/JuliaStrings/utf8proc/blob/08fa0698639f15d07b12c0065a4494f2d504/utf8proc.c#L377)
 from which it extracts the uppercase value. Pre-computing the uppercase 
dictionary first could bring `utf8proc` en par with the performance.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] xhochy commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-16 Thread GitBox


xhochy commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-644795203


   > We'll need to make utf8proc a proper toolchain library, @pitrou should be 
able to help you with that.
   
   I can take care of that!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org