Re: How to remove emoji's from unicode string

2019-01-13 Thread Stephen MacLean via use-livecode
Hi Kee,

codepointOffset() doesn’t seem to work as expected, at least for me on my data, 
although I’m not sure why. The results are way different. I had thought it 
would be as simple as using that, but as per usual, nothing is!

Best,

Steve MacLean

> On Jan 13, 2019, at 6:51 PM, Kee Nethery via use-livecode 
>  wrote:
> 
> On my phone so verify this.
> Shouldn’t you be using codepointoffset(),not offset()?
> 
> Kee Nethery
> 
>> On Jan 13, 2019, at 1:34 PM, Richmond via use-livecode 
>>  wrote:
>> 
>> Cop a look at this:
>> 
>> *http://forums.livecode.com/viewtopic.php?f=7=32030*
>> 
>>> On 13.01.19 22:17, Stephen MacLean via use-livecode wrote:
>>> Hi All,
>>> 
>>> The recent conversations on using offset() with Unicode strings was very 
>>> enlightening, thanks to all that took part!.
>>> 
>>> I have data stored in UTF8mb4. I use textDecode after loading it from the 
>>> DB to put it into a format that LC understands. I then use offset() to find 
>>> certain tags, text, etc. to work with. However, if there are emoji in that 
>>> string, the offset() function hard crashes with a out of range error.
>>> 
>>> Due to the troubles offset(), I’m looking for a way to remove the emojis 
>>> before I have to use the offset function.
>>> 
>>> Short of compiling a list of emoji and the decimal equivalent, does anyone 
>>> have a way to do this in LC?
>>> 
>>> My offset code has been rock solid, except for these rare instances were 
>>> there are emoji in the text and I am not really looking to change it if I 
>>> don’t have to, preferring to just remove the emoji if possible.
>>> 
>>> TIA,
>>> 
>>> Steve MacLean
>>> 
>>> ___
>>> use-livecode mailing list
>>> use-livecode@lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your 
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to remove emoji's from unicode string

2019-01-13 Thread Stephen MacLean via use-livecode
Hi Richmond,

Thanks for posting this!

There are multiple ranges for emojis, as more were added in each version on 
Unicode.

Here is the list for the latest version of unicode 12.0

https://www.unicode.org/Public/emoji/12 ... i-data.txt 


I think that between your stripper and updated unicode map, it should be able 
to find and strip them all?


From my Forum reply.

> On Jan 13, 2019, at 4:34 PM, Richmond via use-livecode 
>  wrote:
> 
> Cop a look at this:
> 
> *http://forums.livecode.com/viewtopic.php?f=7=32030*
> 
> On 13.01.19 22:17, Stephen MacLean via use-livecode wrote:
>> Hi All,
>> 
>> The recent conversations on using offset() with Unicode strings was very 
>> enlightening, thanks to all that took part!.
>> 
>> I have data stored in UTF8mb4. I use textDecode after loading it from the DB 
>> to put it into a format that LC understands. I then use offset() to find 
>> certain tags, text, etc. to work with. However, if there are emoji in that 
>> string, the offset() function hard crashes with a out of range error.
>> 
>> Due to the troubles offset(), I’m looking for a way to remove the emojis 
>> before I have to use the offset function.
>> 
>> Short of compiling a list of emoji and the decimal equivalent, does anyone 
>> have a way to do this in LC?
>> 
>> My offset code has been rock solid, except for these rare instances were 
>> there are emoji in the text and I am not really looking to change it if I 
>> don’t have to, preferring to just remove the emoji if possible.
>> 
>> TIA,
>> 
>> Steve MacLean
>> 
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to remove emoji's from unicode string

2019-01-13 Thread scott--- via use-livecode
Hello Richmond.I have found that emojis also cause the  
command  to fail silently. Being able to strip emojis would be helpful for that 
as well.

I've been fooling about with your emoji stripping stack. 
Using   codePointToNum(tEmojiChar) > 128511  doesn't seem to catch all the 
emoji characters... 
this cat head <   > returns "128049"

Also some emojis now have multiple skin colors < 隆‍♂️隆‍♂️ >and that seems to 
throw a monkeywrench into the works, too. When I posted this to the forum I see 
that the second merman is followed by a  and then a . 
However, the second merman I pasted was actually slightly browner than the 
first (and looked correct when it was originally pasted) but this does not seem 
to pass through the posting mechanism correctly. The brown swatch and male 
symbol seem to be incorrectly parsed away from the second merman.  The “browner 
merman” is reported as 4 characters. This can even be seen by using the delete 
key (I’m on a Mac using LC 9.0.2) and deleting backwards over the merman. It 
changes color as the deleteKey removes characters which the field may or may 
not display. The merman doesn’t necessarily displays at the size that the field 
is set to. Selecting the merman and choosing “Use Owner’s Size” from the text 
menu can break the emoji if the field isn’t wide enough to contain all the 
“hidden characters” on the same line.

hmm… just looked for a bug report and didn’t find one exactly like this. I’m 
pretty ignorant about how unicode actually operates but on the assumption that 
it should “just work” in LiveCode.. I guess a bug report is my next stop.

--cross posted to forums—

Scott Morrow

Elementary Software
(Now with 20% less chalk dust!)
web   http://elementarysoftware.com/
email sc...@elementarysoftware.com
booth 1-800-615-0867
--


> On Jan 13, 2019, at 1:34 PM, Richmond via use-livecode 
>  wrote:
> 
> Cop a look at this:
> 
> *http://forums.livecode.com/viewtopic.php?f=7=32030*
> 
> On 13.01.19 22:17, Stephen MacLean via use-livecode wrote:
>> Hi All,
>> 
>> The recent conversations on using offset() with Unicode strings was very 
>> enlightening, thanks to all that took part!.
>> 
>> I have data stored in UTF8mb4. I use textDecode after loading it from the DB 
>> to put it into a format that LC understands. I then use offset() to find 
>> certain tags, text, etc. to work with. However, if there are emoji in that 
>> string, the offset() function hard crashes with a out of range error.
>> 
>> Due to the troubles offset(), I’m looking for a way to remove the emojis 
>> before I have to use the offset function.
>> 
>> Short of compiling a list of emoji and the decimal equivalent, does anyone 
>> have a way to do this in LC?
>> 
>> My offset code has been rock solid, except for these rare instances were 
>> there are emoji in the text and I am not really looking to change it if I 
>> don’t have to, preferring to just remove the emoji if possible.
>> 
>> TIA,
>> 
>> Steve MacLean
>> 
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to remove emoji's from unicode string

2019-01-13 Thread Kee Nethery via use-livecode
On my phone so verify this.
Shouldn’t you be using codepointoffset(),not offset()?

Kee Nethery

> On Jan 13, 2019, at 1:34 PM, Richmond via use-livecode 
>  wrote:
> 
> Cop a look at this:
> 
> *http://forums.livecode.com/viewtopic.php?f=7=32030*
> 
>> On 13.01.19 22:17, Stephen MacLean via use-livecode wrote:
>> Hi All,
>> 
>> The recent conversations on using offset() with Unicode strings was very 
>> enlightening, thanks to all that took part!.
>> 
>> I have data stored in UTF8mb4. I use textDecode after loading it from the DB 
>> to put it into a format that LC understands. I then use offset() to find 
>> certain tags, text, etc. to work with. However, if there are emoji in that 
>> string, the offset() function hard crashes with a out of range error.
>> 
>> Due to the troubles offset(), I’m looking for a way to remove the emojis 
>> before I have to use the offset function.
>> 
>> Short of compiling a list of emoji and the decimal equivalent, does anyone 
>> have a way to do this in LC?
>> 
>> My offset code has been rock solid, except for these rare instances were 
>> there are emoji in the text and I am not really looking to change it if I 
>> don’t have to, preferring to just remove the emoji if possible.
>> 
>> TIA,
>> 
>> Steve MacLean
>> 
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to remove emoji's from unicode string

2019-01-13 Thread Richmond via use-livecode

Cop a look at this:

*http://forums.livecode.com/viewtopic.php?f=7=32030*

On 13.01.19 22:17, Stephen MacLean via use-livecode wrote:

Hi All,

The recent conversations on using offset() with Unicode strings was very 
enlightening, thanks to all that took part!.

I have data stored in UTF8mb4. I use textDecode after loading it from the DB to 
put it into a format that LC understands. I then use offset() to find certain 
tags, text, etc. to work with. However, if there are emoji in that string, the 
offset() function hard crashes with a out of range error.

Due to the troubles offset(), I’m looking for a way to remove the emojis before 
I have to use the offset function.

Short of compiling a list of emoji and the decimal equivalent, does anyone have 
a way to do this in LC?

My offset code has been rock solid, except for these rare instances were there 
are emoji in the text and I am not really looking to change it if I don’t have 
to, preferring to just remove the emoji if possible.

TIA,

Steve MacLean

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to remove emoji's from unicode string

2019-01-13 Thread Richmond via use-livecode

Well . . . as the emojis are stored in a Unicode range (Hex 1F600 - 1F64F)

https://www.unicode.org/charts/

I'd "just" strip out any characters inwith that range.


On 13.01.19 22:17, Stephen MacLean via use-livecode wrote:

Hi All,

The recent conversations on using offset() with Unicode strings was very 
enlightening, thanks to all that took part!.

I have data stored in UTF8mb4. I use textDecode after loading it from the DB to 
put it into a format that LC understands. I then use offset() to find certain 
tags, text, etc. to work with. However, if there are emoji in that string, the 
offset() function hard crashes with a out of range error.

Due to the troubles offset(), I’m looking for a way to remove the emojis before 
I have to use the offset function.

Short of compiling a list of emoji and the decimal equivalent, does anyone have 
a way to do this in LC?

My offset code has been rock solid, except for these rare instances were there 
are emoji in the text and I am not really looking to change it if I don’t have 
to, preferring to just remove the emoji if possible.

TIA,

Steve MacLean

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

How to remove emoji's from unicode string

2019-01-13 Thread Stephen MacLean via use-livecode
Hi All,

The recent conversations on using offset() with Unicode strings was very 
enlightening, thanks to all that took part!.

I have data stored in UTF8mb4. I use textDecode after loading it from the DB to 
put it into a format that LC understands. I then use offset() to find certain 
tags, text, etc. to work with. However, if there are emoji in that string, the 
offset() function hard crashes with a out of range error.

Due to the troubles offset(), I’m looking for a way to remove the emojis before 
I have to use the offset function.

Short of compiling a list of emoji and the decimal equivalent, does anyone have 
a way to do this in LC?

My offset code has been rock solid, except for these rare instances were there 
are emoji in the text and I am not really looking to change it if I don’t have 
to, preferring to just remove the emoji if possible.

TIA,

Steve MacLean

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode