Re: Merge and unicode
On 9/9/19 1:08 PM, J. Landman Gay via use-livecode wrote: It seems that the merge command doesn't respect unicode. Does anyone have a workaround? The text I'm inserting is already decoded to UTF16. I misspoke, sorry. It's the metadata that doesn't respect unicode. -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
On 9/9/2019 2:13 PM, J. Landman Gay via use-livecode wrote: On 9/9/19 1:08 PM, J. Landman Gay via use-livecode wrote: It seems that the merge command doesn't respect unicode. Does anyone have a workaround? The text I'm inserting is already decoded to UTF16. I misspoke, sorry. It's the metadata that doesn't respect unicode. Can you clarify what you mean when you say the "metadata" doesn't respect Unicode? I'm in the middle of a big Unicode problem and have found and reported a ton of bugs where Unicode is not yet everywhere. I'm keenly interested in any I don't know about. ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
I'm not sure I understand. Do you mean "encoded to UTF-16"? In that case you should decode that to convert it to internal text. And then try merge. (Which still might have problems, I suppose.) > On Sep 9, 2019, at 12:08 PM, J. Landman Gay via use-livecode > wrote: > > It seems that the merge command doesn't respect unicode. Does anyone have a > workaround? The text I'm inserting is already decoded to UTF16. > > -- > Jacqueline Landman Gay | jac...@hyperactivesw.com > HyperActive Software | http://www.hyperactivesw.com > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
On 9/9/19 2:39 PM, Paul Dupuis via use-livecode wrote: On 9/9/2019 2:13 PM, J. Landman Gay via use-livecode wrote: On 9/9/19 1:08 PM, J. Landman Gay via use-livecode wrote: It seems that the merge command doesn't respect unicode. Does anyone have a workaround? The text I'm inserting is already decoded to UTF16. I misspoke, sorry. It's the metadata that doesn't respect unicode. Can you clarify what you mean when you say the "metadata" doesn't respect Unicode? I'm in the middle of a big Unicode problem and have found and reported a ton of bugs where Unicode is not yet everywhere. I'm keenly interested in any I don't know about. Actually I just double-checked and both merge and metadata may be wrong. I get UTF8 text from a server that is then textDecoded to UTF16, and an html template that I merge with parts of the UTF16 text. In the variable watcher, the merged template looks correct but when a field is set to the htmltext the result is wrong, diacriticals and curly quotes are question marks. My solution for that was to urlEncode the content before merging, and urlDecoding when extracting it for display. That works. In another part of the app I use the same (UTF16) text to set the metadata of a line in a field. When the script gets the metadata later, diacriticals and curly quotes are strange characters with very high UTF numbers. -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
It's UTF8 text from a server, which I textDecode to UTF16. When I use the UTF16 text in a merge, diacriticals and/or curly quotes get mangled. (Same with setting metadata on field text too.) On 9/9/19 4:16 PM, Dar Scott Consulting via use-livecode wrote: I'm not sure I understand. Do you mean "encoded to UTF-16"? In that case you should decode that to convert it to internal text. And then try merge. (Which still might have problems, I suppose.) On Sep 9, 2019, at 12:08 PM, J. Landman Gay via use-livecode wrote: It seems that the merge command doesn't respect unicode. Does anyone have a workaround? The text I'm inserting is already decoded to UTF16. -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
Doesn't any Unicode in htmlText of a field need to be in HTML form (i.e. #;? I thought htmlText turns any non ASCII into either hex encoded html or, where html entity names exists, uses html entity names. On 9/9/2019 6:35 PM, J. Landman Gay via use-livecode wrote: On 9/9/19 2:39 PM, Paul Dupuis via use-livecode wrote: On 9/9/2019 2:13 PM, J. Landman Gay via use-livecode wrote: On 9/9/19 1:08 PM, J. Landman Gay via use-livecode wrote: It seems that the merge command doesn't respect unicode. Does anyone have a workaround? The text I'm inserting is already decoded to UTF16. I misspoke, sorry. It's the metadata that doesn't respect unicode. Can you clarify what you mean when you say the "metadata" doesn't respect Unicode? I'm in the middle of a big Unicode problem and have found and reported a ton of bugs where Unicode is not yet everywhere. I'm keenly interested in any I don't know about. Actually I just double-checked and both merge and metadata may be wrong. I get UTF8 text from a server that is then textDecoded to UTF16, and an html template that I merge with parts of the UTF16 text. In the variable watcher, the merged template looks correct but when a field is set to the htmltext the result is wrong, diacriticals and curly quotes are question marks. My solution for that was to urlEncode the content before merging, and urlDecoding when extracting it for display. That works. In another part of the app I use the same (UTF16) text to set the metadata of a line in a field. When the script gets the metadata later, diacriticals and curly quotes are strange characters with very high UTF numbers. ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
I think you are trying to think too much about the LC implementation of text. Maybe. Text in LC is an abstraction of a sequence of code points. Whether it is UTF16 or not is hidden to me. (mostly) So, get textDecode( binaryFromServer, "UTF-8" ) should put that into the correct form, if it is really UTF-8. A data (binary bytes) is interpreted as native encoding if one tries to use it as text. I recommend against this. I try to always textDecode() everything coming in, but I make exceptions at times for ASCII. I'm not sure what you mean by metadata. Are you referring to HTTP content-type? Sorry, if I am off on a bunny trail... Dar > On Sep 9, 2019, at 4:38 PM, J. Landman Gay via use-livecode > wrote: > > It's UTF8 text from a server, which I textDecode to UTF16. When I use the > UTF16 text in a merge, diacriticals and/or curly quotes get mangled. (Same > with setting metadata on field text too.) > > On 9/9/19 4:16 PM, Dar Scott Consulting via use-livecode wrote: >> I'm not sure I understand. >> Do you mean "encoded to UTF-16"? In that case you should decode that to >> convert it to internal text. And then try merge. (Which still might have >> problems, I suppose.) >>> On Sep 9, 2019, at 12:08 PM, J. Landman Gay via use-livecode >>> wrote: >>> >>> It seems that the merge command doesn't respect unicode. Does anyone have a >>> workaround? The text I'm inserting is already decoded to UTF16. >>> >>> -- >>> Jacqueline Landman Gay | jac...@hyperactivesw.com >>> HyperActive Software | http://www.hyperactivesw.com >>> >>> ___ >>> use-livecode mailing list >>> use-livecode@lists.runrev.com >>> Please visit this url to subscribe, unsubscribe and manage your >>> subscription preferences: >>> http://lists.runrev.com/mailman/listinfo/use-livecode >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > > > -- > Jacqueline Landman Gay | jac...@hyperactivesw.com > HyperActive Software | http://www.hyperactivesw.com > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
Well, I've made some changes to the code since I started urlEncoding the text before merging so I'll check that again. Paul is right that unicode in htmltext needs to be in hex, but the numbers I'm getting back are very high (8,000+) and render in the field as strange pictographs. Elsewhere where there is no merge, curly quotes translate to the named quote or apostrophe entities and are correct. By metadata I mean the LC term (see the dictionary) that allows you to attach some text to a field text chunk. The metadata isn't displayed in the field but you can use it for anything you want. In my case the field is a list of clickable entries in a table of contents, each with its own metadata attached that provides a path to the stack and card the entry needs to open. When I use normal LC text as metadata, diacriticals aren't rendered correctly (curly quotes become question marks,) the path is therefore incorrect and the click goes nowhere. Since LC is supposed to be unicode throughout, I'd expect metadata to be compatible. The same text appears correctly when not used as metadata. -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com On September 9, 2019 7:25:28 PM Dar Scott Consulting via use-livecode wrote: I think you are trying to think too much about the LC implementation of text. Maybe. Text in LC is an abstraction of a sequence of code points. Whether it is UTF16 or not is hidden to me. (mostly) So, get textDecode( binaryFromServer, "UTF-8" ) should put that into the correct form, if it is really UTF-8. A data (binary bytes) is interpreted as native encoding if one tries to use it as text. I recommend against this. I try to always textDecode() everything coming in, but I make exceptions at times for ASCII. I'm not sure what you mean by metadata. Are you referring to HTTP content-type? Sorry, if I am off on a bunny trail... Dar On Sep 9, 2019, at 4:38 PM, J. Landman Gay via use-livecode wrote: It's UTF8 text from a server, which I textDecode to UTF16. When I use the UTF16 text in a merge, diacriticals and/or curly quotes get mangled. (Same with setting metadata on field text too.) On 9/9/19 4:16 PM, Dar Scott Consulting via use-livecode wrote: I'm not sure I understand. Do you mean "encoded to UTF-16"? In that case you should decode that to convert it to internal text. And then try merge. (Which still might have problems, I suppose.) On Sep 9, 2019, at 12:08 PM, J. Landman Gay via use-livecode wrote: It seems that the merge command doesn't respect unicode. Does anyone have a workaround? The text I'm inserting is already decoded to UTF16. -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
This quick check seems to work for me. on mouseup put "A" into field 1 set the metadata of char 1 of field 1 to "é" put the metadata of char 1 of field 1 after field 1 end mouseup > On Sep 9, 2019, at 8:32 PM, J. Landman Gay via use-livecode > wrote: > > Well, I've made some changes to the code since I started urlEncoding the text > before merging so I'll check that again. Paul is right that unicode in > htmltext needs to be in hex, but the numbers I'm getting back are very high > (8,000+) and render in the field as strange pictographs. Elsewhere where > there is no merge, curly quotes translate to the named quote or apostrophe > entities and are correct. > > By metadata I mean the LC term (see the dictionary) that allows you to attach > some text to a field text chunk. The metadata isn't displayed in the field > but you can use it for anything you want. In my case the field is a list of > clickable entries in a table of contents, each with its own metadata attached > that provides a path to the stack and card the entry needs to open. > > When I use normal LC text as metadata, diacriticals aren't rendered correctly > (curly quotes become question marks,) the path is therefore incorrect and the > click goes nowhere. > > Since LC is supposed to be unicode throughout, I'd expect metadata to be > compatible. The same text appears correctly when not used as metadata. > -- > Jacqueline Landman Gay | jac...@hyperactivesw.com > HyperActive Software | http://www.hyperactivesw.com > On September 9, 2019 7:25:28 PM Dar Scott Consulting via use-livecode > wrote: > >> I think you are trying to think too much about the LC implementation of >> text. Maybe. >> >> Text in LC is an abstraction of a sequence of code points. Whether it is >> UTF16 or not is hidden to me. (mostly) >> >> So, >> >> get textDecode( binaryFromServer, "UTF-8" ) >> >> should put that into the correct form, if it is really UTF-8. >> >> A data (binary bytes) is interpreted as native encoding if one tries to use >> it as text. I recommend against this. I try to always textDecode() >> everything coming in, but I make exceptions at times for ASCII. >> >> I'm not sure what you mean by metadata. Are you referring to HTTP >> content-type? >> >> Sorry, if I am off on a bunny trail... >> >> Dar >> >>> On Sep 9, 2019, at 4:38 PM, J. Landman Gay via use-livecode >>> wrote: >>> >>> It's UTF8 text from a server, which I textDecode to UTF16. When I use the >>> UTF16 text in a merge, diacriticals and/or curly quotes get mangled. (Same >>> with setting metadata on field text too.) >>> >>> On 9/9/19 4:16 PM, Dar Scott Consulting via use-livecode wrote: I'm not sure I understand. Do you mean "encoded to UTF-16"? In that case you should decode that to convert it to internal text. And then try merge. (Which still might have problems, I suppose.) > On Sep 9, 2019, at 12:08 PM, J. Landman Gay via use-livecode > wrote: > > > It seems that the merge command doesn't respect unicode. Does anyone have > a workaround? The text I'm inserting is already decoded to UTF16. > > > -- > Jacqueline Landman Gay | jac...@hyperactivesw.com > HyperActive Software | http://www.hyperactivesw.com > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode >>> >>> >>> -- >>> Jacqueline Landman Gay | jac...@hyperactivesw.com >>> HyperActive Software | http://www.hyperactivesw.com >>> >>> ___ >>> use-livecode mailing list >>> use-livecode@lists.runrev.com >>> Please visit this url to subscribe, unsubscribe and manage your >>> subscription preferences: >>> http://lists.runrev.com/mailman/listinfo/use-livecode >> >> >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > > > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url
Re: Merge and unicode
And this, too, looks OK to me. on mouseup put empty into field 1 put "A" into field 1 get numToCodepoint(0x2200) & numToCodepoint(0x1040F) & "V-" set the metadata of char 1 of field 1 to it put the metadata of char 1 of field 1 after field 1 end mouseup I guess the problem is in the merge as you thought. I did notice in the dictionary that setting the metadata of a line is not the same as setting the metadata of all of the characters of the line. Dar Scott > On Sep 9, 2019, at 8:58 PM, Dar Scott Consulting via use-livecode > wrote: > > This quick check seems to work for me. > > on mouseup > > put "A" into field 1 > > set the metadata of char 1 of field 1 to "é" > > put the metadata of char 1 of field 1 after field 1 > > end mouseup > > >> On Sep 9, 2019, at 8:32 PM, J. Landman Gay via use-livecode >> wrote: >> >> Well, I've made some changes to the code since I started urlEncoding the >> text before merging so I'll check that again. Paul is right that unicode in >> htmltext needs to be in hex, but the numbers I'm getting back are very high >> (8,000+) and render in the field as strange pictographs. Elsewhere where >> there is no merge, curly quotes translate to the named quote or apostrophe >> entities and are correct. >> >> By metadata I mean the LC term (see the dictionary) that allows you to >> attach some text to a field text chunk. The metadata isn't displayed in the >> field but you can use it for anything you want. In my case the field is a >> list of clickable entries in a table of contents, each with its own metadata >> attached that provides a path to the stack and card the entry needs to open. >> >> When I use normal LC text as metadata, diacriticals aren't rendered >> correctly (curly quotes become question marks,) the path is therefore >> incorrect and the click goes nowhere. >> >> Since LC is supposed to be unicode throughout, I'd expect metadata to be >> compatible. The same text appears correctly when not used as metadata. >> -- >> Jacqueline Landman Gay | jac...@hyperactivesw.com >> HyperActive Software | http://www.hyperactivesw.com >> On September 9, 2019 7:25:28 PM Dar Scott Consulting via use-livecode >> wrote: >> >>> I think you are trying to think too much about the LC implementation of >>> text. Maybe. >>> >>> Text in LC is an abstraction of a sequence of code points. Whether it is >>> UTF16 or not is hidden to me. (mostly) >>> >>> So, >>> >>> get textDecode( binaryFromServer, "UTF-8" ) >>> >>> should put that into the correct form, if it is really UTF-8. >>> >>> A data (binary bytes) is interpreted as native encoding if one tries to use >>> it as text. I recommend against this. I try to always textDecode() >>> everything coming in, but I make exceptions at times for ASCII. >>> >>> I'm not sure what you mean by metadata. Are you referring to HTTP >>> content-type? >>> >>> Sorry, if I am off on a bunny trail... >>> >>> Dar >>> On Sep 9, 2019, at 4:38 PM, J. Landman Gay via use-livecode wrote: It's UTF8 text from a server, which I textDecode to UTF16. When I use the UTF16 text in a merge, diacriticals and/or curly quotes get mangled. (Same with setting metadata on field text too.) On 9/9/19 4:16 PM, Dar Scott Consulting via use-livecode wrote: > I'm not sure I understand. > Do you mean "encoded to UTF-16"? In that case you should decode that to > convert it to internal text. And then try merge. (Which still might have > problems, I suppose.) >> On Sep 9, 2019, at 12:08 PM, J. Landman Gay via use-livecode >> wrote: >> >> >> It seems that the merge command doesn't respect unicode. Does anyone >> have a workaround? The text I'm inserting is already decoded to UTF16. >> >> >> -- >> Jacqueline Landman Gay | jac...@hyperactivesw.com >> HyperActive Software | http://www.hyperactivesw.com >> >> >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your >> subscription preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listin
Re: Merge and unicode
I think I'm doing this wrong. This seems to work, too. on mouseup put empty into field 1 put numToCodepoint(0x2200) into x put numToCodepoint(0x1040F) & "V-" into y put merge(" é{ [[x]] }é [[y]]") into field 1 end mouseup > On Sep 9, 2019, at 10:25 PM, dsc--- via use-livecode > wrote: > > And this, too, looks OK to me. > > on mouseup > put empty into field 1 > put "A" into field 1 > get numToCodepoint(0x2200) & numToCodepoint(0x1040F) & "V-" > set the metadata of char 1 of field 1 to it > put the metadata of char 1 of field 1 after field 1 > end mouseup > > I guess the problem is in the merge as you thought. > > I did notice in the dictionary that setting the metadata of a line is not the > same as setting the metadata of all of the characters of the line. > > Dar Scott > > >> On Sep 9, 2019, at 8:58 PM, Dar Scott Consulting via use-livecode >> wrote: >> >> This quick check seems to work for me. >> >> on mouseup >> >> put "A" into field 1 >> >> set the metadata of char 1 of field 1 to "é" >> >> put the metadata of char 1 of field 1 after field 1 >> >> end mouseup >> >> >>> On Sep 9, 2019, at 8:32 PM, J. Landman Gay via use-livecode >>> wrote: >>> >>> Well, I've made some changes to the code since I started urlEncoding the >>> text before merging so I'll check that again. Paul is right that unicode in >>> htmltext needs to be in hex, but the numbers I'm getting back are very high >>> (8,000+) and render in the field as strange pictographs. Elsewhere where >>> there is no merge, curly quotes translate to the named quote or apostrophe >>> entities and are correct. >>> >>> By metadata I mean the LC term (see the dictionary) that allows you to >>> attach some text to a field text chunk. The metadata isn't displayed in the >>> field but you can use it for anything you want. In my case the field is a >>> list of clickable entries in a table of contents, each with its own >>> metadata attached that provides a path to the stack and card the entry >>> needs to open. >>> >>> When I use normal LC text as metadata, diacriticals aren't rendered >>> correctly (curly quotes become question marks,) the path is therefore >>> incorrect and the click goes nowhere. >>> >>> Since LC is supposed to be unicode throughout, I'd expect metadata to be >>> compatible. The same text appears correctly when not used as metadata. >>> -- >>> Jacqueline Landman Gay | jac...@hyperactivesw.com >>> HyperActive Software | http://www.hyperactivesw.com >>> On September 9, 2019 7:25:28 PM Dar Scott Consulting via use-livecode >>> wrote: >>> I think you are trying to think too much about the LC implementation of text. Maybe. Text in LC is an abstraction of a sequence of code points. Whether it is UTF16 or not is hidden to me. (mostly) So, get textDecode( binaryFromServer, "UTF-8" ) should put that into the correct form, if it is really UTF-8. A data (binary bytes) is interpreted as native encoding if one tries to use it as text. I recommend against this. I try to always textDecode() everything coming in, but I make exceptions at times for ASCII. I'm not sure what you mean by metadata. Are you referring to HTTP content-type? Sorry, if I am off on a bunny trail... Dar > On Sep 9, 2019, at 4:38 PM, J. Landman Gay via use-livecode > wrote: > > It's UTF8 text from a server, which I textDecode to UTF16. When I use the > UTF16 text in a merge, diacriticals and/or curly quotes get mangled. > (Same with setting metadata on field text too.) > > On 9/9/19 4:16 PM, Dar Scott Consulting via use-livecode wrote: >> I'm not sure I understand. >> Do you mean "encoded to UTF-16"? In that case you should decode that to >> convert it to internal text. And then try merge. (Which still might have >> problems, I suppose.) >>> On Sep 9, 2019, at 12:08 PM, J. Landman Gay via use-livecode >>> wrote: >>> >>> >>> It seems that the merge command doesn't respect unicode. Does anyone >>> have a workaround? The text I'm inserting is already decoded to UTF16. >>> >>> >>> -- >>> Jacqueline Landman Gay | jac...@hyperactivesw.com >>> HyperActive Software | http://www.hyperactivesw.com >>> >>> >>> ___ >>> use-livecode mailing list >>> use-livecode@lists.runrev.com >>> Please visit this url to subscribe, unsubscribe and manage your >>> subscription preferences: >>> http://lists.runrev.com/mailman/listinfo/use-livecode >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your >> subscription preferences: >> http://lists.runrev.com/mailman/li
Re: Merge and unicode
Trust me it's better than a feral gander persuit. Bob S > On Sep 9, 2019, at 17:23 , Dar Scott Consulting via use-livecode > wrote: > > Sorry, if I am off on a bunny trail... > > Dar ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
Trusting... Also, interpreting Latin-1 as UTF-8 can generate some weird characters and lots of ?-diamond symbols. > On Sep 10, 2019, at 8:36 AM, Bob Sneidar via use-livecode > wrote: > > Trust me it's better than a feral gander persuit. > > Bob S > > >> On Sep 9, 2019, at 17:23 , Dar Scott Consulting via use-livecode >> wrote: >> >> Sorry, if I am off on a bunny trail... >> >> Dar > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
Jacque, these are my latest thoughts as far as possible problems. 1. Dar is very confused and off in the wrong direction. Use big stick. 2. Binary data is in an 8-bit char set encoding causing problems with UTF-8 decode. Check encoding. 3. Field, line and character metadata are interfering. Clear all, then set and get consistently. 4. Merge is not handling binary data as text. Use textDecode first. Dar Scott Mad Scientist > On Sep 10, 2019, at 11:04 AM, Dar Scott Consulting via use-livecode > wrote: > > Trusting... > > Also, interpreting Latin-1 as UTF-8 can generate some weird characters and > lots of ?-diamond symbols. > >> On Sep 10, 2019, at 8:36 AM, Bob Sneidar via use-livecode >> wrote: >> >> Trust me it's better than a feral gander persuit. >> >> Bob S >> >> >>> On Sep 9, 2019, at 17:23 , Dar Scott Consulting via use-livecode >>> wrote: >>> >>> Sorry, if I am off on a bunny trail... >>> >>> Dar >> >> >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode >> > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
I extracted an example. The main issue is curly quotes. The text came from FileMaker in UTF8, which I textDecode to UTF16. You can assume that all text is LC native throughout the app. Here is the template I use for merge: size="16" color="#C77C02">[[tSECTION]][[tCONCEPT]] In the field, this text is displayed accurately with curly quotes: The New Testament Scholar “Dare to reason!” Here is a result of the merge: The New Testament Scholar“Dare to reason!” Notice that the displayed text uses entity names (&ldquo, &rdquo) while the metadata which was created from the same text block as the field text has changed the quotes to two numbers in the high 5000s with no difference between left and right quotes. I was unable to paste the actual text here, as my mail client refused to render it, but the two numerical references appear as a single pictograph in LC's variable watcher, and do not match the card path I need, which in this case is: EN07_The New Testament Scholar“Dare to reason!” Maybe you can make sense of this? I've written an ugly workaround that pieces together the reference I need, but it would be better if I could just use the metadata. The metadata works fine as long as there are no quotes. On 9/9/19 11:35 PM, dsc--- via use-livecode wrote: I think I'm doing this wrong. This seems to work, too. on mouseup put empty into field 1 put numToCodepoint(0x2200) into x put numToCodepoint(0x1040F) & "V-" into y put merge(" é{ [[x]] }é [[y]]") into field 1 end mouseup On Sep 9, 2019, at 10:25 PM, dsc--- via use-livecode wrote: And this, too, looks OK to me. on mouseup put empty into field 1 put "A" into field 1 get numToCodepoint(0x2200) & numToCodepoint(0x1040F) & "V-" set the metadata of char 1 of field 1 to it put the metadata of char 1 of field 1 after field 1 end mouseup I guess the problem is in the merge as you thought. I did notice in the dictionary that setting the metadata of a line is not the same as setting the metadata of all of the characters of the line. Dar Scott On Sep 9, 2019, at 8:58 PM, Dar Scott Consulting via use-livecode wrote: This quick check seems to work for me. on mouseup put "A" into field 1 set the metadata of char 1 of field 1 to "é" put the metadata of char 1 of field 1 after field 1 end mouseup On Sep 9, 2019, at 8:32 PM, J. Landman Gay via use-livecode wrote: Well, I've made some changes to the code since I started urlEncoding the text before merging so I'll check that again. Paul is right that unicode in htmltext needs to be in hex, but the numbers I'm getting back are very high (8,000+) and render in the field as strange pictographs. Elsewhere where there is no merge, curly quotes translate to the named quote or apostrophe entities and are correct. By metadata I mean the LC term (see the dictionary) that allows you to attach some text to a field text chunk. The metadata isn't displayed in the field but you can use it for anything you want. In my case the field is a list of clickable entries in a table of contents, each with its own metadata attached that provides a path to the stack and card the entry needs to open. When I use normal LC text as metadata, diacriticals aren't rendered correctly (curly quotes become question marks,) the path is therefore incorrect and the click goes nowhere. Since LC is supposed to be unicode throughout, I'd expect metadata to be compatible. The same text appears correctly when not used as metadata. -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com On September 9, 2019 7:25:28 PM Dar Scott Consulting via use-livecode wrote: I think you are trying to think too much about the LC implementation of text. Maybe. Text in LC is an abstraction of a sequence of code points. Whether it is UTF16 or not is hidden to me. (mostly) So, get textDecode( binaryFromServer, "UTF-8" ) should put that into the correct form, if it is really UTF-8. A data (binary bytes) is interpreted as native encoding if one tries to use it as text. I recommend against this. I try to always textDecode() everything coming in, but I make exceptions at times for ASCII. I'm not sure what you mean by metadata. Are you referring to HTTP content-type? Sorry, if I am off on a bunny trail... Dar On Sep 9, 2019, at 4:38 PM, J. Landman Gay via use-livecode wrote: It's UTF8 text from a server, which I textDecode to UTF16. When I use the UTF16 text in a merge, diacriticals and/or curly quotes get mangled. (Same with setting metadata on field text too.) On 9/9/19 4:16 PM, Dar Scott Consulting via use-livecode wrote: I'm not sure I understand. Do you mean "encoded to UTF-16"? In that case you should decode that to convert it to internal text. And then try merge. (Which still might have problems, I suppose.) On Sep 9, 2019, at 12:08 PM, J. Landman Gay via use-livecode wrote: It seems that the merge command doesn't
Re: Merge and unicode
:) 1. Jacque is very confused too, but is afraid of big sticks. 2. Encoding should be identical throughout. I'm working with a large text block, pulling out sections to create a list. All data is retrieved from othe same variable, which is UTF16 native LC text. 3. The metadata is only set at the "paragraph" level, which I need instead of "line" because there is a soft return in each entry. 4. I did try to textDecode the metadata, but since it was already decoded in the source, decoding came out as garbage. I even tried encoding it too, knowing it wouldn't work, and I was right. Solution: urlEncode the metadata before merging, and urlDecode after retrieval. When my example is urlEncoded it becomes a simple string: PP04_The+%D2Mystery%D3+of+Marriage I suppose anything that makes ascii out of the text would work, like base64. On 9/10/19 1:04 PM, dsc--- via use-livecode wrote: Jacque, these are my latest thoughts as far as possible problems. 1. Dar is very confused and off in the wrong direction. Use big stick. 2. Binary data is in an 8-bit char set encoding causing problems with UTF-8 decode. Check encoding. 3. Field, line and character metadata are interfering. Clear all, then set and get consistently. 4. Merge is not handling binary data as text. Use textDecode first. Dar Scott Mad Scientist -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
On 9/10/19 1:47 PM, J. Landman Gay via use-livecode wrote: :) 1. Jacque is very confused too, but is afraid of big sticks. 2. Encoding should be identical throughout. I'm working with a large text block, pulling out sections to create a list. All data is retrieved from othe same variable, which is UTF16 native LC text. 3. The metadata is only set at the "paragraph" level, which I need instead of "line" because there is a soft return in each entry. 4. I did try to textDecode the metadata, but since it was already decoded in the source, decoding came out as garbage. I even tried encoding it too, knowing it wouldn't work, and I was right. Solution: urlEncode the metadata before merging, and urlDecode after retrieval. When my example is urlEncoded it becomes a simple string: PP04_The+%D2Mystery%D3+of+Marriage Blah. URLEncode/decode works fine on desktop (Mac) but on Android it fails. For some reason on Android I get this: PP04_The%2B%253FMystery%253F%2Bof%2BMarriage All tests are in LC 9.5 so there should be no difference, right? -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
I looked at this some more on OS X. I'm not seeing a problem with merge. And I'm not seeing a problem with metadata per se, I don't think. But I am seeing a problem with setting metadata with htmlText. > On Sep 10, 2019, at 1:32 PM, J. Landman Gay via use-livecode > wrote: > > On 9/10/19 1:47 PM, J. Landman Gay via use-livecode wrote: >> :) >> 1. Jacque is very confused too, but is afraid of big sticks. >> 2. Encoding should be identical throughout. I'm working with a large text >> block, pulling out sections to create a list. All data is retrieved from >> othe same variable, which is UTF16 native LC text. >> 3. The metadata is only set at the "paragraph" level, which I need instead >> of "line" because there is a soft return in each entry. >> 4. I did try to textDecode the metadata, but since it was already decoded in >> the source, decoding came out as garbage. I even tried encoding it too, >> knowing it wouldn't work, and I was right. >> Solution: urlEncode the metadata before merging, and urlDecode after >> retrieval. When my example is urlEncoded it becomes a simple string: >> PP04_The+%D2Mystery%D3+of+Marriage > > Blah. URLEncode/decode works fine on desktop (Mac) but on Android it fails. > For some reason on Android I get this: > PP04_The%2B%253FMystery%253F%2Bof%2BMarriage > > All tests are in LC 9.5 so there should be no difference, right? > -- > Jacqueline Landman Gay | jac...@hyperactivesw.com > HyperActive Software | http://www.hyperactivesw.com > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
Because htmlText does not set the metadata to interesting characters, you can consider this workaround: Change the quotes in tCONCEPT to “ and ”. And, thus, in tMETADATA. > On Sep 10, 2019, at 12:47 PM, J. Landman Gay via use-livecode > wrote: > > :) > 1. Jacque is very confused too, but is afraid of big sticks. > 2. Encoding should be identical throughout. I'm working with a large text > block, pulling out sections to create a list. All data is retrieved from othe > same variable, which is UTF16 native LC text. > 3. The metadata is only set at the "paragraph" level, which I need instead of > "line" because there is a soft return in each entry. > 4. I did try to textDecode the metadata, but since it was already decoded in > the source, decoding came out as garbage. I even tried encoding it too, > knowing it wouldn't work, and I was right. > > Solution: urlEncode the metadata before merging, and urlDecode after > retrieval. When my example is urlEncoded it becomes a simple string: > PP04_The+%D2Mystery%D3+of+Marriage > > I suppose anything that makes ascii out of the text would work, like base64. > > On 9/10/19 1:04 PM, dsc--- via use-livecode wrote: >> Jacque, these are my latest thoughts as far as possible problems. >> 1. Dar is very confused and off in the wrong direction. Use big stick. >> 2. Binary data is in an 8-bit char set encoding causing problems with UTF-8 >> decode. Check encoding. >> 3. Field, line and character metadata are interfering. Clear all, then set >> and get consistently. >> 4. Merge is not handling binary data as text. Use textDecode first. >> Dar Scott >> Mad Scientist > > > -- > Jacqueline Landman Gay | jac...@hyperactivesw.com > HyperActive Software | http://www.hyperactivesw.com > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
I wondered about using htmltext in the merge too, but before I started using merge I was setting the properties one by one in a handler. Here's part of my original handler, where pResults is a list of lines that match search criteria: repeat for each line l in pResults put item 1 of l & tLineBreak & char 6 to -1 of item 2 of l &cr after tList end repeat lock screen put tList into fld "searchResults" repeat with x = 1 to the num of lines in fld "searchResults" set the leftIndent of paragraph x of fld "searchResults" to 10 set the spaceBelow of paragraph x of fld "searchResults" to 20 set the metadata of paragraph x of fld "searchResults" to line x of pResults set the textcolor of char 1 to offset(tLineBreak,line x of fld "searchResults") of line x of fld "searchResults" to tHiliteColor set the textsize of char 1 to offset(tLineBreak,line x of fld "searchResults") of line x of fld "searchResults" to 16 end repeat unlock screen The metadata still came out wrong. It also took a long time if there were many lines, due to all the field access, so I switched over to merge to create htmltext. I'll try the replacement method you mentioned in another post. Thanks for chiming in here. On 9/10/19 3:27 PM, Dar Scott Consulting via use-livecode wrote: I looked at this some more on OS X. I'm not seeing a problem with merge. And I'm not seeing a problem with metadata per se, I don't think. But I am seeing a problem with setting metadata with htmlText. On Sep 10, 2019, at 1:32 PM, J. Landman Gay via use-livecode wrote: On 9/10/19 1:47 PM, J. Landman Gay via use-livecode wrote: :) 1. Jacque is very confused too, but is afraid of big sticks. 2. Encoding should be identical throughout. I'm working with a large text block, pulling out sections to create a list. All data is retrieved from othe same variable, which is UTF16 native LC text. 3. The metadata is only set at the "paragraph" level, which I need instead of "line" because there is a soft return in each entry. 4. I did try to textDecode the metadata, but since it was already decoded in the source, decoding came out as garbage. I even tried encoding it too, knowing it wouldn't work, and I was right. Solution: urlEncode the metadata before merging, and urlDecode after retrieval. When my example is urlEncoded it becomes a simple string: PP04_The+%D2Mystery%D3+of+Marriage Blah. URLEncode/decode works fine on desktop (Mac) but on Android it fails. For some reason on Android I get this: PP04_The%2B%253FMystery%253F%2Bof%2BMarriage All tests are in LC 9.5 so there should be no difference, right? -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
I'll wager using a styledText array for this will be fun to write and will perform very well.Richard GaskinFourth World Systems ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Merge and unicode
On 9/11/19 12:28 AM, Richard Gaskin via use-livecode wrote: I'll wager using a styledText array for this will be fun to write and will perform very well. I took a look. You'd win that wager. I didn't test performance (haven't written the handler yet) but getting the styledText of an existing list retains the correct characters. -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
RE: Merge and unicode
I have to look back and find my timings but I think I got a 10x or better performance by creating the styled text array and then setting the field to it. Ralph DiMola IT Director Evergreen Information Services rdim...@evergreeninfo.net -Original Message- From: use-livecode [mailto:use-livecode-boun...@lists.runrev.com] On Behalf Of J. Landman Gay via use-livecode Sent: Wednesday, September 11, 2019 4:54 PM To: How to use LiveCode Cc: J. Landman Gay Subject: Re: Merge and unicode On 9/11/19 12:28 AM, Richard Gaskin via use-livecode wrote: > I'll wager using a styledText array for this will be fun to write and will perform very well. I took a look. You'd win that wager. I didn't test performance (haven't written the handler yet) but getting the styledText of an existing list retains the correct characters. -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode