New chunks

2014-03-11 Thread Benjamin Beaumont
Hi All,

We're in the process of adding some new chunk types in LiveCode 7 and we
would appreciate suggestions for a particular chunk name.

The new chunk types are:

naturalword (breaks on unicode word boundaries)
sentence (breaks on unicode sentence boundaries)
paragraph (Same behaviour as current 'line' chunk)

The first chunk is called 'naturalword' because 'word' is already in use.
Renaming the current 'word' chunk to 'token' to free up 'word' is not an
option for backward compatibility. We are also limited by the current
parser which doesn't allow us to use the form:

put natural word 1 of "this is a string of words"

'naturalword' is the clearest internal suggestion at the moment and we'd
love to get the input from community members if there is an even clearer
option.

Warm regards and thank you for your input.

Ben

_

Benjamin Beaumont . RunRev Ltd

LiveCode Product Manager
mail : 25a Thistle Street Lane South West, Edinburgh, EH2 1EW
email : b...@runrev.com
company : +44(0) 845 219 89 23
fax : +44(0) 845 458 8487
web : www.runrev.com

LiveCode - Programming made simple
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-11 Thread Roger Eller
uword
or
wordup (up = unicode part)

~Roger


On Tue, Mar 11, 2014 at 2:15 PM, Benjamin Beaumont  wrote:

> Hi All,
>
> We're in the process of adding some new chunk types in LiveCode 7 and we
> would appreciate suggestions for a particular chunk name.
>
> The new chunk types are:
>
> naturalword (breaks on unicode word boundaries)
> sentence (breaks on unicode sentence boundaries)
> paragraph (Same behaviour as current 'line' chunk)
>
> The first chunk is called 'naturalword' because 'word' is already in use.
> Renaming the current 'word' chunk to 'token' to free up 'word' is not an
> option for backward compatibility. We are also limited by the current
> parser which doesn't allow us to use the form:
>
> put natural word 1 of "this is a string of words"
>
> 'naturalword' is the clearest internal suggestion at the moment and we'd
> love to get the input from community members if there is an even clearer
> option.
>
> Warm regards and thank you for your input.
>
> Ben
>
> _
>
> Benjamin Beaumont . RunRev Ltd
>
> LiveCode Product Manager
> mail : 25a Thistle Street Lane South West, Edinburgh, EH2 1EW
> email : b...@runrev.com
> company : +44(0) 845 219 89 23
> fax : +44(0) 845 458 8487
> web : www.runrev.com
>
> LiveCode - Programming made simple
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-11 Thread dunbarx
Ben.


Why not "unicodeWord" or "uniWord". Roger takes that a step yet further, and, 
well, why not?



For the sentence, is this a chunk delimited by a period and a space?
For the paragraph, is this a chunk delimited by two returns?



Craig Newman


For the

-Original Message-
From: Roger Eller 
To: How to use LiveCode 
Sent: Tue, Mar 11, 2014 2:41 pm
Subject: Re: New chunks


uword
or
wordup (up = unicode part)

~Roger


On Tue, Mar 11, 2014 at 2:15 PM, Benjamin Beaumont  wrote:

> Hi All,
>
> We're in the process of adding some new chunk types in LiveCode 7 and we
> would appreciate suggestions for a particular chunk name.
>
> The new chunk types are:
>
> naturalword (breaks on unicode word boundaries)
> sentence (breaks on unicode sentence boundaries)
> paragraph (Same behaviour as current 'line' chunk)
>
> The first chunk is called 'naturalword' because 'word' is already in use.
> Renaming the current 'word' chunk to 'token' to free up 'word' is not an
> option for backward compatibility. We are also limited by the current
> parser which doesn't allow us to use the form:
>
> put natural word 1 of "this is a string of words"
>
> 'naturalword' is the clearest internal suggestion at the moment and we'd
> love to get the input from community members if there is an even clearer
> option.
>
> Warm regards and thank you for your input.
>
> Ben
>
> _
>
> Benjamin Beaumont . RunRev Ltd
>
> LiveCode Product Manager
> mail : 25a Thistle Street Lane South West, Edinburgh, EH2 1EW
> email : b...@runrev.com
> company : +44(0) 845 219 89 23
> fax : +44(0) 845 458 8487
> web : www.runrev.com
>
> LiveCode - Programming made simple
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

 
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-11 Thread Richmond

On 11/03/14 20:15, Benjamin Beaumont wrote:

Hi All,

We're in the process of adding some new chunk types in LiveCode 7 and we
would appreciate suggestions for a particular chunk name.

The new chunk types are:

naturalword (breaks on unicode word boundaries)


Well; in theory that looks good until you start to think about languages 
which are
written (such as Sanskrit) with no obvious word boundaries and both 
vowel mutation (Sandhi)

at what would be word boundaries, and consonant fusion.

Languages such as Inuit and Hungarian are agglutinative, and in some 
cases what we (speakers of West
European languages) would term a sentence consists of a single word with 
loads of affixes; some at

the front (prefixes).

Many Austronesian languages use infixes (i.e. twiddly bits shoved into 
the middle of 'words').


These also crop up in Afro-Asiatic languages such as Arabic.

There are also some examples in English such as "fan-f*cking-tabulous".

We could also get sweaty about circumfixes, where a bit gets put on the 
front and a bit gets put on the back as

a sort of split morpheme (not to be confused with split-pea bara).


sentence (breaks on unicode sentence boundaries)


That looks a bit fishy.

How are you going to work out what marks a sentence boundary in every 
language that one can write
with Unicode? And there are languages where the idea of a 'sentence' is 
absent.



paragraph (Same behaviour as current 'line' chunk)

The first chunk is called 'naturalword' because 'word' is already in use.
Renaming the current 'word' chunk to 'token' to free up 'word' is not an
option for backward compatibility. We are also limited by the current
parser which doesn't allow us to use the form:

put natural word 1 of "this is a string of words"

'naturalword' is the clearest internal suggestion at the moment and we'd
love to get the input from community members if there is an even clearer
option.


I'm sorry to be such a "pill", but word and sentence boundaries are such 
culture-bound concepts
that they will only be any good for languages that mark word and 
sentence boundaries.


This is about the same as stating dogmatically that "all bananas are 
yellow", when they are not.



Warm regards and thank you for your input.


You may not thank me.

Richmond.



Ben

_

Benjamin Beaumont . RunRev Ltd





___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-11 Thread Peter M. Brigham
On Mar 11, 2014, at 2:15 PM, Benjamin Beaumont wrote:

> Hi All,
> 
> We're in the process of adding some new chunk types in LiveCode 7 and we
> would appreciate suggestions for a particular chunk name.
> 
> The new chunk types are:
> 
> naturalword (breaks on unicode word boundaries)


Will this be implemented so that naturalword excludes punctuation fore and aft? 
eg,
   naturalword 2 of "one (unusual, but illustrative) suggestion would be"   --> 
"unusual" 

-- Peter

Peter M. Brigham
pmb...@gmail.com
http://home.comcast.net/~pmbrig


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-11 Thread Mark Wieder
Richmond  writes:

> How are you going to work out what marks a sentence boundary in every 
> language that one can write
> with Unicode? And there are languages where the idea of a 'sentence' is 
> absent.

My initial take on this was the same as yours. But unicode separators seem
to have been well thought-out (or at least standardized ):

http://www.unicode.org/reports/tr29/
http://www.unicode.org/Public/6.3.0/ucd/auxiliary/SentenceBreakTest.html

and I think any discussion or arguments should be taken up with the unicode
working group, and not have LC create Yet Another Variation on the way folks
will expect things to work.

-- 
 Mark Wieder
 ahsoftw...@gmail.com



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-11 Thread Mark Wieder
Benjamin Beaumont  writes:

> 'naturalword' is the clearest internal suggestion at the moment and we'd
> love to get the input from community members if there is an even clearer
> option.

My preferences, in order would be:

1. change the parser so that it accepts modifiers, and specifically so that
it accepts "unicode word".

2. break on word boundaries with "unicodeWord". I think "unicode" makes it
much clearer what is going on, rather than having to process what "natural"
means.

-- 
 Mark Wieder
 ahsoftw...@gmail.com



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-11 Thread Jim Hurley
Can someone explain how the “sentence" chunk would work?
How are decimal points, and points in an abbreviation distinguished from the 
“period” that deliniates the end of a “sentence?”
Does it presume that the exitsing text has special embedded “periods?”

I’ve written my own, but it is very cumbersome and not flawless. I use it to do 
manuscript analysis.
Like: Find all sentences in which “time” and “party” occur anywhere in the same 
sentence.

My ignorance on unicode is profound.
Jim

C
> Message: 15
> Date: Tue, 11 Mar 2014 18:15:18 +
> From: Benjamin Beaumont 
> To: LiveCode Developer List ,  How to
>   use LiveCode 
> Subject: New chunks
> Message-ID:
>   
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Hi All,
> 
> We're in the process of adding some new chunk types in LiveCode 7 and we
> would appreciate suggestions for a particular chunk name.
> 
> The new chunk types are:
> 
> naturalword (breaks on unicode word boundaries)
> sentence (breaks on unicode sentence boundaries)
> paragraph (Same behaviour as current 'line' chunk)
> 
> The first chunk is called 'naturalword' because 'word' is already in use.
> Renaming the current 'word' chunk to 'token' to free up 'word' is not an
> option for backward compatibility. We are also limited by the current
> parser which doesn't allow us to use the form:
> 
> put natural word 1 of "this is a string of words"
> 
> 'naturalword' is the clearest internal suggestion at the moment and we'd
> love to get the input from community members if there is an even clearer
> option.
> 
> Warm regards and thank you for your input.
> 
> Ben
> 
> _

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-11 Thread Phil Davis
What are the pros and cons of making the 'useUnicode' property 
persistent and true by default, and let it determine the meaning of 
'word'? To my own immature understanding, that seems like a better 
approach that trying to come up with a new vocabulary item.


Phil Davis



On 3/11/14, 11:15 AM, Benjamin Beaumont wrote:

Hi All,

We're in the process of adding some new chunk types in LiveCode 7 and we
would appreciate suggestions for a particular chunk name.

The new chunk types are:

naturalword (breaks on unicode word boundaries)
sentence (breaks on unicode sentence boundaries)
paragraph (Same behaviour as current 'line' chunk)

The first chunk is called 'naturalword' because 'word' is already in use.
Renaming the current 'word' chunk to 'token' to free up 'word' is not an
option for backward compatibility. We are also limited by the current
parser which doesn't allow us to use the form:

put natural word 1 of "this is a string of words"

'naturalword' is the clearest internal suggestion at the moment and we'd
love to get the input from community members if there is an even clearer
option.

Warm regards and thank you for your input.

Ben

_

Benjamin Beaumont . RunRev Ltd

LiveCode Product Manager
mail : 25a Thistle Street Lane South West, Edinburgh, EH2 1EW
email : b...@runrev.com
company : +44(0) 845 219 89 23
fax : +44(0) 845 458 8487
web : www.runrev.com

LiveCode - Programming made simple
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



--
Phil Davis


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-11 Thread Phil Davis
Or maybe introduce a persistent 'useUnicodeChunks' property that 
controls which kind of chunking is used.


On 3/11/14, 3:42 PM, Phil Davis wrote:
What are the pros and cons of making the 'useUnicode' property 
persistent and true by default, and let it determine the meaning of 
'word'? To my own immature understanding, that seems like a better 
approach that trying to come up with a new vocabulary item.


Phil Davis



On 3/11/14, 11:15 AM, Benjamin Beaumont wrote:

Hi All,

We're in the process of adding some new chunk types in LiveCode 7 and we
would appreciate suggestions for a particular chunk name.

The new chunk types are:

naturalword (breaks on unicode word boundaries)
sentence (breaks on unicode sentence boundaries)
paragraph (Same behaviour as current 'line' chunk)

The first chunk is called 'naturalword' because 'word' is already in 
use.

Renaming the current 'word' chunk to 'token' to free up 'word' is not an
option for backward compatibility. We are also limited by the current
parser which doesn't allow us to use the form:

put natural word 1 of "this is a string of words"

'naturalword' is the clearest internal suggestion at the moment and we'd
love to get the input from community members if there is an even clearer
option.

Warm regards and thank you for your input.

Ben

_

Benjamin Beaumont . RunRev Ltd

LiveCode Product Manager
mail : 25a Thistle Street Lane South West, Edinburgh, EH2 1EW
email : b...@runrev.com
company : +44(0) 845 219 89 23
fax : +44(0) 845 458 8487
web : www.runrev.com

LiveCode - Programming made simple
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your 
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-livecode





--
Phil Davis


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-11 Thread Phil Davis

Sorry, I didn't read all emails before suggesting this.
:-)

On 3/11/14, 3:44 PM, Phil Davis wrote:
Or maybe introduce a persistent 'useUnicodeChunks' property that 
controls which kind of chunking is used.


On 3/11/14, 3:42 PM, Phil Davis wrote:
What are the pros and cons of making the 'useUnicode' property 
persistent and true by default, and let it determine the meaning of 
'word'? To my own immature understanding, that seems like a better 
approach that trying to come up with a new vocabulary item.


Phil Davis



On 3/11/14, 11:15 AM, Benjamin Beaumont wrote:

Hi All,

We're in the process of adding some new chunk types in LiveCode 7 
and we

would appreciate suggestions for a particular chunk name.

The new chunk types are:

naturalword (breaks on unicode word boundaries)
sentence (breaks on unicode sentence boundaries)
paragraph (Same behaviour as current 'line' chunk)

The first chunk is called 'naturalword' because 'word' is already in 
use.
Renaming the current 'word' chunk to 'token' to free up 'word' is 
not an

option for backward compatibility. We are also limited by the current
parser which doesn't allow us to use the form:

put natural word 1 of "this is a string of words"

'naturalword' is the clearest internal suggestion at the moment and 
we'd
love to get the input from community members if there is an even 
clearer

option.

Warm regards and thank you for your input.

Ben

_

Benjamin Beaumont . RunRev Ltd

LiveCode Product Manager
mail : 25a Thistle Street Lane South West, Edinburgh, EH2 1EW
email : b...@runrev.com
company : +44(0) 845 219 89 23
fax : +44(0) 845 458 8487
web : www.runrev.com

LiveCode - Programming made simple
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your 
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-livecode







--
Phil Davis


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-11 Thread Mark Wieder
Phil Davis  writes:

> 
> What are the pros and cons of making the 'useUnicode' property 
> persistent and true by default, and let it determine the meaning of 
> 'word'? To my own immature understanding, that seems like a better 
> approach that trying to come up with a new vocabulary item.

I like this as well.

-- 
 Mark Wieder
 ahsoftw...@gmail.com





___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-11 Thread Mark Schonewille

Hi Ben,

I think that the chunk names for unicode should be exactly the same as 
for english text, but they should act differently if either the text 
being operated on is unicode text or the useUnicode has been set to true.


Note that other development tools don't have this problem at all. 
Operations on English text work the same way as on unicode text and 
there is no need to even think about the encoding.


--
Best regards,

Mark Schonewille

Economy-x-Talk Consulting and Software Engineering
Homepage: http://economy-x-talk.com
Twitter: http://twitter.com/xtalkprogrammer
KvK: 50277553

Use Color Converter to convert CMYK, RGB, RAL, XYZ, H.Lab and other 
colour spaces. http://www.color-converter.com


Buy my new book "Programming LiveCode for the Real Beginner" 
http://qery.us/3fi


LiveCode on Facebook:
https://www.facebook.com/groups/runrev/

On 3/11/2014 19:15, Benjamin Beaumont wrote:

Hi All,

We're in the process of adding some new chunk types in LiveCode 7 and we
would appreciate suggestions for a particular chunk name.

The new chunk types are:

naturalword (breaks on unicode word boundaries)
sentence (breaks on unicode sentence boundaries)
paragraph (Same behaviour as current 'line' chunk)

The first chunk is called 'naturalword' because 'word' is already in use.
Renaming the current 'word' chunk to 'token' to free up 'word' is not an
option for backward compatibility. We are also limited by the current
parser which doesn't allow us to use the form:

put natural word 1 of "this is a string of words"

'naturalword' is the clearest internal suggestion at the moment and we'd
love to get the input from community members if there is an even clearer
option.

Warm regards and thank you for your input.

Ben




___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-12 Thread Richmond
I think this might be useful for anyone who wants to detect word breaks 
in Unicode,

although, personally, it makes me feel very queasy indeed:

http://www.unicode.org/Public/6.3.0/ucd/auxiliary/WordBreakTest.html

This comment; "The material here is informative, not normative." doesn't
exactly inspire confidence either. But, rather than some of the RunRev team
taking a year to have a world tour interviewing speakers of the
100 most-spoken languages to find out how they type their stuff, this is all
there is.

"If your browser handles titles (tool tips), then hovering the mouse 
over the row header will show a sample character of that type."


Well, Firefox handles them; and in some ways the tooltips are the most 
useful things on that page.


[ Mind you, if I were the person at RunRev having to deal with implementing
Unicode I would jump at the chance to have a world tour interviewing
people . . . LOL . . . however, I might just "softly and silently vanish 
away"

somewhere near Singapore! ]

This might, as well, be a "right bu**er" when one comes to typing 
languages that go from right to left

[ Arabic, Manda, Hebrew, et al ].

And here is a bit about sentences:

http://www.unicode.org/reports/tr29/#Sentence_Boundaries

Richmond.

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-12 Thread Fraser Gordon

On 11 Mar 2014, at 19:26, Richmond  wrote:
> 
> Well; in theory that looks good until you start to think about languages 
> which are
> written (such as Sanskrit) with no obvious word boundaries and both vowel 
> mutation (Sandhi)
> at what would be word boundaries, and consonant fusion.

The library that we use for low-level Unicode stuff (ICU) provides a facility 
called "break iterators" - basically, these functions break up text according 
to various rules and variants are provided for graphemes, words, sentences, 
etc. ICU has a (very large) database of rules and (for some languages) 
dictionaries in order to properly break words even in complex languages. Not 
all languages are supported but a large number are.

> 
>> sentence (breaks on unicode sentence boundaries)
> 
> That looks a bit fishy.
> 
> How are you going to work out what marks a sentence boundary in every 
> language that one can write
> with Unicode? And there are languages where the idea of a 'sentence' is 
> absent.

Again, ICU does the hard work. In a language without sentences, text will only 
contain one sentence. 

There is also enough intelligence in ICU that it can tell the difference 
between a decimal point and a full-stop/period. Some languages use different 
marks as sentence separators and ICU also knows about them.

> 
> I'm sorry to be such a "pill", but word and sentence boundaries are such 
> culture-bound concepts
> that they will only be any good for languages that mark word and sentence 
> boundaries.
> 
> This is about the same as stating dogmatically that "all bananas are yellow", 
> when they are not.

Paragraphs are defined in the Unicode standard. They are runs of text 
terminated by the Paragraph Separator character or (optionally) any other 
newline character. While it may not make sense linguistically, this is how we 
delimit paragraphs in LiveCode fields.


Regards,
Fraser
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Fwd: New chunks

2014-03-12 Thread Björnke von Gierke
sent to wrong list.. resend.


I like this approach. However I think two problems still remain:

Problems with libraries and substacks still remain. A stack property that is 
able to set a whole stack to some kind of 'word legacy mode' would therefore be 
very useful.

Word was always unintuitive with punctuation and quotes, so while we're at it, 
why not make word more intuitive? But If you change that, the legacy way of 
word handling should not be removed, because it was somewhat useful. The 
existing word could also theoretically be improved to work with csv parsing, 
that would be very neat. However, either of these ideas about changing how word 
works can't be handled by a simple find/replace in the ide.

On 12.03.2014, at 11:00, Kevin Miller  wrote:

> I think having slept on it my initial suggestion is this:
> 
> We introduce a new legacyWord property for the current word behavior
> 
> What you propose as naturalWord is simply the “word" chunk. Most existing 
> users – in time – and all new users, will want to use this. It is the only 
> way to deliver transparent Unicode. I hate having to make new users use 
> naturalWord, unicodeWord or anything else like that, from now until the dawn 
> of time.
> 
> When the IDE opens a stack in pre 7.0 format, a dialog pops up informing the 
> user of the change and asking if they want to update their scripts. If yes, 
> we use find and replace to swap out word for legacyWord in their scripts.
> 
> Kind regards,
> 
> Kevin
> 
> Kevin Miller ~ ke...@runrev.com ~ http://www.livecode.com/
> LiveCode: Everyone can code
> 
> From: Benjamin Beaumont 
> Reply-To: LiveCode Developer List 
> Date: Tuesday, 11 March 2014 18:15
> To: LiveCode Developer List , How to use 
> LiveCode 
> Subject: New chunks
> 
> Hi All,
> 
> We're in the process of adding some new chunk types in LiveCode 7 and we 
> would appreciate suggestions for a particular chunk name.
> 
> The new chunk types are:
> 
> naturalword (breaks on unicode word boundaries)
> sentence (breaks on unicode sentence boundaries)
> paragraph (Same behaviour as current 'line' chunk)
> 
> The first chunk is called 'naturalword' because 'word' is already in use. 
> Renaming the current 'word' chunk to 'token' to free up 'word' is not an 
> option for backward compatibility. We are also limited by the current parser 
> which doesn't allow us to use the form:
> 
> put natural word 1 of "this is a string of words"
> 
> 'naturalword' is the clearest internal suggestion at the moment and we'd love 
> to get the input from community members if there is an even clearer option.
> 
> Warm regards and thank you for your input.
> 
> Ben



-- 

Use an alternative Dictionary viewer:
http://bjoernke.com/bvgdocu/

Chat with other RunRev developers:
http://bjoernke.com/chatrev/



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-12 Thread Richmond

On 12/03/14 12:12, Fraser Gordon wrote:

On 11 Mar 2014, at 19:26, Richmond  wrote:

Well; in theory that looks good until you start to think about languages which 
are
written (such as Sanskrit) with no obvious word boundaries and both vowel 
mutation (Sandhi)
at what would be word boundaries, and consonant fusion.

The library that we use for low-level Unicode stuff (ICU) provides a facility called 
"break iterators" - basically, these functions break up text according to 
various rules and variants are provided for graphemes, words, sentences, etc. ICU has a 
(very large) database of rules and (for some languages) dictionaries in order to properly 
break words even in complex languages. Not all languages are supported but a large number 
are.


sentence (breaks on unicode sentence boundaries)

That looks a bit fishy.

How are you going to work out what marks a sentence boundary in every language 
that one can write
with Unicode? And there are languages where the idea of a 'sentence' is absent.

Again, ICU does the hard work. In a language without sentences, text will only 
contain one sentence.

There is also enough intelligence in ICU that it can tell the difference 
between a decimal point and a full-stop/period. Some languages use different 
marks as sentence separators and ICU also knows about them.


I'm sorry to be such a "pill", but word and sentence boundaries are such 
culture-bound concepts
that they will only be any good for languages that mark word and sentence 
boundaries.

This is about the same as stating dogmatically that "all bananas are yellow", 
when they are not.

Paragraphs are defined in the Unicode standard. They are runs of text 
terminated by the Paragraph Separator character or (optionally) any other 
newline character. While it may not make sense linguistically, this is how we 
delimit paragraphs in LiveCode fields.


A pretty comprehensive answer to all my points.

Thanks.

Richmond.


Regards,
Fraser
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-12 Thread J. Landman Gay

On 3/12/14, 5:12 AM, Fraser Gordon wrote:

The library that we use for low-level Unicode stuff


Just curious, do you know how much this will inflate the file size of a 
standalone? (Not that it matters really.) I'm thinking about the 
dictionary size(s) as well as the library itself.


--
Jacqueline Landman Gay | jac...@hyperactivesw.com
HyperActive Software   | http://www.hyperactivesw.com

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-12 Thread Richmond

On 12/03/14 18:07, J. Landman Gay wrote:

On 3/12/14, 5:12 AM, Fraser Gordon wrote:

The library that we use for low-level Unicode stuff


Just curious, do you know how much this will inflate the file size of
a standalone? (Not that it matters really.) I'm thinking about the
dictionary size(s) as well as the library itself.



This is going to be the price of having an IDE that is capable of 
singing, dancing and brewing coffee at 5 in the morning: standalone bloat.


I wonder if all the jazzy new additions could not be hived off, so that 
everything is modularised, so one could run down a checklist of capabilities
one's standalone uses before build-time? The one could "trim the fat" to 
whatever one's standalone required, rather than carting around

a lot of excess baggage.

For instance; I might be making something that only deals with ASCII 
script, so I would uncheck all the modules to do with Unicode capabilities

and they would not be built into my standalone.

There are already one or two choices about inclusions to be made in the 
standalone builder, so this would not be setting a precedent.


Just a thought.

Richmond.

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-12 Thread Jim Hurley
Let me try this again.

In “sentence (breaks on unicode sentence boundaries)” how does RR distinguish 
between the multiple uses of the period?

Jim
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-12 Thread Fraser Gordon

On 12 Mar 2014, at 17:02, Richmond  wrote:

> On 12/03/14 18:07, J. Landman Gay wrote:
>> On 3/12/14, 5:12 AM, Fraser Gordon wrote:
>>> The library that we use for low-level Unicode stuff
>> 
>> Just curious, do you know how much this will inflate the file size of
>> a standalone? (Not that it matters really.) I'm thinking about the
>> dictionary size(s) as well as the library itself.
>> 

If you include everything… about 30MB. Not exactly small.

> 
> This is going to be the price of having an IDE that is capable of singing, 
> dancing and brewing coffee at 5 in the morning: standalone bloat.
> 
> I wonder if all the jazzy new additions could not be hived off, so that 
> everything is modularised, so one could run down a checklist of capabilities
> one's standalone uses before build-time? The one could "trim the fat" to 
> whatever one's standalone required, rather than carting around
> a lot of excess baggage.

We definitely don't want to force anyone to include this mass of data if they 
don't need it. As you suggest, the standalone builder will allow you to 
configure how much of this data you want, though we've not yet decided the best 
way to do this. Some will always be necessary for the engine to function, but 
we're trying to minimise that.

Regards,
Fraser
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-12 Thread Bob Sneidar
Pretty sure Livecode is going to do a simple delimiter on period. You would 
have to prep the data first by replacing periods in any word that is a number 
with a placeholder, processing your sentences, then restoring the placeholders 
(if you need to). 

You could get fancy by setting the lineDelimiter to space, then finding every 
line that ends in a period and processing everything in-between. It’s doubtful 
a number would end in a period without it being the end of a sentence. 

Bob


On Mar 11, 2014, at 15:34 , Jim Hurley  wrote:

> Can someone explain how the “sentence" chunk would work?
> How are decimal points, and points in an abbreviation distinguished from the 
> “period” that deliniates the end of a “sentence?”
> Does it presume that the exitsing text has special embedded “periods?”
> 
> I’ve written my own, but it is very cumbersome and not flawless. I use it to 
> do manuscript analysis.
> Like: Find all sentences in which “time” and “party” occur anywhere in the 
> same sentence.
> 
> My ignorance on unicode is profound.
> Jim
> 
> C
>> Message: 15
>> Date: Tue, 11 Mar 2014 18:15:18 +
>> From: Benjamin Beaumont 
>> To: LiveCode Developer List , How to
>>  use LiveCode 
>> Subject: New chunks
>> Message-ID:
>>  
>> Content-Type: text/plain; charset=ISO-8859-1
>> 
>> Hi All,
>> 
>> We're in the process of adding some new chunk types in LiveCode 7 and we
>> would appreciate suggestions for a particular chunk name.
>> 
>> The new chunk types are:
>> 
>> naturalword (breaks on unicode word boundaries)
>> sentence (breaks on unicode sentence boundaries)
>> paragraph (Same behaviour as current 'line' chunk)
>> 
>> The first chunk is called 'naturalword' because 'word' is already in use.
>> Renaming the current 'word' chunk to 'token' to free up 'word' is not an
>> option for backward compatibility. We are also limited by the current
>> parser which doesn't allow us to use the form:
>> 
>> put natural word 1 of "this is a string of words"
>> 
>> 'naturalword' is the clearest internal suggestion at the moment and we'd
>> love to get the input from community members if there is an even clearer
>> option.
>> 
>> Warm regards and thank you for your input.
>> 
>> Ben
>> 
>> _
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-12 Thread Vokey, John
As they are unicode “words”, why not: wurd?

put the 2nd wurd of “word word” into naturalword.

Just a thought.

(Now ducking and running because this email is obviously about cheese).

--
Please avoid sending me Word or PowerPoint attachments.
See 

-Dr. John R. Vokey
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-12 Thread Paul Dupuis
Actually, no it is not based on simple period delimiters. It is using a
code library that has built in rules to understand sentence structure in
most languages and actually can recognize real end-of-sentences. It
deals with numbers, abbreviations, etc. correctly.

Can someone probably construct some sequence of characters that could be
called a sentence that might get mis-parsed? Possibly - I am familiar
with the library RunRev is using only by reputation, so I can't say for
sure. However for most text you will work with where you want to return
"sentence 2 of paragraph 5 of fld X" you will get exactly what you expect.

On 3/12/2014 7:58 PM, Bob Sneidar wrote:
> Pretty sure Livecode is going to do a simple delimiter on period. You would 
> have to prep the data first by replacing periods in any word that is a number 
> with a placeholder, processing your sentences, then restoring the 
> placeholders (if you need to). 
>
> You could get fancy by setting the lineDelimiter to space, then finding every 
> line that ends in a period and processing everything in-between. It’s 
> doubtful a number would end in a period without it being the end of a 
> sentence. 
>
> Bob
>
>
> On Mar 11, 2014, at 15:34 , Jim Hurley  wrote:
>
>> Can someone explain how the “sentence" chunk would work?
>> How are decimal points, and points in an abbreviation distinguished from the 
>> “period” that deliniates the end of a “sentence?”
>> Does it presume that the exitsing text has special embedded “periods?”
>>
>> I’ve written my own, but it is very cumbersome and not flawless. I use it to 
>> do manuscript analysis.
>> Like: Find all sentences in which “time” and “party” occur anywhere in the 
>> same sentence.
>>
>> My ignorance on unicode is profound.
>> Jim
>>
>> C
>>> Message: 15
>>> Date: Tue, 11 Mar 2014 18:15:18 +
>>> From: Benjamin Beaumont 
>>> To: LiveCode Developer List ,How to
>>> use LiveCode 
>>> Subject: New chunks
>>> Message-ID:
>>> 
>>> Content-Type: text/plain; charset=ISO-8859-1
>>>
>>> Hi All,
>>>
>>> We're in the process of adding some new chunk types in LiveCode 7 and we
>>> would appreciate suggestions for a particular chunk name.
>>>
>>> The new chunk types are:
>>>
>>> naturalword (breaks on unicode word boundaries)
>>> sentence (breaks on unicode sentence boundaries)
>>> paragraph (Same behaviour as current 'line' chunk)
>>>
>>> The first chunk is called 'naturalword' because 'word' is already in use.
>>> Renaming the current 'word' chunk to 'token' to free up 'word' is not an
>>> option for backward compatibility. We are also limited by the current
>>> parser which doesn't allow us to use the form:
>>>
>>> put natural word 1 of "this is a string of words"
>>>
>>> 'naturalword' is the clearest internal suggestion at the moment and we'd
>>> love to get the input from community members if there is an even clearer
>>> option.
>>>
>>> Warm regards and thank you for your input.
>>>
>>> Ben
>>>
>>> _
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-12 Thread Dick Kriesel
How about "revword" for the pre-7 behavior?

+ frees "word" to make learning easier post-6
+ tribute to heritage
+ easily recognizable and sensible reference for current and future livecoders
+ no reference to encoding
+ no reference to versions
+ no unexpected "u"
+ precedent for similar changes in the future

-- Dick
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-12 Thread dunbarx
JIm.


Me, too.


I asked early on if it was envisioned that a sentence be delimited by a period 
and a space. (or a paragraph by two successive returns) These seem unreliable 
as large scale text separators,


Craig



-Original Message-
From: Jim Hurley 
To: use-livecode 
Sent: Wed, Mar 12, 2014 1:07 pm
Subject: Re: New chunks


Let me try this again.

In “sentence (breaks on unicode sentence boundaries)” how does RR distinguish 
between the multiple uses of the period?

Jim
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

 
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: New chunks

2014-03-12 Thread Jerry Jensen
+1 for revword. But I still think wurd is the funniest.
.Jerry

On Mar 12, 2014, at 7:25 PM, Dick Kriesel  wrote:

> How about "revword" for the pre-7 behavior?
> 
> + frees "word" to make learning easier post-6
> + tribute to heritage
> + easily recognizable and sensible reference for current and future livecoders
> + no reference to encoding
> + no reference to versions
> + no unexpected "u"
> + precedent for similar changes in the future
> 
> -- Dick
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-12 Thread Jim Hurley
Hi Bob,

I have an app that I use for manuscript analysis and have done some of this 
"prep."

For example to deal with decimal points, since the digit following the decimal 
point is always a number (0123456789):

 repeat with i = 0 to 9
  replace "." & i with "\" & i in tText
   end repeat

So that 3.14 becomes 3\14 and 2.3456 becomes 2\3456

But that's only the beginning.  There are a massive number of abbreviations: 
Mr. Smith, etc. Jan. , A.D., Dr.  

I have a list of about 70.

But then there are names: P. G. Wodehouse and on and on.

I find my script useful for my own needs, but I don't see any commercial 
application. Just too many unexpected places where a period will show up.

So I really can't see the purpose of RR's "sentence chunk". I wish they would 
explain.

Jim

> 
> Message: 25
> Date: Wed, 12 Mar 2014 23:58:46 +0000
> From: Bob Sneidar 
> To: How to use LiveCode 
> Subject: Re: New chunks
> Message-ID: 
> Content-Type: text/plain; charset="Windows-1252"
> 
> Pretty sure Livecode is going to do a simple delimiter on period. You would 
> have to prep the data first by replacing periods in any word that is a number 
> with a placeholder, processing your sentences, then restoring the 
> placeholders (if you need to). 
> 
> You could get fancy by setting the lineDelimiter to space, then finding every 
> line that ends in a period and processing everything in-between. It?s 
> doubtful a number would end in a period without it being the end of a 
> sentence. 
> 
> Bob
> 
> 
> On Mar 11, 2014, at 15:34 , Jim Hurley  wrote:
> 
>> Can someone explain how the ?sentence" chunk would work?
>> How are decimal points, and points in an abbreviation distinguished from the 
>> ?period? that deliniates the end of a ?sentence??
>> Does it presume that the exitsing text has special embedded ?periods??
>> 
>> I?ve written my own, but it is very cumbersome and not flawless. I use it to 
>> do manuscript analysis.
>> Like: Find all sentences in which ?time? and ?party? occur anywhere in the 
>> same sentence.
>> 
>> My ignorance on unicode is profound.
>> Jim
>> For some reason data
> in certain rows didn't 'register' correctly and so WHERE clauses based on
> those rows didn't work. A bug report was issued and the problem solved.
> 
> Currently the WHERE clauses in SQLite + LC 6.6.1 (6.6 rc1) seem to be
> working for me, but I haven't really stress tested it.
> 
> Try creating a brand new db and see if you can get WHERE clauses to work.
> If so, what about dumping your data, build a new db and see if that works?
> 
> 
> --
> 
> Subject: Digest Footer
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> --
> 
> End of use-livecode Digest, Vol 126, Issue 19
> *


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-12 Thread J. Landman Gay

On 3/12/14, 9:25 PM, Dick Kriesel wrote:

How about "revword" for the pre-7 behavior?


That's not bad. I wouldn't mind it.

As an aside, I'm working late tonight and doing a lot of text parsing. 
I've got some text like this:


  3/13/14  text  more text

and without even thinking about it, I grab the date:

  put word 1 of line x of fld y into tDate

And then I stop short and think about what I'd need to do if "word" 
wasn't a word any more. 


This is going to take some getting used to. I've been having this kind 
of double-take all night. I didn't realize at first just how much I use 
that token. Changing "word" is going to be a huge hit for a while.


--
Jacqueline Landman Gay | jac...@hyperactivesw.com
HyperActive Software   | http://www.hyperactivesw.com

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-13 Thread Fraser Gordon

On 13 Mar 2014, at 04:48, Jim Hurley  wrote:
> 
> So I really can't see the purpose of RR's "sentence chunk". I wish they would 
> explain.
> 

We'd be using ICU's sentence breaking code. They include a whole bunch of 
language-related knowledge with the library and can use that to tell the 
difference between decimal points, full stops, abbreviations, etc. You're right 
about it not being perfect but it does seem pretty reliable.

Regards,
Fraser


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-13 Thread Alan Stenhouse
Wouldn't it be simple to

set the itemDel to tab
put item 1 of line x of fld y into tDate


Granted, slightly more work, but since itemDel changes are limited to only the 
local handler, it's not too big a deal, is it?

Or am I missing something else?

Re: the change to "word" - I'd love for it to function seamlessly with unicode 
and actually work properly according to what "normal" people's idea of a word 
is. Isn't that the reason we use livecode?

Regarding using another term for the previous behaviour - legacyWord or revWord 
seem reasonable. Some scripts will need changed, but such are the side-effects 
of progress…

cheers

Alan

On 13/03/2014, at 12:00 PM, use-livecode-requ...@lists.runrev.com wrote:

>   3/13/14  text  more text
> 
> and without even thinking about it, I grab the date:
> 
>   put word 1 of line x of fld y into tDate
> 
> And then I stop short and think about what I'd need to do if "word" 
> wasn't a word any more. 
> 
> This is going to take some getting used to. I've been having this kind 
> of double-take all night. I didn't realize at first just how much I use 
> that token. Changing "word" is going to be a huge hit for a while.


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-13 Thread Peter Brigham
On Wed, Mar 12, 2014 at 9:32 PM, Paul Dupuis  wrote:

> Can someone probably construct some sequence of characters that could be
> called a sentence that might get mis-parsed? Possibly - I am familiar
> with the library RunRev is using only by reputation, so I can't say for
> sure. However for most text you will work with where you want to return
> "sentence 2 of paragraph 5 of fld X" you will get exactly what you expect.
>

How about this:

"In later years, P.G. Wodehouse always went by P.G. Wodehouse might have
thought that Pelham Grenville sounded snooty."

No algorithm is going to manage this kind of thing, where the reader has to
understand the meaning of the sentences to parse them correctly. The
question is how seldom will mistakes occur.

-- Peter

Peter M. Brigham
pmb...@gmail.com
http://home.comcast.net/~pmbrig
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-13 Thread J. Landman Gay
Sure, that's what I'd need to do, convert each instance to items.  I was 
bemoaning the sheer number of times it would have been necessary in just one 
night's work.  It does point out the necessity for me to have a global switch 
of some sort.  

Think of all the swapping you'd have to do within a single repeat loop just to 
pull out chunks composed of different types of characters. A date followed by a 
currency followed by a quoted string for example.  

On March 13, 2014 6:58:27 AM CDT, Alan Stenhouse  
wrote:
>Wouldn't it be simple to
>
>set the itemDel to tab
>put item 1 of line x of fld y into tDate
>
>
>Granted, slightly more work, but since itemDel changes are limited to
>only the local handler, it's not too big a deal, is it?
>
>Or am I missing something else?
>
>

-- 
Jacqueline Landman Gay | jac...@hyperactivesw.com
HyperActive Software   | http://www.hyperactivesw.com

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-13 Thread Bob Sneidar
Office products stumble at name and standard street abbreviations as well. 
Floating point numbers however are ignored. Apparently what they are looking 
for is period-space or period-. Not very comprehensive, but 
of course nothing could accommodate all abbreviations. The example, "I used to 
know H. G. Wells." will produce 3 sentences. But if you ignore a single letter 
followed by a period, then the phrase, “A is greater than B. Therefore, B is 
not greater than A.” will only produce one sentence. 

This evokes in me a severe doubt that the number of sentences can ever be 
absolutely determined with 100% confidence. I suppose had the English language 
been developed in the digital age, someone would have thought of this conundrum 
and used a different character for abbreviations, decimal indicators and 
sentences. 

Things like this make me ponder in what scenario would it be necessary to 
isolate sentences at all. If Microsoft Word, the defacto word processor of the 
world, cannot absolutely detect all sentences in all situations, they obviously 
don’t think there is a real need for it. Can anyone cite an application that 
can detect sentences with 100% certainty? If so, figure out what they are 
using. 

Bob


On Mar 13, 2014, at 02:47 , Fraser Gordon  wrote:

> 
> On 13 Mar 2014, at 04:48, Jim Hurley  wrote:
>> 
>> So I really can't see the purpose of RR's "sentence chunk". I wish they 
>> would explain.
>> 
> 
> We'd be using ICU's sentence breaking code. They include a whole bunch of 
> language-related knowledge with the library and can use that to tell the 
> difference between decimal points, full stops, abbreviations, etc. You're 
> right about it not being perfect but it does seem pretty reliable.
> 
> Regards,
> Fraser
> 
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-13 Thread Jim Hurley
Thanks Fraser. I’m looking forward to it.

I have a poor man’s version that is workable.

However just ran across a sentence in today’s NYT that I will have to include:

(“What they’re offering people is a full stomach and an empty soul.”)

I had dealt with the quote beyond the period and the paren beyond the period, 
but not both.

I find it quite useful in look ups. I can look find all sentences in a book 
length ms. in which any number of words occur, regardless of the order.

The ICU library will be a welcome addition.

Jim

> To: How to use LiveCode 
> Subject: Re: New chunks
> Message-ID: 
> Content-Type: text/plain; charset=us-ascii
> 
> 
> On 13 Mar 2014, at 04:48, Jim Hurley  wrote:
>> 
>> So I really can't see the purpose of RR's "sentence chunk". I wish they 
>> would explain.
>> 
> 
> We'd be using ICU's sentence breaking code. They include a whole bunch of 
> language-related knowledge with the library and can use that to tell the 
> difference between decimal points, full stops, abbreviations, etc. You're 
> right about it not being perfect but it does seem pretty reliable.
> 
> Regards,
> Fraser
> 
> 
> 
> 
> --
> 
> Subject: Digest Footer
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> --
> 
> End of use-livecode Digest, Vol 126, Issue 20
> *


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Fwd: New chunks

2014-03-13 Thread Benjamin Beaumont
Dear List Members,

This discussion has been very interesting and there have been a lot of
suggestions made. Ultimately I do not feel that now is the correct time to
change the fundamental meaning of any of the syntax of LiveCode. My team
and I have been very carefully crafting the 7.0 release so that existing
applications will run and (with virtually no changes in most cases) work
exactly the same as before except that they will allow arbitrary language
strings as input rather than ones which are just limited to the 'native'
character set of the platform (i.e. MacRoman / Latin-1). As Richard pointed
out in a recent post, this has been a huge endeavour and if we were to
change fundamental syntax then it would make it a lot harder to determine
where problems might lie as we stabilize the 7.0 engine and get it
release-worthy.

One of the next big projects we will be undertaking when 7.0 is
release-worthy is integrating the Open Language ideas that have been
discussed and mentioned (albeit in not much depth) before. At this point we
will be completely at liberty to experiment with, adapt and even change
syntax to ensure it is the best it can be, and the most appropriate. One
part of this project will be the refinement of all existing syntax - there
will be two parsers the existing one and the new, together with a script
translation system that should mean upgrading to the new syntax will be
straightforward (and only required when you wish to use the new Open
Language syntax in an existing script - you won't have to update all
scripts at once, and perhaps never at all for some). This is one point at
which we can perhaps correct some historical syntax which is perhaps not
the best.

Moreover, eventually the Open Language project will mean that on (at the
very least) a project-by-project basis you will be able to tailor your
syntax environment to how you want it. If you prefer the current definition
of 'word' then you will be able to continue to use that - just plug in a
module that maps the 'word' syntax to the existing semantics.

It is also important to stress just how important the current 'word' chunk
actually is in script - it's been interesting to see people go from "we
should change the meaning of word" to, "hmmm, perhaps we shouldn't - I
looked at my scripts last night and it would be a nightmare to
change". Currently
LiveCode's 'word' chunk is inherited from HyperCard and is deeply ingrained
in the language - it is a programmatic construct which is convenient for
numerous things, it is not an attempt at proper word boundary analysis. [ A
good example of usage, original cited by Monte, is things like 'word 2 of
the name of tObject' ].

So with that in mind, I really do think the only option we have now if we
want a more word-like word chunk is to choose a different name for it. This
gives existing scripts access to the ability without having to change them
in any way, but doesn't close the door to a more radical change in the
future (when we have Open Language).

The original suggestion was 'naturalWord' to suggest 'natural language' -
however that does not seem popular (I could mention 'lead balloon' here,
but I think the discussion on the lists speaks for itself).

The suggestion of 'unicodeWord' (or variants thereof) I do think would be
an incorrect path - the notion of word boundaries is not somehow just
applicable to 'Unicode' text, it applies to existing (non-Unicode) text
also. Indeed, in 7.0 there will not be a difference - all of the tokens
which mention 'unicode' will be deprecated as they are no longer needed for
new code. i.e. The idea is that you just have text - word boundary analysis
applies to all text, regardless of whatever internal encoding you might
need to store it.

Another suggestion has been 'wordUnit' - however (to me at least) this just
does not seem to suggest anything meaningful. We are talking about 'words'
which is something everyone has some idea about - indeed, as has been
pointed out, the fact that LiveCode's current 'word' is not really like a
'word' as we might intuitively expect can trip-up people new to the
language and its history. I'm not sure adding 'unit' on the end of 'word'
adds anything really related to the underlying concept nor helps quantify
the difference.

Given that we are talking about adding a chunk type that is more
'real-world' or 'true' to intuitive expectation of what a word should be -
the best suggestion I've seen so far is 'trueWord' (thanks Richard!).

Warm regards,

Mark
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-13 Thread Benjamin Beaumont
Dear list member,

Thank you for your input on this important matter. Having taken all the
posts into account and discussed it internally we are inclined to implement
the following in LiveCode 7.0:

1) Add a 'trueWord' chunk type to provide access to the Unicode Standard's
notion of word-breaking.
2) Add a 'sentence' chunk type to provide access to the Unicode Standard's
notion of sentence-breaking.
3) Add a 'paragraph' chunk type to provide access to the Unicode Standard's
notion of paragraph-breaking.
4) Add a synonym 'part' for the current 'word' chunk type.

This will allow LiveCoders to update their scripts to use the newer syntax
in anticipation of a future change to make the behaviour of the 'word'
chunk match the new 'trueWord' behaviour.

We would anticipate changing the meaning of 'word' with our 'Open Language'
project. It requires us to create a highly accurate script translation
system to allow old scripts to be rewritten in new revised and cleaner
syntax. It is at this point we can seriously think about changing the
meaning of existing tokens including word. Existing scripts will continue
to run using the existing parser, and they can be converted (by the user)
over time to use the 'newer' / 'more refined' syntax we are planning.

Warm regards and thank you again for your participation in this discussion.

Ben

_

Benjamin Beaumont . RunRev Ltd

LiveCode Product Manager
mail : 25a Thistle Street Lane South West, Edinburgh, EH2 1EW
email : b...@runrev.com
company : +44(0) 845 219 89 23
fax : +44(0) 845 458 8487
web : www.runrev.com

LiveCode - Programming made simple
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-13 Thread Richard Gaskin

On 3/13/14 9:51 AM, Benjamin Beaumont wrote:

Dear List Members,

This discussion has been very interesting and there have been a lot of
suggestions made. Ultimately I do not feel that now is the correct time
to change the fundamental meaning of any of the syntax of LiveCode. My
team and I have been very carefully crafting the 7.0 release so that
existing applications will run and (with virtually no changes in most
cases) work exactly the same as before except that they will allow
arbitrary language strings as input rather than ones which are just
limited to the 'native' character set of the platform (i.e. MacRoman /
Latin-1). As Richard pointed out in a recent post, this has been a huge
endeavour and if we were to change fundamental syntax then it would make
it a lot harder to determine where problems might lie as we stabilize
the 7.0 engine and get it release-worthy.


Giving credit where due, it was really Joseba Aguayo's post that 
prompted me to consider the impact on testing.  Sometimes it's the brief 
post that gets to the point quickly that helps us keep our priorities clear.


Now we have a challenge for the community:

The LiveCode team has done their part, postponing a change so we can 
expect our existing code to run in v7 as well as it does in v6.


This means that it's up to us to actually do the testing.

If any of you aren't currently using v6.6 RC1, please note that it's a 
Release Candidate, so it's critical that you work with it now.


If you put off using it until after release, any bugs found will just be 
more expensive to fix, and an embarrassment to the community.


Let's step up our game and give v6.6 a thorough workout today.

When it's released, we'll have established a solid baseline for 
evaluating v7.


Given v7's scope of changes, we'll really need that solid baseline for 
the thorough testing I hope you'll all give it.


In the past I've been lax myself about testing some RCs, but after 
seeing bugs post-release that I could have found earlier, I've learned.


So let's please do our part to ensure that the current Release Candidate 
fully meets our needs.  If we test thoroughly enough it may even be the 
most solid release ever, which benefits everyone.


6.6 RC1 is available at the top of this page:


--
 Richard Gaskin
 Fourth World
 LiveCode training and consulting: http://www.fourthworld.com
 Webzine for LiveCode developers: http://www.LiveCodeJournal.com
 Follow me on Twitter:  http://twitter.com/FourthWorldSys


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-13 Thread J. Landman Gay

On 3/13/14, 10:45 AM, Bob Sneidar wrote:

Things like this make me ponder in what scenario would it be
necessary to isolate sentences at all.


I've had more than one case where I wanted to extract portions of a 
paragraph. Right now lines = paragraphs and it gets messy trying to do 
that. Parse a legal document, for example:


Section III. This is contract language for something or other. Blah blah 
blah.


Section IIIii. Under no circumstances is this to be implied that blah 
blah blah.


"put sentence 2 of line 1 of tText into tLegalese"

--
Jacqueline Landman Gay | jac...@hyperactivesw.com
HyperActive Software   | http://www.hyperactivesw.com

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-13 Thread Bob Sneidar
I had to read it 3 times to get it right! No computer is THAT advanced!

Bob


On Mar 13, 2014, at 07:31 , Peter Brigham  wrote:

> On Wed, Mar 12, 2014 at 9:32 PM, Paul Dupuis  wrote:
> 
>> Can someone probably construct some sequence of characters that could be
>> called a sentence that might get mis-parsed? Possibly - I am familiar
>> with the library RunRev is using only by reputation, so I can't say for
>> sure. However for most text you will work with where you want to return
>> "sentence 2 of paragraph 5 of fld X" you will get exactly what you expect.
>> 
> 
> How about this:
> 
> "In later years, P.G. Wodehouse always went by P.G. Wodehouse might have
> thought that Pelham Grenville sounded snooty."
> 
> No algorithm is going to manage this kind of thing, where the reader has to
> understand the meaning of the sentences to parse them correctly. The
> question is how seldom will mistakes occur.
> 
> -- Peter
> 
> Peter M. Brigham
> pmb...@gmail.com
> http://home.comcast.net/~pmbrig
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-13 Thread John Craig
Will point 4 - 'part' synonym break anything that already uses 'part' as 
a synonym of 'control'?  I've only got one project that uses this 
keyword - not a big deal for me to change it, but others may have more 
substantial work to change.



On 13/03/2014 16:54, Benjamin Beaumont wrote:

Dear list member,

Thank you for your input on this important matter. Having taken all the
posts into account and discussed it internally we are inclined to implement
the following in LiveCode 7.0:

1) Add a 'trueWord' chunk type to provide access to the Unicode Standard's
notion of word-breaking.
2) Add a 'sentence' chunk type to provide access to the Unicode Standard's
notion of sentence-breaking.
3) Add a 'paragraph' chunk type to provide access to the Unicode Standard's
notion of paragraph-breaking.
4) Add a synonym 'part' for the current 'word' chunk type.

This will allow LiveCoders to update their scripts to use the newer syntax
in anticipation of a future change to make the behaviour of the 'word'
chunk match the new 'trueWord' behaviour.

We would anticipate changing the meaning of 'word' with our 'Open Language'
project. It requires us to create a highly accurate script translation
system to allow old scripts to be rewritten in new revised and cleaner
syntax. It is at this point we can seriously think about changing the
meaning of existing tokens including word. Existing scripts will continue
to run using the existing parser, and they can be converted (by the user)
over time to use the 'newer' / 'more refined' syntax we are planning.

Warm regards and thank you again for your participation in this discussion.

Ben

_

Benjamin Beaumont . RunRev Ltd

LiveCode Product Manager
mail : 25a Thistle Street Lane South West, Edinburgh, EH2 1EW
email : b...@runrev.com
company : +44(0) 845 219 89 23
fax : +44(0) 845 458 8487
web : www.runrev.com

LiveCode - Programming made simple
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode




___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-13 Thread Mark Wieder
John-

Thursday, March 13, 2014, 3:25:19 PM, you wrote:

> Will point 4 - 'part' synonym break anything that already uses 'part' as
> a synonym of 'control'?  I've only got one project that uses this 
> keyword - not a big deal for me to change it, but others may have more
> substantial work to change.

That was my thought as well. I'm not sure I haven't used 'part'
already, but I can't guarantee I haven't. I'd rather not use a real
word for this, but instead have something that doesn't have another
meaning... "wurd" comes to mind.

-- 
-Mark Wieder
 ahsoftw...@gmail.com

This communication may be unlawfully collected and stored by the National 
Security Agency (NSA) in secret. The parties to this email do not 
consent to the retrieving or storing of this communication and any 
related metadata, as well as printing, copying, re-transmitting, 
disseminating, or otherwise using it. If you believe you have received 
this communication in error, please delete it immediately.


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-13 Thread Stephen MacLean

On Mar 13, 2014, at 1:10 PM, Richard Gaskin  wrote:

> So let's please do our part to ensure that the current Release Candidate 
> fully meets our needs.  If we test thoroughly enough it may even be the most 
> solid release ever, which benefits everyone.
> 
> 6.6 RC1 is available at the top of this page:
> 
> 
> --
> Richard Gaskin

Hi All,

In testing on 6.6 RC1, it seems there is an automatic scaling going on. I see 
it in the release notes, but am looking for more info on it.

LC graphics are for sure super crisp under iOS 7 and retina iPhone, going to 
have to work in those hi-res graphics I have :)

So far, very stable and fast!

Everything If tested so far works fine, but mergMK seems to have a problem when 
when you go to set it's rect to the rect of a graphic that's been scaled, it 
doesn't see it as scaled and is drawn at a quarter of the size. 

How do we turn that off for now?

Posting here. so both RR and Monte see this as well.

Best,

Steve MacLean



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-13 Thread Monte Goulding

On 14/03/2014, at 2:05 PM, Stephen MacLean wrote:

> Everything If tested so far works fine, but mergMK seems to have a problem 
> when when you go to set it's rect to the rect of a graphic that's been 
> scaled, it doesn't see it as scaled and is drawn at a quarter of the size. 

Hmm... LCInterfaceQueryViewScale must be returning the wrong value. We resolved 
this pre 6.5 but from the looks of things there's some changes marked HiDPI at 
the end of Jan that must break it. I won't have time to follow up on this so 
could you bug report with RunRev. The basics if the problem is 
MCIPhoneGetResolutionScale is no longer returning a scaling factor from 
LiveCode points/pixels to UIKit points.

Cheers

--
M E R Goulding 
Software development services
Bespoke application development for vertical markets

mergExt - There's an external for that!

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-14 Thread Benjamin Beaumont
Hi Stephen, Monte.

We're looking at this today.

Warm regards,

Ben


On 14 March 2014 03:41, Monte Goulding  wrote:

>
> On 14/03/2014, at 2:05 PM, Stephen MacLean wrote:
>
> > Everything If tested so far works fine, but mergMK seems to have a
> problem when when you go to set it's rect to the rect of a graphic that's
> been scaled, it doesn't see it as scaled and is drawn at a quarter of the
> size.
>
> Hmm... LCInterfaceQueryViewScale must be returning the wrong value. We
> resolved this pre 6.5 but from the looks of things there's some changes
> marked HiDPI at the end of Jan that must break it. I won't have time to
> follow up on this so could you bug report with RunRev. The basics if the
> problem is MCIPhoneGetResolutionScale is no longer returning a scaling
> factor from LiveCode points/pixels to UIKit points.
>
> Cheers
>
> --
> M E R Goulding
> Software development services
> Bespoke application development for vertical markets
>
> mergExt - There's an external for that!
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



-- 
_

Benjamin Beaumont . RunRev Ltd

LiveCode Product Manager
mail : 25a Thistle Street Lane South West, Edinburgh, EH2 1EW
email : b...@runrev.com
company : +44(0) 845 219 89 23
fax : +44(0) 845 458 8487
web : www.runrev.com

LiveCode - Programming made simple
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-14 Thread Stephen MacLean
Thanks Ben!

Best,

Steve

On Mar 14, 2014, at 6:35 AM, Benjamin Beaumont  wrote:

> Hi Stephen, Monte.
> 
> We're looking at this today.
> 
> Warm regards,
> 
> Ben
> 
> 
> On 14 March 2014 03:41, Monte Goulding  wrote:
> 
>> 
>> On 14/03/2014, at 2:05 PM, Stephen MacLean wrote:
>> 
>>> Everything If tested so far works fine, but mergMK seems to have a
>> problem when when you go to set it's rect to the rect of a graphic that's
>> been scaled, it doesn't see it as scaled and is drawn at a quarter of the
>> size.
>> 
>> Hmm... LCInterfaceQueryViewScale must be returning the wrong value. We
>> resolved this pre 6.5 but from the looks of things there's some changes
>> marked HiDPI at the end of Jan that must break it. I won't have time to
>> follow up on this so could you bug report with RunRev. The basics if the
>> problem is MCIPhoneGetResolutionScale is no longer returning a scaling
>> factor from LiveCode points/pixels to UIKit points.
>> 
>> Cheers
>> 
>> --
>> M E R Goulding
>> Software development services
>> Bespoke application development for vertical markets
>> 
>> mergExt - There's an external for that!
>> 
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
> 
> 
> 
> -- 
> _
> 
> Benjamin Beaumont . RunRev Ltd
> 
> LiveCode Product Manager
> mail : 25a Thistle Street Lane South West, Edinburgh, EH2 1EW
> email : b...@runrev.com
> company : +44(0) 845 219 89 23
> fax : +44(0) 845 458 8487
> web : www.runrev.com
> 
> LiveCode - Programming made simple
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-15 Thread Peter M. Brigham
On Mar 13, 2014, at 10:17 AM, J. Landman Gay wrote:

> Sure, that's what I'd need to do, convert each instance to items.  I was 
> bemoaning the sheer number of times it would have been necessary in just one 
> night's work.  It does point out the necessity for me to have a global switch 
> of some sort.  
> 
> Think of all the swapping you'd have to do within a single repeat loop just 
> to pull out chunks composed of different types of characters. A date followed 
> by a currency followed by a quoted string for example.  
> 
> On March 13, 2014 6:58:27 AM CDT, Alan Stenhouse  
> wrote:
>> Wouldn't it be simple to
>> 
>> set the itemDel to tab
>> put item 1 of line x of fld y into tDate
>> 
>> 
>> Granted, slightly more work, but since itemDel changes are limited to
>> only the local handler, it's not too big a deal, is it?
>> 
>> Or am I missing something else?

I use the following:

get getItem(line x of fld y,1,tab)

function getItem tList,tIndex,tDelim
   -- returns item # tIndex of tList, given itemdelimiter = tDelim
   -- could just "get item tIndex of tList" in the calling handler but
   --then have to set and restore the itemDelimiter, so this is less hassle
   -- defaults to tDelim = comma
   
   if tDelim = empty then put comma into tDelim
   set the itemdelimiter to tDelim
   return item tIndex of tList
end getItem

Very handy for pulling strings out of containers.

-- Peter

Peter M. Brigham
pmb...@gmail.com
http://home.comcast.net/~pmbrig


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-15 Thread Peter M. Brigham
On Mar 13, 2014, at 11:51 AM, Jim Hurley wrote:

> I have a poor man’s version that is workable.
> 
> However just ran across a sentence in today’s NYT that I will have to include:
> 
>(“What they’re offering people is a full stomach and an empty soul.”)
> 
> I had dealt with the quote beyond the period and the paren beyond the period, 
> but not both.

Here's a simple function I've been using. The new LC grammar revisions will 
make it obsolete eventually.

function naturalWord tWord
   -- strips punctuation from HC-style "words" fore and aft
   -- to return something closer to what is normally understood as a word
   put "abcdefghijklmnopqrstuvwxyz1234567890" into tAlphabet
   -- numerals included to cope with numbers and things like "HTML5"
   repeat while char 1 of tWord is not in tAlphabet
  delete char 1 of tWord
   end repeat
   repeat while char -1 of tWord is not in tAlphabet
  delete char -1 of tWord
   end repeat
   return tWord
end naturalWord

It's simple, and therefore somewhat simple-minded -- it will miss some cases, 
but I've found it useful.

"Every complicated problem has a simple, easy, obvious, wrong answer."
-- H. L. Mencken

-- Peter

Peter M. Brigham
pmb...@gmail.com
http://home.comcast.net/~pmbrig


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-15 Thread Jim Hurley
> 
> Message: 3
> Date: Sat, 15 Mar 2014 11:30:06 -0400
> From: "Peter M. Brigham" 
> To: How to use LiveCode 
> Subject: Re: New chunks
> Message-ID: 
> Content-Type: text/plain; charset=windows-1252
> 
> On Mar 13, 2014, at 11:51 AM, Jim Hurley wrote:
> 
>> I have a poor man?s version that is workable.
>> 
>> However just ran across a sentence in today?s NYT that I will have to 
>> include:
>> 
>>  (?What they?re offering people is a full stomach and an empty soul.?)
>> 
>> I had dealt with the quote beyond the period and the paren beyond the 
>> period, but not both.
> 
> Here's a simple function I've been using. The new LC grammar revisions will 
> make it obsolete eventually.
> 
> function naturalWord tWord
>  -- strips punctuation from HC-style "words" fore and aft
>  -- to return something closer to what is normally understood as a word
>  put "abcdefghijklmnopqrstuvwxyz1234567890" into tAlphabet
>  -- numerals included to cope with numbers and things like "HTML5"
>  repeat while char 1 of tWord is not in tAlphabet
> delete char 1 of tWord
>  end repeat
>  repeat while char -1 of tWord is not in tAlphabet
> delete char -1 of tWord
>  end repeat
>  return tWord
> end naturalWord
> 
> It's simple, and therefore somewhat simple-minded -- it will miss some cases, 
> but I've found it useful.
> 
> "Every complicated problem has a simple, easy, obvious, wrong answer."
>   -- H. L. Mencken
> 
> -- Peter
> 
> Peter M. Brigham
> pmb...@gmail.com
> http://home.comcast.net/~pmbrig

Hi Peter,

Always good to hear from you. I like your approach, but as you say, it  sill 
miss some cases, for example: “Don’t’ tell Ms. Fitz-Williams that pi is equal 
to 3.15”

I have built an app I find very useful. I do a lot of writing and I often want 
to go back to search through a manuscripts for sentences or paragraphs contain 
a certain combination of words, in whatever order. 

The following app allows one to put any text into a field and search for any or 
all words (whole words of patricidal)  in a sentence or paragraph in whatever 
order, even MS Word with all their curly quotes and apostrophes etc. It handles 
decimal numberers and also check for 70 of the most common abbreviations. But 
still I am looking forward to LC implementation of these new word and sentence 
chunks. No telling how many special cases I have missed.

In the msg. box:

  go url 
“https://dl.dropboxusercontent.com/u/47044230/Find%20sentences2.livecode”

Warm regards,

Jim



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-15 Thread Jim Hurley
Just thought of a situation I didn’t deal with. The apostrophe is tricky.

In doing a word search, you would want to kept the apostrophe in the word 
“don’t” but probably not in  “Howard’s best friend is his dog.” “Howard’s” 
should appear in the found sentence but not in the word search, i.e. If you 
searched for sentences in which “Howard”  appeared, you would want to find 
“Howard’s best….” 

 
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-15 Thread Peter Haworth
Hi Jim,
Have you ever tried the sqlite fts virtual table?  It's specifically
designed for text searching and is lightening fast.  Certainly an extra
step load stuff into a sneak but might be worth a look.

Pete
lcSQL Software
On Mar 15, 2014 4:25 PM, "Jim Hurley"  wrote:

> Just thought of a situation I didn't deal with. The apostrophe is tricky.
>
> In doing a word search, you would want to kept the apostrophe in the word
> "don't" but probably not in  "Howard's best friend is his dog." "Howard's"
> should appear in the found sentence but not in the word search, i.e. If you
> searched for sentences in which "Howard"  appeared, you would want to find
> "Howard's best"
>
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-15 Thread Peter M. Brigham
On Mar 15, 2014, at 7:05 PM, Jim Hurley wrote:

> The following app allows one to put any text into a field and search for any 
> or all words (whole words of patricidal)  in a sentence or paragraph in 
> whatever order, even MS Word with all their curly quotes and apostrophes etc. 
> It handles decimal numberers and also check for 70 of the most common 
> abbreviations. But still I am looking forward to LC implementation of these 
> new word and sentence chunks. No telling how many special cases I have missed.
> 
> In the msg. box:
> 
>  go url 
> “https://dl.dropboxusercontent.com/u/47044230/Find%20sentences2.livecode”

I get "script compile error: Expression bad factor" in the message box, and 
when I click on the link itself I get a 404 in the browser.

-- Peter

Peter M. Brigham
pmb...@gmail.com
http://home.comcast.net/~pmbrig


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-15 Thread stephen barncard
On Sat, Mar 15, 2014 at 7:19 PM, Peter M. Brigham  wrote:

> I get "script compile error: Expression bad factor" in the message box,
> and when I click on the link itself I get a 404 in the browser.


yeah something odd about the URL here too.

*--*
*Stephen Barncard - San Francisco Ca. USA - Deeds Not Words*
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-16 Thread Jim Hurley
Sorry Peter and Stephen,

That should have been:

open stack  url 
“https://dl.dropboxusercontent.com/u/47044230/FindSentences2.livecode";

Jim

> Message: 6
> Date: Sat, 15 Mar 2014 22:19:20 -0400
> From: "Peter M. Brigham" 
> To: How to use LiveCode 
> Subject: Re: New chunks
> Message-ID: <5865b76a-362c-44a2-97e6-1bf49181a...@gmail.com>
> Content-Type: text/plain; charset=windows-1252
> 
> On Mar 15, 2014, at 7:05 PM, Jim Hurley wrote:
> 
>> The following app allows one to put any text into a field and search for any 
>> or all words (whole words of patricidal)  in a sentence or paragraph in 
>> whatever order, even MS Word with all their curly quotes and apostrophes 
>> etc. It handles decimal numberers and also check for 70 of the most common 
>> abbreviations. But still I am looking forward to LC implementation of these 
>> new word and sentence chunks. No telling how many special cases I have 
>> missed.
>> 
>> In the msg. box:
>> 
>> go url 
>> ?https://dl.dropboxusercontent.com/u/47044230/Find%20sentences2.livecode?
> 
> I get "script compile error: Expression bad factor" in the message box, and 
> when I click on the link itself I get a 404 in the browser.
> 
> -- Peter
> 
> Peter M. Brigham
> pmb...@gmail.com
> http://home.comcast.net/~pmbrig
> 
> 
> 
> 
> --
> 
> Message: 7
> Date: Sat, 15 Mar 2014 19:47:24 -0700
> From: stephen barncard 
> To: How to use LiveCode 
> Subject: Re: New chunks
> Message-ID:
>   
> Content-Type: text/plain; charset=UTF-8
> 
> On Sat, Mar 15, 2014 at 7:19 PM, Peter M. Brigham  wrote:
> 
>> I get "script compile error: Expression bad factor" in the message box,
>> and when I click on the link itself I get a 404 in the browser.
> 
> 
> yeah something odd about the URL here too.
> 
> *--*
> *Stephen Barncard - San Francisco Ca. USA - Deeds Not Words*
> 


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-16 Thread Jim Hurley
Hi Pete,

I’ve never gotten into SQL. Will it search for any or all words in any order in 
a sentence?

Jim


> Message: 5
> Date: Sat, 15 Mar 2014 17:02:20 -0700
> From: Peter Haworth 
> To: How to use LiveCode 
> Subject: Re: New chunks
> Message-ID:
>   
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Hi Jim,
> Have you ever tried the sqlite fts virtual table?  It's specifically
> designed for text searching and is lightening fast.  Certainly an extra
> step load stuff into a sneak but might be worth a look.
> 
> Pete
> lcSQL Software
> On Mar 15, 2014 4:25 PM, "Jim Hurley"  wrote:

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: New chunks

2014-03-16 Thread Peter Haworth
If you're not into SQL, probably not worth the learning curve to go down
that path.

To answer your question though, I believe that could be done with the right
database structure but you'd probably have to pre-parse your manuscript
into sentences then load each sentence as a separate row in the database.

It also feels like a regexp could do this without having to worry about
stripping out punctuation.  The key is to define exactly what constitutes
 a sentence.

Pete
lcSQL Software <http://www.lcsql.com>
Home of lcStackBrowser <http://www.lcsql.com/lcstackbrowser.html> and
SQLiteAdmin <http://www.lcsql.com/sqliteadmin.html>


On Sun, Mar 16, 2014 at 5:31 AM, Jim Hurley wrote:

> Hi Pete,
>
> I've never gotten into SQL. Will it search for any or all words in any
> order in a sentence?
>
> Jim
>
>
> > Message: 5
> > Date: Sat, 15 Mar 2014 17:02:20 -0700
> > From: Peter Haworth 
> > To: How to use LiveCode 
> > Subject: Re: New chunks
> > Message-ID:
> >   <
> cagdt7ep0pdwcchnhoxcfpxdjgpy4yxq9pnmf0brw73je4vc...@mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Hi Jim,
> > Have you ever tried the sqlite fts virtual table?  It's specifically
> > designed for text searching and is lightening fast.  Certainly an extra
> > step load stuff into a sneak but might be worth a look.
> >
> > Pete
> > lcSQL Software
> > On Mar 15, 2014 4:25 PM, "Jim Hurley"  wrote:
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode