Searching is discussed in UTS#10. It does need to be correlated with user's expectations for matching, as you observe.
Mark *— Il meglio è l’inimico del bene —* On Mon, Mar 28, 2011 at 14:13, Shawn Steele <shawn.ste...@microsoft.com>wrote: > Searching gets tricky. Is the result greedy or not (matches as much as > possible or as little as possible), etc. There are lots of variations, > which is why it was skipped from the initial v0.5. > > > > Comparison, Search and Casing are all dependent on each other. If search > finds a substring, we’d expect comparison to match that substring. > Similarly, if one is using Turkish I, we expect all of them to do so. > > > > - Shawn > > > > *From:* Nebojša Ćirić [mailto:c...@google.com] > *Sent:* Monday, March 28, 2011 1:36 PM > *To:* Mark Davis ☕ > *Cc:* es-discuss@mozilla.org; Shawn Steele; Phillips, Addison > *Subject:* Re: Collation API not complete for search > > > > Shawn, would you be ok with adding this new API to the list for 0.5 so we > can support collation search? > > > > I'll edit the strawman in case nobody objects to this addition. > > 25. март 2011. 16.34, Nebojša Ćirić <c...@google.com> је написао/ла: > > In that case I wouldn't put this new functionality in the Collator object. > A new StringSearch or StringIterator object would make more sense: > > > > options = { > > collator[optional - default, collatorType=search], > > source[required], > > pattern[required] > > } > > LocaleInfo.StringIterator = function(options) {} > > LocaleInfo.StringIterator.prototype.first = function() { find > first occurrence} > > LocaleInfo.StringIterator.prototype.next = function() { get me > next occurrence of pattern in source} > > LocaleInfo.StringIterator.prototype.matchLength = function() { length of > the match } > > ... (reset, setPosition...) > > 25. март 2011. 15.14, Mark Davis ☕ <m...@macchiato.com> је написао/ла: > > > > I think an iterator is a cleaner interface; we were just trying to minimize > new API. > > > > In general, collation is context sensitive, so searching on substrings > isn't a good idea. You want to search from a location, but have the rest of > the text available to you. > > > > For the iterator, you would need to be able to reset to a location, but the > context beforehand could affect what happens. > > > Mark > > *— Il meglio è l’inimico del bene —* > > > > On Fri, Mar 25, 2011 at 14:22, Mike Samuel <mikesam...@gmail.com> wrote: > > 2011/3/25 Mike Samuel <mikesam...@gmail.com>: > > > 2011/3/25 Nebojša Ćirić <c...@google.com>: > >> find method wouldn't return boolean but an array of two values: > > > > Sorry if I wasn't clear. The !! at the beginning of the call to find > > is important. > > The undefined value you mentioned below as possible no match result is > > falsey because !!undefined === false. > > > >> myCollator.find('gaard', 'ard', 2) -> [2, 5] // 4 or 5 as a bound > >> myCollator.find('ard', 'ard', 0) -> [0, 3] // 2 or 3 as a bound > >> I guess [2, 5] !== [0, 3] > > > > True, but also [2, 5] !== [2, 5]. > > > >> We could return [-1, undefined] for not found state, or just undefined. > > > >> I agree that returning a boolean makes for easier tests in loops. > > > > > >> 25. март 2011. 14.00, Mike Samuel <mikesam...@gmail.com> је написао/ла: > >>> > >>> 2011/3/25 Nebojša Ćirić <c...@google.com>: > >>> > Looking through the notes from the meeting I also found some problems > >>> > with > >>> > the collator. We did specify the collatorType: search, but we didn't > >>> > offer a > >>> > function that would make use of it. Mark and I are thinking about: > >>> > /** > >>> > * string - string to search over. > >>> > * substring - string to look for in "string" > >>> > * index - start search from index > >>> > * @return {Array} [first, last] - first is index of the match or -1, > >>> > last > >>> > is end of the match or undefined. > >>> > */ > >>> > LocaleInfo.Collator.prototype.find(string, substring, index) > >>> > We could also opt for iterator solution where we keep the state. > >>> > >>> Assuming find returns a falsey value when nothing is found, is it the > >>> case that for all (string, index) pairs, > >>> > >>> !!myCollator.find(string, substring, index) === > >>> !!myCollator.find(string.substring(index), substring, 0) > > Maybe a better way to phrase this relation is > > will any collator ever look at a code-unit to the left of index when > trying to determine whether there is a match at or after index? > > E.g. if the code-unit at index might be a strict suffix of a substring > that could be represented as a one codepoint ligature. > > > > >>> This would be false if the substring 'ard' should be found in 'gard', > >>> but not 'gaard' because then > >>> > >>> !!myCollator.find('gaard', 'ard', 2) !== !!myCollator.find('ard', > >>> 'ard', 0) > >>> > >>> > >>> If that relation does not hold, then exposing find as an iterator > >>> might help prevent a profusion of subtly wrong loops. > >>> > >>> > >>> > The reason we need to return both begin and end part of the found > string > >>> > is: > >>> > Look for gaard and we find gård - which may be equivalent in Danish, > but > >>> > substring lengths don't match (5 vs. 4) so we need to tell user the > next > >>> > index position. > >>> > The other problem Jungshik found is that there is a combinatorial > >>> > explosion > >>> > with all ignoreXXX options we defined. My proposal is to define only > N > >>> > that > >>> > make sense (and can be supported by all implementors) and fall back > the > >>> > rest > >>> > to some predefined default. > >>> > >>> > >>> > >>> > -- > >>> > Nebojša Ćirić > >>> > > >>> > _______________________________________________ > >>> > es-discuss mailing list > >>> > es-discuss@mozilla.org > >>> > https://mail.mozilla.org/listinfo/es-discuss > >>> > > >>> > > >> > >> > >> > >> -- > >> Nebojša Ćirić > >> > > > _______________________________________________ > es-discuss mailing list > es-discuss@mozilla.org > https://mail.mozilla.org/listinfo/es-discuss > > > > > > > -- > Nebojša Ćirić > > > > > -- > Nebojša Ćirić >
_______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss