Hi Brandon -
Thanks for your patch.
Overall comments:
- Thanks for creating a unit test, good idea
- It's fine with me to return "" in cases when the requested range
is out of bounds, and to limit the range to the string.
- You should either support your primitive on compile-time constants
or implement it as an extern proc. If the index_of primitive is
not supported in compile-time constant folding, you should implement
it as an extern proc instead of as a primitive. The other string
functions are primitives for historical reasons. Since we already
have a different string_contains function, I'd recommend just
using an extern proc.
- Go ahead and add a string method to run your new routine. I think
it would be nice to return a range, since string.substring
takes in a range.
Your existing one in the test case could easily be added to ChapelBase.chpl,
around line 664 where the other string methods are.
Of course, other Chapel developers might have an opinion about the
name or return type of this routine...
- You'll need to run the existing unit tests to make sure your patch
doesn't change functionality or break something. (it's normal
code review/pre-commit process to run all the unit tests single-locale).
When you do your testing, also test the new test case as well as
at least test/release with --no-local or with a GASNET configuration,
since the string type currently changes in multilocale configurations.
- I'll ask Sung about returning "". I don't see any issues currently
(since I think strings are still leaked).
Besides those general comments, I have a question about this change to
ChapelRangeBase.chpl:
Index: modules/internal/ChapelRangeBase.chpl
===================================================================
--- modules/internal/ChapelRangeBase.chpl (revision 22367)
+++ modules/internal/ChapelRangeBase.chpl (working copy)
@@ -1014,13 +1014,9 @@
// Return a substring of a string with a range of indices.
inline proc string.substring(r: rangeBase(?))
{
- if r.boundedType != BoundedRangeType.bounded then
- compilerError("substring indexing undefined on unbounded ranges");
-
- if r.stride != 1 then
- return __primitive("string_strided_select", this, r.alignedLow,
r.alignedHigh, r.stride);
- else
- return __primitive("string_select", this, r.low, r.high);
+ var r2 = r[(1..this.length)._base]; // Intersect with string bounds.
+ var lo:int = r2.alignedLow, hi:int = r2.alignedHigh;
+ return __primitive("string_strided_select", this, lo, hi, r2.stride);
}
Why is the intersection done both in string.substring and also in
string_strided_select? Can you handle it in just one place? Is there
a way to write your range computation using only the standard-supported
notation (ie ._base is probably not the best name for it).
Also, you have removed an optimization that calls the simpler string_select
routine when stride == 1. It would be OK with me if string_strided_select
included that optimization, but I think it's important.
Thanks,
-michael
On 12/02/2013 06:06 PM, Brandon Ross wrote:
> Hello, all. Forgive me if I'm doing something wrong; I'm not familiar
> with the patch submission procedure, or which parts of the code are OK
> to touch.
>
> I added a compiler primitive for finding the index of a substring within
> a string. I didn't see any other way of doing the same thing
> efficiently. It's basically just a wrapper around strstr. It returns 0
> if the substring couldn't be found. I didn't add a method to the string
> type (outside of the module I'm working on) for calling it, but I'm
> using it in the module for simple parsing procedures.
>
> I also changed string.substring to more gracefully handle unbounded and
> out-of-bounds range limits by taking the intersection of the passed
> range and the bounding range of the string (e.g.,
> "hello".substring(-5..30) == "hello"). Ranges that don't overlap the
> string at all will return an empty string (e.g.,
> "hello".substring(50..100) == ""). It takes striding into account as
> well. I also made string_index (single character substrings) return an
> empty string if the index is out of range. I don't see the danger in
> handling substrings like this instead of crashing, and other languages
> (e.g., Python) do this as well. All in all I think it makes string
> handling a little friendlier.
>
> Also, for edge cases that return empty strings, I simply returned an
> empty string literal. I have no idea how safe that is or how it affects
> garbage collection. Let me know if it's better practice to allocate a
> new string instead.
>
> I put a test case in /test/trivial/bwross/string_index.chpl.
>
> -- Brandon
>
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers