OIIO (parts of it anyway) tends to be used in fairly performance-critical ways, 
so I don't think there's a strong argument to be made for intentionally having 
bad string performance.

Perhaps you meant, "isn't OIIO mostly image processing, which is all float 
math, so why do strings come up much at all?"

The answer is that we still use strings all the time. Two spots that come to 
mind are (1) names of texture and other image files; (2) the image headers 
themselves are chock full of strings (metadata names, and often the metadata 
values).  We are constantly rummaging through metadata, searching for things by 
name, making and destroying lists of such things, or specifying texture lookups 
by name.

But even if this never became significant in the runtime profile, there are 
also (non-performance) issues of APIs -- both convenience and safety.  Consider 
a mundane utility function that extracts the file extension from an image 
filename.  What kind of parameter should extension() take? A const char*? A 
const std::string&?

If it takes a const char* and the app has a const char*, all is good (though, 
if it's a string literal, extension() will probably needlessly call strlen() 
internally to find its length, which is weird because it started life as a 
constant with known length). But if the app is trying to pass a std::string, it 
will have to pass as mystring.c_str(), and then extension() will again 
needlessly call strlen(), which is especially galling because the length was 
already known when it was a std::string.

If, on the other hand, extension() takes a const string&, then that's great for 
an app that already has the data as a std::string, but for a different app that 
has it as a C string or a string literal), passing it to extension will 
pointlessly do a malloc to create a std::string, a strlen to know how big it 
is, pass the string (which is just a temporary!), and then have to free it 
after the call. Ick.

Should we have both varieties of extension()? Have two copies of every 
string-handling function, one for char* and one for std::string? What about a 
function that takes 3 strings as arguments... do we need 8 versions of the 
function, to handle the full cross product of every one of those parameters 
being either a char* or std::string&?

Or, consider the return type. If you have a function that returns a 
std::string, it must allocate and copy that string, and the caller must 
eventually free it. But for extension() and many other functions, the returned 
string will by definition ALWAYS be a subset of the string that was passed as 
the input parameter. However, that observation does nothing to prevent a new 
malloc/strcpy/free from happening. These little things add up.

The neat thing is that string_ref cuts through all of this mess, and presents a 
SINGLE interface, through which it's fine (and efficient!) for the app to pass 
a std::string, a char*, or even one of our ustring's. It NEVER does a 
malloc/free in order to pass a string, and although it can occasionally trigger 
an unneeded strlen, it probably saves more strlen's than it creates, because 
any function that internally needs the length will already have it.  Also, in 
cases where a return string value is guaranteed to be one of (or a substring 
of) the inputs, the function can also return a string_ref, again completely 
avoiding any allocation or copying.

It's a very slick idea. (Not originally mine.)



On Jan 7, 2014, at 1:56 AM, Gregor Mückl <[email protected]> wrote:

> Just to get a better understanding of this discussion and OIIO in general: 
> why do you place such an emphasis on optimizing string operations in OIIO? 
> Why is decent performance in this area so important?
> 
> Gregor
> 
> On 12/31/2013 12:57 AM, Larry Gritz wrote:
>> Here's my stab at it: https://github.com/OpenImageIO/oiio/pull/772
>> 
>> Feedback appreciated. Now that I've done it, I rather like it.
>> 
>> 
>> On Dec 26, 2013, at 12:35 PM, Larry Gritz <[email protected]> wrote:
>> 
>>> Here's the C++ draft document: 
>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3442.html
>>> It's being considered for C++14.
>>> 
>>> Boost 1.55 has an implementation, but not everybody (including my 
>>> employers) is on such a recent boost, so I don't think we can depend on 
>>> that.  It's a simple class, I don't mind rolling our own, and in fact if we 
>>> do we can also make it a bit more optimized in dealing with ustring.
>>> 
>>> The basic gist is that, say you have a library function foo(x), where x is 
>>> a character sequence. What type should x be?  Let's assume that foo needs 
>>> to read the characters but doesn't wish to assume ownership or modify them 
>>> in place. And also let's assume that the library function may be used by 
>>> apps that like char*, and other apps that like std::string.
>>> 
>>> If x is const char*, then (a) it's technically unsafe because no length is 
>>> passed, so it assumes that the characters are valid and are 0-terminated, 
>>> and also (b) an app that is dealing with std::string needs to call it as 
>>> foo(s.c_str()), which is a little clunky and may rub some people wrong.
>>> 
>>> On the other hand, if x is a std::string&, that works really nicely for an 
>>> app already using std::string, but if the caller just has a char array, 
>>> it'll end up converting to a std::string to pass (which involves allocating 
>>> memory, copying the chars, then deallocating after the call, very wasteful).
>>> 
>>> So the idea is that you'd declare a non-owning (char* + length) object and 
>>> pass that around with implied conversions.  You can just pass either a 
>>> char* or a std::string, works fine both ways, and in neither case will it 
>>> allocate, copy, or deallocate. For the char* case, if the app knows the 
>>> length of the string already, it further provides safety by passing the 
>>> length without having to scan the string.
>>> 
>>> Also, there are examples of functions that currently return a string where 
>>> the string will necessarily be a contiguous subregion of another existing 
>>> string that was passed as an argument (example: substr). Instead of 
>>> allocating std::strings, it can return string_ref's.  The allocation, if 
>>> ever needed, is done on the assignment after return, if you know what I 
>>> mean.
>>> 
>>> To answer your question, you can have an empty string_ref, yes.
>>> 
>>> 
>>> 
>>> On Dec 26, 2013, at 3:02 PM, Mark Visser <[email protected]> wrote:
>>> 
>>>> I'm really only familiar with the oiiotool part of the codebase, but I 
>>>> noticed there are a few places that pass const char* instead of 
>>>> std::string because the latter can't represent NULL. Does string_ref allow 
>>>> the null string?
>>>> 
>>>> Is this the same thing? 
>>>> http://www.boost.org/doc/libs/1_55_0/libs/utility/doc/html/string_ref.html
>>>> 
>>>> cheers
>>>> 
>>>> On Dec 26, 2013, at 1:10 PM, Larry Gritz <[email protected]> wrote:
>>>> 
>>>>> Just floating the idea to see what people think.
>>>>> 
>>>>> Many packages have some version of a "string reference" class that is a 
>>>>> non-owning (and non-allocating) reference to character data that 
>>>>> auto-converts to char* or std::string when assigned.  C++14 is drafting 
>>>>> it into its standard library.
>>>>> 
>>>>> I've been toying with the idea of doing an implementation of this in 
>>>>> OIIO, and changing many of the utility and other functions that take 
>>>>> strings -- which are currently a bit of a mish-mash of std::string, 
>>>>> char*, and ustring -- to accepting a string_ref, then internally it can 
>>>>> do what it wishes. For many uses that just need to look at the 
>>>>> characters, this can eliminate many transitory std::string allocations 
>>>>> and copies.  Since we can't count on C++14 (hah, we can't even count on 
>>>>> C++11), I'd have to write my own (it's very simple) and so it could also 
>>>>> be aware of ustring.
>>>>> 
>>>>> Does anybody have strong opinions?  Including:
>>>>> 
>>>>> 1. Don't change anything ever, we hate string_ref.
>>>>> 
>>>>> 2. A nice idea to use it when it's available from C++14, but wait until 
>>>>> then.
>>>>> 
>>>>> 3. Yes, do it and convert anyplace where it's appropriate. Making it 
>>>>> ustring-aware is great, even if it means deviating from what C++14 will 
>>>>> eventually offer as std::string_ref.
>>>>> 
>>>>> 4. Do it, but only if it's exactly like what C++14 std::string_ref will 
>>>>> end up being.
>>>>> 
>>>>> 5. Something else.
>>>>> 
>>>>> If you don't care one way or the other, as long as it won't really affect 
>>>>> what the source code looks like on the app side, you don't need to chime 
>>>>> in, that's the default answer.
>>>>> 
>>>>> Similarly, there are a lot of places where we pass arrays as just a T*.  
>>>>> It sometimes bugs me how unsafe this is, there are always assumptions 
>>>>> about how many T's it points to.  We could make an array_ref<T> template, 
>>>>> which is just a bundling of the pointer and a length, and use that in 
>>>>> many places instead of raw pointers. There's no allocation issue here, 
>>>>> but forcing the callers to pass an explicit length (and have the called 
>>>>> function given an explicit length) could be useful in improving code 
>>>>> safety.
>>>>> 
>>>>> --
>>>>> Larry Gritz
>>>>> [email protected]
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Oiio-dev mailing list
>>>>> [email protected]
>>>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>>>> 
>>>> _______________________________________________
>>>> Oiio-dev mailing list
>>>> [email protected]
>>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>>> 
>>> --
>>> Larry Gritz
>>> [email protected]
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Oiio-dev mailing list
>>> [email protected]
>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>> 
>> --
>> Larry Gritz
>> [email protected]
>> 
>> 
>> 
>> _______________________________________________
>> Oiio-dev mailing list
>> [email protected]
>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>> 
> 
> _______________________________________________
> Oiio-dev mailing list
> [email protected]
> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org

--
Larry Gritz
[email protected]



_______________________________________________
Oiio-dev mailing list
[email protected]
http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org

Reply via email to