> On 09 Mar 2018, at 20:10, Junio C Hamano <gits...@pobox.com> wrote:
> 
> lars.schnei...@autodesk.com writes:
> 
>> +static const char *default_encoding = "UTF-8";
>> +
>> ...
>> +static const char *git_path_check_encoding(struct attr_check_item *check)
>> +{
>> +    const char *value = check->value;
>> +
>> +    if (ATTR_UNSET(value) || !strlen(value))
>> +            return NULL;
>> +
>> +    if (ATTR_TRUE(value) || ATTR_FALSE(value)) {
>> +            error(_("working-tree-encoding attribute requires a value"));
>> +            return NULL;
>> +    }
> 
> Hmph, so we decide to be loud but otherwise ignore an undefined
> configuration?  Shouldn't we rather die instead to avoid touching
> the user data in unexpected ways?

OK.


>> +
>> +    /* Don't encode to the default encoding */
>> +    if (!strcasecmp(value, default_encoding))
>> +            return NULL;
> 
> Is this an optimization to avoid "recode one encoding to the same
> encoding" no-op overhead?

Correct.

>  We already have the optimization in the
> same spirit in may existing codepaths that has nothing to do with
> w-t-e, and I think we should share the code.  Two pieces of thought
> comes to mind.
> 
> One is a lot smaller in scale: Is same_encoding() sufficient for
> this callsite instead of strcasecmp()?

Yes!


> The other one is a lot bigger: Looking at all the existing callers
> of same_encoding() that call reencode_string() when it returns false,
> would it make sense to drop same_encoding() and move the optimization
> to reencode_string() instead?
> 
> I suspect that the answer to the smaller one is "yes, and even if
> not, it should be easy to enhance/extend same_encoding() to make it
> do what we want it to, and such a change will benefit even existing
> callers."  The answer to the larger one is likely "the optimization
> is not about skipping only reencode_string() call but other things
> are subtly different among callers of same_encoding(), so such a
> refactoring would not be all that useful."

I agree. reencode_string() would need to signal 3 cases:
1. reencode performed
2. reencode not necessary
3. reencode failed

We could model "reencode not necessary" as "char *in == char *return".
However, I think this should be tackled in a separate series.

Thanks
Lars

Reply via email to