20: eina_tmpstr: add eina_tmpstr_strftime

Tom Hacohen Mon, 28 Sep 2015 07:07:52 -0700

On 28/09/15 09:53, Tom Hacohen wrote:
> On 24/09/15 18:53, Cedric BAIL wrote:
>> On Thu, Sep 24, 2015 at 10:20 AM, Tom Hacohen <[email protected]> wrote:
>>> On 23/09/15 19:56, Cedric BAIL wrote:
>>>> On Wed, Sep 23, 2015 at 1:18 AM, Tom Hacohen <[email protected]> wrote:
>>>>> On 22/09/15 18:31, Cedric BAIL wrote:
>>>>>> Le 22 sept. 2015 09:40, "Tom Hacohen" <[email protected]> a écrit :
>>>>>>>
>>>>>>> On 22/09/15 17:32, Cedric BAIL wrote:
>>>>>>>> Le 22 sept. 2015 02:30, "Daniel Kolesa" <[email protected]> a écrit 
>>>>>>>> :
>>>>>>>>>
>>>>>>>>> On Mon, Sep 21, 2015 at 11:24 PM, Shilpa Singh <
>>>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>>> cedric pushed a commit to branch master.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>> http://git.enlightenment.org/core/efl.git/commit/?id=abaf29cb768375957c9ee0b64d36034c21c618ea
>>>>>>>>>>
>>>>>>>>>> commit abaf29cb768375957c9ee0b64d36034c21c618ea
>>>>>>>>>> Author: Shilpa Singh <[email protected]>
>>>>>>>>>> Date:   Mon Sep 21 23:48:16 2015 +0200
>>>>>>>>>>
>>>>>>>>>>          eina_tmpstr: add eina_tmpstr_strftime
>>>>>>>>>
>>>>>>>>> This API seems awfully arbitrary and I don't really see the point.
>>>>>>>>> Might as well be any other function that prints to strings - are you
>>>>>>>>> gonna add these too? Sounds horrible to me.
>>>>>>>>
>>>>>>>> Is it better to have our code cluttered by static buffer and no 
>>>>>>>> overflow
>>>>>>>> check ? Obviously not. Yes,I expect we refactor all our buffer print
>>>>>> code
>>>>>>>> as it is pretty bad right now and may even lead to security issue in
>>>>>> some
>>>>>>>> case.
>>>>>>>
>>>>>>> Just to chime in, I think this doomsday scenario is a bit exaggerated.
>>>>>>> I think having code like:
>>>>>>>
>>>>>>> char buf[SOME_LEN];
>>>>>>> strftime(...buf...);
>>>>>>> return tmpstr_add(buf);
>>>>>>>
>>>>>>> Is a much cleaner solution than adding all of the string functions of
>>>>>>> the world to tmpstr, strbuf, stringshare and etc.
>>>>>>>
>>>>>>> I really don't like code duplication, and I think that's where Daniel is
>>>>>>> coming from.
>>>>>>>
>>>>>>> What do you think?
>>>>>>
>>>>>> It's a good example that won't always work and increase at best the stack
>>>>>> size requirement for absolutely no good reason.
>>>>>
>>>>> Why wouldn't it always work?
>>>>> Also if you enclose it in its own sub {}, every compiler will treat the
>>>>> stack sanely, and even without it, I'd assume optimising compilers will.
>>>>
>>>> The previous implementation used in Elementary was not working with
>>>> some local where it used way more charactere than expected. Work
>>>> around solution is to be over generous on memory...
>>>>
>>>> But compiler have no way to optimize it at all. How could they know
>>>> before jumping to a function call, that is going to generate data at
>>>> runtime, how much that function is going to use to limit the size of
>>>> the buffer on the stack they give to it. No compiler can optimize
>>>> that. It's just impossible. Every time we put a 4K buffer on the stack
>>>> all further function call will force that page allocation, that's how
>>>> things are.
>>>
>>> As implied by my comment about the own sub {}, I was talking about
>>> optimising the over-stack-usage.
>>>
>>> char buf[SOMELEN];
>>> strftime(buf);
>>> ret = tmpstr_add(buf);
>>> // buf is not used anymore, stack can be cleared.
>>>
>>> That's what I meant. It's true though that compilers can't assume we
>>> haven't kept the buf pointer and used it somewhere else (which we are
>>> allowed to until the function ended). Though that's not what you were
>>> talking about.
>>
>> That is another case that a compiler can not optimize. It has no clue
>> that buf is not stored somewhere globally, or that ret doesn't point
>> to somewhere in it. That's why stack size vary depending on the block
>> you are in. Also stack never shrink, so once you get those 4K on it,
>> they will stay there.
>
> As I said, this case is unlikely, but the other case I mentioned does
> allow the compiler to optimise:
>
> int foo(void)
> {
>      const char *bla;
>      {
>         char buf[LONGBUF];
>         strftime(...);
>         bla = eina_tmpstr_add;
>      }
>      // rest of code
>      printf("%s\n", bla);
>      return 0;
> }
>
> this can even be simplified into a macro that accepts a parameter saying
> which function to use to duplicate the string. strdup, eina_tmpstr_add,
> eina_stringshare_add or whatever.
>
>>
>>> However, looking at the tmpstr code, the solution there is no better. It
>>> reallocs all the time and implements a whole mechanism there. Wouldn't
>>> it be much better to create a eina_strftime_alloc (bad name) that
>>> returns an allocated string of the right size and that can be used in
>>> strbuf (strbuf_string_steal), tmpstr (add a string steal) and etc?
>>> That's writing the function once, and having one API for all of eina,
>>> instead of the proposed method which will add plenty and duplicate all
>>> around.
>>
>> strftime doesn't tell you the size of the string you need. It only
>> tells you if it failed to put the generated string inside the buffer
>> you gave it. There is no way around doing a realloc loop with that
>> API.
>>
>> As for writing the function once, it is the case right now. If I was
>> to add that function to strbuf or stringshare, then I would obviously
>> find a way to share the code. If not by just calling the tmpstr
>> function. In general for Eina, we do not add a function until we have
>> at least 2 users in our code base. So for now, I do not plan to add
>> strftime to any other part of Eina, as we do not have the need for it.
>
> You are adding eina_tmpstr_add() without any users in the code base...
> Anyhow, I think it would be better to add API in a common place, as we
> know it'll be used more, than to add a specific API and then duplicating
> it to a common place.
>
>>
>>>>>> I see no reason to have bad code like that just for fewer line of code.
>>>>>
>>>>> It's not a fewer lines of code, it's fewer API and a lot less lines of
>>>>> code. Also, adding such API implies to developers that they should
>>>>> expect such API, so you are just feeding the evil loop.
>>>>
>>>> And they should use it, as the way they usually manually do it lead to
>>>> bug. This is actually reducing our code base and also reduce our
>>>> number of actual bugs.
>>>
>>> Have a general purpose function that does that (as above), no manual
>>> work. Still reduces the codebase, much cleaner.
>>
>> eina_strftime is I guess what you are trying to say here. Basically
>> not making it a tmpstr and just use free to get rid of it. I guess
>> that's an acceptable solution.
>
> Yes, it's exactly what I'm saying. Let's revert this patch and work on
> an eina_strftime instead.
>
>>
>>>>> Btw, I was thinking about the whole tmpstr solution (which you know I am
>>>>> not a fan off) last night, and I came up with two solutions I think are
>>>>> better, and anyhow, are massively more efficient.
>>>>> The first involves creating new API (but it's still less API than
>>>>> tmpstr) for genlist and etc that accept a struct instead of a string.
>>>>> This way we can add more info that will hint who should free it.
>>>>> This is the more involved, clean solution, that we should maybe adopt
>>>>> for EFL 2.0.
>>>>
>>>> So your solution is to pass a struct where there will be one byte
>>>> dedicated to say if that string need to be freed ? I think that's far
>>>> from elegant and fully error prone with no way for the compiler to
>>>> help on that.
>>>
>>> Maybe we'll have more metadata in the future. The struct will be created
>>> automatically with an helper function, so not error prone at all.
>>
>> I will have to see it, but I have yet to see what would all those
>> metadata be. For the moment, I don't see how this could be an
>> improvement over current code base.
>
> I don't think eina_tmpstr is a good solution that will scale nicely,
> these are just alternative ideas.
>
>>
>>>>> The easy hacky solution, which is a replacement to tmpstr (which is also
>>>>> a hack) and is much simpler and more efficient, is to use this scheme:
>>>>> For static strings you just return them:
>>>>> return "foo";
>>>>> for dynamic you return this:
>>>>> return eina_tomstr_add("bar");
>>>>>
>>>>> You can free both with eina_tomstr_free(buf);
>>>>>
>>>>> The implementation is where the magic is at, in memory, it looks like 
>>>>> this:
>>>>> "\x02bar", the first char being the STX (start of text) char. We just
>>>>> use eina_tomstr_str_get(buf) whenever we want to use it. It just returns
>>>>> "buf + 1" if the first char is STX. It's infinitely faster and more
>>>>> memory efficient than tmpstr, I think also much cleaner. The only bad
>>>>> thing is that you need to use the str function, but I think that's a
>>>>> reasonable compromise.
>>>>
>>>> That would obviously lead to random bug. If you free a buf that has
>>>> not been allocated by your tomstr, it will lead to an off by one
>>>> access, if you are unlucky you will fall on a non allocated page. If
>>>> you are really unlucky it will fall on an existing data that does have
>>>> the expected value and call free on a static buffer. Also it won't be
>>>> that much faster or slower as the current implementation walk a short
>>>> list that should definitively be only a few elements long.
>>>
>>> How so? As I said, genlist gets "\x02bar", so genlist can safely check
>>> the *first* (not -1) character of the string. It can then pass the
>>> result of string_get() to where it needs actual text, but it keeps the
>>> real reference for future processing. I don't see why or how we'll have
>>> text that starts with "\0x02". If we want to make it even more rare, we
>>> can use two bytes. Still much more efficient than tmpstr. Why would the
>>> the current implementation be a short list? You want this to be used all
>>> around with dynamic strings, this list will grow as usage will grow.
>>
>> Oh, so you never let the API user know that it is indeed a string and
>> allways make it a hidden structure. That would make the code base
>> quite hugly with all those str_get arround for little benefit. Also
>> having the risk to have a confused API with random behavior sounds bad
>> to me. If you want to hide information in a less problematic way, you
>> can hide easily one bit in the pointer itself. As your API wouldn't
>> autorize the use of the string directly anyway, having the least
>> significant bit used to say if the pointer is to be freed or not will
>> be clearly less error prone. Still it's hugly as hell and I think
>> should be avoided in Efl code base (That's not the first time we have
>> ruled against using that trick in Eina code base, and I am still
>> against this kind of trick).
>
> I think the proposed mechanism is much safer than encoding in the
> pointer. Encoding in the pointer will result in bad memory access and
> thus a crash. My suggestion can cause some issues with comparison,
> that's it. You can create mechanisms that make it completely error free.
> For example, you could save it internally (when you do) as a type that
> can't be cast to a string, and then you'd have to use the functions
> otherwise it wouldn't compile. This is just a clean way to maintain the
> string ABI but still making it safe internally.
>
>>
>> As for why it should stay a short list it is due to what tmpstr is
>> used for. Code pattern for it is of the form :
>>
>> tmp = function_returning_tmpstr();
>> function_call(tmp);
>> tmpstr_del(tmp);
>>
>> So either we have a massive recursion (which is wrong and need to be
>> fixed), or we are leaking tmpstr. tmpstr is not intended to be used
>> for outside of a block and should not be in a structure for example.
>> As I said, tmpstr alive at any point in time should be pretty low and
>> walking that said list never be a performance issue.
>>
>
> True. If we only use it in genlist (and similar), this is the usage
> pattern and it won't be the end of the world. However, as I said, I
> proposed above a solution I find more clean and more general purpose.
> Can be used in many places and saved for the long term. This is a bit
> diverging from the purpose of this thread though (my bad). It's more
> important that we deal with the issue at hand before we freeze.
>
>
>
>
> Conclusion: let's add eina_strftime and eina_tmpstr_manage_new and
> revert this before the freeze.


Btw, the rewrite should also fix the warnings.

src/lib/elm_calendar.c:175:11: warning: return discards ‘const’ 
qualifier from pointer target type [-Wdiscarded-qualifiers]
     return eina_tmpstr_strftime(E_("%B %Y"), selected_time);
            ^
--
Tom.


------------------------------------------------------------------------------
_______________________________________________
enlightenment-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] [EGIT] [core/efl] master 17/20: eina_tmpstr: add eina_tmpstr_strftime

Reply via email to