Re: [Kicad-developers] 6.0 string proposal

Dick Hollenbeck Fri, 03 May 2019 08:24:23 -0700

Could that damage have already been done by prior concurrent access?  Maybe a 
subsequent
read only operation fails because your linked list is already hay wired...


Walking a linked list at a later point in time than when it was damaged can do 
this..



On 5/3/19 10:18 AM, Jeff Young wrote:
> I don’t believe that’s the case.  Neither of the two crashes that I tracked 
> down involved
> direct index access (or any change to the string).  One was calling 
> foo.IsEmpty(), and the
> other foo.Length().  Both use const iterators under the hood.
> 
> When wxUSE_UNICODE_UTF8 is off, wxWidgets gets around parsing the string by 
> just not doing it:
> 
> Remarks
>     Note that while the behaviour of wxString
>     <https://docs.wxwidgets.org/trunk/classwx_string.html> when 
> |wxUSE_UNICODE_WCHAR==1| resembles
>     UCS-2 encoding, it's not completely correct to refer to wxString
>     <https://docs.wxwidgets.org/trunk/classwx_string.html> as UCS-2 encoded 
> since you can
>     encode code points outside the /BMP/ in a wxString
>     <https://docs.wxwidgets.org/trunk/classwx_string.html> as two code units 
> (i.e. as a
>     surrogate pair; as already mentioned however wxString
>     <https://docs.wxwidgets.org/trunk/classwx_string.html> will "see" them as 
> two
>     different code points)
> 
> 
>> On 3 May 2019, at 16:06, John Beard <john.j.be...@gmail.com
>> <mailto:john.j.be...@gmail.com>> wrote:
>>
>> Hi Jeff,
>>
>> I think it is the index access operator that performs this caching, to allow 
>> you to
>> access the n'th code point any number of times while only iterating the 
>> string only once.
>>
>> However, you can still use the iterator access safely. It is only index 
>> based access
>> that is cached and thread-unsafe.
>>
>> This is what the wxString documention recommends. Furthermore, in any 
>> Unicode string,
>> regardless of encoding (8, 16, 32), index access is almost entirely useless 
>> anyway, as
>> code units/points are only indirectly related to glyphs and/or perceived 
>> characters
>> anyway. If you need to parse a Unicode string, you must iterate from the 
>> start. There is
>> no way around it.
>>
>> If we're crashing due to cross thread access by index, the bug is probably 
>> that we
>> access the string by index at all. If this was accessed by iterator, cross 
>> thread, and
>> the string is not changed, it's fine. If the string is changed in another 
>> thread, cached
>> iterators are invalid (same as if you change an C++ container in a single 
>> thread. The
>> standard tells you what iterators are invalidated for each operation on a 
>> container).
>>
>> I may have got the wrong end of the wxStick here (I can't check it for 
>> myself right
>> now), but as far as I can tell, this is fixable by just never caching 
>> indices, as if we
>> were looking at a C-style char array, and using iterators instead.
>>
>> We should probably also turn off the unsafe string conversions by defining
>> wxNO_UNSAFE_WXSTRING_CONV, if it is not already define.
>>
>> Cheers,
>>
>> John
>>
>> On 3 May 2019 16:35:30 CEST, Jeff Young <j...@rokeby.ie 
>> <mailto:j...@rokeby.ie>> wrote:
>>
>>     Yes, we know exactly why it crashes: in order to speed up iterator 
>> access each
>>     iterator keeps a pointer into the last location accessed (so that i-1 
>> and i+1 can be
>>     fast).  These pointers are kept in a linked-list.  Adding and removing 
>> pointers from
>>     this list is not thread-protected.
>>
>>     Note that wxWidgets will add/remove a pointer even for something 
>> seemingly innocuous
>>     like an Empty() check.  So doing mutex locks on our side for non-const 
>> iterator
>>     access is not sufficient.
>>
>>     The worst part is that since two threads collide on the same string only 
>> rarely, we
>>     don’t even know how many of these bugs we have.  We’ve fixed 3 or 4 of 
>> them (by
>>     adding our own mutex checking on any access), but are there 0 or 10 
>> more?  Haven’t a
>>     clue.
>>
>>>>     It is between sad and breath taking.
>>
>>     Indeed.
>>
>>     Cheers,
>>     Jeff.
>>
>>>     On 3 May 2019, at 15:16, Dick Hollenbeck <d...@softplc.com
>>>     <mailto:d...@softplc.com>> wrote:
>>>
>>>     Thanks Jeff.
>>>
>>>     On 5/3/19 4:22 AM, Jeff Young wrote:
>>>>     Hi Dick,
>>>>
>>>>>>     h) What is the list of deficiencies with current string usage?
>>>>
>>>>     I only have one issue with the current use of wxString, but it’s a big 
>>>> one: it crashes
>>>>     (unpredictably) when used multi-threaded in UTF8 mode.
>>>
>>>     The fact that it is onely *One* issue is an important data point.
>>>
>>>     Since you know it is crashing in this class, you must know 
>>> approximately where, and
>>>     under
>>>     what kind of read/write activity.  Of course, if read activity triggers 
>>> a lazy
>>>     (deferred)
>>>     transformation, then this distinction can get blurred.  But more 
>>> information on source
>>>     file locations would be very helpful to me.
>>>
>>>     Another important data point you brought is that the wx library 
>>> designers are advising
>>>     against using wxString for core application.  It will take a couple of 
>>> hours to even
>>>     contemplate that, it is basically staggering to me.  It is between sad 
>>> and breath
>>>     taking.
>>>     Sounds like they designed themselves into a corner and are now 
>>> acknowledging that what
>>>     they designed is more of an API commitment that they want to disavow 
>>> than a real
>>>     solution.
>>>
>>>     I can see where that can happen.  Superior designs come from experience.
>>>      Experience comes
>>>     with usage and time, neither of which are always available up front.
>>>
>>>
>>>
>>>
>>>
>>>>
>>>>     This design document makes for fascinating
>>>>     reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support.  It 
>>>> appears that the
>>>>     current wxString is at least in part modelled on QtString.
>>>>
>>>>     There’s also a bunch of interesting info
>>>>     here: https://docs.wxwidgets.org/trunk/overview_string.html, which I 
>>>> believe is more
>>>>     up-to-date than the previous link.  In particular, there’s the mention 
>>>> that wxString
>>>>     handles extra-BMP characters transparently when compiled in UTF8 mode 
>>>> (currently
>>>>     used by
>>>>     Kicad), but does NOT when compiled in default mode (in which case the 
>>>> app must handle
>>>>     surrogate pairs).  This of course directly leads to your point (d):
>>>>
>>>>>>>     d) What does the set of characters that don't fall into UCS2 
>>>>>>> actually look
>>>>>>>     like?  How big
>>>>>>>     is this set, really?  (UTF16 is bigger than UCS2 and picks up the 
>>>>>>> difference.)
>>>>
>>>>     Do we really need to handle extra-BMP characters?
>>>>
>>>>     An even more recent version of the second document
>>>>     (https://docs.wxwidgets.org/trunk/classwx_string.html) finally makes 
>>>> an oblique
>>>>     reference
>>>>     to the multi-threading issue by starting with this (rather unhelpful) 
>>>> suggestion:
>>>>
>>>>     Note
>>>>        While the use of wxString 
>>>> <https://docs.wxwidgets.org/trunk/classwx_string.html> is
>>>>        unavoidable in wxWidgets program, you are encouraged to use the 
>>>> standard string
>>>>        classes |std::string| or |std::wstring| in your applications and 
>>>> convert them
>>>>     to and
>>>>        from wxString 
>>>> <https://docs.wxwidgets.org/trunk/classwx_string.html> only when
>>>>        interacting with wxWidgets.
>>>>
>>>>
>>>>     Cheers,
>>>>     Jeff.
>>>>
>>>>
>>>>>     On 3 May 2019, at 02:03, Dick Hollenbeck <d...@softplc.com
>>>>>     <mailto:d...@softplc.com> <mailto:d...@softplc.com>> wrote:
>>>>>
>>>>>     On 5/2/19 5:32 PM, Dick Hollenbeck wrote:
>>>>>>     On 4/30/19 4:36 AM, Jeff Young wrote:
>>>>>>>     We had talked earlier about throwing the wxWidgets UTF8 compile 
>>>>>>> switch to get
>>>>>>>     rid of
>>>>>>>     our wxString re-entrancy problems.  However, I noticed that the 6.0 
>>>>>>> work
>>>>>>>     packages doc
>>>>>>>     includes an item for std::string-ization of the BOARD.  (While a 
>>>>>>> lot more work,
>>>>>>>     this
>>>>>>>     is a better solution because it also increases our 
>>>>>>> gui-toolkit-choice flexibility.)
>>>>>>>
>>>>>>>     I’d like to propose that we use std::wstring for that.  UTF8 should 
>>>>>>> *only* be an
>>>>>>>     encoding format (similar to s-expr).  It should never be used 
>>>>>>> internally.
>>>>>>>     That’s what
>>>>>>>     unicode wchar_t’s are for.
>>>>>>>
>>>>>>>     And I’d like to propose that we extend std::wstring-ization to 
>>>>>>> SCH_ITEM and
>>>>>>>     LIB_ITEM.
>>>>>>>      (Then we can get rid of a bunch of our ugly mutex hacks.)
>>>>>>
>>>>>>
>>>>>>     I've been looking at this for a few months now.  I think it is so 
>>>>>> important, that a
>>>>>>     sub-committee should be formed, and if that committee takes as long 
>>>>>> as 4 months
>>>>>>     to come to
>>>>>>     a recommendation, this would not be too long.  This issue is simply 
>>>>>> too critical.
>>>>>>
>>>>>>     I would like to volunteer to be on that committee.  For the entire 
>>>>>> list to
>>>>>>     participate in
>>>>>>     this simply does not make sense to me.  I would welcome the 
>>>>>> opportunity to study
>>>>>>     this with
>>>>>>     a team of 5-6 players.  More than that probably leads to anxiety.  
>>>>>> Then, given the
>>>>>>     recommendations, the list would of course have an opportunity to 
>>>>>> raise questions
>>>>>>     and take
>>>>>>     shots, before a strategy is formulated, and before anything is 
>>>>>> implemented.
>>>>>>
>>>>>>     Again, approximately:
>>>>>>
>>>>>>      committee recommendations -> list approval -> strategy formulation 
>>>>>> ->
>>>>>>     implementation
>>>>>>
>>>>>>
>>>>>>     Up to now I have looked at many libraries and have [way *too* much] 
>>>>>> experience
>>>>>>     in multiple
>>>>>>     languages on multiple platforms, so I think I can be valuable 
>>>>>> contributor.
>>>>>>
>>>>>>     The final work product initially would simply be a list of 
>>>>>> recommendations, that
>>>>>>     quickly
>>>>>>     transforms to a strategy thereafter.  This is an enormous 
>>>>>> undertaking, so I suggest
>>>>>>     against racing to a solution.  It could look a lot easier than it 
>>>>>> will
>>>>>>     ultimately be, as
>>>>>>     is typical in software development.  But the return on investment 
>>>>>> needs to be
>>>>>>     near optimal
>>>>>>     in the end.
>>>>>>
>>>>>>     Some questions to answer are:
>>>>>>
>>>>>>     a) How did wxString get to its current state?  Is is merely a 
>>>>>> conglomeration of
>>>>>>     after
>>>>>>     thought, or is is anywhere near optimal.
>>>>>>
>>>>>>     b) Why so many forms of it?  Can one form be chosen for all 
>>>>>> platforms?
>>>>>>
>>>>>>     c) How does wxString it compare to QtString?
>>>>>>
>>>>>>     d) What does the set of characters that don't fall into UCS2 
>>>>>> actually look like?
>>>>>>      How big
>>>>>>     is this set, really?  (UTF16 is bigger than UCS2 and picks up the 
>>>>>> difference.)
>>>>>>
>>>>>>     e) For data files, I think UTF8 is fine.  So the change is for RAM 
>>>>>> manipulation of
>>>>>>     strings.  Aren't we talking about a RAM resident string that bridges 
>>>>>> into the GUI
>>>>>>     seamlessly?
>>>>>>
>>>>>>     f) What does new C++ language support offer?
>>>>>>
>>>>>>     g) What do C++ language designers suggest?
>>>>>
>>>>>     h) What is the list of deficiencies with current string usage?
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>     etc.
>>>>>>
>>>>>>     But this is best continued in a smaller group, as said.
>>>>>>
>>>>>>
>>>>>>     The other thing that I bring to this is vast familiarity with 
>>>>>> KiCad's internal
>>>>>>     workings,
>>>>>>     string use cases, and goals.
>>>>>>
>>>>>>     Let me know if I can help.
>>>>>>
>>>>>>     Regards,
>>>>>>
>>>>>>     Dick
>>>>>>
>>>>>>
>>>>>>     _______________________________________________
>>>>>>     Mailing list: https://launchpad.net/~kicad-developers
>>>>>>     Post to     : kicad-developers@lists.launchpad.net
>>>>>>     <mailto:kicad-developers@lists.launchpad.net>
>>>>>>     <mailto:kicad-developers@lists.launchpad.net>
>>>>>>     Unsubscribe : https://launchpad.net/~kicad-developers
>>>>>>     More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>
>>>>>
>>>>>     _______________________________________________
>>>>>     Mailing list: https://launchpad.net/~kicad-developers
>>>>>     Post to     : kicad-developers@lists.launchpad.net
>>>>>     <mailto:kicad-developers@lists.launchpad.net>
>>>>>     <mailto:kicad-developers@lists.launchpad.net>
>>>>>     Unsubscribe : https://launchpad.net/~kicad-developers
>>>>>     More help   : https://help.launchpad.net/ListHelp
>>
> 


_______________________________________________
Mailing list: https://launchpad.net/~kicad-developers
Post to     : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

Reply via email to