Re: [Kicad-developers] 6.0 string proposal

Dick Hollenbeck Fri, 03 May 2019 08:15:11 -0700

John I got this too from reading the class documentation an hour ago.

To smoke these out, a person could comment out the undesirable calls in a wx 
header,
perhaps one that was temporarily moved into a place at a higher priority in the 
INCLUDE
file search space.


Then "make -i"

perhaps on a subset of multi-threaded source files (object file make targets), 
or the
whole shebang for maximum pain.



On 5/3/19 10:06 AM, John Beard wrote:
> Hi Jeff,
> 
> I think it is the index access operator that performs this caching, to allow 
> you to access
> the n'th code point any number of times while only iterating the string only 
> once.
> 
> However, you can still use the iterator access safely. It is only index based 
> access that
> is cached and thread-unsafe.
> 
> This is what the wxString documention recommends. Furthermore, in any Unicode 
> string,
> regardless of encoding (8, 16, 32), index access is almost entirely useless 
> anyway, as
> code units/points are only indirectly related to glyphs and/or perceived 
> characters
> anyway. If you need to parse a Unicode string, you must iterate from the 
> start. There is
> no way around it.
> 
> If we're crashing due to cross thread access by index, the bug is probably 
> that we access
> the string by index at all. If this was accessed by iterator, cross thread, 
> and the string
> is not changed, it's fine. If the string is changed in another thread, cached 
> iterators
> are invalid (same as if you change an C++ container in a single thread. The 
> standard tells
> you what iterators are invalidated for each operation on a container).
> 
> I may have got the wrong end of the wxStick here (I can't check it for myself 
> right now),
> but as far as I can tell, this is fixable by just never caching indices, as 
> if we were
> looking at a C-style char array, and using iterators instead.
> 
> We should probably also turn off the unsafe string conversions by defining
> wxNO_UNSAFE_WXSTRING_CONV, if it is not already define.
> 
> Cheers,
> 
> John
> 
> On 3 May 2019 16:35:30 CEST, Jeff Young <j...@rokeby.ie> wrote:
> 
>     Yes, we know exactly why it crashes: in order to speed up iterator access 
> each
>     iterator keeps a pointer into the last location accessed (so that i-1 and 
> i+1 can be
>     fast).  These pointers are kept in a linked-list.  Adding and removing 
> pointers from
>     this list is not thread-protected.
> 
>     Note that wxWidgets will add/remove a pointer even for something 
> seemingly innocuous
>     like an Empty() check.  So doing mutex locks on our side for non-const 
> iterator access
>     is not sufficient.
> 
>     The worst part is that since two threads collide on the same string only 
> rarely, we
>     don’t even know how many of these bugs we have.  We’ve fixed 3 or 4 of 
> them (by adding
>     our own mutex checking on any access), but are there 0 or 10 more?  
> Haven’t a clue.
> 
>>>     It is between sad and breath taking.
> 
>     Indeed.
> 
>     Cheers,
>     Jeff.
> 
>>     On 3 May 2019, at 15:16, Dick Hollenbeck <d...@softplc.com
>>     <mailto:d...@softplc.com>> wrote:
>>
>>     Thanks Jeff.
>>
>>     On 5/3/19 4:22 AM, Jeff Young wrote:
>>>     Hi Dick,
>>>
>>>>>     h) What is the list of deficiencies with current string usage?
>>>
>>>     I only have one issue with the current use of wxString, but it’s a big 
>>> one: it crashes
>>>     (unpredictably) when used multi-threaded in UTF8 mode.
>>
>>     The fact that it is onely *One* issue is an important data point.
>>
>>     Since you know it is crashing in this class, you must know approximately 
>> where, and
>>     under
>>     what kind of read/write activity.  Of course, if read activity triggers 
>> a lazy
>>     (deferred)
>>     transformation, then this distinction can get blurred.  But more 
>> information on source
>>     file locations would be very helpful to me.
>>
>>     Another important data point you brought is that the wx library 
>> designers are advising
>>     against using wxString for core application.  It will take a couple of 
>> hours to even
>>     contemplate that, it is basically staggering to me.  It is between sad 
>> and breath
>>     taking.
>>     Sounds like they designed themselves into a corner and are now 
>> acknowledging that what
>>     they designed is more of an API commitment that they want to disavow 
>> than a real
>>     solution.
>>
>>     I can see where that can happen.  Superior designs come from experience. 
>>  Experience
>>     comes
>>     with usage and time, neither of which are always available up front.
>>
>>
>>
>>
>>
>>>
>>>     This design document makes for fascinating
>>>     reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support.  It 
>>> appears that the
>>>     current wxString is at least in part modelled on QtString.
>>>
>>>     There’s also a bunch of interesting info
>>>     here: https://docs.wxwidgets.org/trunk/overview_string.html, which I 
>>> believe is more
>>>     up-to-date than the previous link.  In particular, there’s the mention 
>>> that wxString
>>>     handles extra-BMP characters transparently when compiled in UTF8 mode 
>>> (currently
>>>     used by
>>>     Kicad), but does NOT when compiled in default mode (in which case the 
>>> app must handle
>>>     surrogate pairs).  This of course directly leads to your point (d):
>>>
>>>>>>     d) What does the set of characters that don't fall into UCS2 
>>>>>> actually look like?
>>>>>>      How big
>>>>>>     is this set, really?  (UTF16 is bigger than UCS2 and picks up the 
>>>>>> difference.)
>>>
>>>     Do we really need to handle extra-BMP characters?
>>>
>>>     An even more recent version of the second document
>>>     (https://docs.wxwidgets.org/trunk/classwx_string.html) finally makes an 
>>> oblique
>>>     reference
>>>     to the multi-threading issue by starting with this (rather unhelpful) 
>>> suggestion:
>>>
>>>     Note
>>>        While the use of wxString 
>>> <https://docs.wxwidgets.org/trunk/classwx_string.html> is
>>>        unavoidable in wxWidgets program, you are encouraged to use the 
>>> standard string
>>>        classes |std::string| or |std::wstring| in your applications and 
>>> convert them to and
>>>        from wxString <https://docs.wxwidgets.org/trunk/classwx_string.html> 
>>> only when
>>>        interacting with wxWidgets.
>>>
>>>
>>>     Cheers,
>>>     Jeff.
>>>
>>>
>>>>     On 3 May 2019, at 02:03, Dick Hollenbeck <d...@softplc.com
>>>>     <mailto:d...@softplc.com> <mailto:d...@softplc.com>> wrote:
>>>>
>>>>     On 5/2/19 5:32 PM, Dick Hollenbeck wrote:
>>>>>     On 4/30/19 4:36 AM, Jeff Young wrote:
>>>>>>     We had talked earlier about throwing the wxWidgets UTF8 compile 
>>>>>> switch to get rid of
>>>>>>     our wxString re-entrancy problems.  However, I noticed that the 6.0 
>>>>>> work
>>>>>>     packages doc
>>>>>>     includes an item for std::string-ization of the BOARD.  (While a lot 
>>>>>> more work, this
>>>>>>     is a better solution because it also increases our 
>>>>>> gui-toolkit-choice flexibility.)
>>>>>>
>>>>>>     I’d like to propose that we use std::wstring for that.  UTF8 should 
>>>>>> *only* be an
>>>>>>     encoding format (similar to s-expr).  It should never be used 
>>>>>> internally. That’s
>>>>>>     what
>>>>>>     unicode wchar_t’s are for.
>>>>>>
>>>>>>     And I’d like to propose that we extend std::wstring-ization to 
>>>>>> SCH_ITEM and
>>>>>>     LIB_ITEM.
>>>>>>      (Then we can get rid of a bunch of our ugly mutex hacks.)
>>>>>
>>>>>
>>>>>     I've been looking at this for a few months now.  I think it is so 
>>>>> important, that a
>>>>>     sub-committee should be formed, and if that committee takes as long 
>>>>> as 4 months
>>>>>     to come to
>>>>>     a recommendation, this would not be too long.  This issue is simply 
>>>>> too critical.
>>>>>
>>>>>     I would like to volunteer to be on that committee.  For the entire 
>>>>> list to
>>>>>     participate in
>>>>>     this simply does not make sense to me.  I would welcome the 
>>>>> opportunity to study
>>>>>     this with
>>>>>     a team of 5-6 players.  More than that probably leads to anxiety.  
>>>>> Then, given the
>>>>>     recommendations, the list would of course have an opportunity to 
>>>>> raise questions
>>>>>     and take
>>>>>     shots, before a strategy is formulated, and before anything is 
>>>>> implemented.
>>>>>
>>>>>     Again, approximately:
>>>>>
>>>>>      committee recommendations -> list approval -> strategy formulation 
>>>>> -> implementation
>>>>>
>>>>>
>>>>>     Up to now I have looked at many libraries and have [way *too* much] 
>>>>> experience in
>>>>>     multiple
>>>>>     languages on multiple platforms, so I think I can be valuable 
>>>>> contributor.
>>>>>
>>>>>     The final work product initially would simply be a list of 
>>>>> recommendations, that
>>>>>     quickly
>>>>>     transforms to a strategy thereafter.  This is an enormous 
>>>>> undertaking, so I suggest
>>>>>     against racing to a solution.  It could look a lot easier than it 
>>>>> will ultimately
>>>>>     be, as
>>>>>     is typical in software development.  But the return on investment 
>>>>> needs to be
>>>>>     near optimal
>>>>>     in the end.
>>>>>
>>>>>     Some questions to answer are:
>>>>>
>>>>>     a) How did wxString get to its current state?  Is is merely a 
>>>>> conglomeration of after
>>>>>     thought, or is is anywhere near optimal.
>>>>>
>>>>>     b) Why so many forms of it?  Can one form be chosen for all platforms?
>>>>>
>>>>>     c) How does wxString it compare to QtString?
>>>>>
>>>>>     d) What does the set of characters that don't fall into UCS2 actually 
>>>>> look like?
>>>>>      How big
>>>>>     is this set, really?  (UTF16 is bigger than UCS2 and picks up the 
>>>>> difference.)
>>>>>
>>>>>     e) For data files, I think UTF8 is fine.  So the change is for RAM 
>>>>> manipulation of
>>>>>     strings.  Aren't we talking about a RAM resident string that bridges 
>>>>> into the GUI
>>>>>     seamlessly?
>>>>>
>>>>>     f) What does new C++ language support offer?
>>>>>
>>>>>     g) What do C++ language designers suggest?
>>>>
>>>>     h) What is the list of deficiencies with current string usage?
>>>>
>>>>
>>>>>
>>>>>
>>>>>     etc.
>>>>>
>>>>>     But this is best continued in a smaller group, as said.
>>>>>
>>>>>
>>>>>     The other thing that I bring to this is vast familiarity with KiCad's 
>>>>> internal
>>>>>     workings,
>>>>>     string use cases, and goals.
>>>>>
>>>>>     Let me know if I can help.
>>>>>
>>>>>     Regards,
>>>>>
>>>>>     Dick
>>>>>
>>>>>
>>>>>     _______________________________________________
>>>>>     Mailing list: https://launchpad.net/~kicad-developers
>>>>>     Post to     : kicad-developers@lists.launchpad.net
>>>>>     <mailto:kicad-developers@lists.launchpad.net>
>>>>>     <mailto:kicad-developers@lists.launchpad.net>
>>>>>     Unsubscribe : https://launchpad.net/~kicad-developers
>>>>>     More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>
>>>>
>>>>     _______________________________________________
>>>>     Mailing list: https://launchpad.net/~kicad-developers
>>>>     Post to     : kicad-developers@lists.launchpad.net
>>>>     <mailto:kicad-developers@lists.launchpad.net>
>>>>     <mailto:kicad-developers@lists.launchpad.net>
>>>>     Unsubscribe : https://launchpad.net/~kicad-developers
>>>>     More help   : https://help.launchpad.net/ListHelp
> 


_______________________________________________
Mailing list: https://launchpad.net/~kicad-developers
Post to     : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

Reply via email to