subject:"Re\: \[Kicad\-developers\] 6.0 string proposal"

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread José Ignacio

When you implement command line switches you will probably want --help to
emit a translated message.

On Fri, May 3, 2019 at 1:51 PM Wayne Stambaugh  wrote:

> On 5/3/2019 11:27 AM, Dick Hollenbeck wrote:
> > On 5/3/19 9:41 AM, Wayne Stambaugh wrote:
> >> There is a secondary goal of removing wxWidgets from our low level
> >> objects.  Maybe some day we can build the low level KiCad non-ui
> >> libraries sans wxWdigets.  My thinking is that wxString should only come
> >> into play at the UI level when dealing with wxWidgets UI code.  Being
> >> able to use a standard C++ string implementation would (may?) go a long
> >> way in helping with that goal.
> >
> > That goal is what I had in mind when I wrote the UTF8 class, and its
> bidirectional
> > conversions to and from wxString.  I think you can pass instances of
> UTF8 to wx functions
> > in many cases, and assign to UTF8 on function returns.
> >
> > But, you don't have all the nice translation support there.  You could
> through use wx
> > translation support and simply assign to UTF8, however.
> >
>
> I don't think translations are an issue because AFAIK we don't have any
> strings in the low level non-ui objects that needs translated but I
> could be wrong.
>
> ___
> Mailing list: https://launchpad.net/~kicad-developers
> Post to : kicad-developers@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp
>
___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Wayne Stambaugh

On 5/3/2019 11:27 AM, Dick Hollenbeck wrote:
> On 5/3/19 9:41 AM, Wayne Stambaugh wrote:
>> There is a secondary goal of removing wxWidgets from our low level
>> objects.  Maybe some day we can build the low level KiCad non-ui
>> libraries sans wxWdigets.  My thinking is that wxString should only come
>> into play at the UI level when dealing with wxWidgets UI code.  Being
>> able to use a standard C++ string implementation would (may?) go a long
>> way in helping with that goal.
> 
> That goal is what I had in mind when I wrote the UTF8 class, and its 
> bidirectional
> conversions to and from wxString.  I think you can pass instances of UTF8 to 
> wx functions
> in many cases, and assign to UTF8 on function returns.
> 
> But, you don't have all the nice translation support there.  You could 
> through use wx
> translation support and simply assign to UTF8, however.
> 

I don't think translations are an issue because AFAIK we don't have any
strings in the low level non-ui objects that needs translated but I
could be wrong.

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Adam Wolf

Yes, we build wxwidgets ourselves!  Let me know what settings I should
set, and we can get test builds out shortly!

https://github.com/KiCad/kicad-mac-builder/blob/master/kicad-mac-builder/wxpython.cmake

Adam

On Fri, May 3, 2019 at 11:52 AM Wayne Stambaugh  wrote:
>
> It seems like this is macos specific.  Don't we have our own custom
> wxwidgets builds for macos?  Maybe we should change the build flags.
>
> On 5/3/2019 12:34 PM, Jeff Young wrote:
> > This is the only bug I could find in the database: 
> > https://bugs.launchpad.net/kicad/+bug/1822678.  Note that the Michaels 
> > (both Geselbracht and Kavanagh) are on OSX.
> >
> > There’s also the two I fixed (neither of which was logged as I couldn’t 
> > figure out how to reproduce them).  Once again, OSX.
> > ___
> > Mailing list: https://launchpad.net/~kicad-developers
> > Post to : kicad-developers@lists.launchpad.net
> > Unsubscribe : https://launchpad.net/~kicad-developers
> > More help   : https://help.launchpad.net/ListHelp
> >
>
> ___
> Mailing list: https://launchpad.net/~kicad-developers
> Post to : kicad-developers@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Dick Hollenbeck

It takes some lead to hit a moving target.
You know this if you have ever shot clay pigeons.

Do we have any evidence that computers are going to have more memory in the 
future?

If so, then this class might be useful:

http://www.cplusplus.com/reference/string/u32string/


I actually don't know what UTF32 is.   It seems like an oxymoron.  If a 32 bit 
field can
hold every character on earth, why do you imply with the "UTF" prefix that this 
is a multi
element per character animal?  Could just be a bad name selection.  "Real 
unicode" would
have been a better name.

In any case, that class probably has some decent typedefs that make it easy to 
use, and
with some conversion functions to and from wxSring, might be easiest to deal 
with long term.

Note that wchar_t is a classic example of bad software design.  Somebody made a 
bet on
wchar_t being wider than char, but did not ensure that it was big enough.  (Did 
not gather
all the concerns up before racing forward.)

char32_t does not make that mistake.  Well it should work until we get invaded 
by space
aliens.

And really, if the next wxWidgets major release does not use this as their 
string
foundation, then I would remain more confused than I am now.  And its confusing 
how that
might be possible.

Because I bought what I would have called a super computer 12 years ago for 
under $35
recently, with a lot of ram.

Dick




___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Wayne Stambaugh

It seems like this is macos specific.  Don't we have our own custom
wxwidgets builds for macos?  Maybe we should change the build flags.

On 5/3/2019 12:34 PM, Jeff Young wrote:
> This is the only bug I could find in the database: 
> https://bugs.launchpad.net/kicad/+bug/1822678.  Note that the Michaels (both 
> Geselbracht and Kavanagh) are on OSX.
> 
> There’s also the two I fixed (neither of which was logged as I couldn’t 
> figure out how to reproduce them).  Once again, OSX.
> ___
> Mailing list: https://launchpad.net/~kicad-developers
> Post to : kicad-developers@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp
> 

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Jeff Young

This is the only bug I could find in the database: 
https://bugs.launchpad.net/kicad/+bug/1822678.  Note that the Michaels (both 
Geselbracht and Kavanagh) are on OSX.

There’s also the two I fixed (neither of which was logged as I couldn’t figure 
out how to reproduce them).  Once again, OSX.
___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Wayne Stambaugh

On 5/3/2019 10:49 AM, Jeff Young wrote:
> @Wayne, are you sure about those settings?  (That’s the flag I proposed
> we flip in the previous incantation of this thread.)  I’m fairly sure
> (although not 100% certain) that the bug only exists in
> wxUSE_UNICODE_UTF8 mode.

If that is the case, could it be that simple to just rebuild wxWidgets
with the appropriate settings?  That would be something wouldn't it?  I
double checked my system, ubuntu on my laptop, both 32and 64 bit builds
on windows and they are all configured the same way.  Keep in mind that
I am just looking at the packaged versions of wxWidgets on these
platforms.  If folks are build their own custom wxWidgets libraries, I
cannot vouch for them.  I wonder if Tom's new crash reporter can grab
the wx/setup.h file so we can have another data point in our bug
reports, particularly for those unexplained random crashes that we seem
to have.

> 
> @Jon & @Tom, I know you guys have also fixed some of these crashes.
>  What platform did you find them on?  The ones I’ve fixed have been
> encountered on OSX (which is the only platform I have).
> 
>> On 3 May 2019, at 15:41, Wayne Stambaugh > > wrote:
>>
>> On 5/3/19 5:22 AM, Jeff Young wrote:
>>> Hi Dick,
>>>
> h) What is the list of deficiencies with current string usage?
>>>
>>> I only have one issue with the current use of wxString, but it’s a big
>>> one: it crashes (unpredictably) when used multi-threaded in UTF8 mode.
>>
>> I thought it was wxString itself that was not thread safe not
>> necessarily the utf-8 build but thread safety is the primary goal now
>> that we are using threads in multiple places within KiCad.
>>
>> On my Debian system wx/setup.h shows
>>
>> #define wxUSE_UNICODE 1
>>
>> and
>>
>> #define wxUSE_UNICODE_UTF8 0
>>
>> so it would appear that wxString is built for unicode not utf8 mode on
>> linux.  I'm also pretty sure windows builds are unicode as well.
>>
>> There is a secondary goal of removing wxWidgets from our low level
>> objects.  Maybe some day we can build the low level KiCad non-ui
>> libraries sans wxWdigets.  My thinking is that wxString should only come
>> into play at the UI level when dealing with wxWidgets UI code.  Being
>> able to use a standard C++ string implementation would (may?) go a long
>> way in helping with that goal.
>>
>>>
>>> This design document makes for fascinating
>>> reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support.  It
>>> appears that the current wxString is at least in part modelled on
>>> QtString.
>>>
>>> There’s also a bunch of interesting info
>>> here: https://docs.wxwidgets.org/trunk/overview_string.html, which I
>>> believe is more up-to-date than the previous link.  In particular,
>>> there’s the mention that wxString handles extra-BMP characters
>>> transparently when compiled in UTF8 mode (currently used by Kicad), but
>>> does NOT when compiled in default mode (in which case the app must
>>> handle surrogate pairs).  This of course directly leads to your point
>>> (d):
>>>
>> d) What does the set of characters that don't fall into UCS2
>> actually look like?  How big
>> is this set, really?  (UTF16 is bigger than UCS2 and picks up the
>> difference.)
>>>
>>> Do we really need to handle extra-BMP characters?
>>>
>>> An even more recent version of the second document
>>> (https://docs.wxwidgets.org/trunk/classwx_string.html) finally makes an
>>> oblique reference to the multi-threading issue by starting with this
>>> (rather unhelpful) suggestion:
>>>
>>> Note
>>>    While the use of wxString
>>>     is
>>>    unavoidable in wxWidgets program, you are encouraged to use the
>>>    standard string classes |std::string| or |std::wstring| in your
>>>    applications and convert them to and from wxString
>>>     only when
>>>    interacting with wxWidgets.
>>>
>>>
>>> Cheers,
>>> Jeff.
>>>
>>>
 On 3 May 2019, at 02:03, Dick Hollenbeck >>> 
 > wrote:

 On 5/2/19 5:32 PM, Dick Hollenbeck wrote:
> On 4/30/19 4:36 AM, Jeff Young wrote:
>> We had talked earlier about throwing the wxWidgets UTF8 compile
>> switch to get rid of our wxString re-entrancy problems.  However, I
>> noticed that the 6.0 work packages doc includes an item for
>> std::string-ization of the BOARD.  (While a lot more work, this is a
>> better solution because it also increases our gui-toolkit-choice
>> flexibility.)
>>
>> I’d like to propose that we use std::wstring for that.  UTF8 should
>> *only* be an encoding format (similar to s-expr).  It should never
>> be used internally. That’s what unicode wchar_t’s are for.
>>
>> And I’d like to propose that we extend std::wstring-ization to
>> SCH_ITEM and LIB_ITEM.  (Then we can get rid of a bunch of our ugly
>> mutex hacks.)

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Jeff Young

I did a bit more sleuthing.  Turns out wxUSE_UNICODE_UTF8 is on for OSX.  I 
misread it because wxUSE_STRING_POS_CACHE is Linux-only:
#if wxUSE_UNICODE_UTF8 && !defined(__WINDOWS__) && !defined(__WXOSX__)
#define wxUSE_STRING_POS_CACHE 1
#else
#define wxUSE_STRING_POS_CACHE 0
#endif
… and it works just like John Beard described (ie: it’s only used for index 
access).

So why was I still getting the crash in IsEmpty() and Length()?  Because it’s 
not the position cache that’s the problem, it’s the iterators themselves.  
Later in string.h we find this (note that all iterators are put in linked list 
in UTF8 mode, const or otherwise):
#if wxUSE_UNICODE_UTF8
// see the comment near wxString::iterator for why we need this
class WXDLLIMPEXP_BASE wxStringIteratorNode
{
public:
wxStringIteratorNode()
: m_str(NULL), m_citer(NULL), m_iter(NULL), m_prev(NULL), m_next(NULL) 
{}
wxStringIteratorNode(const wxString *str,
  wxStringImpl::const_iterator *citer)
{ DoSet(str, citer, NULL); }
wxStringIteratorNode(const wxString *str, wxStringImpl::iterator *iter)
{ DoSet(str, NULL, iter); }
~wxStringIteratorNode()
{ clear(); }

inline void set(const wxString *str, wxStringImpl::const_iterator *citer)
{ clear(); DoSet(str, citer, NULL); }
inline void set(const wxString *str, wxStringImpl::iterator *iter)
{ clear(); DoSet(str, NULL, iter); }

const wxString *m_str;
wxStringImpl::const_iterator *m_citer;
wxStringImpl::iterator *m_iter;
wxStringIteratorNode *m_prev, *m_next;

private:
inline void clear();
inline void DoSet(const wxString *str,
  wxStringImpl::const_iterator *citer,
  wxStringImpl::iterator *iter);

// the node belongs to a particular iterator instance, it's not copied
// when a copy of the iterator is made
wxDECLARE_NO_COPY_CLASS(wxStringIteratorNode);
};
#endif // wxUSE_UNICODE_UTF8

While this looks bad, there is some very good news here:

1) We don’t need to support extra-BMP characters.  We know this because we 
never have on Linux.
2) The memory constraints of using hobbled-UTF16 (UCS-2 in reality) aren’t too 
dire.  (They’re not killing our Linux implementation, anyway.)

So, is wxUSE_UNICODE_UTF8 really only set on OSX?  If so, were all the random 
crash reports really OSX-only?  (Doesn’t seem likely, but maybe.)

Or is wxUSE_UNICODE_UTF8 also set on Windows?

> On 3 May 2019, at 16:24, Dick Hollenbeck  wrote:
> 
> Could that damage have already been done by prior concurrent access?  Maybe a 
> subsequent
> read only operation fails because your linked list is already hay wired...
> 
> Walking a linked list at a later point in time than when it was damaged can 
> do this..
> 
> 
> 
> On 5/3/19 10:18 AM, Jeff Young wrote:
>> I don’t believe that’s the case.  Neither of the two crashes that I tracked 
>> down involved
>> direct index access (or any change to the string).  One was calling 
>> foo.IsEmpty(), and the
>> other foo.Length().  Both use const iterators under the hood.
>> 
>> When wxUSE_UNICODE_UTF8 is off, wxWidgets gets around parsing the string by 
>> just not doing it:
>> 
>> Remarks
>>Note that while the behaviour of wxString
>>> > when 
>> |wxUSE_UNICODE_WCHAR==1| resembles
>>UCS-2 encoding, it's not completely correct to refer to wxString
>>> > as UCS-2 encoded 
>> since you can
>>encode code points outside the /BMP/ in a wxString
>>> > as two code units 
>> (i.e. as a
>>surrogate pair; as already mentioned however wxString
>>> > will "see" them as 
>> two
>>different code points)
>> 
>> 
>>> On 3 May 2019, at 16:06, John Beard >> 
>>> >> wrote:
>>> 
>>> Hi Jeff,
>>> 
>>> I think it is the index access operator that performs this caching, to 
>>> allow you to
>>> access the n'th code point any number of times while only iterating the 
>>> string only once.
>>> 
>>> However, you can still use the iterator access safely. It is only index 
>>> based access
>>> that is cached and thread-unsafe.
>>> 
>>> This is what the wxString documention recommends. Furthermore, in any 
>>> Unicode string,
>>> regardless of encoding (8, 16, 32), index access is almost entirely useless 
>>> anyway, as
>>> code units/points are only indirectly related to glyphs and/or perceived 
>>> characters
>>> anyway. If you need to parse a Unicode string, you must iterate from the 
>>>

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Dick Hollenbeck

On 5/3/19 9:41 AM, Wayne Stambaugh wrote:
> There is a secondary goal of removing wxWidgets from our low level
> objects.  Maybe some day we can build the low level KiCad non-ui
> libraries sans wxWdigets.  My thinking is that wxString should only come
> into play at the UI level when dealing with wxWidgets UI code.  Being
> able to use a standard C++ string implementation would (may?) go a long
> way in helping with that goal.

That goal is what I had in mind when I wrote the UTF8 class, and its 
bidirectional
conversions to and from wxString.  I think you can pass instances of UTF8 to wx 
functions
in many cases, and assign to UTF8 on function returns.

But, you don't have all the nice translation support there.  You could through 
use wx
translation support and simply assign to UTF8, however.

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Dick Hollenbeck

Could that damage have already been done by prior concurrent access?  Maybe a 
subsequent
read only operation fails because your linked list is already hay wired...

Walking a linked list at a later point in time than when it was damaged can do 
this..



On 5/3/19 10:18 AM, Jeff Young wrote:
> I don’t believe that’s the case.  Neither of the two crashes that I tracked 
> down involved
> direct index access (or any change to the string).  One was calling 
> foo.IsEmpty(), and the
> other foo.Length().  Both use const iterators under the hood.
> 
> When wxUSE_UNICODE_UTF8 is off, wxWidgets gets around parsing the string by 
> just not doing it:
> 
> Remarks
> Note that while the behaviour of wxString
>  when 
> |wxUSE_UNICODE_WCHAR==1| resembles
> UCS-2 encoding, it's not completely correct to refer to wxString
>  as UCS-2 encoded 
> since you can
> encode code points outside the /BMP/ in a wxString
>  as two code units 
> (i.e. as a
> surrogate pair; as already mentioned however wxString
>  will "see" them as 
> two
> different code points)
> 
> 
>> On 3 May 2019, at 16:06, John Beard > > wrote:
>>
>> Hi Jeff,
>>
>> I think it is the index access operator that performs this caching, to allow 
>> you to
>> access the n'th code point any number of times while only iterating the 
>> string only once.
>>
>> However, you can still use the iterator access safely. It is only index 
>> based access
>> that is cached and thread-unsafe.
>>
>> This is what the wxString documention recommends. Furthermore, in any 
>> Unicode string,
>> regardless of encoding (8, 16, 32), index access is almost entirely useless 
>> anyway, as
>> code units/points are only indirectly related to glyphs and/or perceived 
>> characters
>> anyway. If you need to parse a Unicode string, you must iterate from the 
>> start. There is
>> no way around it.
>>
>> If we're crashing due to cross thread access by index, the bug is probably 
>> that we
>> access the string by index at all. If this was accessed by iterator, cross 
>> thread, and
>> the string is not changed, it's fine. If the string is changed in another 
>> thread, cached
>> iterators are invalid (same as if you change an C++ container in a single 
>> thread. The
>> standard tells you what iterators are invalidated for each operation on a 
>> container).
>>
>> I may have got the wrong end of the wxStick here (I can't check it for 
>> myself right
>> now), but as far as I can tell, this is fixable by just never caching 
>> indices, as if we
>> were looking at a C-style char array, and using iterators instead.
>>
>> We should probably also turn off the unsafe string conversions by defining
>> wxNO_UNSAFE_WXSTRING_CONV, if it is not already define.
>>
>> Cheers,
>>
>> John
>>
>> On 3 May 2019 16:35:30 CEST, Jeff Young > > wrote:
>>
>> Yes, we know exactly why it crashes: in order to speed up iterator 
>> access each
>> iterator keeps a pointer into the last location accessed (so that i-1 
>> and i+1 can be
>> fast).  These pointers are kept in a linked-list.  Adding and removing 
>> pointers from
>> this list is not thread-protected.
>>
>> Note that wxWidgets will add/remove a pointer even for something 
>> seemingly innocuous
>> like an Empty() check.  So doing mutex locks on our side for non-const 
>> iterator
>> access is not sufficient.
>>
>> The worst part is that since two threads collide on the same string only 
>> rarely, we
>> don’t even know how many of these bugs we have.  We’ve fixed 3 or 4 of 
>> them (by
>> adding our own mutex checking on any access), but are there 0 or 10 
>> more?  Haven’t a
>> clue.
>>
 It is between sad and breath taking.
>>
>> Indeed.
>>
>> Cheers,
>> Jeff.
>>
>>> On 3 May 2019, at 15:16, Dick Hollenbeck >> > wrote:
>>>
>>> Thanks Jeff.
>>>
>>> On 5/3/19 4:22 AM, Jeff Young wrote:
 Hi Dick,

>> h) What is the list of deficiencies with current string usage?

 I only have one issue with the current use of wxString, but it’s a big 
 one: it crashes
 (unpredictably) when used multi-threaded in UTF8 mode.
>>>
>>> The fact that it is onely *One* issue is an important data point.
>>>
>>> Since you know it is crashing in this class, you must know 
>>> approximately where, and
>>> under
>>> what kind of read/write activity.  Of course, if read activity triggers 
>>> a lazy
>>> (deferred)
>>> transformation, then this distinction can get blurred.  But more 
>>> information on source
>>> file locations would be very helpful to me.
>>>
>>> Another important data point you brought is

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Jon Evans

I have not yet had a chance to flip any switches anywhere.
I could never reproduce these string crashes locally on my Linux machine, I
just relied on others' reports.

On Fri, May 3, 2019 at 11:16 AM Jeff Young  wrote:

> More and more strange by the day.  My OSX build also shows that
> wxUSE_UNICODE_UTF8 is off (and thereby wxUSE_STRING_POS_CACHE).  Or at
> least CLion thinks it’s off (and highlights the code accordingly).
>
> I had earlier encouraged Jon to flip the switch to see what happened.  I
> had meant locally, but perhaps he did it in master to see what he could
> smoke out?
>
> Or perhaps someone else did it earlier?  Or perhaps the bug still exists
> even with wxUSE_STRING_POS_CACHE off?
>
>
> On 3 May 2019, at 15:49, Jeff Young  wrote:
>
> @Wayne, are you sure about those settings?  (That’s the flag I proposed we
> flip in the previous incantation of this thread.)  I’m fairly sure
> (although not 100% certain) that the bug only exists in wxUSE_UNICODE_UTF8
> mode.
>
> @Jon & @Tom, I know you guys have also fixed some of these crashes.  What
> platform did you find them on?  The ones I’ve fixed have been encountered
> on OSX (which is the only platform I have).
>
> On 3 May 2019, at 15:41, Wayne Stambaugh  wrote:
>
> On 5/3/19 5:22 AM, Jeff Young wrote:
>
> Hi Dick,
>
> h) What is the list of deficiencies with current string usage?
>
>
> I only have one issue with the current use of wxString, but it’s a big
> one: it crashes (unpredictably) when used multi-threaded in UTF8 mode.
>
>
> I thought it was wxString itself that was not thread safe not
> necessarily the utf-8 build but thread safety is the primary goal now
> that we are using threads in multiple places within KiCad.
>
> On my Debian system wx/setup.h shows
>
> #define wxUSE_UNICODE 1
>
> and
>
> #define wxUSE_UNICODE_UTF8 0
>
> so it would appear that wxString is built for unicode not utf8 mode on
> linux.  I'm also pretty sure windows builds are unicode as well.
>
> There is a secondary goal of removing wxWidgets from our low level
> objects.  Maybe some day we can build the low level KiCad non-ui
> libraries sans wxWdigets.  My thinking is that wxString should only come
> into play at the UI level when dealing with wxWidgets UI code.  Being
> able to use a standard C++ string implementation would (may?) go a long
> way in helping with that goal.
>
>
> This design document makes for fascinating
> reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support.  It
> appears that the current wxString is at least in part modelled on QtString.
>
> There’s also a bunch of interesting info
> here: https://docs.wxwidgets.org/trunk/overview_string.html, which I
> believe is more up-to-date than the previous link.  In particular,
> there’s the mention that wxString handles extra-BMP characters
> transparently when compiled in UTF8 mode (currently used by Kicad), but
> does NOT when compiled in default mode (in which case the app must
> handle surrogate pairs).  This of course directly leads to your point (d):
>
> d) What does the set of characters that don't fall into UCS2
> actually look like?  How big
> is this set, really?  (UTF16 is bigger than UCS2 and picks up the
> difference.)
>
>
> Do we really need to handle extra-BMP characters?
>
> An even more recent version of the second document
> (https://docs.wxwidgets.org/trunk/classwx_string.html) finally makes an
> oblique reference to the multi-threading issue by starting with this
> (rather unhelpful) suggestion:
>
> Note
>While the use of wxString
> is
>unavoidable in wxWidgets program, you are encouraged to use the
>standard string classes |std::string| or |std::wstring| in your
>applications and convert them to and from wxString
> only when
>interacting with wxWidgets.
>
>
> Cheers,
> Jeff.
>
>
> On 3 May 2019, at 02:03, Dick Hollenbeck  > wrote:
>
> On 5/2/19 5:32 PM, Dick Hollenbeck wrote:
>
> On 4/30/19 4:36 AM, Jeff Young wrote:
>
> We had talked earlier about throwing the wxWidgets UTF8 compile
> switch to get rid of our wxString re-entrancy problems.  However, I
> noticed that the 6.0 work packages doc includes an item for
> std::string-ization of the BOARD.  (While a lot more work, this is a
> better solution because it also increases our gui-toolkit-choice
> flexibility.)
>
> I’d like to propose that we use std::wstring for that.  UTF8 should
> *only* be an encoding format (similar to s-expr).  It should never
> be used internally. That’s what unicode wchar_t’s are for.
>
> And I’d like to propose that we extend std::wstring-ization to
> SCH_ITEM and LIB_ITEM.  (Then we can get rid of a bunch of our ugly
> mutex hacks.)
>
>
>
> I've been looking at this for a few months now.  I think it is so
> important, that a
> sub-committee should be formed, and if that committee takes as long
> as 4 months to come to
> a

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Jeff Young

I don’t believe that’s the case.  Neither of the two crashes that I tracked 
down involved direct index access (or any change to the string).  One was 
calling foo.IsEmpty(), and the other foo.Length().  Both use const iterators 
under the hood.

When wxUSE_UNICODE_UTF8 is off, wxWidgets gets around parsing the string by 
just not doing it:

Remarks
Note that while the behaviour of wxString 
 when 
wxUSE_UNICODE_WCHAR==1 resembles UCS-2 encoding, it's not completely correct to 
refer to wxString  as 
UCS-2 encoded since you can encode code points outside the BMP in a wxString 
 as two code units (i.e. 
as a surrogate pair; as already mentioned however wxString 
 will "see" them as two 
different code points)


> On 3 May 2019, at 16:06, John Beard  wrote:
> 
> Hi Jeff,
> 
> I think it is the index access operator that performs this caching, to allow 
> you to access the n'th code point any number of times while only iterating 
> the string only once.
> 
> However, you can still use the iterator access safely. It is only index based 
> access that is cached and thread-unsafe.
> 
> This is what the wxString documention recommends. Furthermore, in any Unicode 
> string, regardless of encoding (8, 16, 32), index access is almost entirely 
> useless anyway, as code units/points are only indirectly related to glyphs 
> and/or perceived characters anyway. If you need to parse a Unicode string, 
> you must iterate from the start. There is no way around it.
> 
> If we're crashing due to cross thread access by index, the bug is probably 
> that we access the string by index at all. If this was accessed by iterator, 
> cross thread, and the string is not changed, it's fine. If the string is 
> changed in another thread, cached iterators are invalid (same as if you 
> change an C++ container in a single thread. The standard tells you what 
> iterators are invalidated for each operation on a container).
> 
> I may have got the wrong end of the wxStick here (I can't check it for myself 
> right now), but as far as I can tell, this is fixable by just never caching 
> indices, as if we were looking at a C-style char array, and using iterators 
> instead.
> 
> We should probably also turn off the unsafe string conversions by defining 
> wxNO_UNSAFE_WXSTRING_CONV, if it is not already define.
> 
> Cheers,
> 
> John
> 
> On 3 May 2019 16:35:30 CEST, Jeff Young  wrote:
> Yes, we know exactly why it crashes: in order to speed up iterator access 
> each iterator keeps a pointer into the last location accessed (so that i-1 
> and i+1 can be fast).  These pointers are kept in a linked-list.  Adding and 
> removing pointers from this list is not thread-protected.
> 
> Note that wxWidgets will add/remove a pointer even for something seemingly 
> innocuous like an Empty() check.  So doing mutex locks on our side for 
> non-const iterator access is not sufficient.
> 
> The worst part is that since two threads collide on the same string only 
> rarely, we don’t even know how many of these bugs we have.  We’ve fixed 3 or 
> 4 of them (by adding our own mutex checking on any access), but are there 0 
> or 10 more?  Haven’t a clue.
> 
>>> It is between sad and breath taking.
> 
> Indeed.
> 
> Cheers,
> Jeff.
> 
>> On 3 May 2019, at 15:16, Dick Hollenbeck > > wrote:
>> 
>> Thanks Jeff.
>> 
>> On 5/3/19 4:22 AM, Jeff Young wrote:
>>> Hi Dick,
>>> 
> h) What is the list of deficiencies with current string usage?
>>> 
>>> I only have one issue with the current use of wxString, but it’s a big one: 
>>> it crashes
>>> (unpredictably) when used multi-threaded in UTF8 mode.
>> 
>> The fact that it is onely *One* issue is an important data point.
>> 
>> Since you know it is crashing in this class, you must know approximately 
>> where, and under
>> what kind of read/write activity.  Of course, if read activity triggers a 
>> lazy (deferred)
>> transformation, then this distinction can get blurred.  But more information 
>> on source
>> file locations would be very helpful to me.
>> 
>> Another important data point you brought is that the wx library designers 
>> are advising
>> against using wxString for core application.  It will take a couple of hours 
>> to even
>> contemplate that, it is basically staggering to me.  It is between sad and 
>> breath taking.
>> Sounds like they designed themselves into a corner and are now acknowledging 
>> that what
>> they designed is more of an API commitment that they want to disavow than a 
>> real solution.
>> 
>> I can see where that can happen.  Superior designs come from experience.  
>> Experience comes
>> with usage and time, neither of which are always available up front.
>> 
>> 
>> 
>> 
>> 
>>> 
>>> This design document makes for fascinating
>>> reading:

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Dick Hollenbeck

John I got this too from reading the class documentation an hour ago.

To smoke these out, a person could comment out the undesirable calls in a wx 
header,
perhaps one that was temporarily moved into a place at a higher priority in the 
INCLUDE
file search space.

Then "make -i"

perhaps on a subset of multi-threaded source files (object file make targets), 
or the
whole shebang for maximum pain.



On 5/3/19 10:06 AM, John Beard wrote:
> Hi Jeff,
> 
> I think it is the index access operator that performs this caching, to allow 
> you to access
> the n'th code point any number of times while only iterating the string only 
> once.
> 
> However, you can still use the iterator access safely. It is only index based 
> access that
> is cached and thread-unsafe.
> 
> This is what the wxString documention recommends. Furthermore, in any Unicode 
> string,
> regardless of encoding (8, 16, 32), index access is almost entirely useless 
> anyway, as
> code units/points are only indirectly related to glyphs and/or perceived 
> characters
> anyway. If you need to parse a Unicode string, you must iterate from the 
> start. There is
> no way around it.
> 
> If we're crashing due to cross thread access by index, the bug is probably 
> that we access
> the string by index at all. If this was accessed by iterator, cross thread, 
> and the string
> is not changed, it's fine. If the string is changed in another thread, cached 
> iterators
> are invalid (same as if you change an C++ container in a single thread. The 
> standard tells
> you what iterators are invalidated for each operation on a container).
> 
> I may have got the wrong end of the wxStick here (I can't check it for myself 
> right now),
> but as far as I can tell, this is fixable by just never caching indices, as 
> if we were
> looking at a C-style char array, and using iterators instead.
> 
> We should probably also turn off the unsafe string conversions by defining
> wxNO_UNSAFE_WXSTRING_CONV, if it is not already define.
> 
> Cheers,
> 
> John
> 
> On 3 May 2019 16:35:30 CEST, Jeff Young  wrote:
> 
> Yes, we know exactly why it crashes: in order to speed up iterator access 
> each
> iterator keeps a pointer into the last location accessed (so that i-1 and 
> i+1 can be
> fast).  These pointers are kept in a linked-list.  Adding and removing 
> pointers from
> this list is not thread-protected.
> 
> Note that wxWidgets will add/remove a pointer even for something 
> seemingly innocuous
> like an Empty() check.  So doing mutex locks on our side for non-const 
> iterator access
> is not sufficient.
> 
> The worst part is that since two threads collide on the same string only 
> rarely, we
> don’t even know how many of these bugs we have.  We’ve fixed 3 or 4 of 
> them (by adding
> our own mutex checking on any access), but are there 0 or 10 more?  
> Haven’t a clue.
> 
>>> It is between sad and breath taking.
> 
> Indeed.
> 
> Cheers,
> Jeff.
> 
>> On 3 May 2019, at 15:16, Dick Hollenbeck > > wrote:
>>
>> Thanks Jeff.
>>
>> On 5/3/19 4:22 AM, Jeff Young wrote:
>>> Hi Dick,
>>>
> h) What is the list of deficiencies with current string usage?
>>>
>>> I only have one issue with the current use of wxString, but it’s a big 
>>> one: it crashes
>>> (unpredictably) when used multi-threaded in UTF8 mode.
>>
>> The fact that it is onely *One* issue is an important data point.
>>
>> Since you know it is crashing in this class, you must know approximately 
>> where, and
>> under
>> what kind of read/write activity.  Of course, if read activity triggers 
>> a lazy
>> (deferred)
>> transformation, then this distinction can get blurred.  But more 
>> information on source
>> file locations would be very helpful to me.
>>
>> Another important data point you brought is that the wx library 
>> designers are advising
>> against using wxString for core application.  It will take a couple of 
>> hours to even
>> contemplate that, it is basically staggering to me.  It is between sad 
>> and breath
>> taking.
>> Sounds like they designed themselves into a corner and are now 
>> acknowledging that what
>> they designed is more of an API commitment that they want to disavow 
>> than a real
>> solution.
>>
>> I can see where that can happen.  Superior designs come from experience. 
>>  Experience
>> comes
>> with usage and time, neither of which are always available up front.
>>
>>
>>
>>
>>
>>>
>>> This design document makes for fascinating
>>> reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support.  It 
>>> appears that the
>>> current wxString is at least in part modelled on QtString.
>>>
>>> There’s also a bunch of interesting info
>>> here: https://docs.wxwidgets.org/trunk/overview_string.html, which I 
>>> believe is more
>>> up-to-date than the previous link.  In

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread John Beard

Hi Jeff,

I think it is the index access operator that performs this caching, to allow 
you to access the n'th code point any number of times while only iterating the 
string only once.

However, you can still use the iterator access safely. It is only index based 
access that is cached and thread-unsafe.

This is what the wxString documention recommends. Furthermore, in any Unicode 
string, regardless of encoding (8, 16, 32), index access is almost entirely 
useless anyway, as code units/points are only indirectly related to glyphs 
and/or perceived characters anyway. If you need to parse a Unicode string, you 
must iterate from the start. There is no way around it.

If we're crashing due to cross thread access by index, the bug is probably that 
we access the string by index at all. If this was accessed by iterator, cross 
thread, and the string is not changed, it's fine. If the string is changed in 
another thread, cached iterators are invalid (same as if you change an C++ 
container in a single thread. The standard tells you what iterators are 
invalidated for each operation on a container).

I may have got the wrong end of the wxStick here (I can't check it for myself 
right now), but as far as I can tell, this is fixable by just never caching 
indices, as if we were looking at a C-style char array, and using iterators 
instead.

We should probably also turn off the unsafe string conversions by defining 
wxNO_UNSAFE_WXSTRING_CONV, if it is not already define.

Cheers,

John

On 3 May 2019 16:35:30 CEST, Jeff Young  wrote:
>Yes, we know exactly why it crashes: in order to speed up iterator
>access each iterator keeps a pointer into the last location accessed
>(so that i-1 and i+1 can be fast).  These pointers are kept in a
>linked-list.  Adding and removing pointers from this list is not
>thread-protected.
>
>Note that wxWidgets will add/remove a pointer even for something
>seemingly innocuous like an Empty() check.  So doing mutex locks on our
>side for non-const iterator access is not sufficient.
>
>The worst part is that since two threads collide on the same string
>only rarely, we don’t even know how many of these bugs we have.  We’ve
>fixed 3 or 4 of them (by adding our own mutex checking on any access),
>but are there 0 or 10 more?  Haven’t a clue.
>
>>> It is between sad and breath taking.
>
>Indeed.
>
>Cheers,
>Jeff.
>
>> On 3 May 2019, at 15:16, Dick Hollenbeck  wrote:
>> 
>> Thanks Jeff.
>> 
>> On 5/3/19 4:22 AM, Jeff Young wrote:
>>> Hi Dick,
>>> 
> h) What is the list of deficiencies with current string usage?
>>> 
>>> I only have one issue with the current use of wxString, but it’s a
>big one: it crashes
>>> (unpredictably) when used multi-threaded in UTF8 mode.
>> 
>> The fact that it is onely *One* issue is an important data point.
>> 
>> Since you know it is crashing in this class, you must know
>approximately where, and under
>> what kind of read/write activity.  Of course, if read activity
>triggers a lazy (deferred)
>> transformation, then this distinction can get blurred.  But more
>information on source
>> file locations would be very helpful to me.
>> 
>> Another important data point you brought is that the wx library
>designers are advising
>> against using wxString for core application.  It will take a couple
>of hours to even
>> contemplate that, it is basically staggering to me.  It is between
>sad and breath taking.
>> Sounds like they designed themselves into a corner and are now
>acknowledging that what
>> they designed is more of an API commitment that they want to disavow
>than a real solution.
>> 
>> I can see where that can happen.  Superior designs come from
>experience.  Experience comes
>> with usage and time, neither of which are always available up front.
>> 
>> 
>> 
>> 
>> 
>>> 
>>> This design document makes for fascinating
>>> reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support.  It
>appears that the
>>> current wxString is at least in part modelled on QtString.
>>> 
>>> There’s also a bunch of interesting info
>>> here: https://docs.wxwidgets.org/trunk/overview_string.html, which I
>believe is more
>>> up-to-date than the previous link.  In particular, there’s the
>mention that wxString
>>> handles extra-BMP characters transparently when compiled in UTF8
>mode (currently used by
>>> Kicad), but does NOT when compiled in default mode (in which case
>the app must handle
>>> surrogate pairs).  This of course directly leads to your point (d):
>>> 
>> d) What does the set of characters that don't fall into UCS2
>actually look like?  How big
>> is this set, really?  (UTF16 is bigger than UCS2 and picks up the
>difference.)
>>> 
>>> Do we really need to handle extra-BMP characters?
>>> 
>>> An even more recent version of the second document
>>> (https://docs.wxwidgets.org/trunk/classwx_string.html) finally makes
>an oblique reference
>>> to the multi-threading issue by starting with this (rather
>unhelpful) suggestion:
>>> 
>>> Note
>>>

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Jeff Young

More and more strange by the day.  My OSX build also shows that 
wxUSE_UNICODE_UTF8 is off (and thereby wxUSE_STRING_POS_CACHE).  Or at least 
CLion thinks it’s off (and highlights the code accordingly).

I had earlier encouraged Jon to flip the switch to see what happened.  I had 
meant locally, but perhaps he did it in master to see what he could smoke out?

Or perhaps someone else did it earlier?  Or perhaps the bug still exists even 
with wxUSE_STRING_POS_CACHE off? 


> On 3 May 2019, at 15:49, Jeff Young  wrote:
> 
> @Wayne, are you sure about those settings?  (That’s the flag I proposed we 
> flip in the previous incantation of this thread.)  I’m fairly sure (although 
> not 100% certain) that the bug only exists in wxUSE_UNICODE_UTF8 mode.
> 
> @Jon & @Tom, I know you guys have also fixed some of these crashes.  What 
> platform did you find them on?  The ones I’ve fixed have been encountered on 
> OSX (which is the only platform I have).
> 
>> On 3 May 2019, at 15:41, Wayne Stambaugh > > wrote:
>> 
>> On 5/3/19 5:22 AM, Jeff Young wrote:
>>> Hi Dick,
>>> 
> h) What is the list of deficiencies with current string usage?
>>> 
>>> I only have one issue with the current use of wxString, but it’s a big
>>> one: it crashes (unpredictably) when used multi-threaded in UTF8 mode.
>> 
>> I thought it was wxString itself that was not thread safe not
>> necessarily the utf-8 build but thread safety is the primary goal now
>> that we are using threads in multiple places within KiCad.
>> 
>> On my Debian system wx/setup.h shows
>> 
>> #define wxUSE_UNICODE 1
>> 
>> and
>> 
>> #define wxUSE_UNICODE_UTF8 0
>> 
>> so it would appear that wxString is built for unicode not utf8 mode on
>> linux.  I'm also pretty sure windows builds are unicode as well.
>> 
>> There is a secondary goal of removing wxWidgets from our low level
>> objects.  Maybe some day we can build the low level KiCad non-ui
>> libraries sans wxWdigets.  My thinking is that wxString should only come
>> into play at the UI level when dealing with wxWidgets UI code.  Being
>> able to use a standard C++ string implementation would (may?) go a long
>> way in helping with that goal.
>> 
>>> 
>>> This design document makes for fascinating
>>> reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support 
>>> .  It
>>> appears that the current wxString is at least in part modelled on QtString.
>>> 
>>> There’s also a bunch of interesting info
>>> here: https://docs.wxwidgets.org/trunk/overview_string.html 
>>> , which I
>>> believe is more up-to-date than the previous link.  In particular,
>>> there’s the mention that wxString handles extra-BMP characters
>>> transparently when compiled in UTF8 mode (currently used by Kicad), but
>>> does NOT when compiled in default mode (in which case the app must
>>> handle surrogate pairs).  This of course directly leads to your point (d):
>>> 
>> d) What does the set of characters that don't fall into UCS2
>> actually look like?  How big
>> is this set, really?  (UTF16 is bigger than UCS2 and picks up the
>> difference.)
>>> 
>>> Do we really need to handle extra-BMP characters?
>>> 
>>> An even more recent version of the second document
>>> (https://docs.wxwidgets.org/trunk/classwx_string.html 
>>> ) finally makes an
>>> oblique reference to the multi-threading issue by starting with this
>>> (rather unhelpful) suggestion:
>>> 
>>> Note
>>>While the use of wxString
>>>>> > is
>>>unavoidable in wxWidgets program, you are encouraged to use the
>>>standard string classes |std::string| or |std::wstring| in your
>>>applications and convert them to and from wxString
>>>>> > only when
>>>interacting with wxWidgets.
>>> 
>>> 
>>> Cheers,
>>> Jeff.
>>> 
>>> 
 On 3 May 2019, at 02:03, Dick Hollenbeck >>> 
 >> wrote:
 
 On 5/2/19 5:32 PM, Dick Hollenbeck wrote:
> On 4/30/19 4:36 AM, Jeff Young wrote:
>> We had talked earlier about throwing the wxWidgets UTF8 compile
>> switch to get rid of our wxString re-entrancy problems.  However, I
>> noticed that the 6.0 work packages doc includes an item for
>> std::string-ization of the BOARD.  (While a lot more work, this is a
>> better solution because it also increases our gui-toolkit-choice
>> flexibility.)
>> 
>> I’d like to propose that we use std::wstring for that.  UTF8 should
>> *only* be an encoding format (similar to s-expr).  It should never
>> be used internally. That’s what unicode

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Wayne Stambaugh

On 5/3/19 5:22 AM, Jeff Young wrote:
> Hi Dick,
> 
>>> h) What is the list of deficiencies with current string usage?
> 
> I only have one issue with the current use of wxString, but it’s a big
> one: it crashes (unpredictably) when used multi-threaded in UTF8 mode.

I thought it was wxString itself that was not thread safe not
necessarily the utf-8 build but thread safety is the primary goal now
that we are using threads in multiple places within KiCad.

On my Debian system wx/setup.h shows

#define wxUSE_UNICODE 1

and

#define wxUSE_UNICODE_UTF8 0

so it would appear that wxString is built for unicode not utf8 mode on
linux.  I'm also pretty sure windows builds are unicode as well.

There is a secondary goal of removing wxWidgets from our low level
objects.  Maybe some day we can build the low level KiCad non-ui
libraries sans wxWdigets.  My thinking is that wxString should only come
into play at the UI level when dealing with wxWidgets UI code.  Being
able to use a standard C++ string implementation would (may?) go a long
way in helping with that goal.

> 
> This design document makes for fascinating
> reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support.  It
> appears that the current wxString is at least in part modelled on QtString.
> 
> There’s also a bunch of interesting info
> here: https://docs.wxwidgets.org/trunk/overview_string.html, which I
> believe is more up-to-date than the previous link.  In particular,
> there’s the mention that wxString handles extra-BMP characters
> transparently when compiled in UTF8 mode (currently used by Kicad), but
> does NOT when compiled in default mode (in which case the app must
> handle surrogate pairs).  This of course directly leads to your point (d):
> 
 d) What does the set of characters that don't fall into UCS2
 actually look like?  How big
 is this set, really?  (UTF16 is bigger than UCS2 and picks up the
 difference.)
> 
> Do we really need to handle extra-BMP characters?
> 
> An even more recent version of the second document
> (https://docs.wxwidgets.org/trunk/classwx_string.html) finally makes an
> oblique reference to the multi-threading issue by starting with this
> (rather unhelpful) suggestion:
> 
> Note
> While the use of wxString
>  is
> unavoidable in wxWidgets program, you are encouraged to use the
> standard string classes |std::string| or |std::wstring| in your
> applications and convert them to and from wxString
>  only when
> interacting with wxWidgets.
> 
> 
> Cheers,
> Jeff.
> 
> 
>> On 3 May 2019, at 02:03, Dick Hollenbeck > > wrote:
>>
>> On 5/2/19 5:32 PM, Dick Hollenbeck wrote:
>>> On 4/30/19 4:36 AM, Jeff Young wrote:
 We had talked earlier about throwing the wxWidgets UTF8 compile
 switch to get rid of our wxString re-entrancy problems.  However, I
 noticed that the 6.0 work packages doc includes an item for
 std::string-ization of the BOARD.  (While a lot more work, this is a
 better solution because it also increases our gui-toolkit-choice
 flexibility.)

 I’d like to propose that we use std::wstring for that.  UTF8 should
 *only* be an encoding format (similar to s-expr).  It should never
 be used internally. That’s what unicode wchar_t’s are for.

 And I’d like to propose that we extend std::wstring-ization to
 SCH_ITEM and LIB_ITEM.  (Then we can get rid of a bunch of our ugly
 mutex hacks.)
>>>
>>>
>>> I've been looking at this for a few months now.  I think it is so
>>> important, that a
>>> sub-committee should be formed, and if that committee takes as long
>>> as 4 months to come to
>>> a recommendation, this would not be too long.  This issue is simply
>>> too critical.
>>>
>>> I would like to volunteer to be on that committee.  For the entire
>>> list to participate in
>>> this simply does not make sense to me.  I would welcome the
>>> opportunity to study this with
>>> a team of 5-6 players.  More than that probably leads to anxiety.
>>>  Then, given the
>>> recommendations, the list would of course have an opportunity to
>>> raise questions and take
>>> shots, before a strategy is formulated, and before anything is
>>> implemented.
>>>
>>> Again, approximately:
>>>
>>>  committee recommendations -> list approval -> strategy formulation
>>> -> implementation
>>>
>>>
>>> Up to now I have looked at many libraries and have [way *too* much]
>>> experience in multiple
>>> languages on multiple platforms, so I think I can be valuable
>>> contributor.
>>>
>>> The final work product initially would simply be a list of
>>> recommendations, that quickly
>>> transforms to a strategy thereafter.  This is an enormous
>>> undertaking, so I suggest
>>> against racing to a solution.  It could look a lot easier than it
>>> will ultimately be, as
>>> is typical in software development.  But the

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Jeff Young

Yes, we know exactly why it crashes: in order to speed up iterator access each 
iterator keeps a pointer into the last location accessed (so that i-1 and i+1 
can be fast).  These pointers are kept in a linked-list.  Adding and removing 
pointers from this list is not thread-protected.

Note that wxWidgets will add/remove a pointer even for something seemingly 
innocuous like an Empty() check.  So doing mutex locks on our side for 
non-const iterator access is not sufficient.

The worst part is that since two threads collide on the same string only 
rarely, we don’t even know how many of these bugs we have.  We’ve fixed 3 or 4 
of them (by adding our own mutex checking on any access), but are there 0 or 10 
more?  Haven’t a clue.

>> It is between sad and breath taking.

Indeed.

Cheers,
Jeff.

> On 3 May 2019, at 15:16, Dick Hollenbeck  wrote:
> 
> Thanks Jeff.
> 
> On 5/3/19 4:22 AM, Jeff Young wrote:
>> Hi Dick,
>> 
 h) What is the list of deficiencies with current string usage?
>> 
>> I only have one issue with the current use of wxString, but it’s a big one: 
>> it crashes
>> (unpredictably) when used multi-threaded in UTF8 mode.
> 
> The fact that it is onely *One* issue is an important data point.
> 
> Since you know it is crashing in this class, you must know approximately 
> where, and under
> what kind of read/write activity.  Of course, if read activity triggers a 
> lazy (deferred)
> transformation, then this distinction can get blurred.  But more information 
> on source
> file locations would be very helpful to me.
> 
> Another important data point you brought is that the wx library designers are 
> advising
> against using wxString for core application.  It will take a couple of hours 
> to even
> contemplate that, it is basically staggering to me.  It is between sad and 
> breath taking.
> Sounds like they designed themselves into a corner and are now acknowledging 
> that what
> they designed is more of an API commitment that they want to disavow than a 
> real solution.
> 
> I can see where that can happen.  Superior designs come from experience.  
> Experience comes
> with usage and time, neither of which are always available up front.
> 
> 
> 
> 
> 
>> 
>> This design document makes for fascinating
>> reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support.  It appears 
>> that the
>> current wxString is at least in part modelled on QtString.
>> 
>> There’s also a bunch of interesting info
>> here: https://docs.wxwidgets.org/trunk/overview_string.html, which I believe 
>> is more
>> up-to-date than the previous link.  In particular, there’s the mention that 
>> wxString
>> handles extra-BMP characters transparently when compiled in UTF8 mode 
>> (currently used by
>> Kicad), but does NOT when compiled in default mode (in which case the app 
>> must handle
>> surrogate pairs).  This of course directly leads to your point (d):
>> 
> d) What does the set of characters that don't fall into UCS2 actually 
> look like?  How big
> is this set, really?  (UTF16 is bigger than UCS2 and picks up the 
> difference.)
>> 
>> Do we really need to handle extra-BMP characters?
>> 
>> An even more recent version of the second document
>> (https://docs.wxwidgets.org/trunk/classwx_string.html) finally makes an 
>> oblique reference
>> to the multi-threading issue by starting with this (rather unhelpful) 
>> suggestion:
>> 
>> Note
>>While the use of wxString 
>> > > is
>>unavoidable in wxWidgets program, you are encouraged to use the standard 
>> string
>>classes |std::string| or |std::wstring| in your applications and convert 
>> them to and
>>from wxString > > only when
>>interacting with wxWidgets.
>> 
>> 
>> Cheers,
>> Jeff.
>> 
>> 
>>> On 3 May 2019, at 02:03, Dick Hollenbeck >>  >> >> wrote:
>>> 
>>> On 5/2/19 5:32 PM, Dick Hollenbeck wrote:
 On 4/30/19 4:36 AM, Jeff Young wrote:
> We had talked earlier about throwing the wxWidgets UTF8 compile switch to 
> get rid of
> our wxString re-entrancy problems.  However, I noticed that the 6.0 work 
> packages doc
> includes an item for std::string-ization of the BOARD.  (While a lot more 
> work, this
> is a better solution because it also increases our gui-toolkit-choice 
> flexibility.)
> 
> I’d like to propose that we use std::wstring for that.  UTF8 should 
> *only* be an
> encoding format (similar to s-expr).  It should never be used internally. 
> That’s what
> unicode wchar_t’s are for.
> 
> And I’d like to propose that we extend std::wstring-ization to SCH_ITEM 
> and LIB_ITEM.
>  (Then we can get rid of a bunch of our ugly mutex

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Dick Hollenbeck

Thanks Wayne for giving me a chance to participate.

Jeff has been very helpful so far.  Before we setup a small group communications
mechanism, I look forward to Jeff's further input.  I think the needs analysis 
is
important before we build a solution work environment.

Maybe there's a simple solution set, without too many ripples.  But to get that 
lucky
would require knowing what the immediate problems are in more detail.


Dick


On 5/3/19 8:25 AM, Wayne Stambaugh wrote:
> Dick,
> 
> On 5/2/19 6:32 PM, Dick Hollenbeck wrote:
>> On 4/30/19 4:36 AM, Jeff Young wrote:
>>> We had talked earlier about throwing the wxWidgets UTF8 compile switch to 
>>> get rid of our wxString re-entrancy problems.  However, I noticed that the 
>>> 6.0 work packages doc includes an item for std::string-ization of the 
>>> BOARD.  (While a lot more work, this is a better solution because it also 
>>> increases our gui-toolkit-choice flexibility.)
>>>
>>> I’d like to propose that we use std::wstring for that.  UTF8 should *only* 
>>> be an encoding format (similar to s-expr).  It should never be used 
>>> internally.  That’s what unicode wchar_t’s are for.
>>>
>>> And I’d like to propose that we extend std::wstring-ization to SCH_ITEM and 
>>> LIB_ITEM.  (Then we can get rid of a bunch of our ugly mutex hacks.)
>>
>>
>> I've been looking at this for a few months now.  I think it is so important, 
>> that a
>> sub-committee should be formed, and if that committee takes as long as 4 
>> months to come to
>> a recommendation, this would not be too long.  This issue is simply too 
>> critical.
>>
>> I would like to volunteer to be on that committee.  For the entire list to 
>> participate in
>> this simply does not make sense to me.  I would welcome the opportunity to 
>> study this with
>> a team of 5-6 players.  More than that probably leads to anxiety.  Then, 
>> given the
>> recommendations, the list would of course have an opportunity to raise 
>> questions and take
>> shots, before a strategy is formulated, and before anything is implemented.
>>
>> Again, approximately:
>>
>>   committee recommendations -> list approval -> strategy formulation -> 
>> implementation
>>
>>
>> Up to now I have looked at many libraries and have [way *too* much] 
>> experience in multiple
>> languages on multiple platforms, so I think I can be valuable contributor.
>>
>> The final work product initially would simply be a list of recommendations, 
>> that quickly
>> transforms to a strategy thereafter.  This is an enormous undertaking, so I 
>> suggest
>> against racing to a solution.  It could look a lot easier than it will 
>> ultimately be, as
>> is typical in software development.  But the return on investment needs to 
>> be near optimal
>> in the end.
> 
> I have no intention of just winging a solution and hoping it works.  We
> are just in the very early stages of brainstorming.  We know that in the
> long run we will have to do something to improve our current handling of
> strings so carefully defining what that looks like is important.  Once
> we have a well defined strategy, implementation will be clear to all
> developers.
> 
>>
>> Some questions to answer are:
>>
>> a) How did wxString get to its current state?  Is is merely a conglomeration 
>> of after
>> thought, or is is anywhere near optimal.
>>
>> b) Why so many forms of it?  Can one form be chosen for all platforms?
>>
>> c) How does wxString it compare to QtString?
>>
>> d) What does the set of characters that don't fall into UCS2 actually look 
>> like?  How big
>> is this set, really?  (UTF16 is bigger than UCS2 and picks up the 
>> difference.)
>>
>> e) For data files, I think UTF8 is fine.  So the change is for RAM 
>> manipulation of
>> strings.  Aren't we talking about a RAM resident string that bridges into 
>> the GUI seamlessly?
> 
> UTF8 is definitely not going to change for file I/O.
> 
>>
>> f) What does new C++ language support offer?
>>
>> g) What do C++ language designers suggest?
>>
>>
>> etc.
>>
>> But this is best continued in a smaller group, as said.
> 
> I'm fine with keeping this limited to the lead dev team and yourself
> since it most likely the responsibility to implement this will fall on
> one of our shoulders.  There is no hurry on this.  Everyone has plenty
> to do as V6 development is in full swing.  I would prefer to take our
> time and get the strategy correct before we attempt to implement anything.
> 
> Cheers,
> 
> Wayne
> 
>>
>>
>> The other thing that I bring to this is vast familiarity with KiCad's 
>> internal workings,
>> string use cases, and goals.
>>
>> Let me know if I can help.
>>
>> Regards,
>>
>> Dick
>>
>>
>> ___
>> Mailing list: https://launchpad.net/~kicad-developers
>> Post to : kicad-developers@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~kicad-developers
>> More help   : https://help.launchpad.net/ListHelp
>>
> 
> ___

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Dick Hollenbeck

Thanks Jeff.

On 5/3/19 4:22 AM, Jeff Young wrote:
> Hi Dick,
> 
>>> h) What is the list of deficiencies with current string usage?
> 
> I only have one issue with the current use of wxString, but it’s a big one: 
> it crashes
> (unpredictably) when used multi-threaded in UTF8 mode.

The fact that it is onely *One* issue is an important data point.

Since you know it is crashing in this class, you must know approximately where, 
and under
what kind of read/write activity.  Of course, if read activity triggers a lazy 
(deferred)
transformation, then this distinction can get blurred.  But more information on 
source
file locations would be very helpful to me.

Another important data point you brought is that the wx library designers are 
advising
against using wxString for core application.  It will take a couple of hours to 
even
contemplate that, it is basically staggering to me.  It is between sad and 
breath taking.
Sounds like they designed themselves into a corner and are now acknowledging 
that what
they designed is more of an API commitment that they want to disavow than a 
real solution.

I can see where that can happen.  Superior designs come from experience.  
Experience comes
with usage and time, neither of which are always available up front.





> 
> This design document makes for fascinating
> reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support.  It appears 
> that the
> current wxString is at least in part modelled on QtString.
> 
> There’s also a bunch of interesting info
> here: https://docs.wxwidgets.org/trunk/overview_string.html, which I believe 
> is more
> up-to-date than the previous link.  In particular, there’s the mention that 
> wxString
> handles extra-BMP characters transparently when compiled in UTF8 mode 
> (currently used by
> Kicad), but does NOT when compiled in default mode (in which case the app 
> must handle
> surrogate pairs).  This of course directly leads to your point (d):
> 
 d) What does the set of characters that don't fall into UCS2 actually look 
 like?  How big
 is this set, really?  (UTF16 is bigger than UCS2 and picks up the 
 difference.)
> 
> Do we really need to handle extra-BMP characters?
> 
> An even more recent version of the second document
> (https://docs.wxwidgets.org/trunk/classwx_string.html) finally makes an 
> oblique reference
> to the multi-threading issue by starting with this (rather unhelpful) 
> suggestion:
> 
> Note
> While the use of wxString 
>  is
> unavoidable in wxWidgets program, you are encouraged to use the standard 
> string
> classes |std::string| or |std::wstring| in your applications and convert 
> them to and
> from wxString  only 
> when
> interacting with wxWidgets.
> 
> 
> Cheers,
> Jeff.
> 
> 
>> On 3 May 2019, at 02:03, Dick Hollenbeck > > wrote:
>>
>> On 5/2/19 5:32 PM, Dick Hollenbeck wrote:
>>> On 4/30/19 4:36 AM, Jeff Young wrote:
 We had talked earlier about throwing the wxWidgets UTF8 compile switch to 
 get rid of
 our wxString re-entrancy problems.  However, I noticed that the 6.0 work 
 packages doc
 includes an item for std::string-ization of the BOARD.  (While a lot more 
 work, this
 is a better solution because it also increases our gui-toolkit-choice 
 flexibility.)

 I’d like to propose that we use std::wstring for that.  UTF8 should *only* 
 be an
 encoding format (similar to s-expr).  It should never be used internally. 
 That’s what
 unicode wchar_t’s are for.

 And I’d like to propose that we extend std::wstring-ization to SCH_ITEM 
 and LIB_ITEM.
  (Then we can get rid of a bunch of our ugly mutex hacks.)
>>>
>>>
>>> I've been looking at this for a few months now.  I think it is so 
>>> important, that a
>>> sub-committee should be formed, and if that committee takes as long as 4 
>>> months to come to
>>> a recommendation, this would not be too long.  This issue is simply too 
>>> critical.
>>>
>>> I would like to volunteer to be on that committee.  For the entire list to 
>>> participate in
>>> this simply does not make sense to me.  I would welcome the opportunity to 
>>> study this with
>>> a team of 5-6 players.  More than that probably leads to anxiety.  Then, 
>>> given the
>>> recommendations, the list would of course have an opportunity to raise 
>>> questions and take
>>> shots, before a strategy is formulated, and before anything is implemented.
>>>
>>> Again, approximately:
>>>
>>>  committee recommendations -> list approval -> strategy formulation -> 
>>> implementation
>>>
>>>
>>> Up to now I have looked at many libraries and have [way *too* much] 
>>> experience in multiple
>>> languages on multiple platforms, so I think I can be valuable contributor.
>>>
>>> The final work product initially would simply be a list of recommendations,

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Wayne Stambaugh

Dick,

On 5/2/19 6:32 PM, Dick Hollenbeck wrote:
> On 4/30/19 4:36 AM, Jeff Young wrote:
>> We had talked earlier about throwing the wxWidgets UTF8 compile switch to 
>> get rid of our wxString re-entrancy problems.  However, I noticed that the 
>> 6.0 work packages doc includes an item for std::string-ization of the BOARD. 
>>  (While a lot more work, this is a better solution because it also increases 
>> our gui-toolkit-choice flexibility.)
>>
>> I’d like to propose that we use std::wstring for that.  UTF8 should *only* 
>> be an encoding format (similar to s-expr).  It should never be used 
>> internally.  That’s what unicode wchar_t’s are for.
>>
>> And I’d like to propose that we extend std::wstring-ization to SCH_ITEM and 
>> LIB_ITEM.  (Then we can get rid of a bunch of our ugly mutex hacks.)
> 
> 
> I've been looking at this for a few months now.  I think it is so important, 
> that a
> sub-committee should be formed, and if that committee takes as long as 4 
> months to come to
> a recommendation, this would not be too long.  This issue is simply too 
> critical.
> 
> I would like to volunteer to be on that committee.  For the entire list to 
> participate in
> this simply does not make sense to me.  I would welcome the opportunity to 
> study this with
> a team of 5-6 players.  More than that probably leads to anxiety.  Then, 
> given the
> recommendations, the list would of course have an opportunity to raise 
> questions and take
> shots, before a strategy is formulated, and before anything is implemented.
> 
> Again, approximately:
> 
>   committee recommendations -> list approval -> strategy formulation -> 
> implementation
> 
> 
> Up to now I have looked at many libraries and have [way *too* much] 
> experience in multiple
> languages on multiple platforms, so I think I can be valuable contributor.
> 
> The final work product initially would simply be a list of recommendations, 
> that quickly
> transforms to a strategy thereafter.  This is an enormous undertaking, so I 
> suggest
> against racing to a solution.  It could look a lot easier than it will 
> ultimately be, as
> is typical in software development.  But the return on investment needs to be 
> near optimal
> in the end.

I have no intention of just winging a solution and hoping it works.  We
are just in the very early stages of brainstorming.  We know that in the
long run we will have to do something to improve our current handling of
strings so carefully defining what that looks like is important.  Once
we have a well defined strategy, implementation will be clear to all
developers.

> 
> Some questions to answer are:
> 
> a) How did wxString get to its current state?  Is is merely a conglomeration 
> of after
> thought, or is is anywhere near optimal.
> 
> b) Why so many forms of it?  Can one form be chosen for all platforms?
> 
> c) How does wxString it compare to QtString?
> 
> d) What does the set of characters that don't fall into UCS2 actually look 
> like?  How big
> is this set, really?  (UTF16 is bigger than UCS2 and picks up the difference.)
> 
> e) For data files, I think UTF8 is fine.  So the change is for RAM 
> manipulation of
> strings.  Aren't we talking about a RAM resident string that bridges into the 
> GUI seamlessly?

UTF8 is definitely not going to change for file I/O.

> 
> f) What does new C++ language support offer?
> 
> g) What do C++ language designers suggest?
> 
> 
> etc.
> 
> But this is best continued in a smaller group, as said.

I'm fine with keeping this limited to the lead dev team and yourself
since it most likely the responsibility to implement this will fall on
one of our shoulders.  There is no hurry on this.  Everyone has plenty
to do as V6 development is in full swing.  I would prefer to take our
time and get the strategy correct before we attempt to implement anything.

Cheers,

Wayne

> 
> 
> The other thing that I bring to this is vast familiarity with KiCad's 
> internal workings,
> string use cases, and goals.
> 
> Let me know if I can help.
> 
> Regards,
> 
> Dick
> 
> 
> ___
> Mailing list: https://launchpad.net/~kicad-developers
> Post to : kicad-developers@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp
> 

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-05-03 Thread Jeff Young

Hi Dick,

>> h) What is the list of deficiencies with current string usage?

I only have one issue with the current use of wxString, but it’s a big one: it 
crashes (unpredictably) when used multi-threaded in UTF8 mode.

This design document makes for fascinating reading: 
https://wiki.wxwidgets.org/Development:_UTF-8_Support.  It appears that the 
current wxString is at least in part modelled on QtString.

There’s also a bunch of interesting info here: 
https://docs.wxwidgets.org/trunk/overview_string.html, which I believe is more 
up-to-date than the previous link.  In particular, there’s the mention that 
wxString handles extra-BMP characters transparently when compiled in UTF8 mode 
(currently used by Kicad), but does NOT when compiled in default mode (in which 
case the app must handle surrogate pairs).  This of course directly leads to 
your point (d):

>>> d) What does the set of characters that don't fall into UCS2 actually look 
>>> like?  How big
>>> is this set, really?  (UTF16 is bigger than UCS2 and picks up the 
>>> difference.)

Do we really need to handle extra-BMP characters?

An even more recent version of the second document 
(https://docs.wxwidgets.org/trunk/classwx_string.html) finally makes an oblique 
reference to the multi-threading issue by starting with this (rather unhelpful) 
suggestion:
Note
While the use of wxString 
 is unavoidable in 
wxWidgets program, you are encouraged to use the standard string classes 
std::string or std::wstring in your applications and convert them to and from 
wxString  only when 
interacting with wxWidgets.

Cheers,
Jeff.


> On 3 May 2019, at 02:03, Dick Hollenbeck  wrote:
> 
> On 5/2/19 5:32 PM, Dick Hollenbeck wrote:
>> On 4/30/19 4:36 AM, Jeff Young wrote:
>>> We had talked earlier about throwing the wxWidgets UTF8 compile switch to 
>>> get rid of our wxString re-entrancy problems.  However, I noticed that the 
>>> 6.0 work packages doc includes an item for std::string-ization of the 
>>> BOARD.  (While a lot more work, this is a better solution because it also 
>>> increases our gui-toolkit-choice flexibility.)
>>> 
>>> I’d like to propose that we use std::wstring for that.  UTF8 should *only* 
>>> be an encoding format (similar to s-expr).  It should never be used 
>>> internally. That’s what unicode wchar_t’s are for.
>>> 
>>> And I’d like to propose that we extend std::wstring-ization to SCH_ITEM and 
>>> LIB_ITEM.  (Then we can get rid of a bunch of our ugly mutex hacks.)
>> 
>> 
>> I've been looking at this for a few months now.  I think it is so important, 
>> that a
>> sub-committee should be formed, and if that committee takes as long as 4 
>> months to come to
>> a recommendation, this would not be too long.  This issue is simply too 
>> critical.
>> 
>> I would like to volunteer to be on that committee.  For the entire list to 
>> participate in
>> this simply does not make sense to me.  I would welcome the opportunity to 
>> study this with
>> a team of 5-6 players.  More than that probably leads to anxiety.  Then, 
>> given the
>> recommendations, the list would of course have an opportunity to raise 
>> questions and take
>> shots, before a strategy is formulated, and before anything is implemented.
>> 
>> Again, approximately:
>> 
>>  committee recommendations -> list approval -> strategy formulation -> 
>> implementation
>> 
>> 
>> Up to now I have looked at many libraries and have [way *too* much] 
>> experience in multiple
>> languages on multiple platforms, so I think I can be valuable contributor.
>> 
>> The final work product initially would simply be a list of recommendations, 
>> that quickly
>> transforms to a strategy thereafter.  This is an enormous undertaking, so I 
>> suggest
>> against racing to a solution.  It could look a lot easier than it will 
>> ultimately be, as
>> is typical in software development.  But the return on investment needs to 
>> be near optimal
>> in the end.
>> 
>> Some questions to answer are:
>> 
>> a) How did wxString get to its current state?  Is is merely a conglomeration 
>> of after
>> thought, or is is anywhere near optimal.
>> 
>> b) Why so many forms of it?  Can one form be chosen for all platforms?
>> 
>> c) How does wxString it compare to QtString?
>> 
>> d) What does the set of characters that don't fall into UCS2 actually look 
>> like?  How big
>> is this set, really?  (UTF16 is bigger than UCS2 and picks up the 
>> difference.)
>> 
>> e) For data files, I think UTF8 is fine.  So the change is for RAM 
>> manipulation of
>> strings.  Aren't we talking about a RAM resident string that bridges into 
>> the GUI seamlessly?
>> 
>> f) What does new C++ language support offer?
>> 
>> g) What do C++ language designers suggest?
> 
> h) What is the list of deficiencies with current string usage?
> 
> 
>> 
>> 
>> etc.
>> 
>> But this is best continued in a smaller group, as said.

Re: [Kicad-developers] 6.0 string proposal

2019-05-02 Thread Dick Hollenbeck

On 4/30/19 4:36 AM, Jeff Young wrote:
> We had talked earlier about throwing the wxWidgets UTF8 compile switch to get 
> rid of our wxString re-entrancy problems.  However, I noticed that the 6.0 
> work packages doc includes an item for std::string-ization of the BOARD.  
> (While a lot more work, this is a better solution because it also increases 
> our gui-toolkit-choice flexibility.)
> 
> I’d like to propose that we use std::wstring for that.  UTF8 should *only* be 
> an encoding format (similar to s-expr).  It should never be used internally.  
> That’s what unicode wchar_t’s are for.
> 
> And I’d like to propose that we extend std::wstring-ization to SCH_ITEM and 
> LIB_ITEM.  (Then we can get rid of a bunch of our ugly mutex hacks.)

I've been looking at this for a few months now.  I think it is so important, 
that a
sub-committee should be formed, and if that committee takes as long as 4 months 
to come to
a recommendation, this would not be too long.  This issue is simply too 
critical.

I would like to volunteer to be on that committee.  For the entire list to 
participate in
this simply does not make sense to me.  I would welcome the opportunity to 
study this with
a team of 5-6 players.  More than that probably leads to anxiety.  Then, given 
the
recommendations, the list would of course have an opportunity to raise 
questions and take
shots, before a strategy is formulated, and before anything is implemented.

Again, approximately:

  committee recommendations -> list approval -> strategy formulation -> 
implementation

Up to now I have looked at many libraries and have [way *too* much] experience 
in multiple
languages on multiple platforms, so I think I can be valuable contributor.

The final work product initially would simply be a list of recommendations, 
that quickly
transforms to a strategy thereafter.  This is an enormous undertaking, so I 
suggest
against racing to a solution.  It could look a lot easier than it will 
ultimately be, as
is typical in software development.  But the return on investment needs to be 
near optimal
in the end.

Some questions to answer are:

a) How did wxString get to its current state?  Is is merely a conglomeration of 
after
thought, or is is anywhere near optimal.

b) Why so many forms of it?  Can one form be chosen for all platforms?

c) How does wxString it compare to QtString?

d) What does the set of characters that don't fall into UCS2 actually look 
like?  How big
is this set, really?  (UTF16 is bigger than UCS2 and picks up the difference.)

e) For data files, I think UTF8 is fine.  So the change is for RAM manipulation 
of
strings.  Aren't we talking about a RAM resident string that bridges into the 
GUI seamlessly?

f) What does new C++ language support offer?

g) What do C++ language designers suggest?

etc.

But this is best continued in a smaller group, as said.

The other thing that I bring to this is vast familiarity with KiCad's internal 
workings,
string use cases, and goals.

Let me know if I can help.

Regards,

Dick

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Jon Evans

String access is a factor in the performance of the new real-time
connectivity algorithm in eeschema, since all connectivity is established
by parsing labels and pin names.  I have not done benchmarks comparing
various options for string storage, but we would need to watch that space
too if we change how strings work.

-Jon

On Tue, Apr 30, 2019 at 8:41 PM John Beard  wrote:

> On 30/04/2019 16:01, Jeff Young wrote:
> > Primarily for performance reasons.
>
> WRT performance, I did a few benchmarks for reference (on Linux)
>
> Loading this large CIAA PCB[1] allocates, out of a peak usage of 467MB
> of heap with a 0.01% threshold:
>
> * 9.6MB of std::basic_string::_M_assign
> * 9.4MB of this is from wxString operator= assignments
> * ~600kB of std::basic_string::_M_construct, (wxString ctor)
>
> So I'm not sure memory usage is a major factor to worry about (strings
> allocate storage on the heap, so we should see basically all the
> interesting things in the heap profile). UTF-8 could be as little as 1/4
> UTF-32 (all strings are ASCII), but even then, it's a few MB saved.
>
> Now, in terms of performance, opening Pcbnew with no file gives:
>
> #4  3.36%   __gconv_transform_utf8_internal
> #5  2.51%   __mbsrtowcs_l
> #6  2.50%   wxMBConv::ToWChar
> #8  2.07%   std::basic_string::_M_assign
> #9  1.88%   wxMBConvStrictUTF8::ToWChar
> #14 1.27%   EscapeString (kicad function)
> #17 0.85%   __GI___strlen_sse2
>
>   #18 0.85%  wxUniChar::From8bit
>
>
> #19 0.84%  wxUniChar::operator==
>
> And plenty more string-y things in the top 50 or so lines. So it seems
> the biggest cost for strings is converting them from UTF-8 to wchar_t
> strings in WX (this is probably not the same on Windows). But it's not
> really a stunning cost.
>
> However, loading the CIAA board, and there are basically no string
> operations above 0.5%, and only a handful even above 0.25%. When doing
> DRC, strings don't break 0.1%: nearly all the significant work is
> looking things up in std::maps and geometry.
>
> So string performance doesn't seem to be *that* critical, as it's
> quickly drowned out under real workloads. It looks to me (and I'm happy
> to be corrected, I'm not a perf expert), like string operations in KiCad
> are not much of a bottleneck.
>
>  > Because characters are different lengths, you have to scan the string
>  > to find the n’th character.
>
> Even with UTF-32, you can only do an O(1) lookup of the n'th *code
> point* or *code unit* (the same in UTF-32, not in UTF-8), not the n'th
> *encoded character*.
>
> That's true even if you normalise the strings first. Not all code points
> map one-to-one to an encoded character (it can be one-to-none,
> one-to-one, many-to-one). And that's even without considering grapheme
> clustering.
>
> Cheers,
>
> John
>
> PS / OT: If we had to optimise one thing,
> PolygonTriangulation::Vertex::inTriangle is the single hungriest
> function, chewing 6.19% of all CPU time, double that of each of the next
> 3: __gnu_cxx::__exchange_and_add (2.76%),  PolygonTriangulation::isEar
> (2.73%) and even malloc (2.27%).
>
> Other than that fairly mundane 6%-er, there are no eye-popping
> performance hogs simply on loading a PCB. Which is nice.
>
> [1]:
>
> https://github.com/ciaa/Hardware/blob/master/PCB/ACC/CIAA_ACC/ciaa_acc.kicad_pcb
>
> ___
> Mailing list: https://launchpad.net/~kicad-developers
> Post to : kicad-developers@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp
>
___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread John Beard


On 30/04/2019 21:55, Jeff Young wrote:

I think we’re a long way from handling Hiragana, Katakana or Kanji.  Probably 
the same for Tamil or Telegu, although I know nothing about them.


We can already accept Unicode from many string user inputs. For example 
a net name can be "" (this is 2 code points, but might render as a 
flag or a "US"). It doesn't render on the schematic (you see ??) but it 
does make it into the netlist just fine. At least on Linux with a UTF-8 
locale.


If we get user fonts in schematics (e.g. to support Chinese users 
without making our own 1+ char Hershey font[1], this will Just Work 
(TM). This is on the Road Map V6[2].


Anywhere text comes in from the user, could be any kind of weird 
Unicode. Generally 95% of the Unicode handling is taking data from the 
user, keeping it safe, and then passing it onto something that can deal 
with the hellish mess of glyphs. Like HarfBuzz[3] (used by toolkits like 
GTK+ and, directly, by Inkscape). As far as we are concerned, it is 
opaque data.



So why do we scan strings?  We do it when tokenizing, but all our tokens are 
roman (if not ascii), so that should be OK.


In this case we'd be iterating the strings anyway, not random access. 
Parsing s-expressions should be as well defined in Unicode as in ASCII, 
it's a matter of what the grammar accepts (not that there is a formal 
grammar, but if there were one, it would probably say ASCII only symbols 
and anything goes between the quotes of a string, and we'd check for 
valid UTF-8 at some point, but maybe not in the tokenising stage).



We also do it looking for numbers to increment.  We’d like this to work for 
other languages, but as long as their subsequent-code-points don’t look like 
roman digits I think we’re OK.  (Subsequent-code-points are all above ascii, 
right?).


Subsequent code *units* in a code point (in UTF-8, where a unit is 1 
byte and a point is 1-4 bytes) start with 0b10. There's no rule about 
what order code *points* can come in, but some orders make sense, other 
orders are nonsense to humans[4].


Parsing some arbitrary sequence of code points and getting something 
semantically useful out is "hard", and highly domain-specific (like, is 
0xAB a number? I would say so, but my non-computery friends will say no. 
What about 一百零五?). But it would still be done on a iterative basis.



We do some case conversions when doing compares.  But again, as long as 
subsequent code points don’t look like ascii we should be OK.  I assume 
capitalization algorithms don’t try to do it on Romanji or other 
non-ascii-coded roman characters?


They do (e.g. Greek, Cyrillic and there are many other bicameral 
scripts) and there are sometimes special rules. A common example is 
ß->SS, so you can't even be sure of the length! This is locale-dependent 
(e.g. std::toupper listens to the std::locale). This, again, is a "hard" 
problem, and there are libraries for it, e.g. ICU, if you really need to 
get it right (e.g. normalizing Unicode sequences to NFC first, etc).


In any case, the capitalization algorithm is a iteration of the string.


When else do we scan strings?


The question is when do we randomly index into strings without having 
scanned for the index point beforehand. This is actually not a common 
action when you're dealing with arbitrary user input. You will normally 
be using some kind of iterative process like "find the offset of the 
first colon" or "split on the second space" or "uppercase this string" 
or "replace illegal characters" or something.


Things like string sorting will also "just work" in UTF-8. It's designed 
that way so that lexicographically sorting by byte is the same as 
lexicographically sorting by code point[5].


If you're dealing with known or expected text, you can certainly still 
index into a UTF-8/32 string. But never for text that's come from some 
Unicode source. It could be anything, even just 5 zero-width joiners 
in a row and that silly poo emoji at the end. That is a problem for 
HarfBuzz.


Yes, it's extremely annoying, but human language is a very complex thing.

Cheers,

John

[1]: https://bugs.launchpad.net/kicad/+bug/594064 (though I think a 
Hershey Chinese font would be "fun", I don't see it happening soon).

[2]: http://docs.kicad-pcb.org/doxygen/v6_road_map.html#v6_sch_sys_fonts
[3]: https://en.wikipedia.org/wiki/HarfBuzz
[4]: The iPhone SMS of Death was caused by a "nonsense" Unicode code 
point sequence.
[5]: And if you want "real" sorting, well, that's *also* locale 
dependent: in German DIN 5007-1, ö=o, 5007-2, ö=oe, in Swedish, ö is at 
the end, after ä.


___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Jeff Young

I think we’re a long way from handling Hiragana, Katakana or Kanji.  Probably 
the same for Tamil or Telegu, although I know nothing about them.

So why do we scan strings?  We do it when tokenizing, but all our tokens are 
roman (if not ascii), so that should be OK.

We also do it looking for numbers to increment.  We’d like this to work for 
other languages, but as long as their subsequent-code-points don’t look like 
roman digits I think we’re OK.  (Subsequent-code-points are all above ascii, 
right?).

We do some case conversions when doing compares.  But again, as long as 
subsequent code points don’t look like ascii we should be OK.  I assume 
capitalization algorithms don’t try to do it on Romanji or other 
non-ascii-coded roman characters?

When else do we scan strings?

> On 30 Apr 2019, at 21:35, John Beard  wrote:
> 
> On 30/04/2019 18:19, Jeff Young wrote:
>> I was referring to UCS-2 or UCS-4.  I’m evidently behind the times, though, 
>> because I now see that UTF-32 and UCS-4 are equivalent.
>> (Which means that both some of John’s original premises and my quote in teal 
>> below were wrong: UTF32 is indeed a one:one map between code points and 
>> chars.)
> 
> Kind of, depending the on definition of character. As long as you never get 
> any multi-code point "characters".
> 
>> So my proposal (in 2019) should be std::u32string (using UTF32 encoding, for 
>> which myString[3] still works).
> 
> By "works", what do you mean? Sure you can index into a UTF-32 string and 
> come up with a valid (whole) code point (and a valid code unit). But that 
> doesn't mean a lot: it could be the "ᄀ" (\u1100) from 가, which is actually 2 
> code points.
> 
> How often do we actually index into a string buffer by code point anyway, 
> without iterating the string to find something first? What does that even 
> mean in the context of a Unicode string?
> 
> Graphemes are not a strange and ignorable edge case: emojis may sound silly, 
> but lots of actual languages use grapheme clusters perfectly casually (Tamil, 
> Telegu[1], Hangul as above, etc). You either support Unicode or you don't, 
> you cannot pick and choose what is "reasonable" to support.
> 
> BTW, UTF-8 is does allow you to index into it by byte and see if you're on a 
> code point boundary (if the byte starts 0b10xx, you are not). You can't 
> index to the n'th code point (but for what purpose?) and you still can't 
> index to the n'th grapheme, but you can't do that in *any* encoding.
> 
>> Better?
> 
> As long as we save our files as UTF-8, I don't really mind what we use 
> internally. But if you actually plan to manipulate strings that could be 
> Unicode and it comes from a user, you cannot do it only by code point, 
> regardless of representation.
> 
> Cheers,
> 
> John
> 
> [1]: Mishandling of Telegu produced the iPhone SMS of Death bug.


___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread John Beard


On 30/04/2019 18:19, Jeff Young wrote:

I was referring to UCS-2 or UCS-4.  I’m evidently behind the times, though, 
because I now see that UTF-32 and UCS-4 are equivalent.

(Which means that both some of John’s original premises and my quote in teal 
below were wrong: UTF32 is indeed a one:one map between code points and chars.)


Kind of, depending the on definition of character. As long as you never 
get any multi-code point "characters".



So my proposal (in 2019) should be std::u32string (using UTF32 encoding, for 
which myString[3] still works).


By "works", what do you mean? Sure you can index into a UTF-32 string 
and come up with a valid (whole) code point (and a valid code unit). But 
that doesn't mean a lot: it could be the "ᄀ" (\u1100) from 가, which is 
actually 2 code points.


How often do we actually index into a string buffer by code point 
anyway, without iterating the string to find something first? What does 
that even mean in the context of a Unicode string?


Graphemes are not a strange and ignorable edge case: emojis may sound 
silly, but lots of actual languages use grapheme clusters perfectly 
casually (Tamil, Telegu[1], Hangul as above, etc). You either support 
Unicode or you don't, you cannot pick and choose what is "reasonable" to 
support.


BTW, UTF-8 is does allow you to index into it by byte and see if you're 
on a code point boundary (if the byte starts 0b10xx, you are not). 
You can't index to the n'th code point (but for what purpose?) and you 
still can't index to the n'th grapheme, but you can't do that in *any* 
encoding.



Better?


As long as we save our files as UTF-8, I don't really mind what we use 
internally. But if you actually plan to manipulate strings that could be 
Unicode and it comes from a user, you cannot do it only by code point, 
regardless of representation.


Cheers,

John

[1]: Mishandling of Telegu produced the iPhone SMS of Death bug.

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Jeff Young

I was referring to UCS-2 or UCS-4.  I’m evidently behind the times, though, 
because I now see that UTF-32 and UCS-4 are equivalent.

(Which means that both some of John’s original premises and my quote in teal 
below were wrong: UTF32 is indeed a one:one map between code points and chars.)

So my proposal (in 2019) should be std::u32string (using UTF32 encoding, for 
which myString[3] still works).

Better?

Cheers,
Jeff.

PS: I was last in deep with this stuff during the early days of PDF & Acrobat — 
which was 30 years ago. ;)


> On 30 Apr 2019, at 18:05, Seth Hillbrand  wrote:
> 
> Am 2019-04-30 12:49, schrieb Jeff Young:
> 
>> You are correct that you also can’t do it with UTF32 strings, but
>> I’m not suggesting those.  I’m suggesting *unicode* strings.
>> That’s 1 code-point per character.  So myString[3] still works.
> 
> Sorry Jeff, I'm being slow here and must be missing an important point.  I 
> had thought that unicode was encoded by UTF-8, UTF-16, etc.  But it sounds 
> like you a referring to something different.  Is there a good place to look 
> for more information on the specific encoding you are suggesting?
> 
> -Seth


___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Seth Hillbrand


Am 2019-04-30 12:49, schrieb Jeff Young:


You are correct that you also can’t do it with UTF32 strings, but
I’m not suggesting those.  I’m suggesting *unicode* strings.
That’s 1 code-point per character.  So myString[3] still works.


Sorry Jeff, I'm being slow here and must be missing an important point.  
I had thought that unicode was encoded by UTF-8, UTF-16, etc.  But it 
sounds like you a referring to something different.  Is there a good 
place to look for more information on the specific encoding you are 
suggesting?


-Seth

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Jeff Young

Thanks for the analysis, John.  However, those numbers are with wxWidget’s 
performance optimizations (the ones that crash when multi-threaded), so we 
don’t really know how bad they would be without it.

Also, wxString hides the UTF8 serialization from us and makes for simpler code. 
 You can use myString[3] and get the 3rd character.  You can’t do that with 
UTF8 strings.

You are correct that you also can’t do it with UTF32 strings, but I’m not 
suggesting those.  I’m suggesting *unicode* strings.  That’s 1 code-point per 
character.  So myString[3] still works. 

(I don’t think graphemes, ligatures, extended-code-points, etc. are a real 
problem for us.  Heck, our stroke font doesn’t even support 10% of unicode.)

Cheers,
Jeff.

Note: if there’s a std::string library that hides the UTF8 serialization from 
us I could be talked into using that.  I do agree that it looks like the 
performance wouldn’t be a deal-breaker.


> On 30 Apr 2019, at 17:22, John Beard  wrote:
> 
> On 30/04/2019 16:01, Jeff Young wrote:
>> Primarily for performance reasons.
> 
> WRT performance, I did a few benchmarks for reference (on Linux)
> 
> Loading this large CIAA PCB[1] allocates, out of a peak usage of 467MB of 
> heap with a 0.01% threshold:
> 
> * 9.6MB of std::basic_string::_M_assign
>   * 9.4MB of this is from wxString operator= assignments
> * ~600kB of std::basic_string::_M_construct, (wxString ctor)
> 
> So I'm not sure memory usage is a major factor to worry about (strings 
> allocate storage on the heap, so we should see basically all the interesting 
> things in the heap profile). UTF-8 could be as little as 1/4 UTF-32 (all 
> strings are ASCII), but even then, it's a few MB saved.
> 
> Now, in terms of performance, opening Pcbnew with no file gives:
> 
> #4  3.36% __gconv_transform_utf8_internal 
> #5  2.51%   __mbsrtowcs_l
> #6  2.50%   wxMBConv::ToWChar
> #8  2.07%   std::basic_string::_M_assign
> #9  1.88%   wxMBConvStrictUTF8::ToWChar
> #14 1.27%   EscapeString (kicad function)
> #17 0.85%   __GI___strlen_sse2  #18 0.85%  
> wxUniChar::From8bit 
> #19 0.84%  wxUniChar::operator==
> 
> And plenty more string-y things in the top 50 or so lines. So it seems the 
> biggest cost for strings is converting them from UTF-8 to wchar_t strings in 
> WX (this is probably not the same on Windows). But it's not really a stunning 
> cost.
> 
> However, loading the CIAA board, and there are basically no string operations 
> above 0.5%, and only a handful even above 0.25%. When doing DRC, strings 
> don't break 0.1%: nearly all the significant work is looking things up in 
> std::maps and geometry.
> 
> So string performance doesn't seem to be *that* critical, as it's quickly 
> drowned out under real workloads. It looks to me (and I'm happy to be 
> corrected, I'm not a perf expert), like string operations in KiCad are not 
> much of a bottleneck.
> 
> > Because characters are different lengths, you have to scan the string
> > to find the n’th character.
> 
> Even with UTF-32, you can only do an O(1) lookup of the n'th *code point* or 
> *code unit* (the same in UTF-32, not in UTF-8), not the n'th *encoded 
> character*.
> 
> That's true even if you normalise the strings first. Not all code points map 
> one-to-one to an encoded character (it can be one-to-none, one-to-one, 
> many-to-one). And that's even without considering grapheme clustering.
> 
> Cheers,
> 
> John
> 
> PS / OT: If we had to optimise one thing, 
> PolygonTriangulation::Vertex::inTriangle is the single hungriest function, 
> chewing 6.19% of all CPU time, double that of each of the next 3: 
> __gnu_cxx::__exchange_and_add (2.76%),  PolygonTriangulation::isEar (2.73%) 
> and even malloc (2.27%).
> 
> Other than that fairly mundane 6%-er, there are no eye-popping performance 
> hogs simply on loading a PCB. Which is nice.
> 
> [1]: 
> https://github.com/ciaa/Hardware/blob/master/PCB/ACC/CIAA_ACC/ciaa_acc.kicad_pcb

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Seth Hillbrand


Am 2019-04-30 12:22, schrieb John Beard:

Even with UTF-32, you can only do an O(1) lookup of the n'th *code
point* or *code unit* (the same in UTF-32, not in UTF-8), not the n'th
*encoded character*.


+1 here.  I'd be in favor of standardizing on a clean, 
standards-compliant string library for internal work, converting to 
wxString only for user interaction.


My main beef with UTF-16 (and UTF-32) is that they don't display as 
human readable files without a UTF-16/UTF-32 compatible viewer.  All of 
our file formatting is ASCII with the exception of user-generated 
content.  So, right now, I can use any text viewer to read the files.  
Using UTF-8 preserves this ability but we'd lose this with u32string 
(unless we convert back for writing)


There are some other, minor issues including byte-order marking, 
corruption re-syncronization and external library support that we'd need 
to think closely about if we wanted to change.



PS / OT: If we had to optimise one thing,
PolygonTriangulation::Vertex::inTriangle is the single hungriest
function, chewing 6.19% of all CPU time, double that of each of the
next 3: __gnu_cxx::__exchange_and_add (2.76%),
PolygonTriangulation::isEar (2.73%) and even malloc (2.27%).


FYI, I am currently working on modifying the triangulation.

-Seth

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread John Beard

On 30/04/2019 16:01, Jeff Young wrote:

Primarily for performance reasons.

WRT performance, I did a few benchmarks for reference (on Linux)

Loading this large CIAA PCB[1] allocates, out of a peak usage of 467MB 
of heap with a 0.01% threshold:

* 9.6MB of std::basic_string::_M_assign
   * 9.4MB of this is from wxString operator= assignments
* ~600kB of std::basic_string::_M_construct, (wxString ctor)

So I'm not sure memory usage is a major factor to worry about (strings 
allocate storage on the heap, so we should see basically all the 
interesting things in the heap profile). UTF-8 could be as little as 1/4 
UTF-32 (all strings are ASCII), but even then, it's a few MB saved.

Now, in terms of performance, opening Pcbnew with no file gives:

#4  3.36%   __gconv_transform_utf8_internal 
#5  2.51%   __mbsrtowcs_l
#6  2.50%   wxMBConv::ToWChar
#8  2.07%   std::basic_string::_M_assign
#9  1.88%   wxMBConvStrictUTF8::ToWChar
#14 1.27%   EscapeString (kicad function)
#17 0.85%   __GI___strlen_sse2 

 #18 0.85%  wxUniChar::From8bit 

#19 0.84%  wxUniChar::operator==

And plenty more string-y things in the top 50 or so lines. So it seems 
the biggest cost for strings is converting them from UTF-8 to wchar_t 
strings in WX (this is probably not the same on Windows). But it's not 
really a stunning cost.

However, loading the CIAA board, and there are basically no string 
operations above 0.5%, and only a handful even above 0.25%. When doing 
DRC, strings don't break 0.1%: nearly all the significant work is 
looking things up in std::maps and geometry.

So string performance doesn't seem to be *that* critical, as it's 
quickly drowned out under real workloads. It looks to me (and I'm happy 
to be corrected, I'm not a perf expert), like string operations in KiCad 
are not much of a bottleneck.

> Because characters are different lengths, you have to scan the string
> to find the n’th character.

Even with UTF-32, you can only do an O(1) lookup of the n'th *code 
point* or *code unit* (the same in UTF-32, not in UTF-8), not the n'th 
*encoded character*.

That's true even if you normalise the strings first. Not all code points 
map one-to-one to an encoded character (it can be one-to-none, 
one-to-one, many-to-one). And that's even without considering grapheme 
clustering.

Cheers,

John

PS / OT: If we had to optimise one thing, 
PolygonTriangulation::Vertex::inTriangle is the single hungriest 
function, chewing 6.19% of all CPU time, double that of each of the next 
3: __gnu_cxx::__exchange_and_add (2.76%),  PolygonTriangulation::isEar 
(2.73%) and even malloc (2.27%).

Other than that fairly mundane 6%-er, there are no eye-popping 
performance hogs simply on loading a PCB. Which is nice.

[1]: 
https://github.com/ciaa/Hardware/blob/master/PCB/ACC/CIAA_ACC/ciaa_acc.kicad_pcb

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Jeff Young

Primarily for performance reasons.  Because characters are different lengths, 
you have to scan the string to find the n’th character.

(wxString solves this with cached iterators, but they’re not thread-safe and so 
we get random crashes from them.)

> On 30 Apr 2019, at 15:59, Dmitry Salychev  wrote:
> 
> On Tue, Apr 30, 2019 at 08:59:46AM -0400, Wayne Stambaugh wrote:
>> Given that std::wstring is platform dependent, I would be opposed to
>> using it.  I'm not opposed to std::u32string but UTF8 is pretty well
>> vetted so please keep that in mind.  I think the possibility of breakage
>> is low but I'm not naive enough to think that it's zero.  You would have
>> to do some serious testing to ensure the conversion of std::u32string to
>> and from UTF8 isn't broken before I would be comfortable merging it into
>> master.
>> 
>> Wayne
>> 
> These are just thoughts of a stranger.
> 
> Why not to use std::string to keep a byte array which represents a UTF-8
> string itself? Size of the string means its length in bytes and
> utf8::distance() [1] returns a number of the code points, i.e.
> length in symbols.
> 
> [1] http://utfcpp.sourceforge.net/
> 
> Regards,
> Dmitry
> 
> ___
> Mailing list: https://launchpad.net/~kicad-developers
> Post to : kicad-developers@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Dmitry Salychev

On Tue, Apr 30, 2019 at 08:59:46AM -0400, Wayne Stambaugh wrote:
> Given that std::wstring is platform dependent, I would be opposed to
> using it.  I'm not opposed to std::u32string but UTF8 is pretty well
> vetted so please keep that in mind.  I think the possibility of breakage
> is low but I'm not naive enough to think that it's zero.  You would have
> to do some serious testing to ensure the conversion of std::u32string to
> and from UTF8 isn't broken before I would be comfortable merging it into
> master.
> 
> Wayne
> 
These are just thoughts of a stranger.

Why not to use std::string to keep a byte array which represents a UTF-8
string itself? Size of the string means its length in bytes and
utf8::distance() [1] returns a number of the code points, i.e.
length in symbols.

[1] http://utfcpp.sourceforge.net/

Regards,
Dmitry

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Jeff Young

The GUI is strongly bound to wxWidgets anyway, so I don’t have an issue using 
wxString there.  Although I’m sure we could create our own “_()” macro at some 
point if it comes to that.

> On 30 Apr 2019, at 14:55, Wayne Stambaugh  wrote:
> 
> What about translated strings?  Is changing them from wxString to
> u32string going to be an issue?  I know this doesn't effect the file I/O
> strings but it is something we are going to have to consider if we are
> going to punt wxString.
> 
> On 4/30/19 9:27 AM, Jeff Young wrote:
>> Sure, but we’re going to be re-writing the parsers and formatters for
>> s-expr so it’s going to be all different code anyway.  (Granted the new
>> code could have used the old infrastructure, but I think we need to wean
>> ourselves from wxString either way.)
>> 
>>> On 30 Apr 2019, at 13:59, Wayne Stambaugh >> > wrote:
>>> 
>>> Given that std::wstring is platform dependent, I would be opposed to
>>> using it.  I'm not opposed to std::u32string but UTF8 is pretty well
>>> vetted so please keep that in mind.  I think the possibility of breakage
>>> is low but I'm not naive enough to think that it's zero.  You would have
>>> to do some serious testing to ensure the conversion of std::u32string to
>>> and from UTF8 isn't broken before I would be comfortable merging it into
>>> master.
>>> 
>>> Wayne
>>> 
>>> On 4/30/19 7:32 AM, Jeff Young wrote:
 I suspect all our platforms use at least 32 bit ints, but even so
 std::u32string does communicate the intent better.
 
 So change the proposal to that….
 
 Cheers,
 Jeff.
 
> On 30 Apr 2019, at 10:52, Andrew Lutsenko  
> > wrote:
> 
> Hi,
> I have no opinion on the matter but would add a reminder that wchar_t
> is platform and compiler dependent.
> Consider using std::u32string instead of std::wstring if you want all
> code points to fit into one element.
> 
> Regards,
> Andrew
> 
> On Tue, Apr 30, 2019 at 2:36 AM Jeff Young  
> > wrote:
> 
>We had talked earlier about throwing the wxWidgets UTF8 compile
>switch to get rid of our wxString re-entrancy problems.  However,
>I noticed that the 6.0 work packages doc includes an item for
>std::string-ization of the BOARD.  (While a lot more work, this is
>a better solution because it also increases our gui-toolkit-choice
>flexibility.)
> 
>I’d like to propose that we use std::wstring for that.  UTF8
>should *only* be an encoding format (similar to s-expr).  It
>should never be used internally.  That’s what unicode wchar_t’s
>are for.
> 
>And I’d like to propose that we extend std::wstring-ization to
>SCH_ITEM and LIB_ITEM.  (Then we can get rid of a bunch of our
>ugly mutex hacks.)
>___
>Mailing list: https://launchpad.net/~kicad-developers
>Post to : kicad-developers@lists.launchpad.net
> 
>
>Unsubscribe : https://launchpad.net/~kicad-developers
>More help   : https://help.launchpad.net/ListHelp
> 
 
 
 ___
 Mailing list: https://launchpad.net/~kicad-developers
 Post to : kicad-developers@lists.launchpad.net
 
 Unsubscribe : https://launchpad.net/~kicad-developers
 More help   : https://help.launchpad.net/ListHelp
 
>>> 
>>> ___
>>> Mailing list: https://launchpad.net/~kicad-developers
>>> Post to : kicad-developers@lists.launchpad.net
>>> 
>>> Unsubscribe : https://launchpad.net/~kicad-developers
>>> More help   : https://help.launchpad.net/ListHelp
>> 


___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Wayne Stambaugh

What about translated strings?  Is changing them from wxString to
u32string going to be an issue?  I know this doesn't effect the file I/O
strings but it is something we are going to have to consider if we are
going to punt wxString.

On 4/30/19 9:27 AM, Jeff Young wrote:
> Sure, but we’re going to be re-writing the parsers and formatters for
> s-expr so it’s going to be all different code anyway.  (Granted the new
> code could have used the old infrastructure, but I think we need to wean
> ourselves from wxString either way.)
> 
>> On 30 Apr 2019, at 13:59, Wayne Stambaugh > > wrote:
>>
>> Given that std::wstring is platform dependent, I would be opposed to
>> using it.  I'm not opposed to std::u32string but UTF8 is pretty well
>> vetted so please keep that in mind.  I think the possibility of breakage
>> is low but I'm not naive enough to think that it's zero.  You would have
>> to do some serious testing to ensure the conversion of std::u32string to
>> and from UTF8 isn't broken before I would be comfortable merging it into
>> master.
>>
>> Wayne
>>
>> On 4/30/19 7:32 AM, Jeff Young wrote:
>>> I suspect all our platforms use at least 32 bit ints, but even so
>>> std::u32string does communicate the intent better.
>>>
>>> So change the proposal to that….
>>>
>>> Cheers,
>>> Jeff.
>>>
 On 30 Apr 2019, at 10:52, Andrew Lutsenko >>> 
 > wrote:

 Hi,
 I have no opinion on the matter but would add a reminder that wchar_t
 is platform and compiler dependent.
 Consider using std::u32string instead of std::wstring if you want all
 code points to fit into one element.

 Regards,
 Andrew

 On Tue, Apr 30, 2019 at 2:36 AM Jeff Young >>> 
 > wrote:

    We had talked earlier about throwing the wxWidgets UTF8 compile
    switch to get rid of our wxString re-entrancy problems.  However,
    I noticed that the 6.0 work packages doc includes an item for
    std::string-ization of the BOARD.  (While a lot more work, this is
    a better solution because it also increases our gui-toolkit-choice
    flexibility.)

    I’d like to propose that we use std::wstring for that.  UTF8
    should *only* be an encoding format (similar to s-expr).  It
    should never be used internally.  That’s what unicode wchar_t’s
    are for.

    And I’d like to propose that we extend std::wstring-ization to
    SCH_ITEM and LIB_ITEM.  (Then we can get rid of a bunch of our
    ugly mutex hacks.)
    ___
    Mailing list: https://launchpad.net/~kicad-developers
    Post to     : kicad-developers@lists.launchpad.net
 
    
    Unsubscribe : https://launchpad.net/~kicad-developers
    More help   : https://help.launchpad.net/ListHelp

>>>
>>>
>>> ___
>>> Mailing list: https://launchpad.net/~kicad-developers
>>> Post to : kicad-developers@lists.launchpad.net
>>> 
>>> Unsubscribe : https://launchpad.net/~kicad-developers
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>
>> ___
>> Mailing list: https://launchpad.net/~kicad-developers
>> Post to : kicad-developers@lists.launchpad.net
>> 
>> Unsubscribe : https://launchpad.net/~kicad-developers
>> More help   : https://help.launchpad.net/ListHelp
> 

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Jeff Young

Sure, but we’re going to be re-writing the parsers and formatters for s-expr so 
it’s going to be all different code anyway.  (Granted the new code could have 
used the old infrastructure, but I think we need to wean ourselves from 
wxString either way.)

> On 30 Apr 2019, at 13:59, Wayne Stambaugh  wrote:
> 
> Given that std::wstring is platform dependent, I would be opposed to
> using it.  I'm not opposed to std::u32string but UTF8 is pretty well
> vetted so please keep that in mind.  I think the possibility of breakage
> is low but I'm not naive enough to think that it's zero.  You would have
> to do some serious testing to ensure the conversion of std::u32string to
> and from UTF8 isn't broken before I would be comfortable merging it into
> master.
> 
> Wayne
> 
> On 4/30/19 7:32 AM, Jeff Young wrote:
>> I suspect all our platforms use at least 32 bit ints, but even so
>> std::u32string does communicate the intent better.
>> 
>> So change the proposal to that….
>> 
>> Cheers,
>> Jeff.
>> 
>>> On 30 Apr 2019, at 10:52, Andrew Lutsenko >> 
>>> >> wrote:
>>> 
>>> Hi,
>>> I have no opinion on the matter but would add a reminder that wchar_t
>>> is platform and compiler dependent.
>>> Consider using std::u32string instead of std::wstring if you want all
>>> code points to fit into one element.
>>> 
>>> Regards,
>>> Andrew
>>> 
>>> On Tue, Apr 30, 2019 at 2:36 AM Jeff Young >> 
>>> >> wrote:
>>> 
>>>We had talked earlier about throwing the wxWidgets UTF8 compile
>>>switch to get rid of our wxString re-entrancy problems.  However,
>>>I noticed that the 6.0 work packages doc includes an item for
>>>std::string-ization of the BOARD.  (While a lot more work, this is
>>>a better solution because it also increases our gui-toolkit-choice
>>>flexibility.)
>>> 
>>>I’d like to propose that we use std::wstring for that.  UTF8
>>>should *only* be an encoding format (similar to s-expr).  It
>>>should never be used internally.  That’s what unicode wchar_t’s
>>>are for.
>>> 
>>>And I’d like to propose that we extend std::wstring-ization to
>>>SCH_ITEM and LIB_ITEM.  (Then we can get rid of a bunch of our
>>>ugly mutex hacks.)
>>>___
>>>Mailing list: https://launchpad.net/~kicad-developers 
>>> 
>>>Post to : kicad-developers@lists.launchpad.net 
>>> 
>>>>> >
>>>Unsubscribe : https://launchpad.net/~kicad-developers 
>>> 
>>>More help   : https://help.launchpad.net/ListHelp 
>>> 
>>> 
>> 
>> 
>> ___
>> Mailing list: https://launchpad.net/~kicad-developers 
>> 
>> Post to : kicad-developers@lists.launchpad.net 
>> 
>> Unsubscribe : https://launchpad.net/~kicad-developers 
>> 
>> More help   : https://help.launchpad.net/ListHelp 
>> 
>> 
> 
> ___
> Mailing list: https://launchpad.net/~kicad-developers 
> 
> Post to : kicad-developers@lists.launchpad.net 
> 
> Unsubscribe : https://launchpad.net/~kicad-developers 
> 
> More help   : https://help.launchpad.net/ListHelp 
> 
___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Wayne Stambaugh

Given that std::wstring is platform dependent, I would be opposed to
using it.  I'm not opposed to std::u32string but UTF8 is pretty well
vetted so please keep that in mind.  I think the possibility of breakage
is low but I'm not naive enough to think that it's zero.  You would have
to do some serious testing to ensure the conversion of std::u32string to
and from UTF8 isn't broken before I would be comfortable merging it into
master.

Wayne

On 4/30/19 7:32 AM, Jeff Young wrote:
> I suspect all our platforms use at least 32 bit ints, but even so
> std::u32string does communicate the intent better.
> 
> So change the proposal to that….
> 
> Cheers,
> Jeff.
> 
>> On 30 Apr 2019, at 10:52, Andrew Lutsenko > > wrote:
>>
>> Hi,
>> I have no opinion on the matter but would add a reminder that wchar_t
>> is platform and compiler dependent.
>> Consider using std::u32string instead of std::wstring if you want all
>> code points to fit into one element.
>>
>> Regards,
>> Andrew
>>
>> On Tue, Apr 30, 2019 at 2:36 AM Jeff Young > > wrote:
>>
>> We had talked earlier about throwing the wxWidgets UTF8 compile
>> switch to get rid of our wxString re-entrancy problems.  However,
>> I noticed that the 6.0 work packages doc includes an item for
>> std::string-ization of the BOARD.  (While a lot more work, this is
>> a better solution because it also increases our gui-toolkit-choice
>> flexibility.)
>>
>> I’d like to propose that we use std::wstring for that.  UTF8
>> should *only* be an encoding format (similar to s-expr).  It
>> should never be used internally.  That’s what unicode wchar_t’s
>> are for.
>>
>> And I’d like to propose that we extend std::wstring-ization to
>> SCH_ITEM and LIB_ITEM.  (Then we can get rid of a bunch of our
>> ugly mutex hacks.)
>> ___
>> Mailing list: https://launchpad.net/~kicad-developers
>> Post to     : kicad-developers@lists.launchpad.net
>> 
>> Unsubscribe : https://launchpad.net/~kicad-developers
>> More help   : https://help.launchpad.net/ListHelp
>>
> 
> 
> ___
> Mailing list: https://launchpad.net/~kicad-developers
> Post to : kicad-developers@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp
> 

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Jeff Young

I suspect all our platforms use at least 32 bit ints, but even so 
std::u32string does communicate the intent better.

So change the proposal to that….

Cheers,
Jeff.

> On 30 Apr 2019, at 10:52, Andrew Lutsenko  wrote:
> 
> Hi,
> I have no opinion on the matter but would add a reminder that wchar_t is 
> platform and compiler dependent.
> Consider using std::u32string instead of std::wstring if you want all code 
> points to fit into one element.
> 
> Regards,
> Andrew
> 
> On Tue, Apr 30, 2019 at 2:36 AM Jeff Young  > wrote:
> We had talked earlier about throwing the wxWidgets UTF8 compile switch to get 
> rid of our wxString re-entrancy problems.  However, I noticed that the 6.0 
> work packages doc includes an item for std::string-ization of the BOARD.  
> (While a lot more work, this is a better solution because it also increases 
> our gui-toolkit-choice flexibility.)
> 
> I’d like to propose that we use std::wstring for that.  UTF8 should *only* be 
> an encoding format (similar to s-expr).  It should never be used internally.  
> That’s what unicode wchar_t’s are for.
> 
> And I’d like to propose that we extend std::wstring-ization to SCH_ITEM and 
> LIB_ITEM.  (Then we can get rid of a bunch of our ugly mutex hacks.)
> ___
> Mailing list: https://launchpad.net/~kicad-developers 
> 
> Post to : kicad-developers@lists.launchpad.net 
> 
> Unsubscribe : https://launchpad.net/~kicad-developers 
> 
> More help   : https://help.launchpad.net/ListHelp 
> 

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Thomas Pointhuber

Hi,

 

I would like to drop in this website to the discussion: https://utf8everywhere.org/

 

There are good points, and I would suggest either storing them internally with 8 or 32 bit.

 

Regards, Thomas

 

 

Gesendet: Dienstag, 30. April 2019 um 11:52 Uhr
Von: "Andrew Lutsenko" 
An: "Jeff Young" 
Cc: "KiCad Developers" 
Betreff: Re: [Kicad-developers] 6.0 string proposal



Hi,
I have no opinion on the matter but would add a reminder that wchar_t is platform and compiler dependent.

Consider using std::u32string instead of std::wstring if you want all code points to fit into one element.
 

Regards,

Andrew


 


On Tue, Apr 30, 2019 at 2:36 AM Jeff Young <j...@rokeby.ie> wrote:

We had talked earlier about throwing the wxWidgets UTF8 compile switch to get rid of our wxString re-entrancy problems.  However, I noticed that the 6.0 work packages doc includes an item for std::string-ization of the BOARD.  (While a lot more work, this is a better solution because it also increases our gui-toolkit-choice flexibility.)

I’d like to propose that we use std::wstring for that.  UTF8 should *only* be an encoding format (similar to s-expr).  It should never be used internally.  That’s what unicode wchar_t’s are for.

And I’d like to propose that we extend std::wstring-ization to SCH_ITEM and LIB_ITEM.  (Then we can get rid of a bunch of our ugly mutex hacks.)
___
Mailing list: https://launchpad.net/~kicad-developers
Post to     : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp





___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

2019-04-30 Thread Andrew Lutsenko

Hi,
I have no opinion on the matter but would add a reminder that wchar_t is
platform and compiler dependent.
Consider using std::u32string instead of std::wstring if you want all code
points to fit into one element.

Regards,
Andrew

On Tue, Apr 30, 2019 at 2:36 AM Jeff Young  wrote:

> We had talked earlier about throwing the wxWidgets UTF8 compile switch to
> get rid of our wxString re-entrancy problems.  However, I noticed that the
> 6.0 work packages doc includes an item for std::string-ization of the
> BOARD.  (While a lot more work, this is a better solution because it also
> increases our gui-toolkit-choice flexibility.)
>
> I’d like to propose that we use std::wstring for that.  UTF8 should *only*
> be an encoding format (similar to s-expr).  It should never be used
> internally.  That’s what unicode wchar_t’s are for.
>
> And I’d like to propose that we extend std::wstring-ization to SCH_ITEM
> and LIB_ITEM.  (Then we can get rid of a bunch of our ugly mutex hacks.)
> ___
> Mailing list: https://launchpad.net/~kicad-developers
> Post to : kicad-developers@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp
>
___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

40 matches

Mail list logo