Hi, sorry for the delay - I had missed your reply mail, and then was AWOL for a couple days ;-)
On Thu, Aug 21, 2025 at 09:47:35AM +0300, Anders Gustafsson wrote: > Yes. I do make assumptions based on how this app is used and yes it is > C++/MFC on windows going from an old > version with no specified coding to UTF. "no specified coding" Indeed - this often means that one "naturally"(?) has ACP behaviour, IOW encountering all sorts of different encodings (depending on current machine configuration global state crap) - with potential DATA CORRUPTION niceties ensuing. - as opposed to properly defined encoding handling boundaries where various implementation layers do have firmly defined - and capable! - encoding specs (UTF8, *or* UTF16...... *or* specific legacy codepage stuff where somehow needed). > The certificate data in this case is "regular" ie a PEM payload that is not > UTF encoded, just: > > -----BEGIN CERTIFICATE----- > xxxxxxx > -----END CERTIFICATE----- > > Ie one byte per character. Filenames OTOH might be a different kettle of fish. "one byte per character" Imprecise [spec] characterization. I'd prefer "ASCII" for that. Since if that PEM spec weren't ASCII[-compatible] *), you'd see really funny characters on your monitor ;-) > I probably should include a > testcase where the path has non-ascii characters. Yup, it probably is a very good idea to have a program (and its config location) [made to be] rooted in a filesystem path which has the most egregiously weirdest set of Unicode codepoints clapping-hands smileys multi-languages that one can imagine. Then when having non-Unicode-compliant handling somewhere, one hopefully *will* notice sufficiently soonish (Fail-Fast / Shift-Left). > The string passed to CURLOPT_SSL_CTX_DATA is a CStringA so it is OK to > convert to char, especially assuming > that the data is just PEM "convert to char" Well not a conversion I'd think, since container type payload *is* char-[byte-]typed [[- which doesn't say anything at all about its actual encoding of course...]] > Yes. String handling in Windows is weird. "Q: Are you a linuxer? Is this a concealed religious fight against Windows?" https://utf8everywhere.org/#faq.linuxer [[[yup, me is a Linuxer :-)]]] > I sort of understand the performance aspect of using UTF-16, but it > makes other things harder. I guess another option would be to use std::string > all the way. Though note that being std::string-typed is not in any way, shape or form related to encoding. (while utf8everywhere.org says that std::string is [to always be] UTF8 encoding **), which is an appreciable recommendation, of course in itself the std::string type doesn't really have such a fixed meaning - unfortunately? c.f. u8string though). IMHO the main point is being aligned to usual protocol [comfort zone]. In implementation areas very closely related to Win32 APIs, do UTF16 (...W()). In most other areas, don't (thus, UTF8 of course - to keep up Unicode-compliant encodings fundamental requirements...). That way one will be able to directly [conveniently] consume various provided APIs (e.g.: - CStringW::MakeUpper() etc. - window SetWindowText[W]() etc. - registry SetPrivateProfile*[W]() etc. - perhaps other woefully Win32-specific stuff such as shlwapi Path*[W]() ) "How to do text on Windows" https://utf8everywhere.org/#windows > Which is pretty much what I do. All certificate data is assumed to be PEM and > coded as ASCII with one byte > per character. Yup, probably encoding (thus: transcoding) {c|sh}ould be disregarded here since this PEM stuff ought to be ASCII subset only *) (nevermind certain rather exotic encodings which actually are *NOT* ASCII compatible, such as EBCDIC - where one *would* need proper full transcoding, to have the correct encoding representation established!). "one byte per character" Don't. IMHO that's some unhelpful thinking / unrelated noise (if I am planning a nice BBQ meal then I certainly won't be analyzing each single charcoal with a scanning electron microscope on its carbon content either... :-)). Things are a certain specific encoding [spec] [name], always. Whether that's *realized* via some SBCS or MBCS (/DBCS) mechanism or UTF8 ***) or UTF16 or UTF32 or whichever other "bits fumbling" mechanics DOES NOT MATTER AT ALL ****). Thus simply "either correct or wrong" (there's one encoding which is the correct one and 9999 others which are wrong :-)). Thus, transcoding. Unless I definitely know that a certain encoding already is compatible (since e.g.: supporting ASCII subset, where I am needing ASCII compat) and thus I can [afford to] do shortcutting skipping (in such cases). *) which it most certainly is (simple base64-based patterns): https://serverfault.com/questions/9708/what-is-a-pem-file-and-how-does-it-differ-from-other-openssl-generated-key-file "base64 translation of the x509 ASN.1 keys." **) most certainly because that's the [almost] only way to have a byte-typed container provide a Unicode-compliant encoding... ***) also an "MBCS", of course... ((just a VERY INCOMPATIBLE one in case of Win32 CP_ACP protocol affected areas...)) ****) well, except where working on string data (doing active _string processing_) - with corresponding properly compatible APIs, of course... Greetings Andreas Mohr > -- > Med vänlig hälsning > > Anders Gustafsson, ingenjör > [email protected] | Support +358 18 12060 | Direkt +358 9 315 > 45 121 | Mobil +358 40506 7099 > > Pedago interaktiv ab, Nygatan 7 B , AX-22100 MARIEHAMN, ÅLAND, FINLAND > > > > >>> Andreas Mohr <[email protected]> 2025-08-20 17:44 >>> > Hi, > > disclaimer: > quite experienced in certain areas, yet not too much in others (CURL). > > TL;DR: > thus discussing potential string encoding issues only. > > On Wed, Aug 20, 2025 at 03:58:14PM +0300, Anders Gustafsson via curl-library > wrote: > > So, yes this is windows ?????? libcurl/8.15.0-DEV OpenSSL/3.5.2 zlib/1.3.1 > > > > I had some issues and I just want to check whether I am going about this > > the right way. The function calls > an > > API where the client certificate is used to authenticate the caller so in > > the original version I used the > > sslctx_function(). To complicate matters does my app support PEM > > certificates and keys in two different > ways: > > 1. As files (Say on a removable secure media) and 2. As strings in the > > database for ease of use. > > > > The first way (filenames) worked right away, ie: > > > > m_Certificate.Trim(); > > if (m_Certificate.IsEmpty()) > > curl_easy_setopt(curl, CURLOPT_SSLCERT, > > m_CertificateFile.GetString()); > > Bleeep - ATLMFC CStringT::GetString() encountered. > This might thus be > dirty "encoding shortcutting" here > (simply invoking .GetString() to > "quickly" "get at" some "char"-compatibly-typed - hah! - input, rather than > doing active transcoding to > the actually *correct* encoding spec of > some char-typed handling). > > Thus, consulting this one: > > > Where m_Certificate and m_Key and regular (char) strings with the PEM coded > > data. > > What would "regular" mean? > > Considering that > CString errfilename; > with > CT2A(errfilename), > one would think that this is > a CString[T] with UNICODE config setting (put differently, CStringW) > environment, however > since you said "regular (char) strings", I am assuming that > you are on !UNICODE config (i.e., CStringA). > > > https://manpages.ubuntu.com/manpages/kinetic/man3/CURLOPT_SSLCERT.3.html > Yeah nice - that page does not specify at all which > encoding char *cert (a filesystem item argument!! - which could be > containing all the smileys available in this universe, with > only a bit of luck...) is expected to have. > Thus on Windows one would tend to > assume ACP crap - which of course would mean that it is > Unicode-compliance-broken (since: neither UTF16 nor UTF8 nor UTF7 nor > UTF-EBCDIC or whatever ;-)). > > > > > > > errno_t fileerr = fopen_s(&errfile, CT2A(errfilename), "w+, ccs=UTF-8"); > > Unicode-compliance-broken filesystem item handling! > > CT2A(errfilename) will be > wide-typed to CP_ACP transition (in UNICODE config setting), and > "nothing" *) (in non-UNICODE). > > *) BTW *HORRIBLE* atlconv.h comment "// Code page doesn't matter" atlconv.h > transcoding protocol breakage!!! > Yeah, as if that would be the case for > e.g. CP_ACP to CP_UTF8 transcoding, which *is* a valid transcoding use > case...) > (think of > CT2CA(..., CP_UTF8) > protocol behaviour **DIFFERENCE**) > > > Thus, your errfilename possibly is ACP (CP_ACP, GetACP()) content, > which would be > "compatible" since > fopen_s() API is equally ACP-specced on Windows (yuck). > ...but: then it would be > Unicode-compliance-broken (due to > being ACP crap, rather than > e.g. UTF8 as usually on Linux). > > Since fopen_s() should have an overload for wide-typed input I'd think, > the way to go would at least be > CT2W(errfilename) - thereby > properly preserving Unicode-compliant (since wide-typed!) encoding (when > on UNICODE config setting - and ACP crap on !UNICODE). > > Or, better do utf8everywhere.org (i.e., std::string[-means-utf8] - > to have ensured that > **every** string traffic anywhere is Unicode-compliant), and thus do > std::string errfilename = "myUtf8InputStuffStringFromSomewhere"; // e.g. > std::filesystem **) API > fopen_s(... CA2W(errfilename, CP_UTF8) ...); > > > **) rather *horribly* Unicode-compliance-broken (on Windows!) - I digress... > "<filesystem>: prevent filesystem::path dangerous conversions to/from default > code page encoding" > https://github.com/microsoft/STL/issues/909 > > > > In the second scenario, PEM in database, I had some problems and I just > > wanted to check that the code I > came > > up with is sane. Ie the authentication will not happen unless I have both > > certificate and key, so: > > > > if (!m_Certificate.IsEmpty()) > > { > > curl_easy_setopt(curl, > > CURLOPT_SSL_CTX_FUNCTION, sslctx_function); > > curl_easy_setopt(curl, CURLOPT_SSL_CTX_DATA, > > m_Certificate.GetString()); > > WARNING CORRUPTION: CURLOPT_SSL_CTX_DATA has a void*-typed ptr arg, thus > both .GetString() CString[T] is accepted, *silently*). > IOW, once on UNICODE config setting it would be > *broken*. > > So, questionable encoding stuff again. > According to > https://curl.se/libcurl/c/CURLOPT_SSL_CTX_DATA.html > "char *mypem = /* CA cert in PEM format" > it seems to *appear* that > "PEM format" means some plain ASCII-only payload. > > Now to be maximally precise one could do > const UINT nCP_PEM = 20127 /*CP_ASCII*/ /* these clearly are [to be!] all > ASCII-only, right!!!? */; > std::string strCertificate = CW2A(CA2W(m_Certificate), nCP_PEM); > (this transition expects that m_Certificate has system ACP content, of course) > > > > (OTOH one could just assume that > all [relevant] ACP encodings are ASCII subset, thus > simply NOT do transcoding since it then ought to be > ASCII-compliant content already anyway). > > > > Then, below, which seems to work OK. I first used the example here: > > https://curl.se/libcurl/c/CURLOPT_SSL_CTX_FUNCTION.html > > but that one did not fix my key for me. Yes, this code still leaves > > allocated memory in case of errors. > > "fix my key" - that wording might hint at > encoding issues. > But perhaps we're talking about > a plain CURL certificate config issue only after all. > > I could not precisely identify (thus: discuss) particular ***) issues in > your handling, but I'd hope that > this will give you some ideas (*if* it is an encoding issue). > > ***) well, except for the broken fopen_s() filesystem item handling... > > Greetings > > Andreas Mohr -- Epidämliche Plage rationaler Schlagseite -- Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-library Etiquette: https://curl.se/mail/etiquette.html
