We started with SHA256 (see
https://gitlab.com/kicad/code/kicad/-/blob/master/include/embedded_files.h?ref_type=heads#L67
).

The problem with cryptographic hashes is that they are slow.  By design.
This affected the load time for KiCad files.  So we decided to use a
known-good, fast hash function.  It is used internally by libstdc++, nginx,
npm, Elasticsearch and a number of other projects.  The lack of commandline
tool was not a consideration and even if it was, I do not think that the
tradeoff between slower load times for everyone and easy commandline
decoding for the limited number of people who would want to do that would
have been a good choice.  Similarly with zstd.  Gzip output was about 40%
larger for many files.  There's just not a good reason to make everyone
deal with larger file sizes in order to accommodate a niche use case.

You seem to be annoyed by the updating KiCad file format.  The changes that
you called out were specifically made in order to standardize the format.
This is well-documented on the dev-docs page.

The choice of SEXPR was made in 2011 and committed in 2012.  JSON was
standardized in 2013.  JSON libraries for C++ weren't released until 2015.
SEXPR had been well-established for many years including usage in other EDA
tools.  It was the right choice for the time and it remains a decent
platform.  Everyone likes to Monday-morning quarterback old decisions but
it shows a lack of historical understanding around the project.

Seth
[image: KiCad Services Corporation Logo]
Seth Hillbrand
*Lead Developer*
+1-530-302-5483‬
Long Beach, CA
www.kipro-pcb.com    [email protected]


On Fri, Jan 17, 2025 at 1:58 AM Salvador E. Tropea <[email protected]>
wrote:

> Hi Seth:
> On 15/1/25 15:06, 'Seth Hillbrand' via KiCad Developers wrote:
>
> The data for embedded files follows the SEXPR format (
> https://datatracker.ietf.org/doc/draft-rivest-sexp/).  Base64 is supposed
> to be bracketed by the pipes.  This allows third-party sexpr parsers to
> more easily handle our data format when we follow conventions.  We did not
> do this for the images and that was an oversight.  Eventually, images will
> be added to the embedded files format and the distinction will go away.
>
>
> Ok, note that images already changed in the past, a pitty they didn't get
> the correct format. (Note: data was "xxxx" and changed to xxxx)
>
> This is not the only thing that is constantly changing, things like "hide
> -> (hide yes)" or "(uuid xxxx) -> (uuid "xxxx")" pop quite often. I guess
> somebody should be in charge of approving the way things are implemented in
> the file formats. Not to mention document it before a release, and I mean
> document the new release not the previous.
>
> BTW: This is related to the popularity issue, if the change of format from
> custom to Sexp had been from custom to JSON (IMHO far more popular than
> Sexp) these errors would not happened. You have plenty of libs and tools to
> implement and verify JSON.
>
>
> We use MurMur3 hash -- unmodified from the source at
> https://github.com/aappleby/smhasher.  You might look at things like
> https://stackoverflow.com/questions/75921577/murmur3-hash-compatibility-between-go-and-python
> to determine why your method is different.  Yes it is fast.  No it is not
> worst.  We do have robust and popular hashes.  This is one of them.  That
> is why we use it.
>
>
> I see we have a quite different idea of what is popular. Let me clarify,
> if you get a minimal Linux core, lets say the docker image for
> "debian:bookworm-slim" (a slim version of Debian Bookworm intended to be
> the base for other docker images) you'll find MD5, SHA256, SHA512, SHA224,
> SHA384 and a few more hashes implemented with command line commands. If you
> take a language like Python (included in KiCad) and take a look at the
> standard hashlib module you'll find SHA1, SHA224, SHA256, SHA384, SHA512,
> SHA-3 and MD5. These are popular hash algorithms.
>
> Now if you take a look at MMH3 ... even the command line tool is rare and
> hard to find! Not supported by the core Python, more than one competing
> modules at PyPi, the most popular implements MMH2, not MMH3. The one that
> implements MMH3 isn't popular enough to be part of Debian. For me this
> isn't a popular hash.
>
> The compression used (Zstandard) is becoming popular, but isn't really
> popular. If you use Base64 + GZip + MD5 your data can be processed by a
> shell script on most (if not all) modern Unix style OSs and you don't need
> extra dependencies for Python.
>
>
> Bug reports for preferred behavior are great to receive at GitLab.
>
>
> You mean the image data vs embedded file inconsistency?
>
>
> Regards, SET
>
>
>
> [image: KiCad Services Corporation Logo]
> Seth Hillbrand
> *Lead Developer*
> +1-530-302-5483‬
> Long Beach, CA
> www.kipro-pcb.com    [email protected]
>
>
> On Wed, Jan 15, 2025 at 5:03 AM Salvador E. Tropea <[email protected]>
> wrote:
>
>> Hi All!
>>
>> Given the lack of documentation, I have some questions:
>>
>> 1) Why the data for embedded files seems to be so different than the
>> data for images? I mean, images are base64 encoded and stored as (data
>> STRING) with the string separated in chunks, before KiCad 8 it wasn't an
>> string, so it looked as keywords. KiCad 8 fixed it. And now embedded
>> files are (data |KEYWORDS|) ... Why the |? Why not strings? Can someone
>> explain it?
>>
>> 2) The checksum seems to be a really rare one, is it MurMur Hash 3 with
>> seed 0xABBA2345? I can't find a popular command line tool to verify it.
>> I tried the "mmh3" Python module using `mmh3.hash128(c,
>> seed=0xABBA2345)` (with c as the bytes from the file decoded and
>> uncompressed) and couldn't reproduce the checksum. I guess this rare
>> hash is fast, is it worst? I mean: we have various robust and popular
>> hashes, why this?
>>
>> BTW: I find strange that after choosing to embed fonts the dialog
>> doesn't immediatly show them. They are there when I save, but I think
>> they should be there before.
>>
>> Regards, SET
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "KiCad Developers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion visit
>> https://groups.google.com/a/kicad.org/d/msgid/devlist/b4479a17-148b-490d-8058-4c82225e4e11%40inti.gob.ar
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "KiCad Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion visit
> https://groups.google.com/a/kicad.org/d/msgid/devlist/CAFdeG-p5CHrbVHcANSKjU3TSRarkzSb8LbnAsN2pC79xqisk8g%40mail.gmail.com
> <https://groups.google.com/a/kicad.org/d/msgid/devlist/CAFdeG-p5CHrbVHcANSKjU3TSRarkzSb8LbnAsN2pC79xqisk8g%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "KiCad Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion visit
> https://groups.google.com/a/kicad.org/d/msgid/devlist/7edff2ca-38f5-447d-8247-d8f9b6ec3ccb%40inti.gob.ar
> <https://groups.google.com/a/kicad.org/d/msgid/devlist/7edff2ca-38f5-447d-8247-d8f9b6ec3ccb%40inti.gob.ar?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"KiCad Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/a/kicad.org/d/msgid/devlist/CAFdeG-qae16LWcBo5BpN46GBrS%2BZ_4AAQVruup57nnkQn-Ufkw%40mail.gmail.com.

Reply via email to