On Thursday, 31 March 2016 at 16:46:42 UTC, Adam D. Ruppe wrote:
On Thursday, 31 March 2016 at 16:38:59 UTC, Anon wrote:
I've been spending my D time thinking about potential changes to how template string value parameters are encoded.


How does it compare to simply gzipping the string and writing it out with base62?

My encoding is shorter in the typical use case, at least when using xz instead gzip. (xz was quicker/easier to get raw compressed data without a header.)

1= Raw UTF-8, 2= my encoder, 3= `echo -n "$1" | xz -Fraw | base64`

---
1. some_identifier
2. some_identifier_
3. AQA0c29tZV9pZGVudGlmaWVyAA==

1. /usr/include/d/std/stdio.d
2. usrincludedstdstdiod_jqacdhbd
3. AQAZL3Vzci9pbmNsdWRlL2Qvc3RkL3N0ZGlvLmQa

1. Hello, World!
2. HelloWorld_0far4i
3. AQAMSGVsbG8sIFdvcmxkIQA=

1. こんにちは世界
2. XtdCDr5mL02g3rv
3. AQAU44GT44KT44Gr44Gh44Gv5LiW55WMAA==
---

The problem is that compression isn't magical, and a string needs to be long enough and have enough repetition to compress well. If it isn't, compression causes the data to grow, and base64 compounds that. For the sake of fairness, let's also do a larger (compressible) string.

Input: 1000 lines, each with the text "Hello World"

1. 12000 bytes
2. 12008 bytes
3. 94 bytes

However, my encoding is still fairly compressible, so we *could* route it through the same compression if/when a symbol is determined to be compressible. That yields 114 bytes.

The other thing I really like about my encoder is that plain C identifiers are left verbatim visible in the result. That would be especially nice with, e.g., opDispatch.

Would a hybrid approach (my encoding, optionally using compression when it would be advantageous) make sense? My encoder already has to process the whole string, so it could do some sort of analysis to estimate how compressible the result would be. I don't know what that would look like, but it could work.

Alternately, we could do the compression on whole mangled names, not just the string values, but I don't know how desirable that is.

Reply via email to