>On Sat, Mar 23, 2024 at 1:44 AM <to...@tuxteam.de> wrote: >> On Sat, Mar 23, 2024 at 12:53:24AM -0500, Albretch Mueller wrote: >> out of a HAR file containing lots of obfuscating js cr@p and all kinds of >> nonsense I was able to extract line looking like:
>It's not "js cr@p", It is called JSON. And there's a spec for >it. Well, I am old enough to remember when JSON meant: "JavaScript Object Notation" in the form of human-readable attribute:value text files. a) using a chromium-derived browser, which can be used to dump the HAR file log of the network back and forth, go, e. g.: https://en.wikipedia.org/wiki/Anaxagoras b) click on the link that says: "Works by or about Anaxagoras" (at Internet Archive) c) on the archive.org page, select "texts" and "always available" (meaning text which is public domain, he died 25 centuries ago) d) then to produce the HAR file, go: d.1) More Tools > Developer Tools; d.2) click on "Network" tab; d.3) Filter: GET d.4) check: "Preserve Log" d.5) scroll down the page all the way to make the client-server back and forth cascade d.6) save the network log as HAR file to then open and eyeball it! >> I have tried substring substitution, sed et tr to no avail. >You might have a lot of fun trying to parse JSON with sed and >tr. 1) That HAR file is not properly formatted. Instead of "attribute":value pairs in the standard way, they have used front slash + quote pairs (instead of just quotes) erratically all around the file. That is why you can't use jq. 2) since they (archive.org) have been changing the format they use on their pages (to avoid html scrappers?), I don't try to make sense of what they do. I would just use quick hacks and "keep moving". 2.a) make editing copy of the file 2.b) using sed I would parse out the lines with the data I need: sed --in-place --expression 's/{\\"index\\":\\"/\n{\\"index\\":\\"/g' "<editing copy>" 2.c) once you extract them, you then need to parse the fields for post processing. I have tried substring substitution, sed et tr to first replace all front slash + quote pairs into quotes to then be able to use jq in the happy way you should. I haven't been successful (is that the reason why they obfuscate their pages in that way?) lbrtchx