On 3/22/24 22:53, Albretch Mueller wrote:
out of a HAR file containing lots of obfuscating js cr@p and all kinds of
nonsense I was able to extract line looking like:
var00='{\"index\":\"prod-h-006\",\"fields\":{\"identifier\":\"bub_gb_O2EAAAAAMAAJ\",\"title\":\"Die
Wissenschaft vom subjectiven Geist\",\"creator\":[\"Karl Rosenkranz\",
\"Mr. ABC123\"],\"collection\":[\"europeanlibraries\",
\"americana\"],\"year\":1843,\"language\":[\"German\"],\"item_size\":797368506},\"_score\":[50.629513]}'
echo "// __ \$var00: |$var00|"
The final result that I need would look like:
o
var02='bub_gb_O2EAAAAAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl
Rosenkranz", "Mr. ABC123"]|["europeanlibraries",
"americana"]|1843|["German"]|797368506|[50.629513]'
echo "// __ \$var02: |$var02|"
I have tried substring substitution, sed et tr to no avail.
lbrtchx
My daily driver:
2024-03-23 04:02:27 dpchrist@laalaa
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
$ cat /etc/debian_version; uname -a; perl -v | head -n 2 | grep .
11.9
Linux laalaa 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31)
x86_64 GNU/Linux
This is perl 5, version 32, subversion 1 (v5.32.1) built for
x86_64-linux-gnu-thread-multi
Put the JSON into a data file, one record per line (my mailer is
line-wrapping data.json -- it contains two lines):
2024-03-23 04:22:20 dpchrist@laalaa
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
$ cat data.json
{"index":"prod-h-006","fields":{"identifier":"bub_gb_O2EAAAAAMAAJ","title":"Die
Wissenschaft vom subjectiven Geist","creator":["Karl Rosenkranz", "Mr.
ABC123"],"collection":["europeanlibraries",
"americana"],"year":1843,"language":["German"],"item_size":797368506},"_score":[50.629513]}
{"index":"prod-h-007","fields":{"identifier":"abc_de_12FGHIJKLMNO","title":"My
Title","creator":["Some Body", "Somebody
Else"],"collection":["europeanlibraries",
"americana"],"year":2024,"language":["English"],"item_size":1234567890},"_score":[12.345678]}
A Perl script to read newline-delimited JSON records and pretty print each:
2024-03-23 04:28:59 dpchrist@laalaa
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
$ cat munge-json
#!/usr/bin/perl
# $Id: munge-json,v 1.3 2024/03/23 11:28:58 dpchrist Exp $
# Refer to debian-user 3/22/24 22:53 Albretch Mueller
# "trying to parse lines from an awkwardly formatted HAR file"
# by David Paul Christensen dpchr...@holgerdanske.com
# Public Domain
use strict;
use warnings;
use Data::Dumper;
use JSON;
use Getopt::Long;
$Data::Dumper::Sortkeys = 1;
my $debug;
GetOptions('debug|d' => \$debug) or die;
while (<>) {
my $rh = decode_json $_;
print Data::Dumper->Dump([$rh], [qw(rh)]) if $debug;
print
join('|',
$rh->{fields}{identifier},
$rh->{fields}{title},
'["' . join('", "', @{$rh->{fields}{creator}}) . '"]',
'["' . join('", "', @{$rh->{fields}{collection}}) . '"]',
$rh->{fields}{year},
'["' . join('", "', @{$rh->{fields}{language}}) . '"]',
$rh->{fields}{item_size},
'[' . join(', ', @{$rh->{_score}}) . ']',
), "\n";
}
Run the script as a Unix filter:
2024-03-23 04:30:16 dpchrist@laalaa
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
$ ./munge-json data.json
bub_gb_O2EAAAAAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl
Rosenkranz", "Mr. ABC123"]|["europeanlibraries",
"americana"]|1843|["German"]|797368506|[50.629513]
abc_de_12FGHIJKLMNO|My Title|["Some Body", "Somebody
Else"]|["europeanlibraries",
"americana"]|2024|["English"]|1234567890|[12.345678]
2024-03-23 04:30:18 dpchrist@laalaa
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
$ cat data.json | ./munge-json
bub_gb_O2EAAAAAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl
Rosenkranz", "Mr. ABC123"]|["europeanlibraries",
"americana"]|1843|["German"]|797368506|[50.629513]
abc_de_12FGHIJKLMNO|My Title|["Some Body", "Somebody
Else"]|["europeanlibraries",
"americana"]|2024|["English"]|1234567890|[12.345678]
David