Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread Albretch Mueller
> Archive.org has a well-documented API at
> https://archive.org/developers/. There's even a command-line tool
> (assuming one doesn't want to use, say, the python library).

I had given a somewhat thorough reading to their API some time ago,
but didn’t find anything that interesting and I was thinking of
developing a java GraalVM API which would be more customizable, easily
usable for other text banks. I took a second look at it and they still
don’t address their own problems, like repeated texts (same exact
text/publication with different identifiers), not standardized
metadata definitions: fr., french, French, fr, … to specify the
language. Author names are entered as free text as well ... so what is
the point of even having an API when the metadata is not well-defined,
-kept.



Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread Greg Wooledge
On Sat, Mar 23, 2024 at 02:05:06PM -0500, Albretch Mueller wrote:
> Actually, in order to deX-Y it in case anyone can offer any help, it
> is more like "I want an index of all the books which have ever been
> written/published" in order to read all of them ;-)

First of all, you will not achieve this goal.  It is not possible for
a human to read every book that has ever been written.  You'll die
before you can even finish a tiny fraction of them.

So, let's say you have a more realistic goal: you want a list of all the
books written by Charles Dickens.

I tried to figure out how to get this out of archive.org but it looks
like their documentation doesn't match their web page.  I started at
 which shows how to
get a list of "items" which all share a common "parent".  I figure
an author might be a reasonable parent.  So then the next question is
how to get the author ID for Charles Dickens.

Next I went to

which tells me I should perform a search on their front page, and
then on the result page, click something called "Media List".

This is where it all falls apart for me.  I can't find a "Media List"
thing to click on.

The documentation also mentions an "ABOUT" that I should be able
to click on to get an Identifier.  Well, that's not a thing I could
find either.  There's an ABOUT link in the top menu bar, which goes to
 which is clearly not what the documentation
was talking about.

All this is far too much of my time wasted trying to help some random
person with an off-topic question on debian-user, so... good luck.



Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread Albretch Mueller
Greg Wooledge via lists.debian.org


>Furthermore, whatever method you are using to *create* this HAR file

>is questionable, since apparently you aren't even getting a properly

>formatted file in the end.


>So, putting these together, it looks like you are taking a file that

>was intended to be used for diagnosing browser/network performance

>issues, and attempting to use this in place of a downloadable index

>of documents from archive.org.


Well, the Chromium HAR log utility has captured that file as a HAR
formatted one of sorts describing the client-server back and forth and
the Linux file utility is telling me it is: "JSON text data". You may
also go:


https://archive.org/search?query=Euklid+OR+Euclid+OR+Euclides&and%5B%5D=lending%3A%22is_readable%22


to save that page and tell me what can you start with its content.
This is what I mean with hellishly obfuscated "js cr@p" and I can't
understand why archive.org would do that.


>Do you have one of these HAR files in a *DIRECTLY DOWNLOADABLE URL*?


the sample json file (the HAR file from archive.org) I am using right
now was uploaded file to:


https://ergosumus.files.wordpress.com/2024/03/karl_rosenkranz02_ia.har_.odt


date

url="https://ergosumus.files.wordpress.com/2024/03/karl_rosenkranz02_ia.har_.odt";

time wget -q --spider --no-verbose --server-response "${url}"; _wgetq=$?

echo "// __ \$_wgetq: |$_wgetq|"


Sat Mar 23 01:39:17 PM CDT 2024

HTTP/1.1 200 OK

Server: nginx

Date: Sat, 23 Mar 2024 18:38:16 GMT

Content-Type: application/vnd.oasis.opendocument.text

Content-Length: 686303

Connection: keep-alive

Last-Modified: Sat, 23 Mar 2024 17:01:03 GMT

Expires: Thu, 18 Apr 2024 19:04:42 GMT

X-Orig-Src: 01_mogdir

X-nc: MISS mdw 24 np

X-Content-Type-Options: nosniff

Alt-Svc: h3=":443"; ma=86400

Accept-Ranges: bytes


real 0m0.582s

user 0m0.080s

sys 0m0.069s

// __ $_wgetq: |0|

~

$ date

Sat Mar 23 11:59:53 AM CDT 2024


$ ls -l Karl_Rosenkranz02_IA.har.*

-rw-r--r-- 1 user user 686303 Mar 23 11:59 Karl_Rosenkranz02_IA.har.odt

-rw-r--r-- 1 user user 4290474 Mar 21 19:17 Karl_Rosenkranz02_IA.har.txt

-rw-r--r-- 1 user user 686303 Mar 23 11:59 Karl_Rosenkranz02_IA.har.zip


$ file --brief Karl_Rosenkranz02_IA.har.*

Zip archive data, at least v2.0 to extract, compression method=deflate

JSON text data

Zip archive data, at least v2.0 to extract, compression method=deflate


$ file Karl_Rosenkranz02_IA.har.*

Karl_Rosenkranz02_IA.har.odt: Zip archive data, at least v2.0 to
extract, compression method=deflate

Karl_Rosenkranz02_IA.har.txt: JSON text data

Karl_Rosenkranz02_IA.har.zip: Zip archive data, at least v2.0 to
extract, compression method=deflate


$ sha256sum Karl_Rosenkranz02_IA.har.*

95c2bf849d67b6812193b72fc8504fcab71b49da7937ea8fd9421bee4075ac86
Karl_Rosenkranz02_IA.har.odt

79dd5a23748db1a7270927b6c16fc28cfff59eaf804ba24b2443da578903ede2
Karl_Rosenkranz02_IA.har.txt

95c2bf849d67b6812193b72fc8504fcab71b49da7937ea8fd9421bee4075ac86
Karl_Rosenkranz02_IA.har.zip

~

or you could:


a) go: https://en.wikipedia.org/wiki/Karl_Rosenkranz

b) click on: Works by or about Karl Rosenkranz (at Internet Archive)

c) on the archive.org page, select "texts" and "always available"
(meaning text which is public domain)

d) open "More Tools" ... as I explained before (with d.5 I meant you
may have to scroll down or use Key press combinations to "manually"
get all records) in Rosenkranz' case I got 169 texts.

~

>This tells me we're deep inside an X-Y problem. The original goal is

>possibly something like "I want an index of all the books about this

>Greek dude". Maybe start from there, and see what answers you get.


Actually, in order to deX-Y it in case anyone can offer any help, it
is more like "I want an index of all the books which have ever been
written/published" in order to read all of them ;-)


Data registries mind their own extant entries. There is no general,
"orbis unum" registry of all texts (generally meant in a philological,
semiological sense: videos, paintings, ...) just the registry not the
extant data. Terribly persuasive silly me tried to explain this idea
to the archive.org folks and they told me off.

What would that registry be good for? Well, let me use self serving
metaphors, some time ago people didn't know how many people lived in
their countries or even their cities, where did the Nile river start,
what an earth map would look like, ... There was a moment in the
history of humankind in which one person could actually have read all
extant literature (at least relating to one culture, say: "natural
philosophy"). Technically it is not so hard, according to google some
130 million books have been printed since the invention of the
printing press. Not that many, anyway. The idea of reading them all
seized me when I was little after reading a one liner by some Perugian
dude (as cannibalized by me):


"the greatest of all gifts and graces that God has granted us with is
the capacity of overcoming oneself".


No

Re: How does the 64bits time_t transition work?

2024-03-23 Thread Jeffrey Walton
On Wed, Mar 20, 2024 at 4:23 AM Brad Rogers  wrote:
>
> On Wed, 20 Mar 2024 08:22:16 +0100
> Detlef Vollmann  wrote:
>
> >Is there a description anywhere how the 64bit time transition works?
>
> I'm far from an expert, but from what I've read, this transition is
> *huge*.  Possibly the largest that has ever occurred in Debian.  It's
> going to take time to get it done.  Lots, and lots, of time.  In the
> meanwhile, it means a good deal of disruption in Sid/unstable.
>
> You should already be aware that running sid comes with certain
> difficulties, and if you're not prepared/willing to deal with them then,
> in all likelihood, Sid isn't for you.

Some folks don't have a choice. To run Debian ports in a Debian
QEMU/Chroot, you have to run Unstable in the guest. You cannot run
Stable or Testing in the guest.

I guess the other choice is to forgo testing on various Debian
architectures. But that seems like a worse choice for everyone
involved. Personally, I would not feel good about this path. I don't
want Debian users and Debian packagers to experience problems I should
have caught during testing.

> Following Marco's advice would be a good first step, IMO.

I don't think this migration was planned well. Debian should have
created a temporary *-t64 port, and then released the appropriate
ISOs. Later, when things got stable, the *-t64 port could have been
merged back into the standard port all at once.

Jeff



Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread Darac Marjal


On 23/03/2024 16:34, Greg Wooledge wrote:

On Sat, Mar 23, 2024 at 11:55:04AM -0400, Greg Wooledge wrote:

On Sat, Mar 23, 2024 at 09:54:05AM -0500, Albretch Mueller wrote:

  1) That HAR file is not properly formatted. Instead of
"attribute":value pairs in the standard way, they have used front
slash + quote pairs (instead of just quotes) erratically all around
the file. That is why you can't use jq.

That is not what I see in the file which I pasted here.

Further investigation:

https://google.com/search?q=what+is+a+HAR+file

   https://www.keycdn.com/support/what-is-a-har-file
   Jan 12, 2023 — A HAR file is primarily used for identifying
   performance issues, such as bottlenecks and slow load times, and page
   rendering problems.

   https://en.wikipedia.org/wiki/HAR_(file_format)
   The HTTP Archive format, or HAR, is a JSON-formatted archive file
   format for logging of a web browser's interaction with a site.
   ...
   This document was never published by the Web Performance Working Group
   and has been abandoned.

So, putting these together, it looks like you are taking a file that
was intended to be used for diagnosing browser/network performance
issues, and attempting to use this in place of a downloadable index
of documents from archive.org.

Furthermore, whatever method you are using to *create* this HAR file
is questionable, since apparently you aren't even getting a properly
formatted file in the end.

This tells me we're deep inside an X-Y problem.  The original goal is
possibly something like "I want an index of all the books about this
Greek dude".  Maybe start from there, and see what answers you get.


If someone was looking to query a Web service programmatically, wouldn't 
the first place to start be seeing if the service has an API?


Archive.org has a well-documented API at 
https://archive.org/developers/. There's even a command-line tool 
(assuming one doesn't want to use, say, the python library).




OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread Greg Wooledge
On Sat, Mar 23, 2024 at 11:55:04AM -0400, Greg Wooledge wrote:
> On Sat, Mar 23, 2024 at 09:54:05AM -0500, Albretch Mueller wrote:
> >  1) That HAR file is not properly formatted. Instead of
> > "attribute":value pairs in the standard way, they have used front
> > slash + quote pairs (instead of just quotes) erratically all around
> > the file. That is why you can't use jq.
> 
> That is not what I see in the file which I pasted here.

Further investigation:

https://google.com/search?q=what+is+a+HAR+file

  https://www.keycdn.com/support/what-is-a-har-file
  Jan 12, 2023 — A HAR file is primarily used for identifying
  performance issues, such as bottlenecks and slow load times, and page
  rendering problems.

  https://en.wikipedia.org/wiki/HAR_(file_format)
  The HTTP Archive format, or HAR, is a JSON-formatted archive file
  format for logging of a web browser's interaction with a site.
  ...
  This document was never published by the Web Performance Working Group
  and has been abandoned.

So, putting these together, it looks like you are taking a file that
was intended to be used for diagnosing browser/network performance
issues, and attempting to use this in place of a downloadable index
of documents from archive.org.

Furthermore, whatever method you are using to *create* this HAR file
is questionable, since apparently you aren't even getting a properly
formatted file in the end.

This tells me we're deep inside an X-Y problem.  The original goal is
possibly something like "I want an index of all the books about this
Greek dude".  Maybe start from there, and see what answers you get.



Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread Greg Wooledge
On Sat, Mar 23, 2024 at 09:54:05AM -0500, Albretch Mueller wrote:
>  a) using a chromium-derived browser, which can be used to dump the
> HAR file log of the network back and forth, go, e. g.:
>   https://en.wikipedia.org/wiki/Anaxagoras
>  b) click on the link that says: "Works by or about Anaxagoras" (at
> Internet Archive)
>  c) on the archive.org page, select "texts" and "always available"
> (meaning text which is public domain, he died 25 centuries ago)
>  d) then to produce the HAR file, go:
>  d.1) More Tools > Developer Tools;
>  d.2) click on "Network" tab;
>  d.3) Filter: GET
>  d.4) check: "Preserve Log"
>  d.5) scroll down the page all the way to make the client-server back
> and forth cascade
>  d.6) save the network log as HAR file to then open and eyeball it!

This is incomprehensible to me.  What the hell is d.5 supposed to be?
Even if I close the Shift-Ctrl-I window, and Ctrl-R to reload the page,
and then reopen Shift-Ctrl-I, and click the down-arrow-in-a-dish icon
whose tooltip says "Export HAR..." all I get in the resulting file
is this:

hobbit:~$ cat Downloads/archive.org.har 
{
  "log": {
"version": "1.2",
"creator": {
  "name": "WebInspector",
  "version": "537.36"
},
"pages": [],
"entries": []
  }
}hobbit:~$ 

Do you have one of these HAR files in a *DIRECTLY DOWNLOADABLE URL*?
Something that doesn't take 12 manual steps that are impossible to
perform?

Or can you *attach* one to a message to this mailing list?  Make sure
it's small.

>  1) That HAR file is not properly formatted. Instead of
> "attribute":value pairs in the standard way, they have used front
> slash + quote pairs (instead of just quotes) erratically all around
> the file. That is why you can't use jq.

That is not what I see in the file which I pasted here.



Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread Albretch Mueller
>On Sat, Mar 23, 2024 at 1:44 AM  wrote:
>> On Sat, Mar 23, 2024 at 12:53:24AM -0500, Albretch Mueller wrote:
>> out of a HAR file containing lots of obfuscating js cr@p and all kinds of
>> nonsense I was able to extract line looking like:

>It's not "js cr@p", It is called JSON. And there's a spec for
>it.

 Well, I am old enough to remember when JSON meant: "JavaScript Object
Notation" in the form of human-readable attribute:value text files.

 a) using a chromium-derived browser, which can be used to dump the
HAR file log of the network back and forth, go, e. g.:
  https://en.wikipedia.org/wiki/Anaxagoras
 b) click on the link that says: "Works by or about Anaxagoras" (at
Internet Archive)
 c) on the archive.org page, select "texts" and "always available"
(meaning text which is public domain, he died 25 centuries ago)
 d) then to produce the HAR file, go:
 d.1) More Tools > Developer Tools;
 d.2) click on "Network" tab;
 d.3) Filter: GET
 d.4) check: "Preserve Log"
 d.5) scroll down the page all the way to make the client-server back
and forth cascade
 d.6) save the network log as HAR file to then open and eyeball it!

>> I have tried substring substitution, sed et tr to no avail.
>You might have a lot of fun trying to parse JSON with sed and
>tr.

 1) That HAR file is not properly formatted. Instead of
"attribute":value pairs in the standard way, they have used front
slash + quote pairs (instead of just quotes) erratically all around
the file. That is why you can't use jq.
 2) since they (archive.org) have been changing the format they use on
their pages (to avoid html scrappers?), I don't try to make sense of
what they do. I would just use quick hacks and "keep moving".
 2.a) make editing copy of the file
 2.b) using sed I would parse out the lines with the data I need:
  sed --in-place --expression
's/{\\"index\\":\\"/\n{\\"index\\":\\"/g' ""
 2.c) once you extract them, you then need to parse the fields for
post processing.

 I have tried substring substitution, sed et tr to first replace all
front slash + quote pairs into quotes to then be able to use jq in the
happy way you should. I haven't been successful (is that the reason
why they obfuscate their pages in that way?)

 lbrtchx



Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread David Christensen

On 3/22/24 22:53, Albretch Mueller wrote:

out of a HAR file containing lots of obfuscating js cr@p and all kinds of
nonsense I was able to extract line looking like:

var00='{\"index\":\"prod-h-006\",\"fields\":{\"identifier\":\"bub_gb_O2EAMAAJ\",\"title\":\"Die
Wissenschaft vom subjectiven Geist\",\"creator\":[\"Karl Rosenkranz\",
\"Mr. ABC123\"],\"collection\":[\"europeanlibraries\",
\"americana\"],\"year\":1843,\"language\":[\"German\"],\"item_size\":797368506},\"_score\":[50.629513]}'
echo "// __ \$var00: |$var00|"

The final result that I need would look like:
o
var02='bub_gb_O2EAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl
Rosenkranz", "Mr. ABC123"]|["europeanlibraries",
"americana"]|1843|["German"]|797368506|[50.629513]'
echo "// __ \$var02: |$var02|"

I have tried substring substitution, sed et tr to no avail.

lbrtchx



My daily driver:

2024-03-23 04:02:27 dpchrist@laalaa 
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller

$ cat /etc/debian_version; uname -a; perl -v | head -n 2 | grep .
11.9
Linux laalaa 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) 
x86_64 GNU/Linux
This is perl 5, version 32, subversion 1 (v5.32.1) built for 
x86_64-linux-gnu-thread-multi



Put the JSON into a data file, one record per line (my mailer is 
line-wrapping data.json -- it contains two lines):


2024-03-23 04:22:20 dpchrist@laalaa 
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller

$ cat data.json
{"index":"prod-h-006","fields":{"identifier":"bub_gb_O2EAMAAJ","title":"Die 
Wissenschaft vom subjectiven Geist","creator":["Karl Rosenkranz", "Mr. 
ABC123"],"collection":["europeanlibraries", 
"americana"],"year":1843,"language":["German"],"item_size":797368506},"_score":[50.629513]}
{"index":"prod-h-007","fields":{"identifier":"abc_de_12FGHIJKLMNO","title":"My 
Title","creator":["Some Body", "Somebody 
Else"],"collection":["europeanlibraries", 
"americana"],"year":2024,"language":["English"],"item_size":1234567890},"_score":[12.345678]}



A Perl script to read newline-delimited JSON records and pretty print each:

2024-03-23 04:28:59 dpchrist@laalaa 
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller

$ cat munge-json
#!/usr/bin/perl
# $Id: munge-json,v 1.3 2024/03/23 11:28:58 dpchrist Exp $
# Refer to debian-user 3/22/24 22:53 Albretch Mueller
# "trying to parse lines from an awkwardly formatted HAR file"
# by David Paul Christensen dpchr...@holgerdanske.com
# Public Domain
use strict;
use warnings;
use Data::Dumper;
use JSON;
use Getopt::Long;
$Data::Dumper::Sortkeys = 1;
my $debug;
GetOptions('debug|d' => \$debug) or die;
while (<>) {
my $rh = decode_json $_;
print Data::Dumper->Dump([$rh], [qw(rh)]) if $debug;
print
join('|',
$rh->{fields}{identifier},
$rh->{fields}{title},
'["' .  join('", "', @{$rh->{fields}{creator}}) . '"]',
'["' .  join('", "', @{$rh->{fields}{collection}}) . '"]',
$rh->{fields}{year},
'["' .  join('", "', @{$rh->{fields}{language}}) . '"]',
$rh->{fields}{item_size},
'[' .  join(', ', @{$rh->{_score}}) . ']',
), "\n";
}   


Run the script as a Unix filter:

2024-03-23 04:30:16 dpchrist@laalaa 
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller

$ ./munge-json data.json
bub_gb_O2EAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl 
Rosenkranz", "Mr. ABC123"]|["europeanlibraries", 
"americana"]|1843|["German"]|797368506|[50.629513]
abc_de_12FGHIJKLMNO|My Title|["Some Body", "Somebody 
Else"]|["europeanlibraries", 
"americana"]|2024|["English"]|1234567890|[12.345678]


2024-03-23 04:30:18 dpchrist@laalaa 
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller

$ cat data.json | ./munge-json
bub_gb_O2EAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl 
Rosenkranz", "Mr. ABC123"]|["europeanlibraries", 
"americana"]|1843|["German"]|797368506|[50.629513]
abc_de_12FGHIJKLMNO|My Title|["Some Body", "Somebody 
Else"]|["europeanlibraries", 
"americana"]|2024|["English"]|1234567890|[12.345678]



David



Re: Root password strength

2024-03-23 Thread Michael Kjörling
On 22 Mar 2024 20:01 -0400, from ler...@gmail.com (Lee):
> The IPv4 address space is only 32 bits long.  Scanning 2^32 = about
> 4,000,000,000 addresses for an open port is easily doable.
> The IPv6 address space is a bit harder...  Let's just say that 7/8th
> of the IPv6 address space is reserved[1] so that means 2^125 addresses
> would need to be scanned .. which just isn't going to happen.
> There are ways for attackers to get the IPv6 address scan space down
> to a reasonable number.  I probably don't know most of them..

You are correct that the globally assigned unicast IPv6 address range
is a /3 out of 128 bits so 2^125 addresses. (2000::/3 out of ::/0.)

But only a tiny sliver of that address space is actually assigned to
anyone on the global Internet.

One can start by looking at the core routing tables and routing
announcements that form the Internet backbone. My guess, without
having looked, would be that you'd be looking at maybe _at most_ say a
/10 (although likely not contiguous) which actually routes anywhere at
all in the default-free zone. It might well be significant less than
that.

If you're already willing to do something like this, I strongly
suspect DNS in particular can help narrow the range down further. For
example, you could iterate over /32s and see which of those have any
reverse DNS set up by looking for corresponding delegations in
ip6.arpa. That'll miss some, but should catch the majority of actively
used assignments.

You can probably eliminate most /64s more or less immediately by
trying to reach _any_ address within each, because most /64s likely
won't be in use and therefore won't route.

Also, while addresses within each /64 look random, there's probably
ample opportunity to optimize the search there through for example EUI
assignment prefix tables and IPv6 address node portion generation
rules. And once someone connects to anywhere directly (that is, not
through something like a VPN concentrator which will replace with its
own outgoing address), whatever system was connected to at a minimum
has a known-good address to check.

And all this is just things I can think of right now. I wouldn't be
the least surprised if there are many more optimizations that can be
made by someone who actually spends some time looking into this.

So while scanning the IPv6 address space certainly is a larger
undertaking than similarly for IPv4, **scanning the IPv6 address space
is far less than 2^93 times harder** than scanning the IPv4 address
space as one might think looking only at _possible_ address length.
IPv6 addresses look random to the human eye, but especially in the
network /64 half of the address, they are far from randomly assigned.

Also, IPv6 typically being used with globally routable addresses
everywhere (as the Internet was meant to be) means that having good
firewalling is a _must_ in the present-day environment. If you do,
then having a globally routable IP address assigned to an end node is
not much of an issue.

-- 
Michael Kjörling 🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”



Re: Root password strength

2024-03-23 Thread Michael Kjörling
On 22 Mar 2024 17:26 +0500, from avbe...@gmail.com (Alexander V. Makartsev):
>     This is because of how IPv4 network address translation (NAT) works, to
> allow multiple LAN hosts to connect to Internet with single IP address
> assigned by Internet Service Provider (ISP).

A NAT router might also implement firewalling functionality, but _NAT
is not a firewall_.

Dropping traffic because it is prohibited (or because it's not
allowed) is _not_ the same thing as dropping traffic because the
device doesn't know what to do with it.


> Now, I don't want to scaremonger and feed anyone's paranoia, but for the
> sake of completion, there are known cases in history when router/firewall
> had vulnerabilities, or firmware flaws, or configuration negligence, that
> allowed perpetrators to 'hack' them, as in gain full access and control over
> their firmware and gain network access to LAN hosts.
> These cases are extremely rare nowadays and very hard to pull off
> successfully, especially if the device owner keeps firmware up-to-date and
> configuration tidy.

Sure, firewalls can have bugs (which may or may not affect security).
But so can software running on a PC. The solution is much the same:
use supported software, and install updates promptly. For a firewall,
get one where the vendor offers, or can at least be expected to offer,
upgrades for a significant amount of time.

-- 
Michael Kjörling 🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”



Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread mgr...@grant.org

Here's a hint at a start of what you need to do, it should be pretty easy to 
extend this, if it's unclear, let me know:

for starters, run your "gunk" into jq like this:

$ echo 
{\"index\":\"prod-h-006\",\"fields\":{\"identifier\":\"bub_gb_O2EAMAAJ\",\"title\":\"Die
 Wissenschaft vom subjectiven Geist\",\"creator\":[\"Karl Rosenkranz\", \"Mr. 
ABC123\"],\"collection\":[\"europeanlibraries\", 
\"americana\"],\"year\":1843,\"language\":[\"German\"],\"item_size\":797368506},\"_score\":[50.629513]}
 | jq
{
  "index": "prod-h-006",
  "fields": {
"identifier": "bub_gb_O2EAMAAJ",
"title": "Die Wissenschaft vom subjectiven Geist",
"creator": [
  "Karl Rosenkranz",
  "Mr. ABC123"
],
"collection": [
  "europeanlibraries",
  "americana"
],
"year": 1843,
"language": [
  "German"
],
"item_size": 797368506
  },
  "_score": [
50.629513
  ]
}

then, start building your output like this:

echo 
{\"index\":\"prod-h-006\",\"fields\":{\"identifier\":\"bub_gb_O2EAMAAJ\",\"title\":\"Die
 Wissenschaft vom subjectiven Geist\",\"creator\":[\"Karl Rosenkranz\", \"Mr. 
ABC123\"],\"collection\":[\"europeanlibraries\", 
\"americana\"],\"year\":1843,\"language\":[\"German\"],\"item_size\":797368506},\"_score\":[50.629513]}
 | jq '.fields.identifier + "|" + .fields.title'

jq is an amazing tool, it's a full fledged programming language.  You just need 
to continue concatenating your desired output.  You might even find you can do 
what you want all inside a jq script instead of what you're doing.  Consider 
writing a jq script with the first line of the script #!/usr/bin/jq

Hope this gets you on the right path!

Michael Grant


From: to...@tuxteam.de
Sent: Friday, March 22, 2024 23:44
To: Albretch Mueller
Cc: debian-user
Subject: Re: trying to parse lines from an awkwardly formatted HAR file ...

On Sat, Mar 23, 2024 at 12:53:24AM -0500, Albretch Mueller wrote:
> out of a HAR file containing lots of obfuscating js cr@p and all kinds of
> nonsense I was able to extract line looking like:

It's not "js cr@p", It is called JSON. And there's a spec for
it.

[...]

> I have tried substring substitution, sed et tr to no avail.

You might have a lot of fun trying to parse JSON with sed and
tr.

If you are serious about it, you should try a proper parser
and extractor. I'd recommend jq [1], available in Debian under
the same-named package. I have written a few shell scripts
reaching into the innards of

You'll have to wrap your brain around it, but in the time you
have implemented a parser for js in "sed and tr" (you might
need a dash of "proper programming language" around that, some
luck and a ton of elbow grease) you might have wrapped your
brain like 16 times around jq (or some other appropriate tool).

Cheers
--
tomás