Assume I've been cursed to scrape HTML. If I convert the pages to Hickory I 
end up with a big mass of data which, sadly, lacks many "class" or "id"s 
that would let me easily pick out the data I need. However, for the most 
part, the only thing I really need off this page is the CVEs, which look 
like this:

CVE-2021-40539

I'm thinking I might write regex against the plain text of the page, but 
I'm also curious, is it common to take something like Hiccup or Hickory or 
a zipper and run regex through it? If yes, how is that done? 

A small part of the data looks like this:

                :content
                [{:type :element,
                  :attrs
                  {:class "tip-intro", :style "font-size: 15px;"},
                  :tag :p,
                  :content
                  [{:type :element,
                    :attrs nil,
                    :tag :em,
                    :content
                    ["This Joint Cybersecurity Advisory uses the MITRE 
Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK®) framework, 
Version 8. See the "
                     {:type :element,
                      :attrs
                      {:href
                      
 "https://attack.mitre.org/versions/v9/techniques/enterprise/"},
                      :tag :a,
                      :content ["ATT&CK for Enterprise"]}
                     " for  referenced threat actor tactics and for 
techniques."]}]}
                 "\n\n"
                 {:type :element,
                  :attrs nil,
                  :tag :p,
                  :content
                  ["This joint advisory is the result of analytic efforts 
between the Federal Bureau of Investigation (FBI), United States Coast 
Guard Cyber Command (CGCYBER), and the Cybersecurity and Infrastructure 
Security Agency (CISA) to highlight the cyber threat associated with active 
exploitation of a newly identified vulnerability (CVE-2021-40539) in 
ManageEngine ADSelfService Plus—a self-service password management and 
single sign-on solution."]}
                 "\n\n"
                 {:type :element,
                  :attrs nil,
                  :tag :p,
                  :content
                  ["CVE-2021-40539, rated critical by the Common 
Vulnerability Scoring System (CVSS), is an authentication bypass 
vulnerability affecting representational state transfer (REST) application 
programming interface (API) URLs that could enable remote code execution. 
The FBI, CISA, and CGCYBER assess that advanced persistent threat (APT) 
cyber actors are likely among those exploiting the vulnerability. The 
exploitation of ManageEngine ADSelfService Plus poses a serious risk to 
critical infrastructure companies, U.S.-cleared defense contractors, 
academic institutions, and other entities that use the software. Successful 
exploitation of the vulnerability allows an attacker to place webshells, 
which enable the adversary to conduct post-exploitation activities, such as 
compromising administrator credentials, conducting lateral movement, and 
exfiltrating registry hives and Active Directory files."]}
                 "\n\n"

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/5f2bd2a4-5c35-463b-9cb4-eecb9148fc89n%40googlegroups.com.

Reply via email to