Assume I've been cursed to scrape HTML. If I convert the pages to Hickory I end up with a big mass of data which, sadly, lacks many "class" or "id"s that would let me easily pick out the data I need. However, for the most part, the only thing I really need off this page is the CVEs, which look like this:
CVE-2021-40539 I'm thinking I might write regex against the plain text of the page, but I'm also curious, is it common to take something like Hiccup or Hickory or a zipper and run regex through it? If yes, how is that done? A small part of the data looks like this: :content [{:type :element, :attrs {:class "tip-intro", :style "font-size: 15px;"}, :tag :p, :content [{:type :element, :attrs nil, :tag :em, :content ["This Joint Cybersecurity Advisory uses the MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK®) framework, Version 8. See the " {:type :element, :attrs {:href "https://attack.mitre.org/versions/v9/techniques/enterprise/"}, :tag :a, :content ["ATT&CK for Enterprise"]} " for referenced threat actor tactics and for techniques."]}]} "\n\n" {:type :element, :attrs nil, :tag :p, :content ["This joint advisory is the result of analytic efforts between the Federal Bureau of Investigation (FBI), United States Coast Guard Cyber Command (CGCYBER), and the Cybersecurity and Infrastructure Security Agency (CISA) to highlight the cyber threat associated with active exploitation of a newly identified vulnerability (CVE-2021-40539) in ManageEngine ADSelfService Plus—a self-service password management and single sign-on solution."]} "\n\n" {:type :element, :attrs nil, :tag :p, :content ["CVE-2021-40539, rated critical by the Common Vulnerability Scoring System (CVSS), is an authentication bypass vulnerability affecting representational state transfer (REST) application programming interface (API) URLs that could enable remote code execution. The FBI, CISA, and CGCYBER assess that advanced persistent threat (APT) cyber actors are likely among those exploiting the vulnerability. The exploitation of ManageEngine ADSelfService Plus poses a serious risk to critical infrastructure companies, U.S.-cleared defense contractors, academic institutions, and other entities that use the software. Successful exploitation of the vulnerability allows an attacker to place webshells, which enable the adversary to conduct post-exploitation activities, such as compromising administrator credentials, conducting lateral movement, and exfiltrating registry hives and Active Directory files."]} "\n\n" -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/clojure/5f2bd2a4-5c35-463b-9cb4-eecb9148fc89n%40googlegroups.com.