I know this isn’t exactly what you’re asking about, but I wanted to
suggest using a tool that’s designed to work with XML rather than with
line-oriented text.
One such tool is
[xml_grep2](https://metacpan.org/pod/distribution/App-xml_grep2/bin/xml_grep2).
Installing it on macOS is a little involved so you may not want to do
it, but if you're interested here's how I did it:
```
/usr/bin/ruby -e "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install cpanminus libxml2
export LDFLAGS="-L/usr/local/opt/libxml2/lib"
export CPPFLAGS="-I/usr/local/opt/libxml2/include"
export PKG_CONFIG_PATH="/usr/local/opt/libxml2/lib/pkgconfig"
cpanm App::Xml_grep2
```
Eventually, you should be able to run xml_grep2:
```
xml_grep2 --hR --text-only '//DataItem' * | bbedit
```
You can pipe that output to grep if you want to further filter the tag
contents:
```
xml_grep2 --hR --text-only '//DataItem' * | grep -oE
'(\{|\[)\s*[a-z]+\s*([0-9]+)\s*(\|.+?)?(\}|\])' | bbedit
```
Hope this helps! :-)
-sam
On 19 Apr 2019, at 8:53, Gustave Stresen-Reuter wrote:
Hi,
Given I've got dozens of folders with more than 1GB of XML documents
in
them, I'm doing the following grep search from the command line:
grep -h -o -R -E '(\{|\[)\s*[a-z]+\s*([0-9]+)\s*(\|.+?)?(\}|\])' * |
bbedit
I'm then removing duplicates and sorting. This gives me a very nice
list of
the data I'm looking for. However, it also includes data that is not
wrapped in specific XML elements. Since grep only does line by line
search
(processes each line of input) and since the XML elements span several
lines, I'm hoping I can use BBEdit to do the search, something like:
(?ms)(?\<DataItem\>)(\{|\[)\s*[a-z]+\s*([0-9]+)\s*(\|.+?)?(\}|\])(?\<\/DataItem\>)
The problem here, though, is that the search results are sent to a
search
results window. Is there any way to recreate the behavior of the grep
command above (the output is just the matching data with no filename,
line
number, nor context)? I am fairly certain that the data I am looking
for
(everything between [ or { and } or ]) does not span multiple lines
(and if
it does it is a mistake on the input so it can be ignored).
I know I can fiddle with the IFS and do tricks like tr \n '' to put
everything onto a single line, but those create other problems of
their own.
I know awk can do something similar but if I've understood what I've
seen
online, it only simulates multiline search by using a loop (and I find
the
syntax to be way, way too hard to follow in spite of how powerful it
is,
so, not really a choice for me).
Any help is greatly appreciated.
Kind regards from the Canary Islands.
Ted Stresen-Reuter
--
This is the BBEdit Talk public discussion group. If you have a
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: <https://www.twitter.com/bbedit>
---
You received this message because you are subscribed to the Google
Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.
--
This is the BBEdit Talk public discussion group. If you have a
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: <https://www.twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.