I know this isn’t exactly what you’re asking about, but I wanted to suggest using a tool that’s designed to work with XML rather than with line-oriented text.

One such tool is [xml_grep2](https://metacpan.org/pod/distribution/App-xml_grep2/bin/xml_grep2).

Installing it on macOS is a little involved so you may not want to do it, but if you're interested here's how I did it:

```
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install cpanminus libxml2
export LDFLAGS="-L/usr/local/opt/libxml2/lib"
export CPPFLAGS="-I/usr/local/opt/libxml2/include"
export PKG_CONFIG_PATH="/usr/local/opt/libxml2/lib/pkgconfig"
cpanm App::Xml_grep2
```

Eventually, you should be able to run xml_grep2:

```
xml_grep2 --hR --text-only '//DataItem' * | bbedit
```

You can pipe that output to grep if you want to further filter the tag contents:

```
xml_grep2 --hR --text-only '//DataItem' * | grep -oE '(\{|\[)\s*[a-z]+\s*([0-9]+)\s*(\|.+?)?(\}|\])' | bbedit
```

Hope this helps! :-)
-sam

On 19 Apr 2019, at 8:53, Gustave Stresen-Reuter wrote:

Hi,

Given I've got dozens of folders with more than 1GB of XML documents in
them, I'm doing the following grep search from the command line:

grep -h -o -R -E '(\{|\[)\s*[a-z]+\s*([0-9]+)\s*(\|.+?)?(\}|\])' * | bbedit

I'm then removing duplicates and sorting. This gives me a very nice list of
the data I'm looking for. However, it also includes data that is not
wrapped in specific XML elements. Since grep only does line by line search
(processes each line of input) and since the XML elements span several
lines, I'm hoping I can use BBEdit to do the search, something like:

(?ms)(?\<DataItem\>)(\{|\[)\s*[a-z]+\s*([0-9]+)\s*(\|.+?)?(\}|\])(?\<\/DataItem\>)

The problem here, though, is that the search results are sent to a search
results window. Is there any way to recreate the behavior of the grep
command above (the output is just the matching data with no filename, line number, nor context)? I am fairly certain that the data I am looking for (everything between [ or { and } or ]) does not span multiple lines (and if
it does it is a mistake on the input so it can be ignored).

I know I can fiddle with the IFS and do tricks like tr \n '' to put
everything onto a single line, but those create other problems of their own.

I know awk can do something similar but if I've understood what I've seen online, it only simulates multiline search by using a loop (and I find the syntax to be way, way too hard to follow in spite of how powerful it is,
so, not really a choice for me).

Any help is greatly appreciated.

Kind regards from the Canary Islands.

Ted Stresen-Reuter

--
This is the BBEdit Talk public discussion group. If you have a
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: <https://www.twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.



--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: <https://www.twitter.com/bbedit>
--- You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.

Reply via email to