Re: How to get truncated output from RegEx search

Sam Hathaway Fri, 19 Apr 2019 07:32:17 -0700

I know this isn’t exactly what you’re asking about, but I wanted tosuggest using a tool that’s designed to work with XML rather than withline-oriented text.

One such tool is[xml_grep2](https://metacpan.org/pod/distribution/App-xml_grep2/bin/xml_grep2).

Installing it on macOS is a little involved so you may not want to doit, but if you're interested here's how I did it:

```

/usr/bin/ruby -e "$(curl -fsSLhttps://raw.githubusercontent.com/Homebrew/install/master/install)"

brew install cpanminus libxml2
export LDFLAGS="-L/usr/local/opt/libxml2/lib"
export CPPFLAGS="-I/usr/local/opt/libxml2/include"
export PKG_CONFIG_PATH="/usr/local/opt/libxml2/lib/pkgconfig"
cpanm App::Xml_grep2
```

Eventually, you should be able to run xml_grep2:

```
xml_grep2 --hR --text-only '//DataItem' * | bbedit
```

You can pipe that output to grep if you want to further filter the tagcontents:

```

xml_grep2 --hR --text-only '//DataItem' * | grep -oE'(\{|\[)\s*[a-z]+\s*([0-9]+)\s*(\|.+?)?(\}|\])' | bbedit

```

Hope this helps! :-)
-sam

On 19 Apr 2019, at 8:53, Gustave Stresen-Reuter wrote:

Hi,
Given I've got dozens of folders with more than 1GB of XML documentsin
them, I'm doing the following grep search from the command line:
grep -h -o -R -E '(\{|\[)\s*[a-z]+\s*([0-9]+)\s*(\|.+?)?(\}|\])' * |bbedit
I'm then removing duplicates and sorting. This gives me a very nicelist of
the data I'm looking for. However, it also includes data that is not
wrapped in specific XML elements. Since grep only does line by linesearch
(processes each line of input) and since the XML elements span several
lines, I'm hoping I can use BBEdit to do the search, something like:

(?ms)(?\<DataItem\>)(\{|\[)\s*[a-z]+\s*([0-9]+)\s*(\|.+?)?(\}|\])(?\<\/DataItem\>)
The problem here, though, is that the search results are sent to asearch
results window. Is there any way to recreate the behavior of the grep
command above (the output is just the matching data with no filename,linenumber, nor context)? I am fairly certain that the data I am lookingfor(everything between [ or { and } or ]) does not span multiple lines(and if
it does it is a mistake on the input so it can be ignored).

I know I can fiddle with the IFS and do tricks like tr \n '' to put
everything onto a single line, but those create other problems oftheir own.
I know awk can do something similar but if I've understood what I'veseenonline, it only simulates multiline search by using a loop (and I findthesyntax to be way, way too hard to follow in spite of how powerful itis,
so, not really a choice for me).

Any help is greatly appreciated.

Kind regards from the Canary Islands.

Ted Stresen-Reuter

--
This is the BBEdit Talk public discussion group. If you have a
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: <https://www.twitter.com/bbedit>
---
You received this message because you are subscribed to the GoogleGroups "BBEdit Talk" group.To unsubscribe from this group and stop receiving emails from it, sendan email to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.

--

This is the BBEdit Talk public discussion group. If you have afeature request or need technical support, please email

"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: <https://www.twitter.com/bbedit>

---You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.

Re: How to get truncated output from RegEx search

Reply via email to