Re: How to get truncated output from RegEx search

2019-04-22 Thread ThePorgie
If I'm understanding try
Find:
((?s).+?)
Replace:
\1



On Monday, April 22, 2019 at 4:10:31 AM UTC-4, Gustave Stresen-Reuter wrote:
>
> Thanks to everyone. 
>
> I had forgotten about bbfind but the solution I ended up using was what 
> Rich had suggested: Extract (and then a bit more processing on the net 
> result). Also, I had to fiddle a bit with the RegEx as it was being too 
> greedy. I'm still struggling to attain expertise with non-capturing parens 
> and positional assertions. Specifically, given:
>
>  Fusce rhoncus, elit.
> [Aenean] eget es?
>Nulla et arcu.
> 
>
> I'd like to find "Nulla" where the text before and after can be anything 
> except  (in other words, the contents of an element).
>
> Years ago (20?) I saw an article that suggested breaking the text up into 
> (repeatable) fields using RegEx. At the time this seemed so advanced it was 
> more than my little brain could process, but the idea has stuck with me and 
> I now wonder if it would be a better approach than what I'm trying to do.
>
> Thanks!
>
>
> On Fri, Apr 19, 2019 at 9:02 PM Christopher Stone  > wrote:
>
>> On 04/19/2019, at 07:53, Gustave Stresen-Reuter > > wrote:
>>
>> The problem here, though, is that the search results are sent to a search 
>> results window. Is there any way to recreate the behavior of the grep 
>> command above (the output is just the matching data with no filename, line 
>> number, nor context)?
>>
>> --
>>
>> Hey Ted,
>>
>> Try using the `bbfind` command line tool instead of `grep`.
>>
>> --
>> Best Regards,
>> Chris
>>
>> -- 
>> This is the BBEdit Talk public discussion group. If you have a 
>> feature request or need technical support, please email
>> "sup...@barebones.com " rather than posting to the group.
>> Follow @bbedit on Twitter: 
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "BBEdit Talk" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to bbe...@googlegroups.com .
>> To post to this group, send email to bbe...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/bbedit.
>>
>

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


Re: How to get truncated output from RegEx search

2019-04-22 Thread Gustave Stresen-Reuter
Thanks to everyone.

I had forgotten about bbfind but the solution I ended up using was what
Rich had suggested: Extract (and then a bit more processing on the net
result). Also, I had to fiddle a bit with the RegEx as it was being too
greedy. I'm still struggling to attain expertise with non-capturing parens
and positional assertions. Specifically, given:

 Fusce rhoncus, elit.
[Aenean] eget es?
   Nulla et arcu.


I'd like to find "Nulla" where the text before and after can be anything
except  (in other words, the contents of an element).

Years ago (20?) I saw an article that suggested breaking the text up into
(repeatable) fields using RegEx. At the time this seemed so advanced it was
more than my little brain could process, but the idea has stuck with me and
I now wonder if it would be a better approach than what I'm trying to do.

Thanks!


On Fri, Apr 19, 2019 at 9:02 PM Christopher Stone <
listmeis...@suddenlink.net> wrote:

> On 04/19/2019, at 07:53, Gustave Stresen-Reuter 
> wrote:
>
> The problem here, though, is that the search results are sent to a search
> results window. Is there any way to recreate the behavior of the grep
> command above (the output is just the matching data with no filename, line
> number, nor context)?
>
> --
>
> Hey Ted,
>
> Try using the `bbfind` command line tool instead of `grep`.
>
> --
> Best Regards,
> Chris
>
> --
> This is the BBEdit Talk public discussion group. If you have a
> feature request or need technical support, please email
> "supp...@barebones.com" rather than posting to the group.
> Follow @bbedit on Twitter: 
> ---
> You received this message because you are subscribed to the Google Groups
> "BBEdit Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bbedit+unsubscr...@googlegroups.com.
> To post to this group, send email to bbedit@googlegroups.com.
> Visit this group at https://groups.google.com/group/bbedit.
>

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


Re: How to get truncated output from RegEx search

2019-04-19 Thread Christopher Stone
On 04/19/2019, at 07:53, Gustave Stresen-Reuter mailto:tedmaster...@gmail.com>> wrote:
> The problem here, though, is that the search results are sent to a search 
> results window. Is there any way to recreate the behavior of the grep command 
> above (the output is just the matching data with no filename, line number, 
> nor context)?


Hey Ted,

Try using the `bbfind` command line tool instead of `grep`.

--
Best Regards,
Chris

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


Re: How to get truncated output from RegEx search

2019-04-19 Thread Rich Siegel
On 4/19/19 at 8:53 AM, tedmaster...@gmail.com (Gustave 
Stresen-Reuter) wrote:



The problem here, though, is that the search results are sent to a search
results window. Is there any way to recreate the behavior of the grep
command above (the output is just the matching data with no filename, line
number, nor context)?


Give the "Extract" button in the multi-file search window a try. 
It may get you closer to what you're looking for.


R.
--
Rich Siegel Bare Bones Software, Inc.
  

Someday I'll look back on all this and laugh... until they 
sedate me.


--
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email

"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


Re: How to get truncated output from RegEx search

2019-04-19 Thread Sam Hathaway
I know this isn’t exactly what you’re asking about, but I wanted to 
suggest using a tool that’s designed to work with XML rather than with 
line-oriented text.


One such tool is 
[xml_grep2](https://metacpan.org/pod/distribution/App-xml_grep2/bin/xml_grep2).


Installing it on macOS is a little involved so you may not want to do 
it, but if you're interested here's how I did it:


```
/usr/bin/ruby -e "$(curl -fsSL 
https://raw.githubusercontent.com/Homebrew/install/master/install)"

brew install cpanminus libxml2
export LDFLAGS="-L/usr/local/opt/libxml2/lib"
export CPPFLAGS="-I/usr/local/opt/libxml2/include"
export PKG_CONFIG_PATH="/usr/local/opt/libxml2/lib/pkgconfig"
cpanm App::Xml_grep2
```

Eventually, you should be able to run xml_grep2:

```
xml_grep2 --hR --text-only '//DataItem' * | bbedit
```

You can pipe that output to grep if you want to further filter the tag 
contents:


```
xml_grep2 --hR --text-only '//DataItem' * | grep -oE 
'(\{|\[)\s*[a-z]+\s*([0-9]+)\s*(\|.+?)?(\}|\])' | bbedit

```

Hope this helps! :-)
-sam

On 19 Apr 2019, at 8:53, Gustave Stresen-Reuter wrote:


Hi,

Given I've got dozens of folders with more than 1GB of XML documents 
in

them, I'm doing the following grep search from the command line:

grep -h -o -R -E '(\{|\[)\s*[a-z]+\s*([0-9]+)\s*(\|.+?)?(\}|\])' * | 
bbedit


I'm then removing duplicates and sorting. This gives me a very nice 
list of

the data I'm looking for. However, it also includes data that is not
wrapped in specific XML elements. Since grep only does line by line 
search

(processes each line of input) and since the XML elements span several
lines, I'm hoping I can use BBEdit to do the search, something like:

(?ms)(?\)(\{|\[)\s*[a-z]+\s*([0-9]+)\s*(\|.+?)?(\}|\])(?\<\/DataItem\>)

The problem here, though, is that the search results are sent to a 
search

results window. Is there any way to recreate the behavior of the grep
command above (the output is just the matching data with no filename, 
line
number, nor context)? I am fairly certain that the data I am looking 
for
(everything between [ or { and } or ]) does not span multiple lines 
(and if

it does it is a mistake on the input so it can be ignored).

I know I can fiddle with the IFS and do tricks like tr \n '' to put
everything onto a single line, but those create other problems of 
their own.


I know awk can do something similar but if I've understood what I've 
seen
online, it only simulates multiline search by using a loop (and I find 
the
syntax to be way, way too hard to follow in spite of how powerful it 
is,

so, not really a choice for me).

Any help is greatly appreciated.

Kind regards from the Canary Islands.

Ted Stresen-Reuter

--
This is the BBEdit Talk public discussion group. If you have a
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
---
You received this message because you are subscribed to the Google 
Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to bbedit+unsubscr...@googlegroups.com.

To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.




--
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email

"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


How to get truncated output from RegEx search

2019-04-19 Thread Gustave Stresen-Reuter
Hi,

Given I've got dozens of folders with more than 1GB of XML documents in
them, I'm doing the following grep search from the command line:

grep -h -o -R -E '(\{|\[)\s*[a-z]+\s*([0-9]+)\s*(\|.+?)?(\}|\])' * | bbedit

I'm then removing duplicates and sorting. This gives me a very nice list of
the data I'm looking for. However, it also includes data that is not
wrapped in specific XML elements. Since grep only does line by line search
(processes each line of input) and since the XML elements span several
lines, I'm hoping I can use BBEdit to do the search, something like:

(?ms)(?\)(\{|\[)\s*[a-z]+\s*([0-9]+)\s*(\|.+?)?(\}|\])(?\<\/DataItem\>)

The problem here, though, is that the search results are sent to a search
results window. Is there any way to recreate the behavior of the grep
command above (the output is just the matching data with no filename, line
number, nor context)? I am fairly certain that the data I am looking for
(everything between [ or { and } or ]) does not span multiple lines (and if
it does it is a mistake on the input so it can be ignored).

I know I can fiddle with the IFS and do tricks like tr \n '' to put
everything onto a single line, but those create other problems of their own.

I know awk can do something similar but if I've understood what I've seen
online, it only simulates multiline search by using a loop (and I find the
syntax to be way, way too hard to follow in spite of how powerful it is,
so, not really a choice for me).

Any help is greatly appreciated.

Kind regards from the Canary Islands.

Ted Stresen-Reuter

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.