On Sat, Jun 20, 2020 at 12:20:41PM +0200, Albretch Mueller wrote:
> _X=".\(html\|txt\)"
> _SDIR="$(pwd)"
>
> _AR_TERMS=(
> Kant
> "Gilbert Ryle"
> Hegel
> )
>
> for iZ in ${!_AR_TERMS[@]}; do
> find "${_SDIR}" -type f -iregex .*"${_X}" -exec grep -il
> "${_AR_TERMS[$iZ]}" {} \;
> done # iZ: terms search/grep'ped inside text files; echo "~";
>
>
> # this would be much faster
>
> find "${_SDIR}" -type f -iregex .*"${_X}" -exec grep -il
> "Kant\|Gilbert Ryle\|Hegel" {} \;
>
> but how do I know which match happened in order to save it into separate
> files?Hm. The first approach goes three times through your files, once for each term. The second goes once, for a combined regular expression. So no wonder the second approach is faster. But to actually attack the problem you should be aware that the second method is doing *something different* from the first one: "grep -l" will stop at the first hit, so even if you could ask grep which one of the alternatives it found, it'll miss Hegel in a file where Kant figures first. Is that what you want? Once you have answered that question, you'll be able to proceed. One possibility is postprocessing your output: grep outputs the hit line, and you can match that against the individual terms; you'd have to drop the "-l" for that, making things somewhat slower. Another possibility is to keep the "-l" and to re-grep the files found against all the individual patterns. Cheers -- t
signature.asc
Description: Digital signature

