Hello. I believe this is a common case and must have been discussed before on various other forums like awk/sed/regular expression group. However I could not google them out. You would be helping me a lot if you simply point to a reference to a solution.
I want to remove all advertisements in my 100 html files. They are pretty neatly classed, like the following: <div class="advertisement"> ... </div> However I could not simply do this: s/<div class="advertisement">.*</div>// Because it is too greedy, that matches the "</div>" till the last, which is almost always after the advertisement. If I set it to not to be greedy, it also fail because it stops at the first </div> inside the advertisement. Consider this case that both greedy and non-greedy fail: <div class="page-content"> <div class="advertisement"> <div>Our product is the best</div> <div>Contact us now!</div> </div> </div> Greedy output: <div class="page-content"> Non-greedy output: <div class="page-content"> <div>Contact us now!</div> </div> </div> Expected output: <div class="page-content"> </div> The only way to make it right seems to be able to give the replacement / remove expression the ability to "count" the number of <div and </div> it encounters. I could program such thing in C thanks to my college education, but it sounds overkill for such a common task. What would you do in this case? -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org