I have one big fat file(8GB) which is in the format "<DOC> <DOCNO> 93489fjdf -adsf0a-t9-4q </DOCNO> sdf0934lkrsjfamkf-q39qjkfrev-dafkvad ,43-0=-toqtgegedag=d0fga </DOC> <DOC> <DOCNO> 9348943jikfsdf0adfa-4q </DOCNO> sdf0934lkrsjfamkf-q39qjkfrev-dafkvad,34 r09mkfas0923rfs;a[qr0qfsfvsdsaf </DOC>"
note that the file looks like an xml at first glance but it isnt. This file has a new line character anywhere and everywhere. hence usage of .* becomes difficult. now the problem is i need to extract data between </DOCNO> till </DOC> and store it in a file by the name mentioned between <DOCNO> and </DOCNO>. at first glance it looked like a problem of awk but after some unsuccessful attempts, i tried sed but couldnt quite get the regex pattern. Can anyone help out with the regex pattern/sed/awk? pracheer gupta _________________________________________________________________ Live Search extreme As India feels the heat of poll season, get all the info you need on the MSN News Aggregator http://news.in.msn.com/National/indiaelections2009/aggregator/default.aspx _______________________________________________ ilugd mailinglist -- ilugd@lists.linux-delhi.org http://frodo.hserus.net/mailman/listinfo/ilugd Archives at: http://news.gmane.org/gmane.user-groups.linux.delhi http://www.mail-archive.com/ilugd@lists.linux-delhi.org/