On Friday, April 22, 2016 at 4:41:08 PM UTC+5:30, Jussi Piitulainen wrote: > Peter Otten writes: > > > harirammano...@gmail.com wrote: > > > >> On Thursday, April 21, 2016 at 7:03:00 PM UTC+5:30, Jussi Piitulainen > >> wrote: > >>> harirammano...@gmail.com writes: > >>> > >>> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30, > >>> > hariram...@gmail.com wrote: > >>> >> HI All, > >>> >> > >>> >> can you help me out in doing below. > >>> >> > >>> >> file: > >>> >> <start> > >>> >> guava > >>> >> fruit > >>> >> <end> > >>> >> <start> > >>> >> mango > >>> >> fruit > >>> >> <end> > >>> >> <start> > >>> >> orange > >>> >> fruit > >>> >> <end> > >>> >> > >>> >> need to delete from start to end if it contains mango in a file... > >>> >> > >>> >> output should be: > >>> >> > >>> >> <start> > >>> >> guava > >>> >> fruit > >>> >> <end> > >>> >> <start> > >>> >> orange > >>> >> fruit > >>> >> <end> > >>> >> > >>> >> Thank you > >>> > > >>> > any one can guide me ? why xml tree parsing is not working if i have > >>> > root.tag and root.attrib as mentioned in earlier post... > >>> > >>> Assuming the real consists of lines between a start marker and end > >>> marker, a winning plan is to collect a group of lines, deal with it, and > >>> move on. > >>> > >>> The following code implements something close to the plan. You need to > >>> adapt it a bit to have your own source of lines and to restore the end > >>> marker in the output and to account for your real use case and for > >>> differences in taste and judgment. - The plan is as described above, but > >>> there are many ways to implement it. > >>> > >>> from io import StringIO > >>> > >>> text = '''\ > >>> <start> > >>> guava > >>> fruit > >>> <end> > >>> <start> > >>> mango > >>> fruit > >>> <end> > >>> <start> > >>> orange > >>> fruit > >>> <end> > >>> ''' > >>> > >>> def records(source): > >>> current = [] > >>> for line in source: > >>> if line.startswith('<end>'): > >>> yield current > >>> current = [] > >>> else: > >>> current.append(line) > >>> > >>> def hasmango(record): > >>> return any('mango' in it for it in record) > >>> > >>> for record in records(StringIO(text)): > >>> hasmango(record) or print(*record) > >> > >> Hi, > >> > >> not working....this is the output i am getting... > >> > >> \ > > > > This means that the line > > > >>> text = '''\ > > > > has trailing whitespace in your copy of the script. > > That's a nuisance. I wish otherwise undefined escape sequences in > strings raised an error, similar to a stray space after a line > continuation character. > > >> <start> > >> guava > >> fruit > >> > >> <start> > >> orange > >> fruit > > > > Jussi forgot to add the "<end>..." line to the group. > > I didn't forget. I meant what I said when I said the OP needs to adapt > the code to (among other things) restore the end marker in the output. > If they can't be bothered to do anything at all, it's their problem. > > It was already known that this is not the actual format of the data. > > > To fix this change the generator to > > > > def records(source): > > current = [] > > for line in source: > > current.append(line) > > if line.startswith('<end>'): > > yield current > > current = [] > > Oops, I notice that I forgot to start a new record only on encountering > a '<start>' line. That should probably be done, unless the format is > intended to be exactly a sequence of "<start>\n- -\n<end>\n". > > >>> hasmango(record) or print(*record) > > > > The > > > > print(*record) > > > > inserts spaces between record entries (i. e. at the beginning of all > > lines except the first) and adds a trailing newline. > > Yes, I forgot about the space. Sorry about that. > > The final newline was intentional. Perhaps I should have added the end > marker there instead (given my preference to not drag it together with > the data lines), like so: > > print(*record, sep = "", end = "<end>\n") > > Or so: > > print(*record, sep = "") > print("<end>") > > Or so: > > for line in record: > print(line.rstrip("\n") > else: > print("<end>") > > Or: > > for line in record: > print(line.rstrip("\n") > else: > if record and not record[-1].strip() == "<end>": > print("<end>") > > But all this is beside the point that to deal with the stated problem > one might want to obtain access to a whole record *first*, then check if > it contains "mango" in the intended way (details missing but at least > "mango\n" as a full line counts as an occurrence), and only *then* print > the whole record (if it doesn't contain "mango"). > > I can think of two other ways - one if the data can be accessed only > once - but they seem more complicated to me. Hm, well, if it's XML, as > stated in another branch of this thread and contrary to the form of the > example data in this branch, there's a third way that may be good, but > here I'm responding to a line-oriented format. > > > You can avoid this by specifying the delimiters explicitly: > > > > if not hasmango(record): > > print(*record, sep="", end="") > > > > Even with these changes code still looks somewhat brittle... > > That depends on the actual data format, and on what really is intended > to trigger the filter. This approach is a complete waste of effort if > there are no guarantees of things being there on their own lines, for > example. > > Ok, that "\ " not only looks brittle but actually is brittle. The one > time I used that slash, I now regret doing so. Here's a fixed version. > (Not sure of the significance of the number of spaces that start the > first data line. They seem to have doubled along the way.) > > text = '''<start> > guava > fruit > <end> > <start> > mango > fruit > <end> > <start> > orange > fruit > <end> > '''
Hi Jussi, i have seen you have written a definition to fulfill the requirement, can we do this same thing using xml parser, as i have failed to implement the thing using xml parser of python if the file is having the content as below... <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" "http://java.sun.com/dtd/web-app_2_3.dtd"> <web-app> and entire thing works if it has as below: <!DOCTYPE web-app <web-app> what i observe is xml tree parsing is not working if http tags are there in between web-app... -- https://mail.python.org/mailman/listinfo/python-list