Re: [Tutor] Fwd: find second occurance of string in line

2015-09-08 Thread Peter Otten
Albert-Jan Roskam wrote:

>> import lxml.etree
>>
>> tree = lxml.etree.parse("example.xml")
>> print tree.xpath("//objectdata/general/timestamp/text()")
> 
> Nice. I do need to try lxml some time. Is the "text()" part xpath as well?
 
Yes. I think ElementTree supports a subset of XPath.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] find second occurance of string in line

2015-09-08 Thread Alan Gauld

On 08/09/15 17:00, richard kappler wrote:

I need to find the index of the second occurance of a string in an xml file
for parsing.


Do you want to find just the second occurence in the *file*
or the second occurence within a given tag in the file (and
there could be multiple such tags)?



 I understand re well enough to do what I want to do


Using re to parse XML is usually the wrong way to go about it.
Fortunately you are not using re in the code below.
However, a real XML parser such as etree(from the std lib)
or lxml might work better.


first instance, but despite many Google searches have yet to find something
to get the index of the second instance, because split won't really work on
my xml file (if I understand split properly) as there are no spaces.


split can split on any character you want, whitespace
just happens to be the default.


Specifically I'm looking for the second  in an objectdata line.


Is objectdata within a specific tag? Usually when parsing XML its
the tags you look for first since "lines" can be broken over
multiple lines and multiple tags can exist on one literal line.


Not all lines are objectdata lines, though all objectdata lines do have
more than one .


This implies there are many objectdata lines within your file? See the 
first comment above... do you want the second index for the first 
objectdata line or do you want it for every objectdata line?



import re


You don't use this.


with open("example.xml", 'r') as f:
 for line in f:
 if "objectdata" in line:
 if "" in line:
 x = ""


You should assign this once above the loops, it saves a lot of 
duplicated work.



 first = x.index(line)


This is looking for the index of line within x.
I suspect you really want

first = line.index(x)


 second = x[first+1:].index(line)


You can specify a start position for index directly:

second = line.index(x,first+1)



 print first, second
 else:
 print "no timestamp"
 else:
 print "no objectdata"

my traceback:

Traceback (most recent call last):
   File "2iter.py", line 10, in 
 first = x.index(line)
ValueError: substring not found


That's what you get when the search fails.
You should use try/except when using index()

Alternatively try using str.find() which returns -1
when no index is found. But you need to check before
using it because -1 is, of course, a valid string index!

HTH

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Fwd: find second occurance of string in line

2015-09-08 Thread Peter Otten
richard kappler wrote:

>> Do you want to find just the second occurence in the *file* or the second
> occurence within a given tag in the file (and there could be multiple such
> tags)?
> 
> There are multiple objectdata lines in the file and I wish to find the
> second occurence of timestamp in each of those lines.
> 
>> Is objectdata within a specific tag? Usually when parsing XML its the
> tags you look for first since "lines" can be broken over multiple lines
> and multiple tags can exist on one literal line.
> 
> objectdata is within a tag as is timestamp. Here's an example:
> 
> http://www.w3.org/2001/XMLSchema-instance;
> xsi:noNamespaceSchemaLocation="Logging.xsd"
> version="1.0">0381UDI132
> 2015-06-18T14:28:06.570
> 531630381UDI12015-06-18T14:27:50379
> 1306 oi="607360" on="379" ox="02503" oc="0" 
is="49787" ie="50312" lftf="N"
> lfts="7" errornb="0"
> iostate="DC00">2015-06-18T14:27:50.811 unit="inch">51.45 unit="ms">0... part.

Here's a way to get these (all of them) with lxml:

import lxml.etree

tree = lxml.etree.parse("example.xml")
print tree.xpath("//objectdata/general/timestamp/text()")


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor