I think the "search" and "match" methods of compiled regular expression objects accept optional "pos" and "endpos" arguments to limit the search range.
On Wed, Jan 20, 2010 at 3:47 PM, Yitzhak Wiener <[email protected]>wrote: > Wow, Benny, this was great coaching. > > I appreciate it so much. > > The reason I opened it as array is because I indeed need to edit 16bit > int's in raw data section of this file. > > > > > > > > Best Regards, > > Yitzhak > ------------------------------ > > *From:* [email protected] [mailto:[email protected]] *On > Behalf Of *Beni Cherniavsky > *Sent:* Wednesday, January 20, 2010 12:19 PM > *To:* Yitzhak Wiener > *Cc:* [email protected] > > *Subject:* Re: [Python-il] problem in script > > > > > > On Tue, Jan 19, 2010 at 18:11, Yitzhak Wiener <[email protected]> > wrote: > > Hi Guys, > > > > May I ask you a question? > > I am trying to write a script that looking for some string expression > (expression A) in a file, and after it finds it, it searches for 2 other > expressions (B & C) which are located few lines after the fist expression. > > These 2 expressions appear few times in this file, that’s why I need to > search for expression A first, and the next time B & C appears this is what > I search for. > > If the expressions are fixed strings, you don't really need regexps - just > use str.index() which takes optional start,stop parameters: > > > > *a_pos = s.index("multiprogpage_c...@c0 - SECTION HEADER")* > > *b_pos = s.index("s_paddr", a_pos)* > > *c_pos = s.index("s_size", b_pos) # or a_pos?* > > > > [If any of these never occurs, .index() will raise ValueError] > > > > If you need the flexibility of regexps, they don't take start,stop > parameters, but you can slice the string itself: > > > > *a_match = re.search("multiprogpage_c...@c0 - SECTION HEADER", s)* > > *b_match = re.search("s_paddr", s[a_match.start()])* > > *c_match = re.search("s_size", s[b_match.start()]) # or a_match?* > > > > But the whole point of regular expressions is that you can also express "A, > then B, then C" at once: > > > > *match = re.search("multiprogpage_c...@c0 - SECTION > HEADER.*(s_paddr).*(s_size)", s)* > > *b_pos = match.start(1)* > > *c_pos = match.start(2)* > > > > If you don't know the order of s_paddr/s_size, the regexp is much trickier. > > I guess you want to look for things after "s_paddr", "s_size", so you want > match.end(1). > > > > => Of these 3 ways, the first is probably simplest and cleanest. > > > > You seem to be parsing a COFF file, right? > > Regexps are not well-suited to parsing binary formats. > > The manual way to parse them is to work with strings, and the array/struct > module to parse specific parts. > > (See my advices below mixed with your code.) > > > > If you intend to do a lot with COFF, consider the > hachoir<http://bit.ly/hachoir> and > Construct <http://construct.wikispaces.com/> frameworks. > > They allow parsing/modifying binary formats in a *declarative* way - your > code looks like a *description* of the format, not like *actions* needed > to parse it. > > And they have built-in definitions for a lot of formats. E.g. both have > ELF and PE (windows exe format) though not COFF. > > *Note however that PE is based on COFF, so I guess you can massage it a > little and get a full COFF parser...* > > > > > > I attached the script I use for finding expression A, but now I don’t know > how to tell the script to start searching for expression B & C from point A. > > > > Some notes how your code can be simplified in Python: > > > > *from array import array* > > * * > > *import os, stat, re* > > * * > > *#get coff file size* > > *file_size = os.stat("project_release.dump")[stat.ST_SIZE]* > > * * > > Since python 2.2, the result of os.stat still pretends to be a tuple but > can also be accessed with named attributes: > > *os.stat("project_release.dump").st_size* > > * * > > *a = array('H')* > > *f = open("project_release.dump","rb")* > > *f2 = open("project_release_out.dump","wb")* > > IMHO, it's cleaner to write a function that takes a string and returns a > string, > and do all file reading/writing at the end, where you call the function. > > This one is a question of taste, you might well disagree... > > * * > > *a.fromfile( f,(file_size/2) )* > > *s = a.tostring()* > > Why use an array object to read the file, when all you seem to do with it > is convert it to a string? > I'd simply do: > > > > *s = open("project_release.dump","rb").read()* > > > > Then, if/when you need to parse parts of it as 16-bit ints, convert those > parts to arrays: *array('H', s[start:stop])* > > This also gives you the flexibility to parse different parts as different > types. See also the struct module. > > > > Note that reading the file, then constructing the array() also saves > checking the size and calling f.fromfile() separately! > > > > * * > > *#search in coff for beggining of "MultiProgPage_Code" code section in > coff file.* > > *#We need the beggining adress and size of this section* > > *pattern = re.compile ("multiprogpage_c...@c0 - SECTION HEADER")* > > *result = pattern.search(s)* > > * * > > You don't have to separately compile regexps - just directly call > functions like re.search(regexp_string, s). > > [Compilation was supposed to improve performance when you use the same > regexp a lot, > but the re module has a cache of compiled regexps, so it usually doesn't > matter.] > > > > And as I said above, s.index() is probably simpler than regexps for your > needs. > > > > *#result is MatchObject, and therefore result.start() holds the location > of exression A in the file.* > > *#now we need to find the value of the first time s_paddr , and s_size are > found after exression A * > > > > > > -- > Beni Cherniavsky-Paskin <[email protected]> > > > ______________________________________________________________________ > DSP Group, Inc. automatically scans all emails and attachments using > MessageLabs Email Security System. > _____________________________________________________________________ > > > ______________________________________________________________________ > DSP Group, Inc. automatically scans all emails and attachments using > MessageLabs Email Security System. > _____________________________________________________________________ > > _______________________________________________ > Python-il mailing list > [email protected] > http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il > > -- Check out my blog: http://orip.org
_______________________________________________ Python-il mailing list [email protected] http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
