Trying to find a elements Xpath and store it as a attribute

2006-10-01 Thread provowallis
Hi all,

I've been struggling with this for a while so I'm hoping that someone
could point me in the right direction. Here's my problem: I'm trying to
get the XPath for a given node in my document and then store that XPath
as an attribute of the element itself. If anyone has a recommendation
I'd be happy to hear it.

Thanks,

Provo

For instance, I would take this XML

###before



An XSLT Programmer
Hello, World!


###after



An XSLT Programmer
Hello, World!


###

import sets
import amara
from amara import binderytools

doc = amara.parse('hello.xml')
elems = {}

for e in doc.xml_xpath('//*'):

 paths = elems.setdefault((e.namespaceURI, e.localName),
sets.Set())
 path = u'/'.join([n.nodeName for n in
e.xml_xpath(u'ancestor::*')])
 paths.add(u'/' + path)

for name in elems:

 doc.name.km = elems[name]

-- 
http://mail.python.org/mailman/listinfo/python-list


Looking for help with Regular Expression

2006-05-23 Thread ProvoWallis
Hi,

I'm looking for a little advice about regular expressions. I want to
capture a string of text that falls between an opening squre bracket
and a closing square bracket (e.g., "[" and "]") but I've run into a
small problem.

I've been using this: '''\[(.*?)\]''' as my pattern. I was expecting
this to be greedy but the funny thing is that it's not greedy enough in
some situations.

Here's my problem: The end of my string sometimes contains a cross
reference to a section in a book and the subsections are cited using
square brackets exactly like the one I'm using as the ending point in
my original regular expression.

E.g., the text string in my data looks like this: see discussion in
§ 512.16[3][b]]

But my regular expression is stopping after the first "]" so after I
add the new markup the output looks like this:

see discussion in
§ 512.16[3][b]]

So the last subsection is outside of the note tag. I want something
like this:

see discussion in
§ 512.16[3][b]]

I'm not sure how to make my capture more greedy so I've resorted to
cleaning up the data after I make the first round of replacements:

data = re.sub(r'''\[(\d*?)\]\[(\w)\]\]''',
'''[\1][\2]]''', data)

There's got to be a better way but I'm not sure what it is.

Thanks,

Greg

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: NewB question on text manipulation

2006-05-03 Thread ProvoWallis
Thanks again and sorry about the lack of examples. It didn't even occur
to me that my example wasn't comprehensive enough when I posted my
first message but I can see the issue now.

Your solution is really helpful for me to see. I can't tell you how
much I apprecaite it. I thought that adding more values to the tuple
was the way to go but couldn't get my mind around how to capture the
info that I needed.

Thanks!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: NewB question on text manipulation

2006-05-03 Thread ProvoWallis
Thanks very much for this I really appreciate it. I've pasted what I've
got now thanks to you.

I only have one issue that I can't figure out. When I print the new
string I'm getting all of the values in the lt list rather than just
the one that corresponds to the original entry.

E.g.,

My original data looks like this:

<1>FAM LAW ENF259-232-687

<1>APPEAL40-38; 40-44; 44-18; 45-151

I want my output to look like this:

<1>FAM LAW ENF259-232-687
<1>APPEAL40-381
<1>APPEAL40-441
<1>APPEAL44-181
<1>APPEAL45-151

But istead I'm getting this -- all of the entries in the lt list are
being added to my string when I just want one. I'm not sure how to
select just the entry in the lt list that I want.

<1>FAM LAW ENF259-232-6871
<1>APPEAL40-38-6871
<1>APPEAL40-44-6871
<1>APPEAL44-18-6871
<1>APPEAL45-15-6871


###


Here's what I've got so far:


s_space = " "  # a single space
s_empty = ""  # empty string

pat = re.compile("\s*([^<]+)([^<]+)")

lst = []

while True:
m = pat.search(s)
if not m:
break

title = m.group(1).strip()
xc = m.group(2)
xc = xc.replace(s_space, s_empty)
tup = (title, xc)
lst.append(tup)
s = pat.sub(s_empty, s, 1)

lt = s.strip()

for title, xc in lst:
lst_pp = xc.split(";")
for pp in lst_pp:
print "<1>%s%s%s" % (title, pp, lt)

-- 
http://mail.python.org/mailman/listinfo/python-list


NewB question on text manipulation

2006-05-02 Thread ProvoWallis
I'm totally stumped by this problem so I'm hoping someone can give me a
little advice or point me in the right direction.

I have a file that looks like this:

APPEAL40-24; 40-46; 42-46; 42-48; 42-62; 42-63 PROC
GUIDE921(b)(1)

(i.e., <[chapter name][multiple or single book page
ranges][chapter name][multiple or single book page
ranges][code]

but I want to change it so that it looks like this

<1>APPEAL40-241(b)(1)
<1>APPEAL40-461(b)(1)
<1>APPEAL42-461(b)(1)
<1>APPEAL42-481(b)(1)
<1>APPEAL42-621(b)(1)
<1>APPEAL42-631(b)(1)
<1>PROC GUIDE921(b)(1)

but I'm not at all sure how to do it.

I've come up with a simlple function that will change the order of the
text but I'm not sure how to break out

 def Switch(m):

  return '%s%s' % (m.group(2), m.group(1))

 data = re.sub(r'''<1>(.*?)(.*?)\n''', Switch, data)

But I'm still a long way from what I need.

Any pointers would be greatly appreciated.

Thanks,

Greg

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: noobie mkdir problem/question

2006-03-25 Thread ProvoWallis
I understand that but I'm still puzzled. Is this the reason why I can't
write files to this directory?

The xrefs directory is created the way I expect it would be using mkdir
but I can't seem to write to it. I thought that my results would be
written to the xrefs directory here but they're ending up in the
original folder not the subfolder.

  outputFile = open(os.path.join(xrefs,outputFname), 'w')
  outputFile.write(data)
  outputFile.close()

What am I missing?

[EMAIL PROTECTED] wrote:
> if (os.path.isdir(xrefs) == 0):
>  os.mkdir(xrefs)
>
> 
> 
> os.path.isdir(stuff) returns 
> True or False

-- 
http://mail.python.org/mailman/listinfo/python-list


noobie mkdir problem/question

2006-03-25 Thread ProvoWallis
Hi,

I'm trying to write a script that will create a new directory and then
write the results to this newly created directory but it doesn't seem
to work for me and I don't know why. I'm hoping someone can see my
mistake or at least point me in the right direction.

I start like this capturing the root directory and making my new
"xrefs" directory (I can see the new folder in windows explorer):

root = raw_input("Enter the path where the program should run: ")

xrefs = os.path.join(root,'xrefs')

if (os.path.isdir(xrefs) == 0):
 os.mkdir(xrefs)
else:
 sys.exit('LOG folder already exists. Exiting program.')

...I do everything else...

And then I'm trying to write the results out to xrefs. But instead of
writing to xrefs they're written to the original directory, i.e., root.
and I'm not sure why.

outputFname = given + '.log'
outputFile = open(os.path.join(xrefs,outputFname), 'w')
outputFile.write(data)
outputFile.close()

Anyone?

Thanks,

Greg

-- 
http://mail.python.org/mailman/listinfo/python-list


Newbie Class/Counter question

2006-03-14 Thread ProvoWallis
Hi,

I've always struggled with classes and this one is no exception.

I'm working in an SGML file and I want to renumber a couple of elements
in the hierarchy based on the previous level.

E.g.,

My document looks like this

A. Title Text
1. Title Text
1. Title Text
1. Title Text
B. Title Text
1. Title Text
1. Title Text

but I want to change the numbering of the second level to sequential
numbers like 1, 2, 3, etc. so my output would look like this

A. Title Text
1. Title Text
2. Title Text
3. Title Text
B. Title Text
1. Title Text
2. Title Text

This is what I've come up with on my own but it doesn't work. I was
hoping someone could critique this and point me in the right or better
direction.

Thanks,

Greg

###


def Fix(m):

 new = m.group(1)

 class ReplacePtSubNumber(object):

  def __init__(self):
   self._count = 0
   self._ptsubtwo_re = re.compile(r'', re.IGNORECASE| re.UNICODE)
  # self._ptsubone_re = re.compile(r'' % (self._count)


 new = ReplacePtSubNumber().sub(new)
 return 'http://mail.python.org/mailman/listinfo/python-list


Re: regular expressions, unicode and XML

2006-01-26 Thread ProvoWallis
Thanks for this but I'm still getting an "empty" character (I don't
know what else to call it) rather than the text captured by my regular
expression in my replaced text.

I even added the utf encoding declaration to my input data but still no
luck.

Any suggestions?

-- 
http://mail.python.org/mailman/listinfo/python-list


regular expressions, unicode and XML

2006-01-25 Thread ProvoWallis
Hi,

I'm hoping someone can help me. I'm hopelessly lost.

I'm trying to make a change in some XML files using a regular
expression (re.sub). I can capture the text I want to replace OK but
when I replace it end up with nothing: i.e., just a "" character in my
file.

data = re.sub(r'(?i)(?u)Sample
Title\—(.*?):', ' Sample
Title—\1:', data)

I think my problem is that I don't understand unicode or even know how
my XML is encoded b/c there is nothing in the XML declaration at the
top of the file.

I'd be grateful if someone could give a little adive or point me in the
right direction. I've read abunch of stuff on the board but nothing
seems to click.I'm guessing I have to decode my file when I read it
something like this

raw = inputFile.read()
fileencoding = "utf-8"
data =  raw.decode(fileencoding)

and then write it out similarly but this doesn't seem to work.

Any help appreciated,

Greg

-- 
http://mail.python.org/mailman/listinfo/python-list


Is possible to combine handle_data and regular expressions?

2006-01-19 Thread ProvoWallis
Hi,

I've experimented with regular expressions to solve my problems in the
past but I have seen so many comments about HTMLParser and sgmllib that
I thought I would try a different approach this time so I tried using
HTMLParser.

I want to search through my SGML file for various strings of text and
find out what section they're in. What I have here does this to a
certain extent but I was wondering if I could make handle_data and
regular expressions work together to make this work a little better.

For instance, when I search for "above" as I am here, I just get
something like this: '174.114[1]':'above' but this isn't very useful
b/c I want to know the context of above (i.e., the informaiton on
either side the above) and maybe even us a regular expression to filter
the search a little more.

Any ideas?

As always, I'd appreciate feedback on my efforts.

Thanks,

Greg

###

from HTMLParser import HTMLParser
import os, re
root = raw_input("Enter the path where the program should run: ")
fname = raw_input("Enter name of the file: ")
print


given,ext = os.path.splitext(fname)

inputFile = open(os.path.join(root,fname), 'r')

data =  inputFile.read()

class PartFinder(HTMLParser):

 _full = None
 _secDict = dict()

 def found(self):
 return self._secDict

 def handle_starttag(self, tag, attrs):
 if tag == "sec-main":
  self._main = dict(attrs).get('no')
  self._full = self._main

 if tag == "sec-sub1":
  self._subone = dict(attrs).get('no')
  self._full = self._main + '[' + self._subone + ']'

 if tag == "sec-sub2":
  self._subtwo = dict(attrs).get('no')
  self._full = self._main + '[' + self._subone + ']' + '['
+ self._subtwo + ']'


 def handle_data(self, data):
 if "Pt" in data:
  if not self._secDict.has_key(self._main):
   self._secDict[self._full] = [data]
   print self._secDict



if __name__ == "__main__":
 parser = PartFinder()
 parser.feed(data)
 x = parser.found()

 output_part = given + '.parts'
 outputFile = file(os.path.join(root,output_part), 'w')
 outputFile.write(str(x))
 outputFile.close()

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie ? -- SGML metadata extraction

2006-01-17 Thread ProvoWallis
Thanks very much for your help. It's greatly appreciated.

It look a  couple of tries to see what was happening but I've figured
it out.

Greg

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie ? -- SGML metadata extraction

2006-01-16 Thread ProvoWallis
Thanks. One more question, though.

I'm not sure how to limit the scope of my search so that I'm just
extracting the id attribute from the sections that I want. I.e., I want
the id attributes from the forms in sections 1 and 3 but not from 2.

Maybe I'm missing something.

-- 
http://mail.python.org/mailman/listinfo/python-list


Newbie ? -- SGML metadata extraction

2006-01-16 Thread ProvoWallis
Hi,

I'm trying to write a script that will extract the value of an
attribute from an element using the attribute value of another element
as the basis for extraction.

For example, in my situation I have a pre-defined list of main sections
and I want to extract the id attribute of the form element and create a
dictionary of graphic ID and section number pairs but only for the
sections in my pre-defined list but I want to exclude the id value from
any section that does not appear on my list. I.e., I want to know the
id value for the forms that appear in sections 1 and 3 but not in 2.

Boiled down my SGML looks something like this:
















This is what I have come up with on my own so far. My problem is that I
can't seem to pick up the value of the id attribute.

Any advice appreciated.

Greg

###

import os, re, csv

root = raw_input("Enter the path where the program should run: ")
fname = raw_input("Enter name of the CSV file containing the section
numbers: ")
sgmlname = raw_input("Enter name of the SGML file to search: ")
print

given,ext = os.path.splitext(fname)
root_name = os.path.join(root,fname)
n = given + '.new'
outputName = os.path.join(root,n)

reader = csv.reader(open(root_name, 'r'), delimiter=',')

sections = []

for row in reader:
 sections.append(row[0])


inputFile = open(os.path.join(root,sgmlname), 'r')

illoList ={}

while 1:
 lines = inputFile.readlines()
 if not lines:
  break
 for line in lines:

   main = re.search(r'(?i)(?m)(?s)http://mail.python.org/mailman/listinfo/python-list


Newbie Question: CSV to XML

2006-01-07 Thread ProvoWallis
Hi,

Would anyone be willing to give me some feedback about this little
script that I wrote to convert CSV to XML. I'll happily admit that I
still have a lot to learn about Python so I'm always grateful for
constructive feedback.

Thanks,

Greg

###

#csv to XML conversion utility

import os, re, csv
root = raw_input("Enter the path where the program should run: ")
fname = raw_input("Enter name of the uncoverted file: ")
print

given,ext = os.path.splitext(fname)
root_name = os.path.join(root,fname)
n = given + '.xml'
outputName = os.path.join(root,n)

reader = csv.reader(open(root_name, 'r'), delimiter=',')

output = open(outputName, 'w')

output.write('\n')

output.write('\n %s %s  \n\n\n' % ('TAS input file for ', given))

for row in reader:
 for i in range(0, len(row)):

  if i == 0:
   output.write('\n\n%s' % (i, row[i]))
  if i > 0 and i < len(row) - 1:
   output.write('\n%s' % (i,
row[i]))
  if i == len(row) - 1:
   output.write('\n%s\n' % (i, row[i]))

output.write('\n\n\n\n')

output.close()

-- 
http://mail.python.org/mailman/listinfo/python-list


Newbie Question: CSV to XML

2006-01-06 Thread ProvoWallis
Hi,

I'm learning more and more about Python all the time but I'm still a
real newbie. I wrote this little script to convert CSV to XML and I was
hoping to get some feedback on it if anyone was willing to comment.

It works but I was wondering if there was anything I could do better.
E.g., incorporate minidom somehow? But I'm totally in the dark as how I
would do this.

Thanks,

Greg


###

#csv to XML conversion utility

import os, re, csv
root = raw_input("Enter the path where the program should run: ")
fname = raw_input("Enter name of the uncoverted file: ")
print

given,ext = os.path.splitext(fname)
root_name = os.path.join(root,fname)
n = given + '.xml'
outputName = os.path.join(root,n)

reader = csv.reader(open(root_name, 'r'), delimiter=',')

output = open(outputName, 'w')

output.write('\n\n')

output.write('\n %s %s  \n\n\n' % ('TAS input file for ', given))

for row in reader:
 for i in range(0, len(row)):

  if i == 0:
   output.write('\n\n%s' % (i, row[i]))
  if i > 0 and i < len(row) - 1:
   output.write('\n%s' % (i,
row[i]))
  if i == len(row) - 1:
   output.write('\n%s\n' % (i, row[i]))

output.write('\n\n\n\n')

output.close()

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: join dictionaries using keys from one & values

2005-12-06 Thread ProvoWallis
Thanks again. This is very helpful.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: join dictionaries using keys from one & values

2005-12-05 Thread ProvoWallis
Thanks so much. I never would have been able to figure this out on my
own.

def dictionary_join(one, two):

 dict2x = dict( ((dict2[k], k) for k in dict2.iterkeys()))
 dict3 = dict(((k, dict2x[v]) for k,v in dict1.iteritems()))
 print dict3

dict1 = {1:'bbb', 2:'aaa', 3:'ccc'}

dict2 = {'5.01':'bbb', '6.01':'ccc', '7.01':'aaa'}

dictionary_join(dict1, dict2)

-- 
http://mail.python.org/mailman/listinfo/python-list


join dictionaries using keys from one & values

2005-12-05 Thread ProvoWallis
I'm still learning python so this might be a crazy question but I
thought I would ask anyway. Can anyone tell me if it is possible to
join two dictionaries together to create a new dictionary using the
keys from the old dictionaries?

The keys in the new dictionary would be the keys from the old
dictionary one (dict1) and the values in the new dictionary would be
the keys from the old dictionary two (dict2). The keys would be joined
by matching the values from dict1 and dict2. The keys in each
dictionary are unique.

dict1 = {1:'bbb', 2:'aaa', 3:'ccc'}

dict2 = {5.01:'bbb', 6.01:'ccc', 7.01:'aaa'}

dict3 = {1 : 5.01, 3 : 6.01, 2 : 7.01}

I looked at "update" but I don't think it's what I'm looking for.

Thanks,

Greg

-- 
http://mail.python.org/mailman/listinfo/python-list


newbie write to file question

2005-12-03 Thread ProvoWallis
Hi,

I'm trying to create a script that will search an SGML file for the
numbers and titles of the hierarchical elements (section level
headings) and create a dictionary with the section number as the key
and the title as the value.

I've managed to make some progress but I'd like to get some general
feedback on my progress so far plus ask a question. When I run this
script on a directory that contains multiple files even the files that
don't contain any matches generate log files and usually with the
contents of the last file that contained matches. I'm not sure what I'm
missing so I'd appreciate some advice.

Thanks,

Greg


Here's a very simplified version of my SGML:

section title 1.01
title 1
title 2
title a
title b
title i
section title 2.02
section title 3.03
title 1
title 2
section title 4.04
section title 5.05

And here's what I written so far:

import os
import re

setpath = raw_input("Enter the path where the program should run: ")
print

table ={}

for root, dirs, files in os.walk(setpath):
 fname = files
 for fname in files:
  inputFile = file(os.path.join(root,fname), 'r')


  while 1:
   lines = inputFile.readlines(1)
   if not lines:
break
   for line in lines:
main = re.search(r'(?i)\n?(.*?)\n' , line)
sub_one = re.search(r'(?i)\n?(.*?)\n' , line)
sub_two = re.search(r'(?i)\n?(.*?)\n' , line)
sub_three = re.search(r'(?i)\n?(.*?)\n' , line)
if main is not None:
 table[main.group(1)] = main.group(2)
 m = main.group(1)
if main is None:
 pass
if sub_one is not None:
 one = m + '[' + sub_one.group(1) + ']'
 table[one] = sub_one.group(2)
if sub_one is None:
 pass
if sub_two is not None:
 two = one + '[' + sub_two.group(1) + ']'
 table[two] = sub_two.group(2)
if sub_two is None:
 pass
if sub_three is not None:
 three = two + '[' + sub_three.group(1) + ']'
 table[three] = sub_three.group(2)
if sub_three is None:
 pass

 str_table = str(table)
 (name,ext) = os.path.splitext(fname)
 output_name = name + '.log'
 outputFile =
file(os.path.join(root,output_name), 'w')
 outputFile.write(str_table)
 outputFile.close()

-- 
http://mail.python.org/mailman/listinfo/python-list


Newbie Count Question

2005-10-09 Thread ProvoWallis
I have a newbie count question.

I have a number of SGML documents divided into sections but over the
course of editing them the some sections have been deleted (and perhaps
others added). I'd like to renumber them. The input documents look like
this:















and after renumbering I would like the sections to look like this:















so they are basically numbered sequentially from 1 thru to the end of
the number of sections.

I've managed to get this far thanks to looking at other posts on the
board but no matter waht I try all of the sections end up being
numbered for the total number of sections in the document. e.g., if
there are 100 sections in the document the "no" attribute is "1.100"
for each one.

import os, re

setpath = raw_input("Enter the path where the program should run: ")
print

for root, folders, files in os.walk(setpath):
for name in files:
filepath = os.path.join(root, name)
fileopen = open(filepath, 'r')
data =  fileopen.read()
fileopen.close()

secmain_pattern = re.compile(r'',
re.IGNORECASE)
m = secmain_pattern.search(data)
all = secmain_pattern.findall(data)

counter = 0
for i in range(0,len(all)):
 counter = counter + 1
 print counter

if m is not None:
 def new_number(match):
  return '' % (match.group(1),
counter)
 data = secmain_pattern.sub(new_number, data)

outputFile = file(os.path.join(root,name), 'w')
outputFile.write(data)
outputFile.close()


Thanks for your help!

-- 
http://mail.python.org/mailman/listinfo/python-list