Okay, so I have gone back to the drawing board and have the following
predicament (my apologies, in advance for the indentation):
Here is my sample:
<Response ID="24856-775" RequestType="Moverview">
<MonthDayCount>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
</MonthDayCount>
<FeeCount>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
</FeeCount>
<PaymentBucketAmount>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
</PaymentBucketAmount>
<PaymentBucketDueDate>
<DateTime>1/1/0001 12:00:00
AM</DateTime>
<DateTime>1/1/0001 12:00:00
AM</DateTime>
<DateTime>1/1/0001 12:00:00
AM</DateTime>
<DateTime>1/1/0001 12:00:00
AM</DateTime>
<DateTime>1/1/0001 12:00:00
AM</DateTime>
<DateTime>1/1/0001 12:00:00
AM</DateTime>
<DateTime>1/1/0001 12:00:00
AM</DateTime>
<DateTime>1/1/0001 12:00:00
AM</DateTime>
</PaymentBucketDueDate>
<Warnings />
<SList />
<LList />
<PA>False</PA>
<PL>False</PL>
<PC>False</PC>
<PCs>False</PCs>
<PJ>False</PJ>
<OITC>0</OITC>
<MG />
<R />
<CCGoods />
</Response>
Using this:
import xml.etree.cElementTree as ElementTree
from xml.etree.ElementTree import XMLParser
import csv
def flatten_list(aList, prefix=''):
for i, element in enumerate(aList, 1):
eprefix = "{}{}".format(prefix, i)
if element:
# treat like dict
if len(element) == 1 or element[0].tag != element[1].tag:
yield from flatten_dict(element, eprefix)
# treat like list
elif element[0].tag == element[1].tag:
yield from flatten_list(element, eprefix)
elif element.text:
text = element.text.strip()
if text:
yield eprefix[:].rstrip('.'), element.text
def flatten_dict(parent_element, prefix=''):
prefix = prefix + parent_element.tag
if parent_element.items():
for k, v in parent_element.items():
yield prefix + k, v
for element in parent_element:
eprefix = prefix + element.tag
if element:
# treat like dict - we assume that if the first two tags
# in a series are different, then they are all different.
if len(element) == 1 or element[0].tag != element[1].tag:
yield from flatten_dict(element, prefix=prefix)
# treat like list - we assume that if the first two tags
# in a series are the same, then the rest are the same.
else:
# here, we put the list in dictionary; the key is the
# tag name the list elements all share in common, and
# the value is the list itself
yield from flatten_list(element, prefix=eprefix)
# if the tag has attributes, add those to the dict
if element.items():
for k, v in element.items():
yield eprefix+k
# this assumes that if you've got an attribute in a tag,
# you won't be having any text. This may or may not be a
# good idea -- time will tell. It works for the way we are
# currently doing XML configuration files...
elif element.items():
for k, v in element.items():
yield eprefix+k
# finally, if there are no child tags and no attributes, extract
# the text
else:
yield eprefix, element.text
def makerows(pairs):
headers = []
columns = {}
for k, v in pairs:
if k in columns:
columns[k].extend((v,))
else:
headers.append(k)
columns[k] = [k, v]
m = max(len(c) for c in columns.values())
for c in columns.values():
c.extend(' ' for i in range(len(c), m))
L = [columns[k] for k in headers]
rows = list(zip(*L))
return rows
def main():
with open('sample.xml', 'r', encoding='utf-8') as f:
xml_string = f.read()
xml_string= xml_string.replace('�', '') #optional to remove
ampersands.
root = ElementTree.XML(xml_string)
for key, value in flatten_dict(root):
key = key.rstrip('.').rsplit('.', 1)[-1]
print(key,value)
if __name__ == "__main__":
main()
I receive this output:
ResponseRequestType Moverview
ResponseID 24856-775
ResponseMonthDayCount1 0
ResponseMonthDayCount2 0
ResponseMonthDayCount3 0
ResponseMonthDayCount4 0
ResponseMonthDayCount5 0
ResponseMonthDayCount6 0
ResponseMonthDayCount7 0
ResponseMonthDayCount8 0
ResponseMonthDayCount9 0
ResponseMonthDayCount10 0
ResponseMonthDayCount11 0
ResponseMonthDayCount12 0
ResponseMonthDayCount13 0
ResponseMonthDayCount14 0
ResponseMonthDayCount15 0
ResponseMonthDayCount16 0
ResponseMonthDayCount17 0
ResponseMonthDayCount18 0
ResponseMonthDayCount19 0
ResponseMonthDayCount20 0
ResponseMonthDayCount21 0
ResponseMonthDayCount22 0
ResponseMonthDayCount23 0
ResponseMonthDayCount24 0
ResponseMonthDayCount25 0
ResponseFeeCount1 0
ResponseFeeCount2 0
ResponseFeeCount3 0
ResponseFeeCount4 0
ResponseFeeCount5 0
ResponseFeeCount6 0
ResponsePaymentBucketAmount1 0
ResponsePaymentBucketAmount2 0
ResponsePaymentBucketAmount3 0
ResponsePaymentBucketAmount4 0
ResponsePaymentBucketAmount5 0
ResponsePaymentBucketAmount6 0
ResponsePaymentBucketAmount7 0
ResponsePaymentBucketAmount8 0
ResponsePaymentBucketDueDate1 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate2 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate3 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate4 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate5 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate6 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate7 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate8 1/1/0001 12:00:00 AM
ResponseWarnings None
ResponseSList None
ResponseLList None
ResponsePA False
ResponsePL False
ResponsePC False
ResponsePCs False
ResponsePJ False
ResponseOITC 0
ResponseMG None
ResponseR None
ResponseCCGoods None
When I write it out to the CSV, using :
writer = csv.writer(open("try2.csv", 'wt'))
writer.writerows(makerows(flatten_dict(root)))`
I still receive the headers with the **Response** chained to the sub-elements
of the root with the tags' text as the values (which is just fine). My goal is
to only have the sublements as the headers (along with their values), sans the
Response (unless it is the tag name). Thanks you all for your feedback! For
example, ResponsePaymentBucketAmount1 0, should just be PaymentBucketAmount1 as
the header.
--
https://mail.python.org/mailman/listinfo/python-list