On Saturday, August 29, 2015 at 11:04:53 PM UTC-4, Ben Finney wrote: > kbtyo writes: > > > I am using Jupyter Notebook and Python 3.4. > > Thank you for saying so! It is not always required, but when it matters, > this information is important to state up front. > > > I have a data structure in the format, (type list): > > > > [{'AccountNumber': N, > > 'Amount': '0', > > 'Answer': '12:00:00 PM', > > 'ID': None, > > 'Type': 'WriteLetters', > > 'Amount': '10', > > {'AccountNumber': Y, > > 'Amount': '0', > > 'Answer': ' 12:00:00 PM', > > 'ID': None, > > 'Type': 'Transfer', > > 'Amount': '2'}] > > > > The end goal is to write this out to CSV. > > So that assumes that *every* item will be a mapping with all the same > keys. CSV is limited to a sequence of "records" which all have the same > fields in the same order.
This clue tipped me off that I wasn't collecting the newly generate key value pairs from my XML parser properly. I was using the dictionary built in method update to update the keys. The terrible thing was that the returned dictionary was only updated with the last keys and values. What a couple of hours of shut eye can do for the mind and body. > > > The list comprehension "data" is to maintain the integrity of the > > column headers and the values for each new instance of the data > > structure (where the keys in the dictionary are the headers and values > > - row instances). The keys in this specific data structure are meant > > to check if there is a value instance, and if there is not - place an > > ''. > > > > [...] > > for row in results: > > data = [row.get(index, '') for index in results] > > The 'for' statement iterates over 'results', getting an item each time. > The name 'row' is bound to each item in turn. > > Then, each time through the 'for' loop, you iterate *again* over > 'results'. The name 'index' is bound to each item. > > You then attempt to use the dict (each item from 'results' is itself a > dict) as a key into that same dict. A dict is not a valid key; it is not > a "hashable type" i.e. a type with a fixed value, that can produce a > hash of the value). I discovered that. I need to iterate again to access the keys and values. > > So you're getting dicts and attempting to use those dicts as keys into > dicts. That will give the error "TypeError: unhashable type: 'dict'". > > I think what you want is not items from the original sequence, but the > keys from the mapping:: > > for input_record in results: > output_record = [input_record.get(key, "") for key in input_record] > > But you're then throwing away the constructed list, since you do nothing > with it before the end of the loop. > > > writer.writerow(data) > > This statement occurs only *after* all the items from 'results' have > been iterated. You will only have the most recent constructed row. > > Perhaps you want:: > > for input_record in results: > output_record = [input_record.get(key, "") for key in input_record] > writer.writerow(output_record) > I tried this and some of the values maintained integrity and some were rewritten by a previous dictionary's values. > -- > \ "An idea isn't responsible for the people who believe in it." | > `\ --Donald Robert Perry Marquis | > _o__) | > Ben Finney @BenFinney: I feel that I need to provide some context to avoid any confusion over my motivations for choosing to do something. My original task was to parse an XML data structure stored in a CSV file with other data types and then add the elements back as headers and the text as row values. I went back to drawing board and creating a "results" list of dictionaries where the keys have values as lists using this. def convert_list_to_dict(get_just_xml_data): d = {} for item in get_just_xml_data(get_all_data): for k, v in item.items(): try: d[k].append(v) except KeyError: d[k] = [v] return d This creates a dictionary for each XML tag - for example: { 'Number1': ['0'], 'Number2': ['0'], 'Number3': ['0'], 'Number4': ['0'], 'Number5': ['0'], 'RepgenName': [None], 'RTpes': ['Execution', 'Letters'], 'RTID': ['3', '5']} I then used this to create a "headers" set (to prevent duplicates to be added) and the list of dictionaries that I mentioned in my OP. I achieve this via: #just headers def construct_headers(convert_list_to_dict): header = set() with open('real.csv', 'rU') as infile: reader = csv.DictReader(infile) for row in reader: xml_data = convert_list_to_dict(get_just_xml_data) #get_just_xml_data(get_all_data) row.update(xml_data) header.update(row.keys()) return header #get all of the results def construct_results(convert_list_to_dict): header = set() results = [] with open('real.csv', 'rU') as infile: reader = csv.DictReader(infile) for row in reader: xml_data = convert_list_to_dict(get_just_xml_data) #get_just_xml_data(get_all_data) # print(row) row.update(xml_data) # print(row) results.append(row) # print(results) header.update(row.keys()) # print(type(results)) return results I guess I am using the headers list originally written out. My initial thought is to just write out the values corresponding with each transaction. For example, citing this data structure: { 'Number1': ['0'], 'Number2': ['0'], 'Number3': ['0'], 'Number4': ['0'], 'Number5': ['0'], 'RPN': [None], 'RTypes': ['Execution', 'Letters'], 'RTID': ['3', '5']} I would get a CSV Number1, Number2, Number3, Number4, Number5, RPN, RTypes,RTID 0, 0, 0, 0, 0, None, Execution, 3 None, None, None,None,None, Letters, 5 I am wondering how I would achieve this when all of the headers set is not sorted (should I do so before writing this out?). Also, since I have millions of transactions I want to make sure that the values for each of the headers is sequentially placed. Any guidance would be very helpful. Thanks. -- https://mail.python.org/mailman/listinfo/python-list