Suggestions for best practices when automating geocoding task

2016-02-11 Thread kbtyo
Good Morning,

I welcome feedback and suggestions for libraries or resources in order to 
automate the following:

1. Given a directory of CSV files (each containing an address field)
   
   a. Read each CSV file
   b. Use address instance in row as part of a query and send request to 
external API
  in order to geocode address
   c. Write response to each row and return the updates file


I have been wondering if using a series of decorators could be implemented. 
Moreover, for the request component, has anyone explored using Tornado or 
Twister to create a queue for requests?

Thank you again for your feedback. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Improving code written in order to reduce verbosity and perhaps use a decorator?

2016-01-10 Thread kbtyo
Hello everyone:

A member on the Stack Overflow community advised me to post my question on this 
forum:

http://codereview.stackexchange.com/questions/116395/opening-the-same-csv-file-in-two-different-ways-in-order-to-transform-data-in-on

I appreciate your feedback immensely. 

Sincerely,
Saran
-- 
https://mail.python.org/mailman/listinfo/python-list


Understanding " 'xml.etree.ElementTree.Element' does not support the buffer interface"

2016-01-10 Thread kbtyo
Hello Everyone:

I am curious to know why I receive the aforementioned message. I am using 
Python 3.4.3 and Windows 7. I am running the following script from Windows 
Powershell:


Response = 's.csv'
with open(Response, 'rU', encoding='utf-8') as data:
separated = data.read().split('","')
x = ElementTree.XML(separated[3])
y = ElementTree.XML(separated[4])
print(dict(flatten_dict(x)))
print(dict(flatten_dict(y)))

I am importing ElementTree as follows:

import xml.etree.cElementTree as ElementTree
from xml.etree.ElementTree import XMLParser


The input data is as follows:

A,B,C,D,E,F,G,H,I,J
"3","8","1","2312285SChecking10","TrueFalseFalseFalseFalse',0001,0070,","1967-12-25
 22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0"

Oddly, when I run the same script via WinPython' Jupyter Notebook, it works 
perfectly. The input string is an XML string and I am only interested in the 
4th and 5th columns, (using zero based indexing). 

Thank you, in advance for your feedback and support. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Understanding how to quote XML string in order to serialize using Python's ElementTree

2016-01-09 Thread kbtyo
My specs:

Python 3.4.3
Windows 7
IDE is Jupyter Notebooks

What I have referenced:

1) 
http://stackoverflow.com/questions/1546717/python-escaping-strings-for-use-in-xml

2)
http://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes

3)http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python


Here is the data (in CSV format) and script, respectively, (I have tried 
variations on serializing Column 'E' using both Sax and ElementTree):

i)

A,B,C,D,E,F,G,H,I,J
"3","8","1","2312285SChecking10","',0001,0070,","1967-12-25
 22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0"

ii)

#!/usr/bin/python
# -*-  coding: utf-8 -*-
import os.path
import sys
import csv
from io import StringIO 
import xml.etree.cElementTree as ElementTree
from xml.etree.ElementTree import XMLParser
import xml
import xml.sax
from xml.sax import ContentHandler

class MyHandler(xml.sax.handler.ContentHandler):
def __init__(self):
self._charBuffer = []
self._result = []

def _getCharacterData(self):
data = ''.join(self._charBuffer).strip()
self._charBuffer = []
return data.strip() #remove strip() if whitespace is important

def parse(self, f):
xml.sax.parse(f, self)
return self._result


def characters(self, data):
self._charBuffer.append(data)

def startElement(self, name, attrs):
if name == 'Response':
self._result.append({})

def endElement(self, name):
if not name == 'Response': self._result[-1][name] = 
self._getCharacterData()

def read_data(path):
with open(path, 'rU', encoding='utf-8') as data:
reader = csv.DictReader(data, delimiter =',', quotechar="'", 
skipinitialspace=True)
for row in reader:
yield row

if __name__ == "__main__":
empty = ''
Response = 'sample.csv'
for idx, row in enumerate(read_data(Response)):
if idx > 10: break
data = row['E']
print(data) # The before
data = data[1:-1]
data = ""'{}'"".format(data)
print(data) # Sanity check 
# data = '',0001,0070,'
try:
root = ElementTree.XML(data)
# print(root)
except StopIteration:
raise
pass
# xmlstring = StringIO(data)
# print(xmlstring)
# Handler = MyHandler().parse(xmlstring)


Specifically, due to the quoting in the CSV file (which is beyond my control), 
I have had to resort to slicing the string (line 51) and then formatting it 
(line 52).

However the print out from the above attempt is as follows:

"'


  File "", line unknown
ParseError: no element found: line 1, column 69
Interestingly - if I assign the variable "data" (as in line 54) I receive this:

  File "", line 56
data = '',0001,0070,'
 ^
SyntaxError: invalid token

I seek feedback and information on how to address utilizing the most Pythonic 
means to do so. Ideally, is there a method that can leverage ElementTree. Thank 
you, in advance, for your feedback and guidance.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pandas Left Merge with xlsx with CSV producing null value columns in output

2015-10-02 Thread kbtyo
On Thursday, October 1, 2015 at 7:47:18 PM UTC-4, Mark Lawrence wrote:
> On 01/10/2015 16:03, kbtyo wrote:
> > I would appreciate any feedback on the following question that I have 
> > raised here:
> >
> > http://stackoverflow.com/questions/32889129/pandas-left-merge-with-xlsx-with-csv-producing-null-value-columns-in-output
> >
> > Thank you for your feedback and support.
> >
> 
> I was going to suggest that you ask on the pandas mailing list/google 
> group, but as you've all ready done so, is there anywhere that you 
> haven't asked?  Is there a subreddit for pandas you could also try just 
> in case?

I appreciate the suggestion. I was able to solve the issue. Thank you again. 

> 
> -- 
> My fellow Pythonistas, ask not what our language can do for you, ask
> what you can do for our language.
> 
> Mark Lawrence
-- 
https://mail.python.org/mailman/listinfo/python-list


Pandas Left Merge with xlsx with CSV producing null value columns in output

2015-10-01 Thread kbtyo
I would appreciate any feedback on the following question that I have raised 
here:

http://stackoverflow.com/questions/32889129/pandas-left-merge-with-xlsx-with-csv-producing-null-value-columns-in-output

Thank you for your feedback and support. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Potential Solution for AssertionError: invalid dtype determination in get_concat_dtype when concatenating operation on list of Dataframes?

2015-09-09 Thread kbtyo
I have a list of Pandas Dataframes that I am attempting to combine using the 
concatenation function.

dataframe_lists = [df1, df2, df3]

result = pd.concat(dataframe_lists, keys = ['one', 'two','three'], 
ignore_index=True)

The full traceback that I receive when I execute this function is:

---
AssertionErrorTraceback (most recent call last)
 in ()
> 1 result = pd.concat(dataframe_lists, keys = ['one', 'two','three'], 
ignore_index=True)
  2 check(dataframe_lists)

C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\tools\merge.py
 in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, 
verify_integrity, copy)
753verify_integrity=verify_integrity,
754copy=copy)
--> 755 return op.get_result()
756 
757 

C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\tools\merge.py
 in get_result(self)
924 
925 new_data = concatenate_block_managers(
--> 926 mgrs_indexers, self.new_axes, concat_axis=self.axis, 
copy=self.copy)
927 if not self.copy:
928 new_data._consolidate_inplace()

C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\core\internals.py
 in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   4061 copy=copy),
   4062  placement=placement)
-> 4063   for placement, join_units in concat_plan]
   4064 
   4065 return BlockManager(blocks, axes)

C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\core\internals.py
 in (.0)
   4061 copy=copy),
   4062  placement=placement)
-> 4063   for placement, join_units in concat_plan]
   4064 
   4065 return BlockManager(blocks, axes)

C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\core\internals.py
 in concatenate_join_units(join_units, concat_axis, copy)
   4150 raise AssertionError("Concatenating join units along axis0")
   4151 
-> 4152 empty_dtype, upcasted_na = get_empty_dtype_and_na(join_units)
   4153 
   4154 to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,

C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\core\internals.py
 in get_empty_dtype_and_na(join_units)
   4139 return np.dtype('m8[ns]'), tslib.iNaT
   4140 else:  # pragma
-> 4141 raise AssertionError("invalid dtype determination in 
get_concat_dtype")
   4142 
   4143 

AssertionError: invalid dtype determination in get_concat_dtype


I believe that the error lies in the fact that one of the data frames is empty. 
As a temporary workaround this rather perplexing error. I used the simple 
function check to verify and return just the headers of the empty dataframe: 

def check(list_of_df):

headers = []
for df in dataframe_lists:
if df.empty is not True:
continue
else:  
headers.append(df.columns)

return headers

I am wondering if it is possible to use this function to, if in the case of an 
empty dataframe, return just that empty dataframe's headers and append it to 
the concatenated dataframe. The output would be a single row for the headers 
(and, in the case of a repeating column name, just a single instance of the 
header (as in the case of the concatenation function). I have two sample data 
sources, one and two non-empty data sets. 

df1: https://gist.github.com/ahlusar1989/42708e6a3ca0aed9b79b
df2 :https://gist.github.com/ahlusar1989/26eb4ce1578e0844eb82

Here is an empty dataframe.


df3 (empty dataframe): https://gist.github.com/ahlusar1989/0721bd8b71416b54eccd

I would like to have the resulting concatenate have the column headers (with 
their values) that reflects df1 and df2...

'AT','AccountNum', 'AcctType', 'Amount', 'City', 'Comment', 
'Country','DuplicateAddressFlag', 'FromAccount', 'FromAccountNum', 
'FromAccountT','PN', 'PriorCity', 'PriorCountry', 'PriorState', 
'PriorStreetAddress','PriorStreetAddress2', 'PriorZip', 'RTID', 'State', 
'Street1','Street2', 'Timestamp', 'ToAccount', 'ToAccountNum', 'ToAccountT', 
'TransferAmount', 'TransferMade', 'TransferTimestamp', 'Ttype', 'WA','WC', 'Zip'

as follows: 

'A', 'AT','AccountNum', 'AcctType', 'Amount', 'B', 'C', 'City', 'Comment', 
'Country', 'D', 'DuplicateAddressFlag', 'E', 'F' 'FromAccount', 
'FromAccountNum', 'FromAccountT', 'G', 'PN', 'PriorCity', 'PriorCountry', 
'PriorState', 'PriorStreetAddress','PriorStreetAddress2', 'PriorZip', 'RTID', 
'State', 'Street1','Street2', 'Timestamp', 'ToAccount', 'ToAccountNum', 
'ToAccountT', 'TransferAmount', 'TransferMade', 'TransferTimestamp', 'Ttype', 
'WA','WC', 'Zip'

I welcome any feedback on how to best do this. Thank

Re: continue vs. pass in this IO reading and writing

2015-09-03 Thread kbtyo
On Thursday, September 3, 2015 at 12:12:04 PM UTC-4, Chris Angelico wrote:
> On Fri, Sep 4, 2015 at 1:57 AM, kbtyo  wrote:
> > I have used CSV and collections. For some reason when I apply this 
> > algorithm, all of my files are not added (the output is ridiculously small 
> > considering how much goes in - think KB output vs MB input):
> >
> > from glob import iglob
> > import csv
> > from collections import OrderedDict
> >
> > files = sorted(iglob('*.csv'))
> > header = OrderedDict()
> > data = []
> >
> > for filename in files:
> > with open(filename, 'r') as fin:
> > csvin = csv.DictReader(fin)
> > header.update(OrderedDict.fromkeys(csvin.fieldnames))
> > data.append(next(csvin))
> >
> > with open('output_filename_version2.csv', 'w') as fout:
> > csvout = csv.DictWriter(fout, fieldnames=list(header))
> > csvout.writeheader()
> > csvout.writerows(data)
> 
> You're collecting up just one row from each file. Since you say your
> input is measured in MB (not GB or anything bigger), the simplest
> approach is probably fine: instead of "data.append(next(csvin))", just
> use "data.extend(csvin)", which should grab them all. That'll store
> all your input data in memory, which should be fine if it's only a few
> meg, and probably not a problem for anything under a few hundred meg.
> 
> ChrisA

H - good point. However, I may have to deal with larger files, but thank 
you for the tip. 

I am also wondering, based on what you stated, you are only "collecting up just 
one row from each file"

I am fulfilling this, correct? 

"I have files that may have different headers. If they are different, they 
should be appended (along with their values) into the output. If there are 
duplicate headers, then their values should just be added sequentially."

I am wondering how DictReader can skip empty rows by default and that this may 
be happening that also extrapolates to the other rows.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: continue vs. pass in this IO reading and writing

2015-09-03 Thread kbtyo
On Thursday, September 3, 2015 at 11:52:16 AM UTC-4, Chris Angelico wrote:
> On Fri, Sep 4, 2015 at 1:38 AM, kbtyo wrote:
> > Thank you for the elaboration. So, what I hear you saying is that (citing, 
> > "In this case, there's no further body, so it's going to be the same as 
> > "pass" (which
> > means "do nothing")") that the else block is not entered. For exma
> 
> Seems like a cut-off paragraph here, but yes. In a try/except/else
> block, the 'else' block executes only if the 'try' didn't raise an
> exception of the specified type(s).
> 
> > Do you mind elaborating on what you meant by "compatible headers?". The 
> > files that I am processing may or may not have the same headers (but if 
> > they do they should add the respective values only).
> >
> 
> Your algorithm is basically: Take the entire first file, including its
> header, and then append all other files after skipping their first
> lines. If you want a smarter form of CSV merge, I would recommend
> using the 'csv' module, and probably doing a quick check of all files
> before you begin, so as to collect up the full set of headers. That'll
> also save you the hassle of playing around with StopIteration as you
> read in the headers.
> 
> ChrisA


I have files that may have different headers. If they are different, they 
should be appended (along with their values). If there are duplicate headers, 
then their values should just be added. 

I have used CSV and collections. For some reason when I apply this algorithm, 
all of my files are not added (the output is ridiculously small considering how 
much goes in - think KB output vs MB input):

from glob import iglob
import csv
from collections import OrderedDict

files = sorted(iglob('*.csv'))
header = OrderedDict()
data = []

for filename in files:
with open(filename, 'r') as fin:
csvin = csv.DictReader(fin)
header.update(OrderedDict.fromkeys(csvin.fieldnames))
data.append(next(csvin))

with open('output_filename_version2.csv', 'w') as fout:
csvout = csv.DictWriter(fout, fieldnames=list(header))
csvout.writeheader()
csvout.writerows(data)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: continue vs. pass in this IO reading and writing

2015-09-03 Thread kbtyo
On Thursday, September 3, 2015 at 11:27:58 AM UTC-4, Chris Angelico wrote:
> On Fri, Sep 4, 2015 at 1:05 AM, kbtyo wrote:
> > However, I am uncertain as to how this executes in a context like this:
> >
> > import glob
> > import csv
> > from collections import OrderedDict
> >
> > interesting_files = glob.glob("*.csv")
> >
> > header_saved = False
> > with open('merged_output_mod.csv','w') as fout:
> >
> > for filename in interesting_files:
> > print("execution here again")
> > with open(filename) as fin:
> > try:
> > header = next(fin)
> > print("Entering Try and Except")
> > except:
> > StopIteration
> > continue
> 
> I think what you want here is:
> 
> except StopIteration:
> continue
> 
> The code you have will catch _any_ exception, and then look up the
> name StopIteration (and discard it).
> 
> > else:
> > if not header_saved:
> > fout.write(header)
> > header_saved = True
> > print("We got here")
> > for line in fin:
> > fout.write(line)
> >
> > My questions are (for some reason my interpreter does not print out any 
> > readout):
> >
> > 1. after the exception is raised does the continue return back up to the 
> > beginning of the for loop (and the "else" conditional is not even 
> > encountered)?
> >
> > 2. How would a pass behave in this situation?
> 
> The continue statement means "skip the rest of this loop's body and go
> to the next iteration of the loop, if there is one". In this case,
> there's no further body, so it's going to be the same as "pass" (which
> means "do nothing").


So what I hear you saying is I am not entering the else" block? Hence, when 
each file is read, the rest of the suite is not applied - specifically, 

   if not header_saved:
   fout.write(header)
   header_saved = True
   print("We got here")

> 
> For the rest, I think your code should be broadly functional. Of
> course, it assumes that your files all have compatible headers, but
> presumably you know that that's safe.
> 
> ChrisA

Would you mind elaborating on what you meant by "compatible headers"? I have 
files that may have different headers. If they are different, they should be 
appended (along with their values). If there are duplicate headers, then their 
values should just be added. 



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: continue vs. pass in this IO reading and writing

2015-09-03 Thread kbtyo
On Thursday, September 3, 2015 at 11:27:58 AM UTC-4, Chris Angelico wrote:
> On Fri, Sep 4, 2015 at 1:05 AM, kbtyo  wrote:
> > However, I am uncertain as to how this executes in a context like this:
> >
> > import glob
> > import csv
> > from collections import OrderedDict
> >
> > interesting_files = glob.glob("*.csv")
> >
> > header_saved = False
> > with open('merged_output_mod.csv','w') as fout:
> >
> > for filename in interesting_files:
> > print("execution here again")
> > with open(filename) as fin:
> > try:
> > header = next(fin)
> > print("Entering Try and Except")
> > except:
> > StopIteration
> > continue
> 
> I think what you want here is:
> 
> except StopIteration:
> continue
> 
> The code you have will catch _any_ exception, and then look up the
> name StopIteration (and discard it).
> 
> > else:
> > if not header_saved:
> > fout.write(header)
> > header_saved = True
> > print("We got here")
> > for line in fin:
> > fout.write(line)
> >
> > My questions are (for some reason my interpreter does not print out any 
> > readout):
> >
> > 1. after the exception is raised does the continue return back up to the 
> > beginning of the for loop (and the "else" conditional is not even 
> > encountered)?
> >
> > 2. How would a pass behave in this situation?
> 
> The continue statement means "skip the rest of this loop's body and go
> to the next iteration of the loop, if there is one". In this case,
> there's no further body, so it's going to be the same as "pass" (which
> means "do nothing").
> 
> For the rest, I think your code should be broadly functional. Of
> course, it assumes that your files all have compatible headers, but
> presumably you know that that's safe.
> 
> ChrisA

Hi ChrisA:

Thank you for the elaboration. So, what I hear you saying is that (citing, "In 
this case, there's no further body, so it's going to be the same as "pass" 
(which 
means "do nothing")") that the else block is not entered. For exma

Do you mind elaborating on what you meant by "compatible headers?". The files 
that I am processing may or may not have the same headers (but if they do they 
should add the respective values only). 
-- 
https://mail.python.org/mailman/listinfo/python-list


continue vs. pass in this IO reading and writing

2015-09-03 Thread kbtyo
Good Morning:

I am experimenting with many exception handling and utilizing continue vs pass. 
After pouring over a lot of material on SO and other forums I am still unclear 
as to the difference when setting variables and applying functions within 
multiple "for" loops. 

Specifically, I understand that the general format in the case of pass and 
using else is the following:

try:
  doSomething()
except Exception: 
pass
else:
  stuffDoneIf()
  TryClauseSucceeds()

However, I am uncertain as to how this executes in a context like this:

import glob
import csv
from collections import OrderedDict

interesting_files = glob.glob("*.csv") 

header_saved = False
with open('merged_output_mod.csv','w') as fout:

for filename in interesting_files:
print("execution here again")
with open(filename) as fin:
try:
header = next(fin)
print("Entering Try and Except")
except:
StopIteration
continue
else:
if not header_saved:
fout.write(header)
header_saved = True
print("We got here")
for line in fin:
fout.write(line)

My questions are (for some reason my interpreter does not print out any 
readout):

1. after the exception is raised does the continue return back up to the 
beginning of the for loop (and the "else" conditional is not even encountered)?

2. How would a pass behave in this situation?

Thanks for your feedback. 

Sincerely,

Saran
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to decipher error "Nonetype" Error

2015-09-02 Thread kbtyo
On Wednesday, September 2, 2015 at 6:29:05 PM UTC-4, Chris Angelico wrote:
> On Thu, Sep 3, 2015 at 8:15 AM, kbtyo wrote:
> > However, when I hit line 40 (referencing the gist), I receive the following 
> > error:
> >
> > ---
> > TypeError Traceback (most recent call last)
> >  in ()
> >  23 # to ensure that the field names in the XML don't match 
> > (and override) the
> >  24 # field names already in the dictionary
> > ---> 25 row.update(xml_data)
> >  26 # ensure that the headers have all the right fields
> >  27 headers.update(row.keys())
> >
> > TypeError: 'NoneType' object is not iterable
> >
> > I can only infer I am passing or converting an empty key or empty row. I 
> > welcome feedback.
> 
> NoneType is the type of the singleton object None. So what this means
> is that xml_data is None at this point.
> 
> In your just_xml_data() function, which is what provides xml_data at
> that point, there are two code branches that matter here: one is the
> "else: return xml", and the other is the implicit "return None" at the
> end of the function. If you catch an exception, you print out a
> message and then allow the function to return None. Is that really
> your intention? It seems an odd way to do things, but if that is what
> you want, you'll need to cope with the None return in the main
> routine.
> 
> BTW, your gist has a couple of non-comments on lines 48 and 49. If you
> can make sure that your code is functionally correct, it'll be easier
> for us to test. (Even more so if you can include a sample input file
> in the gist, though preferably not a huge one.) At the moment, I'm
> just eyeballing the code itself, but if someone can actually run the
> script and reproduce the exact error you're seeing, it makes debugging
> that much easier.
> 
> All the best!
> 
> ChrisA

Hi Chris:

I made changes to the gist: 
https://gist.github.com/ahlusar1989/de2381c1fb77e96ae601

Ahh  Thanks for catching that. No, I want to skip over the Exception 
and return the xml_data in the try block. I didn't realize that was the 
behaviour. I don't want to break the iterative loop. Any advice on how to 
absolve this?

Thanks.
-- 
https://mail.python.org/mailman/listinfo/python-list


How to decipher error "Nonetype" Error

2015-09-02 Thread kbtyo
I am currently using Jupyter and Pythopn 3.4. I am currently using the 
following script to parse and convert XML data structures, nested in a CSV - 
formatted file, convert into a dictionary and then append write it back into 
the same CSV. I am using the following (I added comments to explain the thought 
process): 

https://gist.github.com/ahlusar1989/de2381c1fb77e96ae601

However, when I hit line 40 (referencing the gist), I receive the following 
error:

---
TypeError Traceback (most recent call last)
 in ()
 23 # to ensure that the field names in the XML don't match (and 
override) the
 24 # field names already in the dictionary
---> 25 row.update(xml_data)
 26 # ensure that the headers have all the right fields
 27 headers.update(row.keys())

TypeError: 'NoneType' object is not iterable 

I can only infer I am passing or converting an empty key or empty row. I 
welcome feedback.
-- 
https://mail.python.org/mailman/listinfo/python-list


AttributeError: 'module' object has no attribute '__path__'

2015-08-31 Thread kbtyo

I am using Jupyter notebooks, with Python 3.4. The error below references the 
Anaconda distribution package. This error occurred quite precipitously (only 2 
minutes before I was able to import the modules). I am using Windows 7. My path 
in the console uses Python27. I also have python 34 as well. I am not sure 
where to start modifying the path and fear that I will break my initial setup. 
I welcome feedback on next steps.


---
AttributeErrorTraceback (most recent call last)
C:\Program Files\New\lib\importlib\_bootstrap.py in 
_find_and_load_unlocked(name, import_)

AttributeError: 'module' object has no attribute '__path__'

During handling of the above exception, another exception occurred:

ImportError   Traceback (most recent call last)
 in ()
  8 import fnmatch
  9 # import xml.etree.cElementTree as ElementTree
---> 10 from xml.etree.ElementTree import XMLParser
 11 import xml.etree.ElementTree as ElementTree
 12 import glob

ImportError: No module named 'xml.etree'; 'xml' is not a package
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: TypeError: unhashable type: 'dict' when attempting to hash list - advice sought

2015-08-30 Thread kbtyo
On Sunday, August 30, 2015 at 1:16:12 PM UTC-4, MRAB wrote:
> On 2015-08-30 17:31, kbtyo wrote:
> > On Saturday, August 29, 2015 at 10:50:18 PM UTC-4, MRAB wrote:
> >> On 2015-08-30 03:05, kbtyo wrote:
> >> > I am using Jupyter Notebook and Python 3.4. I have a data structure in 
> >> > the format, (type list):
> >> >
> >> > [{'AccountNumber': N,
> >> > 'Amount': '0',
> >> >   'Answer': '12:00:00 PM',
> >> >'ID': None,
> >> >'Type': 'WriteLetters',
> >> >'Amount': '10',
> >> >{'AccountNumber': Y,
> >> >'Amount': '0',
> >> >'Answer': ' 12:00:00 PM',
> >> > 'ID': None,
> >> >'Type': 'Transfer',
> >> >'Amount': '2'}]
> >> >
> >> > The end goal is to write this out to CSV.
> >> >
> >> > For the above example the output would look like:
> >> >
> >> > AccountNumber, Amount, Answer, ID, Type, Amount
> >> > N,0,12:00:00 PM,None,WriteLetters,10
> >> > Y,2,12:00:00 PM,None,Transfer,2
> >> >
> >> > Below is the function that I am using to write out this data structure. 
> >> > Please excuse any indentation formatting issues. The data structure is 
> >> > returned through the function "construct_results(get_just_xml_data)".
> >> >
> >> > The data that is returned is in the format as above. 
> >> > "construct_headers(get_just_xml_data)" returns a list of headers. 
> >> > Writing out the row for "headers_list" works.
> >> >
> >> > The list comprehension "data" is to maintain the integrity of the column 
> >> > headers and the values for each new instance of the data structure 
> >> > (where the keys in the dictionary are the headers and values - row 
> >> > instances). The keys in this specific data structure are meant to check 
> >> > if there is a value instance, and if there is not - place an ''.
> >> >
> >> > def write_to_csv(results, headers):
> >> >
> >> >  headers = construct_headers(get_just_xml_data)
> >> >  results = construct_results(get_just_xml_data)
> >> >  headers_list = list(headers)
> >> >
> >> >  with open('real_csv_output.csv', 'wt') as f:
> >> >  writer = csv.writer(f)
> >> >  writer.writerow(headers_list)
> >> >  for row in results:
> >> >  data = [row.get(index, '') for index in results]
> >> >  writer.writerow(data)
> >> >
> >> >
> >> >
> >> > However, when I run this, I receive this error:
> >> >
> >> > ---
> >> > TypeError Traceback (most recent call 
> >> > last)
> >> >  in ()
> >> > > 1 write_to_csv(results, headers)
> >> >
> >> >  in write_to_csv(results, headers)
> >> >9 writer.writerow(headers_list)
> >> >   10 for item in results:
> >> > ---> 11 data = [item.get(index, '') for index in results]
> >> >   12 writer.writerow(data)
> >> >
> >> >  in (.0)
> >> >9 writer.writerow(headers_list)
> >> >   10 for item in results:
> >> > ---> 11 data = [item.get(index, '') for index in results]
> >> >   12 writer.writerow(data)
> >> >
> >> > TypeError: unhashable type: 'dict'
> >> >
> >> >
> >> > I have done some research, namely, the following:
> >> >
> >> > https://mail.python.org/pipermail//tutor/2011-November/086761.html
> >> >
> >> > http://stackoverflow.com/questions/27435798/unhashable-type-dict-type-error
> >> >
> >> > http://stackoverflow.com/questions/1957396/why-dict-objects-are-unhashable-in-python
> >> >
> >> > However, I am still perplexed by this error. Any feedback is welcomed. 
> >> > Thank you.
> >> >
>

Re: TypeError: unhashable type: 'dict' when attempting to hash list - advice sought

2015-08-30 Thread kbtyo
On Saturday, August 29, 2015 at 11:04:53 PM UTC-4, Ben Finney wrote:
> kbtyo writes:
> 
> > I am using Jupyter Notebook and Python 3.4.
> 
> Thank you for saying so! It is not always required, but when it matters,
> this information is important to state up front.
> 
> > I have a data structure in the format, (type list):
> >
> > [{'AccountNumber': N,
> > 'Amount': '0',
> >  'Answer': '12:00:00 PM',
> >   'ID': None,
> >   'Type': 'WriteLetters',
> >   'Amount': '10',
> >   {'AccountNumber': Y,
> >   'Amount': '0',
> >   'Answer': ' 12:00:00 PM',   
> >'ID': None,
> >   'Type': 'Transfer',
> >   'Amount': '2'}]
> >
> > The end goal is to write this out to CSV.
> 
> So that assumes that *every* item will be a mapping with all the same
> keys. CSV is limited to a sequence of "records" which all have the same
> fields in the same order.

This clue tipped me off that I wasn't collecting the newly generate key value 
pairs from my XML parser properly. I was using the dictionary built in method 
update to update the keys. The terrible thing was that the returned dictionary 
was only updated with the last keys and values. What a couple of hours of shut 
eye can do for the mind and body. 

> 
> > The list comprehension "data" is to maintain the integrity of the
> > column headers and the values for each new instance of the data
> > structure (where the keys in the dictionary are the headers and values
> > - row instances). The keys in this specific data structure are meant
> > to check if there is a value instance, and if there is not - place an
> > ''.
> >
> 
> [...]
> > for row in results:
> > data = [row.get(index, '') for index in results]
> 
> The 'for' statement iterates over 'results', getting an item each time.
> The name 'row' is bound to each item in turn.
> 
> Then, each time through the 'for' loop, you iterate *again* over
> 'results'. The name 'index' is bound to each item.
> 
> You then attempt to use the dict (each item from 'results' is itself a
> dict) as a key into that same dict. A dict is not a valid key; it is not
> a "hashable type" i.e. a type with a fixed value, that can produce a
> hash of the value).

I discovered that. I need to iterate again to access the keys and values. 
> 
> So you're getting dicts and attempting to use those dicts as keys into
> dicts. That will give the error "TypeError: unhashable type: 'dict'".
> 
> I think what you want is not items from the original sequence, but the
> keys from the mapping::
> 
> for input_record in results:
> output_record = [input_record.get(key, "") for key in input_record]
> 
> But you're then throwing away the constructed list, since you do nothing
> with it before the end of the loop.
> 
> > writer.writerow(data)
> 
> This statement occurs only *after* all the items from 'results' have
> been iterated. You will only have the most recent constructed row.
> 
> Perhaps you want::
> 
> for input_record in results:
> output_record = [input_record.get(key, "") for key in input_record]
> writer.writerow(output_record)
> 

I tried this and some of the values maintained integrity and some were 
rewritten by a previous dictionary's values. 


> -- 
>  \   "An idea isn't responsible for the people who believe in it." |
>   `\  --Donald Robert Perry Marquis |
> _o__)  |
> Ben Finney

@BenFinney:

I feel that I need to provide some context to avoid any confusion over my 
motivations for choosing to do something. 

My original task was to parse an XML data structure stored in a CSV file with 
other data types and then add the elements back as headers and the text as row 
values. I went back to drawing board and creating a "results" list of 
dictionaries where the keys have values as lists using this. 

def convert_list_to_dict(get_just_xml_data):
d = {}
for item in get_just_xml_data(get_all_data):
for k, v in item.items():
try:
d[k].append(v)
except KeyError:
d[k] = [v]
return d 

This creates a dictionary for each XML tag - for example: 
{
 'Number1': [&#

Re: TypeError: unhashable type: 'dict' when attempting to hash list - advice sought

2015-08-30 Thread kbtyo
On Saturday, August 29, 2015 at 10:50:18 PM UTC-4, MRAB wrote:
> On 2015-08-30 03:05, kbtyo wrote:
> > I am using Jupyter Notebook and Python 3.4. I have a data structure in the 
> > format, (type list):
> >
> > [{'AccountNumber': N,
> > 'Amount': '0',
> >   'Answer': '12:00:00 PM',
> >'ID': None,
> >'Type': 'WriteLetters',
> >'Amount': '10',
> >{'AccountNumber': Y,
> >'Amount': '0',
> >'Answer': ' 12:00:00 PM',
> > 'ID': None,
> >'Type': 'Transfer',
> >'Amount': '2'}]
> >
> > The end goal is to write this out to CSV.
> >
> > For the above example the output would look like:
> >
> > AccountNumber, Amount, Answer, ID, Type, Amount
> > N,0,12:00:00 PM,None,WriteLetters,10
> > Y,2,12:00:00 PM,None,Transfer,2
> >
> > Below is the function that I am using to write out this data structure. 
> > Please excuse any indentation formatting issues. The data structure is 
> > returned through the function "construct_results(get_just_xml_data)".
> >
> > The data that is returned is in the format as above. 
> > "construct_headers(get_just_xml_data)" returns a list of headers. Writing 
> > out the row for "headers_list" works.
> >
> > The list comprehension "data" is to maintain the integrity of the column 
> > headers and the values for each new instance of the data structure (where 
> > the keys in the dictionary are the headers and values - row instances). The 
> > keys in this specific data structure are meant to check if there is a value 
> > instance, and if there is not - place an ''.
> >
> > def write_to_csv(results, headers):
> >
> >  headers = construct_headers(get_just_xml_data)
> >  results = construct_results(get_just_xml_data)
> >  headers_list = list(headers)
> >
> >  with open('real_csv_output.csv', 'wt') as f:
> >  writer = csv.writer(f)
> >  writer.writerow(headers_list)
> >  for row in results:
> >  data = [row.get(index, '') for index in results]
> >  writer.writerow(data)
> >
> >
> >
> > However, when I run this, I receive this error:
> >
> > ---
> > TypeError Traceback (most recent call last)
> >  in ()
> > > 1 write_to_csv(results, headers)
> >
> >  in write_to_csv(results, headers)
> >9 writer.writerow(headers_list)
> >   10 for item in results:
> > ---> 11 data = [item.get(index, '') for index in results]
> >   12 writer.writerow(data)
> >
> >  in (.0)
> >9 writer.writerow(headers_list)
> >   10 for item in results:
> > ---> 11 data = [item.get(index, '') for index in results]
> >   12 writer.writerow(data)
> >
> > TypeError: unhashable type: 'dict'
> >
> >
> > I have done some research, namely, the following:
> >
> > https://mail.python.org/pipermail//tutor/2011-November/086761.html
> >
> > http://stackoverflow.com/questions/27435798/unhashable-type-dict-type-error
> >
> > http://stackoverflow.com/questions/1957396/why-dict-objects-are-unhashable-in-python
> >
> > However, I am still perplexed by this error. Any feedback is welcomed. 
> > Thank you.
> >
> You're taking the index values from 'results' instead of 'headers'.

Would you be able to elaborate on this? I partially understand what you mean. 
However, each dictionary (of results) has the same keys to map to (aka, headers 
when written out to CSV). I am wondering if you would be able to explain how 
the index is being used in this case?
-- 
https://mail.python.org/mailman/listinfo/python-list


TypeError: unhashable type: 'dict' when attempting to hash list - advice sought

2015-08-29 Thread kbtyo
I am using Jupyter Notebook and Python 3.4. I have a data structure in the 
format, (type list):

[{'AccountNumber': N,
'Amount': '0',
 'Answer': '12:00:00 PM',
  'ID': None,
  'Type': 'WriteLetters',
  'Amount': '10',
  {'AccountNumber': Y,
  'Amount': '0',
  'Answer': ' 12:00:00 PM',   
   'ID': None,
  'Type': 'Transfer',
  'Amount': '2'}]

The end goal is to write this out to CSV.

For the above example the output would look like:

AccountNumber, Amount, Answer, ID, Type, Amount
N,0,12:00:00 PM,None,WriteLetters,10
Y,2,12:00:00 PM,None,Transfer,2

Below is the function that I am using to write out this data structure. Please 
excuse any indentation formatting issues. The data structure is returned 
through the function "construct_results(get_just_xml_data)". 

The data that is returned is in the format as above. 
"construct_headers(get_just_xml_data)" returns a list of headers. Writing out 
the row for "headers_list" works.

The list comprehension "data" is to maintain the integrity of the column 
headers and the values for each new instance of the data structure (where the 
keys in the dictionary are the headers and values - row instances). The keys in 
this specific data structure are meant to check if there is a value instance, 
and if there is not - place an ''.

def write_to_csv(results, headers):

headers = construct_headers(get_just_xml_data)
results = construct_results(get_just_xml_data)
headers_list = list(headers)

with open('real_csv_output.csv', 'wt') as f:
writer = csv.writer(f)
writer.writerow(headers_list)
for row in results:
data = [row.get(index, '') for index in results]
writer.writerow(data)



However, when I run this, I receive this error:

---
TypeError Traceback (most recent call last)
 in ()
> 1 write_to_csv(results, headers)

 in write_to_csv(results, headers)
  9 writer.writerow(headers_list)
 10 for item in results:
---> 11 data = [item.get(index, '') for index in results]
 12 writer.writerow(data)

 in (.0)
  9 writer.writerow(headers_list)
 10 for item in results:
---> 11 data = [item.get(index, '') for index in results]
 12 writer.writerow(data)

TypeError: unhashable type: 'dict'


I have done some research, namely, the following:

https://mail.python.org/pipermail//tutor/2011-November/086761.html

http://stackoverflow.com/questions/27435798/unhashable-type-dict-type-error

http://stackoverflow.com/questions/1957396/why-dict-objects-are-unhashable-in-python

However, I am still perplexed by this error. Any feedback is welcomed. Thank 
you.
-- 
https://mail.python.org/mailman/listinfo/python-list


enumerate XML tags (keys that will become headers) along with text (values) and write to CSV in one row (as opposed to "stacked" values with one header)

2015-06-25 Thread kbtyo
My question can be found here:


http://stackoverflow.com/questions/31058100/enumerate-column-headers-in-csv-that-belong-to-the-same-tag-key-in-python


Here is an additional sample sample of the XML that I am working with: 



0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0


0
0
0
0
0
0


0
0
0
0
0
0
0
0


1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM




False
False
False
False
False
0




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: To write headers once with different values in separate row in CSV

2015-06-25 Thread kbtyo
Okay, so I have gone back to the drawing board and have the following 
predicament (my apologies, in advance for the indentation):

Here is my sample:


0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0


0
0
0
0
0
0


0
0
0
0
0
0
0
0


1/1/0001 12:00:00 
AM
1/1/0001 12:00:00 
AM
1/1/0001 12:00:00 
AM
1/1/0001 12:00:00 
AM
1/1/0001 12:00:00 
AM
1/1/0001 12:00:00 
AM
1/1/0001 12:00:00 
AM
1/1/0001 12:00:00 
AM




False
False
False
False
False
0






Using this:


import xml.etree.cElementTree as ElementTree 
from xml.etree.ElementTree import XMLParser
import csv

def flatten_list(aList, prefix=''):
for i, element in enumerate(aList, 1):
eprefix = "{}{}".format(prefix, i)
if element:
# treat like dict 
if len(element) == 1 or element[0].tag != element[1].tag: 
yield from flatten_dict(element, eprefix)
# treat like list 
elif element[0].tag == element[1].tag: 
yield from flatten_list(element, eprefix)
elif element.text: 
text = element.text.strip() 
if text: 
yield eprefix[:].rstrip('.'), element.text

def flatten_dict(parent_element, prefix=''):
prefix = prefix + parent_element.tag 
if parent_element.items():
for k, v in parent_element.items():
yield prefix + k, v
for element in parent_element:
eprefix = prefix + element.tag  
if element:
# treat like dict - we assume that if the first two tags 
# in a series are different, then they are all different. 
if len(element) == 1 or element[0].tag != element[1].tag: 
yield from flatten_dict(element, prefix=prefix)
# treat like list - we assume that if the first two tags 
# in a series are the same, then the rest are the same. 
else: 
# here, we put the list in dictionary; the key is the 
# tag name the list elements all share in common, and 
# the value is the list itself
yield from flatten_list(element, prefix=eprefix)
# if the tag has attributes, add those to the dict
if element.items():
for k, v in element.items():
yield eprefix+k
# this assumes that if you've got an attribute in a tag, 
# you won't be having any text. This may or may not be a 
# good idea -- time will tell. It works for the way we are 
# currently doing XML configuration files... 
elif element.items(): 
for k, v in element.items():
yield eprefix+k
# finally, if there are no child tags and no attributes, extract 
# the text 
else:
yield eprefix, element.text

def makerows(pairs):
headers = []
columns = {}
for k, v in pairs:
if k in columns:
columns[k].extend(

Re: To write headers once with different values in separate row in CSV

2015-06-24 Thread kbtyo
On Wednesday, June 24, 2015 at 8:38:24 AM UTC-4, Steven D'Aprano wrote:
> On Wed, 24 Jun 2015 09:37 pm, kbtyo wrote:
> 
> > On Tuesday, June 23, 2015 at 9:50:50 PM UTC-4, Steven D'Aprano wrote:
> >> On Wed, 24 Jun 2015 03:15 am, Sahlusar wrote:
> >> 
> >> > That is not the underlying issue. Any thoughts or suggestions would be
> >> > very helpful.
> >> 
> >> 
> >> Thank you for spending over 100 lines to tell us what is NOT the
> >> underlying issue. I will therefore tell you what is NOT the solution to
> >> your problem (whatever it is, since I can't tell). The solution is NOT to
> >> squeeze lemon juice into your keyboard.
> >> 
> >> If someday you feel like telling us what the issue actually IS, instead
> >> of what it IS NOT, then perhaps we will have a chance to help you find a
> >> solution.
> >> 
> >> 
> >> 
> >> --
> >> Steven
> > 
> > Curious - what should I have provided? 
> 
> To start with, you should tell us what is the problem you are having. You
> gave us some code, and then said "That is not the underlying issue". Okay,
> so what is the underlying issue? What is the problem you want help solving?
> 
> In another post, you responded to John Gordon's question:
> 
> # John
> Have you tried creating some dummy data by hand and seeing 
> how makerows() handles it?
> 
> 
> by answering:
> 
> Yes I did do this.
> 
> 
> Okay. What was the result? Do you want us to guess what result you got?
> 
> 
> John also suggested that you provide sample data, and an implementation of
> flatten_dict, and your answer is:
> 
> Yes, unfortunately, due to NDA protocols I cannot share this.
> 
> 
> You don't have to provide your *actual* data. You can provide *sample* data,
> that does not contain any of your actual confidential values. If your XML
> file looks like this:
> 
> 
> 
>
>   Gambardella, Matthew
>   XML Developer's Guide
>   Computer
>   44.95
>   2000-10-01
>   An in-depth look at creating applications 
>   with XML.
>
> 
> 
> 
> you can replace the data:
> 
> 
> 
>
>   Smith, John
>   ABCDEF
>   Widgets
>   .99
>   1900-01-01
>   blah blah blah blah
>
> 
> 
> 
> You can even change the tags:
> 
> 
> 
> 
>
>   Smith, John
>   ABCDEF
>   Widgets
>   .99
>   1900-01-01
>   blah blah blah blah
>
> 
> 
> 
> If you're still worried that the sample XML has the same structure as your
> real data, you can remove some fields and add new ones:
> 
> 
> 
>
>   ABCDEF
>   .99
>   1900-01-01
>   fe fi fo fum
>   blah blah blah blah
>
> 
> 
> 
> If you can't share the flatten_dict() function, either: 
> 
> (1) get permission to share it from your manager or project leader.
> flatten_dict is not a trade secret or valuable in any way, and
> half-competent Python programmer can probably come up with two or three
> different ways to flatten a dict in five minutes. They're all going to look
> more or less the same, because there's only so many ways to flatten a dict.
> 
> (2) Or accept that we can't help you, and deal with it on your own.
> 
> 
> 
> > Detailed and constructive feedback 
> > (like your reply to my post regarding importing functions) is more useful
> > than to "squeeze lemon juice" into one's keyboard.
> 
> Of course. That is why I said it was NOT the solution. Don't waste your time
> squeezing lemon juice over your keyboard, it won't solve your problem.
> 
> But you can't expect us to guess what your problem is, or debug code we
> can't see, or read your mind and understand your data.
> 
> Before you ask any more questions, please read this:
> 
> http://sscce.org/
> 
> 
> 
> -- 
> Steven



On Wednesday, June 24, 2015 at 8:38:24 AM UTC-4, Steven D'Aprano wrote:
> On Wed, 24 Jun 2015 09:37 pm, kbtyo wrote:
> 
> > On Tuesday, June 23, 2015 at 9:50:50 PM UTC-4, Steven D'Aprano wrote:
> >> On Wed, 24 Jun 2015 03:15 am, Sahlusar wrote:
> >> 
> >> > That is not the underlying issue. Any thoughts or suggestions would be
> >> > very helpful.
> >> 
> >> 
> >> Thank you for spending over 100 lines to tell us what is NOT the
> >> underlying issue. I will therefore tell you what is NOT the soluti

Re: Organizing function calls once files have been moved to a directory

2015-06-24 Thread kbtyo
On Tuesday, June 23, 2015 at 10:18:43 PM UTC-4, Steven D'Aprano wrote:
> On Wed, 24 Jun 2015 06:16 am, kbtyo wrote:
> 
> > I am working on a workflow module that will allow one to recursively check
> > for file extensions and if there is a match move them to a folder for
> > processing (parsing, data wrangling etc).
> > 
> > I have a simple search process, and log for the files that are present
> > (see below). However, I am puzzled by what the most efficient
> > method/syntax is to call functions once the selected files have been
> > moved? 
> 
> The most efficient syntax is the regular syntax that you always use when
> calling a file:
> 
> function(arg, another_arg)
> 
> 
> What else would you use?
> 
> 
> > I have the functions and classes written in another file. Should I 
> > import them or should I include them in the same file as the following
> > mini-script?
> 
> That's entirely up to you. Some factors you might consider:
> 
> - Are these functions and classes reusable by other code? then you might
> want to keep them separate in another file, treated as a library, and
> import the library into your application.
> 
> - If you merge the two files together, will it be so big that it is
> difficult to work with? Then don't merge them together. My opinion is that
> the decimal module from the standard library is about as big as a single
> module should every be, and it is almost 6,500 lines. So if your
> application is bigger than that, you might want to split it.
> 
> 
> 
> > Moreover, should I create another log file for processing? If so, what is
> > an idiomatically correct method to do so?
> 
> I don't know. Do you want a second log file? How will it be different from
> the first?
> 
> As for creating another log file, I guess the most correct way to do so
> would be the same way you created the first log file.
> 
> I'm not sure I actually understand your questions so far.
> 
> Some further comments on your code:
> 
> > if __name__ == '__main__':
> > 
> > # The top argument for name in files
> > topdir = '.'
> > dest = 'C:\\Users\\wynsa2\\Desktop\\'
> 
> Rather than escaping backslashes, you can use regular forward slashes:
> 
> dest = 'C:/Users/wynsa2/Desktop/'
> 
> 
> Windows will accept either.
> 
> 
> > extens = ['docs', 'docx', 'pdf'] # the extensions to search for
> > found = {x: [] for x in extens} # lists of found files
> >  
> > # Directories to ignore
> > ignore = ['docs', 'doc', 'py', 'pdf']
> > logname = "file_search.log"
> > print('Beginning search for files in %s' % os.path.realpath(topdir))
> >   
> > # Walk the tree
> > for dirpath, dirnames, files in os.walk(topdir):
> > # Remove directories in ignore
> > # directory names must match exactly!
> > for idir in ignore:
> > if idir in dirnames:
> > dirnames.remove(idir)
> >  
> > # Loop through the file names for the current step
> > for name in files:
> >  #Calling str.rsplit on name then
> > #splits the string into a list (from the right)
> > #with the first argument "."" delimiting it,
> > #and only making as many splits as the second argument (1).
> > #The third part ([-1]) retrieves the last element of the list--we
> > #use this instead of an index of 1 because if no splits are made
> > #(if there is no "."" in name), no IndexError will be raised
> > 
> > ext = name.lower().rsplit('.', 1)[-1]
> 
> The better way to split the extension from the file name is to use
> os.path.splitext(name):
> 
> 
> py> import os
> py> os.path.splitext("this/file.txt")
> ('this/file', '.txt')
> py> os.path.splitext("this/file")  # no extension
> ('this/file', '')
> py> os.path.splitext("this/file.tar.gz")
> ('this/file.tar', '.gz')
> 
> 
> -- 
> Steven



On Tuesday, June 23, 2015 at 10:18:43 PM UTC-4, Steven D'Aprano wrote:
> On Wed, 24 Jun 2015 06:16 am, kbtyo wrote:
> 
> > I am working on a workflow module that will allow one to recursively check
> > for file extensions and if there is a match move them to a folder for
> > processing (parsing, data wrangling etc).
> > 
> > I have a simple search process, and l

Re: To write headers once with different values in separate row in CSV

2015-06-24 Thread kbtyo
On Tuesday, June 23, 2015 at 9:50:50 PM UTC-4, Steven D'Aprano wrote:
> On Wed, 24 Jun 2015 03:15 am, Sahlusar wrote:
> 
> > That is not the underlying issue. Any thoughts or suggestions would be
> > very helpful.
> 
> 
> Thank you for spending over 100 lines to tell us what is NOT the underlying
> issue. I will therefore tell you what is NOT the solution to your problem
> (whatever it is, since I can't tell). The solution is NOT to squeeze lemon
> juice into your keyboard.
> 
> If someday you feel like telling us what the issue actually IS, instead of
> what it IS NOT, then perhaps we will have a chance to help you find a
> solution.
> 
> 
> 
> -- 
> Steven

Curious - what should I have provided? Detailed and constructive feedback (like 
your reply to my post regarding importing functions) is more useful than to 
"squeeze lemon juice" into one's keyboard. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: To write headers once with different values in separate row in CSV

2015-06-24 Thread kbtyo
On Tuesday, June 23, 2015 at 3:12:40 PM UTC-4, John Gordon wrote:
> In  Sahlusar 
>  writes:
> 
> > However, when I extrapolate this same logic with a list like:
> 
> > ('Response.MemberO.PMembers.PMembers.Member.CurrentEmployer.EmployerAddress
> > .TimeAtPreviousAddress.', None), where the headers/columns are the first
> > item (only to be written out once) with different values. I receive an
> > output CSV with repeating headers and values all printed in one long string
> 
> First, I would try to determine if the problem is in the makerows()
> function, or if the problem is elsewhere.
> 
> Have you tried creating some dummy data by hand and seeing how makerows()
> handles it?
> 
> (By the way, if your post had included some sample data that illustrates
> the problem, it would have been much easier to figure out a solution.
> Instead, we are left guessing at your XML format, and at the particular
> implementation of flatten_dict().)
> 
> -- 
> John Gordon   A is for Amy, who fell down the stairs
> gor...@panix.com  B is for Basil, assaulted by bears
> -- Edward Gorey, "The Gashlycrumb Tinies"



On Tuesday, June 23, 2015 at 3:12:40 PM UTC-4, John Gordon wrote:
> In  Sahlusar 
>  writes:
> 
> > However, when I extrapolate this same logic with a list like:
> 
> > ('Response.MemberO.PMembers.PMembers.Member.CurrentEmployer.EmployerAddress
> > .TimeAtPreviousAddress.', None), where the headers/columns are the first
> > item (only to be written out once) with different values. I receive an
> > output CSV with repeating headers and values all printed in one long string
> 
> First, I would try to determine if the problem is in the makerows()
> function, or if the problem is elsewhere.
> 
> Have you tried creating some dummy data by hand and seeing how makerows()
> handles it?
>


Yes I did do this.  


> (By the way, if your post had included some sample data that illustrates
> the problem, it would have been much easier to figure out a solution.
> Instead, we are left guessing at your XML format, and at the particular
> implementation of flatten_dict().)

Yes, unfortunately, due to NDA protocols I cannot share this. 
> 
> -- 
> John Gordon   A is for Amy, who fell down the stairs
> gor...@panix.com  B is for Basil, assaulted by bears
> -- Edward Gorey, "The Gashlycrumb Tinies"

-- 
https://mail.python.org/mailman/listinfo/python-list


Organizing function calls once files have been moved to a directory

2015-06-23 Thread kbtyo
I am working on a workflow module that will allow one to recursively check for 
file extensions and if there is a match move them to a folder for processing 
(parsing, data wrangling etc). 

I have a simple search process, and log for the files that are present (see 
below). However, I am puzzled by what the most efficient method/syntax is to 
call functions once the selected files have been moved? I have the functions 
and classes written in another file. Should I import them or should I include 
them in the same file as the following mini-script?

Moreover, should I create another log file for processing? If so, what is an 
idiomatically correct method to do so? 

if __name__ == '__main__':

# The top argument for name in files
topdir = '.'
dest = 'C:\\Users\\wynsa2\\Desktop\\'
extens = ['docs', 'docx', 'pdf'] # the extensions to search for 
found = {x: [] for x in extens} # lists of found files
 
# Directories to ignore
ignore = ['docs', 'doc', 'py', 'pdf'] 
logname = "file_search.log"
print('Beginning search for files in %s' % os.path.realpath(topdir))  
  
# Walk the tree
for dirpath, dirnames, files in os.walk(topdir):
# Remove directories in ignore
# directory names must match exactly!
for idir in ignore:
if idir in dirnames:
dirnames.remove(idir)
 
# Loop through the file names for the current step
for name in files:
 #Calling str.rsplit on name then 
#splits the string into a list (from the right) 
#with the first argument "."" delimiting it, 
#and only making as many splits as the second argument (1). 
#The third part ([-1]) retrieves the last element of the list--we 
#use this instead of an index of 1 because if no splits are made 
#(if there is no "."" in name), no IndexError will be raised

ext = name.lower().rsplit('.', 1)[-1]
 
# Save the full name if ext matches
#log_results, errlog and batchcopy are functions
if ext in extens:
found[ext].append(os.path.join(dirpath, name))
log_results(logname, found)
batchcopy(found, dest, errlog=None)

Thank you for your help. 
  
-- 
https://mail.python.org/mailman/listinfo/python-list