Re: Looking for direction

2015-05-21 Thread 20/20 Lab
Your the second to recommend this to me.  I ended up picking it up last 
week.  So I need to sit down with it.  I was able to get a working 
project.  However, I dont fully grasp the details on how. So the book 
will help I'm sure.


Thank you.

On 05/20/2015 05:50 AM, darnold via Python-list wrote:

I recommend getting your hands on "Automate The Boring Stuff With Python" from 
no starch press:

http://www.nostarch.com/automatestuff

I've not read it in its entirety, but it's very beginner-friendly and is 
targeted at just the sort of processing you appear to be doing.

HTH,
Don


--
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for direction

2015-05-20 Thread darnold via Python-list
I recommend getting your hands on "Automate The Boring Stuff With Python" from 
no starch press:

http://www.nostarch.com/automatestuff

I've not read it in its entirety, but it's very beginner-friendly and is 
targeted at just the sort of processing you appear to be doing.

HTH,
Don
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for direction

2015-05-15 Thread Ziqi Xiong
maybe we can change this list to dict, using item[0] and item[1] as keys,
the whole item as value . then you can update by the same key i think
Tim Chase 于2015年5月15日 周五01:17写道:

> On 2015-05-14 09:57, 20/20 Lab wrote:
> > On 05/13/2015 06:23 PM, Steven D'Aprano wrote:
> >>> I have a LARGE csv file that I need to process.  110+ columns,
> >>> 72k rows.  I managed to write enough to reduce it to a few
> >>> hundred rows, and the five columns I'm interested in.
> > I actually stumbled across the csv module after coding enough to
> > make a list of lists.  So that is more the reason I approached the
> > list; Nothing like spending hours (or days) coding something that
> > already exists and just dont know about.
> >>> Now is were I have my problem:
> >>>
> >>> myList = [ [123, "XXX", "Item", "Qty", "Noise"],
> >>>  [72976, "YYY", "Item", "Qty", "Noise"],
> >>>  [123, "XXX" "ItemTypo", "Qty", "Noise"]]
> >>>
> >>> Basically, I need to check for rows with duplicate accounts
> >>> row[0] and staff (row[1]), and if so, remove that row, and add
> >>> it's Qty to the original row. I really dont have a clue how to
> >>> go about this.
> >>
> >> processed = {}  # hold the processed data in a dict
> >>
> >> for row in myList:
> >>  account, staff = row[0:2]
> >>  key = (account, staff)  # Put them in a tuple.
> >>  if key in processed:
> >>  # We've already seen this combination.
> >>  processed[key][3] += row[3]  # Add the quantities.
> >>  else:
> >>  # Never seen this combination before.
> >>  processed[key] = row
> >>
> >> newlist = list(processed.values())
> >>
> > It does, immensely.  I'll make this work.  Thank you again for the
> > link from yesterday and apologies for hitting the wrong reply
> > button.  I'll have to study more on the usage and implementations
> > of dictionaries and tuples.
>
> In processing the initial CSV file, I suspect that using a
> csv.DictReader would make the code a bit cleaner.  Additionally,
> as you're processing through the initial file, unless you need
> the intermediate data, you should be able to do it in one pass.
> Something like
>
>   HEADER_ACCOUNT = "account"
>   HEADER_STAFF = "staff"
>   HEADER_QTY = "Qty"
>
>   processed = {}
>   with open("data.csv") as f:
> reader = csv.DictReader(f)
> for row in reader:
>   if should_process_row(row):
> account = row[HEADER_ACCOUNT]
> staff = row[HEADER_STAFF]
> qty = row[HEADER_QTY]
> try:
>   row[HEADER_QTY] = qty = int(qty)
> except Exception:
>   # not a numeric quantity?
>   continue
> # from Steven's code
> key = (account, staff)
> if key in processed:
>   processed[key][HEADER_QTY] += qty
> else:
>   processed[key][HEADER_QTY] = row
>   so_something_with(processed.values())
>
> I find that using names is a lot clearer than using arbitrary
> indexing.  Barring that, using indexes-as-constants still would
> add further clarity.
>
> -tkc
>
>
>
>
> .
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for direction

2015-05-15 Thread 20/20 Lab



On 05/13/2015 06:12 PM, Dave Angel wrote:

On 05/13/2015 08:45 PM, 20/20 Lab wrote:>

You accidentally replied to me, rather than the mailing list. Please 
use reply-list, or if your mailer can't handle that, do a Reply-All, 
and remove the parts you don't want.


>
> On 05/13/2015 05:07 PM, Dave Angel wrote:
>> On 05/13/2015 07:24 PM, 20/20 Lab wrote:
>>> I'm a beginner to python.  Reading here and there. Written a 
couple of

>>> short and simple programs to make life easier around the office.
>>>
>> Welcome to Python, and to this mailing list.
>>
>>> That being said, I'm not even sure what I need to ask for. I've never
>>> worked with external data before.
>>>
>>> I have a LARGE csv file that I need to process.  110+ columns, 72k
>>> rows.
>>
>> That's not very large at all.
>>
> In the grand scheme, I guess not.  However I'm currently doing this
> whole process using office.  So it can be a bit daunting.

I'm not familiar with the "office" operating system.

>>>  I managed to write enough to reduce it to a few hundred rows, and
>>> the five columns I'm interested in.
>>
>>>
>>> Now is were I have my problem:
>>>
>>> myList = [ [123, "XXX", "Item", "Qty", "Noise"],
>>> [72976, "YYY", "Item", "Qty", "Noise"],
>>> [123, "XXX" "ItemTypo", "Qty", "Noise"]]
>>>
>>
>> It'd probably be useful to identify names for your columns, even if
>> it's just in a comment.  Guessing from the paragraph below, I figure
>> the first two columns are "account" & "staff"
>
> The columns that I pull are Account, Staff, Item Sold, Quantity sold,
> and notes about the sale (notes arent particularly needed, but the
> higher ups would like them in the report)
>>
>>> Basically, I need to check for rows with duplicate accounts row[0] 
and

>>> staff (row[1]), and if so, remove that row, and add it's Qty to the
>>> original row.
>>
>> And which column is that supposed to be?  Shouldn't there be a number
>> there, rather than a string?
>>
>>> I really dont have a clue how to go about this.  The
>>> number of rows change based on which run it is, so I couldnt even get
>>> away with using hundreds of compare loops.
>>>
>>> If someone could point me to some documentation on the functions I 
would

>>> need, or a tutorial it would be a great help.
>>>
>>
>> Is the order significant?  Do you have to preserve the order that the
>> accounts appear?  I'll assume not.
>>
>> Have you studied dictionaries?  Seems to me the way to handle the
>> problem is to read in a row, create a dictionary with key of (account,
>> staff), and data of the rest of the line.
>>
>> Each time you read a row, you check if the key is already in the
>> dictionary.  If not, add it.  If it's already there, merge the data as
>> you say.
>>
>> Then when you're done, turn the dict back into a list of lists.
>>
> The order is irrelevant.  No, I've not really studied dictionaries, but
> a few people have mentioned it.  I'll have to read up on them and, more
> importantly, their applications.  Seems that they are more versatile
> then I thought.
>
> Thank you.

You have to realize that a tuple can be used as a key, in your case a 
tuple of Account and Staff.


You'll have to decide how you're going to merge the ItemSold, 
QuantitySold, and notes.


Tells you how often I actually talk in mailing lists.  My apologies, and 
thank you again.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for direction

2015-05-15 Thread 20/20 Lab



On 05/13/2015 06:12 PM, Dave Angel wrote:

On 05/13/2015 08:45 PM, 20/20 Lab wrote:>

You accidentally replied to me, rather than the mailing list. Please 
use reply-list, or if your mailer can't handle that, do a Reply-All, 
and remove the parts you don't want.


...and now that you mention it.  I appear to have done that with all of 
my replies yesterday.


My deepest apologies for that.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for direction

2015-05-14 Thread Tim Chase
On 2015-05-14 09:57, 20/20 Lab wrote:
> On 05/13/2015 06:23 PM, Steven D'Aprano wrote:
>>> I have a LARGE csv file that I need to process.  110+ columns,
>>> 72k rows.  I managed to write enough to reduce it to a few
>>> hundred rows, and the five columns I'm interested in.
> I actually stumbled across the csv module after coding enough to
> make a list of lists.  So that is more the reason I approached the
> list; Nothing like spending hours (or days) coding something that
> already exists and just dont know about.
>>> Now is were I have my problem:
>>>
>>> myList = [ [123, "XXX", "Item", "Qty", "Noise"],
>>>  [72976, "YYY", "Item", "Qty", "Noise"],
>>>  [123, "XXX" "ItemTypo", "Qty", "Noise"]]
>>>
>>> Basically, I need to check for rows with duplicate accounts
>>> row[0] and staff (row[1]), and if so, remove that row, and add
>>> it's Qty to the original row. I really dont have a clue how to
>>> go about this.
>>
>> processed = {}  # hold the processed data in a dict
>>
>> for row in myList:
>>  account, staff = row[0:2]
>>  key = (account, staff)  # Put them in a tuple.
>>  if key in processed:
>>  # We've already seen this combination.
>>  processed[key][3] += row[3]  # Add the quantities.
>>  else:
>>  # Never seen this combination before.
>>  processed[key] = row
>>
>> newlist = list(processed.values())
>>
> It does, immensely.  I'll make this work.  Thank you again for the
> link from yesterday and apologies for hitting the wrong reply
> button.  I'll have to study more on the usage and implementations
> of dictionaries and tuples.

In processing the initial CSV file, I suspect that using a
csv.DictReader would make the code a bit cleaner.  Additionally,
as you're processing through the initial file, unless you need
the intermediate data, you should be able to do it in one pass.
Something like

  HEADER_ACCOUNT = "account"
  HEADER_STAFF = "staff"
  HEADER_QTY = "Qty"

  processed = {}
  with open("data.csv") as f:
reader = csv.DictReader(f)
for row in reader:
  if should_process_row(row):
account = row[HEADER_ACCOUNT]
staff = row[HEADER_STAFF]
qty = row[HEADER_QTY]
try:
  row[HEADER_QTY] = qty = int(qty)
except Exception:
  # not a numeric quantity?
  continue
# from Steven's code
key = (account, staff)
if key in processed:
  processed[key][HEADER_QTY] += qty
else:
  processed[key][HEADER_QTY] = row
  so_something_with(processed.values())
  
I find that using names is a lot clearer than using arbitrary
indexing.  Barring that, using indexes-as-constants still would
add further clarity.

-tkc




.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for direction

2015-05-14 Thread Albert-Jan Roskam via Python-list

-
On Thu, May 14, 2015 3:35 PM CEST Dennis Lee Bieber wrote:

>On Wed, 13 May 2015 16:24:30 -0700, 20/20 Lab  declaimed
>the following:
>
>>Now is were I have my problem:
>>
>>myList = [ [123, "XXX", "Item", "Qty", "Noise"],
>>[72976, "YYY", "Item", "Qty", "Noise"],
>>[123, "XXX" "ItemTypo", "Qty", "Noise"]]
>>
>>Basically, I need to check for rows with duplicate accounts row[0] and 
>>staff (row[1]), and if so, remove that row, and add it's Qty to the 
>>original row. I really dont have a clue how to go about this.  The 
>>number of rows change based on which run it is, so I couldnt even get 
>>away with using hundreds of compare loops.
>>
>>If someone could point me to some documentation on the functions I would 
>>need, or a tutorial it would be a great help.
>>
>
>   This appears to be a matter of algorithm development -- there won't be
>an pre-made "function" for it. The closest would be the summing functions
>(control break http://en.wikipedia.org/wiki/Control_break ) of a report
>writer application.
>
>   The short gist would be:
>
>   SORT the data by the account field
>   Initialize sum using first record
>   loop
>   read next record
>   if end of data
>   output sum record
>   exit
>   if record is same account as sum
>   add quantity to sum
>   else
>   output sum record
>   reset sum to the new record
>
>   Granted -- loading the data into an SQL capable database would make
>this simple...
>
>   select account, sum(quantity) from table
>   order by account

You could also use pandas. Read the data in a DataFrame, create a groupby 
object, use the sum() and the first() methods.

http://pandas.pydata.org/pandas-docs/version/0.15.2/groupby.html


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for direction

2015-05-14 Thread 20/20 Lab



On 05/13/2015 06:23 PM, Steven D'Aprano wrote:

On Thu, 14 May 2015 09:24 am, 20/20 Lab wrote:


I'm a beginner to python.  Reading here and there.  Written a couple of
short and simple programs to make life easier around the office.

That being said, I'm not even sure what I need to ask for. I've never
worked with external data before.

I have a LARGE csv file that I need to process.  110+ columns, 72k
rows.  I managed to write enough to reduce it to a few hundred rows, and
the five columns I'm interested in.

That's not large. Large is millions of rows, or tens of millions if you have
enough memory. What's large to you and me is usually small to the computer.

You should use the csv module for handling the CSV file, if you aren't
already doing so. Do you need a url to the docs?

I actually stumbled across the csv module after coding enough to make a 
list of lists.  So that is more the reason I approached the list;  
Nothing like spending hours (or days) coding something that already 
exists and just dont know about.

Now is were I have my problem:

myList = [ [123, "XXX", "Item", "Qty", "Noise"],
 [72976, "YYY", "Item", "Qty", "Noise"],
 [123, "XXX" "ItemTypo", "Qty", "Noise"]]

Basically, I need to check for rows with duplicate accounts row[0] and
staff (row[1]), and if so, remove that row, and add it's Qty to the
original row. I really dont have a clue how to go about this.

Is the order of the rows important? If not, the problem is simpler.


processed = {}  # hold the processed data in a dict

for row in myList:
 account, staff = row[0:2]
 key = (account, staff)  # Put them in a tuple.
 if key in processed:
 # We've already seen this combination.
 processed[key][3] += row[3]  # Add the quantities.
 else:
 # Never seen this combination before.
 processed[key] = row

newlist = list(processed.values())


Does that help?



It does, immensely.  I'll make this work.  Thank you again for the link 
from yesterday and apologies for hitting the wrong reply button.  I'll 
have to study more on the usage and implementations of dictionaries and 
tuples.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for direction

2015-05-13 Thread Steven D'Aprano
On Thu, 14 May 2015 09:24 am, 20/20 Lab wrote:

> I'm a beginner to python.  Reading here and there.  Written a couple of
> short and simple programs to make life easier around the office.
> 
> That being said, I'm not even sure what I need to ask for. I've never
> worked with external data before.
> 
> I have a LARGE csv file that I need to process.  110+ columns, 72k
> rows.  I managed to write enough to reduce it to a few hundred rows, and
> the five columns I'm interested in.

That's not large. Large is millions of rows, or tens of millions if you have
enough memory. What's large to you and me is usually small to the computer.

You should use the csv module for handling the CSV file, if you aren't
already doing so. Do you need a url to the docs?


> Now is were I have my problem:
> 
> myList = [ [123, "XXX", "Item", "Qty", "Noise"],
> [72976, "YYY", "Item", "Qty", "Noise"],
> [123, "XXX" "ItemTypo", "Qty", "Noise"]]
> 
> Basically, I need to check for rows with duplicate accounts row[0] and
> staff (row[1]), and if so, remove that row, and add it's Qty to the
> original row. I really dont have a clue how to go about this.

Is the order of the rows important? If not, the problem is simpler.


processed = {}  # hold the processed data in a dict

for row in myList:
account, staff = row[0:2]
key = (account, staff)  # Put them in a tuple.
if key in processed:
# We've already seen this combination.
processed[key][3] += row[3]  # Add the quantities.
else:
# Never seen this combination before.
processed[key] = row

newlist = list(processed.values())


Does that help?



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for direction

2015-05-13 Thread Dave Angel

On 05/13/2015 08:45 PM, 20/20 Lab wrote:>

You accidentally replied to me, rather than the mailing list.  Please 
use reply-list, or if your mailer can't handle that, do a Reply-All, and 
remove the parts you don't want.


>
> On 05/13/2015 05:07 PM, Dave Angel wrote:
>> On 05/13/2015 07:24 PM, 20/20 Lab wrote:
>>> I'm a beginner to python.  Reading here and there.  Written a couple of
>>> short and simple programs to make life easier around the office.
>>>
>> Welcome to Python, and to this mailing list.
>>
>>> That being said, I'm not even sure what I need to ask for. I've never
>>> worked with external data before.
>>>
>>> I have a LARGE csv file that I need to process.  110+ columns, 72k
>>> rows.
>>
>> That's not very large at all.
>>
> In the grand scheme, I guess not.  However I'm currently doing this
> whole process using office.  So it can be a bit daunting.

I'm not familiar with the "office" operating system.

>>>  I managed to write enough to reduce it to a few hundred rows, and
>>> the five columns I'm interested in.
>>
>>>
>>> Now is were I have my problem:
>>>
>>> myList = [ [123, "XXX", "Item", "Qty", "Noise"],
>>> [72976, "YYY", "Item", "Qty", "Noise"],
>>> [123, "XXX" "ItemTypo", "Qty", "Noise"]]
>>>
>>
>> It'd probably be useful to identify names for your columns, even if
>> it's just in a comment.  Guessing from the paragraph below, I figure
>> the first two columns are "account" & "staff"
>
> The columns that I pull are Account, Staff, Item Sold, Quantity sold,
> and notes about the sale (notes arent particularly needed, but the
> higher ups would like them in the report)
>>
>>> Basically, I need to check for rows with duplicate accounts row[0] and
>>> staff (row[1]), and if so, remove that row, and add it's Qty to the
>>> original row.
>>
>> And which column is that supposed to be?  Shouldn't there be a number
>> there, rather than a string?
>>
>>> I really dont have a clue how to go about this.  The
>>> number of rows change based on which run it is, so I couldnt even get
>>> away with using hundreds of compare loops.
>>>
>>> If someone could point me to some documentation on the functions I 
would

>>> need, or a tutorial it would be a great help.
>>>
>>
>> Is the order significant?  Do you have to preserve the order that the
>> accounts appear?  I'll assume not.
>>
>> Have you studied dictionaries?  Seems to me the way to handle the
>> problem is to read in a row, create a dictionary with key of (account,
>> staff), and data of the rest of the line.
>>
>> Each time you read a row, you check if the key is already in the
>> dictionary.  If not, add it.  If it's already there, merge the data as
>> you say.
>>
>> Then when you're done, turn the dict back into a list of lists.
>>
> The order is irrelevant.  No, I've not really studied dictionaries, but
> a few people have mentioned it.  I'll have to read up on them and, more
> importantly, their applications.  Seems that they are more versatile
> then I thought.
>
> Thank you.

You have to realize that a tuple can be used as a key, in your case a 
tuple of Account and Staff.


You'll have to decide how you're going to merge the ItemSold, 
QuantitySold, and notes.


--
DaveA


--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for direction

2015-05-13 Thread MRAB

On 2015-05-14 01:06, Ethan Furman wrote:

On 05/13/2015 04:24 PM, 20/20 Lab wrote:

I'm a beginner to python.  Reading here and there.  Written a couple of
short and simple programs to make life easier around the office.

That being said, I'm not even sure what I need to ask for. I've never
worked with external data before.

I have a LARGE csv file that I need to process.  110+ columns, 72k
rows.  I managed to write enough to reduce it to a few hundred rows, and
the five columns I'm interested in.

Now is were I have my problem:

myList = [ [123, "XXX", "Item", "Qty", "Noise"],
[72976, "YYY", "Item", "Qty", "Noise"],
[123, "XXX" "ItemTypo", "Qty", "Noise"]]

Basically, I need to check for rows with duplicate accounts row[0] and
staff (row[1]), and if so, remove that row, and add it's Qty to the
original row. I really dont have a clue how to go about this.  The
number of rows change based on which run it is, so I couldnt even get
away with using hundreds of compare loops.

If someone could point me to some documentation on the functions I would
need, or a tutorial it would be a great help.


You could try using a dictionary, combining when needed:

# untested
data = {}
for row in all_rows:
key = row[0], row[1]
if key in data:
  item, qty, noise = data[key]
  qty += row[3]
else:
  item, qty, noise = row[2:]
data[key] = item, qty, noise

for (account, staff), (item, qty, noise) in data.items():
do_stuff_with(account, staff, item, qty, noise)

At the end, data should have what you want.  It won't, however, be in
the same order, so hopefully that's not an issue for you.


Starting from that, if the order matters, you can do it this way:

data = {}
order = {}
for index, row in enumerate(all_rows):
key = row[0], row[1]
if key in data:
item, qty, noise = data[key]
qty += row[3]
else:
item, qty, noise = row[2:]
data[key] = item, qty, noise
order.setdefault(key, index)

merged_rows = [(account, staff, item, qty, noise) for (account, staff), 
(item, qty, noise) in data.items()]


def original_order(row):
key = row[0], row[1]
return order[key]

merged_rows.sort(key=original_order)

--
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for direction

2015-05-13 Thread Ethan Furman

On 05/13/2015 04:24 PM, 20/20 Lab wrote:

I'm a beginner to python.  Reading here and there.  Written a couple of
short and simple programs to make life easier around the office.

That being said, I'm not even sure what I need to ask for. I've never
worked with external data before.

I have a LARGE csv file that I need to process.  110+ columns, 72k
rows.  I managed to write enough to reduce it to a few hundred rows, and
the five columns I'm interested in.

Now is were I have my problem:

myList = [ [123, "XXX", "Item", "Qty", "Noise"],
[72976, "YYY", "Item", "Qty", "Noise"],
[123, "XXX" "ItemTypo", "Qty", "Noise"]]

Basically, I need to check for rows with duplicate accounts row[0] and
staff (row[1]), and if so, remove that row, and add it's Qty to the
original row. I really dont have a clue how to go about this.  The
number of rows change based on which run it is, so I couldnt even get
away with using hundreds of compare loops.

If someone could point me to some documentation on the functions I would
need, or a tutorial it would be a great help.


You could try using a dictionary, combining when needed:

# untested
data = {}
for row in all_rows:
  key = row[0], row[1]
  if key in data:
item, qty, noise = data[key]
qty += row[3]
  else:
item, qty, noise = row[2:]
  data[key] = item, qty, noise

for (account, staff), (item, qty, noise) in data.items():
  do_stuff_with(account, staff, item, qty, noise)

At the end, data should have what you want.  It won't, however, be in 
the same order, so hopefully that's not an issue for you.


--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for direction

2015-05-13 Thread Ben Finney
20/20 Lab  writes:

> I'm a beginner to python. Reading here and there. Written a couple of
> short and simple programs to make life easier around the office.

Welcome, and congratulations on self-educating to this point.

> myList = [ [123, "XXX", "Item", "Qty", "Noise"],
>[72976, "YYY", "Item", "Qty", "Noise"],
>[123, "XXX" "ItemTypo", "Qty", "Noise"]]
>
> Basically, I need to check for rows with duplicate accounts row[0] and
> staff (row[1]), and if so, remove that row, and add it's Qty to the
> original row. I really dont have a clue how to go about this.

You might benefit from doing some simple study of algorithms, with
exercises so you can test your knowledge and learn new ways of thinking
about basic algorithms.

I say that because what will be most helpful in the situation you're
facing is to *already* have learned to think about already-solved
algorithms like a toolkit. And the only way to get that toolkit is to do
some study, rather than solving each problem in the real world as it
comes along.

In this case, you are stuck IMO because the process you want to perform
on the data needs to be formally expressed. Here's an attempt:

For each unique pair (‘account_nr’, ‘staff_name’):
Sum all the ‘qty’ values as ‘total_qty’
Emit a record (‘account_nr’, ‘staff_name’, ‘total_qty’)

Once expressed that way, it becomes clear to me that the requirements as
stated have a gap. What becomes of ‘item’, ‘noise’, etc. values for the
same (‘account_nr’, ‘staff_name’) pair? Are they simply discarded as
uninteresting? If not discarded, how are they processed to make a single
record for that (‘account_nr’, ‘staff_name’) pair?

You don't have to respond to me with the answers. But you will need to
deal with that issue, and probably others. The advantage of formally
stating the process you want is to debug it before even writing a line
of code to solve it.

Here is a course on problem solving with algorithms that uses Python
http://interactivepython.org/runestone/static/pythonds/index.html>.

Good hunting!

-- 
 \“We cannot solve our problems with the same thinking we used |
  `\   when we created them.” —Albert Einstein |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for direction

2015-05-13 Thread Dave Angel

On 05/13/2015 07:24 PM, 20/20 Lab wrote:

I'm a beginner to python.  Reading here and there.  Written a couple of
short and simple programs to make life easier around the office.


Welcome to Python, and to this mailing list.


That being said, I'm not even sure what I need to ask for. I've never
worked with external data before.

I have a LARGE csv file that I need to process.  110+ columns, 72k
rows.


That's not very large at all.


 I managed to write enough to reduce it to a few hundred rows, and
the five columns I'm interested in.




Now is were I have my problem:

myList = [ [123, "XXX", "Item", "Qty", "Noise"],
[72976, "YYY", "Item", "Qty", "Noise"],
[123, "XXX" "ItemTypo", "Qty", "Noise"]]



It'd probably be useful to identify names for your columns, even if it's 
just in a comment.  Guessing from the paragraph below, I figure the 
first two columns are "account" & "staff"



Basically, I need to check for rows with duplicate accounts row[0] and
staff (row[1]), and if so, remove that row, and add it's Qty to the
original row.


And which column is that supposed to be?  Shouldn't there be a number 
there, rather than a string?



I really dont have a clue how to go about this.  The
number of rows change based on which run it is, so I couldnt even get
away with using hundreds of compare loops.

If someone could point me to some documentation on the functions I would
need, or a tutorial it would be a great help.



Is the order significant?  Do you have to preserve the order that the 
accounts appear?  I'll assume not.


Have you studied dictionaries?  Seems to me the way to handle the 
problem is to read in a row, create a dictionary with key of (account, 
staff), and data of the rest of the line.


Each time you read a row, you check if the key is already in the 
dictionary.  If not, add it.  If it's already there, merge the data as 
you say.


Then when you're done, turn the dict back into a list of lists.

--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for direction

2015-05-13 Thread Mark Lawrence

On 14/05/2015 00:24, 20/20 Lab wrote:

I'm a beginner to python.  Reading here and there.  Written a couple of
short and simple programs to make life easier around the office.


Welcome :)



That being said, I'm not even sure what I need to ask for. I've never
worked with external data before.

I have a LARGE csv file that I need to process.  110+ columns, 72k
rows.  I managed to write enough to reduce it to a few hundred rows, and
the five columns I'm interested in.

Now is were I have my problem:

myList = [ [123, "XXX", "Item", "Qty", "Noise"],
[72976, "YYY", "Item", "Qty", "Noise"],
[123, "XXX" "ItemTypo", "Qty", "Noise"]]

Basically, I need to check for rows with duplicate accounts row[0] and
staff (row[1]), and if so, remove that row, and add it's Qty to the
original row. I really dont have a clue how to go about this.  The
number of rows change based on which run it is, so I couldnt even get
away with using hundreds of compare loops.

If someone could point me to some documentation on the functions I would
need, or a tutorial it would be a great help.

Thank you.


Check this out http://pandas.pydata.org/

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list