Hi Peter,
First off - many (many!) thanks.

There's some error I don't understand.
Here's the amended script I used:

import csv

#open CSV's and read first column with product IDs into variables pointing to 
with open("Afile.csv", "rb") as f: 
    a = {row[0] for row in csv.reader(f)}
with open("Bfile.csv", "rb") as g: 
    b = {row[0] for row in csv.reader(g)} 

#create variables pointing to lists with unique product IDs in A and B 
in_a_not_b = a-b 
in_b_not_a = b-a 

print in_a_not_b
print in_b_not_a

with open("inAnotB.csv", "wb") as f: 
    writer = csv.writer(f) 
    writer.writerows([item] for item in_a_not_b)

with open("inAnotB.csv", "wb") as g: 
    writer = csv.writer(g) 
    writer.writerows([item] for item in_b_not_a)

print "done!" 

and when I run it I get an invalid syntex error and (as a true newbie I used a 
GUI)in_a_not_b is highlighted in the 
with open("inAnotB.csv", "wb") as f: 
    writer = csv.writer(f) 
    writer.writerows([item] for item in_a_not_b)


Could you please point our what I'm doing wrong?

Thanks again :)

On Tuesday, June 18, 2013 11:39:41 AM UTC+3, Peter Otten wrote:
> Alan Newbie wrote:
> > Hello,
> > Let's say I want to compare two csv files: file A and file B. They are
> > both similarly built - the first column has product IDs (one product per
> > row) and the columns provide some stats about the products such as sales
> > in # and $.
> > 
> > I want to compare these files - see which product IDs appear in the first
> > column of file A and not in B, and which in B and not A. Finally, it would
> > be very great if the result could be written into two new CSV files - one
> > product ID per row in the first column. (no other data in the other
> > columns needed)
> > 
> > This is the script I tried:
> > ==========================
> > 
> > import csv
> > 
> > #open CSV's and read first column with product IDs into variables pointing
> > #to lists
> > A = [line.split(',')[0] for line in open('Afile.csv')]
> > B = [line.split(',')[0] for line in open('Bfile.csv')]
> > 
> > #create variables pointing to lists with unique product IDs in A and B
> > #respectively
> > inAnotB = list(set(A)-set(B))
> > inBnotA = list(set(B)-set(A))
> > 
> > print inAnotB
> > print inBnotA
> > 
> > c = csv.writer(open("inAnotB.csv", "wb"))
> > c.writerow([inAnotB])
> > 
> > 
> > d = csv.writer(open("inBnotA.csv", "wb"))
> > d.writerow([inBnotA])
> > 
> > print "done!"
> > 
> > =====================================================
> > 
> > But it doesn't produce the required results.
> > It prints IDs in this format:
> > 247158132\n
> Python reads lines from a file with the trailing newline included, and 
> line.split(",") with only one column (i. e. no comma) keeps the whole line. 
> As you already know about the csv module you should use it to read your 
> data, e. g. instead of
> > A = [line.split(',')[0] for line in open('Afile.csv')]
> try
> with open("Afile.csv", "rb") as f:
>     a = {row[0] for row in csv.reader(f)}
> ...
> I used {...} instead of [...], so a is already a set and you can proceed:
> in_a_not_b = a - b
> Finally as a shortcut for
> for item in in_a_not_b:
>     writer.writerow([item])
> use the writerows() method to write your data:
> with open("inAnotB.csv", "wb") as f:
>     writer = csv.writer(f)
>     writer.writerows([item] for item in_a_not_b)
> Note that I'm wrapping every item in the set rather than the complete set as 
> a whole. If you wanted to be clever you could spell that even more succinct 
> as
>     writer.writerows(zip(in_a_not_b))
> > and nothing to the csv files.
> > 
> > You could probably tell I'm a newbie.
> > Could you help me out?
> > 
> > here's some dummy data:
> > 
> https://docs.google.com/file/d/0BwziqsHUZOWRYU15aEFuWm9fajA/edit?usp=sharing
> > 
> > 
> https://docs.google.com/file/d/0BwziqsHUZOWRQVlTelVveEhsMm8/edit?usp=sharing
> > 
> > Thanks a bunch in advance! :)

Reply via email to