Hi there, Python newbie here.

I am working with large files. For this reason I figured that I would capture the large input into a list and serialize it with pickle for later (faster) usage. Everything has worked beautifully until today when the large data (1GB) file caused a MemoryError :(

Question for experts: is there a way to refactor this so that data may be filled/written/released as the scripts go and avoid the problem?
code below.

Thanks

data = list()
for line in sys.stdin:

    try:
        parts = line.strip().split("\t")
        t = parts[0]
        w = parts[1]
        u = parts[2]



        #let's retain in-memory copy of data
        data.append({"ta": t,
                     "wa": w,
                     "ua": u
        })

    except IndexError:
        print("Problem with line :"+line, file=sys.stderr)
        pass

#time to save data object into a pickle file

fileObject = open(filename,"wb")
pickle.dump(data,fileObject)
fileObject.close()
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to