Hi, John:
I think your code is right, except "Doc.object" should be "Doc.objects";

The following pseudo code maybe fater than what you write:

doc_map = {}
for each xml:
extract from the xml data -> mydoc_code, mydoc_text, myRelated_doc_codes
doc = Doc.objects.create(doc_code=mydoc_code, doc_text=mydoc_text)
doc_map[mydoc_code] = (doc, myRelated_doc_codes)
for (doc, rcodes) in doc_map.values():
for rcode in rcodes:
doc.related_doc.add(doc_map[rcode])
doc.save()

I have checked, It's okay;
The object have be cached in doc_map, and no need re-query related_codes for 
related_doc from database,  the speed should speed up.

With Regards.




[email protected]

From: John Carlo
Date: 2014-06-11 21:14
To: django-users
Subject: Massive import in Django database
Hello everybody,


I've fallen in love with Django two years ago and I've been using it for my job 
projects. In the past I found very useful information in this group, so a big 
thank you guys!


I have a little doubt.
I have to import in Django db (sqlite for local development, mySql on the 
server) about 1.000.000 xml documents.


The model class is the following:


class Doc(models.Model):
    doc_code =  models.CharField(max_length=20, unique=True, primary_key=True, 
db_index = True) 

    doc_text = models.TextField(null=True, blank=True) 
    related_doc= models.ManyToManyField('self', null=True, blank=True, db_index 
= True) 



>From what I know bulk insertion is not possibile because I have a 
>ManyToManyField relation.


So I have this simple loop (in pseudo code)


for each xml:
   extract from the xml  date-> mydoc_code, mydoc_text, myRelated_doc_codes


   myDoc = Doc.object.get_or_create(doc_code = mydoc_code)[0]
   myDoc.doc_text = mydoc_text
   
   for reldoc_code in myRelated_doc_codes:
        myRelDoc =  Doc.object.get_or_create(doc_code = reldoc_code )[0]
        myDoc.related_doc.add(myRelDoc )


  myDoc.save()




I'm doing it right? Do you have some suggestions, recommendation? I fear that 
since I have 1.000.000 docs to import, it will take a loooot of time, 
especially during the get_or_create routines


thank you in advance everybody!


John








             
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/5b88deaf-d806-4a64-9e8d-528d95599c80%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/2014061123474049956470%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to