Re: update_or_create() always creates (or recreates)
Thanks - you've definitely given me some stuff to think about. I'm doing XHR requests - returning JSON for the scraping (but probably later will have normal pages so I will definitely look at your Etag suggestion - I'm not familiar with that so will look into it). Given it's XHR and JSON I presume eTag isn't relevant, so I think your idea of setting a flag is a good one. So for each row in each table (e.g. a Supplier) that I rescrape - get that from database based on the unique_id and then compare each attribute to the re-scraped JSON and alter flag/update instance if diff. The data will only change about a fraction of a percent of the time (most of the time constant) and it will be about 70k rows with 50 -100 fields. DB is postgres (on Heroku for now). On Friday, 6 November 2015 17:12:05 UTC, Dan Tagg wrote: > > If you are web scraping you really need your code to be as efficient as > possible and to do as little as possible. Firstly, make sure you are using > everything the servers of the websites you are scraping are giving you to > decide whether to bother downloading the page. For example, check the etag > and only bother to scape if it is different from the last time you scraped > data.. If you don't trust the server's ETag, you can hash the page when you > download it and check that against your stored hash so you can check > whether it changed and whether it's worth processing. > > Your approach of trying a 'get' with all the properties set and picking up > the exception has costs -- Assuming your tables have enough rows that > scanning the entire table won't be efficient for every "get" you will need > to have every column you are using in you "get" indexed in the database. > This obviously has a storage cost as well as an additional insert/update > cost and a larger cost to run the query than a simple select against a > single key. Whether that is more efficient than getting the result and > comparing the fields in python I don't know. I imagine it will be dependent > on what your RDBMS is and how it is hosted as well as how many rows and > columns will be in your database table. > > You could initialise a flag to False and as you process your scraped data > you could compare it to the attributes of your instance and set the flag to > True if they have changed and then not bother saving if you get to the end > of processing your scraped data and the modified flag has not been set to > True. > > Dan > > On 6 November 2015 at 16:12, Yuntiwrote: > >> Hi Dan, >> >> Thanks for the suggestion, it's a web scraper (run as a django management >> command) which then saves the data to the database via the Django ORM. >> Given it's a scraper rather than a form (or view) is the above suggested >> function an ok way to proceed or would you suggest something else is more >> appropriate/best practice? >> >> >> >> On Friday, 6 November 2015 14:40:59 UTC, Dan Tagg wrote: >>> >>> Hi Yunti, >>> >>> >>> You could go up a level in the structure of your application and apply >>> the logic there, where there is more support. >>> >>> Are you using Django forms? The ModelForm class pretty much does what >>> you want, it examines form data, validating it against its type and any >>> validation rules you have set in the form or your model, compares it to the >>> instance's data in the database and only saves if there has been some kind >>> of change. >>> >>> Dan >>> >>> On 6 November 2015 at 13:47, Yunti wrote: >>> Jani, Thanks for your reply - you explained it much more concisely than I did. :) Good to have it confirmed that update_or_create() doesn't quite do what I needed - I was confused as to whether it would or not. Thanks for taking the time to do that function, that looks ideal. I'll test it out. On Friday, 6 November 2015 12:52:11 UTC, Jani Tiainen wrote: > Your problem lies on the way Django actually carries out create or > update. > > As name suggest, create or update does either one. But that's what you > don't want - you want conditional update. > > Only update if certain fields have been changed. Well this can be done > few ways. > > So you want to do > "update_only_if_at_least_one_of_default_fields_changed_or_create" > > Operation is simple, if object is not found, create new one using > defaults if found, pull values as a dict, compare against > default values and if at least one differs do an update. Otherwise > don't do anything. > > So basically code would look something like this: > > update_if_changed_or_create(**kwargs): > defaults = kwargs.pop('defaults', None) > > qs = MyModel.objects.filter(**kwargs) > > if not qs: > obj = MyModel(**kwargs).save() > return obj, True # Created object >
Re: update_or_create() always creates (or recreates)
If you are web scraping you really need your code to be as efficient as possible and to do as little as possible. Firstly, make sure you are using everything the servers of the websites you are scraping are giving you to decide whether to bother downloading the page. For example, check the etag and only bother to scape if it is different from the last time you scraped data.. If you don't trust the server's ETag, you can hash the page when you download it and check that against your stored hash so you can check whether it changed and whether it's worth processing. Your approach of trying a 'get' with all the properties set and picking up the exception has costs -- Assuming your tables have enough rows that scanning the entire table won't be efficient for every "get" you will need to have every column you are using in you "get" indexed in the database. This obviously has a storage cost as well as an additional insert/update cost and a larger cost to run the query than a simple select against a single key. Whether that is more efficient than getting the result and comparing the fields in python I don't know. I imagine it will be dependent on what your RDBMS is and how it is hosted as well as how many rows and columns will be in your database table. You could initialise a flag to False and as you process your scraped data you could compare it to the attributes of your instance and set the flag to True if they have changed and then not bother saving if you get to the end of processing your scraped data and the modified flag has not been set to True. Dan On 6 November 2015 at 16:12, Yuntiwrote: > Hi Dan, > > Thanks for the suggestion, it's a web scraper (run as a django management > command) which then saves the data to the database via the Django ORM. > Given it's a scraper rather than a form (or view) is the above suggested > function an ok way to proceed or would you suggest something else is more > appropriate/best practice? > > > > On Friday, 6 November 2015 14:40:59 UTC, Dan Tagg wrote: >> >> Hi Yunti, >> >> >> You could go up a level in the structure of your application and apply >> the logic there, where there is more support. >> >> Are you using Django forms? The ModelForm class pretty much does what you >> want, it examines form data, validating it against its type and any >> validation rules you have set in the form or your model, compares it to the >> instance's data in the database and only saves if there has been some kind >> of change. >> >> Dan >> >> On 6 November 2015 at 13:47, Yunti wrote: >> >>> Jani, >>> >>> Thanks for your reply - you explained it much more concisely than I did. >>> :) >>> >>> Good to have it confirmed that update_or_create() doesn't quite do what >>> I needed - I was confused as to whether it would or not. >>> >>> Thanks for taking the time to do that function, that looks ideal. I'll >>> test it out. >>> >>> >>> On Friday, 6 November 2015 12:52:11 UTC, Jani Tiainen wrote: >>> Your problem lies on the way Django actually carries out create or update. As name suggest, create or update does either one. But that's what you don't want - you want conditional update. Only update if certain fields have been changed. Well this can be done few ways. So you want to do "update_only_if_at_least_one_of_default_fields_changed_or_create" Operation is simple, if object is not found, create new one using defaults if found, pull values as a dict, compare against default values and if at least one differs do an update. Otherwise don't do anything. So basically code would look something like this: update_if_changed_or_create(**kwargs): defaults = kwargs.pop('defaults', None) qs = MyModel.objects.filter(**kwargs) if not qs: obj = MyModel(**kwargs).save() return obj, True # Created object else if len(qs) == 1: obj = qs[0] changed = False for k, v in defaults: if getattr(obj, k) != v: changed = True setattr(obj, k, v) if changed: obj.save() return obj, False # Updated object else: # Multiple objects... return obj, None # No change. On 06.11.2015 14:08, Yunti wrote: Carsten , Thanks for your reply, A note about the last statement: If a Supplier object has the same unique_id, and all other fields (in `defaults`) are the same as well, logically there is no difference between updating and not updating – the result is the same. The entry in the database is the same - apart from the last_updated flag if it's not rewritten over the top of it. This means I can check for new data often and be alerted when there is an actual
Re: update_or_create() always creates (or recreates)
Hi Dan, Thanks for the suggestion, it's a web scraper (run as a django management command) which then saves the data to the database via the Django ORM. Given it's a scraper rather than a form (or view) is the above suggested function an ok way to proceed or would you suggest something else is more appropriate/best practice? On Friday, 6 November 2015 14:40:59 UTC, Dan Tagg wrote: > > Hi Yunti, > > > You could go up a level in the structure of your application and apply the > logic there, where there is more support. > > Are you using Django forms? The ModelForm class pretty much does what you > want, it examines form data, validating it against its type and any > validation rules you have set in the form or your model, compares it to the > instance's data in the database and only saves if there has been some kind > of change. > > Dan > > On 6 November 2015 at 13:47, Yuntiwrote: > >> Jani, >> >> Thanks for your reply - you explained it much more concisely than I did. >> :) >> >> Good to have it confirmed that update_or_create() doesn't quite do what I >> needed - I was confused as to whether it would or not. >> >> Thanks for taking the time to do that function, that looks ideal. I'll >> test it out. >> >> >> On Friday, 6 November 2015 12:52:11 UTC, Jani Tiainen wrote: >> >>> Your problem lies on the way Django actually carries out create or >>> update. >>> >>> As name suggest, create or update does either one. But that's what you >>> don't want - you want conditional update. >>> >>> Only update if certain fields have been changed. Well this can be done >>> few ways. >>> >>> So you want to do >>> "update_only_if_at_least_one_of_default_fields_changed_or_create" >>> >>> Operation is simple, if object is not found, create new one using >>> defaults if found, pull values as a dict, compare against >>> default values and if at least one differs do an update. Otherwise don't >>> do anything. >>> >>> So basically code would look something like this: >>> >>> update_if_changed_or_create(**kwargs): >>> defaults = kwargs.pop('defaults', None) >>> >>> qs = MyModel.objects.filter(**kwargs) >>> >>> if not qs: >>> obj = MyModel(**kwargs).save() >>> return obj, True # Created object >>> else if len(qs) == 1: >>> obj = qs[0] >>> changed = False >>> for k, v in defaults: >>> if getattr(obj, k) != v: >>> changed = True >>> setattr(obj, k, v) >>> if changed: >>> obj.save() >>> return obj, False # Updated object >>> else: >>> # Multiple objects... >>> >>> return obj, None # No change. >>> >>> >>> On 06.11.2015 14:08, Yunti wrote: >>> >>> Carsten , >>> >>> Thanks for your reply, >>> >>> A note about the last statement: If a Supplier object has the same >>> unique_id, and all >>> other fields (in `defaults`) are the same as well, logically there is no >>> difference >>> between updating and not updating – the result is the same. >>> >>> The entry in the database is the same - apart from the last_updated flag >>> if it's not rewritten over the top of it. This means I can check for new >>> data often and be alerted when there is an actual update (i.e. a change to >>> the data). If it rewrites the data everytime it checks then I have no idea >>> when data was actually updated. >>> >>> Have you checked? How? >>> In your create_or_update_if_diff() you seem to try to re-invent >>> update_or_create(), but >>> have you actually examined the results of the >>> >>> supplier, created = Supplier.objects.update_or_create(...) >>> >>> call? >>> >>> I checked by seeing that the last_updated field in the database was >>> updated everytime. (I suppose the issue could be with how that field gets >>> reset to the next time it's run- I didn't eliminate that possibility.) >>> >>> Yes I was worried that I might be recreating (a poor version) of >>> update_or_create() but it didn't seem to have the option where it wouldn't >>> write to the database if there was no change to the data. >>> Can it do this? And how would I verify when an item has been updated or >>> created (or neither) - could I output to the console? >>> >>> If it can how do I call it so it checks against all fields (unique_id >>> and defaults) and updates using the defaults if it finds a difference (and >>> creates if it doesn't find a unique_id)? >>> >>> I'm still not sure if this is possible and how to call the function, >>> particular how to pass in the remaining defaults to check against - >>> **kwargs = defaults isn't right but not sure what it should be. >>> >>> supplier, created = >>> Supplier.objects.update_or_create(unique_id=product_detail['supplierId'], >>> **kwargs=defaults, >>>defaults={ >>>'name': >>>
Re: update_or_create() always creates (or recreates)
Hi Yunti, You could go up a level in the structure of your application and apply the logic there, where there is more support. Are you using Django forms? The ModelForm class pretty much does what you want, it examines form data, validating it against its type and any validation rules you have set in the form or your model, compares it to the instance's data in the database and only saves if there has been some kind of change. Dan On 6 November 2015 at 13:47, Yuntiwrote: > Jani, > > Thanks for your reply - you explained it much more concisely than I did. :) > > Good to have it confirmed that update_or_create() doesn't quite do what I > needed - I was confused as to whether it would or not. > > Thanks for taking the time to do that function, that looks ideal. I'll > test it out. > > > On Friday, 6 November 2015 12:52:11 UTC, Jani Tiainen wrote: > >> Your problem lies on the way Django actually carries out create or update. >> >> As name suggest, create or update does either one. But that's what you >> don't want - you want conditional update. >> >> Only update if certain fields have been changed. Well this can be done >> few ways. >> >> So you want to do >> "update_only_if_at_least_one_of_default_fields_changed_or_create" >> >> Operation is simple, if object is not found, create new one using >> defaults if found, pull values as a dict, compare against >> default values and if at least one differs do an update. Otherwise don't >> do anything. >> >> So basically code would look something like this: >> >> update_if_changed_or_create(**kwargs): >> defaults = kwargs.pop('defaults', None) >> >> qs = MyModel.objects.filter(**kwargs) >> >> if not qs: >> obj = MyModel(**kwargs).save() >> return obj, True # Created object >> else if len(qs) == 1: >> obj = qs[0] >> changed = False >> for k, v in defaults: >> if getattr(obj, k) != v: >> changed = True >> setattr(obj, k, v) >> if changed: >> obj.save() >> return obj, False # Updated object >> else: >> # Multiple objects... >> >> return obj, None # No change. >> >> >> On 06.11.2015 14:08, Yunti wrote: >> >> Carsten , >> >> Thanks for your reply, >> >> A note about the last statement: If a Supplier object has the same >> unique_id, and all >> other fields (in `defaults`) are the same as well, logically there is no >> difference >> between updating and not updating – the result is the same. >> >> The entry in the database is the same - apart from the last_updated flag >> if it's not rewritten over the top of it. This means I can check for new >> data often and be alerted when there is an actual update (i.e. a change to >> the data). If it rewrites the data everytime it checks then I have no idea >> when data was actually updated. >> >> Have you checked? How? >> In your create_or_update_if_diff() you seem to try to re-invent >> update_or_create(), but >> have you actually examined the results of the >> >> supplier, created = Supplier.objects.update_or_create(...) >> >> call? >> >> I checked by seeing that the last_updated field in the database was >> updated everytime. (I suppose the issue could be with how that field gets >> reset to the next time it's run- I didn't eliminate that possibility.) >> >> Yes I was worried that I might be recreating (a poor version) of >> update_or_create() but it didn't seem to have the option where it wouldn't >> write to the database if there was no change to the data. >> Can it do this? And how would I verify when an item has been updated or >> created (or neither) - could I output to the console? >> >> If it can how do I call it so it checks against all fields (unique_id and >> defaults) and updates using the defaults if it finds a difference (and >> creates if it doesn't find a unique_id)? >> >> I'm still not sure if this is possible and how to call the function, >> particular how to pass in the remaining defaults to check against - >> **kwargs = defaults isn't right but not sure what it should be. >> >> supplier, created = >> Supplier.objects.update_or_create(unique_id=product_detail['supplierId'], >> **kwargs=defaults, >>defaults={ >>'name': >> product_detail['supplierName'], >>'entity_name_1': >> entity_name_1, >>'entity_name_2': >> entity_name_1, >>'rating': >> product_detail['supplierRating']}) >> >> On Thursday, 5 November 2015 20:05:39 UTC, Carsten Fuchs wrote: >>> >>> Hi Yunti, Am 05.11.2015 um 18:19 schrieb Yunti: > I have tried to use >>> the update_or_create() method assuming that it would either, create > a new >>> entry in the db if it found
Re: update_or_create() always creates (or recreates)
Jani, Thanks for your reply - you explained it much more concisely than I did. :) Good to have it confirmed that update_or_create() doesn't quite do what I needed - I was confused as to whether it would or not. Thanks for taking the time to do that function, that looks ideal. I'll test it out. On Friday, 6 November 2015 12:52:11 UTC, Jani Tiainen wrote: > > Your problem lies on the way Django actually carries out create or update. > > As name suggest, create or update does either one. But that's what you > don't want - you want conditional update. > > Only update if certain fields have been changed. Well this can be done few > ways. > > So you want to do > "update_only_if_at_least_one_of_default_fields_changed_or_create" > > Operation is simple, if object is not found, create new one using defaults > if found, pull values as a dict, compare against > default values and if at least one differs do an update. Otherwise don't > do anything. > > So basically code would look something like this: > > update_if_changed_or_create(**kwargs): > defaults = kwargs.pop('defaults', None) > > qs = MyModel.objects.filter(**kwargs) > > if not qs: > obj = MyModel(**kwargs).save() > return obj, True # Created object > else if len(qs) == 1: > obj = qs[0] > changed = False > for k, v in defaults: > if getattr(obj, k) != v: > changed = True > setattr(obj, k, v) > if changed: > obj.save() > return obj, False # Updated object > else: > # Multiple objects... > > return obj, None # No change. > > > On 06.11.2015 14:08, Yunti wrote: > > Carsten , > > Thanks for your reply, > > A note about the last statement: If a Supplier object has the same > unique_id, and all > other fields (in `defaults`) are the same as well, logically there is no > difference > between updating and not updating – the result is the same. > > The entry in the database is the same - apart from the last_updated flag > if it's not rewritten over the top of it. This means I can check for new > data often and be alerted when there is an actual update (i.e. a change to > the data). If it rewrites the data everytime it checks then I have no idea > when data was actually updated. > > Have you checked? How? > In your create_or_update_if_diff() you seem to try to re-invent > update_or_create(), but > have you actually examined the results of the > > supplier, created = Supplier.objects.update_or_create(...) > > call? > > I checked by seeing that the last_updated field in the database was > updated everytime. (I suppose the issue could be with how that field gets > reset to the next time it's run- I didn't eliminate that possibility.) > > Yes I was worried that I might be recreating (a poor version) of > update_or_create() but it didn't seem to have the option where it wouldn't > write to the database if there was no change to the data. > Can it do this? And how would I verify when an item has been updated or > created (or neither) - could I output to the console? > > If it can how do I call it so it checks against all fields (unique_id and > defaults) and updates using the defaults if it finds a difference (and > creates if it doesn't find a unique_id)? > > I'm still not sure if this is possible and how to call the function, > particular how to pass in the remaining defaults to check against - > **kwargs = defaults isn't right but not sure what it should be. > > supplier, created = > Supplier.objects.update_or_create(unique_id=product_detail['supplierId'], > **kwargs=defaults, >defaults={ >'name': > product_detail['supplierName'], >'entity_name_1': > entity_name_1, >'entity_name_2': > entity_name_1, >'rating': > product_detail['supplierRating']}) > > On Thursday, 5 November 2015 20:05:39 UTC, Carsten Fuchs wrote: >> >> Hi Yunti, Am 05.11.2015 um 18:19 schrieb Yunti: > I have tried to use the >> update_or_create() method assuming that it would either, create > a new >> entry in the db if it found none or update an existing one if it found one >> and had > differences to the defaults passed in - or wouldn't update if >> there was no difference. A note about the last statement: If a Supplier >> object has the same unique_id, and all other fields (in `defaults`) are the >> same as well, logically there is no difference between updating and not >> updating – the result is the same. > However it just seemed to recreate >> entries each time even if there were no changes. Have you checked? How? In >> your create_or_update_if_diff() you seem to try to re-invent >>
Re: update_or_create() always creates (or recreates)
Your problem lies on the way Django actually carries out create or update. As name suggest, create or update does either one. But that's what you don't want - you want conditional update. Only update if certain fields have been changed. Well this can be done few ways. So you want to do "update_only_if_at_least_one_of_default_fields_changed_or_create" Operation is simple, if object is not found, create new one using defaults if found, pull values as a dict, compare against default values and if at least one differs do an update. Otherwise don't do anything. So basically code would look something like this: update_if_changed_or_create(**kwargs): defaults = kwargs.pop('defaults', None) qs = MyModel.objects.filter(**kwargs) if not qs: obj = MyModel(**kwargs).save() return obj, True # Created object else if len(qs) == 1: obj = qs[0] changed = False for k, v in defaults: if getattr(obj, k) != v: changed = True setattr(obj, k, v) if changed: obj.save() return obj, False # Updated object else: # Multiple objects... return obj, None # No change. On 06.11.2015 14:08, Yunti wrote: Carsten , Thanks for your reply, A note about the last statement: If a Supplier object has the same unique_id, and all other fields (in `defaults`) are the same as well, logically there is no difference between updating and not updating – the result is the same. The entry in the database is the same - apart from the last_updated flag if it's not rewritten over the top of it. This means I can check for new data often and be alerted when there is an actual update (i.e. a change to the data). If it rewrites the data everytime it checks then I have no idea when data was actually updated. Have you checked? How? In your create_or_update_if_diff() you seem to try to re-invent update_or_create(), but have you actually examined the results of the supplier, created = Supplier.objects.update_or_create(...) call? I checked by seeing that the last_updated field in the database was updated everytime. (I suppose the issue could be with how that field gets reset to the next time it's run- I didn't eliminate that possibility.) Yes I was worried that I might be recreating (a poor version) of update_or_create() but it didn't seem to have the option where it wouldn't write to the database if there was no change to the data. Can it do this? And how would I verify when an item has been updated or created (or neither) - could I output to the console? If it can how do I call it so it checks against all fields (unique_id and defaults) and updates using the defaults if it finds a difference (and creates if it doesn't find a unique_id)? I'm still not sure if this is possible and how to call the function, particular how to pass in the remaining defaults to check against - **kwargs = defaults isn't right but not sure what it should be. supplier, created = Supplier.objects.update_or_create(unique_id=product_detail['supplierId'], **kwargs=defaults, defaults={ 'name': product_detail['supplierName'], 'entity_name_1': entity_name_1, 'entity_name_2': entity_name_1, 'rating': product_detail['supplierRating']}) On Thursday, 5 November 2015 20:05:39 UTC, Carsten Fuchs wrote: Hi Yunti, Am 05.11.2015 um 18:19 schrieb Yunti: > I have tried to use the update_or_create() method assuming that it would either, create > a new entry in the db if it found none or update an existing one if it found one and had > differences to the defaults passed in - or wouldn't update if there was no difference. A note about the last statement: If a Supplier object has the same unique_id, and all other fields (in `defaults`) are the same as well, logically there is no difference between updating and not updating – the result is the same. > However it just seemed to recreate entries each time even if there were no changes. Have you checked? How? In your create_or_update_if_diff() you seem to try to re-invent update_or_create(), but have you actually examined the results of the supplier, created = Supplier.objects.update_or_create(...) call? > I think the issue was that I wanted to: > 1) get an entry if all fields were the same, update_or_create() updates an object with the given kwargs, the match is not made against *all* fields (i.e. for the match the fields in `defaults` are not accounted for). > 2) or create a new entry if it didn't find an existing entry with the unique_id > 3) or if there was an entry with the same unique_id, update that entry with remaining > fields. update_or_create() should achieve this. It's hard to tell more without additional information, but https://docs.djangoproject.com/en/1.8/ref/models/querysets/#update-or-create
Re: update_or_create() always creates (or recreates)
Carsten , Thanks for your reply, A note about the last statement: If a Supplier object has the same unique_id, and all other fields (in `defaults`) are the same as well, logically there is no difference between updating and not updating – the result is the same. The entry in the database is the same - apart from the last_updated flag if it's not rewritten over the top of it. This means I can check for new data often and be alerted when there is an actual update (i.e. a change to the data). If it rewrites the data everytime it checks then I have no idea when data was actually updated. Have you checked? How? In your create_or_update_if_diff() you seem to try to re-invent update_or_create(), but have you actually examined the results of the supplier, created = Supplier.objects.update_or_create(...) call? I checked by seeing that the last_updated field in the database was updated everytime. (I suppose the issue could be with how that field gets reset to the next time it's run- I didn't eliminate that possibility.) Yes I was worried that I might be recreating (a poor version) of update_or_create() but it didn't seem to have the option where it wouldn't write to the database if there was no change to the data. Can it do this? And how would I verify when an item has been updated or created (or neither) - could I output to the console? If it can how do I call it so it checks against all fields (unique_id and defaults) and updates using the defaults if it finds a difference (and creates if it doesn't find a unique_id)? I'm still not sure if this is possible and how to call the function, particular how to pass in the remaining defaults to check against - **kwargs = defaults isn't right but not sure what it should be. supplier, created = Supplier.objects.update_or_create(unique_id=product_detail['supplierId'], **kwargs=defaults, defaults={ 'name': product_detail['supplierName'], 'entity_name_1': entity_name_1, 'entity_name_2': entity_name_1, 'rating': product_detail['supplierRating']}) On Thursday, 5 November 2015 20:05:39 UTC, Carsten Fuchs wrote: > > Hi Yunti, > > Am 05.11.2015 um 18:19 schrieb Yunti: > > I have tried to use the update_or_create() method assuming that it would > either, create > > a new entry in the db if it found none or update an existing one if it > found one and had > > differences to the defaults passed in - or wouldn't update if there was > no difference. > > A note about the last statement: If a Supplier object has the same > unique_id, and all > other fields (in `defaults`) are the same as well, logically there is no > difference > between updating and not updating – the result is the same. > > > However it just seemed to recreate entries each time even if there > were no changes. > > Have you checked? How? > In your create_or_update_if_diff() you seem to try to re-invent > update_or_create(), but > have you actually examined the results of the > > supplier, created = Supplier.objects.update_or_create(...) > > call? > > > I think the issue was that I wanted to: > > 1) get an entry if all fields were the same, > > update_or_create() updates an object with the given kwargs, the match is > not made > against *all* fields (i.e. for the match the fields in `defaults` are not > accounted for). > > > 2) or create a new entry if it didn't find an existing entry with the > unique_id > > 3) or if there was an entry with the same unique_id, update that entry > with remaining > > fields. > > update_or_create() should achieve this. It's hard to tell more without > additional > information, but > > https://docs.djangoproject.com/en/1.8/ref/models/querysets/#update-or-create > explains > the function well, including how it works. If you work through this in > small steps, > check examples and their (intermediate) results, you should be able to > find what the > original problem was. > > Best regards, > Carsten > > -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at http://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/9b529e2d-7e2b-4194-a77c-8434efe6205d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: update_or_create() always creates (or recreates)
Carsten , Thanks for your reply, A note about the last statement: If a Supplier object has the same unique_id, and all other fields (in `defaults`) are the same as well, logically there is no difference between updating and not updating – the result is the same. The entry in the database is the same - apart from the last_updated flag if it's not rewritten over the top of it. This means I can check for new data often and be alerted when there is an actual update (i.e. a change to the data). If it rewrites the data everytime it checks then I have no idea when data was actually updated. Have you checked? How? In your create_or_update_if_diff() you seem to try to re-invent update_or_create(), but have you actually examined the results of the supplier, created = Supplier.objects.update_or_create(...) call? I checked by seeing that the last_updated field in the database was updated everytime. (I suppose the issue could be with how that field gets reset to the next time it's run- I didnt) On Thursday, 5 November 2015 20:05:39 UTC, Carsten Fuchs wrote: > > Hi Yunti, > > Am 05.11.2015 um 18:19 schrieb Yunti: > > I have tried to use the update_or_create() method assuming that it would > either, create > > a new entry in the db if it found none or update an existing one if it > found one and had > > differences to the defaults passed in - or wouldn't update if there was > no difference. > > A note about the last statement: If a Supplier object has the same > unique_id, and all > other fields (in `defaults`) are the same as well, logically there is no > difference > between updating and not updating – the result is the same. > > > However it just seemed to recreate entries each time even if there > were no changes. > > Have you checked? How? > In your create_or_update_if_diff() you seem to try to re-invent > update_or_create(), but > have you actually examined the results of the > > supplier, created = Supplier.objects.update_or_create(...) > > call? > > > I think the issue was that I wanted to: > > 1) get an entry if all fields were the same, > > update_or_create() updates an object with the given kwargs, the match is > not made > against *all* fields (i.e. for the match the fields in `defaults` are not > accounted for). > > > 2) or create a new entry if it didn't find an existing entry with the > unique_id > > 3) or if there was an entry with the same unique_id, update that entry > with remaining > > fields. > > update_or_create() should achieve this. It's hard to tell more without > additional > information, but > > https://docs.djangoproject.com/en/1.8/ref/models/querysets/#update-or-create > explains > the function well, including how it works. If you work through this in > small steps, > check examples and their (intermediate) results, you should be able to > find what the > original problem was. > > Best regards, > Carsten > > -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at http://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/45a2e51e-d7bb-4743-aa4c-c23b17098d17%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: update_or_create() always creates (or recreates)
Hi Yunti, Am 05.11.2015 um 18:19 schrieb Yunti: I have tried to use the update_or_create() method assuming that it would either, create a new entry in the db if it found none or update an existing one if it found one and had differences to the defaults passed in - or wouldn't update if there was no difference. A note about the last statement: If a Supplier object has the same unique_id, and all other fields (in `defaults`) are the same as well, logically there is no difference between updating and not updating – the result is the same. However it just seemed to recreate entries each time even if there were no changes. Have you checked? How? In your create_or_update_if_diff() you seem to try to re-invent update_or_create(), but have you actually examined the results of the supplier, created = Supplier.objects.update_or_create(...) call? I think the issue was that I wanted to: 1) get an entry if all fields were the same, update_or_create() updates an object with the given kwargs, the match is not made against *all* fields (i.e. for the match the fields in `defaults` are not accounted for). 2) or create a new entry if it didn't find an existing entry with the unique_id 3) or if there was an entry with the same unique_id, update that entry with remaining fields. update_or_create() should achieve this. It's hard to tell more without additional information, but https://docs.djangoproject.com/en/1.8/ref/models/querysets/#update-or-create explains the function well, including how it works. If you work through this in small steps, check examples and their (intermediate) results, you should be able to find what the original problem was. Best regards, Carsten -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at http://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/563BB657.7050209%40cafu.de. For more options, visit https://groups.google.com/d/optout.
update_or_create() always creates (or recreates)
I have tried to use the update_or_create() method assuming that it would either, create a new entry in the db if it found none or update an existing one if it found one and had differences to the defaults passed in - or wouldn't update if there was no difference. However it just seemed to recreate entries each time even if there were no changes. I think the issue was that I wanted to: 1) get an entry if all fields were the same, 2) or create a new entry if it didn't find an existing entry with the unique_id 3) or if there was an entry with the same unique_id, update that entry with remaining fields. The update_or_create() method doesn't seem to work as I had hoped using how I have called it below - it just always seems to do an update if it finds a match on the given kwargs. Or if I tried passing in all That would would have to be passing in all the fields as keyword args to check that nothing had changed but then that would miss option 3) finding an existing entry that supplier, created = Supplier.objects.update_or_create(unique_id=product_detail['supplierId'], defaults={ 'name': product_detail['supplierName'], 'entity_name_1': entity_name_1, 'entity_name_2': entity_name_1, 'rating': product_detail['supplierRating']}) class Supplier(models.Model): unique_id = models.IntegerField(unique=True) name = models.CharField(max_length=255, unique=True) entity_name_1 = models.CharField(max_length=255, blank=True) entity_name_2 = models.CharField(max_length=255, blank=True) rating = models.CharField(max_length=255) last_updated = models.DateTimeField(auto_now=True) def __str__(self): return self.name Not being convinced that update_or_create() would give me what I needed I made the below function: def create_or_update_if_diff(defaults, model): try: instance = model.objects.get(**defaults) # if no exception, the product doesn't need to be updated except model.DoesNotExist: # the product needs to be created or updated try: model.objects.get(unique_id=defaults['unique_id']) except model.DoesNotExist: # needs to be created instance = model.objects.create(**defaults) # model(**defaults).save() sys.stdout.write('New {} created: {}\n'.format(model, instance.name)) return instance, True else: # needs to be updated instance = model.objects.update(**defaults) sys.stdout.write('{}:' ' {} updated \n'.format(model, instance.unique_id)) return instance, True return instance, False However I can't get it to be quite right. I key a key error on update possibly because the defaults passed in now include unique_id. Should the unique_id be separated and both passed into the function to fix this? (And should I have created a function to achieve this - or would have update_or_create() have been able to do this.?) supplier_defaults={ 'unique_id': product_detail['supplierId'], 'name': product_detail['supplierName'], 'entity_name_1': entity_name_1, 'entity_name_2': entity_name_2, 'rating': product_detail['supplierRating']} -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at http://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/a0b6e1dd-d583-480e-9c6e-540c1ad4511a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.