I tested the EXISTS query and it also works correctly for me. I was suspecting that maybe conn.fetchone()[0] is returning a string "0" instead of 0, which would break the if, but it isn't the case.
There is a potential race condition if your spider reused Item object, or if the item was modified by subsequent pipelines that are executed after MySQLStorePipeline. Such race condition could cause item['name'] to change between an execution of EXISTS and INSERT (because _do_upsert runs in a separate thread), but it should rather manifest itself rarely. If updates are never executed, I suspect something else is causing it. I'm still concern if the DB lib can match datetime.now() result to an Updated column without appropriate '=%' format directive, but such problem would manifest itself differently. I think a way to debug this could be to add more debug logs to see what 'name' is used in the EXISTS query, what is a result of the query (to confirm that this is indeed an int, not a string) and what 'name' is later inserted in the 'INSERT' query. Cheer, Jan On Wed, Dec 18, 2013 at 10:00 AM, Shaun Marchant <[email protected]> wrote: > Hi Jan, > > Thanks, there's not supposed to be a %s there as I'm using the > datetime.now() for that field. The problem lies before the Update statement > though as its never getting triggered. > > The pipeline starts > Connects to the database and checks to see if a record matching Player > already exists > If a matching Player record exists then the UPDATE command should be > executed, if not then INSERT a new record. > > It is point 3. where either there is a slight variance in the > Player/item['name'] match which is returning no records causing INSERT to > happen or my IF statement logic is not correct. > I have manually tested the SELECT EXISTS statement and it works so unless > Scrapy is altering it somehow I think the problem is with the IF statement. > > def process_item(self, item, spider): > d = self.dbpool.runInteraction(self._do_upsert, item, spider) > d.addErrback(self._handle_error, item, spider) > d.addBoth(lambda _: item) > > return d > > def _do_upsert(self, conn, item, spider): > > conn.execute("""SELECT EXISTS( > SELECT 1 FROM tblHandicaps WHERE Player = %s)""", > (item['name'][0])) > database.commit() > ret = conn.fetchone()[0] > > if ret: > conn.execute(""" > UPDATE tblHandicaps....... > > > > > > On Wednesday, December 18, 2013 8:45:48 AM UTC, Jan Wrobel wrote: >> >> Isn't '=%s' missing after 'Updated' in the UPDATE query? >> >> Jan >> >> On Wed, Dec 18, 2013 at 9:29 AM, Shaun Marchant >> <[email protected]> wrote: >> > Hi Arvin, >> > >> > Thanks for your suggestion, I have added in the database commit >> > statements >> > but it makes no noticeable difference. The pipeline does write to the >> > database OK as if I delete all records it will add them in no problem, >> > its >> > purely the UPDATE routine that is not triggering but I cannot see why. >> > >> > >> > Cheers. >> > >> > Shaun >> > >> > >> > On Wednesday, December 18, 2013 12:55:54 AM UTC, Arvin Cao wrote: >> >> >> >> Hi >> >> Maybe you did not commit your database. after "insert" or "update" >> >> your >> >> mysql with Python, you should use database.commit() to commit your >> >> database. >> >> >> >> >> >> 2013/12/18 Shaun Marchant <[email protected]> >> >>> >> >>> Hi Group, >> >>> >> >>> I've managed to get a pipeline working that is INSERTING my scraped >> >>> data >> >>> into a MySQL database but I am unable to update records as it appears >> >>> the >> >>> update logic is not working correctly... any suggestions as to why its >> >>> not >> >>> triggering? I have tested the SQL statement etc directly which is >> >>> working >> >>> fine. >> >>> >> >>> Pipeline.py: >> >>> import sys >> >>> import MySQLdb >> >>> import hashlib >> >>> from datetime import datetime >> >>> from scrapy.exceptions import DropItem >> >>> from scrapy.http import Request >> >>> >> >>> class EmptyItemPipeline(object): >> >>> >> >>> def process_item(self, item, spider): >> >>> if item['handicap']: >> >>> return item >> >>> else: >> >>> item['handicap'] = '99' >> >>> return item >> >>> >> >>> class MySQLStorePipeline(object): >> >>> >> >>> def __init__(self, dbpool): >> >>> self.dbpool = dbpool >> >>> >> >>> @classmethod >> >>> def from_settings(cls, settings): >> >>> dbargs = dict( >> >>> host=settings['localhost'], >> >>> db=settings['database'], >> >>> user=settings['username'], >> >>> passwd=settings['password'], >> >>> charset='utf8', >> >>> use_unicode=True, >> >>> ) >> >>> dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs) >> >>> return cls(dbpool) >> >>> >> >>> def process_item(self, item, spider): >> >>> # run db query in the thread pool >> >>> d = self.dbpool.runInteraction(self._do_upsert, item, spider) >> >>> d.addErrback(self._handle_error, item, spider) >> >>> # at the end return the item in case of success or failure >> >>> d.addBoth(lambda _: item) >> >>> # return the deferred instead the item. This makes the engine >> >>> to >> >>> # process next item (according to CONCURRENT_ITEMS setting) >> >>> after >> >>> this >> >>> # operation (deferred) has finished. >> >>> return d >> >>> >> >>> def _do_upsert(self, conn, item, spider): >> >>> >> >>> conn.execute("""SELECT EXISTS( >> >>> SELECT 1 FROM tblHandicaps WHERE Player = %s)""", >> >>> (item['name'][0])) >> >>> ret = conn.fetchone()[0] >> >>> >> >>> if ret: >> >>> conn.execute(""" >> >>> UPDATE tblHandicaps >> >>> SET Player=%s, Handicap=%s, Exact=%s, Category=%s, >> >>> Updated >> >>> WHERE Player=%s >> >>> """, (item['name'][0], item['handicap'][0], >> >>> item['exact'][0], >> >>> item['category'][0], datetime.now(), item['name'][0])) >> >>> spider.log("Item updated in db: %s %r" % (item['name'][0], >> >>> item)) >> >>> else: >> >>> conn.execute("""INSERT INTO tblHandicaps (Player, >> >>> Handicap, >> >>> Exact, Category, Sex, Updated) VALUES (%s, %s, %s, %s, 'M', %s)""", >> >>> (item['name'][0], >> >>> item['handicap'][0], >> >>> item['exact'][0], >> >>> item['category'][0], >> >>> datetime.now())) >> >>> >> >>> spider.log("Item stored in db: %s %r" % (item['name'][0], >> >>> item)) >> >>> >> >>> def _handle_error(self, failure, item, spider): >> >>> """Handle occurred on db interaction.""" >> >>> # do nothing, just log >> >>> log.err(failure) >> >>> >> >>> >> >>> >> >>> I don't see a specific error in the console log other than an SQL >> >>> primary >> >>> key error as its trying to INSERT rather than UPDATE. >> >>> >> >>> Cheers >> >>> >> >>> >> >>> Shaun >> >>> >> >>> -- >> >>> You received this message because you are subscribed to the Google >> >>> Groups >> >>> "scrapy-users" group. >> >>> To unsubscribe from this group and stop receiving emails from it, send >> >>> an >> >>> email to [email protected]. >> >>> To post to this group, send email to [email protected]. >> >>> >> >>> Visit this group at http://groups.google.com/group/scrapy-users. >> >>> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> >> >> > -- >> > You received this message because you are subscribed to the Google >> > Groups >> > "scrapy-users" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> > an >> > email to [email protected]. >> > To post to this group, send email to [email protected]. >> > Visit this group at http://groups.google.com/group/scrapy-users. >> > For more options, visit https://groups.google.com/groups/opt_out. > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/groups/opt_out.
