I tested the EXISTS query and it also works correctly for me. I was
suspecting that maybe conn.fetchone()[0] is returning a string "0"
instead of 0, which would break the if, but it isn't the case.

There is a potential race condition if your spider reused Item object,
or if the item was modified by subsequent pipelines that are executed
after MySQLStorePipeline. Such race condition could cause item['name']
to change between an execution of EXISTS and INSERT (because
_do_upsert runs in a separate thread), but it should rather manifest
itself rarely. If updates are never executed, I suspect something else
is causing it.

I'm still concern if the DB lib can match datetime.now() result to an
Updated column without appropriate '=%' format directive, but such
problem would manifest itself differently.

I think a way to debug this could be to add more debug logs to see
what 'name' is used in the EXISTS query, what is a result of the query
(to confirm that this is indeed an int, not a string) and what 'name'
is later inserted in the 'INSERT' query.

Cheer,
Jan

On Wed, Dec 18, 2013 at 10:00 AM, Shaun Marchant
<[email protected]> wrote:
> Hi Jan,
>
> Thanks, there's not supposed to be a %s there as I'm using the
> datetime.now() for that field. The problem lies before the Update statement
> though as its never getting triggered.
>
> The pipeline starts
> Connects to the database and checks to see if a record matching Player
> already exists
> If a matching Player record exists then the UPDATE command should be
> executed, if not then INSERT a new record.
>
> It is point 3. where either there is a slight variance in the
> Player/item['name'] match which is returning no records causing INSERT to
> happen or my IF statement logic is not correct.
> I have manually tested the SELECT EXISTS statement and it works so unless
> Scrapy is altering it somehow I think the problem is with the IF statement.
>
>     def process_item(self, item, spider):
>         d = self.dbpool.runInteraction(self._do_upsert, item, spider)
>         d.addErrback(self._handle_error, item, spider)
>         d.addBoth(lambda _: item)
>
>         return d
>
>     def _do_upsert(self, conn, item, spider):
>
>         conn.execute("""SELECT EXISTS(
>                 SELECT 1 FROM tblHandicaps WHERE Player = %s)""",
> (item['name'][0]))
>         database.commit()
>         ret = conn.fetchone()[0]
>
>         if ret:
>             conn.execute("""
>                 UPDATE tblHandicaps.......
>
>
>
>
>
> On Wednesday, December 18, 2013 8:45:48 AM UTC, Jan Wrobel wrote:
>>
>> Isn't '=%s' missing after 'Updated' in the UPDATE query?
>>
>> Jan
>>
>> On Wed, Dec 18, 2013 at 9:29 AM, Shaun Marchant
>> <[email protected]> wrote:
>> > Hi Arvin,
>> >
>> > Thanks for your suggestion, I have added in the database commit
>> > statements
>> > but it makes no noticeable difference. The pipeline does write to the
>> > database OK as if I delete all records it will add them in no problem,
>> > its
>> > purely the UPDATE routine that is not triggering but I cannot see why.
>> >
>> >
>> > Cheers.
>> >
>> > Shaun
>> >
>> >
>> > On Wednesday, December 18, 2013 12:55:54 AM UTC, Arvin Cao wrote:
>> >>
>> >> Hi
>> >>    Maybe you did not commit your database. after "insert" or "update"
>> >> your
>> >> mysql with Python, you should use database.commit() to commit your
>> >> database.
>> >>
>> >>
>> >> 2013/12/18 Shaun Marchant <[email protected]>
>> >>>
>> >>> Hi Group,
>> >>>
>> >>> I've managed to get a pipeline working that is INSERTING my scraped
>> >>> data
>> >>> into a MySQL database but I am unable to update records as it appears
>> >>> the
>> >>> update logic is not working correctly... any suggestions as to why its
>> >>> not
>> >>> triggering? I have tested the SQL statement etc directly which is
>> >>> working
>> >>> fine.
>> >>>
>> >>> Pipeline.py:
>> >>> import sys
>> >>> import MySQLdb
>> >>> import hashlib
>> >>> from datetime import datetime
>> >>> from scrapy.exceptions import DropItem
>> >>> from scrapy.http import Request
>> >>>
>> >>> class EmptyItemPipeline(object):
>> >>>
>> >>>     def process_item(self, item, spider):
>> >>>         if item['handicap']:
>> >>>             return item
>> >>>         else:
>> >>>             item['handicap'] = '99'
>> >>>             return item
>> >>>
>> >>> class MySQLStorePipeline(object):
>> >>>
>> >>>     def __init__(self, dbpool):
>> >>>         self.dbpool = dbpool
>> >>>
>> >>>     @classmethod
>> >>>     def from_settings(cls, settings):
>> >>>         dbargs = dict(
>> >>>             host=settings['localhost'],
>> >>>             db=settings['database'],
>> >>>             user=settings['username'],
>> >>>             passwd=settings['password'],
>> >>>             charset='utf8',
>> >>>             use_unicode=True,
>> >>>         )
>> >>>         dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs)
>> >>>         return cls(dbpool)
>> >>>
>> >>>     def process_item(self, item, spider):
>> >>>         # run db query in the thread pool
>> >>>         d = self.dbpool.runInteraction(self._do_upsert, item, spider)
>> >>>         d.addErrback(self._handle_error, item, spider)
>> >>>         # at the end return the item in case of success or failure
>> >>>         d.addBoth(lambda _: item)
>> >>>         # return the deferred instead the item. This makes the engine
>> >>> to
>> >>>         # process next item (according to CONCURRENT_ITEMS setting)
>> >>> after
>> >>> this
>> >>>         # operation (deferred) has finished.
>> >>>         return d
>> >>>
>> >>>     def _do_upsert(self, conn, item, spider):
>> >>>
>> >>>         conn.execute("""SELECT EXISTS(
>> >>>                 SELECT 1 FROM tblHandicaps WHERE Player = %s)""",
>> >>> (item['name'][0]))
>> >>>         ret = conn.fetchone()[0]
>> >>>
>> >>>         if ret:
>> >>>             conn.execute("""
>> >>>                 UPDATE tblHandicaps
>> >>>                 SET Player=%s, Handicap=%s, Exact=%s, Category=%s,
>> >>> Updated
>> >>>                 WHERE Player=%s
>> >>>             """, (item['name'][0], item['handicap'][0],
>> >>> item['exact'][0],
>> >>> item['category'][0], datetime.now(), item['name'][0]))
>> >>>             spider.log("Item updated in db: %s %r" % (item['name'][0],
>> >>> item))
>> >>>         else:
>> >>>             conn.execute("""INSERT INTO tblHandicaps (Player,
>> >>> Handicap,
>> >>> Exact, Category, Sex, Updated) VALUES (%s, %s, %s, %s, 'M', %s)""",
>> >>>                 (item['name'][0],
>> >>>                  item['handicap'][0],
>> >>>                  item['exact'][0],
>> >>>                  item['category'][0],
>> >>>                  datetime.now()))
>> >>>
>> >>>             spider.log("Item stored in db: %s %r" % (item['name'][0],
>> >>> item))
>> >>>
>> >>>     def _handle_error(self, failure, item, spider):
>> >>>         """Handle occurred on db interaction."""
>> >>>         # do nothing, just log
>> >>>         log.err(failure)
>> >>>
>> >>>
>> >>>
>> >>> I don't see a specific error in the console log other than an SQL
>> >>> primary
>> >>> key error as its trying to INSERT rather than UPDATE.
>> >>>
>> >>> Cheers
>> >>>
>> >>>
>> >>> Shaun
>> >>>
>> >>> --
>> >>> You received this message because you are subscribed to the Google
>> >>> Groups
>> >>> "scrapy-users" group.
>> >>> To unsubscribe from this group and stop receiving emails from it, send
>> >>> an
>> >>> email to [email protected].
>> >>> To post to this group, send email to [email protected].
>> >>>
>> >>> Visit this group at http://groups.google.com/group/scrapy-users.
>> >>> For more options, visit https://groups.google.com/groups/opt_out.
>> >>
>> >>
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "scrapy-users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to [email protected].
>> > To post to this group, send email to [email protected].
>> > Visit this group at http://groups.google.com/group/scrapy-users.
>> > For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to