Thanks again Jan,
Your comments are very useful. I think perhaps what is happening is that
when the SELECT 1 command is run more than just the player name is being
added i.e. in its raw stored format of
[u'Avins, K']
Which is why I had the [0] after item:
conn.execute("""SELECT EXISTS(
SELECT 1 FROM tblHandicaps WHERE Player = %s)""",
(item['name'][0]))
Perhaps this is not the right option to use here?
On Wednesday, December 18, 2013 10:18:10 AM UTC, Jan Wrobel wrote:
>
> I tested the EXISTS query and it also works correctly for me. I was
> suspecting that maybe conn.fetchone()[0] is returning a string "0"
> instead of 0, which would break the if, but it isn't the case.
>
> There is a potential race condition if your spider reused Item object,
> or if the item was modified by subsequent pipelines that are executed
> after MySQLStorePipeline. Such race condition could cause item['name']
> to change between an execution of EXISTS and INSERT (because
> _do_upsert runs in a separate thread), but it should rather manifest
> itself rarely. If updates are never executed, I suspect something else
> is causing it.
>
> I'm still concern if the DB lib can match datetime.now() result to an
> Updated column without appropriate '=%' format directive, but such
> problem would manifest itself differently.
>
> I think a way to debug this could be to add more debug logs to see
> what 'name' is used in the EXISTS query, what is a result of the query
> (to confirm that this is indeed an int, not a string) and what 'name'
> is later inserted in the 'INSERT' query.
>
> Cheer,
> Jan
>
> On Wed, Dec 18, 2013 at 10:00 AM, Shaun Marchant
> <[email protected] <javascript:>> wrote:
> > Hi Jan,
> >
> > Thanks, there's not supposed to be a %s there as I'm using the
> > datetime.now() for that field. The problem lies before the Update
> statement
> > though as its never getting triggered.
> >
> > The pipeline starts
> > Connects to the database and checks to see if a record matching Player
> > already exists
> > If a matching Player record exists then the UPDATE command should be
> > executed, if not then INSERT a new record.
> >
> > It is point 3. where either there is a slight variance in the
> > Player/item['name'] match which is returning no records causing INSERT
> to
> > happen or my IF statement logic is not correct.
> > I have manually tested the SELECT EXISTS statement and it works so
> unless
> > Scrapy is altering it somehow I think the problem is with the IF
> statement.
> >
> > def process_item(self, item, spider):
> > d = self.dbpool.runInteraction(self._do_upsert, item, spider)
> > d.addErrback(self._handle_error, item, spider)
> > d.addBoth(lambda _: item)
> >
> > return d
> >
> > def _do_upsert(self, conn, item, spider):
> >
> > conn.execute("""SELECT EXISTS(
> > SELECT 1 FROM tblHandicaps WHERE Player = %s)""",
> > (item['name'][0]))
> > database.commit()
> > ret = conn.fetchone()[0]
> >
> > if ret:
> > conn.execute("""
> > UPDATE tblHandicaps.......
> >
> >
> >
> >
> >
> > On Wednesday, December 18, 2013 8:45:48 AM UTC, Jan Wrobel wrote:
> >>
> >> Isn't '=%s' missing after 'Updated' in the UPDATE query?
> >>
> >> Jan
> >>
> >> On Wed, Dec 18, 2013 at 9:29 AM, Shaun Marchant
> >> <[email protected]> wrote:
> >> > Hi Arvin,
> >> >
> >> > Thanks for your suggestion, I have added in the database commit
> >> > statements
> >> > but it makes no noticeable difference. The pipeline does write to the
> >> > database OK as if I delete all records it will add them in no
> problem,
> >> > its
> >> > purely the UPDATE routine that is not triggering but I cannot see
> why.
> >> >
> >> >
> >> > Cheers.
> >> >
> >> > Shaun
> >> >
> >> >
> >> > On Wednesday, December 18, 2013 12:55:54 AM UTC, Arvin Cao wrote:
> >> >>
> >> >> Hi
> >> >> Maybe you did not commit your database. after "insert" or
> "update"
> >> >> your
> >> >> mysql with Python, you should use database.commit() to commit your
> >> >> database.
> >> >>
> >> >>
> >> >> 2013/12/18 Shaun Marchant <[email protected]>
> >> >>>
> >> >>> Hi Group,
> >> >>>
> >> >>> I've managed to get a pipeline working that is INSERTING my scraped
> >> >>> data
> >> >>> into a MySQL database but I am unable to update records as it
> appears
> >> >>> the
> >> >>> update logic is not working correctly... any suggestions as to why
> its
> >> >>> not
> >> >>> triggering? I have tested the SQL statement etc directly which is
> >> >>> working
> >> >>> fine.
> >> >>>
> >> >>> Pipeline.py:
> >> >>> import sys
> >> >>> import MySQLdb
> >> >>> import hashlib
> >> >>> from datetime import datetime
> >> >>> from scrapy.exceptions import DropItem
> >> >>> from scrapy.http import Request
> >> >>>
> >> >>> class EmptyItemPipeline(object):
> >> >>>
> >> >>> def process_item(self, item, spider):
> >> >>> if item['handicap']:
> >> >>> return item
> >> >>> else:
> >> >>> item['handicap'] = '99'
> >> >>> return item
> >> >>>
> >> >>> class MySQLStorePipeline(object):
> >> >>>
> >> >>> def __init__(self, dbpool):
> >> >>> self.dbpool = dbpool
> >> >>>
> >> >>> @classmethod
> >> >>> def from_settings(cls, settings):
> >> >>> dbargs = dict(
> >> >>> host=settings['localhost'],
> >> >>> db=settings['database'],
> >> >>> user=settings['username'],
> >> >>> passwd=settings['password'],
> >> >>> charset='utf8',
> >> >>> use_unicode=True,
> >> >>> )
> >> >>> dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs)
> >> >>> return cls(dbpool)
> >> >>>
> >> >>> def process_item(self, item, spider):
> >> >>> # run db query in the thread pool
> >> >>> d = self.dbpool.runInteraction(self._do_upsert, item,
> spider)
> >> >>> d.addErrback(self._handle_error, item, spider)
> >> >>> # at the end return the item in case of success or failure
> >> >>> d.addBoth(lambda _: item)
> >> >>> # return the deferred instead the item. This makes the
> engine
> >> >>> to
> >> >>> # process next item (according to CONCURRENT_ITEMS setting)
> >> >>> after
> >> >>> this
> >> >>> # operation (deferred) has finished.
> >> >>> return d
> >> >>>
> >> >>> def _do_upsert(self, conn, item, spider):
> >> >>>
> >> >>> conn.execute("""SELECT EXISTS(
> >> >>> SELECT 1 FROM tblHandicaps WHERE Player = %s)""",
> >> >>> (item['name'][0]))
> >> >>> ret = conn.fetchone()[0]
> >> >>>
> >> >>> if ret:
> >> >>> conn.execute("""
> >> >>> UPDATE tblHandicaps
> >> >>> SET Player=%s, Handicap=%s, Exact=%s, Category=%s,
> >> >>> Updated
> >> >>> WHERE Player=%s
> >> >>> """, (item['name'][0], item['handicap'][0],
> >> >>> item['exact'][0],
> >> >>> item['category'][0], datetime.now(), item['name'][0]))
> >> >>> spider.log("Item updated in db: %s %r" %
> (item['name'][0],
> >> >>> item))
> >> >>> else:
> >> >>> conn.execute("""INSERT INTO tblHandicaps (Player,
> >> >>> Handicap,
> >> >>> Exact, Category, Sex, Updated) VALUES (%s, %s, %s, %s, 'M', %s)""",
> >> >>> (item['name'][0],
> >> >>> item['handicap'][0],
> >> >>> item['exact'][0],
> >> >>> item['category'][0],
> >> >>> datetime.now()))
> >> >>>
> >> >>> spider.log("Item stored in db: %s %r" %
> (item['name'][0],
> >> >>> item))
> >> >>>
> >> >>> def _handle_error(self, failure, item, spider):
> >> >>> """Handle occurred on db interaction."""
> >> >>> # do nothing, just log
> >> >>> log.err(failure)
> >> >>>
> >> >>>
> >> >>>
> >> >>> I don't see a specific error in the console log other than an SQL
> >> >>> primary
> >> >>> key error as its trying to INSERT rather than UPDATE.
> >> >>>
> >> >>> Cheers
> >> >>>
> >> >>>
> >> >>> Shaun
> >> >>>
> >> >>> --
> >> >>> You received this message because you are subscribed to the Google
> >> >>> Groups
> >> >>> "scrapy-users" group.
> >> >>> To unsubscribe from this group and stop receiving emails from it,
> send
> >> >>> an
> >> >>> email to [email protected].
> >> >>> To post to this group, send email to [email protected].
> >> >>>
> >> >>> Visit this group at http://groups.google.com/group/scrapy-users.
> >> >>> For more options, visit https://groups.google.com/groups/opt_out.
> >> >>
> >> >>
> >> > --
> >> > You received this message because you are subscribed to the Google
> >> > Groups
> >> > "scrapy-users" group.
> >> > To unsubscribe from this group and stop receiving emails from it,
> send
> >> > an
> >> > email to [email protected].
> >> > To post to this group, send email to [email protected].
> >> > Visit this group at http://groups.google.com/group/scrapy-users.
> >> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups
> > "scrapy-users" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an
> > email to [email protected] <javascript:>.
> > To post to this group, send email to
> > [email protected]<javascript:>.
>
> > Visit this group at http://groups.google.com/group/scrapy-users.
> > For more options, visit https://groups.google.com/groups/opt_out.
>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.