On Feb 01, Bjørn von Rimscha <b.vonrims...@gmail.com> wrote:

>   * SPLITTING (run 1 of 2), recursion: 1
  [...]
>    File "C:\Python26\lib\site-packages\MySQLdb\connections.py", line 36, in 
> defaulterrorhandler raise errorclass, 
> errorvalue_mysql_exceptions.IntegrityError: (1062, "Duplicate entry '262918' 
> for key 'PRIMARY'")
> 
> Does that mean splitting created dupicates?

Something like that. :-/
It goes without saying that it shouldn't. :-)

> Or is there a mistake in the imdb data

Improbable: the plain text data files contain some crap, but
imdbpy2sql.py is supposed to work around these problems.
Moreover, the IDs used as primary key in the 'title' (and other)
tables are made up by imdbpy2sql.py itself, and not taken from
the plain text data files.

> I also increased the max_allowed_packet value as suggested in the read 
> me file only with the same result.
  [...]
> MySQL 5.1
> Python 2.6.4
> Windows 7

Honestly, I'm getting a lot of bug reports about MySQL on Windows 7
or Vista.
It's crazy that nobody is able to make it work: splitting the data
we're sending to the database is just a fail-safe measure, and as
you can imagine it would be better if it never happens.
Under Linux I've never seen these problems: tomorrow I'll try to
set up my own MySQL to reproduce the problem, if I can (as you can
guess the code used to split the data set is not extensively tested).

I have some solutions you can try (beside moving to a sane Unix
environment, I mean ;-) :

1. use CSV files, as described in the README.sqldb file.
   Basically, add to your command line something like:
     -c C:/path/to/an/empty/directory/
   But also read the notices in README.sqldb about CSV files
   and Windows paths.
   PS: right now in the SVN, I'm improving support for CSV handling,
   so that you can decouple the creation of the CSV file from the
   insertion of the data in the database.

2. use PostgreSQL or another supported database.

In the meanwhile, if someone has any idea about why MySQL on Vista/7
accepts so few data at a time, is solicited to share his thoughts. :-)


Let me know if/how you fix the problem!

-- 
Davide Alberani <davide.alber...@gmail.com> [GPG KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Reply via email to