[sqlalchemy] Errors on using Postgres' COPY FROM

2009-05-25 Thread nuin

Hi everyone

First message to the list. I've been using SQLAlchemy for quite some
time, mostly with MySQL and some SQLite. Now, we are moving our DB
system to Postgres and I'm converting some scripts that parse data
into our databases to Postgres.

I'm having a recurring error on using COPY FROM. The Python code is
this

after starting and having a successful connection
curr_path2 = os.getcwd()
trans = conn.begin()
conn.execute(COPY precursor (spectrum_number, scan_begin, scan_end,
mass, mtoz, elution_time_begin, spectral_analysis_id, charge) FROM
'%s'  % curr_path2)
trans.commit()

The error I'm getting, with different input files, sometimes on
different lines, is the following:

Traceback (most recent call last):
  File pp_ipad_parser3.py, line 434, in module
FROM '%s'  % curr_path2)
  File C:\Python25\lib\site-packages\sqlalchemy-0.4.7p1-py2.5.egg
\sqlalchemy\engine\base.py, line 844, in execute
return Connection.executors[c](self, object, multiparams, params)
  File C:\Python25\lib\site-packages\sqlalchemy-0.4.7p1-py2.5.egg
\sqlalchemy\engine\base.py, line 854, in _execute_tex
t
self.__execute_raw(context)
  File C:\Python25\lib\site-packages\sqlalchemy-0.4.7p1-py2.5.egg
\sqlalchemy\engine\base.py, line 916, in __execute_ra
w
self._cursor_execute(context.cursor, context.statement,
context.parameters[0], context=context)
  File C:\Python25\lib\site-packages\sqlalchemy-0.4.7p1-py2.5.egg
\sqlalchemy\engine\base.py, line 960, in _cursor_exec
ute
self._handle_dbapi_exception(e, statement, parameters, cursor)
  File C:\Python25\lib\site-packages\sqlalchemy-0.4.7p1-py2.5.egg
\sqlalchemy\engine\base.py, line 942, in _handle_dbap
i_exception
raise exceptions.DBAPIError.instance(statement, parameters, e,
connection_invalidated=is_disconnect)
sqlalchemy.exceptions.DataError: (DataError) missing data for column
spectral_analysis_id
CONTEXT:  COPY precursor, line 360: 13177  13177   13177
555.20511662.5936279296964.374
 COPY precursor (spectrum_number, scan_begin, scan_end, mass, mtoz,
elution_time_begin, spectral_analysis_id, charge) F
ROM 'C:/Users/nuin/workspace/OCBN/src/ipad_parser_1/temp2.csv'  {}

The files are tab-delimited ones, and I have checked their integrity
and the lines where the errors are occurring are fine. Running the
same command by using the -c parameter of psql doesn't give me any
errors and the copy finishes successfully. I have searched different
forums and web sites trying to find a solution for this problem, but I
couldn't find anything that would resemble the error I'm having.

I'd like to thank in advance for any help

Paulo Nuin

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Errors on using Postgres' COPY FROM

2009-05-25 Thread Michael Bayer


On May 25, 2009, at 11:13 AM, nuin wrote:

 sqlalchemy.exceptions.DataError: (DataError) missing data for column
 spectral_analysis_id
 CONTEXT:  COPY precursor, line 360: 13177  13177   13177
 555.20511662.5936279296964.374
 COPY precursor (spectrum_number, scan_begin, scan_end, mass, mtoz,
 elution_time_begin, spectral_analysis_id, charge) F
 ROM 'C:/Users/nuin/workspace/OCBN/src/ipad_parser_1/temp2.csv'  {}


im not familiar with direct usage of COPY FROM but it seems here like  
line 360 of your input file only has six elements and the COPY  
statement is asking for eight.   its not a SQLalchemy issue in any  
case and if anything would be specific to psycopg2, if at all.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Errors on using Postgres' COPY FROM

2009-05-25 Thread Paulo Nuin

Hi

The line on temp2.csv has all the values, and the delimiters are fine:

131771317713177555.20511662.5936279296964.374802
1090

If I remove the line, it sometimes fails again on the same line, with 
different values then, or fails in another line. I have a series of tab 
delimited files to input, and the command fails with different types of 
inputs, integer, floats, etc. A larger file also fails in the middle, a 
shorter file also fails. Testing all the possible input sizes on psql, 
everything is smooth and the inputs are accepted.

Thanks

Paulo


Michael Bayer wrote:
 On May 25, 2009, at 11:13 AM, nuin wrote:

   
 sqlalchemy.exceptions.DataError: (DataError) missing data for column
 spectral_analysis_id
 CONTEXT:  COPY precursor, line 360: 13177  13177   13177
 555.20511662.5936279296964.374
 COPY precursor (spectrum_number, scan_begin, scan_end, mass, mtoz,
 elution_time_begin, spectral_analysis_id, charge) F
 ROM 'C:/Users/nuin/workspace/OCBN/src/ipad_parser_1/temp2.csv'  {}

 

 im not familiar with direct usage of COPY FROM but it seems here like  
 line 360 of your input file only has six elements and the COPY  
 statement is asking for eight.   its not a SQLalchemy issue in any  
 case and if anything would be specific to psycopg2, if at all.


 
   


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: querying into objects from a large flat table or have multiple processed tables mapped to objects?

2009-05-25 Thread Chris Withers

I wish I could make pathetic whining noises and puppydog eyes...

If anyone with some relational nouse could give me even some vague 
pointers that would be great...

Chris

Chris Withers wrote:
 Hi All,
 
 I'm looking to do (something like) weblog analysis and thought it'd be 
 interesting to try an rdb through SQLAlchemy.
 
 Apologies in advance, my knowledge of rdb and sa is much poorer than I'd 
 like, so simple pointers may help...
 
 So, assuming I have data something like:
 
 2009-05-20-00:01:02.345 user1 192.168.1.1  /some/folder/some/path
 2009-05-20-00:01:03.452 user1 192.168.1.1  /some/folder/other/path
 2009-05-20-00:01:02.345 user2 192.168.1.10 /some/folder/some/path
 
 ...what would people recommend for storage?
 
 One table mapped to a Row class or some such?
 What would the primary key be? (since two entries could theoretically 
 happen at exactly the same time, I can't use the date...)
 
 I'd like to then aggregate first by user, then by month, giving a UI like:
 
 Users
 =
 
 Username   Total PagesLast Viewed
    ------
 user1  32434  27th July 2009 16:05
 user2  15000  1st Jan 2009 01:05
 ...lots more users, probably batched...
 
 Clicking on a username gives:
 
 User1
 =
 
 Month   Total Pages  Last Viewed
 --  ---  ---
May 2009  1000 15th May 2009 13:50
 April 2009  1000 30th April 2009 23:55
 ...lots of months...
 
 Clicking on a month gives:
 
 User1 - May 2009
 =
 
 Page Hits   Last Viewed
 ---  -  ---
 /some/folder/some/path   20 15th May 2009 13:50
 /some/folder/other/path  33 1st May 2009 13:50
 
 What queries would I use to generate these views? I'm envisaging some 
 group-by's here, but don't know how I'd plug those into SA. Would I have 
 classes for users, months and pages that had selects map to them or 
 something else?
 
 How would I batch them such that if there are several thousand rows in a 
 table I can show the old 1 to 20 of 3000 without having to load all 
 3000 rows?
 
 Another option I thought of was to not actually store the raw rows, but 
 have several tables:
 
 class Users(Base):
 __tablename__ 'users'
 name = Column(String,primary_key=True)
 total_pages = Column(Integer)
 last_viewed = Column(DateTime)
 
 class Month(Base):
 __tablename__ 'months'
 username = Column(
 String,ForeignKey('users.username'),primary_key=True
 )
 monthname = Column(String,primary_key=True)
 total_pages = Column(Integer)
 last_viewed = Column(DateTime)
 
 class Page(Base):
 __tablename__ 'users'
 username = Column(
 String,ForeignKey('months.username'),primary_key=True
 )
 monthname = Column(
 String,ForeignKey('months.monthname'),primary_key=True
 )
 hits = Column(Integer)
 last_viewed = Column(DateTime)
 
 ...and then populate them while parsing the logs. I can see this storing 
 a lot less data, but would it be faster? Still, how would I do batching 
 of results?
 Ideally, I'd prefer not to go down this route as it limits different 
 types of analysis later down the line but...
 
 What do people think? What's the normal/correct way of handling large 
 amounts of data to be sliced and diced like this?
 
 cheers,
 
 Chris
 
 

-- 
Simplistix - Content Management, Zope  Python Consulting
- http://www.simplistix.co.uk

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Errors on using Postgres' COPY FROM

2009-05-25 Thread Michael Bayer

As I said, the issue lies with the usage of the psycopg2 dbapi.  Write  
a test case using only psycopg2:

import psycopg2
conn = psycopg2.connect(user=user, passwd=password)
cursor = conn.cursor()
cursor.execute(my statement)

Assuming the behavior reproduces,  check with their mailing list.
There is nothing SQLAlchemy does outside of the above.



On May 25, 2009, at 11:38 AM, Paulo Nuin wrote:


 Hi

 The line on temp2.csv has all the values, and the delimiters are fine:

 131771317713177555.20511662.5936279296964.374802
 1090

 If I remove the line, it sometimes fails again on the same line, with
 different values then, or fails in another line. I have a series of  
 tab
 delimited files to input, and the command fails with different types  
 of
 inputs, integer, floats, etc. A larger file also fails in the  
 middle, a
 shorter file also fails. Testing all the possible input sizes on psql,
 everything is smooth and the inputs are accepted.

 Thanks

 Paulo


 Michael Bayer wrote:
 On May 25, 2009, at 11:13 AM, nuin wrote:


 sqlalchemy.exceptions.DataError: (DataError) missing data for column
 spectral_analysis_id
 CONTEXT:  COPY precursor, line 360: 13177  13177   13177
 555.20511662.5936279296964.374
 COPY precursor (spectrum_number, scan_begin, scan_end, mass, mtoz,
 elution_time_begin, spectral_analysis_id, charge) F
 ROM 'C:/Users/nuin/workspace/OCBN/src/ipad_parser_1/temp2.csv'  {}



 im not familiar with direct usage of COPY FROM but it seems here like
 line 360 of your input file only has six elements and the COPY
 statement is asking for eight.   its not a SQLalchemy issue in any
 case and if anything would be specific to psycopg2, if at all.






 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: querying into objects from a large flat table or have multiple processed tables mapped to objects?

2009-05-25 Thread Michael Bayer

the best thing to do would be to experiment with some various schemas  
and see what works best (and maybe read some SQL books or books about  
web log data mining if you feel you want some background).  The  
primary key of your table would be easiest as a surrogate integer key,  
group by is provded by the group_by() method of select() or  
Query.   You also might want to investigate the star schema approach  
which is popular for the slicing and dicing approach:  
http://en.wikipedia.org/wiki/Star_schema 
   .


On May 25, 2009, at 12:08 PM, Chris Withers wrote:


 I wish I could make pathetic whining noises and puppydog eyes...

 If anyone with some relational nouse could give me even some vague
 pointers that would be great...

 Chris

 Chris Withers wrote:
 Hi All,

 I'm looking to do (something like) weblog analysis and thought it'd  
 be
 interesting to try an rdb through SQLAlchemy.

 Apologies in advance, my knowledge of rdb and sa is much poorer  
 than I'd
 like, so simple pointers may help...

 So, assuming I have data something like:

 2009-05-20-00:01:02.345 user1 192.168.1.1  /some/folder/some/path
 2009-05-20-00:01:03.452 user1 192.168.1.1  /some/folder/other/path
 2009-05-20-00:01:02.345 user2 192.168.1.10 /some/folder/some/path

 ...what would people recommend for storage?

 One table mapped to a Row class or some such?
 What would the primary key be? (since two entries could theoretically
 happen at exactly the same time, I can't use the date...)

 I'd like to then aggregate first by user, then by month, giving a  
 UI like:

 Users
 =

 Username   Total PagesLast Viewed
    ------
 user1  32434  27th July 2009 16:05
 user2  15000  1st Jan 2009 01:05
 ...lots more users, probably batched...

 Clicking on a username gives:

 User1
 =

 Month   Total Pages  Last Viewed
 --  ---  ---
   May 2009  1000 15th May 2009 13:50
 April 2009  1000 30th April 2009 23:55
 ...lots of months...

 Clicking on a month gives:

 User1 - May 2009
 =

 Page Hits   Last Viewed
 ---  -  ---
 /some/folder/some/path   20 15th May 2009 13:50
 /some/folder/other/path  33 1st May 2009 13:50

 What queries would I use to generate these views? I'm envisaging some
 group-by's here, but don't know how I'd plug those into SA. Would I  
 have
 classes for users, months and pages that had selects map to them or
 something else?

 How would I batch them such that if there are several thousand rows  
 in a
 table I can show the old 1 to 20 of 3000 without having to load all
 3000 rows?

 Another option I thought of was to not actually store the raw rows,  
 but
 have several tables:

 class Users(Base):
__tablename__ 'users'
name = Column(String,primary_key=True)
total_pages = Column(Integer)
last_viewed = Column(DateTime)

 class Month(Base):
__tablename__ 'months'
username = Column(
String,ForeignKey('users.username'),primary_key=True
)
monthname = Column(String,primary_key=True)
total_pages = Column(Integer)
last_viewed = Column(DateTime)

 class Page(Base):
__tablename__ 'users'
username = Column(
String,ForeignKey('months.username'),primary_key=True
)
monthname = Column(
String,ForeignKey('months.monthname'),primary_key=True
)
hits = Column(Integer)
last_viewed = Column(DateTime)

 ...and then populate them while parsing the logs. I can see this  
 storing
 a lot less data, but would it be faster? Still, how would I do  
 batching
 of results?
 Ideally, I'd prefer not to go down this route as it limits different
 types of analysis later down the line but...

 What do people think? What's the normal/correct way of handling large
 amounts of data to be sliced and diced like this?

 cheers,

 Chris



 -- 
 Simplistix - Content Management, Zope  Python Consulting
- http://www.simplistix.co.uk

 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Errors on using Postgres' COPY FROM

2009-05-25 Thread Paulo Nuin

Thanks a lot. I will check it.

Paulo

On Mon, May 25, 2009 at 6:19 PM, Michael Bayer mike...@zzzcomputing.com wrote:

 As I said, the issue lies with the usage of the psycopg2 dbapi.  Write
 a test case using only psycopg2:

 import psycopg2
 conn = psycopg2.connect(user=user, passwd=password)
 cursor = conn.cursor()
 cursor.execute(my statement)

 Assuming the behavior reproduces,  check with their mailing list.
 There is nothing SQLAlchemy does outside of the above.



 On May 25, 2009, at 11:38 AM, Paulo Nuin wrote:


 Hi

 The line on temp2.csv has all the values, and the delimiters are fine:

 13177    13177    13177    555.2051    1662.59362792969    64.374802
 109    0

 If I remove the line, it sometimes fails again on the same line, with
 different values then, or fails in another line. I have a series of
 tab
 delimited files to input, and the command fails with different types
 of
 inputs, integer, floats, etc. A larger file also fails in the
 middle, a
 shorter file also fails. Testing all the possible input sizes on psql,
 everything is smooth and the inputs are accepted.

 Thanks

 Paulo


 Michael Bayer wrote:
 On May 25, 2009, at 11:13 AM, nuin wrote:


 sqlalchemy.exceptions.DataError: (DataError) missing data for column
 spectral_analysis_id
 CONTEXT:  COPY precursor, line 360: 13177      13177   13177
 555.2051        1662.59362792969    64.374
 COPY precursor (spectrum_number, scan_begin, scan_end, mass, mtoz,
 elution_time_begin, spectral_analysis_id, charge) F
 ROM 'C:/Users/nuin/workspace/OCBN/src/ipad_parser_1/temp2.csv'  {}



 im not familiar with direct usage of COPY FROM but it seems here like
 line 360 of your input file only has six elements and the COPY
 statement is asking for eight.   its not a SQLalchemy issue in any
 case and if anything would be specific to psycopg2, if at all.






 


 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: querying into objects from a large flat table or have multiple processed tables mapped to objects?

2009-05-25 Thread Adrian von Bidder
[web logs - db]

On Tuesday 26 May 2009 00.27:03 Michael Bayer wrote:
 the best thing to do would be to experiment with some various schemas  
 and see what works best

Also, it's extremely important to keep in mind that SQL databases can only 
work well with big tables if you create the right indices.  What kind of 
index to create depends on the database you'll use and on which queries 
you'll run.  (And: don't use a small example data set to decide which 
indices to build.  A database often will change its query plan, sometimes 
drastically, depending on how much data is in a table.)  So: create your 
schema, fill it with a few million log entries and then look at which 
queries might need which indices.

In the case of web logs, you'll probably want an index on every base field 
(tinestamp, user, IP, URL string), but depending on your exact queries, 
combined indices on multiple column, or indices on functions of fields 
(like, perhaps, an index on the ending of the url to quickly filter jpg/png 
requests vs. html/php requests vs. requests on directories, ending by '/') 
might speed up your queries dramatically.

cheers
-- vbi

-- 
Even though I use zsh, holding down the TAB key just doesn’t write my
code for me, so I have to use a text editor.
-- Scott James Remnant



signature.asc
Description: This is a digitally signed message part.