Re: [sqlalchemy] Generic File Loader issue with Booleans

2020-10-28 Thread Mike Bayer
OK please include:

database backend, version

database driver (DBAPI) in use, version

use of special libraries, int() called on 0 or 1 suggests these are not Python 
integers, please specify any numpy or pandas use and read 
https://docs.sqlalchemy.org/en/13/faq/thirdparty.html 

python version

etc



On Wed, Oct 28, 2020, at 10:33 AM, Andrew Martin wrote:
> I will reproduce this when I get done with work this evening and give the 
> specifics.
> 
> On Wednesday, October 28, 2020 at 7:46:26 AM UTC-5, Mike Bayer wrote:
>> 
>> 
>> On Tue, Oct 27, 2020, at 11:56 PM, Andrew Martin wrote:
>>> This is probably a weirdly specific question, but I have a workflow that 
>>> involves loading lots of CSVs into Postgres. Some of them are very large, 
>>> so I want to cut overhead and not use CSV Dictreader. Wanted to use named 
>>> tuples instead.
>>> 
>>> Here's a header and a line from a file:
>>> 
>>> date_dim_id,date_actual,epoch,day_suffix,day_name,day_of_week,day_of_month,day_of_quarter,day_of_year,week_of_month,week_of_year,week_of_year_iso,month_actual,month_name,month_name_abbreviated,quarter_actual,quarter_name,year_actual,first_day_of_week,last_day_of_week,first_day_of_month,last_day_of_month,first_day_of_quarter,last_day_of_quarter,first_day_of_year,last_day_of_year,mm,mmdd,weekend_indr
>>> 
>>> 
>>> Here's a line:
>>> 
>>> 20150101,1/1/15,1420070400,1st,Thursday ,4,1,1,1,1,1,2015-W01-4,1,January  
>>> ,Jan,1,First,2015,12/29/14,1/4/15,1/1/15,1/31/15,1/1/15,3/31/15,1/1/15,12/31/15,12015,1012015,0
>>> 
>>> 
>>> The idea was a generic file loader that takes a filename and a class and 
>>> then loads it. Here's the class:
>>> 
>>> 
>>> class DimDate(Base):
>>> __tablename__ = "dim_date"
>>> 
>>> date_dim_id = Column(Integer, primary_key=True)
>>> date_actual = Column(Date, nullable=False, index=True)
>>> epoch = Column(BigInteger, nullable=False)
>>> day_suffix = Column(String(4), nullable=False)
>>> day_name = Column(String(9), nullable=False)
>>> day_of_week = Column(Integer, nullable=False)
>>> day_of_month = Column(Integer, nullable=False)
>>> day_of_quarter = Column(Integer, nullable=False)
>>> day_of_year = Column(Integer, nullable=False)
>>> week_of_month = Column(Integer, nullable=False)
>>> week_of_year = Column(Integer, nullable=False)
>>> week_of_year_iso = Column(CHAR(10), nullable=False)
>>> month_actual = Column(Integer, nullable=False)
>>> month_name = Column(String(9), nullable=False)
>>> month_name_abbreviated = Column(CHAR(3), nullable=False)
>>> quarter_actual = Column(Integer, nullable=False)
>>> quarter_name = Column(String(9), nullable=False)
>>> year_actual = Column(Integer, nullable=False)
>>> first_day_of_week = Column(Date, nullable=False)
>>> last_day_of_week = Column(Date, nullable=False)
>>> first_day_of_month = Column(Date, nullable=False)
>>> last_day_of_month = Column(Date, nullable=False)
>>> first_day_of_quarter = Column(Date, nullable=False)
>>> last_day_of_quarter = Column(Date, nullable=False)
>>> first_day_of_year = Column(Date, nullable=False)
>>> last_day_of_year = Column(Date, nullable=False)
>>> mm = Column(CHAR(6), nullable=False)
>>> mmdd = Column(CHAR(10), nullable=False)
>>> weekend_indr = Column(Boolean, nullable=False)
>>> 
>>> 
>>> What I ran into is a couple of things. 1. Python True/False throws errors 
>>> when writing to a Boolean. 2. You seem to have to int() a 1 or 0 to avoid a 
>>> dbapi error. 
>>> 
>>> Is this expected?
>> 
>> no, it's not expected, boolean True/False works on all backends without 
>> issue, assuming the target column is using the Boolean datatype at the 
>> SQLAlchemy level as well as that the actual column in the database is of the 
>> appropriate type, for PostgreSQL this would be BOOLEAN.   when you say 
>> "throws errors" please share complete stack traces and complete error 
>> messages, thanks.
>> 
>> can't do much more here without more specifics.  
>> 
>> 
>>> 
>>> 
>>> 
>>> 
>>> And does it have more to do with Postgres than it does SQLA?
>>> 
>>> Here's the code I came up with for generic file loader that actually works. 
>>> I'm sure it's garbage, and I'd like to refactor it, but it works.
>>> 
>>> def load_file(file_name, cls):
>>> """assumes that csv file has headers and that headers match the
>>> names of the columns for a given sqla class. Also assumes that the
>>> sqla class is inheriting from meta.Base"""
>>> inst = inspect(cls)
>>> # attr_names = sorted([c_attr.key for c_attr in 
>>> inst.mapper.column_attrs])
>>> attr_vals = sorted([(c.name, c.type) for c in inst.c])
>>> 
>>> 
>>> s = get_sqla_session()
>>> 
>>> 
>>> with open(file_name, newline="") as infile:
>>> reader = csv.reader(infile)
>>> Data = namedtuple("Data", next(reader))
>>> tuple_fields = Data._fields
>>> for data in map(Data._make, [(c.strip() for c 

Re: [sqlalchemy] Generic File Loader issue with Booleans

2020-10-28 Thread Andrew Martin
I will reproduce this when I get done with work this evening and give the 
specifics.

On Wednesday, October 28, 2020 at 7:46:26 AM UTC-5, Mike Bayer wrote:
>
>
>
> On Tue, Oct 27, 2020, at 11:56 PM, Andrew Martin wrote:
>
> This is probably a weirdly specific question, but I have a workflow that 
> involves loading lots of CSVs into Postgres. Some of them are very large, 
> so I want to cut overhead and not use CSV Dictreader. Wanted to use named 
> tuples instead.
>
> Here's a header and a line from a file:
>
> date_dim_id,date_actual,epoch,day_suffix,day_name,day_of_week,day_of_month
> ,day_of_quarter,day_of_year,week_of_month,week_of_year,week_of_year_iso,
> month_actual,month_name,month_name_abbreviated,quarter_actual,quarter_name
> ,year_actual,first_day_of_week,last_day_of_week,first_day_of_month,
> last_day_of_month,first_day_of_quarter,last_day_of_quarter,
> first_day_of_year,last_day_of_year,mm,mmdd,weekend_indr
>
>
> Here's a line:
>
> 20150101,1/1/15,1420070400,1st,Thursday ,4,1,1,1,1,1,2015-W01-4,1,January 
>  
> ,Jan,1,First,2015,12/29/14,1/4/15,1/1/15,1/31/15,1/1/15,3/31/15,1/1/15,12/31/15,12015,1012015,0
>
>
> The idea was a generic file loader that takes a filename and a class and 
> then loads it. Here's the class:
>
>
> class DimDate(Base):
> __tablename__ = "dim_date"
>
> date_dim_id = Column(Integer, primary_key=True)
> date_actual = Column(Date, nullable=False, index=True)
> epoch = Column(BigInteger, nullable=False)
> day_suffix = Column(String(4), nullable=False)
> day_name = Column(String(9), nullable=False)
> day_of_week = Column(Integer, nullable=False)
> day_of_month = Column(Integer, nullable=False)
> day_of_quarter = Column(Integer, nullable=False)
> day_of_year = Column(Integer, nullable=False)
> week_of_month = Column(Integer, nullable=False)
> week_of_year = Column(Integer, nullable=False)
> week_of_year_iso = Column(CHAR(10), nullable=False)
> month_actual = Column(Integer, nullable=False)
> month_name = Column(String(9), nullable=False)
> month_name_abbreviated = Column(CHAR(3), nullable=False)
> quarter_actual = Column(Integer, nullable=False)
> quarter_name = Column(String(9), nullable=False)
> year_actual = Column(Integer, nullable=False)
> first_day_of_week = Column(Date, nullable=False)
> last_day_of_week = Column(Date, nullable=False)
> first_day_of_month = Column(Date, nullable=False)
> last_day_of_month = Column(Date, nullable=False)
> first_day_of_quarter = Column(Date, nullable=False)
> last_day_of_quarter = Column(Date, nullable=False)
> first_day_of_year = Column(Date, nullable=False)
> last_day_of_year = Column(Date, nullable=False)
> mm = Column(CHAR(6), nullable=False)
> mmdd = Column(CHAR(10), nullable=False)
> weekend_indr = Column(Boolean, nullable=False)
>
>
> What I ran into is a couple of things. 1. Python True/False throws errors 
> when writing to a Boolean. 2. You seem to have to int() a 1 or 0 to avoid a 
> dbapi error. 
>
> Is this expected?
>
>
> no, it's not expected, boolean True/False works on all backends without 
> issue, assuming the target column is using the Boolean datatype at the 
> SQLAlchemy level as well as that the actual column in the database is of 
> the appropriate type, for PostgreSQL this would be BOOLEAN.   when you say 
> "throws errors" please share complete stack traces and complete error 
> messages, thanks.
>
> can't do much more here without more specifics.  
>
>
>
>
>
>
> And does it have more to do with Postgres than it does SQLA?
>
> Here's the code I came up with for generic file loader that actually 
> works. I'm sure it's garbage, and I'd like to refactor it, but it works.
>
> def load_file(file_name, cls):
> """assumes that csv file has headers and that headers match the
> names of the columns for a given sqla class. Also assumes that the
> sqla class is inheriting from meta.Base"""
> inst = inspect(cls)
> # attr_names = sorted([c_attr.key for c_attr in 
> inst.mapper.column_attrs])
> attr_vals = sorted([(c.name, c.type) for c in inst.c])
>
>
> s = get_sqla_session()
>
>
> with open(file_name, newline="") as infile:
> reader = csv.reader(infile)
> Data = namedtuple("Data", next(reader))
> tuple_fields = Data._fields
> for data in map(Data._make, [(c.strip() for c in row) for row in 
> reader]):
> new_obj = cls()
> for attr in attr_vals:
> for field in tuple_fields:
> if attr[0] == field:
> if str(attr[1]) == "BOOLEAN":
> setattr(new_obj, field, int(getattr(data, 
> field)))
> else:
> # print(data, attr[1])
> setattr(new_obj, field, getattr(data, field))
>
>
> new_obj.id = IDService.create_snowflake_id()
>   

Re: [sqlalchemy] Generic File Loader issue with Booleans

2020-10-28 Thread Mike Bayer


On Tue, Oct 27, 2020, at 11:56 PM, Andrew Martin wrote:
> This is probably a weirdly specific question, but I have a workflow that 
> involves loading lots of CSVs into Postgres. Some of them are very large, so 
> I want to cut overhead and not use CSV Dictreader. Wanted to use named tuples 
> instead.
> 
> Here's a header and a line from a file:
> 
> date_dim_id,date_actual,epoch,day_suffix,day_name,day_of_week,day_of_month,day_of_quarter,day_of_year,week_of_month,week_of_year,week_of_year_iso,month_actual,month_name,month_name_abbreviated,quarter_actual,quarter_name,year_actual,first_day_of_week,last_day_of_week,first_day_of_month,last_day_of_month,first_day_of_quarter,last_day_of_quarter,first_day_of_year,last_day_of_year,mm,mmdd,weekend_indr
> 
> 
> Here's a line:
> 
> 20150101,1/1/15,1420070400,1st,Thursday ,4,1,1,1,1,1,2015-W01-4,1,January  
> ,Jan,1,First,2015,12/29/14,1/4/15,1/1/15,1/31/15,1/1/15,3/31/15,1/1/15,12/31/15,12015,1012015,0
> 
> 
> The idea was a generic file loader that takes a filename and a class and then 
> loads it. Here's the class:
> 
> 
> class DimDate(Base):
> __tablename__ = "dim_date"
> 
> date_dim_id = Column(Integer, primary_key=True)
> date_actual = Column(Date, nullable=False, index=True)
> epoch = Column(BigInteger, nullable=False)
> day_suffix = Column(String(4), nullable=False)
> day_name = Column(String(9), nullable=False)
> day_of_week = Column(Integer, nullable=False)
> day_of_month = Column(Integer, nullable=False)
> day_of_quarter = Column(Integer, nullable=False)
> day_of_year = Column(Integer, nullable=False)
> week_of_month = Column(Integer, nullable=False)
> week_of_year = Column(Integer, nullable=False)
> week_of_year_iso = Column(CHAR(10), nullable=False)
> month_actual = Column(Integer, nullable=False)
> month_name = Column(String(9), nullable=False)
> month_name_abbreviated = Column(CHAR(3), nullable=False)
> quarter_actual = Column(Integer, nullable=False)
> quarter_name = Column(String(9), nullable=False)
> year_actual = Column(Integer, nullable=False)
> first_day_of_week = Column(Date, nullable=False)
> last_day_of_week = Column(Date, nullable=False)
> first_day_of_month = Column(Date, nullable=False)
> last_day_of_month = Column(Date, nullable=False)
> first_day_of_quarter = Column(Date, nullable=False)
> last_day_of_quarter = Column(Date, nullable=False)
> first_day_of_year = Column(Date, nullable=False)
> last_day_of_year = Column(Date, nullable=False)
> mm = Column(CHAR(6), nullable=False)
> mmdd = Column(CHAR(10), nullable=False)
> weekend_indr = Column(Boolean, nullable=False)
> 
> 
> What I ran into is a couple of things. 1. Python True/False throws errors 
> when writing to a Boolean. 2. You seem to have to int() a 1 or 0 to avoid a 
> dbapi error. 
> 
> Is this expected?

no, it's not expected, boolean True/False works on all backends without issue, 
assuming the target column is using the Boolean datatype at the SQLAlchemy 
level as well as that the actual column in the database is of the appropriate 
type, for PostgreSQL this would be BOOLEAN.   when you say "throws errors" 
please share complete stack traces and complete error messages, thanks.

can't do much more here without more specifics.  


> 
> 
> 
> 
> And does it have more to do with Postgres than it does SQLA?
> 
> Here's the code I came up with for generic file loader that actually works. 
> I'm sure it's garbage, and I'd like to refactor it, but it works.
> 
> def load_file(file_name, cls):
> """assumes that csv file has headers and that headers match the
> names of the columns for a given sqla class. Also assumes that the
> sqla class is inheriting from meta.Base"""
> inst = inspect(cls)
> # attr_names = sorted([c_attr.key for c_attr in inst.mapper.column_attrs])
> attr_vals = sorted([(c.name, c.type) for c in inst.c])
> 
> 
> s = get_sqla_session()
> 
> 
> with open(file_name, newline="") as infile:
> reader = csv.reader(infile)
> Data = namedtuple("Data", next(reader))
> tuple_fields = Data._fields
> for data in map(Data._make, [(c.strip() for c in row) for row in 
> reader]):
> new_obj = cls()
> for attr in attr_vals:
> for field in tuple_fields:
> if attr[0] == field:
> if str(attr[1]) == "BOOLEAN":
> setattr(new_obj, field, int(getattr(data, field)))
> else:
> # print(data, attr[1])
> setattr(new_obj, field, getattr(data, field))
> 
> 
> new_obj.id = IDService.create_snowflake_id()
> new_obj.is_deleted = 0
> 
> 
> s.add(new_obj)
> s.commit()
> 
> 
> This isn't really a big deal, and maybe not even appropriate, but I had an 
> absolute shit 12 

[sqlalchemy] Generic File Loader issue with Booleans

2020-10-27 Thread Andrew Martin
This is probably a weirdly specific question, but I have a workflow that 
involves loading lots of CSVs into Postgres. Some of them are very large, 
so I want to cut overhead and not use CSV Dictreader. Wanted to use named 
tuples instead.

Here's a header and a line from a file:

date_dim_id,date_actual,epoch,day_suffix,day_name,day_of_week,day_of_month,
day_of_quarter,day_of_year,week_of_month,week_of_year,week_of_year_iso,
month_actual,month_name,month_name_abbreviated,quarter_actual,quarter_name,
year_actual,first_day_of_week,last_day_of_week,first_day_of_month,
last_day_of_month,first_day_of_quarter,last_day_of_quarter,first_day_of_year
,last_day_of_year,mm,mmdd,weekend_indr


Here's a line:

20150101,1/1/15,1420070400,1st,Thursday ,4,1,1,1,1,1,2015-W01-4,1,January 
 
,Jan,1,First,2015,12/29/14,1/4/15,1/1/15,1/31/15,1/1/15,3/31/15,1/1/15,12/31/15,12015,1012015,0


The idea was a generic file loader that takes a filename and a class and 
then loads it. Here's the class:


class DimDate(Base):
__tablename__ = "dim_date"

date_dim_id = Column(Integer, primary_key=True)
date_actual = Column(Date, nullable=False, index=True)
epoch = Column(BigInteger, nullable=False)
day_suffix = Column(String(4), nullable=False)
day_name = Column(String(9), nullable=False)
day_of_week = Column(Integer, nullable=False)
day_of_month = Column(Integer, nullable=False)
day_of_quarter = Column(Integer, nullable=False)
day_of_year = Column(Integer, nullable=False)
week_of_month = Column(Integer, nullable=False)
week_of_year = Column(Integer, nullable=False)
week_of_year_iso = Column(CHAR(10), nullable=False)
month_actual = Column(Integer, nullable=False)
month_name = Column(String(9), nullable=False)
month_name_abbreviated = Column(CHAR(3), nullable=False)
quarter_actual = Column(Integer, nullable=False)
quarter_name = Column(String(9), nullable=False)
year_actual = Column(Integer, nullable=False)
first_day_of_week = Column(Date, nullable=False)
last_day_of_week = Column(Date, nullable=False)
first_day_of_month = Column(Date, nullable=False)
last_day_of_month = Column(Date, nullable=False)
first_day_of_quarter = Column(Date, nullable=False)
last_day_of_quarter = Column(Date, nullable=False)
first_day_of_year = Column(Date, nullable=False)
last_day_of_year = Column(Date, nullable=False)
mm = Column(CHAR(6), nullable=False)
mmdd = Column(CHAR(10), nullable=False)
weekend_indr = Column(Boolean, nullable=False)


What I ran into is a couple of things. 1. Python True/False throws errors 
when writing to a Boolean. 2. You seem to have to int() a 1 or 0 to avoid a 
dbapi error. 

Is this expected? And does it have more to do with Postgres than it does 
SQLA?

Here's the code I came up with for generic file loader that actually works. 
I'm sure it's garbage, and I'd like to refactor it, but it works.

def load_file(file_name, cls):
"""assumes that csv file has headers and that headers match the
names of the columns for a given sqla class. Also assumes that the
sqla class is inheriting from meta.Base"""
inst = inspect(cls)
# attr_names = sorted([c_attr.key for c_attr in 
inst.mapper.column_attrs])
attr_vals = sorted([(c.name, c.type) for c in inst.c])


s = get_sqla_session()


with open(file_name, newline="") as infile:
reader = csv.reader(infile)
Data = namedtuple("Data", next(reader))
tuple_fields = Data._fields
for data in map(Data._make, [(c.strip() for c in row) for row in 
reader]):
new_obj = cls()
for attr in attr_vals:
for field in tuple_fields:
if attr[0] == field:
if str(attr[1]) == "BOOLEAN":
setattr(new_obj, field, int(getattr(data, field
)))
else:
# print(data, attr[1])
setattr(new_obj, field, getattr(data, field))


new_obj.id = IDService.create_snowflake_id()
new_obj.is_deleted = 0


s.add(new_obj)
s.commit()


This isn't really a big deal, and maybe not even appropriate, but I had an 
absolute shit 12 hours trying to figure out why SQLA booleans are such a 
hassle.

If anyone has thoughts, I'm happy to hear them.

Thank you!

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit