Re: [sqlalchemy] MSSQL String columns often incorrectly encoded as NVARCHAR

2019-03-31 Thread Ian Wagner
Thanks. I’ll pursue those avenues.

Cheers,
Ian

2019년 4월 1일 (월) 11:30, Mike Bayer 님이 작성:

> On Sun, Mar 31, 2019 at 10:12 PM Ian Wagner 
> wrote:
> >
> >
> > My suggestion would be a pymssql dialect-level patch to send bytestrings
> for String columns, and of course continue passing str/unicode for Unicode
> columns. I'm on the mailing list looking for help with why my solution
> doesn't work as intended with enums (see my GitHub repo). For the moment,
> I've actually just implemented this column myself as shown, and replaced
> all String columns with it (rather than "replacing" String using colspecs),
> and banned all devs on our project from using the regular String column
> type until it's fixed.
>
> Please just use a TypeDecorator for all your String columns where this
> issue is apparent:
>
> class UTF8String(TypeDecorator):
> impl = String
>
> def process_bind_param(self, value, dialect):
> if value is not None:
> value = value.encode(dialect.encoding)
> return value
>
> next, the issue should be reported to pymssql, where they should
> likely include options to modify this behavior:
>
> https://github.com/pymssql/pymssql
>
> next, I need this to be confirmed as an issue for pyodbc.As my
> understanding is that Microsoft is funding Pyodbc's development I'd
> like to see what their approach to this issue is.
>
> *if* it is truly the canonical solution that applications that code
> against these DBAPIs *must* send byte arrays in order to avoid
> crushing performance costs, the drivers need to make that clear.
> then we can look into introducing either flags or permanent behavior
> such that we encode all Python unicode objects for a *non unicode*
> String datatype under the MSSQL dialects.
>
>
>
> >
> >> Also note pymssql is not well maintained right now due to lack of
> >> funding
> >
> >
> > Noted. We'll look into switching drivers yet again, but the official
> driver lacked several features the last time we evaluated it (I think
> stored proc output parameters were not supported, and it would require
> quite a few syntax changes in areas where we have to write raw SQL).
> >
> > Cheers,
> > Ian
> >
> > On Friday, March 29, 2019 at 11:10:21 PM UTC+9, Mike Bayer wrote:
> >>
> >> Also note pymssql is not well maintained right now due to lack of
> >> funding, please confirm you reproduce your performance concerns using
> >> PyODBC with Microsofts ODBC drivers ?  That should be considered to be
> >> the canonically supported driver right now, works on all platforms
> >> very well now.
> >>
> >> On Fri, Mar 29, 2019 at 10:07 AM Mike Bayer 
> wrote:
> >> >
> >> > OK so I saw that the "N" prefix is not generated with your test case
> >> > either, so I re-read your email.  Can you clarify what you mean by
> >> > "always encoded as NVARCHAR"?   are you referring to the simple fact
> >> > that a Python string object is passed to the driver, and that the
> >> > driver is behind the scenes applying the "N" in any case or is
> >> > otherwise binding it as unicode in such a way that performance is
> >> > impacted?   SQLAlchemy for many years passed bytestrings to drivers
> >> > like pyodbc because they would simply crash if you passed them a
> >> > unicode object, but once they supported it, SQLAlchemy was eager to
> >> > get out of the business of doing this encoding.In 1.3 we've just
> >> > deprecated all the flags that allow it to do this
> >> > (convert_unicode=True).Using that flag would be your quickest way
> >> > to get it back for now but we'd have to develop an all new behavior
> >> > for 1.4 if we are to start encoding these binds again, however current
> >> > behavior has been this way for many years and this is the first it's
> >> > being reported in this way.  I would want to look into driver
> >> > configuration options for this as well.
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Mar 29, 2019 at 9:56 AM Mike Bayer 
> wrote:
> >> > >
> >> > > On Fri, Mar 29, 2019 at 6:20 AM Ian Wagner 
> wrote:
> >> > > >
> >> > > > Hello all,
> >> > > >
> >> > > > I'm trying to get to the bottom of an issue in which Python 3
> (unicode by definition) strings are always encoded as NVARCHAR for at least
> two backends (pymssql and pyodbc). Using bytstrings as comparison arguments
> (for example Table.column == value.encode('utf-8')) sends a regular string
> literal as expected, but regular strings are encoded as NVARCHAR literals.
> >> > > >
> >> > > > This behavior is fairly logical at the underlying driver (pymssql
> or pyodbc) level, which is why I'm posting here. I believe the the use of a
> String column (as opposed to a Unicode column) type should not pass an
> NVARCHAR literal. Doing so has disastrous performance implications, as SQL
> Server ends up casting the whole column up. This will wreak havoc when
> regularly dealing with large-ish tables (1.7 million rows or so in our
> case).
> >> > > >
> >> > > > I have previously posted with a LOT more details on 

Re: [sqlalchemy] MSSQL String columns often incorrectly encoded as NVARCHAR

2019-03-31 Thread Mike Bayer
On Sun, Mar 31, 2019 at 10:12 PM Ian Wagner  wrote:
>
>
> My suggestion would be a pymssql dialect-level patch to send bytestrings for 
> String columns, and of course continue passing str/unicode for Unicode 
> columns. I'm on the mailing list looking for help with why my solution 
> doesn't work as intended with enums (see my GitHub repo). For the moment, 
> I've actually just implemented this column myself as shown, and replaced all 
> String columns with it (rather than "replacing" String using colspecs), and 
> banned all devs on our project from using the regular String column type 
> until it's fixed.

Please just use a TypeDecorator for all your String columns where this
issue is apparent:

class UTF8String(TypeDecorator):
impl = String

def process_bind_param(self, value, dialect):
if value is not None:
value = value.encode(dialect.encoding)
return value

next, the issue should be reported to pymssql, where they should
likely include options to modify this behavior:

https://github.com/pymssql/pymssql

next, I need this to be confirmed as an issue for pyodbc.As my
understanding is that Microsoft is funding Pyodbc's development I'd
like to see what their approach to this issue is.

*if* it is truly the canonical solution that applications that code
against these DBAPIs *must* send byte arrays in order to avoid
crushing performance costs, the drivers need to make that clear.
then we can look into introducing either flags or permanent behavior
such that we encode all Python unicode objects for a *non unicode*
String datatype under the MSSQL dialects.



>
>> Also note pymssql is not well maintained right now due to lack of
>> funding
>
>
> Noted. We'll look into switching drivers yet again, but the official driver 
> lacked several features the last time we evaluated it (I think stored proc 
> output parameters were not supported, and it would require quite a few syntax 
> changes in areas where we have to write raw SQL).
>
> Cheers,
> Ian
>
> On Friday, March 29, 2019 at 11:10:21 PM UTC+9, Mike Bayer wrote:
>>
>> Also note pymssql is not well maintained right now due to lack of
>> funding, please confirm you reproduce your performance concerns using
>> PyODBC with Microsofts ODBC drivers ?  That should be considered to be
>> the canonically supported driver right now, works on all platforms
>> very well now.
>>
>> On Fri, Mar 29, 2019 at 10:07 AM Mike Bayer  wrote:
>> >
>> > OK so I saw that the "N" prefix is not generated with your test case
>> > either, so I re-read your email.  Can you clarify what you mean by
>> > "always encoded as NVARCHAR"?   are you referring to the simple fact
>> > that a Python string object is passed to the driver, and that the
>> > driver is behind the scenes applying the "N" in any case or is
>> > otherwise binding it as unicode in such a way that performance is
>> > impacted?   SQLAlchemy for many years passed bytestrings to drivers
>> > like pyodbc because they would simply crash if you passed them a
>> > unicode object, but once they supported it, SQLAlchemy was eager to
>> > get out of the business of doing this encoding.In 1.3 we've just
>> > deprecated all the flags that allow it to do this
>> > (convert_unicode=True).Using that flag would be your quickest way
>> > to get it back for now but we'd have to develop an all new behavior
>> > for 1.4 if we are to start encoding these binds again, however current
>> > behavior has been this way for many years and this is the first it's
>> > being reported in this way.  I would want to look into driver
>> > configuration options for this as well.
>> >
>> >
>> >
>> >
>> > On Fri, Mar 29, 2019 at 9:56 AM Mike Bayer  wrote:
>> > >
>> > > On Fri, Mar 29, 2019 at 6:20 AM Ian Wagner  wrote:
>> > > >
>> > > > Hello all,
>> > > >
>> > > > I'm trying to get to the bottom of an issue in which Python 3 (unicode 
>> > > > by definition) strings are always encoded as NVARCHAR for at least two 
>> > > > backends (pymssql and pyodbc). Using bytstrings as comparison 
>> > > > arguments (for example Table.column == value.encode('utf-8')) sends a 
>> > > > regular string literal as expected, but regular strings are encoded as 
>> > > > NVARCHAR literals.
>> > > >
>> > > > This behavior is fairly logical at the underlying driver (pymssql or 
>> > > > pyodbc) level, which is why I'm posting here. I believe the the use of 
>> > > > a String column (as opposed to a Unicode column) type should not pass 
>> > > > an NVARCHAR literal. Doing so has disastrous performance implications, 
>> > > > as SQL Server ends up casting the whole column up. This will wreak 
>> > > > havoc when regularly dealing with large-ish tables (1.7 million rows 
>> > > > or so in our case).
>> > > >
>> > > > I have previously posted with a LOT more details on StackOverflow 
>> > > > (https://stackoverflow.com/questions/55098426/strings-used-in-query-always-sent-with-nvarchar-syntax-even-if-the-underlying-c).
>> > > >  I 

Re: [sqlalchemy] MSSQL String columns often incorrectly encoded as NVARCHAR

2019-03-31 Thread Ian Wagner
Thanks for the reply, Mike!

the NVARCHAR thing should not happen if you are comparing to a 
> non-NVARCHAR column.   it only occurs when there is no other context 
> that SQLAlchemy can determine the correct datatype for the Unicode 
> object being passed.


This was my impression as well. I am, as you can see in my MCVE, using the 
ORM, not Core. I am using a String column, which is supposed to map to a 
VARCHAR, whereas Unicode would map to NVARCHAR.

OK so I saw that the "N" prefix is not generated with your test case 
> either, so I re-read your email.  Can you clarify what you mean by 
> "always encoded as NVARCHAR"?   are you referring to the simple fact 
> that a Python string object is passed to the driver, and that the 
> driver is behind the scenes applying the "N" in any case or is 
> otherwise binding it as unicode in such a way that performance is 
> impacted?


That's correct; I made it very clear in my SO post that the underlying 
issue is passing a python(3) str to the driver, and that under the hood the 
driver is prefixing it. The SQL you see generated by SQLAlchemy is NOT the 
same SQL that actually gets sent to SQL Server (this is not the fault of 
SQLAlchemy, and is beyond the scope of this discussion, but in case you're 
trying to debug it this way, you won't have much luck; you need to check 
the SQL Server profiler or recent expensive queries list or some other 
method). Here's a direct link to the offending line in 
pymssql: 
https://github.com/pymssql/pymssql/blob/891b20e29e4e247c17b202e8e34e5c739b6090ef/src/_mssql.pyx#L1779.
 
pymssql isn't necessarily wrong in its behavior; it's a low-level driver 
and doesn't have the same kind of context SQLAlchemy does. This will *very* 
negatively impact performance on a table with a lot of rows (in our case, 
it was a very simple index lookup of a ~1.7million row table).

I noticed the unicode options, and also their deprecation status. Hence my 
hesitation in using those. I also completely understand not wanting to pass 
bytestrings in general. However, this is a clear limitation in pymssql and, 
by some reports, pyodbc as well. I don't know which underlying ODBC driver 
the other guy on SO was using past the python layer though.

but we'd have to develop an all new behavior 
> for 1.4 if we are to start encoding these binds again, however current 
> behavior has been this way for many years and this is the first it's 
> being reported in this way.  I would want to look into driver 
> configuration options for this as well. 


If you check git blame for the pymssql line in question, it's been this way 
for 6 years; it's a very old "problem." I would suggest that nobody has 
noticed it due to lack of scale with a bad database schema. The reason we 
noticed this is because we're working with a 20-year-old schema that 
evolved over the years without any oversight, and has a mess of things like 
string primary keys that we can't easily change. We just happened to notice 
this because we're transitioning the main application logic from classic 
ASP to Python and our core tables are set up rather poorly.

My suggestion would be a pymssql dialect-level patch to send bytestrings 
for String columns, and of course continue passing str/unicode for Unicode 
columns. I'm on the mailing list looking for help with why my solution 
doesn't work as intended with enums (see my GitHub repo). For the moment, 
I've actually just implemented this column myself as shown, and replaced 
all String columns with it (rather than "replacing" String using colspecs), 
and banned all devs on our project from using the regular String column 
type until it's fixed.

Also note pymssql is not well maintained right now due to lack of 
> funding


Noted. We'll look into switching drivers yet again, but the official driver 
lacked several features the last time we evaluated it (I think stored proc 
output parameters were not supported, and it would require quite a few 
syntax changes in areas where we have to write raw SQL).

Cheers,
Ian

On Friday, March 29, 2019 at 11:10:21 PM UTC+9, Mike Bayer wrote:
>
> Also note pymssql is not well maintained right now due to lack of 
> funding, please confirm you reproduce your performance concerns using 
> PyODBC with Microsofts ODBC drivers ?  That should be considered to be 
> the canonically supported driver right now, works on all platforms 
> very well now. 
>
> On Fri, Mar 29, 2019 at 10:07 AM Mike Bayer  > wrote: 
> > 
> > OK so I saw that the "N" prefix is not generated with your test case 
> > either, so I re-read your email.  Can you clarify what you mean by 
> > "always encoded as NVARCHAR"?   are you referring to the simple fact 
> > that a Python string object is passed to the driver, and that the 
> > driver is behind the scenes applying the "N" in any case or is 
> > otherwise binding it as unicode in such a way that performance is 
> > impacted?   SQLAlchemy for many years passed bytestrings to drivers 
> > like pyodbc 

Re: [sqlalchemy] MSSQL String columns often incorrectly encoded as NVARCHAR

2019-03-29 Thread Mike Bayer
Also note pymssql is not well maintained right now due to lack of
funding, please confirm you reproduce your performance concerns using
PyODBC with Microsofts ODBC drivers ?  That should be considered to be
the canonically supported driver right now, works on all platforms
very well now.

On Fri, Mar 29, 2019 at 10:07 AM Mike Bayer  wrote:
>
> OK so I saw that the "N" prefix is not generated with your test case
> either, so I re-read your email.  Can you clarify what you mean by
> "always encoded as NVARCHAR"?   are you referring to the simple fact
> that a Python string object is passed to the driver, and that the
> driver is behind the scenes applying the "N" in any case or is
> otherwise binding it as unicode in such a way that performance is
> impacted?   SQLAlchemy for many years passed bytestrings to drivers
> like pyodbc because they would simply crash if you passed them a
> unicode object, but once they supported it, SQLAlchemy was eager to
> get out of the business of doing this encoding.In 1.3 we've just
> deprecated all the flags that allow it to do this
> (convert_unicode=True).Using that flag would be your quickest way
> to get it back for now but we'd have to develop an all new behavior
> for 1.4 if we are to start encoding these binds again, however current
> behavior has been this way for many years and this is the first it's
> being reported in this way.  I would want to look into driver
> configuration options for this as well.
>
>
>
>
> On Fri, Mar 29, 2019 at 9:56 AM Mike Bayer  wrote:
> >
> > On Fri, Mar 29, 2019 at 6:20 AM Ian Wagner  wrote:
> > >
> > > Hello all,
> > >
> > > I'm trying to get to the bottom of an issue in which Python 3 (unicode by 
> > > definition) strings are always encoded as NVARCHAR for at least two 
> > > backends (pymssql and pyodbc). Using bytstrings as comparison arguments 
> > > (for example Table.column == value.encode('utf-8')) sends a regular 
> > > string literal as expected, but regular strings are encoded as NVARCHAR 
> > > literals.
> > >
> > > This behavior is fairly logical at the underlying driver (pymssql or 
> > > pyodbc) level, which is why I'm posting here. I believe the the use of a 
> > > String column (as opposed to a Unicode column) type should not pass an 
> > > NVARCHAR literal. Doing so has disastrous performance implications, as 
> > > SQL Server ends up casting the whole column up. This will wreak havoc 
> > > when regularly dealing with large-ish tables (1.7 million rows or so in 
> > > our case).
> > >
> > > I have previously posted with a LOT more details on StackOverflow 
> > > (https://stackoverflow.com/questions/55098426/strings-used-in-query-always-sent-with-nvarchar-syntax-even-if-the-underlying-c).
> > >  I also have an MCVE over on GitHub 
> > > (https://github.com/ianthetechie/pymssql_sqlalchemy_55098426).
> > >
> > > In my MCVE, I outline a possible approach for fixing this, but it appears 
> > > to have some problems. I'm posting here asking for feedback on what's 
> > > wrong with my approach, and what would be the best way to go about 
> > > getting this fixed.
> >
> > seems like we will need some documentation for this as it is confusing
> > a lot of people.   The issue that introduced this behavior is
> > https://github.com/sqlalchemy/sqlalchemy/issues/4442 and then that
> > same user felt it was happening too often in
> > https://github.com/sqlalchemy/sqlalchemy/issues/4561, however I
> > clarified that the N prefix only generates if there is no other
> > context to determine that this is not a non-unicode context.
> >
> > the NVARCHAR thing should not happen if you are comparing to a
> > non-NVARCHAR column.   it only occurs when there is no other context
> > that SQLAlchemy can determine the correct datatype for the Unicode
> > object being passed.
> >
> > However, the example case you have on github there seems to be using a
> > unicode in a VARCHAR comparison so should not see the N.  If it does,
> > it's a bug.  I will try your test case now.
> >
> > In the future, please go straight to the SQLAlchemy github issue
> > tracker with a succinct test case, as this N thing is obviously still
> > ongoing.
> >
> >
> >
> > >
> > > Thanks!
> > > Ian
> > >
> > > --
> > > SQLAlchemy -
> > > The Python SQL Toolkit and Object Relational Mapper
> > >
> > > http://www.sqlalchemy.org/
> > >
> > > To post example code, please provide an MCVE: Minimal, Complete, and 
> > > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> > > description.
> > > ---
> > > You received this message because you are subscribed to the Google Groups 
> > > "sqlalchemy" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an 
> > > email to sqlalchemy+unsubscr...@googlegroups.com.
> > > To post to this group, send email to sqlalchemy@googlegroups.com.
> > > Visit this group at https://groups.google.com/group/sqlalchemy.
> > > For more options, visit https://groups.google.com/d/optout.

-- 

Re: [sqlalchemy] MSSQL String columns often incorrectly encoded as NVARCHAR

2019-03-29 Thread Mike Bayer
OK so I saw that the "N" prefix is not generated with your test case
either, so I re-read your email.  Can you clarify what you mean by
"always encoded as NVARCHAR"?   are you referring to the simple fact
that a Python string object is passed to the driver, and that the
driver is behind the scenes applying the "N" in any case or is
otherwise binding it as unicode in such a way that performance is
impacted?   SQLAlchemy for many years passed bytestrings to drivers
like pyodbc because they would simply crash if you passed them a
unicode object, but once they supported it, SQLAlchemy was eager to
get out of the business of doing this encoding.In 1.3 we've just
deprecated all the flags that allow it to do this
(convert_unicode=True).Using that flag would be your quickest way
to get it back for now but we'd have to develop an all new behavior
for 1.4 if we are to start encoding these binds again, however current
behavior has been this way for many years and this is the first it's
being reported in this way.  I would want to look into driver
configuration options for this as well.




On Fri, Mar 29, 2019 at 9:56 AM Mike Bayer  wrote:
>
> On Fri, Mar 29, 2019 at 6:20 AM Ian Wagner  wrote:
> >
> > Hello all,
> >
> > I'm trying to get to the bottom of an issue in which Python 3 (unicode by 
> > definition) strings are always encoded as NVARCHAR for at least two 
> > backends (pymssql and pyodbc). Using bytstrings as comparison arguments 
> > (for example Table.column == value.encode('utf-8')) sends a regular string 
> > literal as expected, but regular strings are encoded as NVARCHAR literals.
> >
> > This behavior is fairly logical at the underlying driver (pymssql or 
> > pyodbc) level, which is why I'm posting here. I believe the the use of a 
> > String column (as opposed to a Unicode column) type should not pass an 
> > NVARCHAR literal. Doing so has disastrous performance implications, as SQL 
> > Server ends up casting the whole column up. This will wreak havoc when 
> > regularly dealing with large-ish tables (1.7 million rows or so in our 
> > case).
> >
> > I have previously posted with a LOT more details on StackOverflow 
> > (https://stackoverflow.com/questions/55098426/strings-used-in-query-always-sent-with-nvarchar-syntax-even-if-the-underlying-c).
> >  I also have an MCVE over on GitHub 
> > (https://github.com/ianthetechie/pymssql_sqlalchemy_55098426).
> >
> > In my MCVE, I outline a possible approach for fixing this, but it appears 
> > to have some problems. I'm posting here asking for feedback on what's wrong 
> > with my approach, and what would be the best way to go about getting this 
> > fixed.
>
> seems like we will need some documentation for this as it is confusing
> a lot of people.   The issue that introduced this behavior is
> https://github.com/sqlalchemy/sqlalchemy/issues/4442 and then that
> same user felt it was happening too often in
> https://github.com/sqlalchemy/sqlalchemy/issues/4561, however I
> clarified that the N prefix only generates if there is no other
> context to determine that this is not a non-unicode context.
>
> the NVARCHAR thing should not happen if you are comparing to a
> non-NVARCHAR column.   it only occurs when there is no other context
> that SQLAlchemy can determine the correct datatype for the Unicode
> object being passed.
>
> However, the example case you have on github there seems to be using a
> unicode in a VARCHAR comparison so should not see the N.  If it does,
> it's a bug.  I will try your test case now.
>
> In the future, please go straight to the SQLAlchemy github issue
> tracker with a succinct test case, as this N thing is obviously still
> ongoing.
>
>
>
> >
> > Thanks!
> > Ian
> >
> > --
> > SQLAlchemy -
> > The Python SQL Toolkit and Object Relational Mapper
> >
> > http://www.sqlalchemy.org/
> >
> > To post example code, please provide an MCVE: Minimal, Complete, and 
> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> > description.
> > ---
> > You received this message because you are subscribed to the Google Groups 
> > "sqlalchemy" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to sqlalchemy+unsubscr...@googlegroups.com.
> > To post to this group, send email to sqlalchemy@googlegroups.com.
> > Visit this group at https://groups.google.com/group/sqlalchemy.
> > For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to 

Re: [sqlalchemy] MSSQL String columns often incorrectly encoded as NVARCHAR

2019-03-29 Thread Mike Bayer
On Fri, Mar 29, 2019 at 6:20 AM Ian Wagner  wrote:
>
> Hello all,
>
> I'm trying to get to the bottom of an issue in which Python 3 (unicode by 
> definition) strings are always encoded as NVARCHAR for at least two backends 
> (pymssql and pyodbc). Using bytstrings as comparison arguments (for example 
> Table.column == value.encode('utf-8')) sends a regular string literal as 
> expected, but regular strings are encoded as NVARCHAR literals.
>
> This behavior is fairly logical at the underlying driver (pymssql or pyodbc) 
> level, which is why I'm posting here. I believe the the use of a String 
> column (as opposed to a Unicode column) type should not pass an NVARCHAR 
> literal. Doing so has disastrous performance implications, as SQL Server ends 
> up casting the whole column up. This will wreak havoc when regularly dealing 
> with large-ish tables (1.7 million rows or so in our case).
>
> I have previously posted with a LOT more details on StackOverflow 
> (https://stackoverflow.com/questions/55098426/strings-used-in-query-always-sent-with-nvarchar-syntax-even-if-the-underlying-c).
>  I also have an MCVE over on GitHub 
> (https://github.com/ianthetechie/pymssql_sqlalchemy_55098426).
>
> In my MCVE, I outline a possible approach for fixing this, but it appears to 
> have some problems. I'm posting here asking for feedback on what's wrong with 
> my approach, and what would be the best way to go about getting this fixed.

seems like we will need some documentation for this as it is confusing
a lot of people.   The issue that introduced this behavior is
https://github.com/sqlalchemy/sqlalchemy/issues/4442 and then that
same user felt it was happening too often in
https://github.com/sqlalchemy/sqlalchemy/issues/4561, however I
clarified that the N prefix only generates if there is no other
context to determine that this is not a non-unicode context.

the NVARCHAR thing should not happen if you are comparing to a
non-NVARCHAR column.   it only occurs when there is no other context
that SQLAlchemy can determine the correct datatype for the Unicode
object being passed.

However, the example case you have on github there seems to be using a
unicode in a VARCHAR comparison so should not see the N.  If it does,
it's a bug.  I will try your test case now.

In the future, please go straight to the SQLAlchemy github issue
tracker with a succinct test case, as this N thing is obviously still
ongoing.



>
> Thanks!
> Ian
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] MSSQL String columns often incorrectly encoded as NVARCHAR

2019-03-29 Thread Ian Wagner
Hello all,

I'm trying to get to the bottom of an issue in which Python 3 (unicode by 
definition) strings are always encoded as NVARCHAR for at least two 
backends (pymssql and pyodbc). Using bytstrings as comparison arguments 
(for example Table.column == value.encode('utf-8')) sends a regular string 
literal as expected, but regular strings are encoded as NVARCHAR literals.

This behavior is fairly logical at the underlying driver (pymssql or 
pyodbc) level, which is why I'm posting here. I believe the the use of a 
String column (as opposed to a Unicode column) type should not pass an 
NVARCHAR literal. Doing so has disastrous performance implications, as SQL 
Server ends up casting the whole column up. This will wreak havoc when 
regularly dealing with large-ish tables (1.7 million rows or so in our 
case).

I have previously posted with a LOT more details on StackOverflow 
(https://stackoverflow.com/questions/55098426/strings-used-in-query-always-sent-with-nvarchar-syntax-even-if-the-underlying-c).
 
I also have an MCVE over on GitHub 
(https://github.com/ianthetechie/pymssql_sqlalchemy_55098426).

In my MCVE, I outline a possible approach for fixing this, but it appears 
to have some problems. I'm posting here asking for feedback on what's wrong 
with my approach, and what would be the best way to go about getting this 
fixed.

Thanks!
Ian

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.