Re: [sqlalchemy] simple query takes to long

2022-06-09 Thread Simon King
How many rows are you fetching, and how many columns in each row?

On Thu, Jun 9, 2022 at 8:37 AM Trainer Go  wrote:

> Hello Jonathan,
>
> i already executed the query without using pandas in my programm
>
> query = "SELECT"
> for row in conn.execute(query).fetchall():
> pass
>
> the result was the same runtime with pandas.
>
> So this cant be the problem. I think so.
>
> Greetings Manuel
>
> Jonathan Vanasco schrieb am Mittwoch, 8. Juni 2022 um 17:28:01 UTC+2:
>
>> When you select in the database ui tool, you are just displaying raw data.
>>
>> When you select within your code snippets above, Python is creating
>> pandas' DataFrame objects for the results.
>>
>> These two concepts are not comparable at all.  Converting the SQL data to
>> Python data structures in Pandas (and SQLAlchemy's ORM) is a lot of
>> overhead - and that grows with the result size.
>>
>> You can use memory and code profiling tools to explore this and see where
>> the issues are. The best approach is what Philip suggested above though,
>> and not use pandas, so you can see how Python/SqlAlchemy handles the raw
>> data.
>>
>>
>>
>>
>>
>> On Wednesday, June 8, 2022 at 9:28:38 AM UTC-4 Trainer Go wrote:
>>
>>> Hello Phil,
>>>
>>> i tested both and without printing the result.
>>>
>>> table_df = pd.read_sql_query(''SELECT, engine)
>>> #print(table_df)
>>> #query = "SELECT"
>>> #for row in conn.execute(query).fetchall():
>>> #pass
>>>
>>>
>>> both have nearly the same runtime. So this is not my problem. And yes,
>>> they are the same queries cause i copy pasted the select from my DBUI where
>>> is tested first the results and the runtime and i expected the same runtime
>>> in my program but no ;)
>>>
>>> Greeting Manuel
>>>
>>> Philip Semanchuk schrieb am Mittwoch, 8. Juni 2022 um 15:04:08 UTC+2:
>>>


 > On Jun 8, 2022, at 8:29 AM, Trainer Go  wrote:
 >
 > When im using pandas with pd.read_sql_query()
 > with chunksize to minimiza the memory usage there is no difference
 between both runtimes..

 Do you know that, or is that speculation?

 >
 > table_df = pd.read_sql_query('''select , engine, chunksize = 3)
 >
 > for df in table_df:
 > print(df)
 >
 > the runtime is nearly the same like 5 minutes

 Printing to the screen also takes time, and your terminal probably
 buffers the results, which requires memory allocation. I’m not saying this
 is your problem (it probably isn’t), but your test still involves pandas
 and your terminal, both of which cloud the issue. You would benefit from
 simplifying your tests.

 Did you try this suggestion from my previous email?


 > for row in conn.execute(my_query).fetchall():
 > pass

 Also, are you 100% sure you’re executing the same query from SQLAlchemy
 that you’re pasting into your DB UI?

 Cheers
 Philip



 >
 >
 >
 > #print(table_df) result: #generator object
 SQLDatabase._query_iterator at 0x0DC69C30>
 > I dont know if the query will be triggered by using print(table_df)
 the result is generator object SQLDatabase._query_iterator at 0x0DC69C30>
 >
 > but the runtime is 6 seconds like in the DBUI im using.
 >
 > I have no clue what to do.
 >
 > Greetings Manuel
 >
 > Trainer Go schrieb am Mittwoch, 8. Juni 2022 um 09:27:04 UTC+2:
 > thank you Philip,
 >
 > I will test it today.
 >
 >
 > Greetings Manuel
 >
 > Philip Semanchuk schrieb am Dienstag, 7. Juni 2022 um 17:13:28 UTC+2:
 >
 >
 > > On Jun 7, 2022, at 5:46 AM, Trainer Go  wrote:
 > >
 > > Hello guys,
 > >
 > > Im executing 2 queries in my python program with sqlalchemy using
 the pyodbc driver.
 > > The database is a Adaptive SQL Anywhere Version 7 32 Bit.
 > >
 > > When im executing the queries in a DB UI it takes 5-6 seconds for
 both together and when im using the same queries in my python programm it
 takes 5-6 minutes instead of 6 seconds. What im doing wrong? Im new at
 this.
 >
 > To start, debug one query at a time, not two.
 >
 > Second, when you test a query in your DB UI, you’re probably already
 connected to the database. Your Python program has to make the connection —
 that’s an extra step, and it might be slow. If you step through the Python
 program in the debugger, you can execute one statement at a time (the
 connection and the query) to understand how long each step takes. That will
 help to isolate the problem.
 >
 > Third, keep in mind that receiving results takes time too. If your DB
 UI is written in C or some other language that allocates memory very
 efficiently, it might be a lot faster than building a Pandas dataframe.
 >
 > You might want to eliminate Pandas entirely so you don’t have to
 question whether or not that’s the source of your 

Re: [sqlalchemy] simple query takes to long

2022-06-09 Thread Trainer Go
Hello Jonathan,

i already executed the query without using pandas in my programm

query = "SELECT"
for row in conn.execute(query).fetchall():
pass

the result was the same runtime with pandas.

So this cant be the problem. I think so.

Greetings Manuel

Jonathan Vanasco schrieb am Mittwoch, 8. Juni 2022 um 17:28:01 UTC+2:

> When you select in the database ui tool, you are just displaying raw data.
>
> When you select within your code snippets above, Python is creating 
> pandas' DataFrame objects for the results. 
>
> These two concepts are not comparable at all.  Converting the SQL data to 
> Python data structures in Pandas (and SQLAlchemy's ORM) is a lot of 
> overhead - and that grows with the result size.
>
> You can use memory and code profiling tools to explore this and see where 
> the issues are. The best approach is what Philip suggested above though, 
> and not use pandas, so you can see how Python/SqlAlchemy handles the raw 
> data.
>
>
>
>
>
> On Wednesday, June 8, 2022 at 9:28:38 AM UTC-4 Trainer Go wrote:
>
>> Hello Phil,
>>
>> i tested both and without printing the result.
>>
>> table_df = pd.read_sql_query(''SELECT, engine)
>> #print(table_df)
>> #query = "SELECT"
>> #for row in conn.execute(query).fetchall():
>> #pass
>>
>>
>> both have nearly the same runtime. So this is not my problem. And yes, 
>> they are the same queries cause i copy pasted the select from my DBUI where 
>> is tested first the results and the runtime and i expected the same runtime 
>> in my program but no ;)
>>
>> Greeting Manuel
>>
>> Philip Semanchuk schrieb am Mittwoch, 8. Juni 2022 um 15:04:08 UTC+2:
>>
>>>
>>>
>>> > On Jun 8, 2022, at 8:29 AM, Trainer Go  wrote: 
>>> > 
>>> > When im using pandas with pd.read_sql_query() 
>>> > with chunksize to minimiza the memory usage there is no difference 
>>> between both runtimes.. 
>>>
>>> Do you know that, or is that speculation? 
>>>
>>> > 
>>> > table_df = pd.read_sql_query('''select , engine, chunksize = 3) 
>>> > 
>>> > for df in table_df: 
>>> > print(df) 
>>> > 
>>> > the runtime is nearly the same like 5 minutes 
>>>
>>> Printing to the screen also takes time, and your terminal probably 
>>> buffers the results, which requires memory allocation. I’m not saying this 
>>> is your problem (it probably isn’t), but your test still involves pandas 
>>> and your terminal, both of which cloud the issue. You would benefit from 
>>> simplifying your tests. 
>>>
>>> Did you try this suggestion from my previous email? 
>>>
>>>
>>> > for row in conn.execute(my_query).fetchall(): 
>>> > pass 
>>>
>>> Also, are you 100% sure you’re executing the same query from SQLAlchemy 
>>> that you’re pasting into your DB UI? 
>>>
>>> Cheers 
>>> Philip 
>>>
>>>
>>>
>>> > 
>>> > 
>>> > 
>>> > #print(table_df) result: #generator object SQLDatabase._query_iterator 
>>> at 0x0DC69C30> 
>>> > I dont know if the query will be triggered by using print(table_df) 
>>> the result is generator object SQLDatabase._query_iterator at 0x0DC69C30> 
>>> > 
>>> > but the runtime is 6 seconds like in the DBUI im using. 
>>> > 
>>> > I have no clue what to do. 
>>> > 
>>> > Greetings Manuel 
>>> > 
>>> > Trainer Go schrieb am Mittwoch, 8. Juni 2022 um 09:27:04 UTC+2: 
>>> > thank you Philip, 
>>> > 
>>> > I will test it today. 
>>> > 
>>> > 
>>> > Greetings Manuel 
>>> > 
>>> > Philip Semanchuk schrieb am Dienstag, 7. Juni 2022 um 17:13:28 UTC+2: 
>>> > 
>>> > 
>>> > > On Jun 7, 2022, at 5:46 AM, Trainer Go  wrote: 
>>> > > 
>>> > > Hello guys, 
>>> > > 
>>> > > Im executing 2 queries in my python program with sqlalchemy using 
>>> the pyodbc driver. 
>>> > > The database is a Adaptive SQL Anywhere Version 7 32 Bit. 
>>> > > 
>>> > > When im executing the queries in a DB UI it takes 5-6 seconds for 
>>> both together and when im using the same queries in my python programm it 
>>> takes 5-6 minutes instead of 6 seconds. What im doing wrong? Im new at 
>>> this. 
>>> > 
>>> > To start, debug one query at a time, not two. 
>>> > 
>>> > Second, when you test a query in your DB UI, you’re probably already 
>>> connected to the database. Your Python program has to make the connection — 
>>> that’s an extra step, and it might be slow. If you step through the Python 
>>> program in the debugger, you can execute one statement at a time (the 
>>> connection and the query) to understand how long each step takes. That will 
>>> help to isolate the problem. 
>>> > 
>>> > Third, keep in mind that receiving results takes time too. If your DB 
>>> UI is written in C or some other language that allocates memory very 
>>> efficiently, it might be a lot faster than building a Pandas dataframe. 
>>> > 
>>> > You might want to eliminate Pandas entirely so you don’t have to 
>>> question whether or not that’s the source of your slowdown. You could do 
>>> this instead - 
>>> > 
>>> > for row in conn.execute(my_query).fetchall(): 
>>> > pass 
>>> > 
>>> > That will force your Python program to iterate over the