Re: Django Channels Load Testing Results

2016-09-27 Thread Erik Cederstrand
Hi Robert,

> Den 26. sep. 2016 kl. 03.03 skrev Robert Roskam :
> 
> > The unit in the second graph is requests per minute, which is inconsistent 
> > since the first graph is requests per second. This also makes comparison 
> > difficult. Also, it doesn't actually show the requests per minute value 
> > unless you break out a calculator, since the X axis stops at 50 seconds. 
> 
> So is your request basically for me to give the slope on each one so that you 
> can interpolate the results from one graph to the other?

One way to compare results from the two tests would be to find out how far from 
max throughput on graph #2 you measured the latency from graph #1. It's 
difficult for me to add a 300 rps line in graph #2 because the 1-second mark on 
the X-axis is impossible to read, and the 1-minute mark is off the axis. Does 
that make sense?


> > Also, the lines in the second graph are suspiciously linear - how many 
> > measurements were made, at which points in time, what is the jitter? Just 
> > show the actual measurements as dots, then I can live with the straight 
> > line. That would also show any effects of the autothrottle algorithm. 
> 
> I'll have to regather that data. I had not logged every single response. I 
> was aggregating the results every 5 seconds. I can resample this one.

It's not a big deal, it's just that the straight lines are hiding a lot of 
possibly interesting information in the distribution of the sample data - is it 
really linear, what's the variability, does the server slow to a crawl after 15 
seconds, etc. Thanks for the attached graph, which clears a lot of this up.


> > Finally, I have a hard time understanding the latency values - the config 
> > shows 1 worker, so I'm assuming seial output. But for Gunicorn, the graph 
> > shows ~80.000 rpm which corresponds to a latency of 0,75ms, while the text 
> > says 6ms. Likewise, according to the graph, Redis has a latency of 1,7ms 
> > and IPC 6ms, which does not align with the text. Is this the effect of an 
> > async I/O behind the scenes, og are there multiple threads within the 
> > worker? 
> 
> I'm pressed to understand what you're trying to say. I'm not sure where you 
> got the 80 (or is it 80 thousand?) rps from. If you're trying to sum the 
> values of gunicorn through time, I guess that exposes something else that I 
> either misrepresented or said incorrectly. The values presented are 
> accumulated. Perhaps, that's not the correct way to present this.


Sorry, I'm in a locale where we use '.' as thousands separator and ',' as 
decimal separator. I'll switch to American notation now :-)

I read the blue line for Gunicorn in graph #2 as having returned in total 
80.000 requests after a duration 60 seconds (just over the 75,000 mark after 
ca. 55 seconds). Is that an incorrect interpretation of the graph? If not, then 
60 seconds / 80,000 requests * 1000 = 0.75 ms per request, if the requests are 
processed in serial.

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/5FDA33F2-9A7B-4D76-83CE-A64B0ACD2BE1%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: Should the Django session-id be hashed?

2016-09-22 Thread Erik Cederstrand

> Den 22. sep. 2016 kl. 13.38 skrev Alex Gaynor :
> 
> If Django were a different framework, I'd probably think this was a 
> reasonable idea. However, Django's ORM is _incredibly_ good at deterring SQL 
> injection. In many many years of using and reviewing Django applications, SQL 
> injection is vanishingly rare in my experience; therefore I think this adds 
> complexity for limited gain. Another relevant factor is that this is only 
> applicable to the database sessions backend.

The attacker would only need to read access for this to work, not write access. 
That could possibly be achieved that even without SQL injection. If the 
attacker can just put another person's session ID in her cookie, then session 
IDs are basically passwords. Passwords should not be stored clear-text. The 
only difference is that session IDs are more short-lived than passwords.

It's the same issue with API key authentication for REST APIs. Not many people 
remember to hash the keys before storing them in the DB.

If the attacker gains write access to the DB, then you're doomed anyway, hashes 
or not. The attacker just makes up her own session ID, hashes it and writes it 
to the database. Or makes up her own password and writes it to the Users table.

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/54777EE1-707B-4794-9854-394C5892587B%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: Django Channels Load Testing Results

2016-09-13 Thread Erik Cederstrand

> Den 13. sep. 2016 kl. 09.28 skrev Erik Cederstrand 
> <erik+li...@cederstrand.dk>:
> 
> First of all, thanks for taking the time to actually do the measurements! 
> It's insightful and very much appreciated.
> 
> [...]300K requests in 10 minutes is 500 rps, but the text says 500 rps. Which 
> is it?
^^^
300 rps

Jeez, not even the email whining about inconsistencies can get the numbers 
right :-)

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CBB8087C-063B-4F91-8870-AEA981FE33F2%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: Django Channels Load Testing Results

2016-09-13 Thread Erik Cederstrand

> Den 13. sep. 2016 kl. 03.41 skrev Robert Roskam :
> 
> Hey Chris,
> 
> The goal of these tests is to see how channels performs with normal HTTP 
> traffic under heavy load with a control. In order to compare accurately, I 
> tried to eliminate variances as much as possible. 
> 
> So yes, there was one worker for both Redis and IPC setups. I provided the 
> supervisor configs, as I figured those would be helpful in describing exactly 
> what commands were run on each system.
> 
> Does that help bring some context? Or would you like for me to elaborate 
> further on some point?

First of all, thanks for taking the time to actually do the measurements! It's 
insightful and very much appreciated.

At least for me, it took a long time to find out what the graphs were actually 
showing. In the first graph, maybe the title could be more descriptive, e.g. 
"Request latency at 300 rps". Also, 300K requests in 10 minutes is 500 rps, but 
the text says 500 rps. Which is it?

The unit in the second graph is requests per minute, which is inconsistent 
since the first graph is requests per second. This also makes comparison 
difficult. Also, it doesn't actually show the requests per minute value unless 
you break out a calculator, since the X axis stops at 50 seconds.

Also, the lines in the second graph are suspiciously linear - how many 
measurements were made, at which points in time, what is the jitter? Just show 
the actual measurements as dots, then I can live with the straight line. That 
would also show any effects of the autothrottle algorithm.

Finally, I have a hard time understanding the latency values - the config shows 
1 worker, so I'm assuming seial output. But for Gunicorn, the graph shows 
~80.000 rpm which corresponds to a latency of 0,75ms, while the text says 6ms. 
Likewise, according to the graph, Redis has a latency of 1,7ms and IPC 6ms, 
which does not align with the text. Is this the effect of an async I/O behind 
the scenes, og are there multiple threads within the worker?

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/4CB46E99-6603-42A3-BEFD-C49CC920EEA2%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: Need help with MySQL 5.7 crashing on Django's Jenkins

2016-08-02 Thread Erik Cederstrand
I think this is better directed at a MySQL list. MySQL shouldn't crash, nothing 
I see indicates that this is a Django issue.

Of course, it's best if you can reproduce the error. Barring that, you'll get a 
much more useful stack trace if you build MySQL with debugging symbols. A quick 
look at the stack trace does indicate that your problem is in either libpthread 
or libc, which is unlikely - is it possible that you have hardware issues?

Erik

> Den 2. aug. 2016 kl. 02.05 skrev Tim Graham :
> 
> Sometimes the MySQL 5.7.13 builds on Ubuntu 16.04 are failing with "Lost 
> connection to MySQL server during query" because the MySQL server restarts 
> during the tests. I wonder if anyone has an idea about how to solve this. 
> Looking through the MySQL error log, I think this is the root cause:
> 
> 2016-08-01T23:02:56.636617Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has 
> lasted > 600 seconds. We intentionally crash the server because it appears to 
> be hung.
> 2016-08-01 23:02:56 0x7f5fb75d8700  InnoDB: Assertion failure in thread 
> 140049074980608 in file ut0ut.cc line 920
> InnoDB: We intentionally generate a memory trap.
> InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
> InnoDB: If you get repeated assertion failures or crashes, even
> InnoDB: immediately after the mysqld startup, there may be
> InnoDB: corruption in the InnoDB tablespace. Please refer to
> InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
> InnoDB: about forcing recovery.
> 23:02:56 UTC - mysqld got signal 6 ;
> This could be because you hit a bug. It is also possible that this binary
> or one of the libraries it was linked against is corrupt, improperly built,
> or misconfigured. This error can also be caused by malfunctioning hardware.
> Attempting to collect some information that could help diagnose the problem.
> As this is a crash and something is definitely wrong, the information
> collection process might fail.
> 
> key_buffer_size=536870912
> read_buffer_size=131072
> max_used_connections=4
> max_threads=151
> thread_count=4
> connection_count=4
> It is possible that mysqld could use up to 
> key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 584285 
> K  bytes of memory
> Hope that's ok; if not, decrease some variables in the equation.
> 
> Thread pointer: 0x0
> Attempting backtrace. You can use the following information to find out
> where mysqld died. If you see no messages after this, something went
> terribly wrong...
> stack_bottom = 0 thread_stack 0x20
> /usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xe7bdab]
> /usr/sbin/mysqld(handle_fatal_signal+0x489)[0x783759]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0)[0x7f60131dc3d0]
> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f6012596418]
> /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f601259801a]
> /usr/sbin/mysqld[0x759764]
> /usr/sbin/mysqld(_ZN2ib5fatalD1Ev+0x145)[0x110c905]
> /usr/sbin/mysqld(srv_error_monitor_thread+0xe2d)[0x10aa34d]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa)[0x7f60131d26fa]
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f6012667b5d]
> The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
> information that should help you find out what is causing the crash.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/3f787f25-2e3f-49b1-b6a3-7a3411e70a9b%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/8F9A7704-1A29-4EBB-8AD7-FF3CAB398416%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: Extend support for long surnames in Django Auth

2016-07-29 Thread Erik Cederstrand
Hello Raony,

I'm sure I'm not aware of all the implications of changing the field length, 
but the first question should be "how long is long enough"? In answering this 
question, this Quora question comes to mind: 
https://www.quora.com/Why-are-South-Indian-names-often-long

Kind regards,
Erik
a.k.a. Bommiraju Sitaramanjaneyulu Rajasekhara Srinivasulu Laxminarayana Siva 
Venkata Sai :-)

> Den 29. jul. 2016 kl. 09.18 skrev Raony Guimaraes Corrêa Do Carmo Lisboa 
> Cardenas :
> 
> Hello everyone,
> 
> For a long time I was having problems to login to djangopackages.com using my 
> github account (pydanny/djangopackages#338). After investigating I discovered 
> the problem was because my surname is longer than 30 characters. I don't know 
> why both first_name and last_name fields have the same size limit of 30 
> characters in Django. That doesn't sound very reasonable.
> 
> I'm sure there are other people on the same situation and this already 
> happened with me trying to login in other django websites.
> 
> 
> 
> 
> 
> 
> 
> Tim Graham suggested I should first ask on this maillist 
> (https://github.com/django/django/pull/6988#issuecomment-235945422) to see if 
> there is consensus to make the change.
> 
> I would like to ask your opinion about an increase from 30 to 60 characters 
> on last_name field so that my login and others won't break again in the 
> future. I can create a Trac ticket if the response is positive.
> 
> Kind Regards,
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/56bc25d9-372e-4985-b601-3cce9664160c%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/7FA7D68D-F718-404C-9F89-B4E1C3C8A6ED%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: Django Model Fields a repr_output property, to include a field in the string representation of an object

2016-07-01 Thread Erik Cederstrand
Plus, it's 3 lines of code and faster to implement than to look up the 
documentation:


class ReprFieldsMixIn:
  class Meta:
  repr_fields = ('bar', 'baz')

  def __repr__(self):
  '<%s: %s>' % (self.__class__.__name__, ', '.join('%s=%s' % (f, 
repr(getattr(self, f))) for f in self._meta.repr_fields))


Erik

> Den 1. jul. 2016 kl. 21.52 skrev Tim Graham :
> 
> Since no one else expressed interest in two weeks, I think it's safe to say 
> it isn't a use case for 80% of users. That's the rough bar I think of as to 
> whether or not something needs to be in core.
> 
> On Friday, July 1, 2016 at 3:05:13 PM UTC-4, Ben Friedland wrote:
> Well in that case wouldn't it make sense to add it to the meta class? Just 
> like how there's unique_together, etc., there could be repr_fields. In which 
> case I think it would be great if it were part of Django. 
> 
> Maybe reconsider as a part of the meta class?
> 
> Ben Friedland
> www.bugben.com
> 
> On Monday, June 27, 2016 at 6:24:57 AM UTC-7, Tim Graham wrote:
> A new model field option doesn't seem necessary. I think a cleaner solution 
> would be something like a decorator that takes a list of fields, e.g.
> 
> @repr_fields('first_name', 'last_name')
> class Person(...):
>...
> 
> This doesn't need to live in Django itself though.
> 
> On Thursday, June 23, 2016 at 7:30:50 PM UTC-4, Ben Friedland wrote:
> Has a feature like this ever been considered? 
> 
> If a model has no __unicode__, __str__ or __repr__ representation, then maybe 
> it could devise a string representation by collecting fields which have this 
> value set to True. 
> 
> Example:
> 
> Without the feature:
> 
> class Person(models.Model):
>first_name = models.CharField(max_length=50)
>last_name = models.CharField(max_length=50)
> 
 person = Person(first_name='Ben', last_name='Friedland')
 print person
> # fairly useless object representation
> 
> 
> This feature would work something like: 
> 
> class Person(models.Model):
>first_name = models.CharField(max_length=50, repr_output=True)
>last_name = models.CharField(max_length=50, repr_output=True)
> 
 person = Person(first_name='Ben', last_name='Friedland')
 print person
># includes fields 
> specified via repr_output=True
> 
> If this would be useful I'd be happy to formally create an issue and even 
> implement the feature. 
> 
> Thanks!
> 
> Ben Friedland
> www.bugben.com
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/08be43ef-ef2e-4a62-bf17-a33935ab207b%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/728D4B02-C3B0-436A-A661-9CC16437A55A%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode normalization for username field

2016-04-25 Thread Erik Cederstrand

> Den 24. apr. 2016 kl. 20.58 skrev Claude Paroz :
> 
> - I'm afraid this change may result in boilerplate as most custom user models 
> will revert to Django's historical (and in my opinion sensible) username 
> validation rules. 
> 
> That's a tough question to estimate. This might be true for most English 
> monolingual web sites, but not necessarily for the majority of Django sites. 
> Hopefully we'll get some more user inputs in this thread.

>From a security point of view, I understand the reasoning here. Everyone 
>expects ASCII-only usernames. There were the same discussions when IDN was 
>introduced. But even with ASCII, people are attempting to spoof (googel.com, 
>gogle.com etc). We need a way to normalize unicode - preferably an external 
>library or method, since this is not a Django-specific problem.

Being from .dk, we're used to translating "Åge Æbelø" to "aage_aebeloe" when 
creating usernames. But just like 8.3 filenames were the norm in DOS and we got 
around that, we now expect to be able to create long filenames with spaces and 
unicode characters. There was a 15-year transition period where unicode 
filenames would maybe work and often break things in unexpected ways (I *still* 
find issues from time to time). Unicode was for the brave and those with lots 
of free time to waste. But I think that in 2016, software that cannot handle 
unicode is simply broken and must be fixed. Sure, unicode can be a hassle. I 
still need to hexdump strings once in a while to find out what is going on. But 
there is no way we can continue to not support the billions of people in the 
world that use a language that doesn't fit into ASCII.

For usernames, most people may still want the old behavior, and they can do 
that. It could even be the default (remember POLA). But being able to create 
unicode usernames should be possible and supported.

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/9C750A64-E218-4D76-828C-72DA3E98ECAE%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: Vendoring multipledispatch

2016-04-07 Thread Erik Cederstrand

> Den 6. apr. 2016 kl. 13.42 skrev Marc Tamlyn :
> 
> Does anyone (potentially from OS packaging worlds maybe) have a good reason 
> NOT to have a dependency?

Here is a list off the top of my head. This is not necessarily an argument 
against dependencies, just some things to consider.


1: Availability. If Django depends on version x.y.z and x.y.z is removed from 
PyPI, or the whole package is deleted, then Django is no longer installable 
(google "NPM kik" for a recent example).

2: Customization. We need to tweak functionality in some non-upstreamable way, 
cherry-pick new functionality, or fix security issues before they are published 
on PyPI.

3: Version conflicts, as mentioned by Sylvain.

4: Security/stability. We depend on version x.y and a witty developer uploads 
dependency x.y.z+1 with an Easter egg, or the PyPI developer account is hacked 
and x.y is replaced.


These issues are amplified in a world where many people have automated 
production deployments running 'pip install -U -r requirements.txt'. Issues 
could spread very fast.

This may not be much different than what people are already exposed to with 
their own project dependencies, but vendoring (directly or by dependency) is 
endorsement by the Project, so any issues in the dependencies will fall back on 
the Project.


Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/19C24081-78F8-4790-9302-84BFB4AC0A46%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: Add documentation to address OWASP Top 10?

2016-04-06 Thread Erik Cederstrand

> Den 6. apr. 2016 kl. 07.29 skrev Anssi Kääriäinen :
> 
> It is notable that if the number of items is a secret (say, you don't
> want to reveal how many sales items you have), just having information
> about sequential numbers is bad. In that case you should use UUID,
> which the documentation could point out.

If anything about your data is sensitive, then there are a pile of side 
channels that putting your data online could expose. URLs are just one. For an 
entertaining read, google "German tank problem".

Giving specific security advice in the documentation that doesn't strictly 
refer to Django features could IMO lead to the false expectation that you're 
magically secure if you follow the advice. I would prefer that the 
documentation simply pointed to further reading, e.g. OWASP.

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/D9FBFA53-1053-4389-A192-3FA44606C82D%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: [Question] MySQL Microseconds stripping

2015-12-20 Thread Erik Cederstrand

> Den 20. dec. 2015 kl. 01.04 skrev Cristiano Coelho :
> 
> About using a custom datetime field that strips microseconds, that won't work 
> for raw queries I believe, not even .update statements as they ignore 
> pre-save? As the stripping happens (or used to happen) at the sql query 
> compile level.
> This is really a bummer, because it seems like the only option is to convert 
> all my datetime columns into datetime(6), which increases the table size and 
> index by around 10%, for something I will never use. Any other work around 
> that can work with both normal and raw queries? 

While I understand that you'd like this to Just Work, you're sending 
microseconds to the DB, knowing they will get lost, and expecting comparisons 
to still work *with* microseconds. It's like expecting 12.34 == int(12.34).

Why not strip the microseconds explicitly as soon as you're handed a datetime 
with microseconds? That way you make it explicit that you really don't want 
microseconds. That's less head-scratching for the next person to work with your 
code. Just dt.replace(microsecond=0) all date values before you issue a 
.filter(), .save(), .update(), .raw() or whatever.

> Should I complain at mysql forums?

You could try, but since Oracle took over, all my reports have been answered 
with WONTFIX. Anyway, it'll be months or years before you get something you can 
install on your server.

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/0001EA1D-0B44-4CC8-AC6B-A80F3B14F943%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: [Question] MySQL Microseconds stripping

2015-12-19 Thread Erik Cederstrand

> Den 19. dec. 2015 kl. 16.01 skrev Aymeric Augustin 
> :
> 
> To be fair, this has a lot to do with MySQL’s lax approach to storing data. 
> There’s so many situations where it just throws data away happily that one 
> can’t really expect to read back data written to MySQL.
> 
> That said, this change in Django sets up this trap for unsuspecting users. 
> Providing a way for users to declare which fields use the old format won't 
> work because of pluggable apps: they cannot know what version of MySQL their 
> users were running when they first created a given datetime column. The best 
> solution may be to provide a conversion script to upgrade all datetime 
> columns from the old to the new format.

One simple solution could be for Christiano to subclass the DateTimeField to 
handle the microsecond precision explicitly. Something like this to strip:


class DateTimeFieldWithPrecision(DateTimeField):
   def __init__(self, *args, **kwargs):
   self.precision = kwargs.get('precision', 6)
   assert 0 <= self.precision <= 6
super().__init__(*args, **kwargs)

   def pre_save(self, model_instance, add):
   dt = getattr(model_instance, self.attname)
dt.replace(microsecond=int(dt.microsecond/10**(6-self.precision)))
   return dt


Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/24D67D81-5EB2-4E0A-B7FB-2771DA2FBEEB%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: [Question] MySQL Microseconds stripping

2015-12-19 Thread Erik Cederstrand

> Den 19. dec. 2015 kl. 13.15 skrev Cristiano Coelho :
> 
> Erik,
> I'm using MySQL 5.6.x and indeed it has microseconds support, but that's not 
> the issue.
> 
> The issue is that every datetime column created has no microseconds (since 
> they were created with django 1.7, so it is actually a datetime(0) column) 
> and I would like to keep it that way, however, since django 1.8+ will always 
> send microseconds in the query, inserts will go fine (mysql will just ignore 
> them) but SELECTS will all fail since mysql will not strip microseconds from 
> the where clause (even if the column is defined as 0 datetime, duh, really 
> mysql?), so basically everything that uses datetime equaility in the query 
> stopped working.

Can you elaborate on that? You're doing something like:

  SELECT foo_date FROM my_table WHERE foo_date = '2015-12-24 12:34:56.123456';

and expecting it to return rows where foo_date is '2015-12-24 12:34:56', but it 
doesn't?

I'm not sure that's a bug - it's not the least astonishing to me. Why aren't 
you stripping microseconds from the datetime values before issuing the query, 
if your data never has microseconds?

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/9F95A9A1-89C5-4304-AF98-8D76FD09C2DC%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: [Question] MySQL Microseconds stripping

2015-12-18 Thread Erik Cederstrand

> Den 19. dec. 2015 kl. 07.52 skrev Cristiano Coelho :
> 
> Hello,
> 
> After django 1.8, the mysql backend no longer strips microseconds. 
> This is giving me some issues when upgrading from 1.7 (I actually upgraded to 
> 1.9 directly), since date times are not stored with micro second precision on 
> mysql, but the queries are sent with them.
> As I see it, my only option is to update all existing date time columns of 
> all existing tables, which is quite boring since there are many tables.
> Is there a way I can explicitly set the model datetime precision? Will this 
> work with raw queries also? Could this be a global setting or monkey patch? 
> This new behaviour basically breaks any '=' query on date times, at least raw 
> queries (I haven't tested the others) since it sends micro seconds which are 
> not stripped down.

MySQL as of version 5.6.4 (and MariaDB) is able to store microseconds in 
datetime fields, but you need to set the precision when you create the column. 
In Django, this should "just work". Which version of MySQL are you using, and 
are your columns created as DATETIME(6)?

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/8BB7E576-385E-41D7-B0AB-CBF4DB17ED36%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Skip IN clause on prefetch queries?

2015-09-25 Thread Erik Cederstrand
Hi devs,

My lengthy email to this list about improving performance of 
prefetch_related() seems to have disappeared. Instead, I created a ticket 
motivating my need for Prefetch() to be able to tell the prefetched to 
trust the queryset provided by Prefetch() and not generate a huge and 
unnecessary IN clause:

https://code.djangoproject.com/ticket/25464

I'd appreciate any feedback so I can improve the patch!

Thanks,
Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/5a536f45-8fad-42aa-9e5e-1aae0db16fab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Prefetch and avoiding huge IN clauses

2015-09-25 Thread Erik Cederstrand
Hi devs,

When prefetching related items for a queryset returning a large amount of 
items, the generated SQL can be quite inefficient. Here's an example:

class Category(models.Model):
type = models.PositiveIntegerField(db_index=True)


class Item(models.Model):
   category = models.ForeignKey(Category, related_name='%(class)ss')


If I have 50.000 categories of type=5, then 
"Category.objects.filter(type=5).prefetch_related('items')" will generate 
an SQL on the Item table with an IN clause containing the 50.000 Category 
ID's. This is because get_prefetch_queryset() on the Manager classes in 
django.db.models.related all do this:

def get_prefetch_queryset(self, instances, queryset=None):
[...]
query = {'%s__in' % self.query_field_name: instances}
queryset = queryset._next_is_sticky().filter(**query)
[...]


While this is great when instances is a short list, we can hope to do 
better than that when instances is large. A much more efficient query in 
the above case would be queryset._next_is_sticky().filter(category__type=5). 
We just need to make sure this alternative query will fetch (at least) the 
same items as the query with the IN clause would.

I thought I could use Prefetch('items', 
queryset=Item.objects.filter(category__type=5))to accomplish this, but 
while the custom queryset *is* used, get_prefetch_queryset() still adds the 
IN clause unconditionally. This is A) redundant B) sends a huge SQL string 
over the wire for the database to process, and C) seriously messes up the 
database query planner, often generating an inefficient execution plan.

I would like to fix this problem. The easiest way seems to be to let 
Prefetch() tell get_prefetch_queryset() to skip the IN clause. 
django.db.models.prefetch_one_level() is in charge of passing the Prefetch 
queryset to get_prefetch_queryset(), so one option would be to add a 
skip_in_clause attribute to the Prefetch model which prefetch_one_level() 
would pass to get_prefetch_queryset(). The latter could then do:

def get_prefetch_queryset(self, instances, queryset=None, skip_in_clause=
False):
[...]
if not skip_in_clause:
query = {'%s__in' % self.query_field_name: instances}
queryset = queryset._next_is_sticky().filter(**query)
[...]


In my preliminary testing on real data, this:

Category.objects.filter(type=5).prefetch_related(Prefetch('items', queryset=
Item.objects.filter(category__type=5), skip_in_clause=True))

is about 20x faster than the current implementation when the query returns 
50.000 categories. The improvement would be greater on larger datasets.

Any comments? If needed, I can provide a pull request with a suggested 
implementation. Apart from the Prefetch class and prefetch_one_level(), 
there are 4 instances of get_prefetch_queryset() in 
django.db.models.related that would need to be changed.


Thanks,
Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/8fe6b600-854c-491e-9acf-e13399d1eafb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.