Re: Can we remove FILE_CHARSET?

2018-10-05 Thread Jon Dufresne
> Is that always available these days? (I'd guess yes.)

I too would guess yes. I believe any reasonably modern text editor will
support
UTF-8 and even likely default to saving in that encoding. I know mine does.

> Is is something we want to impose? Not sure. Are there people doing
> otherwise? (No idea.)

For templates, it wouldn't be imposed. Users can still override the template
engine's encoding with the 'file_charset' option.

For static files, without imposing it, we're back to the third-party app
concern. Just like DEFAULT_CHARSET, it would difficult to change
FILE_CHARSET
_and_ integrate third party apps. The third party apps have likely encoded
their static files using UTF-8, so setting FILE_CHARSET to some other value
will break.

Cheers,
Jon


On Wed, Oct 3, 2018 at 12:14 PM Carlton Gibson 
wrote:

> Thanks for the follow-up Jon.
>
> I'll let Vasili follow-up on his use-case if possible/relevant.
>
> TBH I'm not at all sure about the SQL data files bit, which is in part why
> I asked here.
> (Encoding issues!)
>
> > Maybe that sentence should be rephrased to "template
> files, static files, and translation catalogs".
>
> OK, so IF it's just this, then I'm on Windows doing development in UTF-8
> no problem (and can't really envisage doing much different as it stands)
> but:
>
> * Is that always available these days? (I'd guess yes.)
> * Is is something we want to impose? Not sure. Are there people doing
> otherwise? (No idea.)
>
> (If we can drop a setting, that'd be 💃🏼)
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/13cc5f04-967b-4f53-92f6-cbe155014edc%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CADhq2b4-7vc-hZgKP_%2BYOmUkbRC%2B8r7i2t74d0-9b0vGrVcmkA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can we remove FILE_CHARSET?

2018-10-04 Thread Vasili Korol
I guess, my statement doesn't apply if FILE_CHARSET only affects Django 
text files, so disregard. My point was that non-UTF data is still actively 
used despite the fact that "the whole world moved to Unicode".

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/b79c03f1-cf52-4594-a936-936829921e8b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can we remove FILE_CHARSET?

2018-10-03 Thread Carlton Gibson
Thanks for the follow-up Jon. 

I'll let Vasili follow-up on his use-case if possible/relevant. 

TBH I'm not at all sure about the SQL data files bit, which is in part why 
I asked here. 
(Encoding issues!) 

> Maybe that sentence should be rephrased to "template
files, static files, and translation catalogs".

OK, so IF it's just this, then I'm on Windows doing development in UTF-8 no 
problem (and can't really envisage doing much different as it stands) but: 

* Is that always available these days? (I'd guess yes.)
* Is is something we want to impose? Not sure. Are there people doing 
otherwise? (No idea.)

(If we can drop a setting, that'd be 💃🏼)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/13cc5f04-967b-4f53-92f6-cbe155014edc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can we remove FILE_CHARSET?

2018-10-03 Thread Jon Dufresne
> So Jon, are you basically saying that Vasili's concern shouldn't come up?

Yeah, I think it shouldn't come up. But I'm not sure I fully understand
Vasili's concern . Maybe if it was more specific with more details, I could
better understand it.

Django's documentation states:

https://docs.djangoproject.com/en/dev/ref/unicode/#creating-the-database

> Make sure your database is configured to be able to store arbitrary string
> data. Normally, this means giving it an encoding of UTF-8 or UTF-16. If
you
> use a more restrictive encoding – for example, latin1 (iso8859-1) – you
won’t
> be able to store certain characters in the database, and information will
be
> lost.
>
> ...
>
> All of Django’s database backends automatically convert strings into the
> appropriate encoding for talking to the database. They also automatically
> convert strings retrieved from the database into strings. You don’t even
need
> to tell Django what encoding your database uses: that is handled
> transparently.

So, if these non-UTF-8 articles are stored in the database, this doesn't
involve FILE_CHARSET. Are the articles stored as text or binary data? If
text,
this violates existing Django documentation & assumptions. The database is
expected to be configured for UTF-8. If binary data, then the project's code
will be responsible for decoding it to a text string.

If, on the other hand, these articles are stored as files, how are they
being
loaded? If they are being loaded through a Django code path, which one such
that FILE_CHARSET is involved? Or, are these articles loaded by project code
such that the encoding can be specified.

So, IIUC, it doesn't seem like FILE_CHARSET should be involved for this use
case.

> That the whole "SQL data files" bit is misleading...?

I was unable to find any code with an interaction between FILE_CHARSET &
"SQL
data files". If it exists, do you have a link? I think this text may be
outdated or obsolete. Maybe that sentence should be rephrased to "template
files, static files, and translation catalogs".


On Wed, Oct 3, 2018 at 7:55 AM Carlton Gibson 
wrote:

> Thanks for the input everyone.
>
> So Jon, are you basically saying that Vasili's concern shouldn't come up?
> (That the whole "SQL data files" bit is misleading...?)
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/6d555b4d-0b38-452c-8f40-8e690f9b33a6%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CADhq2b6JU3oVYovQHG%3DO1oz0Ntw1bHTWF_iOkJcnrVPSBQh36w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can we remove FILE_CHARSET?

2018-10-03 Thread Adam Johnson
Jon's logic seems right to me. I find the lack of tests disturbing, and I
wouldn't be surprised if there were other places where django loaded files
from disk without using FILE_CHARSET when a user of that setting would
expect it to be.

On Wed, 3 Oct 2018 at 15:55, Carlton Gibson 
wrote:

> Thanks for the input everyone.
>
> So Jon, are you basically saying that Vasili's concern shouldn't come up?
> (That the whole "SQL data files" bit is misleading...?)
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/6d555b4d-0b38-452c-8f40-8e690f9b33a6%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM0SkC6_xPPg6ENRa%3DeHb2caj1e_rbufudhhdE%2BGM6kR7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can we remove FILE_CHARSET?

2018-10-03 Thread Carlton Gibson
Thanks for the input everyone. 

So Jon, are you basically saying that Vasili's concern shouldn't come up? 
(That the whole "SQL data files" bit is misleading...?)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/6d555b4d-0b38-452c-8f40-8e690f9b33a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can we remove FILE_CHARSET?

2018-10-03 Thread Jon Dufresne
I'm the one that proposed this setting be removed.

The settings is used in the following areas:

> ./django/template/backends/django.py:23:
options.setdefault('file_charset', settings.FILE_CHARSET)

I suppose this is its main use case. The Django template engine defaults to
loading files from disk using the encoding specified by FILE_CHARSET. If a
project needs to load templates using a different encoding, it can continue
to do so by specifying an OPTION in the TEMPLATES setting:

TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'OPTIONS': {
'file_charset': 'latin1',
},
},
]


> ./django/core/management/commands/makemessages.py:106:encoding =
settings.FILE_CHARSET if self.command.settings_available else 'utf-8'

The makemessages management command loads files to preprocess using the
encoding specified by FILE_CHARSET.


> ./django/contrib/staticfiles/storage.py:287:content =
original_file.read().decode(settings.FILE_CHARSET)

The HashedFilesMixin loads files to preprocess using the encoding specified
by FILE_CHARSET.


> ./django/template/backends/dummy.py:31:with
open(template_file, encoding=settings.FILE_CHARSET) as fp:

The dummy template backend loads files using the encoding specified by
FILE_CHARSET. This dummy backend is used for internal testing purposes only
and is not a documented or public API. So I think it could safely be
modified without affecting projects or users.


That's it!

I think this setting has the same issue that was identified by
DEFAULT_CONTENT_TYPE. That is, if a projects sets FILE_CHARSET to a
different value, interactions with third-party apps may be problematic. The
third-party app likely encode templates and static files using UTF-8 so the
use cases above may not work properly.

Projects using a different encoding will still have a deprecation period to
see the Django warnings, adjust the setting, and re-encode files. The
removal won't be immediate. If such projects re-encode files to UTF-8
early, the projects will be both backwards and forwards compatible with
current and future Django versions.

FWIW, I was unable to find examples of a changed FILE_CHARSET by searching
GitHub.

Using a different value for FILE_CHARSET is currently untested internally
(although I believe it works as designed).

On Wed, Oct 3, 2018 at 5:03 AM Claude Paroz  wrote:

> We are not talking about general data encodings here, FILE_CHARSET is used
> to read Django text files from disk (template files, static files (css, js)
> or translation catalogs). So the question is mainly about encoding usage in
> text editors.
>
> Claude
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/ba98d239-479f-4b21-b899-8c9b39b921a3%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CADhq2b5Zz9zLOKVZQOJjGNogh%3DYVpAVP0-tWbeTkj3WRr6UrZg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can we remove FILE_CHARSET?

2018-10-03 Thread Claude Paroz
We are not talking about general data encodings here, FILE_CHARSET is used 
to read Django text files from disk (template files, static files (css, js) 
or translation catalogs). So the question is mainly about encoding usage in 
text editors.

Claude

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/ba98d239-479f-4b21-b899-8c9b39b921a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can we remove FILE_CHARSET?

2018-10-03 Thread Vasili Korol
Some Russian companies still store their old data (in databases and/or 
files) in KOI8-R. I'm not sure how many of them may be using Django, but I 
personally worked for a company in 2014-2015, that maintained a huge 
database of articles stored in KOI8-R. I can assume that, similarly, KOI8-U 
may be used in Ukraine. This is just how it turned out to be historically. 
Windows encoding CP1251 is found less often, but even in mid-2000's it was 
still competing with KOI8, so there may be some old databases in this 
encoding somewhere, too.
I would suggest keeping this setting for now.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/13c2a00e-707d-44a6-85b6-410ed278c317%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Can we remove FILE_CHARSET?

2018-10-03 Thread Carlton Gibson
> FILE_CHARSET (default:'utf-8') 
> The character encoding used to decode any files read from disk. This 
includes template files and initial SQL data files.

Is there anywhere where this isn't UTF-8? (Or can't be decreed to be so?)

Jon has a suggestion to remove it:

Ticket: https://code.djangoproject.com/ticket/29817
PR: https://github.com/django/django/pull/10472/

Claude on GitHub: 


> You preach to a convert! However it's not about not being able to encode 
in UTF-8, but about the common file encoding on some platforms, especially 
Windows. I'm not using Windows for a long time now, so I can't say if UTF-8 
is a common encoding nowadays or if it needs a special handling (say change 
a program preference) in most Windows text editors.

Do you know about this? Can I ask for your input here? 

Thanks! 

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/38cf5604-106b-4e06-a1f0-796b7bd071c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.