Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-11-17 Thread Chris Jerdonek
[Apologies for resurrecting a few-weeks old thread.]

On Thu, Oct 4, 2012 at 2:46 PM,  mar...@v.loewis.de wrote:

 Zitat von Victor Stinner victor.stin...@gmail.com:

 I only see one argument against such refactoring: it will be harder to
 backport/forwardport bugfixes.

 I'm opposed for a different reason: I think it will be *harder* to maintain.
 The amount of code will not be reduced, but now you also need to guess what
 file some piece of functionality may be in. Instead of having my text editor
 (Emacs) search in one file, it will have to search across multiple files -
 but not across all open buffers, but only some of them (since I will have
 many other source files open as well).

 I really fail to see what problem people have with large source files.
 What is it that you want to do that can be done easier if it's multiple
 files?

One thing is browse or link to such code files on the web (e.g. from
within a tracker comment or from within our online documentation).
For example, today I was unable to open the following page from within
a browser to link to one of its lines on a tracker comment:

http://hg.python.org/cpython/file/27c20650aeab/Objects/unicodeobject.c

My laptop's fan simply turns on and the page hangs indefinitely while loading.

I don't think this point was ever mentioned.

--Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-11-17 Thread Chris Angelico
On Sun, Nov 18, 2012 at 5:47 AM, Chris Jerdonek
chris.jerdo...@gmail.com wrote:
 On Thu, Oct 4, 2012 at 2:46 PM,  mar...@v.loewis.de wrote:
 I really fail to see what problem people have with large source files.
 What is it that you want to do that can be done easier if it's multiple
 files?

 One thing is browse or link to such code files on the web (e.g. from
 within a tracker comment or from within our online documentation).
 For example, today I was unable to open the following page from within
 a browser to link to one of its lines on a tracker comment:

 http://hg.python.org/cpython/file/27c20650aeab/Objects/unicodeobject.c

 My laptop's fan simply turns on and the page hangs indefinitely while loading.

Curious. This sounds like a web browser issue - I can pull it up in
either Chrome or Firefox on Windows on my 2GHz/2GB RAM laptop with a
visible pause, but not more than half a second. However, this page is
rather more significant, and is affected equally by the file size:

http://hg.python.org/cpython/annotate/27c20650aeab/Objects/unicodeobject.c

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-11-17 Thread Chris Jerdonek
On Sat, Nov 17, 2012 at 10:55 AM, Chris Angelico ros...@gmail.com wrote:
 On Sun, Nov 18, 2012 at 5:47 AM, Chris Jerdonek
 chris.jerdo...@gmail.com wrote:
 On Thu, Oct 4, 2012 at 2:46 PM,  mar...@v.loewis.de wrote:
 I really fail to see what problem people have with large source files.
 What is it that you want to do that can be done easier if it's multiple
 files?

 One thing is browse or link to such code files on the web (e.g. from
 within a tracker comment or from within our online documentation).
 For example, today I was unable to open the following page from within
 a browser to link to one of its lines on a tracker comment:

 http://hg.python.org/cpython/file/27c20650aeab/Objects/unicodeobject.c

 My laptop's fan simply turns on and the page hangs indefinitely while 
 loading.

 Curious. This sounds like a web browser issue - I can pull it up in
 either Chrome or Firefox on Windows on my 2GHz/2GB RAM laptop with a
 visible pause, but not more than half a second.

I'm also using Chrome and on a fairly new Mac.  Perhaps.  I tried
again and it froze up several open *.python.org tabs (mail.python.org,
bugs.python.org, etc).  However, later it worked as you describe.  The
problem seems sporadic.

--Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-25 Thread Nick Coghlan
On Thu, Oct 25, 2012 at 2:22 PM, Stephen J. Turnbull step...@xemacs.org wrote:
 Nick Coghlan writes:

   OK, I need to weigh in after seeing this kind of reply. Large source files
   are discouraged in general because they're a code smell that points
   strongly towards a *lack of modularity* within a *complex piece of
   functionality*.

 Sure, but large numbers of tiny source files are also a code smell,
 the smell of purist adherence to the literal principle of modularity
 without application of judgment.

Absolutely. The classic example of this is Java's unfortunate
insistence on only-one-public-top-level-class-per-file. Bleh.

 If you want to argue that the pragmatic point of view nevertheless is
 to break up the file, I can see that, but I think Victor is going too
 far.  (Full disclosure dept.: the call graph of the Emacs equivalents
 is isomorphic to the Dungeon of Zork, so I may be a bit biased.)  You
 really should speak to the question of how many and what partition.

Yes, I agree I was too hasty in calling the specifics of Victor's
current proposal a good idea. What raised my ire was the raft of
replies objecting to the refactoring *in principle* for completely
specious reasons like being able to search within a single file
instead of having to use tools that can search across multiple files.

unicodeobject.c is too big, and should be restructured to make any
natural modularity explicit, and provide an easier path for users that
want to understand how the unicode implementation works.

   the real gain is in *modularity*, making it clear to readers which
   parts can be understood and worked on separately from each other.

 Yeah, so which do you think they are?  It seems to me that there are
 three modules to be carved out of unicodeobject.c:

 1.  The internal object management that is not exposed to Python:
 allocation, deallocation, and PEP 393 transformations.

 2.  The public interface to Python implementation: methods and
 properties, including operators.

 3.  Interaction with the outside world: codec implementations.  But
 conceptually, these really don't have anything to do with internal
 implementation of Unicode objects.  They're just functions that
 convert bytes to Unicode and vice versa.  In principle they can be
 written in terms of ord(), chr(), and bytes().  On the other hand,
 they're rather repetitive: When you've seen one codec
 implementation, you've seen them all.  I see no harm in grouping
 them in one file, and possibly a gain from proximity: casual
 passers-by might see refactorings that reduce redundancy.

I suspect you and Victor are in a much better position to thrash out
the details than I am. It was the trend in the discussion to treat the
question as split or don't split? rather than how should we split
it? when a file that large should already contain some natural
splitting points if the implementation isn't a tangled monolithic
mess.

 Why are any of these codecs here in unicodeobjectland in the first
 place?  Sure, they're needed so that Python can find its own stuff,
 but in principle *any* codec could be needed.  Is it just an heuristic
 that the codecs needed for 99% of the world are here, and other codecs
 live in separate modules?

I believe it's a combination of history and whether or not they're
needed by the interpreter during the bootstrapping process before the
encodings namespace is importable.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-25 Thread M.-A. Lemburg
On 25.10.2012 08:42, Nick Coghlan wrote:
 Why are any of these codecs here in unicodeobjectland in the first
 place?  Sure, they're needed so that Python can find its own stuff,
 but in principle *any* codec could be needed.  Is it just an heuristic
 that the codecs needed for 99% of the world are here, and other codecs
 live in separate modules?
 
 I believe it's a combination of history and whether or not they're
 needed by the interpreter during the bootstrapping process before the
 encodings namespace is importable.

They are in unicodeobject.c so that the compilers can inline the
code in the various other places where they are used in the Unicode
implementation directly as necessary and because the codecs use
a lot of functions from the Unicode API (obviously), so the other
direction of inlining (Unicode API in codecs) is needed as well.

BTW: When discussing compiler optimizations, please remember that
there are more compilers out there than just GCC and also the fact
that not everyone is using the latest and greatest version of it.
Link time inlining will usually not be as efficient as compile time
optimization and we need every bit of performance we can get
for Unicode in Python 3.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2012)
 Python Projects, Consulting and Support ...   http://www.egenix.com/
 mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2012-09-27: Released eGenix PyRun 1.1.0 ...   http://egenix.com/go35
2012-09-26: Released mxODBC.Connect 2.0.1 ... http://egenix.com/go34
2012-10-29: PyCon DE 2012, Leipzig, Germany ... 4 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-25 Thread M.-A. Lemburg
On 25.10.2012 08:42, Nick Coghlan wrote:
 unicodeobject.c is too big, and should be restructured to make any
 natural modularity explicit, and provide an easier path for users that
 want to understand how the unicode implementation works.

You can also achieve that goal by structuring the code in unicodeobject.c
in a more modular way. It is already structured in sections, but
there's always room for improvement, of course.

As mentioned before, it is impossible to split out various sections
into separate .c or .h files which then get included in the main
unicodeobject.c. If that's where consensus is going, I'm with Stephen
here in that such a separation should be done in higher level
chunks, rather than creating 10 new files.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2012)
 Python Projects, Consulting and Support ...   http://www.egenix.com/
 mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2012-09-27: Released eGenix PyRun 1.1.0 ...   http://egenix.com/go35
2012-09-26: Released mxODBC.Connect 2.0.1 ... http://egenix.com/go34
2012-10-29: PyCon DE 2012, Leipzig, Germany ... 4 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-25 Thread Maciej Fijalkowski
On Thu, Oct 25, 2012 at 8:57 AM, M.-A. Lemburg m...@egenix.com wrote:
 On 25.10.2012 08:42, Nick Coghlan wrote:
 Why are any of these codecs here in unicodeobjectland in the first
 place?  Sure, they're needed so that Python can find its own stuff,
 but in principle *any* codec could be needed.  Is it just an heuristic
 that the codecs needed for 99% of the world are here, and other codecs
 live in separate modules?

 I believe it's a combination of history and whether or not they're
 needed by the interpreter during the bootstrapping process before the
 encodings namespace is importable.

 They are in unicodeobject.c so that the compilers can inline the
 code in the various other places where they are used in the Unicode
 implementation directly as necessary and because the codecs use
 a lot of functions from the Unicode API (obviously), so the other
 direction of inlining (Unicode API in codecs) is needed as well.

I'm sorry to interrupt, but have you actually measured? What effect
the lack of said inlining has on *any* benchmark is definitely beyond
my ability to guess and I suspect is beyond the ability to guess of
anyone else on this list.

I challenge you to find a benchmark that is being significantly
affected (15%) with the split proposed by Victor. It does not even
have to be a real-world one, although that would definitely buy it
more credibility.

Cheers,
fijal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-25 Thread M.-A. Lemburg
On 25.10.2012 11:18, Maciej Fijalkowski wrote:
 On Thu, Oct 25, 2012 at 8:57 AM, M.-A. Lemburg m...@egenix.com wrote:
 On 25.10.2012 08:42, Nick Coghlan wrote:
 Why are any of these codecs here in unicodeobjectland in the first
 place?  Sure, they're needed so that Python can find its own stuff,
 but in principle *any* codec could be needed.  Is it just an heuristic
 that the codecs needed for 99% of the world are here, and other codecs
 live in separate modules?

 I believe it's a combination of history and whether or not they're
 needed by the interpreter during the bootstrapping process before the
 encodings namespace is importable.

 They are in unicodeobject.c so that the compilers can inline the
 code in the various other places where they are used in the Unicode
 implementation directly as necessary and because the codecs use
 a lot of functions from the Unicode API (obviously), so the other
 direction of inlining (Unicode API in codecs) is needed as well.
 
 I'm sorry to interrupt, but have you actually measured? What effect
 the lack of said inlining has on *any* benchmark is definitely beyond
 my ability to guess and I suspect is beyond the ability to guess of
 anyone else on this list.
 
 I challenge you to find a benchmark that is being significantly
 affected (15%) with the split proposed by Victor. It does not even
 have to be a real-world one, although that would definitely buy it
 more credibility.

I think you misunderstood. What I described is the reason for having
the base codecs in unicodeobject.c.

I think we all agree that inlining has a positive effect on
performance. The scale of the effect depends on the used compiler
and platform.

Victor already mentioned that he'll check the impact of his
proposal, so let's wait for that.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2012)
 Python Projects, Consulting and Support ...   http://www.egenix.com/
 mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2012-09-27: Released eGenix PyRun 1.1.0 ...   http://egenix.com/go35
2012-09-26: Released mxODBC.Connect 2.0.1 ... http://egenix.com/go34
2012-10-29: PyCon DE 2012, Leipzig, Germany ... 4 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-25 Thread Serhiy Storchaka

On 25.10.12 12:18, Maciej Fijalkowski wrote:

I challenge you to find a benchmark that is being significantly
affected (15%) with the split proposed by Victor. It does not even
have to be a real-world one, although that would definitely buy it
more credibility.


I see 10% slowdown for UTF-8 decoding for UCS2 strings, but 10% speedup 
for mostly-BMP UCS4 strings.  For encoding the situation is reversed 
(but up to +27%).  Charmap decoding speedups 10-30%.


GCC 4.4.3, 32-bit Linux.

https://bitbucket.org/storchaka/cpython-stuff/src/default/bench

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-25 Thread Maciej Fijalkowski

 I think you misunderstood. What I described is the reason for having
 the base codecs in unicodeobject.c.

 I think we all agree that inlining has a positive effect on
 performance. The scale of the effect depends on the used compiler
 and platform.


Well. Inlining can have positive or negative effects, depending on
various details. Too much inlining causes more cache misses for
example. However, this is absolutely irrelevant if you don't create
benchmarks and run them. Guessing is seriously not a very good
optimization strategy.

Cheers,
fijal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-25 Thread Serhiy Storchaka

On 25.10.12 12:49, M.-A. Lemburg wrote:

I think you misunderstood. What I described is the reason for having
the base codecs in unicodeobject.c.


For example PyUnicode_FromStringAndSize and PyUnicode_FromString are 
thin wrappers around PyUnicode_DecodeUTF8Stateful.  I think this is a 
reason to keep this functions together.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-25 Thread Nick Coghlan
On Thu, Oct 25, 2012 at 8:07 PM, Maciej Fijalkowski fij...@gmail.com wrote:

 I think you misunderstood. What I described is the reason for having
 the base codecs in unicodeobject.c.

 I think we all agree that inlining has a positive effect on
 performance. The scale of the effect depends on the used compiler
 and platform.


 Well. Inlining can have positive or negative effects, depending on
 various details. Too much inlining causes more cache misses for
 example. However, this is absolutely irrelevant if you don't create
 benchmarks and run them. Guessing is seriously not a very good
 optimization strategy.

Yep, that's why I made the point that speed.python.org should be a
going concern well before 3.4 release, and will be able to let us know
if we have a problem relative to 3.3.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-25 Thread Antoine Pitrou

Le 25/10/2012 02:03, Nick Coghlan a écrit :


speed.python.org is also making progress, and once that is up and
running (which will happen well before any Python 3.4 release) it will
be possible to compare the numbers between 3.3 and trunk to help
determine the validity of any concerns regarding optimisations that
can be performed within a module but not across modules.


Nobody needs speed.python.org to run benchmarks before and after a 
specific change, though.  Cloning http://hg.python.org/benchmarks and 
using the perf.py runner is everything that is needed.


Moreover, you would want to run benchmarks *before* committing and 
pushing the changes. We don't want the huge splitting to be recorded and 
then backed out in the repository history.


Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-25 Thread Antoine Pitrou

Le 25/10/2012 00:15, Nick Coghlan a écrit :


However, -1 on the faux modularity idea of breaking up the files on
disk, but still exposing them to the compiler and linker as a monolithic
block, though. That would be completely missing the point of why large
source files are bad.


I disagree with you. Source files are meant to be read by humans, we 
don't really care whether the compiler has a modular view of the source 
code.


Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-25 Thread Larry Hastings


On 10/24/2012 03:15 PM, Nick Coghlan wrote:
Breaking such files up into separately compiled modules serves two 
purposes:


1. It proves that the code *isn't* a tangled monolithic mess;
2. It enlists the compilation toolchain's assistance in ensuring that 
remains the case in the future.




Either the code is a tangled monolithic mess or it isn't.  If it is, 
then let's fix that, regardless of the size of the file.  If it isn't, I 
don't see breaking up the code among multiple files as providing any 
benefit.  And I see no need for the toolchain's assistance to help us do 
something without benefit.  The line count of the file is essentially 
unrelated to its inherent quality / maintainability.



We are not special snow flakes - good software engineering practice is 
advisable for us as well, so a big +1 from me for breaking up the 
monstrosity that is unicodeobject.c and lowering the barrier to entry 
for hacking on the individual pieces. This should come with a large 
block comment in unicodeobject.c explaining how the pieces are put 
back together again.




I'm all for good software engineering practice.  But can you cite 
objective reasons why large source files are provably bad?  Not tangled 
monolithic messes, not poorly-factored code.  I agree that those are 
bad--but so far nobody has proposed that either of those is true about 
unicodeobject.c (unless you are implicitly doing so above), nor have 
they proposed credible remedies.  All I've seen is that unicodeobject.c 
is a large file, and some people want to break it up into smaller 
files.  I have yet to see anything but handwaving as justification.  For 
example, what is this barrier to entry you suggest exists to hacking on 
the str object, that will apparently be dispelled simply by splitting 
one file into multiple files?


Someone proposed breaking up unicodeobject.c into three distinct 
subsystems and putting those in separate files.  I still don't agree.  
It seems natural to me to have everything associated with the str object 
in one file, just as we do with every other object I can think of.  If 
this were a genuinely good idea, we should consider doing it with every 
similar object.  But nobody is proposing that.  My guess is because the 
other files in CPython are small enough.  At which point we're right 
back to the primary motivation simply being the line count of 
unicodeobject.c, as a purely aesthetic and subjective judgment.



//arry/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-25 Thread Antoine Pitrou
On Thu, 25 Oct 2012 08:13:53 -0700
Larry Hastings la...@hastings.org wrote:
 
 I'm all for good software engineering practice.  But can you cite 
 objective reasons why large source files are provably bad?  Not tangled 
 monolithic messes, not poorly-factored code.  I agree that those are 
 bad--but so far nobody has proposed that either of those is true about 
 unicodeobject.c (unless you are implicitly doing so above)

Well, tangled monolithic mess is quite true about unicodeobject.c,
IMO.
Seriously, I agree with Victor: navigating around unicodeobject.c is a
PITA. Perhaps it isn't if you are using emacs, or you have 35 fingers,
or just a lot of spare time, but in my experience it's painful.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-25 Thread Stephen J. Turnbull
Antoine Pitrou writes:

  Well, tangled monolithic mess is quite true about unicodeobject.c,
  IMO.

s/object.c// and your point remains valid.  Just reading the table of
contents for UTR#17 (http://www.unicode.org/reports/tr17/) should
convince you that it's not going to be easy to produce an elegant
implementation!

  Seriously, I agree with Victor: navigating around unicodeobject.c is a
  PITA. Perhaps it isn't if you are using emacs, or you have 35 fingers,
  or just a lot of spare time, but in my experience it's painful.

Sure, but I don't know of a Unicode implementation which isn't.

I don't think that having a unicode/*.[ch] with a dozen files
(including the README etc) in it is going to make it much more
navigable.  If there are too many files, it's going to be a PITA to
maintain because there won't be an obvious place to put certain
functions.  Eg, I've already mentioned my suspicions about the charmap
code (I apologize for not reading Victor's code to confirm them).

I don't object in principle to splitting the unicodeobject.c.  At the
very least, with all due respect to MAL, XEmacs experience with coding
systems (the Emacs equivalent of Python codecs) suggests that there is
very little to be lost by moving the codec implementations to a
separate file from the Unicode object implementation.  (Here I'm
talking about codecs in the narrow sense of wire-format to Python3 str
and back, not the more general Python2 sense that included zip and
base64 and so on.  Ie, PyUnicode_Translate is not a codec in the
relevant sense.)

On the other hand, I wouldn't be surprised if (despite my earlier
suggestion) codecs and unicode object internals need a close
relationship.  (My intuition and sense of style says splitting codecs
from the low level memory management and PEP 393 stuff is a good idea,
but I'm not confident it would have no impact on performance.)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-24 Thread Larry Hastings

On 10/23/2012 09:29 AM, Georg Brandl wrote:

Especially since you're suggesting a huge number of new files, I question the
argument of better navigability.


FWIW I'm -1 on it too.  I don't see what the big deal is with large 
source files.  If you have difficulty finding your way around 
unicodeobject.c, that seems like more like a tooling issue to me, not a 
source code structural issue.



//arry/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-24 Thread Nick Coghlan
On Oct 25, 2012 2:06 AM, Larry Hastings la...@hastings.org wrote:

 On 10/23/2012 09:29 AM, Georg Brandl wrote:

 Especially since you're suggesting a huge number of new files, I
question the
 argument of better navigability.


 FWIW I'm -1 on it too.  I don't see what the big deal is with large
source files.  If you have difficulty finding your way around
unicodeobject.c, that seems like more like a tooling issue to me, not a
source code structural issue.

OK, I need to weigh in after seeing this kind of reply. Large source files
are discouraged in general because they're a code smell that points
strongly towards a *lack of modularity* within a *complex piece of
functionality*.

Breaking such files up into separately compiled modules serves two purposes:
1. It proves that the code *isn't* a tangled monolithic mess;
2. It enlists the compilation toolchain's assistance in ensuring that
remains the case in the future.

I find complaints about the ease of searching within the file to be
misguided and irrelevant, as I can just as easily reply with if searching
across multiple files is hard for you, use better tools, like grep, or
'Find in Files'.

Note that I also consider the pro argument about better navigability
inaccurate - the real gain is in *modularity*, making it clear to readers
which parts can be understood and worked on separately from each other.

We are not special snow flakes - good software engineering practice is
advisable for us as well, so a big +1 from me for breaking up the
monstrosity that is unicodeobject.c and lowering the barrier to entry for
hacking on the individual pieces. This should come with a large block
comment in unicodeobject.c explaining how the pieces are put back together
again.

However, -1 on the faux modularity idea of breaking up the files on disk,
but still exposing them to the compiler and linker as a monolithic block,
though. That would be completely missing the point of why large source
files are bad.

Regards,
Nick.

--
Sent from my phone, thus the relative brevity :)



 /arry

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-24 Thread Barry Warsaw
On Oct 25, 2012, at 08:15 AM, Nick Coghlan wrote:

OK, I need to weigh in after seeing this kind of reply. Large source files
are discouraged in general because they're a code smell that points
strongly towards a *lack of modularity* within a *complex piece of
functionality*.

Modularity is good, and the file system structure of the project should
reflect that, but to be effective, it needs to be obvious.  It's pretty
obvious what's generally in intobject.c.  I've worked with code bases where
there's no rhyme nor reason as to what you'd find in a particular file, and
this really hurts.

It hurts even with good tools.  Remember that sometimes you don't even know
what you're looking for, so search tools may not be very useful.  For example,
sometimes you want to understand how all the pieces fit together, what the
holistic view of the subsystem is, or where the entry points are.  Search
tools are not very good at this, and if it's a subsystem you only interact
with occasionally, having a file system organization that makes things easier
to remember what you learned the last time you were there helps enormously.

Another point: rather than large files (or maybe in addition to them), large
functions can also be painful to navigate.  So just splitting a file into
subfiles may not be the only modularity improvement you can make.

While I'm personally -0 about splitting up unicodeobject.c, if the folks
advocating for it go ahead with it, I just ask that you do it very carefully,
with an eye toward the casual and newbie reader of our code base.

Cheers,
-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-24 Thread Nick Coghlan
On Thu, Oct 25, 2012 at 8:37 AM, Barry Warsaw ba...@python.org wrote:
 On Oct 25, 2012, at 08:15 AM, Nick Coghlan wrote:

OK, I need to weigh in after seeing this kind of reply. Large source files
are discouraged in general because they're a code smell that points
strongly towards a *lack of modularity* within a *complex piece of
functionality*.

 Modularity is good, and the file system structure of the project should
 reflect that, but to be effective, it needs to be obvious.  It's pretty
 obvious what's generally in intobject.c.  I've worked with code bases where
 there's no rhyme nor reason as to what you'd find in a particular file, and
 this really hurts.

 It hurts even with good tools.  Remember that sometimes you don't even know
 what you're looking for, so search tools may not be very useful.  For example,
 sometimes you want to understand how all the pieces fit together, what the
 holistic view of the subsystem is, or where the entry points are.  Search
 tools are not very good at this, and if it's a subsystem you only interact
 with occasionally, having a file system organization that makes things easier
 to remember what you learned the last time you were there helps enormously.

And if we were talking in the abstract, I think these would be
reasonable concerns to bring up. However, Victor's proposed division
*is* logical (especially if he goes down the path of a separate
subdirectory which will better support easy searching across all of
the unicode object related files), and I conditioned my +1 with the
requirement that a road map be provided in a leading block comment in
unicodeobject.c.

speed.python.org is also making progress, and once that is up and
running (which will happen well before any Python 3.4 release) it will
be possible to compare the numbers between 3.3 and trunk to help
determine the validity of any concerns regarding optimisations that
can be performed within a module but not across modules.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-24 Thread Stephen J. Turnbull
Nick Coghlan writes:

  OK, I need to weigh in after seeing this kind of reply. Large source files
  are discouraged in general because they're a code smell that points
  strongly towards a *lack of modularity* within a *complex piece of
  functionality*.

Sure, but large numbers of tiny source files are also a code smell,
the smell of purist adherence to the literal principle of modularity
without application of judgment.

If you want to argue that the pragmatic point of view nevertheless is
to break up the file, I can see that, but I think Victor is going too
far.  (Full disclosure dept.: the call graph of the Emacs equivalents
is isomorphic to the Dungeon of Zork, so I may be a bit biased.)  You
really should speak to the question of how many and what partition.

  the real gain is in *modularity*, making it clear to readers which
  parts can be understood and worked on separately from each other.

Yeah, so which do you think they are?  It seems to me that there are
three modules to be carved out of unicodeobject.c:

1.  The internal object management that is not exposed to Python:
allocation, deallocation, and PEP 393 transformations.

2.  The public interface to Python implementation: methods and
properties, including operators.

3.  Interaction with the outside world: codec implementations.  But
conceptually, these really don't have anything to do with internal
implementation of Unicode objects.  They're just functions that
convert bytes to Unicode and vice versa.  In principle they can be
written in terms of ord(), chr(), and bytes().  On the other hand,
they're rather repetitive: When you've seen one codec
implementation, you've seen them all.  I see no harm in grouping
them in one file, and possibly a gain from proximity: casual
passers-by might see refactorings that reduce redundancy.

I'm not sure what to do with the charmap stuff.  In current CPython
head it seems incoherent to me: there's an IO codec, but there's also
unicode-to-unicode stuff (PyUnicode_Translate).  I haven't had time to
look at Victor's reorganization to see what he actually did with it,
but in terms of modularity, it seems to me that refactoring this stuff
would be a real win, as opposed to splitting the files which is
presentational improvement for the rest of the code which is pretty
modular.

As for Victor's proposal itself:

  1176 Objects/unicodecharmap.c
  1678 Objects/unicodecodecs.c
  1362 Objects/unicodeformat.c
   253 Objects/unicodeimpl.h
   733 Objects/unicodelegacy.c
  1836 Objects/unicodenew.c
  2777 Objects/unicodeobject.c
  2421 Objects/unicodeoperators.c
  1235 Objects/unicodeoscodecs.c
  1288 Objects/unicodeutfcodecs.c

As Victor himself admits, unicodelegacy and unicodenew are not
descriptive of what they contain.  In I18N discussions, legacy is
usually a deprectory reference to non-Unicode encodings, and I would
tend to guess this file contains codecs from the name.  A better name
might be unicodedeprecated (if what he really means is deprecated
APIs).

I don't understand why splitting out unicodeoperators is a great
idea; it's done nowhere else in CPython.  If that makes sense, why not
split out unicodemethods (for methods normally invoked explicitly
rather than by syntax) too?  N.B. For bytes, the corresponding file is
spelled bytes_methods.

unicodecodecs vs unicodeutfcodecs: Say what?  I would forever be
looking in the wrong one.

unicodeoscodecs suggests to me that these codecs are only usable on
some OSes.  If so, shouldn't the relevant OS be in the name?  If not,
the name is basically misleading IMO.

Why are any of these codecs here in unicodeobjectland in the first
place?  Sure, they're needed so that Python can find its own stuff,
but in principle *any* codec could be needed.  Is it just an heuristic
that the codecs needed for 99% of the world are here, and other codecs
live in separate modules?

Steve
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-23 Thread Benjamin Peterson
2012/10/22 Victor Stinner victor.stin...@gmail.com:
 Hi,

 I forked CPython repository to work on my split unicodeobject.c project:
 http://hg.python.org/sandbox/split-unicodeobject.c

 The result is 10 files (included the existing unicodeobject.c):

   1176 Objects/unicodecharmap.c
   1678 Objects/unicodecodecs.c
   1362 Objects/unicodeformat.c
253 Objects/unicodeimpl.h
733 Objects/unicodelegacy.c
   1836 Objects/unicodenew.c
   2777 Objects/unicodeobject.c
   2421 Objects/unicodeoperators.c
   1235 Objects/unicodeoscodecs.c
   1288 Objects/unicodeutfcodecs.c
  14759 total

 This is just a proposition (and work in progress). Everything can be changed 
 :-)

 unicodenew.c is not a good name. Content of this file may be moved
 somewhere else.

 Some files may be merged again if the separation is not justified.

 I don't like the unicode prefix for filenames, I would prefer a new 
 directory.

 --

 Shorter files are easier to review and maintain. The compilation is
 faster if only one file is modified.

 The MBCS codec requires windows.h. The whole unicodeobject.c includes
 it just for this codec. With the split, only unicodeoscodecs.c
 includes this file.

 The MBCS codec needs also a winver variable. This variable is
 defined between the BLOOM filter and the unicode_result_unchanged()
 function. How can you explain how these things are sorted? Where
 should I add a new function or variable? With the split, the variable
 is now defined very close to where is it used. You don't have to
 scroll 7000 lines to see where it is used.

 If you would like to work on a specific function, you don't have to
 use the search function of your editor to skip thousands to lines. For
 example, the 18 functions and 2 types related to the charmap codec are
 now grouped into one unique and short C file.

 It was already possible to extend and maintain unicodeobject.c (some
 people proved it!), but it should now be much simpler with shorter
 files.

I would like to repeat my opposition to splitting unicodeobject.c. I
don't think the benefits of such a split have been well justified,
certainly not to the point that the claim about much simpler
maintenance is true.


-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-23 Thread M.-A. Lemburg
On 23.10.2012 10:22, Benjamin Peterson wrote:
 2012/10/22 Victor Stinner victor.stin...@gmail.com:
 Hi,

 I forked CPython repository to work on my split unicodeobject.c project:
 http://hg.python.org/sandbox/split-unicodeobject.c

 The result is 10 files (included the existing unicodeobject.c):

   1176 Objects/unicodecharmap.c
   1678 Objects/unicodecodecs.c
   1362 Objects/unicodeformat.c
253 Objects/unicodeimpl.h
733 Objects/unicodelegacy.c
   1836 Objects/unicodenew.c
   2777 Objects/unicodeobject.c
   2421 Objects/unicodeoperators.c
   1235 Objects/unicodeoscodecs.c
   1288 Objects/unicodeutfcodecs.c
  14759 total

 This is just a proposition (and work in progress). Everything can be changed 
 :-)

 unicodenew.c is not a good name. Content of this file may be moved
 somewhere else.

 Some files may be merged again if the separation is not justified.

 I don't like the unicode prefix for filenames, I would prefer a new 
 directory.

 --

 Shorter files are easier to review and maintain. The compilation is
 faster if only one file is modified.

 The MBCS codec requires windows.h. The whole unicodeobject.c includes
 it just for this codec. With the split, only unicodeoscodecs.c
 includes this file.

 The MBCS codec needs also a winver variable. This variable is
 defined between the BLOOM filter and the unicode_result_unchanged()
 function. How can you explain how these things are sorted? Where
 should I add a new function or variable? With the split, the variable
 is now defined very close to where is it used. You don't have to
 scroll 7000 lines to see where it is used.

 If you would like to work on a specific function, you don't have to
 use the search function of your editor to skip thousands to lines. For
 example, the 18 functions and 2 types related to the charmap codec are
 now grouped into one unique and short C file.

 It was already possible to extend and maintain unicodeobject.c (some
 people proved it!), but it should now be much simpler with shorter
 files.
 
 I would like to repeat my opposition to splitting unicodeobject.c. I
 don't think the benefits of such a split have been well justified,
 certainly not to the point that the claim about much simpler
 maintenance is true.

Same feelings here.

If you do go ahead with such a split, please only split the source
files and keep the unicodeobject.c file which then includes all
the other files. Such a restructuring should not result in compilers
no longer being able to optimize code by inlining functions
in one of the most important basic types we have in Python 3.

Also note that splitting the file in multiple smaller ones will
actually create more maintenance overhead, since patches will
likely no longer be easy to merge from 3.3 to 3.4.

BTW: The positive effect of having everything in one file is
that you no longer have to figure which files to look when
trying to find a piece of logic... it's just a ctrl-f or
ctrl-s away :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 23 2012)
 Python Projects, Consulting and Support ...   http://www.egenix.com/
 mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2012-09-27: Released eGenix PyRun 1.1.0 ...   http://egenix.com/go35
2012-09-26: Released mxODBC.Connect 2.0.1 ... http://egenix.com/go34
2012-09-25: Released mxODBC 3.2.1 ... http://egenix.com/go33
2012-10-23: Python Meeting Duesseldorf ... today

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-23 Thread Victor Stinner
 Such a restructuring should not result in compilers
 no longer being able to optimize code by inlining functions
 in one of the most important basic types we have in Python 3.

I agree that performances are important. But I'm not convinced than
moving functions has a real impact on performances, not that such
issues cannot be fixed.

I tried to limit changes impacting performances. Inlining is (only?)
interesting for short functions. PEP 393 introduces many macros for
this. I also added some Fast functiions
(_PyUnicode_FastCopyCharacters() and _PyUnicode_FastFill()) which
don't check parameters and do the real work. I don't think that it's
really useful to inline _PyUnicode_FastFill() in the caller for
example.

I will check performances of all str methods. For example, str.count()
is now calling PyUnicode_Count() instead of the static count().
PyUnicode_Count() adds some extra checks, some of them are not
necessary, and it's not a static function, so it cannot(?) be inlined.
But I bet that the overhead is really low.

Note: Since GCC 4.5, Link Time Optimization are possible. I don't know
if GCC is able to inline functions defined in different files, but C
compilers are better at each release.

--

I will check the impact of performances on _PyUnicode_Widen() and
_PyUnicode_Putchar(), which are no more static. _PyUnicode_Widen() and
_PyUnicode_Putchar() are used in Unicode codecs when it's more
expensive to compute the exact length and maximum character of the
output string. These functions are optimistic (hope that the output
will not grow too much and the string is not widen too much times,
so it should be faster for ASCII).

I implemented a similar approach in my PyUnicodeWriter API, and I plan
to reuse this API to simplify the API. PyUnicodeWriter uses some macro
to limit the overhead of having to check before each write if we need
to enlarge or widen the internal buffer, and allow to write directly
into the buffer using low level functions like PyUnicode_WRITE.

I also hope a performance improvement because the PyUnicodeWriter API
can also overallocate the internal buffer to reduce the number of
calls to realloc() (which is usually slow).

 Also note that splitting the file in multiple smaller ones will
 actually create more maintenance overhead, since patches will
 likely no longer be easy to merge from 3.3 to 3.4.

I'm a candidate to maintain unicodeobject.c. In your check
unicodeobject.c (recent) history, I'm one of the most active developer
on this file since two years (especially in 2012). I'm not sure that
merges on this file are so hard.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-23 Thread Antoine Pitrou

Le 23/10/2012 12:05, Victor Stinner a écrit :

Such a restructuring should not result in compilers
no longer being able to optimize code by inlining functions
in one of the most important basic types we have in Python 3.


I agree that performances are important. But I'm not convinced than
moving functions has a real impact on performances, not that such
issues cannot be fixed.


I agree with Marc-André, there's no point in compiling those files 
separately. #include'ing them in the master unicodeobject.c file is fine.


Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-23 Thread Amaury Forgeot d'Arc
2012/10/23 Antoine Pitrou solip...@pitrou.net:
 I agree with Marc-André, there's no point in compiling those files
 separately. #include'ing them in the master unicodeobject.c file is fine.

I also find the unicodeobject.c difficult to navigate.
Even if we don't split the file, I'd advocate a better presentation of
its content.

Could we have at least clear sections, with titles and descriptions?
And use the ^L page separator for Emacs users?

Code in posixmodule.c could also benefit of a better layout.

-- 
Amaury Forgeot d'Arc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-07 Thread Victor Stinner
 The amount of code will not be reduced, but now you also need to guess what
 file some piece of functionality may be in.

How do you search a piece of code? If you search for a function by its
name, it does not matter in which file it is defined if you an IDE or
vim/emacs with a correct configuration. For example, I type :tag
PyUnicode_Format to go to the PyUnicode_Format() function.

 Instead of having my text editor
 (Emacs) search in one file, it will have to search across multiple files -
 but not across all open buffers, but only some of them (since I will have
 many other source files open as well).

Does it mean that it would be more practical to merge all C files into
one unique file?

 I really fail to see what problem people have with large source files.
 What is it that you want to do that can be done easier if it's multiple
 files?

Another problem with huge files is to handle dependencies with
static functions. If the function A calls the function B which calls
the function C, you have to order A, B and C correctly if these
functions are private and not declared at the top of the file.

If functions are grouped correctly, you just lhave to add the function
to the right file, or reorder the files.

I also prefer short files beacuse it's easier to review/audit a small
file. My brain cannot store too many functions :-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-07 Thread Benjamin Peterson
2012/10/7 Victor Stinner victor.stin...@gmail.com:
 Another problem with huge files is to handle dependencies with
 static functions. If the function A calls the function B which calls
 the function C, you have to order A, B and C correctly if these
 functions are private and not declared at the top of the file.

Having separate files doesn't alleviate this, though. If they're in
separate files, you have to have header files of prototypes.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-07 Thread Chris Angelico
On Mon, Oct 8, 2012 at 8:17 AM, Victor Stinner victor.stin...@gmail.com wrote:
 Another problem with huge files is to handle dependencies with
 static functions. If the function A calls the function B which calls
 the function C, you have to order A, B and C correctly if these
 functions are private and not declared at the top of the file.

 If functions are grouped correctly, you just lhave to add the function
 to the right file, or reorder the files.

This isn't a fundamental problem, since you can always declare a
private function if it's mutually recursive with another private
function. But - forgive me if this is false in CPython - this isn't
usually that common. Also, ordering the functions in (at least an
approximation of) Define Before Use makes it easy to locate the one
you're calling, even in a non-smart editor: just go to the top of the
file and search for the function's name; the first hit will be the
definition. It's not usually difficult to sort functions
appropriately, and can pay dividends in readability.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-07 Thread martin


Zitat von Victor Stinner victor.stin...@gmail.com:


The amount of code will not be reduced, but now you also need to guess what
file some piece of functionality may be in.


How do you search a piece of code?


I type /pattern in vim, or Ctrl-s (incremental search) in Emacs.


If you search for a function by its
name, it does not matter in which file it is defined if you an IDE or
vim/emacs with a correct configuration. For example, I type :tag
PyUnicode_Format to go to the PyUnicode_Format() function.


I don't like tag files. I want to search in all source code (including
comments and strings), and I want to do a substring search (not sure
whether that is supported in tag files).


Instead of having my text editor
(Emacs) search in one file, it will have to search across multiple files -
but not across all open buffers, but only some of them (since I will have
many other source files open as well).


Does it mean that it would be more practical to merge all C files into
one unique file?


That would be extreme, of course. It may cause problems with the
responsiveness of the editor, and with compile times; it may also cause
problems with merging in version control. In addition, there might
be naming conflicts which make it impractical (e.g. many structures
containing the same tp_* struct slots, so when you search for tp_new,
for example, you would get too many hits).

But in principle, I don't mind maintaining *very* large source files.
unicodeobject.c isn't really *that* large.



What is it that you want to do that can be done easier if it's multiple
files?


Another problem with huge files is to handle dependencies with
static functions. If the function A calls the function B which calls
the function C, you have to order A, B and C correctly if these
functions are private and not declared at the top of the file.

If functions are grouped correctly, you just lhave to add the function
to the right file, or reorder the files.


I don't understand. Do you envision that A, B, and C are in separate files?
If so, they cannot be all static anymore, unless you still combine all files
with #include directives, or unless you put them still all in the same file.
I don't see how multiple files gives any improvement. It seems to make matters
worse:
- if you put A, B, C in the same file, you have the same issue that you
  had when unicodeobject.c was a large file - you have to order them
  correctly.
- if you put them in different files, it gets worse: you need to place
  A in a file that gets included after the file that has B, even if it
  would be more logical to put them reverse.


I also prefer short files beacuse it's easier to review/audit a small
file. My brain cannot store too many functions :-)


This is what I don't understand. Why do you have to remember all functions
when reviewing or auditing a file? You can safely ignore all functions
but the one you are reviewing - whether the other functions are in a different
file or in the same file.

Why can you ignore the functions only if they are stored in a different
file?

Regards,
Martin



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-05 Thread M.-A. Lemburg
Victor Stinner wrote:
 Hi,
 
 I would like to split the huge unicodeobject.c file into smaller
 files. It's just the longest C file of CPython: 14,849 lines.
 
 I don't know exactly how to split it, but first I would like to know
 if you would agree with the idea.
 
 Example:
  - Objects/unicode/codecs.c
  - Objects/unicode/mod_format.c
  - Objects/unicode/methods.c
  - Objects/unicode/operators.c
  - etc.
 
 I don't know if it's better to use a subdirectory, or use a prefix for
 new files: Objects/unicode_methods.c, Objects/unicode_codecs.c, etc.
 There is already a Python/codecs.c file for example (same filename).

Better follow the already existing pattern of using unicode as
prefix, e.g. unicodectype.c and unicodetype_db.h.

 I would like to split the unicodeobject.c because it's hard to
 navigate in this huge file between all functions, variables, types,
 macros, etc. It's hard to add new code and to fix bugs. For example,
 the implementation of str%args takes 1000 lines, 2 types and 10
 functions (since my refactor yesterday, in Python 3.3 the main
 function is 500 lines long :-)).
 
 I only see one argument against such refactoring: it will be harder to
 backport/forwardport bugfixes.

When making such a change, you have to pay close attention to
functions that the compiler can potentially inline. AFAIK, moving
such functions into a separate file would prevent such
inlining/optimizations, e.g. the str formatter wouldn't be
able to inline codec calls if placed in separate .c files.

It may be better to split the file into multiple .h files which
then get recombined into the one unicodeobject.c file.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 05 2012)
 Python Projects, Consulting and Support ...   http://www.egenix.com/
 mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2012-09-27: Released eGenix PyRun 1.1.0 ...   http://egenix.com/go35
2012-09-26: Released mxODBC.Connect 2.0.1 ... http://egenix.com/go34
2012-09-25: Released mxODBC 3.2.1 ... http://egenix.com/go33
2012-10-23: Python Meeting Duesseldorf ... 18 days to go

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-05 Thread Chris Jerdonek
On Thu, Oct 4, 2012 at 6:49 PM, Stephen J. Turnbull step...@xemacs.org wrote:
 Chris Jerdonek writes:

   You can create multiple files this way.  I just verified it.  But the
   problem happens with merging.  You will create merge conflicts in the
   deleted portions of every split file on every merge.  There may be a
   way to avoid this that I don't know about though (i.e. to record that
   merges into the deleted portions should no longer occur).
 ...
 There's no other way to do it that I know of in any VCS because they
 all track conflicts at the file level.  (It would be straightforward
 to generalize git to handle this gracefully, but it would be a hugely
 disruptive change.  I don't know if Mercurial would be susceptible to
 such an extension.)

FWIW, I filed an issue in Mercurial's tracker to add support for
splitting files and copying subsets of files:

http://bz.selenic.com/show_bug.cgi?id=3649

As I thought it might be, the idea was rejected.

--Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-04 Thread Andrew Svetlov
I like the idea. From my perspective better to use subdirectory to
sake of easy finding in grep style.

On Thu, Oct 4, 2012 at 11:30 PM, Victor Stinner
victor.stin...@gmail.com wrote:
 Hi,

 I would like to split the huge unicodeobject.c file into smaller
 files. It's just the longest C file of CPython: 14,849 lines.

 I don't know exactly how to split it, but first I would like to know
 if you would agree with the idea.

 Example:
  - Objects/unicode/codecs.c
  - Objects/unicode/mod_format.c
  - Objects/unicode/methods.c
  - Objects/unicode/operators.c
  - etc.

 I don't know if it's better to use a subdirectory, or use a prefix for
 new files: Objects/unicode_methods.c, Objects/unicode_codecs.c, etc.
 There is already a Python/codecs.c file for example (same filename).

 I would like to split the unicodeobject.c because it's hard to
 navigate in this huge file between all functions, variables, types,
 macros, etc. It's hard to add new code and to fix bugs. For example,
 the implementation of str%args takes 1000 lines, 2 types and 10
 functions (since my refactor yesterday, in Python 3.3 the main
 function is 500 lines long :-)).

 I only see one argument against such refactoring: it will be harder to
 backport/forwardport bugfixes.

 Victor
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com



-- 
Thanks,
Andrew Svetlov
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-04 Thread Benjamin Peterson
2012/10/4 Victor Stinner victor.stin...@gmail.com:
 I only see one argument against such refactoring: it will be harder to
 backport/forwardport bugfixes.

I imagine it could also prevent inlining of hot paths.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-04 Thread Victor Stinner
2012/10/4 Benjamin Peterson benja...@python.org:
 2012/10/4 Victor Stinner victor.stin...@gmail.com:
 I only see one argument against such refactoring: it will be harder to
 backport/forwardport bugfixes.

 I imagine it could also prevent inlining of hot paths.

It depends how the code is compiled. The stringlib is splitted in many
.h files, but it is able to use Py_LOCAL_INLINE.

If the code is grouped correctly, we may not loose any nice optimization at all.

FYI #include test.c is allowed in C ;-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-04 Thread Benjamin Peterson
2012/10/4 Victor Stinner victor.stin...@gmail.com:
 2012/10/4 Benjamin Peterson benja...@python.org:
 2012/10/4 Victor Stinner victor.stin...@gmail.com:
 I only see one argument against such refactoring: it will be harder to
 backport/forwardport bugfixes.

 I imagine it could also prevent inlining of hot paths.

 It depends how the code is compiled. The stringlib is splitted in many
 .h files, but it is able to use Py_LOCAL_INLINE.

 If the code is grouped correctly, we may not loose any nice optimization at 
 all.

 FYI #include test.c is allowed in C ;-)

Yes, but then compilation won't be any faster. ;)



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-04 Thread Chris Jerdonek
On Thu, Oct 4, 2012 at 1:30 PM, Victor Stinner victor.stin...@gmail.com wrote:
 I would like to split the huge unicodeobject.c file into smaller
 files. It's just the longest C file of CPython: 14,849 lines.
 ...
 I only see one argument against such refactoring: it will be harder to
 backport/forwardport bugfixes.

I am not siding with either side of the change yet, but an additional
argument against is that history may become less convenient to
navigate and track (e.g. hg annotate may lose information depending on
how the split is done).

Do we have a preferred way to split files?  For example, hg rename
could be used just for the largest chunk.  Or hg copy could be used on
all chunks but one.  I imagine (but have not confirmed) that the
latter would preserve hg annotate and let merges propagate to all
files, but it would also result in spurious merge conflicts in every
one of the files resulting from the split.

--Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-04 Thread Victor Stinner
 I am not siding with either side of the change yet, but an additional
 argument against is that history may become less convenient to
 navigate and track (e.g. hg annotate may lose information depending on
 how the split is done).

If new files are created using hg cp unicodeobject.c
unicode/newfile.c, the historic is kept.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-04 Thread Benjamin Peterson
2012/10/4 Victor Stinner victor.stin...@gmail.com:
 I am not siding with either side of the change yet, but an additional
 argument against is that history may become less convenient to
 navigate and track (e.g. hg annotate may lose information depending on
 how the split is done).

 If new files are created using hg cp unicodeobject.c
 unicode/newfile.c, the historic is kept.

Yes, but you can only create one file that way.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-04 Thread Victor Stinner
2012/10/5 Benjamin Peterson benja...@python.org:
 2012/10/4 Victor Stinner victor.stin...@gmail.com:
 If new files are created using hg cp unicodeobject.c
 unicode/newfile.c, the historic is kept.

 Yes, but you can only create one file that way.

You can create as many files as you want. Try:
---
hg cp unicodeobject.c unicode2.c
hg cp unicodeobject.c unicode3.c
hg ci -m add new files
edit unicode2.c (remove most lines)
edit unicode3.c (remove most lines, but other lines)
hg ci -m modify
hg blame unicode2.c
hg blame unicode3.c
---

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-04 Thread Chris Jerdonek
On Thu, Oct 4, 2012 at 4:31 PM, Benjamin Peterson benja...@python.org wrote:
 2012/10/4 Victor Stinner victor.stin...@gmail.com:
 I am not siding with either side of the change yet, but an additional
 argument against is that history may become less convenient to
 navigate and track (e.g. hg annotate may lose information depending on
 how the split is done).

 If new files are created using hg cp unicodeobject.c
 unicode/newfile.c, the historic is kept.

 Yes, but you can only create one file that way.

You can create multiple files this way.  I just verified it.  But the
problem happens with merging.  You will create merge conflicts in the
deleted portions of every split file on every merge.  There may be a
way to avoid this that I don't know about though (i.e. to record that
merges into the deleted portions should no longer occur).

--Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-04 Thread Eric V. Smith
On 10/4/2012 4:30 PM, Victor Stinner wrote:
 Hi,
 
 I would like to split the huge unicodeobject.c file into smaller
 files. It's just the longest C file of CPython: 14,849 lines.

What problem are you trying to solve?

-- 
Eric.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-04 Thread Antoine Pitrou
On Thu, 04 Oct 2012 23:46:57 +0200
mar...@v.loewis.de wrote:
 
 Zitat von Victor Stinner victor.stin...@gmail.com:
 
  I only see one argument against such refactoring: it will be harder to
  backport/forwardport bugfixes.
 
 I'm opposed for a different reason: I think it will be *harder* to maintain.
 The amount of code will not be reduced, but now you also need to guess what
 file some piece of functionality may be in. Instead of having my text editor
 (Emacs) search in one file, it will have to search across multiple files -
 but not across all open buffers, but only some of them (since I will have
 many other source files open as well).
 
 I really fail to see what problem people have with large source files.
 What is it that you want to do that can be done easier if it's multiple
 files?

Navigate, basically. That is, switch between different pieces of code,
without having to type in some text to search for.

Regards

Antoine.


-- 
Software development and contracting: http://pro.pitrou.net


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-04 Thread Stephen J. Turnbull
Chris Jerdonek writes:

  You can create multiple files this way.  I just verified it.  But the
  problem happens with merging.  You will create merge conflicts in the
  deleted portions of every split file on every merge.  There may be a
  way to avoid this that I don't know about though (i.e. to record that
  merges into the deleted portions should no longer occur).

hg commit will do that automatically, but you need to resolve that
conflict once manually.  If you also happen to be merging *from* a
feature branch *into* the trunk, you need to close the feature branch.
If it needs more work, close it and make a new branch.  Alternatively,
merge from trunk into the feature branch immediately to minimize the
accumulation of conflicts.  Then the eventual merge back to trunk will
only have real conflicts in it.

Note that immediately in the sense needed can be done at any time
because what you need to do is merge the revision created by the hg
cp operations.  If Victor tags with hg tag unicode-refactored, then
you just do hg merge -r unicode-refactored, and if you haven't made
any changes to the relevant files, you shouldn't get any conflicts.
If you do, then use hg revert -r unicode-refactored file ...,
followed by hg resolve --mark file ..., then fix other conflicts,
resolve, and commit as usual.

There's no other way to do it that I know of in any VCS because they
all track conflicts at the file level.  (It would be straightforward
to generalize git to handle this gracefully, but it would be a hugely
disruptive change.  I don't know if Mercurial would be susceptible to
such an extension.)

Specifically, AFAIK this kind of merge conflict will occur:

- if the branch being merged was forked before the hg cp, and

- for each file in the branch containing changes in the deleted region.

I don't advocate this, just want to make the costs clearer.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles?

2012-10-04 Thread Benjamin Peterson
2012/10/4 Antoine Pitrou solip...@pitrou.net:
 On Thu, 04 Oct 2012 23:46:57 +0200
 mar...@v.loewis.de wrote:

 Zitat von Victor Stinner victor.stin...@gmail.com:

  I only see one argument against such refactoring: it will be harder to
  backport/forwardport bugfixes.

 I'm opposed for a different reason: I think it will be *harder* to maintain.
 The amount of code will not be reduced, but now you also need to guess what
 file some piece of functionality may be in. Instead of having my text editor
 (Emacs) search in one file, it will have to search across multiple files -
 but not across all open buffers, but only some of them (since I will have
 many other source files open as well).

 I really fail to see what problem people have with large source files.
 What is it that you want to do that can be done easier if it's multiple
 files?

 Navigate, basically. That is, switch between different pieces of code,
 without having to type in some text to search for.

I find it's only possible to navigate without searching for extremely
small files.


-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com