Re: [Python-Dev] Split unicodeobject.c into subfiles?
[Apologies for resurrecting a few-weeks old thread.] On Thu, Oct 4, 2012 at 2:46 PM, mar...@v.loewis.de wrote: Zitat von Victor Stinner victor.stin...@gmail.com: I only see one argument against such refactoring: it will be harder to backport/forwardport bugfixes. I'm opposed for a different reason: I think it will be *harder* to maintain. The amount of code will not be reduced, but now you also need to guess what file some piece of functionality may be in. Instead of having my text editor (Emacs) search in one file, it will have to search across multiple files - but not across all open buffers, but only some of them (since I will have many other source files open as well). I really fail to see what problem people have with large source files. What is it that you want to do that can be done easier if it's multiple files? One thing is browse or link to such code files on the web (e.g. from within a tracker comment or from within our online documentation). For example, today I was unable to open the following page from within a browser to link to one of its lines on a tracker comment: http://hg.python.org/cpython/file/27c20650aeab/Objects/unicodeobject.c My laptop's fan simply turns on and the page hangs indefinitely while loading. I don't think this point was ever mentioned. --Chris ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
On Sun, Nov 18, 2012 at 5:47 AM, Chris Jerdonek chris.jerdo...@gmail.com wrote: On Thu, Oct 4, 2012 at 2:46 PM, mar...@v.loewis.de wrote: I really fail to see what problem people have with large source files. What is it that you want to do that can be done easier if it's multiple files? One thing is browse or link to such code files on the web (e.g. from within a tracker comment or from within our online documentation). For example, today I was unable to open the following page from within a browser to link to one of its lines on a tracker comment: http://hg.python.org/cpython/file/27c20650aeab/Objects/unicodeobject.c My laptop's fan simply turns on and the page hangs indefinitely while loading. Curious. This sounds like a web browser issue - I can pull it up in either Chrome or Firefox on Windows on my 2GHz/2GB RAM laptop with a visible pause, but not more than half a second. However, this page is rather more significant, and is affected equally by the file size: http://hg.python.org/cpython/annotate/27c20650aeab/Objects/unicodeobject.c ChrisA ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
On Sat, Nov 17, 2012 at 10:55 AM, Chris Angelico ros...@gmail.com wrote: On Sun, Nov 18, 2012 at 5:47 AM, Chris Jerdonek chris.jerdo...@gmail.com wrote: On Thu, Oct 4, 2012 at 2:46 PM, mar...@v.loewis.de wrote: I really fail to see what problem people have with large source files. What is it that you want to do that can be done easier if it's multiple files? One thing is browse or link to such code files on the web (e.g. from within a tracker comment or from within our online documentation). For example, today I was unable to open the following page from within a browser to link to one of its lines on a tracker comment: http://hg.python.org/cpython/file/27c20650aeab/Objects/unicodeobject.c My laptop's fan simply turns on and the page hangs indefinitely while loading. Curious. This sounds like a web browser issue - I can pull it up in either Chrome or Firefox on Windows on my 2GHz/2GB RAM laptop with a visible pause, but not more than half a second. I'm also using Chrome and on a fairly new Mac. Perhaps. I tried again and it froze up several open *.python.org tabs (mail.python.org, bugs.python.org, etc). However, later it worked as you describe. The problem seems sporadic. --Chris ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On Thu, Oct 25, 2012 at 2:22 PM, Stephen J. Turnbull step...@xemacs.org wrote: Nick Coghlan writes: OK, I need to weigh in after seeing this kind of reply. Large source files are discouraged in general because they're a code smell that points strongly towards a *lack of modularity* within a *complex piece of functionality*. Sure, but large numbers of tiny source files are also a code smell, the smell of purist adherence to the literal principle of modularity without application of judgment. Absolutely. The classic example of this is Java's unfortunate insistence on only-one-public-top-level-class-per-file. Bleh. If you want to argue that the pragmatic point of view nevertheless is to break up the file, I can see that, but I think Victor is going too far. (Full disclosure dept.: the call graph of the Emacs equivalents is isomorphic to the Dungeon of Zork, so I may be a bit biased.) You really should speak to the question of how many and what partition. Yes, I agree I was too hasty in calling the specifics of Victor's current proposal a good idea. What raised my ire was the raft of replies objecting to the refactoring *in principle* for completely specious reasons like being able to search within a single file instead of having to use tools that can search across multiple files. unicodeobject.c is too big, and should be restructured to make any natural modularity explicit, and provide an easier path for users that want to understand how the unicode implementation works. the real gain is in *modularity*, making it clear to readers which parts can be understood and worked on separately from each other. Yeah, so which do you think they are? It seems to me that there are three modules to be carved out of unicodeobject.c: 1. The internal object management that is not exposed to Python: allocation, deallocation, and PEP 393 transformations. 2. The public interface to Python implementation: methods and properties, including operators. 3. Interaction with the outside world: codec implementations. But conceptually, these really don't have anything to do with internal implementation of Unicode objects. They're just functions that convert bytes to Unicode and vice versa. In principle they can be written in terms of ord(), chr(), and bytes(). On the other hand, they're rather repetitive: When you've seen one codec implementation, you've seen them all. I see no harm in grouping them in one file, and possibly a gain from proximity: casual passers-by might see refactorings that reduce redundancy. I suspect you and Victor are in a much better position to thrash out the details than I am. It was the trend in the discussion to treat the question as split or don't split? rather than how should we split it? when a file that large should already contain some natural splitting points if the implementation isn't a tangled monolithic mess. Why are any of these codecs here in unicodeobjectland in the first place? Sure, they're needed so that Python can find its own stuff, but in principle *any* codec could be needed. Is it just an heuristic that the codecs needed for 99% of the world are here, and other codecs live in separate modules? I believe it's a combination of history and whether or not they're needed by the interpreter during the bootstrapping process before the encodings namespace is importable. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On 25.10.2012 08:42, Nick Coghlan wrote: Why are any of these codecs here in unicodeobjectland in the first place? Sure, they're needed so that Python can find its own stuff, but in principle *any* codec could be needed. Is it just an heuristic that the codecs needed for 99% of the world are here, and other codecs live in separate modules? I believe it's a combination of history and whether or not they're needed by the interpreter during the bootstrapping process before the encodings namespace is importable. They are in unicodeobject.c so that the compilers can inline the code in the various other places where they are used in the Unicode implementation directly as necessary and because the codecs use a lot of functions from the Unicode API (obviously), so the other direction of inlining (Unicode API in codecs) is needed as well. BTW: When discussing compiler optimizations, please remember that there are more compilers out there than just GCC and also the fact that not everyone is using the latest and greatest version of it. Link time inlining will usually not be as efficient as compile time optimization and we need every bit of performance we can get for Unicode in Python 3. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 25 2012) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2012-09-27: Released eGenix PyRun 1.1.0 ... http://egenix.com/go35 2012-09-26: Released mxODBC.Connect 2.0.1 ... http://egenix.com/go34 2012-10-29: PyCon DE 2012, Leipzig, Germany ... 4 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On 25.10.2012 08:42, Nick Coghlan wrote: unicodeobject.c is too big, and should be restructured to make any natural modularity explicit, and provide an easier path for users that want to understand how the unicode implementation works. You can also achieve that goal by structuring the code in unicodeobject.c in a more modular way. It is already structured in sections, but there's always room for improvement, of course. As mentioned before, it is impossible to split out various sections into separate .c or .h files which then get included in the main unicodeobject.c. If that's where consensus is going, I'm with Stephen here in that such a separation should be done in higher level chunks, rather than creating 10 new files. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 25 2012) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2012-09-27: Released eGenix PyRun 1.1.0 ... http://egenix.com/go35 2012-09-26: Released mxODBC.Connect 2.0.1 ... http://egenix.com/go34 2012-10-29: PyCon DE 2012, Leipzig, Germany ... 4 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On Thu, Oct 25, 2012 at 8:57 AM, M.-A. Lemburg m...@egenix.com wrote: On 25.10.2012 08:42, Nick Coghlan wrote: Why are any of these codecs here in unicodeobjectland in the first place? Sure, they're needed so that Python can find its own stuff, but in principle *any* codec could be needed. Is it just an heuristic that the codecs needed for 99% of the world are here, and other codecs live in separate modules? I believe it's a combination of history and whether or not they're needed by the interpreter during the bootstrapping process before the encodings namespace is importable. They are in unicodeobject.c so that the compilers can inline the code in the various other places where they are used in the Unicode implementation directly as necessary and because the codecs use a lot of functions from the Unicode API (obviously), so the other direction of inlining (Unicode API in codecs) is needed as well. I'm sorry to interrupt, but have you actually measured? What effect the lack of said inlining has on *any* benchmark is definitely beyond my ability to guess and I suspect is beyond the ability to guess of anyone else on this list. I challenge you to find a benchmark that is being significantly affected (15%) with the split proposed by Victor. It does not even have to be a real-world one, although that would definitely buy it more credibility. Cheers, fijal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On 25.10.2012 11:18, Maciej Fijalkowski wrote: On Thu, Oct 25, 2012 at 8:57 AM, M.-A. Lemburg m...@egenix.com wrote: On 25.10.2012 08:42, Nick Coghlan wrote: Why are any of these codecs here in unicodeobjectland in the first place? Sure, they're needed so that Python can find its own stuff, but in principle *any* codec could be needed. Is it just an heuristic that the codecs needed for 99% of the world are here, and other codecs live in separate modules? I believe it's a combination of history and whether or not they're needed by the interpreter during the bootstrapping process before the encodings namespace is importable. They are in unicodeobject.c so that the compilers can inline the code in the various other places where they are used in the Unicode implementation directly as necessary and because the codecs use a lot of functions from the Unicode API (obviously), so the other direction of inlining (Unicode API in codecs) is needed as well. I'm sorry to interrupt, but have you actually measured? What effect the lack of said inlining has on *any* benchmark is definitely beyond my ability to guess and I suspect is beyond the ability to guess of anyone else on this list. I challenge you to find a benchmark that is being significantly affected (15%) with the split proposed by Victor. It does not even have to be a real-world one, although that would definitely buy it more credibility. I think you misunderstood. What I described is the reason for having the base codecs in unicodeobject.c. I think we all agree that inlining has a positive effect on performance. The scale of the effect depends on the used compiler and platform. Victor already mentioned that he'll check the impact of his proposal, so let's wait for that. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 25 2012) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2012-09-27: Released eGenix PyRun 1.1.0 ... http://egenix.com/go35 2012-09-26: Released mxODBC.Connect 2.0.1 ... http://egenix.com/go34 2012-10-29: PyCon DE 2012, Leipzig, Germany ... 4 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On 25.10.12 12:18, Maciej Fijalkowski wrote: I challenge you to find a benchmark that is being significantly affected (15%) with the split proposed by Victor. It does not even have to be a real-world one, although that would definitely buy it more credibility. I see 10% slowdown for UTF-8 decoding for UCS2 strings, but 10% speedup for mostly-BMP UCS4 strings. For encoding the situation is reversed (but up to +27%). Charmap decoding speedups 10-30%. GCC 4.4.3, 32-bit Linux. https://bitbucket.org/storchaka/cpython-stuff/src/default/bench ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
I think you misunderstood. What I described is the reason for having the base codecs in unicodeobject.c. I think we all agree that inlining has a positive effect on performance. The scale of the effect depends on the used compiler and platform. Well. Inlining can have positive or negative effects, depending on various details. Too much inlining causes more cache misses for example. However, this is absolutely irrelevant if you don't create benchmarks and run them. Guessing is seriously not a very good optimization strategy. Cheers, fijal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On 25.10.12 12:49, M.-A. Lemburg wrote: I think you misunderstood. What I described is the reason for having the base codecs in unicodeobject.c. For example PyUnicode_FromStringAndSize and PyUnicode_FromString are thin wrappers around PyUnicode_DecodeUTF8Stateful. I think this is a reason to keep this functions together. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On Thu, Oct 25, 2012 at 8:07 PM, Maciej Fijalkowski fij...@gmail.com wrote: I think you misunderstood. What I described is the reason for having the base codecs in unicodeobject.c. I think we all agree that inlining has a positive effect on performance. The scale of the effect depends on the used compiler and platform. Well. Inlining can have positive or negative effects, depending on various details. Too much inlining causes more cache misses for example. However, this is absolutely irrelevant if you don't create benchmarks and run them. Guessing is seriously not a very good optimization strategy. Yep, that's why I made the point that speed.python.org should be a going concern well before 3.4 release, and will be able to let us know if we have a problem relative to 3.3. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
Le 25/10/2012 02:03, Nick Coghlan a écrit : speed.python.org is also making progress, and once that is up and running (which will happen well before any Python 3.4 release) it will be possible to compare the numbers between 3.3 and trunk to help determine the validity of any concerns regarding optimisations that can be performed within a module but not across modules. Nobody needs speed.python.org to run benchmarks before and after a specific change, though. Cloning http://hg.python.org/benchmarks and using the perf.py runner is everything that is needed. Moreover, you would want to run benchmarks *before* committing and pushing the changes. We don't want the huge splitting to be recorded and then backed out in the repository history. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
Le 25/10/2012 00:15, Nick Coghlan a écrit : However, -1 on the faux modularity idea of breaking up the files on disk, but still exposing them to the compiler and linker as a monolithic block, though. That would be completely missing the point of why large source files are bad. I disagree with you. Source files are meant to be read by humans, we don't really care whether the compiler has a modular view of the source code. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On 10/24/2012 03:15 PM, Nick Coghlan wrote: Breaking such files up into separately compiled modules serves two purposes: 1. It proves that the code *isn't* a tangled monolithic mess; 2. It enlists the compilation toolchain's assistance in ensuring that remains the case in the future. Either the code is a tangled monolithic mess or it isn't. If it is, then let's fix that, regardless of the size of the file. If it isn't, I don't see breaking up the code among multiple files as providing any benefit. And I see no need for the toolchain's assistance to help us do something without benefit. The line count of the file is essentially unrelated to its inherent quality / maintainability. We are not special snow flakes - good software engineering practice is advisable for us as well, so a big +1 from me for breaking up the monstrosity that is unicodeobject.c and lowering the barrier to entry for hacking on the individual pieces. This should come with a large block comment in unicodeobject.c explaining how the pieces are put back together again. I'm all for good software engineering practice. But can you cite objective reasons why large source files are provably bad? Not tangled monolithic messes, not poorly-factored code. I agree that those are bad--but so far nobody has proposed that either of those is true about unicodeobject.c (unless you are implicitly doing so above), nor have they proposed credible remedies. All I've seen is that unicodeobject.c is a large file, and some people want to break it up into smaller files. I have yet to see anything but handwaving as justification. For example, what is this barrier to entry you suggest exists to hacking on the str object, that will apparently be dispelled simply by splitting one file into multiple files? Someone proposed breaking up unicodeobject.c into three distinct subsystems and putting those in separate files. I still don't agree. It seems natural to me to have everything associated with the str object in one file, just as we do with every other object I can think of. If this were a genuinely good idea, we should consider doing it with every similar object. But nobody is proposing that. My guess is because the other files in CPython are small enough. At which point we're right back to the primary motivation simply being the line count of unicodeobject.c, as a purely aesthetic and subjective judgment. //arry/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On Thu, 25 Oct 2012 08:13:53 -0700 Larry Hastings la...@hastings.org wrote: I'm all for good software engineering practice. But can you cite objective reasons why large source files are provably bad? Not tangled monolithic messes, not poorly-factored code. I agree that those are bad--but so far nobody has proposed that either of those is true about unicodeobject.c (unless you are implicitly doing so above) Well, tangled monolithic mess is quite true about unicodeobject.c, IMO. Seriously, I agree with Victor: navigating around unicodeobject.c is a PITA. Perhaps it isn't if you are using emacs, or you have 35 fingers, or just a lot of spare time, but in my experience it's painful. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
Antoine Pitrou writes: Well, tangled monolithic mess is quite true about unicodeobject.c, IMO. s/object.c// and your point remains valid. Just reading the table of contents for UTR#17 (http://www.unicode.org/reports/tr17/) should convince you that it's not going to be easy to produce an elegant implementation! Seriously, I agree with Victor: navigating around unicodeobject.c is a PITA. Perhaps it isn't if you are using emacs, or you have 35 fingers, or just a lot of spare time, but in my experience it's painful. Sure, but I don't know of a Unicode implementation which isn't. I don't think that having a unicode/*.[ch] with a dozen files (including the README etc) in it is going to make it much more navigable. If there are too many files, it's going to be a PITA to maintain because there won't be an obvious place to put certain functions. Eg, I've already mentioned my suspicions about the charmap code (I apologize for not reading Victor's code to confirm them). I don't object in principle to splitting the unicodeobject.c. At the very least, with all due respect to MAL, XEmacs experience with coding systems (the Emacs equivalent of Python codecs) suggests that there is very little to be lost by moving the codec implementations to a separate file from the Unicode object implementation. (Here I'm talking about codecs in the narrow sense of wire-format to Python3 str and back, not the more general Python2 sense that included zip and base64 and so on. Ie, PyUnicode_Translate is not a codec in the relevant sense.) On the other hand, I wouldn't be surprised if (despite my earlier suggestion) codecs and unicode object internals need a close relationship. (My intuition and sense of style says splitting codecs from the low level memory management and PEP 393 stuff is a good idea, but I'm not confident it would have no impact on performance.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On 10/23/2012 09:29 AM, Georg Brandl wrote: Especially since you're suggesting a huge number of new files, I question the argument of better navigability. FWIW I'm -1 on it too. I don't see what the big deal is with large source files. If you have difficulty finding your way around unicodeobject.c, that seems like more like a tooling issue to me, not a source code structural issue. //arry/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On Oct 25, 2012 2:06 AM, Larry Hastings la...@hastings.org wrote: On 10/23/2012 09:29 AM, Georg Brandl wrote: Especially since you're suggesting a huge number of new files, I question the argument of better navigability. FWIW I'm -1 on it too. I don't see what the big deal is with large source files. If you have difficulty finding your way around unicodeobject.c, that seems like more like a tooling issue to me, not a source code structural issue. OK, I need to weigh in after seeing this kind of reply. Large source files are discouraged in general because they're a code smell that points strongly towards a *lack of modularity* within a *complex piece of functionality*. Breaking such files up into separately compiled modules serves two purposes: 1. It proves that the code *isn't* a tangled monolithic mess; 2. It enlists the compilation toolchain's assistance in ensuring that remains the case in the future. I find complaints about the ease of searching within the file to be misguided and irrelevant, as I can just as easily reply with if searching across multiple files is hard for you, use better tools, like grep, or 'Find in Files'. Note that I also consider the pro argument about better navigability inaccurate - the real gain is in *modularity*, making it clear to readers which parts can be understood and worked on separately from each other. We are not special snow flakes - good software engineering practice is advisable for us as well, so a big +1 from me for breaking up the monstrosity that is unicodeobject.c and lowering the barrier to entry for hacking on the individual pieces. This should come with a large block comment in unicodeobject.c explaining how the pieces are put back together again. However, -1 on the faux modularity idea of breaking up the files on disk, but still exposing them to the compiler and linker as a monolithic block, though. That would be completely missing the point of why large source files are bad. Regards, Nick. -- Sent from my phone, thus the relative brevity :) /arry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On Oct 25, 2012, at 08:15 AM, Nick Coghlan wrote: OK, I need to weigh in after seeing this kind of reply. Large source files are discouraged in general because they're a code smell that points strongly towards a *lack of modularity* within a *complex piece of functionality*. Modularity is good, and the file system structure of the project should reflect that, but to be effective, it needs to be obvious. It's pretty obvious what's generally in intobject.c. I've worked with code bases where there's no rhyme nor reason as to what you'd find in a particular file, and this really hurts. It hurts even with good tools. Remember that sometimes you don't even know what you're looking for, so search tools may not be very useful. For example, sometimes you want to understand how all the pieces fit together, what the holistic view of the subsystem is, or where the entry points are. Search tools are not very good at this, and if it's a subsystem you only interact with occasionally, having a file system organization that makes things easier to remember what you learned the last time you were there helps enormously. Another point: rather than large files (or maybe in addition to them), large functions can also be painful to navigate. So just splitting a file into subfiles may not be the only modularity improvement you can make. While I'm personally -0 about splitting up unicodeobject.c, if the folks advocating for it go ahead with it, I just ask that you do it very carefully, with an eye toward the casual and newbie reader of our code base. Cheers, -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On Thu, Oct 25, 2012 at 8:37 AM, Barry Warsaw ba...@python.org wrote: On Oct 25, 2012, at 08:15 AM, Nick Coghlan wrote: OK, I need to weigh in after seeing this kind of reply. Large source files are discouraged in general because they're a code smell that points strongly towards a *lack of modularity* within a *complex piece of functionality*. Modularity is good, and the file system structure of the project should reflect that, but to be effective, it needs to be obvious. It's pretty obvious what's generally in intobject.c. I've worked with code bases where there's no rhyme nor reason as to what you'd find in a particular file, and this really hurts. It hurts even with good tools. Remember that sometimes you don't even know what you're looking for, so search tools may not be very useful. For example, sometimes you want to understand how all the pieces fit together, what the holistic view of the subsystem is, or where the entry points are. Search tools are not very good at this, and if it's a subsystem you only interact with occasionally, having a file system organization that makes things easier to remember what you learned the last time you were there helps enormously. And if we were talking in the abstract, I think these would be reasonable concerns to bring up. However, Victor's proposed division *is* logical (especially if he goes down the path of a separate subdirectory which will better support easy searching across all of the unicode object related files), and I conditioned my +1 with the requirement that a road map be provided in a leading block comment in unicodeobject.c. speed.python.org is also making progress, and once that is up and running (which will happen well before any Python 3.4 release) it will be possible to compare the numbers between 3.3 and trunk to help determine the validity of any concerns regarding optimisations that can be performed within a module but not across modules. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
Nick Coghlan writes: OK, I need to weigh in after seeing this kind of reply. Large source files are discouraged in general because they're a code smell that points strongly towards a *lack of modularity* within a *complex piece of functionality*. Sure, but large numbers of tiny source files are also a code smell, the smell of purist adherence to the literal principle of modularity without application of judgment. If you want to argue that the pragmatic point of view nevertheless is to break up the file, I can see that, but I think Victor is going too far. (Full disclosure dept.: the call graph of the Emacs equivalents is isomorphic to the Dungeon of Zork, so I may be a bit biased.) You really should speak to the question of how many and what partition. the real gain is in *modularity*, making it clear to readers which parts can be understood and worked on separately from each other. Yeah, so which do you think they are? It seems to me that there are three modules to be carved out of unicodeobject.c: 1. The internal object management that is not exposed to Python: allocation, deallocation, and PEP 393 transformations. 2. The public interface to Python implementation: methods and properties, including operators. 3. Interaction with the outside world: codec implementations. But conceptually, these really don't have anything to do with internal implementation of Unicode objects. They're just functions that convert bytes to Unicode and vice versa. In principle they can be written in terms of ord(), chr(), and bytes(). On the other hand, they're rather repetitive: When you've seen one codec implementation, you've seen them all. I see no harm in grouping them in one file, and possibly a gain from proximity: casual passers-by might see refactorings that reduce redundancy. I'm not sure what to do with the charmap stuff. In current CPython head it seems incoherent to me: there's an IO codec, but there's also unicode-to-unicode stuff (PyUnicode_Translate). I haven't had time to look at Victor's reorganization to see what he actually did with it, but in terms of modularity, it seems to me that refactoring this stuff would be a real win, as opposed to splitting the files which is presentational improvement for the rest of the code which is pretty modular. As for Victor's proposal itself: 1176 Objects/unicodecharmap.c 1678 Objects/unicodecodecs.c 1362 Objects/unicodeformat.c 253 Objects/unicodeimpl.h 733 Objects/unicodelegacy.c 1836 Objects/unicodenew.c 2777 Objects/unicodeobject.c 2421 Objects/unicodeoperators.c 1235 Objects/unicodeoscodecs.c 1288 Objects/unicodeutfcodecs.c As Victor himself admits, unicodelegacy and unicodenew are not descriptive of what they contain. In I18N discussions, legacy is usually a deprectory reference to non-Unicode encodings, and I would tend to guess this file contains codecs from the name. A better name might be unicodedeprecated (if what he really means is deprecated APIs). I don't understand why splitting out unicodeoperators is a great idea; it's done nowhere else in CPython. If that makes sense, why not split out unicodemethods (for methods normally invoked explicitly rather than by syntax) too? N.B. For bytes, the corresponding file is spelled bytes_methods. unicodecodecs vs unicodeutfcodecs: Say what? I would forever be looking in the wrong one. unicodeoscodecs suggests to me that these codecs are only usable on some OSes. If so, shouldn't the relevant OS be in the name? If not, the name is basically misleading IMO. Why are any of these codecs here in unicodeobjectland in the first place? Sure, they're needed so that Python can find its own stuff, but in principle *any* codec could be needed. Is it just an heuristic that the codecs needed for 99% of the world are here, and other codecs live in separate modules? Steve ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
2012/10/22 Victor Stinner victor.stin...@gmail.com: Hi, I forked CPython repository to work on my split unicodeobject.c project: http://hg.python.org/sandbox/split-unicodeobject.c The result is 10 files (included the existing unicodeobject.c): 1176 Objects/unicodecharmap.c 1678 Objects/unicodecodecs.c 1362 Objects/unicodeformat.c 253 Objects/unicodeimpl.h 733 Objects/unicodelegacy.c 1836 Objects/unicodenew.c 2777 Objects/unicodeobject.c 2421 Objects/unicodeoperators.c 1235 Objects/unicodeoscodecs.c 1288 Objects/unicodeutfcodecs.c 14759 total This is just a proposition (and work in progress). Everything can be changed :-) unicodenew.c is not a good name. Content of this file may be moved somewhere else. Some files may be merged again if the separation is not justified. I don't like the unicode prefix for filenames, I would prefer a new directory. -- Shorter files are easier to review and maintain. The compilation is faster if only one file is modified. The MBCS codec requires windows.h. The whole unicodeobject.c includes it just for this codec. With the split, only unicodeoscodecs.c includes this file. The MBCS codec needs also a winver variable. This variable is defined between the BLOOM filter and the unicode_result_unchanged() function. How can you explain how these things are sorted? Where should I add a new function or variable? With the split, the variable is now defined very close to where is it used. You don't have to scroll 7000 lines to see where it is used. If you would like to work on a specific function, you don't have to use the search function of your editor to skip thousands to lines. For example, the 18 functions and 2 types related to the charmap codec are now grouped into one unique and short C file. It was already possible to extend and maintain unicodeobject.c (some people proved it!), but it should now be much simpler with shorter files. I would like to repeat my opposition to splitting unicodeobject.c. I don't think the benefits of such a split have been well justified, certainly not to the point that the claim about much simpler maintenance is true. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
On 23.10.2012 10:22, Benjamin Peterson wrote: 2012/10/22 Victor Stinner victor.stin...@gmail.com: Hi, I forked CPython repository to work on my split unicodeobject.c project: http://hg.python.org/sandbox/split-unicodeobject.c The result is 10 files (included the existing unicodeobject.c): 1176 Objects/unicodecharmap.c 1678 Objects/unicodecodecs.c 1362 Objects/unicodeformat.c 253 Objects/unicodeimpl.h 733 Objects/unicodelegacy.c 1836 Objects/unicodenew.c 2777 Objects/unicodeobject.c 2421 Objects/unicodeoperators.c 1235 Objects/unicodeoscodecs.c 1288 Objects/unicodeutfcodecs.c 14759 total This is just a proposition (and work in progress). Everything can be changed :-) unicodenew.c is not a good name. Content of this file may be moved somewhere else. Some files may be merged again if the separation is not justified. I don't like the unicode prefix for filenames, I would prefer a new directory. -- Shorter files are easier to review and maintain. The compilation is faster if only one file is modified. The MBCS codec requires windows.h. The whole unicodeobject.c includes it just for this codec. With the split, only unicodeoscodecs.c includes this file. The MBCS codec needs also a winver variable. This variable is defined between the BLOOM filter and the unicode_result_unchanged() function. How can you explain how these things are sorted? Where should I add a new function or variable? With the split, the variable is now defined very close to where is it used. You don't have to scroll 7000 lines to see where it is used. If you would like to work on a specific function, you don't have to use the search function of your editor to skip thousands to lines. For example, the 18 functions and 2 types related to the charmap codec are now grouped into one unique and short C file. It was already possible to extend and maintain unicodeobject.c (some people proved it!), but it should now be much simpler with shorter files. I would like to repeat my opposition to splitting unicodeobject.c. I don't think the benefits of such a split have been well justified, certainly not to the point that the claim about much simpler maintenance is true. Same feelings here. If you do go ahead with such a split, please only split the source files and keep the unicodeobject.c file which then includes all the other files. Such a restructuring should not result in compilers no longer being able to optimize code by inlining functions in one of the most important basic types we have in Python 3. Also note that splitting the file in multiple smaller ones will actually create more maintenance overhead, since patches will likely no longer be easy to merge from 3.3 to 3.4. BTW: The positive effect of having everything in one file is that you no longer have to figure which files to look when trying to find a piece of logic... it's just a ctrl-f or ctrl-s away :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 23 2012) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2012-09-27: Released eGenix PyRun 1.1.0 ... http://egenix.com/go35 2012-09-26: Released mxODBC.Connect 2.0.1 ... http://egenix.com/go34 2012-09-25: Released mxODBC 3.2.1 ... http://egenix.com/go33 2012-10-23: Python Meeting Duesseldorf ... today eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
Such a restructuring should not result in compilers no longer being able to optimize code by inlining functions in one of the most important basic types we have in Python 3. I agree that performances are important. But I'm not convinced than moving functions has a real impact on performances, not that such issues cannot be fixed. I tried to limit changes impacting performances. Inlining is (only?) interesting for short functions. PEP 393 introduces many macros for this. I also added some Fast functiions (_PyUnicode_FastCopyCharacters() and _PyUnicode_FastFill()) which don't check parameters and do the real work. I don't think that it's really useful to inline _PyUnicode_FastFill() in the caller for example. I will check performances of all str methods. For example, str.count() is now calling PyUnicode_Count() instead of the static count(). PyUnicode_Count() adds some extra checks, some of them are not necessary, and it's not a static function, so it cannot(?) be inlined. But I bet that the overhead is really low. Note: Since GCC 4.5, Link Time Optimization are possible. I don't know if GCC is able to inline functions defined in different files, but C compilers are better at each release. -- I will check the impact of performances on _PyUnicode_Widen() and _PyUnicode_Putchar(), which are no more static. _PyUnicode_Widen() and _PyUnicode_Putchar() are used in Unicode codecs when it's more expensive to compute the exact length and maximum character of the output string. These functions are optimistic (hope that the output will not grow too much and the string is not widen too much times, so it should be faster for ASCII). I implemented a similar approach in my PyUnicodeWriter API, and I plan to reuse this API to simplify the API. PyUnicodeWriter uses some macro to limit the overhead of having to check before each write if we need to enlarge or widen the internal buffer, and allow to write directly into the buffer using low level functions like PyUnicode_WRITE. I also hope a performance improvement because the PyUnicodeWriter API can also overallocate the internal buffer to reduce the number of calls to realloc() (which is usually slow). Also note that splitting the file in multiple smaller ones will actually create more maintenance overhead, since patches will likely no longer be easy to merge from 3.3 to 3.4. I'm a candidate to maintain unicodeobject.c. In your check unicodeobject.c (recent) history, I'm one of the most active developer on this file since two years (especially in 2012). I'm not sure that merges on this file are so hard. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
Le 23/10/2012 12:05, Victor Stinner a écrit : Such a restructuring should not result in compilers no longer being able to optimize code by inlining functions in one of the most important basic types we have in Python 3. I agree that performances are important. But I'm not convinced than moving functions has a real impact on performances, not that such issues cannot be fixed. I agree with Marc-André, there's no point in compiling those files separately. #include'ing them in the master unicodeobject.c file is fine. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles
2012/10/23 Antoine Pitrou solip...@pitrou.net: I agree with Marc-André, there's no point in compiling those files separately. #include'ing them in the master unicodeobject.c file is fine. I also find the unicodeobject.c difficult to navigate. Even if we don't split the file, I'd advocate a better presentation of its content. Could we have at least clear sections, with titles and descriptions? And use the ^L page separator for Emacs users? Code in posixmodule.c could also benefit of a better layout. -- Amaury Forgeot d'Arc ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
The amount of code will not be reduced, but now you also need to guess what file some piece of functionality may be in. How do you search a piece of code? If you search for a function by its name, it does not matter in which file it is defined if you an IDE or vim/emacs with a correct configuration. For example, I type :tag PyUnicode_Format to go to the PyUnicode_Format() function. Instead of having my text editor (Emacs) search in one file, it will have to search across multiple files - but not across all open buffers, but only some of them (since I will have many other source files open as well). Does it mean that it would be more practical to merge all C files into one unique file? I really fail to see what problem people have with large source files. What is it that you want to do that can be done easier if it's multiple files? Another problem with huge files is to handle dependencies with static functions. If the function A calls the function B which calls the function C, you have to order A, B and C correctly if these functions are private and not declared at the top of the file. If functions are grouped correctly, you just lhave to add the function to the right file, or reorder the files. I also prefer short files beacuse it's easier to review/audit a small file. My brain cannot store too many functions :-) Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
2012/10/7 Victor Stinner victor.stin...@gmail.com: Another problem with huge files is to handle dependencies with static functions. If the function A calls the function B which calls the function C, you have to order A, B and C correctly if these functions are private and not declared at the top of the file. Having separate files doesn't alleviate this, though. If they're in separate files, you have to have header files of prototypes. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
On Mon, Oct 8, 2012 at 8:17 AM, Victor Stinner victor.stin...@gmail.com wrote: Another problem with huge files is to handle dependencies with static functions. If the function A calls the function B which calls the function C, you have to order A, B and C correctly if these functions are private and not declared at the top of the file. If functions are grouped correctly, you just lhave to add the function to the right file, or reorder the files. This isn't a fundamental problem, since you can always declare a private function if it's mutually recursive with another private function. But - forgive me if this is false in CPython - this isn't usually that common. Also, ordering the functions in (at least an approximation of) Define Before Use makes it easy to locate the one you're calling, even in a non-smart editor: just go to the top of the file and search for the function's name; the first hit will be the definition. It's not usually difficult to sort functions appropriately, and can pay dividends in readability. ChrisA ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
Zitat von Victor Stinner victor.stin...@gmail.com: The amount of code will not be reduced, but now you also need to guess what file some piece of functionality may be in. How do you search a piece of code? I type /pattern in vim, or Ctrl-s (incremental search) in Emacs. If you search for a function by its name, it does not matter in which file it is defined if you an IDE or vim/emacs with a correct configuration. For example, I type :tag PyUnicode_Format to go to the PyUnicode_Format() function. I don't like tag files. I want to search in all source code (including comments and strings), and I want to do a substring search (not sure whether that is supported in tag files). Instead of having my text editor (Emacs) search in one file, it will have to search across multiple files - but not across all open buffers, but only some of them (since I will have many other source files open as well). Does it mean that it would be more practical to merge all C files into one unique file? That would be extreme, of course. It may cause problems with the responsiveness of the editor, and with compile times; it may also cause problems with merging in version control. In addition, there might be naming conflicts which make it impractical (e.g. many structures containing the same tp_* struct slots, so when you search for tp_new, for example, you would get too many hits). But in principle, I don't mind maintaining *very* large source files. unicodeobject.c isn't really *that* large. What is it that you want to do that can be done easier if it's multiple files? Another problem with huge files is to handle dependencies with static functions. If the function A calls the function B which calls the function C, you have to order A, B and C correctly if these functions are private and not declared at the top of the file. If functions are grouped correctly, you just lhave to add the function to the right file, or reorder the files. I don't understand. Do you envision that A, B, and C are in separate files? If so, they cannot be all static anymore, unless you still combine all files with #include directives, or unless you put them still all in the same file. I don't see how multiple files gives any improvement. It seems to make matters worse: - if you put A, B, C in the same file, you have the same issue that you had when unicodeobject.c was a large file - you have to order them correctly. - if you put them in different files, it gets worse: you need to place A in a file that gets included after the file that has B, even if it would be more logical to put them reverse. I also prefer short files beacuse it's easier to review/audit a small file. My brain cannot store too many functions :-) This is what I don't understand. Why do you have to remember all functions when reviewing or auditing a file? You can safely ignore all functions but the one you are reviewing - whether the other functions are in a different file or in the same file. Why can you ignore the functions only if they are stored in a different file? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
Victor Stinner wrote: Hi, I would like to split the huge unicodeobject.c file into smaller files. It's just the longest C file of CPython: 14,849 lines. I don't know exactly how to split it, but first I would like to know if you would agree with the idea. Example: - Objects/unicode/codecs.c - Objects/unicode/mod_format.c - Objects/unicode/methods.c - Objects/unicode/operators.c - etc. I don't know if it's better to use a subdirectory, or use a prefix for new files: Objects/unicode_methods.c, Objects/unicode_codecs.c, etc. There is already a Python/codecs.c file for example (same filename). Better follow the already existing pattern of using unicode as prefix, e.g. unicodectype.c and unicodetype_db.h. I would like to split the unicodeobject.c because it's hard to navigate in this huge file between all functions, variables, types, macros, etc. It's hard to add new code and to fix bugs. For example, the implementation of str%args takes 1000 lines, 2 types and 10 functions (since my refactor yesterday, in Python 3.3 the main function is 500 lines long :-)). I only see one argument against such refactoring: it will be harder to backport/forwardport bugfixes. When making such a change, you have to pay close attention to functions that the compiler can potentially inline. AFAIK, moving such functions into a separate file would prevent such inlining/optimizations, e.g. the str formatter wouldn't be able to inline codec calls if placed in separate .c files. It may be better to split the file into multiple .h files which then get recombined into the one unicodeobject.c file. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 05 2012) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2012-09-27: Released eGenix PyRun 1.1.0 ... http://egenix.com/go35 2012-09-26: Released mxODBC.Connect 2.0.1 ... http://egenix.com/go34 2012-09-25: Released mxODBC 3.2.1 ... http://egenix.com/go33 2012-10-23: Python Meeting Duesseldorf ... 18 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
On Thu, Oct 4, 2012 at 6:49 PM, Stephen J. Turnbull step...@xemacs.org wrote: Chris Jerdonek writes: You can create multiple files this way. I just verified it. But the problem happens with merging. You will create merge conflicts in the deleted portions of every split file on every merge. There may be a way to avoid this that I don't know about though (i.e. to record that merges into the deleted portions should no longer occur). ... There's no other way to do it that I know of in any VCS because they all track conflicts at the file level. (It would be straightforward to generalize git to handle this gracefully, but it would be a hugely disruptive change. I don't know if Mercurial would be susceptible to such an extension.) FWIW, I filed an issue in Mercurial's tracker to add support for splitting files and copying subsets of files: http://bz.selenic.com/show_bug.cgi?id=3649 As I thought it might be, the idea was rejected. --Chris ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
I like the idea. From my perspective better to use subdirectory to sake of easy finding in grep style. On Thu, Oct 4, 2012 at 11:30 PM, Victor Stinner victor.stin...@gmail.com wrote: Hi, I would like to split the huge unicodeobject.c file into smaller files. It's just the longest C file of CPython: 14,849 lines. I don't know exactly how to split it, but first I would like to know if you would agree with the idea. Example: - Objects/unicode/codecs.c - Objects/unicode/mod_format.c - Objects/unicode/methods.c - Objects/unicode/operators.c - etc. I don't know if it's better to use a subdirectory, or use a prefix for new files: Objects/unicode_methods.c, Objects/unicode_codecs.c, etc. There is already a Python/codecs.c file for example (same filename). I would like to split the unicodeobject.c because it's hard to navigate in this huge file between all functions, variables, types, macros, etc. It's hard to add new code and to fix bugs. For example, the implementation of str%args takes 1000 lines, 2 types and 10 functions (since my refactor yesterday, in Python 3.3 the main function is 500 lines long :-)). I only see one argument against such refactoring: it will be harder to backport/forwardport bugfixes. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com -- Thanks, Andrew Svetlov ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
2012/10/4 Victor Stinner victor.stin...@gmail.com: I only see one argument against such refactoring: it will be harder to backport/forwardport bugfixes. I imagine it could also prevent inlining of hot paths. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
2012/10/4 Benjamin Peterson benja...@python.org: 2012/10/4 Victor Stinner victor.stin...@gmail.com: I only see one argument against such refactoring: it will be harder to backport/forwardport bugfixes. I imagine it could also prevent inlining of hot paths. It depends how the code is compiled. The stringlib is splitted in many .h files, but it is able to use Py_LOCAL_INLINE. If the code is grouped correctly, we may not loose any nice optimization at all. FYI #include test.c is allowed in C ;-) Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
2012/10/4 Victor Stinner victor.stin...@gmail.com: 2012/10/4 Benjamin Peterson benja...@python.org: 2012/10/4 Victor Stinner victor.stin...@gmail.com: I only see one argument against such refactoring: it will be harder to backport/forwardport bugfixes. I imagine it could also prevent inlining of hot paths. It depends how the code is compiled. The stringlib is splitted in many .h files, but it is able to use Py_LOCAL_INLINE. If the code is grouped correctly, we may not loose any nice optimization at all. FYI #include test.c is allowed in C ;-) Yes, but then compilation won't be any faster. ;) -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
On Thu, Oct 4, 2012 at 1:30 PM, Victor Stinner victor.stin...@gmail.com wrote: I would like to split the huge unicodeobject.c file into smaller files. It's just the longest C file of CPython: 14,849 lines. ... I only see one argument against such refactoring: it will be harder to backport/forwardport bugfixes. I am not siding with either side of the change yet, but an additional argument against is that history may become less convenient to navigate and track (e.g. hg annotate may lose information depending on how the split is done). Do we have a preferred way to split files? For example, hg rename could be used just for the largest chunk. Or hg copy could be used on all chunks but one. I imagine (but have not confirmed) that the latter would preserve hg annotate and let merges propagate to all files, but it would also result in spurious merge conflicts in every one of the files resulting from the split. --Chris ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
I am not siding with either side of the change yet, but an additional argument against is that history may become less convenient to navigate and track (e.g. hg annotate may lose information depending on how the split is done). If new files are created using hg cp unicodeobject.c unicode/newfile.c, the historic is kept. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
2012/10/4 Victor Stinner victor.stin...@gmail.com: I am not siding with either side of the change yet, but an additional argument against is that history may become less convenient to navigate and track (e.g. hg annotate may lose information depending on how the split is done). If new files are created using hg cp unicodeobject.c unicode/newfile.c, the historic is kept. Yes, but you can only create one file that way. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
2012/10/5 Benjamin Peterson benja...@python.org: 2012/10/4 Victor Stinner victor.stin...@gmail.com: If new files are created using hg cp unicodeobject.c unicode/newfile.c, the historic is kept. Yes, but you can only create one file that way. You can create as many files as you want. Try: --- hg cp unicodeobject.c unicode2.c hg cp unicodeobject.c unicode3.c hg ci -m add new files edit unicode2.c (remove most lines) edit unicode3.c (remove most lines, but other lines) hg ci -m modify hg blame unicode2.c hg blame unicode3.c --- Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
On Thu, Oct 4, 2012 at 4:31 PM, Benjamin Peterson benja...@python.org wrote: 2012/10/4 Victor Stinner victor.stin...@gmail.com: I am not siding with either side of the change yet, but an additional argument against is that history may become less convenient to navigate and track (e.g. hg annotate may lose information depending on how the split is done). If new files are created using hg cp unicodeobject.c unicode/newfile.c, the historic is kept. Yes, but you can only create one file that way. You can create multiple files this way. I just verified it. But the problem happens with merging. You will create merge conflicts in the deleted portions of every split file on every merge. There may be a way to avoid this that I don't know about though (i.e. to record that merges into the deleted portions should no longer occur). --Chris ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
On 10/4/2012 4:30 PM, Victor Stinner wrote: Hi, I would like to split the huge unicodeobject.c file into smaller files. It's just the longest C file of CPython: 14,849 lines. What problem are you trying to solve? -- Eric. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
On Thu, 04 Oct 2012 23:46:57 +0200 mar...@v.loewis.de wrote: Zitat von Victor Stinner victor.stin...@gmail.com: I only see one argument against such refactoring: it will be harder to backport/forwardport bugfixes. I'm opposed for a different reason: I think it will be *harder* to maintain. The amount of code will not be reduced, but now you also need to guess what file some piece of functionality may be in. Instead of having my text editor (Emacs) search in one file, it will have to search across multiple files - but not across all open buffers, but only some of them (since I will have many other source files open as well). I really fail to see what problem people have with large source files. What is it that you want to do that can be done easier if it's multiple files? Navigate, basically. That is, switch between different pieces of code, without having to type in some text to search for. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
Chris Jerdonek writes: You can create multiple files this way. I just verified it. But the problem happens with merging. You will create merge conflicts in the deleted portions of every split file on every merge. There may be a way to avoid this that I don't know about though (i.e. to record that merges into the deleted portions should no longer occur). hg commit will do that automatically, but you need to resolve that conflict once manually. If you also happen to be merging *from* a feature branch *into* the trunk, you need to close the feature branch. If it needs more work, close it and make a new branch. Alternatively, merge from trunk into the feature branch immediately to minimize the accumulation of conflicts. Then the eventual merge back to trunk will only have real conflicts in it. Note that immediately in the sense needed can be done at any time because what you need to do is merge the revision created by the hg cp operations. If Victor tags with hg tag unicode-refactored, then you just do hg merge -r unicode-refactored, and if you haven't made any changes to the relevant files, you shouldn't get any conflicts. If you do, then use hg revert -r unicode-refactored file ..., followed by hg resolve --mark file ..., then fix other conflicts, resolve, and commit as usual. There's no other way to do it that I know of in any VCS because they all track conflicts at the file level. (It would be straightforward to generalize git to handle this gracefully, but it would be a hugely disruptive change. I don't know if Mercurial would be susceptible to such an extension.) Specifically, AFAIK this kind of merge conflict will occur: - if the branch being merged was forked before the hg cp, and - for each file in the branch containing changes in the deleted region. I don't advocate this, just want to make the costs clearer. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split unicodeobject.c into subfiles?
2012/10/4 Antoine Pitrou solip...@pitrou.net: On Thu, 04 Oct 2012 23:46:57 +0200 mar...@v.loewis.de wrote: Zitat von Victor Stinner victor.stin...@gmail.com: I only see one argument against such refactoring: it will be harder to backport/forwardport bugfixes. I'm opposed for a different reason: I think it will be *harder* to maintain. The amount of code will not be reduced, but now you also need to guess what file some piece of functionality may be in. Instead of having my text editor (Emacs) search in one file, it will have to search across multiple files - but not across all open buffers, but only some of them (since I will have many other source files open as well). I really fail to see what problem people have with large source files. What is it that you want to do that can be done easier if it's multiple files? Navigate, basically. That is, switch between different pieces of code, without having to type in some text to search for. I find it's only possible to navigate without searching for extremely small files. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com