Re: [Python-ideas] Introduce typing.SupportsFsPath
In typeshed there is os.PathLike which is close. You should be able to use Union[str, os.PathLike[str]] for what you want (or define an alias). We generally don't want to add more things to typing that aren't closely related to the type system. (Adding the io and re classes was already less than ideal, and we don't want to do more of those.) On Mon, Oct 8, 2018 at 3:10 PM wrote: > Hello, > > Since __fspath__ was introduced in PEP 519 it is possible to create > object classes that are representing file system paths. > But there is no corresponding type object in the "typing" module. Thus I > cannot specify functions, that accept any kind of object which supports > the __fspath__ protocol. > > Please note that "Path" is not a replacement for "SupportsFsPath", since > the concept of PEP 519 is, that I could implement new objects (without > dependency to "Path") > that are implementing the __fspath__ protocol. > > robert > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Introduce typing.SupportsFsPath
Hello, Since __fspath__ was introduced in PEP 519 it is possible to create object classes that are representing file system paths. But there is no corresponding type object in the "typing" module. Thus I cannot specify functions, that accept any kind of object which supports the __fspath__ protocol. Please note that "Path" is not a replacement for "SupportsFsPath", since the concept of PEP 519 is, that I could implement new objects (without dependency to "Path") that are implementing the __fspath__ protocol. robert ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Support parsing stream with `re`
On 08Oct2018 13:36, Ram Rachum wrote: I'm not an expert on memory. I used Process Explorer to look at the Process. The Working Set of the current run is 11GB. The Private Bytes is 708MB. Actually, see all the info here: https://www.dropbox.com/s/tzoud028pzdkfi7/screenshot_TURING_2018-10-08_133355.jpg?dl=0 And the process' virtual size is about 353GB, which matches having your file mmaped (its contents is now part of your process virtual memory space). I've got 16GB of RAM on this computer, and Process Explorer says it's almost full, just ~150MB left. This is physical memory. I'd say this is expected behaviour. As you access the memory it is paged into physical memory, and since it may be wanted again (the OS can't tell) it isn't paged out until that becomes necessary to make room for other virtual pages. I suspect (but would need to test to find out) that sequentially reading the file instead of memory mapping it might not be so aggressive because your process would be reusing that same small pool of memory to hold data as you scan the file. Cheers, Cameron Simpson ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] add a time decorator to timeit.py
> Summary: Python's timeit.timeit() has an undocumented feature / > implementation detail that gives much of what the original poster has > asked for. Perhaps revising the docs will solve the problem. although timeit can be used with a callable, you need to create a lambda expression if the function has args: ``` def func_to_time(a, b): ... timeit.timeit(lambda: func_to_time(a, b), globals=globals()) ``` and you can't use it as a decorator. De : Python-ideas de la part de Jonathan Fine Envoyé : dimanche 7 octobre 2018 09:15 À : python-ideas Objet : Re: [Python-ideas] add a time decorator to timeit.py Summary: Python's timeit.timeit() has an undocumented feature / implementation detail that gives much of what the original poster has asked for. Perhaps revising the docs will solve the problem. This thread has prompted me to look at timeit again. Usually, I look at the command line help first. >>> import timeit >>> help(timeit) Classes: Timer Functions: timeit(string, string) -> float repeat(string, string) -> list default_timer() -> float This time, to my surprise, I found the following works: >>> def fn(): return 2 + 2 >>> timeit.timeit(fn) 0.10153918000287376 Until today, as I recall, I didn't know this. Now for: https://docs.python.org/3/library/timeit.html I don't see any examples there, that show that timeit.timeit can take a callable as its first argument. So my ignorance can, I hope be forgiven. Now for: https://github.com/python/cpython/blob/3.7/Lib/timeit.py#L100 This contains, for both the stmt and setup parameters, explicit tests such as if isinstance(stmt, str): # string case elif callable(stmt): # callable case So I think it's an undocumented feature, rather than an implementation detail. And if you're a software historian, now perhaps look at https://github.com/python/cpython/commits/3.7/Lib/timeit.py And also, if you wish, for the tests for timeit.py. -- Jonathan ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Why is design-by-contracts not widely adopted?
On 10/8/2018 10:26 AM, Steven D'Aprano wrote: On Sun, Oct 07, 2018 at 04:24:58PM -0400, Terry Reedy wrote: https://www.win.tue.nl/~wstomv/edu/2ip30/references/design-by-contract/index.html defines contracts as "precise (legally unambiguous) specifications" (5.2 Business Contracting/Sub-contracting Metaphor) You are quoting that out of context. The full context says (emphasis added): IN THE BUSINESS WORLD, contracts are precise (legally unambiguous) specifications that define the obligations and benefits of the (usually two) parties involved. This is silly. Every quote that is not complete is literally 'out of context'. However, 'quoting out of context', in the colloquial sense, means selectively quoting so as to distort the original meaning, whereas I attempted to focus on the core meaning I was about to discuss. Marko asked an honest question about why things obvious to him are not obvious to others. I attempted to give an honest answer. If my answer suggested that I have not undertstood Marko properly, as is likely, he can use it as a hint as to how communicate his position better. >> I said above that functions may be specified by >> process rather than result. > > Fine. What of it? Can you describe what the function does? > > "It sorts the list in place." > > "It deletes the given record from the database." > These are all post-conditions. No they are not. They are descriptions of the process. Additional mental work is required to turn them into formal descriptions of the result that can be coded. Marko appears to claim that such coded formal descriptions are easier to read and understand than the short English description. I disagree. It is therefore not obvious to me that the extra work is worthwhile. def append_first(seq): "Append seq[0] to seq." [...] The snipped body (revised to omit irrelevant 'return') seq.append(seq[0]) But with duck-typing, no post condition is possible. That's incorrect. def append_first(seq): require: len(seq) > 0 seq does not neccessarily have a __len__ method hasattr(seq, "append") The actual precondition is that seq[0] be in the domain of seq.append. The only absolutely sure way to test this is to run the code body. Or one could run seq[0] and check it against the preconditions, if formally specified, of seq.append. ensure: len(seq) == len(OLD.seq) + 1 seq[0] == seq[-1] Not even all sequences implement negative indexing. This is true for lists, as I said, but not for every object the meets the preconditions. As others have said, duck typing means that we don't know what unexpected things methods of user-defined classes might do. class Unexpected(): def __init__(self, first): self.first = first def __getitem__(self, key): if key == 0: return self.first else: raise ValueError(f'key {key} does not equal 0') def append(self, item): if isinstance(item, int): self.last = item else: raise TypeError(f'item {item} is not an int') def append_first(seq): seq.append(seq[0]) x = Unexpected(42) append_first(x) print(x.first, x.last) # 42 42 A more realistic example: def add(a, b): return a + b The simplified precondition is that a.__add__ exists and applies to b or that b.__radd__ exists and applies to a. I see no point in formally specifying this as part of 'def add' as it is part of the language definition. It is not just laziness that makes me averse to such redundancy. Even ignoring user classes, a *useful* post-condition that applies to both numbers and sequences is hard to write. I believe + is distributive for both, so that a + (b + b) = (a + b) + b, but -- Terry Jan Reedy ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Better error messages for missing optional stdlib packages
On 10/08/2018 12:29 AM, Terry Reedy wrote: On 10/3/2018 4:29 PM, Marcus Harnisch wrote: When trying to import lzma on one of my machines, I was suprised to get a normal import error like for any other module. What was the traceback and message? Did you get an import error for one of the three imports in lzma.py. I don't know why you would expect anything else. Any import in any stdlib module can potential fail if the file is buggy, corrupted, or missing. $ /usr/bin/python3 Python 3.7.0 (default, Oct 4 2018, 03:21:59) [GCC 8.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import lzma Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.7/lzma.py", line 27, in ModuleNotFoundError: No module named '_lzma' >>> > According to the docs lzma has been part of stdlib since 3.3. Further digging revealed that the error is due to the fact that xz wasn't compiled in when building Python. Perhaps this is a buggy build. This, I reckon, depends on the perspective and the definition of “buggy”. If the build process finishes without error, can we assume that the build is not buggy? If we make claims along the lines of “nobody in their right mind would build Python without lzma” it would only be fair to break the build if liblzma can't be detected. Unless I missed anything it doesn't happen until after the build has finished successfully, that a message is printed which lists the modules which couldn't be detected by setup.py. Here is a list of modules, which I believe are affected: $ grep -F missing.append setup.py missing.append('spwd') missing.append('readline') missing.append('_ssl') missing.append('_hashlib') missing.append('_sqlite3') missing.append('_dbm') missing.append('_gdbm') missing.append('nis') missing.append('_curses') missing.append('_curses_panel') missing.append('zlib') missing.append('zlib') missing.append('zlib') missing.append('_bz2') missing.append('_lzma') missing.append('_elementtree') missing.append('ossaudiodev') missing.append('_tkinter') missing.append('_uuid') Have you complained to the distributor? After finding the root cause of the missing import I did file a request for including lzma in future releases of the distribution. All I am asking is that unsuspecting users not be left in the dark when it comes to debugging unexpected import errors. I believe a missing stdlib module qualifies for “unexpected”. This could happen in form of documentation or by means of an import error handler that prints some helpful message in case that a stdlib module couldn't be found. Regards, Marcus ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] support toml for pyproject support
On Mon, Oct 8, 2018 at 4:53 AM Erik Bray wrote: > If I had the energy to argue it I would also argue against using TOML > in those PEPs. I personally don't especially care for TOML and what's > "obvious" to Tom is not at all obvious to me. I'd rather just stick > with YAML or perhaps something even simpler than either one. > I feel the same way. (Somebody was requesting extensive TOML support for mypy and was also waving those PEPs in front of us.) -- --Guido van Rossum (python.org/~guido) ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] support toml for pyproject support
On Mon, Oct 8, 2018 at 12:49 PM Jimmy Girardet wrote: > > Hi, Hi Jimmy and welcome! :) > > I don't know if this was already debated but I don't know how to search > in the whole archive of the list. > > > For now the adoption of pyproject.toml file is more difficult because > toml is not in the standard library. > > Each tool which wants to use pyproject.toml has to add a toml lib as a > conditional or hard dependency. > > Since toml is now the standard configuration file format, it's strange > the python does not support it in the stdlib lije it would have been > strange to not have the configparser module. > Let's wait till TOML hits 1.0 before adding it to the standard library. It's still at 0.5 right now. I am personally in favor of adding a standard library module for TOML, after it hits 1.0 and there's some stability after the release. > I know it's complicated to add more and more thing to the stdlib but I > really think it is necessary for python packaging being more consistent. > TOML has a fairly unambiguous specification so I don't think the choice of library should really affect what data gets loaded. If there are differences across implementations, due to the TOML specification unintentionally being ambiguous, please do file an issue on GitHub. :) > > Maybe we could thought to a readonly lib to limit the added code. I don't think that would be as helpful as possibly a round-tripping parser-writer combination but I'll refrain from pushing for that *right now*. > > > If it's conceivable, I'd be happy to help in it. > > > Nice Day guys and girls. > > Jimmy Cheers, Pradyun (pip maintainer, TOML Core member) > > > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Why is design-by-contracts not widely adopted?
On Tue, Oct 09, 2018 at 01:21:57AM +1100, Chris Angelico wrote: > > > Yet we keep having use-cases shown to us involving one person with one > > > module, and another person with another module, and the interaction > > > between the two. > > > > Do we? I haven't noticed anything that matches that description, > > although I admit I haven't read every single post in these threads > > religiously. > > Try this: Thanks for the example, that's from one of the posts I haven't read. > If you're regularly changing your function contracts, such that you > need to continually test in case something in the other package > changed, then yes, that's exactly what I'm talking about. Presumably you're opposed to continuous integration testing too. > I'm tired of debating this. Is that what you were doing? I had wondered. http://www.montypython.net/scripts/argument.php *wink* -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Why is design-by-contracts not widely adopted?
Hi Chris, I hope you don't mind me responding though you would like to stop participating. This message is meant for other readers in case they are interested. > Alice tests her package A with some test data D_A. Now assume Betty did > not write any contracts for her package B. When Alice tests her package, > she is actually making an integration test. While she controls the inputs > to B from A, she can only observe the results from B, but not whether they > are correct by coincidence or B did its job correctly. Let's denote D'_B > the data that is given to B from her original test data D_A during Alice's > integration testing. > > > > If you're regularly changing your function contracts, such that you > need to continually test in case something in the other package > changed, then yes, that's exactly what I'm talking about. > The user story I put above had nothing to do with change. I was telling how manually performing integration tests between A and B is tedious for us (since it involves some form or the other of manual recording of input/outputs to the module B and adapting unit tests of B) while contracts are much better (*for us*) since they incur little overhead (write them once for B, anybody runs them automatically). I did not want to highlight the *change* in my user story, but the ease of integration tests with contracts. If it were not for contracts, we would have never performed them. Cheers, Marko On Mon, 8 Oct 2018 at 16:22, Chris Angelico wrote: > On Mon, Oct 8, 2018 at 11:11 PM Steven D'Aprano > wrote: > > > > On Mon, Oct 08, 2018 at 09:32:23PM +1100, Chris Angelico wrote: > > > On Mon, Oct 8, 2018 at 9:26 PM Steven D'Aprano > wrote: > > > > > In other words, you change the *public interface* of your functions > > > > > all the time? How do you not have massive breakage all the time? > > > > > > > > I can't comment about Marko's actual use-case, but *in general* > > > > contracts are aimed at application *internal* interfaces, not so much > > > > library *public* interfaces. > > > > > > Yet we keep having use-cases shown to us involving one person with one > > > module, and another person with another module, and the interaction > > > between the two. > > > > Do we? I haven't noticed anything that matches that description, > > although I admit I haven't read every single post in these threads > > religiously. > > Try this: > > On Mon, Oct 8, 2018 at 5:11 PM Marko Ristin-Kaufmann > wrote: > > Alice tests her package A with some test data D_A. Now assume Betty did > not write any contracts for her package B. When Alice tests her package, > she is actually making an integration test. While she controls the inputs > to B from A, she can only observe the results from B, but not whether they > are correct by coincidence or B did its job correctly. Let's denote D'_B > the data that is given to B from her original test data D_A during Alice's > integration testing. > > > > If you're regularly changing your function contracts, such that you > need to continually test in case something in the other package > changed, then yes, that's exactly what I'm talking about. > > I'm tired of debating this. Have fun. If you love contracts so much, > marry them. I'm not interested in using them, because nothing in any > of these threads has shown me any good use-cases that aren't just > highlighting bad coding practices. > > ChrisA > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Support parsing stream with `re`
On Mon, Oct 8, 2018 at 11:15 PM Anders Hovmöller wrote: > > > However, another possibility is the the regexp is consuming lots of memory. > > The regexp seems simple enough (b'.'), so I doubt it is leaking memory like > mad; I'm guessing you're just seeing the OS page in as much of the file as it > can. > > > Yup. Windows will aggressively fill up your RAM in cases like this > because after all why not? There's no use to having memory just > sitting around unused. For read-only, non-anonymous mappings it's not > much problem for the OS to drop pages that haven't been recently > accessed and use them for something else. So I wouldn't be too > worried about the process chewing up RAM. > > I feel like this is veering more into python-list territory for > further discussion though. > > > Last time I worked on windows, which admittedly was a long time, the file > cache was not attributed to a process, so this doesn't seem to be relevant to > this situation. Depends whether it's a file cache or a memory-mapped file, though. On Linux, if I open a file, read it, then close it, I'm not using that file any more, but it might remain in cache (which will mean that re-reading it will be fast, regardless of whether that's from the same or a different process). That usage shows up as either "buffers" or "cache", and doesn't belong to any process. In contrast, a mmap'd file is memory that you do indeed own. If the system runs short of physical memory, it can simply discard those pages (rather than saving them to the swap file), but they're still owned by one specific process, and should count in that process's virtual memory. (That's based on my knowledge of Linux today and OS/2 back in the 90s. It may or may not be accurate to Windows, but I suspect it won't be very far wrong.) ChrisA ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Why is design-by-contracts not widely adopted?
On Sun, Oct 07, 2018 at 04:24:58PM -0400, Terry Reedy wrote: > A mathematical function is defined or specified by a input domain, > output range, and a mapping from inputs to outputs. The mapping can be > defined either by an explicit listing of input-output pairs or by a rule > specifying either a) the process, what is done to inputs to produce > outputs or, b) the result, how the output relates to the input. Most code does not define pure mathematical functions, unless you're writing in Haskall :-) > https://www.win.tue.nl/~wstomv/edu/2ip30/references/design-by-contract/index.html > > > defines contracts as "precise (legally unambiguous) specifications" (5.2 > Business Contracting/Sub-contracting Metaphor) You are quoting that out of context. The full context says (emphasis added): IN THE BUSINESS WORLD, contracts are precise (legally unambiguous) specifications that define the obligations and benefits of the (usually two) parties involved. and later goes on to say: How does this apply to software correctness? Consider the execution of a routine. The called routine provides a service - it is a supplier. The caller is the client that is requesting the service. We can impose a contract that spells out precisely the obligations and benefits of both the caller (client) and the callee (supplier). This contract SERVES AS THE INTERFACE SPECIFICATION FOR THE ROUTINE. (I would add *executable* interface specification.) > It is not obvious to me > that the metaphor of contracts adds anything worthwhile to the idea of > 'function'. It doesn't. That's not what the metaphor is for. Design By Contract is not a redefinition of "function", it is a software methodology, a paradigm for helping programmers reason better about functions and specify the interface so that bugs are discovered earlier. > 1. Only a small sliver of human interactions are governed by formal > legal contracts read, understood, and agreed to by both (all) parties. Irrelevant. > 2. The idealized definition is naive in practice. Most legal contracts, > unlike the example in the link article, are written in language that > most people cannot read. Irrelevant. Dicts aren't actual paper books filled with definitions of words, floats don't actually float, neural nets are not made of neurons nor can you catch fish in them, and software contracts are code, not legally binding contracts. It is a *metaphor*. > Many contracts are imprecise and legally > ambiguous, which is why we have contract dispute courts. And even then, > the expense means that most people who feel violated in a transaction do > not use courts. Is this a critique of the legal system? What relevance does it have to Design By Contract? Honestly Terry, you seem to be arguing: "Hiring a lawyer is too expensive, and that's why Design By Contract doesn't work as a software methodology." > Post-conditions specify a function by result. I claim that this is not > always sensible. In this context, "result" can mean either "the value returned by the function" OR "the action performed by the function (its side-effect)". Post-conditions can check both. > I said above that functions may be specified by > process rather than result. Fine. What of it? Can you describe what the function does? "It sorts the list in place." "It deletes the given record from the database." "It deducts the given amount from Account A and transfers it to Account B, guaranteeing that either both transactions occur or neither of them, but never one and not the other." These are all post-conditions. Write them as code, and they are contracts. If you can't write them as code, okay, move on to the next function. (By the way, since you started off talking about mathematical functions, functions which perform a process rather than return a result aren't mathematical functions.) > Ironically, the contract metaphor > reinforces my claim. Many contracts, such as in teaching and medicine, > only specify process and explicitly disclaim any particular result of > concern to the client. Irrelevant. > >b)//If you write contracts in text, they will become stale over time > > Not true for good docstrings. We very seldom change the essential > meaning of public functions. What about public functions while they are still under active development with an unstable interface? > How has "Return the sine of x (measured in radians).", for math.sin, > become stale? Why would it ever? Of course a stable function with a fixed API is unlikely to change. What's your point? The sin() function implementation on many platforms probably hasn't changed in 10 or even 20 years. (It probably just calls the hardware routines.) Should we conclude that unit testing is therefore bunk and nobody needs to write unit tests? > What formal executable post condition > would help someone who does not understand 'sine', or
Re: [Python-ideas] Why is design-by-contracts not widely adopted?
On Mon, Oct 8, 2018 at 11:11 PM Steven D'Aprano wrote: > > On Mon, Oct 08, 2018 at 09:32:23PM +1100, Chris Angelico wrote: > > On Mon, Oct 8, 2018 at 9:26 PM Steven D'Aprano wrote: > > > > In other words, you change the *public interface* of your functions > > > > all the time? How do you not have massive breakage all the time? > > > > > > I can't comment about Marko's actual use-case, but *in general* > > > contracts are aimed at application *internal* interfaces, not so much > > > library *public* interfaces. > > > > Yet we keep having use-cases shown to us involving one person with one > > module, and another person with another module, and the interaction > > between the two. > > Do we? I haven't noticed anything that matches that description, > although I admit I haven't read every single post in these threads > religiously. Try this: On Mon, Oct 8, 2018 at 5:11 PM Marko Ristin-Kaufmann wrote: > Alice tests her package A with some test data D_A. Now assume Betty did not > write any contracts for her package B. When Alice tests her package, she is > actually making an integration test. While she controls the inputs to B from > A, she can only observe the results from B, but not whether they are correct > by coincidence or B did its job correctly. Let's denote D'_B the data that is > given to B from her original test data D_A during Alice's integration testing. > If you're regularly changing your function contracts, such that you need to continually test in case something in the other package changed, then yes, that's exactly what I'm talking about. I'm tired of debating this. Have fun. If you love contracts so much, marry them. I'm not interested in using them, because nothing in any of these threads has shown me any good use-cases that aren't just highlighting bad coding practices. ChrisA ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] support toml for pyproject support
I agree here. I briefly urged against using the less used TOML format, but I have no real skin in the game around packaging. I like YAML, but that's also not in the standard library, even if more widely used. But given that packaging is committed to TOML, I think that's a strong case for including a library in stdlib. The PEP 517/518 authors had their reasons that were accepted. Now there is broad ecosystem that is built on that choice. Let's support it. On Mon, Oct 8, 2018, 8:03 AM Anders Hovmöller wrote: > > >> He's referring to PEPs 518 and 517 [1], which indeed standardize on > >> TOML as a file format for Python package build metadata. > >> > >> I think moving anything into the stdlib would be premature though – > >> TOML libraries are under active development, and the general trend in > >> the packaging space has been to move things *out* of the stdlib (e.g. > >> there's repeated rumblings about moving distutils out), because the > >> stdlib release cycle doesn't work well for packaging infrastructure. > > > > If I had the energy to argue it I would also argue against using TOML > > in those PEPs. I personally don't especially care for TOML and what's > > "obvious" to Tom is not at all obvious to me. I'd rather just stick > > with YAML or perhaps something even simpler than either one. > > This thread isn't about regretting past decisions but what makes sense > given current realities though. > > / Anders > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Debugging: some problems and possible solutions
On 04/10/18 19:10, Jonathan Fine wrote: In response to my problem-solution pair (fixing a typo) TITLE: Debug print() statements cause doctests to fail Rhodri James wrote: Or write your debug output to stderr? Perhaps I've been too concise. If so, I apologise. My proposal is that the system be set up so that debug(a, b, c) sends output to the correct stream, whatever it should be. Rhodri: Thank you for your contribution. Are you saying that because the developer can write print(a, b, c, file=sys.stderr) there's not a problem to solve here? Exactly so. If you want a quick drop of debug information, print() will do that just fine. If you want detailed or tunable information, that's what the logging module is for. I'm not sure where on the line between the two your debug() sits and what it's supposed to offer that is better than either of the alternatives. -- Rhodri James *-* Kynesim Ltd ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Support parsing stream with `re`
Thanks for your help everybody! I'm very happy to have learned about mmap. On Mon, Oct 8, 2018 at 3:27 PM Richard Damon wrote: > On 10/8/18 8:11 AM, Ram Rachum wrote: > > " Windows will aggressively fill up your RAM in cases like this > > because after all why not? There's no use to having memory just > > sitting around unused." > > > > Two questions: > > > > 1. Is the "why not" sarcastic, as in you're agreeing it's a waste? > > 2. Will this be different on Linux? Which command do I run on Linux to > > verify that the process isn't taking too much RAM? > > > > > > Thanks, > > Ram. > I would say the 'why not' isn't being sarcastic but pragmatic. (And I > would expect Linux to work similarly). After all if you have a system > with X amount of memory, and total memory demand for the other processes > is 10% of X, what is the issue with letting one process use 80% of X > with memory usages that is easy to clear out if something else wants it. > A read only page that is already backed on the disk is trivial to make > available for another usage. > > Memory just sitting idle is the real waste. > > -- > Richard Damon > > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Support parsing stream with `re`
On 10/8/18 8:11 AM, Ram Rachum wrote: > " Windows will aggressively fill up your RAM in cases like this > because after all why not? There's no use to having memory just > sitting around unused." > > Two questions: > > 1. Is the "why not" sarcastic, as in you're agreeing it's a waste? > 2. Will this be different on Linux? Which command do I run on Linux to > verify that the process isn't taking too much RAM? > > > Thanks, > Ram. I would say the 'why not' isn't being sarcastic but pragmatic. (And I would expect Linux to work similarly). After all if you have a system with X amount of memory, and total memory demand for the other processes is 10% of X, what is the issue with letting one process use 80% of X with memory usages that is easy to clear out if something else wants it. A read only page that is already backed on the disk is trivial to make available for another usage. Memory just sitting idle is the real waste. -- Richard Damon ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Support parsing stream with `re`
>> However, another possibility is the the regexp is consuming lots of memory. >> >> The regexp seems simple enough (b'.'), so I doubt it is leaking memory like >> mad; I'm guessing you're just seeing the OS page in as much of the file as it >> can. > > Yup. Windows will aggressively fill up your RAM in cases like this > because after all why not? There's no use to having memory just > sitting around unused. For read-only, non-anonymous mappings it's not > much problem for the OS to drop pages that haven't been recently > accessed and use them for something else. So I wouldn't be too > worried about the process chewing up RAM. > > I feel like this is veering more into python-list territory for > further discussion though. Last time I worked on windows, which admittedly was a long time, the file cache was not attributed to a process, so this doesn't seem to be relevant to this situation. / Anders___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Support parsing stream with `re`
" Windows will aggressively fill up your RAM in cases like this because after all why not? There's no use to having memory just sitting around unused." Two questions: 1. Is the "why not" sarcastic, as in you're agreeing it's a waste? 2. Will this be different on Linux? Which command do I run on Linux to verify that the process isn't taking too much RAM? Thanks, Ram. On Mon, Oct 8, 2018 at 3:02 PM Erik Bray wrote: > On Mon, Oct 8, 2018 at 12:20 PM Cameron Simpson wrote: > > > > On 08Oct2018 10:56, Ram Rachum wrote: > > >That's incredibly interesting. I've never used mmap before. > > >However, there's a problem. > > >I did a few experiments with mmap now, this is the latest: > > > > > >path = pathlib.Path(r'P:\huge_file') > > > > > >with path.open('r') as file: > > >mmap = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) > > > > Just a remark: don't tromp on the "mmap" name. Maybe "mapped"? > > > > >for match in re.finditer(b'.', mmap): > > >pass > > > > > >The file is 338GB in size, and it seems that Python is trying to load it > > >into memory. The process is now taking 4GB RAM and it's growing. I saw > the > > >same behavior when searching for a non-existing match. > > > > > >Should I open a Python bug for this? > > > > Probably not. First figure out what is going on. BTW, how much RAM have > you > > got? > > > > As you access the mapped file the OS will try to keep it in memory in > case you > > need that again. In the absense of competition, most stuff will get > paged out > > to accomodate it. That's normal. All the data are "clean" (unmodified) > so the > > OS can simply release the older pages instantly if something else needs > the > > RAM. > > > > However, another possibility is the the regexp is consuming lots of > memory. > > > > The regexp seems simple enough (b'.'), so I doubt it is leaking memory > like > > mad; I'm guessing you're just seeing the OS page in as much of the file > as it > > can. > > Yup. Windows will aggressively fill up your RAM in cases like this > because after all why not? There's no use to having memory just > sitting around unused. For read-only, non-anonymous mappings it's not > much problem for the OS to drop pages that haven't been recently > accessed and use them for something else. So I wouldn't be too > worried about the process chewing up RAM. > > I feel like this is veering more into python-list territory for > further discussion though. > > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Why is design-by-contracts not widely adopted?
On Mon, Oct 08, 2018 at 09:32:23PM +1100, Chris Angelico wrote: > On Mon, Oct 8, 2018 at 9:26 PM Steven D'Aprano wrote: > > > In other words, you change the *public interface* of your functions > > > all the time? How do you not have massive breakage all the time? > > > > I can't comment about Marko's actual use-case, but *in general* > > contracts are aimed at application *internal* interfaces, not so much > > library *public* interfaces. > > Yet we keep having use-cases shown to us involving one person with one > module, and another person with another module, and the interaction > between the two. Do we? I haven't noticed anything that matches that description, although I admit I haven't read every single post in these threads religiously. But "application" != "one module" or "one developer". I fail to see the contradiction. An application can be split over dozens of modules, written by teams of developers. Whether one or a dozen modules, it still has no public interface that third-party code can call. It is *all* internal. Obviously if you are using contracts in public library code, the way you will manage them is different from the way you would manage them if you are using them for private or internal code. That's no different from (say) docstrings and doctests: there are implied stability promises for those in *some* functions (the public ones) but not *other* functions (the private ones). Of course some devs don't believe in stability promises, and treat all APIs as unstable. So what? That has nothing to do with contracts. People can "move fast and break everything" in any programming style they like. > Which way is it? Do the contracts change frequently or not? "Mu." https://en.wikipedia.org/wiki/Mu_(negative) They change as frequently as you, the developer writing them, chooses to change them. Just like your tests, your type annotations, your doc strings, and every other part of your code. > Are they public or not? That's up to you. Contracts were originally designed for application development, where the concept of "public" versus "private" is meaningless. The philosophy of DbC is always going to be biased towards that mind-set. Nevertheless, people can choose to use them for library code where there is a meaningful distinction. If they do so, then how they choose to manage the contracts is up to them. If you want to make a contract a public part of the interface, then you can (but that would rule out disabling that specific contract, at least for pre-conditions). If you only want to use it for internal interfaces, you can do that too. If you want to mix and match and make some contracts internal and some public, there is no DbC Police to tell you that you can't. > How are we supposed to understand the point of contracts You could start by reading the explanations given on the Eiffel page, which I've linked to about a bazillion times. Then you could read about another bazillion blog posts and discussions that describe it (some pro, some con, some mixed). And you can read the Wikipedia page that shows how DbC is supported natively by at least 17 languages (not just Eiffel) and via libraries in at least 15 others. Not just new experimental languages, but old, established and conservative languages like Java, C and Ada. There are heaps of discussions on DbC on Stackoverflow: https://stackoverflow.com/search?q=design%20by%20contract and a good page on wiki.c2: http://wiki.c2.com/?DesignByContract TIL: Pre- and postconditions were first supported natively Barbara Liskov's CLU in the 1970s. This is not some "weird bizarre Eiffel thing", as people seem to believe. If it hasn't quite gone mainstream, it is surely at least as common as functional programming style. It has been around for over forty years in one way or another, not four weeks, and is a standard, well-established if minority programming style and development process. Of course it is always valid to debate the pros and cons of DbC versus other development paradigms, but questioning the very basis of DbC as people here keep doing is as ludicrous and annoying as questioning the basis of OOP or FP or TDD would be. Just as functional programming is a paradigm that says (among other things) "no side effects", "no global variables holding state" etc, and we can choose to apply that paradigm even in non-FP languages, so DbC is in part a paradigm that tells you how to design the internals of your application. We can apply the same design concepts to any code we want, even if we're not buying into the whole Contract metaphor: - pre-conditions can be considered argument validation; - post-conditions can be considered a kind of test; - class invariants can be considered a kind of defensive assertion. > if the use-cases being shown all involve bad code > and/or bad coding practices? How do you draw that conclusion? > Contracts, apparently, allow people to violate vers
Re: [Python-ideas] Support parsing stream with `re`
On Mon, Oct 8, 2018 at 12:20 PM Cameron Simpson wrote: > > On 08Oct2018 10:56, Ram Rachum wrote: > >That's incredibly interesting. I've never used mmap before. > >However, there's a problem. > >I did a few experiments with mmap now, this is the latest: > > > >path = pathlib.Path(r'P:\huge_file') > > > >with path.open('r') as file: > >mmap = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) > > Just a remark: don't tromp on the "mmap" name. Maybe "mapped"? > > >for match in re.finditer(b'.', mmap): > >pass > > > >The file is 338GB in size, and it seems that Python is trying to load it > >into memory. The process is now taking 4GB RAM and it's growing. I saw the > >same behavior when searching for a non-existing match. > > > >Should I open a Python bug for this? > > Probably not. First figure out what is going on. BTW, how much RAM have you > got? > > As you access the mapped file the OS will try to keep it in memory in case you > need that again. In the absense of competition, most stuff will get paged out > to accomodate it. That's normal. All the data are "clean" (unmodified) so the > OS can simply release the older pages instantly if something else needs the > RAM. > > However, another possibility is the the regexp is consuming lots of memory. > > The regexp seems simple enough (b'.'), so I doubt it is leaking memory like > mad; I'm guessing you're just seeing the OS page in as much of the file as it > can. Yup. Windows will aggressively fill up your RAM in cases like this because after all why not? There's no use to having memory just sitting around unused. For read-only, non-anonymous mappings it's not much problem for the OS to drop pages that haven't been recently accessed and use them for something else. So I wouldn't be too worried about the process chewing up RAM. I feel like this is veering more into python-list territory for further discussion though. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] support toml for pyproject support
>> He's referring to PEPs 518 and 517 [1], which indeed standardize on >> TOML as a file format for Python package build metadata. >> >> I think moving anything into the stdlib would be premature though – >> TOML libraries are under active development, and the general trend in >> the packaging space has been to move things *out* of the stdlib (e.g. >> there's repeated rumblings about moving distutils out), because the >> stdlib release cycle doesn't work well for packaging infrastructure. > > If I had the energy to argue it I would also argue against using TOML > in those PEPs. I personally don't especially care for TOML and what's > "obvious" to Tom is not at all obvious to me. I'd rather just stick > with YAML or perhaps something even simpler than either one. This thread isn't about regretting past decisions but what makes sense given current realities though. / Anders ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Why is design-by-contracts not widely adopted?
Hi Crhis, > In other words, you change the *public interface* of your functions > > all the time? How do you not have massive breakage all the time? > > I can't comment about Marko's actual use-case, but *in general* > contracts are aimed at application *internal* interfaces, not so much > library *public* interfaces. > Sorry, I might have misunderstood the question -- I was referring to modules used within the company, not outside. Of course, public libraries put on pypi don't change their interfaces weekly. Just to clear the confusion, both Steve and I would claim that the contracts do count as part of the interface. For everything internal, we make changes frequently (including the interface) and more often than not, the docstring is not updated when the implementation of the function is. Contracts help our team catch breaking changes more easily. When we change the behavior of the function, we use "Find usage" in Pycharm, fix manually what we can obviously see that was affected by the changed implementation, then statically check with mypy that the changed return type did not affect the callers, and contracts (of other functions!) catch some of the bugs during testing that we missed when we changed the implementation. End-to-end test with testing contracts turned off catch some more bugs on the real data, and then it goes into production where hopefully we see no errors. Cheers, Marko On Mon, 8 Oct 2018 at 12:32, Chris Angelico wrote: > On Mon, Oct 8, 2018 at 9:26 PM Steven D'Aprano > wrote: > > > In other words, you change the *public interface* of your functions > > > all the time? How do you not have massive breakage all the time? > > > > I can't comment about Marko's actual use-case, but *in general* > > contracts are aimed at application *internal* interfaces, not so much > > library *public* interfaces. > > Yet we keep having use-cases shown to us involving one person with one > module, and another person with another module, and the interaction > between the two. Which way is it? Do the contracts change frequently > or not? Are they public or not? How are we supposed to understand the > point of contracts if the use-cases being shown all involve bad code > and/or bad coding practices? > > Contracts, apparently, allow people to violate versioning expectations > and feel good about it. > > (Am I really exaggerating all that much here?) > > ChrisA > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] support toml for pyproject support
On Mon, Oct 8, 2018 at 12:23 PM Nathaniel Smith wrote: > > On Mon, Oct 8, 2018 at 2:55 AM, Steven D'Aprano wrote: > > > > On Mon, Oct 08, 2018 at 09:10:40AM +0200, Jimmy Girardet wrote: > >> Each tool which wants to use pyproject.toml has to add a toml lib as a > >> conditional or hard dependency. > >> > >> Since toml is now the standard configuration file format, > > > > It is? Did I miss the memo? Because I've never even heard of TOML before > > this very moment. > > He's referring to PEPs 518 and 517 [1], which indeed standardize on > TOML as a file format for Python package build metadata. > > I think moving anything into the stdlib would be premature though – > TOML libraries are under active development, and the general trend in > the packaging space has been to move things *out* of the stdlib (e.g. > there's repeated rumblings about moving distutils out), because the > stdlib release cycle doesn't work well for packaging infrastructure. If I had the energy to argue it I would also argue against using TOML in those PEPs. I personally don't especially care for TOML and what's "obvious" to Tom is not at all obvious to me. I'd rather just stick with YAML or perhaps something even simpler than either one. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Support parsing stream with `re`
I'm not an expert on memory. I used Process Explorer to look at the Process. The Working Set of the current run is 11GB. The Private Bytes is 708MB. Actually, see all the info here: https://www.dropbox.com/s/tzoud028pzdkfi7/screenshot_TURING_2018-10-08_133355.jpg?dl=0 I've got 16GB of RAM on this computer, and Process Explorer says it's almost full, just ~150MB left. This is physical memory. To your question: The loop does iterate, i.e. finding multiple matches. On Mon, Oct 8, 2018 at 1:20 PM Cameron Simpson wrote: > On 08Oct2018 10:56, Ram Rachum wrote: > >That's incredibly interesting. I've never used mmap before. > >However, there's a problem. > >I did a few experiments with mmap now, this is the latest: > > > >path = pathlib.Path(r'P:\huge_file') > > > >with path.open('r') as file: > >mmap = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) > > Just a remark: don't tromp on the "mmap" name. Maybe "mapped"? > > >for match in re.finditer(b'.', mmap): > >pass > > > >The file is 338GB in size, and it seems that Python is trying to load it > >into memory. The process is now taking 4GB RAM and it's growing. I saw the > >same behavior when searching for a non-existing match. > > > >Should I open a Python bug for this? > > Probably not. First figure out what is going on. BTW, how much RAM have > you > got? > > As you access the mapped file the OS will try to keep it in memory in case > you > need that again. In the absense of competition, most stuff will get paged > out > to accomodate it. That's normal. All the data are "clean" (unmodified) so > the > OS can simply release the older pages instantly if something else needs > the > RAM. > > However, another possibility is the the regexp is consuming lots of memory. > > The regexp seems simple enough (b'.'), so I doubt it is leaking memory > like > mad; I'm guessing you're just seeing the OS page in as much of the file as > it > can. > > Also, does the loop iterate? i.e. does it find multiple matches as the > memory > gets consumed, or is the first iateration blocking and consuming gobs of > memory > before the first match comes back? A print() call will tell you that. > > Cheers, > Cameron Simpson > > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Why is design-by-contracts not widely adopted?
On Mon, Oct 8, 2018 at 9:26 PM Steven D'Aprano wrote: > > In other words, you change the *public interface* of your functions > > all the time? How do you not have massive breakage all the time? > > I can't comment about Marko's actual use-case, but *in general* > contracts are aimed at application *internal* interfaces, not so much > library *public* interfaces. Yet we keep having use-cases shown to us involving one person with one module, and another person with another module, and the interaction between the two. Which way is it? Do the contracts change frequently or not? Are they public or not? How are we supposed to understand the point of contracts if the use-cases being shown all involve bad code and/or bad coding practices? Contracts, apparently, allow people to violate versioning expectations and feel good about it. (Am I really exaggerating all that much here?) ChrisA ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Why is design-by-contracts not widely adopted?
On Mon, Oct 08, 2018 at 04:29:34PM +1100, Chris Angelico wrote: > On Mon, Oct 8, 2018 at 4:26 PM Marko Ristin-Kaufmann > wrote: > >> Not true for good docstrings. We very seldom change the essential > >> meaning of public functions. > > > > In my team, we have a stale docstring once every two weeks or even more > > often. "At Resolver we've found it useful to short-circuit any doubt and just refer to comments in code as 'lies'. " --Michael Foord paraphrases Christian Muirhead on python-dev, 2009-03-22 > If it weren't for doctests and contracts, I could imagine we would > have them even more often :) > > > > In other words, you change the *public interface* of your functions > all the time? How do you not have massive breakage all the time? I can't comment about Marko's actual use-case, but *in general* contracts are aimed at application *internal* interfaces, not so much library *public* interfaces. That's not to say that contracts can't be used for libraries at all, but they're not so useful for public interfaces that could be called by arbitrary third-parties. They are more useful for internal interfaces, where you don't break anyone's code but your own if you change the API. Think about it this way: you probably wouldn't hesitate much to change the interface of a _private method or function, aside from discussing it with your dev team. Sure it will break some code, but you have tests to identify the breakage, and maybe refactoring tools to help. And of course the contracts themselves are de facto tests. Such changes are manageable. And since its a private function, nobody outside of your team need care. Same with contracts. (At least in the ideal case.) -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] support toml for pyproject support
On Mon, Oct 8, 2018 at 2:55 AM, Steven D'Aprano wrote: > > On Mon, Oct 08, 2018 at 09:10:40AM +0200, Jimmy Girardet wrote: >> Each tool which wants to use pyproject.toml has to add a toml lib as a >> conditional or hard dependency. >> >> Since toml is now the standard configuration file format, > > It is? Did I miss the memo? Because I've never even heard of TOML before > this very moment. He's referring to PEPs 518 and 517 [1], which indeed standardize on TOML as a file format for Python package build metadata. I think moving anything into the stdlib would be premature though – TOML libraries are under active development, and the general trend in the packaging space has been to move things *out* of the stdlib (e.g. there's repeated rumblings about moving distutils out), because the stdlib release cycle doesn't work well for packaging infrastructure. -n [1] https://www.python.org/dev/peps/pep-0518/ https://www.python.org/dev/peps/pep-0517 -- Nathaniel J. Smith -- https://vorpus.org ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Support parsing stream with `re`
On 08Oct2018 10:56, Ram Rachum wrote: That's incredibly interesting. I've never used mmap before. However, there's a problem. I did a few experiments with mmap now, this is the latest: path = pathlib.Path(r'P:\huge_file') with path.open('r') as file: mmap = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) Just a remark: don't tromp on the "mmap" name. Maybe "mapped"? for match in re.finditer(b'.', mmap): pass The file is 338GB in size, and it seems that Python is trying to load it into memory. The process is now taking 4GB RAM and it's growing. I saw the same behavior when searching for a non-existing match. Should I open a Python bug for this? Probably not. First figure out what is going on. BTW, how much RAM have you got? As you access the mapped file the OS will try to keep it in memory in case you need that again. In the absense of competition, most stuff will get paged out to accomodate it. That's normal. All the data are "clean" (unmodified) so the OS can simply release the older pages instantly if something else needs the RAM. However, another possibility is the the regexp is consuming lots of memory. The regexp seems simple enough (b'.'), so I doubt it is leaking memory like mad; I'm guessing you're just seeing the OS page in as much of the file as it can. Also, does the loop iterate? i.e. does it find multiple matches as the memory gets consumed, or is the first iateration blocking and consuming gobs of memory before the first match comes back? A print() call will tell you that. Cheers, Cameron Simpson ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] support toml for pyproject support
Hi Jimmy, and welcome, On Mon, Oct 08, 2018 at 09:10:40AM +0200, Jimmy Girardet wrote: > Hi, > > I don't know if this was already debated but I don't know how to search > in the whole archive of the list. > > > For now the adoption of pyproject.toml file is more difficult because > toml is not in the standard library. It is true that using third-party libraries is more difficult than using the std lib. That alone is not a reason to add a library to the std lib. > Each tool which wants to use pyproject.toml has to add a toml lib as a > conditional or hard dependency. > > Since toml is now the standard configuration file format, It is? Did I miss the memo? Because I've never even heard of TOML before this very moment. Google Trends doesn't really support your assertion that TOML has become "the standard" for config files: # compare TOML, JSON and YAML https://trends.google.com/trends/explore?q=%2Fg%2F11c5zwr35t,%2Fm%2F05cntt,%2Fm%2F01w6k2 although it is trending upwards: https://trends.google.com/trends/explore?q=%2Fg%2F11c5zwr35t > it's strange > the python does not support it in the stdlib lije it would have been > strange to not have the configparser module. We don't even ship a YAML library, and that seems to be far more popular than TOML. On the other hand, we do ship a plist library. > I know it's complicated to add more and more thing to the stdlib but I > really think it is necessary for python packaging being more consistent. > > > Maybe we could thought to a readonly lib to limit the added code. What is a readonly lib? -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Support parsing stream with `re`
That's incredibly interesting. I've never used mmap before. However, there's a problem. I did a few experiments with mmap now, this is the latest: path = pathlib.Path(r'P:\huge_file') with path.open('r') as file: mmap = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) for match in re.finditer(b'.', mmap): pass The file is 338GB in size, and it seems that Python is trying to load it into memory. The process is now taking 4GB RAM and it's growing. I saw the same behavior when searching for a non-existing match. Should I open a Python bug for this? On Sun, Oct 7, 2018 at 7:49 PM <2...@jmunch.dk> wrote: > On 18-10-07 16.15, Ram Rachum wrote: > > I tested it now and indeed bytes patterns work on memoryview objects. > > But how do I use this to scan for patterns through a stream without > > loading it to memory? > > An mmap object is one of the things you can make a memoryview of, > although looking again, it seems you don't even need to, you can > just re.search the mmap object directly. > > re.search'ing the mmap object means the operating system takes care of > the streaming for you, reading in parts of the file only as necessary. > > regards, Anders > > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] support toml for pyproject support
Hi, I don't know if this was already debated but I don't know how to search in the whole archive of the list. For now the adoption of pyproject.toml file is more difficult because toml is not in the standard library. Each tool which wants to use pyproject.toml has to add a toml lib as a conditional or hard dependency. Since toml is now the standard configuration file format, it's strange the python does not support it in the stdlib lije it would have been strange to not have the configparser module. I know it's complicated to add more and more thing to the stdlib but I really think it is necessary for python packaging being more consistent. Maybe we could thought to a readonly lib to limit the added code. If it's conceivable, I'd be happy to help in it. Nice Day guys and girls. Jimmy ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/