pdb which handles threads

2018-11-19 Thread Andy Valencia
I had yet another program where I accidentally had more than one
thread enter pdb at once, leaving me with the "pdb's battling for
the keyboard" syndrome.  So I extended pdb to recognize and handle
threads.  I added:

"jobs"

List threads, with one current one being the only one involved
with the keyboard.  All others wait politely.

"fg "

To switch to a different thread.

I welcome comments (it's for Python 2), under:

http://sources.vsta.org/

Regards,
Andy Valencia
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Dan Sommers

On 11/19/18 6:49 PM, Robert Girault wrote:

> I think I disagree with your take here.  With mt19937, given ANY seed,
> I can eventually predict all the sequence without having to query the
> oracle any further.

Even if that's true, and I use mt19937 inside my program, you don't
[usually|necessarily] have access to the raw output from it.

> If you're just writing a toy software, even K&R PRNG works just fine.
> If you're writing a weather simulation, I suppose you need real
> random-like properties and still need your generator to be reproducible.
> If you're using random Quicksort, you do need unpredictability and
> reproducibility.  If you're writing a crypto application, then you need
> something way stronger.  We need all of them ...

Agreed.  Mostly.  IIRC, though, your question was about *replacing*
mt19937, not adding a new RNG.

Please use the right tool for the job at hand.

> ... But mt19937 is now useful only in toy software.

It has "real random-like" properties (for certain definitions of "real"
and "random-like") and it's reproducible.  Therefore, it's good for
weather simulations, too.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Reading 'scientific' csv using Pandas?

2018-11-19 Thread Thomas Jollans
On 2018-11-18 19:22, Martin Schöön wrote:
> Den 2018-11-18 skrev Shakti Kumar :
>> On Sun, 18 Nov 2018 at 18:18, Martin Schöön  wrote:
>>>
>>> Now I hit a bump in the road when some of the data is not in plain
>>> decimal notation (xxx,xx) but in 'scientific' (xx,xxxe-xx) notation.
>>>
>>
>> Martin, I believe this should be done by pandas itself while reading
>> the csv file,
>> I took an example in scientific notation and checked this out,
>>
>> my sample.csv file is,
>> col1,col2
>> 1.1,0
>> 10.24e-05,1
>> 9.492e-10,2
>>
> That was a quick answer!
> 
> My pandas is up to date.
> 
> In your example you use the US convention of using "." for decimals
> and "," to separate data. This works perfect for me too.
> 
> However, my data files use European conventions: decimal "," and TAB
> to separate data:
> 
> col1  col2
> 1,1   0
> 10,24e-05 1
> 9,492e-10 2
> 
> I use 
> 
> EUData = pd.read_csv('file.csv', skiprows=1, sep='\t',
> decimal=',', engine='python')
> 
> to read from such files. This works so so. 'Common floats' (3,1415 etc)
> works just fine but 'scientific' stuff (1,6023e23) does not work.
> 
> /Martin
> 


This looks like a bug in the 'python' engine specifically. I suggest you
write a bug report at https://github.com/pandas-dev/pandas/issues

(conda:nb) /tmp
0:jollans@mn70% cat test.csv
Index   Value
0   1,674
1   3,48e+3
2   8,1834e-10
3   3984,109
4   2830812370

(conda:nb) /tmp
0:jollans@mn70% ipython
Python 3.7.0 (default, Oct  9 2018, 10:31:47)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.1.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas as pd



In [2]: pd.read_csv('test.csv', header=[0], index_col=0, decimal=',',
sep='\t')

Out[2]:
  Value
Index
0  1.674000e+00
1  3.48e+03
2  8.183400e-10
3  3.984109e+03
4  2.830812e+09

In [3]: pd.read_csv('test.csv', header=[0], index_col=0, decimal=',',
sep='\t', engine='python')

Out[3]:
Value
Index
0   1.674
1 3,48e+3
2  8,1834e-10
33984.109
4  2830812370

In [4]: pd.__version__


Out[4]: '0.23.4'



-- 
Cheers,
 Thomas
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Chris Angelico
On Tue, Nov 20, 2018 at 10:51 AM Robert Girault  wrote:
> If you're just writing a toy software, even K&R PRNG works just fine.
> If you're writing a weather simulation, I suppose you need real
> random-like properties and still need your generator to be reproducible.
> If you're using random Quicksort, you do need unpredictability and
> reproducibility.  If you're writing a crypto application, then you need
> something way stronger.  We need all of them.  But mt19937 is now useful
> only in toy software.

I disagree. Yes, in a crypto-sensitive situation, you can't depend on
the Twister... but you shouldn't be relying on *any* PRNG for that.
There are plenty of situations where you need something unpredictable
but it doesn't have to be THAT safe. Your example of picking a random
pivot for quicksort is a perfect example. Let's suppose I am sorting
by that method... how are you going to get 624 consecutive outputs? If
you can provide a custom comparison function, you can DOS the search
just by making that inefficient. If you can't, how are you going to
reconstruct the randomness? Is this REALLY a viable attack vector?

It's different if, say, you're operating a virtual casino, and letting
people watch the roulette wheel spins. (Though even then,
reconstructing the twister's state from a series of 1-in-38 results
isn't going to be trivial.) But it's overly paranoid to say that every
single PRNG needs to be cryptographically secure.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Robert Girault
Dennis Lee Bieber  writes:

> On Mon, 19 Nov 2018 19:05:44 -0200, Robert Girault  declaimed
> the following:
>
>>I mean the fact that with 624 samples from the generator, you can
>>determine the rest of the sequence completely.
>
>   Being able to predict the sequence after a large sampling does not mean
> that the /distribution of values/ is not (pseudo-) random.

The problem with determining its sequence is that it might defeat its
purpose.  If you use mt19937 to select a pivot in random Quicksort for
example (where you plan to spend n lg n time in sorting), we can
frustrate your plans and force it into n^2 every time, an effective DoS
attack on your software.

>   After all, pretty much all random number generators will produce the
> same sequence if given the same starting seed... You are, in effect,
> treating your 624 samples as a very large seed...

I think I disagree with your take here.  With mt19937, given ANY seed, I
can eventually predict all the sequence without having to query the
oracle any further.

If you're just writing a toy software, even K&R PRNG works just fine.
If you're writing a weather simulation, I suppose you need real
random-like properties and still need your generator to be reproducible.
If you're using random Quicksort, you do need unpredictability and
reproducibility.  If you're writing a crypto application, then you need
something way stronger.  We need all of them.  But mt19937 is now useful
only in toy software.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Reading 'scientific' csv using Pandas?

2018-11-19 Thread MRAB

On 2018-11-19 20:44, Martin Schöön wrote:

Too many files to go through them with an editor :-(


If only Python could read and write files... :-)
--
https://mail.python.org/mailman/listinfo/python-list


Re: Reading 'scientific' csv using Pandas?

2018-11-19 Thread MRAB

On 2018-11-19 21:32, Martin Schöön wrote:

Den 2018-11-19 skrev Martin Schöön :

Den 2018-11-19 skrev Peter Otten <__pete...@web.de>:


The engine="python" produces an exception over here:

"""
ValueError: The 'decimal' option is not supported with the 'python' engine
"""

Maybe you can try and omit that option?


Bingo!
No, I don't remember why I added that engine thing. It was two days ago!


If that doesn't work you can specify a converter:

pd.read_csv("file.csv", sep="\t", converters={0: lambda s: 

float(s.replace(",", "."))})
   col1  col2
0  1.10e+00 0
1  1.024000e-04 1
2  9.492000e-10 2

[3 rows x 2 columns]



I spoke too early. Upon closer inspection I get the first column with
decimal '.' and the rest with decimal ','. I have tried the converter
thing to no avail :-(

You passed {0: lambda s: float(s.replace(",", "."))} as the converters 
argument, which means that it applies only to column 0.

--
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Ian Kelly
On Mon, Nov 19, 2018 at 2:12 PM Robert Girault  wrote:
>
> Chris Angelico  writes:
>
> > On Tue, Nov 20, 2018 at 7:31 AM Robert Girault  wrote:
> >> Nice.  So Python's random.random() does indeed use mt19937.  Since it's
> >> been broken for years, why isn't it replaced by something newer like
> >> ChaCha20?  Is it due to backward compatibility?  That would make sense.
> >
> > What exactly do you mean by "broken"?
>
> I mean the fact that with 624 samples from the generator, you can
> determine the rest of the sequence completely.
>
> Sorry about mentioning ChaCha20.  That was misleading.  I should've said
> something newer like mrtg32k3a or xorshift*.

If you wish to propose replacing it, that topic is probably best
brought up at python-dev.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Reading 'scientific' csv using Pandas?

2018-11-19 Thread Martin Schöön
Den 2018-11-19 skrev Martin Schöön :
> Den 2018-11-19 skrev Peter Otten <__pete...@web.de>:
>>
>> The engine="python" produces an exception over here:
>>
>> """
>> ValueError: The 'decimal' option is not supported with the 'python' engine
>> """
>>
>> Maybe you can try and omit that option?
>
> Bingo!
> No, I don't remember why I added that engine thing. It was two days ago!
>
>> If that doesn't work you can specify a converter:
>>
> pd.read_csv("file.csv", sep="\t", converters={0: lambda s: 
>> float(s.replace(",", "."))})
>>col1  col2
>> 0  1.10e+00 0
>> 1  1.024000e-04 1
>> 2  9.492000e-10 2
>>
>> [3 rows x 2 columns]
>
I spoke too early. Upon closer inspection I get the first column with
decimal '.' and the rest with decimal ','. I have tried the converter
thing to no avail :-(

/Martin
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Robert Girault
Chris Angelico  writes:

> On Tue, Nov 20, 2018 at 7:31 AM Robert Girault  wrote:
>> Nice.  So Python's random.random() does indeed use mt19937.  Since it's
>> been broken for years, why isn't it replaced by something newer like
>> ChaCha20?  Is it due to backward compatibility?  That would make sense.
>
> What exactly do you mean by "broken"? 

I mean the fact that with 624 samples from the generator, you can
determine the rest of the sequence completely.

Sorry about mentioning ChaCha20.  That was misleading.  I should've said
something newer like mrtg32k3a or xorshift*.

> If you're generating random numbers for any sort of security purpose,
> you probably should look at this:
>
> https://docs.python.org/3/library/secrets.html
>
> (New in 3.6, though, hence the "probably". If you need to support 3.5
> or older - including 2.7 - then you can't use that.)

Thanks for the reference!  

I'm not particularly interested in security at the moment, but I would
like an expert's confirmation that some of these algorithms arent't
replaced due to backward compatibility.  We could easily replace them,
but I think we shouldn't: some people still depend on these algorithms
for their experiment.

Are there other reasons?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Reading 'scientific' csv using Pandas?

2018-11-19 Thread Martin Schöön
Den 2018-11-19 skrev Peter Otten <__pete...@web.de>:
> Martin Schöön wrote:
>
>> My pandas is up to date.
>> 
>
> The engine="python" produces an exception over here:
>
> """
> ValueError: The 'decimal' option is not supported with the 'python' engine
> """
>
> Maybe you can try and omit that option?

Bingo!
No, I don't remember why I added that engine thing. It was two days ago!

> If that doesn't work you can specify a converter:
>
 pd.read_csv("file.csv", sep="\t", converters={0: lambda s: 
> float(s.replace(",", "."))})
>col1  col2
> 0  1.10e+00 0
> 1  1.024000e-04 1
> 2  9.492000e-10 2
>
> [3 rows x 2 columns]
>
I save that one for later. One never nows...

/Martin
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Reading 'scientific' csv using Pandas?

2018-11-19 Thread Martin Schöön
Too many files to go through them with an editor :-(

/Martin
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Reading 'scientific' csv using Pandas?

2018-11-19 Thread Chris Angelico
On Tue, Nov 20, 2018 at 7:46 AM Martin Schöön  wrote:
> Thanks, I just tried this. The line locale.setlocale... throws an
> error:
>
> "locale.Error: unsupported locale setting"
>
> Trying other ideas instead of 'de' results in more of the same.
> '' results in no errors.

Haven't been reading in detail, but maybe "de_DE" will work better,
assuming you have that locale installed.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Reading 'scientific' csv using Pandas?

2018-11-19 Thread Martin Schöön
Den 2018-11-18 skrev Stefan Ram :
> Martin =?UTF-8?Q?Sch=C3=B6=C3=B6n?=  writes:
>>to read from such files. This works so so. 'Common floats' (3,1415 etc)
>>works just fine but 'scientific' stuff (1,6023e23) does not work.
>
>   main.py
>
> import sys
> import pandas
> import locale
> print( sys.version )
> print( pandas.__version__ )
> with open( 'schoon20181118232102.csv', 'w' ) as file:
> print( 'col0\tcol1', file=file, flush=True )
> print( '1,1\t0', file=file, flush=True )
> print( '10,24e-05\t1', file=file, flush=True )
> print( '9,492e-10\t2', file=file, flush=True )
> EUData = pandas.read_csv\
> ( 'schoon20181118232102.csv', sep='\t', decimal=',', engine='python' )
> locale.setlocale( locale.LC_ALL, 'de' )
> print( 2 * locale.atof( EUData[ 'col0' ][ 1 ]))
>
>   transcript
>
> 3.7.0
> 0.23.4
> 0.0002048
>
Thanks, I just tried this. The line locale.setlocale... throws an
error:

"locale.Error: unsupported locale setting"

Trying other ideas instead of 'de' results in more of the same.
'' results in no errors.

The output I get is this:

3.4.2 (default, Oct  8 2014, 10:45:20) 
[GCC 4.9.1]
0.22.0
0.0002048

Scratching my head and speculating: I run this in a Virtualenv
I have created for Jupyter and pandas and whatever I feel I need
for this. Could locale be undefined or something that causes this?

/Martin
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Chris Angelico
On Tue, Nov 20, 2018 at 7:31 AM Robert Girault  wrote:
> Nice.  So Python's random.random() does indeed use mt19937.  Since it's
> been broken for years, why isn't it replaced by something newer like
> ChaCha20?  Is it due to backward compatibility?  That would make sense.

What exactly do you mean by "broken"? If you're generating random
numbers for any sort of security purpose, you probably should look at
this:

https://docs.python.org/3/library/secrets.html

(New in 3.6, though, hence the "probably". If you need to support 3.5
or older - including 2.7 - then you can't use that.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about the definition of the value of an object

2018-11-19 Thread Iwo Herka
> Attempting to define value here would be at best a massive
> distraction from the concepts the documentation is trying
> to get across.

> There is one very simple definition of "value" which is entirely
> accurate, but probably not helpful, and that is: An object's
> value is whatever it is equal to.

> Generally, Python objects have their values defined by an abstract
> concept that is being represented.

That confirms my intuition. Thank you for the responses.

Sincerely,
Iwo Herka
pon., 19 lis 2018 o 20:46 Chris Angelico  napisał(a):
>
> On Tue, Nov 20, 2018 at 3:08 AM Iwo Herka  wrote:
> >
> > Hello everyone,
> >
> > I've been looking for something in the documentation
> > (https://docs.python.org/3.8/reference/datamodel.html) recently
> > and I've noticed something weird. Documentation states that every
> > object has a value, but doesn’t provide any definition
> > whatsoever of what the value is. Now, I'm sure that every reasonably
> > fluent Python programmer has an intuitive
> > understanding of the term, nonetheless, I would expect the
> > documentation defines it somehow (not necessarily
> > in a formal fashion), especially considering that "the value of an
> > object" is used to explain other concepts, such as
> > mutability:
> >
> > > The value of some objects can change. Objects whose value can change are 
> > > said to be mutable; objects whose
> > value is unchangeable once they are created are called immutable.
> >
> > So, why is documentation silent on this? One reason I can think of is
> > to avoid answering inconvenient questions.
>
> Sorta kinda, yeah. There is one very simple definition of "value"
> which is entirely accurate, but probably not helpful, and that is: An
> object's value is whatever it is equal to. That is to say, you can ask
> two basic questions about an object:
>
> x is y # identity: are x and y the SAME object?
> x == y # equality: do x and y have the same value?
>
> Value and equality are intrinsically linked, but unfortunately that
> doesn't really explain what either one actually IS. As Rhodri says,
> you can ask a philosopher about that, and will be stuck for weeks :)
>
> The concept of "immutable" vs "mutable" object, therefore, is that
> some objects may compare equal now and unequal later. Since the same
> object is able to change in "value" over time, it may become (un)equal
> to something while still being the same object. Thus mutable objects
> can't be used as dict keys, as their values could change, and dict
> lookups have to match based on equality. (Imagine putting two keys
> into a dict while they have different values, and then mutating one of
> them to have the same value. Now try to look up that new value using a
> third object. The Logic Police will arrest you before you can say
> "hashability"!)
>
> Generally, Python objects have their values defined by an abstract
> concept that is being represented. For instance, the integer 32 and
> the float 32.0 have the same value, even though they're different
> types; they both represent the abstract number equal to
> two-to-the-fifth. But right here in that sentence, you can see how
> hard it is to actually *define* that value. At best, all you can
> really say is that the value is equal to the value of int("32") or
> some other way of getting an object with that value.
>
> Yep, it's hard. But the cool thing is, it usually doesn't matter - you
> can say "x has the value 32" without worrying about representations,
> data types, etc.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Robert Girault
Peter Otten <__pete...@web.de> writes:

> Robert Girault wrote:
>
>> Looking at its source code, it seems the PRNG behind random.random() is
>> Mersenne Twister, but I'm not sure.  It also seems that random.random()
>> is using /dev/urandom.  Can someone help me to read that source code?
>> 
>> I'm talking about CPython, by the way.  I'm reading
>> 
>>   https://github.com/python/cpython/blob/master/Lib/random.py
>> 
>> The initial comment clearly says it's Mersenne Twister, but the only
>> random() function there seems to call _urandom(), which I suppose is an
>> interface to /dev/urandom.
>> 
>> What am I missing here?
>
> There's a class random.Random which is instantiated at the end of the file, 
> and random() is bound to the corresponding method:
>
> _inst = Random()
> ...
> random = _inst.random
>
> The Random class inherits from _random.Random [...]

Thanks.  I missed that.

> which is implemented in C and does most of the actual work. If you can
> read C:
>
> https://github.com/python/cpython/blob/master/Modules/_randommodule.c
>
> The most relevant part seems to be genrand_int32() which is wrapped by 
> random_random() that actually implenents the _random.Random.random() method.

Nice.  So Python's random.random() does indeed use mt19937.  Since it's
been broken for years, why isn't it replaced by something newer like
ChaCha20?  Is it due to backward compatibility?  That would make sense.

Do you know who broke mt19937 and when?  I'd love to read the reference.
Thank you!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about the definition of the value of an object

2018-11-19 Thread Chris Angelico
On Tue, Nov 20, 2018 at 3:08 AM Iwo Herka  wrote:
>
> Hello everyone,
>
> I've been looking for something in the documentation
> (https://docs.python.org/3.8/reference/datamodel.html) recently
> and I've noticed something weird. Documentation states that every
> object has a value, but doesn’t provide any definition
> whatsoever of what the value is. Now, I'm sure that every reasonably
> fluent Python programmer has an intuitive
> understanding of the term, nonetheless, I would expect the
> documentation defines it somehow (not necessarily
> in a formal fashion), especially considering that "the value of an
> object" is used to explain other concepts, such as
> mutability:
>
> > The value of some objects can change. Objects whose value can change are 
> > said to be mutable; objects whose
> value is unchangeable once they are created are called immutable.
>
> So, why is documentation silent on this? One reason I can think of is
> to avoid answering inconvenient questions.

Sorta kinda, yeah. There is one very simple definition of "value"
which is entirely accurate, but probably not helpful, and that is: An
object's value is whatever it is equal to. That is to say, you can ask
two basic questions about an object:

x is y # identity: are x and y the SAME object?
x == y # equality: do x and y have the same value?

Value and equality are intrinsically linked, but unfortunately that
doesn't really explain what either one actually IS. As Rhodri says,
you can ask a philosopher about that, and will be stuck for weeks :)

The concept of "immutable" vs "mutable" object, therefore, is that
some objects may compare equal now and unequal later. Since the same
object is able to change in "value" over time, it may become (un)equal
to something while still being the same object. Thus mutable objects
can't be used as dict keys, as their values could change, and dict
lookups have to match based on equality. (Imagine putting two keys
into a dict while they have different values, and then mutating one of
them to have the same value. Now try to look up that new value using a
third object. The Logic Police will arrest you before you can say
"hashability"!)

Generally, Python objects have their values defined by an abstract
concept that is being represented. For instance, the integer 32 and
the float 32.0 have the same value, even though they're different
types; they both represent the abstract number equal to
two-to-the-fifth. But right here in that sentence, you can see how
hard it is to actually *define* that value. At best, all you can
really say is that the value is equal to the value of int("32") or
some other way of getting an object with that value.

Yep, it's hard. But the cool thing is, it usually doesn't matter - you
can say "x has the value 32" without worrying about representations,
data types, etc.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Extend NTFS with "version" of file and "version" of folder, also optionally GIT integration or something like it.

2018-11-19 Thread skybuck2000
Described also as:

(Versioning System Integration with Windows Explorer)

Anyway 

Googling NTFS and GIT turned up this:

https://blogs.msdn.microsoft.com/devops/2017/02/03/announcing-gvfs-git-virtual-file-system/

The objective of this project seems to be a bit different. To handle very large 
projects.

Which in itself is great. But for small projects like mine this is perhaps 
somewhat overkill.

But the people working on this project do have some experience integrating GIT 
with a file system and creating some virtual file system.

I highly recommend Microsoft to expand these kinds of projects massively or to 
expand this project in a big way and to involve Windows Explorer programmers to 
get on on this action and to expand windows explorer to also work with this 
file system and versioning system and to perhaps provide some slight new 
features for windows explorer to work in tandem with such a new file system.

I want something like this to be usuable for small projects too

Perhaps it's already usuable not sure... would be nice if this software could 
be made available for windows 7, only microsoft sort out the troubles with 
windows 10 updates ;)

I think this continous windows 10 updating approach might be a bit too much for 
people to handle.

Perhaps it's better to have an older version and a new version, so people don't 
be bother with new version f*ck ups.

I sure don't have to to mess with windows updates and troubles, except very 
urgent security fixes.

Bye,
  Skybuck.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about the definition of the value of an object

2018-11-19 Thread Terry Reedy

On 11/19/2018 9:08 AM, Iwo Herka wrote:

Hello everyone,

I've been looking for something in the documentation
(https://docs.python.org/3.8/reference/datamodel.html) recently
and I've noticed something weird. Documentation states that every
object has a value, but doesn’t provide any definition
whatsoever of what the value is.


Python is a language for manipulating information stored in Python 
objects.  Abstractly, object values are the mostly implementation- and 
even language-independent information that we wish to manipulate.  Note 
that concrete types are somewhat implementation dependent and ids are 
implementation and session dependent.


Bools represent binary choices, not 'truth' per se.  We call the choices 
'True' and 'False' because that (with or without capitals) is the 
default binary choice in propositional logic.  For numbers, 0 versus not 
0 is often an important choice.  Ditto for 'empty' versus 'not empty' 
for collections.


--
Terry Jan Reedy


--
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Peter Otten
Robert Girault wrote:

> Looking at its source code, it seems the PRNG behind random.random() is
> Mersenne Twister, but I'm not sure.  It also seems that random.random()
> is using /dev/urandom.  Can someone help me to read that source code?
> 
> I'm talking about CPython, by the way.  I'm reading
> 
>   https://github.com/python/cpython/blob/master/Lib/random.py
> 
> The initial comment clearly says it's Mersenne Twister, but the only
> random() function there seems to call _urandom(), which I suppose is an
> interface to /dev/urandom.
> 
> What am I missing here?

There's a class random.Random which is instantiated at the end of the file, 
and random() is bound to the corresponding method:

_inst = Random()
...
random = _inst.random

The Random class inherits from _random.Random which is implemented in C and 
does most of the actual work. If you can read C:

https://github.com/python/cpython/blob/master/Modules/_randommodule.c

The most relevant part seems to be genrand_int32() which is wrapped by 
random_random() that actually implenents the _random.Random.random() method.

-- 
https://mail.python.org/mailman/listinfo/python-list


on the prng behind random.random()

2018-11-19 Thread Robert Girault
Looking at its source code, it seems the PRNG behind random.random() is
Mersenne Twister, but I'm not sure.  It also seems that random.random()
is using /dev/urandom.  Can someone help me to read that source code?

I'm talking about CPython, by the way.  I'm reading 

  https://github.com/python/cpython/blob/master/Lib/random.py

The initial comment clearly says it's Mersenne Twister, but the only
random() function there seems to call _urandom(), which I suppose is an
interface to /dev/urandom.

What am I missing here?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Extend NTFS with "version" of file and "version" of folder, also optionally GIT integration or something like it.

2018-11-19 Thread Rhodri James

On 19/11/2018 16:42, skybuck2...@hotmail.com wrote:

As far as I know currently NTFS is missing a key feature for code development and 
compare: "versioning information" per file and per folder.


While I appreciate your desire for Files-11 (the OpenVMS filing system), 
I'm struggling to see how this is relevant to Python.


--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list


Re: Extend NTFS with "version" of file and "version" of folder, also optionally GIT integration or something like it.

2018-11-19 Thread Ethan Furman

On 11/19/2018 08:42 AM, skybuck2...@hotmail.com wrote:


As far as I know currently NTFS is missing a key feature for code development and 
compare: "versioning information" per file and per folder.


This is not a mailing list for the purpose of discussing Microsoft 
Windows enhancements.


How is this related to Python?

--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list


Re: bottledaemon stop/start doesn't work if killed elsewhere

2018-11-19 Thread Adam Funk
On 2018-11-19, Dennis Lee Bieber wrote:

> On Sun, 18 Nov 2018 15:33:47 -0600, Dan Sommers
><2qdxy4rzwzuui...@potatochowder.com> declaimed the following:
>
>>
>>What if the oom-killer kills the watchdog?
>>
>
>   Then you have TWO processes with out-of-control memory growth.
>
>   The out-of-memory killer should only be killing processes that are
> requesting obscene amounts of memory. You could put a USB hard-drive on the
> system and create a swap partition on the hard drive (you don't want to
> swap to an SD card, it will rapidly kill the card).

This pi has an external USB drive (with its own power supply) for
everything except /boot, including a 46 GB swap partition!

>   More important -- try to find out what your daemon is doing that is
> increasing its memory usage (Firefox on Windows is a known hog; I have to
> kill it periodically as it grows to 1.5GB [it's the 32-bit version due to
> my favored plug-ins that are no longer supported in 64-bit, so has a 2GB
> process limit]).

AFAICT the oom-killer only fires when the nightly texpire cron job (a
component of the leafnode local news server) is running, & even then
only once a week or so.  Usually when that happens, it kills texpire,
which doesn't really matter, since that runs again the next night.
Occasionally it kills some other thing.  I don't see how this
bottledaemon could be the memory hog --- it has one endpoint that
accepts a few hundred bytes of JSON, validates it, & then appends a
line to a TSV file.

Thanks,
Adam
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about the definition of the value of an object

2018-11-19 Thread Rhodri James

On 19/11/2018 14:08, Iwo Herka wrote:

I've been looking for something in the documentation
(https://docs.python.org/3.8/reference/datamodel.html) recently
and I've noticed something weird. Documentation states that every
object has a value, but doesn’t provide any definition
whatsoever of what the value is. Now, I'm sure that every reasonably
fluent Python programmer has an intuitive
understanding of the term, nonetheless, I would expect the
documentation defines it somehow (not necessarily
in a formal fashion), especially considering that "the value of an
object" is used to explain other concepts, such as
mutability:


I don't think it's weird at all.  "Value" is a complex concept, just ask 
a philosopher next time you have a few weeks free :-)  You even 
demonstrated as much, when you noted that ordering is part of the value 
of a tuple.  Attempting to define value here would be at best a massive 
distraction from the concepts the documentation is trying to get across.


--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list


Extend NTFS with "version" of file and "version" of folder, also optionally GIT integration or something like it.

2018-11-19 Thread skybuck2000
As far as I know currently NTFS is missing a key feature for code development 
and compare: "versioning information" per file and per folder.

This sucks badly.

Currently I have files as follows:

folder version 0.01\
some_source_code_file_version_1.pas
some_other_source_code_file_version1.pas

and

folder version 0.02\
some_source_code_file_version_2.pas
some_other_source_code_file_version2.pas

Now it's impossible to "code compare" these files with "code compare" tool.

It does not recgonize these files.

For code compare to work it would have to be:

folder version 0.01\
some_source_code_file_version_1.pas
some_other_source_code_file_version1.pas

and

folder version 0.02\
some_source_code_file_version_1.pas
some_other_source_code_file_version1.pas

However losing versioning information per file is dangerous in case folders are 
renamed or files are moved.

Encoding versioning information in project files is also highly undesired in 
case project files are lost or corrupted.

There is a very clear and distinct need to include versioning in folders and 
files.

I hope to see NTFS extended in the future to incorporate these 
changes/additional features.

I also highly recommend to include some GIT like versioning system integration 
with windows explorer.

It would be highly desireable to "view/browse" and even edit or change previous 
GIT versions via a windows explorer and also to be able to open different 
version folders/branches for code compare purposes and other editing tools.

Currently this GIT integration is missing, tortuise GIT does offer some 
integration/icon overlays but that's about it, advanced browsing of NTFS/GIT 
related folder tree structure is currently not possible cause ofcourse this 
functionality is missing.

Furthermore deep branching is desireable to allow very deep folders.

Also file path and folder path lengths must be incredibly large to allow deep 
branching and deep versioning of source code files.

GIT on top of NTFS would currently never be possible because of very limited 
path and folders length. A little bit of branching and it hits limits.

Also for a next-generation GIT version the following features are desired:

1A. Splitting of original source code file into multiple small files.
+
1B. Rebasing changes in original split file into multiple small files. 
Currently GIT is too limited in this and does not recgonize these changes in 
original to be copied/rebased into split files. A very serious GIT limitation. 
This is "pattern recgonition" technology that must be advanced to the next 
level of technology. Perhaps DEEP learning might offer some possibilities here 
or new algorithms to detect "movements and code changes" of 1 file into 
multiple files.

This feature will allow:

1.1. Splitting files into multiple files, without approval of original fork 
maintainer by "rebel" fork maintainer.

1.2. Continueing to work on multiple files by "rebel" fork, and rebasing 
changes of original fork maintainer, without original fork maintainer to commit 
these changes (from "rebel fork").

For as long as the pattern recgonizer can handle these differences it would be 
nice to have.

Currently GIT is limitted to splitting files by maintainer, forks cannot split 
files and then continue to benefit from original unsplitted source, if original 
source does not split.

Splitting files is in my oppinion necessary to keep software projects 
maintainable and functioning. Compilers/Editors/Analyzers/Visualizers/Parsers 
these are allow limited in their technology and implementation and can handle 
only so much due to limitations and especially bugs. The larger the source file 
the more like it is to contain some weird text which bugs out these tools, 
especially if the large file was written by a person that does not know the 
tools or the language well, strange bugs will occur especially but these kinds 
of programmers not knowing tool limitations and bugs.

Splitting off files can reduce these problems and solve them one by one better 
and offer more possibility for diagnosis of tool-related bugs and problems, 
plus possibly higher recompile times since not entire file needs to be 
re-compiled.

Also splitting files in relation to history and changing history/rebasing works 
bad currently in git as already indicated, somebody did write a script to try 
and solve it via branching per file, but as far as I know this script did not 
work.

In principle I am against changing history in a versioning system, though the 
re-base feature itself could be interesting if it works.

Think of this as "applieing a change in history towards the future", like time 
travel or something, perhaps a different term could be used for it.

"Apply change in past to future".

2. Better visualization of history/branches.

Currently I cannot make sense of how branches are merged, this would be nice if 
this was better indicated, ofcourse visualizating history of different 
time-related branches is more diffi

Question about the definition of the value of an object

2018-11-19 Thread Iwo Herka
Hello everyone,

I've been looking for something in the documentation
(https://docs.python.org/3.8/reference/datamodel.html) recently
and I've noticed something weird. Documentation states that every
object has a value, but doesn’t provide any definition
whatsoever of what the value is. Now, I'm sure that every reasonably
fluent Python programmer has an intuitive
understanding of the term, nonetheless, I would expect the
documentation defines it somehow (not necessarily
in a formal fashion), especially considering that "the value of an
object" is used to explain other concepts, such as
mutability:

> The value of some objects can change. Objects whose value can change are said 
> to be mutable; objects whose
value is unchangeable once they are created are called immutable.

So, why is documentation silent on this? One reason I can think of is
to avoid answering inconvenient questions.
For example, while it is trivial that the value of an object of type
"bool" (either object "True" or object "False") is
either truth or falsity, but what about tuples? For the tuple "(a, b,
c)", is the value {"a", "b", "c"} or ("a", "b", "c")?
In another words, does the value retain information about the order? I
would think so, since "(a, b, c) != (a, c, b)"
but it's not obvious if we define value as "all the data the object
holds" or something similar. Same question can
be extended to things such as lists, dictionaries or - even more
problematic - user-defined types.

Moreover, there are paragraphs in the documentation where the word
"value" is used in different,
seemingly confusing, contexts:

> Ellipsis
> This type has a single value. There is a single object with this value. This 
> object is accessed through the literal ...
or the built-in name Ellipsis. Its truth value is true.

This would suggest that:
1. Value of a type is an object of that type (this one is pretty standard).
2. There is a single value of type "Ellipsis", which is the object
accessed via built-in name "Ellipsis".
3. Value of the object "Ellipsis" is a unique value denoting the
omission from speech or writing.

Do I have it all backwards or am I missing something obvious here?
Thank you.

Sincerely,
Iwo Herka
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: What Python related git pre-commit hooks are you using?

2018-11-19 Thread Jon Ribbens
On 2018-11-18, Malcolm Greene  wrote:
> Curious to learn what Python related git pre-commit hooks people are
> using? What hooks have you found useful and which hooks have you tried
> and abandoned? Appreciate any suggestions for those new to this process.
> Background: Window, macOS, and Linux dev environments, PyCharm professional 
> edition IDE, 64-bit Python 3.6, private Github repos. Considering black 
> (standardize formatting), pylamas (multiple static code tests) and possibly a 
> hook into our pytest test runner.

I wrote the following pre-commit hook, which is fairly specific to my
requirements but also fairly easy to customize. It (configurably)
checks text-type files for tabs, trailing whitespace and non-Unix
line endings, python files for syntax and with flake8 and pylint,
and JavaScript files with 'standard'. It works with Python 2 or 3.

https://github.com/jribbens/voting/blob/master/.githooks/pre-commit
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Reading 'scientific' csv using Pandas?

2018-11-19 Thread Peter Otten
Martin Schöön wrote:

> My pandas is up to date.
> 
> In your example you use the US convention of using "." for decimals
> and "," to separate data. This works perfect for me too.
> 
> However, my data files use European conventions: decimal "," and TAB
> to separate data:
> 
> col1  col2
> 1,1   0
> 10,24e-05 1
> 9,492e-10 2
> 
> I use
> 
> EUData = pd.read_csv('file.csv', skiprows=1, sep='\t',
> decimal=',', engine='python')
> 
> to read from such files. This works so so. 'Common floats' (3,1415 etc)
> works just fine but 'scientific' stuff (1,6023e23) does not work.

With

>>> with open("file.csv", "w") as f:
... f.write("col1\tcol2\n"
... "1,1\t0\n"
... "10,24e-05\t1\n"
... "9,492e-10\t2\n")
... 
40

the following works on my system:

>>> pd.read_csv("file.csv", delimiter="\t", decimal=",")
   col1  col2
0  1.10e+00 0
1  1.024000e-04 1
2  9.492000e-10 2

[3 rows x 2 columns]

The version is a bit old, though:

>>> pd.__version__
'0.13.1'

The engine="python" produces an exception over here:

"""
ValueError: The 'decimal' option is not supported with the 'python' engine
"""

Maybe you can try and omit that option?
If that doesn't work you can specify a converter:

>>> pd.read_csv("file.csv", sep="\t", converters={0: lambda s: 
float(s.replace(",", "."))})
   col1  col2
0  1.10e+00 0
1  1.024000e-04 1
2  9.492000e-10 2

[3 rows x 2 columns]


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Reading 'scientific' csv using Pandas?

2018-11-19 Thread Shakti Kumar
Hi Martin,

On Sun, 18 Nov 2018 at 23:59, Martin Schöön  wrote:
>
> Den 2018-11-18 skrev Shakti Kumar :
> > On Sun, 18 Nov 2018 at 18:18, Martin Schöön  wrote:
> >>
> >> Now I hit a bump in the road when some of the data is not in plain
> >> decimal notation (xxx,xx) but in 'scientific' (xx,xxxe-xx) notation.
> >>
> >
> > Martin, I believe this should be done by pandas itself while reading
> > the csv file,
> > I took an example in scientific notation and checked this out,
> >
> > my sample.csv file is,
> > col1,col2
> > 1.1,0
> > 10.24e-05,1
> > 9.492e-10,2
> >
> That was a quick answer!
>
> My pandas is up to date.
>
> In your example you use the US convention of using "." for decimals
> and "," to separate data. This works perfect for me too.
>
> However, my data files use European conventions: decimal "," and TAB
> to separate data:
>
> col1col2
> 1,1 0
> 10,24e-05   1
> 9,492e-10   2
>

A quick fix would be to replace all commas in your file with stops (.)
In case you have other stops in your file not necessarily in your
scientific notation columns only, you may do this replace process only
for your interested columns.
Meanwhile I should be looking for a cleaner way of loading this csv in
pandas, never came through this comma notation :)
Members of @python-list@python.org, any better solution?

> I use
>
> EUData = pd.read_csv('file.csv', skiprows=1, sep='\t',
> decimal=',', engine='python')
>
> to read from such files. This works so so. 'Common floats' (3,1415 etc)
> works just fine but 'scientific' stuff (1,6023e23) does not work.
>
> /Martin
> --
> https://mail.python.org/mailman/listinfo/python-list



-- 
Shakti.
-- 
https://mail.python.org/mailman/listinfo/python-list