Re: Bulletproof json.dump?

2020-07-09 Thread Adam Funk
On 2020-07-07, Stephen Rosen wrote:

> On Mon, Jul 6, 2020 at 6:37 AM Adam Funk  wrote:
>
>> Is there a "bulletproof" version of json.dump somewhere that will
>> convert bytes to str, any other iterables to list, etc., so you can
>> just get your data into a file & keep working?
>>
>
> Is the data only being read by python programs? If so, consider using
> pickle: https://docs.python.org/3/library/pickle.html
> Unlike json dumping, the goal of pickle is to represent objects as exactly
> as possible and *not* to be interoperable with other languages.
>
>
> If you're using json to pass data between python and some other language,
> you don't want to silently convert bytes to strings.
> If you have a bytestring of utf-8 data, you want to utf-8 decode it before
> passing it to json.dumps.
> Likewise, if you have latin-1 data, you want to latin-1 decode it.
> There is no universal and correct bytes-to-string conversion.
>
> On Mon, Jul 6, 2020 at 9:45 AM Chris Angelico  wrote:
>
>> Maybe what we need is to fork out the default JSON encoder into two,
>> or have a "strict=True" or "strict=False" flag. In non-strict mode,
>> round-tripping is not guaranteed, and various types will be folded to
>> each other - mainly, many built-in and stdlib types will be
>> represented in strings. In strict mode, compliance with the RFC is
>> ensured (so ValueError will be raised on inf/nan), and everything
>> should round-trip safely.
>>
>
> Wouldn't it be reasonable to represent this as an encoder which is provided
> by `json`? i.e.
>
> from json import dumps, UnsafeJSONEncoder
> ...
> json.dumps(foo, cls=UnsafeJSONEncoder)
>
> Emphasizing the "Unsafe" part of this and introducing people to the idea of
> setting an encoder also seems nice.
>
>
> On Mon, Jul 6, 2020 at 9:12 AM Chris Angelico  wrote:
>
>> On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list
>>  wrote:
>> >
>
>> The 'json' module already fails to provide round-trip functionality:
>> >
>> > >>> for data in ({True: 1}, {1: 2}, (1, 2)):
>> > ... if json.loads(json.dumps(data)) != data:
>> > ... print('oops', data, json.loads(json.dumps(data)))
>> > ...
>> > oops {True: 1} {'true': 1}
>> > oops {1: 2} {'1': 2}
>> > oops (1, 2) [1, 2]
>>
>> There's a fundamental limitation of JSON in that it requires string
>> keys, so this is an obvious transformation. I suppose you could call
>> that one a bug too, but it's very useful and not too dangerous. (And
>> then there's the tuple-to-list transformation, which I think probably
>> shouldn't happen, although I don't think that's likely to cause issues
>> either.)
>
>
> Ideally, all of these bits of support for non-JSON types should be opt-in,
> not opt-out.
> But it's not worth making a breaking change to the stdlib over this.
>
> Especially for new programmers, the notion that
> deserialize(serialize(x)) != x
> just seems like a recipe for subtle bugs.
>
> You're never guaranteed that the deserialized object will match the
> original, but shouldn't one of the goals of a de/serialization library be
> to get it as close as is reasonable?
>
>
> I've seen people do things which boil down to
>
> json.loads(x)["some_id"] == UUID(...)
>
> plenty of times. It's obviously wrong and the fix is easy, but isn't making
> the default json encoder less strict just encouraging this type of bug?
>
> Comparing JSON data against non-JSON types is part of the same category of
> errors: conflating JSON with dictionaries.
> It's very easy for people to make this mistake, especially since JSON
> syntax is a subset of python dict syntax, so I don't think `json.dumps`
> should be encouraging it.
>
> On Tue, Jul 7, 2020 at 6:52 AM Adam Funk  wrote:
>
>> Here's another "I'd expect to have to deal with this sort of thing in
>> Java" example I just ran into:
>>
>> >>> r = requests.head(url, allow_redirects=True)
>> >>> print(json.dumps(r.headers, indent=2))
>> ...
>> TypeError: Object of type CaseInsensitiveDict is not JSON serializable
>> >>> print(json.dumps(dict(r.headers), indent=2))
>> {
>>   "Content-Type": "text/html; charset=utf-8",
>>   "Server": "openresty",
>> ...
>> }
>>
>
> Why should the JSON encoder know about an arbitra

Re: Bulletproof json.dump?

2020-07-07 Thread Adam Funk
On 2020-07-06, Adam Funk wrote:

> On 2020-07-06, Chris Angelico wrote:
>> On Mon, Jul 6, 2020 at 10:11 PM Jon Ribbens via Python-list
>> wrote:

>>> While I agree entirely with your point, there is however perhaps room
>>> for a bit more helpfulness from the json module. There is no sensible
>>> reason I can think of that it refuses to serialize sets, for example.
>>
>> Sets don't exist in JSON. I think that's a sensible reason.
>
> I don't agree.  Tuples & lists don't exist separately in JSON, but
> both are serializable (to the same thing).  Non-string keys aren't
> allowed in JSON, but it silently converts numbers to strings instead
> of barfing.  Typically, I've been using sets to deduplicate values as
> I go along, & having to walk through the whole object changing them to
> lists before serialization strikes me as the kind of pointless labor
> that I expect when I'm using Java.  ;-)

Here's another "I'd expect to have to deal with this sort of thing in
Java" example I just ran into:


>>> r = requests.head(url, allow_redirects=True)
>>> print(json.dumps(r.headers, indent=2))
...
TypeError: Object of type CaseInsensitiveDict is not JSON serializable
>>> print(json.dumps(dict(r.headers), indent=2))
{
  "Content-Type": "text/html; charset=utf-8",
  "Server": "openresty",
...
}


-- 
I'm after rebellion --- I'll settle for lies.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Adam Funk
On 2020-07-06, Chris Angelico wrote:

> On Mon, Jul 6, 2020 at 10:11 PM Jon Ribbens via Python-list
> wrote:
>>
>> On 2020-07-06, Chris Angelico  wrote:
>> > On Mon, Jul 6, 2020 at 8:36 PM Adam Funk  wrote:
>> >> Is there a "bulletproof" version of json.dump somewhere that will
>> >> convert bytes to str, any other iterables to list, etc., so you can
>> >> just get your data into a file & keep working?
>> >
>> > That's the PHP definition of "bulletproof" - whatever happens, no
>> > matter how bad, just keep right on going.
>>
>> While I agree entirely with your point, there is however perhaps room
>> for a bit more helpfulness from the json module. There is no sensible
>> reason I can think of that it refuses to serialize sets, for example.
>
> Sets don't exist in JSON. I think that's a sensible reason.

I don't agree.  Tuples & lists don't exist separately in JSON, but
both are serializable (to the same thing).  Non-string keys aren't
allowed in JSON, but it silently converts numbers to strings instead
of barfing.  Typically, I've been using sets to deduplicate values as
I go along, & having to walk through the whole object changing them to
lists before serialization strikes me as the kind of pointless labor
that I expect when I'm using Java.  ;-)



>> Going a bit further and, for example, automatically calling isoformat()
>> on date/time/datetime objects would perhaps be a bit more controversial,
>> but would frequently be useful, and there's no obvious downside that
>> occurs to me.
>
> They wouldn't round-trip without some way of knowing which strings
> represent date/times. If you just want a one-way output format, it's
> not too hard to subclass the encoder - there's an example right there
> in the docs (showing how to create a representation for complex
> numbers). The vanilla JSON encoder shouldn't do any of this. In fact,
> just supporting infinities and nans is fairly controversial - see
> other threads happening right now.
>
> Maybe what people want is a pretty printer instead?
>
> https://docs.python.org/3/library/pprint.html
>
> Resilient against recursive data structures, able to emit Python-like
> code for many formats, is as readable as JSON, and is often
> round-trippable. It lacks JSON's interoperability, but if you're
> trying to serialize sets and datetimes, you're forfeiting that anyway.
>
> ChrisA


-- 
"It is the role of librarians to keep government running in difficult
times," replied Dramoren.  "Librarians are the last line of defence
against chaos."   (McMullen 2001)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Adam Funk
On 2020-07-06, Frank Millman wrote:

> On 2020-07-06 2:06 PM, Jon Ribbens via Python-list wrote:
>> On 2020-07-06, Chris Angelico  wrote:
>>> On Mon, Jul 6, 2020 at 8:36 PM Adam Funk  wrote:
>>>> Is there a "bulletproof" version of json.dump somewhere that will
>>>> convert bytes to str, any other iterables to list, etc., so you can
>>>> just get your data into a file & keep working?
>>>
>>> That's the PHP definition of "bulletproof" - whatever happens, no
>>> matter how bad, just keep right on going.
>> 
>> While I agree entirely with your point, there is however perhaps room
>> for a bit more helpfulness from the json module. There is no sensible
>> reason I can think of that it refuses to serialize sets, for example.
>> Going a bit further and, for example, automatically calling isoformat()
>> on date/time/datetime objects would perhaps be a bit more controversial,
>> but would frequently be useful, and there's no obvious downside that
>> occurs to me.
>> 
>
> I may be missing something, but that would cause a downside for me.
>
> I store Python lists and dicts in a database by calling dumps() when 
> saving them to the database and loads() when retrieving them.
>
> If a date was 'dumped' using isoformat(), then on retrieval I would not 
> know whether it was originally a string, which must remain as is, or was 
> originally a date object, which must be converted back to a date object.
>
> There is no perfect answer, but my solution works fairly well. When 
> dumping, I use 'default=repr'. This means that dates get dumped as 
> 'datetime.date(2020, 7, 6)'. I look for that pattern on retrieval to 
> detect that it is actually a date object.
>
> I use the same trick for Decimal objects.
>
> Maybe the OP could do something similar.

Aha, I think the default=repr option is probably just what I need;
maybe (at least in the testing stages) something like this:

try:
with open(output_file, 'w') as f:
json.dump(f)
except TypeError:
print('unexpected item in the bagging area!')
with open(output_file, 'w') as f:
json.dump(f, default=repr)

and then I'd know when I need to go digging through the output for
bytes, sets, etc., but at least I'd have the output to examine.


-- 
Well, we had a lot of luck on Venus
We always had a ball on Mars
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Adam Funk
On 2020-07-06, Chris Angelico wrote:

> On Mon, Jul 6, 2020 at 8:36 PM Adam Funk  wrote:
>>
>> Hi,
>>
>> I have a program that does a lot of work with URLs and requests,
>> collecting data over about an hour, & then writing the collated data
>> to a JSON file.  The first time I ran it, the json.dump failed because
>> there was a bytes value instead of a str, so I had to figure out where
>> that was coming from before I could get any data out.  I've previously
>> run into the problem of collecting values in sets (for deduplication)
>> & forgetting to walk through the big data object changing them to
>> lists before serializing.
>>
>> Is there a "bulletproof" version of json.dump somewhere that will
>> convert bytes to str, any other iterables to list, etc., so you can
>> just get your data into a file & keep working?
>>
>
> That's the PHP definition of "bulletproof" - whatever happens, no
> matter how bad, just keep right on going. If you really want some way

Well played!

> to write "just anything" to your file, I recommend not using JSON -
> instead, write out the repr of your data structure. That'll give a
> decent result for bytes, str, all forms of numbers, and pretty much
> any collection, and it won't break if given something that can't
> safely be represented.

Interesting point.  At least the TypeError message does say what the
unacceptable type is ("Object of type set is not JSON serializable").


-- 
"It is the role of librarians to keep government running in difficult
times," replied Dramoren.  "Librarians are the last line of defence
against chaos."   (McMullen 2001)
-- 
https://mail.python.org/mailman/listinfo/python-list


Bulletproof json.dump?

2020-07-06 Thread Adam Funk
Hi,

I have a program that does a lot of work with URLs and requests,
collecting data over about an hour, & then writing the collated data
to a JSON file.  The first time I ran it, the json.dump failed because
there was a bytes value instead of a str, so I had to figure out where
that was coming from before I could get any data out.  I've previously
run into the problem of collecting values in sets (for deduplication)
& forgetting to walk through the big data object changing them to
lists before serializing.

Is there a "bulletproof" version of json.dump somewhere that will
convert bytes to str, any other iterables to list, etc., so you can
just get your data into a file & keep working?

(I'm using Python 3.7.)

Thanks!

-- 
Slade was the coolest band in England. They were the kind of guys
that would push your car out of a ditch.  ---Alice Cooper
-- 
https://mail.python.org/mailman/listinfo/python-list


Creating LF, NEL line terminators by accident? (python3)

2019-03-26 Thread Adam Funk
Hi,

I have a Python 3 (using 3.6.7) program that reads a TSV file, does
some churning with the data, and writes a TSV file out.

#v+
print('reading', options.input_file)
with open(options.input_file, 'r', encoding='utf-8-sig') as f:
for line in f.readlines():
row = line.split('\t')
# DO STUFF WITH THE CELLS IN THE ROW

# ...

print('writing', options.output_file)
with open(options.output_file, 'w', encoding='utf-8') as f:
# MAKE THE HEADER list of str
f.write('\t'.join(header) + '\n')

for doc_id in sorted(all_ids):
# CREATE A ROW list of str FOR EACH DOCUMENT ID
f.write('\t'.join(row) + '\n')
#v-

I noticed that the file command on the output returns "UTF-8 Unicode
text, with very long lines, with LF, NEL line terminators".

I'd never come across NEL terminators until now, and I've never
(AFAIK) created a file with them before.  Any idea why this is
happening?

(I tried changing the input encoding from 'utf-8-sig' to 'utf-8' but
got the same results with the output.)

Thanks,
Adam


-- 
I am at the moment writing a lengthy indictment against our
century. When my brain begins to reel from my literary labors, I make
an occasional cheese dip.---Ignatius J Reilly
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Basic pynomo instructions not working

2018-12-10 Thread Adam Funk
On 2018-11-22, dieter wrote:

> The "pynomo" version you have installed may have been developped for
> Python 2 and not run in "python3".
>
> In Python 2, you have implicit relative imports.
> As an example, it allows modules in the package "pynomo"
> to use "import nomo_wrapper" to import the submodule "nomo_wrapper".
> Python 3 has discarded implicit relative imports. In
> the example above, "import nomo_wrapper" must become
> "from . import nomo_wrapper" (explicit relative import)
> or "import pynomo.nomo_wrapper as nomo_wrapper" (absolute import).
>
> For the time being, you still find many packages which run
> only under Python 2. Failing relative imports or syntax errors
> are a frequent indication towards this.

Yup, that's the problem --- it does work in Python 2.  I didn't think
of that at first because there was a pip3 package for it!


-- 
Growth for growth's sake is the ideology of the cancer cell.
 ---Edward Abbey
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: bottledaemon stop/start doesn't work if killed elsewhere

2018-11-20 Thread Adam Funk
On 2018-11-18, Dan Sommers wrote:

> On 11/18/18 1:21 PM, MRAB wrote:> On 2018-11-18 17:50, Adam Funk wrote:
> >> Hi,
> >>
> >> I'm using bottledaemon to run a little REST service on a Pi that takes
> >> input from other machines on the LAN and stores stuff in a database.
> >> I have a cron job to call 'stop' and 'start' on it daily, just in case
> >> of problems.
> >>
> >> Occasionally the oom-killer runs overnight and kills the process using
> >> bottledaemon; when this happens (unlike properly stopping the daemon),
> >> the pidfile and its lockfile are left on the filesystem, so the 'stop'
> >> does nothing and the 'start' gets refusedq because the old pidfile and
> >> lockfile are present.  At the moment, I eventually notice something
> >> wrong with the output data, ssh into the Pi, and rm the two files then
> >> call 'start' on the daemon again.
> >>
> >> Is there a recommended or good way to handle this situation
> >> automatically?
> >>
> > Could you write a watchdog daemon that checks whether bottledaemon is
> > running, and deletes those files if it isn't (or hasn't been for a 
> while)?
>
> What if the oom-killer kills the watchdog?
>
> Whatever runs in response to the start command has to be smarter:  if
> the pid and lock files exist, then check whether they refer to a
> currently running bottledaemon.  If so, then all is well, and refuse to
> start a redundant daemon.  If not, then remove the pid and lock files
> and start the daemon.

I've reported this as an issue on github.  It seems to me that the
'stop' subcommand should delete the pidfile and lockfile if the pid is
no longer running.


-- 
It's like a pair of eyes. You're looking at the umlaut, and it's
looking at you. ---David St. Hubbins
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Suggestions for plotting slide rule & sector scales?

2018-11-20 Thread Adam Funk
On 2018-11-08, Stefan Ram wrote:

> Adam Funk  writes:
>>and get a line 100 mm long with a log scale on the top and a linear
>>scale on the bottom.
>
>   From what I have heard,
>
> pyqt.sourceforge.net/Docs/PyQt4/qx11info.html#appDpiX
>
>   will give you the dots per inch (link not validated).
>
>   matplotlib.axis.Axis handles drawing of t tick lines,
>   grid lines, tick and axis label (information not verified).

I hadn't thought of using matplotlib to do axes only, without the rest
of the graph --- interesting idea.

I've also found pynomo, which looks more appropriate, but I'm having
problems with that (see other post).


-- 
Physics is like sex. Sure, it may give some practical results, but 
that's not why we do it.---Richard Feynman
-- 
https://mail.python.org/mailman/listinfo/python-list


Basic pynomo instructions not working

2018-11-20 Thread Adam Funk
Hi,

I'm trying to use the basic stuff in pynomo



which I've installed with pip3, but I run into this problem trying to
the basic stuff in the documentation:

#v+
$ python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pynomo.nomographer import *
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/adam/.local/lib/python3.6/site-packages/pynomo/nomographer.py", 
line 16, in 
from nomo_wrapper import *
ModuleNotFoundError: No module named 'nomo_wrapper'
>>> import pynomo
>>> import pynomo.nomographer
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/adam/.local/lib/python3.6/site-packages/pynomo/nomographer.py", 
line 16, in 
from nomo_wrapper import *
ModuleNotFoundError: No module named 'nomo_wrapper'
>>>
#v-

Any ideas?

Thanks.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: bottledaemon stop/start doesn't work if killed elsewhere

2018-11-19 Thread Adam Funk
On 2018-11-19, Dennis Lee Bieber wrote:

> On Sun, 18 Nov 2018 15:33:47 -0600, Dan Sommers
><2qdxy4rzwzuui...@potatochowder.com> declaimed the following:
>
>>
>>What if the oom-killer kills the watchdog?
>>
>
>   Then you have TWO processes with out-of-control memory growth.
>
>   The out-of-memory killer should only be killing processes that are
> requesting obscene amounts of memory. You could put a USB hard-drive on the
> system and create a swap partition on the hard drive (you don't want to
> swap to an SD card, it will rapidly kill the card).

This pi has an external USB drive (with its own power supply) for
everything except /boot, including a 46 GB swap partition!

>   More important -- try to find out what your daemon is doing that is
> increasing its memory usage (Firefox on Windows is a known hog; I have to
> kill it periodically as it grows to 1.5GB [it's the 32-bit version due to
> my favored plug-ins that are no longer supported in 64-bit, so has a 2GB
> process limit]).

AFAICT the oom-killer only fires when the nightly texpire cron job (a
component of the leafnode local news server) is running, & even then
only once a week or so.  Usually when that happens, it kills texpire,
which doesn't really matter, since that runs again the next night.
Occasionally it kills some other thing.  I don't see how this
bottledaemon could be the memory hog --- it has one endpoint that
accepts a few hundred bytes of JSON, validates it, & then appends a
line to a TSV file.

Thanks,
Adam
-- 
https://mail.python.org/mailman/listinfo/python-list


bottledaemon stop/start doesn't work if killed elsewhere

2018-11-18 Thread Adam Funk
Hi,

I'm using bottledaemon to run a little REST service on a Pi that takes
input from other machines on the LAN and stores stuff in a database.
I have a cron job to call 'stop' and 'start' on it daily, just in case
of problems.

Occasionally the oom-killer runs overnight and kills the process using
bottledaemon; when this happens (unlike properly stopping the daemon),
the pidfile and its lockfile are left on the filesystem, so the 'stop'
does nothing and the 'start' gets refusedq because the old pidfile and
lockfile are present.  At the moment, I eventually notice something
wrong with the output data, ssh into the Pi, and rm the two files then
call 'start' on the daemon again.

Is there a recommended or good way to handle this situation
automatically?

Thanks,
Adam
-- 
https://mail.python.org/mailman/listinfo/python-list


Suggestions for plotting slide rule & sector scales?

2018-11-08 Thread Adam Funk
I like old scientific instruments & that sort of thing, & am wondering
what libraries I could use for programmatically generating
mathematical scales, ideally able to display the result in the GUI &
save it as a png (or other standard graphics format).  Ideally, I'd
like to be able to write code like this:

line = foo_library.draw_line(0, 0, 100, 0)

for x in range(1,101):
line.add_minor_tick(math.log10(x) * 100/2, side=top)

for x in range(1,11):
line.add_major_tick(math.log10(x*10) * 100/2, label=str(x), side=top)

for x in range(0,100):
line.add_minor_tick(x, side=bottom)

for x in range(0,10):
line.add_major_tick(x*10, label=str(x), side=bottom)

and get a line 100 mm long with a log scale on the top and a linear
scale on the bottom.

Thanks,
Adam


-- 
It takes a thousand men to invent a telegraph, or a steam engine, or a
phonograph, or a telephone or any other important thing --- and the
last man gets the credit and we forget the others.   ---Mark Twain
-- 
https://mail.python.org/mailman/listinfo/python-list


managing the RDFLib NamespaceManager

2017-03-14 Thread Adam Funk
I'm using RDFLib 4.2.2 (installed with pip3) in Python 3.5.2, and the
automatic behaviour of the NamespaceManager is giving me conniptions,
because the automatically generated namespaces go too far to the right
in the URIs and make the right-hand parts of the QNames meaningless.

For example, I have this code:

nsMgr = NamespaceManager(rdflib.Graph())
nsMgr.bind('dbpedia', Namespace('http://dbpedia.org/resource/'))
# plus some others


but in the serialized RDF I get these (and some other ns-numbered
ones):

@prefix dbpedia:  .
@prefix ns3:  .
@prefix ns5:  .

and obviously the QName outputs are wrong.  Is there any way to make
an RDFLib NamespaceManager *not* generate any namespaces
automatically?

Thanks,
Adam
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Making IDLE3 ignore non-BMP characters instead of throwing an exception?

2016-10-21 Thread Adam Funk
On 2016-10-17, eryk sun wrote:

> On Mon, Oct 17, 2016 at 2:20 PM, Adam Funk  wrote:
>> I'm using IDLE 3 (with python 3.5.2) to work interactively with
>> Twitter data, which of course contains emojis.  Whenever the running
>> program tries to print the text of a tweet with an emoji, it barfs
>> this & stops running:
>>
>>   UnicodeEncodeError: 'UCS-2' codec can't encode characters in
>>   position 102-102: Non-BMP character not supported in Tk
>>
>> Is there any way to set IDLE to ignore these characters (either drop
>> them or replace them with something else) instead of throwing the
>> exception?
>>
>> If not, what's the best way to strip them out of the string before
>> printing?
>
> You can patch print() to transcode non-BMP characters as surrogate
> pairs. For example:
>
> import builtins
>
> def print_ucs2(*args, print=builtins.print, **kwds):
> args2 = []
> for a in args:
> a = str(a)
> if max(a) > '\u':
> b = a.encode('utf-16le', 'surrogatepass')
> chars = [b[i:i+2].decode('utf-16le', 'surrogatepass')
>  for i in range(0, len(b), 2)]
> a = ''.join(chars)
> args2.append(a)
> print(*args2, **kwds)
>
> builtins._print = builtins.print
> builtins.print = print_ucs2
>
> On Windows this should allow printing non-BMP characters such as
> emojis (e.g. U+0001F44C). On Linux it prints a non-BMP character as a
> pair of empty boxes. If you're not using Windows you can modify this
> to print something else for non-BMP characters, such as a replacement
> character or \U literals.

Clever, thanks.  (I'm actually using Linux.)

-- 
Consistently separating words by spaces became a general custom about
the tenth century A. D., and lasted until about 1957, when FORTRAN
abandoned the practice.  --- Sun FORTRAN Reference Manual
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Making IDLE3 ignore non-BMP characters instead of throwing an exception?

2016-10-17 Thread Adam Funk
On 2016-10-17, Adam Funk wrote:

> I'm using IDLE 3 (with python 3.5.2) to work interactively with
> Twitter data, which of course contains emojis.  Whenever the running
> program tries to print the text of a tweet with an emoji, it barfs
> this & stops running:
>
>   UnicodeEncodeError: 'UCS-2' codec can't encode characters in
>   position 102-102: Non-BMP character not supported in Tk
>
> Is there any way to set IDLE to ignore these characters (either drop
> them or replace them with something else) instead of throwing the
> exception?
>
> If not, what's the best way to strip them out of the string before
> printing?

Well, to answer part of my own question, this works for stripping them
out:

 s = ''.join([c for c in s if ord(c)<65535])



-- 
Master Foo said: "A man who mistakes secrets for knowledge is like
a man who, seeking light, hugs a candle so closely that he smothers
it and burns his hand."--- Eric Raymond
-- 
https://mail.python.org/mailman/listinfo/python-list


Making IDLE3 ignore non-BMP characters instead of throwing an exception?

2016-10-17 Thread Adam Funk
I'm using IDLE 3 (with python 3.5.2) to work interactively with
Twitter data, which of course contains emojis.  Whenever the running
program tries to print the text of a tweet with an emoji, it barfs
this & stops running:

  UnicodeEncodeError: 'UCS-2' codec can't encode characters in
  position 102-102: Non-BMP character not supported in Tk

Is there any way to set IDLE to ignore these characters (either drop
them or replace them with something else) instead of throwing the
exception?

If not, what's the best way to strip them out of the string before
printing?

Thanks,
Adam
-- 
https://mail.python.org/mailman/listinfo/python-list


pyicloud: TypeError: 'dict_values' object does not support indexing

2016-09-30 Thread Adam Funk
I'm trying to use pyicloud in idle3 (installed by pip3 on Ubuntu).



The basic stuff works, but access to photos (following the
instructions) fails:


>>> photos = api.photos.all
>>> for photo in photos:
print(photo.filename)


Traceback (most recent call last):
  File "", line 2, in 
  print(photo.filename)
File
  "/usr/local/lib/python3.5/dist-packages/pyicloud/services/photos.py",
  line 242, in filename
  return self.data['details'].get('filename')
File
  "/usr/local/lib/python3.5/dist-packages/pyicloud/services/photos.py",
  line 237, in data
  self._data = self.album._fetch_asset_data_for(self)
File
  "/usr/local/lib/python3.5/dist-packages/pyicloud/services/photos.py",
  line 203, in _fetch_asset_data_for
  client_ids.append(self._photo_assets[index].client_id)
  TypeError: 'dict_values' object does not support indexing

which points at this bit of the source code

240@property
241def filename(self):
242return self.data['details'].get('filename')

And I get the same exception trying to do anything with a single
photo.  Is this code not really Python 3 compatible?  Or am I doing
something stupid?

Thanks,
Adam


-- 
A firm rule must be imposed upon our nation before it destroys
itself. The United States needs some theology and geometry, some taste
and decency. I suspect that we are teetering on the edge of the abyss.
 --- Ignatius J Reilly
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does pathlib not have is_readable() & things like that?

2016-04-29 Thread Adam Funk
On 2016-04-28, Grant Edwards wrote:

> On 2016-04-28, Adam Funk  wrote:
>> On 2016-04-26, Random832 wrote:
>>
>>> On Tue, Apr 26, 2016, at 09:30, Adam Funk wrote:
>>>> I recently discovered pathlib in the Python 3 standard library, & find
>>>> it very useful, but I'm a bit surprised that it doesn't offer things
>>>> like is_readable() and is_writable.  Is there a good reason for that?
>>>
>>> Well, one reason would be EAFP. Just try to open the file and see if it
>>> gives you a PermissionError.
>>
>> I understand that in general, but the tool I'm working on here takes a
>> command-line option to specify an output directory, & I'd rather not
>> start processing the data (which involves GETting from a REST service,
>> processing, and PUTting back modifications to the data) only to crash
>> after the first batch because of a user error.
>
> Then open the output file before you do the GET.

I guess I could, but fetching the data actually involves a whole lot
of GET requests (the first one includes cross-references to the URLs
where the rest of the data is found), some BeautifulSoup processing, &
a lot of other processing to produce a big dict, which I then write
out as json using what I think is the best way (output_file is an
instance of pathlib.Path):

with output_file.open(mode='w', encoding='UTF-8', errors='replace')  as f:
json.dump(output, f, sort_keys=True, indent=2)

> Or just do os.access("directory/where/you/want/to/open/a/file",os.W_OK)

That's what I'm doing now, but I prefer to give the user the error
message early on.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does pathlib not have is_readable() & things like that?

2016-04-28 Thread Adam Funk
On 2016-04-26, Random832 wrote:

> On Tue, Apr 26, 2016, at 09:30, Adam Funk wrote:
>> I recently discovered pathlib in the Python 3 standard library, & find
>> it very useful, but I'm a bit surprised that it doesn't offer things
>> like is_readable() and is_writable.  Is there a good reason for that?
>
> Well, one reason would be EAFP. Just try to open the file and see if it
> gives you a PermissionError.

I understand that in general, but the tool I'm working on here takes a
command-line option to specify an output directory, & I'd rather not
start processing the data (which involves GETting from a REST service,
processing, and PUTting back modifications to the data) only to crash
after the first batch because of a user error.


-- 
Specifications are for the weak & timid!
  --- Klingon Programmer's Guide
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does pathlib not have is_readable() & things like that?

2016-04-28 Thread Adam Funk
On 2016-04-26, Steven D'Aprano wrote:

> On Tue, 26 Apr 2016 11:30 pm, Adam Funk wrote:

>> I've been improvising with things like this:
>> 
>> import pathlib, os
>> 
>> path = pathlib.Path('some/directory')
>> writable = os.access(str(path), os.W_OK | os.X_OK)
>> 
>> Is that the best way to do it?
>
> No. All you have learned is that the directory is writable *now*. In a
> millisecond, or a minute, when you actually go to write to it, it may no
> longer be writable -- it may not even exist.
>
> There is a whole class of serious security vulnerabilities and bugs caused
> by the difference between the time you check something and the time you
> actually use it. "Time of check to time of use" bugs can be best avoided by
> not checking ahead of time whether the directory is writable, but just
> *attempting to write to it*, and catching the error if you can't.

I appreciate the general principle, but the situation here is a tool
that iterates over a loop of the following: GET some chunks of data
from a REST service, process them, PUT something back to existing
documents through the REST service, & save the result in a directory
specified as a command-line option.  I don't want the tool to modify
the first batch of data in the REST service, be unable to store the
results locally, & crash as a result.


-- 
A mathematical formula should never be "owned" by anybody! Mathematics
belongs to God.   --- Donald Knuth
-- 
https://mail.python.org/mailman/listinfo/python-list


Why does pathlib not have is_readable() & things like that?

2016-04-26 Thread Adam Funk
I recently discovered pathlib in the Python 3 standard library, & find
it very useful, but I'm a bit surprised that it doesn't offer things
like is_readable() and is_writable.  Is there a good reason for that?

I've been improvising with things like this:

import pathlib, os

path = pathlib.Path('some/directory')
writable = os.access(str(path), os.W_OK | os.X_OK) 

Is that the best way to do it?


-- 
Unix is a user-friendly operating system. It's just very choosy about
its friends.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Testing whether the VPN is running?

2016-02-25 Thread Adam Funk
On 2016-02-23, Cameron Simpson wrote:

> On 18Feb2016 10:03, Adam Funk  wrote:
>>On 2016-02-18, Ervin Hegedüs wrote:
>>> I think that the psutil modul could be better for you for this
>>> task:
>>> https://pypi.python.org/pypi/psutil/
>>>
>>> and see the "Network" section.
>>
>>if 'tun0' in psutil.net_if_addrs():
>>   # vpn is running
>>
>>Brilliant!  I've used psutil for something else, but I didn't know it
>>did that too.  My excuse is that the version on my system was 2.2.1,
>>which does not do that, so I installed the newer version with pip3.
>>Thanks for pointing me to that.
>
> You might also want to check that the interface is up.
>
> My personal hack (not for a VPN, but for "being online", which turns my ssh 
> tunnels on and off) is to look in the output of "netstat -rn" for a default 
> route. This may imply that an alternative test for you is to test for a route 
> to your VPN's address range?  Just an idea.

Also interesting to know, thanks.


-- 
Mrs CJ and I avoid clichés like the plague.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Testing whether the VPN is running?

2016-02-18 Thread Adam Funk
On 2016-02-18, Ervin Hegedüs wrote:

> Hi Adam,
>
> On Thu, Feb 18, 2016 at 09:26:58AM +0000, Adam Funk wrote:
>> I'd like to test (inside a python 3 program) whether the VPN is
>> running or not.  The only thing I can think of so far is to use
>> subprocess to run the 'ifconfig' command, then check its output for
>> 'tun0'.  Is there a better way?
>
> you didn't wrote, which system (OS) you want to use - based on
> the "ifconfig" and "tun0" keywords, possible that's a Linux.

Oops, sorry!  But your educated guess is right.  :-)

> I think that the psutil modul could be better for you for this
> task:
>
> https://pypi.python.org/pypi/psutil/
>
> and see the "Network" section.

if 'tun0' in psutil.net_if_addrs():
   # vpn is running

Brilliant!  I've used psutil for something else, but I didn't know it
did that too.  My excuse is that the version on my system was 2.2.1,
which does not do that, so I installed the newer version with pip3.
Thanks for pointing me to that.


-- 
XML is like violence: if it doesn't solve the problem,
use more.
-- 
https://mail.python.org/mailman/listinfo/python-list


Testing whether the VPN is running?

2016-02-18 Thread Adam Funk
I'd like to test (inside a python 3 program) whether the VPN is
running or not.  The only thing I can think of so far is to use
subprocess to run the 'ifconfig' command, then check its output for
'tun0'.  Is there a better way?

Thanks.


-- 
Nam Sibbyllam quidem Cumis ego ipse oculis meis vidi in ampulla 
pendere, et cum illi pueri dicerent: beable beable beable; respondebat 
illa: doidy doidy doidy. --- plorkwort
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: trying to force stdout to utf-8 with errors='ignore' or 'replace'

2015-12-11 Thread Adam Funk
On 2015-12-11, Peter Otten wrote:

> Adam Funk wrote:

>> but with either or both of those, I get the dreaded
>> "UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
>> 562: ordinal not in range(128)".  How can I force the output to be in
>> UTF-8 & silently suppress invalid characters?
>
> (I'm assuming you are using Python 2 and that main_body is a unicode 
> instance)

The short answer turned out to be 'switch to Python 3', which I think
is what I'll do from now on unless I absolutely need a library that
isn't available there.

(AFAICT, the email parser in 2.7 returns the body as a bytestring &
doesn't actually look at the Content-Type header, & trying to decode
the body with that just made it barf in different places.)


-- 
Science is what we understand well enough to explain to a computer.  
Art is everything else we do.  --- Donald Knuth
-- 
https://mail.python.org/mailman/listinfo/python-list


trying to force stdout to utf-8 with errors='ignore' or 'replace'

2015-12-11 Thread Adam Funk
I'm fiddling with a program that reads articles in the news spool
using email.parser (standard library) &
email_reply_parser.EmailReplyParser (installed with pip).  Reading is
fine, & I don't get any errors writing output extracted from article
bodies *until* I try to suppress invalid characters.  This works:

if message.is_multipart():
body = message.get_payload(0, True)
else:
body = message.get_payload()
main_body = EmailReplyParser.parse_reply(body)
# fix quoted-printable stuff
if equals_regex.search(main_body):
main_body = quopri.decodestring(main_body)
# suppress attribution before quoted text
main_body = attrib_regex.sub('>', main_body)
# suppress sig
main_body = sig_regex.sub('\n', main_body)
main_body.strip()
stdout.write(main_body + '\n\n')

but the stdout includes invalid characters.  I tried adding this at
the beginning

if stdout.encoding is None:
   writer = codecs.getwriter("utf-8")
   stdout = writer(stdout, errors='replace')

and changing the output line to 

stdout.write(main_body.encode('utf-8', errors='replace') + '\n\n')

but with either or both of those, I get the dreaded
"UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
562: ordinal not in range(128)".  How can I force the output to be in
UTF-8 & silently suppress invalid characters?


-- 
Unit tests are like the boy who cried wolf.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: writing an email.message.Message in UTF-8

2015-12-08 Thread Adam Funk
On 2015-12-07, Terry Reedy wrote:

> On 12/7/2015 9:57 AM, Adam Funk wrote:
>> I'm trying to write an instance of email.message.Message, whose body
>> contains unicode characters, to a UTF-8 file.  (Python 2.7.3 & 2.7.10
>> again.)
>
> The email package was rewritten for, I believe, 3.3.  I believe it 
> should handle unicode email encoded as utf-8 more easily.

Actually it works in Python 3.2.3, & fortunately my program doesn't
depend on anything that isn't available for python 3 yet.  Thanks!


-- 
Most Americans are too civilized to hang skulls from baskets, having
been headhunters, of course, only as recently as Vietnam.
  --- Kinky Friedman
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: writing an email.message.Message in UTF-8

2015-12-08 Thread Adam Funk
On 2015-12-08, dieter wrote:

> Adam Funk  writes:
>
>> I'm trying to write an instance of email.message.Message, whose body
>> contains unicode characters, to a UTF-8 file.  (Python 2.7.3 & 2.7.10
>> again.)
>>
>> reply = email.message.Message()
>> reply.set_charset('utf-8')
>> ... # set various headers
>> reply.set_payload('\n'.join(body_lines) + '\n')
>> ...
>> outfile = codecs.open(outfilename, 'w', encoding='utf-8', 
>> errors='ignore')
>> outfile.write(reply.as_string())
>> outfile.close()
>>
>> Then reply.as_string() barfs a UnicodeDecodeError.  I look in the
>> documentation, which says the generator is better.  So I replace the
>> outfile.write(...) line with the following:
>>
>> g = email.generator.Generator(outfile, mangle_from_=False)
>> g.flatten(reply)
>>
>> which still barfs a UnicodeDecodeError.  Looking closer at the first
>> error, I see that the exception was in g.flatten(...) already & thrown
>> up to reply.as_string().  How can I force the thing to do UTF-8
>> output?
>
> You could try replacing "reply.set_payload('\n'.join(body_lines) + '\n')"
> by "reply.set_payload(('\n'.join(body_lines) + '\n').encode('utf-8'))",
> i.e. you would not pass in a unicode payload but an "utf-8" encode
> "str" payload.

That didn't work (I got the same error) but switching to python 3.2
did.  Thanks, though.


-- 
A mathematical formula should never be "owned" by anybody! Mathematics
belonga to God.   --- Donald Knuth
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: writing an email.message.Message in UTF-8

2015-12-07 Thread Adam Funk
On 2015-12-07, Adam Funk wrote:

> I'm trying to write an instance of email.message.Message, whose body
> contains unicode characters, to a UTF-8 file.  (Python 2.7.3 & 2.7.10
> again.)
>
> reply = email.message.Message()
> reply.set_charset('utf-8')
> ... # set various headers
> reply.set_payload('\n'.join(body_lines) + '\n')

I've also tried changing that to
 reply.set_payload('\n'.join(body_lines) + '\n', 'utf-8')
but I get the same error on output.

> ...
> outfile = codecs.open(outfilename, 'w', encoding='utf-8', errors='ignore')
> outfile.write(reply.as_string())
> outfile.close()
>
> Then reply.as_string() barfs a UnicodeDecodeError.  I look in the
> documentation, which says the generator is better.  So I replace the
> outfile.write(...) line with the following:
>
> g = email.generator.Generator(outfile, mangle_from_=False)
> g.flatten(reply)
>
> which still barfs a UnicodeDecodeError.  Looking closer at the first
> error, I see that the exception was in g.flatten(...) already & thrown
> up to reply.as_string().  How can I force the thing to do UTF-8
> output?
>
> Thanks.
>
>


-- 
Cats don't have friends.  They have co-conspirators.
 http://www.gocomics.com/getfuzzy/2015/05/31
-- 
https://mail.python.org/mailman/listinfo/python-list


writing an email.message.Message in UTF-8

2015-12-07 Thread Adam Funk
I'm trying to write an instance of email.message.Message, whose body
contains unicode characters, to a UTF-8 file.  (Python 2.7.3 & 2.7.10
again.)

reply = email.message.Message()
reply.set_charset('utf-8')
... # set various headers
reply.set_payload('\n'.join(body_lines) + '\n')
...
outfile = codecs.open(outfilename, 'w', encoding='utf-8', errors='ignore')
outfile.write(reply.as_string())
outfile.close()

Then reply.as_string() barfs a UnicodeDecodeError.  I look in the
documentation, which says the generator is better.  So I replace the
outfile.write(...) line with the following:

g = email.generator.Generator(outfile, mangle_from_=False)
g.flatten(reply)

which still barfs a UnicodeDecodeError.  Looking closer at the first
error, I see that the exception was in g.flatten(...) already & thrown
up to reply.as_string().  How can I force the thing to do UTF-8
output?

Thanks.


-- 
  $2.95!
 PLATE O' SHRIMP
Luncheon Special
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: getting fileinput to do errors='ignore' or 'replace'?

2015-12-07 Thread Adam Funk
On 2015-12-04, Oscar Benjamin wrote:

> Or you can use fileinput which is designed to be exactly this kind of
> context manager and to be used in this way. Although fileinput is slightly
> awkward in defaulting to reading stdin.

That default is what I specifically like about fileinput --- it's a
normal way for command-line tools to work:

$ sort file0 file1 file2 >sorted.txt
$ generate_junk | sort >sorted_junk.txt


-- 
  $2.95!
 PLATE O' SHRIMP
Luncheon Special
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: getting fileinput to do errors='ignore' or 'replace'?

2015-12-03 Thread Adam Funk
On 2015-12-03, Laura Creighton wrote:

> In a message of Thu, 03 Dec 2015 15:12:15 +0000, Adam Funk writes:
>>I'm having trouble with some input files that are almost all proper
>>UTF-8 but with a couple of troublesome characters mixed in, which I'd
>>like to ignore instead of throwing ValueError.  I've found the
>>openhook for the encoding
>>
>>for line in fileinput.input(options.files, 
>>openhook=fileinput.hook_encoded("utf-8")):
>>do_stuff(line)
>>
>>which the documentation describes as "a hook which opens each file
>>with codecs.open(), using the given encoding to read the file", but
>>I'd like codecs.open() to also have the errors='ignore' or
>>errors='replace' effect.  Is it possible to do this?
>>
>>Thanks.
>
> This should be both easy to add, and useful, and I happen to know that
> fileinput is being hacked on by Serhiy Storchaka right now, who agrees
> that this would be easy.  So, with his approval, I stuck this into the
> tracker.  http://bugs.python.org/issue25788  
>
> Future Pythons may not have the problem.

Good to know, thanks.


-- 
You cannot really appreciate Dilbert unless you've read it in the
original Klingon.  --- Klingon Programmer's Guide
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: getting fileinput to do errors='ignore' or 'replace'?

2015-12-03 Thread Adam Funk
On 2015-12-03, Terry Reedy wrote:

> fileinput is an ancient module that predates iterators (and generators) 
> and context managers. Since by 2.7 open files are both context managers 
> and line iterators, you can easily write your own multi-file line 
> iteration that does exactly what you want.  At minimum:
>
> for file in files:
>  with codecs.open(file, errors='ignore') as f
>  # did not look up signature,
>  for line in f:
>  do_stuff(line)
>
> To make this reusable, wrap in 'def filelines(files):' and replace 
> 'do_stuff(line)' with 'yield line'.

I like fileinput because if the file list is empty, it reads from
stdin instead (so I can pipe something else's output into it).
Unfortunately, the fix I got elsewhere in this thread doesn't seem to
work for that!


-- 
Science is what we understand well enough to explain to a computer.  
Art is everything else we do.  --- Donald Knuth
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: getting fileinput to do errors='ignore' or 'replace'?

2015-12-03 Thread Adam Funk
On 2015-12-03, Peter Otten wrote:

> def my_hook_encoded(encoding, errors=None):
> import io
> def openhook(filename, mode):
> mode = mode.replace('U', '').replace('b', '') or 'r'
> return io.open(
> filename, mode, 
> encoding=encoding, newline='', 
> errors=errors)
> return openhook
>
> for line in fileinput.input(
> options.files,
> openhook=my_hook_encoded("utf-8", errors="ignore")):
> do_stuff(line)

Perfect, thanks!


> (codecs.open() instead of io.open() should also work)

OK.


-- 
The internet is quite simply a glorious place. Where else can you find
bootlegged music and films, questionable women, deep seated xenophobia
and amusing cats all together in the same place?   --- Tom Belshaw
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: getting fileinput to do errors='ignore' or 'replace'?

2015-12-03 Thread Adam Funk
On 2015-12-03, Adam Funk wrote:

> I'm having trouble with some input files that are almost all proper
> UTF-8 but with a couple of troublesome characters mixed in, which I'd
> like to ignore instead of throwing ValueError.  I've found the
> openhook for the encoding
>
> for line in fileinput.input(options.files, 
> openhook=fileinput.hook_encoded("utf-8")):
> do_stuff(line)
>
> which the documentation describes as "a hook which opens each file
> with codecs.open(), using the given encoding to read the file", but
> I'd like codecs.open() to also have the errors='ignore' or
> errors='replace' effect.  Is it possible to do this?

I forgot to mention: this is for Python 2.7.3 & 2.7.10 (on different
machines).


-- 
...the reason why so many professional artists drink a lot is not
necessarily very much to do with the artistic temperament, etc.  It is
simply that they can afford to, because they can normally take a large
part of a day off to deal with the ravages.--- Amis _On Drink_
-- 
https://mail.python.org/mailman/listinfo/python-list


getting fileinput to do errors='ignore' or 'replace'?

2015-12-03 Thread Adam Funk
I'm having trouble with some input files that are almost all proper
UTF-8 but with a couple of troublesome characters mixed in, which I'd
like to ignore instead of throwing ValueError.  I've found the
openhook for the encoding

for line in fileinput.input(options.files, 
openhook=fileinput.hook_encoded("utf-8")):
do_stuff(line)

which the documentation describes as "a hook which opens each file
with codecs.open(), using the given encoding to read the file", but
I'd like codecs.open() to also have the errors='ignore' or
errors='replace' effect.  Is it possible to do this?

Thanks.


-- 
Why is it drug addicts and computer afficionados are both 
called users?  --- Clifford Stoll
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sqlite3 and dates

2015-02-19 Thread Adam Funk
On 2015-02-18, Chris Angelico wrote:

> On Thu, Feb 19, 2015 at 9:17 AM,   wrote:
>>> SQLite3 is fine for something that's basically just a more structured
>>> version of a flat file. You assume that nobody but you has the file
>>> open, and you manipulate it just the same as if it were a big fat blob
>>> of JSON, but thanks to SQLite, you don't have to rewrite the whole
>>> file every time you make a small change. That's fine.
>>
>> That's bullshit.  Sqlite offers a lot more than that including
>> a SQL interface, transactions, referential integrity, constraints
>> indexes, triggers and other general relational database features.
>>
>> That you would equate that to a JSON blob would indicate either
>> a profound ignorance about Sqlite or (more likely) a need to
>> defend your preference with complete disregard of fact.
>
> I didn't equate them. I said that SQLite3 is great if you look on it
> as an upgrade over a JSON blob. Of course it offers more features than
> that, and you don't need to swear at me to make your point.
>
> But SQLite3 is *not* great if you look on it as a database engine
> comparable with DB2, PostgreSQL, and even MySQL.

I certainly agree with that bit, but in my own code I can almost never
justify the hassle (set-up, security considerations, &c.) of using a
database server.  TBH, one reason I like SQLite3 is that I can easily
move the data file around in the filesystem or between machies.


-- 
"It is the role of librarians to keep government running in difficult
times," replied Dramoren.  "Librarians are the last line of defence
against chaos."   (McMullen 2001)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sqlite3 and dates

2015-02-19 Thread Adam Funk
On 2015-02-18, Johannes Bauer wrote:

> On 18.02.2015 12:21, Chris Angelico wrote:
>
>> SQLite3 is fine for something that's basically just a more structured
>> version of a flat file. You assume that nobody but you has the file
>> open, and you manipulate it just the same as if it were a big fat blob
>> of JSON, but thanks to SQLite, you don't have to rewrite the whole
>> file every time you make a small change. That's fine. But it's the
>> wrong tool for any job involving multiple users over a network, and
>> quite probably the wrong tool for a lot of other jobs too.
>
> Your assessment that some tools fit certain problems and don't fit
> different problems is entirely correct. SQLite does the job that it is
> supposed to do and it fills that nieche well.
>
>> It's the
>> smallest-end piece of software that can truly be called a database. I
>> would consider it to be the wrong database for serious accounting
>> work, and that's based on the ranting of a majorly-annoyed accountant
>> who had to deal with issues in professional systems that had made
>> similar choices in back-end selection.
>
> It probably is the wrong database for serious accounting work, and it's
> probably also the wrong database for doing multivariate statistical
> analysis on sparse matrices that you store in tables.
>
> You could similarly argue that a hammer is the wrong tool to drive in a
> screw and you'd be correct in that assessment. But it's completely
> besides the point.

"If your only tool is a hammer, every problem looks like a nail."
;-)


-- 
In the 1970s, people began receiving utility bills for
-£999,999,996.32 and it became harder to sustain the 
myth of the infallible electronic brain. (Verity Stob)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does argparse return None instead of [] if an append action isn't used?

2015-01-26 Thread Adam Funk
On 2015-01-26, Peter Otten wrote:

> Adam Funk wrote:
>
>> On 2015-01-09, Ned Batchelder wrote:
>> 
>>> On 1/9/15 9:44 AM, Adam Funk wrote:
>>>> This makes it a bit more trouble to use:
>>>>
>>>>if options.bar:
>>>>   for b in options:bar
>>>>  do_stuff(b)
>>>>
>>>> instead of
>>>>
>>>>for b in options.bar
>>>>   do_stuff(b)
>>>
>>> This doesn't answer why the value defaults to None, and some people may
>>> recoil at it, but I've used:
>>>
>>>  for b in options.bar or ():
>>>  do_stuff(b)
>> 
>> Do you mean "for b in options.bar or []:" ?
>
> Doesn't matter; in the context of a for loop any empty iterable would do.

Of course it would.  Doh!


-- 
A recent study conducted by Harvard University found that the average
American walks about 900 miles a year. Another study by the AMA found
that Americans drink, on average, 22 gallons of alcohol a year. This
means, on average, Americans get about 41 miles to the gallon.
 http://www.cartalk.com/content/average-americans-mpg
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does argparse return None instead of [] if an append action isn't used?

2015-01-26 Thread Adam Funk
On 2015-01-09, Ned Batchelder wrote:

> On 1/9/15 9:44 AM, Adam Funk wrote:
>> This makes it a bit more trouble to use:
>>
>>if options.bar:
>>   for b in options:bar
>>  do_stuff(b)
>>
>> instead of
>>
>>for b in options.bar
>>   do_stuff(b)
>
> This doesn't answer why the value defaults to None, and some people may 
> recoil at it, but I've used:
>
>  for b in options.bar or ():
>  do_stuff(b)

Do you mean "for b in options.bar or []:" ?


-- 
War is God's way of teaching Americans geography.
 [Ambrose Bierce]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does argparse return None instead of [] if an append action isn't used?

2015-01-26 Thread Adam Funk
On 2015-01-09, Wolfgang Maier wrote:

> On 01/09/2015 03:44 PM, Adam Funk wrote:
>> I noticed in use that if an option with the 'append' action isn't
>> used, argparse assigns None to it rather than an empty list, &
>> confirmed this interactively:
>>
>> #v+
>>>>> import argparse
>>>>> parser = argparse.ArgumentParser()
>>>>> parser.add_argument('--foo', action='append')
>> _AppendAction(option_strings=['--foo'], dest='foo', nargs=None, const=None, 
>> default=None, type=None, choices=None, help=None, metavar=None)
>>>>> parser.add_argument('--bar', action='append')
>> _AppendAction(option_strings=['--bar'], dest='bar', nargs=None, const=None, 
>> default=None, type=None, choices=None, help=None, metavar=None)
>>>>> parser.parse_args('--foo 1 --foo 2'.split())
>> Namespace(bar=None, foo=['1', '2'])
>> #v-
>>
>
> Isn't that the exact behaviour documented here:
>
> https://docs.python.org/3/library/argparse.html#default
>
> where it says that the default for the default argument is None ?
>
> I think Skip is right: you should be able to just add
>
> default = []
>
> to your arguments in the add_argument call.

Yes, it works.


-- 
Master Foo said: "A man who mistakes secrets for knowledge is like
a man who, seeking light, hugs a candle so closely that he smothers
it and burns his hand."--- Eric Raymond
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does argparse return None instead of [] if an append action isn't used?

2015-01-09 Thread Adam Funk
On 2015-01-09, Skip Montanaro wrote:

>> I noticed in use that if an option with the 'append' action isn't
>> used, argparse assigns None to it rather than an empty list, &
>> confirmed this interactively:
>
> I don't use argparse (or optparse), being a getopt Luddite myself, but
> can you set the default for an action in the add_argument call?

Well, duh!  That works, thanks.  (I can't explain why I didn't think
of that.)


>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--foo', action='append',default=[])
_AppendAction(option_strings=['--foo'], dest='foo', nargs=None, const=None, 
default=[], type=None, choices=None, help=None, metavar=None)
>>> parser.add_argument('--bar', action='append',default=[])
_AppendAction(option_strings=['--bar'], dest='bar', nargs=None, const=None, 
default=[], type=None, choices=None, help=None, metavar=None)
>>> parser.parse_args('--foo 1 --foo 2'.split())
Namespace(bar=[], foo=['1', '2'])
>>> parser.parse_args('--foo 1 --bar 2'.split())
Namespace(bar=['2'], foo=['1'])
>>> parser.parse_args([])
Namespace(bar=[], foo=[])





-- 
Slade was the coolest band in England. They were the kind of guys
that would push your car out of a ditch. --- Alice Cooper
-- 
https://mail.python.org/mailman/listinfo/python-list


Why does argparse return None instead of [] if an append action isn't used?

2015-01-09 Thread Adam Funk
I noticed in use that if an option with the 'append' action isn't
used, argparse assigns None to it rather than an empty list, &
confirmed this interactively:

#v+
>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--foo', action='append')
_AppendAction(option_strings=['--foo'], dest='foo', nargs=None, const=None, 
default=None, type=None, choices=None, help=None, metavar=None)
>>> parser.add_argument('--bar', action='append')
_AppendAction(option_strings=['--bar'], dest='bar', nargs=None, const=None, 
default=None, type=None, choices=None, help=None, metavar=None)
>>> parser.parse_args('--foo 1 --foo 2'.split())
Namespace(bar=None, foo=['1', '2'])
#v-

This makes it a bit more trouble to use:

  if options.bar:
 for b in options:bar
do_stuff(b)

instead of

  for b in options.bar
 do_stuff(b)

which is (of course) what I was doing when I discovered the None.  Is
there any benefit to the user from this, or is it just an "accident"
of the way argparse is written?


-- 
The history of the world is the history of a privileged few.
--- Henry Miller
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Permissions on files installed by pip?

2014-10-22 Thread Adam Funk
On 2014-10-17, Jean-Michel Pichavant wrote:

> - Original Message -
>> From: "Adam Funk" 
>> To: python-list@python.org
>> Sent: Thursday, 16 October, 2014 9:29:46 PM
>> Subject: Permissions on files installed by pip?
>> 
>> I've been using the python-nltk package on Ubuntu, but I need ntlk
>> 3.0
>> now.  I used 'sudo aptitude purge python-nltk' to get rid of my
>> existing installation, & followed instructions on the nltk website
>> [1]
>> starting at step 4 (since I already have python-pip & python-numpy
>> packages installed).
>> 
>> $ sudo pip install -U
>> 
>> I couldn't get it to work, until I realized that the permissions &
>> ownership on /usr/local/lib/python2.7/dist-packages were 'drwx--S---
>> root staff'.  A 'chmod -R a+rX' on that directory seems to have fixed
>> it.  Is it normal for sudo pip install to set the permissions that
>> way, or did I do something wrong?
>
> On debian wheezy:
>
> ls -al /usr/local/lib/python2.7/dist-packages  
>
> drwxrwsr-x 5 root staff 4.0K Jun 30 15:16 ./
>
> I'm not sure pip is responsible for this anyway, so my money goes on "you did 
> something wrong" :)

Probably something to do with the way I have sudo set up then.
Thanks.


-- 
Everybody says sex is obscene. The only true obscenity 
is war.   --- Henry Miller
-- 
https://mail.python.org/mailman/listinfo/python-list


Permissions on files installed by pip?

2014-10-16 Thread Adam Funk
I've been using the python-nltk package on Ubuntu, but I need ntlk 3.0
now.  I used 'sudo aptitude purge python-nltk' to get rid of my
existing installation, & followed instructions on the nltk website [1]
starting at step 4 (since I already have python-pip & python-numpy
packages installed).

$ sudo pip install -U 

I couldn't get it to work, until I realized that the permissions &
ownership on /usr/local/lib/python2.7/dist-packages were 'drwx--S---
root staff'.  A 'chmod -R a+rX' on that directory seems to have fixed
it.  Is it normal for sudo pip install to set the permissions that
way, or did I do something wrong?



[1]
http://www.nltk.org/install.html

-- 
Master Foo once said to a visiting programmer: "There is more
Unix-nature in one line of shell script than there is in ten
thousand lines of C."--- Eric Raymond
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Searching for lots of similar strings (filenames) in sqlite3 database

2014-07-02 Thread Adam Funk
On 2014-07-02, Chris Angelico wrote:

> On Wed, Jul 2, 2014 at 7:32 PM, Adam Funk  wrote:
>> Well, I've changed it to the following anyway.
>>
>> subdir_glob = subdir + '/*'
>> cursor.execute('SELECT filename FROM files WHERE filename GLOB ?',
>>(subdir_glob,))
>> rows = cursor.fetchall()
>> known_files = {row[0] for row in rows}
>>
>> I see what you mean about paths containing '%', but I don't see why
>> you were concerned about underscores, though.
>
> With GLOB, presumably ? matches a single character and * matches any
> number of characters. With LIKE, _ matches a single character and %
> matches any number. So, for instance, WHERE filename LIKE
> '/foo/bar/spam_spam/%' will match '/foo/bar/spam2spam/1234', which may
> be a little surprising. It's not going to be a serious problem in most
> cases, as it'll also match '/foo/bar/spam_spam/1234', but the false
> positives will make one of those "Huh" moments if you don't
> keep an eye on your magic characters.
>
> In your specific case, you happen to be safe, but as I look over the
> code, my paranoia kicks in and tells me to check :) It's just one of
> those things that flags itself to the mind - anything that might help
> catch bugs early is a good feature of the mind, in my opinion!

Oh, I'd just missed the '_' in the LIKE documentation.  Doh!


-- 
Indentation is for enemy skulls, not code!
--- Klingon Programmer's Guide
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Searching for lots of similar strings (filenames) in sqlite3 database

2014-07-02 Thread Adam Funk
On 2014-07-01, Chris Angelico wrote:

> On Wed, Jul 2, 2014 at 1:15 AM, Adam Funk  wrote:
>> On 2014-07-01, Chris Angelico wrote:

>>> There is one critical consideration, though. What happens if the
>>> directory name contains an underscore or percent sign? Or can you
>>> absolutely guarantee that they won't? You may need to escape them, and
>>> I'm not sure how SQLite handles that. (Possibly \_ will match literal
>>> _, and \\ will match literal \, or something like that.)
>>
>> I can guarantee that the directory names are all
>> '/var/spool/news/message.id/' then 3 digits.  (The filenames are
>> pretty wild, since they are MIDs.)  AIUI, using the '?' substitution
>> in the sqlite3 library is supposed to be safe.
>
> This is nothing to do with question-mark substitution. There are two
> separate levels of character significance here - it's like a quoted
> string with a regex. Suppose you want to make a regex that searches
> for an apostrophe. If you try to define that in a single-quoted
> string, you need to escape it:
>
> regex = '^\'$'
>
> However, if you ask the user to enter a regex, that wouldn't be necessary:
>
> regex = input("Enter a pattern: ") # raw_input in Python 2
> Enter a pattern: ^'$
>
> This is what the question mark substitution is like - it avoids the
> need to carefully manage string delimiters and so on. However, if you
> want to make a regex that searches for a backslash, then you need to
> escape it, because the backslash is important to the regex itself. In
> the same way, the underscore and percent sign are significant to the
> LIKE operator. If it were possible to have a directory name with a
> percent sign in it, it would match far too much - because you'd
> construct a LIKE pattern something like (ahem)
> "/var/spool/news/message%20id/142/%" - and as you can see, the percent
> sign at the end is no different from the percent sign in the middle.
>
> But you're safe because you know your data, unrelated to your
> substitution method. Possibly merits a comment... but possibly not
> worth it.

Well, I've changed it to the following anyway.

subdir_glob = subdir + '/*'
cursor.execute('SELECT filename FROM files WHERE filename GLOB ?',
   (subdir_glob,))
rows = cursor.fetchall()
known_files = {row[0] for row in rows}

I see what you mean about paths containing '%', but I don't see why
you were concerned about underscores, though.


-- 
You know, there are many people in the country today who, through no
fault of their own, are sane. Some of them were born sane. Some of
them became sane later in their lives.--― Graham Chapman
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Searching for lots of similar strings (filenames) in sqlite3 database

2014-07-02 Thread Adam Funk
On 2014-07-01, Chris Angelico wrote:

> On Wed, Jul 2, 2014 at 1:15 AM, Adam Funk  wrote:
>> On 2014-07-01, Chris Angelico wrote:
>>
>>> On Tue, Jul 1, 2014 at 9:26 PM, Adam Funk  wrote:
>>>> cursor.execute('SELECT filename FROM files WHERE filename IS ?', 
>>>> (filename,))
>>>
>>> Shouldn't this be an equality check rather than IS, which normally I'd
>>> expect to be "IS NULL" or "IS NOT NULL"?
>>
>> Oh, it probably should be in "heavy" SQL.  In SQLite, '==', '=', &
>> 'IS' are interchangeable.
>>
>> http://www.tutorialspoint.com/sqlite/sqlite_operators.htm
>
> Ah, okay. In that case, I'd advise going with either == for
> consistency with the rest of Python, or (preferably) = for consistency
> with other SQL engines. You wouldn't use "is" to test if two Python
> strings are equal, so there's no particular reason to use it here :)

I agree.

>> Oh, even better:
>>
>> add_files = listing - known_files
>> delete_files = known_files - listing
>>
>> and then I can remove files that have disappeared off the spool from
>> the table.  Thanks very much!
>
> Ah! Didn't know that was a valuable feature for you, but getting that
> "for free" is an extra little bonus, so that's awesome!

I didn't know it was a valuable feature until I saw that easy way to
do it!  This will keep the database from growing indefinitely, though.



-- 
'...and Tom [Snyder] turns to him and says, "so Alice [Cooper], is it
true you kill chickens on stage?"  That was the opening question, and
Alice looks at him real serious and goes, "Oh no, no no.  That's
Colonel Sanders.  Colonel Sanders kills chickens."'
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Searching for lots of similar strings (filenames) in sqlite3 database

2014-07-01 Thread Adam Funk
On 2014-07-01, Chris Angelico wrote:

> On Tue, Jul 1, 2014 at 9:26 PM, Adam Funk  wrote:
>> cursor.execute('SELECT filename FROM files WHERE filename IS ?', 
>> (filename,))
>
> Shouldn't this be an equality check rather than IS, which normally I'd
> expect to be "IS NULL" or "IS NOT NULL"?

Oh, it probably should be in "heavy" SQL.  In SQLite, '==', '=', &
'IS' are interchangeable.

http://www.tutorialspoint.com/sqlite/sqlite_operators.htm

Looking at that page again, I see that 'GLOB' is a case-sensitive
version of 'LIKE'.  I can't help but wonder if that makes it faster.
;-)


> As to your actual question: Your two database lookups are doing
> distinctly different things, so there's no surprise that they perform
> very differently. B asks the database "Do you have this? Do you have
> this?" for every file you have, and C asks the database "What do you
> have?", and then comparing that against the list of files. By the way
> - the A+C technique could be done quite tidily as a set difference:
>
> # assume you have listing1 and cursor set up
> # as per your above code
> listing = {os.path.join(directory, x) for x in listing1}
> cursor.execute(...) # as per above
> known_files = {row[0] for row in cursor} # cursors are iterable
> needed_files = listing - known_files
> cursor.executemany('INSERT INTO files VALUES (?, ?)', ((filename,
> 0) for filename in needed_files))

Oh, even better:

add_files = listing - known_files
delete_files = known_files - listing

and then I can remove files that have disappeared off the spool from
the table.  Thanks very much!


> Anyway. The significant thing is the performance of the database on
> two different workloads: either "give me everything that matches this
> pattern" (where the pattern ends with a percent sign), or "do you have
> this? do you have this? do you have this?". Generally, database
> indexing is fairly efficient at handling prefix searches, so the first
> query will basically amount to an index search, which is a lot faster
> than the repeated separate searching; it takes advantage of the fact
> that all the strings you're looking at will have the same prefix.
>
> There is one critical consideration, though. What happens if the
> directory name contains an underscore or percent sign? Or can you
> absolutely guarantee that they won't? You may need to escape them, and
> I'm not sure how SQLite handles that. (Possibly \_ will match literal
> _, and \\ will match literal \, or something like that.)

I can guarantee that the directory names are all
'/var/spool/news/message.id/' then 3 digits.  (The filenames are
pretty wild, since they are MIDs.)  AIUI, using the '?' substitution
in the sqlite3 library is supposed to be safe.

> This is not bypassing the database's optimization; in fact, it's
> working tidily within it. 

That's reassuring!

...
> But doing the set difference in Python is just as good a way of doing the job.

I like it.  Thanks very much.


-- 
Specifications are for the weak & timid!
  --- Klingon Programmer's Guide
-- 
https://mail.python.org/mailman/listinfo/python-list


Searching for lots of similar strings (filenames) in sqlite3 database

2014-07-01 Thread Adam Funk
I have some code that reads files in a leafnode2 news spool & needs to
check for new files periodically.  The full paths are all like
'/var/spool/news/message.id/345/<123...@example.com>' with a 3-digit
subdirectory & a Message-ID for the filename itself.  I'm using Python
3 & sqlite3 in the standard library.

I have a table of filenames created with the following command:

   cursor.execute('CREATE TABLE files (filename TEXT PRIMARY KEY, used 
INTEGER)')

To check for new files in one of the subdirectories, I run A then
either B or C below (I've tried both).

A.
listing1 = os.listdir(directory)
listing [os.path.join(directory, x) for x in listing1]

B.
cursor = db_conn.cursor()
for filename in listing:
cursor.execute('SELECT filename FROM files WHERE filename IS ?', 
(filename,))
row = cursor.fetchone()
if not row:
cursor.execute('INSERT INTO files VALUES (?, ?)', (filename, 0) )
files_new += 1
db_conn.commit()

C.
cursor = db_conn.cursor()
subdir_like = directory + '/%'
cursor.execute('SELECT filename FROM files WHERE filename LIKE ?', 
(subdir_like,))
rows = cursor.fetchall()
known_files =  [row[0] for row in rows]
for filename in listing:
if filename not in known_files:
cursor.execute('INSERT INTO files VALUES (?, ?)', (filename, 0) )
files_new += 1
db_conn.commit()

A+B was the first method I came up with, because it looks like the
"keep it simple & let the database do its job" approach, but it was
very time-consuming, so I tested A+C out.  A is quick (a second); B
can drag on for over an hour to check 2000 filenames (for example) in
a subdirectory; C always takes less than a minute.  So C is much
better than B, but it looks (to me) like one of those attempts to
bypass & ignore the database's built-in optimizations.

Comments?


-- 
No sport is less organized than Calvinball!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Standard way to generate mail/news reply?

2014-06-26 Thread Adam Funk
On 2014-06-24, Skip Montanaro wrote:

> On Tue, Jun 24, 2014 at 6:46 AM, Adam Funk  wrote:
>> Is there some standard library or code for taking an e-mail or
>> newsgroup message & generating a reply to it?
>
> You might try searching for "mail reply" on pypi.python.org. That will
> return a number of hits. I know the python.org replybot is there and
> used frequently. It might be a good starting point.

It looks like I can use the email_reply_parser to do half the job, &
modify code from replybot to do the other half.  Thanks!


-- 
svn ci -m 'come back make, all is forgiven!' build.xml
-- 
https://mail.python.org/mailman/listinfo/python-list


Standard way to generate mail/news reply?

2014-06-24 Thread Adam Funk
Is there some standard library or code for taking an e-mail or
newsgroup message & generating a reply to it?  (I mean things like
quoting the original message, >> quoting etc. where necessary, &
generating the right References & In-Reply-To headers.)

I homebrewed some code for this in Perl (sorry) years ago, but before
I reimplement it in Python, I thought I should ask if there's a "good"
way.

Thanks,
Adam


-- 
A recent study conducted by Harvard University found that the average
American walks about 900 miles a year. Another study by the AMA found
that Americans drink, on average, 22 gallons of alcohol a year. This
means, on average, Americans get about 41 miles to the gallon.
 http://www.cartalk.com/content/average-americans-mpg
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers

2014-06-03 Thread Adam Funk
On 2014-05-27, Steven D'Aprano wrote:

> On Tue, 27 May 2014 16:13:46 +0100, Adam Funk wrote:

>> Well, here's the way it works in my mind:
>> 
>>I can store a set of a zillion strings (or a dict with a zillion
>>string keys), but every time I test "if new_string in seen_strings",
>>the computer hashes the new_string using some kind of "short hash",
>>checks the set for matching buckets (I'm assuming this is how python
>>tests set membership --- is that right?), 
>
> So far so good. That applies to all objects, not just strings.
>
>
>>then checks any
>>hash-matches for string equality.  Testing string equality is slower
>>than integer equality, and strings (unless they are really short)
>>take up a lot more memory than long integers.
>
> But presumably you have to keep the string around anyway. It's going to 
> be somewhere, you can't just throw the string away and garbage collect 
> it. The dict doesn't store a copy of the string, it stores a reference to 
> it, and extra references don't cost much.

In the case where I did something like that, I wasn't keeping copies
of the strings in memory after hashing (& otherwise processing them).
I know that putting the strings' pointers in the set is a light memory
load.



[snipping the rest because...]

You've convinced me.  Thanks.



-- 
I heard that Hans Christian Andersen lifted the title for "The Little
Mermaid" off a Red Lobster Menu. [Bucky Katt]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers

2014-06-03 Thread Adam Funk
On 2014-05-28, Dan Sommers wrote:

> On Tue, 27 May 2014 17:02:50 +, Steven D'Aprano wrote:
>
>> - rather than "zillions" of them, there are few enough of them that
>>  the chances of an MD5 collision is insignificant;
>
>>   (Any MD5 collision is going to play havoc with your strategy of
>>   using hashes as a proxy for the real string.)
>
>> - and you can arrange matters so that you never need to MD5 hash a
>>   string twice.
>
> Hmmm...  I'll use the MD5 hashes of the strings as a key, and the
> strinsgs as the value (to detect MD5 collisions) ...

Hey, I'm not *that* stupid.


-- 
In the 1970s, people began receiving utility bills for
-£999,999,996.32 and it became harder to sustain the 
myth of the infallible electronic brain. (Verity Stob)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers

2014-05-27 Thread Adam Funk
On 2014-05-23, Terry Reedy wrote:

> On 5/23/2014 6:27 AM, Adam Funk wrote:
>
>> that.  The only thing that really bugs me in Python 3 is that execfile
>> has been removed (I find it useful for testing things interactively).
>
> The spelling has been changed to exec(open(...).read(), ... . It you use 
> it a lot, add a customized def execfile(filename, ... to your site 
> module or local utils module.

Are you talking about this?

https://docs.python.org/3/library/site.html

Is there a dummies/quick-start guide to using USER_SITE stuff?


-- 
No sport is less organized than Calvinball!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers

2014-05-27 Thread Adam Funk
On 2014-05-23, Chris Angelico wrote:

> On Fri, May 23, 2014 at 8:27 PM, Adam Funk  wrote:
>> I've also used hashes of strings for other things involving
>> deduplication or fast lookups (because integer equality is faster than
>> string equality).  I guess if it's just for deduplication, though, a
>> set of byte arrays is as good as a set of int?
>
> Want a better way to do that? Use a set or dict and let Python do the
> hashing. It'll be every bit as fast as explicit hashing, plus you get
> the bonus of simplicity.

Well, here's the way it works in my mind:

   I can store a set of a zillion strings (or a dict with a zillion
   string keys), but every time I test "if new_string in
   seen_strings", the computer hashes the new_string using some kind
   of "short hash", checks the set for matching buckets (I'm assuming
   this is how python tests set membership --- is that right?), then
   checks any hash-matches for string equality.  Testing string
   equality is slower than integer equality, and strings (unless they
   are really short) take up a lot more memory than long integers.

So for that kind of thing, I tend to store MD5 hashes or something
similar.  Is that wrong?


-- 
With the breakdown of the medieval system, the gods of chaos, lunacy,
and bad taste gained ascendancy.
--- Ignatius J Reilly
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers

2014-05-23 Thread Adam Funk
On 2014-05-23, Adam Funk wrote:

> On 2014-05-22, Peter Otten wrote:

>> In Python 3 there's int.from_bytes()
>>
>>>>> h = hashlib.sha1(b"Hello world")
>>>>> int.from_bytes(h.digest(), "little")
>> 538059071683667711846616050503420899184350089339
>
> Excellent, thanks for pointing that out.  I've just recently started
> using Python 3 instead of 2, & appreciate pointers to new things like
> that.

BTW, I just tested that & it should be "big" for consistency with the
hexdigest:

Python 3.3.2+ (default, Feb 28 2014, 00:52:16) 
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> h0 = hashlib.sha1(bytes('pants', 'UTF-8')).digest()
>>> h1 = hashlib.sha1(bytes('pants', 'UTF-8')).hexdigest()
>>> int.from_bytes(h0, 'big')
1315090007003469710610607131160586168131917298749
>>> int.from_bytes(h0, 'little')
352462323236431222976527983157432783788229548774
>>> int(h1, 16)
1315090007003469710610607131160586168131917298749

Thanks.


-- 
The kid's a hot prospect. He's got a good head for merchandising, an
agent who can take you downtown and one of the best urine samples I've
seen in a long time.   [Dead Kennedys t-shirt]
-- 
https://mail.python.org/mailman/listinfo/python-list


hashing strings to integers (was: hashing strings to integers for sqlite3 keys)

2014-05-23 Thread Adam Funk
On 2014-05-22, Peter Otten wrote:

> Adam Funk wrote:

>> Well, J*v* returns a byte array, so I used to do this:
>> 
>> digester = MessageDigest.getInstance("MD5");
>> ...
>> digester.reset();
>> byte[] digest = digester.digest(bytes);
>> return new BigInteger(+1, digest);
>
> In Python 3 there's int.from_bytes()
>
>>>> h = hashlib.sha1(b"Hello world")
>>>> int.from_bytes(h.digest(), "little")
> 538059071683667711846616050503420899184350089339

Excellent, thanks for pointing that out.  I've just recently started
using Python 3 instead of 2, & appreciate pointers to new things like
that.  The only thing that really bugs me in Python 3 is that execfile
has been removed (I find it useful for testing things interactively).


>> I dunno why language designers don't make it easy to get a single big
>> number directly out of these things.
>  
> You hardly ever need to manipulate the numerical value of the digest. And on 
> its way into the database it will be re-serialized anyway.

I now agree that my original plan to hash strings for the SQLite3
table was pointless, so I've changed the subject header.  :-)

I have had good reason to use int hashes in the past, however.  I was
doing some experiments with Andrei Broder's "sketches of shingles"
technique for finding partial duplication between documents, & you
need integers for that so you can do modulo arithmetic.

I've also used hashes of strings for other things involving
deduplication or fast lookups (because integer equality is faster than
string equality).  I guess if it's just for deduplication, though, a
set of byte arrays is as good as a set of int?


-- 
Classical Greek lent itself to the promulgation of a rich culture,
indeed, to Western civilization.  Computer languages bring us
doorbells that chime with thirty-two tunes, alt.sex.bestiality, and
Tetris clones. (Stoll 1995)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Chris Angelico wrote:

> On Thu, May 22, 2014 at 11:54 PM, Adam Funk  wrote:

>> That ties in with a related question I've been wondering about lately
>> (using MD5s & SHAs for other things) --- getting a hash value (which
>> is internally numeric, rather than string, right?) out as a hex string
>> & then converting that to an int looks inefficient to me --- is there
>> any better way to get an int?  (I haven't seen any other way in the
>> API.)
>
> I don't know that there is, at least not with hashlib. You might be
> able to use digest() followed by the struct module, but it's no less
> convoluted. It's the same in several other languages' hashing
> functions; the result is a string, not an integer.

Well, J*v* returns a byte array, so I used to do this:

digester = MessageDigest.getInstance("MD5");
...
digester.reset();
byte[] digest = digester.digest(bytes);
return new BigInteger(+1, digest);

I dunno why language designers don't make it easy to get a single big
number directly out of these things.


I just had a look at the struct module's fearsome documentation &
think it would present a good shoot(self, foot) opportunity.


-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Chris Angelico wrote:

> On Thu, May 22, 2014 at 11:41 PM, Adam Funk  wrote:
>> On further reflection, I think I asked for that.  In fact, the table
>> I'm using only has one column for the hashes --- I wasn't going to
>> store the strings at all in order to save disk space (maybe my mind is
>> stuck in the 1980s).
>
> That's a problem, then, because you will see hash collisions. Maybe
> not often, but they definitely will occur if you have enough strings
> (look up the birthday paradox - with a 32-bit arbitrarily selected
> integer (such as a good crypto hash that you then truncate to 32
> bits), you have a 50% chance of a collision at just 77,000 strings).

Ah yes, there's a handy table for that:

https://en.wikipedia.org/wiki/Birthday_attack#Mathematics


> Do you have enough RAM to hold all the strings directly? Just load 'em
> all up into a Python set. Set operations are fast, clean, and easy.
> Your already_seen function becomes a simple 'in' check. These days you
> can get 16GB or 32GB of RAM in a PC inexpensively enough; with an
> average string size of 80 characters, and assuming Python 3.3+, that's
> about 128 bytes each - close enough, and a nice figure. 16GB divided
> by 128 gives 128M strings - obviously you won't get all of that, but
> that's your ball-park. Anything less than, say, a hundred million
> strings, and you can dump the lot into memory. Easy!

Good point, & since (as I explained in my other post) the substrings
are being deduplicated in their own table anyway it's probably not
worth bothering with persistence between runs for this bit.


-- 
Some say the world will end in fire; some say in segfaults.
 [XKCD 312]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Tim Chase wrote:

> On 2014-05-22 12:47, Adam Funk wrote:
>> I'm using Python 3.3 and the sqlite3 module in the standard library.
>> I'm processing a lot of strings from input files (among other
>> things, values of headers in e-mail & news messages) and suppressing
>> duplicates using a table of seen strings in the database.
>> 
>> It seems to me --- from past experience with other things, where
>> testing integers for equality is faster than testing strings, as
>> well as from reading the SQLite3 documentation about INTEGER
>> PRIMARY KEY --- that the SELECT tests should be faster if I am
>> looking up an INTEGER PRIMARY KEY value rather than TEXT PRIMARY
>> KEY.  Is that right?
>
> If sqlite can handle the absurd length of a Python long, you *can* do
> it as ints:

It can't.  SQLite3 INTEGER is an 8-byte signed one.

https://www.sqlite.org/datatype3.html

But after reading the other replies to my question, I've concluded
that what I was trying to do is pointless.


>  >>> from hashlib import sha1
>  >>> s = "Hello world"
>  >>> h = sha1(s)
>  >>> h.hexdigest()
>   '7b502c3a1f48c8609ae212cdfb639dee39673f5e'
>  >>> int(h.hexdigest(), 16)
>   703993777145756967576188115661016000849227759454L

That ties in with a related question I've been wondering about lately
(using MD5s & SHAs for other things) --- getting a hash value (which
is internally numeric, rather than string, right?) out as a hex string
& then converting that to an int looks inefficient to me --- is there
any better way to get an int?  (I haven't seen any other way in the
API.)


-- 
A firm rule must be imposed upon our nation before it destroys
itself. The United States needs some theology and geometry, some taste
and decency. I suspect that we are teetering on the edge of the abyss.
 --- Ignatius J Reilly
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Chris Angelico wrote:

> On Thu, May 22, 2014 at 9:47 PM, Adam Funk  wrote:
>> I'm using Python 3.3 and the sqlite3 module in the standard library.
>> I'm processing a lot of strings from input files (among other things,
>> values of headers in e-mail & news messages) and suppressing
>> duplicates using a table of seen strings in the database.
>>
>> It seems to me --- from past experience with other things, where
>> testing integers for equality is faster than testing strings, as well
>> as from reading the SQLite3 documentation about INTEGER PRIMARY KEY
>> --- that the SELECT tests should be faster if I am looking up an
>> INTEGER PRIMARY KEY value rather than TEXT PRIMARY KEY.  Is that
>> right?
>
> It might be faster to use an integer primary key, but the possibility
> of even a single collision means you can't guarantee uniqueness
> without a separate check. I don't know sqlite3 well enough to say, but
> based on what I know of PostgreSQL, it's usually best to make your
> schema mimic your logical structure, rather than warping it for the
> sake of performance. With a good indexing function, the performance of
> a textual PK won't be all that much worse than an integral one, and
> everything you do will read correctly in the code - no fiddling around
> with hashes and collision checks.
>
> Stick with the TEXT PRIMARY KEY and let the database do the database's
> job. If you're processing a really large number of strings, you might
> want to consider moving from sqlite3 to PostgreSQL anyway (I've used
> psycopg2 quite happily), as you'll get better concurrency; and that
> might solve your performance problem as well, as Pg plays very nicely
> with caches.

Well, actually I'm thinking about doing away with checking for
duplicates at this stage, since the substrings that I pick out of the
deduplicated header values go into another table as the TEXT PRIMARY
KEY anyway, with deduplication there.  So I think this stage reeks of
premature optimization.


-- 
The history of the world is the history of a privileged few.
--- Henry Miller
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Peter Otten wrote:

> Adam Funk wrote:
>
>> I'm using Python 3.3 and the sqlite3 module in the standard library.
>> I'm processing a lot of strings from input files (among other things,
>> values of headers in e-mail & news messages) and suppressing
>> duplicates using a table of seen strings in the database.
>> 
>> It seems to me --- from past experience with other things, where
>> testing integers for equality is faster than testing strings, as well
>> as from reading the SQLite3 documentation about INTEGER PRIMARY KEY
>> --- that the SELECT tests should be faster if I am looking up an
>> INTEGER PRIMARY KEY value rather than TEXT PRIMARY KEY.  Is that
>> right?
>
> My gut feeling tells me that this would matter more for join operations than 
> lookup of a value. If you plan to do joins you could use an autoinc integer 
> as the primary key and an additional string key for lookup.

I'm not doing any join operations.  I'm using sqlite3 for storing big
piles of data & persistence between runs --- not really "proper
relational database use".  In this particular case, I'm getting header
values out of messages & doing this:

  for this_string in these_strings:
if not already_seen(this_string):
  process(this_string)
# ignore if already seen 

...
> and only if you can demonstrate a significant speedup keep the complication 
> in your code.
>
> If you find such a speedup I'd like to see the numbers because this cries 
> PREMATURE OPTIMIZATION...

On further reflection, I think I asked for that.  In fact, the table
I'm using only has one column for the hashes --- I wasn't going to
store the strings at all in order to save disk space (maybe my mind is
stuck in the 1980s).


-- 
But the government always tries to coax well-known writers into the
Establishment; it makes them feel educated. [Robert Graves]
-- 
https://mail.python.org/mailman/listinfo/python-list


hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
I'm using Python 3.3 and the sqlite3 module in the standard library.
I'm processing a lot of strings from input files (among other things,
values of headers in e-mail & news messages) and suppressing
duplicates using a table of seen strings in the database.

It seems to me --- from past experience with other things, where
testing integers for equality is faster than testing strings, as well
as from reading the SQLite3 documentation about INTEGER PRIMARY KEY
--- that the SELECT tests should be faster if I am looking up an
INTEGER PRIMARY KEY value rather than TEXT PRIMARY KEY.  Is that
right?

If so, what sort of hashing function should I use?  The "maxint" for
SQLite3 is a lot smaller than the size of even MD5 hashes.  The only
thing I've thought of so far is to use MD5 or SHA-something modulo the
maxint value.  (Security isn't an issue --- i.e., I'm not worried
about someone trying to create a hash collision.)

Thanks,
Adam


-- 
"It is the role of librarians to keep government running in difficult
times," replied Dramoren.  "Librarians are the last line of defence
against chaos."   (McMullen 2001)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Significant digits in a float?

2014-05-08 Thread Adam Funk
On 2014-05-02, Dennis Lee Bieber wrote:

> On Thu, 01 May 2014 21:55:20 +0100, Adam Funk 
> declaimed the following:
>
>>On 2014-05-01, Dennis Lee Bieber wrote:

>>> Math teacher was selling them in my 10th grade... Actually I already
>>> owned a Faber-Castell 57/22 "Business" ruler (which did NOT have the CF/DF
>>> scales set for *PI) and a Pickett N-1010-ES Trig rule.
>>
>>What does a "business" slide-rule do?  Depreciation?
>>
>
>   Special markers for: dozen, gross; a scale for "non-metric measures" to
> metric equivalents -- US Bushel, UK ("brit") bushel, US gallon, UK gallon,
> short and long tons, a few Russian units, "Pud" and "R.t." which appear to
> map to cubic inch and cubic foot; markings for % (discount and mark-up)
>
>   And a scheme for simple interest calculations (which may explain why
> the CF/DF scales are longer than the C/D scales): "Move the main cursor
> line over the principal on scale DF -- the principal must be taken only on
> scale DF -- set the rate per cent on the scale CI, under the short cursor
> line, and read the interest on the scale DF or D in line with the number of
> days on the scale CF or C." {yes, just to the left of the normal cursor is
> a short line only over the inverted C scale}

Interesting, thanks.

-- 
I used to be better at logic problems, before I just dumped
them all into TeX and let Knuth pick out the survivors.
   -- plorkwort
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Significant digits in a float?

2014-05-01 Thread Adam Funk
On 2014-05-01, Larry Hudson wrote:

> On 05/01/2014 05:56 AM, Roy Smith wrote:

>> For those who have no idea what we're talking about, take a look at
>> http://www.ted.com/talks/clifford_stoll_on_everything.  If you just want
>> to see what you do with a slide rule, fast forward to 14:20, but you
>> really owe it to yourself to invest the 18 minutes to watch the whole
>> thing.
>>
>
> Anyone (besides me) ever seen a cylindrical slide rule?  I have one -- 
> unfortunately misplaced 
> at the moment.  :-(
>
> The scales were helical around a cylinder giving (it was claimed) to be the 
> equivalent of a 
> five-foot rule.  But that still only gave one additional significant digit.  
> Only two scales, 
> however, which limited its use to multiply/divide and logs.  But interesting.

I have a "circular" (really spiral) slide rule that I inherited from
my grandfather.

http://www.ducksburg.com/atlas_slide_rule/

One of my uncles told me that he took it (or a similar model) to
university (ca. 1960, I guess) & got an F on a calculus test because
his answers were too accurate & precise to be honest.  He went to the
professor's office, showed him the circular slide rule, & got an A.


-- 
To live without killing is a thought which could electrify the world,
if men were only capable of staying awake long enough to let the idea
soak in. --- Henry Miller
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Off-topic circumnavigating the earth in a mile or less

2014-05-01 Thread Adam Funk
On 2014-04-30, Ethan Furman wrote:

> Wow.  It's amazing how writing something down, wrongly (I originally had 
> north and south reversed), correcting it, 
> letting some time pass (enough to post the message so one can be properly 
> embarrassed ;), and then rereading it later 
> can make something so much clearer!

It's amazing how big a subthread appeared in response to a gag that I
think I got from a Car Talk puzzler.


-- 
A recent study conducted by Harvard University found that the average
American walks about 900 miles a year. Another study by the AMA found
that Americans drink, on average, 22 gallons of alcohol a year. This
means, on average, Americans get about 41 miles to the gallon.
 http://www.cartalk.com/content/average-americans-mpg
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Off-topic circumnavigating the earth in a mile or less

2014-05-01 Thread Adam Funk
On 2014-05-01, Terry Reedy wrote:

> On 4/30/2014 7:46 PM, Ian Kelly wrote:
>
>> It also works if your starting point is (precisely) the north pole.  I
>> believe that's the canonical answer to the riddle, since there are no
>> bears in Antarctica.
>
> For the most part, there are no bears within a mile of the North Pole 
> either. "they are rare north of 88°" (ie, 140 miles from pole).
> https://en.wikipedia.org/wiki/Polar_bears
> They mostly hunt in or near open water, near the coastlines.
>
> I find it amusing that someone noticed and posted an alternate, 
> non-canonical  solution. How might a bear be near the south pole? As 
> long as we are being creative, suppose some jokester mounts a near 
> life-size stuffed black bear, made of cold-tolerant artificial 
> materials, near but not at the South Pole. The intent is to give fright 
> to naive newcomers. Someone walking in a radius 1/2pi circle about the 
> pole might easily see it.

OK, change bear to bird & the question to "What kind of bird is it?"


-- 
There's no money in poetry, but there's no poetry in
money either.  --- Robert Graves
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Significant digits in a float?

2014-05-01 Thread Adam Funk
On 2014-05-01, Dennis Lee Bieber wrote:

> On Tue, 29 Apr 2014 20:42:33 -0400, Roy Smith  declaimed the
> following:
>
>>In article ,
>> Dennis Lee Bieber  wrote:

>>> (one reason slide-rules were acceptable for so long -- and even my high
>>> school trig course only required slide-rule significance even though half
>>> the class had scientific calculators [costing >$100, when a Sterling
>>> slide-rule could still be had for <$10]) 
>>
>>Sterling?  Snort.  K&E was the way to go.
>
>   Math teacher was selling them in my 10th grade... Actually I already
> owned a Faber-Castell 57/22 "Business" ruler (which did NOT have the CF/DF
> scales set for *PI) and a Pickett N-1010-ES Trig rule.

What does a "business" slide-rule do?  Depreciation?


-- 
"Mrs CJ and I avoid clichés like the plague."
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Significant digits in a float?

2014-04-29 Thread Adam Funk
On 2014-04-29, Roy Smith wrote:

> Another possibility is that they're latitude/longitude coordinates, some 
> of which are given to the whole degree, some of which are given to 
> greater precision, all the way down to the ten-thousandth of a degree.

That makes sense.  1° of longitude is about 111 km at the equator,
78 km at 45°N or S, & 0 km at the poles.


"A man pitches his tent, walks 1 km south, walks 1 km east, kills a
bear, & walks 1 km north, where he's back at his tent.  What color is
the bear?"  ;-)


-- 
War is God's way of teaching Americans geography.
 [Ambrose Bierce]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Detecting a click on the turtle screen when the turtle isn't doing anything?

2013-02-06 Thread Adam Funk
On 2013-02-05, Dennis Lee Bieber wrote:

>   I'll echo the "Ugh" about the use of global AND ADD a dislike of the
> busy loop that only exits if some other return sets a value. If the busy
> loop were performing some action that changed the test state within the
> loop itself, okay...

TBH, I was originally going to use 

input('press RETURN to continue')

but I thought it would be nicer to click on the plot window.


> -=-=-=-
> import threading
>
> evtFlag = threading.Event()

This is a global variable, but here we're calling a method on it
rather than changing its value (boolean in my original case).  Why is
that better in principle?


> def clicked(x, y):
>   print("clicked at %f %f" % (x, y))
>   evtFlag.set()
>   #don't need "global" as not binding to the name evtFlag
>   #don't need "return" as falling off the end of the function
>   #   implements return

Does it do any harm to put an empty "return" at the end of a method?
(It looks tidier to me with the return.)


> def wait_for_clicks(s):
>   evtFlag.clear()
>   s.listen()
>   evtFlag.wait()
>   #don't need "global" or "return"
>   #don't need busy loop; .wait() blocks until some other thread
>   #   (GUI callback) sets the Event

I tried these and got even worse results --- even Ctrl-C in the xterm
I was running the program from wouldn't kill it; I had to use Ctrl-Z
and kill %1.


-- 
It would be unfair to detect an element of logic in the siting of the
Pentagon alongside the National Cemetery, but the subject seems at
least worthy of investigation.  --- C Northcote Parkinson
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Detecting a click on the turtle screen when the turtle isn't doing anything?

2013-02-06 Thread Adam Funk
On 2013-02-05, Dave Angel wrote:

> I'm no fan of Java.  But it's not about a "main" method, it's about 
> sharing data between functions.  Most of the time non-constant globals 
> are a mistake.  If the data can't be passed as an argument, then it 
> should probably be part of the instance data of some class.  Which class 
> is a design decision, and unlike Java, I wouldn't encourage writing a 
> class for unrelated functions, just to bundle them together.

Well, I understand the OO principle there, but it seems practical to
accept a few global variables in the "main" code of a program.
Anyway...


> Anyway, back to your problem.  Since your code doesn't have threads, it 
> must have an event loop somewhere.  Event loops don't coexist at all 
> well with calls to sleep().
>
>  while waiting:
>  time.sleep(1)
>
> If you start that code with waiting being true, it will never terminate.

Right.  But the following *does* work (although it's probably
offensive):

#v+
def wait_for_click(s, t):
global waiting
waiting = True
s.listen()
t.hideturtle()
t.penup()
while waiting:
t.forward(5)
t.right(5)
return
#v-



-- 
A lot of people never use their intiative because no-one
told them to. --- Banksy
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Detecting a click on the turtle screen when the turtle isn't doing anything?

2013-02-05 Thread Adam Funk
On 2013-02-05, Adam Funk wrote:

> I'm trying to get a program to do some plotting with turtle graphics,
> then wait for the user to click on the graphics window, then do some
> more plotting, &c.  So far I have the following, which doesn't work:
>
> #v+
> waiting = False
>
> def clicked(x, y):
> global waiting
> print('clicked at %f %f' % (x,y))
> waiting = False
> return
>
> def wait_for_click(s):
> global waiting
> waiting = True
> s.listen()
> while waiting:
> time.sleep(1)
> return
>
>
> ...
> t = turtle.Pen()
> s = turtle.Screen()

Oops, I snipped out two important lines:

#v+
s.onclick(clicked, btn=1)
wait_for_click(s, t)
#v-

> ...
> traverse.plot(s, t, "black", scale, adjx, adjy)
> wait_for_click(s)
> bowditch.plot(s, t, "red", scale, adjx, adjy)
> wait_for_click(s)
> transit.plot(s, t, "blue", scale, adjx, adjy)
> wait_for_click(s)
> #v-
>
>
> Each of my plot(..) calls does some turtle movement, and I want the
> program to sit and wait for the user to click the graphics window,
> then add the next plot.  I've played around with some event handling
> examples I found [1], and concluded that the onclick binding only
> works while the turtle is doing something.  Is that correct?  Is there
> a way to wait for the click & hear it while the turtle is not doing
> anything?
>
>
> [1] 
> <http://csil-web.cs.surrey.sfu.ca/cmpt120fall2010/wiki/IntroToEventHandling/>
>
>
> Thanks,
> Adam
>
>


-- 
Master Foo said: "A man who mistakes secrets for knowledge is like
a man who, seeking light, hugs a candle so closely that he smothers
it and burns his hand."--- Eric Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Detecting a click on the turtle screen when the turtle isn't doing anything?

2013-02-05 Thread Adam Funk
On 2013-02-05, woo...@gmail.com wrote:

> Note that the code you posted does not call onclick().  

It does, actually, but I accidentally snipped it when C&Ping code into
my original post.  Sorry!

> Globals are
>  confusing IMHO.  Code becomes cleaner and easier to write and read
>  when you become familiar with classes.

I've already got a module with Course and Traverse classes (this is
for surveying problems).  If I have to write a class just to create
one instance of it & call the main() method, I might as well use Java!
;-)

But I'll work through the example you posted --- thanks.


-- 
Master Foo said: "A man who mistakes secrets for knowledge is like
a man who, seeking light, hugs a candle so closely that he smothers
it and burns his hand."--- Eric Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Detecting a click on the turtle screen when the turtle isn't doing anything?

2013-02-05 Thread Adam Funk
I'm trying to get a program to do some plotting with turtle graphics,
then wait for the user to click on the graphics window, then do some
more plotting, &c.  So far I have the following, which doesn't work:

#v+
waiting = False

def clicked(x, y):
global waiting
print('clicked at %f %f' % (x,y))
waiting = False
return

def wait_for_click(s):
global waiting
waiting = True
s.listen()
while waiting:
time.sleep(1)
return


...
t = turtle.Pen()
s = turtle.Screen()
...
traverse.plot(s, t, "black", scale, adjx, adjy)
wait_for_click(s)
bowditch.plot(s, t, "red", scale, adjx, adjy)
wait_for_click(s)
transit.plot(s, t, "blue", scale, adjx, adjy)
wait_for_click(s)
#v-


Each of my plot(..) calls does some turtle movement, and I want the
program to sit and wait for the user to click the graphics window,
then add the next plot.  I've played around with some event handling
examples I found [1], and concluded that the onclick binding only
works while the turtle is doing something.  Is that correct?  Is there
a way to wait for the click & hear it while the turtle is not doing
anything?


[1] 



Thanks,
Adam


-- 
But the government always tries to coax well-known writers into the
Establishment; it makes them feel educated. [Robert Graves]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError when piping stdout, but not when printing directly to the console

2012-01-06 Thread Adam Funk
On 2012-01-06, Peter Otten wrote:

> Adam Funk wrote:
>
>> On 2012-01-04, Peter Otten wrote:
>> 
>>> Adam Funk wrote:
>> 
>>>> How can I force python (preferably within my python program, rather
>>>> than having to set something externally) to treat stdout as UTF-8?
>>>
>>>
>>> $ cat force_utf8.py
>>> # -*- coding: utf-8 -*-
>>> import sys
>>>
>>> if sys.stdout.encoding is None:
>>> import codecs
>>> writer = codecs.getwriter("utf-8")
>>> sys.stdout = writer(sys.stdout)
>>>
>>> print u"Ähnlich üblich nötig"
>> 
>> That's great, thanks!
>> 
>> I guess issues like this will magically go away when I eventually move
>> to Python 3?
>
> Not "magically", but UTF-8 has become the default encoding...

Close enough!



-- 
When Elaine turned 11, her mother sent her to train under
Donald Knuth in his mountain hideaway. [XKCD 342]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError when piping stdout, but not when printing directly to the console

2012-01-06 Thread Adam Funk
On 2012-01-04, Peter Otten wrote:

> Adam Funk wrote:

>> How can I force python (preferably within my python program, rather
>> than having to set something externally) to treat stdout as UTF-8?
>
>
> $ cat force_utf8.py
> # -*- coding: utf-8 -*-
> import sys
>
> if sys.stdout.encoding is None:
> import codecs
> writer = codecs.getwriter("utf-8")
> sys.stdout = writer(sys.stdout)
>
> print u"Ähnlich üblich nötig"

That's great, thanks!

I guess issues like this will magically go away when I eventually move
to Python 3?


-- 
Physics is like sex.  Sure, it may give some practical results,
but that's not why we do it.  [Richard Feynman]
-- 
http://mail.python.org/mailman/listinfo/python-list


UnicodeEncodeError when piping stdout, but not when printing directly to the console

2012-01-04 Thread Adam Funk
(I'm using Python 2.7.2+ on Ubuntu.)

When I'm running my program in an xterm, the print command with an
argument containing unicode works fine (it correctly detects my UTF-8
environment).  But when I run it with a pipe or redirect to a file (|
or >), unicode strings fail with the following (for example):

UnicodeEncodeError: 'ascii' codec can't encode character u'\u0107' in position 
21: ordinal not in range(128)

How can I force python (preferably within my python program, rather
than having to set something externally) to treat stdout as UTF-8?


Thanks,
Adam


-- 
Nam Sibbyllam quidem Cumis ego ipse oculis meis vidi in ampulla 
pendere, et cum illi pueri dicerent: beable beable beable; respondebat 
illa: doidy doidy doidy.   [plorkwort]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: mutually exclusive arguments to a constructor

2011-12-31 Thread Adam Funk
On 2011-12-30, Günther Dietrich wrote:

> Adam Funk  wrote:
>
>>Suppose I'm creating a class that represents a bearing or azimuth,
>>created either from a string of traditional bearing notation
>>("N24d30mE") or from a number indicating the angle in degrees as
>>usually measured in trigonometry (65.5, measured counter-clockwise
>>from the x-axis).  The class will have methods to return the same
>>bearing in various formats.
...
> You can determine the type of the input data by using isinstance() and 
> take the appropriate actions depending on this decision:
>
>>>> class MyClass(object):
> ... def __init__(self, input_data):
> ... if isinstance(input_data, basestring):
> ... print "Do actions for string type input"
> ... elif isinstance(input_data, float):
> ... print "Do actions for float type input"
> ... def get_output_data(self):
> ... return "output data"

Aha, I think I like this approach best, partly because I realized
after writing my post that it might also be good to accept strings
representing "pure" angles (e.g., "65d30m").  So I think I'll use
isinstance *and then* check the input string against some regexes to
determine whether it's in traditional surveying notation or trig
notation in DMS.


-- 
The generation of random numbers is too important to be left to
chance. [Robert R. Coveyou]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: mutually exclusive arguments to a constructor

2011-12-31 Thread Adam Funk
On 2011-12-30, Roy Smith wrote:

> "But!", some C++/Java type bondage addicts might cry, "there's nothing 
> to prevent somebody from creating a DirectionIndicatingThingie directly, 
> bypassing the factory functions.  There's no way to make the constructor 
> private!".  To which the free-willed pythonistas would respond, "If it 
> hurts when you do that, don't do that".

Actually one problem that can occur in large Java projects is that the
package structure requires some things to have public constructors
(even when you'd rather not do that) so the Factory class in the main
package has access to them.


-- 
English has perfect phonetic spelling. It just doesn't have phonetic
pronunciation.[Peter Moylan]
-- 
http://mail.python.org/mailman/listinfo/python-list


mutually exclusive arguments to a constructor

2011-12-30 Thread Adam Funk
(Warning: this question obviously reflects the fact that I am more
accustomed to using Java than Python.)

Suppose I'm creating a class that represents a bearing or azimuth,
created either from a string of traditional bearing notation
("N24d30mE") or from a number indicating the angle in degrees as
usually measured in trigonometry (65.5, measured counter-clockwise
from the x-axis).  The class will have methods to return the same
bearing in various formats.

In Java, I would write two constructors, one taking a single String
argument and one taking a single Double argument.  But in Python, a
class can have only one __init__ method, although it can have a lot of
optional arguments with default values.  What's the correct way to
deal with a situation like the one I've outlined above?


-- 
Unix is a user-friendly operating system. It's just very choosy about
its friends.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: suppressing bad characters in output PCDATA (converting JSON to XML)

2011-12-02 Thread Adam Funk
On 2011-11-29, Stefan Behnel wrote:

> Adam Funk, 29.11.2011 13:57:
>> On 2011-11-28, Stefan Behnel wrote:

>>> If the name "big_json" is supposed to hint at a large set of data, you may
>>> want to use something other than minidom. Take a look at the
>>> xml.etree.cElementTree module instead, which is substantially more memory
>>> efficient.
>>
>> Well, the input file in this case contains one big JSON list of
>> reasonably sized elements, each of which I'm turning into a separate
>> XML file.  The output files range from 600 to 6000 bytes.
>
> It's also substantially easier to use, but if your XML writing code works 
> already, why change it.

That module looks useful --- thanks for the tip.  (TBH, I'm using
minidom mainly because I've used it before and the API is similar to
the DOM APIs I've used in other languages.)


> You should read up on Unicode a bit.

It wouldn't do me any harm.  :-)


>>>> I thought this would force all the output to be valid, but xmlstarlet
>>>> gives some errors like these on a few documents:
>>>>
>>>> PCDATA invalid Char value 7
>>>> PCDATA invalid Char value 31
>>>
>>> This strongly hints at a broken encoding, which can easily be triggered by
>>> your erroneous encode-and-encode cycles above.
>>
>> No, I've checked the JSON input and those exact control characters are
>> there too.
>
> Ah, right, I didn't look closely enough. Those are forbidden in XML:
>
> http://www.w3.org/TR/REC-xml/#charsets
>
> It's sad that minidom (apparently) lets them pass through without even a 
> warning.

Yes, it is!  I've now found this, which seems to fix the problem:

http://bitkickers.blogspot.com/2011/05/stripping-control-characters-in-python.html


-- 
The internet is quite simply a glorious place. Where else can you find
bootlegged music and films, questionable women, deep seated xenophobia
and amusing cats all together in the same place? [Tom Belshaw]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: suppressing bad characters in output PCDATA (converting JSON to XML)

2011-11-29 Thread Adam Funk
On 2011-11-28, Steven D'Aprano wrote:

> On Fri, 25 Nov 2011 13:50:01 +0000, Adam Funk wrote:
>
>> I'm converting JSON data to XML using the standard library's json and
>> xml.dom.minidom modules.  I get the input this way:
>> 
>> input_source = codecs.open(input_file, 'rb', encoding='UTF-8',
>> errors='replace') big_json = json.load(input_source)
>> input_source.close()
>> 
>> Then I recurse through the contents of big_json to build an instance of
>> xml.dom.minidom.Document (the recursion includes some code to rewrite
>> dict keys as valid element names if necessary), 
>
> How are you doing that? What do you consider valid?

Regex-replacing all whitespace ('\s+') with '_', and adding 'a_' to
the beginning of any potential tag that doesn't start with a letter.
This is good enough for my purposes.

>> I thought this would force all the output to be valid, but xmlstarlet
>> gives some errors like these on a few documents:
>
> It will force the output to be valid UTF-8 encoded to bytes, not 
> necessarily valid XML.

Yes!

>> PCDATA invalid Char value 7
>> PCDATA invalid Char value 31
>
> What's xmlstarlet, and at what point does it give this error? It doesn't 
> appear to be in the standard library.

It's a command-line tool I use a lot for finding the bad bits in XML,
nothing to do with python.

http://xmlstar.sourceforge.net/

>> I guess I need to process each piece of PCDATA to clean out the control
>> characters before creating the text node:
>> 
>>   text = doc.createTextNode(j)
>>   root.appendChild(text)
>> 
>> What's the best way to do that, bearing in mind that there can be
>> multibyte characters in the strings?
>
> Are you mixing unicode and byte strings?

I don't think I am.

> Are you sure that the input source is actually UTF-8? If not, then all 
> bets are off: even if the decoding step works, and returns a string, it 
> may contain the wrong characters. This might explain why you are getting 
> unexpected control characters in the output: they've come from a badly 
> decoded input.

I'm pretty sure that the input is really UTF-8, but has a few control
characters (fairly rare).

> Another possibility is that your data actually does contain control 
> characters where there shouldn't be any.

I think that's the problem, and I'm looking for an efficient way to
delete them from unicode strings.


-- 
Some say the world will end in fire; some say in segfaults.
 [XKCD 312]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: suppressing bad characters in output PCDATA (converting JSON to XML)

2011-11-29 Thread Adam Funk
On 2011-11-28, Stefan Behnel wrote:

> Adam Funk, 25.11.2011 14:50:
>> I'm converting JSON data to XML using the standard library's json and
>> xml.dom.minidom modules.  I get the input this way:
>>
>> input_source = codecs.open(input_file, 'rb', encoding='UTF-8', 
>> errors='replace')
>
> It doesn't make sense to use codecs.open() with a "b" mode.

OK, thanks.

>> big_json = json.load(input_source)
>
> You shouldn't decode the input before passing it into json.load(), just 
> open the file in binary mode. Serialised JSON is defined as being UTF-8 
> encoded (or BOM-prefixed), not decoded Unicode.

So just do
  input_source = open(input_file, 'rb')
  big_json = json.load(input_source)
?

>> input_source.close()
>
> In case of a failure, the file will not be closed safely. All in all, use 
> this instead:
>
>  with open(input_file, 'rb') as f:
>  big_json = json.load(f)

OK, thanks.

>> Then I recurse through the contents of big_json to build an instance
>> of xml.dom.minidom.Document (the recursion includes some code to
>> rewrite dict keys as valid element names if necessary)
>
> If the name "big_json" is supposed to hint at a large set of data, you may 
> want to use something other than minidom. Take a look at the 
> xml.etree.cElementTree module instead, which is substantially more memory 
> efficient.

Well, the input file in this case contains one big JSON list of
reasonably sized elements, each of which I'm turning into a separate
XML file.  The output files range from 600 to 6000 bytes.


>> and I save the document:
>>
>> xml_file = codecs.open(output_fullpath, 'w', encoding='UTF-8', 
>> errors='replace')
>> doc.writexml(xml_file, encoding='UTF-8')
>> xml_file.close()
>
> Same mistakes as above. Especially the double encoding is both unnecessary 
> and likely to fail. This is also most likely the source of your problems.

Well actually, I had the problem with the occasional control
characters in the output *before* I started sticking encoding="UTF-8"
all over the place (in an unsuccessful attempt to beat them down).


>> I thought this would force all the output to be valid, but xmlstarlet
>> gives some errors like these on a few documents:
>>
>> PCDATA invalid Char value 7
>> PCDATA invalid Char value 31
>
> This strongly hints at a broken encoding, which can easily be triggered by 
> your erroneous encode-and-encode cycles above.

No, I've checked the JSON input and those exact control characters are
there too.  I want to suppress them (delete or replace with spaces).

> Also, the kind of problem you present here makes it pretty clear that you 
> are using Python 2.x. In Python 3, you'd get the appropriate exceptions 
> when trying to write binary data to a Unicode file.

Sorry, I forgot to mention the version I'm using, which is "2.7.2+".


-- 
In the 1970s, people began receiving utility bills for
-£999,999,996.32 and it became harder to sustain the 
myth of the infallible electronic brain.  (Stob 2001)
-- 
http://mail.python.org/mailman/listinfo/python-list


suppressing bad characters in output PCDATA (converting JSON to XML)

2011-11-25 Thread Adam Funk
I'm converting JSON data to XML using the standard library's json and
xml.dom.minidom modules.  I get the input this way:

input_source = codecs.open(input_file, 'rb', encoding='UTF-8', errors='replace')
big_json = json.load(input_source)
input_source.close()

Then I recurse through the contents of big_json to build an instance
of xml.dom.minidom.Document (the recursion includes some code to
rewrite dict keys as valid element names if necessary), and I save the
document:

xml_file = codecs.open(output_fullpath, 'w', encoding='UTF-8', errors='replace')
doc.writexml(xml_file, encoding='UTF-8')
xml_file.close()


I thought this would force all the output to be valid, but xmlstarlet
gives some errors like these on a few documents:

PCDATA invalid Char value 7
PCDATA invalid Char value 31

I guess I need to process each piece of PCDATA to clean out the
control characters before creating the text node:

  text = doc.createTextNode(j)
  root.appendChild(text)

What's the best way to do that, bearing in mind that there can be
multibyte characters in the strings?  I found some suggestions on the
WWW involving filter with string.printable, which AFAICT isn't
unicode-friendly --- is there a unicode.printable or something like
that?


-- 
"Mrs CJ and I avoid clichés like the plague."
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: recommend a graphics library for plotting by the pixel?

2011-10-07 Thread Adam Funk
On 2011-10-05, Westley Martínez wrote:

> On Wed, Oct 05, 2011 at 02:29:38PM +0100, Adam Funk wrote:

>> I only know PyGame because we did an exercise in recreating the old
>> breakout game and messing around with it at a local Python group.
>> 
>> I was under the mistaken impression from that exercise that you have
>> to maintain a set of all the objects on the screen and redraw them all
>> every time through the loop that ends with pygame.display.flip() ---
>> *but* I now see that the loop starts with these:
>> 
>> clock.tick(tick_rate)
>> screen.fill((0,0,0))
>> # comes from screen = 
>> pygame.display.set_mode((screen_width,screen_height))
>> # before the loop
>> 
>> and that I was then deleting hit bricks, calculating the new positions
>> of the balls, and then redrawing everything that was left on the
>> secondary screen because things were moving around and disappearing.
>> 
>> I guess if I don't clear the screen at the beginning of the loop but
>> just blit pixels onto it, when I call display.flip(), it will add the
>> new blittings to what was already there?  If that's true, this will be
>> much easier than I thought.

> Yep.  Blitting is replacing the old colors with new colors.  It doesn't
> replace colors unless you tell it to.

My mistake was in sample code, running with it, & not looking at it
too closely.  ;-)


-- 
Mathematiker sind wie Franzosen: Was man ihnen auch sagt, übersetzen
sie in ihre eigene Sprache, so daß unverzüglich etwas völlig anderes
daraus wird.[Goethe]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: recommend a graphics library for plotting by the pixel?

2011-10-07 Thread Adam Funk
On 2011-10-04, Ian Kelly wrote:

> On Tue, Oct 4, 2011 at 3:56 AM, Adam Funk  wrote:
>> I'd like to create a window with a "pause" button and a large plotting
>> area, in which I'd like to draw a polygon, detect the pixel
>> coördinates of a mouse click, and then start setting the colors of
>> pixels by (x,y) coördinates.  (This is just for my own amusement, to
>> play with some formulas for generating fractals using random numbers.)
>>
>> There seems to be a large number of different python GUI libraries,
>> and I wonder if someone can recommend the easiests one to learn for
>> this purpose?  (The only python graphics library I've used is PyGame,
>> which I don't think is the way to go here.)
>
> You could use wxPython.  You'll need to create a custom control and
> have it paint itself by blitting from a wx.Bitmap, which you'll draw
> on by using a wx.MemoryDC and then refreshing the control.
>
> I would probably just use pygame myself.  I guess you're avoiding it
> because of the requirement for a button, but there are GUI libraries
> available for it, or if all you need are a couple of buttons you could
> easily roll your own.

Excellent suggestion.  I got it to work, but using keypresses (pause,
step, quit) instead of buttons, and a mouse event for letting the user
pick the start point on the screen.


-- 
I worry that 10 or 15 years from now, [my daughter] will come to me
and say 'Daddy, where were you when they took freedom of the press
away from the Internet?'  [Mike Godwin]
http://www.eff.org/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: recommend a graphics library for plotting by the pixel?

2011-10-05 Thread Adam Funk
On 2011-10-04, Derek Simkowiak wrote:

>  If this is strictly for 2D pixel graphics, I recommend using PyGame 
> (aka SDL).  Why do you not think it's the way to go?  It was built for 
> this type of thing.

I only know PyGame because we did an exercise in recreating the old
breakout game and messing around with it at a local Python group.

I was under the mistaken impression from that exercise that you have
to maintain a set of all the objects on the screen and redraw them all
every time through the loop that ends with pygame.display.flip() ---
*but* I now see that the loop starts with these:

clock.tick(tick_rate)
screen.fill((0,0,0))
# comes from screen = pygame.display.set_mode((screen_width,screen_height))
# before the loop

and that I was then deleting hit bricks, calculating the new positions
of the balls, and then redrawing everything that was left on the
secondary screen because things were moving around and disappearing.

I guess if I don't clear the screen at the beginning of the loop but
just blit pixels onto it, when I call display.flip(), it will add the
new blittings to what was already there?  If that's true, this will be
much easier than I thought.

The only buttons I have in mind are "pause", "step", "go", and "quit",
and I can just as easily do those with keypresses.
-- 
http://mail.python.org/mailman/listinfo/python-list


recommend a graphics library for plotting by the pixel?

2011-10-04 Thread Adam Funk
I'd like to create a window with a "pause" button and a large plotting
area, in which I'd like to draw a polygon, detect the pixel
coördinates of a mouse click, and then start setting the colors of
pixels by (x,y) coördinates.  (This is just for my own amusement, to
play with some formulas for generating fractals using random numbers.)

There seems to be a large number of different python GUI libraries,
and I wonder if someone can recommend the easiests one to learn for
this purpose?  (The only python graphics library I've used is PyGame,
which I don't think is the way to go here.)


-- 
In the 1970s, people began receiving utility bills for
-£999,999,996.32 and it became harder to sustain the 
myth of the infallible electronic brain.  (Stob 2001)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: bash: testing whether anyone is or has recently been logged in?

2011-07-07 Thread Adam Funk
On 2011-04-20, Bill Marcum wrote:

> On 2011-04-20, Adam Funk  wrote:
>> I'd appreciate any suggestions for testing (preferably from a bash
>> script, although perl or python would be OK too) whether anyone is
>> currently logged in, and whether anyone has been logged in during the
>> past $N minutes.  
>>
>> (The idea is to use this in a cron script that only does its job when
>> the machine is not "in live use".)
>>
> A script could parse the output of "who", "w" or "last", and the stored
> output from previous commands. 

[adding comp.lang.python]

Here's what I came up with in python.  If you call it with no
arguments or with 0 as the argument, it gives a 0 exit code if no-one
is logged in.  If you call it with a non-zero argument N, it gives a 0
exit code if no-one has been logged in in the past N minutes.

I welcome suggestions, corrections, &c.



#v+
#!/usr/bin/python
# -*- coding: utf-8 -*-

import subprocess, re
from   dateutil.parser import *
from   sys  import stdout, stderr, argv
from   datetime import datetime, timedelta
from   optparseimport OptionParser


oparser = OptionParser(usage="usage: %prog [options] [N]")

oparser.add_option("-v", dest="verbose", default=False, action="store_true",
   help="Verbose output")

(options, args) = oparser.parse_args()



query = 0

try:
if len(args) > 0:
query = int(args[0])
except ValueError:
stderr.write('Invalid argument %s\n' % argv[1])
exit(-1)

if (options.verbose):
stdout.write('query %i\n' % query)


last_proc = subprocess.Popen(args=['last'], stdout=subprocess.PIPE)
last_out = last_proc.stdout.read().split('\n')

still_logged = re.compile(r'still logged in')
line_pattern = re.compile(r'^\S+\s+\S+\s+\S+\s+(\S+\s+\S+\s+\S+) ..:.. - 
(..:..)\s+.*$')
timestamps = [];
minutes = 1


for line in last_out:
if line.startswith('reboot'):
pass

elif still_logged.search(line):
minutes = 0
if (options.verbose):
stdout.write('still logged in\n')
break

else:
matcher = line_pattern.match(line)
# user term host Tue Apr 26 13:49 - 14:52  (01:02)

if matcher:
date_string = matcher.group(1) + ' ' + matcher.group(2)
timestamp = parse(date_string)
stdout.write('d> ' + date_string + '  -->  ' + str(timestamp) + 
'\n')
timestamps.append(timestamp)

if len(timestamps) > 0:
latest = max(timestamps)
stderr.write(str(latest) + '\n')
now = datetime.now()
delta = now - latest
minutes = delta.days * 24 * 60 + delta.seconds / 60



diff_value = query + 1 - minutes
exit_value = max (0, diff_value)

if (options.verbose):
stdout.write('min   %i\n' % minutes)
stdout.write('diff  %i\n' % diff_value)
stdout.write('exit  %i\n' % exit_value)

exit(exit_value)
#v-


-- 
I worry that 10 or 15 years from now, [my daughter] will come to me
and say 'Daddy, where were you when they took freedom of the press
away from the Internet?'  [Mike Godwin]
http://www.eff.org/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Does fileinput.input() read STDIN all at once?

2007-12-21 Thread Adam Funk
On 2007-12-20, Tim Roberts wrote:

>>As a test, I tried this:
>>
>>   for line in fileinput.input():
>>  print '**', line
>>
>>and found that it would print nothing until I hit Ctl-D, then print
>>all the lines, then wait for another Ctl-D, and so on (until I pressed
>>Ctl-D twice in succession to end the loop).
>
> Note that you are doing a lot more than just reading from a file here. This
> is calling a function in a module.  Fortunately, you have the source to
> look at.
>
> As I read the fileinput.py module, this will eventually call
> FileInput.next(), which eventually calls FileInput.readline(), which
> eventually calls stdin.readlines(_bufsize).  The default buffer size is
> 8,192 bytes.

OK, I think I've got this figured out!

I'll keep using fileinput.input() for the normal case (reading from
files named as arguments) and use sys.stdin.readline() for testing
(reading from the keyboard, when no filenames are given).

Thanks to you and Jonathan for your advice.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Does fileinput.input() read STDIN all at once?

2007-12-19 Thread Adam Funk
On 2007-12-18, Jonathan Gardner wrote:

>> As a test, I tried this:
>>
>>for line in fileinput.input():
>>   print '**', line
>>
>> and found that it would print nothing until I hit Ctl-D, then print
>> all the lines, then wait for another Ctl-D, and so on (until I pressed
>> Ctl-D twice in succession to end the loop).

> There is probably a 1024 byte buffer. Try filling it up and see if you
> get something before you hit CTRL-D.

Thanks; I'm looking into the buffering question.


> It sounds like you want to write some kind of interactive program for
> the terminal. Do yourself a favor and use curses or go with a full-
> blown GUI.

No, I'm really interested in reading the input from files!

This just came up because I was trying to test the program by giving
it no filename arguments and typing sample input.
-- 
http://mail.python.org/mailman/listinfo/python-list


Does fileinput.input() read STDIN all at once?

2007-12-18 Thread Adam Funk
I'm using this sort of standard thing:

   for line in fileinput.input():
  do_stuff(line)

and wondering whether it reads until it hits an EOF and then passes
lines (one at a time) into the variable line.  This appears to be the
behaviour when it's reading STDIN interactively (i.e. from the
keyboard).

As a test, I tried this:

   for line in fileinput.input():
  print '**', line

and found that it would print nothing until I hit Ctl-D, then print
all the lines, then wait for another Ctl-D, and so on (until I pressed
Ctl-D twice in succession to end the loop).

Is it possible to configure this to pass each line of input into line
as it comes?

Thanks,
Adam
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Equivalent of perl's Pod::Usage?

2007-12-13 Thread Adam Funk
On 2007-12-10, Nick Craig-Wood wrote:

> That said, python does a good job of turning doc strings and class
> descriptions into man pages even without any special markup, if you
> wrote docstrings everywhere.  Try pydoc on any bit of python (without
> the .py) and you'll see what I mean
>
> As for Pod::Usage - write the instructions for your script as a
> docstring at the top of your file, then use this little function...
>
> def usage(error):
> """
> Print the usage, an error message, then exit with an error
> """
> print >>sys.stderr, globals()['__doc__']
> print >>sys.stderr, error
> sys.exit(1)

That looks useful; thanks.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is a "real" C-Python possible?

2007-12-11 Thread Adam Funk
On 2007-12-10, sturlamolden wrote:

> We have seen several examples that 'dynamic' and 'interpreted'
> languages can be quite efficient: There is an implementation of Common
> Lisp - CMUCL - that can compete with Fortran in efficiency for
> numerical computing. There are also versions of Lisp than can compete
> with the latest versions of JIT-compiled Java, e.g. SBCL and Allegro.
> As it happens, SBCL and CMUCL is mostly implemented in Lisp. The issue
> of speed for a language like Python has a lot to do with the quality
> of the implementation. What really makes CMUCL shine is the compiler
> that emits efficient native code on the fly. If it is possible to make
> a very fast Lisp, it should be possible to make a very fast Python as
> well. I remember people complaining 10 years ago that 'Lisp is so
> slow'. A huge effort has been put into making Lisp efficient enough
> for AI. I hope Python some day will gain a little from that effort as
> well.

I've been told that Torbjörn Lager's implementation of the Brill
tagger in Prolog is remarkably fast, but that it uses some
counter-intuitive arrangements of the predicate and argument
structures in order to take advantage of the way Prolog databases are
indexed.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Equivalent of perl's Pod::Usage?

2007-12-10 Thread Adam Funk
On 2007-12-08, Dennis Lee Bieber wrote:

> On Fri, 7 Dec 2007 20:12:21 +0000, Adam Funk <[EMAIL PROTECTED]>
> declaimed the following in comp.lang.python:
>
>> I'm using to using Pod::Usage in my Perl programs (a snipped example
>> is shown below, if you're interested) to generate a little man page
>> when they are called with the -h option.
>> 
>> Is there an equivalent in Python?
>>
>   I'd suggest you look in the Python references for docstring and/or
> __doc__

Thanks.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Equivalent of perl's Pod::Usage?

2007-12-10 Thread Adam Funk
On 2007-12-08, Neil Cerutti wrote:

> On 2007-12-08, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote:
>> On Fri, 7 Dec 2007 20:12:21 +, Adam Funk <[EMAIL PROTECTED]>
>> declaimed the following in comp.lang.python:
>>
>>> I'm using to using Pod::Usage in my Perl programs (a snipped example
>>> is shown below, if you're interested) to generate a little man page
>>> when they are called with the -h option.
>>> 
>>> Is there an equivalent in Python?
>>>
>>  I'd suggest you look in the Python references for docstring and/or
>> __doc__
>
> I found the example incomprehensible, so I looked it up in
> perldoc. 

Sorry about that!  POD is a mark-up language that Perl's Pod::Usage
module can translate into man pages (and other documentation formats).

So what I'm really after is an easy way to generate something that
looks like a man page.


> Anyhow, Python doesn't have it. Combining printing
> various verboseness of usage messages with setting exit codes
> with calling the exit function seems a little bizarre.
>
> But I believe optparse will handle parsing arguments and printing
> usage messages, though not, I think, setting verbosity levels and
> exiting the program.

Thanks.
-- 
http://mail.python.org/mailman/listinfo/python-list


Equivalent of perl's Pod::Usage?

2007-12-07 Thread Adam Funk
I'm using to using Pod::Usage in my Perl programs (a snipped example
is shown below, if you're interested) to generate a little man page
when they are called with the -h option.

Is there an equivalent in Python?

Thanks,
Adam


##

use Pod::Usage;

getopts("ha:b:c", \%option) ;

if ($option{h}) {
pod2usage(-verbose => 2);
die("\n\n");
}

# REST OF PROGRAM HERE!

#
=head1 Documentation

=head2 Synopsis

COMMAND OPTION(S) [FILE(S)]

=head2 Options

=over

=item -h

Get this help.

...

=back

=cut
#
-- 
http://mail.python.org/mailman/listinfo/python-list


  1   2   >