webbrowser module + urls ending in .py = a security hole?

2006-01-29 Thread Blair P. Houghton
I'm just learning Python, so bear with.

I was messing around with the webbrowser module and decided it was
pretty cool to have the browser open a URL from within a python script,
so I wrote a short script to open a local file the same way, using the
script file as an example target:

# browser-test.py
import webbrowser
import sys
pathname = sys.argv[0]
protocol = 'file://'
url = protocol + pathname
webbrowser.open(url)

And what I got, instead of a browser window with the text of my script,
was a sequence of DOS windows popping up and disappearing.

Apparently that's because either Windows (XP SP2) or the browser
(Firefox) was interpreting the .py file extension and running Python to
execute it.

So is this a known (mis)feature, and will it happen if I chance to use
webbrowser.open() on a remote .py file?

Because if so, it's a king-hell security hole.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: webbrowser module + urls ending in .py = a security hole?

2006-01-29 Thread Blair P. Houghton
Oh, uh, Python version 2.4.2, in case you're wondering.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: webbrowser module + urls ending in .py = a security hole?

2006-01-30 Thread Blair P. Houghton
I'm going to try it out on a remote server later today.

I did use this script to fetch remote HTML
(url='http://www.python.org') before I tired the remote file, and it
opened the webpage in Firefox.

I may also try to poke around in webbrowser.py, if possible, to see if
I can see whether it's selecting the executable for the given
extension, or passing it off to the OS.  I would think, since Python is
not /supposed/ to have client-side scripting powers, that even when the
script is on the client this is bad behavior.

Just don't have the bandwidth, just now.

Anyone got a good regex that will always detect an extension that might
be considered a script? Or reject all but known non-scripted
extensions? Because wrapping the webbrowser.open() call would be the
workaround, and upgrading webbrowser.py would be a solution.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Find directory name of file?

2006-01-30 Thread Blair P. Houghton

Grant Edwards wrote:

> Try something like this at the beginning of your program and
> see if it does what you want:
>
>   print os.path.abspath(sys.argv[0])

Wanna see something freaky?

In IDLE, I type the following:

>>> import sys
>>> import os.path
>>> os.path.abspath(sys.argv[0])
'C:\\Program Files\\Python\\2.4'
>>>

Pretty normal, right?  Then I type this:

>>> print sys.argv[0]

>>>

Yup.  No output.  Even if I wrap the sys.argv[0] in something visible,
there's nothing there:

>>> print "x"+str(sys.argv[0])+"y"
xy
>>>

How does it know what the path of the argument is if it doesn't know
what the argument is?  Seems a bit presumptive to assume that CWD is
the path to a blank string...unless abspath presumes that anything you
pass it is an executable, even if it's got no bytes...

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: webbrowser module + urls ending in .py = a security hole?

2006-01-30 Thread Blair P. Houghton
Sorry...should read:

"I did use the script to fetch remote HTML
(url='http://www.python.org') before I tried the local file, and it
opened the webpage in Firefox."

Too many chars, too few fingers.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: webbrowser module + urls ending in .py = a security hole?

2006-02-01 Thread Blair P. Houghton
>Would it be sufficient in your case merely to allow only .html files to
>be loaded?  Or URLs without .extensions?  Or even just permit only the
>http: protocol?

Personally, I'm just noodling around with this right now.
So "my case" is the abstract case.  I think the solution if
one was needed would be to look at how something like
Firefox implements script detection and warns about it,
so all forms of scripts would be rejected.

I did try loading the .py file over a remote connection, and
it does seem to work as expected that way; i.e., I get a
browser window with the text of the script.  So the
webbrowser.py module's handling of http:// accesses
is definitely different from its handling of  file://  accesses.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: webbrowser module + urls ending in .py = a security hole?

2006-02-02 Thread Blair P. Houghton
Peter Hansen wrote:
> It appears the correct approach might be something along the lines of
> reading the registry to find what application is configured for the
> "HTTP" protocol (HKCR->HTTP->shell->open->command) and run that, passing
> it the URL.  I think that would do what most people expect, even when
> the URL actually passed specifies the "file" protocol and not "http".

Yeah...but here's where my mind splits.  I like security, but I'm not
sure I like the idea of breaking URL syntax and treating "file" as
"http" when it's explicitly specified...although in the context of a
URL, that might be the user's intended use-case... so do we go with "do
the secure, probably expected thing" or "do the thing Tim Berners-Lee
designed it to do"?

Since the behavior is "correct" in the "http://"; case (the text is
displayed in the browser), and any "file://" access has physical and
network security built into it by nature of never accessing outside the
user's already-accessible file domain, maybe it is "correct" that the
"file://" access be treated as though it was issued from a shell
command or file-explorer window.  Which makes it no security hole at
all, it would seem...

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: webbrowser module + urls ending in .py = a security hole?

2006-02-02 Thread Blair P. Houghton

Blair P. Houghton wrote:
> Which makes it no security hole at
> all, it would seem...

Well, no, that's a little strong.  No *new* security hole, maybe.  It
would be on the order of having ./ in the PATH for root, and getting
trapped by a hacker who named his rootkit "ls" or "pwd".  I.e., it puts
the onus on the caller user of determining what file is really being
accessed and what's really in it before it's ever opened for default
action.

So it's an insecurity that produces an annoyance that maybe could be
handled by the webbrowser.py module...

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: OO conventions

2006-02-02 Thread Blair P. Houghton
Image would be a superclass to JPGImage, BMPImage, PNGImage, etc...

But which to use could only be determined AFTER opening the file,
because "file.jpg" doesn't have type JPG, it has type string and
semantic value "maybe a jpeg file or maybe something misnamed as a jpeg
file".

So Image.open(filename) seems right as a factory function that opens
the file, figures out what it really is, constructs the appropriate
subclass object (likely by passing a string to the constructor, e.g.,
JPGImage(filename)), and returns the object via the superclass type.
The caller can then either check a flag in the superclass to see what
type the subclass is, or just assume it's the right type of image for
the filename extension (or does Python have RTTI? I don't recall if
I've seen it, yet...).

Though if the filename doesn't match the content, someone should raise
an exception...

But this means that Image.open(filename) is a static method of the
superclass, not requiring instantiation.  Image(string) could easily
default to assuming string is a filename, doing the same work as
Image.open(filename), though it would at least partially construct an
Image instance each time it's called, which isn't what you want.
Image.open(filename) defined as a static method (wait...is that
possible in Python? I hate being the newbie) would not do any
constructing until after it knew what to construct.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: OO conventions

2006-02-03 Thread Blair P. Houghton

bruno at modulix wrote:
> Blair P. Houghton wrote:
> > So Image.open(filename) seems right as a factory function that opens
> > the file, figures out what it really is, constructs the appropriate
> > subclass object (likely by passing a string to the constructor, e.g.,
> > JPGImage(filename)), and returns the object via the superclass type.
>
> Why "via the superclass type" ? "returns the object" is enough.

I'm just being pedantic; because we can't tell from the return
type what subclass type it is, we see it as an object of the
superclass type and then ask it what subclass type it is.

> > The caller can then either check a flag in the superclass to see what
> > type the subclass is,
>
> Why the h... ? We don't care what type it is, as long at it does what we
> expect it to do.

We might want to know what to expect.  A function returning a Person
may return a Man or a Woman,.  It could make a difference to our
Libido, so it's in our best interest to find out what was returned, so
we
can call the right methods specific to that subclass.

> > or just assume it's the right type of image
>
> Yes
>
> (snip)
> > (or does Python have RTTI?
>
> Much better than 'RTTI'.
>
> > I don't recall if
> > I've seen it, yet...).
>
> obj.__class__ is a reference to the class (which is itself an object...)

Shortly after I posted that I came across other posts mentioning
__class__
and I'm trying to grok it in fullness now.

> > Though if the filename doesn't match the content, someone should raise
> > an exception...
>
> Why ? filenames and extensions are nothing more than conventions.

Ostensibly.  But they're also a means of deceiving people, so the
handling
of a mismatch deserves care.  Whether that care takes the form of an
exception or by defensive coding (which is kind of what exceptions
simplify) is up to you.

> >  Image(string) could easily
> > default to assuming string is a filename, doing the same work as
> > Image.open(filename), though it would at least partially construct an
> > Image instance each time it's called, which isn't what you want.
> > Image.open(filename) defined as a static method (wait...is that
> > possible in Python? I hate being the newbie)
>
> It is (search for 'staticmethod' and 'classmethod'). But there's not
> much use for 'static methods' in Python - we usually just use plain
> functions ('classmethods' are another beast - much more useful than
> staticmethods)

Does it make any noticeable difference in efficiency, or does nobody
care much about efficiency in Python?

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Does Python support a peek like method for its file objects?

2006-02-05 Thread Blair P. Houghton
Avi Kak wrote:
> Hello:
>
>   Does Python support a peek like method for its file objects?
>
>   I'd like to be able to look at the next byte in a disk file before
>   deciding whether I should read it with, say, the read() method.
>   Is it possible to do so in Python?
>
>   Your answer would be much appreciated.

If it's a seekable file (not a stream input) then you can use
file.tell() to get the current position, then file.read() to read
some data, then file.seek(), giving it the position you got from
file.tell(), to rewind to the same position.  This is the safe version;
in the unsafe version you can skip the file.tell() stuff and just use
relative positioning in the file.seek() operation.

If it's a socket, you can use recv() or recvfrom() if you
set the flags argument to MSG_PEEK.

If it's a stream, you're out of luck, and you'll have to buffer the
data
yourself, although you can use select() or poll() to check on
availability of data if that's what you really want.

At least, in theory.  I haven't tried any of this in Python yet.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Does Python support a peek like method for its file objects?

2006-02-05 Thread Blair P. Houghton
Here's a version that should work in text mode as well:

fp = file("some file on disk", "r")
b = ""
while 1:
p = fp.tell()
# peek at the next byte; moves file position only if a byte is read
c = fp.read(1)
# decide whether to read it
if c == "?":
# pretend we never read the byte
fp.seek(p)
break
# now read the byte "for real"
b = c
if not b:
# we've reached the end of the file
break
fp.close() 

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: OO conventions

2006-02-05 Thread Blair P. Houghton

Alex Martelli wrote:
> As you see, static methods have a small extra lookup cost (a couple
> hundred nanoseconds on my oldish laptop);

I would've expected the opposite effect...I guess the runtime
considers instances more active than the static portion of
a class.

> "Premature
> optimization is the root of all evil in programming", as Knuth wrote
> quoting Hoare -- and anybody who's read Knuth's work knows he is
> anything BUT indifferent to real optimization; the key is avoiding that
> "premature" part!-)

Apropos of which someone mentioned in a thread on Slashdot
today about writing an entire program's code in Python first and
then optimizing portions to C or C++ only as performance warrants.

Seems like a good idea.  I've noticed Python is a lot easier to
get up-and-running with, even if you're as unfamiliar with it as
I am, compared to the other two.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Learning Python

2006-02-05 Thread Blair P. Houghton

Xavier Morel wrote:
> Where the hell did you get the idea of stacking input on a raw_input in
> the first place?

I'm guessing it goes something like:  "input is a verb, but raw_input
is a noun, so raw_input looks like a cast or conversion or stream
constructor, and input looks like an action..."

raw_input is a bad name for get_interactive_input, anyway...

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numeric expression from string?

2006-02-06 Thread Blair P. Houghton
Steven wrote:
>Do you think your users might enter something Evil and break their own system?

That's not usually how it works.

How it usually works is:

1.  Innocent code-monkey writes nifty applet, posts on usenet.
2.  Innocent but dull-witted framework manufacturer includes nifty
applet in Next Big Thing framework.
2.  Innocent webmaster uses framework to design entire website,
dragging and dropping input boxes validated by nifty applet all over
the place.
3.  Budding malevolent self-deceived "just fooling around" script
kiddie enters evil string into vulnerable buffer passed nifty applet,
taking down innocent webmaster's system.  Posts astonishment on
#dickwar3z irc channel.
4.  Genuinely malevolent wiseguy/blackmailer/terrorist blackhat stores
sploit for later inclusion in rootkit-laying worm suite.
5.  Randal Schwartz goes to jail.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: in over my head ascii

2006-02-06 Thread Blair P. Houghton
[EMAIL PROTECTED] wrote:
> What do you bet the server software was written by someone
> who thought ASCII STX meant literally the characters "STX"?

Wouldn't explain the "ENX" instead of "ETX".

> I've seen stupider things.

I give it 25% probability of being what you said, and 75% probability
that they didn't want to send any characters that couldn't be printed
in plaintext.

I've seen literally dozens of this kind of roll-your-own serial
protocol in the past few years, and they're all indicative that no two
are alike.  Ever read the ARINC 429 documents? They'll curl your toes.

--Blair
  "It's protocols all the way down."

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numeric expression from string?

2006-02-06 Thread Blair P. Houghton
Steven wrote:
>Do you think your users might enter something Evil and break their own system?

That's not usually how it works.

How it usually works is:

1.  Innocent code-monkey writes nifty applet, posts on usenet.
2.  Innocent but dull-witted framework manufacturer includes nifty
applet in Next Big Thing framework.
2.  Innocent webmaster uses framework to design entire website,
dragging and dropping input boxes validated by nifty applet all over
the place.
3.  Budding malevolent self-deceived "just fooling around" script
kiddie enters evil string into vulnerable buffer passed nifty applet,
taking down innocent webmaster's system.  Posts astonishment on
#dickwar3z irc channel.
4.  Genuinely malevolent wiseguy/blackmailer/terrorist blackhat stores
sploit for later inclusion in rootkit-laying worm suite.
5.  Randal Schwartz goes to jail.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: in over my head ascii

2006-02-10 Thread Blair P. Houghton

[EMAIL PROTECTED] wrote:

> so. how do i make 200 occupy 4 bytes ?

Did you double-check that they didn't want "0200" in ascii
in the message?

As in:

STX0200ENX

Because that's always possible.  And, if you're _lucky_, they designed
the innards of the message so that "ENX" can never appear at random.
Though with a length parameter, you shouldn't have to worry about
that...though with a length parameter the ENX is redundant...

Argh!

I'm having flashbacks.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python - Web Display Technology

2006-05-22 Thread Blair P. Houghton

Sybren Stuvel wrote:
> Heiko Wundram enlightened us with:
> > And: the web is a platform to offer _information_. Not to offer
> > shiny graphics/sound [...]
>
> Many would disagree...
>
> Not me, but I know a lot of people that would.

I would.  Most people would, once they realize that shiny/flashy is
information too.

High "production values" affect value-determining centers of the brain,
bypassing the linguistic and logical centers.  They make you understand
that the thing you're being presented is "worth something".

Most of the time, it's only worth a fat cash profit to the person doing
the presenting, who is giving you a piece of junk at an inflated price.
 But your brain doesn't care.  It's got a shortcut to your wallet, and
the information on the screen is accessing that.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python - Web Display Technology

2006-05-23 Thread Blair P. Houghton

[EMAIL PROTECTED] wrote:
> SamFeltus wrote:
> > Here is a visual argument,
> > http://samfeltus.com/swf/contact_globes.swf
>
> Here's a text-based argument.
>
> If I search Golge for "gardener, Athens, GA" then Google's spiders
> won't have recorded your contact page. So I don't find you as a local
> gardener, so I don't hire you for my mansion in Athens.
>
> Your contact page is arguably pretty, but pretty just isn't selling for
> that particular sort of page.

That's why Flash often comes with a heapin' helpin' o' metadata.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Timeline for Python?

2006-09-23 Thread Blair P. Houghton

wesley chun wrote:
>
> 1. never write against older versions of Python... you will only
> obsolete your book even faster (well, "sooner")

I believe there is some market for documentation of older
versions of software.  Many installations are constrained
by the cost of upgrading and can not migrate to a newer
version.  Meanwhile, they are laboring under the old
documentation, which, in the case of open-source stuff,
is often thoroughly inadequate.  Books with accurate
information would help them, and would have helped me, in
many cases.

But I don't think it's a big enough market that many people
would consider skipping the new-version market, which is
much bigger, when trying to decide what to write about.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Timeline for Python?

2006-09-24 Thread Blair P. Houghton

Lawrence D'Oliveiro wrote:
> In message <[EMAIL PROTECTED]>, Blair P.
> Houghton wrote:
>
> > wesley chun wrote:
> >>
> >> 1. never write against older versions of Python... you will only
> >> obsolete your book even faster (well, "sooner")
> >
> > I believe there is some market for documentation of older
> > versions of software.  Many installations are constrained
> > by the cost of upgrading and can not migrate to a newer
> > version.
>
> And they can afford to buy books??

Buying a book costs $40, max.

An hour of upgrade wrangling costs $40, min.  And it's never just one
person involved.  And it's never just an hour.  And yes, there are
plenty of places using software, especially "free" software, for which
any expense greater than a book is a major decision.

> If they're that strapped for cash, it's cheaper to access documentation on
> the Web.

Yes, it is, but the documentation on the web doesn't make the author
any money, which is why it's poorly constructed, poorly edited,
difficult to read, inaccurate, and abandoned when the next version of
the software comes out.

If book authors are reluctant to get paid to write new books for old
versions of systems, can you imagine how few people want to maintain
old web-docs for old versions of systems?

But there's a market there for someone willing to make a few bucks, and
a publisher with the savvy to find it and serve it.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Timeline for Python?

2006-09-25 Thread Blair P. Houghton

Aahz wrote:
> You did see my advice, seconded by Wes, that any book should cover the
> version differences?  How is that sufficiently inadequate that new books
> should specifically target older versions?

I think it's a good idea, but I also think that it may cause authors to
rely on the old documentation.  The problem is the old documentation
is inadequate.  New documentation should account for new things that
are known about the old versions, not just the implemented changes in
the software.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: does anybody earn a living programming in python?

2006-09-27 Thread Blair P. Houghton

walterbyrd wrote:
> If so, I doubt there are many.
>
> I wonder why that is?

Because Java has Sun's crazy-money behind it, and that pisses Microsoft
off, so C# has MS's crazy-money behind it.  And long before that, C was
/the/ language because it was the only one that would allow you to
actually program systems properly.

I happen to know that Google does most of its admin scripting in
Python.  It can't be a small job, running a few hundred thousand
servers worldwide and keeping them all up to date for system and
security.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How can I correct an error in an old post?

2006-10-01 Thread Blair P. Houghton

[EMAIL PROTECTED] wrote:
> Hi.
> I was searching for some information regarding a problem and found an
> interesting post that includes an answer to the problem; thought the
> post is very helpful it is based on a wrong assumption and thus the
> solution it suggests is incorrect. It took me some time to understand
> that the suggested solution is wrong and to find the correct solution
> and I wish to add my findings to the post, to prevent others from
> taking the wrong path.
> When I tried to replay to the post I received a reject message stating
> that it is impossible to replay to the topic since it is old and was
> closed by a manager.
> The question is how can I add this correction?
> Thanks.

Start a new thread and include the old post with your
corrections.

Yes, this means the erroneous post is still out there, and
does not link directly to yours, but that's the fault of people
who "close" threads, for they are among the stupidest
people on the net.

But if, as you did, someone else searches for the information,
the old post will come up in the search results, and yours will
too, because it contains the old post.  So anyone who bothers
to look at both will figure out what happened.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How can I correct an error in an old post?

2006-10-02 Thread Blair P. Houghton

Steve Holden wrote:
> Since this message was never on topic, I'd appreciate it if all
> concerned would close this thread now.

I already did.  How did you get in here?

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How can I correct an error in an old post?

2006-10-05 Thread Blair P. Houghton

Tim Roberts wrote:
> Although it might be mirrored on a web site somewhere, this is a Usenet
> newsgroup.  It is impossible to "close" a thread.  The concept simply does
> not exist.

Google, the new de facto website of record for Usenet, disagrees.

But they do about 10 things totally wrong with Google groups that
I'd've fixed in my spare time in my first week if they'd hired me back
when I was interviewing with them.

So if they want it to work, they know where to find me.

--Blair

P.S.  Did I mention? Google's distributed systems are managed with
Python scripts.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Google breaks Usenet (was Re: How can I correct an error in an old post?)

2006-10-05 Thread Blair P. Houghton

Aahz wrote:
> In article <[EMAIL PROTECTED]>,
> Blair P. Houghton <[EMAIL PROTECTED]> wrote:
> >
> >But they do about 10 things totally wrong with Google groups that
> >I'd've fixed in my spare time in my first week if they'd hired me back
> >when I was interviewing with them.
>
> Only ten?

I'm giving them the benefit of the doubt.

Their security must have /some/ sort of procedure for extracting
people repairing flaws...

I figure 7-10 days max.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Google breaks Usenet (was Re: How can I correct an error in an old post?)

2006-10-06 Thread Blair P. Houghton
Aahz wrote:
> In article <[EMAIL PROTECTED]>,
> Bryan Olson  <[EMAIL PROTECTED]> wrote:
> >Blair P. Houghton wrote:
> >>
> >> But they do about 10 things totally wrong with Google groups that
> >> I'd've fixed in my spare time in my first week if they'd hired me back
> >> when I was interviewing with them.
> >>
> >> So if they want it to work, they know where to find me.
> >
> >Doesn't seem likely, does it? But don't let it stop you. You don't
> >need Google's permission to build a better Usenet service. They
> >don't have any copyright on the posts, or other special protection.
> >I'm a former Googler myself and I use their service all the time,
> >but if yours is better I'll switch.
>
> The problem is the network effect.  In this case, what Google has that
> can't be replicated is the history of posts.

Exactly.

Usenet isn't just the "send this message to all leaf nodes via tree"
behavior,
it's the "show me the message from 1987 or 1988 written by dickie
sexton where
he invents the '(*plonk*)' meme" behavior, and a lot of others.

It would be an interesting script that would crawl through Google's
online
copy of the DejaNews archive (which itself was incomplete, by the way)
to replicate all of that, with complete headers, minus Google's header
munging.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Google breaks Usenet (was Re: How can I correct an error in an old post?)

2006-10-08 Thread Blair P. Houghton

Bryan Olson wrote:
> Blair P. Houghton wrote:
> > Usenet isn't just the "send this message to all leaf nodes via tree"
> > behavior,
> > it's the "show me the message from 1987 or 1988 written by dickie
> > sexton where
> > he invents the '(*plonk*)' meme" behavior, and a lot of others.
>
> That makes Google the only non-broken Usenet service,
> the opposite of what the retitling of this thread claims.

It takes more than /one/ non-broken behavior to make
a non-broken system.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Google breaks Usenet (was Re: How can I correct an error in an old post?)

2006-10-08 Thread Blair P. Houghton

Bryan Olson wrote:
> Aahz wrote:
> > The problem is the network effect.  In this case, what Google has that
> > can't be replicated is the history of posts.
>
> There's no magic there. Get them the same way Google and
> Dejanews got them, plus you might scrape Google, from some
> locality with favorable laws.

You can do it in America, as long as you don't use their value-added
formatting or data.

Hence, stripping their headers from every message, and un-framing
the messages from their bins.

The original data, with its owner-copyrighted,
revokably-licensed-by-default
attributes, can be recovered.

But some messages can't be seen because their owners already
asked Google not to display them.  They're still in the archive:
Google has as much right as anyone to keep what you sent them.
They just don't have the right to display them next to their ads if
you revoke their license.  But missing messages are comparatively
rare.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hundreds of seconds?

2006-10-11 Thread Blair P. Houghton

Tim Peters wrote:
> On Windows 98, time.time() typically updates only once per 0.055
> seconds (18.2 Hz), but time.clock() typically updates more than a
> million times per second.  You do /not/ want to use time.time() for
> sub-second time measurement on Windows.  Use time.clock() for this
> purpose on Windows.

Windows is not a real-time operating system.

Let me say that again:

Windows is not a real-time operating system.

The times you get from those functions will not always be
"the time I called the function".

They will always be "some time between the time
I called the function and it returned".

The difference being, sometimes when you call a function
it takes a lot longer to return because Windows has gone
and done something else unrelated to your program for
several seconds in-between.

You can reduce how often this happens by jacking up
the process priority for your program, but it never goes
away completely, and, because Windows is not a real-
time operating system, you can not predict with certainty
when these delays will occur and how long they will be.

Which may or may not matter, but people were talking
about trusting the number of bits in a floating-point
number to tell them the precision of the clock, so I
figured I should clear up another misconception while
they were learning not to do that, too...

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: merits of Lisp vs Python

2006-12-09 Thread Blair P. Houghton
Python doesn't annoyingly rip you out of the real world to code in it.

Anyone looking at a python script can get a sense of where it's going.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: EVIDENCE: 911 was CONTROLLED DEMOLITION to make muslims 2nd class

2006-09-05 Thread Blair P. Houghton

>thermate

So the guy found burned aluminum on iron.

That doesn't mean there were military-grade incendiary devices anywhere
near the WTC.

You idiot.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: best small database?

2006-09-11 Thread Blair P. Houghton

Larry Bates wrote:
> The filesystem is almost always the
> most efficient place to store files, not as blobs in a
> database.

I could get all theoretical about why that's not so in most cases,
but there are plenty of cases where it is so (especially when the
person doing the DB doesn't get the idea behind all filesystems,
which is that they are themselves simplified databases), so
I won't*.

In this case, the filesystem may be the best place to
do the work, because it's the cheapest to implement
and maintain.

--Blair

* - okay, I will

1.  Since the filesystem is a database, making accesses
to it after being directed there by a database means you're
using two database systems (and an intervening operating
system)  to do one thing.  Serious databases work from
disks with no filesystem to get rid of that extra layer entirely.
But there are benefits to having things in files reachable by
ordinary tools, and to having the OS mediating access to
the data, but you need to be sure you need those benefits
and can afford the overhead.  Academic in most cases,
including the one that started this thread.

2.  When using the filesystem as the database
you only get one kind of native association, and have to
use semantics in the directory and filenames to give you
hints as to the type stored at a particular location.  You get a
few pieces of accounting data (mod times, etc.) in the
directory listing, but can't associate anything else with
the file directly, at least not unless you create another
file that has the associated data in it, or stuff the extra
data in the file itself, but then that makes each file
a database...see where it goes?  Sometimes it's better
to come up with a schema you can extend rationally to
fit the problem you are trying to solve.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: best small database?

2006-09-12 Thread Blair P. Houghton

Larry Bates wrote:
> As far as "rational extension" is concerned, I think I can relate.
> As a developer of imaging systems that store multiple-millions of
> scanned pieces of paper online for customers, I can promise you
> the file system is quite efficient at storing files (and that is
> what the OP asked for in the original post) and way better than
> storing in Oracle blobs.  Can you store them in the database,
> absolutely.  Is it efficient and manageable.  It has been our
> experience that it is not.  Ever tried to upgrade Oracle 9 to
> Oracle 10 with a Tb of blobs?

Can't be any harder than switching between incompatible filesystems,
unless you assume it should "just work...".

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: best small database?

2006-09-13 Thread Blair P. Houghton

Fredrik Lundh wrote:
> Blair P. Houghton wrote:
>
> > Can't be any harder than switching between incompatible filesystems,
> > unless you assume it should "just work...".
>
> so what file systems are you using that don't support file names and
> binary data ?

Mmmm, no.

I'm saying that the change from Oracle 9 to Oracle 10 is like changing
from ffs to fat32.

They have different structures related to the location and
identification of every stored object.  Sometimes different storage
structures (block sizes, block organization, fragmentation rules, etc.)
for the insides of a file.

A filesystem is a specialized database that stores generalized data.

The value of a database program and its data storage system is that you
can get the filesystem out of the way, and deal only in one layer of
searching and retrieval.

A DB may be only trivially more efficient when the data are a
collection of very large objects with a few externally associated
attributes that can all be found in the average filesystem's directory
structures; but a DB doing raw accesses on a bare disk is a big
improvement in speed when dealing with a huge collection of relatively
small data, each with a relatively large number of inconsistently
associated attributes.

The tradeoff is that you end up giving your DB vendor the option of
making you have to offload and reload that disk if they change their
system between versions.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: best small database?

2006-09-14 Thread Blair P. Houghton

Fredrik Lundh wrote:
> Blair P. Houghton wrote:
> > I'm saying that the change from Oracle 9 to Oracle 10 is like changing
> > from ffs to fat32.
>
> well, I'm quite sure that the people I know who's spending a lot of
> their time moving stuff from Oracle N to Oracle N+1 (and sometimes
> getting stuck, due to incompatibilities between SQL and SQL and a lack
> of infinite resources) would say you're completely and utterly nuts.

Maybe they'd just be hyperbolic from the frustration.  Filesystems
/are/ databases, and incompatibilities /are/ incompatibilities.  And
without ANSI, the SQL problem could be like incompatibilities in C.
Not unheard-of.  Not at all.

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: RegEx conditional search and replace

2006-07-06 Thread Blair P. Houghton

mbstevens wrote:
> In such a case you may need to make the page
> into one string to search if you don't want to use some complex
> method of tracking state with variables as you move from
> string to string.

In general it's a very hard problem to do stateful regexes.

I recall something from last year about the new Perl implementation
that tried to address this sort of problem.  But I may have been
reading old docs and it could have been done years ago.

Parsing the HTML would be the only sure way to accomplish
it.  Let something that already knows the hierarchy tell you
that you're entering a URL and you can skip past all of its
recursive inclusions of strings with URLs with strings that
have URLs and so on...

Of course, that means reconstructing the HTML from the
parse tree afterward...

--Blair

-- 
http://mail.python.org/mailman/listinfo/python-list