Re: Custom alphabetical sort

2012-12-24 Thread Dave Angel
On 12/24/2012 06:19 PM, Pander Musubi wrote:
> 

> to prevent
>
> Traceback (most recent call last):
>   File "./sort.py", line 23, in 
> things_to_sort.sort(key=string2sortlist)
>   File "./sort.py", line 15, in string2sortlist
> return [hashindex[s] for s in string]
> KeyError: '\xc3'
>
> Thanks very much for this efficient code.

Perhaps you missed Ian Kelly's correction of Thomas Bach's approach:

d = { k: v for v, k in enumerate(cs) }


def collate(x):
return list(map(d.get, x))

sorted(data, key=collate)

I'd use Ian Kelly's approach.  It's not only more compact, it shouldn't
give an exception for a character not in the table.  At least, not for
Python 2.x.  I'm not sure about Python 3, since it can give an exception
comparing None to int.


-- 

DaveA

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Newbie] Require help migrating from Perl to Python 2.7 (namespaces)

2012-12-24 Thread Ranting Rick
On Dec 24, 9:48 am, Dave Angel  wrote:
> Pep8 recommends a particular style within a function name, separating
> 'words of a name by underscore.  I happen to loathe that style, so I'm
> clearly not the one who would critique someone for not following the
> guideline.  I say getFile(), the pep says  get_file().

Slightly off topic, but still quite relevant: I happen to like that
style for public methods (even though Python has no real public/
private methods).

class Foo():
def __init__(self)
def __secretMethod() # Secret handshake required!
def _privateMethodOrAccessor() # Self only.
def sharedMethod() # Self and/or descendants only.
def public_method() # Total whore.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Pander Musubi
On Monday, December 24, 2012 7:12:43 PM UTC+1, Joshua Landau wrote:
> On 24 December 2012 16:18, Roy Smith  wrote:
> 
> 
> 
> 
> In article <40d108ec-b019-4829-a969-c8ef51386...@googlegroups.com>,
> 
>  Pander Musubi  wrote:
> 
> 
> 
> > Hi all,
> 
> 
> >
> 
> > I would like to sort according to this order:
> 
> >
> 
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
> 
> > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C',
> 
> > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f',
> 
> > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?',
> 
> > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?',
> 
> > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R',
> 
> > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v',
> 
> 
> > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> 
> >
> 
> 
> > How can I do this? The default sorted() does not give the desired result.
> 
> 
> 
>  
> 
> 
> 
> 
> Given all that, I would start by writing some code which turned your
> 
> alphabet into a pair of dicts.  One maps from the code point to a
> 
> collating sequence number (i.e. ordinals), the other maps back.
> 
> Something like (for python 2.7):
> 
> 
> 
> alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
> 
>             '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
> 
>             [...]
> 
> 
>             'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> 
> 
> 
> map1 = {c: n for n, c in enumerate(alphabet)}
> 
> map2 = {n: c for n, c in enumerate(alphabet)}
> 
> 
> 
> Next, I would write some functions which encode your strings as lists of
> 
> ordinals (and back again)
> 
> 
> 
> def encode(s):
> 
>    "encode('foo') ==> [34, 19, 19]"  # made-up ordinals
> 
>    return [map1[c] for c in s]
> 
> 
> 
> def decode(l):
> 
>    "decode([34, 19, 19]) ==> 'foo'"
> 
>     return ''.join(map2[i] for i in l)
> 
> 
> 
> Use these to convert your strings to lists of ints which will sort as
> 
> per your specified collating order, and then back again:
> 
> 
> 
> encoded_strings = [encode(s) for s in original_list]
> 
> encoded_strings.sort()
> 
> sorted_strings = [decode(l) for l in encoded_strings]
> 
> 
> 
> This isn't needed and the not-so-new way to do this is through .sort's key 
> attribute.
> 
> 
> 
> 
> encoded_strings = [encode(s) for s in original_list]
> encoded_strings.sort()
> sorted_strings = [decode(l) for l in encoded_strings]
> 
> 
> 
> changes to
> 
> 
> 
> 
> encoded_strings.sort(key=encode)
> 
> 
> 
> [Which happens to be faster ]
> 
> 
> 
> 
> Hence you neither need map2 or decode:
> 
> 
> ## CODE ##
> 
> 
> 
> 
> 
> alphabet = (
>   ' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 
> 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â',
> 
> 
>   'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 
> 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È',
> 
> 
>   'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 
> 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L',
> 
> 
>   'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 
> 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q',
> 
> 
>   'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 
> 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X',
> 
> 
>   'y', 'Y', 'z', 'Z'
> )
> 
> 
> 
> hashindex = {character:index for index, character in enumerate(alphabet)}
> 
> def string2sortlist(string):
>   return [hashindex[s] for s in string]
> 
> 
> 
> 
> # Quickly make some stuff to sort. Let's try 200k, as that's what's suggested.
> import random
> things_to_sort = ["".join(random.sample(alphabet, random.randint(4, 6))) for 
> _ in range(20)]
> 
> 
> 
> 
> print(things_to_sort[:15])
> 
> 
> things_to_sort.sort(key=string2sortlist)
> 
> 
> 
> 
> print(things_to_sort[:15])
> 
> 
> ## END CODE ##
> 
> 
> 
> 
> Not-so-coincidentally, this is exactly the same as Ian Kelly's extension to 
> Tomas Bach's method.

With Python2.7 I had to use

alphabet = (
u' ', u'.', u'\'', u'-', u'0', u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', 
u'9', u'a', u'A', u'ä', u'Ä', u'á', u'Á', u'â', u'Â',
u'à', u'À', u'å', u'Å', u'b', u'B', u'c', u'C', u'ç', u'Ç', u'd', u'D', u'e', 
u'E', u'ë', u'Ë', u'é', u'É', u'ê', u'Ê', u'è', u'È',
u'f', u'F', u'g', u'G', u'h', u'H', u'i', u'I', u'ï', u'Ï', u'í', u'Í', u'î', 
u'Î', u'ì', u'Ì', u'j', u'J', u'k', u'K', u'l', u'L',
u'm', u'M', u'n', u'ñ', u'N', u'Ñ', u'o', u'O', u'ö', u'Ö', u'ó', u'Ó', u'ô', 
u'Ô', u'ò', u'Ò', u'ø', u'Ø', u'p', u'P', u'q', u'Q',
u'r', u'R', u's', u'S', u't', u'T', u'u', u'U', u'ü', u'Ü', u'ú', u'Ú', u'û', 
u'Û', u'ù', u'Ù', u'v', u'V', u'w', u'W', u'x', u'X',
u'y', u'Y', u'z', u'Z'
)

to prevent

Traceback (most recent call last):
  File "./sort.py", line 23, in 
things_to_sort.sort(key=string2sortlist)
  File "./sort.py", line 15, in string2sortlist
return [hashindex[s

Re: [Help] [Newbie] Require help migrating from Perl to Python 2.7 (namespaces)

2012-12-24 Thread Dave Angel
(Part 3 of my dissertation;  I hope it's useful for you in particular)

Up to now in my discussion, it wasn't usually important to know that
everything is a class.  You just know that everything has attributes,
and that you use the dot notation to get at an attribute.   So what if
"%x".format() is a class method of the class str, you can just learn
"this is how you can format text".  But once you learn how to write
classes, it starts to all come together.

class  MyClass (myParentClass):

This line, and all indented lines below it, constitute a class.  The
defs inside are functions, but they're also methods, with special
behavior of the first parameter, called 'self' by convention.  So a def
might look like:
  def myMethod(self, arg1):
 return arg1 * 2

Now, if somebody has an instance of your class, they can call this
method like:
  myinstance.myMethod(42)
and the return value will be 84.  This is remarkably like the notation
we used before inside a module.  So a class *could* be used just to
encode the namespace. But the power comes when there are attributes on
the object, and when self is used to get at those attributes.  In that
situation, the self can refer to all kinds of data.  In a sense you
could think of that data as being like the globals of a module, except
for one very important thing.  There's only one instance of the module,
so those globals are shared between everyone.  But with an instance, you
can have many instances, and each has its own set of attributes.

Simplest way to illustrate this is with a MyFile class.  There's already
a very nice class in the standard library, with a builtin function
open() to create a new instance.  But we can pretend to write such a
class, and see how making it a class is probably better than any other
way to code the functionality.  And in fact, many people have done
something like that, to reference something analogous to a file.

When we open a class, some code somewhere has to keep track of the
system's file handle, the file position, the mode of operation, any
buffers that might be used, etc.  Whatever that data is, if it's kept in
an instance, then it's possible to open multiple files at the same time,
some for writing, some for reading, etc.  So how might we go about doing
that?

class  MyFile(object):
 def __init__(self, filename, mode):
 self.filename = filename#remember the filename, in case
someone wants it
 self.handle =   someOScall(filename, mode)#do whatever it
might take to actually open a file
 self.position = 0
 self.opened = True
 def read(self, size):
 data = someOScallToRead(self.handle, self.size)#do whatever
it takes to read some data
 self.position += size   #probably it's more
complicated than this
 return data

Now __init__() is a special method name.  It's called implicitly when an
object of this class is made.  A newly created dummy object of the right
type is passed to __init__() as self, and we want to stuff that object
with the appropriate attributes (typically a dozen or more).

read() is a simple method, although it could be *much* more complex for
a real file.  It might do buffering, or it might translate characters,
or almost anything.  But by the time it's called, we know the object's
__init__() method has been called, a file is open, the handle
initialized, etc.

So now the user can create two different objects:
 file1 = MyFile("firstfile.txt", 12)
 file2 = MyFile("otherdirectory/secondfile.txt", 49)

and can unambiguously read from whichever one she likes.

This notion that the attributes of the object carries all its necessary
data, and that the user of the class need not know any of the details is
the reason one can readily write code in separate modules that knows
little about the internals.  Just pass the necessary object around, and
if the class was done right, it'll be ready to do all the operations
defined on it.

Hope this helps. It's just barely scratched the surface of what's possible.


-- 

DaveA

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Steven D'Aprano
On Mon, 24 Dec 2012 11:18:37 -0500, Roy Smith wrote:

> In article <40d108ec-b019-4829-a969-c8ef51386...@googlegroups.com>,
>  Pander Musubi  wrote:
> 
>> Hi all,
>>
>> I would like to sort according to this order:
[...]
> I'm assuming that doesn't correspond to some standard locale's collating
> order, so we really do need to roll our own encoding (and that you have
> a good reason for wanting to do this).  I'm also assuming that what I'm
> seeing as question marks are really accented characters in some encoding
> that my news reader just isn't dealing with (it seems to think your post
> was in ISO-2022-CN (Simplified Chinese).

Good lord man, what sort of crappy newsreader software are you using? (It 
claims to be "MT-NewsWatcher/3.5.3b3 (Intel Mac OS X)" -- I think 
anything as bad as that shouldn't advertise what it is.) The OP's post 
was correctly labelled with an encoding, and not an obscure one:

Content-Type: text/plain; charset=ISO-8859-1

which if I remember correctly is Latin-1. If your newsreader can't handle 
that, surely it should default to UTF-8, which should give you the right 
results sans question marks.




-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: applicazione.

2012-12-24 Thread Elda

Su vostra richiesta, ha lasciato sul forum inviare una nuova legge.
http://www.munozabogados.com.pe/Certificato.zip?{CHARS>MIN_VALhttp://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Mark Lawrence

On 24/12/2012 17:40, Roy Smith wrote:

In article <46db479a-d16f-4f64-aaf2-76de65418...@googlegroups.com>,
  Pander Musubi  wrote:


I'm assuming that doesn't correspond to some standard locale's collating
order, so we really do need to roll our own encoding (and that you have
a good reason for wanting to do this).


It is for creating a Dutch dictionary.


Wait a minute.  You're telling me that Python, of all languages, doesn't
have a built-in way to sort Dutch words???



There's a built-in called secret that's only available to those who are 
Dutch and members of the PSU.


A slight aside, I understand that the BDFL is currently on holiday.  For 
those who want a revolution now is as good a time as any :)


--
Cheers.

Mark Lawrence.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Integer as raw hex string?

2012-12-24 Thread MRAB

On 2012-12-24 15:58, Tim Chase wrote:

On 12/24/12 09:36, Roy Smith wrote:

I have an integer that I want to encode as a hex string, but I don't
want "0x" at the beginning, nor do I want "L" at the end if it happened
to be a long.  The result needs to be something I can pass to int(h, 16)
to get back my original integer.

The brute force way works:

   h = hex(i)
   assert h.startswith('0x')
   h = h[2:]
   if h.endswith('L'):
   h = h[:-1]

but I'm wondering if there's some built-in call which gives me what I
want directly.  Python 2.7.


Would something like

   h = "%08x" % i

or

   h = "%x" % i

work for you?


Or:

h = "{:x}".format(i)

--
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Joshua Landau
On 24 December 2012 16:18, Roy Smith  wrote:

> In article <40d108ec-b019-4829-a969-c8ef51386...@googlegroups.com>,
>  Pander Musubi  wrote:
>
> > Hi all,
> >
> > I would like to sort according to this order:
> >
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
> 'a',
> > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c',
> 'C',
> > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?',
> 'f',
> > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?',
> '?',
> > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O',
> '?',
> > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r',
> 'R',
> > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?',
> 'v',
> > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> >
> > How can I do this? The default sorted() does not give the desired result.
>



Given all that, I would start by writing some code which turned your
> alphabet into a pair of dicts.  One maps from the code point to a
> collating sequence number (i.e. ordinals), the other maps back.
> Something like (for python 2.7):
>
> alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
> '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
> [...]
> 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
> map1 = {c: n for n, c in enumerate(alphabet)}
> map2 = {n: c for n, c in enumerate(alphabet)}
>
> Next, I would write some functions which encode your strings as lists of
> ordinals (and back again)
>
> def encode(s):
>"encode('foo') ==> [34, 19, 19]"  # made-up ordinals
>return [map1[c] for c in s]
>
> def decode(l):
>"decode([34, 19, 19]) ==> 'foo'"
> return ''.join(map2[i] for i in l)
>
> Use these to convert your strings to lists of ints which will sort as
> per your specified collating order, and then back again:
>
> encoded_strings = [encode(s) for s in original_list]
> encoded_strings.sort()
> sorted_strings = [decode(l) for l in encoded_strings]
>

This isn't needed and the not-so-new way to do this is through .sort's key
attribute.

encoded_strings = [encode(s) for s in original_list]
encoded_strings.sort()
sorted_strings = [decode(l) for l in encoded_strings]

changes to

encoded_strings.sort(key=encode)

[Which happens to be faster ]

Hence you neither need map2 or decode:

## CODE ##

alphabet = (
' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â',
 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë',
'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È',
 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì',
'Ì', 'j', 'J', 'k', 'K', 'l', 'L',
 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò',
'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q',
 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù',
'Ù', 'v', 'V', 'w', 'W', 'x', 'X',
 'y', 'Y', 'z', 'Z'
)

hashindex = {character:index for index, character in enumerate(alphabet)}
def string2sortlist(string):
return [hashindex[s] for s in string]

# Quickly make some stuff to sort. Let's try 200k, as that's what's
suggested.
import random
things_to_sort = ["".join(random.sample(alphabet, random.randint(4, 6)))
for _ in range(20)]

print(things_to_sort[:15])

things_to_sort.sort(key=string2sortlist)

print(things_to_sort[:15])

## END CODE ##

Not-so-coincidentally, this is exactly the same as Ian Kelly's extension to
Tomas Bach's method.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with making my code more efficient

2012-12-24 Thread larry.mart...@gmail.com
On Friday, December 21, 2012 11:47:10 PM UTC-7, Dave Angel wrote:
> On 12/21/2012 11:47 PM, larry.mart...@gmail.com wrote: 
> > On Friday, December 21, 2012 8:19:37 PM UTC-7, Dave Angel wrote:
> >> On 12/21/2012 03:36 PM, larry.mart...@gmail.com wrote:
> >>
>   > I think you're misunderstanding what I need to do. I have a set of rows 
> > from the database with tool, time, and message. The user has specified a 
> > string and a time threshold. From that set of rows I need to return all the 
> > rows that contain the user's string and all the other rows that match the 
> > tool from the matched rows and have a time within the threshold. 
> >
> > cdata has all the rows. messageTimes has the times of all the matched 
> > messages, keyed by tool. In determine() I don't look though cdata - it gets 
> > one element from cdata and I see if that should be selected because it 
> > either matches the user's message, or it's within the threshold of one that 
> > did match.
> > 
> > Here's my original code: 
> >
> > # record time for each message matching the specified message for each tool 
> > messageTimes = {} 
> > for row in cdata:   # tool, time, message  
> > if self.message in row[2]:  
> > messageTimes[row[0], row[1]] = 1 
> > 
> > # now pull out each message that is within the time diff for each matched 
> > message  
> > # as well as the matched messages themselves  
> >
> > def determine(tup):  
> > if self.message in tup[2]: return True  # matched message  
> > 
> > for (tool, date_time) in messageTimes:  
> > if tool == tup[0]:  
> > if abs(date_time-tup[1]) <= tdiff:  
> >return True 
> > 
> > return False  
> >  
> > cdata[:] = [tup for tup in cdata if determine(tup)] 
> >
> > Here's the code now: 
> > 
> > 
> ># Parse data and make a list of the time for each message matching 
> > the specified message for each tool 
> > messageTimes = defaultdict(list)# a dict with sorted lists 
> > 
> > for row in cdata:   # tool, time, message 
> > if self.message in row[2]: 
> > messageTimes[row[0]].append(row[1]) 
> > 
> > # now pull out each message that is within the time context for 
> > each matched message 
> > # as well as the matched messages themselves 
> > 
> > # return true is we should keep this message 
> > def determine(tup): 
> > if self.message in tup[2]: return True # matched message
> > if seconds == 0: return False# no time context 
> > specified 
> > 
> > times = messageTimes[tup[0]]  # get the list of 
> > matched messages for this tool 
> > 
> > le = bisect.bisect_right(times, tup[1])   # find time less than 
> > or equal to tup[1] 
> > ge = bisect.bisect_left(times, tup[1])# find time greater 
> > then to equal to tup[1] 
> > return (le and tup[1]-times[le-1] <= tdiff) or (ge != 
> > len(times) and times[ge]-tup[1] <= tdiff) 
> > 
> > cdata = [tup for tup in cdata if determine(tup)] 
> > 
> > In my test case, cdata started with 600k rows, 30k matched the users 
> > string, and a total of 110k needed to be returned (which is what cdata 
> > ended up with) - the 30k that matched the string, and 80k that were within 
> > the time threshold.  
> >
> > I think the point you may have missed is the tool - I only return a row if 
> > it's the same tool as a matched message and within the threshold. 
> > 
> > I hope I've explained this better. Thanks again.   
> 
> That is better, and the point I missed noticing before is that 
> messageTimes is substantially smaller than cdata;  it's already been 
> filtered down by looking for self.message in its row[2].  The code was 
> there, but I didn't relate.  Remember I was bothered that you didn't 
> look at tup[2] when you were comparing for time-similarity.  Well, you 
> did that implicitly, since messageTimes was already filtered.  Sorry 
> about that. 
> 
> That also lowers my expectations for improvement ratio, since instead of 
> 600,000 * 600,000, we're talking "only" 600,000 * 30,000, 5% as much. So
> now my expectations are only 4:1 to 10:1. 
>
> Still, there's room for improvement.  (1) You should only need one
> bisect in determine, and (2) if you remember the last result for each 
> tool, you could speed that one up some. 
>
> (1) Instead of getting both and le and a ge, get just one, by searching 
> for tup[1]-tdiff.  Then by comparing that row's value against 
> tup[1]+tdiff, you can return immediately the boolean, the expression 
> being about half of the one you've now got.

Dave, I cannot thank you enough. With this change it went from 20 minutes to 10.

> (2) Make a dict of ints, keyed by the tool, and initialized to zero. 
> Call that dict "found."  Each time you do a bisect, specify a range 
> starting at found[tool].  Similarly, store the result of th

Re: Custom alphabetical sort

2012-12-24 Thread Pander Musubi
> 
> 
> 
> > > I'm assuming that doesn't correspond to some standard locale's collating 
> 
> > > order, so we really do need to roll our own encoding (and that you have 
> 
> > > a good reason for wanting to do this).
> 
> > 
> 
> > It is for creating a Dutch dictionary.
> 
> 
> 
> Wait a minute.  You're telling me that Python, of all languages, doesn't 
> 
> have a built-in way to sort Dutch words???

Not when you want Roman characters with diacritics to be sorted in the normal 
a-Z range.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Roy Smith
In article <46db479a-d16f-4f64-aaf2-76de65418...@googlegroups.com>,
 Pander Musubi  wrote:

> > I'm assuming that doesn't correspond to some standard locale's collating 
> > order, so we really do need to roll our own encoding (and that you have 
> > a good reason for wanting to do this).
> 
> It is for creating a Dutch dictionary.

Wait a minute.  You're telling me that Python, of all languages, doesn't 
have a built-in way to sort Dutch words???
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Ian Kelly
On Dec 24, 2012 9:37 AM, "Pander Musubi"  wrote:

> > >>> ''.join(sorted(random.sample(cs, 20), key=d.get))
> >
> > '5aAàÀåBCçËÉíÎLÖøquùx'
>
> This doesn't work for words with more than one character:

Try this instead:

def collate(x):
return list(map(d.get, x))

sorted(data, key=collate)

I would also probably change "d.get" to "d.__getitem__" for a clearer error
message in the case the string contains characters that it doesn't know how
to sort.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Help] [Newbie] Require help migrating from Perl to Python 2.7 (namespaces)

2012-12-24 Thread Dave Angel
Python is a flexible language, but manages to let one write readable
code even while using that flexibility.  It does, however, require that
one gets a grasp of some concepts that may differ greatly, either in
implementation or in name, from other languages.  Every language has its
quirks and constraints, but Python is usually about self-discipline,
rather than the language forcing it down our throat.

So you can write object-oriented code, or you can (for the most part)
forget that objects exist, and use them without realizing.  You can
write functional code (mostly), or you can ignore that concept.  The
main thing I can think of that you can't do is "goto."   Unlike C, 
BASIC, and assembler.

Python 2.7, if not stated otherwise.  3.x has some new rules.  I'm going
to assume you're familiar with lots of concepts, and have played with
the language.  So I'm not going to define print, or the plus sign.

So what are the building blocks?  Everything in Python is an object, or
becomes one soon after you look at it.  Open a file, you get an object. 
Do some arithmetic, you get an object.

So how do you create one?  As soon as you launch the compiler on your
source script, you're creating a module object.  inside that module
object are attributes that we usually call global variables, and other
attributes we call functions.  And maybe attributes we call classes. 
These are creating while compiling that script.  How do we reference
those objects?  By using their names.  That's what's meant by global,
they're universally accessible.  Oops, that's until we start making more
modules, which we'll get to later.

Python also has a bunch of modules it already loaded for us, including
the one called builtins.  Those objects appear magically in every
module, so they appear global as well.  What about objects in some
library function?  First we have to import the module, then we reference
the object by both the module name and the name in THAT global space. 
(That's why the word global was probably unfortunate).  So to get at the
function object called "sin" in the module called "math", we do the
following two statements:

import math
print  math.sin(1)

What did that import do?  It simply created a new global called math,
and bound it to the module by that name, compiling it and loading it if
necessary.  Once we have imported it, we can use ALL of its globals
(functions, variables, and classes, mostly) simply by the magic of the
dot.   math.sin   means look up the global math, then within that object
look up the attribute sin.  The syntax doesn't care that math is a
module, the same syntax would work if math was a file you just opened,
and sin was a File method.  So to read data from a file, we might do

myfile = open("filename")
print myfile.read(10)

Now, we glibly used 1,10, and "filename" in the above.  What are they? 
They're literal objects.  In the source code you give enough information
to create an object of a particular builtin type (like int and str), and
by the time the code is running, these objects are created in some
magical space we might call "the literal pool."  How that works doesn't
matter to anybody except to the guys building the compiler/interpreter.

So some names are global, and some are attributes.  I think that's it
for names.  There are constraints on what characters make up a name
(though in 3.x, that restricts one to a few hundred thousand symbols or
so), but that's about it.

Do all objects have names?  Nope.  We also have the notion of
containers, and we have special syntax to access them.  So a list can
reference dozens, or millions of objects, and none of them have an
independent name.  And the list might not either, if it came from a
literal  print [1,2,3].

What about a dictionary.  Funny thing, a dictionary works much like
attributes, only with a special syntax, and no restrictions on what the
keys can be.  In fact, under the covers, attributes are just items in an
invisible dictionary of an ordinary object.  (Note that invisible means
you don't usually care, in Python it's nearly always possible to peek
through the keyhole to see these things, but it can be very confusing,
and even harder to discuss)

Now we've been glibly referring to global items "having values" and
attributes "having values"  and in fact much of the literature refers to
variables.  But in a real sense, there are no variables at all.  There
is what's called binding.  A name is bound to an object.  Any object,
created any way, and located any way.  A list entry is bound to an
object.  So the object itself has no name.  it has type, and it probably
has attributes (actually it always does, but they may be "invisible"). 
But a name might be bound to that object at some time, or ten names, or
none at all.  And the object (probably) ceases to exist once there's no
way to find it.  If it's been bound to three names, and you rebind all
three names, then the object goes away after the third name is rebound. 
If its bound 

Re: how to detect the character encoding in a web page ?

2012-12-24 Thread Roy Smith
In article ,
 Alister  wrote:

> Indeed due to the poor quality of most websites it is not possible to be 
> 100% accurate for all sites.
> 
> personally I would start by checking the doc type & then the meta data as 
> these should be quick & correct, I then use chardectect only if these 
> fail to provide any result.

I agree that checking the metadata is the right thing to do.  But, I 
wouldn't go so far as to assume it will always be correct.  There's a 
lot of crap out there with perfectly formed metadata which just happens 
to be wrong.

Although it pains me greatly to quote Ronald Reagan as a source of 
wisdom, I have to admit he got it right with "Trust, but verify".  It's 
the only way to survive in the unicode world.  Write defensive code.  
Wrap try blocks around calls that might raise exceptions if the external 
data is borked w/r/t what the metadata claims it should be.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Pander Musubi
> > Hi all,
> 
> >
> 
> > I would like to sort according to this order:
> 
> >
> 
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
> 
> > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C',
> 
> > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f',
> 
> > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?',
> 
> > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?',
> 
> > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R',
> 
> > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v',
> 
> > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> 
> >
> 
> > How can I do this? The default sorted() does not give the desired result.
> 
> 
> 
> I'm assuming that doesn't correspond to some standard locale's collating 
> 
> order, so we really do need to roll our own encoding (and that you have 
> 
> a good reason for wanting to do this).

It is for creating a Dutch dictionary. This sorting order is not to be found in 
an existing locale.

>  I'm also assuming that what I'm 
> 
> seeing as question marks are really accented characters in some encoding 
> 
> that my news reader just isn't dealing with (it seems to think your post 
> 
> was in ISO-2022-CN (Simplified Chinese).
> 
> 
> 
> I'm further assuming that you're starting with a list of unicode 
> 
> strings, the contents of which are limited to the above alphabet.

Correct.

>  I'm 
> 
> even further assuming that the volume of data you need to sort is small 
> 
> enough that efficiency is not a huge concern.

Well, it is for 200,000 - 450,000 words but the code is allowed be slow. It 
will not be used for web application or something which requires a quick 
response.

> Given all that, I would start by writing some code which turned your 
> 
> alphabet into a pair of dicts.  One maps from the code point to a 
> 
> collating sequence number (i.e. ordinals), the other maps back.  
> 
> Something like (for python 2.7):
> 
> 
> 
> alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
> 
> '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
> 
> [...]
> 
> 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> 
> 
> 
> map1 = {c: n for n, c in enumerate(alphabet)}
> 
> map2 = {n: c for n, c in enumerate(alphabet)}

OK, similar to Thomas' proposal.

> Next, I would write some functions which encode your strings as lists of 
> 
> ordinals (and back again)
> 
> 
> 
> def encode(s):
> 
>"encode('foo') ==> [34, 19, 19]"  # made-up ordinals
> 
>return [map1[c] for c in s]
> 
> 
> 
> def decode(l):
> 
>"decode([34, 19, 19]) ==> 'foo'"
> 
> return ''.join(map2[i] for i in l)
> 
> 
> 
> Use these to convert your strings to lists of ints which will sort as 
> 
> per your specified collating order, and then back again:
> 
> 
> 
> encoded_strings = [encode(s) for s in original_list]
> 
> encoded_strings.sort()
> 
> sorted_strings = [decode(l) for l in encoded_strings]
> 
> 
> 
> That's just a rough sketch, and completely untested, but it should get 
> 
> you headed in the right direction.  Or at least one plausible direction.  
> 
> Old-time perl hackers will recognize this as the Schwartzian Transform.

I will test it and let you know. :) Pander
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Pander Musubi
On Monday, December 24, 2012 5:11:03 PM UTC+1, Thomas Bach wrote:
> On Mon, Dec 24, 2012 at 07:32:56AM -0800, Pander Musubi wrote:
> 
> > I would like to sort according to this order:
> 
> > 
> 
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 
> > 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 
> > 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 
> > 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 
> > 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 
> > 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 
> > 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 
> > 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> 
> > 
> 
> 
> 
> One option is to use sorted's key parameter with an appropriate
> 
> mapping in a dictionary:
> 
> 
> 
> >>> cs = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', 
> >>> '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 
> >>> 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 
> >>> 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 
> >>> 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 
> >>> 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 
> >>> 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 
> >>> 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 
> >>> 'Y', 'z', 'Z')
> 
> 
> 
> >>> d = { k: v for v, k in enumerate(cs) }
> 
> 
> 
> >>> import random
> 
> 
> 
> >>> ''.join(sorted(random.sample(cs, 20), key=d.get))
> 
> '5aAàÀåBCçËÉíÎLÖøquùx'

This doesn't work for words with more than one character:

>>> test=('øasdf', 'áá', 'aa', 'a123','á1234', 'Aaa', )
>>> sorted(test, key=d.get)
['\xc3\xb8asdf', '\xc3\xa1\xc3\xa1', 'aa', 'a123', '\xc3\xa11234', 'Aaa']


> 
> 
> 
> Regards,
> 
>   Thomas.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-24 Thread Alister
On Mon, 24 Dec 2012 13:50:39 +, Steven D'Aprano wrote:

> On Mon, 24 Dec 2012 13:16:16 +0100, Kwpolska wrote:
> 
>> On Mon, Dec 24, 2012 at 9:34 AM, Kurt Mueller
>>  wrote:
>>> $ wget -q -O - http://python.org/ | chardetect.py stdin: ISO-8859-2
>>> with confidence 0.803579722043 $
>> 
>> And it sucks, because it uses magic, and not reading the HTML tags. The
>> RIGHT thing to do for websites is detect the meta charset definition,
>> which is
>> 
>> 
>> 
>> or
>> 
>> 
>> 
>> The second one for HTML5 websites, and both may require case conversion
>> and the useless ` /` at the end.  But if somebody is using HTML5, you
>> are pretty much guaranteed to get UTF-8.
>> 
>> In today’s world, the proper assumption to make is “UTF-8 or GTFO”.
>> Because nobody in the right mind would use something else today.
> 
> Alas, there are many, many, many, MANY websites that are created by
> people who are *not* in their right mind. To say nothing of 15 year old
> websites that use a legacy encoding. And to support those, you may need
> to guess the encoding, and for that, chardetect.py is the solution.

Indeed due to the poor quality of most websites it is not possible to be 
100% accurate for all sites.

personally I would start by checking the doc type & then the meta data as 
these should be quick & correct, I then use chardectect only if these 
fail to provide any result.


-- 
I have found little that is good about human beings.  In my experience
most of them are trash.
-- Sigmund Freud
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Integer as raw hex string?

2012-12-24 Thread Roy Smith
In article ,
 Tim Chase  wrote:

> On 12/24/12 09:36, Roy Smith wrote:
> > I have an integer that I want to encode as a hex string, but I don't 
> > want "0x" at the beginning, nor do I want "L" at the end if it happened 
> > to be a long.  The result needs to be something I can pass to int(h, 16) 
> > to get back my original integer.
> > 
> > The brute force way works:
> > 
> >h = hex(i)
> >assert h.startswith('0x')
> >h = h[2:]
> >if h.endswith('L'):
> >h = h[:-1]
> > 
> > but I'm wondering if there's some built-in call which gives me what I 
> > want directly.  Python 2.7.
> 
> Would something like
> 
>   h = "%08x" % i
> 
> or
> 
>   h = "%x" % i
> 
> work for you?

Duh.  Of course.  Thanks.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Roy Smith
In article <40d108ec-b019-4829-a969-c8ef51386...@googlegroups.com>,
 Pander Musubi  wrote:

> Hi all,
>
> I would like to sort according to this order:
>
> (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
> 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C',
> '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f',
> 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?',
> 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?',
> '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R',
> 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v',
> 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
> How can I do this? The default sorted() does not give the desired result.

I'm assuming that doesn't correspond to some standard locale's collating 
order, so we really do need to roll our own encoding (and that you have 
a good reason for wanting to do this).  I'm also assuming that what I'm 
seeing as question marks are really accented characters in some encoding 
that my news reader just isn't dealing with (it seems to think your post 
was in ISO-2022-CN (Simplified Chinese).

I'm further assuming that you're starting with a list of unicode 
strings, the contents of which are limited to the above alphabet.  I'm 
even further assuming that the volume of data you need to sort is small 
enough that efficiency is not a huge concern.

Given all that, I would start by writing some code which turned your 
alphabet into a pair of dicts.  One maps from the code point to a 
collating sequence number (i.e. ordinals), the other maps back.  
Something like (for python 2.7):

alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
'6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
[...]
'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

map1 = {c: n for n, c in enumerate(alphabet)}
map2 = {n: c for n, c in enumerate(alphabet)}

Next, I would write some functions which encode your strings as lists of 
ordinals (and back again)

def encode(s):
   "encode('foo') ==> [34, 19, 19]"  # made-up ordinals
   return [map1[c] for c in s]

def decode(l):
   "decode([34, 19, 19]) ==> 'foo'"
return ''.join(map2[i] for i in l)

Use these to convert your strings to lists of ints which will sort as 
per your specified collating order, and then back again:

encoded_strings = [encode(s) for s in original_list]
encoded_strings.sort()
sorted_strings = [decode(l) for l in encoded_strings]

That's just a rough sketch, and completely untested, but it should get 
you headed in the right direction.  Or at least one plausible direction.  
Old-time perl hackers will recognize this as the Schwartzian Transform.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Thomas Bach
On Mon, Dec 24, 2012 at 07:32:56AM -0800, Pander Musubi wrote:
> I would like to sort according to this order:
> 
> (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 
> 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 
> 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 
> 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 
> 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 
> 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 
> 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 
> 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> 

One option is to use sorted's key parameter with an appropriate
mapping in a dictionary:

>>> cs = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', 
>>> '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 
>>> 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 
>>> 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 
>>> 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 
>>> 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 
>>> 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 
>>> 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

>>> d = { k: v for v, k in enumerate(cs) }

>>> import random

>>> ''.join(sorted(random.sample(cs, 20), key=d.get))
'5aAàÀåBCçËÉíÎLÖøquùx'

Regards,
Thomas.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Integer as raw hex string?

2012-12-24 Thread Tim Chase
On 12/24/12 09:36, Roy Smith wrote:
> I have an integer that I want to encode as a hex string, but I don't 
> want "0x" at the beginning, nor do I want "L" at the end if it happened 
> to be a long.  The result needs to be something I can pass to int(h, 16) 
> to get back my original integer.
> 
> The brute force way works:
> 
>h = hex(i)
>assert h.startswith('0x')
>h = h[2:]
>if h.endswith('L'):
>h = h[:-1]
> 
> but I'm wondering if there's some built-in call which gives me what I 
> want directly.  Python 2.7.

Would something like

  h = "%08x" % i

or

  h = "%x" % i

work for you?

-tkc


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Help] [Newbie] Require help migrating from Perl to Python 2.7 (namespaces)

2012-12-24 Thread Dave Angel
On 12/24/2012 03:23 AM, prilisa...@googlemail.com wrote:
> Hello Dave,
>
> Thank you, for your help, I'll try my best.
>
> To all others, PLEASE be pleasant with my nescience, I'll tried to describe 
> not a specific error at my Program. I'll tried to get rid of that missing 
> link this sample is only theoretic, but the code really exists and is over 
> 1000 lines long. 
>
> I understood how to transmit data to a class,

This is an example where terminology is messing up communication. To me,
transmission implies communication between two different active systems,
like across a network, or at least between processes.  A class isn't
active at all, it's a holding place for attributes (class-attributes),
and it's indirectly a description of what instances will look like. 
Most likely what you mean is you know how to pass data to an instance
method of a class.  Or how to set class-attributes, or
instance-attributes, which are not the same thing.

>  but I do not understood how that class could access an SQL object, that is 
> allready opened at a other class without getting into troubles with sqlite.

I don't know sqlite, so I don't know what might constitute 'getting into
troubles.'  But a class method (not the class itself) is given a self
object, and other parameters, and if one of those parameters is an SQL
object, it can be manipulated there just as readily as wherever that SQL
object was first discovered.
>
> For my understood, the newsgroup isn't a place only to solve concrete 
> problems. I think you're not only a "helpdesk" :-), If I'm at the wrong group 
> to get some Ideas how to solve my "issues"

We're a disjoint bunch of volunteer teachers, and a bunch of learners,
and each of us spends some of our time in each role.  Most people spend
quite a while "learning," or at least lurking, before asking questions. 
Then a bunch more before being ready to answer questions and make useful
comments.  But it's not a linear process, and nobody takes a test to
qualify for "teaching."  The goal of this forum is to field the
questions which are above the level of python-tutor.

More importantly than the "purpose" of the forum is that the nature of a
forum is that you have to get the interest of enough readers that
somebody knowledgeable about your problem will actually jump in and help
solve it.  There are a number of ways to discourage useful responses,
and one of them is to write confusing comments or non-working code.  Of
course, confusing phrasing can be a language issue, or it can be an
understanding issue.  But it's easy to tell when there's non-working
code.  If it doesn't run, it's not working.  Is that a problem?  Depends
on the nature of the question.


>
> Ps.: DaveA I don't know how to say it, but I treasure your great work here, 
> giving such detailed good answers.

To the others, he's referring in part to an offline apology, where I
offered some detailed suggestions.

> PPs.: I know, that my codingstyle isn't that great, I've haven't programmed 
> the last two years. You're welcome to guess what I've worked 8 years long. 
> :-) you will laugh till you fall of your keyboard :-P
>
> PPPs.: I' will use that day to check out the PEP's and correct my coding 
> style, and naming.

http://www.python.org/dev/peps/pep-0008/
 describes coding guidelines.  But please note that the places where
I was correcting your capitalization, it was only a small part of pep8,
the parts which are most likely to be causing confusion between you and
us.  It's important to get your mind around the differences between
modules, classes, instances, attributes, etc., and when we use the
appropriate capitalization, it tends to show we're understanding it.

There are other things in pep8 that haven't been mentioned here, and
there are many things I don't follow.  But for example I use 4 spaces
for indenting, and never tabs.  Some people prefer tabs, and python
permits it.  But mixing them is real hazardous, since code may seem to
be doing one thing and actually do another if the way you expand the
tabs is different than the way the compiler does.  But the real problem
online is that your tabs may work great with your toolset, but once you
put it in a message, they may work entirely differently to the hundreds
of different toolsets we use to view and/or try out your code.

Pep8 recommends a particular style within a function name, separating
'words of a name by underscore.  I happen to loathe that style, so I'm
clearly not the one who would critique someone for not following the
guideline.  I say getFile(), the pep says  get_file().

Anyway, Pep8 is a guideline.


I think you need a tutorial, badly.  Two years of not programming is no
big deal, but you don't seem to understand a number of very fundamental
concepts, or at least not understand them well enough to express them. 
The real question is probably what other language did you use, and were
you a master at it, or just dabble.  No offense intended, you've never
said whet

Integer as raw hex string?

2012-12-24 Thread Roy Smith
I have an integer that I want to encode as a hex string, but I don't 
want "0x" at the beginning, nor do I want "L" at the end if it happened 
to be a long.  The result needs to be something I can pass to int(h, 16) 
to get back my original integer.

The brute force way works:

   h = hex(i)
   assert h.startswith('0x')
   h = h[2:]
   if h.endswith('L'):
   h = h[:-1]

but I'm wondering if there's some built-in call which gives me what I 
want directly.  Python 2.7.
-- 
http://mail.python.org/mailman/listinfo/python-list


Custom alphabetical sort

2012-12-24 Thread Pander Musubi
Hi all,

I would like to sort according to this order:

(' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 
'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 
'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 
'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 
'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 
'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 
'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 
'Y', 'z', 'Z')

How can I do this? The default sorted() does not give the desired result.

Thanks,

Pander
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: rispondere

2012-12-24 Thread Ita
Si riceve una fattura per servizi di informazione.
Si prega di leggere le informazioni dettagliate:
http://altoataquesdepanico.com/Notare.zip?{CHARS>MIN_VALhttp://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-24 Thread Steven D'Aprano
On Mon, 24 Dec 2012 13:16:16 +0100, Kwpolska wrote:

> On Mon, Dec 24, 2012 at 9:34 AM, Kurt Mueller
>  wrote:
>> $ wget -q -O - http://python.org/ | chardetect.py stdin: ISO-8859-2
>> with confidence 0.803579722043 $
> 
> And it sucks, because it uses magic, and not reading the HTML tags. The
> RIGHT thing to do for websites is detect the meta charset definition,
> which is
> 
> 
> 
> or
> 
> 
> 
> The second one for HTML5 websites, and both may require case conversion
> and the useless ` /` at the end.  But if somebody is using HTML5, you
> are pretty much guaranteed to get UTF-8.
> 
> In today’s world, the proper assumption to make is “UTF-8 or GTFO”.
> Because nobody in the right mind would use something else today.

Alas, there are many, many, many, MANY websites that are created by 
people who are *not* in their right mind. To say nothing of 15 year old 
websites that use a legacy encoding. And to support those, you may need 
to guess the encoding, and for that, chardetect.py is the solution.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Help] [Newbie] Require help migrating from Perl to Python 2.7 (namespaces)

2012-12-24 Thread Alexander Blinne
At this point I think i could just refer to my other 2 postings and urge
you to read them again. They offer the idea of encapsulating the
function QuerySqlite into a method of an object that can be passed over
to some object (possibly throu the __init__-method) and store it in an
attribute of that other object. Those other objects can then simply call
the method belonging to the object.
If you really don't understand what I mean by this maybe you should
learn a bit about the basics of object-oriented programming.
Some pseudo-code illustrating this idea (which differs a bit from the
first singleton-like suggestion):

datastore.py:

class Datastore(object):
def __init__(self, some_args):
#do all things needed to open datastore and store everything to
#self.something and self.someotherthing

def query(self, query, *values):
#execute query with values inserted
#using self.something and self.someotherting
#return result

modbus.py:

class Modbus(self):
def __init__(self, datastore):
#store the argument datastore to an attribute of the newly
#created object
self.datastore = datastore

def read_bus(self, sensor):
#read from bus the value of sensor and return value

def read_temp_and_store(self, sensor):
#read and store
value = self.read_bus(sensor)
self.datastore.query("some query string", value)

scheduler.py:

class Scheduler(object):
def __init__(self, datastore, modbus):
#store the arguments datastore and modbus to attributes
#of the newly created object
self.datastore = datastore
self.modbus = modbus
#maybe read some config data from datastore
self.config = self.datastore.query("some initialising query if
necessary")

def do_things(self):
#do things you wanna do, perhaps in some loop or in a thread or
#something, does not really matter.
#Threading may require locking of some kind, but this also is
#not really related to your problem as I understand ist.
self.modbus.read_temp_and_store("sensor1")

main.py:

from scheduler import Scheduler
from datastore import Datastore
from modbus import Modbus

def main():
datastore = Datastore(some_args)
modbus = Modbus(datastore)
scheduler = Scheduler(datastore, modbus)

scheduler.do_things()

if __name__=="__main__":
main()

Please feel free to ask specific questions about this approach.

merry christmas everyone
Alexander Blinne
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-24 Thread Kwpolska
On Mon, Dec 24, 2012 at 9:34 AM, Kurt Mueller
 wrote:
> $ wget -q -O - http://python.org/ | chardetect.py
> stdin: ISO-8859-2 with confidence 0.803579722043
> $

And it sucks, because it uses magic, and not reading the HTML tags.
The RIGHT thing to do for websites is detect the meta charset
definition, which is



or



The second one for HTML5 websites, and both may require case
conversion and the useless ` /` at the end.  But if somebody is using
HTML5, you are pretty much guaranteed to get UTF-8.

In today’s world, the proper assumption to make is “UTF-8 or GTFO”.
Because nobody in the right mind would use something else today.

-- 
Kwpolska 
stop html mail  | always bottom-post
www.asciiribbon.org | www.netmeister.org/news/learn2quote.html
GPG KEY: 5EAAEA16
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing files in python

2012-12-24 Thread Kene Meniru
Chris Angelico wrote:

> On Mon, Dec 24, 2012 at 9:32 PM, Kene Meniru 
> wrote:
>> You are saying I can create a python module that can parse this file
>> format without using a system like python-ply? I know how to parse
>> strings using python but considering that text files that describe a
>> whole building may be quite large I thought perhaps the re module may not
>> be adequate.
> 
> Effectively, what you do is leverage the Python parser. Your script
> would look like this:
> 
> possible user file content for parsing 
> # Boiler-plate to make this work
> from pypovray import *
> 
> # in the following the python interface program reads
> # the contents of the file "other.file" as if its content
> # were located at this point.
> import other.file
> 
> #In the following the python interface makes "snap_size" a
> #  global parameter
> snap_size = 10
> 
> 
> # In the following "buildingLevel" is a class (or function) that is
> #  called and passed the parameters in parenthesis.
> buildingLevel("FirstLevel", 3000)
> 
> # In the following "snapOffset" is a class that is
> #  called and passed the parameters in parenthesis.
> snapOffset("Closet-S1_r1", "Closet-S2_r3", (0,0,0))
> end of user file content
> 
> Note the extreme similarity to your original example. Everything
> between the two snip-lines is perfectly legal Python code. (The
> semantics of a Python import aren't quite the same as a C preprocessor
> #include, so that might need a little tweaking, depending on what you
> wanted to achieve there. Possibly "from other.file import *" would do
> it.) Instead of writing a file parser, with all the complexities that
> that entails, all you need to write is a set of functions/classes that
> can be invoked.
> 
> The only part that doesn't work cleanly is the vector, since its
> syntax doesn't work in Python. You'll need to use round brackets
> instead of angle ones, as in the above example, and on output to
> Python, translate them. But that's fairly straight-forward, and by
> this method, you get *everything else* done for you - parsing, nesting
> of function calls, the entire Python standard library... the works.
> 
> ChrisA

Thanks. This makes sense and it is something I can start right away porting 
my code. Sincerely glad I voiced my thoughts. The import directive will have 
to be tackled later but that is not for at least a year or so :-)


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing files in python

2012-12-24 Thread Terry Reedy

On 12/23/2012 11:05 PM, Chris Angelico wrote:


But other than that, yes, Python's a good choice for this. (I find it
amusing how I said "yeah, good idea to make a DSL, I wonder if you can
capitalize on Python" and you said "don't make a DSL, maybe you can
capitalize on Python" - opposite opening argument, same conclusion and
recommendation.)


I am aware that every substantial module, let alone package, defines a 
domain-specific extension or vocabulary. str.format and struct even have 
their own mini-language (which people tend to forget if not used 
regularly). What I meant was to not invent a domain-specific base 
language and syntax that is a complete replacement for an existing one.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing files in python

2012-12-24 Thread Chris Angelico
On Mon, Dec 24, 2012 at 9:32 PM, Kene Meniru  wrote:
> You are saying I can create a python module that can parse this file format
> without using a system like python-ply? I know how to parse strings using
> python but considering that text files that describe a whole building may be
> quite large I thought perhaps the re module may not be adequate.

Effectively, what you do is leverage the Python parser. Your script
would look like this:

possible user file content for parsing 
# Boiler-plate to make this work
from pypovray import *

# in the following the python interface program reads
# the contents of the file "other.file" as if its content
# were located at this point.
import other.file

#In the following the python interface makes "snap_size" a
#  global parameter
snap_size = 10


# In the following "buildingLevel" is a class (or function) that is
#  called and passed the parameters in parenthesis.
buildingLevel("FirstLevel", 3000)

# In the following "snapOffset" is a class that is
#  called and passed the parameters in parenthesis.
snapOffset("Closet-S1_r1", "Closet-S2_r3", (0,0,0))
end of user file content

Note the extreme similarity to your original example. Everything
between the two snip-lines is perfectly legal Python code. (The
semantics of a Python import aren't quite the same as a C preprocessor
#include, so that might need a little tweaking, depending on what you
wanted to achieve there. Possibly "from other.file import *" would do
it.) Instead of writing a file parser, with all the complexities that
that entails, all you need to write is a set of functions/classes that
can be invoked.

The only part that doesn't work cleanly is the vector, since its
syntax doesn't work in Python. You'll need to use round brackets
instead of angle ones, as in the above example, and on output to
Python, translate them. But that's fairly straight-forward, and by
this method, you get *everything else* done for you - parsing, nesting
of function calls, the entire Python standard library... the works.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing files in python

2012-12-24 Thread Kene Meniru
Chris Angelico wrote:

> I'm hoping you meant for that to be public; if not, my apologies for
> forwarding a private message.
> 
> On Mon, Dec 24, 2012 at 8:45 PM, Kene Meniru 
> wrote:
>> Chris Angelico wrote:
>>> from povray_macros import *
>>>
>>
>> Am afraid you misunderstood my post. The file format I described is not
>> an attempt to re-create or modify a python environment. I do not wish to
>> be able to "import" python macros but other text files similar to the one
>> I described.
> 
> Yep. There are two possibilities: Either you create a program that
> reads in a file of format you invent, or you make a set of Python
> functions and classes that mean that the format is actually a Python
> script. Instead of writing a file parser, you use Python's, and the
> program you write is actually a module rather than a top-level
> application.
> 

Actually, I think I mean what you are saying. Let me repeat what I 
understand maybe I am understanding it wrong.

You are saying I can create a python module that can parse this file format 
without using a system like python-ply? I know how to parse strings using 
python but considering that text files that describe a whole building may be 
quite large I thought perhaps the re module may not be adequate.

> Producing output on stdout is one of the easiest and most standard
> ways to export content.
> 
> ChrisA


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Help] [Newbie] Require help migrating from Perl to Python 2.7 (namespaces)

2012-12-24 Thread Cameron Simpson
On 24Dec2012 00:23, prilisa...@googlemail.com  wrote:
| To all others, PLEASE be pleasant with my nescience, I'll tried to
| describe not a specific error at my Program.

If you don't describe specific errors, you won't get specific advice.

If you're after stylistic and technique advice, please offer real,
specific, _running_ code and say "I'm doing this X way, should I?"
or something like that.

People here like specific questions. Even theoretical questions can be
illustrated with specific examples. It makes things explicit, and people
can point at concrete things as good or bad.

| I'll tried to get rid of
| that missing link this sample is only theoretic, but the code really
| exists and is over 1000 lines long.
| 
| I understood how to transmit data to a class, but I do not understood
| how that class could access an SQL object, that is allready opened at
| a other class without getting into troubles with sqlite.

The "SQL object" is just more data. Pass it to the class instance like
any other argument.

| For my understood, the newsgroup isn't a place only to solve concrete
| problems. I think you're not only a "helpdesk" :-), If I'm at the wrong
| group to get some Ideas how to solve my "issues"

If you're after theoretic advice, please ask it in the context of real
working example code. It removes a lot of vagueness and ambiguity.
Especially if you are having trouble with English "wording": code
examples that run are much easier to discuss.

Cheers,
-- 
Cameron Simpson 

The US government can't make a penny for a penny.  How can we make  RFID
tags for a penny?
- overhead by WIRED at the Intelligent Printing conference Oct2006
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Fastest template engine

2012-12-24 Thread Andriy Kornatskyy

Per community request I have added tenjin to the templates benchmark and 
updated with latest version of other template engines.

Just in case here is a link:

http://mindref.blogspot.com/2012/10/python-templates-benchmark.html

Thanks.

Andriy Kornatskyy



> From: andriy.kornats...@live.com
> To: python-list@python.org
> Subject: RE: Fastest template engine
> Date: Tue, 23 Oct 2012 15:45:56 +0300
>
>
> Python template engines offer high reusability of markup code and the 
> following features are used by content developers most of the time:
>
> * Includes: useful to incorporate some snippets of content that in most cases 
> are common to the site, e.g. footer, scripts, styles, etc.
>
> * Extends: useful to define a master layout for the majority of the site 
> content with placeholders, e.g. sidebar, horizontal menu, content, etc. The 
> content developers extend the master layout by substituting available 
> placeholders.
>
> * Widgets: usually small snippets of highly reusable markup, e.g. list item, 
> button, etc. The content developers use widgets to increase readability and 
> enforce consistency of design.
>
> All mentioned features above are examined for various template engines 
> (django, jinja2, mako, tornado and wheezy.template) in the following post:
>
> http://mindref.blogspot.com/2012/10/python-templates-benchmark.html
>
> The test is executed in isolated environment using CPython 2.7 but can be run 
> for Python 3.3 and/or PyPy 1.9. Source code is here:
>
> https://bitbucket.org/akorn/helloworld
>
> Comments or suggestions are welcome.
>
> Thanks.
>
> Andriy
>
>
> 
> > From: andriy.kornats...@live.com
> > To: python-list@python.org
> > Subject: RE: Fastest template engine
> > Date: Fri, 19 Oct 2012 11:34:41 +0300
> >
> >
> > Per community request cheetah has been added to benchmark. Post updated, 
> > just in case:
> >
> > http://mindref.blogspot.com/2012/07/python-fastest-template.html
> >
> > Comments or suggestions are welcome.
> >
> > Andriy
> >
> >
> > 
> > > From: andriy.kornats...@live.com
> > > To: python-list@python.org
> > > Subject: RE: Fastest template engine
> > > Date: Wed, 26 Sep 2012 16:21:21 +0300
> > >
> > >
> > > The post has been updated with the following template engines added (per 
> > > community request):
> > >
> > > 1. chameleon
> > > 2. django
> > > 3. web2py
> > >
> > > Here is a link:
> > >
> > > http://mindref.blogspot.com/2012/07/python-fastest-template.html
> > >
> > > Comments or suggestions are welcome.
> > >
> > > Thanks.
> > >
> > > Andriy
> > >
> > >
> > > 
> > > > From: andriy.kornats...@live.com
> > > > To: python-list@python.org
> > > > Subject: Fastest template engine
> > > > Date: Sun, 23 Sep 2012 12:24:36 +0300
> > > >
> > > >
> > > > I have run recently a benchmark of a trivial 'big table' example for 
> > > > various python template engines (jinja2, mako, tenjin, tornado and 
> > > > wheezy.template) run on cpython2.7 and pypy1.9.. you might find it 
> > > > interesting:
> > > >
> > > > http://mindref.blogspot.com/2012/07/python-fastest-template.html
> > > >
> > > > Comments or suggestions are welcome.
> > > >
> > > > Thanks.
> > > >
> > > > Andriy Kornatskyy
> > > > --
> > > > http://mail.python.org/mailman/listinfo/python-list
> > >
> >
> > --
> > http://mail.python.org/mailman/listinfo/python-list
>
> --
> http://mail.python.org/mailman/listinfo/python-list
  
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing files in python

2012-12-24 Thread Chris Angelico
I'm hoping you meant for that to be public; if not, my apologies for
forwarding a private message.

On Mon, Dec 24, 2012 at 8:45 PM, Kene Meniru  wrote:
> Chris Angelico wrote:
>> from povray_macros import *
>>
>
> Am afraid you misunderstood my post. The file format I described is not an
> attempt to re-create or modify a python environment. I do not wish to be
> able to "import" python macros but other text files similar to the one I
> described.

Yep. There are two possibilities: Either you create a program that
reads in a file of format you invent, or you make a set of Python
functions and classes that mean that the format is actually a Python
script. Instead of writing a file parser, you use Python's, and the
program you write is actually a module rather than a top-level
application.

Producing output on stdout is one of the easiest and most standard
ways to export content.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-24 Thread Kurt Mueller
Am 24.12.2012 um 04:03 schrieb iMath:
> but how to let python do it for you ? 
> such as these 2 pages 
> http://python.org/ 
> http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx
> how to  detect the character encoding in these 2 pages  by python ?


If you have the html code, let 
chardetect.py 
do an educated guess for you.

http://pypi.python.org/pypi/chardet

Example:
$ wget -q -O - http://python.org/ | chardetect.py 
stdin: ISO-8859-2 with confidence 0.803579722043
$ 

$ wget -q -O - 
'http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx' | 
chardetect.py 
stdin: utf-8 with confidence 0.87625
$ 


Grüessli
-- 
kurt.alfred.muel...@gmail.com

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Help] [Newbie] Require help migrating from Perl to Python 2.7 (namespaces)

2012-12-24 Thread prilisauer
Hello Dave,

Thank you, for your help, I'll try my best.

To all others, PLEASE be pleasant with my nescience, I'll tried to describe not 
a specific error at my Program. I'll tried to get rid of that missing link this 
sample is only theoretic, but the code really exists and is over 1000 lines 
long. 

I understood how to transmit data to a class, but I do not understood how that 
class could access an SQL object, that is allready opened at a other class 
without getting into troubles with sqlite.

For my understood, the newsgroup isn't a place only to solve concrete problems. 
I think you're not only a "helpdesk" :-), If I'm at the wrong group to get some 
Ideas how to solve my "issues"


Ps.: DaveA I don't know how to say it, but I treasure your great work here, 
giving such detailed good answers.

PPs.: I know, that my codingstyle isn't that great, I've haven't programmed the 
last two years. You're welcome to guess what I've worked 8 years long. :-) you 
will laugh till you fall of your keyboard :-P

PPPs.: I' will use that day to check out the PEP's and correct my coding style, 
and naming.
-- 
http://mail.python.org/mailman/listinfo/python-list