Re: New to PSF
Yeah, I mean Python Software Foundation. I am a developer and I want to contribute. So, Can you please help me in getting started ? Thanks On Sunday, December 28, 2014 4:27:54 AM UTC+5:30, Steven D'Aprano wrote: > prateek pandey wrote: > > > Hey, I'm new to PSF. Can someone please help me in getting started. > > > Can we have some context? What do you mean by PSF? The Python Software > Foundation? Something else? > > > -- > Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: New to PSF
Yeah, I mean Python Software Foundation. I am a developer and I'm want to contribute. So, Can you please help me in getting started ? Thanks On Sunday, December 28, 2014 4:27:54 AM UTC+5:30, Steven D'Aprano wrote: > prateek pandey wrote: > > > Hey, I'm new to PSF. Can someone please help me in getting started. > > > Can we have some context? What do you mean by PSF? The Python Software > Foundation? Something else? > > > -- > Steven -- https://mail.python.org/mailman/listinfo/python-list
New to PSF
Hey, I'm new to PSF. Can someone please help me in getting started. -- https://mail.python.org/mailman/listinfo/python-list
Natural Language Processing in Python
Hi, Can somebody please provide me link to a good online resource or e- book for doing natural language processing programming in Python. Thanks, Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: Python
Chk dis out ... http://www.swaroopch.com/notes/Python It's gr8 guide for beginners Cheers!!! Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: False and 0 in the same dictionary
On Nov 5, 1:52 am, Duncan Booth <[EMAIL PROTECTED]> wrote: > Prateek <[EMAIL PROTECTED]> wrote: > > I've been using Python for a while (4 years) so I feel like a moron > > writing this post because I think I should know the answer to this > > question: > > > How do I make a dictionary which has distinct key-value pairs for 0, > > False, 1 and True. > > How about using (x, type(x)) as the key instead of just x? Yup. I thought of that. Although it seems kinda unpythonic to do so. Especially since the dictionary is basically a cache mostly containing strings. Adding all the memory overhead for the extra tuples seems like a waste just for those four keys. Is there a better way? I also thought of using a custom __eq__ method in a custom class which extends the dict type but decided that was even worse. Prateek -- http://mail.python.org/mailman/listinfo/python-list
False and 0 in the same dictionary
I've been using Python for a while (4 years) so I feel like a moron writing this post because I think I should know the answer to this question: How do I make a dictionary which has distinct key-value pairs for 0, False, 1 and True. As I have learnt, 0 and False both hash to the same value (same for 1 and True). >>> b = {0:'xyz', False:'abc'} >>> b {0: 'abc'} # Am I the only one who thinks this is weird? This obviously stems from the fact that 0 == False but 0 is not False etc. etc. That doesn't help my case where I need to distinguish between the two The same issue applies in a list: Suppose I do: >>> a = [0, 1, True, False] >>> a.index(False) 0 Wha??? Help. -- http://mail.python.org/mailman/listinfo/python-list
Re: building a linux executable
On Oct 24, 5:25 pm, Paul Boddie <[EMAIL PROTECTED]> wrote: > On 24 Okt, 14:20, Bjoern Schliessmann > [EMAIL PROTECTED]> wrote: > > > I'm sorry I cannot help, but how many linux distros have no python > > installed or no packages of it? > > It's not usually the absence of Python that's the problem. What if > your application uses various extension modules which in turn rely on > various libraries (of the .so or .a kind)? It may be more convenient > to bundle all these libraries instead of working out the package > dependencies for all the target distributions, even if you know them > all. > > Paul Thanks Paul, So I've bundled all the extension modules (cx_Freeze helped me out with that). Here is what I did: if sys.platform.startswith("linux"): import [add a bunch of imports here] This import statement immediately imports all modules which will be required. Hence, cx_Freeze is easily able to find them and I can put all the .so files with the distro. The problem is that some of these .so files (_hashlib.so) are hard- linked to /lib/libssl.so and /lib/libcrypto.so. I cannot simply just copy those (libssl/libcrypto) files into the distribution folder (and cx_Freeze won't do anything about them because they are not Python modules). I need a way to figure out how to get _hashlib.so to refer to a libssl.so which is in the same directory (I would have thunk that it should use the PATH environment variable - apparently not). This seems to be a simple enough problem, doesn't it?? NB: Has the community come up with a better way to distribute Python executables on linux? I'd also like to hear from folks who've got linux distributables of medium to large scale python programs... -- http://mail.python.org/mailman/listinfo/python-list
building a linux executable
Hello, I'm trying to package my python program into a linux executable using cx_freeze. The goal is that the user should require python on their system. I've managed to make the binaries on Fedora Core 6 and they run fine. However, when I move to Ubuntu (tested on Ubuntu Server 7.04 and xUbuntu Desktop 7.04), the program fails to run with the following error: ImportError: no module named _md5 _md5 being imported by the hashlib module because the import of _hashlib.so failed. When I try to import the _hashlib module manually, I see that it cannot find libssl.so.6 (The _hashlib module in the standard python installation which came with Ubuntu works just fine). If I manually copy the libssl.so.6 file from FC6 (its really a symlink pointing to libssl.so.0.9.8b) to Ubuntu and make the symlink in /lib, it works fine (the next error is related to libcrypto, libxslt and libjpeg (i'm also using libxml, libxsl and PIL in my application). I've scoured the net and it seems that _hashlib.so is dynamically linked to libssl which is not being "integrated" into the build by cx_Freeze. Similarly, there are other files with the same problem. Does anyone have any idea how to make cx_Freeze make a linux executable which is portable across *nix distros? Thanks in advance, -Prateek Sureka -- http://mail.python.org/mailman/listinfo/python-list
Re: Python Database Apps
On Sep 14, 1:00 am, Jonathan Gardner <[EMAIL PROTECTED]> wrote: > On Sep 12, 9:38 pm, Prateek <[EMAIL PROTECTED]> wrote: > > > Have you checked out Brainwave?http://www.brainwavelive.com > > > We provide a schema-free non-relational database bundled with an app > > server which is basically CherryPy with a few enhancements (rich JS > > widgets, Cheetah/Clearsilver templates). Free for non-commercial use. > > You might want to rethink how you are handling databases. What sets > your database apart from hierarchical or object-oriented databases? Do > you understand why people prefer relational databases over the other > options? There's a reason why SQL has won out over all the other > options available. You would do well to understand it rather than > trying out things we already know do not work. I didn't mention any details in my previous post. Basically, the Brainwave database is very similar to an RDF triple store which captures semantic information. There are some fundamental differences however, (e.g. a simplified API, no schema, ability to store objects by introspecting and breaking it up into its primitives etc.) In general, it works well if you're dealing with a lot of unstructured data or even semi-structured/structured data where the structure is constantly changing. Key areas of benefit are in terms of integrating legacy databases and also analytics heavy applications. This is because there is no concept of normalized/denormalized schemas so your data does not need to go through any structural transformations in preparation for OLAP use. Since we're new, I probably won't recommend using it with very very large data-sets. However, we've seen significant benefits with small and medium sized data-sets. At any rate, whatever you choose, you can import and export all data in CSV/XML at any time so your data is safe. Some of the benefits I have described above have been shown to be areas of concern when dealing with SQL stores. There is a lot of research and many opinion pieces available on this subject. I'd be happy to discuss any aspect with you further. Sorry for the delayed response. I've been in and around airplanes for the last 24 hours (and its not over yet). -Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: Possible suggestion for removing the GIL
On Sep 13, 1:36 pm, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: > Prateek wrote: > > Hi, > > > Recently there was some talk on removing the GIL and even the BDFL has > > written a blog post on it. > > I was trying to come up with a scalable and backwards compatible > > approach for how to do it. > > > I've put my thoughts up in a blog post - and I'd really like to hear > > what the community thinks of it. > > Mainly it revolves around dedicating one core for executing > > synchronized code and doing context switches instead of acquiring/ > > releasing locks. > > Where is the gain? Having just one core doesn't give you true parallelism - > which is the main reason behind the cries for a GIL-less Python. > > Diez Diez, I was talking of dedicating one core for synchronized code. In the case of a threaded app on two cores, one core would be dedicated to synchronized code and the other would run non-sync code (effectively behaving like a single threaded machine). However, on machines with n cores where n > 2 cores, we would have n-1 cores available to run code in parallel. Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: Python Database Apps
On Sep 13, 4:55 am, Ed Leafe <[EMAIL PROTECTED]> wrote: > On Sep 12, 2007, at 10:53 AM, [EMAIL PROTECTED] wrote: > > > Thanks for ideas Ed. I am checking out dabo now. I do have a few > > questions about it. Packaging. Is it easy to package into a quick > > install for windows. The users are going to want to get too in > > depth. > > py2exe is your friend here. I know several developers who have used > this to distribute Dabo apps, so we could certainly help you get your > setup.py working. > > > Second, data sources. When I'm adding a data source to the > > window in class designer, it always picks up the one I created (which > > incidentally was a sample, my form for connection manager isn't > > working at the moment.) My idea is to have the the local sqlite > > database as the only viewable data source, and the server side only > > for syncing. So they logon, sync up, sync down, and view. I'm > > worried about having to get them to install python, dabo, and the app. > > The users would never see any of the Class Designer, connection > editor, or any of the other development tools. I would imagine that > you would need to code the sync parts by getting the current changed > data from the local SQLite database, creating a connection to the > server DB, doing the insert/update as needed, grabbing the latest > from the server, disconnecting from the server, and then updating the > local data. The user would probably need to do nothing more than > click a button to start running your code. > > As far as what the user must install, that's what will happen with > any Python solution. py2exe takes care of all of that, bundling > Python, Dabo, your database modules, and any other dependencies into > a single .exe file. You can then use something like Inno Setup to > create a basic Installer that will look and work like any other > Windows application installer. > > -- Ed Leafe > --http://leafe.com > --http://dabodev.com Have you checked out Brainwave? http://www.brainwavelive.com We provide a schema-free non-relational database bundled with an app server which is basically CherryPy with a few enhancements (rich JS widgets, Cheetah/Clearsilver templates). Free for non-commercial use. --Prateek Sureka -- http://mail.python.org/mailman/listinfo/python-list
Possible suggestion for removing the GIL
Hi, Recently there was some talk on removing the GIL and even the BDFL has written a blog post on it. I was trying to come up with a scalable and backwards compatible approach for how to do it. I've put my thoughts up in a blog post - and I'd really like to hear what the community thinks of it. Mainly it revolves around dedicating one core for executing synchronized code and doing context switches instead of acquiring/ releasing locks. http://www.brainwavelive.com/blog/index.php?/archives/12-Suggestion-for-removing-the-Python-Global-Interpreter-Lock.html Thanks, Prateek Sureka -- http://mail.python.org/mailman/listinfo/python-list
MROW Locking
Can anyone direct me to a good resource on how to do MROW Locking efficiently in Python. The recipe at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/413393 is actually quite inefficient (not sure if it is the code or MROW itself). I have a dictionary (a cache lets say or some sort of index) which needs to be accessed by multiple threads. Most of the time its just being read. Very rarely, I have to iterate over it (sometimes writing, mostly reading), sometimes I have to update a single entry or multiple entries (ala dict.update()). I'd like to know the best way to make this happen (i.e. is MROW really what I am looking for or is there something else?). Is there a good way to do this using the in-built Lock and RLock objects? This project is part of a commercial database product. Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: fastest way to find the intersection of n lists of sets
On Apr 30, 12:37 pm, Raymond Hettinger <[EMAIL PROTECTED]> wrote: > [Prateek] > > > The reason why I'm casting to a list first is because I found that > > creating a long list which I convert to a set in a single operation is > > faster (although probably less memory efficient - which I can deal > > with) than doing all the unions. > > That would be a surprising result because set-to-set operations do > not have to recompute hash values. Also, underneath-the-hood, > both approaches share the exact same implementation for inserting > new values one the hash value is known. > > If you've seen an unfavorable speed comparison, then you most likely > had code that built new intermediate sets between step: > >common = s1 | s2 | s3 | s4 | s5 | s6 | s7 | s8 | s9 > > Instead, it is faster to build-up a single result set: > > common = set() > for s in s1, s2, s3, s4, s5, s6, s7, s8, s9: > common |= s > > Raymond Hettinger Thanks Raymond, This was my old code: self.lv is a dictionary which retrieves data from the disk or cache v_r = reduce(operator.or_, [self.lv[x.id] for x in v], set()) This code ran faster: v_r = reduce(operator.add, [list(self.lv.get(x.id, [])) for x in v], []) v_r = set(v_r) I was doing 3 of these and then intersecting them. Now, I'm doing... v_r = set() _efs = frozenset() for y in [self.lv.get(x.id, _efs) for x in v]: v_r |= y Since the number of sets is always 2 or 3, I just do the intersections explicitly like so: if len(list_of_unioned_sets) == 3: result = list_of_unioned_sets[0] result &= list_of_unioned_sets[1] result &= list_of_unioned_sets[2] elif len(list_of_unioned_sets) == 2: result = list_of_unioned_sets[0] result &= list_of_unioned_sets[1] else: # Do something else... Sorry for the relatively non-descript variable names. Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: help in debugging file.seek, file.read
On Apr 30, 3:20 am, Prateek <[EMAIL PROTECTED]> wrote: > Sorry, I forgot to mention - the RTH line only prints when the time > taken is > 0.1 second (so that I don't pollute the output with other > calls that complete normally) I have some more information on this problem. Turns out the issue is with buffer syncing. I was already doing a flush operation on the file after every commit (which flushes the user buffers). I added an os.fsync call which fixed the erratic behavior. But the code is still horribly slow... 122s vs around 100s (without the fsync). Since fsync flushes the kernel buffers, I'm assuming this has something to do with my O/S (Mac OS 10.4.9 Intel - Mac Book Pro 2.33Ghz Core 2 Duo, 2GB RAM). Can anybody help? -- http://mail.python.org/mailman/listinfo/python-list
Re: fastest way to find the intersection of n lists of sets
On Apr 30, 5:08 am, John Nagle <[EMAIL PROTECTED]> wrote: > Prateek wrote: > >> For the above example, it's worth sorting lists_of_sets by the > >>length of the sets, and doing the short ones first. > > > Thanks. I thought so - I'm doing just that using a simple Decorate- > > Sort-Undecorate idiom. > > >> How big are the sets? If they're small, but you have a lot of > >>them, you may be better off with a bit-set representation, then > >>using AND operations for intersection. If they're huge (tens of millions > >>of entries), you might be better off doing sorts and merges on the > >>sets. > > > I have either 2 or 3 sets (never more) which can be arbitrarily large. > > Most of them are small (between 0 and few elements - say less that 5). > > A few will be megamonstrous ( > 100,000 items) > > >> When you ask questions like this, it's helpful to give some > >>background. We don't know whether this is a homework assignment, or > >>some massive application that's slow and you need to fix it, even > >>if it requires heavy implementation effort. > > > Its definitely not a homework assignment - its part of a commercial > > database query engine. Heavy implementation effort is no problem. > > > Prateek > > If you're intersecting a set of 5 vs a set of 100,000, the > intersection time won't be the problem. That's just five lookups. > It's building a set of 100,000 items that may be time consuming. > > Does the big set stay around for a while, or do you have to pay > that cost on each query? > > Those really aren't big data sets on modern machines. > > John Nagle 100,000 is an arbitrary number - that is potentially equivalent to the number of unique cells in all tables of a typical database (thats the best analogy I can come up with since this isn't a typical RDBMS). The big set does stay around for a while - I've implemented an LRU based caching algorithm on the code that does the I/O. Since the db is transactioned, I keep one copy in the current transaction cache (which is a simple dictionary) and one in the main read cache (LRU based) (which obviously survives across transactions). Since the largest sets also tend to be the most frequently used, this basically solves my I/O caching issue. My problem is that I had ugly code doing a lot of unnecessary list <-> set casting. Here is what I've got now: from itertools import chain ids1 = [...], ids2 = [...], ids3 = [...] _efs = frozenset() # dbx.get calls return sets l1 = frozenset(chain(*[db1.get(x, _efs) for x in ids1]) l2 = frozenset(chain(*[db2.get(x, _efs) for x in ids2]) l3 = frozenset(chain(*[db3.get(x, _efs) for x in ids3]) decorated = [(len(x), x) for x in [l1, l2, l3]] decorated.sort() result = reduce(set.intersection, [x[1] for x in decorated]) What do you think? Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: Launching an independent Python program in a cross-platform way (including mac)
On Apr 30, 4:32 am, André <[EMAIL PROTECTED]> wrote: > I would like to find out how I can launch an independent Python > program from existing one in a cross-platform way. The result I am > after is that a new terminal window should open (for io independent of > the original script). > > The following seems to work correctly under Ubuntu and Windows ... but > I haven't been able to find a way to make it work under Mac OS. > > def exec_external(code, path): > """execute code in an external process > currently works under: > * Windows NT (tested) > * GNOME (tested) [January 2nd and 15th change untested] > This also needs to be implemented for OS X, KDE > and some form of linux fallback (xterm?) > """ > if os.name == 'nt': > current_dir = os.getcwd() > target_dir, fname = os.path.split(path) > > filename = open(path, 'w') > filename.write(code) > filename.close() > > if os.name == 'nt': > os.chdir(target_dir) # change dir so as to deal with paths > that > # include spaces > Popen(["cmd.exe", ('/c start python %s'%fname)]) > os.chdir(current_dir) > elif os.name == 'posix': > try: > os.spawnlp(os.P_NOWAIT, 'gnome-terminal', 'gnome- > terminal', > '-x', 'python', '%s'%path) > except: > raise NotImplementedError > else: > raise NotImplementedError > == > Any help would be greatly appreciated. > > André Well, You need to check sys.platform on the Mac instead of os.name. os.name returns 'posix' on all *nix based systems. sys.platform helpfully returns "darwin" on the Mac. Not sure how to start Terminal. Here's what I got when I tried it: >>> if sys.platform == "darwin": os.spawnlp(os.P_NOWAIT, >>> '/Applications/Utilities/Terminal.app/Contents/MacOS/Terminal') 9460 >>> 2007-04-30 05:19:59.255 [9460] No Info.plist file in application bundle or >>> no NSPrincipalClass in the Info.plist file, exiting Maybe I'm just calling it wrong and you'll have more luck. Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: fastest way to find the intersection of n lists of sets
On Apr 30, 3:48 am, James Stroud <[EMAIL PROTECTED]> wrote: > Prateek wrote: > > I have 3 variable length lists of sets. I need to find the common > > elements in each list (across sets) really really quickly. > > > Here is some sample code: > > > # Doesn't make sense to union the sets - we're going to do > > intersections later anyway > > l1 = reduce(operator.add, list(x) for x in l1) > > l2 = reduce(operator.add, list(x) for x in l2) > > l3 = reduce(operator.add, list(x) for x in l3) > > > # Should I do this in two steps? Maybe by intersecting the two > > shortest lists first? > > s = frozenset(l1) & frozenset(l2) & frozenset(l3) > > > I'm assuming frozensets are (somehow) quicker than sets because > > they're immutable. > > > Any code suggestions? Maybe using something in the new fancy-schmancy > > itertools module? > > > Thanks, > > Prateek > > I don't understand why you cast to list. I would propose: > The reason why I'm casting to a list first is because I found that creating a long list which I convert to a set in a single operation is faster (although probably less memory efficient - which I can deal with) than doing all the unions. Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: fastest way to find the intersection of n lists of sets
> For the above example, it's worth sorting lists_of_sets by the > length of the sets, and doing the short ones first. Thanks. I thought so - I'm doing just that using a simple Decorate- Sort-Undecorate idiom. > How big are the sets? If they're small, but you have a lot of > them, you may be better off with a bit-set representation, then > using AND operations for intersection. If they're huge (tens of millions > of entries), you might be better off doing sorts and merges on the > sets. I have either 2 or 3 sets (never more) which can be arbitrarily large. Most of them are small (between 0 and few elements - say less that 5). A few will be megamonstrous ( > 100,000 items) > When you ask questions like this, it's helpful to give some > background. We don't know whether this is a homework assignment, or > some massive application that's slow and you need to fix it, even > if it requires heavy implementation effort. > Its definitely not a homework assignment - its part of a commercial database query engine. Heavy implementation effort is no problem. Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: help in debugging file.seek, file.read
Sorry, I forgot to mention - the RTH line only prints when the time taken is > 0.1 second (so that I don't pollute the output with other calls that complete normally) -- http://mail.python.org/mailman/listinfo/python-list
help in debugging file.seek, file.read
I have a wierd sort of problem. I'm writing a bunch of sets to a file (each element is a fixed length string). I was originally using the built-in sets type but due to a processing issue, I had to shift to a Python implementation (see http://groups.google.com/group/comp.lang.python/browse_thread/thread/77e06005e897653c/12270083be9a67f6). I'm using Raymond Hettinger's very graciously provided TransactionSet recipe from http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/511496 My old code was pretty speedy but the new code is quite slow and according to some hotshot profiles it has nothing to do with the new TransactionSet. *Some* of the file.seek and file.read calls occasionally block for insane amounts (>10s) of time when reading no more than 45 bytes of data from the file. So when I'm running load- tests, this eventually happens like so: Each i_id statement is the time taken for 100 successive commits. Each RTH (Read Table Header) statement is the time taken for a single read call for 45 bytes of data # sz__TABLE_HEADER_FORMAT__ is 45 hdr = os.read(f.fileno(), sz__TABLE_HEADER_FORMAT__) #hdr = f.read(sz__TABLE_HEADER_FORMAT__) # Tried this as well Loading items... i_id: 0 0.00003057861 i_id: 100 1.01557397842 i_id: 200 1.14013886452 i_id: 300 1.16142892838 i_id: 400 1.16356801987 i_id: 500 1.36410307884 i_id: 600 1.34421014786 i_id: 700 1.30385017395 i_id: 800 1.48079919815 i_id: 900 1.41147589684 RTH: 0.582525968552 RTH: 2.77490496635 i_id: 1000 5.16863512993 i_id: 1100 1.73725795746 i_id: 1200 1.56621193886 i_id: 1300 1.81338000298 i_id: 1400 1.69464302063 i_id: 1500 1.74725604057 i_id: 1600 2.3591946 i_id: 1700 1.85096788406 i_id: 1800 2.20518493652 i_id: 1900 1.94831299782 i_id: 2000 2.03350806236 i_id: 2100 2.32529306412 i_id: 2200 2.44498205185 RTH: 0.105868816376 i_id: 2300 3.65522289276 i_id: 2400 4.2119910717 i_id: 2500 4.21354198456 RTH: 0.115046024323 RTH: 0.122591972351 RTH: 2.88115119934 RTH: 10.5908679962 i_id: 2600 18.8498170376 i_id: 2700 2.42577004433 i_id: 2800 2.47392010689 i_id: 2900 2.88293218613 So I have no idea why this is happening (it is also happening with seek operations). Any guidance on how to debug this? thanks, Prateek -- http://mail.python.org/mailman/listinfo/python-list
fastest way to find the intersection of n lists of sets
I have 3 variable length lists of sets. I need to find the common elements in each list (across sets) really really quickly. Here is some sample code: # Doesn't make sense to union the sets - we're going to do intersections later anyway l1 = reduce(operator.add, list(x) for x in l1) l2 = reduce(operator.add, list(x) for x in l2) l3 = reduce(operator.add, list(x) for x in l3) # Should I do this in two steps? Maybe by intersecting the two shortest lists first? s = frozenset(l1) & frozenset(l2) & frozenset(l3) I'm assuming frozensets are (somehow) quicker than sets because they're immutable. Any code suggestions? Maybe using something in the new fancy-schmancy itertools module? Thanks, Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: Support for new items in set type
> I don't see where your SeaSet class is used. > Actually that is the point. According to the hotshot profile, the problem code doesn't use the SeaSet implementation. Yet that same code was running much faster earlier. I tried multiple time (2-3 times). >From what I can fathom, nothing else changed - just the set implementation. It seems obvious that the read() call in the __readTableHeader method is blocking for longer periods although it isn't exactly obvious why that might be. I was hoping someone with an idea of Python-C interaction could shed some light on this. I'm on a different computer right now, I'll log back in later and post more code if that helps. Again, thanks to anyone who can help. Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: Support for new items in set type
Oh dear god, I implemented this and it overall killed performance by about 50% - 100%. The same script (entering 3000 items) takes between 88 - 109s (it was running in 55s earlier). Here is the new Set implementation: class SeaSet(set): __slots__ = ['master', 'added', 'deleted'] def __init__(self, s=None): if s is None: s = [] self.master = set(s) self.added = set() self.deleted = set() def add(self, l): if l not in self.master: self.added.add(l) try: self.deleted.remove(l) except KeyError: pass def remove(self, l): try: self.master.remove(l) self.deleted.add(l) except KeyError: try: self.added.remove(l) except KeyError: pass def __contains__(self, l): if l in self.deleted: return False elif l in self.added: return True else: return l in self.master def __len__(self): return len(self.master) + len(self.added) def __iter__(self): for x in self.master: yield x for x in self.added: yield x def __or__(self, s): return self.union(s) def __and__(self, s): return self.intersection(s) def __sub__(self, s): return self.difference(s) def union(self, s): """docstring for union""" if isinstance(s, (set, frozenset)): return s | self.master | self.added elif isinstance(s, SeaSet): return self.master | self.added | s.master | s.added else: raise TypeError def intersection(self, s): """docstring for intersection""" if isinstance(s, (set, frozenset)): return s & self.master & self.added elif isinstance(s, SeaSet): return self.master & self.added & s.master & s.added else: raise TypeError def difference(self, s): """docstring for difference""" if isinstance(s, (set, frozenset)): self.deleted |= (self.master - s) self.master -= s self.added -= s elif isinstance(s, SeaSet): t = (s.master | s.deleted) self.deleted |= self.master - t self.master -= t self.added -= t else: raise TypeError The surprising thing is that commits *ARE* running about 50% faster (according to the time column in the hotshot profiler). But, now, the longest running operations seem to be the I/O operations which are taking 10 times longer! (even if they're only reading or writing a few bytes. Could this have something to do with the set implementation being in Python as opposed to C? For instance, this method: def __readTableHeader(self, f): hdr = f.read(sz__TABLE_HEADER_FORMAT__) if len(hdr) < sz__TABLE_HEADER_FORMAT__: raise EOFError t = THF_U(hdr) #t = unpack(__TABLE_HEADER_FORMAT__, hdr) return t is now taking > 13s when it was taking less than 0.8s before! (same number of calls, nothing changed except the set implementation) sz__TABLE_HEADER_FORMAT__ is a constant = struct.calcsize("http://mail.python.org/mailman/listinfo/python-list
Re: Python and Javascript equivalence
Try creating a dict with sequential numeric keys. If you already have a list called my_list, you can do: com_array = dict(zip(range(len(my_list)), my_list)) This works when you want to convert Python objects to Javascript using JSON. It may work for you. -Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: pickled object, read and write..
On Apr 22, 11:40 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > Hi all. > > I have to put together some code that reads high scores from a saved > file, then gives the user the opportunity to add their name and score > to the high scores list, which is then saved. > > Trouble is, I can't tell the program to read a file that doesn't > exist, that generates an error. > > So I must have a file created, problem HERE is everytime the program > is run, it will overwrite the current list of saved high scores. > > Advice would be much appreciated. Try the following idiom: try: try: fp = open("filename", 'r+') except IOError: fp = open("filename", 'w+') fp.write(high_score) finally: fp.close() -Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: Support for new items in set type
> > 2) Maintaining a copy wastes memory > > 3) I don't have a good solution if I delete items from the set > > (calculating the difference will return an empty set but I need to > > actually delete stuff). > > (3) is easy -- the difference originalset-finalset is the set of things > you have to delete. Now why didn't I think of that? > > Does anyone have any ideas? > > Your problem is really a thinly disguised version of one of the most > ancient and venerable ones in data processing -- the "master file / > transaction file" problem. With sets (which are built as hash tables) > you have very powerful weapons for this fray (it used to be, many years > ago, that sorting and "in-step processing of sorted files" was the only > feasible approach to such problems), but you still have to wield them > properly. > > In your shoes, I would write a class whose instances hold three sets: > -- the "master set" is what you originally read from the file > -- the "added set" is the set of things you've added since then > -- the "deleted set" is the set of things you've deleted since them > > You can implement a set-like interface; for example, your .add method > will need to remove the item from the deleted set (if it was there), > then add it to the added set (if it wasn't already in the master set); > and so on, for each and every method you actually desire (including no > doubt __contains__ for membership testing). E.g., with suitably obvious > names for the three sets, we could have...: > > class cleverset(object): >... >def __contains__(self, item): > if item in self.deleted: return False > elif item in self.added: return True > else: return item in self.master > > When the time comes to save back to disk, you'll perform deletions and > additions based on self.deleted and self.added, of course. Ok. This class sounds like a good idea. I'll try it. > I'm not addressing the issue of how to best represent such sets on disk > because it's not obvious to me what operations you need to perform on > the on-disk representation -- for example, deletions can be very costly > unless you can afford to simply set a "logically deleted" flag on > deleted items; but, depending on how often you delete stuff, that will > eventually degrade performance due to fragmentation, and maintaining the > indexing (if you need one) can be a bear. I actually don't have very many transactions I need to perform on disk. Once I've loaded these sets, I normally use the contains operation and union/intersection/difference operations. I can definitely use a tombstone on the record to mark it as logically deleted. The problem right now is that I don't need indexing within sets (I either load the whole set or don't bother at all) - I'm only indexing the entire file (one entry in the index per tablespace). So even if I use tombstones, I'll have to still go through all the tablespace to find the tombstoned files. Alternatively, I could write the deleted data at the end of the tablespace and mark it with the tombstone (since I don't care about space so much) and when I load the data, I can eliminate the tombstoned entries. I'm give that a try and report my performance results on this thread - I absolutely love that I can make this important a change in one or two days using Python! > Normally, I would use a good relational database (which has no doubt > already solved on my behalf all of these issues) for purpose of > persistent storage of such data -- usually sqlite for reasonably small > amounts of data, PostgreSQL if I need enormous scalability and other > "enterprise" features, but I hear other people are happy with many other > engines for relational DBs, such as Oracle (used to be superb if you > could afford full-time specialized db admins), IBM's DB2, SAP's database > (now opensourced), Firebird, even MySQL (I've never, but NEVER, had any > good experience with the latter, but I guess maybe it's only me, as > otherwise it could hardly be so popular). The main point here is that a > relational DB's tables _should_ be already very well optimized for > storing "sets", since those table ARE exactly nothing but sets of tuples > (and a 1-item tuple is a tuple for all that:-). Thanks Alex, but we're actually implementing a (non-relational) database engine. This particular file is one of many files (which are all kept in sync by the engine) I need to persist (some files use other methods - including pickling - for persistence). I've already solved most of our other problems and according to hotshot, this is one of the next hurdles (this commit is responsible for 8% of the total CPU time @ 3000 items and 3 files). If you're interested, you can check it out at http://www.brainwavelive.com or email me. Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: Support for new items in set type
On Apr 22, 11:09 am, Steven D'Aprano <[EMAIL PROTECTED]> wrote: > On Sat, 21 Apr 2007 20:13:44 -0700, Prateek wrote: > > I have a bit of a specialized request. > > > I'm reading a table of strings (specifically fixed length 36 char > > uuids generated via uuid.uuid4() in the standard library) from a file > > and creating a set out of it. > > Then my program is free to make whatever modifications to this set. > > > When I go back to save this set, I'd like to be able to only save the > > new items. > > This may be a silly question, but why? Why not just save the modified set, > new items and old, and not mess about with complicated transactions? I tried just that. Basically ignored all the difficulties of difference calculation and just overwrote the entire tablespace with the new set. At about 3000 entries per file (and 3 files) along with all the indexing etc. etc. just the extra I/O cost me 28% performance. I got 3000 entries committed in 53s with difference calculation but in 68s with writing the whole thing. > > After all, you say: > > > PS: Yes - I need blazing fast performance - simply pickling/unpickling > > won't do. Memory constraints are important but definitely secondary. > > Disk space constraints are not very important. > > Since disk space is not important, I think that you shouldn't care that > you're duplicating the original items. (Although maybe I'm missing > something.) > > Perhaps what you should be thinking about is writing a custom pickle-like > module optimized for reading/writing sets quickly. I already did this. I'm not using the pickle module at all - Since I'm guaranteed that my sets contain a variable number of fixed length strings, I write a header at the start of each tablespace (using struct.pack) marking the number of rows and then simply save each string one after the other without delimiters. I can do this simply by issuing "".join(list(set_in_question)) and then saving the string after the header. There are a few more things that I handle (such as automatic tablespace overflow) Prateek -- http://mail.python.org/mailman/listinfo/python-list
Support for new items in set type
I have a bit of a specialized request. I'm reading a table of strings (specifically fixed length 36 char uuids generated via uuid.uuid4() in the standard library) from a file and creating a set out of it. Then my program is free to make whatever modifications to this set. When I go back to save this set, I'd like to be able to only save the new items. Currently I am creating a copy of the set as soon as I load it and then when I go back to save it, i'm calculating the difference and saving just the difference. There are many problems with this approach so far: 1) Calculating the difference via the standard set implementation is not very scalable -> O(n) I presume 2) Maintaining a copy wastes memory 3) I don't have a good solution if I delete items from the set (calculating the difference will return an empty set but I need to actually delete stuff). I was thinking of writing a specialized set implementation (I have no idea how to start on this) which tracks new items (added after initialization) in a separate space and exposes a new API call which would enable me to directly get those values. This is kind of ugly and doesn't solve problem 3. I also thought of using a hastable but I'm storing many ( > 1 million) of these sets in the same file (most of them will be empty or contain just a few values but a few of them will be very large - excess of 10,000 items). The file is already separated into tablespaces. Does anyone have any ideas? Thanks in advance. Prateek PS: Yes - I need blazing fast performance - simply pickling/unpickling won't do. Memory constraints are important but definitely secondary. Disk space constraints are not very important. -- http://mail.python.org/mailman/listinfo/python-list
Re: Fast Imaging for Webserver
Thanks for all the responses so far.. The 'imaging' app that I mentioned is actually serving as a library for a bunch of other apps. A typical scenario is icon processing. The source image *could* be large (in this case it was 128x128 PNGA) Think of it as a Mac Finder/Win Explorer style view (grid + imagelist) where you're showing a bunch of data in icon form (16x16). For small amounts of data, everything is ok... but as soon as I get to 600-700 items, the page takes forever to load. The obvious solution is pagination but I'm doing some aggregate analysis on the data (like a frequency distribution) where if I were to paginate the data, i'd lose the client-side sorting ability (I'd have to sort server side which is a whole other can of worms). I'm gonna take Paul's advice and report as soon as I can. One other alternative is to save this information on the filesystem (as temp files) and route all future requests via the static file processing mechanism (which can be handled by Apache)... do you think that is a good idea? Prateek On Jan 25, 3:11 pm, "Fredrik Lundh" <[EMAIL PROTECTED]> wrote: > "Prateek" wrote: > > Hi. I'm creating a web-application using CherryPy 2.2.1. My application > > needs to process images (JPG/PNG files) to > > > 1) create thumbnails (resize them) > > 2) overlay them on a custom background (a simple frame) > > 3) Overlay 'badges' (small 16x16 images) on top of the final thumbnail > > > I am using PIL 1.1.5 which I have custom compiled on my development > > machine (MacBook Pro 2.33Ghz Core Duo).. > > I am using im.thumbnail for step 1 and im.paste for steps 2 and 3. > > > The problem is that this thing is just way too slow. > > For ab -n 1000 -C session_id=2f55ae2dfefa896f67a80f73045aadfa4b4269f1 > >http://localhost:8080/imaging/icon/def/128/255(where def is the name > > of the image - default in this case - 128 is the size in pixels and 255 > > is the background color), I am getting: > > > Document Path: /imaging/icon/def/128/255 > > Document Length:14417 bytes > > > Concurrency Level: 1 > > Time taken for tests: 18.664 seconds > > Complete requests: 1000 > > Failed requests:0 > > Broken pipe errors: 0 > > Total transferred: 1468 bytes > > HTML transferred: 14417000 bytes > > Requests per second:53.58 [#/sec] (mean) > > Time per request: 18.66 [ms] (mean) > > Time per request: 18.66 [ms] (mean, across all concurrent > > requests) > > Transfer rate: 786.54 [Kbytes/sec] received > > > FYI: This request returns a PNG image (image/png) and not html > > > My understanding is that the problem is either with the CherryPy setup > > (which is likely because even in other cases, i don't get much more > > than 65 requests per second) or PIL itself (even though I'm caching the > > background images and source images)without knowing more about what kind of > > source images you're using, the > amount of data they represent etc, it's not entirely obvious to me that 50+ > images per second on a single server is that bad, really. unless the source > images are rather tiny, that's plenty of pixels per second to crunch. > > are you really receiving more than 450 source images per day per > server? > > -- http://mail.python.org/mailman/listinfo/python-list
Fast Imaging for Webserver
Hi. I'm creating a web-application using CherryPy 2.2.1. My application needs to process images (JPG/PNG files) to 1) create thumbnails (resize them) 2) overlay them on a custom background (a simple frame) 3) Overlay 'badges' (small 16x16 images) on top of the final thumbnail I am using PIL 1.1.5 which I have custom compiled on my development machine (MacBook Pro 2.33Ghz Core Duo).. I am using im.thumbnail for step 1 and im.paste for steps 2 and 3. The problem is that this thing is just way too slow. For ab -n 1000 -C session_id=2f55ae2dfefa896f67a80f73045aadfa4b4269f1 http://localhost:8080/imaging/icon/def/128/255 (where def is the name of the image - default in this case - 128 is the size in pixels and 255 is the background color), I am getting: Document Path: /imaging/icon/def/128/255 Document Length:14417 bytes Concurrency Level: 1 Time taken for tests: 18.664 seconds Complete requests: 1000 Failed requests:0 Broken pipe errors: 0 Total transferred: 1468 bytes HTML transferred: 14417000 bytes Requests per second:53.58 [#/sec] (mean) Time per request: 18.66 [ms] (mean) Time per request: 18.66 [ms] (mean, across all concurrent requests) Transfer rate: 786.54 [Kbytes/sec] received FYI: This request returns a PNG image (image/png) and not html My understanding is that the problem is either with the CherryPy setup (which is likely because even in other cases, i don't get much more than 65 requests per second) or PIL itself (even though I'm caching the background images and source images) Does anyone have a better solution? Is there a faster replacement for PIL? Thanks in advance. Prateek -- http://mail.python.org/mailman/listinfo/python-list
Re: How can I get involved
Hey all, As promised, I'm releasing v0.1 of JUnpickler - an unpickler for Python pickle data (currently Protocol 2 only) for Java. http://code.brainwavelive.com/unpickler Do check it out and let me have your comments. Prateek Sureka Fredrik Lundh wrote: > Paul Boddie wrote: > > > I find it interesting that you've been using Python for so long and yet > > haven't posted to this group before. > > c.l.python is just a small speck at the outer parts of the python > universe. most python programmers don't even read this newsgroup, > except, perhaps, when they stumble upon it via a search engine. > > -- http://mail.python.org/mailman/listinfo/python-list
Python Unpickler for Java
Hey guys, I've started work on JUnpickler - an Unpickler for Java. The Jython project has been going a little slow lately so I thought this would be a nice addition to the community (plus I'll probably need it for my work anyway) - So here is a bit of code I wrote (its my first open source contribution) which takes pickle data (protocol 2) and deserializes it into a Java object. http://code.brainwavelive.com/unpickler I plan to use this in conjunction with a different process running a Pyro daemon for IPC based object passing. Thoughts? Comments? Contributors? Prateek Sureka http://www.brainwavelive.com -- http://mail.python.org/mailman/listinfo/python-list
Re: how to determine Operating System in Use?
eeps! typo. > if sys.platform == "darwin": > macStuff() > elif sys.platform == "win32": > winStuff() > Not sure what the string is on linux. Just fire up the interpreter and try it. Prateek Prateek wrote: > also try: > > sys.platform > > if sys.platform == "darwin": > macStuff() > elif sys.platform == "win32": > linuxStuff() > > > James Cunningham wrote: > > On 2006-12-13 19:28:14 -0500, [EMAIL PROTECTED] said: > > > > > > > > > > > On Dec 13, 6:32 pm, "Ian F. Hood" <[EMAIL PROTECTED]> wrote: > > >> Hi > > >> In typically windows environments I have used: > > >> if 'Windows' in os.environ['OS']... > > >> to prove it, but now I need to properly support different environments. > > >> To do so I must accurately determine what system the python instance is > > >> running on (linux, win, mac, etc). > > >> Is there a best practises way to do this? > > >> TIA > > >> Ian > > > > > > I would do this: > > > > > > if os.name == ''posix': > > > linuxStuff() > > > elif os.name == 'nt': > > > windowsStuff() > > > elif os.name == 'os2': ... > > > --- > > > os.name is 'posix', 'nt', 'os2', 'mac', 'ce' or 'riscos' > > > > > > -N > > > > Bearing in mind, of course, that Mac will return "posix", too. And > > Cygwin might. Erg. > > > > Best, > > James -- http://mail.python.org/mailman/listinfo/python-list
Re: how to determine Operating System in Use?
also try: sys.platform if sys.platform == "darwin": macStuff() elif sys.platform == "win32": linuxStuff() James Cunningham wrote: > On 2006-12-13 19:28:14 -0500, [EMAIL PROTECTED] said: > > > > > > > On Dec 13, 6:32 pm, "Ian F. Hood" <[EMAIL PROTECTED]> wrote: > >> Hi > >> In typically windows environments I have used: > >> if 'Windows' in os.environ['OS']... > >> to prove it, but now I need to properly support different environments. > >> To do so I must accurately determine what system the python instance is > >> running on (linux, win, mac, etc). > >> Is there a best practises way to do this? > >> TIA > >> Ian > > > > I would do this: > > > > if os.name == ''posix': > > linuxStuff() > > elif os.name == 'nt': > > windowsStuff() > > elif os.name == 'os2': ... > > --- > > os.name is 'posix', 'nt', 'os2', 'mac', 'ce' or 'riscos' > > > > -N > > Bearing in mind, of course, that Mac will return "posix", too. And > Cygwin might. Erg. > > Best, > James -- http://mail.python.org/mailman/listinfo/python-list
Re: How can I get involved
Hey everyone... Thanks a million for the warm welcome. Not sure WHY I haven't posted before (although I have been lurking for a few weeks). I guess I've been learning via practice and other random documentation online. In case anyone is interested, the url of my organization is http://www.brainwavelive.com There isn't too much technical documentation there yet, but more is coming soon - I promise. Let me know if anyone wants to participate. There are a couple of sub-projects which we're open sourcing which we'll put up soon. In the meantime, I had an interesting project I thought some of you may be interested in... I've been checking out Jython and I've been kinda disappointed to see nothing released in a while (so I'm eager to see what comes next). In the meantime, I need a way to access my database server (written in Python) from Java. Since the server process uses Pyro for IPC and I don't want to switch to XML (purely because of the overhead), I thought it might be fun to write something in Java. So I've started work on an Unpickler in Java. I'll post again soon with the URL (haven't uploaded it yet). If anyone is interested, email me. Prateek Paul Boddie wrote: > I find it interesting that you've been using Python for so long and yet > haven't posted to this group before. Have you been participating in > other forums or groups? I suppose this shows how big the community is, > and that comp.lang.python is just the tip of the iceberg. > > Anyway, welcome! :-) > > > Key points: > > 1) It comes with a non-relational schema-free database we designed > > 2) The application server is based on CherryPy > > 3) The UI engine is XSLT based (but it also supports Cheetah and > > Clearsilver via plugins - out of the box) > > I get along fairly well with XSLT, so this sounds rather interesting. > > > Basically, I really love the language and I'm looking for ways to get > > involved in the community and contribute. > > There are so many things you can do that it's hard to describe them > all. If you're free to contribute to open source projects, you might > choose some that interest you, or which might benefit the work you do > in your day job, and contribute some effort to those projects. > Alternatively, you could start your own interesting projects and share > them with us. Perhaps you want to help other people to understand > Python or its libraries, and you could decide to write some > documentation or some "how to" documents or guides. If you're really > enthusiastic, you could help the core developers with developing > CPython, but if you're more of a Java person then perhaps the > developers of Jython might appreciate the help a bit more. > > > My past (pre-Python) experience has been mainly in web-technologies - > > Java, PHP, ASP and a little bit of C. > > > > Any ideas? > > The python.org Wiki gives a few starting points, based on what I've > written above: > > http://wiki.python.org/moin/FrontPage > > It's a bit of a mess, unfortunately, despite some of us trying to keep > some areas reasonably tidy, but if you dive in and look around you'll > probably find something to inspire you. If not, just tell us and we'll > try and suggest something. ;-) > > Paul -- http://mail.python.org/mailman/listinfo/python-list
How can I get involved
Hey all, I'm messaging this group for the first time. Basically I've been a (pretty intensive) Python programmer for the past 2 years. I started a software company which has just released an SDK (v1.0b - developer preview) for developing web applications in Python. Key points: 1) It comes with a non-relational schema-free database we designed 2) The application server is based on CherryPy 3) The UI engine is XSLT based (but it also supports Cheetah and Clearsilver via plugins - out of the box) Basically, I really love the language and I'm looking for ways to get involved in the community and contribute. My past (pre-Python) experience has been mainly in web-technologies - Java, PHP, ASP and a little bit of C. Any ideas? Prateek -- http://mail.python.org/mailman/listinfo/python-list