Re: Function to Print a nicely formatted Dictionary or List?
If you want tables layed out in a framing grid, you might want to take a look at this: https://bitbucket.org/astanin/python-tabulate/pull-requests/31/allow-specifying-float-formats-per-column/diff Frederic On 6/9/22 12:43, Dave wrote: Hi, Before I write my own I wondering if anyone knows of a function that will print a nicely formatted dictionary? By nicely formatted I mean not all on one line! Cheers Dave -- https://mail.python.org/mailman/listinfo/python-list
Re: Global VS Local Subroutines
I believe to have observed a difference which also might be worth noting: the imbedded function a() (second example) has access to all of the imbedding function's variables, which might be an efficiency factor with lots of variables. The access is read-only, though. If the inner function writes to one of the readable external variables, that variable becomes local to the inner function. Frederic On 2/10/22 1:13 PM, BlindAnagram wrote: Is there any difference in performance between these two program layouts: def a(): ... def(b): c = a(b) or def(b): def a(): ... c = a(b) I would appreciate any insights on which layout to choose in which circumstances. -- https://mail.python.org/mailman/listinfo/python-list
Re: Update a specific element in all a list of N lists
On 12/16/21 3:00 PM, hanan lamaazi wrote: Dear All, I really need your assistance, I have a dataset with 1005000 rows and 25 columns, The main column that I repeatedly use are Time, ID, and Reputation First I sliced the data based on the time, and I append the sliced data in a list called "df_list". So I get 201 lists with 25 columns The main code is starting for here: for elem in df_list: {do something.} {Here I'm trying to calculate the outliers} Out.append(outliers) Now my problem is that I need to locate those outliers in the df_list and then update another column with is the "Reputation" Note that the there is a duplicated IDs but at different time slot example is ID = 1 is outliers, I need to select all ID = 1 in the list and update their reputation column I tried those solutions: 1) grp = data11.groupby(['ID']) for i in GlobalNotOutliers.ID: data11.loc[grp.get_group(i).index, 'Reput'] += 1 for j in GlobalOutliers.ID: data11.loc[grp.get_group(j).index, 'Reput'] -= 1 It works for a dataframe but not for a list 2) for elem in df_list: elem.loc[elem['ID'].isin(Outlier['ID'])] It doesn't select the right IDs, it gives the whole values in elem 3) Here I set the index using IDs: for i in Outlier.index: for elem in df_list: print(elem.Reput) if i in elem.index: # elem.loc[elem[i] , 'Reput'] += 1 m = elem.iloc[i, :] print(m) It gives this error: IndexError: single positional indexer is out-of-bounds I'm greatly thankful to anyone who can help me, I'd suggest you group your records by date and put each group into a dict whose key is date. Collecting each record into its group, append to it the index of the respective record in the original list. Then go through all your groups, record by record, finding outliers. The last item in the record is the index of the record in the original list identifying the record you want to update. Something like this: dictionary = {} for i, record in enumerate (original_list): date = record [DATE_INDEX] if date in dictionary: dictionary [date].append ((record, i)) else: dictionary[date] = [(record, i)] reputation_indexes = set () for date, records in dictionary.items (): for record, i in records: if has_outlier (record): reputation_indexes.add (i) for i in reputation_idexes: update_reputation (original_list [i]) Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: How to get dynamic data in html (javascript?)
On 1/11/20 2:39 PM, Friedrich Rentsch wrote: Hi all, I'm pretty good at hacking html text. But I have no clue how to get dynamic data like this : "At close: {date} {time}". I would appreciate a starting push to narrow my focus, currently awfully unfocused. Thanks. Frederic Thanks for bothering. I'm sorry for having been too terse. The snippet was from an html download (wget https://finance.yahoo.com/quote/HYT/history?p=HYT). Here's a little more of it: ". . . ED_SHORT":"At close: {date} {time}","MARKET_TIME_NOTICE_CLOSED":"As of {date} {time}. {marketState}","MARKE . . .". I suppose the browser gets the values whose names appear in braces, dialoguing with the server. I believe it is javascript. I see sections marked . . . section here . . . </tt><tt>. If it is javascript, the question would be how to run javascript in python, unless it runs on the server by requests sent by python from my end. Frederic -- https://mail.python.org/mailman/listinfo/python-list
How to get dynamic data in html (javascript?)
Hi all, I'm pretty good at hacking html text. But I have no clue how to get dynamic data like this : "At close: {date} {time}". I would appreciate a starting push to narrow my focus, currently awfully unfocused. Thanks. Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: pre-edit stuff persists in a reloaded a module
On 10/5/19 1:48 PM, Friedrich Rentsch wrote: Hi all, Python 2.7. I habitually work interactively in an Idle window. Occasionally I correct code, reload and find that edits fail to load. I understand that reloading is not guaranteed to reload everything, but I don't understand the exact mechanism and would appreciate some illumination. Right now I am totally bewildered, having deleted and garbage collected a module and an object, reloaded the module and remade the object and when I inspect the corrected source (inspect.getsource (Object.run)) I see the uncorrected source, which isn't even on the disk anymore. The command 'reload' correctly displays the name of the source, ending '.py', indicating that it recognizes the source being newer than the compile ending '.pyc'. After the reload, the pyc-file is newer, indicating that it has been recompiled. But the runtime error persist. So the recompile must have used the uncorrected old code. I could kill python with signal 15, but would prefer a targeted purge that doesn't wipe clean my Idle workbench. (I know I should upgrade to version 3. I will as soon as I get around to it. Hopefully that will fix the problem.) Thanks for comments Frederic Closing the thread with thanks to all who responded, offering excellent advice. Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: pre-edit stuff persists in a reloaded a module
On 10/5/19 2:48 PM, Peter Otten wrote: Friedrich Rentsch wrote: Hi all, Python 2.7. I habitually work interactively in an Idle window. Occasionally I correct code, reload and find that edits fail to load. I understand that reloading is not guaranteed to reload everything, but I don't understand the exact mechanism and would appreciate some illumination. Right now I am totally bewildered, having deleted and garbage collected a module and an object, reloaded the module and remade the object and when I inspect the corrected source (inspect.getsource (Object.run)) I see the uncorrected source, which isn't even on the disk anymore. The command 'reload' correctly displays the name of the source, ending '.py', indicating that it recognizes the source being newer than the compile ending '.pyc'. After the reload, the pyc-file is newer, indicating that it has been recompiled. But the runtime error persist. So the recompile must have used the uncorrected old code. I could kill python with signal 15, but would prefer a targeted purge that doesn't wipe clean my Idle workbench. (I know I should upgrade to version 3. I will as soon as I get around to it. Hopefully that will fix the problem.) Thanks for comments (1) stay away from reload() (2) inspect.getsource() uses a cache that you should be able to clear with linecache.clear(): $ echo 'def f(): return "old"' > tmp.py $ python [...] import inspect, tmp inspect.getsource(tmp.f) 'def f(): return "old"\n' [1]+ Angehalten python $ echo 'def f(): return "new"' > tmp.py $ fg python reload(tmp) reload(tmp) inspect.getsource(tmp.f) 'def f(): return "old"\n' import linecache; linecache.clearcache() inspect.getsource(tmp.f) 'def f(): return "new"\n' (3) see 1 ;) Thank you, Peter. I guess, then, that not only 'inspect', but the compiler as well reads source off the line cache and clearing the latter would make 'reload' work as expected. Are there other snags lurking, that you advise against using 'reload'? What are the alternatives for developing iteratively, alternating between running and editing? Frederic -- https://mail.python.org/mailman/listinfo/python-list
pre-edit stuff persists in a reloaded a module
Hi all, Python 2.7. I habitually work interactively in an Idle window. Occasionally I correct code, reload and find that edits fail to load. I understand that reloading is not guaranteed to reload everything, but I don't understand the exact mechanism and would appreciate some illumination. Right now I am totally bewildered, having deleted and garbage collected a module and an object, reloaded the module and remade the object and when I inspect the corrected source (inspect.getsource (Object.run)) I see the uncorrected source, which isn't even on the disk anymore. The command 'reload' correctly displays the name of the source, ending '.py', indicating that it recognizes the source being newer than the compile ending '.pyc'. After the reload, the pyc-file is newer, indicating that it has been recompiled. But the runtime error persist. So the recompile must have used the uncorrected old code. I could kill python with signal 15, but would prefer a targeted purge that doesn't wipe clean my Idle workbench. (I know I should upgrade to version 3. I will as soon as I get around to it. Hopefully that will fix the problem.) Thanks for comments Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: Regex to extract multiple fields in the same line
On 06/15/2018 12:37 PM, Ganesh Pal wrote: Hey Friedrich, The proposed solution worked nice , Thank you for the reply really appreciate that Only thing I think would need a review is if the assignment of the value of one dictionary to the another dictionary if is done correctly ( lines 17 to 25 in the below code) Here is my code : root@X1:/Play_ground/SPECIAL_TYPES/REGEX# vim Friedrich.py 1 import re 2 from collections import OrderedDict 3 4 keys = ["struct", "loc", "size", "mirror", 5 "filename","final_results"] 6 7 stats = OrderedDict.fromkeys(keys) 8 9 10 line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10 --path=/tmp/data_block.txt --size=8' 11 12 regex = re.compile (r"--(struct|loc|size|mirror| log_file)\s*=\s*([^\s]+)") 13 result = dict(re.findall(regex, line)) 14 print result 15 16 if result['log_file']: 17stats['filename'] = result['log_file'] 18 if result['struct']: 19stats['struct'] = result['struct'] 20 if result['size']: 21stats['size'] = result['size'] 22 if result['loc']: 23stats['loc'] = result['loc'] 24 if result['mirror']: 25stats['mirror'] = result['mirror'] 26 27 print stats 28 Looks okay to me. If you'd read 'result' using 'get' you wouldn't need to test for the key. 'stats' would then have all keys and value None for keys missing in 'result': stats['filename'] = result.get ('log_file') stats['struct'] = result.get ('struct') This may or may not suit your purpose. Also, I think the regex can just be (r"--(struct|loc|size|mirror|log_file)=([^\s]+)") no need to match white space character (\s* ) before and after the = symbol because this would never happen ( this line is actually a key=value pair of a dictionary getting logged) You are right. I thought your sample line had a space in one of the groups and didn't reread to verify, letting the false impression take hold. Sorry about that. Frederic Regards, Ganesh On Fri, Jun 15, 2018 at 12:53 PM, Friedrich Rentsch < anthra.nor...@bluewin.ch> wrote: Hi Ganesch. Having proposed a solution to your problem, it would be kind of you to let me know whether it has helped. In case you missed my response, I repeat it: regex = re.compile (r"--(struct|loc|size|mirror|l og_file)\s*=\s*([^\s]+)") regex.findall (line) [('struct', 'data_block'), ('log_file', '/var/1000111/test18.log'), ('loc', '0'), ('mirror', '10')] Frederic On 06/13/2018 07:32 PM, Ganesh Pal wrote: On Wed, Jun 13, 2018 at 5:59 PM, Rhodri James wrote: On 13/06/18 09:08, Ganesh Pal wrote: Hi Team, I wanted to parse a file and extract few feilds that are present after "=" in a text file . Example , form the below line I need to extract the values present after --struct =, --loc=, --size= and --log_file= Sample input line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10 --path=/tmp/data_block.txt size=8' Did you mean "--size=8" at the end? That's what your explanation implied. Yes James you got it right , I meant "--size=8 " ., Hi Team, I played further with python's re.findall() and I am able to extract all the required fields , I have 2 further questions too , please suggest Question 1: Please let me know the mistakes in the below code and suggest if it can be optimized further with better regex # This code has to extract various the fields from a single line ( assuming the line is matched here ) of a log file that contains various values (and then store the extracted values in a dictionary ) import re line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10 --path=/tmp/data_block.txt --size=8' #loc is an number r_loc = r"--loc=([0-9]+)" r_size = r'--size=([0-9]+)' r_struct = r'--struct=([A-Za-z_]+)' r_log_file = r'--log_file=([A-Za-z0-9_/.]+)' if re.findall(r_loc, line): print re.findall(r_loc, line) if re.findall(r_size, line): print re.findall(r_size, line) if re.findall(r_struct, line): print re.findall(r_struct, line) if re.findall(r_log_file, line): print re.findall(r_log_file, line) o/p: root@X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py ['0'] ['8'] ['data_block'] ['/var/1000111/test18.log'] Question 2: I tried to see if I can use re.search with look behind assertion , it seems to work , any comments or suggestions Example: import re line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10 --path=/tmp/data_block.txt --size=8' match = re.search(r'(?P(?<=--loc=)([0-9]+))', line)
Re: Regex to extract multiple fields in the same line
On 06/13/2018 07:32 PM, Ganesh Pal wrote: On Wed, Jun 13, 2018 at 5:59 PM, Rhodri James wrote: On 13/06/18 09:08, Ganesh Pal wrote: Hi Team, I wanted to parse a file and extract few feilds that are present after "=" in a text file . Example , form the below line I need to extract the values present after --struct =, --loc=, --size= and --log_file= Sample input line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10 --path=/tmp/data_block.txt size=8' Did you mean "--size=8" at the end? That's what your explanation implied. How's this? (Supposing that the values contain no spaces): >>> regex = re.compile (r"--(struct|loc|size|mirror|log_file)\s*=\s*([^\s]+)") >>> regex.findall (line) [('struct', 'data_block'), ('log_file', '/var/1000111/test18.log'), ('loc', '0'), ('mirror', '10')] Frederic Yes James you got it right , I meant "--size=8 " ., Hi Team, I played further with python's re.findall() and I am able to extract all the required fields , I have 2 further questions too , please suggest Question 1: Please let me know the mistakes in the below code and suggest if it can be optimized further with better regex # This code has to extract various the fields from a single line ( assuming the line is matched here ) of a log file that contains various values (and then store the extracted values in a dictionary ) import re line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10 --path=/tmp/data_block.txt --size=8' #loc is an number r_loc = r"--loc=([0-9]+)" r_size = r'--size=([0-9]+)' r_struct = r'--struct=([A-Za-z_]+)' r_log_file = r'--log_file=([A-Za-z0-9_/.]+)' if re.findall(r_loc, line): print re.findall(r_loc, line) if re.findall(r_size, line): print re.findall(r_size, line) if re.findall(r_struct, line): print re.findall(r_struct, line) if re.findall(r_log_file, line): print re.findall(r_log_file, line) o/p: root@X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py ['0'] ['8'] ['data_block'] ['/var/1000111/test18.log'] Question 2: I tried to see if I can use re.search with look behind assertion , it seems to work , any comments or suggestions Example: import re line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10 --path=/tmp/data_block.txt --size=8' match = re.search(r'(?P(?<=--loc=)([0-9]+))', line) if match: print match.group('loc') o/p: root@X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py 0 I want to build the sub patterns and use match.group() to get the values , some thing as show below but it doesn't seem to work match = re.search(r'(?P(?<=--loc=)([0-9]+))' r'(?P(?<=--size=)([0-9]+))', line) if match: print match.group('loc') print match.group('size') Regards, Ganesh -- https://mail.python.org/mailman/listinfo/python-list
Re: stock quotes off the web, py style
On 05/16/2018 06:21 PM, Mike McClain wrote: On Wed, May 16, 2018 at 02:33:23PM +0200, Friedrich Rentsch wrote: I didn't know the site you mention. I've been getting quotes from Yahoo daily. The service they discontinued was for up to 50 symbols per page. I now parse a separate page of some 500K of html for each symbol! This site is certainly more concise and surely a lot faster. Thank you sir for the response and code snippet. As it turns out iextrading.com doesn't supply data on mutuals which are the majority of my portfolio so they are not goimng to do me much good after all. If you please, what is the URL of one stock you're getting from Yahoo that requires parsing 500K of html per symbol? That's better than not getting the quotes. If AlphaVantage ever comes back up, they send 100 days quotes for each symbol and I only use today's and yesterday's, but it is easy to parse. You would do multiple symbols in a loop which you enter with an open urllib object, rather than opening a new one for each symbol inside the loop. At the moment I can't see how to do that but will figure it out. Thanks for the pointer. Mike -- "There are three kinds of men. The ones who learn by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." --- Will Rogers I meant to check out AlphaVantage myself and registered, since it appears to be a kind of interest group. I wasn't aware it is down, because I haven't yet tried to log on. But I hope to do so when it comes back. The way I get quotes from Yahoo is a hack: 1. Get a quote on the Yahoo web page. 2. Copy the url. (https://finance.yahoo.com/quote/IBM?p=IBM=1). 3. Compose such urls in a loop one symbol at a time and read nearly 600K of html text for each of them. 4. Parse the text for the numbers I want to extract. Needles in a haystack. Slow for a large set of symbols and grossly inefficient in terms of data traffic. Forget my last suggestion "You would do multiple symbols . . ." that was wrong. You have to open a urllib object for every symbol, the same way you'd open a file for every file name. And thanks to the practitioners for the warnings against using 'eval'. I have hardly ever used it, never in online communications. So my awareness level is low. But I understand the need to be careful. Frederic You would do multiple symbols "You would do multiple symbols -- https://mail.python.org/mailman/listinfo/python-list
Re: stock quotes off the web, py style
On 05/16/2018 02:23 AM, Mike McClain wrote: Initially I got my quotes from a broker daily to plug into a spreadsheet, Then I found Yahoo and wrote a perl script to grab them. When Yahoo quit supplying quotes I found AlphaVantage.co and rewrote the perl script. AlphaVantage.co has been down since last week and I found iextrading.com has a freely available interface. Since it needs a rewrite and I'm trying to get a handle on python this seems like a good opportunity to explore. If someone would please suggest modules to explore. Are there any upper level modules that would allow me to do something like: from module import get def getAquote(symbol): url = 'https://api.iextrading.com/1.0/stock/()/quote'.format(symbol) reply = module.get(url) return my_parse(reply) Thanks, Mike -- Men occasionally stumble over the truth, but most of them pick themselves up and hurry off as if nothing ever happened. - Churchill I didn't know the site you mention. I've been getting quotes from Yahoo daily. The service they discontinued was for up to 50 symbols per page. I now parse a separate page of some 500K of html for each symbol! This site is certainly more concise and surely a lot faster. It serves a naked set of data, which happens to conform to the python source code specification for dictionaries and consequently can be compiled into a dictionary with 'eval', like so: >>> ibm = urllib2.urlopen ("https://api.iextrading.com/1.0/stock/IBM/quote;).read() >>> ibm = eval (ibm) >>> for item in sorted (ibm.items()): print '%-24s%s' % item avgTotalVolume 5331869 calculationPrice close change -0.56 changePercent -0.00388 close 143.74 closeTime 1526414517398 companyName International Business Machines Corporation delayedPrice 143.74 delayedPriceTime 1526414517398 high 143.99 iexAskPrice 0 iexAskSize 0 iexBidPrice 0 iexBidSize 0 iexLastUpdated 0 iexMarketPercent 0 iexRealtimePrice 0 iexRealtimeSize 0 iexVolume 0 latestPrice 143.74 latestSource Close latestTime May 15, 2018 latestUpdate 1526414517398 latestVolume 4085996 low 142.92 marketCap 131948764304 open 143.5 openTime 1526391000646 peRatio 10.34 previousClose 144.3 primaryExchange New York Stock Exchange sector Technology symbol IBM week52High 171.13 week52Low 139.13 ytdChange -0.0485148849103 You would do multiple symbols in a loop which you enter with an open urllib object, rather than opening a new one for each symbol inside the loop. Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: General Purpose Pipeline library?
On 11/22/2017 10:54 AM, Friedrich Rentsch wrote: On 11/21/2017 03:26 PM, Jason wrote: On Monday, November 20, 2017 at 10:49:01 AM UTC-5, Jason wrote: a pipeline can be described as a sequence of functions that are applied to an input with each subsequent function getting the output of the preceding function: out = f6(f5(f4(f3(f2(f1(in)) However this isn't very readable and does not support conditionals. Tensorflow has tensor-focused pipepines: fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') I have some code which allows me to mimic this, but with an implied parameter. def executePipeline(steps, collection_funcs = [map, filter, reduce]): results = None for step in steps: func = step[0] params = step[1] if func in collection_funcs: print func, params[0] results = func(functools.partial(params[0], *params[1:]), results) else: print func if results is None: results = func(*params) else: results = func(*(params+(results,))) return results executePipeline( [ (read_rows, (in_file,)), (map, (lower_row, field)), (stash_rows, ('stashed_file', )), (map, (lemmatize_row, field)), (vectorize_rows, (field, min_count,)), (evaluate_rows, (weights, None)), (recombine_rows, ('stashed_file', )), (write_rows, (out_file,)) ] ) Which gets me close, but I can't control where rows gets passed in. In the above code, it is always the last parameter. I feel like I'm reinventing a wheel here. I was wondering if there's already something that exists? Why do I want this? Because I'm tired of writing code that is locked away in a bespoke function. I'd have an army of functions all slightly different in functionality. I require flexibility in defining pipelines, and I don't want a custom pipeline to require any low-level coding. I just want to feed a sequence of functions to a script and have it process it. A middle ground between the shell | operator and bespoke python code. Sure, I could write many binaries bound by shell, but there are some things done far easier in python because of its extensive libraries and it can exist throughout the execution of the pipeline whereas any temporary persistence has to be though environment variables or files. Well after examining your feedback, it looks like Grapevine has 99% of the concepts that I wanted to invent, even if the | operator seems a bit clunky. I personally prefer the affluent interface convention. But this should work. Kamaelia could also work, but it seems a little bit more grandiose. Thanks everyone who chimed in! This looks very much like I what I have been working on of late: a generic processing paradigm based on chainable building blocks. I call them Workshops, because the base class can be thought of as a workshop that takes some raw material, processes it and delivers the product (to the next in line). Your example might look something like this: >>> import workshops as WS >>> Vectorizer = WS.Chain ( WS.File_Reader (), # WS provides WS.Map (lower_row), # WS provides (wrapped builtin) Row_Stasher (), # You provide WS.Map (lemmatize_row), # WS provides Row_Vectorizer (), # Yours Row_Evaluator (), # Yours Row_Recombiner (), WS.File_Writer (), _name = 'Vectorizer' ) Parameters are process-control settings that travel through a subscription-based mailing system separate from the payload pipe. >>> Vectorizer.post (min_count = ..., ) # Set all parameters that control the entire run. >>> Vectorizer.post ("File_Writer", file_name = 'output_file_name') # Addressed, not meant for File_Reader Run >>> Vectorizer ('input_file_name') # File Writer returns 0 if the Chain completes successfully. 0 If you would provide a list of your functions (input, output, parameters) I'd be happy to show a functioning solution. Writing a Shop follows a simple standard pattern: Naming the subscriptions, if any, and writing a single method that reads the subscribed parameters, if any, then takes payload, processes it and returns the product. I intend to share the system, provided there's an interest. I'd have to tidy it up quite a bit, though, before daring to release it. There's a lot more to it . . . Frederic I'm sorry, I made a mistake with the "From" item. My address is obviously not &quo
Re: General Purpose Pipeline library?
On 11/21/2017 03:26 PM, Jason wrote: On Monday, November 20, 2017 at 10:49:01 AM UTC-5, Jason wrote: a pipeline can be described as a sequence of functions that are applied to an input with each subsequent function getting the output of the preceding function: out = f6(f5(f4(f3(f2(f1(in)) However this isn't very readable and does not support conditionals. Tensorflow has tensor-focused pipepines: fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') I have some code which allows me to mimic this, but with an implied parameter. def executePipeline(steps, collection_funcs = [map, filter, reduce]): results = None for step in steps: func = step[0] params = step[1] if func in collection_funcs: print func, params[0] results = func(functools.partial(params[0], *params[1:]), results) else: print func if results is None: results = func(*params) else: results = func(*(params+(results,))) return results executePipeline( [ (read_rows, (in_file,)), (map, (lower_row, field)), (stash_rows, ('stashed_file', )), (map, (lemmatize_row, field)), (vectorize_rows, (field, min_count,)), (evaluate_rows, (weights, None)), (recombine_rows, ('stashed_file', )), (write_rows, (out_file,)) ] ) Which gets me close, but I can't control where rows gets passed in. In the above code, it is always the last parameter. I feel like I'm reinventing a wheel here. I was wondering if there's already something that exists? Why do I want this? Because I'm tired of writing code that is locked away in a bespoke function. I'd have an army of functions all slightly different in functionality. I require flexibility in defining pipelines, and I don't want a custom pipeline to require any low-level coding. I just want to feed a sequence of functions to a script and have it process it. A middle ground between the shell | operator and bespoke python code. Sure, I could write many binaries bound by shell, but there are some things done far easier in python because of its extensive libraries and it can exist throughout the execution of the pipeline whereas any temporary persistence has to be though environment variables or files. Well after examining your feedback, it looks like Grapevine has 99% of the concepts that I wanted to invent, even if the | operator seems a bit clunky. I personally prefer the affluent interface convention. But this should work. Kamaelia could also work, but it seems a little bit more grandiose. Thanks everyone who chimed in! -- https://mail.python.org/mailman/listinfo/python-list
Re: General Purpose Pipeline library?
On 11/21/2017 03:26 PM, Jason wrote: On Monday, November 20, 2017 at 10:49:01 AM UTC-5, Jason wrote: a pipeline can be described as a sequence of functions that are applied to an input with each subsequent function getting the output of the preceding function: out = f6(f5(f4(f3(f2(f1(in)) However this isn't very readable and does not support conditionals. Tensorflow has tensor-focused pipepines: fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') I have some code which allows me to mimic this, but with an implied parameter. def executePipeline(steps, collection_funcs = [map, filter, reduce]): results = None for step in steps: func = step[0] params = step[1] if func in collection_funcs: print func, params[0] results = func(functools.partial(params[0], *params[1:]), results) else: print func if results is None: results = func(*params) else: results = func(*(params+(results,))) return results executePipeline( [ (read_rows, (in_file,)), (map, (lower_row, field)), (stash_rows, ('stashed_file', )), (map, (lemmatize_row, field)), (vectorize_rows, (field, min_count,)), (evaluate_rows, (weights, None)), (recombine_rows, ('stashed_file', )), (write_rows, (out_file,)) ] ) Which gets me close, but I can't control where rows gets passed in. In the above code, it is always the last parameter. I feel like I'm reinventing a wheel here. I was wondering if there's already something that exists? Why do I want this? Because I'm tired of writing code that is locked away in a bespoke function. I'd have an army of functions all slightly different in functionality. I require flexibility in defining pipelines, and I don't want a custom pipeline to require any low-level coding. I just want to feed a sequence of functions to a script and have it process it. A middle ground between the shell | operator and bespoke python code. Sure, I could write many binaries bound by shell, but there are some things done far easier in python because of its extensive libraries and it can exist throughout the execution of the pipeline whereas any temporary persistence has to be though environment variables or files. Well after examining your feedback, it looks like Grapevine has 99% of the concepts that I wanted to invent, even if the | operator seems a bit clunky. I personally prefer the affluent interface convention. But this should work. Kamaelia could also work, but it seems a little bit more grandiose. Thanks everyone who chimed in! This looks very much like I what I have been working on of late: a generic processing paradigm based on chainable building blocks. I call them Workshops, because the base class can be thought of as a workshop that takes some raw material, processes it and delivers the product (to the next in line). Your example might look something like this: >>> import workshops as WS >>> Vectorizer = WS.Chain ( WS.File_Reader (), # WS provides WS.Map (lower_row), # WS provides (wrapped builtin) Row_Stasher (), # You provide WS.Map (lemmatize_row), # WS provides. Name for addressed Directions sending. Row_Vectorizer (), # Yours Row_Evaluator (), # Yours Row_Recombiner (), WS.File_Writer (), _name = 'Vectorizer' ) Parameters are process-control settings that travel through a subscription-based mailing system separate from the payload pipe. >>> Vectorizer.post (min_count = ..., ) # Set all parameters that control the entire run. >>> Vectorizer.post ("File_Writer", file_name = 'output_file_name') # Addressed, not meant for File_Reader Run >>> Vectorizer ('input_file_name') # File Writer returns 0 if the Chain completes successfully. 0 If you would provide a list of your functions (input, output, parameters) I'd be happy to show a functioning solution. Writing a Shop follows a simple standard pattern: Naming the subscriptions, if any, and writing a single method that reads the subscribed parameters, if any, then takes payload, processes it and returns the product. I intend to share the system, provided there's an interest. I'd have to tidy
execfile and import not working
Hi, I am setting up Python 2.7 after an upgrade to Ubuntu 16.04, a thorough one, leaving no survivors. Everything is fine, IDLE opens, ready to go. Alas, execfile and import commands don't do my bidding, but hang IDLE. All I can do is kill the process named "python" from a bash terminal. IDLE then is still open, says "=== RESTART: Shell ===" and is again ready for action. It works interactively, but no imports . . . What could be the problem? Thanks for ideas Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: Redirecting input of IDLE window
On 08/14/2017 10:47 AM, Friedrich Rentsch wrote: Hi, I work interactively in an IDLE window most of the time and find "help (...)" very useful to summarize things. The display comes up directly (doesn't return a text, which I could edit, assign or store). I suspect that there are ways to redirect the display, say to a file. Thanks for suggestions. Frederic Peter Otten's "mypager" works well. All suggestions provide a welcome opportunity to learn more about the inner workings. Thank you all for your responses. Frederic -- https://mail.python.org/mailman/listinfo/python-list
Redirecting input of IDLE window
Hi, I work interactively in an IDLE window most of the time and find "help (...)" very useful to summarize things. The display comes up directly (doesn't return a text, which I could edit, assign or store). I suspect that there are ways to redirect the display, say to a file. Thanks for suggestions. Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding the name of an object's source file
On 06/06/2017 03:52 PM, Matt Wheeler wrote: On Tue, 6 Jun 2017 at 11:20 Peter Otten <__pete...@web.de> wrote: import os inspect.getsourcefile(os.path.split) '/usr/lib/python3.4/posixpath.py' And so much more fun than scanning the documentation :) Alternatively, without using inspect, we can get around `Object.__module__` being a string by importing it as a string: import importlib, os importlib.import_module(os.path.split.__module__).__file__ '/Users/matt/.pyenv/versions/3.6.0/lib/python3.6/posixpath.py' Stupendous! Thanks both of you. I tried the Peter's inspect-based method on a hierarchical assembly of objects: >>> def sources (S, indent = 0): print indent * '\t' + '%-60s%s' % (S.__class__, inspect.getsourcefile (S.__class__)) if isinstance (S, WS._Association): for s in S: sources (s, indent + 1) >>> sources (M[1][1]) /home/fr/python/util/workshops.py /home/fr/python/util/workshops.py /home/fr/python/finance/position.py positions_initializer.Positions_Initializer /home/fr/python/finance/positions_initializer.py position.Position_Activity /home/fr/python/finance/position.py position.normalizer /home/fr/python/finance/position.py position.split_adjuster /home/fr/python/finance/position.py position.Journalizer /home/fr/python/finance/position.py current_work.report_unmerger /home/fr/temp/current_work.py /home/fr/python/finance/position.py position.report_name_sender /home/fr/python/finance/position.py position.Summary_Report /home/fr/python/finance/position.py workshops.File_Writer /home/fr/python/util/workshops.py Wonderful! Awesome! Matt's solution works well too. Thanks a million Frederic -- https://mail.python.org/mailman/listinfo/python-list
Finding the name of an object's source file
Hi all, Developing a project, I have portions that work and should be assembled into a final program. Some parts don't interconnect when they should, because of my lack of rigor in managing versions. So in order to get on, I should next tidy up the mess and the way I can think of to do it is to read out the source file names of all of a working component's elements, then delete unused files and consolidate redundancy. So, the task would be to find source file names. inspect.getsource () knows which file to take the source from, but as far as I can tell, none of its methods reveals that name, if called on a command line (>>> inspect.(get source file name) ()). Object.__module__ works. Module.__file__ works, but Object.__module__.__file__ doesn't, because Object.__module__ is only a string. After one hour of googling, I believe inspect() is used mainly at runtime (introspection?) for tracing purposes. An alternative to inspect() has not come up. I guess I could grep inspect.(getsource ()), but that doesn't feel right. There's probably a simpler way. Any suggestions? Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: use regex to search the page one time to get two types of Information
On 08/19/2016 09:02 AM, iMath wrote: I need to use regex to search two types of Information within a web page, while it seems searching the page two times rather than one is much time consuming , is it possible to search the page one time to get two or more types of Information? >>> r = re.compile ('page|Information|time') >>> r.findall ( (your post) ) ['Information', 'page', 'page', 'time', 'time', 'page', 'time', 'Information'] Does that look right? Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: Scraping email to make invoice
On 04/24/2016 08:58 PM, CM wrote: I would like to write a Pythons script to automate a tedious process and could use some advice. The source content will be an email that has 5-10 PO (purchase order) numbers and information for freelance work done. The target content will be an invoice. (There will be an email like this every week). Right now, the "recommended" way to go (from the company) from source to target is manually copying and pasting all the tedious details of the work done into the invoice. But this is laborious, error-prone...and just begging for automation. There is no human judgment necessary whatsoever in this. I'm comfortable with "scraping" a text file and have written scripts for this, but could use some pointers on other parts of this operation. 1. INPUT: What's the best way to scrape an email like this? The email is to a Gmail account, and the content shows up in the email as a series of basically 6x7 tables (HTML?), one table per PO number/task. I know if the freelancer were to copy and paste the whole set of tables into a text file and save it as plain text, Python could easily scrape that file, but I'd much prefer to save the user those steps. Is there a relatively easy way to go from the Gmail email to generating the invoice directly? (I know there is, but wasn't sure what is state of the art these days). 2. OUPUT: The invoice will have boilerplate content on top and then an Excel table at bottom that is mostly the same information from the source content. Ideally, so that the invoice looks good, the invoice should be a Word document. For the first pass at this, it looked best by laying out the entire invoice in Excel and then copy and pasting it into a Word doc as an image (since otherwise the columns ran over for some reason). In any case, the goal is to create a single page invoice that looks like a clean, professional looking invoice. 3. UI: I am comfortable with making GUI apps, so could use this as the interface for the (somewhat computer-uncomfortable) user. But the less user actions necessary, the better. The emails always come from the same sender, and always have the same boilerplate language ("Below please find your Purchase Order (PO)"), so I'm envisioning a small GUI window with a single button that says "MAKE NEWEST INVOICE" and the user presses it and it automatically searches the user's email for PO # emails and creates the newest invoice. I'm guessing I could keep a sqlite database or flat file on the computer to just track what is meant by "newest", and then the output would have the date created in the file, so the user can be sure what has been invoiced. I'm hoping I can write this in a couple of days. Any suggestions welcome! Thanks. INPUT: What's the best way to scrape an email like this? -- Like what? You need to explain what exactly your input is or show an example. Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: Review Request of Python Code
On 03/09/2016 05:18 AM, subhabangal...@gmail.com wrote: Dear Group, I am trying to write a code for pulling data from MySQL at the backend and annotating words and trying to put the results as separated sentences with each line. The code is generally running fine but I am feeling it may be better in the end of giving out sentences, and for small data sets it is okay but with 50,000 news articles it is performing dead slow. I am using Python2.7.11 on Windows 7 with 8GB RAM. I am trying to copy the code here, for your kind review. import MySQLdb import nltk def sql_connect_NewTest1(): db = MySQLdb.connect(host="localhost", user="*", passwd="*", db="abcd_efgh") cur = db.cursor() #cur.execute("SELECT * FROM newsinput limit 0,5;") #REPORTING RUNTIME ERROR cur.execute("SELECT * FROM newsinput limit 0,50;") dict_open=open("/python27/NewTotalTag.txt","r") #OPENING THE DICTIONARY FILE dict_read=dict_open.read() dict_word=dict_read.split() a4=dict_word #Assignment for code. list1=[] flist1=[] nlist=[] for row in cur.fetchall(): #print row[2] var1=row[3] #print var1 #Printing lines #var2=len(var1) # Length of file var3=var1.split(".") #SPLITTING INTO LINES #print var3 #Printing The Lines #list1.append(var1) var4=len(var3) #Number of all lines #print "No",var4 for line in var3: #print line #flist1.append(line) linew=line.split() for word in linew: if word in a4: windex=a4.index(word) windex1=windex+1 word1=a4[windex1] word2=word+"/"+word1 nlist.append(word2) #print list1 #print nlist elif word not in a4: word3=word+"/"+"NA" nlist.append(word3) #print list1 #print nlist else: print "None" #print "###",flist1 #print len(flist1) #db.close() #print nlist lol = lambda lst, sz: [lst[i:i+sz] for i in range(0, len(lst), sz)] #TRYING TO SPLIT THE RESULTS AS SENTENCES nlist1=lol(nlist,7) #print nlist1 for i in nlist1: string1=" ".join(i) print i #print string1 Thanks in Advance. I have a modular processing framework in its final stages of completion whose purpose is to save (a lot of) time coding the kind of problem you describe. I intend to upload the system and am currently interested in real-world cases for the manual. I tried coding your problem, thinking it would take no more than a minute. It wasn't that easy, because don't say what input you have, nor what you expect your program to do. Inferring the missing info from your code takes more time that I can spare. So, if you would give a few lines of your input and explain your purpose, I'd be happy to help. Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding Blank Columns in CSV
On 10/05/2015 03:29 PM, Jaydip Chakrabarty wrote: Hello, I have a csv file like this. Name,Surname,Age,Sex abc,def,,M ,ghi,,F jkl,mno,, pqr,,,F I want to find out the blank columns, that is, fields where all the values are blank. Here is my python code. fn = "tmp1.csv" fin = open(fn, 'rb') rdr = csv.DictReader(fin, delimiter=',') data = list(rdr) flds = rdr.fieldnames fin.close() mt = [] flag = 0 for i in range(len(flds)): for row in data: if len(row[flds[i]]): flag = 0 break else: flag = 1 if flag: mt.append(flds[i]) flag = 0 print mt I need to know if there is better way to code this. Thanks. Operations on columns are often simpler, if a table is rotated beforehand. Columns become lists. def find_empty_columns (table): number_of_records = len (table) rotated_table = zip (*table) indices_of_empty_columns = [] for i in range (len (rotated_table)): # Column indices if rotated_table[i].count ('') == number_of_records: indices_of_empty_columns.append (i) return indices_of_empty_columns Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: Reading \n unescaped from a file
On 09/06/2015 09:51 AM, Peter Otten wrote: Friedrich Rentsch wrote: My response was meant for the list, but went to Peter by mistake. So I repeat it with some delay: On 09/03/2015 04:24 PM, Peter Otten wrote: Friedrich Rentsch wrote: On 09/03/2015 11:24 AM, Peter Otten wrote: Friedrich Rentsch wrote: I appreciate your identifying two mistakes. I am curious to know what they are. Sorry for not being explicit. substitutes = [self.table [item] for item in hits if item in valid_hits] + [] # Make lengths equal for zip to work right That looks wrong... You are adding an empty list here. I wondered what you were trying to achieve with that. Right you are! It doesn't do anything. I remember my idea was to pad the substitutes list by one, because the list of intervening text segments is longer by one element and zip uses the least common length, discarding all overhang. The remedy was totally ineffective and, what's more, not needed, judging by the way the editor performs as expected. That's because you are getting the same effect later by adding nohits[-1] You could avoid that by replacing [] with [""]. substitutes = list("12") nohits = list("abc") zipped = zip(nohits, substitutes) "".join(list(reduce(lambda a, b: a+b, [zipped][0]))) + nohits[-1] 'a1b2c' zipped = zip(nohits, substitutes + [""]) "".join(list(reduce(lambda a, b: a+b, [zipped][0]))) 'a1b2c' By the way, even those who are into functional programming might find "".join(map("".join, zipped)) 'a1b2c' more readable. But there's a more general change that I suggest: instead of processing the string twice, first to search for matches, then for the surrounding text you could achieve the same in one pass with a cool feature of the re.sub() method -- it accepts a function: def replace(text, replacements): ... table = dict(replacements) ... def substitute(match): ... return table[match.group()] ... regex = "|".join(re.escape(find) for find, replace in replacements) ... return re.compile(regex).sub(substitute, text) ... replace("1 foo 2 bar 1 baz", [("1", "one"), ("2", "two")]) 'one foo two bar one baz' I didn't think of using sub. But you're right. It is better, likely faster too. Building the regex reversed sorted will make it handle overlapping targets correctly, e.g.: r = ( ("1", "one"), ("2", "two"), ("12", "twelve"), ) Your function as posted: replace ('1 foo 2 bar 12 baz', r) 'one foo two bar onetwo baz' regex = "|".join(re.escape(find) for find, replace in reversed (sorted (replacements))) replace ('1 foo 2 bar 12 baz', r) 'one foo two bar twelve baz' Thanks for the hints Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: Reading \n unescaped from a file
My response was meant for the list, but went to Peter by mistake. So I repeat it with some delay: On 09/03/2015 04:24 PM, Peter Otten wrote: Friedrich Rentsch wrote: On 09/03/2015 11:24 AM, Peter Otten wrote: Friedrich Rentsch wrote: I appreciate your identifying two mistakes. I am curious to know what they are. Sorry for not being explicit. substitutes = [self.table [item] for item in hits if item in valid_hits] + [] # Make lengths equal for zip to work right That looks wrong... You are adding an empty list here. I wondered what you were trying to achieve with that. Right you are! It doesn't do anything. I remember my idea was to pad the substitutes list by one, because the list of intervening text segments is longer by one element and zip uses the least common length, discarding all overhang. The remedy was totally ineffective and, what's more, not needed, judging by the way the editor performs as expected. output = input ...and so does this. That seems to be the only occurence of the name "input" in your code. Did you mean "text" or do you really want to return the built-in? Right you are again! I did mean text. I changed a few names to make them more suggestive, and apparently missed this one. Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: Reading \n unescaped from a file
On 09/03/2015 11:24 AM, Peter Otten wrote: Friedrich Rentsch wrote: On 09/02/2015 04:03 AM, Rob Hills wrote: Hi, I am developing code (Python 3.4) that transforms text data from one format to another. As part of the process, I had a set of hard-coded str.replace(...) functions that I used to clean up the incoming text into the desired output format, something like this: dataIn = dataIn.replace('\r', '\\n') # Tidy up linefeeds dataIn = dataIn.replace('','<') # Tidy up < character dataIn = dataIn.replace('','>') # Tidy up < character dataIn = dataIn.replace('','o') # No idea why but lots of these: convert to 'o' character dataIn = dataIn.replace('','f') # .. and these: convert to 'f' character dataIn = dataIn.replace('','e') # .. 'e' dataIn = dataIn.replace('','O') # .. 'O' These statements transform my data correctly, but the list of statements grows as I test the data so I thought it made sense to store the replacement mappings in a file, read them into a dict and loop through that to do the cleaning up, like this: with open(fileName, 'r+t', encoding='utf-8') as mapFile: for line in mapFile: line = line.strip() try: if (line) and not line.startswith('#'): line = line.split('#')[:1][0].strip() # trim any trailing comments name, value = line.split('=') name = name.strip() self.filterMap[name]=value.strip() except: self.logger.error('exception occurred parsing line [{0}] in file [{1}]'.format(line, fileName)) raise Elsewhere, I use the following code to do the actual cleaning up: def filter(self, dataIn): if dataIn: for token, replacement in self.filterMap.items(): dataIn = dataIn.replace(token, replacement) return dataIn My mapping file contents look like this: \r = \\n â = = < = > = = F = o = f = e = O This all works "as advertised" */except/* for the '\r' => '\\n' replacement. Debugging the code, I see that my '\r' character is "escaped" to '\\r' and the '\\n' to 'n' when they are read in from the file. I've been googling hard and reading the Python docs, trying to get my head around character encoding, but I just can't figure out how to get these bits of code to do what I want. It seems to me that I need to either: * change the way I represent '\r' and '\\n' in my mapping file; or * transform them somehow when I read them in However, I haven't figured out how to do either of these. TIA, I have had this problem too and can propose a solution ready to run out of my toolbox: class editor: def compile (self, replacements): targets, substitutes = zip (*replacements) re_targets = [re.escape (item) for item in targets] re_targets.sort (reverse = True) self.targets_set = set (targets) self.table = dict (replacements) regex_string = '|'.join (re_targets) self.regex = re.compile (regex_string, re.DOTALL) def edit (self, text, eat = False): hits = self.regex.findall (text) nohits = self.regex.split (text) valid_hits = set (hits) & self.targets_set # Ignore targets with illegal re modifiers. Can you give an example of an ignored target? I don't see the light... if valid_hits: substitutes = [self.table [item] for item in hits if item in valid_hits] + [] # Make lengths equal for zip to work right That looks wrong... if eat: output = ''.join (substitutes) else: zipped = zip (nohits, substitutes) output = ''.join (list (reduce (lambda a, b: a + b, [zipped][0]))) + nohits [-1] else: if eat: output = '' else: output = input ...and so does this. return output >>> substitutions = ( ('\r', '\n'), ('', '<'), ('', '>'), ('', 'o'), ('', 'f'), ('', 'e'), ('', 'O'), ) Order doesn't matter. Add new ones at the end. >>> e = editor () >>> e.compile (substitutions) A simple way of testing is running the substitutions through the editor >>> print e.edit (repr (substitutions)) (('\r', '\n'), ('<', '<'), ('>', '>'), ('o', 'o'), ('f', 'f'), ('e', 'e'), ('O', 'O')) The escapes need to be tested separately >>> print e.edit ('abc\rdef') abc def Note: This editor's compiler compiles the substitution list to a regular expression which the editor uses to find all matches in the text passed to edit. There has got to be a limit to the size of a text which
Re: Reading \n unescaped from a file
On 09/03/2015 06:12 PM, Rob Hills wrote: Hi Friedrich, On 03/09/15 16:40, Friedrich Rentsch wrote: On 09/02/2015 04:03 AM, Rob Hills wrote: Hi, I am developing code (Python 3.4) that transforms text data from one format to another. As part of the process, I had a set of hard-coded str.replace(...) functions that I used to clean up the incoming text into the desired output format, something like this: dataIn = dataIn.replace('\r', '\\n') # Tidy up linefeeds dataIn = dataIn.replace('','<') # Tidy up < character dataIn = dataIn.replace('','>') # Tidy up < character dataIn = dataIn.replace('','o') # No idea why but lots of these: convert to 'o' character dataIn = dataIn.replace('','f') # .. and these: convert to 'f' character dataIn = dataIn.replace('','e') # .. 'e' dataIn = dataIn.replace('','O') # .. 'O' These statements transform my data correctly, but the list of statements grows as I test the data so I thought it made sense to store the replacement mappings in a file, read them into a dict and loop through that to do the cleaning up, like this: with open(fileName, 'r+t', encoding='utf-8') as mapFile: for line in mapFile: line = line.strip() try: if (line) and not line.startswith('#'): line = line.split('#')[:1][0].strip() # trim any trailing comments name, value = line.split('=') name = name.strip() self.filterMap[name]=value.strip() except: self.logger.error('exception occurred parsing line [{0}] in file [{1}]'.format(line, fileName)) raise Elsewhere, I use the following code to do the actual cleaning up: def filter(self, dataIn): if dataIn: for token, replacement in self.filterMap.items(): dataIn = dataIn.replace(token, replacement) return dataIn My mapping file contents look like this: \r = \\n â = = < = > = = F = o = f = e = O This all works "as advertised" */except/* for the '\r' => '\\n' replacement. Debugging the code, I see that my '\r' character is "escaped" to '\\r' and the '\\n' to 'n' when they are read in from the file. I've been googling hard and reading the Python docs, trying to get my head around character encoding, but I just can't figure out how to get these bits of code to do what I want. It seems to me that I need to either: * change the way I represent '\r' and '\\n' in my mapping file; or * transform them somehow when I read them in However, I haven't figured out how to do either of these. TIA, I have had this problem too and can propose a solution ready to run out of my toolbox: class editor: def compile (self, replacements): targets, substitutes = zip (*replacements) re_targets = [re.escape (item) for item in targets] re_targets.sort (reverse = True) self.targets_set = set (targets) self.table = dict (replacements) regex_string = '|'.join (re_targets) self.regex = re.compile (regex_string, re.DOTALL) def edit (self, text, eat = False): hits = self.regex.findall (text) nohits = self.regex.split (text) valid_hits = set (hits) & self.targets_set # Ignore targets with illegal re modifiers. if valid_hits: substitutes = [self.table [item] for item in hits if item in valid_hits] + [] # Make lengths equal for zip to work right if eat: output = ''.join (substitutes) else: zipped = zip (nohits, substitutes) output = ''.join (list (reduce (lambda a, b: a + b, [zipped][0]))) + nohits [-1] else: if eat: output = '' else: output = input return output substitutions = ( ('\r', '\n'), ('', '<'), ('', '>'), ('', 'o'), ('', 'f'), ('', 'e'), ('', 'O'), ) Order doesn't matter. Add new ones at the end. e = editor () e.compile (substitutions) A simple way of testing is running the substitutions through the editor print e.edit (repr (substitutions)) (('\r', '\n'), ('<', '<'), ('>', '>'), ('o', 'o'), ('f', 'f'), ('e', 'e'), ('O', 'O')) The escapes need to be tested separately print e.edit ('abc\rdef') abc def Note: This editor's compiler compiles the substitution list to a regular expression which the editor uses to find all matches in the text passed to edit. There has got to be a limit to the size of a text which a regular expression can handle. I don't know what this limit is. To be on the safe side, edit a large text line by line or at least in sensible chunks. Frederic Thanks for the suggestion. I had ori
Re: Reading \n unescaped from a file
On 09/02/2015 04:03 AM, Rob Hills wrote: Hi, I am developing code (Python 3.4) that transforms text data from one format to another. As part of the process, I had a set of hard-coded str.replace(...) functions that I used to clean up the incoming text into the desired output format, something like this: dataIn = dataIn.replace('\r', '\\n') # Tidy up linefeeds dataIn = dataIn.replace('','<') # Tidy up < character dataIn = dataIn.replace('','>') # Tidy up < character dataIn = dataIn.replace('','o') # No idea why but lots of these: convert to 'o' character dataIn = dataIn.replace('','f') # .. and these: convert to 'f' character dataIn = dataIn.replace('','e') # .. 'e' dataIn = dataIn.replace('','O') # .. 'O' These statements transform my data correctly, but the list of statements grows as I test the data so I thought it made sense to store the replacement mappings in a file, read them into a dict and loop through that to do the cleaning up, like this: with open(fileName, 'r+t', encoding='utf-8') as mapFile: for line in mapFile: line = line.strip() try: if (line) and not line.startswith('#'): line = line.split('#')[:1][0].strip() # trim any trailing comments name, value = line.split('=') name = name.strip() self.filterMap[name]=value.strip() except: self.logger.error('exception occurred parsing line [{0}] in file [{1}]'.format(line, fileName)) raise Elsewhere, I use the following code to do the actual cleaning up: def filter(self, dataIn): if dataIn: for token, replacement in self.filterMap.items(): dataIn = dataIn.replace(token, replacement) return dataIn My mapping file contents look like this: \r = \\n â = = < = > = = F = o = f = e = O This all works "as advertised" */except/* for the '\r' => '\\n' replacement. Debugging the code, I see that my '\r' character is "escaped" to '\\r' and the '\\n' to 'n' when they are read in from the file. I've been googling hard and reading the Python docs, trying to get my head around character encoding, but I just can't figure out how to get these bits of code to do what I want. It seems to me that I need to either: * change the way I represent '\r' and '\\n' in my mapping file; or * transform them somehow when I read them in However, I haven't figured out how to do either of these. TIA, I have had this problem too and can propose a solution ready to run out of my toolbox: class editor: def compile (self, replacements): targets, substitutes = zip (*replacements) re_targets = [re.escape (item) for item in targets] re_targets.sort (reverse = True) self.targets_set = set (targets) self.table = dict (replacements) regex_string = '|'.join (re_targets) self.regex = re.compile (regex_string, re.DOTALL) def edit (self, text, eat = False): hits = self.regex.findall (text) nohits = self.regex.split (text) valid_hits = set (hits) & self.targets_set # Ignore targets with illegal re modifiers. if valid_hits: substitutes = [self.table [item] for item in hits if item in valid_hits] + [] # Make lengths equal for zip to work right if eat: output = ''.join (substitutes) else: zipped = zip (nohits, substitutes) output = ''.join (list (reduce (lambda a, b: a + b, [zipped][0]))) + nohits [-1] else: if eat: output = '' else: output = input return output >>> substitutions = ( ('\r', '\n'), ('', '<'), ('', '>'), ('', 'o'), ('', 'f'), ('', 'e'), ('', 'O'), ) Order doesn't matter. Add new ones at the end. >>> e = editor () >>> e.compile (substitutions) A simple way of testing is running the substitutions through the editor >>> print e.edit (repr (substitutions)) (('\r', '\n'), ('<', '<'), ('>', '>'), ('o', 'o'), ('f', 'f'), ('e', 'e'), ('O', 'O')) The escapes need to be tested separately >>> print e.edit ('abc\rdef') abc def Note: This editor's compiler compiles the substitution list to a regular expression which the editor uses to find all matches in the text passed to edit. There has got to be a limit to the size of a text which a regular expression can handle. I don't know what this limit is. To be on the safe side, edit a large text line by line or at least in sensible chunks. Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: How to model government organization hierarchies so that the list can expand and compress
On 08/13/2015 09:10 PM, Alex Glaros wrote: It's like the desktop folder/directory model where you can create unlimited folders and put folders within other folders. Instead of folders, I want to use government organizations. Example: Let user create agency names: Air Force, Marines, Navy, Army. Then let them create an umbrella collection called Pentagon, and let users drag Air Force, Marines, Navy, etc. into the umbrella collection. User may wish to add smaller sub-sets of Army, such as Army Jeep Repair Services User may also want to add a new collection Office of the President and put OMB and Pentagon under that as equals. What would the data model look like for this? If I have a field: next_higher_level_parent that lets children records keep track of parent record, it's hard for me to imagine anything but an inefficient bubble sort to produce a hierarchical organizational list. Am using Postgres, not graph database. I'm hoping someone else has worked on this problem, probably not with government agency names, but perhaps the same principle with other objects. Thanks! Alex Glaros After struggling for years with a tree-like estate management system (onwer at the top, next level: real estate, banks, art collection, etc., third level: real estate units, bank accounts, etc. fourth level: investment positions, currency accounts, etc)--it recently occurred to me that I had such a system all along: the file system. The last folder at the bottom end of each branch names its contents (AAPL or USD or Lamborghini, etc) the contents is a csv file recording an in and out, revenue, expense history (date, quantity, paid or received, memo, . . .). Any documentation on the respective value item may also be stored in the same folder, easy to find without requiring cross referencing. Managing the data is not a awkward as one might fear. A bash wizard could probably do it quite efficiently with bash scripts. Bash dummies, like me, are more comfortable with python. Moving, say, a portfolio from one bank to another is a matter of mv i/banks/abc/account-123 i/banks/xyz (system call). With the tabular data base system (MySQL) I have, simple operations like this one are quite awkward. Well, you might laugh. Or others might. If your task is a commercial order, then this approach will hardly do. Anyway, I thought I'd toss it up. If it won't help it won't hurt. Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: Who uses IDLE -- please answer if you ever do, know, or teach
On 08/06/2015 03:21 AM, Rustom Mody wrote: On Thursday, August 6, 2015 at 6:36:56 AM UTC+5:30, Terry Reedy wrote: There have been discussions, such as today on Idle-sig , about who uses Idle and who we should design it for. If you use Idle in any way, or know of or teach classes using Idle, please answer as many of the questions below as you are willing, and as are appropriate Private answers are welcome. They will be deleted as soon as they are tallied (without names). I realized that this list is a biased sample of the universe of people who have studied Python at least, say, a month. But biased data should be better than my current vague impressions. 0. Classes where Idle is used: Where? Level? Idle users: 1. Are you grade school (1=12)? undergraduate (Freshman-Senior)? post-graduate (from whatever)? 2. Are you beginner (1st class, maybe 2nd depending on intensity of first)? post-beginner? 3. With respect to programming, are you amateur (unpaid) professional (paid for programming) -- Terry Jan Reedy, Idle maintainer I used idle to teach a 2nd year engineering course last sem It was a more pleasant experience than I expected One feature that would help teachers: It would be nice to (have setting to) auto-save the interaction window [Yeah I tried to see if I could do it by hand but could not find where] Useful for giving as handouts of the class So students rest easy and dont need to take 'literal' notes of the session I will now be teaching more advanced students and switching back to emacs -- python, C, and others -- so really no option to emacs. Not ideal at all but nothing else remotely comparable I've been using Idle full time to simultaneously manage my financial holdings, develop the management system and manually fix errors. While the ultimate goal is a push-button system, I have not reached that stage and am compelled to work trial-and-error style. For this way of working I found Idle well-suited, since the majority of jobs I do are hacks and quick fixes, not production runs running reliably. I recently came up with a data transformation framework that greatly expedites interactive development. It is based on transformer objects that wrap a transformation function. The base class Transformer handles the flow of the data in a manner that allows linking the transformer modules together in chains. With a toolbox of often used standards, a great variety of transformation tasks can be accomplished by simply lining up a bunch of toolbox transformers in chains. Bridging a gap now and then is a relatively simple matter of writing a transformation function that converts the output format upstream of the gap to the required input format downstream of the gap. The system works very well. It saves me a lot of time. I am currently writing a manual with the intention to upload it for comment and also to upload the system, if the comments are not too discouraging. If I may show a few examples below . . . Frederic (moderately knowledgeable non-professional) -- import TYX FR = TYX.File_Reader () CSVP = TYX.CSV_Parser () TAB = TYX.Tabulator () print TAB (CSVP (FR ('Downloads/xyz.csv'))) # Calls nest --- Date,Open,Close,High,Low,Volume 07/18/2014,34.36,34.25,34.36,34.25,485 07/17/2014,34.55,34.50,34.55,34.47,2,415 07/16/2014,34.65,34.63,34.68,34.52,83,477 --- CSVP.get () # display all parameters CSV_Parser dialect None delimiter '\t' quote '' has_headerFalse strip_fields True headers [] CSVP.set (delimiter = ',') TAB.set (table_format = 'pipe') print TAB (CSVP ()) # Transformers retain their input |:---|:--|:--|:--|:--|:---| | Date | Open | Close | High | Low | Volume | | 07/18/2014 | 34.36 | 34.25 | 34.36 | 34.25 | 485| | 07/17/2014 | 34.55 | 34.50 | 34.55 | 34.47 | 2,415 | | 07/16/2014 | 34.65 | 34.63 | 34.68 | 34.52 | 83,477 | class formatter (TYX.Transformer): def __init__ (self): TYX.Transformer.__init__ (self, symbol = None) # declare parameter def transform (self, records): symbol = self.get ('symbol') if symbol: out = [] for d, o, c, h, l, v in records [1:]: # Clip headers month, day, year = d.split ('/') d = '%s-%s-%s' % (year, month, day) v = v.replace (',', '') out.append ((d, symbol, o, c, h, l, v)) return out fo = formatter () fo.set (symbol = 'XYZ') TAB.set (float_format = 'f') print TAB (fo (CSVP())) # Transformers also retain their output |:---|:|--:|--:|--:|--:|--:| | 2014-07-18 | XYZ | 34.36
Re: Extract email address from Java script in html source using python
On 05/23/2015 04:15 PM, savitha devi wrote: What I exactly want is the java script is in the html code. I am trying for a regular expression to find the email address embedded with in the java script. On Sat, May 23, 2015 at 2:31 PM, Chris Angelico ros...@gmail.com wrote: On Sat, May 23, 2015 at 4:46 PM, savitha devi savith...@gmail.com wrote: I am developing a web scraper code using HTMLParser. I need to extract text/email address from java script with in the HTMLCode.I am beginner level in python coding and totally lost here. Need some help on this. The java script code is as below: script type='text/javascript' //!-- document.getElementById('cloak48218').innerHTML = ''; var prefix = '#109;a' + 'i#108;' + '#116;o'; var path = 'hr' + 'ef' + '='; var addy48218 = '#105;nf#111;' + '#64;'; addy48218 = addy48218 + 'tsv-n#101;#117;r#105;#101;d' + '#46;' + 'd#101;'; document.getElementById('cloak48218').innerHTML += 'a ' + path + '\'' + prefix + ':' + addy48218 + '\'' + addy48218+'\/a'; //-- This is deliberately being done to prevent scripted usage. What exactly are you needing to do this for? You're basically going to have to execute the entire block of JavaScript code, and then decode the entities to get to what you want. Doing it manually is pretty easy; doing it automatically will virtually require a language interpreter. ChrisA -- https://mail.python.org/mailman/listinfo/python-list This is just about nuts and bolts, not about the ethics of presumed intentions. Hope it helps one way or other Frederic --- sample = '''//!-- document.getElementById('cloak48218').innerHTML = ''; var prefix = '#109;a' + 'i#108;' + '#116;o'; var path = 'hr' + 'ef' + '='; var addy48218 = '#105;nf#111;' + '#64;'; addy48218 = addy48218 + 'tsv-n#101;#117;r#105;#101;d' + '#46;' + 'd#101;'; document.getElementById('cloak48218').innerHTML += 'a ' + path + '\'' + prefix + ':' + addy48218 + ''' + addy48218+'\/a'; //--''' import SE # Download from PyPi at https://pypi.python.org/pypi/SE def make_se_translator (): # Make SE substitutions subs_list = [] # Make # code substitutions for i in range (256): subs_list.append ('#%d;=%c' % (i, chr(i))) # Delete Java stuff subs_list.append (' document.getElementById(\'cloak48218\').= ') subs_list.append (' var = \n= //!--= //--= ') # Java syntax? Tweaks needed to get the sample working subs_list.append (' + \'\'\'= \'\'\'=\'\' \/=/ ') # Add more as needed trial and error style # subs_list.append ( . . . format: ' old=new delete this= ' # Make text subs = '\n'.join (subs_list) # Make SE translator translator = SE.SE (subs) # return translator, subs # print subs, if you want to see what they look like return translator translator = make_se_translator () translation = translator (sample) print translation # See innerHTML = ''; prefix = 'ma' + 'il' + 'to'; path = 'hr' + 'ef' + '='; addy48218 = 'info' + '@'; addy48218 = addy48218 + 'tsv-neuried' + '.' +'de'; innerHTML += 'a ' + path +prefix + ':' + addy48218 + '' + addy48218+'/a'; exec (translation.lstrip ()) print innerHTML a href=mailto:i...@tsv-neuried.dei...@tsv-neuried.de/a -- https://mail.python.org/mailman/listinfo/python-list