Re: Function to Print a nicely formatted Dictionary or List?

2022-06-09 Thread Friedrich Rentsch
If you want tables layed out in a framing grid, you might want to take a 
look at this:


https://bitbucket.org/astanin/python-tabulate/pull-requests/31/allow-specifying-float-formats-per-column/diff

Frederic



On 6/9/22 12:43, Dave wrote:

Hi,

Before I write my own I wondering if anyone knows of a function that will print 
a nicely formatted dictionary?

By nicely formatted I mean not all on one line!

Cheers
Dave



--
https://mail.python.org/mailman/listinfo/python-list


Re: Global VS Local Subroutines

2022-02-10 Thread Friedrich Rentsch
I believe to have observed a difference which also might be worth 
noting: the imbedded function a() (second example) has access to all of 
the imbedding function's variables, which might be an efficiency factor 
with lots of variables. The access is read-only, though. If the inner 
function writes to one of the readable external variables, that variable 
becomes local to the inner function.


Frederic

On 2/10/22 1:13 PM, BlindAnagram wrote:

Is there any difference in performance between these two program layouts:

   def a():
 ...
   def(b):
 c = a(b)

or

   def(b):
 def a():
   ...
 c = a(b)

I would appreciate any insights on which layout to choose in which 
circumstances.




--
https://mail.python.org/mailman/listinfo/python-list


Re: Update a specific element in all a list of N lists

2021-12-19 Thread Friedrich Rentsch



On 12/16/21 3:00 PM, hanan lamaazi wrote:

Dear All,

I really need your assistance,

I have a dataset with 1005000 rows and 25 columns,

The main column that I repeatedly use are Time, ID, and Reputation

First I sliced the data based on the time, and I append the sliced data in
a list called "df_list". So I get 201 lists with 25 columns

The main code is starting for here:

for elem in df_list:

{do something.}

{Here I'm trying to calculate the outliers}

Out.append(outliers)

Now my problem is that I need to locate those outliers in the df_list and
then update another column with is the "Reputation"

Note that the there is a duplicated IDs but at different time slot

example is ID = 1 is outliers, I need to select all ID = 1 in the list and
update their reputation column

I tried those solutions:
1)

grp = data11.groupby(['ID'])
 for i in GlobalNotOutliers.ID:
 data11.loc[grp.get_group(i).index, 'Reput'] += 1

 for j in GlobalOutliers.ID:
 data11.loc[grp.get_group(j).index, 'Reput'] -= 1


It works for a dataframe but not for a list

2)

for elem in df_list:

elem.loc[elem['ID'].isin(Outlier['ID'])]


It doesn't select the right IDs, it gives the whole values in elem

3) Here I set the index using IDs:

for i in Outlier.index:
 for elem in df_list:
 print(elem.Reput)
 if i in elem.index:
# elem.loc[elem[i] , 'Reput'] += 1
 m = elem.iloc[i, :]
 print(m)


It gives this error:

IndexError: single positional indexer is out-of-bounds


I'm greatly thankful to anyone who can help me,


I'd suggest you group your records by date and put each group into a 
dict whose key is date. Collecting each record into its group, append to 
it the index of the respective record in the original list. Then go 
through all your groups, record by record, finding outliers. The last 
item in the record is the index of the record in the original list 
identifying the record you want to update. Something like this:


    dictionary = {}
    for i, record in enumerate (original_list):
        date = record [DATE_INDEX]
        if date in dictionary:
            dictionary [date].append ((record, i))
        else:
            dictionary[date] = [(record, i)]

    reputation_indexes = set ()
    for date, records in dictionary.items ():
        for record, i in records:
            if has_outlier (record):
                reputation_indexes.add (i)

    for i in reputation_idexes:
        update_reputation (original_list [i])

Frederic



--
https://mail.python.org/mailman/listinfo/python-list


Re: How to get dynamic data in html (javascript?)

2020-01-13 Thread Friedrich Rentsch



On 1/11/20 2:39 PM, Friedrich Rentsch wrote:

Hi all,

I'm pretty good at hacking html text. But I have no clue how to get 
dynamic data like this : "At close: {date} {time}". I would appreciate 
a starting push to narrow my focus, currently awfully unfocused. Thanks.


Frederic

Thanks for bothering. I'm sorry for having been too terse. The snippet 
was from an html download (wget 
https://finance.yahoo.com/quote/HYT/history?p=HYT). Here's a little more 
of it:  ".  .  .  ED_SHORT":"At close: {date} 
{time}","MARKET_TIME_NOTICE_CLOSED":"As of {date} {time}. 
{marketState}","MARKE . . .". I suppose the browser gets the values 
whose names appear in braces, dialoguing with the server. I believe it 
is javascript. I see sections marked  . . . section here . . . 
</tt><tt>. If it is javascript, the question would be how to run 
javascript in python, unless it runs on the server by requests sent by 
python from my end.


Frederic


--
https://mail.python.org/mailman/listinfo/python-list


How to get dynamic data in html (javascript?)

2020-01-11 Thread Friedrich Rentsch

Hi all,

I'm pretty good at hacking html text. But I have no clue how to get 
dynamic data like this : "At close: {date} {time}". I would appreciate a 
starting push to narrow my focus, currently awfully unfocused. Thanks.


Frederic

--
https://mail.python.org/mailman/listinfo/python-list


Re: pre-edit stuff persists in a reloaded a module

2019-10-07 Thread Friedrich Rentsch




On 10/5/19 1:48 PM, Friedrich Rentsch wrote:

Hi all,

Python 2.7. I habitually work interactively in an Idle window. 
Occasionally I correct code, reload and find that edits fail to load. 
I understand that reloading is not guaranteed to reload everything, 
but I don't understand the exact mechanism and would appreciate some 
illumination. Right now I am totally bewildered, having deleted and 
garbage collected a module and an object, reloaded the module and 
remade the object and when I inspect the corrected source 
(inspect.getsource (Object.run)) I see the uncorrected source, which 
isn't even on the disk anymore. The command 'reload' correctly 
displays the name of the source, ending '.py', indicating that it 
recognizes the source being newer than the compile ending '.pyc'. 
After the reload, the pyc-file is newer, indicating that it has been 
recompiled. But the runtime error persist. So the recompile must have 
used the uncorrected old code. I could kill python with signal 15, but 
would prefer a targeted purge that doesn't wipe clean my Idle 
workbench. (I know I should upgrade to version 3. I will as soon as I 
get around to it. Hopefully that will fix the problem.)


Thanks for comments

Frederic

Closing the thread with thanks to all who responded, offering excellent 
advice.


Frederic

--
https://mail.python.org/mailman/listinfo/python-list


Re: pre-edit stuff persists in a reloaded a module

2019-10-05 Thread Friedrich Rentsch




On 10/5/19 2:48 PM, Peter Otten wrote:

Friedrich Rentsch wrote:


Hi all,

Python 2.7. I habitually work interactively in an Idle window.
Occasionally I correct code, reload and find that edits fail to load. I
understand that reloading is not guaranteed to reload everything, but I
don't understand the exact mechanism and would appreciate some
illumination. Right now I am totally bewildered, having deleted and
garbage collected a module and an object, reloaded the module and remade
the object and when I inspect the corrected source (inspect.getsource
(Object.run)) I see the uncorrected source, which isn't even on the disk
anymore. The command 'reload' correctly displays the name of the source,
ending '.py', indicating that it recognizes the source being newer than
the compile ending '.pyc'. After the reload, the pyc-file is newer,
indicating that it has been recompiled. But the runtime error persist.
So the recompile must have used the uncorrected old code. I could kill
python with signal 15, but would prefer a targeted purge that doesn't
wipe clean my Idle workbench. (I know I should upgrade to version 3. I
will as soon as I get around to it. Hopefully that will fix the problem.)

Thanks for comments

(1) stay away from reload()
(2) inspect.getsource() uses a cache that you should be able to clear with
linecache.clear():

$ echo 'def f(): return "old"' > tmp.py
$ python
[...]

import inspect, tmp
inspect.getsource(tmp.f)

'def f(): return "old"\n'
[1]+  Angehalten  python
$ echo 'def f(): return "new"' > tmp.py
$ fg
python
reload(tmp)
reload(tmp)


inspect.getsource(tmp.f)

'def f(): return "old"\n'

import linecache; linecache.clearcache()
inspect.getsource(tmp.f)

'def f(): return "new"\n'

(3) see 1 ;)

Thank you, Peter. I guess, then, that not only 'inspect', but the 
compiler as well reads source off the line cache and clearing the latter 
would make 'reload' work as expected. Are there other snags lurking, 
that you advise against using 'reload'? What are the alternatives for 
developing iteratively, alternating between running and editing?


Frederic

--
https://mail.python.org/mailman/listinfo/python-list


pre-edit stuff persists in a reloaded a module

2019-10-05 Thread Friedrich Rentsch

Hi all,

Python 2.7. I habitually work interactively in an Idle window. 
Occasionally I correct code, reload and find that edits fail to load. I 
understand that reloading is not guaranteed to reload everything, but I 
don't understand the exact mechanism and would appreciate some 
illumination. Right now I am totally bewildered, having deleted and 
garbage collected a module and an object, reloaded the module and remade 
the object and when I inspect the corrected source (inspect.getsource 
(Object.run)) I see the uncorrected source, which isn't even on the disk 
anymore. The command 'reload' correctly displays the name of the source, 
ending '.py', indicating that it recognizes the source being newer than 
the compile ending '.pyc'. After the reload, the pyc-file is newer, 
indicating that it has been recompiled. But the runtime error persist. 
So the recompile must have used the uncorrected old code. I could kill 
python with signal 15, but would prefer a targeted purge that doesn't 
wipe clean my Idle workbench. (I know I should upgrade to version 3. I 
will as soon as I get around to it. Hopefully that will fix the problem.)


Thanks for comments

Frederic

--
https://mail.python.org/mailman/listinfo/python-list


Re: Regex to extract multiple fields in the same line

2018-06-15 Thread Friedrich Rentsch



On 06/15/2018 12:37 PM, Ganesh Pal wrote:

Hey Friedrich,

The proposed solution worked nice , Thank you for  the reply really
appreciate that


Only thing I think would need a review is   if the assignment of the value
of one dictionary to the another dictionary  if is done correctly ( lines
17 to 25 in the below code)


Here is my code :

root@X1:/Play_ground/SPECIAL_TYPES/REGEX# vim Friedrich.py
   1 import re
   2 from collections import OrderedDict
   3
   4 keys = ["struct", "loc", "size", "mirror",
   5 "filename","final_results"]
   6
   7 stats =  OrderedDict.fromkeys(keys)
   8
   9
  10 line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --size=8'
  11
  12 regex = re.compile (r"--(struct|loc|size|mirror|
log_file)\s*=\s*([^\s]+)")
  13 result = dict(re.findall(regex, line))
  14 print result
  15
  16 if result['log_file']:
  17stats['filename'] = result['log_file']
  18 if result['struct']:
  19stats['struct'] = result['struct']
  20 if result['size']:
  21stats['size'] = result['size']
  22 if result['loc']:
  23stats['loc'] = result['loc']
  24 if result['mirror']:
  25stats['mirror'] = result['mirror']
  26
  27 print stats
  28
Looks okay to me. If you'd read 'result' using 'get' you wouldn't need 
to test for the key. 'stats' would then have all keys and value None for 
keys missing in 'result':


stats['filename'] = result.get ('log_file')
stats['struct']   = result.get ('struct')

This may or may not suit your purpose.


Also, I think  the regex can just be
(r"--(struct|loc|size|mirror|log_file)=([^\s]+)")
no need to match white space character (\s* )  before and after the =
symbol because this would never happen ( this line is actually a key=value
pair of a dictionary getting logged)

You are right. I thought your sample line had a space in one of the 
groups and didn't reread to verify, letting the false impression take 
hold. Sorry about that.


Frederic



Regards,
Ganesh






On Fri, Jun 15, 2018 at 12:53 PM, Friedrich Rentsch <
anthra.nor...@bluewin.ch> wrote:


Hi Ganesch. Having proposed a solution to your problem, it would be kind
of you to let me know whether it has helped. In case you missed my
response, I repeat it:


regex = re.compile (r"--(struct|loc|size|mirror|l

og_file)\s*=\s*([^\s]+)")

regex.findall (line)

[('struct', 'data_block'), ('log_file', '/var/1000111/test18.log'),
('loc', '0'), ('mirror', '10')]

Frederic


On 06/13/2018 07:32 PM, Ganesh Pal wrote:


On Wed, Jun 13, 2018 at 5:59 PM, Rhodri James 
wrote:

On 13/06/18 09:08, Ganesh Pal wrote:

Hi Team,

I wanted to parse a file and extract few feilds that are present after
"="
in a text file .


Example , form  the below line I need to extract the values present
after
--struct =, --loc=, --size= and --log_file=

Sample input

line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt size=8'

Did you mean "--size=8" at the end?  That's what your explanation

implied.




Yes James you got it right ,  I  meant  "--size=8 " .,


Hi Team,


I played further with python's re.findall()  and  I am able to extract all
the required  fields , I have 2 further questions too , please suggest


Question 1:

   Please let me know  the mistakes in the below code and  suggest if it
can
be optimized further with better regex


# This code has to extract various the fields  from a single line (
assuming the line is matched here ) of a log file that contains various
values (and then store the extracted values in a dictionary )

import re

line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --size=8'

#loc is an number
r_loc = r"--loc=([0-9]+)"
r_size = r'--size=([0-9]+)'
r_struct = r'--struct=([A-Za-z_]+)'
r_log_file = r'--log_file=([A-Za-z0-9_/.]+)'


if re.findall(r_loc, line):
 print re.findall(r_loc, line)

if re.findall(r_size, line):
 print re.findall(r_size, line)

if re.findall(r_struct, line):
 print re.findall(r_struct, line)

if re.findall(r_log_file, line):
 print re.findall(r_log_file, line)


o/p:
root@X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py
['0']
['8']
['data_block']
['/var/1000111/test18.log']


Question 2:

I  tried to see if I can use  re.search with look behind assertion , it
seems to work , any comments or suggestions

Example:

import re

line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --size=8'

match = re.search(r'(?P(?<=--loc=)([0-9]+))', line)

Re: Regex to extract multiple fields in the same line

2018-06-13 Thread Friedrich Rentsch




On 06/13/2018 07:32 PM, Ganesh Pal wrote:

On Wed, Jun 13, 2018 at 5:59 PM, Rhodri James  wrote:


On 13/06/18 09:08, Ganesh Pal wrote:


   Hi Team,

I wanted to parse a file and extract few feilds that are present after "="
in a text file .


Example , form  the below line I need to extract the values present after
--struct =, --loc=, --size= and --log_file=

Sample input

line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt size=8'


Did you mean "--size=8" at the end?  That's what your explanation implied.

How's this? (Supposing that the values contain no spaces):

>>> regex = re.compile 
(r"--(struct|loc|size|mirror|log_file)\s*=\s*([^\s]+)")

>>> regex.findall (line)
[('struct', 'data_block'), ('log_file', '/var/1000111/test18.log'), 
('loc', '0'), ('mirror', '10')]


Frederic








Yes James you got it right ,  I  meant  "--size=8 " .,


Hi Team,


I played further with python's re.findall()  and  I am able to extract all
the required  fields , I have 2 further questions too , please suggest


Question 1:

  Please let me know  the mistakes in the below code and  suggest if it  can
be optimized further with better regex


# This code has to extract various the fields  from a single line (
assuming the line is matched here ) of a log file that contains various
values (and then store the extracted values in a dictionary )

import re

line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --size=8'

#loc is an number
r_loc = r"--loc=([0-9]+)"
r_size = r'--size=([0-9]+)'
r_struct = r'--struct=([A-Za-z_]+)'
r_log_file = r'--log_file=([A-Za-z0-9_/.]+)'


if re.findall(r_loc, line):
print re.findall(r_loc, line)

if re.findall(r_size, line):
print re.findall(r_size, line)

if re.findall(r_struct, line):
print re.findall(r_struct, line)

if re.findall(r_log_file, line):
print re.findall(r_log_file, line)


o/p:
root@X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py
['0']
['8']
['data_block']
['/var/1000111/test18.log']


Question 2:

I  tried to see if I can use  re.search with look behind assertion , it
seems to work , any comments or suggestions

Example:

import re

line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --size=8'

match = re.search(r'(?P(?<=--loc=)([0-9]+))', line)
if match:
print match.group('loc')


o/p: root@X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py

0


I  want to build  the sub patterns and use match.group() to get the values
, some thing as show below but it doesn't seem to work


match = re.search(r'(?P(?<=--loc=)([0-9]+))'
   r'(?P(?<=--size=)([0-9]+))', line)
if match:
print match.group('loc')
print match.group('size')

Regards,
Ganesh


--
https://mail.python.org/mailman/listinfo/python-list


Re: stock quotes off the web, py style

2018-05-16 Thread Friedrich Rentsch



On 05/16/2018 06:21 PM, Mike McClain wrote:

On Wed, May 16, 2018 at 02:33:23PM +0200, Friedrich Rentsch wrote:


I didn't know the site you mention. I've been getting quotes from
Yahoo daily. The service they discontinued was for up to 50 symbols
per page. I now parse a separate page of some 500K of html for each
symbol! This site is certainly more concise and surely a lot faster.

 Thank you sir for the response and code snippet.
As it turns out iextrading.com doesn't supply data on mutuals which
are the majority of my portfolio so they are not goimng to do me much
good after all.
 If you please, what is the URL of one stock you're getting from
Yahoo that requires parsing 500K of html per symbol? That's better
than not getting the quotes.
 If AlphaVantage ever comes back up, they send 100 days quotes for
each symbol and I only use today's and yesterday's, but it is easy to
parse.


You would do multiple symbols in a loop which you enter with an open
urllib object, rather than opening a new one for each symbol inside
the loop.

 At the moment I can't see how to do that but will figure it out.
Thanks for the pointer.

Mike
--
"There are three kinds of men. The ones who learn by reading. The
few who learn by observation. The rest of them have to pee on the
electric fence for themselves." --- Will Rogers
I meant to check out AlphaVantage myself and registered, since it 
appears to be a kind of interest group. I wasn't aware it is down, 
because I haven't yet tried to log on. But I hope to do so when it comes 
back.


The way I get quotes from Yahoo is a hack: 1. Get a quote on the Yahoo 
web page. 2. Copy the url. 
(https://finance.yahoo.com/quote/IBM?p=IBM=1). 3. Compose 
such urls in a loop one symbol at a time and read nearly 600K of html 
text for each of them. 4. Parse the text for the numbers I want to 
extract. Needles in a haystack. Slow for a large set of symbols and 
grossly inefficient in terms of data traffic.


Forget my last suggestion "You would do multiple symbols . . ." that was 
wrong. You have to open a urllib object for every symbol, the same way 
you'd open a file for every file name.


And thanks to the practitioners for the warnings against using 'eval'. I 
have hardly ever used it, never in online communications. So my 
awareness level is low. But I understand the need to be careful.


Frederic




 


You would do multiple symbols

"You would do multiple symbols

--
https://mail.python.org/mailman/listinfo/python-list


Re: stock quotes off the web, py style

2018-05-16 Thread Friedrich Rentsch



On 05/16/2018 02:23 AM, Mike McClain wrote:

 Initially I got my quotes from a broker daily to plug into a
spreadsheet, Then I found Yahoo and wrote a perl script to grab them.
When Yahoo quit supplying quotes I found AlphaVantage.co and rewrote
the perl script.
 AlphaVantage.co has been down since last week and I found
iextrading.com has a freely available interface. Since it needs
a rewrite and I'm trying to get a handle on python this seems
like a good opportunity to explore.
 If someone would please suggest modules to explore. Are there any
upper level modules that would allow me to do something like:

from module import get
def getAquote(symbol):
 url = 'https://api.iextrading.com/1.0/stock/()/quote'.format(symbol)
 reply = module.get(url)
 return my_parse(reply)

Thanks,
Mike
--
Men occasionally stumble over the truth, but most of them pick
themselves up and hurry off as if nothing ever happened.
 - Churchill


I didn't know the site you mention. I've been getting quotes from Yahoo 
daily. The service they discontinued was for up to 50 symbols per page. 
I now parse a separate page of some 500K of html for each symbol! This 
site is certainly more concise and surely a lot faster. It serves a 
naked set of data, which happens to conform to the python source code 
specification for dictionaries and consequently can be compiled into a 
dictionary with 'eval', like so:


>>> ibm = urllib2.urlopen 
("https://api.iextrading.com/1.0/stock/IBM/quote;).read()

>>> ibm = eval (ibm)
>>> for item in sorted (ibm.items()): print '%-24s%s' % item

avgTotalVolume  5331869
calculationPrice    close
change  -0.56
changePercent   -0.00388
close   143.74
closeTime   1526414517398
companyName International Business Machines Corporation
delayedPrice    143.74
delayedPriceTime    1526414517398
high    143.99
iexAskPrice 0
iexAskSize  0
iexBidPrice 0
iexBidSize  0
iexLastUpdated  0
iexMarketPercent    0
iexRealtimePrice    0
iexRealtimeSize 0
iexVolume   0
latestPrice 143.74
latestSource    Close
latestTime  May 15, 2018
latestUpdate    1526414517398
latestVolume    4085996
low 142.92
marketCap   131948764304
open    143.5
openTime    1526391000646
peRatio 10.34
previousClose   144.3
primaryExchange New York Stock Exchange
sector  Technology
symbol  IBM
week52High  171.13
week52Low   139.13
ytdChange   -0.0485148849103

You would do multiple symbols in a loop which you enter with an open 
urllib object, rather than opening a new one for each symbol inside the 
loop.


Frederic

--
https://mail.python.org/mailman/listinfo/python-list


Re: General Purpose Pipeline library?

2017-11-22 Thread Friedrich Rentsch



On 11/22/2017 10:54 AM, Friedrich Rentsch wrote:



On 11/21/2017 03:26 PM, Jason wrote:

On Monday, November 20, 2017 at 10:49:01 AM UTC-5, Jason wrote:
a pipeline can be described as a sequence of functions that are 
applied to an input with each subsequent function getting the output 
of the preceding function:


out = f6(f5(f4(f3(f2(f1(in))

However this isn't very readable and does not support conditionals.

Tensorflow has tensor-focused pipepines:
 fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, 
scope='fc1')
 fc2 = layers.fully_connected(fc1, 256, 
activation_fn=tf.nn.relu, scope='fc2')
 out = layers.fully_connected(fc2, 10, activation_fn=None, 
scope='out')


I have some code which allows me to mimic this, but with an implied 
parameter.


def executePipeline(steps, collection_funcs = [map, filter, reduce]):
results = None
for step in steps:
    func = step[0]
    params = step[1]
    if func in collection_funcs:
    print func, params[0]
    results = func(functools.partial(params[0], 
*params[1:]), results)

    else:
    print func
    if results is None:
    results = func(*params)
    else:
    results = func(*(params+(results,)))
return results

executePipeline( [
    (read_rows, (in_file,)),
    (map, (lower_row, field)),
    (stash_rows, ('stashed_file', )),
    (map, (lemmatize_row, field)),
    (vectorize_rows, (field, min_count,)),
    (evaluate_rows, (weights, None)),
    (recombine_rows, ('stashed_file', )),
    (write_rows, (out_file,))
    ]
)

Which gets me close, but I can't control where rows gets passed in. 
In the above code, it is always the last parameter.


I feel like I'm reinventing a wheel here.  I was wondering if 
there's already something that exists?
Why do I want this? Because I'm tired of writing code that is locked 
away in a bespoke function. I'd  have an army of functions all 
slightly different in functionality. I require flexibility in 
defining pipelines, and I don't want a custom pipeline to require any 
low-level coding. I just want to feed a sequence of functions to a 
script and have it process it. A middle ground between the shell | 
operator and bespoke python code. Sure, I could write many binaries 
bound by shell, but there are some things done far easier in python 
because of its extensive libraries and it can exist throughout the 
execution of the pipeline whereas any temporary persistence  has to 
be though environment variables or files.


Well after examining your feedback, it looks like Grapevine has 99% 
of the concepts that I wanted to invent, even if the | operator seems 
a bit clunky. I personally prefer the affluent interface convention. 
But this should work.


Kamaelia could also work, but it seems a little bit more grandiose.


Thanks everyone who chimed in!


This looks very much like I what I have been working on of late: a 
generic processing paradigm based on chainable building blocks. I call 
them Workshops, because the base class can be thought of as a workshop 
that takes some raw material, processes it and delivers the product 
(to the next in line). Your example might look something like this:


    >>> import workshops as WS

    >>> Vectorizer = WS.Chain (
            WS.File_Reader (),        # WS provides
            WS.Map (lower_row),   # WS provides (wrapped builtin)
            Row_Stasher (),           # You provide
            WS.Map (lemmatize_row),   # WS provides
        Row_Vectorizer (),        # Yours
        Row_Evaluator (),     # Yours
            Row_Recombiner (),
        WS.File_Writer (),
            _name = 'Vectorizer'
        )

    Parameters are process-control settings that travel through a 
subscription-based mailing system separate from the payload pipe.


    >>> Vectorizer.post (min_count = ...,  ) # Set all parameters that 
control the entire run.
    >>> Vectorizer.post ("File_Writer", file_name = 
'output_file_name')    # Addressed, not meant for File_Reader


    Run

    >>> Vectorizer ('input_file_name')    # File Writer returns 0 if 
the Chain completes successfully.

    0

    If you would provide a list of your functions (input, output, 
parameters) I'd be happy to show a functioning solution. Writing a 
Shop follows a simple standard pattern: Naming the subscriptions, if 
any, and writing a single method that reads the subscribed parameters, 
if any, then takes payload, processes it and returns the product.


    I intend to share the system, provided there's an interest. I'd 
have to tidy it up quite a bit, though, before daring to release it.


    There's a lot more to it . . .

Frederic

I'm sorry, I made a mistake with the "From" item. My address is 
obviously not &quo

Re: General Purpose Pipeline library?

2017-11-22 Thread Friedrich Rentsch



On 11/21/2017 03:26 PM, Jason wrote:

On Monday, November 20, 2017 at 10:49:01 AM UTC-5, Jason wrote:

a pipeline can be described as a sequence of functions that are applied to an 
input with each subsequent function getting the output of the preceding 
function:

out = f6(f5(f4(f3(f2(f1(in))

However this isn't very readable and does not support conditionals.

Tensorflow has tensor-focused pipepines:
 fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
 fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, 
scope='fc2')
 out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')

I have some code which allows me to mimic this, but with an implied parameter.

def executePipeline(steps, collection_funcs = [map, filter, reduce]):
results = None
for step in steps:
func = step[0]
params = step[1]
if func in collection_funcs:
print func, params[0]
results = func(functools.partial(params[0], 
*params[1:]), results)
else:
print func
if results is None:
results = func(*params)
else:
results = func(*(params+(results,)))
return results

executePipeline( [
(read_rows, (in_file,)),
(map, (lower_row, field)),
(stash_rows, ('stashed_file', )),
(map, (lemmatize_row, field)),
(vectorize_rows, (field, min_count,)),
(evaluate_rows, (weights, None)),
(recombine_rows, ('stashed_file', )),
(write_rows, (out_file,))
]
)

Which gets me close, but I can't control where rows gets passed in. In the 
above code, it is always the last parameter.

I feel like I'm reinventing a wheel here.  I was wondering if there's already 
something that exists?

Why do I want this? Because I'm tired of writing code that is locked away in a 
bespoke function. I'd  have an army of functions all slightly different in 
functionality. I require flexibility in defining pipelines, and I don't want a 
custom pipeline to require any low-level coding. I just want to feed a sequence 
of functions to a script and have it process it. A middle ground between the 
shell | operator and bespoke python code. Sure, I could write many binaries 
bound by shell, but there are some things done far easier in python because of 
its extensive libraries and it can exist throughout the execution of the 
pipeline whereas any temporary persistence  has to be though environment 
variables or files.

Well after examining your feedback, it looks like Grapevine has 99% of the 
concepts that I wanted to invent, even if the | operator seems a bit clunky. I 
personally prefer the affluent interface convention. But this should work.

Kamaelia could also work, but it seems a little bit more grandiose.


Thanks everyone who chimed in!


--
https://mail.python.org/mailman/listinfo/python-list


Re: General Purpose Pipeline library?

2017-11-22 Thread Friedrich Rentsch



On 11/21/2017 03:26 PM, Jason wrote:

On Monday, November 20, 2017 at 10:49:01 AM UTC-5, Jason wrote:

a pipeline can be described as a sequence of functions that are applied to an 
input with each subsequent function getting the output of the preceding 
function:

out = f6(f5(f4(f3(f2(f1(in))

However this isn't very readable and does not support conditionals.

Tensorflow has tensor-focused pipepines:
 fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
 fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, 
scope='fc2')
 out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')

I have some code which allows me to mimic this, but with an implied parameter.

def executePipeline(steps, collection_funcs = [map, filter, reduce]):
results = None
for step in steps:
func = step[0]
params = step[1]
if func in collection_funcs:
print func, params[0]
results = func(functools.partial(params[0], 
*params[1:]), results)
else:
print func
if results is None:
results = func(*params)
else:
results = func(*(params+(results,)))
return results

executePipeline( [
(read_rows, (in_file,)),
(map, (lower_row, field)),
(stash_rows, ('stashed_file', )),
(map, (lemmatize_row, field)),
(vectorize_rows, (field, min_count,)),
(evaluate_rows, (weights, None)),
(recombine_rows, ('stashed_file', )),
(write_rows, (out_file,))
]
)

Which gets me close, but I can't control where rows gets passed in. In the 
above code, it is always the last parameter.

I feel like I'm reinventing a wheel here.  I was wondering if there's already 
something that exists?

Why do I want this? Because I'm tired of writing code that is locked away in a 
bespoke function. I'd  have an army of functions all slightly different in 
functionality. I require flexibility in defining pipelines, and I don't want a 
custom pipeline to require any low-level coding. I just want to feed a sequence 
of functions to a script and have it process it. A middle ground between the 
shell | operator and bespoke python code. Sure, I could write many binaries 
bound by shell, but there are some things done far easier in python because of 
its extensive libraries and it can exist throughout the execution of the 
pipeline whereas any temporary persistence  has to be though environment 
variables or files.

Well after examining your feedback, it looks like Grapevine has 99% of the 
concepts that I wanted to invent, even if the | operator seems a bit clunky. I 
personally prefer the affluent interface convention. But this should work.

Kamaelia could also work, but it seems a little bit more grandiose.


Thanks everyone who chimed in!


This looks very much like I what I have been working on of late: a 
generic processing paradigm based on chainable building blocks. I call 
them Workshops, because the base class can be thought of as a workshop 
that takes some raw material, processes it and delivers the product (to 
the next in line). Your example might look something like this:


    >>> import workshops as WS

    >>> Vectorizer = WS.Chain (
            WS.File_Reader (),        # WS provides
            WS.Map (lower_row),   # WS provides (wrapped builtin)
            Row_Stasher (),           # You provide
            WS.Map (lemmatize_row),   # WS provides. Name for addressed 
Directions sending.

        Row_Vectorizer (),        # Yours
        Row_Evaluator (),     # Yours
            Row_Recombiner (),
        WS.File_Writer (),
            _name = 'Vectorizer'
        )

    Parameters are process-control settings that travel through a 
subscription-based mailing system separate from the payload pipe.


    >>> Vectorizer.post (min_count = ...,  ) # Set all parameters that 
control the entire run.
    >>> Vectorizer.post ("File_Writer", file_name = 
'output_file_name')    # Addressed, not meant for File_Reader


    Run

    >>> Vectorizer ('input_file_name')    # File Writer returns 0 if 
the Chain completes successfully.

    0

    If you would provide a list of your functions (input, output, 
parameters) I'd be happy to show a functioning solution. Writing a Shop 
follows a simple standard pattern: Naming the subscriptions, if any, and 
writing a single method that reads the subscribed parameters, if any, 
then takes payload, processes it and returns the product.


    I intend to share the system, provided there's an interest. I'd 
have to tidy 

execfile and import not working

2017-09-06 Thread Friedrich Rentsch
Hi, I am setting up Python 2.7 after an upgrade to Ubuntu 16.04, a 
thorough one, leaving no survivors. Everything is fine, IDLE opens, 
ready to go. Alas, execfile and import commands don't do my bidding, but 
hang IDLE. All I can do is kill the process named "python" from a bash 
terminal. IDLE then is still open, says "=== RESTART: Shell ===" and is 
again ready for action. It works interactively, but no imports . . . 
What could be the problem?


Thanks for ideas

Frederic


--
https://mail.python.org/mailman/listinfo/python-list


Re: Redirecting input of IDLE window

2017-08-15 Thread Friedrich Rentsch



On 08/14/2017 10:47 AM, Friedrich Rentsch wrote:

Hi,

I work interactively in an IDLE window most of the time and find 
"help (...)" very useful to summarize things. The display comes up 
directly (doesn't return a text, which I could edit, assign or store). 
I suspect that there are ways to redirect the display, say to a file. 
Thanks for suggestions.



Frederic




Peter Otten's "mypager" works well. All suggestions provide a welcome 
opportunity to learn more about the inner workings. Thank you all for 
your responses.


Frederic



--
https://mail.python.org/mailman/listinfo/python-list


Redirecting input of IDLE window

2017-08-14 Thread Friedrich Rentsch

Hi,

I work interactively in an IDLE window most of the time and find 
"help (...)" very useful to summarize things. The display comes up 
directly (doesn't return a text, which I could edit, assign or store). I 
suspect that there are ways to redirect the display, say to a file. 
Thanks for suggestions.



Frederic


--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding the name of an object's source file

2017-06-06 Thread Friedrich Rentsch



On 06/06/2017 03:52 PM, Matt Wheeler wrote:

On Tue, 6 Jun 2017 at 11:20 Peter Otten <__pete...@web.de> wrote:


import os
inspect.getsourcefile(os.path.split)

'/usr/lib/python3.4/posixpath.py'

And so much more fun than scanning the documentation :)


Alternatively, without using inspect, we can get around `Object.__module__`
being a string by importing it as a string:


import importlib, os
importlib.import_module(os.path.split.__module__).__file__

'/Users/matt/.pyenv/versions/3.6.0/lib/python3.6/posixpath.py'



Stupendous! Thanks both of you. I tried the Peter's inspect-based method 
on a hierarchical assembly of objects:


>>> def sources (S, indent = 0):
print indent * '\t' + '%-60s%s' % (S.__class__, 
inspect.getsourcefile (S.__class__))

if isinstance (S, WS._Association):
for s in S:
sources (s, indent + 1)


>>> sources (M[1][1])
 /home/fr/python/util/workshops.py
 /home/fr/python/util/workshops.py
 
/home/fr/python/finance/position.py
positions_initializer.Positions_Initializer 
/home/fr/python/finance/positions_initializer.py

position.Position_Activity /home/fr/python/finance/position.py
position.normalizer /home/fr/python/finance/position.py
position.split_adjuster /home/fr/python/finance/position.py
position.Journalizer /home/fr/python/finance/position.py
current_work.report_unmerger /home/fr/temp/current_work.py
 /home/fr/python/finance/position.py
position.report_name_sender /home/fr/python/finance/position.py
position.Summary_Report /home/fr/python/finance/position.py
workshops.File_Writer /home/fr/python/util/workshops.py


Wonderful! Awesome! Matt's solution works well too.

Thanks a million

Frederic

--
https://mail.python.org/mailman/listinfo/python-list


Finding the name of an object's source file

2017-06-06 Thread Friedrich Rentsch

Hi all,

Developing a project, I have portions that work and should be assembled 
into a final program. Some parts don't interconnect when they should, 
because of my lack of rigor in managing versions. So in order to get on, 
I should next tidy up the mess and the way I can think of to do it is to 
read out the source file names of all of a working component's elements, 
then delete unused files and consolidate redundancy.


So, the task would be to find source file names. inspect.getsource () 
knows which file to take the source from, but as far as I can tell, none 
of its methods reveals that name, if called on a command line (>>> 
inspect.(get source file name) ()). Object.__module__ works. 
Module.__file__ works, but Object.__module__.__file__ doesn't, because 
Object.__module__ is only a string.


After one hour of googling, I believe inspect() is used mainly at 
runtime (introspection?) for tracing purposes. An alternative to 
inspect() has not come up. I guess I could grep inspect.(getsource ()), 
but that doesn't feel right. There's probably a simpler way. Any 
suggestions?


Frederic


--
https://mail.python.org/mailman/listinfo/python-list


Re: use regex to search the page one time to get two types of Information

2016-08-19 Thread Friedrich Rentsch

On 08/19/2016 09:02 AM, iMath wrote:

I need to use regex to search two types of Information within a web page, while 
it seems  searching the page two times rather than one is much time consuming , 
is it possible to search the page one time to get two or more types of 
Information?


>>> r = re.compile ('page|Information|time')
>>> r.findall ( (your post) )
['Information', 'page', 'page', 'time', 'time', 'page', 'time', 
'Information']


Does that look right?

Frederic


--
https://mail.python.org/mailman/listinfo/python-list


Re: Scraping email to make invoice

2016-04-24 Thread Friedrich Rentsch



On 04/24/2016 08:58 PM, CM wrote:

I would like to write a Pythons script to automate a tedious process and could 
use some advice.

The source content will be an email that has 5-10 PO (purchase order) numbers 
and information for freelance work done. The target content will be an invoice. 
(There will be an email like this every week).

Right now, the "recommended" way to go (from the company) from source to target 
is manually copying and pasting all the tedious details of the work done into the 
invoice. But this is laborious, error-prone...and just begging for automation. There is 
no human judgment necessary whatsoever in this.

I'm comfortable with "scraping" a text file and have written scripts for this, 
but could use some pointers on other parts of this operation.

1. INPUT: What's the best way to scrape an email like this? The email is to a 
Gmail account, and the content shows up in the email as a series of basically 
6x7 tables (HTML?), one table per PO number/task. I know if the freelancer were 
to copy and paste the whole set of tables into a text file and save it as plain 
text, Python could easily scrape that file, but I'd much prefer to save the 
user those steps. Is there a relatively easy way to go from the Gmail email to 
generating the invoice directly? (I know there is, but wasn't sure what is 
state of the art these days).

2. OUPUT: The invoice will have boilerplate content on top and then an Excel 
table at bottom that is mostly the same information from the source content. 
Ideally, so that the invoice looks good, the invoice should be a Word document. 
For the first pass at this, it looked best by laying out the entire invoice in 
Excel and then copy and pasting it into a Word doc as an image (since otherwise 
the columns ran over for some reason). In any case, the goal is to create a 
single page invoice that looks like a clean, professional looking invoice.

3. UI: I am comfortable with making GUI apps, so could use this as the interface for the (somewhat 
computer-uncomfortable) user. But the less user actions necessary, the better. The emails always come from 
the same sender, and always have the same boilerplate language ("Below please find your Purchase Order 
(PO)"), so I'm envisioning a small GUI window with a single button that says "MAKE NEWEST 
INVOICE" and the user presses it and it automatically searches the user's email for PO # emails and 
creates the newest invoice. I'm guessing I could keep a sqlite database or flat file on the computer to just 
track what is meant by "newest", and then the output would have the date created in the file, so 
the user can be sure what has been invoiced.

I'm hoping I can write this in a couple of days.

Any suggestions welcome! Thanks.


INPUT: What's the best way to scrape an email like this?  --  Like what? You 
need to explain what exactly your input is or show an example.

Frederic




--
https://mail.python.org/mailman/listinfo/python-list


Re: Review Request of Python Code

2016-03-09 Thread Friedrich Rentsch



On 03/09/2016 05:18 AM, subhabangal...@gmail.com wrote:

Dear Group,

I am trying to write a code for pulling data from MySQL at the backend and 
annotating words and trying to put the results as separated sentences with each 
line. The code is generally running fine but I am feeling it may be better in 
the end of giving out sentences, and for small data sets it is okay but with 
50,000 news articles it is performing dead slow. I am using Python2.7.11 on 
Windows 7 with 8GB RAM.

I am trying to copy the code here, for your kind review.

import MySQLdb
import nltk
def sql_connect_NewTest1():
 db = MySQLdb.connect(host="localhost",
  user="*",
  passwd="*",
  db="abcd_efgh")
 cur = db.cursor()
 #cur.execute("SELECT * FROM newsinput limit 0,5;") #REPORTING RUNTIME 
ERROR
 cur.execute("SELECT * FROM newsinput limit 0,50;")
 dict_open=open("/python27/NewTotalTag.txt","r") #OPENING THE DICTIONARY 
FILE
 dict_read=dict_open.read()
 dict_word=dict_read.split()
 a4=dict_word #Assignment for code.
 list1=[]
 flist1=[]
 nlist=[]
 for row in cur.fetchall():
 #print row[2]
 var1=row[3]
 #print var1 #Printing lines
 #var2=len(var1) # Length of file
 var3=var1.split(".") #SPLITTING INTO LINES
 #print var3 #Printing The Lines
 #list1.append(var1)
 var4=len(var3) #Number of all lines
 #print "No",var4
 for line in var3:
 #print line
 #flist1.append(line)
 linew=line.split()
 for word in linew:
 if word in a4:
 windex=a4.index(word)
 windex1=windex+1
 word1=a4[windex1]
 word2=word+"/"+word1
 nlist.append(word2)
 #print list1
 #print nlist
 elif word not in a4:
 word3=word+"/"+"NA"
 nlist.append(word3)
 #print list1
 #print nlist
 else:
 print "None"
 
 #print "###",flist1

 #print len(flist1)
 #db.close()
 #print nlist
 lol = lambda lst, sz: [lst[i:i+sz] for i in range(0, len(lst), sz)] 
#TRYING TO SPLIT THE RESULTS AS SENTENCES
 nlist1=lol(nlist,7)
 #print nlist1
 for i in nlist1:
 string1=" ".join(i)
 print i
 #print string1
 

Thanks in Advance.
 
 


I have a modular processing framework in its final stages of completion 
whose purpose is to save (a lot of) time coding the kind of problem you 
describe. I intend to upload the system and am currently interested in 
real-world cases for the manual. I tried coding your problem, thinking 
it would take no more than a minute. It wasn't that easy, because don't 
say what input you have, nor what you expect your program to do. 
Inferring the missing info from your code takes more time that I can 
spare. So, if you would give a few lines of your input and explain your 
purpose, I'd be happy to help.


Frederic

--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding Blank Columns in CSV

2015-10-05 Thread Friedrich Rentsch



On 10/05/2015 03:29 PM, Jaydip Chakrabarty wrote:

Hello,

I have a csv file like this.

Name,Surname,Age,Sex
abc,def,,M
,ghi,,F
jkl,mno,,
pqr,,,F

I want to find out the blank columns, that is, fields where all the
values are blank. Here is my python code.

fn = "tmp1.csv"
fin = open(fn, 'rb')
rdr = csv.DictReader(fin, delimiter=',')
data = list(rdr)
flds = rdr.fieldnames
fin.close()
mt = []
flag = 0
for i in range(len(flds)):
 for row in data:
 if len(row[flds[i]]):
 flag = 0
 break
 else:
 flag = 1
 if flag:
 mt.append(flds[i])
 flag = 0
print mt

I need to know if there is better way to code this.

Thanks.

Operations on columns are often simpler, if a table is rotated 
beforehand. Columns become lists.


def find_empty_columns (table):
number_of_records = len (table)
rotated_table = zip (*table)
indices_of_empty_columns = []
for i in range (len (rotated_table)):  # Column indices
if rotated_table[i].count ('') == number_of_records:
indices_of_empty_columns.append (i)
return indices_of_empty_columns

Frederic
--
https://mail.python.org/mailman/listinfo/python-list


Re: Reading \n unescaped from a file

2015-09-06 Thread Friedrich Rentsch



On 09/06/2015 09:51 AM, Peter Otten wrote:

Friedrich Rentsch wrote:


My response was meant for the list, but went to Peter by mistake. So I
repeat it with some delay:

On 09/03/2015 04:24 PM, Peter Otten wrote:

Friedrich Rentsch wrote:


On 09/03/2015 11:24 AM, Peter Otten wrote:

Friedrich Rentsch wrote:

I appreciate your identifying two mistakes. I am curious to know what
they are.

Sorry for not being explicit.


substitutes = [self.table [item] for item in hits if
item
in valid_hits] + []  # Make lengths equal for zip to work right

That looks wrong...

You are adding an empty list here. I wondered what you were trying to
achieve with that.

Right you are! It doesn't do anything. I remember my idea was to pad the
substitutes list by one, because the list of intervening text segments
is longer by one element and zip uses the least common length,
discarding all overhang. The remedy was totally ineffective and, what's
more, not needed, judging by the way the editor performs as expected.

That's because you are getting the same effect later by adding

nohits[-1]

You could avoid that by replacing [] with [""].


substitutes = list("12")
nohits = list("abc")
zipped = zip(nohits, substitutes)
"".join(list(reduce(lambda a, b: a+b, [zipped][0]))) + nohits[-1]

'a1b2c'

zipped = zip(nohits, substitutes + [""])
"".join(list(reduce(lambda a, b: a+b, [zipped][0])))

'a1b2c'

By the way, even those who are into functional programming might find


"".join(map("".join, zipped))

'a1b2c'

more readable.

But there's a more general change that I suggest: instead of processing the
string twice, first to search for matches, then for the surrounding text you
could achieve the same in one pass with a cool feature of the re.sub()
method -- it accepts a function:


def replace(text, replacements):

... table = dict(replacements)
... def substitute(match):
... return table[match.group()]
... regex = "|".join(re.escape(find) for find, replace in replacements)
... return re.compile(regex).sub(substitute, text)
...

replace("1 foo 2 bar 1 baz", [("1", "one"), ("2", "two")])

'one foo two bar one baz'




I didn't think of using sub. But you're right. It is better, likely 
faster too. Building the regex reversed sorted will make it handle 
overlapping targets correctly, e.g.:


r = (
("1", "one"),
("2", "two"),
("12", "twelve"),
)

Your function as posted:

replace ('1 foo 2 bar 12 baz', r)
'one foo two bar onetwo baz'

regex = "|".join(re.escape(find) for find, replace in reversed (sorted 
(replacements)))


replace ('1 foo 2 bar 12 baz', r)
'one foo two bar twelve baz'

Thanks for the hints

Frederic

--
https://mail.python.org/mailman/listinfo/python-list


Re: Reading \n unescaped from a file

2015-09-04 Thread Friedrich Rentsch
My response was meant for the list, but went to Peter by mistake. So I 
repeat it with some delay:


On 09/03/2015 04:24 PM, Peter Otten wrote:

Friedrich Rentsch wrote:


On 09/03/2015 11:24 AM, Peter Otten wrote:

Friedrich Rentsch wrote:

I appreciate your identifying two mistakes. I am curious to know what
they are.

Sorry for not being explicit.


   substitutes = [self.table [item] for item in hits if item
in valid_hits] + []  # Make lengths equal for zip to work right

That looks wrong...

You are adding an empty list here. I wondered what you were trying to
achieve with that.
Right you are! It doesn't do anything. I remember my idea was to pad the 
substitutes list by one, because the list of intervening text segments 
is longer by one element and zip uses the least common length, 
discarding all overhang. The remedy was totally ineffective and, what's 
more, not needed, judging by the way the editor performs as expected.

   output = input

...and so does this.

That seems to be the only occurence of the name "input" in your code. Did
you mean "text" or do you really want to return the built-in?

Right you are again! I did mean text. I changed a few names to make them 
more suggestive, and apparently missed this one.


Frederic
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Reading \n unescaped from a file

2015-09-03 Thread Friedrich Rentsch



On 09/03/2015 11:24 AM, Peter Otten wrote:

Friedrich Rentsch wrote:



On 09/02/2015 04:03 AM, Rob Hills wrote:

Hi,

I am developing code (Python 3.4) that transforms text data from one
format to another.

As part of the process, I had a set of hard-coded str.replace(...)
functions that I used to clean up the incoming text into the desired
output format, something like this:

  dataIn = dataIn.replace('\r', '\\n') # Tidy up linefeeds
  dataIn = dataIn.replace('','<') # Tidy up < character
  dataIn = dataIn.replace('','>') # Tidy up < character
  dataIn = dataIn.replace('','o') # No idea why but lots of
  these: convert to 'o' character dataIn =
  dataIn.replace('','f') # .. and these: convert to 'f'
  character
  dataIn = dataIn.replace('','e') # ..  'e'
  dataIn = dataIn.replace('','O') # ..  'O'

These statements transform my data correctly, but the list of statements
grows as I test the data so I thought it made sense to store the
replacement mappings in a file, read them into a dict and loop through
that to do the cleaning up, like this:

  with open(fileName, 'r+t', encoding='utf-8') as mapFile:
  for line in mapFile:
  line = line.strip()
  try:
  if (line) and not line.startswith('#'):
  line = line.split('#')[:1][0].strip() # trim any
  trailing comments name, value = line.split('=')
  name = name.strip()
  self.filterMap[name]=value.strip()
  except:
  self.logger.error('exception occurred parsing line
  [{0}] in file [{1}]'.format(line, fileName)) raise

Elsewhere, I use the following code to do the actual cleaning up:

  def filter(self, dataIn):
  if dataIn:
  for token, replacement in self.filterMap.items():
  dataIn = dataIn.replace(token, replacement)
  return dataIn


My mapping file contents look like this:

\r = \\n
“ = 
 = <
 = >
 = 
 = F
 = o
 = f
 = e
 = O

This all works "as advertised" */except/* for the '\r' => '\\n'
replacement. Debugging the code, I see that my '\r' character is
"escaped" to '\\r' and the '\\n' to 'n' when they are read in from
the file.

I've been googling hard and reading the Python docs, trying to get my
head around character encoding, but I just can't figure out how to get
these bits of code to do what I want.

It seems to me that I need to either:

* change the way I represent '\r' and '\\n' in my mapping file; or
* transform them somehow when I read them in

However, I haven't figured out how to do either of these.

TIA,



I have had this problem too and can propose a solution ready to run out
of my toolbox:


class editor:

  def compile (self, replacements):
  targets, substitutes = zip (*replacements)
  re_targets = [re.escape (item) for item in targets]
  re_targets.sort (reverse = True)
  self.targets_set = set (targets)
  self.table = dict (replacements)
  regex_string = '|'.join (re_targets)
  self.regex = re.compile (regex_string, re.DOTALL)

  def edit (self, text, eat = False):
  hits = self.regex.findall (text)
  nohits = self.regex.split (text)
  valid_hits = set (hits) & self.targets_set  # Ignore targets
with illegal re modifiers.

Can you give an example of an ignored target? I don't see the light...


  if valid_hits:
  substitutes = [self.table [item] for item in hits if item
in valid_hits] + []  # Make lengths equal for zip to work right

That looks wrong...


  if eat:
  output = ''.join (substitutes)
  else:
  zipped = zip (nohits, substitutes)
  output = ''.join (list (reduce (lambda a, b: a + b,
[zipped][0]))) + nohits [-1]
  else:
  if eat:
  output = ''
  else:
  output = input

...and so does this.


  return output

  >>> substitutions = (
  ('\r', '\n'),
  ('', '<'),
  ('', '>'),
  ('', 'o'),
  ('', 'f'),
  ('', 'e'),
  ('', 'O'),
  )

Order doesn't matter. Add new ones at the end.

  >>> e = editor ()
  >>> e.compile (substitutions)

A simple way of testing is running the substitutions through the editor

  >>> print e.edit (repr (substitutions))
(('\r', '\n'), ('<', '<'), ('>', '>'), ('o', 'o'), ('f', 'f'), ('e',
'e'), ('O', 'O'))

The escapes need to be tested separately

  >>> print e.edit ('abc\rdef')
abc
def

Note: This editor's compiler compiles the substitution list to a regular
expression which the editor uses to find all matches in the text passed
to edit. There has got to be a limit to the size of a text which

Re: Reading \n unescaped from a file

2015-09-03 Thread Friedrich Rentsch



On 09/03/2015 06:12 PM, Rob Hills wrote:

Hi Friedrich,

On 03/09/15 16:40, Friedrich Rentsch wrote:

On 09/02/2015 04:03 AM, Rob Hills wrote:

Hi,

I am developing code (Python 3.4) that transforms text data from one
format to another.

As part of the process, I had a set of hard-coded str.replace(...)
functions that I used to clean up the incoming text into the desired
output format, something like this:

  dataIn = dataIn.replace('\r', '\\n') # Tidy up linefeeds
  dataIn = dataIn.replace('','<') # Tidy up < character
  dataIn = dataIn.replace('','>') # Tidy up < character
  dataIn = dataIn.replace('','o') # No idea why but lots of
these: convert to 'o' character
  dataIn = dataIn.replace('','f') # .. and these: convert to
'f' character
  dataIn = dataIn.replace('','e') # ..  'e'
  dataIn = dataIn.replace('','O') # ..  'O'

These statements transform my data correctly, but the list of statements
grows as I test the data so I thought it made sense to store the
replacement mappings in a file, read them into a dict and loop through
that to do the cleaning up, like this:

  with open(fileName, 'r+t', encoding='utf-8') as mapFile:
  for line in mapFile:
  line = line.strip()
  try:
  if (line) and not line.startswith('#'):
  line = line.split('#')[:1][0].strip() # trim
any trailing comments
  name, value = line.split('=')
  name = name.strip()
  self.filterMap[name]=value.strip()
  except:
  self.logger.error('exception occurred parsing
line [{0}] in file [{1}]'.format(line, fileName))
  raise

Elsewhere, I use the following code to do the actual cleaning up:

  def filter(self, dataIn):
  if dataIn:
  for token, replacement in self.filterMap.items():
  dataIn = dataIn.replace(token, replacement)
  return dataIn


My mapping file contents look like this:

\r = \\n
“ = 
 = <
 = >
 = 
 = F
 = o
 = f
 = e
 = O

This all works "as advertised" */except/* for the '\r' => '\\n'
replacement. Debugging the code, I see that my '\r' character is
"escaped" to '\\r' and the '\\n' to 'n' when they are read in from
the file.

I've been googling hard and reading the Python docs, trying to get my
head around character encoding, but I just can't figure out how to get
these bits of code to do what I want.

It seems to me that I need to either:

* change the way I represent '\r' and '\\n' in my mapping file; or
* transform them somehow when I read them in

However, I haven't figured out how to do either of these.

TIA,



I have had this problem too and can propose a solution ready to run
out of my toolbox:


class editor:

 def compile (self, replacements):
 targets, substitutes = zip (*replacements)
 re_targets = [re.escape (item) for item in targets]
 re_targets.sort (reverse = True)
 self.targets_set = set (targets)
 self.table = dict (replacements)
 regex_string = '|'.join (re_targets)
 self.regex = re.compile (regex_string, re.DOTALL)

 def edit (self, text, eat = False):
 hits = self.regex.findall (text)
 nohits = self.regex.split (text)
 valid_hits = set (hits) & self.targets_set  # Ignore targets
with illegal re modifiers.
 if valid_hits:
 substitutes = [self.table [item] for item in hits if item
in valid_hits] + []  # Make lengths equal for zip to work right
 if eat:
 output = ''.join (substitutes)
 else:
 zipped = zip (nohits, substitutes)
 output = ''.join (list (reduce (lambda a, b: a + b,
[zipped][0]))) + nohits [-1]
 else:
 if eat:
 output = ''
 else:
 output = input
 return output


substitutions = (

 ('\r', '\n'),
 ('', '<'),
 ('', '>'),
 ('', 'o'),
 ('', 'f'),
 ('', 'e'),
 ('', 'O'),
 )

Order doesn't matter. Add new ones at the end.


e = editor ()
e.compile (substitutions)

A simple way of testing is running the substitutions through the editor


print e.edit (repr (substitutions))

(('\r', '\n'), ('<', '<'), ('>', '>'), ('o', 'o'), ('f', 'f'), ('e',
'e'), ('O', 'O'))

The escapes need to be tested separately


print e.edit ('abc\rdef')

abc
def

Note: This editor's compiler compiles the substitution list to a
regular expression which the editor uses to find all matches in the
text passed to edit. There has got to be a limit to the size of a text
which a regular expression can handle. I don't know what this limit
is. To be on the safe side, edit a large text line by line or at least
in sensible chunks.

Frederic


Thanks for the suggestion.  I had ori

Re: Reading \n unescaped from a file

2015-09-03 Thread Friedrich Rentsch



On 09/02/2015 04:03 AM, Rob Hills wrote:

Hi,

I am developing code (Python 3.4) that transforms text data from one
format to another.

As part of the process, I had a set of hard-coded str.replace(...)
functions that I used to clean up the incoming text into the desired
output format, something like this:

 dataIn = dataIn.replace('\r', '\\n') # Tidy up linefeeds
 dataIn = dataIn.replace('','<') # Tidy up < character
 dataIn = dataIn.replace('','>') # Tidy up < character
 dataIn = dataIn.replace('','o') # No idea why but lots of these: 
convert to 'o' character
 dataIn = dataIn.replace('','f') # .. and these: convert to 'f' 
character
 dataIn = dataIn.replace('','e') # ..  'e'
 dataIn = dataIn.replace('','O') # ..  'O'

These statements transform my data correctly, but the list of statements
grows as I test the data so I thought it made sense to store the
replacement mappings in a file, read them into a dict and loop through
that to do the cleaning up, like this:

 with open(fileName, 'r+t', encoding='utf-8') as mapFile:
 for line in mapFile:
 line = line.strip()
 try:
 if (line) and not line.startswith('#'):
 line = line.split('#')[:1][0].strip() # trim any 
trailing comments
 name, value = line.split('=')
 name = name.strip()
 self.filterMap[name]=value.strip()
 except:
 self.logger.error('exception occurred parsing line [{0}] 
in file [{1}]'.format(line, fileName))
 raise

Elsewhere, I use the following code to do the actual cleaning up:

 def filter(self, dataIn):
 if dataIn:
 for token, replacement in self.filterMap.items():
 dataIn = dataIn.replace(token, replacement)
 return dataIn


My mapping file contents look like this:

\r = \\n
“ = 
 = <
 = >
 = 
 = F
 = o
 = f
 = e
 = O

This all works "as advertised" */except/* for the '\r' => '\\n'
replacement. Debugging the code, I see that my '\r' character is
"escaped" to '\\r' and the '\\n' to 'n' when they are read in from
the file.

I've been googling hard and reading the Python docs, trying to get my
head around character encoding, but I just can't figure out how to get
these bits of code to do what I want.

It seems to me that I need to either:

   * change the way I represent '\r' and '\\n' in my mapping file; or
   * transform them somehow when I read them in

However, I haven't figured out how to do either of these.

TIA,




I have had this problem too and can propose a solution ready to run out 
of my toolbox:



class editor:

def compile (self, replacements):
targets, substitutes = zip (*replacements)
re_targets = [re.escape (item) for item in targets]
re_targets.sort (reverse = True)
self.targets_set = set (targets)
self.table = dict (replacements)
regex_string = '|'.join (re_targets)
self.regex = re.compile (regex_string, re.DOTALL)

def edit (self, text, eat = False):
hits = self.regex.findall (text)
nohits = self.regex.split (text)
valid_hits = set (hits) & self.targets_set  # Ignore targets 
with illegal re modifiers.

if valid_hits:
substitutes = [self.table [item] for item in hits if item 
in valid_hits] + []  # Make lengths equal for zip to work right

if eat:
output = ''.join (substitutes)
else:
zipped = zip (nohits, substitutes)
output = ''.join (list (reduce (lambda a, b: a + b, 
[zipped][0]))) + nohits [-1]

else:
if eat:
output = ''
else:
output = input
return output

>>> substitutions = (
('\r', '\n'),
('', '<'),
('', '>'),
('', 'o'),
('', 'f'),
('', 'e'),
('', 'O'),
)

Order doesn't matter. Add new ones at the end.

>>> e = editor ()
>>> e.compile (substitutions)

A simple way of testing is running the substitutions through the editor

>>> print e.edit (repr (substitutions))
(('\r', '\n'), ('<', '<'), ('>', '>'), ('o', 'o'), ('f', 'f'), ('e', 
'e'), ('O', 'O'))


The escapes need to be tested separately

>>> print e.edit ('abc\rdef')
abc
def

Note: This editor's compiler compiles the substitution list to a regular 
expression which the editor uses to find all matches in the text passed 
to edit. There has got to be a limit to the size of a text which a 
regular expression can handle. I don't know what this limit is. To be on 
the safe side, edit a large text line by line or at least in sensible 
chunks.


Frederic

--
https://mail.python.org/mailman/listinfo/python-list


Re: How to model government organization hierarchies so that the list can expand and compress

2015-08-14 Thread Friedrich Rentsch



On 08/13/2015 09:10 PM, Alex Glaros wrote:

It's like the desktop folder/directory model where you can create unlimited 
folders and put folders within other folders. Instead of folders, I want to use 
government organizations.

Example: Let user create agency names: Air Force, Marines, Navy, Army. Then let them 
create an umbrella collection called Pentagon, and let users drag Air Force, 
Marines, Navy, etc. into the umbrella collection.

User may wish to add smaller sub-sets of Army, such as Army Jeep Repair 
Services

User may also want to add a new collection Office of the President and put 
OMB and Pentagon under that as equals.

What would the data model look like for this?  If I have a field: 
next_higher_level_parent that lets children records keep track of parent 
record, it's hard for me to imagine anything but an inefficient bubble sort to 
produce a hierarchical organizational list. Am using Postgres, not graph 
database.

I'm hoping someone else has worked on this problem, probably not with 
government agency names, but perhaps the same principle with other objects.

Thanks!

Alex Glaros


After struggling for years with a tree-like estate management system 
(onwer at the top, next level: real estate, banks, art collection, etc., 
third level: real estate units, bank accounts, etc. fourth level: 
investment positions, currency accounts, etc)--it recently occurred to 
me that I had such a system all along: the file system. The last folder 
at the bottom end of each branch names its contents (AAPL or USD or 
Lamborghini, etc) the contents is a csv file recording an in and out, 
revenue, expense history (date, quantity, paid or received, memo, . . 
.). Any documentation on the respective value item may also be stored in 
the same folder, easy to find without requiring cross referencing.


Managing the data is not a awkward as one might fear. A bash wizard 
could probably do it quite efficiently with bash scripts. Bash dummies, 
like me, are more comfortable with python. Moving, say, a portfolio from 
one bank to another is a matter of mv i/banks/abc/account-123 
i/banks/xyz (system call). With the tabular data base system (MySQL) I 
have, simple operations like this one are quite awkward.


Well, you might laugh. Or others might. If your task is a commercial 
order, then this approach will hardly do. Anyway, I thought I'd toss it 
up. If it won't help it won't hurt.


Frederic







--
https://mail.python.org/mailman/listinfo/python-list


Re: Who uses IDLE -- please answer if you ever do, know, or teach

2015-08-07 Thread Friedrich Rentsch



On 08/06/2015 03:21 AM, Rustom Mody wrote:

On Thursday, August 6, 2015 at 6:36:56 AM UTC+5:30, Terry Reedy wrote:

There have been discussions, such as today on Idle-sig , about who uses
Idle and who we should design it for.  If you use Idle in any way, or
know of or teach classes using Idle, please answer as many of the
questions below as you are willing, and as are appropriate

Private answers are welcome. They will be deleted as soon as they are
tallied (without names).

I realized that this list is a biased sample of the universe of people
who have studied Python at least, say, a month.  But biased data should
be better than my current vague impressions.

0. Classes where Idle is used:
Where?
Level?

Idle users:

1. Are you
grade school (1=12)?
undergraduate (Freshman-Senior)?
post-graduate (from whatever)?

2. Are you
beginner (1st class, maybe 2nd depending on intensity of first)?
post-beginner?

3. With respect to programming, are you
amateur (unpaid)
professional (paid for programming)

--
Terry Jan Reedy, Idle maintainer

I used idle to teach a 2nd year engineering course last sem
It was a more pleasant experience than I expected
One feature that would help teachers:
It would be nice to (have setting to) auto-save the interaction window
[Yeah I tried to see if I could do it by hand but could not find where]
Useful for giving as handouts of the class
So students rest easy and dont need to take 'literal' notes of the session

I will now be teaching more advanced students and switching back to emacs
-- python, C, and others -- so really no option to emacs.
Not ideal at all but nothing else remotely comparable


I've been using Idle full time to simultaneously manage my financial 
holdings, develop the management system and manually fix errors. While 
the ultimate goal is a push-button system, I have not reached that stage 
and am compelled to work trial-and-error style. For this way of working 
I found Idle well-suited, since the majority of jobs I do are hacks and 
quick fixes, not production runs running reliably.


I recently came up with a data transformation framework that greatly 
expedites interactive development. It is based on transformer objects 
that wrap a transformation function. The base class Transformer handles 
the flow of the data in a manner that allows linking the transformer 
modules together in chains. With a toolbox of often used standards, a 
great variety of transformation tasks can be accomplished by simply 
lining up a bunch of toolbox transformers in chains. Bridging a gap now 
and then is a relatively simple matter of writing a transformation 
function that converts the output format upstream of the gap to the 
required input format downstream of the gap.


The system works very well. It saves me a lot of time. I am currently 
writing a manual with the intention to upload it for comment and also to 
upload the system, if the comments are not too discouraging. If I may 
show a few examples below . . .


Frederic (moderately knowledgeable non-professional)

--

import TYX

FR = TYX.File_Reader ()
CSVP = TYX.CSV_Parser ()
TAB = TYX.Tabulator ()

print TAB (CSVP (FR ('Downloads/xyz.csv')))   # Calls nest
   ---
   Date,Open,Close,High,Low,Volume
   07/18/2014,34.36,34.25,34.36,34.25,485
   07/17/2014,34.55,34.50,34.55,34.47,2,415
   07/16/2014,34.65,34.63,34.68,34.52,83,477
   ---

CSVP.get ()   # display all parameters
   CSV_Parser
dialect   None
delimiter '\t'
quote ''
has_headerFalse
strip_fields  True
headers   []

CSVP.set (delimiter = ',')
TAB.set (table_format = 'pipe')
print TAB (CSVP ())   # Transformers retain their input
   |:---|:--|:--|:--|:--|:---|
   | Date   | Open  | Close | High  | Low   | Volume |
   | 07/18/2014 | 34.36 | 34.25 | 34.36 | 34.25 | 485|
   | 07/17/2014 | 34.55 | 34.50 | 34.55 | 34.47 | 2,415  |
   | 07/16/2014 | 34.65 | 34.63 | 34.68 | 34.52 | 83,477 |

class formatter (TYX.Transformer):
  def __init__ (self):
 TYX.Transformer.__init__ (self, symbol = None) # declare 
parameter

  def transform (self, records):
 symbol = self.get ('symbol')
 if symbol:
out = []
for d, o, c, h, l, v in records [1:]: # Clip headers
   month, day, year = d.split ('/')
   d = '%s-%s-%s' % (year, month, day)
   v = v.replace (',', '')
   out.append ((d, symbol, o, c, h, l, v))
return out
fo = formatter ()
fo.set (symbol = 'XYZ')
TAB.set (float_format = 'f')
print TAB (fo (CSVP()))   # Transformers also retain their output
|:---|:|--:|--:|--:|--:|--:|
   | 2014-07-18 | XYZ | 34.36 

Re: Extract email address from Java script in html source using python

2015-05-24 Thread Friedrich Rentsch



On 05/23/2015 04:15 PM, savitha devi wrote:

What I exactly want is the java script is in the html code. I am trying for
a regular expression to find the email address embedded with in the java
script.

On Sat, May 23, 2015 at 2:31 PM, Chris Angelico ros...@gmail.com wrote:


On Sat, May 23, 2015 at 4:46 PM, savitha devi savith...@gmail.com wrote:

I am developing a web scraper code using HTMLParser. I need to extract
text/email address from java script with in the HTMLCode.I am beginner

level

in python coding and totally lost here. Need some help on this. The java
script code is as below:

script type='text/javascript'
  //!--
  document.getElementById('cloak48218').innerHTML = '';
  var prefix = '#109;a' + 'i#108;' + '#116;o';
  var path = 'hr' + 'ef' + '=';
  var addy48218 = '#105;nf#111;' + '#64;';
  addy48218 = addy48218 + 'tsv-n#101;#117;r#105;#101;d' + '#46;' +
'd#101;';
  document.getElementById('cloak48218').innerHTML += 'a ' + path + '\'' +
prefix + ':' + addy48218 + '\'' + addy48218+'\/a';
  //--

This is deliberately being done to prevent scripted usage. What
exactly are you needing to do this for?

You're basically going to have to execute the entire block of
JavaScript code, and then decode the entities to get to what you want.
Doing it manually is pretty easy; doing it automatically will
virtually require a language interpreter.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list



This is just about nuts and bolts, not about the ethics of presumed 
intentions.


Hope it helps one way or other

Frederic


--- 



sample = '''//!--
 document.getElementById('cloak48218').innerHTML = '';
 var prefix = '#109;a' + 'i#108;' + '#116;o';
 var path = 'hr' + 'ef' + '=';
 var addy48218 = '#105;nf#111;' + '#64;';
 addy48218 = addy48218 + 'tsv-n#101;#117;r#105;#101;d' + '#46;' +
'd#101;';
 document.getElementById('cloak48218').innerHTML += 'a ' + path + '\'' +
prefix + ':' + addy48218 + ''' + addy48218+'\/a';
 //--'''

 import SE  # Download from PyPi at https://pypi.python.org/pypi/SE

 def make_se_translator ():

# Make SE substitutions
subs_list = []

# Make # code substitutions
for i in range (256):
subs_list.append ('#%d;=%c' % (i, chr(i)))

# Delete Java stuff
subs_list.append (' document.getElementById(\'cloak48218\').= ')
subs_list.append (' var = \n= //!--= //--= ')

# Java syntax? Tweaks needed to get the sample working
subs_list.append (' + \'\'\'= \'\'\'=\'\' \/=/ ')

# Add more as needed trial and error style
# subs_list.append ( . . . format: ' old=new delete this= '

# Make text
subs = '\n'.join (subs_list)

# Make SE translator
translator = SE.SE (subs)

# return translator, subs   # print subs, if you want to see what 
they look like

return translator


 translator = make_se_translator ()

 translation = translator (sample)

 print translation   # See
 innerHTML = ''; prefix = 'ma' + 'il' + 'to'; path = 'hr' + 'ef' + '='; 
addy48218 = 'info' + '@'; addy48218 = addy48218 + 'tsv-neuried' + '.' 
+'de'; innerHTML += 'a ' + path  +prefix + ':' + addy48218 + '' + 
addy48218+'/a';


 exec (translation.lstrip ())

 print innerHTML
a href=mailto:i...@tsv-neuried.dei...@tsv-neuried.de/a

--
https://mail.python.org/mailman/listinfo/python-list