Re: convert script awk in python
Michael Torrie writes: > On 3/25/21 1:14 AM, Loris Bennett wrote: >> Does any one have a better approach? > > Not as such. Running a command and parsing its output is a relatively > common task. Years ago I wrote my own simple python wrapper function > that would make it easier to run a program with arguments, and capture > its output. I ended up using that wrapper many times, which saved a lot > of time. > > When it comes to converting a bash pipeline process to Python, it's > worth considering that most of pipelines seem to involve parsing using > sed or awk (as yours do), which is way easier to do from python without > that kind of pipelining. However there is a fantastic article I read > years ago about how generators are python's equivalent to a pipe. > Anyone wanting to replace a bash script with python should read this: > > https://www.dabeaz.com/generators/Generators.pdf Thanks for the link - very instructive. > Also there's an interesting shell scripting language based on Python > called xonsh which makes it much easier to interact with processes like > bash does, but still leveraging Python to process the output. > https://xon.sh/ . That looks very interesting, too. Cheers, Loris -- This signature is currently under construction. -- https://mail.python.org/mailman/listinfo/python-list
RE: convert script awk in python
Many thanks for the link to that document. Most helpful. Peter > -Original Message- > From: Michael Torrie > Sent: Friday, March 26, 2021 8:32 PM > To: python-list@python.org > Subject: Re: convert script awk in python > > On 3/25/21 1:14 AM, Loris Bennett wrote: > > Does any one have a better approach? > > Not as such. Running a command and parsing its output is a relatively > common task. Years ago I wrote my own simple python wrapper function > that would make it easier to run a program with arguments, and capture > its output. I ended up using that wrapper many times, which saved a lot > of time. > > When it comes to converting a bash pipeline process to Python, it's > worth considering that most of pipelines seem to involve parsing using > sed or awk (as yours do), which is way easier to do from python without > that kind of pipelining. However there is a fantastic article I read > years ago about how generators are python's equivalent to a pipe. > Anyone wanting to replace a bash script with python should read this: > > https://www.dabeaz.com/generators/Generators.pdf > > Also there's an interesting shell scripting language based on Python > called xonsh which makes it much easier to interact with processes like > bash does, but still leveraging Python to process the output. > https://xon.sh/ . -- -- https://mail.python.org/mailman/listinfo/python-list
RE: convert script awk in python
https://docs.python.org/3/library/fileinput.html Dan, Yes, fileinput sounds like what I described and more. It does indeed seem to emulate the interface in programs like AWK including using "-" as a placeholder for standard input. Now all you need is to have it also do the split! ∀vi ∃. Grθß -Original Message- From: Python-list On Behalf Of 2qdxy4rzwzuui...@potatochowder.com Sent: Friday, March 26, 2021 9:43 PM To: python-list@python.org Subject: Re: convert script awk in python On 2021-03-26 at 21:06:19 -0400, Avi Gross via Python-list wrote: > A generator that opens one file at a time (or STDIN) in a consistent > manner, would be a reasonable thing to have as part of emulating AWK. https://docs.python.org/3/library/fileinput.html -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: convert script awk in python
On 2021-03-26 at 21:06:19 -0400, Avi Gross via Python-list wrote: > A generator that opens one file at a time (or STDIN) in a consistent > manner, would be a reasonable thing to have as part of emulating AWK. https://docs.python.org/3/library/fileinput.html -- https://mail.python.org/mailman/listinfo/python-list
RE: convert script awk in python
Michael, A generator that opens one file at a time (or STDIN) in a consistent manner, would be a reasonable thing to have as part of emulating AWK. As I see it, you may want a bit more that includes having it know how to parse each line it reads into some version of names that in Python might not be $1 and $2 types of names but may be an array of strings with the complete line perhaps being in array[0] and each of the parts. Clearly you would place whatever equivalent BEGIN statements in your code above the call to the generator then have something like a for loop assigning the result of the generator to a variable and your multiple condition/action parts in the loop. You then have the END outside the loop. But it is far from as simple as that to emulate what AWK does such as deciding whether you stop matching patterns once the first match is found and executed. As I noted, some AWK features do not line up with normal python such as assuming variables not initialized are zero or "" depending on context. There may well be scoping issues and other things to consider. And clearly you need to do things by hand if you want a character string to be treated as an integer, ... But all fairly doable, albeit not sure an easy translation between an AWK script into python is trivial, or even a good idea. You could do a similar concept with other utilities like sed or grep or other such filter utilities where the same generator, or a variant, might automate things. I am pretty sure some module or other has done things like this. It is common in a language like PERL to do something like this: while(<>) { # get rid of the pesky newline character chomp; # read the fields in the current record into an array @fields = split(':', $_); # DO stuff } The <> diamond operator is a sort of generator that reads in a line at a time from as many files as needed and sticks it in $_ by default and then you throw away the newline and split the line and then do what you wish after that. No reason python cannot have something similar, maybe more wordy. Disclaimer: I am not suggesting people use AWK or PERL or anything else. The focus is if people come from other programming environments and are looking at how to do common tasks in python. -Original Message- From: Python-list On Behalf Of Michael Torrie Sent: Friday, March 26, 2021 8:32 PM To: python-list@python.org Subject: Re: convert script awk in python On 3/25/21 1:14 AM, Loris Bennett wrote: > Does any one have a better approach? Not as such. Running a command and parsing its output is a relatively common task. Years ago I wrote my own simple python wrapper function that would make it easier to run a program with arguments, and capture its output. I ended up using that wrapper many times, which saved a lot of time. When it comes to converting a bash pipeline process to Python, it's worth considering that most of pipelines seem to involve parsing using sed or awk (as yours do), which is way easier to do from python without that kind of pipelining. However there is a fantastic article I read years ago about how generators are python's equivalent to a pipe. Anyone wanting to replace a bash script with python should read this: https://www.dabeaz.com/generators/Generators.pdf Also there's an interesting shell scripting language based on Python called xonsh which makes it much easier to interact with processes like bash does, but still leveraging Python to process the output. https://xon.sh/ . -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: convert script awk in python
On 3/25/21 1:14 AM, Loris Bennett wrote: > Does any one have a better approach? Not as such. Running a command and parsing its output is a relatively common task. Years ago I wrote my own simple python wrapper function that would make it easier to run a program with arguments, and capture its output. I ended up using that wrapper many times, which saved a lot of time. When it comes to converting a bash pipeline process to Python, it's worth considering that most of pipelines seem to involve parsing using sed or awk (as yours do), which is way easier to do from python without that kind of pipelining. However there is a fantastic article I read years ago about how generators are python's equivalent to a pipe. Anyone wanting to replace a bash script with python should read this: https://www.dabeaz.com/generators/Generators.pdf Also there's an interesting shell scripting language based on Python called xonsh which makes it much easier to interact with processes like bash does, but still leveraging Python to process the output. https://xon.sh/ . -- https://mail.python.org/mailman/listinfo/python-list
Re: convert script awk in python
Christian Gollwitzer writes: > The closest equivalent I can come up with in Python is this: > > == > import sys > > s=0 > for line in sys.stdin: > try: > s += float(line.split()[1]) > except: > pass > print(s) > === > > > I don't want to cram this into a python -c " " line, if it even is > possible; how do you handle indentation levels and loops?? > I agree. Perhaps we need a ‘awk’ module/package. I see that there is one in PyPI but that was last updated in 2016. -- Regards, Pankaj Jangid -- https://mail.python.org/mailman/listinfo/python-list
Re: convert script awk in python
... funny thing is that OP never contributed to this discussion. Several people provided very valuable inputs but OP did not even bother to say "thank you". just saying ... On Wed, Mar 24, 2021 at 11:22:02AM -0400, Avi Gross via Python-list wrote: Cameron, I agree with you. I first encountered AWK in 1982 when I went to work for Bell Labs. I have not had any reason to use AWK since before the year 2000 so I was not sure that unused variables were initialized to zero. The code seemed to assume that. I have learned quite a few languages since and after a while, they tend to blend into each other. I think it would indeed have been more AWKthonic (or should that be called AWKward?) to have a BEGIN section in which functions were declared and variables clearly initialized but the language does allow some quick and dirty ways to do things and clearly the original programmer used some. Which brings us back to languages like python. When I started using AWK and a slew of other UNIX programs years ago, what I found interesting is how much AWK was patterned a bit on the C language, not a surprise as the K in AWK is Brian Kernighan who had a hand in C. But unlike C that made me wait around as it compiled, AWK was a bit more of an interpreted language and I could write one-liner shell scripts (well, stretched over a few lines if needed) that did things. True, if you stuck an entire program in a BEGIN statement and did not actually loop over data, it seems a tad wasteful. But sometimes it was handy to use it to test out a bit of C code I was writing without waiting for the whole compile thing. In a sense, it was bit like using the python REPL and getting raid feedback. Of course, when I was an early adopter of C++, too many things were not in AWK! What gets me is the original question which made it sound a bit like asking how you would translate some fairly simple program from language A to language B. For some fairly simple programs, the translation effort could be minimal. There are often trivial mappings between similar constructs. Quite a bit of python simply takes a block of code in another language that is between curly braces, and lines it up indented below whatever it modifies and after a colon. The reverse may be similarly trivial. There are of course many such changes needed for some languages but when some novel twist is used that the language does not directly support, you may need to innovate or do a rewrite that avoids it. But still, except in complicated expressions, you can rewrite x++ to "x += 1" if that is available or "x = x + 1" or "x -> x + 1" or whatever. What gets me here is that AWK in his program was being used exactly for what it was designed. Python is more general-purpose. Had we been asked (not on this forum) to convert that AWK script to PERL, it would have been much more straightforward because PERL was also designed to be able to read in lines and break them into parts and act on them. It has constructs like the diamond operator or split that make it easy. Hence, at the end, I suggested Tomasz may want to do his task not using just basic python but some module others have already shared that emulates some of the filter aspects of AWK. That may make it easier to just translate the bits of code to python while largely leaving the logic in place, depending on the module. Just to go way off the rails, was our annoying cross-poster from a while back also promising to include a language like AWK into their universal translator by just saving some JSON descriptions? -Original Message- From: Python-list On Behalf Of Cameron Simpson Sent: Tuesday, March 23, 2021 6:38 PM To: Tomasz Rola Cc: Avi Gross via Python-list Subject: Re: convert script awk in python On 23Mar2021 16:37, Tomasz Rola wrote: On Tue, Mar 23, 2021 at 10:40:01AM -0400, Avi Gross via Python-list wrote: [...] I am a tod concerned as to where any of the variables x, y or z have been defined at this point. I have not seen a BEGIN {...} pattern/action or anywhere these have been initialized but they are set in a function that as far as I know has not been called. Weird. Maybe awk is allowing an uninitialized variable to be tested for in your code but if so, you need to be cautious how you do this in python. As far as I can say, the type of uninitialised variable is groked from the first operation on it. I.e., "count += 1" first initializes count to 0 and then adds 1. This might depend on exact awk being used. There were few of them during last 30+ years. I just assume it does as I wrote above. I'm pretty sure this behaviour's been there since very early times. I think it was there when I learnt awk, decades ago. Using BEGIN would be in better style, of course. Aye. Always good to be up front about initial values. There is a very nice book, "The AWK Programming Language" by Aho, Kernighan and Weinberger. First printed in 1988, n
Re: convert script awk in python
Peter Otten <__pete...@web.de> writes: > On 25/03/2021 08:14, Loris Bennett wrote: > >> I'm not doing that, but I am trying to replace a longish bash pipeline >> with Python code. >> >> Within Emacs, often I use Org mode[1] to generate date via some bash >> commands and then visualise the data via Python. Thus, in a single Org >> file I run >> >>/usr/bin/sacct -u $user -o jobid -X -S $start -E $end -s COMPLETED -n | >> \ >>xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print >> $3 " " $9}' | sed 's/%//g' >> >> The raw numbers are formatted by Org into a table >> >>| cpu_eff | mem_eff | >>|-+-| >>|96.6 | 99.11 | >>| 93.43 | 100.0 | >>|91.3 | 100.0 | >>| 88.71 | 100.0 | >>| 89.79 | 100.0 | >>| 84.59 | 100.0 | >>| 83.42 | 100.0 | >>| 86.09 | 100.0 | >>| 92.31 | 100.0 | >>| 90.05 | 100.0 | >>| 81.98 | 100.0 | >>| 90.76 | 100.0 | >>| 75.36 | 64.03 | >> >> I then read this into some Python code in the Org file and do something like >> >>df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0]) >>cpu_data = df.loc[: , "cpu_eff"] >>mem_data = df.loc[: , "mem_eff"] >> >>... >> >>n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5)) >>n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5)) >> >> which generates nice histograms. >> >> I decided rewrite the whole thing as a stand-alone Python program so >> that I can run it as a cron job. However, as a novice Python programmer >> I am finding translating the bash part slightly clunky. I am in the >> middle of doing this and started with the following: >> >> sacct = subprocess.Popen(["/usr/bin/sacct", >>"-u", user, >>"-S", period[0], "-E", period[1], >>"-o", "jobid", "-X", >>"-s", "COMPLETED", "-n"], >> stdout=subprocess.PIPE, >> ) >> >> jobids = [] >> >> for line in sacct.stdout: >> jobid = str(line.strip(), 'UTF-8') >> jobids.append(jobid) >> >> for jobid in jobids: >> seff = subprocess.Popen(["/usr/bin/seff", jobid], >> stdin=sacct.stdout, >> stdout=subprocess.PIPE, >> ) > > The statement above looks odd. If seff can read the jobids from stdin > there should be no need to pass them individually, like: > > sacct = ... > seff = Popen( > ["/usr/bin/seff"], stdin=sacct.stdout, stdout=subprocess.PIPE, > universal_newlines=True > ) > for line in seff.communicate()[0].splitlines(): > ... Indeed, seff cannot read multiple jobids. That's why had 'xargs' in the original bash code. Initially I thought of calling 'xargs' via Popen, but this seemed very fiddly (I didn't manage to get it working) and anyway seemed a bit weird to me as it is really just a loop, which I can implement perfectly well in Python. Cheers, Loris >> seff_output = [] >> for line in seff.stdout: >> seff_output.append(str(line.strip(), "UTF-8")) >> >> ... >> >> but compared the to the bash pipeline, this all seems a bit laboured. >> >> Does any one have a better approach? >> >> Cheers, >> >> Loris >> >> >>> -Original Message- >>> From: Cameron Simpson >>> Sent: Wednesday, March 24, 2021 6:34 PM >>> To: Avi Gross >>> Cc: python-list@python.org >>> Subject: Re: convert script awk in python >>> >>> On 24Mar2021 12:00, Avi Gross wrote: >>>> But I wonder how much languages like AWK are still used to make new >>>> programs as compared to a time they were really useful. >>> >>> You mentioned in an adjacent post that you've not used AWK since 2000. >>> By contrast, I still use it regularly. >>> >>> It's great for proof of concept at the command line or in small scripts, and >>> as the innards of quite useful scripts. I've a trite "colsum" script which >>> does nothing but generate and run a little awk programme to sum a column, >>> and routinely type "blah | colsum 2" or the like to get a tally. >>> >>> I totally agree that once you're processing a lot of data from places or >>> where a shell script is making long pipelines or many command invocations, >>> if that's a performance issue it is time to recode. >>> >>> Cheers, >>> Cameron Simpson >> >> Footnotes: >> [1] https://orgmode.org/ >> > -- Dr. Loris Bennett (Hr./Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de -- https://mail.python.org/mailman/listinfo/python-list
Re: convert script awk in python
On 25/03/2021 08:14, Loris Bennett wrote: I'm not doing that, but I am trying to replace a longish bash pipeline with Python code. Within Emacs, often I use Org mode[1] to generate date via some bash commands and then visualise the data via Python. Thus, in a single Org file I run /usr/bin/sacct -u $user -o jobid -X -S $start -E $end -s COMPLETED -n | \ xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 " " $9}' | sed 's/%//g' The raw numbers are formatted by Org into a table | cpu_eff | mem_eff | |-+-| |96.6 | 99.11 | | 93.43 | 100.0 | |91.3 | 100.0 | | 88.71 | 100.0 | | 89.79 | 100.0 | | 84.59 | 100.0 | | 83.42 | 100.0 | | 86.09 | 100.0 | | 92.31 | 100.0 | | 90.05 | 100.0 | | 81.98 | 100.0 | | 90.76 | 100.0 | | 75.36 | 64.03 | I then read this into some Python code in the Org file and do something like df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0]) cpu_data = df.loc[: , "cpu_eff"] mem_data = df.loc[: , "mem_eff"] ... n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5)) n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5)) which generates nice histograms. I decided rewrite the whole thing as a stand-alone Python program so that I can run it as a cron job. However, as a novice Python programmer I am finding translating the bash part slightly clunky. I am in the middle of doing this and started with the following: sacct = subprocess.Popen(["/usr/bin/sacct", "-u", user, "-S", period[0], "-E", period[1], "-o", "jobid", "-X", "-s", "COMPLETED", "-n"], stdout=subprocess.PIPE, ) jobids = [] for line in sacct.stdout: jobid = str(line.strip(), 'UTF-8') jobids.append(jobid) for jobid in jobids: seff = subprocess.Popen(["/usr/bin/seff", jobid], stdin=sacct.stdout, stdout=subprocess.PIPE, ) The statement above looks odd. If seff can read the jobids from stdin there should be no need to pass them individually, like: sacct = ... seff = Popen( ["/usr/bin/seff"], stdin=sacct.stdout, stdout=subprocess.PIPE, universal_newlines=True ) for line in seff.communicate()[0].splitlines(): ... seff_output = [] for line in seff.stdout: seff_output.append(str(line.strip(), "UTF-8")) ... but compared the to the bash pipeline, this all seems a bit laboured. Does any one have a better approach? Cheers, Loris -----Original Message- From: Cameron Simpson Sent: Wednesday, March 24, 2021 6:34 PM To: Avi Gross Cc: python-list@python.org Subject: Re: convert script awk in python On 24Mar2021 12:00, Avi Gross wrote: But I wonder how much languages like AWK are still used to make new programs as compared to a time they were really useful. You mentioned in an adjacent post that you've not used AWK since 2000. By contrast, I still use it regularly. It's great for proof of concept at the command line or in small scripts, and as the innards of quite useful scripts. I've a trite "colsum" script which does nothing but generate and run a little awk programme to sum a column, and routinely type "blah | colsum 2" or the like to get a tally. I totally agree that once you're processing a lot of data from places or where a shell script is making long pipelines or many command invocations, if that's a performance issue it is time to recode. Cheers, Cameron Simpson Footnotes: [1] https://orgmode.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: convert script awk in python
Am 25.03.21 um 00:30 schrieb Avi Gross: It [awk] is, as noted, a great tool and if you only had one or a few tools like it available, it can easily be bent and twisted to do much of what the others do as it is more programmable than most. But following that line of reasoning, fairly simple python scripts can be written with python -c "..." or by pointing to a script The thing with awk is that lots of useful text processing is directly built into the main syntax; whereas in Python, you can certainly do it as well, but it requires to load a library. The simple column summation mentioned before by Cameron would be awk ' {sum += $2 } END {print sum}' which can be easily typed into a command line, with the benefit that it skips every line where the 2nd col is not a valid number. This is important because often there are empty lines, often there is an empty line at the end, some ascii headers whatever. The closest equivalent I can come up with in Python is this: == import sys s=0 for line in sys.stdin: try: s += float(line.split()[1]) except: pass print(s) === I don't want to cram this into a python -c " " line, if it even is possible; how do you handle indentation levels and loops?? Of course, for big fancy programs Python is a much better choice than awk, no questions asked - but awk has a place for little things which fit the special programming model, and there are surprisingly many applications where this is just the easiest and fastest way to do the job. It's like regexes - a few simple characters can do the job which otherwise requires a bulky program, but once the parsing gets to certain complexity, a true parsing language, or even just handcoded Python is much more maintainable. Christian PS: Exercise - handle lines commented out with a '#', i.e. skip those. In awk: gawk '!/^\s*#/ {sum += $2 } END {print sum}' -- https://mail.python.org/mailman/listinfo/python-list
Re: convert script awk in python
"Avi Gross" writes: > Just to be clear, Cameron, I retired very early and thus have had no reason > to use AWK in a work situation and for a while was not using UNIX-based > machines. I have no doubt I would have continued using WK as one part of my > toolkit for years albeit less often as I found other tools better for some > situations, let alone the kind I mentioned earlier that are not text-file > based such as databases. > > It is, as noted, a great tool and if you only had one or a few tools like it > available, it can easily be bent and twisted to do much of what the others > do as it is more programmable than most. But following that line of > reasoning, fairly simple python scripts can be written with python -c "..." > or by pointing to a script > > Anyone have a collection of shell scripts that can be used in pipelines > where each piece is just a call to python to do something simple? I'm not doing that, but I am trying to replace a longish bash pipeline with Python code. Within Emacs, often I use Org mode[1] to generate date via some bash commands and then visualise the data via Python. Thus, in a single Org file I run /usr/bin/sacct -u $user -o jobid -X -S $start -E $end -s COMPLETED -n | \ xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 " " $9}' | sed 's/%//g' The raw numbers are formatted by Org into a table | cpu_eff | mem_eff | |-+-| |96.6 | 99.11 | | 93.43 | 100.0 | |91.3 | 100.0 | | 88.71 | 100.0 | | 89.79 | 100.0 | | 84.59 | 100.0 | | 83.42 | 100.0 | | 86.09 | 100.0 | | 92.31 | 100.0 | | 90.05 | 100.0 | | 81.98 | 100.0 | | 90.76 | 100.0 | | 75.36 | 64.03 | I then read this into some Python code in the Org file and do something like df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0]) cpu_data = df.loc[: , "cpu_eff"] mem_data = df.loc[: , "mem_eff"] ... n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5)) n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5)) which generates nice histograms. I decided rewrite the whole thing as a stand-alone Python program so that I can run it as a cron job. However, as a novice Python programmer I am finding translating the bash part slightly clunky. I am in the middle of doing this and started with the following: sacct = subprocess.Popen(["/usr/bin/sacct", "-u", user, "-S", period[0], "-E", period[1], "-o", "jobid", "-X", "-s", "COMPLETED", "-n"], stdout=subprocess.PIPE, ) jobids = [] for line in sacct.stdout: jobid = str(line.strip(), 'UTF-8') jobids.append(jobid) for jobid in jobids: seff = subprocess.Popen(["/usr/bin/seff", jobid], stdin=sacct.stdout, stdout=subprocess.PIPE, ) seff_output = [] for line in seff.stdout: seff_output.append(str(line.strip(), "UTF-8")) ... but compared the to the bash pipeline, this all seems a bit laboured. Does any one have a better approach? Cheers, Loris > -Original Message- > From: Cameron Simpson > Sent: Wednesday, March 24, 2021 6:34 PM > To: Avi Gross > Cc: python-list@python.org > Subject: Re: convert script awk in python > > On 24Mar2021 12:00, Avi Gross wrote: >>But I wonder how much languages like AWK are still used to make new >>programs as compared to a time they were really useful. > > You mentioned in an adjacent post that you've not used AWK since 2000. > By contrast, I still use it regularly. > > It's great for proof of concept at the command line or in small scripts, and > as the innards of quite use
RE: convert script awk in python
Just to be clear, Cameron, I retired very early and thus have had no reason to use AWK in a work situation and for a while was not using UNIX-based machines. I have no doubt I would have continued using WK as one part of my toolkit for years albeit less often as I found other tools better for some situations, let alone the kind I mentioned earlier that are not text-file based such as databases. It is, as noted, a great tool and if you only had one or a few tools like it available, it can easily be bent and twisted to do much of what the others do as it is more programmable than most. But following that line of reasoning, fairly simple python scripts can be written with python -c "..." or by pointing to a script Anyone have a collection of shell scripts that can be used in pipelines where each piece is just a call to python to do something simple? -Original Message- From: Cameron Simpson Sent: Wednesday, March 24, 2021 6:34 PM To: Avi Gross Cc: python-list@python.org Subject: Re: convert script awk in python On 24Mar2021 12:00, Avi Gross wrote: >But I wonder how much languages like AWK are still used to make new >programs as compared to a time they were really useful. You mentioned in an adjacent post that you've not used AWK since 2000. By contrast, I still use it regularly. It's great for proof of concept at the command line or in small scripts, and as the innards of quite useful scripts. I've a trite "colsum" script which does nothing but generate and run a little awk programme to sum a column, and routinely type "blah | colsum 2" or the like to get a tally. I totally agree that once you're processing a lot of data from places or where a shell script is making long pipelines or many command invocations, if that's a performance issue it is time to recode. Cheers, Cameron Simpson -- https://mail.python.org/mailman/listinfo/python-list
Re: convert script awk in python
On 24Mar2021 12:00, Avi Gross wrote: >But I wonder how much languages like AWK are still used to make new >programs >as compared to a time they were really useful. You mentioned in an adjacent post that you've not used AWK since 2000. By contrast, I still use it regularly. It's great for proof of concept at the command line or in small scripts, and as the innards of quite useful scripts. I've a trite "colsum" script which does nothing but generate and run a little awk programme to sum a column, and routinely type "blah | colsum 2" or the like to get a tally. I totally agree that once you're processing a lot of data from places or where a shell script is making long pipelines or many command invocations, if that's a performance issue it is time to recode. Cheers, Cameron Simpson -- https://mail.python.org/mailman/listinfo/python-list
Re: convert script awk in python
On 24/03/2021 16:00, Avi Gross via Python-list wrote: > But I wonder how much languages like AWK are still used to make new programs > as compared to a time they were really useful. True. I first discovered awk from a Byte article around 1988/9 and it became my goto tool for text munching right up until I found Python in 1998. I still use it as part of a unix command pipeline but I rarely write awk scripts in a file anymore - if it's that complex I reach for Python. But at one time I had a dozen or more awk scripts in my ~/bin folder. I also used awk on a real-world project to process csv files from an Excel spreadsheet and create site-specific config files for some shiny new WindowsNT(v3.1) boxes we were using. They had twin network connections and hard coded IP settings(for resilience) and the network designers delivered the site settings by Excel. We turned them into .BAT files using awk. Eventually, we figured out how to write Excel macros and converted it all to VBA. Happy days. :-) -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos -- https://mail.python.org/mailman/listinfo/python-list
RE: convert script awk in python
Alan, Back when various UNIX (later also included in other Operating environments like Linux and the Mac OS and even Microsoft) utilities came along, the paradigm was a bit different and some kinds of tasks were seen as being done with a pipeline of often small and focused utilities. You mentioned SED which at first seems like a very simple tool but if you look again, it can replace lots of other tools mostly as you can write one-liners with lots of power. AWK, in some sense, was even more powerful and can emulate so many others. But it came with a cost compared to some modern languages where by attaching a few modules, you can do much of the same in fewer passes over the data. I am not sure if I mentioned it here, but I was once on a project that stored all kinds of billing information in primitive text files using a vertical bar as record separator. My boss, who was not really a programmer, started looking at analyzing the data fairly primitively ended up writing huge shell scripts (ksh, I think) that remotely went to our computers around the world and gathered the files and processed them through pipelines that often were 10 or more parts as he selectively broke each line into parts, removed some and so on. He would use /bin/echo, cut, grep, sed, and so on. The darn thing ran for hours which was fine when it was running at midnight in Missouri, but not so much when it ran the same time in countries like Japan and Israel where the users were awake. I got lots of complaints and showed him how his entire mess could be replaced mostly by a single AWK script and complete in minutes. Of course, now, with a fast internet and modern languages that can run threads in parallel, it probably would complete in seconds. Maybe I would have translated that AWK to python after all, but these days I am studying Kotlin so maybe ... As I see it, many languages have a trade-off. The fact that AWK decided to allow a variable to be used without any other form of declaration, was a feature. It could easily lead to errors if you spelled something wrong. But look at Python. You can use a variable to hold anything just by using it. If you spell it wrong later when putting something else in it, no problem. You now have two variables. If you try to access the value of a non-initialized variable, you get an error. But many more strongly-typed languages would catch more potential errors. If you store an int in a variable and later mistakenly put a string in the same variable name, python is happy. And that can be a GOOD feature for programmers but will not catch some errors. Initializing variables to 0 really only makes sense for numeric variables. When a language allows all kinds of "objects" you might need an object-specific default initialization and for some objects, that makes no sense. As you note, the POSIX compliant versions of AWK do also initialize, if needed, to empty strings. But I wonder how much languages like AWK are still used to make new programs as compared to a time they were really useful. So many people sort of live within one application in a GUI rather than work at a textual level in a shell where many problems can rapidly be done with a few smaller tools, often in a pipeline. Avi -Original Message- From: Python-list On Behalf Of Alan Gauld via Python-list Sent: Wednesday, March 24, 2021 5:28 AM To: python-list@python.org Subject: Re: convert script awk in python On 23/03/2021 14:40, Avi Gross via Python-list wrote: > $1 == 113 { > if (x || y || z) > print "More than one type $8 atom."; > else { > x = $2; y = $3; z = $4; > istep++; > } > } > > I am a tod concerned as to where any of the variables x, y or z have > been defined at this point. They haven't been, they are using awk's auto-initialization feature. The variables are defined in this bit of code. The first time we see $1 == 113 we define the variables. On subsequent appearances we print the warning. > far as I know has not been called. Weird. Maybe awk is allowing an > uninitialized variable to be tested for in your code but if so, you > need to be cautious how you do this in python. It's standard behaviour in any POSIX compliant awk, variables are initialised to empty strings/arrays or zero as appropriate to first use. The original AWK book has already been mentioned, which covers nawk. I'll add the O'Reilly book "sed & awk" which covers the POSIX version and includes several extensions not covered in the original book. (It also covers sed but that's irrelevant here) -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: convert script awk in python
On 23/03/2021 14:40, Avi Gross via Python-list wrote: > $1 == 113 { > if (x || y || z) > print "More than one type $8 atom."; > else { > x = $2; y = $3; z = $4; > istep++; > } > } > > I am a tod concerned as to where any of the variables x, y or z have been > defined at this point. They haven't been, they are using awk's auto-initialization feature. The variables are defined in this bit of code. The first time we see $1 == 113 we define the variables. On subsequent appearances we print the warning. > far as I know has not been called. Weird. Maybe awk is allowing an > uninitialized variable to be tested for in your code but if so, you need to > be cautious how you do this in python. It's standard behaviour in any POSIX compliant awk, variables are initialised to empty strings/arrays or zero as appropriate to first use. The original AWK book has already been mentioned, which covers nawk. I'll add the O'Reilly book "sed & awk" which covers the POSIX version and includes several extensions not covered in the original book. (It also covers sed but that's irrelevant here) -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos -- https://mail.python.org/mailman/listinfo/python-list
RE: convert script awk in python
Cameron, I agree with you. I first encountered AWK in 1982 when I went to work for Bell Labs. I have not had any reason to use AWK since before the year 2000 so I was not sure that unused variables were initialized to zero. The code seemed to assume that. I have learned quite a few languages since and after a while, they tend to blend into each other. I think it would indeed have been more AWKthonic (or should that be called AWKward?) to have a BEGIN section in which functions were declared and variables clearly initialized but the language does allow some quick and dirty ways to do things and clearly the original programmer used some. Which brings us back to languages like python. When I started using AWK and a slew of other UNIX programs years ago, what I found interesting is how much AWK was patterned a bit on the C language, not a surprise as the K in AWK is Brian Kernighan who had a hand in C. But unlike C that made me wait around as it compiled, AWK was a bit more of an interpreted language and I could write one-liner shell scripts (well, stretched over a few lines if needed) that did things. True, if you stuck an entire program in a BEGIN statement and did not actually loop over data, it seems a tad wasteful. But sometimes it was handy to use it to test out a bit of C code I was writing without waiting for the whole compile thing. In a sense, it was bit like using the python REPL and getting raid feedback. Of course, when I was an early adopter of C++, too many things were not in AWK! What gets me is the original question which made it sound a bit like asking how you would translate some fairly simple program from language A to language B. For some fairly simple programs, the translation effort could be minimal. There are often trivial mappings between similar constructs. Quite a bit of python simply takes a block of code in another language that is between curly braces, and lines it up indented below whatever it modifies and after a colon. The reverse may be similarly trivial. There are of course many such changes needed for some languages but when some novel twist is used that the language does not directly support, you may need to innovate or do a rewrite that avoids it. But still, except in complicated expressions, you can rewrite x++ to "x += 1" if that is available or "x = x + 1" or "x -> x + 1" or whatever. What gets me here is that AWK in his program was being used exactly for what it was designed. Python is more general-purpose. Had we been asked (not on this forum) to convert that AWK script to PERL, it would have been much more straightforward because PERL was also designed to be able to read in lines and break them into parts and act on them. It has constructs like the diamond operator or split that make it easy. Hence, at the end, I suggested Tomasz may want to do his task not using just basic python but some module others have already shared that emulates some of the filter aspects of AWK. That may make it easier to just translate the bits of code to python while largely leaving the logic in place, depending on the module. Just to go way off the rails, was our annoying cross-poster from a while back also promising to include a language like AWK into their universal translator by just saving some JSON descriptions? -Original Message- From: Python-list On Behalf Of Cameron Simpson Sent: Tuesday, March 23, 2021 6:38 PM To: Tomasz Rola Cc: Avi Gross via Python-list Subject: Re: convert script awk in python On 23Mar2021 16:37, Tomasz Rola wrote: >On Tue, Mar 23, 2021 at 10:40:01AM -0400, Avi Gross via Python-list wrote: >[...] >> I am a tod concerned as to where any of the variables x, y or z have >> been defined at this point. I have not seen a BEGIN {...} >> pattern/action or anywhere these have been initialized but they are >> set in a function that as far as I know has not been called. Weird. >> Maybe awk is allowing an uninitialized variable to be tested for in >> your code but if so, you need to be cautious how you do this in python. > >As far as I can say, the type of uninitialised variable is groked from >the first operation on it. I.e., "count += 1" first initializes count >to 0 and then adds 1. > >This might depend on exact awk being used. There were few of them >during last 30+ years. I just assume it does as I wrote above. I'm pretty sure this behaviour's been there since very early times. I think it was there when I learnt awk, decades ago. >Using BEGIN would be in better style, of course. Aye. Always good to be up front about initial values. >There is a very nice book, "The AWK Programming Language" by Aho, >Kernighan and Weinberger. First printed in 1988, now free and in pdf >format. Go search. Yes, a really nice book. [... walks into the other room to get his copy ...] October 1988. Wow. There're 11 pages
Re: convert script awk in python
On 23Mar2021 16:37, Tomasz Rola wrote: >On Tue, Mar 23, 2021 at 10:40:01AM -0400, Avi Gross via Python-list wrote: >[...] >> I am a tod concerned as to where any of the variables x, y or z have been >> defined at this point. I have not seen a BEGIN {...} pattern/action or >> anywhere these have been initialized but they are set in a function that as >> far as I know has not been called. Weird. Maybe awk is allowing an >> uninitialized variable to be tested for in your code but if so, you need to >> be cautious how you do this in python. > >As far as I can say, the type of uninitialised variable is groked from >the first operation on it. I.e., "count += 1" first initializes count >to 0 and then adds 1. > >This might depend on exact awk being used. There were few of them >during last 30+ years. I just assume it does as I wrote above. I'm pretty sure this behaviour's been there since very early times. I think it was there when I learnt awk, decades ago. >Using BEGIN would be in better style, of course. Aye. Always good to be up front about initial values. >There is a very nice book, "The AWK Programming Language" by Aho, >Kernighan and Weinberger. First printed in 1988, now free and in pdf >format. Go search. Yes, a really nice book. [... walks into the other room to get his copy ...] October 1988. Wow. There're 11 pages of good example programmes before any need for user variables at all. But at "1.5, Counting" is the sentence: Awk variables used as numbers begin life with the value 0, so we don't need to initialise emp. Which is great for writing ad hoc scripts, particularly on the command line. But not a great style for anything complex. Cheers, Cameron Simpson -- https://mail.python.org/mailman/listinfo/python-list
Re: convert script awk in python
On Tue, Mar 23, 2021 at 10:40:01AM -0400, Avi Gross via Python-list wrote: > Alberto, > [...] > I am a tod concerned as to where any of the variables x, y or z have been > defined at this point. I have not seen a BEGIN {...} pattern/action or > anywhere these have been initialized but they are set in a function that as > far as I know has not been called. Weird. Maybe awk is allowing an > uninitialized variable to be tested for in your code but if so, you need to > be cautious how you do this in python. As far as I can say, the type of uninitialised variable is groked from the first operation on it. I.e., "count += 1" first initializes count to 0 and then adds 1. This might depend on exact awk being used. There were few of them during last 30+ years. I just assume it does as I wrote above. Using BEGIN would be in better style, of course. There is a very nice book, "The AWK Programming Language" by Aho, Kernighan and Weinberger. First printed in 1988, now free and in pdf format. Go search. Perhaps it is easier to make the script work rather than rewriting it in another language. Both ways require deep understanding of current code, then with rewrite one also has to make sure new code is drop in replacement for the old. There is very nice documentation to gawk (Gnu AWK). -- Regards, Tomasz Rola -- ** A C programmer asked whether computer had Buddha's nature. ** ** As the answer, master did "rm -rif" on the programmer's home** ** directory. And then the C programmer became enlightened... ** ** ** ** Tomasz Rola mailto:tomasz_r...@bigfoot.com ** -- https://mail.python.org/mailman/listinfo/python-list
RE: convert script awk in python
ou: https://www.google.com/search?q=python+awk+module=ALeKk03gD2jZYJkZ0cGv zbKlErWzQJ5Spw%3A1616510303610=hp=X_1ZYJPDIobI5gKMk4CACA=AI NFCbYAYFoLb50VZVAododj5tTkC9AtICpv08Aw=python+awk+module_lcp=Cgdnd 3Mtd2l6EAMyBggAEBYQHjoHCCMQ6gIQJzoHCC4Q6gIQJzoECCMQJzoFCAAQsQM6CwguELEDEMcBE KMCOggIABCxAxCDAToCCAA6BQguELEDUNobWLhGYIFIaAFwAHgAgAFWiAH2CJIBAjE3mAEAoAEBq gEHZ3dzLXdperABCg=gws-wiz=0ahUKEwjT7q2T0sbvAhUGpFkKHYwJAIAQ4dUDC Ak=5 -Original Message----- From: Python-list On Behalf Of alberto Sent: Tuesday, March 23, 2021 7:32 AM To: python-list@python.org Subject: convert script awk in python Hi to everyone I have an awk script that calculate minimum distances between points ## atom type frag - atom type surface #!/bin/bash FILE1=$1.lammpstrj if [ -f $FILE1 ]; then awk 'function sq(x) { return x * x; } function dist(x1, y1, z1, x2, y2, z2) { return sqrt(sq(x1 - x2) + sq(y1 - y2) + sq(z1 - z2)); } function print_distances() { if (na == 0) print "No type 8 atoms."; else { min = 1000; for (a = 0; a < na; a++) { d = dist(x, y, z, pos[a,"x"], pos[a,"y"], pos[a,"z"]); #printf "%7.5f ", d; if (d < min) min = d; } printf "%6i%7.5f\n", istep, min; x = y = z = 0; delete pos; na = 0; } } $1 == 113 { if (x || y || z) print "More than one type $8 atom."; else { x = $2; y = $3; z = $4; istep++; } } $8 == 10 { pos[na,"x"] = $2; pos[na,"y"] = $3; pos[na,"z"] = $4; na += 1; } /^ITEM: ATOMS/ && na != 0 { print_distances(); } END { print_distances(); } ' $1.lammpstrj > $1_mindist.txt fi where $1 is a particular atom and $8 is a other type of atoms How could I prepare a python script regards A -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
convert script awk in python
Hi to everyone I have an awk script that calculate minimum distances between points ## atom type frag - atom type surface #!/bin/bash FILE1=$1.lammpstrj if [ -f $FILE1 ]; then awk 'function sq(x) { return x * x; } function dist(x1, y1, z1, x2, y2, z2) { return sqrt(sq(x1 - x2) + sq(y1 - y2) + sq(z1 - z2)); } function print_distances() { if (na == 0) print "No type 8 atoms."; else { min = 1000; for (a = 0; a < na; a++) { d = dist(x, y, z, pos[a,"x"], pos[a,"y"], pos[a,"z"]); #printf "%7.5f ", d; if (d < min) min = d; } printf "%6i%7.5f\n", istep, min; x = y = z = 0; delete pos; na = 0; } } $1 == 113 { if (x || y || z) print "More than one type $8 atom."; else { x = $2; y = $3; z = $4; istep++; } } $8 == 10 { pos[na,"x"] = $2; pos[na,"y"] = $3; pos[na,"z"] = $4; na += 1; } /^ITEM: ATOMS/ && na != 0 { print_distances(); } END { print_distances(); } ' $1.lammpstrj > $1_mindist.txt fi where $1 is a particular atom and $8 is a other type of atoms How could I prepare a python script regards A -- https://mail.python.org/mailman/listinfo/python-list