Re: convert script awk in python

2021-03-29 Thread Loris Bennett
Michael Torrie  writes:

> On 3/25/21 1:14 AM, Loris Bennett wrote:
>> Does any one have a better approach?
>
> Not as such.  Running a command and parsing its output is a relatively
> common task. Years ago I wrote my own simple python wrapper function
> that would make it easier to run a program with arguments, and capture
> its output.  I ended up using that wrapper many times, which saved a lot
> of time.
>
> When it comes to converting a bash pipeline process to Python, it's
> worth considering that most of pipelines seem to involve parsing using
> sed or awk (as yours do), which is way easier to do from python without
> that kind of pipelining. However there is a fantastic article I read
> years ago about how generators are python's equivalent to a pipe.
> Anyone wanting to replace a bash script with python should read this:
>
> https://www.dabeaz.com/generators/Generators.pdf

Thanks for the link - very instructive.

> Also there's an interesting shell scripting language based on Python
> called xonsh which makes it much easier to interact with processes like
> bash does, but still leveraging Python to process the output.
> https://xon.sh/ .

That looks very interesting, too.

Cheers,

Loris

-- 
This signature is currently under construction.
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: convert script awk in python

2021-03-27 Thread pjfarley3
Many thanks for the link to that document.  Most helpful.

Peter

> -Original Message-
> From: Michael Torrie 
> Sent: Friday, March 26, 2021 8:32 PM
> To: python-list@python.org
> Subject: Re: convert script awk in python
> 
> On 3/25/21 1:14 AM, Loris Bennett wrote:
> > Does any one have a better approach?
> 
> Not as such.  Running a command and parsing its output is a relatively
> common task. Years ago I wrote my own simple python wrapper function
> that would make it easier to run a program with arguments, and capture
> its output.  I ended up using that wrapper many times, which saved a lot
> of time.
> 
> When it comes to converting a bash pipeline process to Python, it's
> worth considering that most of pipelines seem to involve parsing using
> sed or awk (as yours do), which is way easier to do from python without
> that kind of pipelining. However there is a fantastic article I read
> years ago about how generators are python's equivalent to a pipe.
> Anyone wanting to replace a bash script with python should read this:
> 
> https://www.dabeaz.com/generators/Generators.pdf
> 
> Also there's an interesting shell scripting language based on Python
> called xonsh which makes it much easier to interact with processes like
> bash does, but still leveraging Python to process the output.
> https://xon.sh/ .
--

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: convert script awk in python

2021-03-26 Thread Avi Gross via Python-list
https://docs.python.org/3/library/fileinput.html

Dan,

Yes, fileinput sounds like what I described and more. It does indeed seem
to emulate the interface in programs like AWK including using "-" as a
placeholder for standard input. Now all you need is to have it also do the
split!

∀vi ∃. Grθß

-Original Message-
From: Python-list  On
Behalf Of 2qdxy4rzwzuui...@potatochowder.com
Sent: Friday, March 26, 2021 9:43 PM
To: python-list@python.org
Subject: Re: convert script awk in python

On 2021-03-26 at 21:06:19 -0400,
Avi Gross via Python-list  wrote:

> A generator that opens one file at a time (or STDIN) in a consistent 
> manner, would be a reasonable thing to have as part of emulating AWK.

https://docs.python.org/3/library/fileinput.html
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: convert script awk in python

2021-03-26 Thread 2QdxY4RzWzUUiLuE
On 2021-03-26 at 21:06:19 -0400,
Avi Gross via Python-list  wrote:

> A generator that opens one file at a time (or STDIN) in a consistent
> manner, would be a reasonable thing to have as part of emulating AWK.

https://docs.python.org/3/library/fileinput.html
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: convert script awk in python

2021-03-26 Thread Avi Gross via Python-list
Michael,

A generator that opens one file at a time (or STDIN) in a consistent manner,
would be a reasonable thing to have as part of emulating AWK.

As I see it, you may want a bit more that includes having it know how to
parse each line it reads into some version of names that in Python might not
be $1 and $2 types of names but may be an array of strings with the complete
line perhaps being in array[0] and each  of the parts.

Clearly you would place whatever equivalent BEGIN statements in your code
above the call to the generator  then have something like a for loop
assigning the result of the generator to a variable and your multiple
condition/action parts in the loop. You then have the END outside the loop.

But it is far from as simple as that to emulate what AWK does such as
deciding whether you stop matching patterns once the first match is found
and executed. As I noted, some AWK features do not line up with normal
python such as assuming variables not initialized are zero or "" depending
on context. There may well be scoping issues and other things to consider.
And clearly you need to do things by hand if you want a character string to
be treated as an integer, ...

But all fairly doable, albeit not sure an easy translation between an AWK
script into python is trivial, or even a good idea. 

You could do a similar concept with other utilities like sed or grep or
other such filter utilities where the same generator, or a variant, might
automate things. I am pretty sure some module or other has done things like
this.

It is common in a language like PERL to do something like this:

while(<>)
{
  # get rid of the pesky newline character
  chomp;

  # read the fields in the current record into an array
  @fields = split(':', $_);

# DO stuff
}

The <> diamond operator is a sort of generator that reads in a line at a
time from as many files as needed and sticks it in $_ by default and then
you throw away the newline and split the line and then do what you wish
after that. No reason python cannot have something similar, maybe more
wordy.

Disclaimer: I am not suggesting people use AWK or PERL or anything else. The
focus is if people come from other programming environments and are looking
at how to do common tasks in python.


-Original Message-
From: Python-list  On
Behalf Of Michael Torrie
Sent: Friday, March 26, 2021 8:32 PM
To: python-list@python.org
Subject: Re: convert script awk in python

On 3/25/21 1:14 AM, Loris Bennett wrote:
> Does any one have a better approach?

Not as such.  Running a command and parsing its output is a relatively
common task. Years ago I wrote my own simple python wrapper function that
would make it easier to run a program with arguments, and capture its
output.  I ended up using that wrapper many times, which saved a lot of
time.

When it comes to converting a bash pipeline process to Python, it's worth
considering that most of pipelines seem to involve parsing using sed or awk
(as yours do), which is way easier to do from python without that kind of
pipelining. However there is a fantastic article I read years ago about how
generators are python's equivalent to a pipe.
Anyone wanting to replace a bash script with python should read this:

https://www.dabeaz.com/generators/Generators.pdf

Also there's an interesting shell scripting language based on Python called
xonsh which makes it much easier to interact with processes like bash does,
but still leveraging Python to process the output.
https://xon.sh/ .
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: convert script awk in python

2021-03-26 Thread Michael Torrie
On 3/25/21 1:14 AM, Loris Bennett wrote:
> Does any one have a better approach?

Not as such.  Running a command and parsing its output is a relatively
common task. Years ago I wrote my own simple python wrapper function
that would make it easier to run a program with arguments, and capture
its output.  I ended up using that wrapper many times, which saved a lot
of time.

When it comes to converting a bash pipeline process to Python, it's
worth considering that most of pipelines seem to involve parsing using
sed or awk (as yours do), which is way easier to do from python without
that kind of pipelining. However there is a fantastic article I read
years ago about how generators are python's equivalent to a pipe.
Anyone wanting to replace a bash script with python should read this:

https://www.dabeaz.com/generators/Generators.pdf

Also there's an interesting shell scripting language based on Python
called xonsh which makes it much easier to interact with processes like
bash does, but still leveraging Python to process the output.
https://xon.sh/ .
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: convert script awk in python

2021-03-26 Thread Pankaj Jangid
Christian Gollwitzer  writes:

> The closest equivalent I can come up with in Python is this:
>
> ==
> import sys
>
> s=0
> for line in sys.stdin:
> try:
> s += float(line.split()[1])
> except:
> pass
> print(s)
> ===
>
>
> I don't want to cram this into a python -c " "  line, if it even is
> possible; how do you handle indentation levels and loops??
>

I agree. Perhaps we need a ‘awk’ module/package. I see that there is one
in PyPI but that was last updated in 2016.

-- 
Regards,
Pankaj Jangid

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: convert script awk in python

2021-03-25 Thread Dan Ciprus (dciprus) via Python-list
... funny thing is that OP never contributed to this discussion. Several people 
provided very valuable inputs but OP did not even bother to say "thank you".


just saying ...

On Wed, Mar 24, 2021 at 11:22:02AM -0400, Avi Gross via Python-list wrote:

Cameron,

I agree with you. I first encountered AWK in 1982 when I went to work for
Bell Labs.

I have not had any reason to use AWK since before the year 2000 so I was not
sure that unused variables were initialized to zero. The code seemed to
assume that. I have learned quite a few languages since and after a while,
they tend to blend into each other.

I think it would indeed have been more AWKthonic (or should that be called
AWKward?) to have a BEGIN section in which functions were declared and
variables clearly initialized but the language does allow some quick and
dirty ways to do things and clearly the original programmer used some.

Which brings us back to languages like python. When I started using AWK and
a slew of other UNIX programs years ago, what I found interesting is how
much AWK was patterned a bit on the C language, not a surprise as the K in
AWK is Brian Kernighan who had a hand in C. But unlike C that made me wait
around as it compiled, AWK was a bit more of an interpreted language and I
could write one-liner shell scripts (well, stretched over a few lines if
needed) that did things. True, if you stuck an entire program in a BEGIN
statement and did not actually loop over data, it seems a tad wasteful. But
sometimes it was handy to use it to test out a bit of C code I was writing
without waiting for the whole compile thing. In a sense, it was  bit like
using the python REPL and getting raid feedback. Of course, when I was an
early adopter of C++, too many things were not in AWK!

What gets me is the original question which made it sound a bit like asking
how you would translate some fairly simple program from language A to
language B. For some fairly simple programs, the translation effort could be
minimal. There are often trivial mappings between similar constructs. Quite
a bit of python simply takes a block of code in another language that is
between curly braces, and lines it up indented below whatever it modifies
and after a colon. The reverse may be similarly trivial. There are of course
many such changes needed for some languages but when some novel twist is
used that the language does not directly support, you may need to innovate
or do a rewrite that avoids it. But still, except in complicated
expressions, you can rewrite x++ to "x += 1" if that is available or "x = x
+ 1" or "x -> x + 1" or whatever.

What gets me here is that AWK in his program  was being used exactly for
what it was designed. Python is more general-purpose. Had we been asked (not
on this forum) to convert that AWK script to PERL, it would have been much
more straightforward because PERL was also designed to be able to read in
lines and break them into parts and act on them. It has constructs like the
diamond operator or split that make it easy.

Hence, at the end, I suggested Tomasz may want to do his task not using just
basic python but some module others have already shared that emulates some
of the filter aspects of AWK. That may make it easier to just translate the
bits of code to python while largely leaving the logic in place, depending
on the module.

Just to go way off the rails, was our annoying cross-poster from a while
back also promising to include a language like AWK into their universal
translator by just saving some JSON descriptions?

-Original Message-
From: Python-list  On
Behalf Of Cameron Simpson
Sent: Tuesday, March 23, 2021 6:38 PM
To: Tomasz Rola 
Cc: Avi Gross via Python-list 
Subject: Re: convert script awk in python

On 23Mar2021 16:37, Tomasz Rola  wrote:

On Tue, Mar 23, 2021 at 10:40:01AM -0400, Avi Gross via Python-list wrote:
[...]

I am a tod concerned as to where any of the variables x, y or z have
been defined at this point. I have not seen a BEGIN {...}
pattern/action or anywhere these have been initialized but they are
set in a function that as far as I know has not been called. Weird.
Maybe awk is allowing an uninitialized variable to be tested for in
your code but if so, you need to be cautious how you do this in python.


As far as I can say, the type of uninitialised variable is groked from
the first operation on it. I.e., "count += 1" first initializes count
to 0 and then adds 1.

This might depend on exact awk being used. There were few of them
during last 30+ years. I just assume it does as I wrote above.


I'm pretty sure this behaviour's been there since very early times. I think
it was there when I learnt awk, decades ago.


Using BEGIN would be in better style, of course.


Aye. Always good to be up front about initial values.


There is a very nice book, "The AWK Programming Language" by Aho,
Kernighan and Weinberger. First printed in 1988, n

Re: convert script awk in python

2021-03-25 Thread Loris Bennett
Peter Otten <__pete...@web.de> writes:

> On 25/03/2021 08:14, Loris Bennett wrote:
>
>> I'm not doing that, but I am trying to replace a longish bash pipeline
>> with Python code.
>>
>> Within Emacs, often I use Org mode[1] to generate date via some bash
>> commands and then visualise the data via Python.  Thus, in a single Org
>> file I run
>>
>>/usr/bin/sacct  -u $user -o jobid -X -S $start -E $end -s COMPLETED -n  | 
>> \
>>xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print 
>> $3 " " $9}' | sed 's/%//g'
>>
>> The raw numbers are formatted by Org into a table
>>
>>| cpu_eff | mem_eff |
>>|-+-|
>>|96.6 |   99.11 |
>>|   93.43 |   100.0 |
>>|91.3 |   100.0 |
>>|   88.71 |   100.0 |
>>|   89.79 |   100.0 |
>>|   84.59 |   100.0 |
>>|   83.42 |   100.0 |
>>|   86.09 |   100.0 |
>>|   92.31 |   100.0 |
>>|   90.05 |   100.0 |
>>|   81.98 |   100.0 |
>>|   90.76 |   100.0 |
>>|   75.36 |   64.03 |
>>
>> I then read this into some Python code in the Org file and do something like
>>
>>df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0])
>>cpu_data = df.loc[: , "cpu_eff"]
>>mem_data = df.loc[: , "mem_eff"]
>>
>>...
>>
>>n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5))
>>n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5))
>>
>> which generates nice histograms.
>>
>> I decided rewrite the whole thing as a stand-alone Python program so
>> that I can run it as a cron job.  However, as a novice Python programmer
>> I am finding translating the bash part slightly clunky.  I am in the
>> middle of doing this and started with the following:
>>
>>  sacct = subprocess.Popen(["/usr/bin/sacct",
>>"-u", user,
>>"-S", period[0], "-E", period[1],
>>"-o", "jobid", "-X",
>>"-s", "COMPLETED", "-n"],
>>   stdout=subprocess.PIPE,
>>  )
>>
>>  jobids = []
>>
>>  for line in sacct.stdout:
>>  jobid = str(line.strip(), 'UTF-8')
>>  jobids.append(jobid)
>>
>>  for jobid in jobids:
>>  seff = subprocess.Popen(["/usr/bin/seff", jobid],
>>  stdin=sacct.stdout,
>>  stdout=subprocess.PIPE,
>>  )
>
> The statement above looks odd. If seff can read the jobids from stdin
> there should be no need to pass them individually, like:
>
> sacct = ...
> seff = Popen(
>   ["/usr/bin/seff"], stdin=sacct.stdout, stdout=subprocess.PIPE,
>   universal_newlines=True
> )
> for line in seff.communicate()[0].splitlines():
> ...

Indeed, seff cannot read multiple jobids.  That's why had 'xargs' in the
original bash code.  Initially I thought of calling 'xargs' via
Popen, but this seemed very fiddly (I didn't manage to get it working)
and anyway seemed a bit weird to me as it is really just a loop, which I
can implement perfectly well in Python.

Cheers,

Loris


>>  seff_output = []
>>  for line in seff.stdout:
>>  seff_output.append(str(line.strip(), "UTF-8"))
>>
>>  ...
>>
>> but compared the to the bash pipeline, this all seems a bit laboured.
>>
>> Does any one have a better approach?
>>
>> Cheers,
>>
>> Loris
>>
>>
>>> -Original Message-
>>> From: Cameron Simpson 
>>> Sent: Wednesday, March 24, 2021 6:34 PM
>>> To: Avi Gross 
>>> Cc: python-list@python.org
>>> Subject: Re: convert script awk in python
>>>
>>> On 24Mar2021 12:00, Avi Gross  wrote:
>>>> But I wonder how much languages like AWK are still used to make new
>>>> programs as compared to a time they were really useful.
>>>
>>> You mentioned in an adjacent post that you've not used AWK since 2000.
>>> By contrast, I still use it regularly.
>>>
>>> It's great for proof of concept at the command line or in small scripts, and
>>> as the innards of quite useful scripts. I've a trite "colsum" script which
>>> does nothing but generate and run a little awk programme to sum a column,
>>> and routinely type "blah  | colsum 2" or the like to get a tally.
>>>
>>> I totally agree that once you're processing a lot of data from places or
>>> where a shell script is making long pipelines or many command invocations,
>>> if that's a performance issue it is time to recode.
>>>
>>> Cheers,
>>> Cameron Simpson 
>>
>> Footnotes:
>> [1]  https://orgmode.org/
>>
>
-- 
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: convert script awk in python

2021-03-25 Thread Peter Otten

On 25/03/2021 08:14, Loris Bennett wrote:


I'm not doing that, but I am trying to replace a longish bash pipeline
with Python code.

Within Emacs, often I use Org mode[1] to generate date via some bash
commands and then visualise the data via Python.  Thus, in a single Org
file I run

   /usr/bin/sacct  -u $user -o jobid -X -S $start -E $end -s COMPLETED -n  | \
   xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 " 
" $9}' | sed 's/%//g'

The raw numbers are formatted by Org into a table


   | cpu_eff | mem_eff |
   |-+-|
   |96.6 |   99.11 |
   |   93.43 |   100.0 |
   |91.3 |   100.0 |
   |   88.71 |   100.0 |
   |   89.79 |   100.0 |
   |   84.59 |   100.0 |
   |   83.42 |   100.0 |
   |   86.09 |   100.0 |
   |   92.31 |   100.0 |
   |   90.05 |   100.0 |
   |   81.98 |   100.0 |
   |   90.76 |   100.0 |
   |   75.36 |   64.03 |

I then read this into some Python code in the Org file and do something like

   df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0])
   cpu_data = df.loc[: , "cpu_eff"]
   mem_data = df.loc[: , "mem_eff"]

   ...

   n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5))
   n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5))

which generates nice histograms.

I decided rewrite the whole thing as a stand-alone Python program so
that I can run it as a cron job.  However, as a novice Python programmer
I am finding translating the bash part slightly clunky.  I am in the
middle of doing this and started with the following:

 sacct = subprocess.Popen(["/usr/bin/sacct",
   "-u", user,
   "-S", period[0], "-E", period[1],
   "-o", "jobid", "-X",
   "-s", "COMPLETED", "-n"],
  stdout=subprocess.PIPE,
 )

 jobids = []

 for line in sacct.stdout:
 jobid = str(line.strip(), 'UTF-8')
 jobids.append(jobid)

 for jobid in jobids:
 seff = subprocess.Popen(["/usr/bin/seff", jobid],
 stdin=sacct.stdout,
 stdout=subprocess.PIPE,
 )


The statement above looks odd. If seff can read the jobids from stdin 
there should be no need to pass them individually, like:


sacct = ...
seff = Popen(
  ["/usr/bin/seff"], stdin=sacct.stdout, stdout=subprocess.PIPE,
  universal_newlines=True
)
for line in seff.communicate()[0].splitlines():
...



 seff_output = []
 for line in seff.stdout:
 seff_output.append(str(line.strip(), "UTF-8"))

 ...

but compared the to the bash pipeline, this all seems a bit laboured.

Does any one have a better approach?

Cheers,

Loris



-----Original Message-
From: Cameron Simpson 
Sent: Wednesday, March 24, 2021 6:34 PM
To: Avi Gross 
Cc: python-list@python.org
Subject: Re: convert script awk in python

On 24Mar2021 12:00, Avi Gross  wrote:

But I wonder how much languages like AWK are still used to make new
programs as compared to a time they were really useful.


You mentioned in an adjacent post that you've not used AWK since 2000.
By contrast, I still use it regularly.

It's great for proof of concept at the command line or in small scripts, and
as the innards of quite useful scripts. I've a trite "colsum" script which
does nothing but generate and run a little awk programme to sum a column,
and routinely type "blah  | colsum 2" or the like to get a tally.

I totally agree that once you're processing a lot of data from places or
where a shell script is making long pipelines or many command invocations,
if that's a performance issue it is time to recode.

Cheers,
Cameron Simpson 


Footnotes:
[1]  https://orgmode.org/




--
https://mail.python.org/mailman/listinfo/python-list


Re: convert script awk in python

2021-03-25 Thread Christian Gollwitzer

Am 25.03.21 um 00:30 schrieb Avi Gross:

It [awk] is, as noted, a great tool and if you only had one or a few tools like 
it
available, it can easily be bent and twisted to do much of what the others
do as it is more programmable than most. But following that line of
reasoning, fairly simple python scripts can be written with python -c "..."
or by pointing to a script


The thing with awk is that lots of useful text processing is directly 
built into the main syntax; whereas in Python, you can certainly do it 
as well, but it requires to load a library. The simple column summation 
mentioned before by Cameron would be


   awk ' {sum += $2 } END {print sum}'

which can be easily typed into a command line, with the benefit that it 
skips every line where the 2nd col is not a valid number. This is 
important because often there are empty lines, often there is an empty 
line at the end, some ascii headers whatever.


The closest equivalent I can come up with in Python is this:

==
import sys

s=0
for line in sys.stdin:
try:
s += float(line.split()[1])
except:
pass
print(s)
===


I don't want to cram this into a python -c " "  line, if it even is 
possible; how do you handle indentation levels and loops??


Of course, for big fancy programs Python is a much better choice than 
awk, no questions asked - but awk has a place for little things which 
fit the special programming model, and there are surprisingly many 
applications where this is just the easiest and fastest way to do the job.


It's like regexes - a few simple characters can do the job which 
otherwise requires a bulky program, but once the parsing gets to certain 
complexity, a true parsing language, or even just handcoded Python is 
much more maintainable.


Christian

PS: Exercise - handle lines commented out with a '#', i.e. skip those. 
In awk:


gawk '!/^\s*#/ {sum += $2 } END {print sum}'

--
https://mail.python.org/mailman/listinfo/python-list


Re: convert script awk in python

2021-03-25 Thread Loris Bennett
"Avi Gross"  writes:

> Just to be clear, Cameron, I retired very early and thus have had no reason
> to use AWK in a work situation and for a while was not using UNIX-based
> machines. I have no doubt I would have continued using WK as one part of my
> toolkit for years albeit less often as I found other tools better for some
> situations, let alone the kind I mentioned earlier that are not text-file
> based such as databases.
>
> It is, as noted, a great tool and if you only had one or a few tools like it
> available, it can easily be bent and twisted to do much of what the others
> do as it is more programmable than most. But following that line of
> reasoning, fairly simple python scripts can be written with python -c "..."
> or by pointing to a script
>
> Anyone have a collection of shell scripts that can be used in pipelines
> where each piece is just a call to python to do something simple?

I'm not doing that, but I am trying to replace a longish bash pipeline
with Python code.

Within Emacs, often I use Org mode[1] to generate date via some bash
commands and then visualise the data via Python.  Thus, in a single Org
file I run

  /usr/bin/sacct  -u $user -o jobid -X -S $start -E $end -s COMPLETED -n  | \   


  xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 
" " $9}' | sed 's/%//g' 
 
   
The raw numbers are formatted by Org into a table

  | cpu_eff | mem_eff |
  |-+-|
  |96.6 |   99.11 |
  |   93.43 |   100.0 |
  |91.3 |   100.0 |
  |   88.71 |   100.0 |
  |   89.79 |   100.0 |
  |   84.59 |   100.0 |
  |   83.42 |   100.0 |
  |   86.09 |   100.0 |
  |   92.31 |   100.0 |
  |   90.05 |   100.0 |
  |   81.98 |   100.0 |
  |   90.76 |   100.0 |
  |   75.36 |   64.03 |

I then read this into some Python code in the Org file and do something like

  df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0])
  cpu_data = df.loc[: , "cpu_eff"]  

 
  mem_data = df.loc[: , "mem_eff"]  



  ...

  n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5))  

 
  n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5)) 

which generates nice histograms.

I decided rewrite the whole thing as a stand-alone Python program so
that I can run it as a cron job.  However, as a novice Python programmer
I am finding translating the bash part slightly clunky.  I am in the
middle of doing this and started with the following:

sacct = subprocess.Popen(["/usr/bin/sacct",
  "-u", user,
  "-S", period[0], "-E", period[1],
  "-o", "jobid", "-X",
  "-s", "COMPLETED", "-n"],
 stdout=subprocess.PIPE,
)

jobids = []

for line in sacct.stdout:
jobid = str(line.strip(), 'UTF-8')
jobids.append(jobid)

for jobid in jobids:
seff = subprocess.Popen(["/usr/bin/seff", jobid],
stdin=sacct.stdout,
stdout=subprocess.PIPE,
)
seff_output = []
for line in seff.stdout:
seff_output.append(str(line.strip(), "UTF-8"))

...

but compared the to the bash pipeline, this all seems a bit laboured. 

Does any one have a better approach?

Cheers,

Loris


> -Original Message-
> From: Cameron Simpson  
> Sent: Wednesday, March 24, 2021 6:34 PM
> To: Avi Gross 
> Cc: python-list@python.org
> Subject: Re: convert script awk in python
>
> On 24Mar2021 12:00, Avi Gross  wrote:
>>But I wonder how much languages like AWK are still used to make new 
>>programs as compared to a time they were really useful.
>
> You mentioned in an adjacent post that you've not used AWK since 2000.  
> By contrast, I still use it regularly.
>
> It's great for proof of concept at the command line or in small scripts, and
> as the innards of quite use

RE: convert script awk in python

2021-03-24 Thread Avi Gross via Python-list
Just to be clear, Cameron, I retired very early and thus have had no reason
to use AWK in a work situation and for a while was not using UNIX-based
machines. I have no doubt I would have continued using WK as one part of my
toolkit for years albeit less often as I found other tools better for some
situations, let alone the kind I mentioned earlier that are not text-file
based such as databases.

It is, as noted, a great tool and if you only had one or a few tools like it
available, it can easily be bent and twisted to do much of what the others
do as it is more programmable than most. But following that line of
reasoning, fairly simple python scripts can be written with python -c "..."
or by pointing to a script

Anyone have a collection of shell scripts that can be used in pipelines
where each piece is just a call to python to do something simple?

-Original Message-
From: Cameron Simpson  
Sent: Wednesday, March 24, 2021 6:34 PM
To: Avi Gross 
Cc: python-list@python.org
Subject: Re: convert script awk in python

On 24Mar2021 12:00, Avi Gross  wrote:
>But I wonder how much languages like AWK are still used to make new 
>programs as compared to a time they were really useful.

You mentioned in an adjacent post that you've not used AWK since 2000.  
By contrast, I still use it regularly.

It's great for proof of concept at the command line or in small scripts, and
as the innards of quite useful scripts. I've a trite "colsum" script which
does nothing but generate and run a little awk programme to sum a column,
and routinely type "blah  | colsum 2" or the like to get a tally.

I totally agree that once you're processing a lot of data from places or
where a shell script is making long pipelines or many command invocations,
if that's a performance issue it is time to recode.

Cheers,
Cameron Simpson 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: convert script awk in python

2021-03-24 Thread Cameron Simpson
On 24Mar2021 12:00, Avi Gross  wrote:
>But I wonder how much languages like AWK are still used to make new 
>programs
>as compared to a time they were really useful.

You mentioned in an adjacent post that you've not used AWK since 2000.  
By contrast, I still use it regularly.

It's great for proof of concept at the command line or in small scripts, 
and as the innards of quite useful scripts. I've a trite "colsum" script 
which does nothing but generate and run a little awk programme to sum a 
column, and routinely type "blah  | colsum 2" or the like to get a 
tally.

I totally agree that once you're processing a lot of data from places or 
where a shell script is making long pipelines or many command 
invocations, if that's a performance issue it is time to recode.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: convert script awk in python

2021-03-24 Thread Alan Gauld via Python-list
On 24/03/2021 16:00, Avi Gross via Python-list wrote:

> But I wonder how much languages like AWK are still used to make new programs
> as compared to a time they were really useful.

True. I first discovered awk from a Byte article around 1988/9
and it became my goto tool for text munching right up until
I found Python in 1998.

I still use it as part of a unix command pipeline but I rarely
write awk scripts in a file anymore - if it's that complex I
reach for Python.

But at one time I had a dozen or more awk scripts in my ~/bin folder.

I also used awk on a real-world project to process csv files from an
Excel spreadsheet and create site-specific config files for some shiny
new WindowsNT(v3.1) boxes we were using. They had twin network
connections and hard coded IP settings(for resilience) and the network
designers delivered the site settings by Excel. We turned them into .BAT
files using awk.

Eventually, we figured out how to write Excel macros and converted
it all to VBA. Happy days. :-)

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


-- 
https://mail.python.org/mailman/listinfo/python-list


RE: convert script awk in python

2021-03-24 Thread Avi Gross via Python-list
Alan,

Back when various UNIX (later also included in other Operating environments
like Linux and the Mac OS and even Microsoft) utilities came along, the
paradigm was a bit different and some kinds of tasks were seen as being done
with a pipeline of often small and focused utilities. You mentioned SED
which at first seems like a very simple tool but if you look again, it can
replace lots of other tools mostly as you can write one-liners with lots of
power. AWK, in some sense, was even more powerful and can emulate so many
others.

But it came with a cost compared to some modern languages where by attaching
a few modules, you can do much of the same in fewer passes over the data.

I am not sure if I mentioned it here, but I was once on a project that
stored all kinds of billing information in primitive text files using a
vertical bar as  record separator. My boss, who was not really a programmer,
started looking at analyzing the data fairly primitively ended up writing
huge shell scripts (ksh, I think) that remotely went to our computers around
the world and gathered the files and processed them through pipelines that
often were 10 or more parts as he selectively broke each line into parts,
removed some and so on. He would use /bin/echo, cut, grep, sed, and so on.
The darn thing ran for hours which was fine when it was running at midnight
in Missouri, but not so much when it ran the same time in countries like
Japan and Israel where the users were awake. I got lots of complaints and
showed him how his entire mess could be replaced mostly by a single AWK
script and complete in minutes.

Of course, now, with a fast internet and modern languages that can run
threads in parallel, it probably would complete in seconds. Maybe I would
have translated that AWK to python after all, but these days I am studying
Kotlin so maybe ...

As I see it, many languages have a trade-off. The fact that AWK decided to
allow a variable to be used without any other form of declaration, was a
feature. It could easily lead to errors if you spelled something wrong. But
look at Python. You can use a variable to hold anything just by using it. If
you spell it wrong later when putting something else in it, no problem. You
now have two variables. If you try to access the value of a non-initialized
variable, you get an error. But many more strongly-typed languages would
catch more potential errors. If you store an int in a variable and later
mistakenly put a string in the same variable name, python is happy. And that
can be a GOOD feature for programmers but will not catch some errors.
Initializing variables to 0 really only makes sense for numeric variables.
When a language allows all kinds of "objects" you might need an
object-specific default initialization and for some objects, that makes no
sense. As you note, the POSIX compliant versions of AWK do also initialize,
if needed, to empty strings.

But I wonder how much languages like AWK are still used to make new programs
as compared to a time they were really useful. So many people sort of live
within one application in a GUI rather than work at a textual level in a
shell where many problems can rapidly be done with a few smaller tools,
often in a pipeline.

Avi

-Original Message-
From: Python-list  On
Behalf Of Alan Gauld via Python-list
Sent: Wednesday, March 24, 2021 5:28 AM
To: python-list@python.org
Subject: Re: convert script awk in python

On 23/03/2021 14:40, Avi Gross via Python-list wrote:

> $1 == 113 {
> if (x || y || z)
> print "More than one type $8 atom.";
> else {
> x = $2; y = $3; z = $4;
> istep++;
> }
> }
> 
> I am a tod concerned as to where any of the variables x, y or z have 
> been defined at this point.

They haven't been, they are using awk's auto-initialization feature.
The variables are defined in this bit of code. The first time we see $1 ==
113 we define the variables. On subsequent appearances we print the warning.

> far as I know has not been called. Weird. Maybe awk is allowing an 
> uninitialized variable to be tested for in your code but if so, you 
> need to be cautious how you do this in python.

It's standard behaviour in any POSIX compliant awk, variables are
initialised to empty strings/arrays or zero as appropriate to first use.

The original AWK book has already been mentioned, which covers nawk.
I'll add the O'Reilly book "sed & awk" which covers the POSIX version and
includes several extensions not covered in the original book. (It also
covers sed but that's irrelevant here)

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: convert script awk in python

2021-03-24 Thread Alan Gauld via Python-list
On 23/03/2021 14:40, Avi Gross via Python-list wrote:

> $1 == 113 {
> if (x || y || z)
> print "More than one type $8 atom.";
> else {
> x = $2; y = $3; z = $4;
> istep++;
> }
> }
> 
> I am a tod concerned as to where any of the variables x, y or z have been
> defined at this point. 

They haven't been, they are using awk's auto-initialization feature.
The variables are defined in this bit of code. The first time we see $1
== 113 we define the variables. On subsequent appearances we print the
warning.

> far as I know has not been called. Weird. Maybe awk is allowing an
> uninitialized variable to be tested for in your code but if so, you need to
> be cautious how you do this in python.

It's standard behaviour in any POSIX compliant awk, variables are
initialised to empty strings/arrays or zero as appropriate to first use.

The original AWK book has already been mentioned, which covers nawk.
I'll add the O'Reilly book "sed & awk" which covers the POSIX version
and includes several extensions not covered in the original book. (It
also covers sed but that's irrelevant here)

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


-- 
https://mail.python.org/mailman/listinfo/python-list


RE: convert script awk in python

2021-03-24 Thread Avi Gross via Python-list
Cameron,

I agree with you. I first encountered AWK in 1982 when I went to work for
Bell Labs.

I have not had any reason to use AWK since before the year 2000 so I was not
sure that unused variables were initialized to zero. The code seemed to
assume that. I have learned quite a few languages since and after a while,
they tend to blend into each other. 

I think it would indeed have been more AWKthonic (or should that be called
AWKward?) to have a BEGIN section in which functions were declared and
variables clearly initialized but the language does allow some quick and
dirty ways to do things and clearly the original programmer used some.

Which brings us back to languages like python. When I started using AWK and
a slew of other UNIX programs years ago, what I found interesting is how
much AWK was patterned a bit on the C language, not a surprise as the K in
AWK is Brian Kernighan who had a hand in C. But unlike C that made me wait
around as it compiled, AWK was a bit more of an interpreted language and I
could write one-liner shell scripts (well, stretched over a few lines if
needed) that did things. True, if you stuck an entire program in a BEGIN
statement and did not actually loop over data, it seems a tad wasteful. But
sometimes it was handy to use it to test out a bit of C code I was writing
without waiting for the whole compile thing. In a sense, it was  bit like
using the python REPL and getting raid feedback. Of course, when I was an
early adopter of C++, too many things were not in AWK!

What gets me is the original question which made it sound a bit like asking
how you would translate some fairly simple program from language A to
language B. For some fairly simple programs, the translation effort could be
minimal. There are often trivial mappings between similar constructs. Quite
a bit of python simply takes a block of code in another language that is
between curly braces, and lines it up indented below whatever it modifies
and after a colon. The reverse may be similarly trivial. There are of course
many such changes needed for some languages but when some novel twist is
used that the language does not directly support, you may need to innovate
or do a rewrite that avoids it. But still, except in complicated
expressions, you can rewrite x++ to "x += 1" if that is available or "x = x
+ 1" or "x -> x + 1" or whatever.

What gets me here is that AWK in his program  was being used exactly for
what it was designed. Python is more general-purpose. Had we been asked (not
on this forum) to convert that AWK script to PERL, it would have been much
more straightforward because PERL was also designed to be able to read in
lines and break them into parts and act on them. It has constructs like the
diamond operator or split that make it easy.

Hence, at the end, I suggested Tomasz may want to do his task not using just
basic python but some module others have already shared that emulates some
of the filter aspects of AWK. That may make it easier to just translate the
bits of code to python while largely leaving the logic in place, depending
on the module.

Just to go way off the rails, was our annoying cross-poster from a while
back also promising to include a language like AWK into their universal
translator by just saving some JSON descriptions?

-Original Message-
From: Python-list  On
Behalf Of Cameron Simpson
Sent: Tuesday, March 23, 2021 6:38 PM
To: Tomasz Rola 
Cc: Avi Gross via Python-list 
Subject: Re: convert script awk in python

On 23Mar2021 16:37, Tomasz Rola  wrote:
>On Tue, Mar 23, 2021 at 10:40:01AM -0400, Avi Gross via Python-list wrote:
>[...]
>> I am a tod concerned as to where any of the variables x, y or z have 
>> been defined at this point. I have not seen a BEGIN {...} 
>> pattern/action or anywhere these have been initialized but they are 
>> set in a function that as far as I know has not been called. Weird. 
>> Maybe awk is allowing an uninitialized variable to be tested for in 
>> your code but if so, you need to be cautious how you do this in python.
>
>As far as I can say, the type of uninitialised variable is groked from 
>the first operation on it. I.e., "count += 1" first initializes count 
>to 0 and then adds 1.
>
>This might depend on exact awk being used. There were few of them 
>during last 30+ years. I just assume it does as I wrote above.

I'm pretty sure this behaviour's been there since very early times. I think
it was there when I learnt awk, decades ago.

>Using BEGIN would be in better style, of course.

Aye. Always good to be up front about initial values.

>There is a very nice book, "The AWK Programming Language" by Aho, 
>Kernighan and Weinberger. First printed in 1988, now free and in pdf 
>format. Go search.

Yes, a really nice book. [... walks into the other room to get his copy ...]
October 1988.

Wow. There're 11 pages 

Re: convert script awk in python

2021-03-23 Thread Cameron Simpson
On 23Mar2021 16:37, Tomasz Rola  wrote:
>On Tue, Mar 23, 2021 at 10:40:01AM -0400, Avi Gross via Python-list wrote:
>[...]
>> I am a tod concerned as to where any of the variables x, y or z have been
>> defined at this point. I have not seen a BEGIN {...} pattern/action or
>> anywhere these have been initialized but they are set in a function that as
>> far as I know has not been called. Weird. Maybe awk is allowing an
>> uninitialized variable to be tested for in your code but if so, you need to
>> be cautious how you do this in python.
>
>As far as I can say, the type of uninitialised variable is groked from
>the first operation on it. I.e., "count += 1" first initializes count
>to 0 and then adds 1.
>
>This might depend on exact awk being used. There were few of them
>during last 30+ years. I just assume it does as I wrote above.

I'm pretty sure this behaviour's been there since very early times. I 
think it was there when I learnt awk, decades ago.

>Using BEGIN would be in better style, of course.

Aye. Always good to be up front about initial values.

>There is a very nice book, "The AWK Programming Language" by Aho,
>Kernighan and Weinberger. First printed in 1988, now free and in pdf
>format. Go search.

Yes, a really nice book. [... walks into the other room to get his copy 
...] October 1988.

Wow. There're 11 pages of good example programmes before any need for 
user variables at all. But at "1.5, Counting" is the sentence:

Awk variables used as numbers begin life with the value 0, so we 
don't need to initialise emp.

Which is great for writing ad hoc scripts, particularly on the command 
line. But not a great style for anything complex.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: convert script awk in python

2021-03-23 Thread Tomasz Rola
On Tue, Mar 23, 2021 at 10:40:01AM -0400, Avi Gross via Python-list wrote:
> Alberto,
> 
[...]
> I am a tod concerned as to where any of the variables x, y or z have been
> defined at this point. I have not seen a BEGIN {...} pattern/action or
> anywhere these have been initialized but they are set in a function that as
> far as I know has not been called. Weird. Maybe awk is allowing an
> uninitialized variable to be tested for in your code but if so, you need to
> be cautious how you do this in python.

As far as I can say, the type of uninitialised variable is groked from
the first operation on it. I.e., "count += 1" first initializes count
to 0 and then adds 1.

This might depend on exact awk being used. There were few of them
during last 30+ years. I just assume it does as I wrote above.

Using BEGIN would be in better style, of course.

There is a very nice book, "The AWK Programming Language" by Aho,
Kernighan and Weinberger. First printed in 1988, now free and in pdf
format. Go search. Perhaps it is easier to make the script work rather
than rewriting it in another language. Both ways require deep
understanding of current code, then with rewrite one also has to make
sure new code is drop in replacement for the old.

There is very nice documentation to gawk (Gnu AWK).

-- 
Regards,
Tomasz Rola

--
** A C programmer asked whether computer had Buddha's nature.  **
** As the answer, master did "rm -rif" on the programmer's home**
** directory. And then the C programmer became enlightened...  **
** **
** Tomasz Rola  mailto:tomasz_r...@bigfoot.com **
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: convert script awk in python

2021-03-23 Thread Avi Gross via Python-list
ou:

https://www.google.com/search?q=python+awk+module=ALeKk03gD2jZYJkZ0cGv
zbKlErWzQJ5Spw%3A1616510303610=hp=X_1ZYJPDIobI5gKMk4CACA=AI
NFCbYAYFoLb50VZVAododj5tTkC9AtICpv08Aw=python+awk+module_lcp=Cgdnd
3Mtd2l6EAMyBggAEBYQHjoHCCMQ6gIQJzoHCC4Q6gIQJzoECCMQJzoFCAAQsQM6CwguELEDEMcBE
KMCOggIABCxAxCDAToCCAA6BQguELEDUNobWLhGYIFIaAFwAHgAgAFWiAH2CJIBAjE3mAEAoAEBq
gEHZ3dzLXdperABCg=gws-wiz=0ahUKEwjT7q2T0sbvAhUGpFkKHYwJAIAQ4dUDC
Ak=5




-Original Message-----
From: Python-list  On
Behalf Of alberto
Sent: Tuesday, March 23, 2021 7:32 AM
To: python-list@python.org
Subject: convert script awk in python

Hi to everyone I have an awk script that calculate minimum distances between
points 

## atom type frag - atom type surface
#!/bin/bash

FILE1=$1.lammpstrj

if [ -f $FILE1 ];
then

awk 'function sq(x) {
return x * x;
}
function dist(x1, y1, z1, x2, y2, z2) {
return sqrt(sq(x1 - x2) + sq(y1 - y2) + sq(z1 - z2)); } function
print_distances() {
if (na == 0)
print "No type 8 atoms.";
else {
min = 1000;
for (a = 0; a < na; a++) {
d = dist(x, y, z, pos[a,"x"], pos[a,"y"], pos[a,"z"]);
#printf "%7.5f ", d;
if (d < min) min = d;
}
printf "%6i%7.5f\n", istep, min;
x = y = z = 0;
delete pos;
na = 0;
}
}
$1 == 113 {
if (x || y || z)
print "More than one type $8 atom.";
else {
x = $2; y = $3; z = $4;
istep++;
}
}
$8 == 10 {
pos[na,"x"] = $2; pos[na,"y"] = $3; pos[na,"z"] = $4;
na += 1;
}
/^ITEM: ATOMS/ && na != 0 { print_distances(); }
END   { print_distances(); }
' $1.lammpstrj > $1_mindist.txt
fi

where $1 is a particular atom and $8 is a other type of atoms

How could I prepare  a python script 

regards

A
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


convert script awk in python

2021-03-23 Thread alberto
Hi to everyone I have an awk script that calculate minimum distances between 
points 

## atom type frag - atom type surface
#!/bin/bash

FILE1=$1.lammpstrj

if [ -f $FILE1 ];
then

awk 'function sq(x) {
return x * x;
}
function dist(x1, y1, z1, x2, y2, z2) {
return sqrt(sq(x1 - x2) + sq(y1 - y2) + sq(z1 - z2));
}
function print_distances() {
if (na == 0)
print "No type 8 atoms.";
else {
min = 1000;
for (a = 0; a < na; a++) {
d = dist(x, y, z, pos[a,"x"], pos[a,"y"], pos[a,"z"]);
#printf "%7.5f ", d;
if (d < min) min = d;
}
printf "%6i%7.5f\n", istep, min;
x = y = z = 0;
delete pos;
na = 0;
}
}
$1 == 113 {
if (x || y || z)
print "More than one type $8 atom.";
else {
x = $2; y = $3; z = $4;
istep++;
}
}
$8 == 10 {
pos[na,"x"] = $2; pos[na,"y"] = $3; pos[na,"z"] = $4;
na += 1;
}
/^ITEM: ATOMS/ && na != 0 { print_distances(); }
END   { print_distances(); }
' $1.lammpstrj > $1_mindist.txt
fi

where $1 is a particular atom and $8 is a other type of atoms

How could I prepare  a python script 

regards

A
-- 
https://mail.python.org/mailman/listinfo/python-list