On 25/03/2021 08:14, Loris Bennett wrote:

I'm not doing that, but I am trying to replace a longish bash pipeline
with Python code.

Within Emacs, often I use Org mode[1] to generate date via some bash
commands and then visualise the data via Python.  Thus, in a single Org
file I run

   /usr/bin/sacct  -u $user -o jobid -X -S $start -E $end -s COMPLETED -n  | \
   xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 " 
" $9}' | sed 's/%//g'
The raw numbers are formatted by Org into a table

   | cpu_eff | mem_eff |
   |---------+---------|
   |    96.6 |   99.11 |
   |   93.43 |   100.0 |
   |    91.3 |   100.0 |
   |   88.71 |   100.0 |
   |   89.79 |   100.0 |
   |   84.59 |   100.0 |
   |   83.42 |   100.0 |
   |   86.09 |   100.0 |
   |   92.31 |   100.0 |
   |   90.05 |   100.0 |
   |   81.98 |   100.0 |
   |   90.76 |   100.0 |
   |   75.36 |   64.03 |

I then read this into some Python code in the Org file and do something like

   df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0])
   cpu_data = df.loc[: , "cpu_eff"]
   mem_data = df.loc[: , "mem_eff"]

   ...

   n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5))
   n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5))

which generates nice histograms.

I decided rewrite the whole thing as a stand-alone Python program so
that I can run it as a cron job.  However, as a novice Python programmer
I am finding translating the bash part slightly clunky.  I am in the
middle of doing this and started with the following:

         sacct = subprocess.Popen(["/usr/bin/sacct",
                                   "-u", user,
                                   "-S", period[0], "-E", period[1],
                                   "-o", "jobid", "-X",
                                   "-s", "COMPLETED", "-n"],
                                  stdout=subprocess.PIPE,
         )

         jobids = []

         for line in sacct.stdout:
             jobid = str(line.strip(), 'UTF-8')
             jobids.append(jobid)

         for jobid in jobids:
             seff = subprocess.Popen(["/usr/bin/seff", jobid],
                                     stdin=sacct.stdout,
                                     stdout=subprocess.PIPE,
             )

The statement above looks odd. If seff can read the jobids from stdin there should be no need to pass them individually, like:

sacct = ...
seff = Popen(
  ["/usr/bin/seff"], stdin=sacct.stdout, stdout=subprocess.PIPE,
  universal_newlines=True
)
for line in seff.communicate()[0].splitlines():
    ...


             seff_output = []
             for line in seff.stdout:
                 seff_output.append(str(line.strip(), "UTF-8"))

             ...

but compared the to the bash pipeline, this all seems a bit laboured.

Does any one have a better approach?

Cheers,

Loris


-----Original Message-----
From: Cameron Simpson <c...@cskk.id.au>
Sent: Wednesday, March 24, 2021 6:34 PM
To: Avi Gross <avigr...@verizon.net>
Cc: python-list@python.org
Subject: Re: convert script awk in python

On 24Mar2021 12:00, Avi Gross <avigr...@verizon.net> wrote:
But I wonder how much languages like AWK are still used to make new
programs as compared to a time they were really useful.

You mentioned in an adjacent post that you've not used AWK since 2000.
By contrast, I still use it regularly.

It's great for proof of concept at the command line or in small scripts, and
as the innards of quite useful scripts. I've a trite "colsum" script which
does nothing but generate and run a little awk programme to sum a column,
and routinely type "blah .... | colsum 2" or the like to get a tally.

I totally agree that once you're processing a lot of data from places or
where a shell script is making long pipelines or many command invocations,
if that's a performance issue it is time to recode.

Cheers,
Cameron Simpson <c...@cskk.id.au>

Footnotes:
[1]  https://orgmode.org/



--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to