Hi,

I'm using Python to parse out  metrics from logfiles, and ship them off to a 
database called InfluxDB, using their Python driver 
(https://github.com/influxdb/influxdb-python).

With InfluxDB, it's more efficient if you pack in more points into each message.

Hence, I'm using the grouper() recipe from the itertools documentation 
(https://docs.python.org/3.6/library/itertools.html), to process the data in 
chunks, and then shipping off the points at the end of each chunk:

  def grouper(iterable, n, fillvalue=None):
      "Collect data into fixed-length chunks or blocks"
      # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
      args = [iter(iterable)] * n
      return zip_longest(fillvalue=fillvalue, *args)
  ....
  for chunk in grouper(parse_iostat(f), 500):
      json_points = []
      for block in chunk:
          if block:
              try:
                  for i, line in enumerate(block):
                      # DO SOME STUFF
              except ValueError as e:
                  print("Bad output seen - skipping")
      client.write_points(json_points)
      print("Wrote in {} points to InfluxDB".format(len(json_points)))


However, for some parsers, not every line will yield a datapoint.

I'm wondering if perhaps rather than trying to chunk the input, it might be 
better off just calling len() on the points list each time, and sending it off 
when it's ready. E.g.:

    #!/usr/bin/env python3

    json_points = []
    _BATCH_SIZE = 2

    for line_number, line in enumerate(open('blah.txt', 'r')):
        if 'cat' in line:
            print('Found cat on line {}'.format(line_number + 1 ))
            json_points.append(line_number)
            print("json_points contains {} points".format(len(json_points)))
        if len(json_points) >= _BATCH_SIZE:
            # print("json_points contains {} points".format(len(json_points)))
            print('Sending off points!')
            json_points = []
            
    print("Loop finished. json_points contains {} 
points".format(len(json_points)))
    print('Sending off points!')

Does the above seem reasonable? Any issues you see? Or are there any other more 
efficient approaches to doing this?
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to