Re: Merging Related Information from 2 Tables

2009-10-30 Thread Jonathan McKeown
On Thursday 29 October 2009 20:44:12 Martin McCormick wrote:
 Giorgos Keramidas writes:
  You should use a Perl or Python script, and a hash...
 
  If you show us a few sample lines from the input file and how you want
  the output to look, it shouldn't be too hard to quickly hack one of those
  together.

The alternative is to use join(1).

   A records look like:

 hydrogen.cis.osu. 43200   IN  A   192.168.2.123

 Text or TXT records look similar [...]

 hydrogen.cis.osu. 5   IN  TXT cordell-north,009,192.168.2.123

This will work well since the default join field is the first field in the 
line.

Jonathan
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Merging Related Information from 2 Tables

2009-10-29 Thread Martin McCormick
This is probably going to be a hashing exercise but I am
checking to see if any of the building blocks needed are already
out there.

The problem is simple to describe in that there are 2
tables. One is a DNS zone transfer table of all the A or Address
records in a given zone or from several zones for that matter.
the other table is from the same zones and consists of text or
TXT records. The only thing the 2 tables have in common is that
some of the TXT records share the exact same name field as the A
records so we should be able to display the important contents
of the A and TXT records on the same line if their names match.
The challenge is to do this quickly so some sort of hash
function is needed to locate A and TXT records having
the same name.

Grep does this beautifully for single entries across multiple
files, but I need to merge the text part of the TXT record with
the IP address and host name from the A record with the same
name. The only hard part is finding the quickest way to match
the roughly 25,000 host names in the A records with around half
as many TXT records. This is basically a bucket list problem in
which we can either have an A record name in a bucket by itself
or an A record in a given bucket and a TXT record in another
bucket with the same name as the A record.

In the interest of standing on the shoulders of giants,
I am checking to see how much tried and tested tools already
exist and how much needs to be home-grown.

It is also possible to use egrep to search for A and TXT
records in 1 pass through a file in which case one would search
from the same file for both record types but the problem is the
same. In case anybody wonders:

egrep '([[:space:]]IN([[:space:]]TXT[[:space:]]|[[:space:]]A[[:space:]]))' 
okstate.zone ATXT.txt

The line break here is for Email consideration. The above
command should all be on one line.

Thanks for any suggestions.

Martin McCormick WB5AGZ  Stillwater, OK 
Systems Engineer
OSU Information Technology Department Telecommunications Services Group
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Merging Related Information from 2 Tables

2009-10-29 Thread Giorgos Keramidas
On Thu, 29 Oct 2009 10:38:56 -0500, Martin McCormick 
mar...@dc.cis.okstate.edu wrote:
 This is probably going to be a hashing exercise but I am checking to see
 if any of the building blocks needed are already out there.

 The problem is simple to describe in that there are 2 tables. One is a
 DNS zone transfer table of all the A or Address records in a given zone
 or from several zones for that matter.  the other table is from the same
 zones and consists of text or TXT records. The only thing the 2 tables
 have in common is that some of the TXT records share the exact same name
 field as the A records so we should be able to display the important
 contents of the A and TXT records on the same line if their names match.
 The challenge is to do this quickly so some sort of hash function is
 needed to locate A and TXT records having the same name.

Hi Martin,

You should use a Perl or Python script, and a hash...

If you show us a few sample lines from the input file and how you want the
output to look, it shouldn't be too hard to quickly hack one of those
together.

With a short input file like this:

: keram...@kobe:/tmp$ cat input-file
: localhost   IN  A   127.0.0.1
: kobeIN  A   127.0.0.1
: kobeIN  TXT This is a test

You can construct a hash map of hostname - list of records in
Python with a relatively short script:

: #!/usr/bin/env python
:
: import re
: import sys
:
: are = None  # a regexp for matching 'A' records
: txtre = None# a regexp for matching 'TXT' records
:
: try:
: are = 
re.compile(r'^\s*(\S+)\s+[iI][nN]\s+[aA]\s+(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)).*$')
: txtre = re.compile(r'^\s*(\S+)\s+[iI][nN]\s+[tT][xX][tT]\s+(.*)$')
: except Exception, inst:
: sys.stderr.write('regexp error: %s' % str(inst))
: sys.exit(1)
:
: hosts = {}
:
: for l in sys.stdin.readlines():
: l = l.rstrip('\n\r')
: # Is this an A record?
: m = are.match(l)
: if m:
: (name, addr) = (m.group(1), m.group(2))
: rec = ('A', addr)
: if not name in hosts:
: hosts[name] = [rec]
: else:
: hosts[name].append(rec)
: # Is this a TXT record?
: m = txtre.match(l)
: if m:
: (name, text) = (m.group(1), m.group(2))
: rec = ('TXT', text)
: if not name in hosts:
: hosts[name] = [rec]
: else:
: hosts[name].append(rec)
:
: print hosts

Running this script should produce something like:

: keram...@kobe:/tmp$ python martin.py  input-file 
: {'kobe': [('A', '127.0.0.1'), ('TXT', 'This is a test')],
:  'localhost': [('A', '127.0.0.1')]}

When you have the hash map of hostname to record-list for each host, you
can select and print any combination of host=record from this hash.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Merging Related Information from 2 Tables

2009-10-29 Thread Martin McCormick
Giorgos Keramidas writes:
 You should use a Perl or Python script, and a hash...
 
 If you show us a few sample lines from the input file and how you want the
 output to look, it shouldn't be too hard to quickly hack one of those
 together.

Perl and python-- I wasn't even thinking of that! Thank
you. I have installed python now on the FreeBSD system and will
start learning it.

A records look like:

hydrogen.cis.osu. 43200 IN  A   192.168.2.123

Text or TXT records look similar except that the data they
convey are ASCII text strings of various information that are
either read by people or maybe tell servers how to behave toward
that particular client.

hydrogen.cis.osu. 5 IN  TXT cordell-north,009,192.168.2.123

Our hope is to have an output line looking like:

192.168.2.123 hydrogen.cis.osu cordell-north,009,192.168.2.123

We will actually run that output through sed to convert
the 's to blanks and also the ,'s to blanks but that is
trivial.

Thanks for the examples.

Martin McCormick
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Merging Related Information from 2 Tables

2009-10-29 Thread Giorgos Keramidas
On Thu, 29 Oct 2009 18:37:09 +0200, Giorgos Keramidas 
keram...@ceid.upatras.gr wrote:
 You should use a Perl or Python script, and a hash...
 ...
 Running this script should produce something like:

 : keram...@kobe:/tmp$ python martin.py  input-file
 : {'kobe': [('A', '127.0.0.1'), ('TXT', 'This is a test')],
 :  'localhost': [('A', '127.0.0.1')]}

 When you have the hash map of hostname to record-list for each host, you
 can select and print any combination of host=record from this hash.

On Thu, 29 Oct 2009 13:44:12 -0500, Martin McCormick 
mar...@dc.cis.okstate.edu wrote:
 Perl and python-- I wasn't even thinking of that! Thank you. I have
 installed python now on the FreeBSD system and will start learning it.

   A records look like:

 hydrogen.cis.osu. 43200   IN  A   192.168.2.123

 Text or TXT records look similar except that the data they
 convey are ASCII text strings of various information that are
 either read by people or maybe tell servers how to behave toward
 that particular client.

 hydrogen.cis.osu. 5   IN  TXT cordell-north,009,192.168.2.123

Once you slurp all the A and TXT records in a hash-map or another data
structure of your own with Python, you can iterate over the hash and
print parts or all of it.  For example, if you have the hash I printed
in my previous reply, you can print all addresses and text records with
a small bit of code:

: keram...@kobe:/home/keramida$ cat hello.py
: #!/usr/bin/env python
:
: hosts = {'kobe': [('A', '127.0.0.1'),
:   ('TXT', 'This is a test')],
:  'localhost': [('A', '127.0.0.1')]}
:
: for h in sorted(hosts):
: addrs = [x[1] for x in hosts[h] if x[0] == 'A']
: txts = [x[1] for x in hosts[h] if x[0] == 'TXT']
: for a in addrs:
: if len(txts) == 0:
: txts = []
: for t in txts:
: print %-20s %-30s %s % (a, h, t)
: keram...@kobe:/home/keramida$ python hello.py
: 127.0.0.1kobe   This is a test
: 127.0.0.1localhost
: keram...@kobe:/home/keramida$

Add or remove formatting as you see fit :-)

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org