Add an additional field in both files that is the uint32 version of the IP.
Sort both files ascending by the new field.
Since the files are already sorted, compare-object will be very fast. If you
want it even faster, you can create a couple of while() loops yourself to do
the comparisons.
There are a variety of mechanisms for converting the text-ip to uint32-ip. The
easiest is not the fastest. :) Depends on how often you will need to do this as
to whether that's relevant.
Probably the easiest:
( '192.168.231.10' -as [IpAddress] ).Address
From: [email protected] [mailto:[email protected]] On
Behalf Of Orlebeck, Geoffrey
Sent: Tuesday, May 24, 2016 6:03 PM
To: '[email protected]'
Subject: [powershell] CSV Comparisons:
I have two CSV files. One is a list of IP addresses and DNS hostnames. The
other has IP, DNS name (if it resolves) as well as "firstseen" and "lastseen"
on the network. I am attempting to compare the list of IP addresses to the
second list that contains additional fields.
The dataset has the following format:
[cid:[email protected]]
The goal is to loop through the list of IP addresses and check for matches in
the CSV file. If found, I want to note the entire row with all fields
(IP,Name,FirstSeen,LastSeen). I'm not sure of the most efficient way to process
this. The smaller list is ~9k entries. The larger list has over 260k entries.
On a small scale I am able to see this logic will at least give me a list of IP
addresses found in both files:
$List = Import-csv C:\IP_Addresses.csv
$CSV = Import-Csv C:\DB_Export.csv
$Matches = $List | % {$CSV.ip -like $_.ip}
The efficiency of this method is where I'm concerned. I'm currently working
with a limited subset of the data (about 500 entries each), and it takes a few
minutes to parse. However, with the full list being 9,000 IP addresses against
260,000+ rows in $CSV, I have a feeling it will take quite some time to process.
I'm wondering if anyone has any examples of such data comparisons. My end goal
is to parse the 9k IP addresses, compare them against our 260k entries, and
eventually filter based on the 'LastSeen' field. My gut says using
Compare-Object or the above example is a bad process, I'm just not sure how
else to go about processing such large files. Basically I read about the
differences between Get-Content versus Streamreader
here<https://foxdeploy.com/2016/03/23/coding-for-speed/> and thought there is
probably a better way than the built-in cmdlets I am attempting to use.
Thank you.
-Geoff
Confidentiality Notice: This is a transmission from Community Hospital of the
Monterey Peninsula. This message and any attached documents may be confidential
and contain information protected by state and federal medical privacy
statutes. They are intended only for the use of the addressee. If you are not
the intended recipient, any disclosure, copying, or distribution of this
information is strictly prohibited. If you received this transmission in error,
please accept our apologies and notify the sender. Thank you.