I appreciate all of your responses. I plan on reviewing both the uint32 and
binary options for future use, but in this instance I just needed the data for
a one-off. I’m sure others will pick this apart for its lack of refinement, but
it got the job done for me:
$DNS = Import-csv 'C:\dns.csv'
$CSV = Import-Csv 'C:\db_dump.csv'
$Valid = New-Object System.Collections.ArrayList
$NotValid = New-Object System.Collections.ArrayList
$Stale = New-Object System.Collections.ArrayList
# Populate hashtable with "IP,Hostname" from DNS dump file
$Hash = @{}
Foreach($Entry in $DNS)
{
$Hash.Add($Entry.IP,$Entry.Name)
}
# Iterate through db_dump entries and find matches.
Foreach($Object in $CSV)
{
If($Hash.ContainsKey($Object.IP))
{
[Void]$Valid.Add($Object)
}
}
# Iterate through db_dump and if IP is not valid, add to $Stale ArrayList
Foreach($Item in $DNS)
{
If(!($Valid.IP -contains $Item.IP))
{
[Void]$Stale.Add($Item)
}
}
# Iterate $Stale entries, ping address, record results, and export to CSV file
$PingResults = $()
Foreach($Server in $Stale.Name)
{
$Status = @{ "ServerName" = $Server; "TimeStamp" = (Get-Date -f s) }
if (Test-Connection $Server -Count 1 -ea 0 -Quiet)
{
$Status["Results"] = "Up"
}
else
{
$Status["Results"] = "Down"
}
New-Object -TypeName PSObject -Property $status -OutVariable serverStatus
$PingResults += $serverStatus
}
$PingResults | Export-Csv -Path 'C:\stale_ip_results.csv' -NoTypeInformation
For future projects and/or anything that will require scheduled refreshes, I
really like the idea of the additional field to compare as well as possibly
tacking on multi-threading. Thanks to Mike for the hash table idea as it helped
speed up my querying. And I’m all about learning, I take no offense when people
point out issues with my code. So I always appreciate feedback, so long as it’s
not just “that sucks”. If you can offer other suggestions or even just “hey,
look at ____ method for handling such task”, it’s likely I didn’t use it
because I didn’t know about it or how to use it.
Thanks again one and all!
-Geoff
From: [email protected] [mailto:[email protected]] On
Behalf Of Devin Rich
Sent: Wednesday, May 25, 2016 12:55 PM
To: [email protected]
Subject: Re: [powershell] RE: CSV Comparisons:
Ok, time to pull out the black magic!
There is the new magic .Where and ,Foreach methods that are MUCH faster than
anything you have used. This is a simple way to make your code go many times
faster. How many times faster? Many times. Syntax details:
http://www.powershellmagazine.com/2014/10/22/foreach-and-where-magic-methods/
Next idea: BINARY SEARCH! yay! If you sort by that IP address in your 250k CSV
file, why should you have to search through the whole thing? Just looking in
the middle will let you skip at least half of the total array in a few quick
cycles. Create an array to search through. For each IP in your 9k list, search
for it in binary style from the 260k list. This may be best to do as previously
mentioned from a IP to Decimal conversion (Check out: [ipaddress]"192.168.1.1"
in powershell for the address property that you can add to each. BONUS
POINTS!!! Use the magic foreach instead of a crappy normal foreach.)
(note that there is some more magic you can do if you have multiple IP
addresses in your 260k CSV of IPs to get ALL maching IP lines without having to
search trhoguh the whole list).
Basically:
$MatchedIPAddresses = New-Object System.Collections.ArrayList($null)
$BigAList = $BigAList | sort IPAddressDecimal
$array = $BigAList.IPAddressDecimal #This line makes it so that the binary
search doesn't have to unroll the IPAddressDecimal property from the list for
each time you do the 9k searches.
foreach ($IP in $LittleAList.IPAdressDecimal)
{
$index = [array]::BinarySearch($array,$IP)
if ($index -ge 0)
{
$Null = $MatchedIPAddresses.Add($BigAList[$index])
}
}
But be warned!!!!! This is just theory. I haven't tried this to see if it would
work or not. Also, I often have used $MatchedIPAddresses +=
$BigAList[$index]instead of the $matched.Add() method. This can be a VERY easy
way to add objects to an array of objects. However, each time you do that, it
creates a new array with length of current array + 1. Copies all data to that
new one from the old one. And then adds your $Match to the very end of it. As
your array of objects gets longer, it takes longer to do the ($Object +=
syntax. You can get more performance by doing an arraylist.
Ok, next idea: MULTITHREADING!!! Why if you have 9k independent IP addresses
do you need to wait for each one to complete before you start the next? Check
out: https://github.com/RamblingCookieMonster/Invoke-Parallel If doing that is
too much, consider just only importing 1/5th of your 9k addresses in csv to 5
different powershell sessions. Since all 5 sessions cover all IP addresses,
have them all output to 5 files and combine at the very end. You'll love it!!!
If you can get a 9k core computer, you could do the whole thing in the time it
takes you to do 1 right now!!! :)
I think that the binary search will give you the greatest performance boost. I
think that multi threading (even the very manual way) will also drastically
reduce your processing time. Lastly, the magic .foreach and .where will also
make you loop through each entry much more quickly.
Play with it a bit and PLEASE PLEASE PLEASE take a moment to update us with how
it went. You could turn then into functions to make loading the data faster so
that you open up each powershell window and say Calculate-IPaddressMatches in
order to do the job. Then you can even just use something like this to tell us
how long it took: ((history)[-1].endexecutiontime -
(history)[-1].startexecutiontime).TotalSeconds
Thanks,
Devin Rich
Systems Administrator
On Wed, May 25, 2016 at 12:00 PM, its.mike
<[email protected]<mailto:[email protected]>> wrote:
2 ¢:
I'd try a hash table, created from the 9k DNS entries.
Then spin the 260k file checking its IP addresses against the hash table keys.
From: [email protected]<mailto:[email protected]>
[mailto:[email protected]<mailto:[email protected]>]
On Behalf Of Michael B. Smith
Sent: Tuesday, May 24, 2016 4:03 pm
To: [email protected]<mailto:[email protected]>
Subject: [powershell] RE: CSV Comparisons:
Add an additional field in both files that is the uint32 version of the IP.
Sort both files ascending by the new field.
Since the files are already sorted, compare-object will be very fast. If you
want it even faster, you can create a couple of while() loops yourself to do
the comparisons.
There are a variety of mechanisms for converting the text-ip to uint32-ip. The
easiest is not the fastest. ☺ Depends on how often you will need to do this as
to whether that’s relevant.
Probably the easiest:
( ‘192.168.231.10’ –as [IpAddress] ).Address
From: [email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Orlebeck, Geoffrey
Sent: Tuesday, May 24, 2016 6:03 PM
To: '[email protected]<mailto:[email protected]>'
Subject: [powershell] CSV Comparisons:
I have two CSV files. One is a list of IP addresses and DNS hostnames. The
other has IP, DNS name (if it resolves) as well as “firstseen” and “lastseen”
on the network. I am attempting to compare the list of IP addresses to the
second list that contains additional fields.
The dataset has the following format:
[cid:[email protected]]
The goal is to loop through the list of IP addresses and check for matches in
the CSV file. If found, I want to note the entire row with all fields
(IP,Name,FirstSeen,LastSeen). I’m not sure of the most efficient way to process
this. The smaller list is ~9k entries. The larger list has over 260k entries.
On a small scale I am able to see this logic will at least give me a list of IP
addresses found in both files:
$List = Import-csv C:\IP_Addresses.csv
$CSV = Import-Csv C:\DB_Export.csv
$Matches = $List | % {$CSV.ip -like $_.ip}
The efficiency of this method is where I’m concerned. I’m currently working
with a limited subset of the data (about 500 entries each), and it takes a few
minutes to parse. However, with the full list being 9,000 IP addresses against
260,000+ rows in $CSV, I have a feeling it will take quite some time to process.
I’m wondering if anyone has any examples of such data comparisons. My end goal
is to parse the 9k IP addresses, compare them against our 260k entries, and
eventually filter based on the ‘LastSeen’ field. My gut says using
Compare-Object or the above example is a bad process, I’m just not sure how
else to go about processing such large files. Basically I read about the
differences between Get-Content versus Streamreader
here<https://foxdeploy.com/2016/03/23/coding-for-speed/> and thought there is
probably a better way than the built-in cmdlets I am attempting to use.
Thank you.
-Geoff
Confidentiality Notice: This is a transmission from Community Hospital of the
Monterey Peninsula. This message and any attached documents may be confidential
and contain information protected by state and federal medical privacy
statutes. They are intended only for the use of the addressee. If you are not
the intended recipient, any disclosure, copying, or distribution of this
information is strictly prohibited. If you received this transmission in error,
please accept our apologies and notify the sender. Thank you.
The information contained in this message is privileged, confidential, and
protected from disclosure. If you are not the intended recipient, you are
hereby notified that any review, printing, dissemination, distribution, copying
or other use of this communication is strictly prohibited. If you have received
this communication in error, please notify us immediately by replying to the
message and deleting it from your computer.