I appreciate all of your responses. I plan on reviewing both the uint32 and 
binary options for future use, but in this instance I just needed the data for 
a one-off. I’m sure others will pick this apart for its lack of refinement, but 
it got the job done for me:


$DNS = Import-csv 'C:\dns.csv'
$CSV = Import-Csv 'C:\db_dump.csv'
$Valid = New-Object System.Collections.ArrayList
$NotValid = New-Object System.Collections.ArrayList
$Stale = New-Object System.Collections.ArrayList

# Populate hashtable with "IP,Hostname" from DNS dump file
$Hash = @{}
Foreach($Entry in $DNS)
{
    $Hash.Add($Entry.IP,$Entry.Name)
}

# Iterate through db_dump entries and find matches.
Foreach($Object in $CSV)
{
    If($Hash.ContainsKey($Object.IP))
    {
        [Void]$Valid.Add($Object)
    }
}

# Iterate through db_dump and if IP is not valid, add to $Stale ArrayList
Foreach($Item in $DNS)
{
    If(!($Valid.IP -contains $Item.IP))
    {
        [Void]$Stale.Add($Item)
    }
}

# Iterate $Stale entries, ping address, record results, and export to CSV file
$PingResults = $()
Foreach($Server in $Stale.Name)
{
    $Status = @{ "ServerName" = $Server; "TimeStamp" = (Get-Date -f s) }
    if (Test-Connection $Server -Count 1 -ea 0 -Quiet)
    {
        $Status["Results"] = "Up"
    }
    else
    {
        $Status["Results"] = "Down"
    }
    New-Object -TypeName PSObject -Property $status -OutVariable serverStatus
    $PingResults += $serverStatus
}
$PingResults | Export-Csv -Path 'C:\stale_ip_results.csv' -NoTypeInformation

For future projects and/or anything that will require scheduled refreshes, I 
really like the idea of the additional field to compare as well as possibly 
tacking on multi-threading. Thanks to Mike for the hash table idea as it helped 
speed up my querying. And I’m all about learning, I take no offense when people 
point out issues with my code. So I always appreciate feedback, so long as it’s 
not just “that sucks”. If you can offer other suggestions or even just “hey, 
look at ____ method for handling such task”, it’s likely I didn’t use it 
because I didn’t know about it or how to use it.

Thanks again one and all!

-Geoff

From: [email protected] [mailto:[email protected]] On 
Behalf Of Devin Rich
Sent: Wednesday, May 25, 2016 12:55 PM
To: [email protected]
Subject: Re: [powershell] RE: CSV Comparisons:

Ok, time to pull out the black magic!

There is the new magic .Where and ,Foreach methods that are MUCH faster than 
anything you have used. This is a simple way to make your code go many times 
faster. How many times faster? Many times. Syntax details: 
http://www.powershellmagazine.com/2014/10/22/foreach-and-where-magic-methods/

Next idea: BINARY SEARCH! yay! If you sort by that IP address in your 250k CSV 
file, why should you have to search through the whole thing? Just looking in 
the middle will let you skip at least half of the total array in a few quick 
cycles. Create an array to search through. For each IP in your 9k list, search 
for it in binary style from the 260k list. This may be best to do as previously 
mentioned from a IP to Decimal conversion (Check out: [ipaddress]"192.168.1.1" 
in powershell for the address property that you can add to each. BONUS 
POINTS!!! Use the magic foreach instead of a crappy normal foreach.)
(note that there is some more magic you can do if you have multiple IP 
addresses in your 260k CSV of IPs to get ALL maching IP lines without having to 
search trhoguh the whole list).
Basically:
$MatchedIPAddresses = New-Object System.Collections.ArrayList($null)
$BigAList = $BigAList | sort IPAddressDecimal
$array = $BigAList.IPAddressDecimal #This line makes it so that the binary 
search doesn't have to unroll the IPAddressDecimal property from the list for 
each time you do the 9k searches.
foreach ($IP in $LittleAList.IPAdressDecimal)
{
    $index = [array]::BinarySearch($array,$IP)
    if ($index -ge 0)
    {
        $Null = $MatchedIPAddresses.Add($BigAList[$index])
    }
}


But be warned!!!!! This is just theory. I haven't tried this to see if it would 
work or not. Also, I often have used $MatchedIPAddresses += 
$BigAList[$index]instead of the $matched.Add() method. This can be a VERY easy 
way to add objects to an array of objects. However, each time you do that, it 
creates a new array with length of current array + 1. Copies all data to that 
new one from the old one. And then adds your $Match to the very end of it. As 
your array of objects gets longer, it takes longer to do the ($Object += 
syntax. You can get more performance by doing an arraylist.

Ok, next idea:  MULTITHREADING!!! Why if you have 9k independent IP addresses 
do you need to wait for each one to complete before you start the next? Check 
out: https://github.com/RamblingCookieMonster/Invoke-Parallel If doing that is 
too much, consider just only importing 1/5th of your 9k addresses in csv to 5 
different powershell sessions. Since all 5 sessions cover all IP addresses, 
have them all output to 5 files and combine at the very end. You'll love it!!!

If you can get a 9k core computer, you could do the whole thing in the time it 
takes you to do 1 right now!!! :)


I think that the binary search will give you the greatest performance boost. I 
think that multi threading (even the very manual way) will also drastically 
reduce your processing time. Lastly, the magic .foreach and .where will also 
make you loop through each entry much more quickly.

Play with it a bit and PLEASE PLEASE PLEASE take a moment to update us with how 
it went. You could turn then into functions to make loading the data faster so 
that you open up each powershell window and say Calculate-IPaddressMatches in 
order to do the job. Then you can even just use something like this to tell us 
how long it took: ((history)[-1].endexecutiontime - 
(history)[-1].startexecutiontime).TotalSeconds

Thanks,

Devin Rich
Systems Administrator


On Wed, May 25, 2016 at 12:00 PM, its.mike 
<[email protected]<mailto:[email protected]>> wrote:
2 ¢:

I'd try a hash table, created from the 9k DNS entries.

Then spin the 260k file checking its IP addresses against the hash table keys.



From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]<mailto:[email protected]>] 
On Behalf Of Michael B. Smith
Sent: Tuesday, May 24, 2016 4:03 pm
To: [email protected]<mailto:[email protected]>
Subject: [powershell] RE: CSV Comparisons:

Add an additional field in both files that is the uint32 version of the IP. 
Sort both files ascending by the new field.

Since the files are already sorted, compare-object will be very fast. If you 
want it even faster, you can create a couple of while() loops yourself to do 
the comparisons.

There are a variety of mechanisms for converting the text-ip to uint32-ip. The 
easiest is not the fastest. ☺ Depends on how often you will need to do this as 
to whether that’s relevant.

Probably the easiest:

               ( ‘192.168.231.10’ –as [IpAddress] ).Address

From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Orlebeck, Geoffrey
Sent: Tuesday, May 24, 2016 6:03 PM
To: '[email protected]<mailto:[email protected]>'
Subject: [powershell] CSV Comparisons:

I have two CSV files. One is a list of IP addresses and DNS hostnames. The 
other has IP, DNS name (if it resolves) as well as “firstseen” and “lastseen” 
on the network. I am attempting to compare the list of IP addresses to the 
second list that contains additional fields.

The dataset has the following format:
[cid:[email protected]]

The goal is to loop through the list of IP addresses and check for matches in 
the CSV file. If found, I want to note the entire row with all fields 
(IP,Name,FirstSeen,LastSeen). I’m not sure of the most efficient way to process 
this. The smaller list is ~9k entries. The larger list has over 260k entries. 
On a small scale I am able to see this logic will at least give me a list of IP 
addresses found in both files:

$List = Import-csv C:\IP_Addresses.csv
$CSV = Import-Csv C:\DB_Export.csv
$Matches = $List | % {$CSV.ip -like $_.ip}

The efficiency of this method is where I’m concerned. I’m currently working 
with a limited subset of the data (about 500 entries each), and it takes a few 
minutes to parse. However, with the full list being 9,000 IP addresses against 
260,000+ rows in $CSV, I have a feeling it will take quite some time to process.

I’m wondering if anyone has any examples of such data comparisons. My end goal 
is to parse the 9k IP addresses, compare them against our 260k entries, and 
eventually filter based on the ‘LastSeen’ field. My gut says using  
Compare-Object or the above example is a bad process, I’m just not sure how 
else to go about processing such large files. Basically I read about the 
differences between Get-Content versus Streamreader 
here<https://foxdeploy.com/2016/03/23/coding-for-speed/> and thought there is 
probably a better way than the built-in cmdlets I am attempting to use.

Thank you.

-Geoff
Confidentiality Notice: This is a transmission from Community Hospital of the 
Monterey Peninsula. This message and any attached documents may be confidential 
and contain information protected by state and federal medical privacy 
statutes. They are intended only for the use of the addressee. If you are not 
the intended recipient, any disclosure, copying, or distribution of this 
information is strictly prohibited. If you received this transmission in error, 
please accept our apologies and notify the sender. Thank you.





The information contained in this message is privileged, confidential, and 
protected from disclosure. If you are not the intended recipient, you are 
hereby notified that any review, printing, dissemination, distribution, copying 
or other use of this communication is strictly prohibited. If you have received 
this communication in error, please notify us immediately by replying to the 
message and deleting it from your computer.

Reply via email to