1) A flat file DB (array) is probably the way to go
in your case.
2) To ignore the header you simply write that in your
Perl code. If the header is static, just write it in
and ignore all import headers.
3) Use a hash to remove EXACT data matches. That is easy.
It gets much harder if you want to remove SIMILAR, but not
exact data.
To keep the Perl code simple, you might want to use filtering
in LibreOffice Calc for your queries.
Mike
On 1/27/26 15:52, Gomes, Rich via beginners wrote:
I am working on a ongoing project with email where I will need to
import daily csv files into perl and create a searchable database of
all the files.
Here is some example data:
Header:
(Some of these fields\columns may or may not be removed in future
csv’s, but this is what we have for now)
Timestamp,SenderFromDomain,SenderFromAddress,DMARC,RecipientEmailAddress,Subject,SenderIPv4,Connectors,DeliveryAction,EmailActionPolicy,OrgLevelAction,OrgLevelPolicy,UserLevelAction,UserLevelPolicy,AuthenticationDetails,Context,ReportId,SenderObjectId
Example rows:
"Jan 27, 2026 3:30:56
PM",domain.com,[email protected],pass,[email protected],Thank you
for your application,20.1.130.13,,Delivered,,Allow,Connection
policy,,,"{""SPF"":""pass"",""DKIM"":""pass"",""DMARC"":""pass"",""CompAuth"":""pass""}",,4647d63d-1f9d-4982-6c39-08de5de2f778-18193297287602271192-1,1d3478ee-351f-4ee9-b6ec-7b03ee68e334
"Jan 27, 2026 3:33:04 PM", domain.ar,notifica@
domain.ar,pass,[email protected],Envío de Orden de Compra Aramark
Nro.
115615,149.72.150.13,,Delivered,,,,,,"{""SPF"":""pass"",""DKIM"":""pass"",""DMARC"":""pass"",""CompAuth"":""pass""}",,976717e0-23ac-4538-a058-08de5de33a88-6451908357547151849-1,
"Jan 27, 2026 3:31:29 PM", domain.com,paradox@
domain.com,pass,[email protected],Please confirm your interview with
HR
Reps,159.183.2.108,,Delivered,,,,,,"{""SPF"":""pass"",""DKIM"":""pass"",""DMARC"":""pass"",""CompAuth"":""pass""}",,f8d7f41e-fb08-491c-43f4-08de5de30c16-11061410221252786783-1,5767d814-45d6-4a03-bb3b-434692b8edc3
My initial question is:
Since the data will stay for some time (at least a year), is a
database the best to import the data “into”? Or would an array be a
better approach?
Some of the queries I expect to perfom are:
“Show me the last time that a specific value in SenderFromAddress had
a Connector value of “empty””
“Show me the last time that SenderFromAddress had a OrgLevelPolicy
value of “xyz””
Things like that. Basically query any combinations of fields
Also, since all the files are in the same format, how do you “ignore”
the header after the “first import”?
Also, there is a potential for some overlap in data, albeit small (I
am pulling this data from a KQL query in O365), is there a “routine” I
can run against the data to detect and remove any duplicate data.
I would like to learn how to do this both during the import and also
run it against existing data. That may seem “extra” but this is all
about me learning how to do each of these things
Is this a good starting place for what I am looking to do:?
_How to read a CSV file using Perl?_
<https://perlmaven.com/how-to-read-a-csv-file-using-perl>
Thank you,
Rich