Re: help with a csv import script to searchable database

Mike Sun, 08 Feb 2026 06:27:10 -0800


1) A flat file DB (array) is probably the way to go
in your case.


2) To ignore the header you simply write that in your
Perl code.  If the header is static, just write it in
and ignore all import headers.

3) Use a hash to remove EXACT data matches.  That is easy.
It gets much harder if you want to remove SIMILAR, but not
exact data.

To keep the Perl code simple, you might want to use filtering
in LibreOffice Calc for your queries.


Mike


On 1/27/26 15:52, Gomes, Rich via beginners wrote:

I am working on a ongoing project with email where I will need toimport daily csv files into perl and create a searchable database ofall the files.
Here is some example data:
Header:
(Some of these fields\columns may or may not be removed in futurecsv’s, but this is what we have for now)
Timestamp,SenderFromDomain,SenderFromAddress,DMARC,RecipientEmailAddress,Subject,SenderIPv4,Connectors,DeliveryAction,EmailActionPolicy,OrgLevelAction,OrgLevelPolicy,UserLevelAction,UserLevelPolicy,AuthenticationDetails,Context,ReportId,SenderObjectId
Example rows:
"Jan 27, 2026 3:30:56PM",domain.com,[email protected],pass,[email protected],Thank youfor your application,20.1.130.13,,Delivered,,Allow,Connectionpolicy,,,"{""SPF"":""pass"",""DKIM"":""pass"",""DMARC"":""pass"",""CompAuth"":""pass""}",,4647d63d-1f9d-4982-6c39-08de5de2f778-18193297287602271192-1,1d3478ee-351f-4ee9-b6ec-7b03ee68e334"Jan 27, 2026 3:33:04 PM", domain.ar,notifica@domain.ar,pass,[email protected],Envío de Orden de Compra AramarkNro.115615,149.72.150.13,,Delivered,,,,,,"{""SPF"":""pass"",""DKIM"":""pass"",""DMARC"":""pass"",""CompAuth"":""pass""}",,976717e0-23ac-4538-a058-08de5de33a88-6451908357547151849-1,"Jan 27, 2026 3:31:29 PM", domain.com,paradox@domain.com,pass,[email protected],Please confirm your interview withHRReps,159.183.2.108,,Delivered,,,,,,"{""SPF"":""pass"",""DKIM"":""pass"",""DMARC"":""pass"",""CompAuth"":""pass""}",,f8d7f41e-fb08-491c-43f4-08de5de30c16-11061410221252786783-1,5767d814-45d6-4a03-bb3b-434692b8edc3
My initial question is:
Since the data will stay for some time (at least a year), is adatabase the best to import the data “into”? Or would an array be abetter approach?
Some of the queries I expect to perfom are:
“Show me the last time that a specific value in SenderFromAddress hada Connector value of “empty””“Show me the last time that SenderFromAddress had a OrgLevelPolicyvalue of “xyz””
Things like that. Basically query any combinations of fields
Also, since all the files are in the same format, how do you “ignore”the header after the “first import”?Also, there is a potential for some overlap in data, albeit small (Iam pulling this data from a KQL query in O365), is there a “routine” Ican run against the data to detect and remove any duplicate data.I would like to learn how to do this both during the import and alsorun it against existing data. That may seem “extra” but this is allabout me learning how to do each of these things
Is this a good starting place for what I am looking to do:?
_How to read a CSV file using Perl?_<https://perlmaven.com/how-to-read-a-csv-file-using-perl>
Thank you,
Rich

Re: help with a csv import script to searchable database

Reply via email to