Hi Folks,
I have a typical ENVI header metadata file with the following values

ENVI
description = {
  Georeferenced Image built from input GLT. [Tue Aug 11 15:39:46 2015] [Tue
  Aug 11 16:17:51 2015]}
samples = 1363
lines = 22826
bands = 432
header offset = 0
file type = ENVI Standard
data type = 4
interleave = bil
sensor type = Unknown
byte order = 0
map info = { UTM , 1.000 , 1.000 , 201603.137 , 4061363.983 ,
2.7000000000e+00 , 2.7000000000e+00 , 13 , North , WGS-84 , units=Meters ,
rotation=90.00000000 }
wavelength units = Nanometers
...

Before I work on Workflow and automating data staging and ingest, right
now, I use the following syntax to ingest the file

./filemgr-client --operation --ingestProduct --productName
ang20150420t182050_corr_v1e_img.hdr --productStructure Flat
--productTypeName GenericFile --metadataFile
file:///usr/local/coal-sds-deploy/data/staging/ang20150420t182050_corr_v1e_img.hdr.met
--refs
file:///usr/local/coal-sds-deploy/data/staging/ang20150420t182050_corr_v1e_img.hdr
--url http://localhost:9000

This generates the following record

{ "id": "07193c17-67f8-4f8c-ac2e-5a281c7ee48c", "CAS.ProductStructure":
"Flat", "CAS.ProductTypeName": "GenericFile", "CAS.ProductName":
"ang20150419t155032_corr_v1f_img.hdr", "CAS.ProductReceivedTime":
"2018-09-16T19:26:52Z", "CAS.ProductTypeId": "urn:oodt:GenericFile", "
CAS.ProductTransferStatus": "RECEIVED", "CAS.ProductId":
"07193c17-67f8-4f8c-ac2e-5a281c7ee48c", "FileLocation": [
"/usr/local/coal-sds-deploy/data/archive/ang20150419t155032_corr_v1f_img.hdr"
], "Filename": [ "ang20150419t155032_corr_v1f_img.hdr" ], "MimeType": [
"application/octet-stream", "application", "octet-stream" ], "
CAS.ReferenceMimeType": [ "application/octet-stream" ], "
CAS.ReferenceDatastore": [
"file:/usr/local/coal-sds-deploy/data/archive/ang20150419t155032_corr_v1f_img.hdr/ang20150419t155032_corr_v1f_img.hdr"
], "CAS.ReferenceFileSize": [ 20732 ], "CAS.ReferenceOriginal": [
"file:///usr/local/coal-sds-deploy/data/staging/ang20150419t155032_corr_v1f_img.hdr"
], "_version_": 1611819870151770000 }

I would now like to also extract out the various values included within the
file as top level metadata. How can I ensure that 1) the Tika extractor is
being used to extract metadata and 2) that if Tika does not pick up the
metadata, then I can pick it up some other way? Preferably this would be
done server-side such that the client arguments are simple.

Thanks
Lewis

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Reply via email to