Hi Folks, I have a typical ENVI header metadata file with the following values
ENVI description = { Georeferenced Image built from input GLT. [Tue Aug 11 15:39:46 2015] [Tue Aug 11 16:17:51 2015]} samples = 1363 lines = 22826 bands = 432 header offset = 0 file type = ENVI Standard data type = 4 interleave = bil sensor type = Unknown byte order = 0 map info = { UTM , 1.000 , 1.000 , 201603.137 , 4061363.983 , 2.7000000000e+00 , 2.7000000000e+00 , 13 , North , WGS-84 , units=Meters , rotation=90.00000000 } wavelength units = Nanometers ... Before I work on Workflow and automating data staging and ingest, right now, I use the following syntax to ingest the file ./filemgr-client --operation --ingestProduct --productName ang20150420t182050_corr_v1e_img.hdr --productStructure Flat --productTypeName GenericFile --metadataFile file:///usr/local/coal-sds-deploy/data/staging/ang20150420t182050_corr_v1e_img.hdr.met --refs file:///usr/local/coal-sds-deploy/data/staging/ang20150420t182050_corr_v1e_img.hdr --url http://localhost:9000 This generates the following record { "id": "07193c17-67f8-4f8c-ac2e-5a281c7ee48c", "CAS.ProductStructure": "Flat", "CAS.ProductTypeName": "GenericFile", "CAS.ProductName": "ang20150419t155032_corr_v1f_img.hdr", "CAS.ProductReceivedTime": "2018-09-16T19:26:52Z", "CAS.ProductTypeId": "urn:oodt:GenericFile", " CAS.ProductTransferStatus": "RECEIVED", "CAS.ProductId": "07193c17-67f8-4f8c-ac2e-5a281c7ee48c", "FileLocation": [ "/usr/local/coal-sds-deploy/data/archive/ang20150419t155032_corr_v1f_img.hdr" ], "Filename": [ "ang20150419t155032_corr_v1f_img.hdr" ], "MimeType": [ "application/octet-stream", "application", "octet-stream" ], " CAS.ReferenceMimeType": [ "application/octet-stream" ], " CAS.ReferenceDatastore": [ "file:/usr/local/coal-sds-deploy/data/archive/ang20150419t155032_corr_v1f_img.hdr/ang20150419t155032_corr_v1f_img.hdr" ], "CAS.ReferenceFileSize": [ 20732 ], "CAS.ReferenceOriginal": [ "file:///usr/local/coal-sds-deploy/data/staging/ang20150419t155032_corr_v1f_img.hdr" ], "_version_": 1611819870151770000 } I would now like to also extract out the various values included within the file as top level metadata. How can I ensure that 1) the Tika extractor is being used to extract metadata and 2) that if Tika does not pick up the metadata, then I can pick it up some other way? Preferably this would be done server-side such that the client arguments are simple. Thanks Lewis -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc