Hi Pavan, If you need to store both the binary itself, and the meta info + textual contents, there are two general approaches:
- put meta info and textual contents in document properties - store them separately as normal documents with a reference with the database uri of the actual binary MarkLogic 9 also allows storing simple key/value pairs in hidden document metadata, which is more efficient than document properties or separate docs, but it is probably too limited for this use case. You can store transcripts of videos including timestamps as XML, which would work for both the two-doc, and the doc-prop approach. Document properties allows storing complete XML fragments, and is associated with the same database uri as the actual document (in this case the binary data). It is included in indexing automatically. You just need to indicate you like to include properties fragments in searching and faceting. There are out of the box CPF pipelines for Document Filtering. There is one that saves the the result in doc properties, and one that saves the result in a separate doc. It should be possible to enable those via the Admin ui.. Kind regards, Geert From: GUPTA Pavan <[email protected]<mailto:[email protected]>> Date: Thursday, July 20, 2017 at 11:07 AM To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>>, Geert Josten <[email protected]<mailto:[email protected]>> Subject: RE: [MarkLogic Dev General] Binary Document Ingestion in MP4 and MP3 format Hello Geert, Thanks for information. I would also know how I can store the content (means spoken words) of a video and find the time when it was spoken as we load the content of any document file in metadata. Is there any CPF I need to apply or suggest some library. Thanks In Advance! Regards, Pavan From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Geert Josten Sent: Thursday, July 20, 2017 2:27 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Binary Document Ingestion in MP4 and MP3 format Hi Pavan, You can apply xdmp:document-filter on many binary formats, including mp3 and mp4. It will extract meta information like file size and content mime type, and for instance document properties from office documents, and exif tags from images. It will also attempt extract actual text, but that will only work if such text is inside the file in a machine readable form. E.g. text contained inside images or video streams will not be captured. This includes images embedded in office docs, image pdf, and also captions and subtitles on images and videos. You would need an OCR kind of solution for that.. Kind regards, Geert From: <[email protected]<mailto:[email protected]>> on behalf of GUPTA Pavan <[email protected]<mailto:[email protected]>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Date: Thursday, July 20, 2017 at 9:19 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: [MarkLogic Dev General] Binary Document Ingestion in MP4 and MP3 format Hi Team, I am trying to ingest the .mp4 and .mp3 file and make them searchable. I have studied that these files are considered as binary files. I have also seen how to make the binary files searchable but I have done for .doc, .ppt, .pdf etc file but could not do for .mp4 or .mp3. Actually I want to make the files searchable. Can you please direct me how to achieve this and tell me if I need to enable or set up any content processing framework for same.\ Thanks In Advance! Regards, Pavan
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
