Hi Geert, MarkLogic 9 also allows storing simple key/value pairs in hidden document > metadata, which is more efficient than document properties
I am interested in that new feature. Is there somewhere an explanation how it works (regarding reindexing, ...)? Thanks, Andreas 2017-07-20 11:33 GMT+02:00 Geert Josten <geert.jos...@marklogic.com>: > Hi Pavan, > > If you need to store both the binary itself, and the meta info + textual > contents, there are two general approaches: > > - put meta info and textual contents in document properties > - store them separately as normal documents with a reference with the > database uri of the actual binary > > MarkLogic 9 also allows storing simple key/value pairs in hidden document > metadata, which is more efficient than document properties or separate > docs, but it is probably too limited for this use case. > > You can store transcripts of videos including timestamps as XML, which > would work for both the two-doc, and the doc-prop approach. > > Document properties allows storing complete XML fragments, and is > associated with the same database uri as the actual document (in this case > the binary data). It is included in indexing automatically. You just need > to indicate you like to include properties fragments in searching and > faceting. > > There are out of the box CPF pipelines for Document Filtering. There is > one that saves the the result in doc properties, and one that saves the > result in a separate doc. It should be possible to enable those via the > Admin ui.. > > Kind regards, > Geert > > From: GUPTA Pavan <pavan.gu...@soprasteria.com> > Date: Thursday, July 20, 2017 at 11:07 AM > To: MarkLogic Developer Discussion <general@developer.marklogic.com>, > Geert Josten <geert.jos...@marklogic.com> > Subject: RE: [MarkLogic Dev General] Binary Document Ingestion in MP4 and > MP3 format > > Hello Geert, > > > > Thanks for information. I would also know how I can store the content > (means spoken words) of a video and find the time when it was spoken as we > load the content of any document file in metadata. > > Is there any CPF I need to apply or suggest some library. > > > > Thanks In Advance! > > > > > > Regards, > > Pavan > > > > *From:* general-boun...@developer.marklogic.com [mailto:general-bounces@ > developer.marklogic.com <general-boun...@developer.marklogic.com>] *On > Behalf Of *Geert Josten > *Sent:* Thursday, July 20, 2017 2:27 PM > *To:* MarkLogic Developer Discussion > *Subject:* Re: [MarkLogic Dev General] Binary Document Ingestion in MP4 > and MP3 format > > > > Hi Pavan, > > > > You can apply xdmp:document-filter on many binary formats, including mp3 > and mp4. It will extract meta information like file size and content mime > type, and for instance document properties from office documents, and exif > tags from images. It will also attempt extract actual text, but that will > only work if such text is inside the file in a machine readable form. E.g. > text contained inside images or video streams will not be captured. This > includes images embedded in office docs, image pdf, and also captions and > subtitles on images and videos. You would need an OCR kind of solution for > that.. > > > > Kind regards, > > Geert > > > > *From: *<general-boun...@developer.marklogic.com> on behalf of GUPTA > Pavan <pavan.gu...@soprasteria.com> > *Reply-To: *MarkLogic Developer Discussion <general@developer.marklogic. > com> > *Date: *Thursday, July 20, 2017 at 9:19 AM > *To: *"general@developer.marklogic.com" <general@developer.marklogic.com> > *Subject: *[MarkLogic Dev General] Binary Document Ingestion in MP4 and > MP3 format > > > > Hi Team, > > > > I am trying to ingest the .mp4 and .mp3 file and make them searchable. I > have studied that these files are considered as binary files. > > > > I have also seen how to make the binary files searchable but I have done > for .doc, .ppt, .pdf etc file but could not do for .mp4 or .mp3. > > > > Actually I want to make the files searchable. > > > > Can you please direct me how to achieve this and tell me if I need to > enable or set up any content processing framework for same.\ > > > > Thanks In Advance! > > > > > > Regards, > > Pavan > > _______________________________________________ > General mailing list > General@developer.marklogic.com > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > > -- Andreas Hubmer Senior IT Consultant EBCONT enterprise technologies GmbH Millennium Tower Handelskai 94-96 A-1200 Vienna Mobile: +43 664 60651861 Fax: +43 2772 512 69-9 Email: andreas.hub...@ebcont.com Web: http://www.ebcont.com OUR TEAM IS YOUR SUCCESS UID-Nr. ATU68135644 HG St.Pölten - FN 399978 d
_______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general