Sounds like that to me, but don’t know details. It is indexed, so changing it 
must involve updating indexes for sure though. But there might be subtleties 
about what is actually reindexed and what not..

I’ll forward your question though..

Cheers

From: 
<general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>>
 on behalf of Andreas Hubmer 
<andreas.hub...@ebcont.com<mailto:andreas.hub...@ebcont.com>>
Reply-To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Date: Thursday, July 20, 2017 at 4:42 PM
To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Binary Document Ingestion in MP4 and MP3 
format

Thanks, Geert.

In the release notes I've found the following statement 
(https://docs.marklogic.com/guide/relnotes/chap3#id_45632):
Storing the axes times in metadata enables MarkLogic to update the axes 
timestamps without changing the documents and invoking reindexing.
To me it seems that the metadata is connected to the fragment but stored 
somehow differently. Do you know any more details?

Cheers,
Andreas



2017-07-20 16:35 GMT+02:00 Geert Josten 
<geert.jos...@marklogic.com<mailto:geert.jos...@marklogic.com>>:
Hi Andreas,

I tried to look for a nice Guide section, but couldn’t find one. But there 
isn’t too much to say about it actually.

It starts with adding metadata to a doc using 
http://docs.marklogic.com/xdmp:document-set-metadata. It takes a map:map, and 
non-string values will be converted to quoted strings. It effectively lives 
inside the same document fragment as the documents contents, but it is not 
included nor embedded when you pull up the contents with for instance fn:doc.

You can also search on it using so-called metadata fields. That is a new 3rd 
type of field. You can create them with admin ui, or for instance with admin 
functions. The Temporal guide spends a few words on it: 
http://docs.marklogic.com/guide/temporal/temporal-quick-start#id_50302. Very 
useful for storing temporal properties, but you can use it for other purposes 
too.

In search constraints you just refer to the field by name, like any other 
field. You can range index metadata fields too, like other fields, and even 
index as dateTime and such, but you cannot store a fragment of XML inside it, 
and index on a sub-element of that. It will simply get stored as quoted xml, 
and it will full-text search that instead..

Cheers,
Geert

From: 
<general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>>
 on behalf of Andreas Hubmer 
<andreas.hub...@ebcont.com<mailto:andreas.hub...@ebcont.com>>
Reply-To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Date: Thursday, July 20, 2017 at 11:53 AM
To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Binary Document Ingestion in MP4 and MP3 
format

Hi Geert,

MarkLogic 9 also allows storing simple key/value pairs in hidden document 
metadata, which is more efficient than document properties
I am interested in that new feature. Is there somewhere an explanation how it 
works (regarding reindexing, ...)?

Thanks,
Andreas



2017-07-20 11:33 GMT+02:00 Geert Josten 
<geert.jos...@marklogic.com<mailto:geert.jos...@marklogic.com>>:
Hi Pavan,

If you need to store both the binary itself, and the meta info + textual 
contents, there are two general approaches:

- put meta info and textual contents in document properties
- store them separately as normal documents with a reference with the database 
uri of the actual binary

MarkLogic 9 also allows storing simple key/value pairs in hidden document 
metadata, which is more efficient than document properties or separate docs, 
but it is probably too limited for this use case.

You can store transcripts of videos including timestamps as XML, which would 
work for both the two-doc, and the doc-prop approach.

Document properties allows storing complete XML fragments, and is associated 
with the same database uri as the actual document (in this case the binary 
data). It is included in indexing automatically. You just need to indicate you 
like to include properties fragments in searching and faceting.

There are out of the box CPF pipelines for Document Filtering. There is one 
that saves the the result in doc properties, and one that saves the result in a 
separate doc. It should be possible to enable those via the Admin ui..

Kind regards,
Geert

From: GUPTA Pavan 
<pavan.gu...@soprasteria.com<mailto:pavan.gu...@soprasteria.com>>
Date: Thursday, July 20, 2017 at 11:07 AM
To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>, 
Geert Josten <geert.jos...@marklogic.com<mailto:geert.jos...@marklogic.com>>
Subject: RE: [MarkLogic Dev General] Binary Document Ingestion in MP4 and MP3 
format

Hello Geert,

Thanks for information. I would also know how I can store the content (means 
spoken words) of a video and find the time when it was spoken as we load the 
content of any document file in metadata.
Is there any CPF I need to apply or suggest some library.

Thanks In Advance!


Regards,
Pavan

From:general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Geert Josten
Sent: Thursday, July 20, 2017 2:27 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Binary Document Ingestion in MP4 and MP3 
format

Hi Pavan,

You can apply xdmp:document-filter on many binary formats, including mp3 and 
mp4. It will extract meta information like file size and content mime type, and 
for instance document properties from office documents, and exif tags from 
images. It will also attempt extract actual text, but that will only work if 
such text is inside the file in a machine readable form. E.g. text contained 
inside images or video streams will not be captured. This includes images 
embedded in office docs, image pdf, and also captions and subtitles on images 
and videos. You would need an OCR kind of solution for that..

Kind regards,
Geert

From: 
<general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>>
 on behalf of GUPTA Pavan 
<pavan.gu...@soprasteria.com<mailto:pavan.gu...@soprasteria.com>>
Reply-To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Date: Thursday, July 20, 2017 at 9:19 AM
To: "general@developer.marklogic.com<mailto:general@developer.marklogic.com>" 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Subject: [MarkLogic Dev General] Binary Document Ingestion in MP4 and MP3 format

Hi Team,

I am trying to ingest the .mp4 and .mp3 file and make them searchable. I have 
studied that these files are considered as binary files.

I have also seen how to make the binary files searchable but I have done for 
.doc, .ppt, .pdf etc file but could not do for .mp4 or .mp3.

Actually I want to make the files searchable.

Can you please direct me how to achieve this and tell me if I need to enable or 
set up any content processing framework for same.\

Thanks In Advance!


Regards,
Pavan

_______________________________________________
General mailing list
General@developer.marklogic.com<mailto:General@developer.marklogic.com>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general




_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to