Re: [MarkLogic Dev General] MarkLogic use case

Rob Szkutak Mon, 15 Dec 2014 09:22:14 -0800

Something else to keep in your mind as you're bringing in data is whether or 
not you want to perform any content enrichment or transformation on the data. 
If you do, you'll want to look into using CPF (which supports various input 
formats including PDF) or writing your own xQuery to handle the ingestion. 
Other methods of ingesting data will generally just insert it into MarkLogic 
as-is.


For more information on CPF, see: https://docs.marklogic.com/guide/cpf/overview

Best,

Rob

From: [email protected] 
[mailto:[email protected]] On Behalf Of Dave Cassel
Sent: Monday, December 15, 2014 11:11 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] MarkLogic use case

First, JSON. In MarkLogic 7, documents are one of XML, text, or binary. If you 
insert and retrieve documents through the REST API, they will be quietly 
converted to XML and you can build range indexes on those converted documents. 
Users of the API will just see them as JSON documents. In MarkLogic 8 
(currently available in Early Access), JSON will become a native type and 
you'll be able to build range indexes on it in that native form.

Now for binaries. Binary content such as PDF and images will need to be 
converted to some form that has text and structure that MarkLogic can work 
with. MarkLogic provides some tools for this, like 
xdmp:pdf-convert()<http://docs.marklogic.com/xdmp:pdf-convert> and 
xdmp:document-filter()<http://docs.marklogic.com/xdmp:document-filter>. 
Functions like these will extract metadata and text from your binaries and 
allow you to store them, typically in another document alongside the binary.

For CSV, your choices are to convert to XML or to ingest it simply as text. The 
latter would just give you a large text document with no structure -- you 
almost certainly don't want that. Take a look at 
MLCP<http://developer.marklogic.com/products/mlcp> for a tool to convert CSV 
into XML during ingest.

--
Dave Cassel
Developer Community Manager
MarkLogic Corporation<http://www.marklogic.com/>
Cell:  +1-484-798-8720



From: Maisnam Ns <[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Monday, December 15, 2014 at 11:16 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [MarkLogic Dev General] MarkLogic use case

Hi,
 I have to insert data that comes in various formats xml, csv ,json , 
bib,images and pdf.
For whatever I know , I can use xml as it is and create range index on elements 
. But can someone help me whether pdf , json and bib can have range indexes. I 
am of the view that these formats need to be converted to xml and then create 
the range indexes.
Is this a correct statement? How can I create range indexes on pdf , json and 
csv without converting to xml
Regards
ns

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] MarkLogic use case

Reply via email to