Something else to keep in your mind as you're bringing in data is whether or not you want to perform any content enrichment or transformation on the data. If you do, you'll want to look into using CPF (which supports various input formats including PDF) or writing your own xQuery to handle the ingestion. Other methods of ingesting data will generally just insert it into MarkLogic as-is.
For more information on CPF, see: https://docs.marklogic.com/guide/cpf/overview Best, Rob From: [email protected] [mailto:[email protected]] On Behalf Of Dave Cassel Sent: Monday, December 15, 2014 11:11 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] MarkLogic use case First, JSON. In MarkLogic 7, documents are one of XML, text, or binary. If you insert and retrieve documents through the REST API, they will be quietly converted to XML and you can build range indexes on those converted documents. Users of the API will just see them as JSON documents. In MarkLogic 8 (currently available in Early Access), JSON will become a native type and you'll be able to build range indexes on it in that native form. Now for binaries. Binary content such as PDF and images will need to be converted to some form that has text and structure that MarkLogic can work with. MarkLogic provides some tools for this, like xdmp:pdf-convert()<http://docs.marklogic.com/xdmp:pdf-convert> and xdmp:document-filter()<http://docs.marklogic.com/xdmp:document-filter>. Functions like these will extract metadata and text from your binaries and allow you to store them, typically in another document alongside the binary. For CSV, your choices are to convert to XML or to ingest it simply as text. The latter would just give you a large text document with no structure -- you almost certainly don't want that. Take a look at MLCP<http://developer.marklogic.com/products/mlcp> for a tool to convert CSV into XML during ingest. -- Dave Cassel Developer Community Manager MarkLogic Corporation<http://www.marklogic.com/> Cell: +1-484-798-8720 From: Maisnam Ns <[email protected]<mailto:[email protected]>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Date: Monday, December 15, 2014 at 11:16 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: [MarkLogic Dev General] MarkLogic use case Hi, I have to insert data that comes in various formats xml, csv ,json , bib,images and pdf. For whatever I know , I can use xml as it is and create range index on elements . But can someone help me whether pdf , json and bib can have range indexes. I am of the view that these formats need to be converted to xml and then create the range indexes. Is this a correct statement? How can I create range indexes on pdf , json and csv without converting to xml Regards ns
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
