Thanks, Erik. It helped me get in the right frame of mind when thinking critically on where certain ingestion logic should reside. And thanks for digging into the example of node-xslx and pointing out that it's async built on an underlying sync library. I definitely looked at the binary extract for xslx and the Open Office pipeline, but these seem to only allow rough grain text searches. I need to be able to create indexes and create fine-grain queries on the data. Plus, xslx has the nasty behavior of putting any repeated strings into a separate sharedStrings.xml file and there didn't seem to be any MarkLogic server side solution to remedy this. And I need to automate or at least control the shredding process from an external tier as much as possible because there will be a lot of different sets of xslx. I'm thinking of massaging xslx into json, send to MarkLogic, and use CPF to split each "row" into a document since the transform function can't do a xdmp.documentInsert().
Ok, back to the node/npm/JavaScript libraries. Here's a knowledgebase page <https://help.marklogic.com/knowledgebase/article/View/222/0/server-side-javascript-implementation-and-module-reuse> I just came across that offers additional explanation that you pretty much nailed. I've also included my troubleshooting steps in how to require a library server side using the example of 'lodash.js'. I tried to send lodash.js to modules database and then use it in in a transform with `require(“lodash.js”)` statement, but it failed with: > "message": "JS-JAVASCRIPT: var _ = require('lodash.js'); -- Error running > JavaScript request: XDMP-NOEXECUTE: Document is not of executable mimetype. > URI: lodash.js > So, I needed to write it as lodash.*sjs* and require(“lodash.*sjs*”). But then this failed with: > "message": "JS-JAVASCRIPT: var _ = require('lodash.sjs'); -- Error running > JavaScript request: XDMP-MODNOTFOUND: Module lodash.sjs not found To fix this, send as uri: “*/*lodash.sjs" and used with require(“*/* lodash.sjs”). Note: I used contentType: "application/vnd.marklogic-javascript” when sending lodash.sjs to server and used the node.js client api modulesDb.documents.write instead of the more specialized db.config.extlibs.write because I couldn't get the transform's require statement to work. Plus, the former feels like it gives more flexibility without having to learn a special set of write and read calls. Maybe my perspective will change on this with time. Regards, Will ------------------------------ Message: 2 Date: Mon, 13 Jul 2015 02:55:54 +0000 From: Erik Hennum <[email protected]> Subject: Re: [MarkLogic Dev General] Can node libraries be installed server-side? To: MarkLogic Developer Discussion <[email protected]> Message-ID: <dfdf2fd50bf5aa42adaf93ff2e3ca185070ea...@exchg10-be01.marklogic.com > Content-Type: text/plain; charset="iso-8859-1" Hi, Will: There are some significant differences between Node.js and MarkLogic as a JavaScript runtime environment (even though both make use of v8). First and foremost, Node.js emphasizes asynchronous IO. As a transactional database, MarkLogic emphasizes synchronous IO. You can execute asynchronous actions in MarkLogic (via the task server), but when you do an xdmp.documentInsert(), the operation blocks until the operation succeeds or fails. Stepping back, the tier where you implement an action is not arbitrary. In the database, it's best to write short actions (similar to stored procedure) for query expansion, query composition, inbound or outbound data transformation, and so on. The middle tier is great for information bus operations, business logic, and so on. With that perspective, the libraries that make sense to use as dependencies for server-side JavaScript actions are those that finish synchronous actions quickly. For that reason, in the particular case, my guess would be that js-xlsx (the core library wrapped by node-xlsx) might be a better fit for server-side processing than node-xlsx (which adds asynchronous IO conveniences that would not work in the server). At present, you would need to either modify the mimetypes configuration to identify *.js as an extension for server-side JavaScript (so the server knows that it's not static JavaScript to send to the client) or rename the library extension to sjs. You could put the library in the modules database as described in: http://docs.marklogic.com/guide/rest-dev/extensions#id_55309 Then, require the library in your transform or main module. The speculations about package management for such dependencies is very interesting. By the way, the server can extract metadata from spreadsheets without installing an external library: http://docs.marklogic.com/guide/search-dev/binary-document-metadata#id_74790 Hoping that helps, Erik Hennum ------------------------------ Message: 1 Date: Sun, 12 Jul 2015 22:19:41 -0400 From: Will Lawrence <[email protected]> Subject: [MarkLogic Dev General] Can node libraries be installed server-side? To: [email protected] Message-ID: <cagehxqseol3dqogobk-t6fze6fx-m8dhylbnlw3lc6t1c0m...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" I tried but couldn't find any examples or guidance for using node libraries within .sjs files on the MarkLogic server. How could we use, for example, the npm module 'node-xlsx' in a transform? It would be great to be able to leverage the power of the npm and node micro-library ecosystem within .sjs files. Perhaps there could be an .npmrc file controlled via the MarkLogic admin to specify if the server is allowed to talk to registry.npmjs.com or an enterprise npm registry or non at all. Then, a REST API could be exposed to write dependencies to the MarkLogic's package.json that would automatically do an 'npm install' so that when an .sjs file is installed, it can execute the line: ```spreadsheetShredder = require('node-xlsx'); Regards, Will -------------- next part -------------- An HTML attachment was scrubbed... URL: http://developer.marklogic.com/pipermail/general/attachments/20150712/60738f32/attachment-0001.html On Sun, Jul 12, 2015 at 10:19 PM, Will Lawrence <[email protected]> wrote: > I tried but couldn't find any examples or guidance for using node > libraries within .sjs files on the MarkLogic server. How could we use, for > example, the npm module 'node-xlsx' in a transform? > > It would be great to be able to leverage the power of the npm and node > micro-library ecosystem within .sjs files. > > Perhaps there could be an .npmrc file controlled via the MarkLogic admin > to specify if the server is allowed to talk to registry.npmjs.com or an > enterprise npm registry or non at all. Then, a REST API could be exposed to > write dependencies to the MarkLogic's package.json that would automatically > do an 'npm install' so that when an .sjs file is installed, it can execute > the line: > > ```spreadsheetShredder = require('node-xlsx'); > > Regards, > Will > -- William Lawrence 703-873-7035
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
