Re: [MarkLogic Dev General] Extracting ML Documents to a zip file using Java Client Api ?
I shared an answer in your duplicate SO question<https://stackoverflow.com/questions/48636279/is-there-compress-option-for-exportlistener-in-marklogic-java-client-api/48649442>, but I'll repeat it here in case it helps anyone. Your onDocumentReady listener is run for each document, so I'm guessing it doesn't make sense to create a new FileOutputStream("F:/Json/file.zip"); for each document. That's why you're only seeing the last document when you're done. Try moving these two lines to before you initialize your batcher: final FileOutputStream dest = new FileOutputStream("F:/Json/file.zip"); final ZipOutputStream out = new ZipOutputStream(new BufferedOutputStream(dest)); That way they'll only run once. Also, move this until after dmm.stopJob(batcher);: out.close(); Also, surround your listener code in a synchronized(out) {...} block so the threads won't overwrite each other as they write to the stream. Remember, your listener code is going to run in 10 threads in parallel, so your code in the listener needs to be thread-safe. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com <general-boun...@developer.marklogic.com> on behalf of C. Yaswanth <rocking...@gmail.com> Sent: Wednesday, February 7, 2018 7:13 AM To: MarkLogic Developer Discussion Subject: [MarkLogic Dev General] Extracting ML Documents to a zip file using Java Client Api ? Hi, I am exporting all the documents from a collection to local directory . Below is my code. public class Extract { static // replace with your MarkLogic Server connection information DatabaseClient client = DatabaseClientFactory.newClient("x", x, "x", "x", Authentication.DIGEST); private static String EX_DIR = "F:/JavaExtract"; // Loading files into the database asynchronously public static void exportByQuery() { DataMovementManager dmm = client.newDataMovementManager(); // Construct a directory query with which to drive the job. QueryManager qm = client.newQueryManager(); StringQueryDefinition query = qm.newStringDefinition(); query.setCollections("GOT"); // Create and configure the batcher QueryBatcher batcher = dmm.newQueryBatcher(query); batcher.withBatchSize(10) .withThreadCount(1) .onUrisReady( new ExportListener() .onDocumentReady(doc-> { String uriParts[] = doc.getUri().split("/"); try { FileOutputStream dest = new FileOutputStream("F:/Json/file.zip"); ZipOutputStream out = new ZipOutputStream(new BufferedOutputStream(dest)); ZipEntry e = new ZipEntry(uriParts[uriParts.length - 1]); out.putNextEntry(e); byte[] data = doc.getContent( new StringHandle()).toBuffer(); doc.getFormat(); out.write(data, 0, data.length); out.closeEntry(); out.close(); } catch (Exception e) { e.printStackTrace(); } })) .onQueryFailure( exception -> exception.printStackTrace() ); dmm.startJob(batcher); // Wait for the job to complete, and then stop it. batcher.awaitCompletion(); dmm.stopJob(batcher); } public static void main(String[] args) { exportByQuery(); } } When i am running it is taking only the last document in `GOT` collection and keeping in zip rather than taking all. Cant figure it out where it where i am doing wrong? Any Help Is Appreciated Thanks ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Is there compress option for exportListener in Marklogic Java Client API?
Take a look at ExportToZipJob in ml-javaclient-util<https://github.com/marklogic-community/ml-javaclient-util/wiki/DMSDK-Jobs> and mlExportToZip and mlExportBatchesToZips in ml-gradle<https://github.com/marklogic-community/ml-gradle/wiki/Exporting-data>. Do those do what you need? If not, could you help describe your requirements in more detail? Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com <general-boun...@developer.marklogic.com> on behalf of C. Yaswanth <rocking...@gmail.com> Sent: Tuesday, February 6, 2018 5:52:45 AM To: MarkLogic Developer Discussion Subject: [MarkLogic Dev General] Is there compress option for exportListener in Marklogic Java Client API? I want export all the documents from my marklogic db using Data Movement SDK . I exported as files, but i want to try compressing in zip file through DMSDK. I searched in the documentation regarding `compress` option but didn't find any. I am just trying for the option 'compress` while exporting data, like in CORB and MLCP . Any Help Is Appreciated Thanks ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] mlcp for multiple host
Vikas, I'm sorry I can't answer on MLCPBean. But I think the more supported way to do this is using the Data Movement SDK<https://developer.marklogic.com/learn/data-movement-sdk>. Have you looked at that? Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com <general-boun...@developer.marklogic.com> on behalf of vikas.sin...@cognizant.com <vikas.sin...@cognizant.com> Sent: Wednesday, January 31, 2018 12:12:51 PM To: general@developer.marklogic.com Subject: [MarkLogic Dev General] mlcp for multiple host Hi All, I am customizing mlcp bean to run content pump through java code. I am able to set host name where I have single host .In cluster environment , I changed my code to set host as comma separated list but getting illegal Argument exception . Do we have any other way to connect mlcp for multiple host. Regards, Vikas Singh This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Regarding DMSDK
Sundaravadivel, that is a different question than what OP is asking about. The original question was about how to connect to all the nodes in the cluster. You can use DMSDK with an F5, but you lose the performance benefit of talking directly to the best host in the cluster. When using QueryBatcher remember that documents are stored in forests on a specific primary host, and the F5 doesn't know which forest nor host that is, but DMSDK does know. So inserting the F5 into the mix adds an extra redirect as the communication is forced to a random host rather than the node which has the document. For an example of how to configure if you must use DMSDK with a load balancer, see https://docs.marklogic.com/guide/java/data-movement#id_26583 Asynchronous Multi-Document Operations (Java Application Developer's Guide) — MarkLogic 9 Product Documentation<https://docs.marklogic.com/guide/java/data-movement#id_26583> docs.marklogic.com MarkLogic is the only Enterprise NoSQL Database Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com <general-boun...@developer.marklogic.com> on behalf of Sundaravadivel kandaswamy <mumbaisun...@gmail.com> Sent: Monday, January 29, 2018 12:13:47 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Regarding DMSDK If we use F5 or other tools for load balancing then how DMSDK will distribute the load across all the hosts? Because F5 will have only one URL which is connected to multiple nodes.. On Jan 29, 2018 2:07 PM, "Sam Mefford" <sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com>> wrote: You normally don't have to specify the hosts. When you connect to a specific host/port/database, DMSDK figures out what hosts also have a REST server available on that port and forests for that database. So DMSDK will connect directly to all appropriate hosts. Does that answer your question? Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com> Cell: +1 801 706 9731<tel:(801)%20706-9731> www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com> <general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>> on behalf of C. Yaswanth <rocking...@gmail.com<mailto:rocking...@gmail.com>> Sent: Monday, January 29, 2018 11:38:21 AM To: MarkLogic Developer Discussion Subject: [MarkLogic Dev General] Regarding DMSDK I want to run the Marklogic Data Movement SDK for transformation in distributed mode on my Marklogic cluster which is running on 3Nodes. Usually in the mlcp we will use the `-host` parameter to specify our hostnames and have `-mode` parameter to define our mode type. Is it possible here in DMSDK to mention all the hostnames of our nodes in ML cluster like this newClient("host1,host2,host3", port, username, password, authentication) So that it will distribute the task efficiently. But i didnt seen any documentation of DMSDK with multiple hosts. 1. If i am not giving my all the hostnames then how it gonna distribute the work in parallel fashion. Any help is appreciated. Thanks ___ General mailing list General@developer.marklogic.com<mailto:General@developer.marklogic.com> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Regarding DMSDK
You normally don't have to specify the hosts. When you connect to a specific host/port/database, DMSDK figures out what hosts also have a REST server available on that port and forests for that database. So DMSDK will connect directly to all appropriate hosts. Does that answer your question? Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com <general-boun...@developer.marklogic.com> on behalf of C. Yaswanth <rocking...@gmail.com> Sent: Monday, January 29, 2018 11:38:21 AM To: MarkLogic Developer Discussion Subject: [MarkLogic Dev General] Regarding DMSDK I want to run the Marklogic Data Movement SDK for transformation in distributed mode on my Marklogic cluster which is running on 3Nodes. Usually in the mlcp we will use the `-host` parameter to specify our hostnames and have `-mode` parameter to define our mode type. Is it possible here in DMSDK to mention all the hostnames of our nodes in ML cluster like this newClient("host1,host2,host3", port, username, password, authentication) So that it will distribute the task efficiently. But i didnt seen any documentation of DMSDK with multiple hosts. 1. If i am not giving my all the hostnames then how it gonna distribute the work in parallel fashion. Any help is appreciated. Thanks ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Noob query question..
I should point out that this is not the fastest way to do it. A faster way would be to index "date-taken" as a dateTime element range index and use cts:search with cts:element-range-query. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.com] on behalf of Sam Mefford [sam.meff...@marklogic.com] Sent: Thursday, August 24, 2017 2:56 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Noob query question.. XQuery is an extension of XPath. Here's an example in XPath. These things are easiest to understand if we know the structure of your docs. Suppose I insert: xdmp:document-insert("test.xml", 2015-01-01) I could find the count of docs more than two years old like this: count(/note[fn:days-from-duration(fn:current-date() - date-taken) > (365 * 2)]) Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.com] on behalf of Ladner, Eric (Eric.Ladner) [eric.lad...@chevron.com] Sent: Thursday, August 24, 2017 2:24 PM To: MarkLogic Developer Discussion Subject: [MarkLogic Dev General] Noob query question.. I’m still rather new to MarkLogic and apparently have a lot to learn. When doing research on a proof of concept, I ran across a situation that would be trivial to solve in SQL, but I’m having problems wrapping my head around how to do that in XQuery. Or, is XQuery even the right place for this? Basically, the number of notes per subject for any note that’s less than two years old. If I was to do this in SQL, it’d look something like: select subject, count(*) from notes where date_taken > sysdate-(365*2) group by subject; There’s some additional WHERE clause stuff for filtering, but on average, the number of results shouldn’t be large. Any guidance on building up more complex queries like this? The documentation is semi-helpful, but the examples it gives for usage are usually very simplistic. Eric Ladner Systems Analyst eric.lad...@chevron.com<mailto:eric.lad...@chevron.com> ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Noob query question..
XQuery is an extension of XPath. Here's an example in XPath. These things are easiest to understand if we know the structure of your docs. Suppose I insert: xdmp:document-insert("test.xml", 2015-01-01) I could find the count of docs more than two years old like this: count(/note[fn:days-from-duration(fn:current-date() - date-taken) > (365 * 2)]) Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.com] on behalf of Ladner, Eric (Eric.Ladner) [eric.lad...@chevron.com] Sent: Thursday, August 24, 2017 2:24 PM To: MarkLogic Developer Discussion Subject: [MarkLogic Dev General] Noob query question.. I’m still rather new to MarkLogic and apparently have a lot to learn. When doing research on a proof of concept, I ran across a situation that would be trivial to solve in SQL, but I’m having problems wrapping my head around how to do that in XQuery. Or, is XQuery even the right place for this? Basically, the number of notes per subject for any note that’s less than two years old. If I was to do this in SQL, it’d look something like: select subject, count(*) from notes where date_taken > sysdate-(365*2) group by subject; There’s some additional WHERE clause stuff for filtering, but on average, the number of results shouldn’t be large. Any guidance on building up more complex queries like this? The documentation is semi-helpful, but the examples it gives for usage are usually very simplistic. Eric Ladner Systems Analyst eric.lad...@chevron.com<mailto:eric.lad...@chevron.com> ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Large job processing question.
Delete those documents (and associated sidecar files) from MarkLogic Server. Solution Adjustment 3 The source documents can be read from a staging area containing at least the uri and the up-to-date hashcode for each document. This will reduce the read load on the source system to only documents found to be missing from MarkLogic or updated from what is in MarkLogic. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.com] on behalf of Ladner, Eric (Eric.Ladner) [eric.lad...@chevron.com] Sent: Tuesday, August 22, 2017 8:36 AM To: general@developer.marklogic.com Subject: [MarkLogic Dev General] Large job processing question. We have some large jobs (ingestion and validation of unstructured documents) that have timeout issues. The way the jobs are structured is structured is that the first job checks that all the existing documents are valid (still exists on the file system). It does this in two steps: 1) gather all documents to be validated from the DB 2) check that list against the file system. The second job is: 1) the filesystem is traversed to find any new documents (or that have been modified in the last X days), 2) those new/modified documents are ingested. The problem in the second step is there could be tens of thousands of documents in a hundred thousand folders (don’t ask). The job will typically time out after an hour during the “go find all the new documents” phase. I’m trying to find out if there’s a way to re-structure the job so that it runs faster and doesn’t time out, or maybe breaks up the task into different parts that run in parallel or something. Any thoughts welcome. Eric Ladner Systems Analyst eric.lad...@chevron.com<mailto:eric.lad...@chevron.com> ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Logging Java Api
I don't see this code anywhere creating any log files nor connecting the out stream to any logging framework. To see the contents of "out", you'll need to write them to a file, or print them to standard out, or connect a logging framework to do that for you. All RequestLogger is trying to do is allow you to capture the requests sent to the REST server if that's desirable. If all you need is logging, please use your preferred logging framwork. We're not attempting to be a logging framework, we're attempting to be ready to connect with and work in harmony with your preferred logging framework. Here's and important relevant excerpt from the javadoc summary page<https://docs.marklogic.com/javadoc/client/overview-summary.html?hq=slf4j>: Enabling Logging We use slf4j<http://www.slf4j.org/manual.html> for logging. This means you can choose any slf4j-compliant logging framework such as Logback, AVSL, JDK logging, Log4j, or Simple. If you don't know which to choose, we recommend Logback since it is a native implementation and the easiest to configure. Please follow the instructions on the slf4j website to configure your logging framework. It should take no more than 15 minutes. Once your logging framework is configured with slf4j, you should be able to see and manage logging from the Java-client API. This is especially important for long-running QueryBatcher and WriteBatcher jobs. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.com] on behalf of Andreas Holzgethan [andreas.holzget...@ebcont.com] Sent: Friday, May 19, 2017 3:18 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Logging Java Api We tried different ways: 1. ByteArrayOutputStream out = new ByteArrayOutputStream(); RequestLogger logger = databaseClient.newLogger(out); xmlDocumentManager.startLogging(logger); try { out.write("Test!".getBytes()); out.flush(); } catch (IOException e) { e.printStackTrace(); } DocumentDescriptor e = this.xmlDocumentManager.newDescriptor(this.getUri(id)); StringHandle content = (StringHandle)this.xmlDocumentManager.read(e, (new StringHandle()).withFormat(Format.XML)); 2. ByteArrayOutputStream out = new ByteArrayOutputStream(); RequestLogger logger = databaseClient.newLogger(out); xmlDocumentManager.startLogging(logger); try { logger.getPrintStream().write("Test".getBytes()); logger.getPrintStream().flush(); } catch (IOException e) { e.printStackTrace(); } DocumentDescriptor e = this.xmlDocumentManager.newDescriptor(this.getUri(id)); StringHandle content = (StringHandle)this.xmlDocumentManager.read(e, (new StringHandle()).withFormat(Format.XML)); We also tried to call startLogging after the try catch block. Once we've also tried to call append instead of write, but we never found "Test" in the Logfiles. Best regards Andreas Holzgethan Andreas Holzgethan BSc. IT Consultant EBCONT enterprise technologies GmbH Millennium Tower Handelskai 94-96 1200 Wien Mobil: +43 664 606 517 05 Email:andreas.holzget...@ebcont.com<mailto:andreas.holzget...@ebcont.com> Web:http://www.ebcont-et.com/<http://www.ebcont.com/> <http://www.ebcont.com/> OUR TEAM IS YOUR SUCCESS HG St. Pölten - FN 293731 h UID: ATU63444589 2017-05-18 18:40 GMT+02:00 Sam Mefford <sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com>>: Can you please share the code you tried that did not work? Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com> Cell: +1 801 706 9731<tel:(801)%20706-9731> www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com> [general-boun...@developer.marklogic.com<mailto:general-boun...@deve
Re: [MarkLogic Dev General] MarkLogic unexpected REST authentication (digest)
Consider sending the curl output with -v Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com> Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 4/12/2017 8:59 AM, Dave Cassel wrote: Could you send the output of your script? -- Dave Cassel, @dmcassel<https://twitter.com/dmcassel> Technical Community Manager MarkLogic Corporation<http://www.marklogic.com/> http://developer.marklogic.com/ From: Nikunj Vekariya <nikunjdvekar...@yahoo.com<mailto:nikunjdvekar...@yahoo.com>> Reply-To: Nikunj Vekariya <nikunjdvekar...@yahoo.com<mailto:nikunjdvekar...@yahoo.com>> Date: Wednesday, April 12, 2017 at 7:33 AM To: Dave Cassel <dave.cas...@marklogic.com<mailto:dave.cas...@marklogic.com>>, MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: Re: [MarkLogic Dev General] MarkLogic unexpected REST authentication (digest) Hi Dave, Yes it is 401 error. I tried running the same script multiple times, but still throwing same error. Best, Nikunj On Wednesday, 12 April 2017 4:59 PM, Dave Cassel <dave.cas...@marklogic.com<mailto:dave.cas...@marklogic.com>> wrote: Does it return a 401 and then try again and succeed? That's actually normal operation for a digest request. See Example with Explanation<https://en.wikipedia.org/wiki/Digest_access_authentication#Example_with_explanation> on the Digest access authentication wikipedia page. -- Dave Cassel, @dmcassel<https://twitter.com/dmcassel> Technical Community Manager MarkLogic Corporation<http://www.marklogic.com/> http://developer.marklogic.com/ From: <general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>> on behalf of Nikunj Vekariya <nikunjdvekar...@yahoo.com<mailto:nikunjdvekar...@yahoo.com>> Reply-To: Nikunj Vekariya <nikunjdvekar...@yahoo.com<mailto:nikunjdvekar...@yahoo.com>>, MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Date: Wednesday, April 12, 2017 at 7:24 AM To: "general@developer.marklogic.com<mailto:general@developer.marklogic.com>" <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: [MarkLogic Dev General] MarkLogic unexpected REST authentication (digest) I am using MarkLogic 8.0-5.4 version. I created an REST instance @port 8100. The default http configuration for authentication is digest and default user is nobody. When I pass the following curl command it throws 401 unauthorized error. curl --anyauth --user admin:admin -X PUT -d "" -H "Content-type: application/xml" http://localhost:8100/v1/documents?uri=/shakespeare/plays/a_and_c.xml Even if I replace --anyauth with --digest it throws the same error. Also if I change the default user to admin and keep authentication to digest in admin, and try to run the same curl command it throws the same error. Basically if I keep authentication to digest, it throws 401 unauthorized error message. I want to use digest in my application. How shall I proceed to it? Best, Nikunj Vekariya ___ General mailing list General@developer.marklogic.com<mailto:General@developer.marklogic.com> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Trying to find equivalent to MongoDB aggregate( match -> project[include] -> project[exclude] -> unwind )
Can you give some context as to what "a better way" should do better than your solution? Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.com] on behalf of John Muehlhausen [j...@esseforma.com] Sent: Tuesday, April 11, 2017 4:08 PM To: general@developer.marklogic.com Subject: [MarkLogic Dev General] Trying to find equivalent to MongoDB aggregate( match -> project[include] -> project[exclude] -> unwind ) I have a MarkLogic solution in server-side JavaScript but I'm thinking there must be a better way? The document is in a bitemporal store called "zzz". I want to do the following: Search documents in both the "zzz" and "latest" collections (i.e. the most recent version of "zzz"). Find the documents where papa="Dan". Of these results, eliminate the "child" arrays from "children" items. Return {yoyo,hoop} documents where there will be as many return documents as there are children. In other words, there is denormalization where yoyo is duplicated. A source document with two children would produce two output documents. JavaScript follows example document: { "systemStart": "2017-04-11T17:46:48.468112Z", "systemEnd": "-12-31T11:59:59Z", "validStart": "2014-04-03T11:00:00", "validEnd": "2014-04-03T16:00:00", "id": "2017-04-11/34567", "date": "2017-04-11", "yoyo": "12345", "papa": "Dan", "children": [ { "hoop": "A", "child": [ { "a": "a" } ] } , { "hoop": "B", "child": [ { "a": "b" } ] } ] } My current solution: var query=cts.andQuery([cts.collectionQuery("latest"),cts.collectionQuery("zzz"),cts.jsonPropertyValueQuery("papa","Dan")]); var results=cts.search(query); var arr = Array(); for(var result of results) { result = result.toObject(); var len = result.children.length; for(var i=0;i<len;i++) { delete result.children[i].child; result.children[i].yoyo = result.yoyo; arr.push(result.children[i]); } } arr; Current output in Query Console, assuming two documents are found: [ { "hoop": "A", "yoyo": "12345" } , { "hoop": "B", "yoyo": "12345" } , { "hoop": "A", "yoyo": "12345" } , { "hoop": "B", "yoyo": "12345" } ] In MongoDB the query looks something like this (without the bitemporal of course): db.zzz.aggregate( [ {$match: {“papa” : “Dan”}}, {$project: {_id: false, “yoyo”: true, “children”: true} }, {$project: {“children.child”:false}}, {$unwind: “$children”} ] ) Thanks, John ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] How to pass parameters between JavaScript functions?
I think you need to add "return" before parseFlights(resp); in the getFlightsInAir function. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com> Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 2/8/2017 2:09 PM, Matt Moody wrote: > > ML JavaScript newbie question… > > > > I’m simply trying to pass the > > response from my httpGet call into my > parseFlights function. The httpGet > > call returns a valid JSON > document. However, it looks like the parameter > > that gets passed to > the parseFlights function is always blank. > > > > > > Why? > > > > > > declareUpdate(); > > > > var username = 'moody1'; > > var > > apiKey = '#'; > > > > var fxml_base_url = > > > 'http://flightxml.flightaware.com/json/FlightXML2/'; > > var url_endpoint = > > 'SearchBirdseyeInFlight?howMany'; > > var url = fxml_base_url + > > url_endpoint; > > > > function getFlightsInAir() { > > var resp = > > fn.subsequence(xdmp.httpGet(url, > > > {"authentication":{"method":"basic","username":username,"password":apiKey}}), > > > 2, 1); > > > > // return resp; // this displays a valid JSON > > document > > parseFlights(resp); // pass the JSON document to > > parseFlights > function > > }; > > > > function parseFlights(flights) { > > > > return flights; // this data is always blank > > }; > > > > > > getFlightsInAir(); > > > > > > > > Any help would be appreciated!! > > * * > > > > *Matt Moody* > > Sales Engineer > > MarkLogic Pty Ltd > > > > matt.mo...@marklogic.com<mailto:matt.mo...@marklogic.com> > > <mailto:matt.mo...@marklogic.com><mailto:matt.mo...@marklogic.com> > > > > Mobile: +61 415 564 355 > > Skype: matt.moody.ML > > > > www.marklogic.com<http://www.marklogic.com> > > <http://www.marklogic.com/><http://www.marklogic.com/> > > .png > > > > > > ___ General mailing list > > > General@developer.marklogic.com<mailto:General@developer.marklogic.com> > > Manage your subscription at: > > > http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] XML to JSON using MLCP
Have you looked at using a transform? Here's an example from the docs<https://docs.marklogic.com/guide/mlcp/import#id_17589> that converts from binary to XML, so I'm guessing you can convert from XML to JSON. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com> Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 2/7/2017 3:16 PM, Shiv Shankar wrote: Hi, I am using nested xml and used -input_file_type aggregates since it is a nested XML. I see them inserting as XML documents. Is there way I can insert them as json documents using MLCP? Thanks Shan. ___ General mailing list General@developer.marklogic.com<mailto:General@developer.marklogic.com> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] How to get the number of records ingested by a MLCP hit
the Data Movement SDK is also available via the Early Access program or via github: https://github.com/marklogic/java-client-api/tree/4.0.0-EA4 Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.com] on behalf of Dave Cassel [dave.cas...@marklogic.com] Sent: Thursday, January 12, 2017 9:09 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] How to get the number of records ingested by a MLCP hit An additional thought for the benefit of the list archive: in MarkLogic 9, you should be able to get this functionality through the Data Movement SDK (currently still in development). -- Dave Cassel<http://davidcassel.net>, @dmcassel<https://twitter.com/dmcassel> Technical Community Manager MarkLogic Corporation<http://www.marklogic.com/> http://developer.marklogic.com/ From: <general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>> on behalf of David Lee <david@marklogic.com<mailto:david@marklogic.com>> Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Date: Thursday, January 12, 2017 at 10:41 AM To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: Re: [MarkLogic Dev General] How to get the number of records ingested by a MLCP hit Depending on your use case and scenarios, mlcp is an application layer built on XCC and HDFS libraries, as well as application logic. If you like the feature set of mlcp and just want to call it from java then In would recommend as did Sam to call it as a sub process (Runtime.exec()). I do *not* recommend calling it in the same JVM as your java code -- there is no significant performance or feature gains and there is risk of 'contamination' of the global memory space between your app and MLCP. If you follow the 'best practice' for running sub-processes in Java, MLCP will work well in that mode. In particular, pay attention to making sure that the input and output streams of the process don’t stall or deadlock - this is well documented in the Oracle Java API's for sub process. ( not particular to MLCP). You will need to read the stdout of mlcp concurrently with writing any input provided to it or redirect the subproject's input, output, error streams to a file. This will give you the full feature set of mlcp. If you are looking for a java library intended to be embedded into a Java app, The suggested Java based library for embedded use is the Java SDK which is based on the REST framework. It is different than MLCP but at about the same abstraction level, and is a library intended for embedded use. If you are looking for the same underlying protocol that mlcp uses (but minus the application level features and HDFS support), MLCP uses the XCC library as the transport layer for direct access to ML. XCC is a library intended for imbedded use. It’s a low-level library -- so it doesn’t have the high level features that MLCP has, nor is it as easy to use or debug but it is what MLCP uses for direct ML communication, is publicly downloadable and documented. I would have recommended it as the it only if the Java API does not do what you need. In that case We would appreciate any input into what issues you have as it is designed to be much more usable at the 'application layer' in a Java app then XCC is, supporting directly high level features like complex search, data mapping etc. and in general is a good abstraction layer for middle tier Java Applications, analogous to say Hibernate -- whereas XCC is more analogous to JDBC. From: general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Sam Mefford Sent: Wednesday, January 11, 2017 2:33 PM To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: Re: [MarkLogic Dev General] How to get the number of records ingested by a MLCP hit I'm sure you could find a way. But using mlcp in a Java application is not a supported usage. mlcp is designed to run at the command-line. Sam Mefford Senior Engineer MarkLogic Cor
Re: [MarkLogic Dev General] How to get the number of records ingested by a MLCP hit
I'm sure you could find a way. But using mlcp in a Java application is not a supported usage. mlcp is designed to run at the command-line. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.com] on behalf of Yogesh Kumar [yogiman...@gmail.com] Sent: Wednesday, January 11, 2017 4:10 AM To: general@developer.marklogic.com Subject: Re: [MarkLogic Dev General] How to get the number of records ingested by a MLCP hit Gentle reminder!! On 09-Jan-2017 3:44 PM, "Yogesh Kumar" <yogiman...@gmail.com<mailto:yogiman...@gmail.com>> wrote: Gentle reminder!! On 07-Jan-2017 10:13 PM, "Yogesh Kumar" <yogiman...@gmail.com<mailto:yogiman...@gmail.com>> wrote: Hi Team, I am using MLCP in my java code to ingest the data into Mark Logic. How can I get the following details in my java application OUTPUT_RECORDS OUTPUT_RECORDS_COMMITTED OUTPUT_RECORDS_FAILED Thanks, Yogesh ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Execute Xquery code using REST API
Can you share more details about what you've tried and how you've verified that it's not working? Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.com] on behalf of Bhushan Suryawanshi [bsuryawan...@xenomorph.com] Sent: Wednesday, January 11, 2017 9:41 AM To: MarkLogic Developer Discussion Subject: [MarkLogic Dev General] Execute Xquery code using REST API Hi Guys, Greetings of the day. I am using MarkLogic8 using REST API. I was trying to execute the below XQuery Code using REST Api. let $size := 1 let $distinct-element-qnames := distinct-values( for $i in doc("BTU_NZ_Equity.xml")[1 to $size]//* return node-name($i) ) for $qn in $distinct-element-qnames return element element { attribute local-name { local-name-from-QName($qn) } } I was referring to the below link for content transformations, https://docs.marklogic.com/guide/rest-dev/transforms#id_17421 The Example provided is using the documents service whereas I want to use the content transformation for search service. Below is the code I have written following the guidelines provided in the example online. xquery version "1.0-ml"; module namespace example = "http://marklogic.com/rest-api/transform/element_root;; declare function example:transform( $context as map:map, $params as map:map, $content as document-node() ) as document-node() { if (fn:empty($content/*)) then $content else let $size := (map:get($params,"size"),"UNDEFINED")[1] let $category := (map:get($params, "category"), "transformed")[1] let $distinct-element-qnames := distinct-values( for $i in collection($category)[1 to $size]//* return node-name($i) ) for $qn in $distinct-element-qnames return element element { attribute local-name { local-name-from-QName($qn) } } }; However I couldn’t get it working, Can anyone help me with the above problem and let me know if I am missing something. Thanks for your help in advance. Thanks & Regards, Bhushan Suryawanshi Bhushan Suryawanshi Developer t: +1-212-401-7894 www.xenomorph.com<http://www.xenomorph.com/?mail> | blog<http://xenomorph.typepad.com/?mail> | twitter<http://goo.gl/oaBWO> | linkedin<http://goo.gl/rdi8W> This email is confidential and is intended only for the addressee. If you are not the intended recipient, please note that any dissemination, distribution or copying of this email is strictly prohibited. Any and all estimates, proposals, quotes and fees contained herein are for discussion purposes only and are not contractually binding unless executed in accordance with the standard practices of the Company. Attachments to this message have been virus checked but no guarantee can be made that any attachment is virus free. Please notify us immediately of any problem. ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] How to pull data out of marklogic quickly?
The reason it is slower without the tag around it is you return a sequence which serializes as a multi-part http response with headers for each part. That, of course, is slower than one response (surrounded by tags) with the headers only once. There is no back-and-forth when using result.hasNext() and results.next(), just the small overhead of the multi-part response. Have you tried using the values endpoint? With Java you use queryManager.values or queryManager.tuples if you have a view defined on your range indexes. For more info see "Search On Tuples (Tuples Query / Values Query)"<http://docs.marklogic.com/guide/java/searches#id_65191> in the Java Application Developer's Guide and Finding Value Co-Occurrences in Lexicons<http://docs.marklogic.com/guide/rest-dev/search#id_94885> in the REST Application Developer's Guide. That might be one of the fastest options for you since the answers come directly out of the in-memory range indexes. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com> Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 10/11/2016 2:27 PM, Mark Shanks wrote: I previously tried not wrapping the output in the element. Running the following code: for $x in cts:search(fn:doc(),cts:and-query(( cts:element-value-query(xs:QName('Department'), 'Sales'), cts:element-range-query(xs:QName('Date'), '>', xs:date('2015-01-01')), cts:element-range-query(xs:QName('Date'), '<', xs:date('2015-01-03')), cts:not-query(cts:element-value-query(xs:QName('Date'), 'NULL')) )), 'unfiltered' , 0.0) )), 'unfiltered' , 0.0) return fn:concat($x//Department,'|',$x//Total,'|',$x//Location'') It would return the required documents in the console. However, when I ran the same code using the rest api and java using: theCall.xquery(query); out.println(theCall.evalAs(String.class)); It would print out only a single document. I then tried the iterator instead: theCall.xquery(query); EvalResultIterator result = theCall.eval(); while (result.hasNext()) { out.println(result.next().getString()); } This did retrieve all of the documents, but was benchmarked as slower - presumably because you have so much back and forth between java and the server. Is there another way to get the results into java that does not involve the iterator but returns all of the documents? From: general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com> <general-boun...@developer.marklogic.com><mailto:general-boun...@developer.marklogic.com> on behalf of Geert Josten <geert.jos...@marklogic.com><mailto:geert.jos...@marklogic.com> Sent: Tuesday, 11 October 2016 6:15:57 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] How to pull data out of marklogic quickly? Hi Mark, The best way to tackle this would be to parallelize output. Have 10 or more worker threads consume parts of the total (how many might depend on your cluster size, and the total amount of records you need to produce), and make each write a CSV on its own. The cts:search is a good starting point, but if you want to emit CSV anyhow, then don’t wrap the results of cts:search in a element. Instead let each doc found from cts:search return one or more line-strings, which you don’t join either. MarkLogic will insert line-ends between such strings automatically, and this way it will allow for streaming. Doing it right, one worker should be able to produce a 1 mln record csv file in a few minute on an average laptop. At this point, I would worry less about using $x//Department, but assuming $x holds the document node, you could write $x/Record/Department. That would indeed be a little quicker. Not sure if Corb(2) can produce CSV, and if it would leverage parallelism in the same way as I meant, but it could be worth taking a look at cluster-based tools like Hadoop. Apache Camel might allow parallel processing too.. Cheers, Geert From: <general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>> on behalf of Mark Shanks <markshanks...@hotmail.com<mailto:markshanks...@hotmail.com>> Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Date: Tuesday, October 11, 2016 at 12:27 AM To: MarkLogic Developer Discussion <general@deve
Re: [MarkLogic Dev General] mlcp and loading multiple files
>From what I can tell, that means you are not seeing a bug in mlcp. You simply >have a an assumption in your transform that is effectively a race condition. >If your transform depends on a set of files being loaded before another set of >files, you must completely load the first set first. I have written more advanced transforms in the past that can merge all dependencies whenever the last one arrives, but that's only necessary when you can't find a way to just load the first set first. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com> Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 7/5/2016 9:30 AM, Hans Hübner wrote: Hi, just to let you know: The problem that I had was entirely caused by the fact that I was loading files in parallel that depended on each other, by the way of the loader transformation that I've posted. The mlcp percentage display is still confusing, though, as it apparently shows the percentage of the input data that was loaded into the database, not the number of records read from the input. That could be improved, I think but it does not seem to be very important. Thank you Indy and Geert for looking at this! -Hans On Sun, Jul 3, 2016 at 7:52 PM, Hans Hübner <hans.hueb...@lambdawerk.com<mailto:hans.hueb...@lambdawerk.com>> wrote: Hi, I'm trying to load a bunch of files into MarkLogic using mlcp, but for some reason, it seems that it skips some of the files. I'm using a command line like this: mlcp.sh import \ -database tx-claims \ -host marklogic -port 8884 -username XXX -password XXX -mode local \ -input_file_path 2015/277ca/ \ -input_file_type aggregates -aggregate_record_element TRANSACTION \ -transform_module /transform-in.xquery \ -transform_function transform-response \ -transform_namespace http://lambdawerk.com/tx-claims The transform-response function looks like this: declare function tx-claims:transform-response( $content as map:map, $context as map:map ) as map:map* { let $doc := map:get($content, 'value') let $icn := $doc/TRANSACTION/LOOP2000D/LOOP2200D/TRN/TRN02/text() let $uri := concat('/responses/', $icn, '.xml') return (map:put($content, 'uri', $uri), $content) }; The mlcp output looks like this at the end: 16/07/03 18:59:22 INFO contentpump.LocalJobRunner: completed 76% 16/07/03 18:59:28 INFO contentpump.LocalJobRunner: completed 77% 16/07/03 18:59:30 INFO contentpump.LocalJobRunner: completed 78% 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: completed 80% 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: completed 81% 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: com.marklogic.mapreduce.ContentPumpStats: 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 1421471 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 1421471 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_COMMITTED: 1404192 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: Total execution time: 3270 sec After the load operation completes, nothing unusual is in the ErrroLog.txt file. However, when I look into the database, I find that some files are missing. When I load one of the missing files into the database explicitly (specifying its name as -input_file_path argument), it is correctly loaded. Now, the mlcp output looks kind of fishy to me in that i apparently loads the last 19% of the work in under one second. It seems that it is skipping a whole bunch of files. It also seems that some output records could not be written. The manual says that this could be caused by a server-side transformation, but our function does not seem to be at fault - When I load the missing file specifying its file name, it is correctly loaded, so it seems to be something else. I would greatly appreciate any ideas or advice. Thanks! Hans -- LambdaWerk GmbH Oranienburger Straße 87/89 10178 Berlin Phone: +49 30 555 7335 0 Fax: +49 30 555 7335 99 HRB 169991 B Amtsgericht Charlottenburg USt-ID: DE301399951 Geschäftsführer: Hans Hübner http://lambdawerk.com/ ___ General mailing list General@developer.marklogic.com<mailto:General@developer.marklogic.com> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general __
Re: [MarkLogic Dev General] Avoiding Facets in search:snippet highlight
To do this I think you'll need a custom snippeting function<https://docs.marklogic.com/guide/search-dev/query-options#id_61707> with custom highlighting<https://docs.marklogic.com/guide/search-dev/highlight#id_79734>. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com> Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 6/14/2016 7:34 AM, Erik Hennum wrote: Hi, Tabish: This might not be what you're asking but... If you want to turn off the facet output for a range index, set the facet attribute or subproperty to false: http://docs.marklogic.com/guide/search-dev/appendixa#id_42752 If you want to prefer other elements or properties to properties with a range index, used preferred-matches: http://docs.marklogic.com/guide/search-dev/appendixa#id_50098 If you need to implement a custom snippeting function that skips a blacklist of elements or properties, see: http://docs.marklogic.com/guide/search-dev/query-options#id_58295 If none of that helps, please provide a specific example of what the current and desired output. Erik Hennum From: general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com> [general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>] on behalf of ShamsTabish Sheikh [shams4...@hotmail.com<mailto:shams4...@hotmail.com>] Sent: Tuesday, June 14, 2016 4:49 AM To: general@developer.marklogic.com<mailto:general@developer.marklogic.com> Subject: [MarkLogic Dev General] Avoiding Facets in search:snippet highlight Hello team, I'm are trying to display the search keyword matches to the user in our app. I have used search:snippet to highlight the matches, in some cases facet values are appearing in the results which we don't want, please suggest a way to avoid facet values in search:highlight. Thanks & Regards, Tabish. ___ General mailing list General@developer.marklogic.com<mailto:General@developer.marklogic.com> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Banking - ml vs ms sql (or oracle) | performance story
Yeah, and that list can be broken down almost infinitely. For example: * faster to query data using exact match on multiple fields * faster to query data using fuzzy match, wildcard match, date range match, etc. * faster to query data joining five tables * faster to query data with very large (or wide) records * faster to query data with very large result set * faster to query data using full-text search with phrase matching * faster with all of the above in a single query As I'm sure you know, there are may other ways to compare databases that might align better with tangible value: * are there core features that would benefit the application and clearly differentiate the databases? * search * faceted search * tunable relevance ranking * fuzzy matching to de-duplicate customers across systems despite name misspellings * schema flexibility * since CRM benefits from integrating data from many sources * since tradestore data formats change frequently<http://www.marklogic.com/blog/nosql-operational-trade-store/> * is there any impedance mismatch between persistence data model and application data model? * does the trade-store require any advanced server-side rules at index time / query time? * if so, are there features that will help implement those rules * etc., etc. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com> Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 5/18/2016 9:28 AM, Anthony Coates wrote: Classification: Public Hi. I'm not allowed to make technology recommendations, and so I'm not going to recommend any particular database here. However, historically many database vendors have been uncomfortable with people publishing benchmark comparisons, because the actual performance of a database often depends on exactly how you use it. Sometimes you may be hitting the sweet spot of a particular database, sometimes you may not be, and the database that is faster for you might not be the database that is faster for someone else. There are also different ways to measure "faster": * faster to store data * faster to query data * faster to develop/test/debug applications (i.e. "faster to market"). All of those can be important, but in different cases a different one may be the most important. So, if you can define which area of database performance or use is the most critical one to be "faster" for your use case, it might be easier for you to make a clear statement about how your short-listed databases compare to each other for your own needs. Cheers, Tony. -Original Message- From: general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Sebastien Vige Sent: 18 May 2016 16:14 To: General@developer.marklogic.com<mailto:General@developer.marklogic.com> Subject: [MarkLogic Dev General] Banking - ml vs ms sql (or oracle) | performance story Team, We are battling with a dev team. We need to show them ML is faster than MSSQL. In which user story (banking or other) can we reference / present a factual performance gain ? This will help us to convince our prospect. Our use cases are CRM web app fueled by ML as well as a mini tradestore. Any input welcome ! Tx Seb Kind Regards Sebastien Envoyé de mon iPhone ___ General mailing list General@developer.marklogic.com<mailto:General@developer.marklogic.com> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general --- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures and to http://www.db.com/unitedkingdom/content/privacy.htm for information about privacy. ___ General mailing list General@developer.marklogic.com<mailto:General@developer.marklogic.com> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/genera
Re: [MarkLogic Dev General] Exporting Xquery files using MLCP
Hey Danny, When I try it it works fine for me. mlcp 1.3-3 and 8.0-5. Here's the command I ran: mlcp.bat export -host localhost -database Modules -port 8000 -username admin -password admin -mode local -output_file_path modules -directory_filter /marklogic.rest.transform/ -content_encoding system Also, you asked if there is another way to programmatically extract the said files. Here's a way to do it using the Java Client API. Note the constructor for DatabaseClient specifies the "Modules" database as the third arg. DatabaseClient client = DatabaseClientFactory.newClient("localhost", 8000, "Modules", "admin", "admin", DIGEST); DocumentManager docMgr = client.newDocumentManager(); boolean infinite = true; StructuredQueryDefinition query = new StructuredQueryBuilder().directory(infinite, "/marklogic.rest.transform/"); int start = 1; DocumentPage results = docMgr.search(query, start); System.out.println("results.size()=[" + results.size() + "]"); for ( DocumentRecord result : results ) { String docContents = result.getContentAs(String.class); System.out.println("docContents=[" + docContents + "]"); } Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com> Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 3/31/2016 1:28 PM, Danny Sinang wrote: Hi, Has anyone here tried to export Xquery files from the Modules database using MLCP ? We tried the config below, but the files generated had numbers at the start and end 014392 xquery version "1.0-ml"; module namespace cds="http://www.company.org/cds;; [2:04] cds:replace-subject-id-with-value($updated-xpaths, $updated-doc) else $updated-doc }; 20 If MLCP can't do this properly, can anyone suggest another way to programmatically extract the said files ? Regards, Danny export -host localhost -database Modules -port 8006 -username admin -password admin -mode local -output_file_path /Users/danny/Downloads/cds/ -directory_filter /cds/ -content_encoding system ___ General mailing list General@developer.marklogic.com<mailto:General@developer.marklogic.com> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] New to Marklogic, Java REST API client
Do not use QueryOptionsBuilder as it is deprecated. Instead use your favorite XML builder API. XML builder libraries for Java are prevalent and several good ones are directly supported by the Java Client API: Jackson<https://docs.marklogic.com/javadoc/client/com/marklogic/client/io/JacksonHandle.html>, JDOM<https://docs.marklogic.com/javadoc/client/com/marklogic/client/extra/jdom/JDOMHandle.html>, DOM4J<https://docs.marklogic.com/javadoc/client/com/marklogic/client/extra/dom4j/DOM4JHandle.html>, XOM<https://docs.marklogic.com/javadoc/client/com/marklogic/client/extra/xom/XOMHandle.html>, DOM<https://docs.marklogic.com/javadoc/client/com/marklogic/client/io/DOMHandle.html>, and JAXB<https://docs.marklogic.com/javadoc/client/com/marklogic/client/io/JAXBHandle.html>. Pick your favorite. Also, there are XML Builders that create an InputStream or String and can thus be supported using StringHandle<https://docs.marklogic.com/javadoc/client/com/marklogic/client/io/StringHandle.html> or InputStreamHandle<https://docs.marklogic.com/javadoc/client/com/marklogic/client/io/InputStreamHandle.html>. For an example, you could use XMLStreamWriter: ByteArrayOutputStream baos = new ByteArrayOutputStream(); XMLOutputFactory factory = XMLOutputFactory.newInstance(); factory.setProperty(XMLOutputFactory.IS_REPAIRING_NAMESPACES, true); XMLStreamWriter w = factory.createXMLStreamWriter(baos, "UTF-8"); w.setDefaultNamespace("http://marklogic.com/appservices/search;<http://marklogic.com/appservices/search>); w.writeStartElement("options"); w.writeStartElement("constraint"); w.writeAttribute("name", "tag"); w.writeStartElement("collection"); w.writeEndElement(); w.writeEndElement(); w.writeStartElement("constraint"); w.writeAttribute("name", "company"); w.writeStartElement("value"); w.writeAttribute("type", "string"); w.writeStartElement("json-property"); w.writeCharacters("affiliation"); w.writeEndElement(); w.writeEndElement(); w.writeEndElement(); w.writeEndElement(); System.out.println( baos.toString("UTF-8") ); Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com> Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 2/22/2016 4:08 AM, Hamza Bennani wrote: Hi, I'm new to MarkLogic and I'm facing a problem when I try to define a JSON key/value constraint via the Java REST API client. I'm using MarkLogic v8 My Java code is : QueryOptionsManager optionsMgr = client.newServerConfigManager().newQueryOptionsManager(); QueryOptionsBuilder qob = new QueryOptionsBuilder(); QueryOptionsHandle optsHandle = new QueryOptionsHandle().withConstraints( qob.constraint("tag", qob.collection(""))); // add a JSON value constraint optsHandle.addConstraint( qob.constraint("company", qob.value( qob.jsonTermIndex("affiliation"; The error thrown is : Status 500: RESTAPI-INVALIDCONTENT: (err:FOER) Invalid content: Operation results in invalid Options: Use "json-property" instead of "json-key" to specify structures in JSON. Validation detail: XDMP-VALIDATEUNEXPECTED: (err:XQDY0027) validate strict { $opt } -- Invalid node: Found search:json-key but expected ((search:element)|search:attribute)|search:json-property|search:field)|search:fragment-scope)*)|((search:term-option|search:weight)*))* at fn:doc("")/search:options/search:constraint[2]/search:value/search:json-key using schema "search.xsd"An , or specification is required on . My question is how to use "json-property" instead of "json-key" as suggested ? Thanks, -- Hamza BENNANI ___ General mailing list General@developer.marklogic.com<mailto:General@developer.marklogic.com> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] MarkLogic highlights wrong content when searched
I answered this question on Stack Overflow: http://stackoverflow.com/a/34206432/3582140http://stackoverflow.com/a/34206432/3582140 Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. From: general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.com] on behalf of tanaya.mon...@cognizant.com [tanaya.mon...@cognizant.com] Sent: Thursday, December 10, 2015 4:08 AM To: general@developer.marklogic.com Subject: [MarkLogic Dev General] MarkLogic highlights wrong content when searched Hi, I am trying to search some content and highlight the search strings present in the content(like google) in MarkLogic using REST API. The problem is when I am including "ME" in the search-string, it's highlighting the (html italic tags) along with the "Me" in the content. I have created a field with some elements and running a word-query on the field. For example: some data from me more data from somewhere by me I have created a field called 'suggestions' with elements 'title' and 'desc' and searching the search strings within the field using word-query. Now when i search for "some me" ,its retrieving the content like <some data from <me more data <i> from <i> somewhere by <me Url: localhost:9000/v1/search?q=some me=Data=0=10=Transformation=json I am using cts:highlight for highlighting, something like : cts:highlight($final-result, $query, fn:concat('',$cts:text,'')), $custom-config) Trying to understand if there is any grammar for which ‘Me’ and ‘I’ are treated as same? Or there is some problem with HTML tag parsing. Looking forward for a prompt reply. Thanks in Advance. Regards, Tanaya Mondal Programmer Analyst This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] search MarkLogic Database using Regular Expressions
+1 to what Erik said. To give an example, at ingestion time you could add to each document an element (or JSON property) with the VIN number, like: vinPn5123456/vin Then at query time you could look for any document with the vin element. To extract the vin from each document at and add the element at ingest time, you could use any ETL tool or scripting language or XQuery via CPF, trigger, or MLCP. Several natural language processors try to solve this kind of enrichment problem. Maybe someone on the list can recommend specific NLP tools for VIN recognition based on their experience. In my experience, don't use an NLP tool if you can solve the problem with a simple regex. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.commailto:sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.comhttp://www.marklogic.com This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 7/14/2015 10:01 PM, Erik Hennum wrote: Hi, Javier: If it's a smallish set of documents, you can write a loop that reads each document and applies a regex to all of the text in the document, but if it is a substantial corpus, you should look at enriching the documents to support searching for VIN numbers. To search over a set of values with performance at scale requires an index over the values. To recognize the values within JSON or XML documents, the indexer looks for a specified JSON property or XML element or attribute. That requires modifying the documents on or after ingestion to identify the VIN numbers. (It's easiest if you can specify a unique JSON property or XML element or attribute, but if that's not possible, fields can support unions and path range indexes can support containment.) Several natural language processors try to solve this kind of enrichment problem. Maybe someone on the list can recommend specific NLP tools for VIN recognition based on their experience. Hoping that helps, Erik Hennum From: general-boun...@developer.marklogic.commailto:general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.commailto:general-boun...@developer.marklogic.com] on behalf of Javier Lizarraga [jlizarr...@gennet.commailto:jlizarr...@gennet.com] Sent: Tuesday, July 14, 2015 5:21 PM To: general@developer.marklogic.commailto:general@developer.marklogic.com Subject: [MarkLogic Dev General] search MarkLogic Database using Regular Expressions Is there a way to issue a search using a regular expression in MarkLogic? For example the following regular expression identifies a vin number: (([a-h,A-H,j-n,J-N,p-z,P-Z,0-9]{9})([a-h,A-H,j-n,J-N,p,P,r-t,R-T,v-z,V-Z,0-9])([a-h,A-H,j-n,J-N,p-z,P-Z,0-9])(\d{6})) I would like to issue a query that would search the entire database returning documents that contain valid vin numbers. Similar to the MarkLogic fn:match which takes in a string and outputs a Boolean value. fn:matches(this is my string 2T3JK4DV1AW023473 , (([a-h,A-H,j-n,J-N,p-z,P-Z,0-9]{9})([a-h,A-H,j-n,J-N,p,P,r-t,R-T,v-z,V-Z,0-9])([a-h,A-H,j-n,J-N,p-z,P-Z,0-9])(\d{6}))) I’d like to do something like this cts:search(“(([a-h,A-H,j-n,J-N,p-z,P-Z,0-9]{9})([a-h,A-H,j-n,J-N,p,P,r-t,R-T,v-z,V-Z,0-9])([a-h,A-H,j-n,J-N,p-z,P-Z,0-9])(\d{6}))) Any help would be greatly appreciated!! Javier ___ General mailing list General@developer.marklogic.commailto:General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Rest API extract-document-data
The response for extract-document-data doesn't come through as metadata. It comes through as a search:extractedsearch:extracted element. We'll be enhancing MatchDocumentSummary (via SearchHandle) to have accessor method for that, but it's not there yet. Here's some sample code you can use immediately: Document results = queryMgr.search(query, new DOMHandle()).get(); NodeList extracts = results.getElementsByTagNameNS(*, extracted); for (int i=0; i extracts.getLength(); i++ ) { Node extract = extracts.item(i); } Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.commailto:sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.comhttp://www.marklogic.com This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 4/17/2015 10:38 AM, Köhler, Klaus wrote: Hi Geert, thanks a lot. This is working now! Is there a way to get the extracted data via the Java Api? When using the QBE mode I could fetch the extracted data via MatchDocumentSummary[] results = resultsHandle.getMatchResults(); for (MatchDocumentSummary result : results) { Document doc = result.getMetadata(); } When using combined search getMetadata() returns null. Klaus From: general-boun...@developer.marklogic.commailto:general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Geert Josten Sent: Friday, April 17, 2015 12:58 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Rest API extract-document-data Hi Klaus, There was a fix in 8.0-2 that addressed this. The REST-api is kind of assuming you want multipart/mixed response, while you didn’t actually ask for that. In 8.0-2 this is handled differently. Cheers, Geert From: Köhler, Klaus k.koeh...@klopotek.demailto:k.koeh...@klopotek.de Reply-To: MarkLogic Developer Discussion general@developer.marklogic.commailto:general@developer.marklogic.com Date: Friday, April 17, 2015 at 12:24 PM To: General@developer.marklogic.commailto:General@developer.marklogic.com General@developer.marklogic.commailto:General@developer.marklogic.com Subject: [MarkLogic Dev General] Rest API extract-document-data Hi, when using the Rest API and the search option extract-document-data I got the error XDMP-AS: (err:XPTY0004) searchmodq:resolve($structured-query, $options, $params) -- Invalid coercion: (search:response snippet-format=snippet total=153 start=1 page-length=10 xmlns:search=http://marklogic.com/appservices/search;search:resulthttp://marklogic.com/appservices/search%22%3E%3Csearch:result index=1 uri=/programs/bef81c54-03ab-4bd8-b73c-.../search:response, fn:doc(/programs/bef81c54-03ab-4bd8-b73c-16a2b6980455.xml), fn:doc(/programs/763865fe-050c-4512-9ce6-b44ab8a8cd12.xml), ...) as element(search:response) . See the MarkLogic server error log for further detail. The search options are as following: search xmlns=http://marklogic.com/appservices/search;http://marklogic.com/appservices/search%22 options constraint name=test value element ns= name=componentUuid / /value /constraint extract-document-data selected=all/ /options /search Using the XQuery console everything works fine. Any idea? Klaus Köhler Software Architect Klopotek Partner GmbH Schlüterstraße 39 10629 Berlin Tel. + 49 30.884 53-204 Fax + 49 30.884 53-100 mailto:k.koeh...@klopotek.de http://www.klopotek.de Geschäftsführung: Ulrich Klopotek von Glowczewski (Vorsitzender) Stefan Jacob, Wolf-Michael Mehl AG Charlottenburg HRB 45287 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind, informieren Sie bitte sofort den Absender und vernichten diese E-Mail. Das Kopieren oder die Weitergabe der E-Mail ist in diesem Fall nicht gestattet. Aufgrund der technischen Gegebenheiten im Internet ist nicht auszuschließen, dass diese Nachricht auf dem Weg zu Ihnen gelesen oder verfälscht worden ist. Fachliche Ausführungen in E-Mails und/oder Anlagen dazu haben grundsätzlich Entwurfscharakter und bedürfen unserer schriftlichen Bestätigung. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient please notify the sender immediately and destroy this e-mail. In this case any copy, disclosure or distribution of the material in this e-mail is strictly forbidden. Due to the technical structures of the Internet, this message may have been scanned or faked on its way to you. Professional statements within an e-mail and/or attachments to an e-mail are to be regarded
Re: [MarkLogic Dev General] search:search constraint for document URI
This use case looks perfect for what Erik is describing, using search:document-query. Your application can add that directly as part of the structured query so there's no need for a constraint. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.com This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 4/1/2015 10:08 AM, Murray, Gregory wrote: My use case is a web application for a digital library of books where each XML document is a book (metadata + full text). Normally the user searches the entire database/library of books, but if the user finds a specific book of interest, we want to allow the user to search only within the text of that book and see snippets and page numbers where the term occurs. I actually didn't include the entire search options previously. We also use the following to return search results at the page level rather than the document/book level: !-- only search page elements; thus the @path of each search result will indicate a page, not an entire document -- searchable-expression xmlns:ia=http://digital.library.ptsem.edu/ia;/ia:doc/ia:text/ia:page/searchable-expression For an example, go here and perform a search, let's say for the word city http://commons.ptsem.edu/id/acacianlyricsmis00mund On Apr 1, 2015, at 11:56 AM, Erik Hennum wrote: Hi, Danny and Gregory: You can also use a search:document-query in the Search API without using an additional query. http://docs.marklogic.com/guide/search-dev/structured-query#id_27172 What's the use case for defining a constraint for document URIs? I wouldn't expect users to type document uris into a search box or an application to build facets over document URIs (which by definition have one fragment per URI). Erik Hennum From: general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.com] on behalf of Murray, Gregory [gregory.mur...@ptsem.edu] Sent: Wednesday, April 01, 2015 8:47 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] search:search constraint for document URI You can do this with cts:document-query() like so: (: Performs a search for the specified query text within the specified document. :) declare function m:document-search( $qtext as xs:string, $uri as xs:anyURI, $start as xs:unsignedLong?, $page-length as xs:unsignedLong?) as element(search:response) { let $options := options xmlns=http://marklogic.com/appservices/search; !-- limit the search to the specified document -- additional-query{cts:document-query($uri)}/additional-query /options return search:search($qtext, $options, $start, $page-length) }; On Apr 1, 2015, at 11:06 AM, Danny Sinang wrote: Is it possible to define a search:search constraint to match a document URI without having to resort to writing/using a custom constraint ? Regards, Danny ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Getting null directory using JAVA API
We want to encourage participation as much as possible. We've all been there: done the tutorials, read over a hundred-pages of guide, thumbed through all the javadocs, and tried exhaustive google searches and still can't figure something out. We encourage everyone in that boat to use this forum rather than stay stuck. The good news about this question from Ns is the follow-up post sharing the answer when it was found. So now when someone else faces the same question and they use google, the answer will be readily available. With that said, good pointer to the NoSQL for Dummies book. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.commailto:sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.comhttp://www.marklogic.com This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 2/24/2015 11:48 AM, Peter Gomez wrote: Get a book called NoSQL for Dummies and some MarkLogic documentation before clogging this forum man Date: Wed, 25 Feb 2015 00:14:20 +0530 From: maisnam...@gmail.commailto:maisnam...@gmail.com To: general@developer.marklogic.commailto:general@developer.marklogic.com Subject: Re: [MarkLogic Dev General] Getting null directory using JAVA API After getting the uris from cts:uris() , I am setting it like the below and now it is working querydef.setDirectory(/contentD:/marklogic/data/TopSongs/); On Tue, Feb 24, 2015 at 11:56 PM, Maisnam Ns maisnam...@gmail.commailto:maisnam...@gmail.com wrote: Hi, I uploaded roughly 1100 files from my local directory1(d:\marklogic\d1) and directory2 (d:\marklogic\d2) to marklogic database 'TestDB' . My requirement is to search from either from d1 or d2 based on checking a checkbox. If d1 is checked I need to search from d1 and vice versa. But as given in the marklogic API Java docs , I am trying to get the directory path by the below code but it is giving null. How can I set and unset the directories to search for and why querydef.getDirectory() is null. QueryManager queryMgr = conn.getClient().newQueryManager(); // create a search definition StringQueryDefinition querydef = queryMgr.newStringDefinition(options); querydef.setCriteria(arg); System.out.println(directory+ querydef.getDirectory()); Thanks ___ General mailing list General@developer.marklogic.commailto:General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.commailto:General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Converting Marklogic Java options to xml options
Are you looking for this? search:facet-optionsearch:facet-optionfrequency-order/search:facet-option Here it is in a complete example from the docshttp://docs.marklogic.com/search:search?q=search:searchv=8.0api=true: options xmlns=http://marklogic.com/appservices/search;http://marklogic.com/appservices/search ... constraint name=color-facet range type=xs:string facet=true element ns= name=bodycolor/ !-- the facet-option values are passed directly to the underlying lexicon calls -- facet-optionfrequency-order/facet-option facet-optiondescending/facet-option /range /constraint ... /options Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.commailto:sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.comhttp://www.marklogic.com This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 2/22/2015 9:00 PM, Maisnam Ns wrote: Hi , Can someone help me in converting this JAVA API Options to xml options, I am stuck with the 'frequency-order' which is there below in Java formation but couldn't get how to put this 'frequency-order' in xml query options syntax: JAVA : QueryOptionsHandle options = new QueryOptionsHandle().withValues( qob.values(country, qob.range( qob.elementRangeIndex(new QName(country), qob.stringRangeType(QueryOptions.DEFAULT_COLLATION))), frequency-order)); optionsMgr.writeOptions(optionsName, options); QueryManager queryMgr = client.newQueryManager(); ValuesDefinition valuesDef = queryMgr.newValuesDefinition(country, optionsName); valuesDef.setFrequency(Frequency.ITEM); XML: search:options xmlns=http://marklogic.com/appservices/search; search:values name=country search:range type=xs:string collation=http://marklogic.com/collation/; search:element ns= name=country/ /search:range search:values/search:values/ /search:options Above where to put frequency-order element? As I want to get the same result using the above xml to that of Java formation Thanks ___ General mailing list General@developer.marklogic.commailto:General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Error trying to create query options in xml
Ah well, my last reply was redundant as you seem to have found the answer yourself. Good on you! Here's an update with a working sample. Notice I changed search:constraint to search:values. And notice I pass the options name to the newValuesDefintion method. import com.marklogic.client.DatabaseClientFactory; import static com.marklogic.client.DatabaseClientFactory.Authentication.DIGEST; import com.marklogic.client.DatabaseClient; import com.marklogic.client.admin.QueryOptionsManager; import com.marklogic.client.query.CountedDistinctValue; import com.marklogic.client.query.QueryManager; import com.marklogic.client.query.ValuesDefinition; import com.marklogic.client.io.StringHandle; import com.marklogic.client.io.ValuesHandle; public class Test { public static void main(String[] args) { DatabaseClient client = DatabaseClientFactory.newClient(localhost, 8000, admin, admin, DIGEST); QueryOptionsManager optionsMgr = client.newServerConfigManager().newQueryOptionsManager(); // construct the query options String optionXml = search:options + xmlns:search='http://marklogic.com/appservices/search'+ search:values name='country'search:valuesname='country'+ search:range collation='http://marklogic.com/collation/' type='xs:string' facet='true'search:rangecollation='http://marklogic.com/collation/'type='xs:string'facet='true'+ search:facet-optionsearch:facet-optionfrequency-order/search:facet-option+ search:facet-optionsearch:facet-optiondescending/search:facet-option+ search:facet-optionsearch:facet-optionlimit=10/search:facet-option+ search:element ns='' name='country'/search:elementns=''name='country'/+ /search:range+ /search:values+ /search:options; // create a handle to send the query options StringHandle writeHandle = new StringHandle(optionXml); // write the query options to the database optionsMgr.writeOptions(myOptions, writeHandle); QueryManager queryMgr = client.newQueryManager(); ValuesDefinition query = queryMgr.newValuesDefinition(country, myOptions); ValuesHandle values = queryMgr.values(query, new ValuesHandle()); for (CountedDistinctValue value : values.getValues() ) { String textValue = value.get(xs:string, String.class); System.out.println(textValue + + value.getCount()); } } } Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.commailto:sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.comhttp://www.marklogic.com This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 2/22/2015 10:58 PM, Maisnam Ns wrote: Hi, Can someone help me on creating the query options , the first one using 1. QueryOptionsBuilder in Java API works but with xml it is not working, the result I am getting is 1. US 100 2. JP 49 3. ES 23 Basically , I am getting the country and the counts . QueryOptionsBuilder qob = new QueryOptionsBuilder(); // expose the SPEAKER element range index as speaker values QueryOptionsHandle options = new QueryOptionsHandle().withValues( qob.values(country, qob.range( qob.elementRangeIndex(new QName(country), qob.stringRangeType(QueryOptions.DEFAULT_COLLATION))), frequency-order)); 2. QueryOptionsManager optionsMgr = client.newServerConfigManager().newQueryOptionsManager(); // construct the query options String optionXml = search:options + xmlns:search='http://marklogic.com/appservices/search'+ search:constraint name='country'search:constraintname='country'+ search:range collation='http://marklogic.com/collation/' type='xs:string' facet='true'+ search:facet-optionsearch:facet-optionfrequency-order/search:facet-option+ search:facet-optionsearch:facet-optiondescending/search:facet-option+ search:facet-optionsearch:facet-optionlimit=10/search:facet-option+ search:element ns='' name='country'/search:elementns=''name='country'/+ /search:range+ /search:constraint+ /search:options; // create
Re: [MarkLogic Dev General] Problem designing DB Architecure for MarkLogic
I don't want to overwhelm you, but it's worth pointing out that in order to design a schema I usually take much more into account: 1. What kinds of queries will you run? Which data items need value queries, range queries, facets, term queries, wildcard queries, etc.? What will be your peak query throughput? 2. Where does your external data come from? How often will you get updates? What transformations will you perform as you get it? 3. What data is created/managed in MarkLogic? How do you keep data managed in MarkLogic distinct from data managed elsewhere but copied to MarkLogic? 4. How is data presented through the application? What transformations for search results, list pages, detail pages? Since I've done mostly search applications, one of the most important things I'm trying to determine is the granularity of search results so ideally one document maps to one search result. Since there's a lot to account for, you might engage some consulting to help you through the process until you're ready to do it yourself. With that said, to just try things out you can model your data several different ways. Here's one example: User Id1/Id Group_idGroup1/Group_id NameMark/Name Project NameProject1/Name /Project /User User Id2/Id Group_idGroup2/Group_id NameLisa/Name Gallery album_Id1/album_Id Album_NameMyalbum1/Album_Name /Gallery /User Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.commailto:sam.meff...@marklogic.com Cell: +1 801 706 9731 www.marklogic.comhttp://www.marklogic.com This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 2/13/2015 12:45 AM, Srinivas wrote: Hi All, Can some body assists me how to build an architecture for this in Marklogic DB. This application is built on MySQL now we are trying to migrate it to MarkLogic. If somebody could assist me building this in MarkLogic, that would help me a lot. How to relate one document collection to another. Ex: Ø User group1 has Projects, Products as their sub level of data storage Ø User Group2 has Gallery associated with him User Table: Id Group_id Name 1 Group1 Mark 2 Group2 Lisa Project Table: Project_Id User_id Name 1 1 Project1 2 1 Project2 Gallery: album_Id User_id Album Name 1 2 Myalbum1 [Registration,Profile1,Profile2,projects,Products,achievements,Postings,Gallery] Thanks Regards, Srinivas | Sr CakePHP Programmer +91 - 9538025790 | srini...@nervecentrex.commailto:srini...@nervecentrex.com nerve centrex software (India) pvt. Ltd. 122 soudhamini 3rd main gruhalakshmi layout II stage Kamalanagar Bangalore 560079 Karnataka India [cid:image002.png@01CF74D9.64936420] Disclaimer: This communication is for informational purposes only. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of Nerve Centrex Software (India) Pvt. Ltd., its subsidiaries and affiliates. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify Nerve Centrex immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Nerve Centrex Software (India) Pvt. Ltd., its subsidiaries and affiliates therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by Nerve Centrex Software (India) Pvt. Ltd., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. ___ General mailing list General@developer.marklogic.commailto:General@developer.marklogic.com http://developer.marklogic.com/mailman
[MarkLogic Dev General] How to track collection last modified?
Is there a good general way to track when a collection was last modified? I'd like to use this for app-layer (Java) caching of reponses to complex queries. Rather than re-run queries, I'd like to cache the responses unless the underlying data has been modified (and thus the collection last-modified stamp is newer than the one associated with the cache). I looked at last modified timestamps for the database and directories, and those might be viable alternatives, but I see no API to get the values MarkLogic is tracking. I considered updating a document with the last-modified timestamp each time a document is updated in the collection, but that could become a bottleneck when indexing high volumes. I considered creating a range index on a last-modified timestamp element on each document and querying it to find out the highest last modified value. While this may work, I wonder if it's wasted indexing effort and memory for the range index since the timestamp is about the worst fit for a range index because very few documents will have the same timestamps. Anyone ever try something similar? Do you think one of the ideas above is best, or is there a better way? -- Sam Mefford Avalon Consulting, LLC (801) 706-9731 ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
[MarkLogic Dev General] Support for direct-child queries?
I have a need to retrieve relevance-ranked documents containing text matching in TITLE tags that are direct children of INFO-OBJ tags. I'm using cts:search to get matching documents in relevance order. Using cts:element-query I can match in TITLE tags that are descendants of INFO-OBJ tags, but I see no way to limit the matching only to direct children. Thus I'm matching TITLE tags that are children of descendant TABLE tags, rather than only match the direct-child INFO-OBJ/TITLE tags I want. Any ideas? Sam Mefford sam.meff...@marklogic.com 801-706-9731 ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
[MarkLogic Dev General] general@developer.marklogic.com
You can change the searchable expression (the first arg to cts:search or the searchable-expression option in search:search). Yeah, i thought of those, but neither gives me relevance ranked documents, each will return relevance-ranked titles, which isn't what I'm looking for in this case. Sam Date: Fri, 17 Sep 2010 10:29:42 -0700 From: Danny Sokolsky danny.sokol...@marklogic.com Subject: Re: [MarkLogic Dev General] Support for direct-child queries? To: General Mark Logic Developer Discussion general@developer.marklogic.com Message-ID: c9924d15b04672479b089f7d55ffc1325531d...@exchg-be.marklogic.com Content-Type: text/plain; charset=us-ascii You can change the searchable expression (the first arg to cts:search or the searchable-expression option in search:search). for example: cts:search(//INFO-OBJ/TITLE, $query) -Danny -Original Message- From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Sam Mefford Sent: Friday, September 17, 2010 10:21 AM To: General Mark Logic Developer Discussion Subject: [MarkLogic Dev General] Support for direct-child queries? I have a need to retrieve relevance-ranked documents containing text matching in TITLE tags that are direct children of INFO-OBJ tags. I'm using cts:search to get matching documents in relevance order. Using cts:element-query I can match in TITLE tags that are descendants of INFO-OBJ tags, but I see no way to limit the matching only to direct children. Thus I'm matching TITLE tags that are children of descendant TABLE tags, rather than only match the direct-child INFO-OBJ/TITLE tags I want. Any ideas? Sam Mefford sam.meff...@marklogic.com 801-706-9731 ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general