Re: [MarkLogic Dev General] Marklogic Write Profiling

2014-07-09 Thread Michael Blakeley
Not easily. There are elements for fragments-added and fragments-deleted in the 
output from xdmp:query-meters. But normally they are always zero, because the 
output is generated before the commit phase has been run.

declare namespace qm="http://marklogic.com/xdmp/query-meters"; ;
xdmp:document-insert('test', ),
xdmp:query-meters()/(qm:fragments-added|qm:fragments-deleted)/string()
=> 0 0

Commit doesn't run until the request is done, so it's difficult to get any 
information about what happened during the commit phase. One way around this is 
to wrap your update work in a read-only request, so that it runs with 
different-transaction isolation. Here's an example using xdmp:eval - but I'd 
prefer xdmp:invoke for a real implementation.

declare namespace qm="http://marklogic.com/xdmp/query-meters"; ;
xdmp:eval("xdmp:document-insert('test', )"),
xdmp:query-meters()/(qm:fragments-added|qm:fragments-deleted)/string()
=> 2 0

I'm pretty sure that will work with ML5. But with ML7 you could use 
invoke-function instead, which is pretty slick.

declare namespace qm="http://marklogic.com/xdmp/query-meters"; ;
xdmp:invoke-function(
  function() {
xdmp:document-insert('test1', ),
(: When using invoke-function, be sure to commit. :)
xdmp:commit() },
  
update
  ),
xdmp:query-meters()/(qm:fragments-added|qm:fragments-deleted)/string()
=> 2 2

Both times we see that two fragments were added. That's the document fragment 
plus the properties fragment, because I had maintain-last-modified enabled. And 
the second time, two old fragments were deleted.

This doesn't tell us everything. We can say that there was I/O from timestamp 
updates. If journaling was enabled then there were also journal writes. After 
that the state of any existing in-memory stands might drive more I/O, including 
saves and merges. For journal traffic it would also be important to have an 
idea how large the fragments were.

-- Mike

On 9 Jul 2014, at 13:12 , Timothy Pearce  wrote:

> Is there a way to profile what file operations happen during a given query? 
> Much like there is the profile you can run which says time taken per query, 
> I'm interested in getting a file I/O record. Importantly it's where/when the 
> data inside the file stands are being changed is in question.
> 
> The premise of the issue is, the backup solution is doing a i/o transaction 
> log and replaying it to a remote storage, so any extraneous writes can 
> balloon this log, and the goal is to identify if such writes are happening. 
> There is a selection process which tags certain documents into collections, 
> and these initial queries may return thousands of documents within the 
> marklogic DB, but only tags a select number. The amount of data changes going 
> to the backup is much larger than the expected for this operation, and thus 
> the need to identify if a query function is writing to the files inside the 
> db which it should not.
> 
> Currently running Marklogic Server Enterprise Edition 5.0.
> If this functionality has been added in a newer version, that would be useful 
> to be a solution.
> 
> Thanks,
> Tim
> 
> 
> Nothing in this message is intended to constitute an electronic signature 
> unless a specific statement to the contrary is included in this message. 
> Confidentiality Note: This message is intended only for the person or entity 
> to which it is addressed. It may contain confidential and/or proprietary 
> material. Any review, transmission, dissemination or other use, or taking of 
> any action in reliance upon this message by persons or entities other than 
> the intended recipient is prohibited. If you received this message in error, 
> please contact the sender and delete it from your computer.
> ___
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> 

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


[MarkLogic Dev General] Marklogic Write Profiling

2014-07-09 Thread Timothy Pearce
Is there a way to profile what file operations happen during a given query? 
Much like there is the profile you can run which says time taken per query, I'm 
interested in getting a file I/O record. Importantly it's where/when the data 
inside the file stands are being changed is in question.

The premise of the issue is, the backup solution is doing a i/o transaction log 
and replaying it to a remote storage, so any extraneous writes can balloon this 
log, and the goal is to identify if such writes are happening. There is a 
selection process which tags certain documents into collections, and these 
initial queries may return thousands of documents within the marklogic DB, but 
only tags a select number. The amount of data changes going to the backup is 
much larger than the expected for this operation, and thus the need to identify 
if a query function is writing to the files inside the db which it should not.

Currently running Marklogic Server Enterprise Edition 5.0.
If this functionality has been added in a newer version, that would be useful 
to be a solution.

Thanks,
Tim


Nothing in this message is intended to constitute an electronic signature 
unless a specific statement to the contrary is included in this message. 
Confidentiality Note: This message is intended only for the person or entity to 
which it is addressed. It may contain confidential and/or proprietary material. 
Any review, transmission, dissemination or other use, or taking of any action 
in reliance upon this message by persons or entities other than the intended 
recipient is prohibited. If you received this message in error, please contact 
the sender and delete it from your computer.
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] WebDAV app service code available or other implementation for WebDAV to MarkLogic?

2014-07-09 Thread Casey Jordan
So I have a requirement that all content which goes into the database gets
pre-processed with some java code which is already written. Also, there is
some markup that will likely need to be added to the content (Default
attributes defined by the schema etc). Also because ML does not support
saving doctype declarations they will probably need to be converted to PI's
in this process.

Then on the way out, similar but reverse processes need to happen via some
other java code. These are complex enough that a simple filter through
nginx wouldn't work.

So what I think I am looking at is that my application servers will need to
act as the gateway for any content in and out of the database that needs
this processing. So I will have to have a custom WebDAV implementation, as
well as REST, etc. This does not bother me so much, except that the current
open source WebDav implementations are notoriously buggy.


On Wed, Jul 9, 2014 at 12:47 PM, Justin Makeig 
wrote:

> No, that's not possible. MarkLogic's WebDAV app server is implemented in
> C++, deep in the guts of the Server.
> What types of customization are you looking to do? Is WebDAV a hard and
> fast requirement or would a custom REST service suffice?
> As was discussed on a recent thread, you could stick a reverse HTTP proxy
> in between your WebDAV client and MarkLogic and do processing there. You
> could also implement your own WebDAV server in XQuery with an HTTP app
> server. If WebDAV is not a hard and fast requirement, you could also create
> your own custom (likely simpler) REST services or extend the ones that are
> built into MarkLogic <
> http://docs.marklogic.com/guide/rest-dev/intro#chapter>.
>
> Justin
>
> Justin Makeig
> Director, Product Management
> MarkLogic Corporation
> justin.mak...@marklogic.com
> www.marklogic.com
>
>
>
> On Jul 9, 2014, at 9:24 AM, Casey Jordan  wrote:
>
> If someone wanted to customize the WebDAV app service does Mark Logic make
> this code available so that it could be modified and run on a separate
> application server (Ie inside a servlet container)?
>
> --
> --
> Casey Jordan
> easyDITA a product of Jorsek LLC
> "CaseyDJordan" on LinkedIn, Twitter & Facebook
> (585) 348 7399
> easydita.com
>
>
> This message is intended only for the use of the Addressee(s) and may
> contain information that is privileged, confidential, and/or exempt from
> disclosure under applicable law.  If you are not the intended recipient,
> please be advised that any disclosure  copying, distribution, or use of
> the information contained herein is prohibited.  If you have received
> this communication in error, please destroy all copies of the message,
> whether in electronic or hard copy format, as well as attachments, and
> immediately contact the sender by replying to this e-mail or by phone.
> Thank you.
>  ___
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>
>
>
> ___
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>
>


-- 
--
Casey Jordan
easyDITA a product of Jorsek LLC
"CaseyDJordan" on LinkedIn, Twitter & Facebook
(585) 348 7399
easydita.com


This message is intended only for the use of the Addressee(s) and may
contain information that is privileged, confidential, and/or exempt from
disclosure under applicable law.  If you are not the intended recipient,
please be advised that any disclosure  copying, distribution, or use of
the information contained herein is prohibited.  If you have received
this communication in error, please destroy all copies of the message,
whether in electronic or hard copy format, as well as attachments, and
immediately contact the sender by replying to this e-mail or by phone.
Thank you.
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] WebDAV app service code available or other implementation for WebDAV to MarkLogic?

2014-07-09 Thread Justin Makeig
No, that's not possible. MarkLogic's WebDAV app server is implemented in C++, 
deep in the guts of the Server. 
What types of customization are you looking to do? Is WebDAV a hard and fast 
requirement or would a custom REST service suffice?
As was discussed on a recent thread, you could stick a reverse HTTP proxy in 
between your WebDAV client and MarkLogic and do processing there. You could 
also implement your own WebDAV server in XQuery with an HTTP app server. If 
WebDAV is not a hard and fast requirement, you could also create your own 
custom (likely simpler) REST services or extend the ones that are built into 
MarkLogic .

Justin

Justin Makeig
Director, Product Management
MarkLogic Corporation
justin.mak...@marklogic.com
www.marklogic.com



On Jul 9, 2014, at 9:24 AM, Casey Jordan  wrote:

> If someone wanted to customize the WebDAV app service does Mark Logic make 
> this code available so that it could be modified and run on a separate 
> application server (Ie inside a servlet container)?
> 
> -- 
> --
> Casey Jordan
> easyDITA a product of Jorsek LLC
> "CaseyDJordan" on LinkedIn, Twitter & Facebook
> (585) 348 7399
> easydita.com
> 
> 
> This message is intended only for the use of the Addressee(s) and may
> contain information that is privileged, confidential, and/or exempt from
> disclosure under applicable law.  If you are not the intended recipient,
> please be advised that any disclosure  copying, distribution, or use of
> the information contained herein is prohibited.  If you have received
> this communication in error, please destroy all copies of the message,
> whether in electronic or hard copy format, as well as attachments, and
> immediately contact the sender by replying to this e-mail or by phone.
> Thank you.
> ___
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general



smime.p7s
Description: S/MIME cryptographic signature
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


[MarkLogic Dev General] WebDAV app service code available or other implementation for WebDAV to MarkLogic?

2014-07-09 Thread Casey Jordan
If someone wanted to customize the WebDAV app service does Mark Logic make
this code available so that it could be modified and run on a separate
application server (Ie inside a servlet container)?

-- 
--
Casey Jordan
easyDITA a product of Jorsek LLC
"CaseyDJordan" on LinkedIn, Twitter & Facebook
(585) 348 7399
easydita.com


This message is intended only for the use of the Addressee(s) and may
contain information that is privileged, confidential, and/or exempt from
disclosure under applicable law.  If you are not the intended recipient,
please be advised that any disclosure  copying, distribution, or use of
the information contained herein is prohibited.  If you have received
this communication in error, please destroy all copies of the message,
whether in electronic or hard copy format, as well as attachments, and
immediately contact the sender by replying to this e-mail or by phone.
Thank you.
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Guides on database design for multi-tenancy?

2014-07-09 Thread Will Thompson
Just wanted to add one note about sharing a schema database. Although schema 
validation is explicit, schema type assessment is implicit and automatic. If 
two schemas defined conflicting types on the same element name, I assume that 
would throw a dynamic error. If every schema is namespaced, though, it 
shouldn’t be a problem.

-Will

On Jul 7, 2014, at 2:27 PM, Michael Blakeley  wrote:

> That suggests a raw tree size of about 5-GB for a large customer. With a high 
> level of text indexing it might approach 20-GB or even 40-GB. That's a medium 
> size for a forest. It's best to limit them to about 200-GB, but short of that 
> a larger forest is more efficient than a smaller one. Since those are your 
> larger customers, that suggests you could combine quite a few smaller 
> customers. To me this points to a shared database.
> 
> Forest storage is basically schemaless. Simply ingesting XML doesn't validate 
> it against a schema. You do that explicitly using a validate { ... } 
> expression. It's possible to make that happen using a trigger, if you want 
> automatic validation. But usually it's better to accept documents even when 
> they don't validate, so that fixing them is a database operation.
> 
> Your next question may be: should I map specific customers to specific 
> forests? Usually no. Usually it's better to let the database spread documents 
> around. Think of the forests as disks in a RAID volume, rather than 
> sub-databases.
> 
> -- Mike
> 
> On 7 Jul 2014, at 10:58 , Casey Jordan  wrote:
> 
>> Thanks, I figured that there would be more resources that were not shared 
>> when having multiple dbs. That being said, I am not sure it would be a big 
>> impact in my case. I would say that a big client might have 500k documents 
>> that are around 10kb each. 
>> 
>> Also, another consideration is that each client needs to have separate 
>> schemas for their content. So this might force me into the multi db design. 
>> Unless I made the default content store forest schemaless 
>> 
>> Is it even possible to have a schemaless forest?
>> 
>> 
>> On Mon, Jul 7, 2014 at 1:37 PM, Gene Thomas  wrote:
>> I think the overall performance would be best with your content in separate 
>> databases.
>> 
>> Gene
>> 
>> 
>> On Monday, July 7, 2014 10:33 AM, Casey Jordan  
>> wrote:
>> 
>> 
>> Thanks guys that is really helpful information.
>> 
>> Is there any significant  performance or resource tradeoffs when choosing 
>> between putting everything in one big database vs splitting it into one for 
>> each "client"? Personally I like the idea of keeping everything as separate 
>> as possible, but if this mean that it had some major tradeoff that would be 
>> good to know.
>> 
>> 
>> On Mon, Jul 7, 2014 at 1:28 PM, Justin Makeig  
>> wrote:
>> Casey,
>> There are two ways in MarkLogic 7 to query a specific database: Create a 
>> separate app server (HTTP or XDBC) for each database. An app server has a 
>> default database that you can set in configuration. Each query/update 
>> evaluated for that app server runs against that database. Many app servers 
>> can point to one database, but an app server can only be associated with one 
>> database. Another, lower-level means is to use xdmp:eval 
>>  or xdmp:invoke. These 
>> allow you to specify a database at runtime and evaluate specific code 
>> against it. I wouldn't recommend this as a general approach, though. It will 
>> make your code less readable and, in certain scenarios, will prevent 
>> MarkLogic from maximizing some performance optimizations it does under the 
>> covers.
>> 
>> Another approach might be to create protected collections for each "tenant" 
>> within the same database. With MarkLogic's role-based security, you can be 
>> assured that you can completely restrict viewing and editing to very 
>> specific roles. You can take a similar approach to running privileged code 
>> with amps. Take a look at the Security Guide for more details 
>> .
>> 
>> Justin
>> 
>> 
>> 
>> Justin Makeig
>> Director, Product Management
>> MarkLogic Corporation
>> justin.mak...@marklogic.com
>> www.marklogic.com
>> 
>> 
>> 
>> On Jul 7, 2014, at 10:14 AM, Casey Jordan  wrote:
>> 
>>> Hi all,
>>> 
>>> I am checking out Mark Logic for the first time and I was interested if 
>>> there is any information around designing a cluster for multi-tenancy?
>>> 
>>> I assumed that I could create a separate database for each "client" that 
>>> would be using the application, and then segment data that way. However 
>>> right away it became a little unclear to me as to how I query a specific 
>>> database (couldn't find an example of this in the docs), or manage users, 
>>> triggers, schemas etc for a specific database. 
>>> 
>>> I know this is a fairly general question, but any advice would be helpful.
>>> 
>>> Thanks
>>> 
>>> -- 
>>> --
>>> Casey Jordan
>>> easyDITA a product