Re: [MarkLogic Dev General] Viewing Documents with WebDAV

Tim Meagher Fri, 17 Jun 2011 11:07:59 -0700

Hi Danny,


I think that last-modified is enabled by default.  I haven't tried to
disable it to improve performance, but I can't say I have a good feel for
all of the ramifications for doing so.

 

Regardless of the state of last-modified, I understand the problem of
browsing a huge directory in a file system.  My MarkLogic applications are a
port from a file system-based application in which I hierarchically
organized the content for faster access and to avoid pushing the limits of
the number of files in a directory.  For the port I created hierarchical
directory URIs especially so that I can use WebDAV and as a result I can use
oXygen and a variety of other WebDAV clients to simply read and post content
(but I won't use it for moving or renaming content).  I don't see any
noticeable delays using WebDAV to access documents and sub-directory URIs.
It is not something I do on a regular basis, but nonetheless I am being
advised not to use WebDAV as such because of performance issues.  I just
don't have any way to measure the impact of using WebDAV on our MarkLogic
database.

 

I've been looking into writing an xquery app using cts:uris as you
suggested, but it seems to be a waste of time without understanding at what
point my use of WebDAV negatively affects the ability of our database to
keep up with all its other processing.  Is there some way to measure or to
anticipate that based on the number of documents or subdirectories in a
directory URI?  What underlying calls is MarkLogic using service WebDAV
request?

 

Thank you!

 

Tim

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Danny Sokolsky
Sent: Friday, June 17, 2011 1:34 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Viewing Documents with WebDAV

 

Hi Tim,

 

There is nothing inherently slow about WebDAV per se, but as I see it, there
are 2 issues that people tend to run up against:

 

1)      Scale:  WebDAV requires last-modified to be enabled, which in turn
creates properties fragments on each document in the database.  This tends
to be fine for smaller data sets (say a few million documents), but is less
fine for larger data sets (100s of million or billions of documents).  

2)      Many WebDAV clients have problems: Some WebDAV clients do strange
things.  This tends to manifest itself in weird behavior when you do things
like rename a directory.  For example, on Windows 7 WebDAV, when you create
a document via WebDAV, it first creates an empty document, then updates it
with the contents.

 

Now many of the problems you see in WebDAV clients you would also see if you
just used a filesystem browser.  Try opening a filesystem browser on a
directory that contains 1 million documents (actually, I would not try it if
I were you).  It is hard on both the client and on the server.

 

If what you want is directory browsing, I would recommend writing something
in XQuery and use cts:uris.

 

So it really depends on what you are doing.

 

I'm not sure that really answers your question, but maybe it will begin to
chip away at it.

 

-Danny

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Tim Meagher
Sent: Friday, June 17, 2011 9:34 AM
To: 'General MarkLogic Developer Discussion'
Subject: Re: [MarkLogic Dev General] Viewing Documents with WebDAV

 

Hi Folks,

 

I haven't had any response to this yet - did I hit a nerve? J 

 

My experience with WebDAV and MarkLogic has been interesting.  There are
things I know cause problems, such as trying to move or rename a directory
URI.  It makes me wonder if there are some WebDAV commands that should be
disallowed, unless it depends on the client.  In any case, it would be very
beneficial to get some answers about my original question.

 

~Tim M.

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Tim Meagher
Sent: Sunday, June 12, 2011 4:57 PM
To: 'General MarkLogic Developer Discussion'
Subject: [MarkLogic Dev General] Viewing Documents with WebDAV

 

Hi Folks,

 

The MarkLogic documentation clearly states that:

 

The main purpose of a WebDAV server is to make it easy for people to store,
retrieve, and modify documents in a database. The documents can be any type,
whether they are text documents such as .txt files or source code, binary
documents such as image files or Microsoft Word files, or XML documents.
Because the documents are stored in a database, you can create applications
that use the content in those documents for whatever purpose you need. You
can also use the database backup and restore features to easily back up the
content in the database.

 

WebDAV is pretty useless when it comes to browsing directory URIs than
contain too many documents and/or subdirectories, but I have been operating
on the assumption that by organizing a large set of XML documents in a
hierarchical directory URI structure that I can limit the number of
documents and subdirectories that are accessible via the Data Source
Explorer in oXygen's WebDAV client (and other WebDAV clients as well). 

 

It has been extremely valuable for me to use the oXygen WebDAV Data Source
Explorer to quickly drill down, locate documents, compare the input and
output of transforms, to debug updated transforms, and when necessary to
manually correct and save an errant  XML document in the input stream of a
CPF pipeline.  This functionality is not simply available in CQ.  However, I
was recently informed that even if directory structures are organized
hierarchically, that WebDAV clients still cause the server to incur
significant performance hits when opening a directory in the oXygen or other
WebDAV treeview to explore its contents.  The risk may be more pronounced on
a production system than on a development system, but full content sets for
evaluation may only be available in a production environment.  I haven't
seen any documentation that discusses these details and I'm not familiar
with the WebDAV API's for browsing and reading directory URIs in MarkLogic,
so I would like to go a little deeper and try to determine:

 

1.       How much of a performance hit is incurred and at what point, i.e.,
when there are 1,000 subdirectories and/or document URIs within a given
directory URI?  Is there a given number at which the performance hit becomes
negligible so that using a hierarchical directory URI structure is feasible?
Is there a way to measure that performance for any given WebDAV client?

 

2.       Can the WebDAV client be tuned to explicitly prevent any
significant hits to the server, i.e., by limiting threads, timeouts, etc?

 

3.       The 4.1 documentation refers to  tested WebDAV clients.  I'm
surprised that list doesn't include oXygen or some other freeware clients.
I'm assuming that by saying that the listed WebDAV clients were tested that
they also passed some form of acceptance testing.  I'd like to suggest that
the list be updated to reflect oXygen, EnGinSite DataFreeway, and BitKinex.

 

4.       Are there protocols in WebDAV that allow for limited directory
viewing - that is to only request the first N subdirectories and/or
documents within a given directory URI so as to explicitly limit the load on
the server when trying to get a directory listing?

 

My options for replacing the use of oXygen and to avoid performance
penalties associated with WebDAV are to:

 

1.       Build my own WebDAV client (if I can limit the directory listings),
or

 

2.       Build an xquery web app that uses optimized queries such as
cts:uris to drill down into directory URIs with an expandable TreeView to
locate documents (and as a means of using the oXygen WebDAV-based editor to
create the URL of the document I want to open and to paste it into the
oXygen OpenURL feature).

 

3.       Find existing code that can hopefully be ported  to perform either
of the above.

 

Thank you all for any help with this!

 

Tim Meagher

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Viewing Documents with WebDAV

Reply via email to