Re: [MarkLogic Dev General] Marklogic Webdav not connecting

2012-06-27 Thread DJaun Maclin
Thanks for the reply, Danny!
Automatic directory creation is enabled for the database.

From: Danny Sokolsky 
danny.sokol...@marklogic.commailto:danny.sokol...@marklogic.com
Reply-To: General Discussion 
general@developer.marklogic.commailto:general@developer.marklogic.com
Date: Tue, 26 Jun 2012 16:12:05 -0700
To: General Discussion 
general@developer.marklogic.commailto:general@developer.marklogic.com
Cc: Robert Tuten rtu...@wattnet.netmailto:rtu...@wattnet.net, Matt 
Leshinskie matt.leshins...@marklogic.commailto:matt.leshins...@marklogic.com
Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

Is automatic directory creation enabled for the database?
___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Marklogic Webdav not connecting

2012-06-27 Thread Danny Sokolsky
Here are a few other things to look at:

* Make sure the root directory exists in the database.  For example, if the 
root is / on the WebDAV server, make sure the directory / exists (  
xdmp:directory-properties(/) should return a properties document with a 
directory element, for example)
* make sure the user you are logging into the WebDAV server has permissions on 
that root directory.  Try logging in as a user with the admin role--=if that 
works but another user does not, then the user probably does not have 
permissions on the root directory.

-Danny

From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] On Behalf Of DJaun Maclin 
[dmac...@wattnet.net]
Sent: Tuesday, June 26, 2012 11:12 PM
To: General Discussion
Cc: Robert Tuten; Matt Leshinskie
Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

Thanks for the reply, Danny!
Automatic directory creation is enabled for the database.

From: Danny Sokolsky 
danny.sokol...@marklogic.commailto:danny.sokol...@marklogic.com
Reply-To: General Discussion 
general@developer.marklogic.commailto:general@developer.marklogic.com
Date: Tue, 26 Jun 2012 16:12:05 -0700
To: General Discussion 
general@developer.marklogic.commailto:general@developer.marklogic.com
Cc: Robert Tuten rtu...@wattnet.netmailto:rtu...@wattnet.net, Matt 
Leshinskie matt.leshins...@marklogic.commailto:matt.leshins...@marklogic.com
Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

Is automatic directory creation enabled for the database?
___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Marklogic Webdav not connecting

2012-06-27 Thread Geert Josten
Hi Djaun,

It might also be worth trying to connect to the webdav server from a
non-windows client, or by using a real webdav-client under windows. There
are lots of known issues with the windows network wizard with regard to
webdav. It typically involves doing various authentication attempts, but
giving up too early. Even something as silly as adding or removing the
trailing slash can make a difference.

Kind regards,
Geert

 -Oorspronkelijk bericht-
 Van: general-boun...@developer.marklogic.com [mailto:general-
 boun...@developer.marklogic.com] Namens Danny Sokolsky
 Verzonden: woensdag 27 juni 2012 8:12
 Aan: MarkLogic Developer Discussion
 Onderwerp: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

 Here are a few other things to look at:

 * Make sure the root directory exists in the database.  For example, if
the root is
 / on the WebDAV server, make sure the directory / exists (
xdmp:directory-
 properties(/) should return a properties document with a directory
element,
 for example)
 * make sure the user you are logging into the WebDAV server has
permissions
 on that root directory.  Try logging in as a user with the admin
role--=if that
 works but another user does not, then the user probably does not have
 permissions on the root directory.

 -Danny
 
 From: general-boun...@developer.marklogic.com [general-
 boun...@developer.marklogic.com] On Behalf Of DJaun Maclin
 [dmac...@wattnet.net]
 Sent: Tuesday, June 26, 2012 11:12 PM
 To: General Discussion
 Cc: Robert Tuten; Matt Leshinskie
 Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

 Thanks for the reply, Danny!
 Automatic directory creation is enabled for the database.

 From: Danny Sokolsky
 danny.sokol...@marklogic.commailto:danny.sokol...@marklogic.com
 Reply-To: General Discussion
 general@developer.marklogic.commailto:general@developer.marklogic.com
 
 Date: Tue, 26 Jun 2012 16:12:05 -0700
 To: General Discussion
 general@developer.marklogic.commailto:general@developer.marklogic.com
 
 Cc: Robert Tuten rtu...@wattnet.netmailto:rtu...@wattnet.net, Matt
 Leshinskie
 matt.leshins...@marklogic.commailto:matt.leshins...@marklogic.com
 Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

 Is automatic directory creation enabled for the database?
 ___
 General mailing list
 General@developer.marklogic.com
 http://community.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Strict validation fails: why?

2012-06-27 Thread Geert Josten
Hi Henrik,

 I am not an expert on schema, but I would add
 xmlns=http://www.irhb.org/sources; to the schema root element. I might
also
 try removing the elementFormDefault=unqualified attribute from the
schema
 root element.

Yes, declaring the default namespace should do the trick. Elements are
declared like:

xs:element name=sources

E.g. without prefix in the name attribute. You either need to make clear
that these should be tied to a default namespace being
http://www.irhb.org/sources, or you should declare a prefix for that
namespace, and use the prefix in the element declarations..

Kind regards,
Geert

 -Oorspronkelijk bericht-
 Van: general-boun...@developer.marklogic.com [mailto:general-
 boun...@developer.marklogic.com] Namens Michael Blakeley
 Verzonden: dinsdag 26 juni 2012 23:46
 Aan: MarkLogic Developer Discussion
 Onderwerp: Re: [MarkLogic Dev General] Strict validation fails: why?

 It looks like a namespace problem. The error seems to be saying that
your
 schema specifies the empty namespace, but the actual node is in the
 http://www.irhb.org/sources namespace.

 I am not an expert on schema, but I would add
 xmlns=http://www.irhb.org/sources; to the schema root element. I might
also
 try removing the elementFormDefault=unqualified attribute from the
schema
 root element.

 -- Mike

 On 26 Jun 2012, at 14:36 , Henrik Nielsen wrote:

  I am new to MarkLogic and by no means  an expert on XML. I would be
very
 grateful if members of this list would help me over the current
stumbling block. I
 have tried without success to validate a document in the Query Console
as
 follows:-
 
  xquery version 1.0-ml;
  import schema http://www.irhb.org/sources; at sources.xsd;
  (:
  this will validate against the specified schema, and will fail
  if the schema does not exist (or if it is not valid according to
  the schema)
  :)
  let $node := xdmp:document-get(c:/irhb/xmldocs/sources_y.xml)
  return
  try { xdmp:document-insert(sources_y.xml,
  validate strict { $node } )
  }
  catch ($e) { Validation failed: ,
  $e/error:format-string/text() }
 
  I get this error message:-
 
  ?xml version=1.0 encoding=UTF-8?
  results warning=more than one root itemValidation failed: XDMP-
 VALIDATEUNEXPECTED: (err:XQDY0027) validate strict { $node } -- Invalid
node:
 Found {http://www.irhb.org/sources}originator but expected (originator*)
at
 fn:doc(c:/irhb/xmldocs/sources_y.xml)/*:sources/*:originator[1] using
 schema sources.xsd/results
 
  This is the beginning of my schema:-
 
  ?xml version=1.0 encoding=utf-8?
  xs:schema targetNamespace=http://www.irhb.org/sources;
 xmlns:xs=http://www.w3.org/2001/XMLSchema;
  elementFormDefault=unqualified
  attributeFormDefault=unqualified
 xs:element name=sources
 
xs:complexType
 
xs:sequence
 
xs:element
 name=originator minOccurs=0 maxOccurs=unbounded
 
 xs:complexType
 
 xs:sequence
 
 xs:element name=originatorid type=xs:string /
 
  And here is the beginning of sources_y.xml:-
 
  sources xsi:schemaLocation=http://www.irhb.org/sources sources.xsd
  xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
  xmlns=http://www.irhb.org/sources;
originator
 
originatoridY/originatorid
 
surnameY./surname
 
  I have understood that strict validation requires the outermost
element to be a
 root element, but as far as I can see that is also the case here, so
what can be
 wrong?
 
 
  Henrik Thiil Nielsen
  (Copenhagen)
  ___
  General mailing list
  General@developer.marklogic.com
  http://community.marklogic.com/mailman/listinfo/general

 ___
 General mailing list
 General@developer.marklogic.com
 http://community.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Marklogic Webdav not connecting

2012-06-27 Thread Erik Zander
Hi Djaun!

As I don't know your windows version but as it sounds its maybe win 7 (this 
should be mostly the same for vista) that your trying to connect from.

Could it be that your previous successful connections was made from a Win XP 
machine?

If so here's some things to take note of 

*The Web services service isn't enabled by default in win 7 as this is needed 
we want to start that one automatically. 

*Windows 7 do not support basic authentication if not via ssh this can be 
changed in the registry in the BasicAuthLevel key found under 
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\WebClient\Parameters to 
enable basic authentications without ssh set the value to 2.

*To speed things up open Internet Explorer and open settings under the 
connections tab click on lan settings in this new window uncheck the detect 
settings automatically box

*Lastly restart to let the settings take place.


Hope this helps 
Regards
Erik

===

Message: 4
Date: Wed, 27 Jun 2012 06:12:04 +
From: DJaun Maclin dmac...@wattnet.net
Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting
To: General Discussion general@developer.marklogic.com
Cc: Robert Tuten rtu...@wattnet.net,  Matt Leshinskie
matt.leshins...@marklogic.com
Message-ID: cc100e14.5a14%dmac...@wattnet.net
Content-Type: text/plain; charset=us-ascii

Thanks for the reply, Danny!
Automatic directory creation is enabled for the database.

From: Danny Sokolsky 
danny.sokol...@marklogic.commailto:danny.sokol...@marklogic.com
Reply-To: General Discussion 
general@developer.marklogic.commailto:general@developer.marklogic.com
Date: Tue, 26 Jun 2012 16:12:05 -0700
To: General Discussion 
general@developer.marklogic.commailto:general@developer.marklogic.com
Cc: Robert Tuten rtu...@wattnet.netmailto:rtu...@wattnet.net, Matt 
Leshinskie matt.leshins...@marklogic.commailto:matt.leshins...@marklogic.com
Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

Is automatic directory creation enabled for the database?
-- next part --
An HTML attachment was scrubbed...
URL: 
http://community.marklogic.com/pipermail/general/attachments/20120627/17f0e739/attachment-0001.html
 

--

Message: 5
Date: Tue, 26 Jun 2012 23:12:02 -0700
From: Danny Sokolsky danny.sokol...@marklogic.com
Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting
To: MarkLogic Developer Discussion general@developer.marklogic.com
Message-ID:
c9924d15b04672479b089f7d55ffc1322261a79...@exchg-be.marklogic.com
Content-Type: text/plain; charset=us-ascii

Here are a few other things to look at:

* Make sure the root directory exists in the database.  For example, if the 
root is / on the WebDAV server, make sure the directory / exists (  
xdmp:directory-properties(/) should return a properties document with a 
directory element, for example)
* make sure the user you are logging into the WebDAV server has permissions on 
that root directory.  Try logging in as a user with the admin role--=if that 
works but another user does not, then the user probably does not have 
permissions on the root directory.

-Danny

From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] On Behalf Of DJaun Maclin 
[dmac...@wattnet.net]
Sent: Tuesday, June 26, 2012 11:12 PM
To: General Discussion
Cc: Robert Tuten; Matt Leshinskie
Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

Thanks for the reply, Danny!
Automatic directory creation is enabled for the database.

From: Danny Sokolsky 
danny.sokol...@marklogic.commailto:danny.sokol...@marklogic.com
Reply-To: General Discussion 
general@developer.marklogic.commailto:general@developer.marklogic.com
Date: Tue, 26 Jun 2012 16:12:05 -0700
To: General Discussion 
general@developer.marklogic.commailto:general@developer.marklogic.com
Cc: Robert Tuten rtu...@wattnet.netmailto:rtu...@wattnet.net, Matt 
Leshinskie matt.leshins...@marklogic.commailto:matt.leshins...@marklogic.com
Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

Is automatic directory creation enabled for the database?


--

___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


End of General Digest, Vol 96, Issue 69
***
___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Database Status Troubles

2012-06-27 Thread Mike Sokolov
I would agree that you are bumping up against hardware limits.

I get this regularly on a largish database ( 10M docs, ~ 680 GB on one 
server).  Whenever we re-index a significant amount of content the 
server becomes unusable for days - all it can handle is the reindexing 
and merging.  Even if we throttle reindexing down to 1 there will be 
periods of downtime (and of course the whole process takes longer).  The 
admin console responds fine except for the status screens, which 
(mostly) time out, as Alex reports.  If you keep trying, you can usually 
eventually get a response, although this can take 1/2 hour of retrying 
and waiting, which is painful.

The strategy we are pursuing is to maintain two servers; we reindex them 
in sequence, switching production from one to the other and back again.  
Another thing we are trying is increasing RAM (because we can, and the 
system doesn't *seem* to be CPU bound), but I suspect the real issue is 
disk I/O.

I'd like to add more disk, or faster ones (or ideally more servers, but 
that is challenging for a variety of reasons).  I'm also not sure how to 
prove that more I/O bandwidth would help dramatically.  I've looked at 
vmstat: we're not swapping, but I'm also not sure how to read the stats 
clearly enough to understand in detail what's going on with the disks.

Also wondering if you have recommendations as to how to configure 
disks.  Do you use RAID?  What kind?  A centralized SAN or other storage 
device?  I plan to ask support for help, too, but any general 
suggestions from folks on the list here would be appreciated.

-Mike

On 6/26/2012 6:16 PM, Danny Sokolsky wrote:
 Another thing you might try is turning the reindexer throttle down.  Go to 
 the database config page and try setting reindexer throttle to 1 (I think the 
 default is 5).  Then give your system a while to catch up, and see if you can 
 then get to the forest status page.


 -Danny

 -Original Message-
 From: general-boun...@developer.marklogic.com 
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Michael Blakeley
 Sent: Tuesday, June 26, 2012 2:37 PM
 To: MarkLogic Developer Discussion
 Subject: Re: [MarkLogic Dev General] Database Status Troubles

 On 26 Jun 2012, at 14:12 , Alex Milowski wrote:

 On Tue, Jun 26, 2012 at 2:01 PM, Michael Blakeley m...@blakeley.com wrote:
 On 26 Jun 2012, at 13:52 , Alex Milowski wrote:

 The forest status page doesn't return.  The browser just hangs out
 until something times out.
 That is decidedly abnormal. I have seen a few systems where database-status 
 was slow and would time out during reindexing, but forest-status should 
 not. Admittedly I don't do much with geospatial, so this might be a 
 geospatial-specific bug.
 Here's what I don't get.  I dropped a large index.  Shouldn't that be
 an easy operation?

 Or is it the index I added that is causing the problem?  It should
 only hit a very small number of documents/elements.  Is it still
 required it to touch everything?
 MarkLogic never updates in place (except for timestamps). So it doesn't drop 
 existing indexes: it only reindexes. When the indexes for an old fragment are 
 out of date with respect to the current database configuration, the fragment 
 is deleted and its content reinserted as a new fragment. That is happening to 
 all the fragments affected by your latest configuration changes.

 But my hunch is that the OS is overloaded, so I would start by checking the 
 OS status: top and iostat, on linux or Solaris. The OS may be out of RAM 
 and swapping, or possibly reindexing puts too much demand on the available 
 disk I/O capacity. Either way, I suspect that your database has outgrown 
 the hardware.
 Yes, that is certainly true.  I don't have enough memory.  Funny bit
 is that the application works reasonable well for my research
 purposes.  For production, it would need far more memory to support a
 load.
 Reindexing is essentially a large series of batched updates. Updates are much 
 more expensive than reads, and unlike reads they cannot be cached.

 You can do some monitoring of the reindexer status with 'tail -F' on the 
 error log, while setting the file-log-level to Debug. However if the OS is 
 paging, there won't be much progress to monitor.

 If you like, you could handle a situation like this by adding the new index 
 and allow reindexing, then turn off reindexing and delete the obsolete index. 
 The obsolete index won't disappear from the old stands, but new ones won't 
 have it. I'm not sure if that's useful in your situation, but it is an 
 option. It breaks the problem into a small one that you solve, and a larger 
 one that you ignore.

 Or you might reduce the sizes of your in-memory limits and host-level caches 
 to a bare minimum, to reduce demands on the OS. Reindexing performance would 
 suffer, but you might be able to get your work done. This isn't easy to get 
 right, though. Usually the best answer is to upgrade the hardware.

 -- 

Re: [MarkLogic Dev General] Database Status Troubles

2012-06-27 Thread Damon Feldman
Mike and Alex,

MarkLogic can do both at the same time, but an underlying system issue may be 
triggered by the intensified activity. I agree with a previous message 
suggesting the OS is overloaded, swapping or otherwise in serious trouble. The 
system should re-index without affecting queries very much, particularly if the 
reindex throttle is turned down. 

Can you check the OS and/or VM status for CPU, swap and I/O activity?

Yours,
Damon

-Original Message-
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Mike Sokolov
Sent: Wednesday, June 27, 2012 7:52 AM
To: MarkLogic Developer Discussion
Cc: Danny Sokolsky
Subject: Re: [MarkLogic Dev General] Database Status Troubles

I would agree that you are bumping up against hardware limits.

I get this regularly on a largish database ( 10M docs, ~ 680 GB on one 
server).  Whenever we re-index a significant amount of content the 
server becomes unusable for days - all it can handle is the reindexing 
and merging.  Even if we throttle reindexing down to 1 there will be 
periods of downtime (and of course the whole process takes longer).  The 
admin console responds fine except for the status screens, which 
(mostly) time out, as Alex reports.  If you keep trying, you can usually 
eventually get a response, although this can take 1/2 hour of retrying 
and waiting, which is painful.

The strategy we are pursuing is to maintain two servers; we reindex them 
in sequence, switching production from one to the other and back again.  
Another thing we are trying is increasing RAM (because we can, and the 
system doesn't *seem* to be CPU bound), but I suspect the real issue is 
disk I/O.

I'd like to add more disk, or faster ones (or ideally more servers, but 
that is challenging for a variety of reasons).  I'm also not sure how to 
prove that more I/O bandwidth would help dramatically.  I've looked at 
vmstat: we're not swapping, but I'm also not sure how to read the stats 
clearly enough to understand in detail what's going on with the disks.

Also wondering if you have recommendations as to how to configure 
disks.  Do you use RAID?  What kind?  A centralized SAN or other storage 
device?  I plan to ask support for help, too, but any general 
suggestions from folks on the list here would be appreciated.

-Mike

On 6/26/2012 6:16 PM, Danny Sokolsky wrote:
 Another thing you might try is turning the reindexer throttle down.  Go to 
 the database config page and try setting reindexer throttle to 1 (I think the 
 default is 5).  Then give your system a while to catch up, and see if you can 
 then get to the forest status page.


 -Danny

 -Original Message-
 From: general-boun...@developer.marklogic.com 
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Michael Blakeley
 Sent: Tuesday, June 26, 2012 2:37 PM
 To: MarkLogic Developer Discussion
 Subject: Re: [MarkLogic Dev General] Database Status Troubles

 On 26 Jun 2012, at 14:12 , Alex Milowski wrote:

 On Tue, Jun 26, 2012 at 2:01 PM, Michael Blakeley m...@blakeley.com wrote:
 On 26 Jun 2012, at 13:52 , Alex Milowski wrote:

 The forest status page doesn't return.  The browser just hangs out
 until something times out.
 That is decidedly abnormal. I have seen a few systems where database-status 
 was slow and would time out during reindexing, but forest-status should 
 not. Admittedly I don't do much with geospatial, so this might be a 
 geospatial-specific bug.
 Here's what I don't get.  I dropped a large index.  Shouldn't that be
 an easy operation?

 Or is it the index I added that is causing the problem?  It should
 only hit a very small number of documents/elements.  Is it still
 required it to touch everything?
 MarkLogic never updates in place (except for timestamps). So it doesn't drop 
 existing indexes: it only reindexes. When the indexes for an old fragment are 
 out of date with respect to the current database configuration, the fragment 
 is deleted and its content reinserted as a new fragment. That is happening to 
 all the fragments affected by your latest configuration changes.

 But my hunch is that the OS is overloaded, so I would start by checking the 
 OS status: top and iostat, on linux or Solaris. The OS may be out of RAM 
 and swapping, or possibly reindexing puts too much demand on the available 
 disk I/O capacity. Either way, I suspect that your database has outgrown 
 the hardware.
 Yes, that is certainly true.  I don't have enough memory.  Funny bit
 is that the application works reasonable well for my research
 purposes.  For production, it would need far more memory to support a
 load.
 Reindexing is essentially a large series of batched updates. Updates are much 
 more expensive than reads, and unlike reads they cannot be cached.

 You can do some monitoring of the reindexer status with 'tail -F' on the 
 error log, while setting the file-log-level to Debug. However if the OS is 
 paging, there 

Re: [MarkLogic Dev General] Database Status Troubles

2012-06-27 Thread Mike Sokolov
We will need to trigger a reindex in the next couple of days, so I will 
capture some stats then and post back - thanks.

-Mike

On 06/27/2012 09:05 AM, Damon Feldman wrote:
 Mike and Alex,

 MarkLogic can do both at the same time, but an underlying system issue may be 
 triggered by the intensified activity. I agree with a previous message 
 suggesting the OS is overloaded, swapping or otherwise in serious trouble. 
 The system should re-index without affecting queries very much, particularly 
 if the reindex throttle is turned down.

 Can you check the OS and/or VM status for CPU, swap and I/O activity?

 Yours,
 Damon

 -Original Message-
 From: general-boun...@developer.marklogic.com 
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Mike Sokolov
 Sent: Wednesday, June 27, 2012 7:52 AM
 To: MarkLogic Developer Discussion
 Cc: Danny Sokolsky
 Subject: Re: [MarkLogic Dev General] Database Status Troubles

 I would agree that you are bumping up against hardware limits.

 I get this regularly on a largish database (  10M docs, ~ 680 GB on one
 server).  Whenever we re-index a significant amount of content the
 server becomes unusable for days - all it can handle is the reindexing
 and merging.  Even if we throttle reindexing down to 1 there will be
 periods of downtime (and of course the whole process takes longer).  The
 admin console responds fine except for the status screens, which
 (mostly) time out, as Alex reports.  If you keep trying, you can usually
 eventually get a response, although this can take 1/2 hour of retrying
 and waiting, which is painful.

 The strategy we are pursuing is to maintain two servers; we reindex them
 in sequence, switching production from one to the other and back again.
 Another thing we are trying is increasing RAM (because we can, and the
 system doesn't *seem* to be CPU bound), but I suspect the real issue is
 disk I/O.

 I'd like to add more disk, or faster ones (or ideally more servers, but
 that is challenging for a variety of reasons).  I'm also not sure how to
 prove that more I/O bandwidth would help dramatically.  I've looked at
 vmstat: we're not swapping, but I'm also not sure how to read the stats
 clearly enough to understand in detail what's going on with the disks.

 Also wondering if you have recommendations as to how to configure
 disks.  Do you use RAID?  What kind?  A centralized SAN or other storage
 device?  I plan to ask support for help, too, but any general
 suggestions from folks on the list here would be appreciated.

 -Mike

 On 6/26/2012 6:16 PM, Danny Sokolsky wrote:

 Another thing you might try is turning the reindexer throttle down.  Go to 
 the database config page and try setting reindexer throttle to 1 (I think 
 the default is 5).  Then give your system a while to catch up, and see if 
 you can then get to the forest status page.


 -Danny

 -Original Message-
 From: general-boun...@developer.marklogic.com 
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Michael 
 Blakeley
 Sent: Tuesday, June 26, 2012 2:37 PM
 To: MarkLogic Developer Discussion
 Subject: Re: [MarkLogic Dev General] Database Status Troubles

 On 26 Jun 2012, at 14:12 , Alex Milowski wrote:

  
 On Tue, Jun 26, 2012 at 2:01 PM, Michael Blakeleym...@blakeley.com  wrote:

 On 26 Jun 2012, at 13:52 , Alex Milowski wrote:

  
 The forest status page doesn't return.  The browser just hangs out
 until something times out.

 That is decidedly abnormal. I have seen a few systems where 
 database-status was slow and would time out during reindexing, but 
 forest-status should not. Admittedly I don't do much with geospatial, so 
 this might be a geospatial-specific bug.
  
 Here's what I don't get.  I dropped a large index.  Shouldn't that be
 an easy operation?

 Or is it the index I added that is causing the problem?  It should
 only hit a very small number of documents/elements.  Is it still
 required it to touch everything?

 MarkLogic never updates in place (except for timestamps). So it doesn't drop 
 existing indexes: it only reindexes. When the indexes for an old fragment 
 are out of date with respect to the current database configuration, the 
 fragment is deleted and its content reinserted as a new fragment. That is 
 happening to all the fragments affected by your latest configuration changes.

  
 But my hunch is that the OS is overloaded, so I would start by checking 
 the OS status: top and iostat, on linux or Solaris. The OS may be out of 
 RAM and swapping, or possibly reindexing puts too much demand on the 
 available disk I/O capacity. Either way, I suspect that your database has 
 outgrown the hardware.
  
 Yes, that is certainly true.  I don't have enough memory.  Funny bit
 is that the application works reasonable well for my research
 purposes.  For production, it would need far more memory to support a
 load.

 Reindexing is essentially a 

Re: [MarkLogic Dev General] Database Status Troubles

2012-06-27 Thread Michael Blakeley
You can collect good data manually with something like 'iostat -mxz 15' (remove 
the 'z' or the 'm' if not supported, but the 'x' is important).

Or you can configure your sysstat cron job to collect disk statistics 
automatically, for sar. In /etc/cron.d/sysstat you want something like this:

 */10 * * * * root /usr/lib64/sa/sa1 -S DISK 60 10 
  
 53 23 * * * root /usr/lib64/sa/sa2 –A

The -S DISK is important, and isn't there by default.

Also make sure sysstat is installed, of course. It includes iostat too.

-- Mike

On 27 Jun 2012, at 07:11 , Mike Sokolov wrote:

 We will need to trigger a reindex in the next couple of days, so I will 
 capture some stats then and post back - thanks.
 
 -Mike
 
 On 06/27/2012 09:05 AM, Damon Feldman wrote:
 Mike and Alex,
 
 MarkLogic can do both at the same time, but an underlying system issue may 
 be triggered by the intensified activity. I agree with a previous message 
 suggesting the OS is overloaded, swapping or otherwise in serious trouble. 
 The system should re-index without affecting queries very much, particularly 
 if the reindex throttle is turned down.
 
 Can you check the OS and/or VM status for CPU, swap and I/O activity?
 
 Yours,
 Damon
 
 -Original Message-
 From: general-boun...@developer.marklogic.com 
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Mike Sokolov
 Sent: Wednesday, June 27, 2012 7:52 AM
 To: MarkLogic Developer Discussion
 Cc: Danny Sokolsky
 Subject: Re: [MarkLogic Dev General] Database Status Troubles
 
 I would agree that you are bumping up against hardware limits.
 
 I get this regularly on a largish database (  10M docs, ~ 680 GB on one
 server).  Whenever we re-index a significant amount of content the
 server becomes unusable for days - all it can handle is the reindexing
 and merging.  Even if we throttle reindexing down to 1 there will be
 periods of downtime (and of course the whole process takes longer).  The
 admin console responds fine except for the status screens, which
 (mostly) time out, as Alex reports.  If you keep trying, you can usually
 eventually get a response, although this can take 1/2 hour of retrying
 and waiting, which is painful.
 
 The strategy we are pursuing is to maintain two servers; we reindex them
 in sequence, switching production from one to the other and back again.
 Another thing we are trying is increasing RAM (because we can, and the
 system doesn't *seem* to be CPU bound), but I suspect the real issue is
 disk I/O.
 
 I'd like to add more disk, or faster ones (or ideally more servers, but
 that is challenging for a variety of reasons).  I'm also not sure how to
 prove that more I/O bandwidth would help dramatically.  I've looked at
 vmstat: we're not swapping, but I'm also not sure how to read the stats
 clearly enough to understand in detail what's going on with the disks.
 
 Also wondering if you have recommendations as to how to configure
 disks.  Do you use RAID?  What kind?  A centralized SAN or other storage
 device?  I plan to ask support for help, too, but any general
 suggestions from folks on the list here would be appreciated.
 
 -Mike
 
 On 6/26/2012 6:16 PM, Danny Sokolsky wrote:
 
 Another thing you might try is turning the reindexer throttle down.  Go to 
 the database config page and try setting reindexer throttle to 1 (I think 
 the default is 5).  Then give your system a while to catch up, and see if 
 you can then get to the forest status page.
 
 
 -Danny
 
 -Original Message-
 From: general-boun...@developer.marklogic.com 
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Michael 
 Blakeley
 Sent: Tuesday, June 26, 2012 2:37 PM
 To: MarkLogic Developer Discussion
 Subject: Re: [MarkLogic Dev General] Database Status Troubles
 
 On 26 Jun 2012, at 14:12 , Alex Milowski wrote:
 
 
 On Tue, Jun 26, 2012 at 2:01 PM, Michael Blakeleym...@blakeley.com  
 wrote:
 
 On 26 Jun 2012, at 13:52 , Alex Milowski wrote:
 
 
 The forest status page doesn't return.  The browser just hangs out
 until something times out.
 
 That is decidedly abnormal. I have seen a few systems where 
 database-status was slow and would time out during reindexing, but 
 forest-status should not. Admittedly I don't do much with geospatial, so 
 this might be a geospatial-specific bug.
 
 Here's what I don't get.  I dropped a large index.  Shouldn't that be
 an easy operation?
 
 Or is it the index I added that is causing the problem?  It should
 only hit a very small number of documents/elements.  Is it still
 required it to touch everything?
 
 MarkLogic never updates in place (except for timestamps). So it doesn't 
 drop existing indexes: it only reindexes. When the indexes for an old 
 fragment are out of date with respect to the current database 
 configuration, the fragment is deleted and its content reinserted as a new 
 fragment. That is happening to all the fragments affected by your latest 
 configuration changes.
 
 
 But my hunch is that the OS is 

[MarkLogic Dev General] Wikimedia parse

2012-06-27 Thread David Lee
Has anyone seen a XQuery or XSLT parser for WikiMedia (markup for Wikipedia)

I found this list

http://www.mediawiki.org/wiki/Alternative_parsers


What I'm looking for is a way to take the XML dump of Wikipedia and enrich it 
to something more useful.  Right now all the body of an article is in Wikimedia 
format and largely opaque to ML except as one long string.


-
David Lee
Lead Engineer
MarkLogic Corporation
d...@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.comhttp://www.marklogic.com/

This e-mail and any accompanying attachments are confidential. The information 
is intended solely for the use of the individual to whom it is addressed. Any 
review, disclosure, copying, distribution, or use of this e-mail communication 
by others is strictly prohibited. If you are not the intended recipient, please 
notify us immediately by returning this message to the sender and delete all 
copies. Thank you for your cooperation.

___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] General Digest, Vol 96, Issue 70

2012-06-27 Thread Henrik Nielsen
Geert and Michael,

Thank you very much for your help. 

I added 'xmlns=http://www.irhb.org;' in the schema, and now the docs
validate and load beautifully. 
In the meantime I  also discovered the validator at
http://xsdvalidation.utilities-online.info/ which gives excellent feedback.

Henrik

___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


[MarkLogic Dev General] cts:search / geospatial queries - combining criteria

2012-06-27 Thread Alex Milowski
I've been restructuring my indices for good (or bad) for my weather
database and I've go a basic question about optimally using a
geospatial index.  In summary, I have reports for each weather station
stored in their own collection by id and each report has a location
(latitude/longitude attribute pair).  Previously, I had a index on
that pair but I've dropped that in favor of a summary document for
each station which tracks its current position.

The rational is that there are roughly 10,000 stations and there are
10's of millions of reports (eventually to be 100's of millions).  So,
an index on a smaller set of document should be faster/better/etc.,
right?

One difficulty is that I need to produce a report based on quadrangles
(think rectangles mapped on the surface of the earth).  For each
quadrangle, I want to count the number of reports contained in the
database for a certain period of time (e.g. the last 30 minutes).  The
core of this query used to be:

xdmp:estimate(
   cts:search(
   collection(http://.../weather/;)/s:report,
   cts:and-query(
   (cts:element-attribute-range-query(xs:QName(s:report),
QName(,received),=,$dtstart),
cts:element-attribute-range-query(xs:QName(s:report),
QName(,received),=,$dtend),
cts:element-attribute-pair-geospatial-query(xs:QName(s:report),
QName(,latitude), QName(,longitude), $quad) ) )))

but now, with the index change, it becomes:

sum(
   for $s in
cts:search(
collection(http://.../stations/;),
cts:element-attribute-pair-geospatial-query(xs:QName(s:station),
QName(,latitude), QName(,longitude), $quad) )
   return 
xdmp:estimate(collection(concat(http://.../weather/,$s/s:station/@id))/s:report[@received=$dtstart
and @received=$dtend]))

which actually performs worse--probably due to the for loop and sum bit.

Is there some way to combine this into one cts:search() statement
where I get the relevant set of ids from the stations collections via
the geospatial index and then the reports by id, received time, or
collection ?

Keep in mind that every report belongs to its station's collection of
weather reports as well as a database-wide collection of weather
reports.

-- 
--Alex Milowski
The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered.

Bertrand Russell in a footnote of Principles of Mathematics
___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Wikimedia parse

2012-06-27 Thread Michael Blakeley
Not in XQuery: it would be much too ugly for my taste. I've used 
http://code.google.com/p/gwtwiki/ and contributed a couple of patches. Hsiao 
could show you some sample code with xhtml-like output.

If you need to use it from XQuery, I suppose you could wrap it in a web service.

-- Mike

On 27 Jun 2012, at 09:59 , David Lee wrote:

 Has anyone seen a XQuery or XSLT parser for WikiMedia (markup for Wikipedia)
  
 I found this list
  
 http://www.mediawiki.org/wiki/Alternative_parsers
  
  
 What I'm looking for is a way to take the XML dump of Wikipedia and enrich it 
 to something more useful.  Right now all the body of an article is in 
 Wikimedia format and largely opaque to ML except as one long string.
  
  
 -
 David Lee
 Lead Engineer
 MarkLogic Corporation
 d...@marklogic.com
 Phone: +1 650-287-2531
 Cell:  +1 812-630-7622
 www.marklogic.com
 
 This e-mail and any accompanying attachments are confidential. The 
 information is intended solely for the use of the individual to whom it is 
 addressed. Any review, disclosure, copying, distribution, or use of this 
 e-mail communication by others is strictly prohibited. If you are not the 
 intended recipient, please notify us immediately by returning this message to 
 the sender and delete all copies. Thank you for your cooperation.
  
 ___
 General mailing list
 General@developer.marklogic.com
 http://community.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Wikimedia parse

2012-06-27 Thread David Lee
Thanks,  I tried the online tool on a sample I have and it strips out much of 
the meaningful stuff :( 

---  Input
{{Infobox settlement
lt;!--See the Table at Infobox Settlement for all fields and descriptions of 
usage--gt;
lt;!-- Basic info  gt;
|name  = Teichibe 
|other_name =
|native_name=  lt;!-- for cities whose native name is not in 
English --gt;
|nickname   = 
|settlement_type=Village
|motto  =
lt;!-- images and maps  ---gt;
|image_skyline  = 
|imagesize  = 
|image_caption  = 
|image_flag = 
|flag_size  =
|image_seal = 
|seal_size  =
|image_shield   = 
|shield_size=
|image_map  = 
|mapsize= 
|map_caption= 
|pushpin_map=Malilt;!-- the name of a location map as per 
http://en.wikipedia.org/wiki/Template:Location_map --gt;
|pushpin_label_position =bottom
|pushpin_mapsize=300
|pushpin_map_caption=Location in Mali
lt;!-- Location --gt;
|coordinates_display= inline,title
|coordinates_region = ML
|subdivision_type   = Country
|subdivision_name   = {{flag|Mali}}
|subdivision_type1  = [[Regions of Mali|Region]]
|subdivision_name1  = [[Kayes Region]]
|subdivision_type2  =[[Cercles of Mali|Cercle]]
|subdivision_name2  = [[Kayes Cercle]]
|subdivision_type3  =[[Communes of Mali|Commune]]
|subdivision_name3  = [[Karakoro]]
|lt;!-- Politics -gt;
|government_footnotes   =
|government_type=
|leader_title   =
|leader_name=
|leader_title1  =  lt;!-- for places with, say, both a mayor and a 
city manager --gt;
|leader_name1   =
|established_title  =  lt;!-- Settled --gt;
|established_date   = 
lt;!-- Area-gt;
|area_magnitude = 
|unit_pref=Imperial lt;!--Enter: Imperial, if Imperial 
(metric) is desired--gt;
|area_footnotes   =
|area_total_km2   =  lt;!-- ALL fields dealing with a measurements are 
subject to automatic unit conversion--gt;
|area_land_km2= lt;!--See table @ Template:Infobox Settlement for 
details on automatic unit conversion--gt;
lt;!-- Population   ---gt;
|population_as_of   =
|population_footnotes   =
|population_note=
|population_total   =
|population_density_km2 =
|population_density_sq_mi   =
|population_metro   =
|population_density_metro_km2   =
|population_density_metro_sq_mi =
|population_blank1_title=Ethnicities
|population_blank1  =
|population_density_blank1_km2 =   
|population_density_blank1_sq_mi =
lt;!-- General information  ---gt;
|timezone   =[[GMT]] 
|utc_offset = +0
|timezone_DST   = 
|utc_offset_DST = 
|latd=15|latm=16|lats=30 |latNS=N
|longd=11|longm=42|longs=25|longEW=W
|elevation_footnotes=  lt;!--for references: use lt;refgt; lt;/refgt; 
tags--gt;
|elevation_m= 
|elevation_ft   =
lt;!-- Area/postal codes amp; others gt;
|postal_code_type   =  lt;!-- enter ZIP code, Postcode, Post code, Postal 
code... --gt;
|postal_code=
|area_code  =
|blank_name =
|blank_info =
|blank1_name=
|blank1_info=
|website= 
|footnotes  = 
}} 

'''Teichibe''' is a village and principal settlement (''[[chef-lieu]]'') of the 
[[Karakoro|commune of Karakoro]] in the [[Kayes Cercle|Cercle of Kayes]] in the 
[[Kayes Region]] of south-western [[Mali]].lt;refgt;{{citation | 
title=Communes de la Région de Kayes | publisher= Ministère de l'administration 
territoriale et des collectivités locales, République du Mali | 
url=http://www.matcl.gov.ml/pdf/ComRegKayes.pdf | language=French 
}}.lt;/refgt;


==References==
{{reflist}}

[[Category:Populated places in the Kayes Region]]


{{Kayes-geo-stub}}

--  Output

p{{Infobox settlement
#60;!--See the Table at Infobox Settlement for all fields and descriptions of 
usage--#62;
#60;!-- Basic info  #62;}} /p
pbTeichibe/b is a village and principal settlement (ia 
href=Chef-lieu title=chef-lieuchef-lieu/a/i) of the a href=Karakoro 
title=Karakorocommune of Karakoro/a in the a href=Kayes_Cercle 
title=Kayes CercleCercle of Kayes/a in the a href=Kayes_Region 
title=Kayes RegionKayes Region/a of south-western a href=Mali 
title=MaliMali/a.#60;ref#62;{{citation}}.#60;/ref#62;/p

h2span class=mw-headline id=ReferencesReferences/span/h2
p{{reflist}}/p

p{{Kayes-geo-stub}}/p 


-
David Lee
Lead Engineer
MarkLogic Corporation
d...@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com

This e-mail and 

Re: [MarkLogic Dev General] Wikimedia parse

2012-06-27 Thread Michael Blakeley
Yes, it does. The API gives you access to much of that wiki-structured markup, 
but you have to decide what to do with it. Naturally the online tool and much 
of the sample code doesn't do anything interesting.

-- Mike

On 27 Jun 2012, at 10:19 , David Lee wrote:

 Thanks,  I tried the online tool on a sample I have and it strips out much of 
 the meaningful stuff :( 
 
 ---  Input
 {{Infobox settlement
 lt;!--See the Table at Infobox Settlement for all fields and descriptions of 
 usage--gt;
 lt;!-- Basic info  gt;
 |name  = Teichibe 
 |other_name =
 |native_name=  lt;!-- for cities whose native name is not in 
 English --gt;
 |nickname   = 
 |settlement_type=Village
 |motto  =
 lt;!-- images and maps  ---gt;
 |image_skyline  = 
 |imagesize  = 
 |image_caption  = 
 |image_flag = 
 |flag_size  =
 |image_seal = 
 |seal_size  =
 |image_shield   = 
 |shield_size=
 |image_map  = 
 |mapsize= 
 |map_caption= 
 |pushpin_map=Malilt;!-- the name of a location map as per 
 http://en.wikipedia.org/wiki/Template:Location_map --gt;
 |pushpin_label_position =bottom
 |pushpin_mapsize=300
 |pushpin_map_caption=Location in Mali
 lt;!-- Location --gt;
 |coordinates_display= inline,title
 |coordinates_region = ML
 |subdivision_type   = Country
 |subdivision_name   = {{flag|Mali}}
 |subdivision_type1  = [[Regions of Mali|Region]]
 |subdivision_name1  = [[Kayes Region]]
 |subdivision_type2  =[[Cercles of Mali|Cercle]]
 |subdivision_name2  = [[Kayes Cercle]]
 |subdivision_type3  =[[Communes of Mali|Commune]]
 |subdivision_name3  = [[Karakoro]]
 |lt;!-- Politics -gt;
 |government_footnotes   =
 |government_type=
 |leader_title   =
 |leader_name=
 |leader_title1  =  lt;!-- for places with, say, both a mayor and a 
 city manager --gt;
 |leader_name1   =
 |established_title  =  lt;!-- Settled --gt;
 |established_date   = 
 lt;!-- Area-gt;
 |area_magnitude = 
 |unit_pref=Imperial lt;!--Enter: Imperial, if Imperial 
 (metric) is desired--gt;
 |area_footnotes   =
 |area_total_km2   =  lt;!-- ALL fields dealing with a measurements 
 are subject to automatic unit conversion--gt;
 |area_land_km2= lt;!--See table @ Template:Infobox Settlement 
 for details on automatic unit conversion--gt;
 lt;!-- Population   ---gt;
 |population_as_of   =
 |population_footnotes   =
 |population_note=
 |population_total   =
 |population_density_km2 =
 |population_density_sq_mi   =
 |population_metro   =
 |population_density_metro_km2   =
 |population_density_metro_sq_mi =
 |population_blank1_title=Ethnicities
 |population_blank1  =
 |population_density_blank1_km2 =   
 |population_density_blank1_sq_mi =
 lt;!-- General information  ---gt;
 |timezone   =[[GMT]] 
 |utc_offset = +0
 |timezone_DST   = 
 |utc_offset_DST = 
 |latd=15|latm=16|lats=30 |latNS=N
 |longd=11|longm=42|longs=25|longEW=W
 |elevation_footnotes=  lt;!--for references: use lt;refgt; 
 lt;/refgt; tags--gt;
 |elevation_m= 
 |elevation_ft   =
 lt;!-- Area/postal codes amp; others gt;
 |postal_code_type   =  lt;!-- enter ZIP code, Postcode, Post code, 
 Postal code... --gt;
 |postal_code=
 |area_code  =
 |blank_name =
 |blank_info =
 |blank1_name=
 |blank1_info=
 |website= 
 |footnotes  = 
 }} 
 
 '''Teichibe''' is a village and principal settlement (''[[chef-lieu]]'') of 
 the [[Karakoro|commune of Karakoro]] in the [[Kayes Cercle|Cercle of Kayes]] 
 in the [[Kayes Region]] of south-western [[Mali]].lt;refgt;{{citation | 
 title=Communes de la Région de Kayes | publisher= Ministère de 
 l'administration territoriale et des collectivités locales, République du 
 Mali | url=http://www.matcl.gov.ml/pdf/ComRegKayes.pdf | language=French 
 }}.lt;/refgt;
 
 
 ==References==
 {{reflist}}
 
 [[Category:Populated places in the Kayes Region]]
 
 
 {{Kayes-geo-stub}}
 
 --  Output
 
 p{{Infobox settlement
 #60;!--See the Table at Infobox Settlement for all fields and descriptions 
 of usage--#62;
 #60;!-- Basic info  #62;}} /p
 pbTeichibe/b is a village and principal settlement (ia 
 href=Chef-lieu title=chef-lieuchef-lieu/a/i) of the a 
 href=Karakoro title=Karakorocommune of Karakoro/a in the a 
 href=Kayes_Cercle title=Kayes CercleCercle of Kayes/a in the a 
 href=Kayes_Region title=Kayes RegionKayes Region/a of south-western 

Re: [MarkLogic Dev General] Wikimedia parse

2012-06-27 Thread David Lee
Ah the API does !!! Who hoo.
Maybe I can get XML out of this after all ... I smell an xmlsh extension in the 
making :)

I actually have a similar problem with xmlsh docs ... they are all currently in 
WakiWiki ... but thats a black box ... I want to turn them into XML like 
DocBook ...
Someone (Dave Pawson I think ?) wrote me a python lib to do that but only 90% 
... the last 10% as usual is 99% 

-
David Lee
Lead Engineer
MarkLogic Corporation
d...@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information 
is intended solely for the use of the individual to whom it is addressed. Any 
review, disclosure, copying, distribution, or use of this e-mail communication 
by others is strictly prohibited. If you are not the intended recipient, please 
notify us immediately by returning this message to the sender and delete all 
copies. Thank you for your cooperation.


 -Original Message-
 From: general-boun...@developer.marklogic.com [mailto:general-
 boun...@developer.marklogic.com] On Behalf Of Michael Blakeley
 Sent: Wednesday, June 27, 2012 1:23 PM
 To: MarkLogic Developer Discussion
 Subject: Re: [MarkLogic Dev General] Wikimedia parse
 
 Yes, it does. The API gives you access to much of that wiki-structured markup,
 but you have to decide what to do with it. Naturally the online tool and much
 of the sample code doesn't do anything interesting.
 
 -- Mike
 
 On 27 Jun 2012, at 10:19 , David Lee wrote:
 
  Thanks,  I tried the online tool on a sample I have and it strips out much 
  of
 the meaningful stuff :(
 
  ---  Input
  {{Infobox settlement
  lt;!--See the Table at Infobox Settlement for all fields and descriptions 
  of
 usage--gt;
  lt;!-- Basic info  gt;
  |name  = Teichibe
  |other_name =
  |native_name=  lt;!-- for cities whose native name is not in 
  English --
 gt;
  |nickname   =
  |settlement_type=Village
  |motto  =
  lt;!-- images and maps  ---gt;
  |image_skyline  =
  |imagesize  =
  |image_caption  =
  |image_flag =
  |flag_size  =
  |image_seal =
  |seal_size  =
  |image_shield   =
  |shield_size=
  |image_map  =
  |mapsize=
  |map_caption=
  |pushpin_map=Malilt;!-- the name of a location map as per
 http://en.wikipedia.org/wiki/Template:Location_map --gt;
  |pushpin_label_position =bottom
  |pushpin_mapsize=300
  |pushpin_map_caption=Location in Mali
  lt;!-- Location --gt;
  |coordinates_display= inline,title
  |coordinates_region = ML
  |subdivision_type   = Country
  |subdivision_name   = {{flag|Mali}}
  |subdivision_type1  = [[Regions of Mali|Region]]
  |subdivision_name1  = [[Kayes Region]]
  |subdivision_type2  =[[Cercles of Mali|Cercle]]
  |subdivision_name2  = [[Kayes Cercle]]
  |subdivision_type3  =[[Communes of Mali|Commune]]
  |subdivision_name3  = [[Karakoro]]
  |lt;!-- Politics -gt;
  |government_footnotes   =
  |government_type=
  |leader_title   =
  |leader_name=
  |leader_title1  =  lt;!-- for places with, say, both a mayor and a 
  city
 manager --gt;
  |leader_name1   =
  |established_title  =  lt;!-- Settled --gt;
  |established_date   =
  lt;!-- Area-gt;
  |area_magnitude =
  |unit_pref=Imperial lt;!--Enter: Imperial, if Imperial 
  (metric) is
 desired--gt;
  |area_footnotes   =
  |area_total_km2   =  lt;!-- ALL fields dealing with a measurements 
  are
 subject to automatic unit conversion--gt;
  |area_land_km2= lt;!--See table @ Template:Infobox Settlement 
  for
 details on automatic unit conversion--gt;
  lt;!-- Population   ---gt;
  |population_as_of   =
  |population_footnotes   =
  |population_note=
  |population_total   =
  |population_density_km2 =
  |population_density_sq_mi   =
  |population_metro   =
  |population_density_metro_km2   =
  |population_density_metro_sq_mi =
  |population_blank1_title=Ethnicities
  |population_blank1  =
  |population_density_blank1_km2 =
  |population_density_blank1_sq_mi =
  lt;!-- General information  ---gt;
  |timezone   =[[GMT]]
  |utc_offset = +0
  |timezone_DST   =
  |utc_offset_DST =
  |latd=15|latm=16|lats=30 |latNS=N
  |longd=11|longm=42|longs=25|longEW=W
  |elevation_footnotes=  lt;!--for references: use lt;refgt; 
  lt;/refgt;
 tags--gt;
  |elevation_m=
  |elevation_ft   =
  lt;!-- Area/postal 

Re: [MarkLogic Dev General] cts:search / geospatial queries - combining criteria

2012-06-27 Thread Alex Milowski
On Wed, Jun 27, 2012 at 10:21 AM, Michael Blakeley m...@blakeley.com wrote:
 I'm not sure that was a good change. Sometimes more documents are better - 
 and MarkLogic can handle billions, albeit with the right hardware.

 Anyway, look into fetching the station ids using cts:element-attribute-values 
 instead of cts:search. You might also be able to move the received date-range 
 constraints into a cts:query term on that call, which would allow you to use 
 cts:frequency instead of xdmp:estimate.


I'm not sure it was a great move but having to index millions of
weather reports just to get one report working better doesn't feel
like a great solution either.

Just as another example, this one performs reasonably well for one quadrangle:

for $s in cts:search(
   collection(http://.../stations/;),
   cts:element-attribute-pair-geospatial-query(xs:QName(s:station),
QName(,latitude), QName(,longitude), $quad) )
  order by $s/s:station/@id
   return if 
(not(collection(concat(http://.../weather/,$s/s:station/@id))/s:report[@received$dtstart
and @received$dtend])[1]) then () else
   let $id := string($s/s:station/@id)
 ...

even with the order by and testing for no reports.

I've also looks at the query plans for some of these.  There is a
index on @received and so the expression [@received$dtstart and
@received$dtend] hits the index quite nicely.  It only becomes
faster if you can combine it within one query (via an and) as I had
originally.

There's a twist here in that the report I'm having trouble with is for
all quadrangles where the above example is only for one quadrangles.
That is, in my original question, that query runs many times (10368
times for 2.5° quadrangles).  In relational terms, it is a join that
I'm performing multiple times because I'm grouping by quadrangle.

-- 
--Alex Milowski
The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered.

Bertrand Russell in a footnote of Principles of Mathematics
___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Marklogic Webdav not connecting

2012-06-27 Thread Joseph Bryan
What OS are you running?

I've had trouble on Windows 7 browsing the root of a WebDAV app
server. When I instead accessed a specific sub-directory via WebDAV, I
was able to view the contents in Windows Explorer.

Thanks.

-jb

On Wed, Jun 27, 2012 at 1:16 PM, DJaun Maclin dmac...@wattnet.net wrote:
 Thanks for the suggestions, Danny!

 My Webdav was set to root and the root properties existed after testing in CQ
 Tried different users of varying permissions, but it didn't get me anywhere.
 Thanks though.


 -Original Message-
 From: general-boun...@developer.marklogic.com 
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Danny Sokolsky
 Sent: Wednesday, June 27, 2012 1:12 AM
 To: MarkLogic Developer Discussion
 Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

 Here are a few other things to look at:

 * Make sure the root directory exists in the database.  For example, if the 
 root is / on the WebDAV server, make sure the directory / exists (  
 xdmp:directory-properties(/) should return a properties document with a 
 directory element, for example)
 * make sure the user you are logging into the WebDAV server has permissions 
 on that root directory.  Try logging in as a user with the admin role--=if 
 that works but another user does not, then the user probably does not have 
 permissions on the root directory.

 -Danny
 
 From: general-boun...@developer.marklogic.com 
 [general-boun...@developer.marklogic.com] On Behalf Of DJaun Maclin 
 [dmac...@wattnet.net]
 Sent: Tuesday, June 26, 2012 11:12 PM
 To: General Discussion
 Cc: Robert Tuten; Matt Leshinskie
 Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

 Thanks for the reply, Danny!
 Automatic directory creation is enabled for the database.

 From: Danny Sokolsky 
 danny.sokol...@marklogic.commailto:danny.sokol...@marklogic.com
 Reply-To: General Discussion 
 general@developer.marklogic.commailto:general@developer.marklogic.com
 Date: Tue, 26 Jun 2012 16:12:05 -0700
 To: General Discussion 
 general@developer.marklogic.commailto:general@developer.marklogic.com
 Cc: Robert Tuten rtu...@wattnet.netmailto:rtu...@wattnet.net, Matt 
 Leshinskie 
 matt.leshins...@marklogic.commailto:matt.leshins...@marklogic.com
 Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

 Is automatic directory creation enabled for the database?
 ___
 General mailing list
 General@developer.marklogic.com
 http://community.marklogic.com/mailman/listinfo/general
 ___
 General mailing list
 General@developer.marklogic.com
 http://community.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Database Status Troubles

2012-06-27 Thread Mike Sokolov
Thanks that's a big help, Michael

On 06/27/2012 12:31 PM, Michael Blakeley wrote:
 You can collect good data manually with something like 'iostat -mxz 15' 
 (remove the 'z' or the 'm' if not supported, but the 'x' is important).

 Or you can configure your sysstat cron job to collect disk statistics 
 automatically, for sar. In /etc/cron.d/sysstat you want something like this:


 */10 * * * * root /usr/lib64/sa/sa1 -S DISK 60 10

 53 23 * * * root /usr/lib64/sa/sa2 –A
  
 The -S DISK is important, and isn't there by default.

 Also make sure sysstat is installed, of course. It includes iostat too.

 -- Mike

 On 27 Jun 2012, at 07:11 , Mike Sokolov wrote:


 We will need to trigger a reindex in the next couple of days, so I will
 capture some stats then and post back - thanks.

 -Mike

 On 06/27/2012 09:05 AM, Damon Feldman wrote:
  
 Mike and Alex,

 MarkLogic can do both at the same time, but an underlying system issue may 
 be triggered by the intensified activity. I agree with a previous message 
 suggesting the OS is overloaded, swapping or otherwise in serious trouble. 
 The system should re-index without affecting queries very much, 
 particularly if the reindex throttle is turned down.

 Can you check the OS and/or VM status for CPU, swap and I/O activity?

 Yours,
 Damon

 -Original Message-
 From: general-boun...@developer.marklogic.com 
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Mike Sokolov
 Sent: Wednesday, June 27, 2012 7:52 AM
 To: MarkLogic Developer Discussion
 Cc: Danny Sokolsky
 Subject: Re: [MarkLogic Dev General] Database Status Troubles

 I would agree that you are bumping up against hardware limits.

 I get this regularly on a largish database (   10M docs, ~ 680 GB on one
 server).  Whenever we re-index a significant amount of content the
 server becomes unusable for days - all it can handle is the reindexing
 and merging.  Even if we throttle reindexing down to 1 there will be
 periods of downtime (and of course the whole process takes longer).  The
 admin console responds fine except for the status screens, which
 (mostly) time out, as Alex reports.  If you keep trying, you can usually
 eventually get a response, although this can take 1/2 hour of retrying
 and waiting, which is painful.

 The strategy we are pursuing is to maintain two servers; we reindex them
 in sequence, switching production from one to the other and back again.
 Another thing we are trying is increasing RAM (because we can, and the
 system doesn't *seem* to be CPU bound), but I suspect the real issue is
 disk I/O.

 I'd like to add more disk, or faster ones (or ideally more servers, but
 that is challenging for a variety of reasons).  I'm also not sure how to
 prove that more I/O bandwidth would help dramatically.  I've looked at
 vmstat: we're not swapping, but I'm also not sure how to read the stats
 clearly enough to understand in detail what's going on with the disks.

 Also wondering if you have recommendations as to how to configure
 disks.  Do you use RAID?  What kind?  A centralized SAN or other storage
 device?  I plan to ask support for help, too, but any general
 suggestions from folks on the list here would be appreciated.

 -Mike

 On 6/26/2012 6:16 PM, Danny Sokolsky wrote:


 Another thing you might try is turning the reindexer throttle down.  Go to 
 the database config page and try setting reindexer throttle to 1 (I think 
 the default is 5).  Then give your system a while to catch up, and see if 
 you can then get to the forest status page.


 -Danny

 -Original Message-
 From: general-boun...@developer.marklogic.com 
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Michael 
 Blakeley
 Sent: Tuesday, June 26, 2012 2:37 PM
 To: MarkLogic Developer Discussion
 Subject: Re: [MarkLogic Dev General] Database Status Troubles

 On 26 Jun 2012, at 14:12 , Alex Milowski wrote:


  
 On Tue, Jun 26, 2012 at 2:01 PM, Michael Blakeleym...@blakeley.com   
 wrote:


 On 26 Jun 2012, at 13:52 , Alex Milowski wrote:


  
 The forest status page doesn't return.  The browser just hangs out
 until something times out.


 That is decidedly abnormal. I have seen a few systems where 
 database-status was slow and would time out during reindexing, but 
 forest-status should not. Admittedly I don't do much with geospatial, so 
 this might be a geospatial-specific bug.

  
 Here's what I don't get.  I dropped a large index.  Shouldn't that be
 an easy operation?

 Or is it the index I added that is causing the problem?  It should
 only hit a very small number of documents/elements.  Is it still
 required it to touch everything?


 MarkLogic never updates in place (except for timestamps). So it doesn't 
 drop existing indexes: it only reindexes. When the indexes for an old 
 fragment are out of date with respect to the current database 
 configuration, the fragment is 

Re: [MarkLogic Dev General] Marklogic Webdav not connecting

2012-06-27 Thread deekshant belwal
Try connecting with BitKinex if you are using win-7. Also you can try
changing webdav server's default authentication configuration to basic if
not set already.

On Wed, Jun 27, 2012 at 1:47 PM, Joseph Bryan joe...@gmail.com wrote:

 What OS are you running?

 I've had trouble on Windows 7 browsing the root of a WebDAV app
 server. When I instead accessed a specific sub-directory via WebDAV, I
 was able to view the contents in Windows Explorer.

 Thanks.

 -jb

 On Wed, Jun 27, 2012 at 1:16 PM, DJaun Maclin dmac...@wattnet.net wrote:
  Thanks for the suggestions, Danny!
 
  My Webdav was set to root and the root properties existed after testing
 in CQ
  Tried different users of varying permissions, but it didn't get me
 anywhere.
  Thanks though.
 
 
  -Original Message-
  From: general-boun...@developer.marklogic.com [mailto:
 general-boun...@developer.marklogic.com] On Behalf Of Danny Sokolsky
  Sent: Wednesday, June 27, 2012 1:12 AM
  To: MarkLogic Developer Discussion
  Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting
 
  Here are a few other things to look at:
 
  * Make sure the root directory exists in the database.  For example, if
 the root is / on the WebDAV server, make sure the directory / exists (
  xdmp:directory-properties(/) should return a properties document with a
 directory element, for example)
  * make sure the user you are logging into the WebDAV server has
 permissions on that root directory.  Try logging in as a user with the
 admin role--=if that works but another user does not, then the user
 probably does not have permissions on the root directory.
 
  -Danny
  
  From: general-boun...@developer.marklogic.com [
 general-boun...@developer.marklogic.com] On Behalf Of DJaun Maclin [
 dmac...@wattnet.net]
  Sent: Tuesday, June 26, 2012 11:12 PM
  To: General Discussion
  Cc: Robert Tuten; Matt Leshinskie
  Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting
 
  Thanks for the reply, Danny!
  Automatic directory creation is enabled for the database.
 
  From: Danny Sokolsky danny.sokol...@marklogic.commailto:
 danny.sokol...@marklogic.com
  Reply-To: General Discussion general@developer.marklogic.commailto:
 general@developer.marklogic.com
  Date: Tue, 26 Jun 2012 16:12:05 -0700
  To: General Discussion general@developer.marklogic.commailto:
 general@developer.marklogic.com
  Cc: Robert Tuten rtu...@wattnet.netmailto:rtu...@wattnet.net, Matt
 Leshinskie matt.leshins...@marklogic.commailto:
 matt.leshins...@marklogic.com
  Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting
 
  Is automatic directory creation enabled for the database?
  ___
  General mailing list
  General@developer.marklogic.com
  http://community.marklogic.com/mailman/listinfo/general
  ___
  General mailing list
  General@developer.marklogic.com
  http://community.marklogic.com/mailman/listinfo/general
 ___
 General mailing list
 General@developer.marklogic.com
 http://community.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Marklogic Webdav not connecting

2012-06-27 Thread DJaun Maclin
Thanks, JB.
I will give that a try!
My OS is Windows Server 2003 R2 64bit



-Original Message-
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Joseph Bryan
Sent: Wednesday, June 27, 2012 12:47 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

What OS are you running?

I've had trouble on Windows 7 browsing the root of a WebDAV app
server. When I instead accessed a specific sub-directory via WebDAV, I
was able to view the contents in Windows Explorer.

Thanks.

-jb

On Wed, Jun 27, 2012 at 1:16 PM, DJaun Maclin dmac...@wattnet.net wrote:
 Thanks for the suggestions, Danny!

 My Webdav was set to root and the root properties existed after testing in CQ
 Tried different users of varying permissions, but it didn't get me anywhere.
 Thanks though.


 -Original Message-
 From: general-boun...@developer.marklogic.com 
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Danny Sokolsky
 Sent: Wednesday, June 27, 2012 1:12 AM
 To: MarkLogic Developer Discussion
 Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

 Here are a few other things to look at:

 * Make sure the root directory exists in the database.  For example, if the 
 root is / on the WebDAV server, make sure the directory / exists (  
 xdmp:directory-properties(/) should return a properties document with a 
 directory element, for example)
 * make sure the user you are logging into the WebDAV server has permissions 
 on that root directory.  Try logging in as a user with the admin role--=if 
 that works but another user does not, then the user probably does not have 
 permissions on the root directory.

 -Danny
 
 From: general-boun...@developer.marklogic.com 
 [general-boun...@developer.marklogic.com] On Behalf Of DJaun Maclin 
 [dmac...@wattnet.net]
 Sent: Tuesday, June 26, 2012 11:12 PM
 To: General Discussion
 Cc: Robert Tuten; Matt Leshinskie
 Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

 Thanks for the reply, Danny!
 Automatic directory creation is enabled for the database.

 From: Danny Sokolsky 
 danny.sokol...@marklogic.commailto:danny.sokol...@marklogic.com
 Reply-To: General Discussion 
 general@developer.marklogic.commailto:general@developer.marklogic.com
 Date: Tue, 26 Jun 2012 16:12:05 -0700
 To: General Discussion 
 general@developer.marklogic.commailto:general@developer.marklogic.com
 Cc: Robert Tuten rtu...@wattnet.netmailto:rtu...@wattnet.net, Matt 
 Leshinskie 
 matt.leshins...@marklogic.commailto:matt.leshins...@marklogic.com
 Subject: Re: [MarkLogic Dev General] Marklogic Webdav not connecting

 Is automatic directory creation enabled for the database?
 ___
 General mailing list
 General@developer.marklogic.com
 http://community.marklogic.com/mailman/listinfo/general
 ___
 General mailing list
 General@developer.marklogic.com
 http://community.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


[MarkLogic Dev General] Thesaurus question

2012-06-27 Thread Shannon
Hi,

Will MarkLogic have search issues if I leave the part-of-speech element out of 
my thsr doc? It's not required by the schema, but it appears in all of your 
examples. I will replicate many entries and that info will be unknown. Maybe 
it's added value but unnecessary? 

Thanks,

Shannon (shifl...@virginia.edu)
___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general