David,

Directories with millions of documents aren't necessarily a problem: I create them frequently. Last week I build a 20M document database, and the largest directory contained 9.2M documents.

I see the 32-bit kernel as more of a problem. A 32-bit kernel is limited to a 32-bit address space, and the server process only gets 3-GB of that address space, no matter how much RAM or swap you have. So why not install a 64-bit linux? Your CPU is probably 64-bit capable, unless it pre-dates AMD Opteron or Intel's EM64T technology.

Also, Jason reminded me that you've done some past tuning of your database in-memory limits, to accommodate those giant fragmented documents. Now that you're loading smaller documents, you should reset those to the default values. There's a button for this, toward the bottom of the database config screen: it's labeled "get default values". Returning to the default values might help you avoid the XDMP-MEMORY error.

Getting back to the query in my last message, it is probably slow because it has to read-lock all the documents in the directory, even when the query is only deleting 1000 of them. You can get around this with some xdmp:eval() trickery (caution - sharp tools!). This version uses an outer read-only query to gather the uris, and an inner update to delete them. So instead of needing millions of read locks and 1000 write locks, it only needs 1000 read locks and 1000 write locks.

This is essentially a way to relax the query's ACID guarantees. Normally we guarantee that the documents that are present at the start of a transaction, and aren't affected by the transaction, will still be available at the end of the transaction. Hence the need to read-lock all of them. But by telling the update to run in a different-transaction, we can relax this requirement and allow the xdmp:directory() portion to run in lockless (timestamped) mode. The assert on line 1 ensures that the xdmp:directory() part really does run in timestamped mode.

let $assert :=
  if (xdmp:request-timestamp()) then ()
  else error((), 'NOTIMESTAMP', text { 'outer query is not read-only' })
let $path := '/'
let $map := map:map()
let $list-uris :=
  for $i in xdmp:directory($path, 'infinity')[1 to 1000]
  return map:put($map, xdmp:node-uri($i), true())
let $do := xdmp:eval('
  declare variable $URIS as map:map external;
  xdmp:document-delete(map:keys($URIS))
',
  (xs:QName('URIS'), $map),
  <options xmlns="xdmp:eval">
    <isolation>different-transaction</isolation>
    <prevent-deadlocks>true</prevent-deadlocks>
  </options>
)
return count(map:keys($map))
, xdmp:elapsed-time()

You could keep running that until it returns 0, and you could tinker with the '1 to 1000' range if you like.

-- Mike

On 2009-12-09 09:46, Lee, David wrote:
Thanks for the suggestion
I am running 4.1-3, I have plenty of swap space.

I tried the bulk deletes but they were taking about 1 minute per 1000 documents 
to delete ...
I gave up after a few hours.

I've created a new DB and am starting the process of reloading now, about 2/3 
through then I'll delete the old forest.

I've come to the conclusion, that atleast on my system which is admittedly not 
that powerful (32bit linux, 4GB ram,  2.8ghz, ) that ML doesnt handle directories 
with>  1mil entries very well.
I try to add more then that and run into all sorts of memory problems.
I try to *delete* that directory and cant.

It also doesnt handle individual files with>  1mil fragments that well but 
atleast it handles them.
For my experimental case, I'm trying now a hybrid approach which is to bulk up 1000 
"rows" per file and keeping the # of files in a directory in the 1000's not 
million's ...



-----Original Message-----
From: Michael Blakeley [mailto:michael.blake...@marklogic.com]
Sent: Wednesday, December 09, 2009 12:33 PM
To: General Mark Logic Developer Discussion
Cc: Lee, David
Subject: Re: [MarkLogic Dev General] Cannot delete directory with 1mil docs - 
XDMP-MEMORY

The XDMP-MEMORY message does mean that the host couldn't allocation the
needed memory. In this case that was probably because the transaction
was too large to fit in memory. If you aren't already using 4.1-3, I'd
upgrade - just in case this is a known problem that has already been fixed.

If 4.1-3 doesn't help, then I suppose you could increase the swap
space... but I don't think you'd like the performance. You might be able
to reduce the sizes of the group-level caches, but that might lead to
*CACHEFULL errors.

So as Geert suggested, clearing the forest is probably the fastest
solution. Or if you don't mind spending more time on it, you could
delete in blocks of 1000 documents.

    for $i in xdmp:directory($path, 'infinity')[1 to 1000]
    return xdmp:document-delete(xdmp:node-uri($i))

You could automate this using xdmp:spawn(). You could also use
cts:uris() with a cts:directory-query(), if you have the uri lexicon
available.

-- Mike

On 2009-12-09 05:59, Lee, David wrote:
My joys of success were premature.
I ran into memory problems trying to load the full set of documents, it died 
after about 1mil.
So I tried to delete the directory and now I’m getting

Exception running: :query
com.marklogic.xcc.exceptions.XQueryException: XDMP-MEMORY: xdmp:directory-delete
("/RxNorm/rxnsat/") -- Memory exhausted
in /eval, on line 1

Arg !!!!

I’ve tried to change various memory settings to no avail.  Any clue how to 
delete this directory ?
or should I start to delete the files piecemeal.

Suggestions welcome.

-David


----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
d...@epocrates.com<mailto:d...@epocrates.com>
812-482-5224







_______________________________________________
General mailing list
General@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general

Reply via email to