Re: [MarkLogic Dev General] Cannot delete directory with 1mil docs - XDMP-MEMORY

Michael Blakeley Wed, 09 Dec 2009 10:41:43 -0800

David,

Directories with millions of documents aren't necessarily a problem: Icreate them frequently. Last week I build a 20M document database, andthe largest directory contained 9.2M documents.

I see the 32-bit kernel as more of a problem. A 32-bit kernel is limitedto a 32-bit address space, and the server process only gets 3-GB of thataddress space, no matter how much RAM or swap you have. So why notinstall a 64-bit linux? Your CPU is probably 64-bit capable, unless itpre-dates AMD Opteron or Intel's EM64T technology.

Also, Jason reminded me that you've done some past tuning of yourdatabase in-memory limits, to accommodate those giant fragmenteddocuments. Now that you're loading smaller documents, you should resetthose to the default values. There's a button for this, toward thebottom of the database config screen: it's labeled "get default values".Returning to the default values might help you avoid the XDMP-MEMORY error.

Getting back to the query in my last message, it is probably slowbecause it has to read-lock all the documents in the directory, evenwhen the query is only deleting 1000 of them. You can get around thiswith some xdmp:eval() trickery (caution - sharp tools!). This versionuses an outer read-only query to gather the uris, and an inner update todelete them. So instead of needing millions of read locks and 1000 writelocks, it only needs 1000 read locks and 1000 write locks.

This is essentially a way to relax the query's ACID guarantees. Normallywe guarantee that the documents that are present at the start of atransaction, and aren't affected by the transaction, will still beavailable at the end of the transaction. Hence the need to read-lock allof them. But by telling the update to run in a different-transaction, wecan relax this requirement and allow the xdmp:directory() portion to runin lockless (timestamped) mode. The assert on line 1 ensures that thexdmp:directory() part really does run in timestamped mode.


let $assert :=
  if (xdmp:request-timestamp()) then ()
  else error((), 'NOTIMESTAMP', text { 'outer query is not read-only' })
let $path := '/'
let $map := map:map()
let $list-uris :=
  for $i in xdmp:directory($path, 'infinity')[1 to 1000]
  return map:put($map, xdmp:node-uri($i), true())
let $do := xdmp:eval('
  declare variable $URIS as map:map external;
  xdmp:document-delete(map:keys($URIS))
',
  (xs:QName('URIS'), $map),
  <options xmlns="xdmp:eval">
    <isolation>different-transaction</isolation>
    <prevent-deadlocks>true</prevent-deadlocks>
  </options>
)
return count(map:keys($map))
, xdmp:elapsed-time()

You could keep running that until it returns 0, and you could tinkerwith the '1 to 1000' range if you like.


-- Mike

On 2009-12-09 09:46, Lee, David wrote:

Thanks for the suggestion
I am running 4.1-3, I have plenty of swap space.

I tried the bulk deletes but they were taking about 1 minute per 1000 documents 
to delete ...
I gave up after a few hours.

I've created a new DB and am starting the process of reloading now, about 2/3 
through then I'll delete the old forest.

I've come to the conclusion, that atleast on my system which is admittedly not 
that powerful (32bit linux, 4GB ram,  2.8ghz, ) that ML doesnt handle directories 
with>  1mil entries very well.
I try to add more then that and run into all sorts of memory problems.
I try to *delete* that directory and cant.

It also doesnt handle individual files with>  1mil fragments that well but 
atleast it handles them.
For my experimental case, I'm trying now a hybrid approach which is to bulk up 1000 
"rows" per file and keeping the # of files in a directory in the 1000's not 
million's ...



-----Original Message-----
From: Michael Blakeley [mailto:michael.blake...@marklogic.com]
Sent: Wednesday, December 09, 2009 12:33 PM
To: General Mark Logic Developer Discussion
Cc: Lee, David
Subject: Re: [MarkLogic Dev General] Cannot delete directory with 1mil docs - 
XDMP-MEMORY

The XDMP-MEMORY message does mean that the host couldn't allocation the
needed memory. In this case that was probably because the transaction
was too large to fit in memory. If you aren't already using 4.1-3, I'd
upgrade - just in case this is a known problem that has already been fixed.

If 4.1-3 doesn't help, then I suppose you could increase the swap
space... but I don't think you'd like the performance. You might be able
to reduce the sizes of the group-level caches, but that might lead to
*CACHEFULL errors.

So as Geert suggested, clearing the forest is probably the fastest
solution. Or if you don't mind spending more time on it, you could
delete in blocks of 1000 documents.

    for $i in xdmp:directory($path, 'infinity')[1 to 1000]
    return xdmp:document-delete(xdmp:node-uri($i))

You could automate this using xdmp:spawn(). You could also use
cts:uris() with a cts:directory-query(), if you have the uri lexicon
available.

-- Mike

On 2009-12-09 05:59, Lee, David wrote:

My joys of success were premature.
I ran into memory problems trying to load the full set of documents, it died 
after about 1mil.
So I tried to delete the directory and now I’m getting

Exception running: :query
com.marklogic.xcc.exceptions.XQueryException: XDMP-MEMORY: xdmp:directory-delete
("/RxNorm/rxnsat/") -- Memory exhausted
in /eval, on line 1

Arg !!!!

I’ve tried to change various memory settings to no avail.  Any clue how to 
delete this directory ?
or should I start to delete the files piecemeal.

Suggestions welcome.

-David


----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
d...@epocrates.com<mailto:d...@epocrates.com>
812-482-5224



_______________________________________________
General mailing list
General@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Cannot delete directory with 1mil docs - XDMP-MEMORY

Reply via email to