David,
Directories with millions of documents aren't necessarily a problem: I
create them frequently. Last week I build a 20M document database, and
the largest directory contained 9.2M documents.
I see the 32-bit kernel as more of a problem. A 32-bit kernel is limited
to a 32-bit address space, and the server process only gets 3-GB of that
address space, no matter how much RAM or swap you have. So why not
install a 64-bit linux? Your CPU is probably 64-bit capable, unless it
pre-dates AMD Opteron or Intel's EM64T technology.
Also, Jason reminded me that you've done some past tuning of your
database in-memory limits, to accommodate those giant fragmented
documents. Now that you're loading smaller documents, you should reset
those to the default values. There's a button for this, toward the
bottom of the database config screen: it's labeled "get default values".
Returning to the default values might help you avoid the XDMP-MEMORY error.
Getting back to the query in my last message, it is probably slow
because it has to read-lock all the documents in the directory, even
when the query is only deleting 1000 of them. You can get around this
with some xdmp:eval() trickery (caution - sharp tools!). This version
uses an outer read-only query to gather the uris, and an inner update to
delete them. So instead of needing millions of read locks and 1000 write
locks, it only needs 1000 read locks and 1000 write locks.
This is essentially a way to relax the query's ACID guarantees. Normally
we guarantee that the documents that are present at the start of a
transaction, and aren't affected by the transaction, will still be
available at the end of the transaction. Hence the need to read-lock all
of them. But by telling the update to run in a different-transaction, we
can relax this requirement and allow the xdmp:directory() portion to run
in lockless (timestamped) mode. The assert on line 1 ensures that the
xdmp:directory() part really does run in timestamped mode.
let $assert :=
if (xdmp:request-timestamp()) then ()
else error((), 'NOTIMESTAMP', text { 'outer query is not read-only' })
let $path := '/'
let $map := map:map()
let $list-uris :=
for $i in xdmp:directory($path, 'infinity')[1 to 1000]
return map:put($map, xdmp:node-uri($i), true())
let $do := xdmp:eval('
declare variable $URIS as map:map external;
xdmp:document-delete(map:keys($URIS))
',
(xs:QName('URIS'), $map),
<options xmlns="xdmp:eval">
<isolation>different-transaction</isolation>
<prevent-deadlocks>true</prevent-deadlocks>
</options>
)
return count(map:keys($map))
, xdmp:elapsed-time()
You could keep running that until it returns 0, and you could tinker
with the '1 to 1000' range if you like.
-- Mike
On 2009-12-09 09:46, Lee, David wrote:
Thanks for the suggestion
I am running 4.1-3, I have plenty of swap space.
I tried the bulk deletes but they were taking about 1 minute per 1000 documents
to delete ...
I gave up after a few hours.
I've created a new DB and am starting the process of reloading now, about 2/3
through then I'll delete the old forest.
I've come to the conclusion, that atleast on my system which is admittedly not
that powerful (32bit linux, 4GB ram, 2.8ghz, ) that ML doesnt handle directories
with> 1mil entries very well.
I try to add more then that and run into all sorts of memory problems.
I try to *delete* that directory and cant.
It also doesnt handle individual files with> 1mil fragments that well but
atleast it handles them.
For my experimental case, I'm trying now a hybrid approach which is to bulk up 1000
"rows" per file and keeping the # of files in a directory in the 1000's not
million's ...
-----Original Message-----
From: Michael Blakeley [mailto:michael.blake...@marklogic.com]
Sent: Wednesday, December 09, 2009 12:33 PM
To: General Mark Logic Developer Discussion
Cc: Lee, David
Subject: Re: [MarkLogic Dev General] Cannot delete directory with 1mil docs -
XDMP-MEMORY
The XDMP-MEMORY message does mean that the host couldn't allocation the
needed memory. In this case that was probably because the transaction
was too large to fit in memory. If you aren't already using 4.1-3, I'd
upgrade - just in case this is a known problem that has already been fixed.
If 4.1-3 doesn't help, then I suppose you could increase the swap
space... but I don't think you'd like the performance. You might be able
to reduce the sizes of the group-level caches, but that might lead to
*CACHEFULL errors.
So as Geert suggested, clearing the forest is probably the fastest
solution. Or if you don't mind spending more time on it, you could
delete in blocks of 1000 documents.
for $i in xdmp:directory($path, 'infinity')[1 to 1000]
return xdmp:document-delete(xdmp:node-uri($i))
You could automate this using xdmp:spawn(). You could also use
cts:uris() with a cts:directory-query(), if you have the uri lexicon
available.
-- Mike
On 2009-12-09 05:59, Lee, David wrote:
My joys of success were premature.
I ran into memory problems trying to load the full set of documents, it died
after about 1mil.
So I tried to delete the directory and now I’m getting
Exception running: :query
com.marklogic.xcc.exceptions.XQueryException: XDMP-MEMORY: xdmp:directory-delete
("/RxNorm/rxnsat/") -- Memory exhausted
in /eval, on line 1
Arg !!!!
I’ve tried to change various memory settings to no avail. Any clue how to
delete this directory ?
or should I start to delete the files piecemeal.
Suggestions welcome.
-David
----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
d...@epocrates.com<mailto:d...@epocrates.com>
812-482-5224
_______________________________________________
General mailing list
General@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general