Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 54

Geert Josten Sun, 21 Feb 2016 03:42:49 -0800

Hi Ravinder,

Thanks for the info. So you have 12 physical cores in total, and an equal 
number of forests. That should mean you have roughly 12 mln docs per forest. 
That should be a nice number for fast faceting, and getting value frequencies.


I am rather surprised about the 30 seconds though, and especially because the 
above sounds right. I ran a little comparison on an average demo server over 
here, with a single forest containing 16 mln docs. I restarted the server to 
make sure the caches are cold, and then ran the same code as you, only for a 
slightly different element index. It returned in 0.06 sec, which is kind of the 
order of magnitude I’d typically expect from MarkLogic. Using a cluster 
shouldn’t add much more, regardless of the number of nodes or forests. Are the 
number consistent if you rerun your test?

You should always be able to get sub-sec results for this. And because that is 
clearly not happening, something else must be causing issues here. Low latency 
for instance, or maybe your indexes are taking more memory that MarkLogic is 
getting, meaning it could be swapping or such. How much free memory is 
available on the three nodes, and how fast is the network connection between 
them? Also, is anything else competing for cpu, memory, or network bandwidth 
perhaps?

Cheers,
Geert

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of RAVINDER MAAN <[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Saturday, February 20, 2016 at 11:08 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 54

Hi Geerat

Thanks for reply. In ML it takes about 30 seconds and in elasticsearch it takes 
4 seconds. It is cluster of 3 nodes. Each node has 16GB RAM and "ls 
/proc/cpuinfo" show 8 cores(I think it is because of hyper threading actual 
cores are 4). I have configured 4 forests per node. Do you think 
increasing/decreasing number of forests will help? As this is range index query 
so I guess entire index is in memory so other cache settings should not effect 
this query.

If I run the query with query meters I just see below cache misses, all other 
caches hit/miss are 0.

<qm:value-cache-misses>194</qm:value-cache-misses>
<qm:regexp-cache-hits>181</qm:regexp-cache-hits>
<qm:regexp-cache-misses>5</qm:regexp-cache-misses>


Thanks & regards,
Ravinder Singh Maan

On Sat, Feb 20, 2016 at 7:33 PM, 
<[email protected]<mailto:[email protected]>>
 wrote:
Send General mailing list submissions to
        [email protected]<mailto:[email protected]>

To subscribe or unsubscribe via the World Wide Web, visit
        http://developer.marklogic.com/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
        
[email protected]<mailto:[email protected]>

You can reach the person managing the list at
        
[email protected]<mailto:[email protected]>

When replying, please edit your Subject line so it is more specific
than "Re: Contents of General digest..."


Today's Topics:

   1. Re: Best way to find most occuring word or sort by frequency
      (Geert Josten)
   2. Re: [1.0-ml] XDMP-TRPLIDXNOTFOUND: cts:triples() -- Triple
      index not enabled (Geert Josten)


----------------------------------------------------------------------

Message: 1
Date: Sat, 20 Feb 2016 18:44:50 +0000
From: Geert Josten 
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] Best way to find most occuring
        word or sort by frequency
To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Message-ID: 
<d2ee7209.c3d73%[email protected]<mailto:d2ee7209.c3d73%[email protected]>>
Content-Type: text/plain; charset="us-ascii"

Hi,

I think this is the right approach..

If you talk about it being slow, how slow is that exactly? And how did you 
configure MarkLogic? More specifically, how many forest do you have? Also, how 
much memory, and cpu cores do you have?

Kind regards,
Geert


From: 
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
 on behalf of RAVINDER MAAN 
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
Date: Saturday, February 20, 2016 at 11:34 AM
To: 
"[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>"
 
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
Subject: [MarkLogic Dev General] Best way to find most occuring word or sort by 
frequency

Hello all

I want to sort element values by frequency. I have tried below

for $word in cts:element-values(xs:QName("ELEMENT_NAME"),  (), 
("frequency-order", "limit=10"))
return <word count="{cts:frequency($word)}">{$word}</word>


But for very large index this is slow in comparison to elasticsearch. I did 
this comparison on same machine with same data and of course only one of them 
was running when I did the comparison. There are about 250 million documents 
and frequency range is 1 million to hundreds i.e. if I run above query the word 
on the top has count 1000000.

Is there any other way of doing same ?


Thanks & regards,
Ravinder Singh Maan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://developer.marklogic.com/pipermail/general/attachments/20160220/eecf895c/attachment-0001.html

------------------------------

Message: 2
Date: Sat, 20 Feb 2016 19:33:41 +0000
From: Geert Josten 
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] [1.0-ml] XDMP-TRPLIDXNOTFOUND:
        cts:triples() -- Triple index not enabled
To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Message-ID: 
<d2ee7d69.c3ddc%[email protected]<mailto:d2ee7d69.c3ddc%[email protected]>>
Content-Type: text/plain; charset="iso-8859-1"

Hi Ga?l,

You need to enable the triple-index. You can do that by going to the Admin UI 
of your MarkLogic installation, navigating to the relevant content database, 
and toggling the triple index from false to true there. It should be around the 
10th edit option, so close to the top. Confirm the change by clicking OK at the 
top or bottom of the page, and then wait for the reindex to complete. You can 
follow the progress on the Status tab of that database. Refresh it once in a 
while to get it updated.

Kind regards,
Geert

From: 
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
 on behalf of Ga?l YIMEN YIMGA 
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
Date: Saturday, February 20, 2016 at 5:46 PM
To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
Subject: [MarkLogic Dev General] [1.0-ml] XDMP-TRPLIDXNOTFOUND: cts:triples() 
-- Triple index not enabled

Hello All,

I'm facing an issue in MarkLogic.
I ran successfully the following query
===================
import module namespace sem = "http://marklogic.com/semantics";
      at "/MarkLogic/semantics.xqy";

sem:rdf-insert(
  (
  sem:triple(
    sem:iri("http://example.org/marklogic/people/John_Smith";),
    sem:iri("http://example.org/marklogic/predicate/livesIn";),
    "London"
    )
  ,
  sem:triple(
    sem:iri("http://example.org/marklogic/people/Jane_Smith";),
    sem:iri("http://example.org/marklogic/predicate/livesIn";),
    "London"
    )
  ,
  sem:triple(
    sem:iri("http://example.org/marklogic/people/Jack_Smith";),
    sem:iri("http://example.org/marklogic/predicate/livesIn";),
    "Glasgow"
    )
  )
)
===================

But in a secnond plan, I rand the following to count the number of triples
=======
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";;
fn:count(cts:triples());
=======
I got the following error in the image below

[Images int?gr?es 1]

Your help to fix this will be greatfull.

Thanks in advance !!!

Ga?l.
--

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://developer.marklogic.com/pipermail/general/attachments/20160220/a4c6b935/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 17507 bytes
Desc: image.png
Url : 
http://developer.marklogic.com/pipermail/general/attachments/20160220/a4c6b935/attachment.png

------------------------------

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


End of General Digest, Vol 140, Issue 54
****************************************

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 54

Reply via email to