Re: [MarkLogic Dev General] Clarification on MarkLogic

Aiswarya Wed, 20 Jul 2011 23:17:32 -0700

Thanks guys.. Both the machines platform and bits are same.. I will test it
out and let you know.


Thanks
Aiswarya V

-----Original Message-----
From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of
general-requ...@developer.marklogic.com
Sent: Wednesday, July 20, 2011 9:26 PM
To: general@developer.marklogic.com
Subject: General Digest, Vol 85, Issue 71

Send General mailing list submissions to
        general@developer.marklogic.com

To subscribe or unsubscribe via the World Wide Web, visit
        http://developer.marklogic.com/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
        general-requ...@developer.marklogic.com

You can reach the person managing the list at
        general-ow...@developer.marklogic.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of General digest..."


Today's Topics:

   1. Re: Clarification on MarkLogic (Danny Sokolsky)
   2. Re: Search using 100k terms (Danny Sokolsky)
   3. Re: Lots of collections ... (Evan Lenz)


----------------------------------------------------------------------

Message: 1
Date: Wed, 20 Jul 2011 07:56:41 -0700
From: Danny Sokolsky <danny.sokol...@marklogic.com>
Subject: Re: [MarkLogic Dev General] Clarification on MarkLogic
To: General MarkLogic Developer Discussion
        <general@developer.marklogic.com>
Message-ID:
        <c9924d15b04672479b089f7d55ffc132029075c...@exchg-be.marklogic.com>
Content-Type: text/plain; charset="us-ascii"

This is not correct.  A database restore uses the configuration for the
database to which you are restoring; it does not take them from the backup.
The db backup does contain a copy of the config files for convenience, but
they are not applied to the restore.  If the settings from the backup are
different from the db to which you are restoring, it will start to reindex
(if reindexing is enabled).

Here are a few other points worth looking at:

* is your new machine the same platform and bits (64 or 32) as your backup?
The backups are platform (and bits) specific, so if they are different it
will not work
* you can take a snapshot of your data directory after uninstalling
MarkLogic, then put that data directory in the same location on the new
machine before installing MarkLogic.  Then when you install, MarkLogic will
treat that as an upgrade (keeping all of your config info, which is in the
data dir).  Then you can apply your db backups.
* I would test whatever you do carefully first, and take good backups.
* xqsync can help out, especially if your source and target are on different
platforms.  But it can be good anyway.  Xqsync to a 2nd machine, test, clean
the old machine, then xqsync from the 2nd machine back to the newly cleaned
1st machine.

Like all things like this, there is potential for a mistake to lead to data
loss, so be careful and test it.

-Danny

-----Original Message-----
From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Khan, Kashif
Sent: Wednesday, July 20, 2011 6:19 AM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Clarification on MarkLogic

One thing I forgot to mention is that when you recreate the database in
step 3 you do not have to configure it. Just give it the same name as
before and do the basic settings. It will inherit al the indexes and
search configurations etc once you restore the database from the backup.

I am sure you know this but just an FYI. When you take a backup of
batabase all the content is also backed up. So restoring the database will
also restore the content.


Best Regards,
Kashif Khan



On 7/20/11 9:09 AM, "Khan, Kashif" <kashif.k...@hmhpub.com> wrote:

>When you take a backup of a database all the settings you mentioned are
>also backed up. So when you restore the database all the settings will be
>restored.
>
>
>Best Regards,
>Kashif Khan
>
>
>
>On 7/20/11 9:22 AM, "Aiswarya" <aiswarya.venkatachalapa...@laserwords.com>
>wrote:
>
>>Hi Kashif Khan,
>>
>>Thanks for your quick reply. I can take the backup of the content and
>>restore it back. But, Is there any way to get a full report on the
>>available
>>databases configuration such as searches enabled, created range Indexes
>>(with their scalar type, namespace, collation, localname), created
>>lexicons,
>>enabled & created pipelines and so on. I will need all these to manually
>>recreate the databases. Could you please help me out!!
>>
>>Thanks
>>Aiswarya V
>>
>>-----Original Message-----
>>From: general-boun...@developer.marklogic.com
>>[mailto:general-boun...@developer.marklogic.com] On Behalf Of
>>general-requ...@developer.marklogic.com
>>Sent: Wednesday, July 20, 2011 6:02 PM
>>To: general@developer.marklogic.com
>>Subject: General Digest, Vol 85, Issue 68
>>
>>Send General mailing list submissions to
>>      general@developer.marklogic.com
>>
>>To subscribe or unsubscribe via the World Wide Web, visit
>>      http://developer.marklogic.com/mailman/listinfo/general
>>or, via email, send a message with subject or body 'help' to
>>      general-requ...@developer.marklogic.com
>>
>>You can reach the person managing the list at
>>      general-ow...@developer.marklogic.com
>>
>>When replying, please edit your Subject line so it is more specific
>>than "Re: Contents of General digest..."
>>
>>
>>Today's Topics:
>>
>>   1. Reg: Wildcarded search doesn't return   fitness (ambika arumugam)
>>   2. Re: Clarification on MarkLogic (Khan, Kashif)
>>
>>
>>----------------------------------------------------------------------
>>
>>Message: 1
>>Date: Wed, 20 Jul 2011 16:40:27 +0530
>>From: ambika arumugam <ambikaarumuga...@gmail.com>
>>Subject: [MarkLogic Dev General] Reg: Wildcarded search doesn't return
>>      fitness
>>To: General MarkLogic Developer Discussion
>>      <general@developer.marklogic.com>
>>Message-ID:
>>      <CAESiW4HcttaTF=Pvky=fqvrmcb7oecfy4z-oqmd3xazbsoc...@mail.gmail.com>
>>Content-Type: text/plain; charset="iso-8859-1"
>>
>>Hi all,
>>
>>I am performing a search to return results with its fitness value
>>
>>let $options := <options xmlns="http://marklogic.com/appservices/search";>
>>    <term>
>>          <term-option>wildcarded</term-option>
>>    </term>
>></options>
>>
>>return search:search("the",$options)//search:result/@fitness
>>this returns me the results with its fitness
>>
>>but when i perform wildcarded search
>>
>> search:search("the*",$options)//search:result/@fitness
>>
>>it gives me the result but the values of fitness, score and confidence
>>are
>>zero.
>>
>>Am i missing something?
>>
>>I would like to get the relevance fitness values for all results while i
>>do
>>the wildcarded search.
>>
>>Regards
>>Ambika
>>-------------- next part --------------
>>An HTML attachment was scrubbed...
>>URL:
>>http://developer.marklogic.com/pipermail/general/attachments/20110720/c20
>>9
>>40
>>0f/attachment-0001.html
>>
>>------------------------------
>>
>>Message: 2
>>Date: Wed, 20 Jul 2011 08:28:27 -0400
>>From: "Khan, Kashif" <kashif.k...@hmhpub.com>
>>Subject: Re: [MarkLogic Dev General] Clarification on MarkLogic
>>To: General MarkLogic Developer Discussion
>>      <general@developer.marklogic.com>
>>Message-ID: <ca4c4095.699e%kashif.k...@hmhpub.com>
>>Content-Type: text/plain; charset="windows-1252"
>>
>>This should work for you unless someone else disagrees
>>
>> 1.  Just take a backup of the the databases that you need. Store this on
>>an
>>external drive.
>> 2.  Restore your machine.
>> 3.  Manually recreate the databases. I think you will have to keep the
>>same
>>name as before. At this point you will not have any content in the
>>databases
>> 4.  Restore the daatabase that you created in step 1 from the backup.
>>Best Regards,
>>Kashif Khan
>>Sr. Solutions Architect
>>Houghton Mifflin Harcourt, Orlando, FL
>>Office: (407) 345-3420
>>Cell: (407) 949-4697
>>
>>"The water you touch in the river is the last of that which has passed
>>and
>>the first of that which is coming" --Leonardo da Vinci
>>
>>
>>
>>From: Aiswarya
>><aiswarya.venkatachalapa...@laserwords.com<mailto:aiswarya.venkatachalapa
>>t
>>hy
>>@laserwords.com>>
>>Reply-To: General MarkLogic Developer Discussion
>><general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
>>Date: Wed, 20 Jul 2011 04:00:00 -0400
>>To:
>>"general@developer.marklogic.com<mailto:general@developer.marklogic.com>"
>><general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
>>Subject: [MarkLogic Dev General] Clarification on MarkLogic
>>
>>
>>Hi Guys,
>>
>>
>>
>>I desperately need your help.
>>
>>
>>
>>I need to format my machine but I don?t want to lose any of the content
>>or
>>configuration of MarkLogic(such as database, forest, pipelines
>>configuration
>>etc.,)of my machine. Is there something that I can take a back up of
>>whole
>>Mark Logic installed in my machine and roll back after formatting the
>>machine. Please help me out.
>>
>>
>>
>>Thanks
>>
>>Aiswarya V
>>
>>
>>-------------- next part --------------
>>An HTML attachment was scrubbed...
>>URL:
>>http://developer.marklogic.com/pipermail/general/attachments/20110720/6be
>>4
>>e9
>>9a/attachment.html
>>
>>------------------------------
>>
>>_______________________________________________
>>General mailing list
>>General@developer.marklogic.com
>>http://developer.marklogic.com/mailman/listinfo/general
>>
>>
>>End of General Digest, Vol 85, Issue 68
>>***************************************
>>
>>_______________________________________________
>>General mailing list
>>General@developer.marklogic.com
>>http://developer.marklogic.com/mailman/listinfo/general
>
>_______________________________________________
>General mailing list
>General@developer.marklogic.com
>http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


------------------------------

Message: 2
Date: Wed, 20 Jul 2011 08:05:57 -0700
From: Danny Sokolsky <danny.sokol...@marklogic.com>
Subject: Re: [MarkLogic Dev General] Search using 100k terms
To: General MarkLogic Developer Discussion
        <general@developer.marklogic.com>
Message-ID:
        <c9924d15b04672479b089f7d55ffc132029075c...@exchg-be.marklogic.com>
Content-Type: text/plain; charset="utf-8"

As always, Darin has lots of great ideas, I recommend trying them.

Given the nature of your data, though, I would try using the "exact" option
to cts:element-value-query.

I think the range index idea is really good, and then you can use a
range-query (cts:element-range-query) either with search:search or cts:uris.
I am not sure I would classify it as a "heavyweight" solution...it will use
a little more memory, but will be well worth it, I suspect.

Let us know how it goes.

-Danny

-----Original Message-----
From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of McBeath, Darin
W (ELS-STL)
Sent: Wednesday, July 20, 2011 7:28 AM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Search using 100k terms

I would also make sure that on the options for cts:element-value-query
that you have indicated 'punctuation-insensitive' and 'case-insensitive'.
I'm assuming that punctuation/case will not matter in your situation.

On 7/20/11 9:54 AM, "McBeath, Darin W (ELS-STL)" <d.mcbe...@elsevier.com>
wrote:

>A couple of thoughts ?
>
>Consider using cts:uris (assuming you have a lexicon URI index for your
>content).  This is a lower-level API than search:search and could get you
>better performance.  My guess is that search:search is likley using
>cts:search under the covers.  I don't know for sure as I typically user
>the lower level APIs (such as cts:search, cts:uris, etc.).  Those more
>familiar with search:search can elaborate on whether cts:search is being
>used by search:search.
>
>Continue to use cts:element-value-query (but I would consider breaking
>the list of 100,000 terms into chunks of 10,000 or something a bit more
>reasonable.  For these smaller chunks of work, I would consider spawning
>them on the task server so that they could potentially be done in
>parallel.  Of course, try 100,000 first and see if you can meet your
>performance criteria (< 10s).
>
>One last thought is that you might want to investigate creating a range
>index on ce:pii and use cts:element-range-query.  Not sure if this will
>be faster than cts:element-value-query ? But, I seem to recall  that
>range indexes are supposed to be kept in memory.  This is a fairly
>heavyweight solution as there could be implications on your DB sizing and
>your XML as the ce:pii element would need to be unique within your XML
>document (which is likely not the case) and I wouldn't recommend fields
>in this situation as a workaround.
>
>Darin.
>
>
>
>From: Vijayasekar Padmanaban
><vijayaseka...@infosys.com<mailto:vijayaseka...@infosys.com>>
>Reply-To: General MarkLogic Developer Discussion
><general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
>Date: Wed, 20 Jul 2011 13:28:05 +0530
>To: General MarkLogic Developer Discussion
><general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
>Subject: Re: [MarkLogic Dev General] Search using 100k terms
>
>Hi Jason,
>
>Sorry for the confusion.
>
>Please find below the snippet of the xml we have in DB. (DB is having 10
>million xml documents)
>
><ja:item-info>
><ja:jid>YMSG</ja:jid>
><ja:aid>0103883</ja:aid>
><ce:pii>S0011-3840(01)70009-3</ce:pii>
><ce:doi>10.1016/S0011-3840(01)70009-3</ce:doi>
><ce:copyright type="other" year="2001"/>
></ja:item-info>
>
>The file we used to upload will have the PIIs (which I had mentioned as
>terms in my earlier email) as shown below: (There could be 100k PIIs in
>the file)
>S0016-5085(68)70198-0
>S0016-5085(68)70199-2
>S0016-5085(68)70200-6
>S0016-5085(68)70201-8
>S0016-5085(68)70202-X
>S0016-5085(68)70203-1
>S0016-5085(68)70204-3
>?..
>..?
>
>I need to identify documents that matches the PIIs (which I had mentioned
>as terms in my earlier email) in the file.
>
>Currently we are using search:search() API in our application. Hence I
>had tried using the additional query option of search API as shown below:
>cts:element-value-query(xs:QName(?ce:pii?), $uploadedPIIs as xs:string*)
>
>But this additional query option is taking lot of time to yield result.
>
>So is there any other better way to perform this? Please suggest.
>
>Regards,
>Vijay
>
>From: 
>general-boun...@developer.marklogic.com<mailto:general-bounces@developer.m
>arklogic.com> [mailto:general-boun...@developer.marklogic.com] On Behalf
>Of Jason Hunter
>Sent: Wednesday, July 20, 2011 12:32 PM
>To: General MarkLogic Developer Discussion
>Subject: Re: [MarkLogic Dev General] Search using 100k terms
>
>You say "the term" but you also say you have 300,000 terms.  So I'm
>confused.
>
>You want to find documents that have all 300,000 terms?
>
>Or for each term you want to find documents having just that term?  And
>you want to do that basic query 300,000 times across all terms in less
>than 10 seconds?
>
>-jh-
>
>On Jul 19, 2011, at 11:13 PM, Vijayasekar Padmanaban wrote:
>
>
>Hi Jason,
>
>Thanks for your response.
>
>My DB is having 10 million documents in it. I need to identify the
>documents which have the term.
>I would expect search to retrieve results less than 10 seconds.
>
>Regards,
>Vijay
>
>From: 
>general-boun...@developer.marklogic.com<mailto:general-bounces@developer.m
>arklogic.com> [mailto:general-boun...@developer.marklogic.com] On Behalf
>Of Jason Hunter
>Sent: Wednesday, July 20, 2011 11:33 AM
>To: General MarkLogic Developer Discussion
>Subject: Re: [MarkLogic Dev General] Search using 100k terms
>
>I'm a little unclear on what you're trying to do.
>
>You want to take a list of 300,000 terms and identify which documents
>have each term?  Or do you only need to identify which terms are present
>in one or more documents and which terms aren't present anywhere?
>Something else?
>
>How long are you willing to wait for the answer?
>
>-jh-
>
>On Jul 19, 2011, at 10:45 PM, Vijayasekar Padmanaban wrote:
>
>
>
>Hi All,
>
>We have a use case to perform search based on the contents uploaded as a
>file. The file would have a max of 100,000 terms in it. We need to
>validate the contents of the file with our repository contents and
>produce results. Our repository contains 10 million contents. Each term
>in the file need to be validated with an element in the enhanced xml.
>
>Below are the two approached I had tried:
>1.       Using search constraints
>a.       Each search term would be concatenated with the constraint and
>would be joined using ?OR? delimiter as shown below:
>For e.g., ?const:<term1> OR const:<term2> OR const:<term3> OR
>const:<term3> OR ?..?
>                                This ended in stack overflow error when
>the number of search terms exceeded 1000
>2.       Using element value query
>a.       All the search terms would be passed as text to the
>cts:element-value-query as shown below:
>cts:element-value-query(<Qualifier-Name>, text as xs:string*)
>                                This worked well when DB contains less
>number of contents say 300,000. But when used with DB that has 10 million
>contents it failed saying ?Time limit exceeded?
>
>Could you suggest me the best possible approach to resolve this issue?
>
>Thanks,
>Vijay
>
>
>**************** CAUTION - Disclaimer *****************
>
>This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
>solely
>
>for the use of the addressee(s). If you are not the intended recipient,
>please
>
>notify the sender by e-mail and delete the original message. Further, you
>are not
>
>to copy, disclose, or distribute this e-mail or its contents to any other
>person and
>
>any such actions are unlawful. This e-mail may contain viruses. Infosys
>has taken
>
>every reasonable precaution to minimize this risk, but is not liable for
>any damage
>
>you may sustain as a result of any virus in this e-mail. You should carry
>out your
>
>own virus checks before opening the e-mail or attachment. Infosys
>reserves the
>
>right to monitor and review the content of all messages sent to or from
>this e-mail
>
>address. Messages sent to or from this e-mail address may be stored on the
>
>Infosys e-mail system.
>
>***INFOSYS******** End of Disclaimer ********INFOSYS***
>
>_______________________________________________
>General mailing list
>General@developer.marklogic.com<mailto:General@developer.marklogic.com>
>http://developer.marklogic.com/mailman/listinfo/general
>
>_______________________________________________
>General mailing list
>General@developer.marklogic.com<mailto:General@developer.marklogic.com>
>http://developer.marklogic.com/mailman/listinfo/general
>
>_______________________________________________ General mailing list
>General@developer.marklogic.com<mailto:General@developer.marklogic.com>
>http://developer.marklogic.com/mailman/listinfo/general
>_______________________________________________
>General mailing list
>General@developer.marklogic.com
>http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

------------------------------

Message: 3
Date: Wed, 20 Jul 2011 08:52:32 -0700
From: Evan Lenz <evan.l...@marklogic.com>
Subject: Re: [MarkLogic Dev General] Lots of collections ...
To: General MarkLogic Developer Discussion
        <general@developer.marklogic.com>
Message-ID: <ca4c4042.1034c%evan.l...@marklogic.com>
Content-Type: text/plain; charset="us-ascii"

Interesting approach. I have a few questions. First of all, do you even need
either (range indexes or collections)? The query example you gave should be
resolvable from the Universal Index alone. I tried this in CQ (after
creating 300 sample <logfile> documents):

xdmp:query-trace(true()),
//logfile[@host eq 'host1']

And then I looked in the logfile:

Analyzing path: fn:collection()/descendant::logfile[@host eq "host1"]
Step 1 is searchable: fn:collection()
Step 2 is searchable: descendant::logfile[@host eq "host1"]
Path is fully searchable.
Gathering constraints.
Comparison contributed hash value constraint: logfile/@host = "host1"
Step 2 predicate 1 contributed 1 constraint: @host eq "host1"
Comparison contributed hash value constraint: logfile/@host = "host1"
Step 2 predicate 1 contributed 1 constraint: @host eq "host1"
Step 2 contributed 2 constraints: descendant::logfile[@host eq "host1"]
Executing search.
Selected 100 fragments to filter

The above told me that the result was completely resolved from the Universal
Index since I haven't enabled any ranged indexes (and I know that exactly
100 of my sample docs have host="host1").

My other two questions:

 *   What is your main motivation for using collections rather than
attribute range indexes?
 *   How do you plan to associate the documents with the collection URIs?

Thanks,

Evan Lenz
Software Developer, Community
developer.marklogic.com<http://developer.marklogic.com>

From: "Lee, David" <d...@epocrates.com<mailto:d...@epocrates.com>>
Reply-To: General MarkLogic Developer Discussion
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Date: Tue, 19 Jul 2011 14:34:39 -0700
To: "General Mark Logic Developer Discussion
(general@developer.marklogic.com<mailto:general@developer.marklogic.com>)"
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Subject: [MarkLogic Dev General] Lots of collections ...

Thanks to some tips from this group (and especially Kelly !)
I've started leveraging collections instead of directories.  So far really
fantastic results !!!
Thank you all !!

Of course one success opens the doors to a million questions ...

Question ... Is there a significant cost to having a 'large' number of
overlapping  documents in collections ?
In my use case I may have millions of very similar small documents all with
some basic set of attributes which have a small set of possible values.
I've implemented attribute value range indexes, but was wondering if
collections might work better ?
A typical use case would be to filter a result set by only those documents
with a particular attribute set to one value.
If I had collections for each attribute/value combination  (maybe 100
collections max) A collection query could do the equivalent of a range
index.
Example:

<logfile host="host1" system="tomcat" ...>
   ...

Instead of making a range index on logfile/@host and logfile/@system
Make collections called    host-host1  host-host2  host-host3  ... and
system-tomcat system-mysql ...
Then this xpath
//logfile[@host eq 'host1']

would be equivalent to a collection search on 'host-host1'

Is this brilliant or stupid ?  Obviously there will be a tradeoff ... but
I'm thinking in this case since the number of possible values is very small
that collections might actually be a good thing.

-David





----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
d...@epocrates.com<mailto:d...@epocrates.com>
812-482-5224

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://developer.marklogic.com/pipermail/general/attachments/20110720/9a255d
8e/attachment.html 

------------------------------

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


End of General Digest, Vol 85, Issue 71
***************************************

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Clarification on MarkLogic

Reply via email to