[MarkLogic Dev General] rollback && commit problem

Sanjeev Medishetty Wed, 01 Aug 2007 21:18:42 -0700

Hi,

We are facing some different problem with rollback and commit statments.We 
have an application build on struts which uses xcc.jar API provided by 
marklogic.The API provides commit(), rollback() statements but no use. The 
queries are implicit commited when I submit the request.If  I 
setAutocommit() false it throws exception multi requests are not 
supported.


But we really require this for our project there are group of updates that 
should happen all or none( atomicity ).We don't want to use 
invokeModules() because there is lot of stuff we need to  do between the 
updates. Please help us in this regard it is extreemly urgent for our 
project.

Thanks,

Sanjeev Kumar Medishetty.

The information contained in this mail is classified as
() L&T Infotech General Business
(x) L&T Infotech Internal Use
() L&T Infotech Confidential
() L&T Infotech Proprietary & Confidential

Larsen & Toubro Infotech Ltd.
www.Lntinfotech.com

This Document is classified as: 

L&T Infotech Proprietary   L&T Infotech Confidential   L&T Infotech 
Internal Use Only   L&T Infotech General Business 

This Email may contain confidential or privileged information for the 
intended recipient (s) If you are not the intended recipient, please do 
not use or disseminate the information, notify the sender and delete it 
from your system. 



Andy Townsend <[EMAIL PROTECTED]> 
Sent by: [EMAIL PROTECTED]
08/01/2007 07:44 PM
Please respond to
General Mark Logic Developer Discussion <[email protected]>


To
General Mark Logic Developer Discussion <[email protected]>
cc

Subject
RE: [MarkLogic Dev General] Relevance and Fields







Stephen, 

Thanks for the response - I confess I had relegated this to the pile of 
unknowns.  I have tried to recreate the scenario this morning and have not 
yet fully recreated it - I suspect it is some kind of edge case that 
changes to my DB have affected, however I have repeated some pieces and 
have attached an annotated ErrorLog.txt extract.  To touch on your 
responses first - 

> Presumably when you say you ran 'cts:query(doc(), "myword")', you mean 
'cts:search(doc(), "myword")' ?? 
Yes of course I mean cts:search() - sorry for the confusion, clearly 
typing way too quickly. 

> "a) what the creation of a field is really doing to my DB in order to 
affect TF " 
> -- as described above, the creation of a field creates additional, 
field-specific termlists, so that TF on a cts:field-word-query() is 
> based on the number of times the term appears in the field. 
Okay - but perhaps we can clarify more with regard to the attachment. 

> b) what the TF normalization function is   
> -- the TF normalization function adjusts the count of the occurrences of 
a term according to the length of the document (strictly, the fragment). 
I do (and did) understand the principal, I guess I was asking what the 
algorithm is to see if that helped me understand other things - of course 
I respect that you consider the algorithm to be "secret sauce", though can 
you indicate whether it is a "well-behaved" function or whether there are 
'transition document sizes' where the function might cause quirky 
behaviour? 

And so to the attachment - from my ML installation this morning, Windows, 
version 3.2-1 

It seems to show IDF changing from 316/2 to 508/4 depending on the 
existence of the field. 

It also shows TF for the two matching documents/fragments changing before 
and after the creation of the field, though not currently (unlike my 
earlier example) changing back again after the field is deleted. 

Can you explain why/how these should change? 
Can you respond to / comment on the lines marked with "-- ??" ? 



Thanks in advance for any cycles that you can engage. 

Andy 






"Stephen Buxton" <[EMAIL PROTECTED]> 
Sent by: [EMAIL PROTECTED] 
01/08/2007 06:30 

Please respond to
General Mark Logic Developer Discussion <[email protected]>


To
"General Mark Logic Developer Discussion" 
<[email protected]> 
cc

Subject
RE: [MarkLogic Dev General] Relevance and Fields








Andy, 
  
  I finally managed to find a few cycles to try this out, and I'm puzzled. 

  
  You said: "... creating a Field appears to create a new index from which 
TF is calculated ..." 
  Creating a Field causes new termlists to be created. So if you create a 
field f1 that includes an element called title that contains the word 
"pig", a new termlist for "the-word-pig-in-the-field-f1" is created (in 
much the same way as when you turn on fast element word searches, a new 
termlist such as "the-word-pig-in-the-element-title" is created). You can 
think of this as creating "a new index", though we don't normally describe 
it that way - it's just creating a set of new termlists. 
  
  Then you described an experiment - here's where I'm puzzled. 
  Presumably when you say you ran 'cts:query(doc(), "myword")', you mean 
'cts:search(doc(), "myword")' ?? Or maybe 'cts:search(fn:collection(), 
"myword")' ?? 
  
  If you ran the same word query over the same corpus with the same 
database index settings, you should've seen the same scores. 
  If you ran a different query - e.g. if you used cts:field-word-query() 
instead of cts:word-query() - then, as you described in your "simple 
tests", you should see a different score. Now the TF is the number of 
times the term occurs *in the field*, not in the whole fragment. 
  I tried to reproduce your results with just a few documents - the "pig" 
documents I used in the User Conference presentation - and, as expected, I 
got the same score for a simple word query whether or not a field existed. 

  Could you possibly send me a test case? Or at least an excerpt from the 
trace? 
  The existence of a field should not affect the scores returned by a 
simple word query. 
  
  You asked: 
"a) what the creation of a field is really doing to my DB in order to 
affect TF " 
-- as described above, the creation of a field creates additional, 
field-specific termlists, so that TF on a cts:field-word-query() is based 
on the number of times the term appears in the field. 

b) what the TF normalization function is   
-- the TF normalization function adjusts the count of the occurrences of a 
term according to the length of the document (strictly, the fragment). If 
we didn't adjust for document length, then longer documents would always 
dominate the results since they are more likely to contain more 
occurrences of any given term. We don't publish the exact algorithm - 
partly because it's "secret sauce", and partly because we may tweak it 
from time to time. 
  
You said: 
  
"P.S.  As an aside - the developer docs describes "inverse document 
frequency" as "log(1/df) where df (document frequency) is the number of 
documents in which the term occurs." 
I think this is a little misleading  - it really means log( D/df) where D 
is the total number of documents (a.k.a fragments) or a variant definition 
of df is needed.  This is the behaviour that can be seen in the log trace. 
 Also, just to be pedantic (who me?) it should probably be ln(D/df) rather 
than log(D/df)  since it's the natural log :-) " 
  
Yes, correct. IDF is about the percentage of documents that contain a 
term, not the absolute number of documents that contain that term. 
I'll log a doc bug. 
  
- Steve B. 
  
Stephen Buxton 
Director of Product Management 
Mark Logic Corporation 
999 Skyway Road 
Suite 200 
San Carlos, CA 94070 
+1 650 655 2317 Phone 
[EMAIL PROTECTED] 
www.marklogic.com 
This e-mail and any accompanying attachments are confidential. The 
information is intended solely for the use of the individual to whom it is 
addressed. Any review, disclosure, copying, distribution, or use of this 
e-mail communication by others is strictly prohibited. If you are not the 
intended recipient, please notify us immediately by returning this message 
to the sender and delete all copies.  Thank you for your cooperation. 
  
 

From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Andy 
Townsend
Sent: Thursday, May 31, 2007 9:16 AM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] Relevance and Fields 


Hi folks, 

Could some kind soul (probably a kindly ML soul) please expand a little on 
how the new 3.2 Fields and Relevance interplay. 

Slide 14 from Stephen's presentation on relevance from the User Conference 
(I'm afraid I was in another session) hints that Fields can have an effect 
as it says down the bottom: 

       Relevance may be calculated with respect to 
       an element or a field 
               More focused relevance measurement 

However all the rest of the slides and the 3.2 developers guide (section 
23.2) refer only to fragments and the calculation of TF and IDF from 
fragment based stats. 

I ran some very simple tests in a DB with about a hundred documents and 
turned on the Relevance trace (as explained at the conference).  I was 
able to demonstrate that creating a Field appears to create a new index 
from which TF is calculated since when doing a cts:field-word-query() 
since I could see a lower TF value in the trace output (for a document 
where some term occurances fell in the field and some fell outside). 
Marvellous! 

However......  when doing a simple word-query across all docs I found that 
relevance actually varied depending on whether the Field actually existed. 


i.e. 
- DB, no fields, run cts:query(doc(), "myword") and docA gets relevance X 
- create field, wait for DB to settle down after reindexing 
- DB, with field, re-run cts:query(doc(), "myword") and now docA gets 
relevance Y where Y < X   (!!) 
- drop field, wait for reindexing to settle 
- DB, no fields, re-run cts:query(doc(), "myword") and now docA gets 
relevance X again.     (!!!) 

The Relevance trace shows that the only value changing is the value for TF 
(so IDF still the same, number of total fragments still the same) however 
the number of term occurances has not changed, neither (as far as I know) 
has the fragment size.  This makes me wonder: 
a) what the creation of a field is really doing to my DB in order to 
affect TF 
b) what the TF normalization function is  - this function is refered to on 
slide 12, normalization for fragment length and in 23.1.1 in the developer 
docs where it also says: 

       "a word that occurs 10 times in a 100 word document will get a 
higher score than a word that occurs 100 times in a 1,000 word document" 

but gives no further details of what this function is and why docs with 
10/100 should count less than docs with 100/1000 

Any clarifications on Fields, Field indexes and how these interplay with 
relevance calculations? 

Thanks in advance, 

Andy 

P.S.  As an aside - the developer docs describes "inverse document 
frequency" as "log(1/df) where df (document frequency) is the number of 
documents in which the term occurs." 

I think this is a little misleading  - it really means log( D/df) where D 
is the total number of documents (a.k.a fragments) or a variant definition 
of df is needed.  This is the behaviour that can be seen in the log trace. 
 Also, just to be pedantic (who me?) it should probably be ln(D/df) rather 
than log(D/df)  since it's the natural log :-) 




The information contained in this e-mail and any subsequent
correspondence is private and confidential and intended solely 
for the named recipient(s).  If you are not a named recipient, 
you must not copy, distribute, or disseminate the information, 
open any attachment, or take any action in reliance on it.  If you 
have received the e-mail in error, please notify the sender and delete
the e-mail. 

Any views or opinions expressed in this e-mail are those of the 
individual sender, unless otherwise stated.  Although this e-mail has 
been scanned for viruses you should rely on your own virus check, as 
the sender accepts no liability for any damage arising out of any bug 
or virus infection.

John Wiley & Sons Limited is a private limited company registered in
England with registered number 641132.

Registered office address: The Atrium, Southern Gate, Chichester,
West Sussex, PO19 8SQ.

 _______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

The information contained in this e-mail and any subsequent
correspondence is private and confidential and intended solely 
for the named recipient(s).  If you are not a named recipient, 
you must not copy, distribute, or disseminate the information, 
open any attachment, or take any action in reliance on it.  If you 
have received the e-mail in error, please notify the sender and delete
the e-mail. 
 
Any views or opinions expressed in this e-mail are those of the 
individual sender, unless otherwise stated.  Although this e-mail has 
been scanned for viruses you should rely on your own virus check, as 
the sender accepts no liability for any damage arising out of any bug 
or virus infection.

John Wiley & Sons Limited is a private limited company registered in
England with registered number 641132.

Registered office address: The Atrium, Southern Gate, Chichester,
West Sussex, PO19 8SQ.


 

______________________________________________________________________
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general



______________________________________________________________________

** REINDEX TO GET TO A BASELINE **

2007-08-01 11:02:57.515 Info: Saving D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000002f
2007-08-01 11:03:04.304 Info: Saved 19 MB in 6 sec at 3 MB/sec to D:\Documents 
and Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000002f
2007-08-01 11:03:04.705 Info: Merging D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000002e and 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000002f to 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000031
2007-08-01 11:03:24.243 Info: Saving D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000030
2007-08-01 11:03:24.804 Info: Merged 68 MB in 21 sec at 3 MB/sec to 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000031
2007-08-01 11:03:27.057 Info: Deleted D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000002e
2007-08-01 11:03:27.407 Info: Deleted D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000002f
2007-08-01 11:03:30.892 Info: Saved 19 MB in 6 sec at 3 MB/sec to D:\Documents 
and Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000030
2007-08-01 11:03:31.283 Info: Merging D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000031 and 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000030 to 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000033
2007-08-01 11:03:39.254 Info: Refragmented 316 fragments in 53 sec at 5 
fragments/sec on forest ForestOne
2007-08-01 11:03:52.373 Info: Merged 74 MB in 22 sec at 3 MB/sec to 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000033
2007-08-01 11:03:52.663 Info: Deleted D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000031
2007-08-01 11:03:52.804 Info: Deleted D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000030
2007-08-01 11:03:55.738 Info: Saving D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000032
2007-08-01 11:03:57.560 Info: Saved 13 MB in 1 sec at 13 MB/sec to D:\Documents 
and Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000032
2007-08-01 11:03:57.731 Info: Merging D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000033 and 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000032 to 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000034
2007-08-01 11:04:06.864 Info: Merged 75 MB in 10 sec at 7 MB/sec to 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000034
2007-08-01 11:04:07.154 Info: Deleted D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000033
2007-08-01 11:04:07.294 Info: Deleted D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000032

** SIMPLE SEARCH FOR THE TERM "Hermite" **
(:
let $q2 := cts:word-query("hermite")
return cts:search(fn:doc(),$q2)
:)
Notice second line "8*weight*idf=8*1*log(316/2)=41" indicating idf = log(316/2)
        Is that 316 fragments, with 2 containing the search term -- ??
Notice third line - TF for ".../WCM287.xml"
        "score=scale*logtf=41*3=123" indicating logtf=3 for this fragment
Notice fifth line - TF for ".../WCM280.xml"
        "score=scale*logtf=41*9=369" indicating logtf=9 for this fragment

2007-08-01 11:04:35.104 Info: [Event:id=Relevance Term] key=0xb7543dddd0628c2e 
text="hermite"
2007-08-01 11:04:35.114 Info: [Event:id=Relevance IDF] key=0xb7543dddd0628c2e 
scale=8*weight*idf=8*1*log(316/2)=41
2007-08-01 11:04:35.134 Info: [Event:id=Relevance TF] key=0xb7543dddd0628c2e 
score=scale*logtf=41*3=123 
fragment=doc("/content/journals/WCM/WCM287/WCM287.xml")
2007-08-01 11:04:35.134 Info: [Event:id=Relevance Quality] 
score=scoreSum/weightSum+weight*quality=123/1+1*0=123 
confidence=sqrt(score/(8*maxlogtf*maxidf))=sqrt(123/(8*18*log(316)))=0.385231 
fitness=sqrt(score/(8*maxlogtf*avgidf))=sqrt(123/(8*18*(5.06259/1)))=0.410757 
fragment=doc("/content/journals/WCM/WCM287/WCM287.xml")
2007-08-01 11:04:35.144 Info: [Event:id=Relevance TF] key=0xb7543dddd0628c2e 
score=scale*logtf=41*9=369 
fragment=doc("/content/journals/WCM/WCM280/WCM280.xml")
2007-08-01 11:04:35.144 Info: [Event:id=Relevance Quality] 
score=scoreSum/weightSum+weight*quality=369/1+1*0=369 
confidence=sqrt(score/(8*maxlogtf*maxidf))=sqrt(369/(8*18*log(316)))=0.667239 
fitness=sqrt(score/(8*maxlogtf*avgidf))=sqrt(369/(8*18*(5.06259/1)))=0.711452 
fragment=doc("/content/journals/WCM/WCM280/WCM280.xml")

** NOW CREATE FIELD "field_sect1" on element "sect1" which contains some of the 
occurances of the search term **
** ALLOW REINDEXER TO DO ITS THING **

2007-08-01 11:05:54.377 Info: Saving D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000035
2007-08-01 11:06:01.838 Info: Saved 18 MB in 7 sec at 2 MB/sec to D:\Documents 
and Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000035
2007-08-01 11:06:02.519 Info: Merging D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000034 and 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000035 to 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000037
2007-08-01 11:06:16.629 Info: Saving D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000036
2007-08-01 11:06:24.450 Info: Saved 18 MB in 7 sec at 2 MB/sec to D:\Documents 
and Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000036
2007-08-01 11:06:28.466 Info: Merged 54 MB in 26 sec at 2 MB/sec to 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000037
2007-08-01 11:06:29.117 Info: Deleted D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000035
2007-08-01 11:06:29.678 Info: Merging D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000037 and 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000036 to 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000039
2007-08-01 11:06:42.446 Info: Saving D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000038
2007-08-01 11:06:50.167 Info: Saved 19 MB in 7 sec at 2 MB/sec to D:\Documents 
and Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000038
2007-08-01 11:06:55.094 Info: Merged 63 MB in 26 sec at 2 MB/sec to 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000039
2007-08-01 11:06:55.905 Info: Deleted D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000037
2007-08-01 11:06:55.935 Info: Merging D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000039 and 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000038 to 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000003b
2007-08-01 11:06:56.135 Info: Deleted D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000036
2007-08-01 11:07:02.695 Info: Refragmented 190 fragments in 79 sec at 2 
fragments/sec on forest ForestOne
2007-08-01 11:07:03.506 Info: Deleted D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000034
2007-08-01 11:07:15.633 Info: Merged 70 MB in 20 sec at 3 MB/sec to 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000003b
2007-08-01 11:07:16.094 Info: Deleted D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000039
2007-08-01 11:07:16.244 Info: Deleted D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\00000038
2007-08-01 11:07:19.819 Info: Saving D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000003a
2007-08-01 11:07:23.144 Info: Saved 18 MB in 3 sec at 6 MB/sec to D:\Documents 
and Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000003a
2007-08-01 11:07:23.314 Info: Merging D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000003b and 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000003a to 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000003c

** REPEAT SIMPLE SEARCH FOR THE TERM "Hermite" **
(:
let $q2 := cts:word-query("hermite")
return cts:search(fn:doc(),$q2)
:)
Notice second line "8*weight*idf=8*1*log(508/4)=41" indicating idf = log(508/4)
        Is that 508 fragments, with 4 containing the search term -- ??
        idf has changed from log(316/2)=41 to log(508/4)=39 -- ??
Notice third line - TF for ".../WCM287.xml"
        "score=scale*logtf=39*2=78" indicating logtf=2 for this fragment 
        logtf apparently changed from 3 to 2 -- ??
Notice fifth line - TF for ".../WCM280.xml"
        "score=scale*logtf=39*8=312" indicating logtf=8 for this fragment
        logtf apparently changed from 9 to 8 -- ??

2007-08-01 11:07:36.934 Info: [Event:id=Relevance Term] key=0xb7543dddd0628c2e 
text="hermite"
2007-08-01 11:07:37.024 Info: [Event:id=Relevance IDF] key=0xb7543dddd0628c2e 
scale=8*weight*idf=8*1*log(508/4)=39
2007-08-01 11:07:37.044 Info: [Event:id=Relevance TF] key=0xb7543dddd0628c2e 
score=scale*logtf=39*2=78 
fragment=doc("/content/journals/WCM/WCM287/WCM287.xml")
2007-08-01 11:07:37.044 Info: [Event:id=Relevance IDF] key=0xb7543dddd0628c2e 
scale=8*weight*idf=8*1*log(508/4)=39
2007-08-01 11:07:37.044 Info: [Event:id=Relevance Quality] 
score=scoreSum/weightSum+weight*quality=78/1+1*0=78 
confidence=sqrt(score/(8*maxlogtf*maxidf))=sqrt(78/(8*18*log(508)))=0.294853 
fitness=sqrt(score/(8*maxlogtf*avgidf))=sqrt(78/(8*18*(4.84419/1)))=0.334392 
fragment=doc("/content/journals/WCM/WCM287/WCM287.xml")
2007-08-01 11:07:37.054 Info: [Event:id=Relevance TF] key=0xb7543dddd0628c2e 
score=scale*logtf=39*8=312 
fragment=doc("/content/journals/WCM/WCM280/WCM280.xml")
2007-08-01 11:07:37.054 Info: [Event:id=Relevance Quality] 
score=scoreSum/weightSum+weight*quality=312/1+1*0=312 
confidence=sqrt(score/(8*maxlogtf*maxidf))=sqrt(312/(8*18*log(508)))=0.589706 
fitness=sqrt(score/(8*maxlogtf*avgidf))=sqrt(312/(8*18*(4.84419/1)))=0.668784 
fragment=doc("/content/journals/WCM/WCM280/WCM280.xml")

** DROP FIELD field_sect1 **

2007-08-01 11:07:37.474 Info: Merged 76 MB in 15 sec at 5 MB/sec to 
D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000003c
2007-08-01 11:07:37.755 Info: Deleted D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000003b
2007-08-01 11:07:37.885 Info: Deleted D:\Documents and 
Settings\atownsen\MarkLogic\ForestOne\Forests\ForestOne\0000003a


** REPEAT SIMPLE SEARCH FOR THE TERM "Hermite" **
(:
let $q2 := cts:word-query("hermite")
return cts:search(fn:doc(),$q2)
:)
Notice second line "8*weight*idf=8*1*log(316/2)=41" indicating logtf = 
log(316/2)
        Is that 316 fragments, with 2 containing the search term??
        idf has changed back to log(316/2)=41 after deleting field -- ??
Notice third line - TF for ".../WCM287.xml"
        "score=scale*logtf=41*2=78" indicating logtf=2 for this fragment 
        logtf apparently still 2 
Notice fifth line - TF for ".../WCM280.xml"
        "score=scale*logtf=41*8=312" indicating logtf=8 for this fragment
        logtf apparently still 8

2007-08-01 11:09:06.612 Info: [Event:id=Relevance Term] key=0xb7543dddd0628c2e 
text="hermite"
2007-08-01 11:09:06.632 Info: [Event:id=Relevance IDF] key=0xb7543dddd0628c2e 
scale=8*weight*idf=8*1*log(316/2)=41
2007-08-01 11:09:06.632 Info: [Event:id=Relevance TF] key=0xb7543dddd0628c2e 
score=scale*logtf=41*2=82 
fragment=doc("/content/journals/WCM/WCM287/WCM287.xml")
2007-08-01 11:09:06.632 Info: [Event:id=Relevance Quality] 
score=scoreSum/weightSum+weight*quality=82/1+1*0=82 
confidence=sqrt(score/(8*maxlogtf*maxidf))=sqrt(82/(8*18*log(316)))=0.314539 
fitness=sqrt(score/(8*maxlogtf*avgidf))=sqrt(82/(8*18*(5.06259/1)))=0.335381 
fragment=doc("/content/journals/WCM/WCM287/WCM287.xml")
2007-08-01 11:09:06.632 Info: [Event:id=Relevance TF] key=0xb7543dddd0628c2e 
score=scale*logtf=41*8=328 
fragment=doc("/content/journals/WCM/WCM280/WCM280.xml")
2007-08-01 11:09:06.632 Info: [Event:id=Relevance Quality] 
score=scoreSum/weightSum+weight*quality=328/1+1*0=328 
confidence=sqrt(score/(8*maxlogtf*maxidf))=sqrt(328/(8*18*log(316)))=0.629079 
fitness=sqrt(score/(8*maxlogtf*avgidf))=sqrt(328/(8*18*(5.06259/1)))=0.670763 
fragment=doc("/content/journals/WCM/WCM280/WCM280.xml")

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

[MarkLogic Dev General] rollback && commit problem

Reply via email to