Re: [MarkLogic Dev General] Trying to add a rule to xqdt plug-in

2012-01-03 Thread Raghu
I would like to validate my xquery against a set of rules on saving the
xquery in eclipse. Say I am declaring a variable in xquery and never using
it, i would like to throw a warning saying this variable is never read
locally, similar to how java is compiled and warnings are displayed.

On Jan 3, 2012 9:51 PM, "Geert Josten"  wrote:

Hi Raghu,



Not quite sure I understand what you are trying to achieve. Can you
elaborate on what you are trying to do? Are you talking about something
like the Java imports cleanup of Eclipse, but applied to declarations in an
XQuery module? Or is it more like a cleanup of your output?



Did you consider posting an issue at
https://bugs.eclipse.org/bugs/enter_bug.cgi?product=WTP%20Incubator&component=wtp.inc.xquery?



Kind regards,

Geert



*Van:* general-boun...@developer.marklogic.com [mailto:
general-boun...@developer.marklogic.com] *Namens *Raghu
*Verzonden:* dinsdag 3 januari 2012 16:26
*Aan:* General MarkLogic Developer Discussion
*Onderwerp:* [MarkLogic Dev General] Trying to add a rule to xqdt plug-in





Hi All,



  I am trying to add a rule to xqdt plugin (e.g) to remove unused
namespace...

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Search documents without a node tag

2012-01-03 Thread Evan Lenz
You should be able to pass in the not-query using search:search's 
 option, if that's what you want.

But if you want to translate "title:null" to a not query, I think you'd need to 
create a custom constraint. Here's a blog article by Dave Cassel about those: 
http://blog.davidcassel.net/2011/07/a-custom-facet-for-the-search-api/

If you only need it to be a constraint (and not a facet), all you need to 
implement is the search:parse() function, so it would look something like this 
(untested):

declare function facet:parse(
  $constraint-qtext as xs:string,
  $right as schema-element(cts:query))
as schema-element(cts:query)
{
  let $text := string($right//cts:text),
  $query := if ($text eq 'null') then 
cts:not-query(cts:element-query(xs:QName("TITLE"), cts:and-query((
 else cts:element-query(xs:QName("TITLE"), 
$text)
  return
{$query}/*
};

The  element is just used as a temporary parent element for converting 
the cts:query value to a  element.

If you're getting unexpected results from negated queries, you need to make 
sure that the original query does not include any false positives. False 
positives are normally fine (because they can get filtered out later). But if 
you negate the query using cts:not-query (which means get all documents except 
those matching this query), then false positives turn into false negatives and 
hence missing results (and filtering can't help you there as it only removes 
candidate results). Moral of the story with cts:not-query() is to use it very 
cautiously—only in places where you know the negated query won't return false 
positives. A query for the presence of  should fall into that category 
(fully resolvable from the Universal Index).

Evan Lenz
Software Developer, Community
MarkLogic Corporation
http://developer.marklogic.com


From: Michael Sokolov mailto:soko...@ifactory.com>>
Reply-To: General MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>
Date: Tue, 3 Jan 2012 06:41:00 -0800
To: General MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Search documents without a node tag

You can do cts:not-query(cts:element-query(xs:QName("TITLE"), 
cts:and-query(( to find documents that don't have any TITLE elements.  I 
don't know how to do that in the search api, though.

-Mike

On 1/3/2012 8:48 AM, Mariano Grau Calín wrote:
Hi all,

We want to find documents where a node tag not exist.
By example, this query return documents where TITLE has a value. The opposite 
we want.

doc()/DOC[AGENCY='EFE' and TITLE]

If we try to negate query, result is not like expected.

doc()/DOC[AGENCY='Grupo Joly' and TITLE[text() is null]] (: same result :)

doc()/DOC[AGENCY='EFE' and not(TITLE)] (: all documents where TITLE exists or 
not exists :)

doc()/DOC[AGENCY='EFE' and TITLE='']  (: zero results!!! :)

Really, we'd like to define a constraint in search api and to write something 
like:

search:search('title:null age:efe)

Or otherwise a additionat-query with cts:query code.


Regards,

Mariano Grau
mgrau @ grupojoly.com
Dpto. Sistemas
Grupo Joly




___
General mailing list
General@developer.marklogic.comhttp://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] How to use near query insearch:search function

2012-01-03 Thread Mariano Grau Calín
Thanks,

I don't had seen NEAR examples in search:search function. 
Fortunately we have this version.


Mariano Grau
Dpto. Sistemas
Grupo Joly



-Mensaje original-
De: general-boun...@developer.marklogic.com en nombre de Colleen Whitney
Enviado el: mar 03/01/2012 18:17
Para: General MarkLogic Developer Discussion
CC: general@developer.marklogic.com
Asunto: Re: [MarkLogic Dev General] How to use near query insearch:search   
function
 
Mariano, starting with version 4.2 there are two near query operators built in: 
 NEAR and NEAR/# (where # is a distance).

So "dog NEAR cat" parses to a near query with the default distance (10), and 
"dog NEAR/2 cat" parses to a near query with a distance of 2.

Sent from my iPhone

On Jan 3, 2012, at 6:07 AM, "Mariano Grau Calín" 
mailto:mg...@grupojoly.com>> wrote:

Hi,

Is there any operator in search:search function for to run near-query searchs?

Regards,

Mariano Grau
mgrau @  grupojoly.com
Dpto. Sistemas
Grupo Joly

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

<>___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] search:parse/unparse not reversible?

2012-01-03 Thread Micah Dubinko
J.J.,

Your understanding of the relationship between parse and unparse is correct.

Which version of the server are you running? This looks like a bug that was 
recently fixed in 4.2-8 and 4.1-12.

-m


On Jan 3, 2012, at 10:22 AM, J.J. Larrea wrote:

> Please correct any misunderstanding I may have, but from the 
> documentation I concluded that search:unparse was meant to reverse the 
> action of search:parse and construct a query string functionally 
> identical to the original (functionally identical because it is 
> reconstructed from the cts:query tree and might not have redundant 
> parentheses and such); If the output of search:unparse is indeed 
> functionally identical then one should be able to reparse it and obtain 
> an identical cts:query tree as obtained from the original search:parse.
> 
> However it looks to me like there are cases where search:unparse does 
> not properly render even a trivial cts:query tree created via 
> search:parse, even using the default grammar returned by 
> search:get-default-options().  Referring to the output from the attached 
> snippet, where $query := 'pos -(neg1 OR neg2)', with a simplified tree 
> notation:
> 
>  (a)  search:parse( $query, $opts ) => AND( pos, NOT( OR( neg1, neg2 ) 
> ) ) as expected.
> 
>  (b)  search:unparse of (a) => omits the parentheses grouping the 
> target of the NOT eg. 'pos -neg1 OR neg2'
> 
>  (c)  search:parse of ( b ) => parses the misrendering as expected eg. 
> AND( pos, OR( NOT( neg1 ), neg2 ) )
> 
> Query trees (a) and (c) are obviously not functionally identical, since 
> in the first case neg1 is prohibited and the second it is required. The 
> same behavior can be seen with -(neg1 neg2), or other cases with a 
> complex cts:not-query target.
> 
> As a workaround I created my own implementation of unparse, and it seems 
> simple enough to parenthesize clauses which don't confirm to the default 
> operator precedence simply by passing the parent clause strength down 
> the recursion and parenthesizing whenever a sub-clause's binding 
> strength is less than the parent; it works for a few test cases at 
> least. Can't the built-in search:unparse do that?
> 
> - J.J. Larrea
> 
> ___
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


[MarkLogic Dev General] search:parse/unparse not reversible?

2012-01-03 Thread J.J. Larrea
Please correct any misunderstanding I may have, but from the 
documentation I concluded that search:unparse was meant to reverse the 
action of search:parse and construct a query string functionally 
identical to the original (functionally identical because it is 
reconstructed from the cts:query tree and might not have redundant 
parentheses and such); If the output of search:unparse is indeed 
functionally identical then one should be able to reparse it and obtain 
an identical cts:query tree as obtained from the original search:parse.


However it looks to me like there are cases where search:unparse does 
not properly render even a trivial cts:query tree created via 
search:parse, even using the default grammar returned by 
search:get-default-options().  Referring to the output from the attached 
snippet, where $query := 'pos -(neg1 OR neg2)', with a simplified tree 
notation:


 (a)  search:parse( $query, $opts ) => AND( pos, NOT( OR( neg1, neg2 ) 
) ) as expected.


 (b)  search:unparse of (a) => omits the parentheses grouping the 
target of the NOT eg. 'pos -neg1 OR neg2'


 (c)  search:parse of ( b ) => parses the misrendering as expected eg. 
AND( pos, OR( NOT( neg1 ), neg2 ) )


Query trees (a) and (c) are obviously not functionally identical, since 
in the first case neg1 is prohibited and the second it is required. The 
same behavior can be seen with -(neg1 neg2), or other cases with a 
complex cts:not-query target.


As a workaround I created my own implementation of unparse, and it seems 
simple enough to parenthesize clauses which don't confirm to the default 
operator precedence simply by passing the parent clause strength down 
the recursion and parenthesizing whenever a sub-clause's binding 
strength is less than the parent; it works for a few test cases at 
least. Can't the built-in search:unparse do that?


- J.J. Larrea

xquery version "1.0-ml";

import module namespace search="http://marklogic.com/appservices/search";
at "/MarkLogic/appservices/search/search.xqy";

declare variable $query := 'pos -(neg1 OR neg2)';

let $options := search:get-default-options()
let $parsed := search:parse( $query, $options )
let $unparsed := search:unparse( $parsed )
let $reparsed := search:parse( $unparsed, $options )
return (
"Original:",
$query,
"Unparsed:",
$unparsed,
"Parsed Tree:",
$parsed,
"Reparsed Tree:",
$reparsed,
"Options (default):",
$options
)___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Invalid entity reference "ndash"

2012-01-03 Thread Geert Josten
Hi Dan,

I think that if you search the docs you will see MarkLogic is quite clear
about what is supported and what not (you just need to find the right
sections ;-).

I disagree that the fact that MarkLogic accepts a DOCTYPE should mean it
should also use it to validate at read (however simple it may seem, which
it isn't). Most systems I worked with during my roughly 15 year of XML
experience didn't do so, nor is *any* XML parser required or even supposed
to do so.

But I do agree that at least reading entities from external dtds (as well
as handling the encoding of the XML decl correctly) like most ordinary XML
parsers do would have been very convenient in uploading DTD-style and
non-Unicode XML which I have faced myself a lot during those years..

Kind regards,
Geert

-Oorspronkelijk bericht-
Van: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] Namens dv...@dvint.com
Verzonden: dinsdag 3 januari 2012 17:31
Aan: General MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Invalid entity reference "ndash"

Do they say schema support or XML support? Validation is based upon
reading either the schema or the DTD in any fashion it might be referenced
or constructed. So if they provide any DOCTYPE support which would only be
DTDs, then I would expect it to be very easy to say read and validate as
configured.

This would have complicated my project had it gone through. Would not have
shut it down, but I would have been diappointed in support for something I
would consider key for any documentation project.

..dan


> Hi Dan,
>
> It surprised me a bit too. But not sure the XML rec requires XML parsers
> to support DTD's at all (can't seem to find the relevant section). But
> MarkLogic Server has very good XML Schema support, so I wouldn't say it
> doesn't validate at all. It is just focusing on XML Schema instead of
DTD
> (nor both)..
>
> Kind regards,
> Geert
>
> -Oorspronkelijk bericht-
> Van: general-boun...@developer.marklogic.com
> [mailto:general-boun...@developer.marklogic.com] Namens dv...@dvint.com
> Verzonden: dinsdag 3 januari 2012 16:53
> Aan: General MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] Invalid entity reference "ndash"
>
> That is an interesting limitation I was not aware of. Works with XML
> documents but does not provide full validation capabilites - or the
> ability to work with valid documents as is.
>
> I got some intial training and intoroduction to Marklogic but then never
> got the project to actually implement anything.
>
> ..dan
>
>
>> Hi John,
>>
>>
>>
>> MarkLogic Server handles DOCTYPE rules only very limited. Only entity
>> declarations in the local subset are parsed and used. References to any
>> external entity or dtd file is ignored. That is why a dtd ref doesn’t
>> work.
>> Ron gave a work-around (I have posted similar code to handle mixed
>> encodings by the way some while ago), but that is pretty expensive if
> you
>> need to load many docs. If you need to load many docs, you might prefer
> to
>> use xmlsh or recordloader or any of the other available tools to insert
>> your data. These have better support for DOCTYPEs..
>>
>>
>>
>> I do recall another workaround, which might be acceptable for you.
There
>> is
>> this repair option that defaults to none. If you change it to full, it
>> should allow most of the iso entities and convert them to the
> appropriate
>> Unicode characters automatically. The full repair might do more than
you
>> need though, in case the xml is not well-formed..
>>
>>
>>
>> Kind regards,
>>
>> Geert
>>
>>
>>
>> *Van:* general-boun...@developer.marklogic.com [mailto:
>> general-boun...@developer.marklogic.com] *Namens *John Zhong
>> *Verzonden:* vrijdag 30 december 2011 18:17
>> *Aan:* General MarkLogic Developer Discussion
>> *Onderwerp:* Re: [MarkLogic Dev General] Invalid entity reference
> "ndash"
>>
>>
>>
>> Yes, I make sure the dtd and the associated ent files are in the
correct
>> location. And I was saying the xdmp:document-get function does not work
> in
>> this case.
>>
>> Actually, I did a simple test by defining a simple xml and dtd:
>>
>> test.dtd:
>>
>> 
>> 
>>
>> test.xml:
>>
>> 
>> 
>> –
>>
>> I can open the xml by ie without problem (it shows error if I delete
the
>> entity definition in test.dtd), then tested xdmp:document-get function
>> again, it shows a error:
>>
>> [1.0-ml] XDMP-DOCENTITYREF: xdmp:document-get("/test.xml") -- Invalid
>> entity reference "ndash" at /test.xml line 3
>>
>> John
>>
>> On Sat, Dec 31, 2011 at 12:17 AM, Dan Vint  wrote:
>>
>> You need to make sure the external text entity is
>> being read (or that the file being referenced is
>> being found). You may not actually be reading in
>> the DTD file itself. The way it is configured is
>> correct, your code just needs to make sure it is
>> expanding the doctype via the system or public
>> identifier (journalpublishing3.dtd or via XML
>> catalog //NLM//DTD Jo

Re: [MarkLogic Dev General] How to use near query in search:search function

2012-01-03 Thread Colleen Whitney
Mariano, starting with version 4.2 there are two near query operators built in: 
 NEAR and NEAR/# (where # is a distance).

So "dog NEAR cat" parses to a near query with the default distance (10), and 
"dog NEAR/2 cat" parses to a near query with a distance of 2.

Sent from my iPhone

On Jan 3, 2012, at 6:07 AM, "Mariano Grau Calín" 
mailto:mg...@grupojoly.com>> wrote:

Hi,

Is there any operator in search:search function for to run near-query searchs?

Regards,

Mariano Grau
mgrau @  grupojoly.com
Dpto. Sistemas
Grupo Joly

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Invalid entity reference "ndash"

2012-01-03 Thread dvint
Do they say schema support or XML support? Validation is based upon
reading either the schema or the DTD in any fashion it might be referenced
or constructed. So if they provide any DOCTYPE support which would only be
DTDs, then I would expect it to be very easy to say read and validate as
configured.

This would have complicated my project had it gone through. Would not have
shut it down, but I would have been diappointed in support for something I
would consider key for any documentation project.

..dan


> Hi Dan,
>
> It surprised me a bit too. But not sure the XML rec requires XML parsers
> to support DTD's at all (can't seem to find the relevant section). But
> MarkLogic Server has very good XML Schema support, so I wouldn't say it
> doesn't validate at all. It is just focusing on XML Schema instead of DTD
> (nor both)..
>
> Kind regards,
> Geert
>
> -Oorspronkelijk bericht-
> Van: general-boun...@developer.marklogic.com
> [mailto:general-boun...@developer.marklogic.com] Namens dv...@dvint.com
> Verzonden: dinsdag 3 januari 2012 16:53
> Aan: General MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] Invalid entity reference "ndash"
>
> That is an interesting limitation I was not aware of. Works with XML
> documents but does not provide full validation capabilites - or the
> ability to work with valid documents as is.
>
> I got some intial training and intoroduction to Marklogic but then never
> got the project to actually implement anything.
>
> ..dan
>
>
>> Hi John,
>>
>>
>>
>> MarkLogic Server handles DOCTYPE rules only very limited. Only entity
>> declarations in the local subset are parsed and used. References to any
>> external entity or dtd file is ignored. That is why a dtd ref doesn’t
>> work.
>> Ron gave a work-around (I have posted similar code to handle mixed
>> encodings by the way some while ago), but that is pretty expensive if
> you
>> need to load many docs. If you need to load many docs, you might prefer
> to
>> use xmlsh or recordloader or any of the other available tools to insert
>> your data. These have better support for DOCTYPEs..
>>
>>
>>
>> I do recall another workaround, which might be acceptable for you. There
>> is
>> this repair option that defaults to none. If you change it to full, it
>> should allow most of the iso entities and convert them to the
> appropriate
>> Unicode characters automatically. The full repair might do more than you
>> need though, in case the xml is not well-formed..
>>
>>
>>
>> Kind regards,
>>
>> Geert
>>
>>
>>
>> *Van:* general-boun...@developer.marklogic.com [mailto:
>> general-boun...@developer.marklogic.com] *Namens *John Zhong
>> *Verzonden:* vrijdag 30 december 2011 18:17
>> *Aan:* General MarkLogic Developer Discussion
>> *Onderwerp:* Re: [MarkLogic Dev General] Invalid entity reference
> "ndash"
>>
>>
>>
>> Yes, I make sure the dtd and the associated ent files are in the correct
>> location. And I was saying the xdmp:document-get function does not work
> in
>> this case.
>>
>> Actually, I did a simple test by defining a simple xml and dtd:
>>
>> test.dtd:
>>
>> 
>> 
>>
>> test.xml:
>>
>> 
>> 
>> –
>>
>> I can open the xml by ie without problem (it shows error if I delete the
>> entity definition in test.dtd), then tested xdmp:document-get function
>> again, it shows a error:
>>
>> [1.0-ml] XDMP-DOCENTITYREF: xdmp:document-get("/test.xml") -- Invalid
>> entity reference "ndash" at /test.xml line 3
>>
>> John
>>
>> On Sat, Dec 31, 2011 at 12:17 AM, Dan Vint  wrote:
>>
>> You need to make sure the external text entity is
>> being read (or that the file being referenced is
>> being found). You may not actually be reading in
>> the DTD file itself. The way it is configured is
>> correct, your code just needs to make sure it is
>> expanding the doctype via the system or public
>> identifier (journalpublishing3.dtd or via XML
>> catalog //NLM//DTD Journal Publishing DTD v3.0
>> 20080202//EN") and associated files before it starts parsing the
> content.
>>
>> ..dan
>>
>>
>>
>> At 07:28 AM 12/30/2011, you wrote:
>>>Thank you for your note, Erik.
>>>
>>>And thank you for your code, Ron. Yes, I did try that and it worked.
>>>
>>>The xml file has already had a dtd declaration
>Publishing DTD v3.0 20080202//EN"
>>>"journalpublishing3.dtd">, and the dtd file also
>>>reference to a isopub.ent file that defined this
>>>entity.
>>>
>>>So, it seems I have to make the entity definition appear in the xml:
>>>
>Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd"
>>>[]
>>> >
>>>
>>>John
>>>
>>>On Fri, Dec 30, 2011 at 10:42 PM, Ron Hitchens
>>
>>><r...@ronsoft.com> wrote:
>>>
>>>Â  Try this code, adjusting the name of your root
>>>node as needed. Â Other entities could also be defined
>>>in the doc-type header.
>>>
>>>Â  This loads the XML first as text, prepends a doc-type
>>
>>>header that defines the entity and then parses the result
>>
>>>as XML. Â This requires making an extra copy of the 

Re: [MarkLogic Dev General] Is MarkLogic susceptible to the hash collision attack?

2012-01-03 Thread David Lee
From the details of the report it looks like you need to know details of the 
hashcode. Implementation as well as the hash table code, if in fact a hashtable 
is used.
Very unlikely the same exact exploit would work across systems.
Also I'm very skeptical ...  Even a badly written hashtable shouldn't perform 
as bad as indicated with only thousands of collisions 90 seconds of CPU for 
a few thousand entries ??? 


Sent from my iPad (excuse the terseness) 
David A Lee
d...@calldei.com

On Jan 3, 2012, at 11:12 AM, Geert Josten  wrote:

> Ryan,
>  
> Do you recall there was any mentioning of Apache HTTPD by any chance?
>  
> Kind regards,
> Geert
>  
> Van: general-boun...@developer.marklogic.com 
> [mailto:general-boun...@developer.marklogic.com] Namens seme...@hotmail.com
> Verzonden: dinsdag 3 januari 2012 16:56
> Aan: general@developer.marklogic.com
> Onderwerp: Re: [MarkLogic Dev General] Is MarkLogic susceptible to the hash 
> collision attack?
>  
> I haven't been able to produce this problem on a MarkLogic instance. My 
> concerns have been assuaged about it for MarkLogic.
> 
> From: geert.jos...@dayon.nl
> Date: Tue, 3 Jan 2012 15:54:47 +0100
> To: general@developer.marklogic.com
> Subject: Re: [MarkLogic Dev General] Is MarkLogic susceptible to the hash 
> collision attack?
> 
> Hi Ryan,
>  
> Have you tried? (at home preferably ;)
>  
> Kind regards,
> Geert
>  
> Van: general-boun...@developer.marklogic.com 
> [mailto:general-boun...@developer.marklogic.com] Namens seme...@hotmail.com
> Verzonden: donderdag 29 december 2011 18:16
> Aan: general@developer.marklogic.com
> Onderwerp: [MarkLogic Dev General] Is MarkLogic susceptible to the hash 
> collision attack?
>  
> Quote:
> 
> Researchers have shown how a flaw that is common to most popular Web 
> programming languages can be used to launch denial-of-service attacks by 
> exploiting hash tables. Announced publicly on Wednesday at the Chaos 
> Communication Congress event in Germany, the flaw affects a long list of 
> technologies, including PHP, ASP.NET, Java, Python, Ruby, Apache Tomcat, 
> Apache Geronimo, Jetty, and Glassfish, as well as Google's open source 
> JavaScript engine V8. The vendors and developers behind these technologies 
> are working to close the vulnerability, with Microsoft warning of "imminent 
> public release of exploit code" for what is known as a hash collision attack.
> 
> ...
> 
> "Hash tables are a commonly used data structure in most programming 
> languages," they explained. "Web application servers or platforms commonly 
> parse attacker-controlled POST form data into hash tables automatically, so 
> that they can be accessed by application developers. If the language does not 
> provide a randomized hash function or the application server does not 
> recognize attacks using multi-collisions, an attacker can degenerate the hash 
> table by sending lots of colliding keys. The algorithmic complexity of 
> inserting n elements into the table then goes to O(n**2), making it possible 
> to exhaust hours of CPU time using a single HTTP request."
> 
> more-> 
> http://arstechnica.com/business/news/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack.ars
> 
> Seems to be a big deal with a lot of servers. Is MarkLogic affected?
> 
> thanks,
> Ryan
> 
> ___ General mailing list 
> General@developer.marklogic.com 
> http://developer.marklogic.com/mailman/listinfo/general
> ___
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Trying to add a rule to xqdt plug-in

2012-01-03 Thread Geert Josten
Hi Raghu,



Not quite sure I understand what you are trying to achieve. Can you
elaborate on what you are trying to do? Are you talking about something
like the Java imports cleanup of Eclipse, but applied to declarations in an
XQuery module? Or is it more like a cleanup of your output?



Did you consider posting an issue at
https://bugs.eclipse.org/bugs/enter_bug.cgi?product=WTP%20Incubator&component=wtp.inc.xquery?



Kind regards,

Geert



*Van:* general-boun...@developer.marklogic.com [mailto:
general-boun...@developer.marklogic.com] *Namens *Raghu
*Verzonden:* dinsdag 3 januari 2012 16:26
*Aan:* General MarkLogic Developer Discussion
*Onderwerp:* [MarkLogic Dev General] Trying to add a rule to xqdt plug-in



Hi All,



  I am trying to add a rule to xqdt plugin (e.g) to remove unused
namespaces/variables. Is there a simpler way to do it? Can somebody please
point me to an xquery parser or anything which would help me accmpolish
this task. I already tried downloading xqdt code but there are too many
dependencies.



Thanks in advance

Raghu
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Invalid entity reference "ndash"

2012-01-03 Thread Geert Josten
Hi Dan,

It surprised me a bit too. But not sure the XML rec requires XML parsers
to support DTD's at all (can't seem to find the relevant section). But
MarkLogic Server has very good XML Schema support, so I wouldn't say it
doesn't validate at all. It is just focusing on XML Schema instead of DTD
(nor both)..

Kind regards,
Geert

-Oorspronkelijk bericht-
Van: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] Namens dv...@dvint.com
Verzonden: dinsdag 3 januari 2012 16:53
Aan: General MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Invalid entity reference "ndash"

That is an interesting limitation I was not aware of. Works with XML
documents but does not provide full validation capabilites - or the
ability to work with valid documents as is.

I got some intial training and intoroduction to Marklogic but then never
got the project to actually implement anything.

..dan


> Hi John,
>
>
>
> MarkLogic Server handles DOCTYPE rules only very limited. Only entity
> declarations in the local subset are parsed and used. References to any
> external entity or dtd file is ignored. That is why a dtd ref doesn’t
> work.
> Ron gave a work-around (I have posted similar code to handle mixed
> encodings by the way some while ago), but that is pretty expensive if
you
> need to load many docs. If you need to load many docs, you might prefer
to
> use xmlsh or recordloader or any of the other available tools to insert
> your data. These have better support for DOCTYPEs..
>
>
>
> I do recall another workaround, which might be acceptable for you. There
> is
> this repair option that defaults to none. If you change it to full, it
> should allow most of the iso entities and convert them to the
appropriate
> Unicode characters automatically. The full repair might do more than you
> need though, in case the xml is not well-formed..
>
>
>
> Kind regards,
>
> Geert
>
>
>
> *Van:* general-boun...@developer.marklogic.com [mailto:
> general-boun...@developer.marklogic.com] *Namens *John Zhong
> *Verzonden:* vrijdag 30 december 2011 18:17
> *Aan:* General MarkLogic Developer Discussion
> *Onderwerp:* Re: [MarkLogic Dev General] Invalid entity reference
"ndash"
>
>
>
> Yes, I make sure the dtd and the associated ent files are in the correct
> location. And I was saying the xdmp:document-get function does not work
in
> this case.
>
> Actually, I did a simple test by defining a simple xml and dtd:
>
> test.dtd:
>
> 
> 
>
> test.xml:
>
> 
> 
> –
>
> I can open the xml by ie without problem (it shows error if I delete the
> entity definition in test.dtd), then tested xdmp:document-get function
> again, it shows a error:
>
> [1.0-ml] XDMP-DOCENTITYREF: xdmp:document-get("/test.xml") -- Invalid
> entity reference "ndash" at /test.xml line 3
>
> John
>
> On Sat, Dec 31, 2011 at 12:17 AM, Dan Vint  wrote:
>
> You need to make sure the external text entity is
> being read (or that the file being referenced is
> being found). You may not actually be reading in
> the DTD file itself. The way it is configured is
> correct, your code just needs to make sure it is
> expanding the doctype via the system or public
> identifier (journalpublishing3.dtd or via XML
> catalog //NLM//DTD Journal Publishing DTD v3.0
> 20080202//EN") and associated files before it starts parsing the
content.
>
> ..dan
>
>
>
> At 07:28 AM 12/30/2011, you wrote:
>>Thank you for your note, Erik.
>>
>>And thank you for your code, Ron. Yes, I did try that and it worked.
>>
>>The xml file has already had a dtd declaration
>>>Publishing DTD v3.0 20080202//EN"
>>"journalpublishing3.dtd">, and the dtd file also
>>reference to a isopub.ent file that defined this
>>entity.
>>
>>So, it seems I have to make the entity definition appear in the xml:
>>
>>>Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd"
>>[]
>> >
>>
>>John
>>
>>On Fri, Dec 30, 2011 at 10:42 PM, Ron Hitchens
>
>><r...@ronsoft.com> wrote:
>>
>>Â  Try this code, adjusting the name of your root
>>node as needed. Â Other entities could also be defined
>>in the doc-type header.
>>
>>Â  This loads the XML first as text, prepends a doc-type
>
>>header that defines the entity and then parses the result
>
>>as XML. Â This requires making an extra copy of the document
>
>>in memory, so it could bump against memory limits if you
>>do it in volume with lots of large documents.
>>
>
>>Â  Handy list of entity definitions here:
>><
>
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_referen
ces
>>
>
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_referen
ces
>
>>
>>
>>xquery version '1.0-ml';
>>
>>declare variable $file-path := "/tmp/z.xml";
>>Â  (: path of XML doc in filesystem :)
>>
>>declare variable $document-uri :=
>
>>"/test/mydoc.xml"; Â  (: URI to insert it as in MarkLogic :)
>>
>>declare variable $load-options :=
>>Â  Â  Â  Â  Â  Â (: force loading as text, not as XML :)
>>Â  Â 
>>Â  Â  Â text
>

Re: [MarkLogic Dev General] Is MarkLogic susceptible to the hash collision attack?

2012-01-03 Thread Geert Josten
Ryan,



Do you recall there was any mentioning of Apache HTTPD by any chance?



Kind regards,

Geert



*Van:* general-boun...@developer.marklogic.com [mailto:
general-boun...@developer.marklogic.com] *Namens *seme...@hotmail.com
*Verzonden:* dinsdag 3 januari 2012 16:56
*Aan:* general@developer.marklogic.com
*Onderwerp:* Re: [MarkLogic Dev General] Is MarkLogic susceptible to the
hash collision attack?



I haven't been able to produce this problem on a MarkLogic instance. My
concerns have been assuaged about it for MarkLogic.
--

From: geert.jos...@dayon.nl
Date: Tue, 3 Jan 2012 15:54:47 +0100
To: general@developer.marklogic.com
Subject: Re: [MarkLogic Dev General] Is MarkLogic susceptible to the hash
collision attack?

Hi Ryan,



Have you tried? (at home preferably ;)



Kind regards,

Geert



*Van:* general-boun...@developer.marklogic.com [mailto:
general-boun...@developer.marklogic.com] *Namens *seme...@hotmail.com
*Verzonden:* donderdag 29 december 2011 18:16
*Aan:* general@developer.marklogic.com
*Onderwerp:* [MarkLogic Dev General] Is MarkLogic susceptible to the hash
collision attack?



Quote:

Researchers have shown how a flaw that is common to most popular Web
programming languages can be used to launch denial-of-service attacks by
exploiting hash tables. Announced publicly on Wednesday at the Chaos
Communication 
Congressevent
in Germany, the flaw affects a long list of technologies, including
PHP, ASP.NET, Java, Python, Ruby, Apache Tomcat, Apache Geronimo, Jetty,
and Glassfish, as well as Google's open source JavaScript engine V8. The
vendors and developers behind these technologies are working to close the
vulnerability, with Microsoft warning of "imminent public release of
exploit 
code"
for what is known as a hash collision attack.

...

"Hash tables are a commonly used data structure in most programming
languages," they explained. "Web application servers or platforms commonly
parse attacker-controlled POST form data into hash tables automatically, so
that they can be accessed by application developers. If the language does
not provide a randomized hash function or the application server does not
recognize attacks using multi-collisions, an attacker can degenerate the
hash table by sending lots of colliding keys. The algorithmic complexity of
inserting n elements into the table then goes to O(n**2), making it
possible to exhaust hours of CPU time using a single HTTP request."

more->
http://arstechnica.com/business/news/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack.ars

Seems to be a big deal with a lot of servers. Is MarkLogic affected?

thanks,
Ryan


___ General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Is MarkLogic susceptible to the hash collision attack?

2012-01-03 Thread seme...@hotmail.com

I haven't been able to produce this problem on a MarkLogic instance. My 
concerns have been assuaged about it for MarkLogic.

From: geert.jos...@dayon.nl
Date: Tue, 3 Jan 2012 15:54:47 +0100
To: general@developer.marklogic.com
Subject: Re: [MarkLogic Dev General] Is MarkLogic susceptible to the hash 
collision attack?



Hi Ryan,
 Have you tried? (at home preferably ;)
 Kind regards,
Geert 
Van: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] Namens seme...@hotmail.com

Verzonden: donderdag 29 december 2011 18:16
Aan: general@developer.marklogic.com
Onderwerp: [MarkLogic Dev General] Is MarkLogic susceptible to the hash 
collision attack?
 Quote:

Researchers have shown how a flaw that is common to most popular Web 
programming languages can be used to launch denial-of-service attacks by 
exploiting hash tables. Announced publicly on Wednesday at the Chaos 
Communication Congress event in Germany, the flaw affects a long list of 
technologies, including PHP, ASP.NET, Java, Python, Ruby, Apache Tomcat, Apache 
Geronimo, Jetty, and Glassfish, as well as Google's open source JavaScript 
engine V8. The vendors and developers behind these technologies are working to 
close the vulnerability, with Microsoft warning of "imminent public release of 
exploit code" for what is known as a hash collision attack.


...

"Hash tables are a commonly used data structure in most programming languages," 
they explained. "Web application servers or platforms commonly parse 
attacker-controlled POST form data into hash tables automatically, so that they 
can be accessed by application developers. If the language does not provide a 
randomized hash function or the application server does not recognize attacks 
using multi-collisions, an attacker can degenerate the hash table by sending 
lots of colliding keys. The algorithmic complexity of inserting n elements into 
the table then goes to O(n**2), making it possible to exhaust hours of CPU time 
using a single HTTP request."


more-> 
http://arstechnica.com/business/news/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack.ars


Seems to be a big deal with a lot of servers. Is MarkLogic affected?

thanks,
Ryan

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general 
  ___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Invalid entity reference "ndash"

2012-01-03 Thread dvint
That is an interesting limitation I was not aware of. Works with XML
documents but does not provide full validation capabilites - or the
ability to work with valid documents as is.

I got some intial training and intoroduction to Marklogic but then never
got the project to actually implement anything.

..dan


> Hi John,
>
>
>
> MarkLogic Server handles DOCTYPE rules only very limited. Only entity
> declarations in the local subset are parsed and used. References to any
> external entity or dtd file is ignored. That is why a dtd ref doesn’t
> work.
> Ron gave a work-around (I have posted similar code to handle mixed
> encodings by the way some while ago), but that is pretty expensive if you
> need to load many docs. If you need to load many docs, you might prefer to
> use xmlsh or recordloader or any of the other available tools to insert
> your data. These have better support for DOCTYPEs..
>
>
>
> I do recall another workaround, which might be acceptable for you. There
> is
> this repair option that defaults to none. If you change it to full, it
> should allow most of the iso entities and convert them to the appropriate
> Unicode characters automatically. The full repair might do more than you
> need though, in case the xml is not well-formed..
>
>
>
> Kind regards,
>
> Geert
>
>
>
> *Van:* general-boun...@developer.marklogic.com [mailto:
> general-boun...@developer.marklogic.com] *Namens *John Zhong
> *Verzonden:* vrijdag 30 december 2011 18:17
> *Aan:* General MarkLogic Developer Discussion
> *Onderwerp:* Re: [MarkLogic Dev General] Invalid entity reference "ndash"
>
>
>
> Yes, I make sure the dtd and the associated ent files are in the correct
> location. And I was saying the xdmp:document-get function does not work in
> this case.
>
> Actually, I did a simple test by defining a simple xml and dtd:
>
> test.dtd:
>
> 
> 
>
> test.xml:
>
> 
> 
> –
>
> I can open the xml by ie without problem (it shows error if I delete the
> entity definition in test.dtd), then tested xdmp:document-get function
> again, it shows a error:
>
> [1.0-ml] XDMP-DOCENTITYREF: xdmp:document-get("/test.xml") -- Invalid
> entity reference "ndash" at /test.xml line 3
>
> John
>
> On Sat, Dec 31, 2011 at 12:17 AM, Dan Vint  wrote:
>
> You need to make sure the external text entity is
> being read (or that the file being referenced is
> being found). You may not actually be reading in
> the DTD file itself. The way it is configured is
> correct, your code just needs to make sure it is
> expanding the doctype via the system or public
> identifier (journalpublishing3.dtd or via XML
> catalog //NLM//DTD Journal Publishing DTD v3.0
> 20080202//EN") and associated files before it starts parsing the content.
>
> ..dan
>
>
>
> At 07:28 AM 12/30/2011, you wrote:
>>Thank you for your note, Erik.
>>
>>And thank you for your code, Ron. Yes, I did try that and it worked.
>>
>>The xml file has already had a dtd declaration
>>>Publishing DTD v3.0 20080202//EN"
>>"journalpublishing3.dtd">, and the dtd file also
>>reference to a isopub.ent file that defined this
>>entity.
>>
>>So, it seems I have to make the entity definition appear in the xml:
>>
>>>Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd"
>>[]
>> >
>>
>>John
>>
>>On Fri, Dec 30, 2011 at 10:42 PM, Ron Hitchens
>
>><r...@ronsoft.com> wrote:
>>
>>Â  Try this code, adjusting the name of your root
>>node as needed. Â Other entities could also be defined
>>in the doc-type header.
>>
>>Â  This loads the XML first as text, prepends a doc-type
>
>>header that defines the entity and then parses the result
>
>>as XML. Â This requires making an extra copy of the document
>
>>in memory, so it could bump against memory limits if you
>>do it in volume with lots of large documents.
>>
>
>>Â  Handy list of entity definitions here:
>><
> http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
>>
> http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
>
>>
>>
>>xquery version '1.0-ml';
>>
>>declare variable $file-path := "/tmp/z.xml";
>>Â  (: path of XML doc in filesystem :)
>>
>>declare variable $document-uri :=
>
>>"/test/mydoc.xml"; Â  (: URI to insert it as in MarkLogic :)
>>
>>declare variable $load-options :=
>>Â  Â  Â  Â  Â  Â (: force loading as text, not as XML :)
>>Â  Â 
>>Â  Â  Â text
>>Â  Â ;
>
>>
>>declare variable $doctype-decl as xs:string :=
>>Â  (: adjust root node name and add entities as needed :)
>
>>Â  ']>';
>
>>
>>let $doc-as-text := xdmp:document-get ($file-path, $load-options)
>>let $doc-with-decl := fn:concat ($doctype-decl, $doc-as-text)
>>let $doc := xdmp:unquote ($doc-with-decl)
>>
>>return xdmp:document-insert ($document-uri, $doc)
>>
>>
>>On Dec 30, 2011, at 6:05 AM, John Zhong wrote:
>>
>> > Thanks for your quick answer, Harry.
>> >
>> > But how if I don't want to modify the original xml?
>> >
>> > Thanks,
>> > John
>> >
>> > On Fri, Dec 30, 2011 at 1:59 PM, Harry B.
>
>> <

[MarkLogic Dev General] Trying to add a rule to xqdt plug-in

2012-01-03 Thread Raghu
Hi All,

  I am trying to add a rule to xqdt plugin (e.g) to remove unused
namespaces/variables. Is there a simpler way to do it? Can somebody please
point me to an xquery parser or anything which would help me accmpolish
this task. I already tried downloading xqdt code but there are too many
dependencies.

Thanks in advance
Raghu
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Calculated field, search and sorting

2012-01-03 Thread Geert Josten
Hi Matt,



Do discounts apply to specific collections of products? And could you
divide all products into a reasonable number of collections that way? You
could always say something like: (collection1 and discountbucket1) or
(collection2 and discountbucket2) or … (using cts:and-query and
cts:or-query). You would still have to do various calculations, determining
the correct bucket for each collection of products that applies to the
particular user, as well as recalculate each price that is being displayed,
but it should be manageable..



Kind regards,

Geert



*Van:* general-boun...@developer.marklogic.com [mailto:
general-boun...@developer.marklogic.com] *Namens *
matt.broekh...@thomsonreuters.com
*Verzonden:* vrijdag 30 december 2011 21:43
*Aan:* general@developer.marklogic.com
*Onderwerp:* Re: [MarkLogic Dev General] Calculated field, search and
sorting



Yes the discounts are integers. I was thinking along those lines, but then
there needs to be a way to tie the group to the bucket since you can’t say
“this group uses this bucket across the board”



Say there are 10,000 groups, there are 100 buckets. How do you get around
having 10,000 entries spread across each bucket -- for each product?



*From:* general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] *On Behalf Of *Rick Pelton
*Sent:* Friday, December 30, 2011 12:32 PM
*To:* General MarkLogic Developer Discussion
*Subject:* Re: [MarkLogic Dev General] Calculated field, search and sorting



Hi Matt,



You might still be able to use the bucket approach.  Are the discounts
integers?  If so, then you are down to 100 buckets.  If you make additional
observations about the discounts, such as none are greater than 50%, or
discounts occur in multiples of 5% …, you could limit the buckets even
further.  Even if the discounts aren’t integers, you can use hybrid
solution - still round to an integer and post process – this would help you
limit the number of documents you have to post process. Anything that you
could do to limit the # of documents that you have to post process will
help with performance.




Rick











*From:* general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] *On Behalf Of *
matt.broekh...@thomsonreuters.com
*Sent:* Friday, December 30, 2011 9:40 AM
*To:* general@developer.marklogic.com
*Subject:* Re: [MarkLogic Dev General] Calculated field, search and sorting



There are > 10,000 groups. Each has a percentage discount. The vast
majority of these percentages are 100%, but some are not.



It looks like denormalizing is just not going to work, so the trade off is
a O(n) operation to determine price.



Thanks for your input.



*From:* general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] *On Behalf Of *Jason Hunter
*Sent:* Thursday, December 29, 2011 11:49 PM
*To:* General MarkLogic Developer Discussion
*Subject:* Re: [MarkLogic Dev General] Calculated field, search and sorting



Hi Matt,



There's no magic bullet.



If you can group people's discounts into nice stable buckets ("friends" vs
"family" vs "public") you can denormalize each group's price into the
document and query against that using a range index.  That'll work for up
to say 10 or maybe 50 buckets.  But if every customer has a distinct
discount schedule, that won't be feasible.



In that case I'd probably use heuristics to guess at ranges, limit to that
range, then use calculations to refine the results.  For example, to find
all items over $10 you first find all items over $10 retail using indexes
and then apply discounts live to see which are still over $10.  Sort by the
calculated value.  It's all very easy to write in code, just a matter of
how fast you can do the calculations.



If you want below $10 it's a two-step process.  You can immediately include
all that started below $10, then look at the max discount the person gets
(say 25%) and so include everything $12.50 and lower too for dynamic price
consideration.



-jh-



On Dec 29, 2011, at 8:27 AM, matt.broekh...@thomsonreuters.com wrote:



Hello marklogic gurus. I’m still getting my feet wet, but am starting to
get a decent grip on how this system works.



However, I have one requirement that I cannot seem to develop a strategy
for using the documentation.



Say I have 1-N many product documents. Each product has a title, a
description, etc. Marklogic searches and sorts this very well.

My situation involves the price of these products. Some arbitrary users
have an arbitrary percentage discount on an arbitrary set of products.



It would be easy enough to post process the results to  reflect the correct
price based on the user, however that doesn’t help me when I need to find
all products that are > $XX , or sort the results based on price, since
this isn’t “in” the data. It is a calculated field.



If the discount was applicable across the board to all the products, again
that would be eas

Re: [MarkLogic Dev General] Is MarkLogic susceptible to the hash collision attack?

2012-01-03 Thread Geert Josten
Hi Ryan,



Have you tried? (at home preferably ;)



Kind regards,

Geert



*Van:* general-boun...@developer.marklogic.com [mailto:
general-boun...@developer.marklogic.com] *Namens *seme...@hotmail.com
*Verzonden:* donderdag 29 december 2011 18:16
*Aan:* general@developer.marklogic.com
*Onderwerp:* [MarkLogic Dev General] Is MarkLogic susceptible to the hash
collision attack?



Quote:

Researchers have shown how a flaw that is common to most popular Web
programming languages can be used to launch denial-of-service attacks by
exploiting hash tables. Announced publicly on Wednesday at the Chaos
Communication 
Congressevent
in Germany, the flaw affects a long list of technologies, including
PHP, ASP.NET, Java, Python, Ruby, Apache Tomcat, Apache Geronimo, Jetty,
and Glassfish, as well as Google's open source JavaScript engine V8. The
vendors and developers behind these technologies are working to close the
vulnerability, with Microsoft warning of "imminent public release of
exploit 
code"
for what is known as a hash collision attack.

...

"Hash tables are a commonly used data structure in most programming
languages," they explained. "Web application servers or platforms commonly
parse attacker-controlled POST form data into hash tables automatically, so
that they can be accessed by application developers. If the language does
not provide a randomized hash function or the application server does not
recognize attacks using multi-collisions, an attacker can degenerate the
hash table by sending lots of colliding keys. The algorithmic complexity of
inserting n elements into the table then goes to O(n**2), making it
possible to exhaust hours of CPU time using a single HTTP request."

more->
http://arstechnica.com/business/news/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack.ars

Seems to be a big deal with a lot of servers. Is MarkLogic affected?

thanks,
Ryan
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Invalid entity reference "ndash"

2012-01-03 Thread Geert Josten
Hi John,



MarkLogic Server handles DOCTYPE rules only very limited. Only entity
declarations in the local subset are parsed and used. References to any
external entity or dtd file is ignored. That is why a dtd ref doesn’t work.
Ron gave a work-around (I have posted similar code to handle mixed
encodings by the way some while ago), but that is pretty expensive if you
need to load many docs. If you need to load many docs, you might prefer to
use xmlsh or recordloader or any of the other available tools to insert
your data. These have better support for DOCTYPEs..



I do recall another workaround, which might be acceptable for you. There is
this repair option that defaults to none. If you change it to full, it
should allow most of the iso entities and convert them to the appropriate
Unicode characters automatically. The full repair might do more than you
need though, in case the xml is not well-formed..



Kind regards,

Geert



*Van:* general-boun...@developer.marklogic.com [mailto:
general-boun...@developer.marklogic.com] *Namens *John Zhong
*Verzonden:* vrijdag 30 december 2011 18:17
*Aan:* General MarkLogic Developer Discussion
*Onderwerp:* Re: [MarkLogic Dev General] Invalid entity reference "ndash"



Yes, I make sure the dtd and the associated ent files are in the correct
location. And I was saying the xdmp:document-get function does not work in
this case.

Actually, I did a simple test by defining a simple xml and dtd:

test.dtd:




test.xml:



–

I can open the xml by ie without problem (it shows error if I delete the
entity definition in test.dtd), then tested xdmp:document-get function
again, it shows a error:

[1.0-ml] XDMP-DOCENTITYREF: xdmp:document-get("/test.xml") -- Invalid
entity reference "ndash" at /test.xml line 3

John

On Sat, Dec 31, 2011 at 12:17 AM, Dan Vint  wrote:

You need to make sure the external text entity is
being read (or that the file being referenced is
being found). You may not actually be reading in
the DTD file itself. The way it is configured is
correct, your code just needs to make sure it is
expanding the doctype via the system or public
identifier (journalpublishing3.dtd or via XML
catalog //NLM//DTD Journal Publishing DTD v3.0
20080202//EN") and associated files before it starts parsing the content.

..dan



At 07:28 AM 12/30/2011, you wrote:
>Thank you for your note, Erik.
>
>And thank you for your code, Ron. Yes, I did try that and it worked.
>
>The xml file has already had a dtd declaration
>Publishing DTD v3.0 20080202//EN"
>"journalpublishing3.dtd">, and the dtd file also
>reference to a isopub.ent file that defined this
>entity.
>
>So, it seems I have to make the entity definition appear in the xml:
>
>Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd"
>[]
> >
>
>John
>
>On Fri, Dec 30, 2011 at 10:42 PM, Ron Hitchens

><r...@ronsoft.com> wrote:
>
>Â  Try this code, adjusting the name of your root
>node as needed. Â Other entities could also be defined
>in the doc-type header.
>
>Â  This loads the XML first as text, prepends a doc-type

>header that defines the entity and then parses the result

>as XML. Â This requires making an extra copy of the document

>in memory, so it could bump against memory limits if you
>do it in volume with lots of large documents.
>

>Â  Handy list of entity definitions here:
><
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
>
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

>
>
>xquery version '1.0-ml';
>
>declare variable $file-path := "/tmp/z.xml";
>Â  (: path of XML doc in filesystem :)
>
>declare variable $document-uri :=

>"/test/mydoc.xml"; Â  (: URI to insert it as in MarkLogic :)
>
>declare variable $load-options :=
>Â  Â  Â  Â  Â  Â (: force loading as text, not as XML :)
>Â  Â 
>Â  Â  Â text
>Â  Â ;

>
>declare variable $doctype-decl as xs:string :=
>Â  (: adjust root node name and add entities as needed :)

>Â  ']>';

>
>let $doc-as-text := xdmp:document-get ($file-path, $load-options)
>let $doc-with-decl := fn:concat ($doctype-decl, $doc-as-text)
>let $doc := xdmp:unquote ($doc-with-decl)
>
>return xdmp:document-insert ($document-uri, $doc)
>
>
>On Dec 30, 2011, at 6:05 AM, John Zhong wrote:
>
> > Thanks for your quick answer, Harry.
> >
> > But how if I don't want to modify the original xml?
> >
> > Thanks,
> > John
> >
> > On Fri, Dec 30, 2011 at 1:59 PM, Harry B.

> <dna...@gmail.com> wrote:
> > Try using the numeric instead
> >
> > –
> >
> > I can't remember why, but this usually works.
> >
> > On Dec 29, 2011 10:53 PM, "John Zhong"

> <j...@yuxipacific.com> wrote:
> > Hi all,
> >
> > I am having problem to use the
> xdmp:document-get function to read a xml with
> entiry reference – I want to know how to
> fix this problem? I am using ML 5.0-1.2 version.
> >
> > [1.0-ml] XDMP-DOCENTITYREF:
> xdmp:document-get("D:\test.xml") -- Invalid
> entity reference "ndash" 

Re: [MarkLogic Dev General] Search documents without a node tag

2012-01-03 Thread Michael Sokolov
You can do cts:not-query(cts:element-query(xs:QName("TITLE"), 
cts:and-query(( to find documents that don't have any TITLE 
elements.  I don't know how to do that in the search api, though.


-Mike

On 1/3/2012 8:48 AM, Mariano Grau Calín wrote:

Hi all,
We want to find documents where a node tag not exist.
By example, this query return documents where TITLE has a value. The 
opposite we want.

doc()/DOC[AGENCY='EFE' and TITLE]
If we try to negate query, result is not like expected.
doc()/DOC[AGENCY='Grupo Joly' and TITLE[text() is null]] (: same result :)
doc()/DOC[AGENCY='EFE' and not(TITLE)] (: all documents where TITLE 
exists or not exists :)

doc()/DOC[AGENCY='EFE' and TITLE='']  (: zero results!!! :)
Really, we'd like to define a constraint in search api and to write 
something like:

search:search('title:null age:efe)
Or otherwise a additionat-query with cts:query code.
Regards,
Mariano Grau
mgrau @ grupojoly.com
Dpto. Sistemas
Grupo Joly


___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Xquery review tool

2012-01-03 Thread Geert Josten
Hi Arti,



What kind of reviewing are you looking for? Code style? Validity?



Kind regards,

Geert



*Van:* general-boun...@developer.marklogic.com [mailto:
general-boun...@developer.marklogic.com] *Namens *Arti Paramanantham
*Verzonden:* woensdag 28 december 2011 9:31
*Aan:* general@developer.marklogic.com
*Onderwerp:* [MarkLogic Dev General] Xquery review tool




Hi all,



Does anyone know of a tool/plugin for reviewing xquery files?



Thanks,

Arti
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


[MarkLogic Dev General] How to use near query in search:search function

2012-01-03 Thread Mariano Grau Calín
Hi,
 
Is there any operator in search:search function for to run near-query searchs?
 
Regards,
 
Mariano Grau
mgrau @ grupojoly.com
Dpto. Sistemas
Grupo Joly
 
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


[MarkLogic Dev General] Search documents without a node tag

2012-01-03 Thread Mariano Grau Calín
Hi all,
 
We want to find documents where a node tag not exist.
By example, this query return documents where TITLE has a value. The opposite 
we want.
 
doc()/DOC[AGENCY='EFE' and TITLE]
 
If we try to negate query, result is not like expected.
 
doc()/DOC[AGENCY='Grupo Joly' and TITLE[text() is null]] (: same result :)
 
doc()/DOC[AGENCY='EFE' and not(TITLE)] (: all documents where TITLE exists or 
not exists :)
 
doc()/DOC[AGENCY='EFE' and TITLE='']  (: zero results!!! :)
 
Really, we'd like to define a constraint in search api and to write something 
like:
 
search:search('title:null age:efe)
 
Or otherwise a additionat-query with cts:query code.
 
 
Regards,
 
Mariano Grau
mgrau @ grupojoly.com
Dpto. Sistemas
Grupo Joly
 
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general