Re: [MarkLogic Dev General] XDMP:http-get and 304 responses

2015-01-08 Thread David Ennis
Although not an ideal solution, but if it does end up being a bug that you
have to live with for now,  then perhaps one way that will help with the
delay for the time being at the expense of extra round-trips:

- first do a head request using hdmp:http-head()
- locally decide on the need to download the doc (assuming the
Last-Modified header is returned)
- then do a regular fetch without the If-Modified-Since header (since you
figured that out from the head request)

Again, not ideal, but if xdmp:http-get truly waits for the timeout, then
perhaps this is of value in the meantime.





Kind Regards,
David Ennis


David Ennis
*Content Engineer*

[image: HintTech]  <http://www.hinttech.com/>
Mastering the value of content
creative | technology | content

Delftechpark 37i
2628 XJ Delft
The Netherlands
T: +31 88 268 25 00
M: +31 63 091 72 80

[image: http://www.hinttech.com] <http://www.hinttech.com>
<https://twitter.com/HintTech>  <http://www.facebook.com/HintTech>
<http://www.linkedin.com/company/HintTech>

On 8 January 2015 at 20:13, Geert Josten  wrote:

>  Hi Chris,
>
>  It is not uncommon to be strict in sending, tolerant in receiving with
> such things. I would recommend sending your case to Support. The delay
> sounds unnecessary, and inconvenient..
>
>  Kind regards,
> Geert
>
>   From: Chris Hudson-Silver 
> Reply-To: MarkLogic Developer Discussion 
> Date: Thursday, January 8, 2015 at 6:22 PM
> To: MarkLogic Developer Discussion 
> Subject: Re: [MarkLogic Dev General] XDMP:http-get and 304 responses
>
>   Hi Gert,
>
>
>
> Thanks for your reply.
>
>
>
> I checked and the response did not have a content-length so I checked the
> HTTP spec to see what the content-length should be set to for a 304
> response.
>
> “The 304 response MUST NOT contain a message-body”:
>
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.5
>
> “The presence of a message-body in a request is signaled by the inclusion
> of a Content-Length or Transfer-Encoding header”:
>
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.3
>
>
>
> which makes me think that a 304 response should not have a Content-Length.
>
>
>
> Even so I tried forcing my remote repository to have a Content-Length of 0
> but the header was not getting back to MarkLogic. I thought this might be
> because it was detecting the lack of message body and therefore not setting
> the header so I decided to break the HTTP spec and include a body but again
> the Content-Length and body were missing in my response. Looking into the
> Application Server itself (tomcat) it looks as if a response with a 304
> response will always filter out the content body and Content-Length header
> (probably to force it to comply with the HTTP spec):
>
>
> https://github.com/apache/tomcat60/blob/94b4cf497377e48b116c0bed7ccada0f47cd9c10/java/org/apache/coyote/http11/Http11NioProcessor.java#L1649
>
> indicates that the following is used to remove the content body
>
> https://github.com/apache/tomcat60/blob/94b4cf497377e48b116c0bed7ccada0f47cd9c10/java/org/apache/coyote/http11/filters/VoidOutputFilter.java
>
> and the following will cause the content-length to be removed
>
>
> https://github.com/apache/tomcat60/blob/94b4cf497377e48b116c0bed7ccada0f47cd9c10/java/org/apache/coyote/http11/Http11NioProcessor.java#L1442
>
>
>
> So looks like I can’t force the remote server to set a zero content-length
> if that would cause the MarkLogic call not to wait for the full timeout.
>
> What do you think bug in Tomcat or MarkLogic?
>
>
>
> Thanks again,
>
> Chris
>
>
>
> *From:* Geert Josten [mailto:geert.jos...@marklogic.com
> ]
> *Sent:* 08 January 2015 10:44
> *To:* MarkLogic Developer Discussion
> *Subject:* Re: [MarkLogic Dev General] XDMP:http-get and 304 responses
>
>
>
> Hi Chris,
>
>
>
> Does the response contain a Content-Length? If not, maybe MarkLogic waits
> the full timeout before it decides there is none. If it has one (with a
> value of zero), that might be a bug..
>
>
>
> Kind regards,
>
> Geert
>
>
>
> *From: *Chris Hudson-Silver 
> *Reply-To: *MarkLogic Developer Discussion <
> general@developer.marklogic.com>
> *Date: *Thursday, January 8, 2015 at 11:12 AM
> *To: *"general@developer.marklogic.com" 
> *Subject: *[MarkLogic Dev General] XDMP:http-get and 304 responses
>
>
>
> Hi All,
>
>
>
> Recently I was working on a project that tracks a repository by calling a
> REST webservice that returns back metadata and download URLS for items that
> have changed in the remote repository since the last call. It then checks
> to see if the item has already been downloaded

Re: [MarkLogic Dev General] XDMP:http-get and 304 responses

2015-01-08 Thread Geert Josten
Hi Chris,

It is not uncommon to be strict in sending, tolerant in receiving with such 
things. I would recommend sending your case to Support. The delay sounds 
unnecessary, and inconvenient..

Kind regards,
Geert

From: Chris Hudson-Silver 
mailto:chris.hudson-sil...@ixxus.com>>
Reply-To: MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>
Date: Thursday, January 8, 2015 at 6:22 PM
To: MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] XDMP:http-get and 304 responses

Hi Gert,

Thanks for your reply.

I checked and the response did not have a content-length so I checked the HTTP 
spec to see what the content-length should be set to for a 304 response.
“The 304 response MUST NOT contain a message-body”:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.5
“The presence of a message-body in a request is signaled by the inclusion of a 
Content-Length or Transfer-Encoding header”:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.3

which makes me think that a 304 response should not have a Content-Length.

Even so I tried forcing my remote repository to have a Content-Length of 0 but 
the header was not getting back to MarkLogic. I thought this might be because 
it was detecting the lack of message body and therefore not setting the header 
so I decided to break the HTTP spec and include a body but again the 
Content-Length and body were missing in my response. Looking into the 
Application Server itself (tomcat) it looks as if a response with a 304 
response will always filter out the content body and Content-Length header 
(probably to force it to comply with the HTTP spec):
https://github.com/apache/tomcat60/blob/94b4cf497377e48b116c0bed7ccada0f47cd9c10/java/org/apache/coyote/http11/Http11NioProcessor.java#L1649
indicates that the following is used to remove the content body
https://github.com/apache/tomcat60/blob/94b4cf497377e48b116c0bed7ccada0f47cd9c10/java/org/apache/coyote/http11/filters/VoidOutputFilter.java
and the following will cause the content-length to be removed
https://github.com/apache/tomcat60/blob/94b4cf497377e48b116c0bed7ccada0f47cd9c10/java/org/apache/coyote/http11/Http11NioProcessor.java#L1442

So looks like I can’t force the remote server to set a zero content-length if 
that would cause the MarkLogic call not to wait for the full timeout.
What do you think bug in Tomcat or MarkLogic?

Thanks again,
Chris

From: Geert Josten [mailto:geert.jos...@marklogic.com]
Sent: 08 January 2015 10:44
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] XDMP:http-get and 304 responses

Hi Chris,

Does the response contain a Content-Length? If not, maybe MarkLogic waits the 
full timeout before it decides there is none. If it has one (with a value of 
zero), that might be a bug..

Kind regards,
Geert

From: Chris Hudson-Silver 
mailto:chris.hudson-sil...@ixxus.com>>
Reply-To: MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>
Date: Thursday, January 8, 2015 at 11:12 AM
To: "general@developer.marklogic.com<mailto:general@developer.marklogic.com>" 
mailto:general@developer.marklogic.com>>
Subject: [MarkLogic Dev General] XDMP:http-get and 304 responses

Hi All,

Recently I was working on a project that tracks a repository by calling a REST 
webservice that returns back metadata and download URLS for items that have 
changed in the remote repository since the last call. It then checks to see if 
the item has already been downloaded and if so will call the download URL with 
the HTTP cache headers set as the modification could have been just metadata 
not content. E.g:

let $options := Fri, 21 
Nov 2014 16:53:12 
GMT"1416588792000"full
let $response := xdmp:http-get($url, $options)

I noticed that the run time for this was considerably longer if some of the 
items would return back a Not Modified 304 response so decided to test if it 
was the remote repository or MarkLogic adding the overhead. I did this by 
creating a script that generated CURL commands so I could do the exact same 
requests from the command line and MarkLogic.
The calls back to the command line and Marklogic were returning the exact same 
response including the correct 304 code and an empty response body.
The calls from the command line were taking about 20 seconds less time than the 
calls from MarkLogic and seeing how the global timeout was set to 20 seconds I 
decided to try the MarkLogic calls but with a 1 second time out e.g:


let $options := 1Fri, 21 Nov 
2014 16:53:12 
GMT"1416588792000"full

and now they are taking approximately 1 second longer than the calls from the 
command line.

Has anyone else encountered this?
Could it be that MarkLogic is waiting for the response body even though it has 
received a valid response header? The HTTP 1.1 standard states that a response 
does not necessarily need a response body

Re: [MarkLogic Dev General] XDMP:http-get and 304 responses

2015-01-08 Thread Chris Hudson-Silver
Hi Gert,

Thanks for your reply.

I checked and the response did not have a content-length so I checked the HTTP 
spec to see what the content-length should be set to for a 304 response.
"The 304 response MUST NOT contain a message-body":
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.5
"The presence of a message-body in a request is signaled by the inclusion of a 
Content-Length or Transfer-Encoding header":
http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.3

which makes me think that a 304 response should not have a Content-Length.

Even so I tried forcing my remote repository to have a Content-Length of 0 but 
the header was not getting back to MarkLogic. I thought this might be because 
it was detecting the lack of message body and therefore not setting the header 
so I decided to break the HTTP spec and include a body but again the 
Content-Length and body were missing in my response. Looking into the 
Application Server itself (tomcat) it looks as if a response with a 304 
response will always filter out the content body and Content-Length header 
(probably to force it to comply with the HTTP spec):
https://github.com/apache/tomcat60/blob/94b4cf497377e48b116c0bed7ccada0f47cd9c10/java/org/apache/coyote/http11/Http11NioProcessor.java#L1649
indicates that the following is used to remove the content body
https://github.com/apache/tomcat60/blob/94b4cf497377e48b116c0bed7ccada0f47cd9c10/java/org/apache/coyote/http11/filters/VoidOutputFilter.java
and the following will cause the content-length to be removed
https://github.com/apache/tomcat60/blob/94b4cf497377e48b116c0bed7ccada0f47cd9c10/java/org/apache/coyote/http11/Http11NioProcessor.java#L1442

So looks like I can't force the remote server to set a zero content-length if 
that would cause the MarkLogic call not to wait for the full timeout.
What do you think bug in Tomcat or MarkLogic?

Thanks again,
Chris

From: Geert Josten [mailto:geert.jos...@marklogic.com]
Sent: 08 January 2015 10:44
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] XDMP:http-get and 304 responses

Hi Chris,

Does the response contain a Content-Length? If not, maybe MarkLogic waits the 
full timeout before it decides there is none. If it has one (with a value of 
zero), that might be a bug..

Kind regards,
Geert

From: Chris Hudson-Silver 
mailto:chris.hudson-sil...@ixxus.com>>
Reply-To: MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>
Date: Thursday, January 8, 2015 at 11:12 AM
To: "general@developer.marklogic.com<mailto:general@developer.marklogic.com>" 
mailto:general@developer.marklogic.com>>
Subject: [MarkLogic Dev General] XDMP:http-get and 304 responses

Hi All,

Recently I was working on a project that tracks a repository by calling a REST 
webservice that returns back metadata and download URLS for items that have 
changed in the remote repository since the last call. It then checks to see if 
the item has already been downloaded and if so will call the download URL with 
the HTTP cache headers set as the modification could have been just metadata 
not content. E.g:

let $options := Fri, 21 
Nov 2014 16:53:12 
GMT"1416588792000"full
let $response := xdmp:http-get($url, $options)

I noticed that the run time for this was considerably longer if some of the 
items would return back a Not Modified 304 response so decided to test if it 
was the remote repository or MarkLogic adding the overhead. I did this by 
creating a script that generated CURL commands so I could do the exact same 
requests from the command line and MarkLogic.
The calls back to the command line and Marklogic were returning the exact same 
response including the correct 304 code and an empty response body.
The calls from the command line were taking about 20 seconds less time than the 
calls from MarkLogic and seeing how the global timeout was set to 20 seconds I 
decided to try the MarkLogic calls but with a 1 second time out e.g:


let $options := 1Fri, 21 Nov 
2014 16:53:12 
GMT"1416588792000"full

and now they are taking approximately 1 second longer than the calls from the 
command line.

Has anyone else encountered this?
Could it be that MarkLogic is waiting for the response body even though it has 
received a valid response header? The HTTP 1.1 standard states that a response 
does not necessarily need a response body so if this is the case it maybe a bug 
in MarkLogics HTTP module.
Or am I missing something vital in my request options?

Thanks in advance,

Chris

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] XDMP:http-get and 304 responses

2015-01-08 Thread Geert Josten
Hi Chris,

Does the response contain a Content-Length? If not, maybe MarkLogic waits the 
full timeout before it decides there is none. If it has one (with a value of 
zero), that might be a bug..

Kind regards,
Geert

From: Chris Hudson-Silver 
mailto:chris.hudson-sil...@ixxus.com>>
Reply-To: MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>
Date: Thursday, January 8, 2015 at 11:12 AM
To: "general@developer.marklogic.com<mailto:general@developer.marklogic.com>" 
mailto:general@developer.marklogic.com>>
Subject: [MarkLogic Dev General] XDMP:http-get and 304 responses

Hi All,

Recently I was working on a project that tracks a repository by calling a REST 
webservice that returns back metadata and download URLS for items that have 
changed in the remote repository since the last call. It then checks to see if 
the item has already been downloaded and if so will call the download URL with 
the HTTP cache headers set as the modification could have been just metadata 
not content. E.g:

let $options := Fri, 21 
Nov 2014 16:53:12 
GMT"1416588792000"full
let $response := xdmp:http-get($url, $options)

I noticed that the run time for this was considerably longer if some of the 
items would return back a Not Modified 304 response so decided to test if it 
was the remote repository or MarkLogic adding the overhead. I did this by 
creating a script that generated CURL commands so I could do the exact same 
requests from the command line and MarkLogic.
The calls back to the command line and Marklogic were returning the exact same 
response including the correct 304 code and an empty response body.
The calls from the command line were taking about 20 seconds less time than the 
calls from MarkLogic and seeing how the global timeout was set to 20 seconds I 
decided to try the MarkLogic calls but with a 1 second time out e.g:


let $options := 1Fri, 21 Nov 
2014 16:53:12 
GMT"1416588792000"full

and now they are taking approximately 1 second longer than the calls from the 
command line.

Has anyone else encountered this?
Could it be that MarkLogic is waiting for the response body even though it has 
received a valid response header? The HTTP 1.1 standard states that a response 
does not necessarily need a response body so if this is the case it maybe a bug 
in MarkLogics HTTP module.
Or am I missing something vital in my request options?

Thanks in advance,

Chris

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


[MarkLogic Dev General] XDMP:http-get and 304 responses

2015-01-08 Thread Chris Hudson-Silver
Hi All,

Recently I was working on a project that tracks a repository by calling a REST 
webservice that returns back metadata and download URLS for items that have 
changed in the remote repository since the last call. It then checks to see if 
the item has already been downloaded and if so will call the download URL with 
the HTTP cache headers set as the modification could have been just metadata 
not content. E.g:

let $options := Fri, 21 
Nov 2014 16:53:12 
GMT"1416588792000"full
let $response := xdmp:http-get($url, $options)

I noticed that the run time for this was considerably longer if some of the 
items would return back a Not Modified 304 response so decided to test if it 
was the remote repository or MarkLogic adding the overhead. I did this by 
creating a script that generated CURL commands so I could do the exact same 
requests from the command line and MarkLogic.
The calls back to the command line and Marklogic were returning the exact same 
response including the correct 304 code and an empty response body.
The calls from the command line were taking about 20 seconds less time than the 
calls from MarkLogic and seeing how the global timeout was set to 20 seconds I 
decided to try the MarkLogic calls but with a 1 second time out e.g:


let $options := 1Fri, 21 Nov 
2014 16:53:12 
GMT"1416588792000"full

and now they are taking approximately 1 second longer than the calls from the 
command line.

Has anyone else encountered this?
Could it be that MarkLogic is waiting for the response body even though it has 
received a valid response header? The HTTP 1.1 standard states that a response 
does not necessarily need a response body so if this is the case it maybe a bug 
in MarkLogics HTTP module.
Or am I missing something vital in my request options?

Thanks in advance,

Chris

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general