Dear Wiki user, You have subscribed to a wiki page or wiki category on "Jakarta-httpclient Wiki" for change notification.
The following page has been changed by RolandWeber: http://wiki.apache.org/jakarta-httpclient/FrequentlyAskedApplicationDesignQuestions ------------------------------------------------------------------------------ - #pragma section-numbers 2 + #DEPRECATED - = Application Design FAQ = + This page has been [http://wiki.apache.org/HttpComponents/FrequentlyAskedApplicationDesignQuestions moved] + to the new [http://wiki.apache.org/HttpComponents/ HttpComponents Wiki]. - This document addresses questions about application design which - have been raised repeatedly on the HttpClient and HttpComponents mailing lists. - As it addresses design issues rather than API or other HttpClient or - HttpComponents specific problems, much of the information presented - is equally applicable for HttpURLConnection or non-Java APIs. - - If you are just getting your feet wet and want to understand the basics - of client HTTP programming rather than read about application design alternatives, - check out our [wiki:Self:ForAbsoluteBeginners primer]. - - ---- - [[TableOfContents]] - - - ------- - == Sending Parameters and Uploading Files == - - A question that is asked on a regular basis is: - - ''How do I upload a file along with some parameters?'' - - This section presents different ways to upload parameters, files, and both. - It assumes that you are implementing both a client application and - a server application to which the client application connects. - The client application might be a Java program using HttpClient, - while the server application is assumed to be a - [http://java.sun.com/products/servlet/2.3/javadoc/javax/servlet/http/HttpServlet.html Servlet]. - - Parameters are name/value pairs, where both name and value are strings. - Names should always be in US-ASCII, values may use other character - encodings, depending on the technique used for sending the parameters. - Files or rather file parameters are name/value pairs, where the name is - a string and the value is binary data read from a file. Binary values - from other sources can be handled similar to files from a design perspective, - though the details of the API will vary. - - - === GET with Query === - - The simplest way to transfer parameters to the server is the - query string of the URL. That's the part after the question mark, - for example in: - - http:''''''//my.server.name/my/servlet'''?param1=value1¶m2=value2''' - - Query strings can be used with any HTTP method, but they are most - frequently used with the GET method. A query string is also the - only way to send parameters with a GET method. - (Unless your application encodes parameters into the URL path.) - - The names and values of a query string must be URL-escaped. Each space character needs to - be replaced by a + character. Reserved characters like = & % + : / need to be - URL-escaped (%xx sequences) with their byte representation (see below). - URL-escaping is automatically handled for example by the [http://java.sun.com/j2se/1.5.0/docs/api/java/net/URI.html#URI(java.lang.String,%20java.lang.String,%20java.lang.String,%20int,%20java.lang.String,%20java.lang.String,%20java.lang.String) java.net.URI] - and [http://jakarta.apache.org/commons/httpclient/apidocs/org/apache/commons/httpclient/URI.html#setRawQuery(char%5b%5d) org.apache.commons.httpclient.URI] classes. - - HTTP Request lines and thus query strings are confined to the ASCII character encoding. Only ASCII names/values - can reliably be transferred in a query string. However it is possible to use a non-ASCII character encoding by - URL-escaping the characters. Character encodings (like UTF-8, ISO-8859-1; and unlike EBCDIC) whose lower 7-bit are - compatible with ASCII only need to escape the non-ASCII characters. The character encoding used to create and - interprete the % escape sequences must be the same on the server and on the client. It is common practice to - agree on UTF-8. But it is more a recommendation than a standard and must be verified in individual cases. - To avoid problems it is strongly discouraged to send non-ASCII values in the query string. - - Depending on the encoding the following characters are character encoded and URL-escaped as follows: - || char || ASCII || ISO-8859-1 || UTF-8 || EBCDIC || - || A || A || A || A || %E1 || - || ä || N/A || %E4 || %C3%A4 || N/A || - || & || %26 || %26 || %26 || %70 || - - On the server, name/value pairs sent in a query string are available - as parameters of the [http://java.sun.com/products/servlet/2.3/javadoc/javax/servlet/ServletRequest.html#getParameter(java.lang.String) ServletRequest]. - %xx escape sequences and + characters are automatically decoded by the Servlet API. - If non-ASCII values are sent in the query string, the outcome depends - on the implementation of the Servlet API, the Content-Type header and possibly also on - configuration parameters, such as the JVM default character set. - That's why it is strongly discouraged to send non-ASCII values - in the query string. - - - === POST with URL-encoded Form Data === - - Unlike the GET method, a POST method has a message body or entity - which can hold any kind of binary or non-binary data. - A simple way to send parameters with string values to the server - is to put the query string into the message body instead of the URL. - This avoids URL length restrictions, problems with parameters being - logged where they shouldn't, and it also allows for non-ASCII characters - in the values. While a URL is confined to ASCII characters, the - message body is not. The character set can be specified in a header field. - The encoding of special characters is automatically handled by - HttpClient's [http://jakarta.apache.org/commons/httpclient/apidocs/org/apache/commons/httpclient/methods/PostMethod.html PostMethod]. - - On the server, name/value pairs sent in a message body with - content type "application/x-www-form-urlencoded" are available - as parameters of the [http://java.sun.com/products/servlet/2.3/javadoc/javax/servlet/ServletRequest.html#getParameter(java.lang.String) ServletRequest]. - If there are parameters in both the message body and the query string, - all of them are available in the !ServletRequest. - - - === POST with Multipart Form Data === - - In order to upload binary data such as files, the data can be encoded as - multipart [http://www.ietf.org/rfc/rfc1521.txt MIME]. - That's the same format which is used for sending email attachments. - HTML forms for uploading files have to specify the content type - "multipart/form-data" so the browser knows that the multipart MIME encoding - must be applied. That content type is also sent to the server. - HttpClient provides the [http://jakarta.apache.org/commons/httpclient/apidocs/org/apache/commons/httpclient/methods/multipart/MultipartRequestEntity.html MultipartRequestEntity] - class to perform multipart MIME encoding. - - On the server, parameters sent as multipart MIME are ''not'' available - as parameters of the - [http://java.sun.com/products/servlet/2.3/javadoc/javax/servlet/ServletRequest.html#getParameter(java.lang.String) ServletRequest]. - The servlet has to read and interpret the message body explicitly. - There are libraries for parsing multipart MIME data, for example - [http://jakarta.apache.org/commons/fileupload/ Commons FileUpload]. - It should also be possible to parse multipart/form-data using the - [http://java.sun.com/products/javamail/FAQ.html#attach JavaMail API]. - If there are parameters in both the message body and the query string, - only those from the query string are available in the !ServletRequest. - - - === POST with Query and Data === - - In the special case that you need to upload only parameters with - ASCII string values and a single file, there is another option. - You can send the parameters in a query string and the file contents - as the message body. This does not require special encoding or - decoding of the binary data. - This kind of request can not be generated by an HTML form. - [[BR]] - With this approach, the file is not sent as a name/value pair. - Only the value, that is the file contents, will be transferred. - The servlet has to know what to do with the file based only on - the information in the URL (and HTTP headers). The information in - the URL can come from the query string (?name=!MyImage.png), - from the URL path (/my/servlet/save), or both. - [[BR]] - You should specify a content type that indicates the type of data - you are sending in the message body, such as "application/octet-stream" - or "image/png" or whatever else is appropriate for the file you are uploading. - If you are uploading a text file, you should also specify the - character set as part of the content type. - - On the server, the parameters sent in the query string are available - as parameters of the - [http://java.sun.com/products/servlet/2.3/javadoc/javax/servlet/ServletRequest.html#getParameter(java.lang.String) ServletRequest]. - The file contents is available as from the !ServletRequest as either - [http://java.sun.com/products/servlet/2.3/javadoc/javax/servlet/ServletRequest.html#getInputStream() binary] - or - [http://java.sun.com/products/servlet/2.3/javadoc/javax/servlet/ServletRequest.html#getReader() character] - data. - - - === Further Reading === - - [http://java.sun.com/j2se/1.5.0/docs/api/java/net/HttpURLConnection.html Java Standard Edition 5.0, HttpURLConnection] - - [http://www.iana.org/assignments/media-types/ IANA: Registered MIME Media Types] - - [http://www.w3.org/TR/html401/interact/forms.html#form-content-type HTML 4.01: Form Content Types] - - [http://java.sun.com/products/servlet/docs.html Sun: Java Servlet Technology Documentation] - - [http://java.sun.com/products/servlet/2.3/javadoc/ Servlet API 2.3] - - [http://java.sun.com/javaee/5/docs/api/javax/servlet/package-summary.html Java Enterprise Edition 5.0, Servlet API 2.4] - - [http://java.sun.com/products/javamail/reference/index.html JavaMail] - - [http://java.sun.com/javaee/5/docs/api/javax/mail/package-summary.html Java Enterprise Edition 5.0, JavaMail API] - - - - - ------- - == Client Authentication == - - There are different techniques by which a client can establish - it's identity to a server in a web environment. - We repeatedly got questions from users who were not aware of - the differences between those techniques. - This section presents the common authentication techniques and - puts them in the appropriate context. - - === Protocol Layers === - - Whenever a web client communicates with a web server, there - are communications on different protocol layers. The following - diagram shows the layers relevant for this discussion: - - ||<:>Application Layer|| - ||<:>HTTP Layer|| - ||<:>Transport Layer|| - - At the top, there is the application layer. That is your - client application communicating with a web application. - The web application can for example be comprised of some - servlets and JSPs. Your application will create and send - requests and interpret the responses to achieve some - purpose specific to your application. - [[BR]] - The middle layer is where HTTP communication takes place. - That is HttpClient or HttpComponents (or - [http://java.sun.com/j2se/1.5.0/docs/api/java/net/HttpURLConnection.html java.net.HttpURLConnection]) - exchanging HTTP messages with an HTTP server. !HttpClient - is not aware of the purpose of the messages. It merely - knows how to generate and parse the HTTP message format, and - how to handle some HTTP protocol details such as redirects and cookies. - [[BR]] - At the bottom is the transport layer. That is the operation system - connecting sockets to the computer on which the HTTP server is running. - Or it is a TLS/SSL library creating a secure connection to that computer. - On this layer, it is just binary data passing between the machines. - The transport layer is not aware of the data being HTTP messages, - let alone of the purpose for which the application is communicating. - - This layered structure is not obvious to the casual user. - When using a browser, you don't care whether your input is - interpreted by the HTML engine on the application layer, - or by the HTTP implementation on the HTTP layer, or by - the TLS/SSL implementation on the bottom layer. - Likewise, if you are developing a web application for example - in a J2EE environment, you're looking at the protocol layers - from the top down and don't care where the functionality is - provided. - [[BR]] - However, !HttpClient is [wiki:Self:ForAbsoluteBeginners not a browser]. - If you are developing an application with !HttpClient or - some other HTTP implementation, you have to be aware of this structure. - Client authentication can be performed on each of the three layers, - but !HttpClient is only responsible for the middle layer. - You will need to use other APIs for authentication on the application - or transport layer. - - - === Basic, Digest, NTLM Authentication === - - These authentication techniques operate on the HTTP layer and are - supported to some degree by HttpClient. - Basic and Digest authentication are specified in - [http://www.ietf.org/rfc/rfc2617.txt RFC 2617]. - Both are fully supported by !HttpClient. - A browser will typically pop up an authentication dialog asking - for the password to a specific server. The password will be asked - only once for each session. If NTLM authentication is used and - the password is the same as the Windows password, there may be - no authentication dialog at all (single sign-on, SSO). - - Basic authentication is considered insecure because it sends the - user password in plain text (unprotected) with each request. - That is only acceptable to some degree in intranets or when using - TLS/SSL secure connections (HTTPS). It is generally not acceptable - when using insecure connections over the internet. - [[BR]] - Digest authentication is more secure than basic authentication - because the password itself is not sent to the server. Instead, - a hash of the password is created and sent. Digest authentication - is rarely used, since in order to verify the hash, the server - needs to know the user password in plain text. User repositories - will typically not store passwords in plain text, but rather - hashes of the password. Therefore, digest authentication can - not be performed using such repositories. Storing passwords in - plain text on the server backend systems introduces a weak spot - into the server side architecture. - [[BR]] - NTLM authentication comes in several varieties, all of which are - proprietary authentication protocols by Microsoft. !HttpClient - partially supports NTLM authentication, as explained in the - [wiki:Self:FrequentlyAskedNTLMQuestions NTLM FAQ]. - The older versions, or lower levels, of NTLM authentication suffer - from the same weakness as Basic authentication. The newer versions - rectify this, but the protocols are not publicly documented. - There are no open source implementations of the newer versions. - - - === Form Based Authentication === - - The form based authentication technique operates on the application layer. - When using a browser, username and password have to be entered - in an HTML form. They will be sent to the server only once. - After successful authentication, the server remembers that this - client is authenticated and will not ask for the password again - during that session. Session tracking often requires a cookie. - [[BR]] - From the HttpClient perspective, submitting a form for client - authentication is no different from submitting a form for a - search query or any other purpose. - Instructions on how to support session tracking and simulate - form submission are available in the - [wiki:Self:ForAbsoluteBeginners Client HTTP Programming Guide]. - - Form based authentication is more secure than basic authentication. - Although it also transmits the password in plain text, it does so - only once and not with every request. Still, when used over the - internet, from based authentication should use a secure TLS/SSL - connection at least for the login procedure. Afterwards, the session - can be continued over plain connections, as the password is not - sent again. - - - === Certificate Based Authentication === - - Certificate based (client) authentication operates on the transport layer. - It is part of the TLS/SSL protocol. Instead of using a password, - certificate based authentication relies on public key cryptography. - A private key is stored on the client and used to authenticate - against the server. - A browser will typically pop up a password dialog, asking for the - password to the local key store. The password is never sent to the - server, it is only used locally to gain access to the private key. - The private key itself is never sent either. - [[BR]] - From the HttpClient perspective, certificate based authentication - is performed transparently when a secure TLS/SSL connection is - established to a server. Transparently means that !HttpClient doesn't - know anything about the authentication at all. You have to install a - [http://jakarta.apache.org/commons/httpclient/apidocs/org/apache/commons/httpclient/protocol/SecureProtocolSocketFactory.html SecureProtocolSocketFactory] - that automatically authenticates the client if requested by the server. - This includes asking the user for the password to the local key store. - - Certificate based client authentication is the most secure of the - authentication techniques discussed here. The drawback is that it - requires a complex public key infrastructure - ([http://en.wikipedia.org/wiki/Public_key_infrastructure PKI]) - to be put in place. - Certificates holding the public keys for each client need to be - available in the server's user repository, and the private keys - have to be deployed on each client machine. Issuing the certificates - for all clients is itself a complex task. - - === Further Reading === - - [http://java.sun.com/j2ee/1.4/docs/tutorial/doc/Overview6.html The J2EE 1.4 Tutorial, Security] (SUN) - - [http://www.ietf.org/rfc/rfc2617.txt RFC 2617: HTTP Authentication: Basic and Digest Access Authentication] - - [http://www.onjava.com/pub/a/onjava/2002/06/12/form.html J2EE Form Based Authentication] (onJava.com) - - [http://en.wikipedia.org/wiki/Public_key_infrastructure Wikipedia: Public Key Infrastructure] - - [http://www.ietf.org/rfc/rfc2246.txt RFC 2246: The TLS Protocol Version 1.0] - - [http://www.ietf.org/rfc/rfc3546.txt RFC 3546: Transport Layer Security (TLS) Extensions] - - - - ------- - == Server Performing Login for Client == - - Once in a while, somebody wants a server or proxy to perform login to a different site on behalf of the client, - then handing the session over to the client. Since the authentication is already performed by the server or proxy, - the client is not supposed to ask the user for credentials. - - In general, this is '''not possible'''. We mean it. It is '''not''' possible. Seriously. - Unless very specific conditions are met, there is '''no way'''. - - === Why It Should Not Work === - - Imagine you are a server called Bob. - Alice logs in to you, providing her credentials. - Then Charlie appears, trying to access Alice's data. - Charlie has no credentials, he just says: "Alice logged in on my behalf." - Sounds fishy, does it not? - ''Would you believe Charlie?'' - - So, what are the conditions which might allow this to happen anyway? - Firstly, authentication must apply to a session rather than the individual requests. - This usually implies form-based authentication (see above) and the existence of a session ID. - Secondly, the server configuration must be a bit negligent regarding security. - The rest depends on the type of session tracking. - - === URL-based Session Tracking === - - If the user session is tracked in the URL, the handover is simple. - Just send the URL including the session ID from the proxy or server to the client. - If the server does not notice the change of the client IP address, you are lucky. - A URL with session identifier could look like this: - http''''''://webmail.where.ever/xml/webmail;jsessionid=89702CCE20F2401326843985B0FB546F.TC159b - - === Cookie-based Session Tracking === - - If the user session is tracked with a session cookie, the handover is problematic. - If your server or proxy is in the same domain as the site you want to login to, - you can send the session cookie obtained from the target site on to the client, - setting it at the domain level. - This may or may not work, depending on additional security checks by the server. - It may interfere with session tracking of other servers in the same domain, - causing 'inexplicable' malfunctions of seemingly unrelated web applications for the client. - [[BR]] - A better solution would be to create a Single Sign-On (SSO) domain for your server or proxy and the target site. - Check the documentation of your application server(s) for information on Single Sign-On. - - If your server or proxy is ''not'' in the same domain as the site you want to login to, you are out of luck. - [[BR]] - ''If you find a way to make this work across domains, please report a security vulnerability against the browser.'' - - If you don't know what all that stuff about cookies and domains means, - you shouldn't implement this kind of security sensitive application in the first place. - - === Further Reading === - - [http://en.wikipedia.org/wiki/Alice_and_Bob Wikipedia: Alice and Bob] - - [http://java.sun.com/javaee/5/docs/tutorial/doc/Servlets11.html#wp64784 Java EE 5 Tutorial: Session Tracking] - - [http://en.wikipedia.org/wiki/Cross-site_cooking Wikipedia: Cross-site Cooking] - - [http://en.wikipedia.org/wiki/Single_sign-on Wikipedia: Single sign-on] - - - - ------- - == Proxy Configuration == - - HttpClient takes proxy configuration data from - [http://jakarta.apache.org/commons/httpclient/apidocs/org/apache/commons/httpclient/HostConfiguration.html HostConfiguration] - objects. These can either be passed explicitly when a method is executed (!HttpClient 3.1), or the default - configuration stored in the !HttpClient object is used. - Some of our users have the requirement to pick up external proxy configurations. - The following sections discuss some options for obtaining external proxy configuration data. - [[BR]] - Please note that !HttpClient is designed to yield predictable results for applications in need of an embedded - HTTP implementation. If !HttpClient would automatically pick up external configuration data, predictability - would be lost. Therefore, it remains the responsibility of the ''application'' to obtain proxy configuration data - and to pass it to !HttpClient. We will consider to provide helpers for this task if patches are contributed, - but the responsibility for calling such helpers would still remain with the application. - - - === System Properties === - - Up to and including Java 1.4, the standard Java implementation of HTTP, which is accessible through the - [http://java.sun.com/j2se/1.4.2/docs/api/java/net/HttpURLConnection.html HttpURLConnection] - class, expects proxy configuration data in system properties. The names of the properties - that affect different protocols (HTTP, HTTPS, FTP,...) have changed over time. The two most - prominent examples are {{{http.proxyHost}}} and {{{http.proxyPort}}}. - You can read the values of these properties and supply the configuration as shown in the example below. - [[BR]] - Note that other properties will also affect the standard Java HTTP implementation, for example - a list of proxy exemptions in {{{http.nonProxyHost}}}. It is your application which must decide - whether the external proxy configuration is applicable or not. - - {{{ - - String proxyHost = System.getProperty("http.proxyHost"); - int proxyPort = Integer.parseInt(System.getProperty("http.proxyPort")); - - String url = "http://www.google.com"; - - HttpClient client = new HttpClient(new MultiThreadedHttpConnectionManager()); - client.getHttpConnectionManager().getParams().setConnectionTimeout(30000); - - client.getHostConfiguration().setProxy(proxyHost,proxyPort); - - GetMethod get = new GetMethod(url); - get.setFollowRedirects(true); - String strGetResponseBody = null; - - try { - int iGetResultCode = client.executeMethod(get); - strGetResponseBody = get.getResponseBodyAsString(); - } catch (Exception ex) { - ex.printStackTrace(); - } finally { - get.releaseConnection(); - } - - }}} - - Since Java 5.0, the - [http://java.sun.com/j2se/1.5.0/docs/api/java/net/ProxySelector.html ProxySelector] - class allows for a more flexible, per-connection proxy configuration of the default HTTP implementation. - Because !HttpClient 3.1 is compatible with Java 1.2, it cannot support that class directly. - However, your application can make use of the default {{{ProxySelector}}} to pick up the - standard proxy configuration and pass it to !HttpClient. - !HttpClient 4.0 requires Java 5 and will include an optional proxy selection mechanism - based on {{{ProxySelector}}}. If you choose to obtain your proxy configuration elsewhere, - you will of course still be able to do that, too. - - - === Operating System Settings === - - On Linux and Unix systems, a proxy on the operating system level is typically set in environment variables. - The Java method for reading environment variables is - [http://java.sun.com/j2se/1.4.2/docs/api/java/lang/System.html#getenv(java.lang.String) System.getenv]. - Unfortunately, it is deprecated and not even implemented in some Java versions (JDK 1.4?). - The recommended replacement is to pass relevant environment variables as system properties by using - {{{-Dname=value}}} options when starting your application. See above for reading a proxy configuration - from system properties. Of course you can use custom property names in order to pass values without - affecting the default HTTP implementation. - [[BR]] - If using {{{-D}}} options is not feasible and you are stuck with a JVM that does not implement {{{System.getenv}}}, - you can try to run a shell script using - [http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Runtime.html#exec(java.lang.String%5b%5d) Runtime.exec]. - The shell script should print the relevant environment variables to standard output, from where your - application can parse them. - - On Windows systems, the proxy configuration is typically set in the registry. - You can either use - [http://java.sun.com/j2se/1.4.2/docs/guide/jni/index.html native code] - to read the registry, or try to run a shell script (batch file) as mentioned for the Linux/Unix case. - - ''If you know something about proxy settings on Mac OS, please share that information. - You can edit this Wiki page directly, or send a mail to one of our - [http://jakarta.apache.org/httpcomponents/mail-lists.html mailing lists].'' - - - === Browser Settings === - - When an applet uses - [http://java.sun.com/j2se/1.4.2/docs/api/java/net/HttpURLConnection.html HttpURLConnection], - the Java plug-in running the applet will automatically pick up the proxy configuration of the browser, - and also cookies stored in the browser. This is described in the - [http://java.sun.com/j2se/1.4.2/docs/guide/plugin/developer_guide/contents.html Java plug-in Developer Guide] - for JDK 1.4, - [http://java.sun.com/j2se/1.4.2/docs/guide/plugin/developer_guide/proxie_config.html chapter 5] - and - [http://java.sun.com/j2se/1.4.2/docs/guide/plugin/developer_guide/cookie_support.html chapter 7] - respectively. While this documentation explains the complexity of obtaining the proxy configuration, - it does not mention a public API from which an application could pick it up. - - Since Java 5.0, you can use the default - [http://java.sun.com/j2se/1.5.0/docs/api/java/net/ProxySelector.html ProxySelector] - mentioned in the section on system properties above. When running in the Java plug-in, - it will provide access to the browser proxy configuration. - [[BR]] - ''If you know how to access the browser proxy configuration in previous versions of the Java plug-in, - please share that information. You can edit this Wiki page directly, or send a mail to one of our - [http://jakarta.apache.org/httpcomponents/mail-lists.html mailing lists].'' - [[BR]] - ''If you know how to access the browser cookie store from an applet, please share that information too. - Send a mail to one of our [http://jakarta.apache.org/httpcomponents/mail-lists.html mailing lists] - or start a new section in this Wiki page.'' - - - === Further Reading === - - [http://java.sun.com/j2se/1.4.2/docs/guide/net/properties.html Networking Properties], Java 1.4 - - [http://java.sun.com/j2se/1.5.0/docs/guide/net/proxies.html Java Networking and Proxies], Java 5.0 - - [http://java.sun.com/j2se/1.4.2/docs/guide/jni/index.html Java Native Interface] - - [http://java.sun.com/j2se/1.4.2/docs/guide/plugin/index.html Java Plug-in], Java 1.4 - - [http://java.sun.com/j2se/1.5.0/docs/guide/deployment/deployment-guide/proxie_config.html Proxy Configuration], Deployment Guide for Java 5.0 - --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]