On 12/6/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > example: <tag>�</tag> is not valid XML > Can you give an example of a query that needs binary information?
It's never an absolute need - one could always work around the problem, for sure. The issue was more a desire to be able to represent everything that *currently* works in lucene (as far as queries go). - hacking the bits of numerics directly into chunks (7 or 15 bits for example) (I actually do this) - representing separation of values or sentences with a null byte Previously, all I had to watch out for was UCS-16 surrogates: as long as I stayed below 0xD800, everything worked fine. > Also I'd be curious to see a problem with Unicode code points in XML, > if you have one handy. The definition of valid XML 1.0 characters: #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] The simplest example is code-point 0. It's a valid unicode character, but it's not a valid XML character (even when you replace it with an entity). Example: <tag>NullTerminated�</tag> is not valid XML > http://www.fawcette.com/javapro/2003_02/magazine/features/ehatcher/ > (must register to see the full article, unfortunately) > > I'm confident that XML can accommodate our needs just fine, and any > other text transmission would have to re-solve many issues that XML > has already solved. Agreed. It wasn't a blocker, but it was something I wanted to see tackled up front. It means adding a little more application logic to handle escaping/unescaping. The bottom line is I want to be able to represent the perfectly valid lucene query new TermQuery(new Term("field","\u0000")). -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
