Re: Namespaces in response (SOLR-1586)
: eh ... agree to disagree i guess. it seems just as valid to say that : UpdateCommand -- what type of data does it update? ... or that : RequestHandler is ambigious because it can only handle Solr requests, : so it should be title SolrRequestHandler. : : True! I guess it's just aesthetics. I can go either way, but I dunno. (and : yes, just to be a pest, What type of data does that UpdateCommand update?) Isn't it obvious from the context? ... Solr Data :) (i think that's the first, and last, time i've used an emoticon on a lucene mailing list ) : You give a little, you get a little back. Maybe a compromise is to called it : NamedListResponseWriter, b/c that's really what it writes, no? Naming can be By that logic every ResponseWriter is a NamedListResponseWriter, and a StringResponseWriter and a MapResponseWriter ... at a certain point you have to just trust that people will read the docs, you can't encode every bit of knowledge about hte code base into the names. -Hoss
Re: Namespaces in response (SOLR-1586)
: a SolrQueryResponse, no one has ever accused any of those response writers : of not being flexible enough to generate a *different* type of response in : those formats. : : You may be right, but actually quite a few issues have referenced even non : XMLWriters of similar issues. See: I honeslty don't understand what you're getting at here, this list of issues is all over the map and almost none of them relate to the extensibility of any request handlers... : http://issues.apache.org/jira/browse/SOLR-1616 ... this was from someone who didn't notice json.nl=arrarr and felt like the default way of representing a NamedList in JSON was odd. they didn't disagree with the JSON structure, they just don't like the default. : http://issues.apache.org/jira/browse/SOLR-358 ...this was an improvement issue to track adding the ruby response writer ... which idnd't exist before this. : http://issues.apache.org/jira/browse/SOLR-1555 ...this is a bug in how the term compontent adds the terms to the response ... it's completley orthoginal to the response output structure. : http://issues.apache.org/jira/browse/SOLR-431 ...this is from one of my coworkers who had some really old, really hideously hackish plugins from before Solr was open sourced that was trying to find a way to work arround a big fixed in the xml escaping -- i could maybe see this as a response writers need to be more flexible type issue, except they knew from the start the start they were abusing a bug. : http://issues.apache.org/jira/browse/SOLR-912 ...this is an issue Kay opened to revamp NamedList to be more typesafe ... also has absolutely nothign to do with how flexible the output representation is. : Maybe, maybe not. I'm not sure the effect is to make it crystal clear as : much as it is to make it clearer. XMLWriter is totally ambiguous -- what : type of XML does it generate? I would argue SOLR response XML, hence the : SorlXmlResponseWriter. eh ... agree to disagree i guess. it seems just as valid to say that UpdateCommand -- what type of data does it update? ... or that RequestHandler is ambigious because it can only handle Solr requests, so it should be title SolrRequestHandler. we have enough ambiguity and confusion with some of our config file options and names that non-java users see ... the ones that only plugin writers see i'm less concerned with ... better to beef up the javadocs that deal with a bunch of deprecation headaches just to add Solr to the front of a class name. -Hoss
Re: Namespaces in response (SOLR-1586)
Hi Hoss: On 12/15/09 6:39 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : a SolrQueryResponse, no one has ever accused any of those response writers : of not being flexible enough to generate a *different* type of response in : those formats. : : You may be right, but actually quite a few issues have referenced even non : XMLWriters of similar issues. See: I honeslty don't understand what you're getting at here, this list of issues is all over the map and almost none of them relate to the extensibility of any request handlers... They may be all over the map, but in general they address your statement about non-XML response writers being flexible enough to generate a different type of response (although admittedly, none are as clear at the XMLWriter examples, I'll give you that). The examples I gave were just based on a quick search of JIRA. : Maybe, maybe not. I'm not sure the effect is to make it crystal clear as : much as it is to make it clearer. XMLWriter is totally ambiguous -- what : type of XML does it generate? I would argue SOLR response XML, hence the : SorlXmlResponseWriter. eh ... agree to disagree i guess. it seems just as valid to say that UpdateCommand -- what type of data does it update? ... or that RequestHandler is ambigious because it can only handle Solr requests, so it should be title SolrRequestHandler. True! I guess it's just aesthetics. I can go either way, but I dunno. (and yes, just to be a pest, What type of data does that UpdateCommand update?) we have enough ambiguity and confusion with some of our config file options and names that non-java users see ... the ones that only plugin writers see i'm less concerned with ... better to beef up the javadocs that deal with a bunch of deprecation headaches just to add Solr to the front of a class name. You give a little, you get a little back. Maybe a compromise is to called it NamedListResponseWriter, b/c that's really what it writes, no? Naming can be a pain -- I'll try and think of a good one when I'm preparing the patch for SOLR-1649. Thanks for the discussion. Helps to clarify things! Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Namespaces in response (SOLR-1586)
: I'm conflicted here. In simple semantics, sure it's just an array of : float/double numbers. A, if a string must be used a comma is probably OK, so : long as it maps to some existing known approach to represent points. I've : asked several times if there are examples. I can point to one that uses : spaces to separate the coordinates in the point (georss). What others use : comma? I have no opinion about the details ... space seperated string, comma seperated string, list of ints ... they are all the same to me. As a layman, my limited knowledge of geo coordinates has a vague notion that comma is the seperated used when discussing latitude nad longitute, but i have no real knowledge of naything GIS related. (i think i remember that KML uses comma, but KML also has some weird idea that longitude comes first because that's what the guys writing graphics rendering engines aparently like: y-axis first) : Well, I actually would disagree. What's the point of #toInternal and : #toExternal then, other than to convert from the external representation to : an internal Lucene index representation, and then to do the opposite coming : out of the index? that is what they are for -- but they deal purely in string representations of hte data itself -- they don't (and shouldn't) know/care wether the data is then being encapsulted in JSON, thrift, Avro, Solr XML, RSS, KML, etc The String limitation of toExternal is on of the reasons toObject was added (and the reason the BinaryResponseWRiter uses toObject()). : class final which it once was). We should rename that to : SolrXmlResponseWriter, but it's not really generic XML (as the name : suggests), it's SOLR's custom (undocumented) XML schema, right? Also, since Eh... i don't know that the name suggests that it can generate generic XML, it generates a (particular) one to one mapping from the SolrQueryResponse to XML .. just like the JSONResponseWriter generates a one to one mapping fromthe SolrQueryResponse to JSON, and ditoo for the ruby/php/python writers ... there an infinite number of possible XML/JSON/Ruby/PHP/Python/etc. structures that *could* be generated from a SolrQueryResponse, no one has ever accused any of those response writers of not being flexible enough to generate a *different* type of response in those formats. And practicle speaking: slapping Solr in front of a response writer classname isn't going to make it crystal clear that it produces a solr specific type of . It's oging to make people think it's the Solr implemntation of . Solr is hte prefix of enough classnames that eyeballs are just going to gloss over it. : suggests), it's SOLR's custom (undocumented) XML schema, right? Also, since : it's undocumented, I'd be happy to throw it together for it's XML format. we actaully went round and round on documenting it back in the early days .. frequently it was deemed self documenting enough for end users so not much effort was ever put into it. there was a Jira issue to create and XSD, but even once we had one, no one really had any idea what to *do* with it... https://issues.apache.org/jira/browse/SOLR-17 : Would that also be welcomed? Then, we should develop an easy extension point : mechanism for people who want to develop their own XML response writers and : write their own clients (or leverage existing clients that understand that : XML). +1 I think the crux of this would be XML based response writer similar to hte BinaryResponseWriter that can use a codec type system for outputing known types of objects, using FiledType.toOBject() to get field values. Then we just have to provide default codecs for all the types of objects we produce out of the box, but people can customize with their own codecs if they want differnet representation. -Hoss
Re: Namespaces in response (SOLR-1586)
Hi Hoss, On 12/14/09 3:18 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Well, I actually would disagree. What's the point of #toInternal and : #toExternal then, other than to convert from the external representation to : an internal Lucene index representation, and then to do the opposite coming : out of the index? that is what they are for -- but they deal purely in string representations of hte data itself -- they don't (and shouldn't) know/care wether the data is then being encapsulted in JSON, thrift, Avro, Solr XML, RSS, KML, etc The String limitation of toExternal is on of the reasons toObject was added (and the reason the BinaryResponseWRiter uses toObject()). Conceptually I think that the best approach would be to do something similar to the functionality of #toObject, but to not call it that. #toInternal and #toExternal are actually good names, their interface is just off (they shouldn't return Strings). : class final which it once was). We should rename that to : SolrXmlResponseWriter, but it's not really generic XML (as the name : suggests), it's SOLR's custom (undocumented) XML schema, right? Also, since Eh... i don't know that the name suggests that it can generate generic XML, it generates a (particular) one to one mapping from the SolrQueryResponse to XML .. just like the JSONResponseWriter generates a one to one mapping fromthe SolrQueryResponse to JSON, and ditoo for the ruby/php/python writers ... there an infinite number of possible XML/JSON/Ruby/PHP/Python/etc. structures that *could* be generated from a SolrQueryResponse, no one has ever accused any of those response writers of not being flexible enough to generate a *different* type of response in those formats. You may be right, but actually quite a few issues have referenced even non XMLWriters of similar issues. See: http://issues.apache.org/jira/browse/SOLR-1616 http://issues.apache.org/jira/browse/SOLR-358 http://issues.apache.org/jira/browse/SOLR-1555 http://issues.apache.org/jira/browse/SOLR-431 http://issues.apache.org/jira/browse/SOLR-912 And practicle speaking: slapping Solr in front of a response writer classname isn't going to make it crystal clear that it produces a solr specific type of . It's oging to make people think it's the Solr implemntation of . Solr is hte prefix of enough classnames that eyeballs are just going to gloss over it. Maybe, maybe not. I'm not sure the effect is to make it crystal clear as much as it is to make it clearer. XMLWriter is totally ambiguous -- what type of XML does it generate? I would argue SOLR response XML, hence the SorlXmlResponseWriter. : suggests), it's SOLR's custom (undocumented) XML schema, right? Also, since : it's undocumented, I'd be happy to throw it together for it's XML format. we actaully went round and round on documenting it back in the early days .. frequently it was deemed self documenting enough for end users so not much effort was ever put into it. there was a Jira issue to create and XSD, but even once we had one, no one really had any idea what to *do* with it... https://issues.apache.org/jira/browse/SOLR-17 I commented on SOLR-17 on what could be done with it, and I linked it to the new issue I threw up: SOLR-1646. Both can be closed at the same time, or even better, I can close SOLR-1646 and then work diligently on trying to get SOLR-17 committed. Even for documentation purposes it's well worth while. : Would that also be welcomed? Then, we should develop an easy extension point : mechanism for people who want to develop their own XML response writers and : write their own clients (or leverage existing clients that understand that : XML). +1 I think the crux of this would be XML based response writer similar to hte BinaryResponseWriter that can use a codec type system for outputing known types of objects, using FiledType.toOBject() to get field values. Then we just have to provide default codecs for all the types of objects we produce out of the box, but people can customize with their own codecs if they want differnet representation. +1! Thanks, Hoss. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Namespaces in response (SOLR-1586)
: : I think the initial geosearch feature can start off with : : str10,20/str for a point. : : +1. : : Fundamentally, how is a string a point? Fundementally a string is not a point, and a point is not a string -- but if you want express the concept of a point in a manner that only uses very simple primative types, then a string containing comma seperated numbers is a pretty dencet way to do it. If you'd prefer, a pair of numbers would workd just as well... arrfloat10/floatfloat20/float/arr : The current XML format SOlr uses was designed to be extremely simple, very : JSON-esque, and easily parsable by *anyone* in any langauge, without : needing special knowledge of types . : : Whoah. I'm totally confused now. Why have FieldTypes then? When not just use : Lucene? The use case for FieldTypes is _not_ just for indexing, or querying. : It's also for representation? No, actually the use case for FieldTYpes is entirely about the internal logic of how Solr should deal with those fields, and how various operations should work on them. FieldTypes can dictate the internal representation within the confines of a Lucene index, but they should not circumvent the contracts of the response writers in dictating what is/isn't a legal response. XMLWriter.writePrim may be public, which means there is a loophole that plugin writers can exploit to add new tag names to the Solr XML response that violate the contract (and no we don't have a formal XSD or DTD for our XML response format, but we still have a very well advertised contract) -- but that doesn't mean that code which ships with Solr should exploit those loopholes to violate that contract. People should expect that if they use Solr as is without any custom code that the XMLResponseWriter won't all of the sudden start including new, non-primitive-ish, XML tags/attributes that weren't there before. That's the entire point of the format as it was designed: break down whatever complex data might be involved in a response into easily digestible maps/lists of maps/lists of very primitive types that can easily be used in any programming langauge. : allowed for a while I think), why prevent it? Allowing namespaces does _not_ : break anything. ... : introducing a new 'point concept, wether as point or as : georss:point/, is going to break things for people. : : Show me an example, I fundamentally disagree with this. Ok. Let's start with SolrJ then: take a look at the KnownType enum (line 151) in XMLResponseParser... http://svn.apache.org/viewvc/lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/impl/XMLResponseParser.java?revision=819403view=markup ...or let's do a random google code search for solr xml lst -- check out ResponseContentHandler in solrpy... http://code.google.com/p/solrpy/source/browse/trunk/solr/core.py#841 ...I can't write python code to save my life, but I have pretty good idea what that code will do if it sees an unexpected tag. This is how a *LOT* of SOlr client libraries are implemented ... it's not an issue of broken XML parsers freaking out about namespaces, it's an issue of having a long standing, heavily advertised schema for the XML response that promises to only ever use a handful of types. Adding any new tags to this format (regardless of how easy it may be because of that stupid fucking public modifier on XMLWuiter.writePrim) will absolutely break things for people. : And why is that? Isn't the point of SOLR to expand to use cases brought up : by users of the system? As long as those use cases can be principally : supported, without breaking backwards compatibility (or in that case, if : they do, with large blinking red text that says it), then you're shutting : people out for 0 benefit? It's aesthetics we're talking about here. I don't know if i'd say that's the point of Solr, but yes we should absolutely try to grow the capabilities of the system as new use cases come along. I am 100% in agreement that the existing simple XMLRresponseWriter is not for everyone -- Historicly we've tried to maintain a sense of equality between all of hte Response writers, so that they all contained the same data just with different markup -- but there are clearly cases where it would be nice to have a response writer that is allowed to know more about teh real structure of the data and represent it in a manner that more closely represents it's purpose. This was the entire point behind adding FieldType.toOBject, and UUIDFIeld w/the BinaryResponseWriter is a good example of the model we should follow in the future. There is a clear push for Solr to natively be able to generated responses that incorporate more industry standard XML schemas, and i would love to see us start adding functionality to do that, but bastardizing the existing XMLResponseWriter format is not the way to do it. Bottom Line: I am a big fat -1 on any patch to Solr that adds new xml tags to the output
Re: Namespaces in response (SOLR-1586)
: themselves ... because of the back-ass-wards way we have FieldTypes write : their values directly to an XMLWriter or a TextWriter the idea of using an : object that stringifies itself as needed doesn't really apply very well : : I think it's rather powerful. You insulate the following variations into 1 : single place to change them (FieldType): : : * output representation : * indexing : * validation : : To remove this from FieldType would be to strew the same functionality : across multiple classes, which doesn't make sense IMHO. it's a damned-if-you-do/damned-if-you-don't situation though ... you look at as insulating the response writers because all of the logic about serializing data is in the FieldType, but i look at it as poluting the FieldType with knowledge about the output formats -- there's a reason we didn't add writeBinary to the FieldTYpe when the BinaryResponseWriter was added ... the toObject abstraction let's the FieldType do whatever it wants internally, and provide it's best face to the world when asked. the ResponseWriters can then apply hueristics to decide the most compatible type they know of to use when representing it: is it something complex i have a codec for? no; oh well, then is it soemthing that implemnets COllection? no; oh well, then is it something that is an instanceof Number? no; oh well, as a last resort we can stringify : In the long run, this might be nice, and +1 on getting there in the long : run. In the short, a compromise is to allow namespacing on fields in the : existing XmlWriter, which is allowed anyways, whether by oversight or not. I'm sure if we look hard enough at teh existing internal APIs, we can find a way to generate completley broken XML that no DOM, SAX or pull parser could possibly deal with cleanly -- but that doesn't mean we should do that just because it would allow us to start outputing a bunch of metadata that we think is useful. breaking the (implicit) XML Schema is just as bad as breaking the XML itself. -Hoss
Re: Namespaces in response (SOLR-1586)
Hi Hoss, : : I think the initial geosearch feature can start off with : : str10,20/str for a point. : : +1. : : Fundamentally, how is a string a point? Fundementally a string is not a point, and a point is not a string -- but if you want express the concept of a point in a manner that only uses very simple primative types, then a string containing comma seperated numbers is a pretty dencet way to do it. If you'd prefer, a pair of numbers would workd just as well... arrfloat10/floatfloat20/float/arr I'm conflicted here. In simple semantics, sure it's just an array of float/double numbers. A, if a string must be used a comma is probably OK, so long as it maps to some existing known approach to represent points. I've asked several times if there are examples. I can point to one that uses spaces to separate the coordinates in the point (georss). What others use comma? : The current XML format SOlr uses was designed to be extremely simple, very : JSON-esque, and easily parsable by *anyone* in any langauge, without : needing special knowledge of types . : : Whoah. I'm totally confused now. Why have FieldTypes then? When not just use : Lucene? The use case for FieldTypes is _not_ just for indexing, or querying. : It's also for representation? No, actually the use case for FieldTYpes is entirely about the internal logic of how Solr should deal with those fields, and how various operations should work on them. FieldTypes can dictate the internal representation within the confines of a Lucene index, but they should not circumvent the contracts of the response writers in dictating what is/isn't a legal response. Well, I actually would disagree. What's the point of #toInternal and #toExternal then, other than to convert from the external representation to an internal Lucene index representation, and then to do the opposite coming out of the index? : allowed for a while I think), why prevent it? Allowing namespaces does _not_ : break anything. ... : introducing a new 'point concept, wether as point or as : georss:point/, is going to break things for people. : : Show me an example, I fundamentally disagree with this. Ok. Let's start with SolrJ then: take a look at the KnownType enum (line 151) in XMLResponseParser... http://svn.apache.org/viewvc/lucene/solr/trunk/src/solrj/org/apache/solr/clien t/solrj/impl/XMLResponseParser.java?revision=819403view=markup Got it. OK, sure, well thanks for actually being able to identify somewhere where it would be and for taking the time to provide a link. So what you are saying is that this breaks the SolrJ and python clients and people who develop clients to parse and read the (undocumented) SOLR response schema. ...or let's do a random google code search for solr xml lst -- check out ResponseContentHandler in solrpy... http://code.google.com/p/solrpy/source/browse/trunk/solr/core.py#841 ...I can't write python code to save my life, but I have pretty good idea what that code will do if it sees an unexpected tag. Gotcha. : And why is that? Isn't the point of SOLR to expand to use cases brought up : by users of the system? As long as those use cases can be principally : supported, without breaking backwards compatibility (or in that case, if : they do, with large blinking red text that says it), then you're shutting : people out for 0 benefit? It's aesthetics we're talking about here. I don't know if i'd say that's the point of Solr, but yes we should absolutely try to grow the capabilities of the system as new use cases come along. Well that's what I was trying to do, but all I was hearing was a lot of hollering without any help to understand why. Thanks for being the one to finally provide that information. I am 100% in agreement that the existing simple XMLRresponseWriter is not for everyone -- Historicly we've tried to maintain a sense of equality between all of hte Response writers, so that they all contained the same data just with different markup -- but there are clearly cases where it would be nice to have a response writer that is allowed to know more about teh real structure of the data and represent it in a manner that more closely represents it's purpose. I'd like to refactor the whole thing to be a bit less brittle, and also to close off people that shouldn't be dealing with SOLR's XML in/out (by taking away your favorite writePrim method and its public modifier and making the class final which it once was). We should rename that to SolrXmlResponseWriter, but it's not really generic XML (as the name suggests), it's SOLR's custom (undocumented) XML schema, right? Also, since it's undocumented, I'd be happy to throw it together for it's XML format. Would that also be welcomed? Then, we should develop an easy extension point mechanism for people who want to develop their own XML response writers and write their own clients (or leverage existing clients that
Re: Namespaces in response (SOLR-1586)
Hi Hoss, : I think it's rather powerful. You insulate the following variations into 1 : single place to change them (FieldType): : : * output representation : * indexing : * validation : : To remove this from FieldType would be to strew the same functionality : across multiple classes, which doesn't make sense IMHO. it's a damned-if-you-do/damned-if-you-don't situation though ... you look at as insulating the response writers because all of the logic about serializing data is in the FieldType, but i look at it as poluting the FieldType with knowledge about the output formats -- there's a reason we didn't add writeBinary to the FieldTYpe when the BinaryResponseWriter was added ... the toObject abstraction let's the FieldType do whatever it wants internally, and provide it's best face to the world when asked. the ResponseWriters can then apply hueristics to decide the most compatible type they know of to use when representing it: is it something complex i have a codec for? no; oh well, then is it soemthing that implemnets COllection? no; oh well, then is it something that is an instanceof Number? no; oh well, as a last resort we can stringify Sure, it's just that it's half-way on both sides right now like you said. There's probably a middle ground. I like the insulation but I also understand the clutter (i.e., what you're saying). : In the long run, this might be nice, and +1 on getting there in the long : run. In the short, a compromise is to allow namespacing on fields in the : existing XmlWriter, which is allowed anyways, whether by oversight or not. I'm sure if we look hard enough at teh existing internal APIs, we can find a way to generate completley broken XML that no DOM, SAX or pull parser could possibly deal with cleanly -- but that doesn't mean we should do that just because it would allow us to start outputing a bunch of metadata that we think is useful. breaking the (implicit) XML Schema is just as bad as breaking the XML itself. Agreed. Let's document that (implicit) schema so loud people like me don't keep bugging you guys when it's so obvious to you. I'm just trying to help. I'll take an action. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Namespaces in response (SOLR-1586)
Hi Grant, and others, My 2 cents (and of course I'm bias having prepared the patch): In SOLR-1586, the proposed patch introduces the concept that a Solr response can declare a namespace for part of the response (in this case, it is using the tags defined by georss.org to specify a point, etc.). The patch doesn't introduce this concept -- it makes use of it. XMLWriter#writePrim took care of that for me, see Hostetter's comment: http://www.lucidimagination.com/search/document/be6fb7ce53c2922d/jira_create d_solr_1592_refactor_xmlwriter_starttag_to_allow_arbitrary_attributes_to_be_ writ Since that method is public, anyone could have done this in the past, they just chose not to. Moreover, they chose not to in the committed source for SOLR, but others who took SOLR, prepared their own XML response writers, etc., may have done this same thing as well. Discussion points: 1. If there are standard namespaces, then people can use them to do fun XML things +1. This includes things like validation, strong typing (see SOLR-912 for others who also believe that the NamedList BagOfObjects structure, while robust, introduces type confusion when unraveling the response), and plugging in to other tools. Imagine a GIS tool that required a georss:point to be returned back somehow. You could argue XSLT could do this, but as you note below, it's an extra step. It also _implicitly_ ties the representation and typing of a FieldType to something that isn't really tied to a field type at all (an XSLT file?) 2. If we allow them, we get all of the other benefits of namespaces... For sure -- see above for some examples. 3. The indexing side doesn't support them, so it seems odd to put in something like field name=point55.3 27.9/field and get back georss:point name=point 55.3 27.9/georss:point. At the same time, it seems equally weird to get back str name=point.../str when there is in fact more semantic information available about this particular field that would otherwise require more work by an application to make sense of. You got it. I'm not sure why it seems weird -- the translation from docs/fields to external representation (via response writers or field type representation) is one of the benefits of SOLR IMHO. 4. If we let in other namespaces, we then are opening ourselves to longer responses, etc. It is also likely the case that there isn't just one standard. This likely could mean slower responses, etc. How does adding in some characters (e.g., an ns tag and an associated URL) add anything other than noise? We're talking the difference between O(n) versus O(n+20) here. Also it's perfectly legit IMHO to say, well if you introduce 10, 000 namespaces, well, that's on you, and be prepared for slower client/server interactions. 5. If people wanted them, they could just do XSLT, but that is an extra step too. Yep, that's an extra step, and it's not explicit, like the patch I attached is. I tried to take advantage of one of SOLR's extension points in the architecture to explicitly tie a representation of a Field to its external and internal representation (aka, the point of a FieldType, no?) An alternative is that we could refactor things a bit and allow the FieldType to specify the tag name instead of it being hardcoded in the writers. This way people writing FieldTypes could define them. For instance, we could have FieldType.getTagName() that could be overridden and clients could have tools for introspecting this. This is basically what I did right? I did an inline namespace using a variant of #writePrm in XMLWriter (#writeCdata) and had the FieldType#toExternal method set the tag name, which is allowed by the API. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Namespaces in response (SOLR-1586)
Inline... On Dec 9, 2009, at 9:33 AM, Mattmann, Chris A (388J) wrote: Hi Grant, and others, My 2 cents (and of course I'm bias having prepared the patch): In SOLR-1586, the proposed patch introduces the concept that a Solr response can declare a namespace for part of the response (in this case, it is using the tags defined by georss.org to specify a point, etc.). The patch doesn't introduce this concept -- it makes use of it. XMLWriter#writePrim took care of that for me, see Hostetter's comment: http://www.lucidimagination.com/search/document/be6fb7ce53c2922d/jira_create d_solr_1592_refactor_xmlwriter_starttag_to_allow_arbitrary_attributes_to_be_ writ Since that method is public, anyone could have done this in the past, they just chose not to. Moreover, they chose not to in the committed source for SOLR, but others who took SOLR, prepared their own XML response writers, etc., may have done this same thing as well. Discussion points: 1. If there are standard namespaces, then people can use them to do fun XML things +1. This includes things like validation, Yeah, but the rest of Solr's response doesn't have it, so... strong typing (see SOLR-912 for others who also believe that the NamedList BagOfObjects structure, while robust, introduces type confusion when unraveling the response), and plugging in to other tools. Imagine a GIS tool that required a georss:point to be returned back somehow. You could argue XSLT could do this, but as you note below, it's an extra step. It also _implicitly_ ties the representation and typing of a FieldType to something that isn't really tied to a field type at all (an XSLT file?) Agreed. 2. If we allow them, we get all of the other benefits of namespaces... For sure -- see above for some examples. 3. The indexing side doesn't support them, so it seems odd to put in something like field name=point55.3 27.9/field and get back georss:point name=point 55.3 27.9/georss:point. At the same time, it seems equally weird to get back str name=point.../str when there is in fact more semantic information available about this particular field that would otherwise require more work by an application to make sense of. You got it. I'm not sure why it seems weird -- the translation from docs/fields to external representation (via response writers or field type representation) is one of the benefits of SOLR IMHO. It's weird b/c no XML type was specified upfront, but a type was given out on the back end. It's not a show stopper or anything, just an interesting point, I think. 4. If we let in other namespaces, we then are opening ourselves to longer responses, etc. It is also likely the case that there isn't just one standard. This likely could mean slower responses, etc. How does adding in some characters (e.g., an ns tag and an associated URL) add anything other than noise? We're talking the difference between O(n) versus O(n+20) here. Also it's perfectly legit IMHO to say, well if you introduce 10, 000 namespaces, well, that's on you, and be prepared for slower client/server interactions. You'd be surprised how slow XML parsing often is, especially for larger responses, XML processing can be quite expensive and most of the information in verbose at best. I've seen this on a number of occasions and it is why we switched to a binary response format in SolrJ and why I think all clients should speak the binary protocol. 5. If people wanted them, they could just do XSLT, but that is an extra step too. Yep, that's an extra step, and it's not explicit, like the patch I attached is. I tried to take advantage of one of SOLR's extension points in the architecture to explicitly tie a representation of a Field to its external and internal representation (aka, the point of a FieldType, no?) An alternative is that we could refactor things a bit and allow the FieldType to specify the tag name instead of it being hardcoded in the writers. This way people writing FieldTypes could define them. For instance, we could have FieldType.getTagName() that could be overridden and clients could have tools for introspecting this. This is basically what I did right? I did an inline namespace using a variant of #writePrm in XMLWriter (#writeCdata) and had the FieldType#toExternal method set the tag name, which is allowed by the API. As Hoss' points out on the thread, I think the longer term goal seems to be to be more agnostic of the FieldType, so this would argue against my proposal. -Grant
Re: Namespaces in response (SOLR-1586)
Hi Grant, My replies inline as well: Discussion points: 1. If there are standard namespaces, then people can use them to do fun XML things +1. This includes things like validation, Yeah, but the rest of Solr's response doesn't have it, so... You mean the rest of SOLR's default response and the components that add to it. I can, arbitrarily, as a user of SOLR, introduce as many inline xmlns attributes (and thus declare arbitrary number of namespaces) as I want, there is nothing that precludes me from doing so was my point. 3. The indexing side doesn't support them, so it seems odd to put in something like field name=point55.3 27.9/field and get back georss:point name=point 55.3 27.9/georss:point. At the same time, it seems equally weird to get back str name=point.../str when there is in fact more semantic information available about this particular field that would otherwise require more work by an application to make sense of. You got it. I'm not sure why it seems weird -- the translation from docs/fields to external representation (via response writers or field type representation) is one of the benefits of SOLR IMHO. It's weird b/c no XML type was specified upfront, but a type was given out on the back end. It's not a show stopper or anything, just an interesting point, I think. I actually disagree with this. FieldTypes, if we agree on a data type representation, e.g., georss point format, or line format, etc., define their XML representation. So, if we have a FieldType of type georss:point, then a type _is_ given up front, it's just defined in the standard that defines the field element. Imagine if you wanted to standardize on something like dublin core, for titles, formats, etc. SOLR expects a fairly simple XML structure (Documents, with Fields, with attributes), but the advantage of SOLR over traditional Lucene is that via FieldTypes, you can understand what the true type of the field you are indexing is. In other words, we can say in a schema file that e.g., this incoming title is DublinCore, so its field type is solr.DublinCoreAuthor, which inside of the FieldType definition, tells us how to go from the given representation to the index reprsentation (#toINternal) and subsequently tells us how to go from the index representation to the external representation (#toExternal). I'm not advocating for change SOLR's input doc format for indexing -- I'm arguing that what you guys have done is actually a great idea. Having FieldTypes and SolrInputDocuments as separate, allows each to involve independently of one another, but the same time, be brought back together for the purpose of e.g., validation, (see the lat/lon validation I did in the attached patch), response writing (for plugging into external tools), and for representation in the Lucene index outside of plain ol' Strings. 4. If we let in other namespaces, we then are opening ourselves to longer responses, etc. It is also likely the case that there isn't just one standard. This likely could mean slower responses, etc. How does adding in some characters (e.g., an ns tag and an associated URL) add anything other than noise? We're talking the difference between O(n) versus O(n+20) here. Also it's perfectly legit IMHO to say, well if you introduce 10, 000 namespaces, well, that's on you, and be prepared for slower client/server interactions. You'd be surprised how slow XML parsing often is, especially for larger responses, XML processing can be quite expensive and most of the information in verbose at best. I've seen this on a number of occasions and it is why we switched to a binary response format in SolrJ and why I think all clients should speak the binary protocol. Sure, XML parsing can be slow, but from your point above, you guys have standardized on using a binary request/response format in things like SolrJ, so what does the XML have to do this with anyways and why performance a concern then? In the case where people want XML, in their particular format, it's up to them to parse (and in most cases, if they are outputting a format, there's likely already readers/etc. that exist for that format, where things like optimizations can be delegated to). On the other hand, let's consider XSLT, which is a big performance hit as well, in many cases, more of a hit than simply outputting XML with the namespaces inline. Also, let's quality this. I'm not saying we should make SOLR's default response (and all its Components that add to the response) be forced to use namespaces. However, it should definitely not be precluded. 5. If people wanted them, they could just do XSLT, but that is an extra step too. Yep, that's an extra step, and it's not explicit, like the patch I attached is. I tried to take advantage of one of SOLR's extension points in the architecture to explicitly tie a representation of a Field to its external and internal representation (aka, the point of a
Re: Namespaces in response (SOLR-1586)
Hey All, 1. Namespaces are fun especially when you have some target format you are trying to work towards. Many target formats use namespaces extensively so having the ability to map to them on the back end (response) would be great. This does not mean that Solr would have to utilize namespaces at all and supporting them internally is a different issue. I think that was the spirit of the original patch. 2. From what I'm gathering this is a discussion of whether Solr supports them internally. Hopefully, there is a differentiation between internal/external namespace usage with Solr. 3. Why must the response dictate what is done internally within Solr? 4. Internally it would seem that these are just string mappings and how much impact would there really be to writing out the response? 5. If the shift is just to have them use XSLT my guess would be that would cause a slower response than direct mappings. This is solely my opinion as I have not done any tests but NamedList - XML - XSLT would seem logically slower than NamedList- (mapped) XML Thanks, Paul Ramirez On 12/9/09 5:30 AM, Grant Ingersoll gsing...@apache.org wrote: In SOLR-1586, the proposed patch introduces the concept that a Solr response can declare a namespace for part of the response (in this case, it is using the tags defined by georss.org to specify a point, etc.). I'm not sure what to make of this. My gut reaction says no, but I'm not a namespace expert and I also don't feel strongly about it. Discussion points: 1. If there are standard namespaces, then people can use them to do fun XML things 2. If we allow them, we get all of the other benefits of namespaces... 3. The indexing side doesn't support them, so it seems odd to put in something like field name=point55.3 27.9/field and get back georss:point name=point 55.3 27.9/georss:point. At the same time, it seems equally weird to get back str name=point.../str when there is in fact more semantic information available about this particular field that would otherwise require more work by an application to make sense of. 4. If we let in other namespaces, we then are opening ourselves to longer responses, etc. It is also likely the case that there isn't just one standard. This likely could mean slower responses, etc. 5. If people wanted them, they could just do XSLT, but that is an extra step too. An alternative is that we could refactor things a bit and allow the FieldType to specify the tag name instead of it being hardcoded in the writers. This way people writing FieldTypes could define them. For instance, we could have FieldType.getTagName() that could be overridden and clients could have tools for introspecting this. I'm not sure what effect any of this would have on downstream clients, either. Thoughts? -Grant
Re: Namespaces in response (SOLR-1586)
My gut feeling is that we should not be introducing namespaces by default. It introduces a new requirement of XML parsers in clients, and some parsers would start validating by default, and going out to the web to retrieve the referenced namespace/schema, etc. I think the initial geosearch feature can start off with str10,20/str for a point. If we wish to introduce a point type in the XML and binary response writers at a later point in time, it seems like it might require a version bump of the output format anyway, and we could go to something simple like point10,20/point. It is worth using standards when they buy you enough I'm not sure this is one of those times. I'm sure there are standards for numeric types like int too... but using namespaces for that seems like overkill. But if someone wants to supply patches that can optionally enable sticking in schema, namespaces, etc, w/o significant impact to the default, that's OK too. Or perhaps a custom response writer that uses namespaces for every single type for those who want that. -Yonik http://www.lucidimagination.com
Re: Namespaces in response (SOLR-1586)
On Wed, Dec 9, 2009 at 11:44 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: How does it introduce any new requirements? Namespaces are easily ignored by any XML client as they are if they weren't present. In other words, unless the XML client has setValidating=true, then this isn't an issue. I've run across cases where I added a schema declaration to an XML file and then things started failing. I think some parsers may default to validating if it sees that it can? Namespaces are to avoid name clashes. Solr XML is well defined and not arbitrary... adding point if we wish to do so won't introduce any clashes. The only difference between what you call simple above and what I've proposed (and correct me if I'm wrong but others have too) is that your point tag would include a namespace prefix and an xmlns attribute. What's the difference? It is worth using standards when they buy you enough I'm not sure this is one of those times. I'm sure there are standards for numeric types like int too... but using namespaces for that seems like overkill. There's a difference between a primitive type like int, and one like point. Also, it all comes down to your use case. If the only thing you're ever going to do with SOLR is have a SOLR client talk to it (Java, Ruby, whatever PL you want) then namespaces/etc. might be overkill. But why open up the response format then and advertise SOLR as something that provides REST-ful services for search? REST-ful doesn't say anything about customizing the response format. If that's the case, then users consuming those responses need the flexibility to customize them for their use case (validation, plugging into external GIS tools, etc.). So, I don't agree with this. What GIS tool could deal with a Solr XML response format w/o any other knowledge of everything else in the response? Are there some real use cases that using a namespace vs not for point make easier (an honest question... I don't know much about GIS stuff). All I've done is use what already exists. There doesn't need to be any patches. XmlWriter#writePrim allowed you to do this before, see: Yeah, you can use that to output longfalse/long too... but it will cause certain clients to barf. -Yonik http://www.lucidimagination.com
Re: Namespaces in response (SOLR-1586)
Should have tried this before... I just created a small XML file: foo barhi/bar /foo I pointed both firefox and IE at this file and it displays as XML fine. I then changed the file to this: foo zoo:barhi/zoo:bar /foo That made both of them barf. That alone makes me lean pretty strongly against using a namespace for this. -Yonik http://www.lucidimagination.com On Wed, Dec 9, 2009 at 12:28 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Dec 9, 2009 at 11:44 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: How does it introduce any new requirements? Namespaces are easily ignored by any XML client as they are if they weren't present. In other words, unless the XML client has setValidating=true, then this isn't an issue. I've run across cases where I added a schema declaration to an XML file and then things started failing. I think some parsers may default to validating if it sees that it can? Namespaces are to avoid name clashes. Solr XML is well defined and not arbitrary... adding point if we wish to do so won't introduce any clashes. The only difference between what you call simple above and what I've proposed (and correct me if I'm wrong but others have too) is that your point tag would include a namespace prefix and an xmlns attribute. What's the difference? It is worth using standards when they buy you enough I'm not sure this is one of those times. I'm sure there are standards for numeric types like int too... but using namespaces for that seems like overkill. There's a difference between a primitive type like int, and one like point. Also, it all comes down to your use case. If the only thing you're ever going to do with SOLR is have a SOLR client talk to it (Java, Ruby, whatever PL you want) then namespaces/etc. might be overkill. But why open up the response format then and advertise SOLR as something that provides REST-ful services for search? REST-ful doesn't say anything about customizing the response format. If that's the case, then users consuming those responses need the flexibility to customize them for their use case (validation, plugging into external GIS tools, etc.). So, I don't agree with this. What GIS tool could deal with a Solr XML response format w/o any other knowledge of everything else in the response? Are there some real use cases that using a namespace vs not for point make easier (an honest question... I don't know much about GIS stuff). All I've done is use what already exists. There doesn't need to be any patches. XmlWriter#writePrim allowed you to do this before, see: Yeah, you can use that to output longfalse/long too... but it will cause certain clients to barf. -Yonik http://www.lucidimagination.com
Re: Namespaces in response (SOLR-1586)
Hi Yonik, Should have tried this before... I just created a small XML file: foo barhi/bar /foo I pointed both firefox and IE at this file and it displays as XML fine. I then changed the file to this: foo zoo:barhi/zoo:bar /foo Sure, of course it does. It's because that's not valid XML syntax. You have to declare the namespace for zoo. You can do it at the top of the XML file in the root XML tag. Or, you can do it inline (like I've done in SOLR). Try this: foo zoo:bar xmlns:zoo=http://example.com/zoo;hi/zoo:bar /foo Cheers, Chris That made both of them barf. That alone makes me lean pretty strongly against using a namespace for this. -Yonik http://www.lucidimagination.com On Wed, Dec 9, 2009 at 12:28 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Dec 9, 2009 at 11:44 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: How does it introduce any new requirements? Namespaces are easily ignored by any XML client as they are if they weren't present. In other words, unless the XML client has setValidating=true, then this isn't an issue. I've run across cases where I added a schema declaration to an XML file and then things started failing. I think some parsers may default to validating if it sees that it can? Namespaces are to avoid name clashes. Solr XML is well defined and not arbitrary... adding point if we wish to do so won't introduce any clashes. The only difference between what you call simple above and what I've proposed (and correct me if I'm wrong but others have too) is that your point tag would include a namespace prefix and an xmlns attribute. What's the difference? It is worth using standards when they buy you enough I'm not sure this is one of those times. I'm sure there are standards for numeric types like int too... but using namespaces for that seems like overkill. There's a difference between a primitive type like int, and one like point. Also, it all comes down to your use case. If the only thing you're ever going to do with SOLR is have a SOLR client talk to it (Java, Ruby, whatever PL you want) then namespaces/etc. might be overkill. But why open up the response format then and advertise SOLR as something that provides REST-ful services for search? REST-ful doesn't say anything about customizing the response format. If that's the case, then users consuming those responses need the flexibility to customize them for their use case (validation, plugging into external GIS tools, etc.). So, I don't agree with this. What GIS tool could deal with a Solr XML response format w/o any other knowledge of everything else in the response? Are there some real use cases that using a namespace vs not for point make easier (an honest question... I don't know much about GIS stuff). All I've done is use what already exists. There doesn't need to be any patches. XmlWriter#writePrim allowed you to do this before, see: Yeah, you can use that to output longfalse/long too... but it will cause certain clients to barf. -Yonik http://www.lucidimagination.com ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Namespaces in response (SOLR-1586)
On Wed, Dec 9, 2009 at 12:40 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: foo zoo:bar xmlns:zoo=http://example.com/zoo;hi/zoo:bar /foo If you're forced to declare the namespace / put the URI, I'm just afraid of what clients / XML parsers out there may start trying to validate by default. And I'm still trying to figure out what we gain. If one does want validation, it seems like we should have an (optional) schema for the XML response as a whole? -Yonik http://www.lucidimagination.com
Re: Namespaces in response (SOLR-1586)
: I think the initial geosearch feature can start off with : str10,20/str for a point. +1. The current XML format SOlr uses was designed to be extremely simple, very JSON-esque, and easily parsable by *anyone* in any langauge, without needing special knowledge of types . It has been heavily advertised as only containing a very small handful of tags, representing primitive types (int, long, float, date, double, str) and basic collections (arr, lst, doc) ... even if id neverh ad a formal shema/DTD. adding new tags to that -- name spaced or otherwise -- is a very VERY bad idea for clients who have come to expect that they can use very simple parsing code to access all the data. introducing a new 'point concept, wether as point or as georss:point/, is going to break things for people. As discussed with Mattman in another thread -- some public methods in XMLWriter have inadvertantly made it possible for plugin writers to add their own XML tags -- but that doesn't mean we should do it in the core Solr distribution. If you write your own custom XMLWriter you aren't allowed to be suprised when it contains new tags, but our out of hte box users shouldn't have to deal with such suprises. As also discussed in that same thread thread: it makes a lot of sense in the long run to start having Response Writers that can generate more rich XML based responses and if there are already well defined standards for some of these concepts (like georss) then by all means we should support them -- but the existing XmlResponseWriter should NOT start generating new tags. The contract for SolrQueryResponse has always said: A SolrQueryResponse may contain the following types of Objects generated by the SolrRequestHandler that processed the request. ... Other data types may be added to the SolrQueryResponse, but there is no guarantee that QueryResponseWriters will be able to deal with unexpected types. ...unless things have changed since hte last time i looked, all of the out of the box response writers call toString() on any object they don't understand. So the best way to move forward in a flexible manner seems like it would be to add a new GeoPoint object to Solr, which toStrings to a simple -34.56,67.89 for use by existing response writers as a string, but some newer smarter response writer could output it in some more sophisticated manner. -Hoss
Re: Namespaces in response (SOLR-1586)
: ...unless things have changed since hte last time i looked, all of the : out of the box response writers call toString() on any object they : don't understand. So the best way to move forward in a flexible manner : seems like it would be to add a new GeoPoint object to Solr, which : toStrings to a simple -34.56,67.89 for use by existing response writers : as a string, but some newer smarter response writer could output it in : some more sophisticated manner. The caveat to that, now that i've skimmed SOLR-1586, is that it currently only applies to objects added to the SolrQueryResponse (or one of hte containers in it) datastructure that the ResponseWriter's walk themselves ... because of the back-ass-wards way we have FieldTypes write their values directly to an XMLWriter or a TextWriter the idea of using an object that stringifies itself as needed doesn't really apply very well ... and it won't unless we switch all of the ResponseWRiters to follow the BinaryResponseWriter model of using FieldType.toObject(...) to get the field value as an obejct that can be sent over the wire -- then the existing XmlResponseWriter, and the Text ResponseWriters, can call toString() on Objects they doesn't understand, and some newer/hipper/cooler response writers that understand georss can do fancier things with it. -Hoss
Re: Namespaces in response (SOLR-1586)
Hi Yonik, I've run across cases where I added a schema declaration to an XML file and then things started failing. I think some parsers may default to validating if it sees that it can? I've seen this too. But it won't affect the interaction we're talking about like I said, SOLR-1586 outputs valid XML, so this isn't an issue. Namespaces are to avoid name clashes. Solr XML is well defined and not arbitrary... adding point if we wish to do so won't introduce any clashes. Actually there are quite a bit of use cases for namespacing beyond name clashes. Namespaces enable validation, understanding and definition for elements (understanding units, ranges, etc.). For instance, you and I both use the term mass, but in my domain, mass refers to the planetary science definition of mass, but, in your domain you mean earth science. mass does not always mean the same thing (variation in units, representation, etc.) See here: http://www.w3.org/TR/2006/REC-xml-names11-20060816/ The only difference between what you call simple above and what I've proposed (and correct me if I'm wrong but others have too) is that your point tag would include a namespace prefix and an xmlns attribute. What's the difference? It is worth using standards when they buy you enough I'm not sure this is one of those times. I'm sure there are standards for numeric types like int too... but using namespaces for that seems like overkill. There's a difference between a primitive type like int, and one like point. Also, it all comes down to your use case. If the only thing you're ever going to do with SOLR is have a SOLR client talk to it (Java, Ruby, whatever PL you want) then namespaces/etc. might be overkill. But why open up the response format then and advertise SOLR as something that provides REST-ful services for search? REST-ful doesn't say anything about customizing the response format. So are you saying that the intention is not to allow customization of the response format? Also you've released how many releases of SOLR that have the capability to do this and now you're suddenly going to change it? I'm sorry I disagree. If that's the case, then users consuming those responses need the flexibility to customize them for their use case (validation, plugging into external GIS tools, etc.). So, I don't agree with this. What GIS tool could deal with a Solr XML response format w/o any other knowledge of everything else in the response? Are there some real use cases that using a namespace vs not for point make easier (an honest question... I don't know much about GIS stuff). Using standards enables standard tool development. Unless you want everyone to develop their own custom tools for SOLR (or be tied to using whatever is provided by SOLR _only_), and I don't think that's the intent. I also don't think that's a very friendly, open strategy for users. What I'm proposing does _not_ break backwards compatibility, anywhere. If you've got an example, then speak up. All I've done is use what already exists. There doesn't need to be any patches. XmlWriter#writePrim allowed you to do this before, see: Yeah, you can use that to output longfalse/long too... but it will cause certain clients to barf. That's a ResponseWriter issue. That's not a client issue. Clients don't arbitrarily connect to servers for which they don't speak the protocol language. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Namespaces in response (SOLR-1586)
Any parser that does that is so broken that you should stop using it immediately. --wunder Walter, totally agree here. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Namespaces in response (SOLR-1586)
Hey All, I think Eric is right on here and what I thought the intent of the patch was. Facilitating integration of Solr into environments where there is not one true XML output. In addition, there shouldn't be one true JSON output for cases where your existing code already has a way it expects the JSON. Why not allow someone to write a JSON output that feeds directly into that tool without having to change that tool. This is what makes Solr so cool is because of its flexibility and to limit that would be a shame. None of this really has to limit the internal representation or what the Solr community builds to support it's format but don't unnecessarily relegate that functionality to XSLT. --Paul On 12/9/09 11:22 AM, Eric Pugh ep...@opensourceconnections.com wrote: Is this the opportunity of having more then one XML output type? I mean, XML is meant to be a transport medium for data, and maybe moving from a one true XML output for Solr to being able to support multiple outputs dependent on the consumer would be useful. I can see it making it easier to plug Solr into environments that expect data in certain formats, without doing an extra XSL transformation? Eric
Re: Namespaces in response (SOLR-1586)
On Dec 9, 2009, at 11:11 AM, Mattmann, Chris A (388J) wrote: Any parser that does that is so broken that you should stop using it immediately. --wunder Walter, totally agree here. To elaborate my position: 1. Validation is a user option. The XML spec makes that very clear. We've had 10 years to get that right, and anyone who auto-validates is not paying attention. Validation is very useful when you are creating XML, rarely useful when reading it. 2. XML namespaces are string prefixes that use the URL syntax. They do not follow URI rules for anything but syntax and there is no guarantee that they can be resolved. In fact, an XML parser can't do anything standard with the result if they do resolve. Again, we've had 10 years to figure that out. Yes, this can be confusing, but if a parser author can't figure it out, don't use their parser because they are already getting the simple stuff wrong. wunder
Re: Namespaces in response (SOLR-1586)
Hi Yonik, Using standards enables standard tool development. We do use standards... lots of them :-) Let's be a bit more specific though - I was asking about using a namespace for the point type by *default*, and in isolation (i.e. the rest of solr xml isn't namespaced), and if/how that made things easier? Let's ask a different question -- how does it make things harder? At first blush it doesn't really seem to since any tool would need to deal with the Solr XML response in general. I've got use cases where folks writing APIs in Javascript/Ajax are querying SOLR (as a REST-ful web service) and elements of the response are being dropped into a web page via DHTML. Having the ability to drop tags that include namespaces helps out those folks because they want to have: (a) expected representations using standards they like (GeoRSS is on the list). (b) understanding of the elements they are dropping in (i.e., there is one use case where separately, after dropping in the georss:point tag, the tag definition (e.g., via the namespace at: http://www.w3.org/2003/01/geo/wgs84_pos#) is looked up and displayed. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++