Re: Offer to submit some custom enhancements
if you wish to write out only the known parts (or parts that you need). say responseheader, results or the output of any known handler, it would be fine. But it cannot be a standard responsewriter unless it supports NamedList format But it is OK. One quick question? what is the client platform on which the library is going to run on that you need protocol buffers On Thu, Oct 16, 2008 at 9:27 PM, Feak, Todd [EMAIL PROTECTED] wrote: Answering Grant Ingersoll's question for use case as well, which may clarify. Without revealing TOO much about our internal structure, we are in the process of replacing SOAP communications in house with Protocol Buffers. We did evaluate Thrift as well, but decided on Protocol Buffers. A large effort for that conversion is well under way. I've been asked if Solr can support this, and to create a prototype to see if there are similar gains. I don't imagine it will be the gains that we've seen over SOAP, but I do foresee some amount of throughput increase. So, in response to suggestion for other binary formatting technologies, my hands are tied. This is the prototype I have to work on for now. If it works out, I will gladly share it. If not, I will share why, and hopefully save others some time. As for Protocol Buffers not supporting the NamedList structure. Google's documentation strongly suggests that intermediate (bean) classes be created, instead of trying to marshall and de-marshall your object model directly. This intermediate model doesn't have to precisely mirror the NamedList, it can be *any* compromise that gets the data from A to B, as long as the NamedList can be reconstituted on the other side. I'm sure something can be done. Thanks, Todd Feak -Original Message- From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] Sent: Thursday, October 16, 2008 8:17 AM To: solr-dev@lucene.apache.org Subject: Re: Offer to submit some custom enhancements Hi Todd, AFAIK, protocol buffers cannot be used for Solr because it is unable to support the NamedList structure that all Solr components use. The binary protocol (NamedListCodec) that SolrJ uses to communicate with Solr server is extremely optimized for our response format. However it is Java only. There are other projects such as Apache Thrift ( http://incubator.apache.org/thrift/) and Etch (both in incubation) which can be looked at. There are a few issues in Thrift which may help us in the future: https://issues.apache.org/jira/browse/THRIFT-110 https://issues.apache.org/jira/browse/THRIFT-122 On Thu, Oct 16, 2008 at 12:18 AM, Feak, Todd [EMAIL PROTECTED]wrote: Reposting, as I inadvertently thread hijacked on the first one. My bad. Hi all, I have a handful of custom classes that we've created for our purposes here. I'd like to share them if you think they have value for the rest of the community, but I wanted to check here before creating JIRA tickets and patches. Here's what I have: 1. DoubleMetaphoneFilter and Factory. This replaces usage of the PhoneticFilter and Factory allowing access to set maxCodeLength() on the DoubleMetaphone encoder and access to the alternate encodings that the encoder provides for some words. 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and Latin alphabet) exist in both a FullWidth and HalfWidth form. This filter normalizes by switching to the FullWidth form for all the characters. I have seen at least one JIRA ticket about this issue. This implementation doesn't rely on Java 1.6. 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be translated to Katakana. This filter normalizes to Katakana so that data and queries can come in either way and get hits. Also, I have been requested to create a prototype that you may be interested in. I'm to construct a QueryResponseWriter that returns documents using Google's Protocol Buffers. This would rely on an existing patch that exposes the OutputStream, but I would like to start the work soon. Are there license concerns that would block sharing this with you? Is there any interest in this? Thanks for your consideration, Todd Feak -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
RE: Offer to submit some custom enhancements
Both Java and C++ clients would potentially be using the Protocol Buffers. Performance and ease of adoption will determine what actually gets used. I'm implementing QueryResponseWriter right now and able to handle the NamedList, but due to lack inheritance in the Protocol Buffer object model, each type that is placed into the NamedList needs special handling. However, that doesn’t appear to be any different then the JSON or XML response writers, so I am hopeful this could work. The biggest stumbling block is the lack of access to OutputStream instead of Writer, but I saw a patch to address that. -Todd -Original Message- From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] Sent: Friday, October 17, 2008 4:07 AM To: solr-dev@lucene.apache.org Subject: Re: Offer to submit some custom enhancements if you wish to write out only the known parts (or parts that you need). say responseheader, results or the output of any known handler, it would be fine. But it cannot be a standard responsewriter unless it supports NamedList format But it is OK. One quick question? what is the client platform on which the library is going to run on that you need protocol buffers On Thu, Oct 16, 2008 at 9:27 PM, Feak, Todd [EMAIL PROTECTED] wrote: Answering Grant Ingersoll's question for use case as well, which may clarify. Without revealing TOO much about our internal structure, we are in the process of replacing SOAP communications in house with Protocol Buffers. We did evaluate Thrift as well, but decided on Protocol Buffers. A large effort for that conversion is well under way. I've been asked if Solr can support this, and to create a prototype to see if there are similar gains. I don't imagine it will be the gains that we've seen over SOAP, but I do foresee some amount of throughput increase. So, in response to suggestion for other binary formatting technologies, my hands are tied. This is the prototype I have to work on for now. If it works out, I will gladly share it. If not, I will share why, and hopefully save others some time. As for Protocol Buffers not supporting the NamedList structure. Google's documentation strongly suggests that intermediate (bean) classes be created, instead of trying to marshall and de-marshall your object model directly. This intermediate model doesn't have to precisely mirror the NamedList, it can be *any* compromise that gets the data from A to B, as long as the NamedList can be reconstituted on the other side. I'm sure something can be done. Thanks, Todd Feak -Original Message- From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] Sent: Thursday, October 16, 2008 8:17 AM To: solr-dev@lucene.apache.org Subject: Re: Offer to submit some custom enhancements Hi Todd, AFAIK, protocol buffers cannot be used for Solr because it is unable to support the NamedList structure that all Solr components use. The binary protocol (NamedListCodec) that SolrJ uses to communicate with Solr server is extremely optimized for our response format. However it is Java only. There are other projects such as Apache Thrift ( http://incubator.apache.org/thrift/) and Etch (both in incubation) which can be looked at. There are a few issues in Thrift which may help us in the future: https://issues.apache.org/jira/browse/THRIFT-110 https://issues.apache.org/jira/browse/THRIFT-122 On Thu, Oct 16, 2008 at 12:18 AM, Feak, Todd [EMAIL PROTECTED]wrote: Reposting, as I inadvertently thread hijacked on the first one. My bad. Hi all, I have a handful of custom classes that we've created for our purposes here. I'd like to share them if you think they have value for the rest of the community, but I wanted to check here before creating JIRA tickets and patches. Here's what I have: 1. DoubleMetaphoneFilter and Factory. This replaces usage of the PhoneticFilter and Factory allowing access to set maxCodeLength() on the DoubleMetaphone encoder and access to the alternate encodings that the encoder provides for some words. 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and Latin alphabet) exist in both a FullWidth and HalfWidth form. This filter normalizes by switching to the FullWidth form for all the characters. I have seen at least one JIRA ticket about this issue. This implementation doesn't rely on Java 1.6. 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be translated to Katakana. This filter normalizes to Katakana so that data and queries can come in either way and get hits. Also, I have been requested to create a prototype that you may be interested in. I'm to construct a QueryResponseWriter that returns documents using Google's Protocol Buffers. This would rely on an existing patch that exposes the OutputStream, but I would like to start the work soon. Are there license concerns that would block sharing this with you? Is there any interest in this? Thanks for your
RE: Offer to submit some custom enhancements
But it cannot be a standard responsewriter unless it supports NamedList format It has to be able to handle NamedList's contained in SolrQueryResponse, but it can output them in whatever format it wants for going over the wire ... whether the client on the other side of the Protocol Buffer knows how to make sense of the data you send it is another matter : biggest stumbling block is the lack of access to OutputStream instead of : Writer, but I saw a patch to address that. no patch needed, implement BinaryQueryResponseWriter and you'll be given a raw OutputStream. -Hoss
Re: Offer to submit some custom enhancements
Hi Todd, All of these sound good. Personally, I think analyzers like these belong in Lucene's contrib/analyzers package, with Solr factory implementations built on those, but that's your call. As for the Protocol Buffers, I am assuming you mean: http://code.google.com/p/protobuf/ That is an Apache license, so it is fine to incorporate. Sounds like it might be a contrib to start, but that's just my take. Sounds like they might be worth using in SolrJ and for distributed, but am interested in how it compares to other similar technologies. Can you share your use case for them? -Grant On Oct 15, 2008, at 2:48 PM, Feak, Todd wrote: Reposting, as I inadvertently thread hijacked on the first one. My bad. Hi all, I have a handful of custom classes that we've created for our purposes here. I'd like to share them if you think they have value for the rest of the community, but I wanted to check here before creating JIRA tickets and patches. Here's what I have: 1. DoubleMetaphoneFilter and Factory. This replaces usage of the PhoneticFilter and Factory allowing access to set maxCodeLength() on the DoubleMetaphone encoder and access to the alternate encodings that the encoder provides for some words. 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and Latin alphabet) exist in both a FullWidth and HalfWidth form. This filter normalizes by switching to the FullWidth form for all the characters. I have seen at least one JIRA ticket about this issue. This implementation doesn't rely on Java 1.6. 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be translated to Katakana. This filter normalizes to Katakana so that data and queries can come in either way and get hits. Also, I have been requested to create a prototype that you may be interested in. I'm to construct a QueryResponseWriter that returns documents using Google's Protocol Buffers. This would rely on an existing patch that exposes the OutputStream, but I would like to start the work soon. Are there license concerns that would block sharing this with you? Is there any interest in this? Thanks for your consideration, Todd Feak -- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Offer to submit some custom enhancements
Hi Todd, AFAIK, protocol buffers cannot be used for Solr because it is unable to support the NamedList structure that all Solr components use. The binary protocol (NamedListCodec) that SolrJ uses to communicate with Solr server is extremely optimized for our response format. However it is Java only. There are other projects such as Apache Thrift ( http://incubator.apache.org/thrift/) and Etch (both in incubation) which can be looked at. There are a few issues in Thrift which may help us in the future: https://issues.apache.org/jira/browse/THRIFT-110 https://issues.apache.org/jira/browse/THRIFT-122 On Thu, Oct 16, 2008 at 12:18 AM, Feak, Todd [EMAIL PROTECTED]wrote: Reposting, as I inadvertently thread hijacked on the first one. My bad. Hi all, I have a handful of custom classes that we've created for our purposes here. I'd like to share them if you think they have value for the rest of the community, but I wanted to check here before creating JIRA tickets and patches. Here's what I have: 1. DoubleMetaphoneFilter and Factory. This replaces usage of the PhoneticFilter and Factory allowing access to set maxCodeLength() on the DoubleMetaphone encoder and access to the alternate encodings that the encoder provides for some words. 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and Latin alphabet) exist in both a FullWidth and HalfWidth form. This filter normalizes by switching to the FullWidth form for all the characters. I have seen at least one JIRA ticket about this issue. This implementation doesn't rely on Java 1.6. 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be translated to Katakana. This filter normalizes to Katakana so that data and queries can come in either way and get hits. Also, I have been requested to create a prototype that you may be interested in. I'm to construct a QueryResponseWriter that returns documents using Google's Protocol Buffers. This would rely on an existing patch that exposes the OutputStream, but I would like to start the work soon. Are there license concerns that would block sharing this with you? Is there any interest in this? Thanks for your consideration, Todd Feak -- Regards, Shalin Shekhar Mangar.
Re: Offer to submit some custom enhancements
Python marshal format supports everything we need and is easy to implement in Java. It is roughly equivalent to JSON, but binary. http://docs.python.org/library/marshal.html wunder On 10/16/08 8:16 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Hi Todd, AFAIK, protocol buffers cannot be used for Solr because it is unable to support the NamedList structure that all Solr components use. The binary protocol (NamedListCodec) that SolrJ uses to communicate with Solr server is extremely optimized for our response format. However it is Java only. There are other projects such as Apache Thrift ( http://incubator.apache.org/thrift/) and Etch (both in incubation) which can be looked at. There are a few issues in Thrift which may help us in the future: https://issues.apache.org/jira/browse/THRIFT-110 https://issues.apache.org/jira/browse/THRIFT-122 On Thu, Oct 16, 2008 at 12:18 AM, Feak, Todd [EMAIL PROTECTED]wrote: Reposting, as I inadvertently thread hijacked on the first one. My bad. Hi all, I have a handful of custom classes that we've created for our purposes here. I'd like to share them if you think they have value for the rest of the community, but I wanted to check here before creating JIRA tickets and patches. Here's what I have: 1. DoubleMetaphoneFilter and Factory. This replaces usage of the PhoneticFilter and Factory allowing access to set maxCodeLength() on the DoubleMetaphone encoder and access to the alternate encodings that the encoder provides for some words. 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and Latin alphabet) exist in both a FullWidth and HalfWidth form. This filter normalizes by switching to the FullWidth form for all the characters. I have seen at least one JIRA ticket about this issue. This implementation doesn't rely on Java 1.6. 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be translated to Katakana. This filter normalizes to Katakana so that data and queries can come in either way and get hits. Also, I have been requested to create a prototype that you may be interested in. I'm to construct a QueryResponseWriter that returns documents using Google's Protocol Buffers. This would rely on an existing patch that exposes the OutputStream, but I would like to start the work soon. Are there license concerns that would block sharing this with you? Is there any interest in this? Thanks for your consideration, Todd Feak
RE: Offer to submit some custom enhancements
Regarding the location of the Filters and Factories ... I agree that the Filters would be best located in Lucene, as users of both packages would then have access. What I'm struggling with is the timing of putting Filters into Lucene, and then Factories into Solr. The Factories in Solr would be useless until the Filters had been accepted and released in Lucene, then the Lucene version upgraded in Solr. What I'm inclined to do is release the Filters to both, and have the Factories point to the Solr version, until they become available in the Lucene version, then switch them over and drop the Solr version. How is this handled with other new Filter/Factory sets? Just let me know, and I'll get the ball rolling on those. I'm going to follow up on Protocol Buffers in response to some other messages I see coming in. Thanks, Todd Feak -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Thursday, October 16, 2008 7:12 AM To: solr-dev@lucene.apache.org Subject: Re: Offer to submit some custom enhancements Hi Todd, All of these sound good. Personally, I think analyzers like these belong in Lucene's contrib/analyzers package, with Solr factory implementations built on those, but that's your call. As for the Protocol Buffers, I am assuming you mean: http://code.google.com/p/protobuf/ That is an Apache license, so it is fine to incorporate. Sounds like it might be a contrib to start, but that's just my take. Sounds like they might be worth using in SolrJ and for distributed, but am interested in how it compares to other similar technologies. Can you share your use case for them? -Grant On Oct 15, 2008, at 2:48 PM, Feak, Todd wrote: Reposting, as I inadvertently thread hijacked on the first one. My bad. Hi all, I have a handful of custom classes that we've created for our purposes here. I'd like to share them if you think they have value for the rest of the community, but I wanted to check here before creating JIRA tickets and patches. Here's what I have: 1. DoubleMetaphoneFilter and Factory. This replaces usage of the PhoneticFilter and Factory allowing access to set maxCodeLength() on the DoubleMetaphone encoder and access to the alternate encodings that the encoder provides for some words. 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and Latin alphabet) exist in both a FullWidth and HalfWidth form. This filter normalizes by switching to the FullWidth form for all the characters. I have seen at least one JIRA ticket about this issue. This implementation doesn't rely on Java 1.6. 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be translated to Katakana. This filter normalizes to Katakana so that data and queries can come in either way and get hits. Also, I have been requested to create a prototype that you may be interested in. I'm to construct a QueryResponseWriter that returns documents using Google's Protocol Buffers. This would rely on an existing patch that exposes the OutputStream, but I would like to start the work soon. Are there license concerns that would block sharing this with you? Is there any interest in this? Thanks for your consideration, Todd Feak -- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
RE: Offer to submit some custom enhancements
Answering Grant Ingersoll's question for use case as well, which may clarify. Without revealing TOO much about our internal structure, we are in the process of replacing SOAP communications in house with Protocol Buffers. We did evaluate Thrift as well, but decided on Protocol Buffers. A large effort for that conversion is well under way. I've been asked if Solr can support this, and to create a prototype to see if there are similar gains. I don't imagine it will be the gains that we've seen over SOAP, but I do foresee some amount of throughput increase. So, in response to suggestion for other binary formatting technologies, my hands are tied. This is the prototype I have to work on for now. If it works out, I will gladly share it. If not, I will share why, and hopefully save others some time. As for Protocol Buffers not supporting the NamedList structure. Google's documentation strongly suggests that intermediate (bean) classes be created, instead of trying to marshall and de-marshall your object model directly. This intermediate model doesn't have to precisely mirror the NamedList, it can be *any* compromise that gets the data from A to B, as long as the NamedList can be reconstituted on the other side. I'm sure something can be done. Thanks, Todd Feak -Original Message- From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] Sent: Thursday, October 16, 2008 8:17 AM To: solr-dev@lucene.apache.org Subject: Re: Offer to submit some custom enhancements Hi Todd, AFAIK, protocol buffers cannot be used for Solr because it is unable to support the NamedList structure that all Solr components use. The binary protocol (NamedListCodec) that SolrJ uses to communicate with Solr server is extremely optimized for our response format. However it is Java only. There are other projects such as Apache Thrift ( http://incubator.apache.org/thrift/) and Etch (both in incubation) which can be looked at. There are a few issues in Thrift which may help us in the future: https://issues.apache.org/jira/browse/THRIFT-110 https://issues.apache.org/jira/browse/THRIFT-122 On Thu, Oct 16, 2008 at 12:18 AM, Feak, Todd [EMAIL PROTECTED]wrote: Reposting, as I inadvertently thread hijacked on the first one. My bad. Hi all, I have a handful of custom classes that we've created for our purposes here. I'd like to share them if you think they have value for the rest of the community, but I wanted to check here before creating JIRA tickets and patches. Here's what I have: 1. DoubleMetaphoneFilter and Factory. This replaces usage of the PhoneticFilter and Factory allowing access to set maxCodeLength() on the DoubleMetaphone encoder and access to the alternate encodings that the encoder provides for some words. 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and Latin alphabet) exist in both a FullWidth and HalfWidth form. This filter normalizes by switching to the FullWidth form for all the characters. I have seen at least one JIRA ticket about this issue. This implementation doesn't rely on Java 1.6. 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be translated to Katakana. This filter normalizes to Katakana so that data and queries can come in either way and get hits. Also, I have been requested to create a prototype that you may be interested in. I'm to construct a QueryResponseWriter that returns documents using Google's Protocol Buffers. This would rely on an existing patch that exposes the OutputStream, but I would like to start the work soon. Are there license concerns that would block sharing this with you? Is there any interest in this? Thanks for your consideration, Todd Feak -- Regards, Shalin Shekhar Mangar.
Offer to submit some custom enhancements
Hi all, I have a handful of custom classes that we've created for our purposes here. I'd like to share them if you think they have value for the rest of the community, but I wanted to check here before creating JIRA tickets and patches. Here's what I have: 1. DoubleMetaphoneFilter and Factory. This replaces usage of the PhoneticFilter and Factory allowing access to set maxCodeLength() on the DoubleMetaphone encoder and access to the alternate encodings that the encoder provides for some words. 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and Latin alphabet) exist in both a FullWidth and HalfWidth form. This filter normalizes by switching to the FullWidth form for all the characters. I have seen at least one JIRA ticket about this issue. This implementation doesn't rely on Java 1.6. 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be translated to Katakana. This filter normalizes to Katakana so that data and queries can come in either way and get hits. Also, I have been requested to create a prototype that you may be interested in. I'm to construct a QueryResponseWriter that returns documents using Google's Protocol Buffers. This would rely on an existing patch that exposes the OutputStream, but I would like to start the work soon. Are there license concerns that would block sharing this with you? Is there any interest in this? Thanks for your consideration, Todd Feak
Offer to submit some custom enhancements
Reposting, as I inadvertently thread hijacked on the first one. My bad. Hi all, I have a handful of custom classes that we've created for our purposes here. I'd like to share them if you think they have value for the rest of the community, but I wanted to check here before creating JIRA tickets and patches. Here's what I have: 1. DoubleMetaphoneFilter and Factory. This replaces usage of the PhoneticFilter and Factory allowing access to set maxCodeLength() on the DoubleMetaphone encoder and access to the alternate encodings that the encoder provides for some words. 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and Latin alphabet) exist in both a FullWidth and HalfWidth form. This filter normalizes by switching to the FullWidth form for all the characters. I have seen at least one JIRA ticket about this issue. This implementation doesn't rely on Java 1.6. 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be translated to Katakana. This filter normalizes to Katakana so that data and queries can come in either way and get hits. Also, I have been requested to create a prototype that you may be interested in. I'm to construct a QueryResponseWriter that returns documents using Google's Protocol Buffers. This would rely on an existing patch that exposes the OutputStream, but I would like to start the work soon. Are there license concerns that would block sharing this with you? Is there any interest in this? Thanks for your consideration, Todd Feak