Re: Offer to submit some custom enhancements

2008-10-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
if you wish to write out only the known parts (or parts that you
need). say responseheader, results or the output of any known handler,
it would be fine.

But it cannot be a standard responsewriter unless it supports NamedList format

But it is OK.

One quick question? what is the client platform on which the library
is going to run on that you need protocol buffers



On Thu, Oct 16, 2008 at 9:27 PM, Feak, Todd [EMAIL PROTECTED] wrote:
 Answering Grant Ingersoll's question for use case as well, which may
 clarify.

 Without revealing TOO much about our internal structure, we are in the
 process of replacing SOAP communications in house with Protocol Buffers.
 We did evaluate Thrift as well, but decided on Protocol Buffers. A large
 effort for that conversion is well under way. I've been asked if Solr
 can support this, and to create a prototype to see if there are similar
 gains. I don't imagine it will be the gains that we've seen over SOAP,
 but I do foresee some amount of throughput increase.

 So, in response to suggestion for other binary formatting technologies,
 my hands are tied. This is the prototype I have to work on for now. If
 it works out, I will gladly share it. If not, I will share why, and
 hopefully save others some time.

 As for Protocol Buffers not supporting the NamedList structure. Google's
 documentation strongly suggests that intermediate (bean) classes be
 created, instead of trying to marshall and de-marshall your object model
 directly. This intermediate model doesn't have to precisely mirror the
 NamedList, it can be *any* compromise that gets the data from A to B, as
 long as the NamedList can be reconstituted on the other side. I'm sure
 something can be done.

 Thanks,
 Todd Feak


 -Original Message-
 From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 16, 2008 8:17 AM
 To: solr-dev@lucene.apache.org
 Subject: Re: Offer to submit some custom enhancements

 Hi Todd,

 AFAIK, protocol buffers cannot be used for Solr because it is unable to
 support the NamedList structure that all Solr components use.

 The binary protocol (NamedListCodec) that SolrJ uses to communicate with
 Solr server is extremely optimized for our response format. However it
 is
 Java only.

 There are other projects such as Apache Thrift (
 http://incubator.apache.org/thrift/) and Etch (both in incubation) which
 can
 be looked at. There are a few issues in Thrift which may help us in the
 future:

 https://issues.apache.org/jira/browse/THRIFT-110
 https://issues.apache.org/jira/browse/THRIFT-122

 On Thu, Oct 16, 2008 at 12:18 AM, Feak, Todd
 [EMAIL PROTECTED]wrote:

 Reposting, as I inadvertently thread hijacked on the first one. My
 bad.

 Hi all,

 I have a handful of custom classes that we've created for our purposes
 here. I'd like to share them if you think they have value for the rest
 of the community, but I wanted to check here before creating JIRA
 tickets and patches.

 Here's what I have:

 1. DoubleMetaphoneFilter and Factory. This replaces usage of the
 PhoneticFilter and Factory allowing access to set maxCodeLength() on
 the
 DoubleMetaphone encoder and access to the alternate encodings that
 the
 encoder provides for some words.

 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and
 Latin alphabet) exist in both a FullWidth and HalfWidth form. This
 filter normalizes by switching to the FullWidth form for all the
 characters. I have seen at least one JIRA ticket about this issue.
 This
 implementation doesn't rely on Java 1.6.

 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be
 translated to Katakana. This filter normalizes to Katakana so that
 data
 and queries can come in either way and get hits.


 Also, I have been requested to create a prototype that you may be
 interested in. I'm to construct a QueryResponseWriter that returns
 documents using Google's Protocol Buffers. This would rely on an
 existing patch that exposes the OutputStream, but I would like to
 start
 the work soon. Are there license concerns that would block sharing
 this
 with you? Is there any interest in this?

 Thanks for your consideration,
 Todd Feak




 --
 Regards,
 Shalin Shekhar Mangar.




-- 
--Noble Paul


RE: Offer to submit some custom enhancements

2008-10-17 Thread Feak, Todd
Both Java and C++ clients would potentially be using the Protocol Buffers. 

Performance and ease of adoption will determine what actually gets used.

I'm implementing QueryResponseWriter right now and able to handle the 
NamedList, but due to lack inheritance in the Protocol Buffer object model, 
each type that is placed into the NamedList needs special handling. However, 
that doesn’t appear to be any different then the JSON or XML response writers, 
so I am hopeful this could work. The biggest stumbling block is the lack of 
access to OutputStream instead of Writer, but I saw a patch to address that.

-Todd

-Original Message-
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 17, 2008 4:07 AM
To: solr-dev@lucene.apache.org
Subject: Re: Offer to submit some custom enhancements

if you wish to write out only the known parts (or parts that you
need). say responseheader, results or the output of any known handler,
it would be fine.

But it cannot be a standard responsewriter unless it supports NamedList format

But it is OK.

One quick question? what is the client platform on which the library
is going to run on that you need protocol buffers



On Thu, Oct 16, 2008 at 9:27 PM, Feak, Todd [EMAIL PROTECTED] wrote:
 Answering Grant Ingersoll's question for use case as well, which may
 clarify.

 Without revealing TOO much about our internal structure, we are in the
 process of replacing SOAP communications in house with Protocol Buffers.
 We did evaluate Thrift as well, but decided on Protocol Buffers. A large
 effort for that conversion is well under way. I've been asked if Solr
 can support this, and to create a prototype to see if there are similar
 gains. I don't imagine it will be the gains that we've seen over SOAP,
 but I do foresee some amount of throughput increase.

 So, in response to suggestion for other binary formatting technologies,
 my hands are tied. This is the prototype I have to work on for now. If
 it works out, I will gladly share it. If not, I will share why, and
 hopefully save others some time.

 As for Protocol Buffers not supporting the NamedList structure. Google's
 documentation strongly suggests that intermediate (bean) classes be
 created, instead of trying to marshall and de-marshall your object model
 directly. This intermediate model doesn't have to precisely mirror the
 NamedList, it can be *any* compromise that gets the data from A to B, as
 long as the NamedList can be reconstituted on the other side. I'm sure
 something can be done.

 Thanks,
 Todd Feak


 -Original Message-
 From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 16, 2008 8:17 AM
 To: solr-dev@lucene.apache.org
 Subject: Re: Offer to submit some custom enhancements

 Hi Todd,

 AFAIK, protocol buffers cannot be used for Solr because it is unable to
 support the NamedList structure that all Solr components use.

 The binary protocol (NamedListCodec) that SolrJ uses to communicate with
 Solr server is extremely optimized for our response format. However it
 is
 Java only.

 There are other projects such as Apache Thrift (
 http://incubator.apache.org/thrift/) and Etch (both in incubation) which
 can
 be looked at. There are a few issues in Thrift which may help us in the
 future:

 https://issues.apache.org/jira/browse/THRIFT-110
 https://issues.apache.org/jira/browse/THRIFT-122

 On Thu, Oct 16, 2008 at 12:18 AM, Feak, Todd
 [EMAIL PROTECTED]wrote:

 Reposting, as I inadvertently thread hijacked on the first one. My
 bad.

 Hi all,

 I have a handful of custom classes that we've created for our purposes
 here. I'd like to share them if you think they have value for the rest
 of the community, but I wanted to check here before creating JIRA
 tickets and patches.

 Here's what I have:

 1. DoubleMetaphoneFilter and Factory. This replaces usage of the
 PhoneticFilter and Factory allowing access to set maxCodeLength() on
 the
 DoubleMetaphone encoder and access to the alternate encodings that
 the
 encoder provides for some words.

 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and
 Latin alphabet) exist in both a FullWidth and HalfWidth form. This
 filter normalizes by switching to the FullWidth form for all the
 characters. I have seen at least one JIRA ticket about this issue.
 This
 implementation doesn't rely on Java 1.6.

 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be
 translated to Katakana. This filter normalizes to Katakana so that
 data
 and queries can come in either way and get hits.


 Also, I have been requested to create a prototype that you may be
 interested in. I'm to construct a QueryResponseWriter that returns
 documents using Google's Protocol Buffers. This would rely on an
 existing patch that exposes the OutputStream, but I would like to
 start
 the work soon. Are there license concerns that would block sharing
 this
 with you? Is there any interest in this?

 Thanks for your

RE: Offer to submit some custom enhancements

2008-10-17 Thread Chris Hostetter

 But it cannot be a standard responsewriter unless it supports NamedList 
 format

It has to be able to handle NamedList's contained in SolrQueryResponse, 
but it can output them in whatever format it wants for going over the wire 
... whether the client on the other side of the Protocol Buffer knows how 
to make sense of the data you send it is another matter

: biggest stumbling block is the lack of access to OutputStream instead of 
: Writer, but I saw a patch to address that.

no patch needed, implement BinaryQueryResponseWriter and you'll be given a 
raw OutputStream.


-Hoss



Re: Offer to submit some custom enhancements

2008-10-16 Thread Grant Ingersoll

Hi Todd,

All of these sound good.  Personally, I think analyzers like these  
belong in Lucene's contrib/analyzers package, with Solr factory  
implementations built on those, but that's your call.


As for the Protocol Buffers, I am assuming you mean: http://code.google.com/p/protobuf/ 
   That is an Apache license, so it is fine to incorporate.  Sounds  
like it might be a contrib to start, but that's just my take.


Sounds like they might be worth using in SolrJ and for distributed,  
but am interested in how it compares to other similar technologies.   
Can you share your use case for them?


-Grant

On Oct 15, 2008, at 2:48 PM, Feak, Todd wrote:

Reposting, as I inadvertently thread hijacked on the first one. My  
bad.


Hi all,

I have a handful of custom classes that we've created for our purposes
here. I'd like to share them if you think they have value for the rest
of the community, but I wanted to check here before creating JIRA
tickets and patches.

Here's what I have:

1. DoubleMetaphoneFilter and Factory. This replaces usage of the
PhoneticFilter and Factory allowing access to set maxCodeLength() on  
the
DoubleMetaphone encoder and access to the alternate encodings that  
the

encoder provides for some words.

2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and
Latin alphabet) exist in both a FullWidth and HalfWidth form. This
filter normalizes by switching to the FullWidth form for all the
characters. I have seen at least one JIRA ticket about this issue.  
This

implementation doesn't rely on Java 1.6.

3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be
translated to Katakana. This filter normalizes to Katakana so that  
data

and queries can come in either way and get hits.


Also, I have been requested to create a prototype that you may be
interested in. I'm to construct a QueryResponseWriter that returns
documents using Google's Protocol Buffers. This would rely on an
existing patch that exposes the OutputStream, but I would like to  
start
the work soon. Are there license concerns that would block sharing  
this

with you? Is there any interest in this?

Thanks for your consideration,
Todd Feak


--
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











Re: Offer to submit some custom enhancements

2008-10-16 Thread Shalin Shekhar Mangar
Hi Todd,

AFAIK, protocol buffers cannot be used for Solr because it is unable to
support the NamedList structure that all Solr components use.

The binary protocol (NamedListCodec) that SolrJ uses to communicate with
Solr server is extremely optimized for our response format. However it is
Java only.

There are other projects such as Apache Thrift (
http://incubator.apache.org/thrift/) and Etch (both in incubation) which can
be looked at. There are a few issues in Thrift which may help us in the
future:

https://issues.apache.org/jira/browse/THRIFT-110
https://issues.apache.org/jira/browse/THRIFT-122

On Thu, Oct 16, 2008 at 12:18 AM, Feak, Todd [EMAIL PROTECTED]wrote:

 Reposting, as I inadvertently thread hijacked on the first one. My bad.

 Hi all,

 I have a handful of custom classes that we've created for our purposes
 here. I'd like to share them if you think they have value for the rest
 of the community, but I wanted to check here before creating JIRA
 tickets and patches.

 Here's what I have:

 1. DoubleMetaphoneFilter and Factory. This replaces usage of the
 PhoneticFilter and Factory allowing access to set maxCodeLength() on the
 DoubleMetaphone encoder and access to the alternate encodings that the
 encoder provides for some words.

 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and
 Latin alphabet) exist in both a FullWidth and HalfWidth form. This
 filter normalizes by switching to the FullWidth form for all the
 characters. I have seen at least one JIRA ticket about this issue. This
 implementation doesn't rely on Java 1.6.

 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be
 translated to Katakana. This filter normalizes to Katakana so that data
 and queries can come in either way and get hits.


 Also, I have been requested to create a prototype that you may be
 interested in. I'm to construct a QueryResponseWriter that returns
 documents using Google's Protocol Buffers. This would rely on an
 existing patch that exposes the OutputStream, but I would like to start
 the work soon. Are there license concerns that would block sharing this
 with you? Is there any interest in this?

 Thanks for your consideration,
 Todd Feak




-- 
Regards,
Shalin Shekhar Mangar.


Re: Offer to submit some custom enhancements

2008-10-16 Thread Walter Underwood
Python marshal format supports everything we need and is easy to implement
in Java. It is roughly equivalent to JSON, but binary.

http://docs.python.org/library/marshal.html

wunder

On 10/16/08 8:16 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote:

 Hi Todd,
 
 AFAIK, protocol buffers cannot be used for Solr because it is unable to
 support the NamedList structure that all Solr components use.
 
 The binary protocol (NamedListCodec) that SolrJ uses to communicate with
 Solr server is extremely optimized for our response format. However it is
 Java only.
 
 There are other projects such as Apache Thrift (
 http://incubator.apache.org/thrift/) and Etch (both in incubation) which can
 be looked at. There are a few issues in Thrift which may help us in the
 future:
 
 https://issues.apache.org/jira/browse/THRIFT-110
 https://issues.apache.org/jira/browse/THRIFT-122
 
 On Thu, Oct 16, 2008 at 12:18 AM, Feak, Todd [EMAIL PROTECTED]wrote:
 
 Reposting, as I inadvertently thread hijacked on the first one. My bad.
 
 Hi all,
 
 I have a handful of custom classes that we've created for our purposes
 here. I'd like to share them if you think they have value for the rest
 of the community, but I wanted to check here before creating JIRA
 tickets and patches.
 
 Here's what I have:
 
 1. DoubleMetaphoneFilter and Factory. This replaces usage of the
 PhoneticFilter and Factory allowing access to set maxCodeLength() on the
 DoubleMetaphone encoder and access to the alternate encodings that the
 encoder provides for some words.
 
 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and
 Latin alphabet) exist in both a FullWidth and HalfWidth form. This
 filter normalizes by switching to the FullWidth form for all the
 characters. I have seen at least one JIRA ticket about this issue. This
 implementation doesn't rely on Java 1.6.
 
 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be
 translated to Katakana. This filter normalizes to Katakana so that data
 and queries can come in either way and get hits.
 
 
 Also, I have been requested to create a prototype that you may be
 interested in. I'm to construct a QueryResponseWriter that returns
 documents using Google's Protocol Buffers. This would rely on an
 existing patch that exposes the OutputStream, but I would like to start
 the work soon. Are there license concerns that would block sharing this
 with you? Is there any interest in this?
 
 Thanks for your consideration,
 Todd Feak
 
 
 



RE: Offer to submit some custom enhancements

2008-10-16 Thread Feak, Todd
Regarding the location of the Filters and Factories ... I agree that the
Filters would be best located in Lucene, as users of both packages would
then have access. 

What I'm struggling with is the timing of putting Filters into Lucene,
and then Factories into Solr. The Factories in Solr would be useless
until the Filters had been accepted and released in Lucene, then the
Lucene version upgraded in Solr. What I'm inclined to do is release the
Filters to both, and have the Factories point to the Solr version, until
they become available in the Lucene version, then switch them over and
drop the Solr version.

How is this handled with other new Filter/Factory sets? 

Just let me know, and I'll get the ball rolling on those.

I'm going to follow up on Protocol Buffers in response to some other
messages I see coming in.

Thanks,
Todd Feak

-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 16, 2008 7:12 AM
To: solr-dev@lucene.apache.org
Subject: Re: Offer to submit some custom enhancements

Hi Todd,

All of these sound good.  Personally, I think analyzers like these  
belong in Lucene's contrib/analyzers package, with Solr factory  
implementations built on those, but that's your call.

As for the Protocol Buffers, I am assuming you mean:
http://code.google.com/p/protobuf/ 
That is an Apache license, so it is fine to incorporate.  Sounds  
like it might be a contrib to start, but that's just my take.

Sounds like they might be worth using in SolrJ and for distributed,  
but am interested in how it compares to other similar technologies.   
Can you share your use case for them?

-Grant

On Oct 15, 2008, at 2:48 PM, Feak, Todd wrote:

 Reposting, as I inadvertently thread hijacked on the first one. My  
 bad.

 Hi all,

 I have a handful of custom classes that we've created for our purposes
 here. I'd like to share them if you think they have value for the rest
 of the community, but I wanted to check here before creating JIRA
 tickets and patches.

 Here's what I have:

 1. DoubleMetaphoneFilter and Factory. This replaces usage of the
 PhoneticFilter and Factory allowing access to set maxCodeLength() on  
 the
 DoubleMetaphone encoder and access to the alternate encodings that  
 the
 encoder provides for some words.

 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and
 Latin alphabet) exist in both a FullWidth and HalfWidth form. This
 filter normalizes by switching to the FullWidth form for all the
 characters. I have seen at least one JIRA ticket about this issue.  
 This
 implementation doesn't rely on Java 1.6.

 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be
 translated to Katakana. This filter normalizes to Katakana so that  
 data
 and queries can come in either way and get hits.


 Also, I have been requested to create a prototype that you may be
 interested in. I'm to construct a QueryResponseWriter that returns
 documents using Google's Protocol Buffers. This would rely on an
 existing patch that exposes the OutputStream, but I would like to  
 start
 the work soon. Are there license concerns that would block sharing  
 this
 with you? Is there any interest in this?

 Thanks for your consideration,
 Todd Feak

--
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












RE: Offer to submit some custom enhancements

2008-10-16 Thread Feak, Todd
Answering Grant Ingersoll's question for use case as well, which may
clarify.

Without revealing TOO much about our internal structure, we are in the
process of replacing SOAP communications in house with Protocol Buffers.
We did evaluate Thrift as well, but decided on Protocol Buffers. A large
effort for that conversion is well under way. I've been asked if Solr
can support this, and to create a prototype to see if there are similar
gains. I don't imagine it will be the gains that we've seen over SOAP,
but I do foresee some amount of throughput increase.

So, in response to suggestion for other binary formatting technologies,
my hands are tied. This is the prototype I have to work on for now. If
it works out, I will gladly share it. If not, I will share why, and
hopefully save others some time.

As for Protocol Buffers not supporting the NamedList structure. Google's
documentation strongly suggests that intermediate (bean) classes be
created, instead of trying to marshall and de-marshall your object model
directly. This intermediate model doesn't have to precisely mirror the
NamedList, it can be *any* compromise that gets the data from A to B, as
long as the NamedList can be reconstituted on the other side. I'm sure
something can be done.

Thanks,
Todd Feak


-Original Message-
From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 16, 2008 8:17 AM
To: solr-dev@lucene.apache.org
Subject: Re: Offer to submit some custom enhancements

Hi Todd,

AFAIK, protocol buffers cannot be used for Solr because it is unable to
support the NamedList structure that all Solr components use.

The binary protocol (NamedListCodec) that SolrJ uses to communicate with
Solr server is extremely optimized for our response format. However it
is
Java only.

There are other projects such as Apache Thrift (
http://incubator.apache.org/thrift/) and Etch (both in incubation) which
can
be looked at. There are a few issues in Thrift which may help us in the
future:

https://issues.apache.org/jira/browse/THRIFT-110
https://issues.apache.org/jira/browse/THRIFT-122

On Thu, Oct 16, 2008 at 12:18 AM, Feak, Todd
[EMAIL PROTECTED]wrote:

 Reposting, as I inadvertently thread hijacked on the first one. My
bad.

 Hi all,

 I have a handful of custom classes that we've created for our purposes
 here. I'd like to share them if you think they have value for the rest
 of the community, but I wanted to check here before creating JIRA
 tickets and patches.

 Here's what I have:

 1. DoubleMetaphoneFilter and Factory. This replaces usage of the
 PhoneticFilter and Factory allowing access to set maxCodeLength() on
the
 DoubleMetaphone encoder and access to the alternate encodings that
the
 encoder provides for some words.

 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and
 Latin alphabet) exist in both a FullWidth and HalfWidth form. This
 filter normalizes by switching to the FullWidth form for all the
 characters. I have seen at least one JIRA ticket about this issue.
This
 implementation doesn't rely on Java 1.6.

 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be
 translated to Katakana. This filter normalizes to Katakana so that
data
 and queries can come in either way and get hits.


 Also, I have been requested to create a prototype that you may be
 interested in. I'm to construct a QueryResponseWriter that returns
 documents using Google's Protocol Buffers. This would rely on an
 existing patch that exposes the OutputStream, but I would like to
start
 the work soon. Are there license concerns that would block sharing
this
 with you? Is there any interest in this?

 Thanks for your consideration,
 Todd Feak




-- 
Regards,
Shalin Shekhar Mangar.


Offer to submit some custom enhancements

2008-10-15 Thread Feak, Todd
Hi all,

I have a handful of custom classes that we've created for our purposes
here. I'd like to share them if you think they have value for the rest
of the community, but I wanted to check here before creating JIRA
tickets and patches.

Here's what I have:

1. DoubleMetaphoneFilter and Factory. This replaces usage of the
PhoneticFilter and Factory allowing access to set maxCodeLength() on the
DoubleMetaphone encoder and access to the alternate encodings that the
encoder provides for some words.

2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and
Latin alphabet) exist in both a FullWidth and HalfWidth form. This
filter normalizes by switching to the FullWidth form for all the
characters. I have seen at least one JIRA ticket about this issue. This
implementation doesn't rely on Java 1.6.

3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be
translated to Katakana. This filter normalizes to Katakana so that data
and queries can come in either way and get hits.


Also, I have been requested to create a prototype that you may be
interested in. I'm to construct a QueryResponseWriter that returns
documents using Google's Protocol Buffers. This would rely on an
existing patch that exposes the OutputStream, but I would like to start
the work soon. Are there license concerns that would block sharing this
with you? Is there any interest in this?

Thanks for your consideration,
Todd Feak


Offer to submit some custom enhancements

2008-10-15 Thread Feak, Todd
Reposting, as I inadvertently thread hijacked on the first one. My bad.

Hi all,

I have a handful of custom classes that we've created for our purposes
here. I'd like to share them if you think they have value for the rest
of the community, but I wanted to check here before creating JIRA
tickets and patches.

Here's what I have:

1. DoubleMetaphoneFilter and Factory. This replaces usage of the
PhoneticFilter and Factory allowing access to set maxCodeLength() on the
DoubleMetaphone encoder and access to the alternate encodings that the
encoder provides for some words.

2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and
Latin alphabet) exist in both a FullWidth and HalfWidth form. This
filter normalizes by switching to the FullWidth form for all the
characters. I have seen at least one JIRA ticket about this issue. This
implementation doesn't rely on Java 1.6.

3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be
translated to Katakana. This filter normalizes to Katakana so that data
and queries can come in either way and get hits.


Also, I have been requested to create a prototype that you may be
interested in. I'm to construct a QueryResponseWriter that returns
documents using Google's Protocol Buffers. This would rely on an
existing patch that exposes the OutputStream, but I would like to start
the work soon. Are there license concerns that would block sharing this
with you? Is there any interest in this?

Thanks for your consideration,
Todd Feak