Re: Proposal: Lucene indexing/searching for nested objects

2017-07-20 Thread Jacob Barrett
Good point! Sounds good then.

Sent from my iPhone

> On Jul 20, 2017, at 11:15 AM, Dan Smith  wrote:
> 
>> On Thu, Jul 20, 2017 at 10:57 AM, Jacob Barrett  wrote:
>> 
>> I really feel like an annotation would make the most sense. How likely is
>> it that the object could not be annotated or the serializer for the object
>> is not tightly coupled with the object? Having to map objects to
>> serializers externally then becomes a lot more complicated to keep
>> consistent.
>> 
> 
> Well, with PDX serialization there may not even be a java class, or it may
> not be present on the server. So annotations don't really cover all of the
> use cases. With the proposed API, you could plug in an annotation based
> serializer, if you wanted to.
> 
> -Dan


Re: Proposal: Lucene indexing/searching for nested objects

2017-07-20 Thread Dan Smith
On Thu, Jul 20, 2017 at 10:57 AM, Jacob Barrett  wrote:

> I really feel like an annotation would make the most sense. How likely is
> it that the object could not be annotated or the serializer for the object
> is not tightly coupled with the object? Having to map objects to
> serializers externally then becomes a lot more complicated to keep
> consistent.
>

Well, with PDX serialization there may not even be a java class, or it may
not be present on the server. So annotations don't really cover all of the
use cases. With the proposed API, you could plug in an annotation based
serializer, if you wanted to.

-Dan


Re: Proposal: Lucene indexing/searching for nested objects

2017-07-20 Thread Jacob Barrett
I really feel like an annotation would make the most sense. How likely is it 
that the object could not be annotated or the serializer for the object is not 
tightly coupled with the object? Having to map objects to serializers 
externally then becomes a lot more complicated to keep consistent.

Sent from my iPhone

> On Jul 20, 2017, at 10:38 AM, Dan Smith  wrote:
> 
> This proposal doesn't really talk about XML or gfsh support.
> 
> The XML should probably just be a nested xml element, like this. It should
> have the same support for declarables that other callbacks in the xml do.
> 
> 
>   com.mycompany.MySerializer 
> 
> 
> The gfsh command to create an index should also accept a serializer, like
> this
> 
> create lucene index --serializer=com.mycompany.MySerializer
> 
> If there are no objections I'll update the proposal.
> 
> -Dan
> 
>> On Tue, Jul 18, 2017 at 10:38 AM, Dan Smith  wrote:
>> 
>> I think this LuceneSerializer API needs a slight tweak. In order to
>> implement the proposed FlatFormatSerializer, the serializer needs access to
>> the index configuration to see what fields the user wants to index. We
>> should also pass the LuceneIndex to the serializer.
>> 
>> public interface LuceneSerializer {
>>  Collection toDocuments(Object value, *LuceneIndex index*);
>> }
>> 
>>> On Thu, Jul 13, 2017 at 2:19 PM, Dan Smith  wrote:
>>> 
>>> On Thu, Jul 13, 2017 at 11:26 AM, Jacob Barrett 
>>> wrote:
>>> 
 Collections are really tough in Lucene because you have to flatten the
 document. I struggled against it for some time on a project a few years ago
 and ultimately decided to index the relationships separately and then merge
 the results.
 
>>> 
>>> Yeah, this is part of the motivation for providing the LuceneSerializer
>>> API. We can provide a built in serializer that just flattens all nested
>>> collections into a single field, but users could also write their own
>>> implementation that converts the nested objects into separate lucene
>>> documents and use some of query classes in org.apache.lucene.search.join if
>>> they really need to.
>>> 
>>> It's not part of the goal here, but I think this LuceneSerializer API
>>> could also make it easier to do spatial indexing, because users could
>>> create a serializer that converts their gemfire object into a Lucene
>>> document with GeoPointFields.
>>> 
>>> -Dan
>>> 
>>> 
>> 


Re: Proposal: Lucene indexing/searching for nested objects

2017-07-20 Thread Dan Smith
This proposal doesn't really talk about XML or gfsh support.

The XML should probably just be a nested xml element, like this. It should
have the same support for declarables that other callbacks in the xml do.


   com.mycompany.MySerializer 


The gfsh command to create an index should also accept a serializer, like
this

create lucene index --serializer=com.mycompany.MySerializer

If there are no objections I'll update the proposal.

-Dan

On Tue, Jul 18, 2017 at 10:38 AM, Dan Smith  wrote:

> I think this LuceneSerializer API needs a slight tweak. In order to
> implement the proposed FlatFormatSerializer, the serializer needs access to
> the index configuration to see what fields the user wants to index. We
> should also pass the LuceneIndex to the serializer.
>
> public interface LuceneSerializer {
>   Collection toDocuments(Object value, *LuceneIndex index*);
> }
>
> On Thu, Jul 13, 2017 at 2:19 PM, Dan Smith  wrote:
>
>> On Thu, Jul 13, 2017 at 11:26 AM, Jacob Barrett 
>> wrote:
>>
>>> Collections are really tough in Lucene because you have to flatten the
>>> document. I struggled against it for some time on a project a few years ago
>>> and ultimately decided to index the relationships separately and then merge
>>> the results.
>>>
>>
>> Yeah, this is part of the motivation for providing the LuceneSerializer
>> API. We can provide a built in serializer that just flattens all nested
>> collections into a single field, but users could also write their own
>> implementation that converts the nested objects into separate lucene
>> documents and use some of query classes in org.apache.lucene.search.join if
>> they really need to.
>>
>> It's not part of the goal here, but I think this LuceneSerializer API
>> could also make it easier to do spatial indexing, because users could
>> create a serializer that converts their gemfire object into a Lucene
>> document with GeoPointFields.
>>
>> -Dan
>>
>>
>


Re: Proposal: Lucene indexing/searching for nested objects

2017-07-18 Thread Dan Smith
I think this LuceneSerializer API needs a slight tweak. In order to
implement the proposed FlatFormatSerializer, the serializer needs access to
the index configuration to see what fields the user wants to index. We
should also pass the LuceneIndex to the serializer.

public interface LuceneSerializer {
  Collection toDocuments(Object value, *LuceneIndex index*);
}

On Thu, Jul 13, 2017 at 2:19 PM, Dan Smith  wrote:

> On Thu, Jul 13, 2017 at 11:26 AM, Jacob Barrett 
> wrote:
>
>> Collections are really tough in Lucene because you have to flatten the
>> document. I struggled against it for some time on a project a few years ago
>> and ultimately decided to index the relationships separately and then merge
>> the results.
>>
>
> Yeah, this is part of the motivation for providing the LuceneSerializer
> API. We can provide a built in serializer that just flattens all nested
> collections into a single field, but users could also write their own
> implementation that converts the nested objects into separate lucene
> documents and use some of query classes in org.apache.lucene.search.join if
> they really need to.
>
> It's not part of the goal here, but I think this LuceneSerializer API
> could also make it easier to do spatial indexing, because users could
> create a serializer that converts their gemfire object into a Lucene
> document with GeoPointFields.
>
> -Dan
>
>


Re: Proposal: Lucene indexing/searching for nested objects

2017-07-13 Thread Dan Smith
On Thu, Jul 13, 2017 at 11:26 AM, Jacob Barrett  wrote:

> Collections are really tough in Lucene because you have to flatten the
> document. I struggled against it for some time on a project a few years ago
> and ultimately decided to index the relationships separately and then merge
> the results.
>

Yeah, this is part of the motivation for providing the LuceneSerializer
API. We can provide a built in serializer that just flattens all nested
collections into a single field, but users could also write their own
implementation that converts the nested objects into separate lucene
documents and use some of query classes in org.apache.lucene.search.join if
they really need to.

It's not part of the goal here, but I think this LuceneSerializer API could
also make it easier to do spatial indexing, because users could create a
serializer that converts their gemfire object into a Lucene document with
GeoPointFields.

-Dan


Re: Proposal: Lucene indexing/searching for nested objects

2017-07-13 Thread Jacob Barrett
Collections are really tough in Lucene because you have to flatten the 
document. I struggled against it for some time on a project a few years ago and 
ultimately decided to index the relationships separately and then merge the 
results.


Sent from my iPhone

> On Jul 13, 2017, at 11:13 AM, Dan Smith  wrote:
> 
> +1 Looks good. I think we should consider adding support for collections as
> well, but that doesn't have to be in the first cut.
> 
> -Dan
> 
>> On Wed, Jul 12, 2017 at 10:37 AM, Diane Hardman  wrote:
>> 
>> The Geode 1.2.0 release includes Lucene text search fully integrated and
>> tested (no longer experimental). We are now proposing enhancements to
>> improve Lucene usability in Geode.
>> 
>> Some Geode users create data models that include nested and complex
>> objects. The current Geode Lucene integration supports indexing and
>> querying only the top-level fields in the data object. The objective of
>> this proposal is to support indexing and querying an arbitrary depth of
>> nested objects.
>> 
>> 
>> Please review the proposal in the following wiki page and give us your
>> feedback.
>> 
>> https://cwiki.apache.org/confluence/display/GEODE/
>> Lucene+Text+Search+on+Nested+Object
>> 


Re: Proposal: Lucene indexing/searching for nested objects

2017-07-13 Thread Dan Smith
+1 Looks good. I think we should consider adding support for collections as
well, but that doesn't have to be in the first cut.

-Dan

On Wed, Jul 12, 2017 at 10:37 AM, Diane Hardman  wrote:

> The Geode 1.2.0 release includes Lucene text search fully integrated and
> tested (no longer experimental). We are now proposing enhancements to
> improve Lucene usability in Geode.
>
> Some Geode users create data models that include nested and complex
> objects. The current Geode Lucene integration supports indexing and
> querying only the top-level fields in the data object. The objective of
> this proposal is to support indexing and querying an arbitrary depth of
> nested objects.
>
>
> Please review the proposal in the following wiki page and give us your
> feedback.
>
> https://cwiki.apache.org/confluence/display/GEODE/
> Lucene+Text+Search+on+Nested+Object
>


Proposal: Lucene indexing/searching for nested objects

2017-07-12 Thread Diane Hardman
The Geode 1.2.0 release includes Lucene text search fully integrated and
tested (no longer experimental). We are now proposing enhancements to
improve Lucene usability in Geode.

Some Geode users create data models that include nested and complex
objects. The current Geode Lucene integration supports indexing and
querying only the top-level fields in the data object. The objective of
this proposal is to support indexing and querying an arbitrary depth of
nested objects.


Please review the proposal in the following wiki page and give us your
feedback.

https://cwiki.apache.org/confluence/display/GEODE/Lucene+Text+Search+on+Nested+Object