Re: Proposal: Lucene indexing/searching for nested objects
Good point! Sounds good then. Sent from my iPhone > On Jul 20, 2017, at 11:15 AM, Dan Smith wrote: > >> On Thu, Jul 20, 2017 at 10:57 AM, Jacob Barrett wrote: >> >> I really feel like an annotation would make the most sense. How likely is >> it that the object could not be annotated or the serializer for the object >> is not tightly coupled with the object? Having to map objects to >> serializers externally then becomes a lot more complicated to keep >> consistent. >> > > Well, with PDX serialization there may not even be a java class, or it may > not be present on the server. So annotations don't really cover all of the > use cases. With the proposed API, you could plug in an annotation based > serializer, if you wanted to. > > -Dan
Re: Proposal: Lucene indexing/searching for nested objects
On Thu, Jul 20, 2017 at 10:57 AM, Jacob Barrett wrote: > I really feel like an annotation would make the most sense. How likely is > it that the object could not be annotated or the serializer for the object > is not tightly coupled with the object? Having to map objects to > serializers externally then becomes a lot more complicated to keep > consistent. > Well, with PDX serialization there may not even be a java class, or it may not be present on the server. So annotations don't really cover all of the use cases. With the proposed API, you could plug in an annotation based serializer, if you wanted to. -Dan
Re: Proposal: Lucene indexing/searching for nested objects
I really feel like an annotation would make the most sense. How likely is it that the object could not be annotated or the serializer for the object is not tightly coupled with the object? Having to map objects to serializers externally then becomes a lot more complicated to keep consistent. Sent from my iPhone > On Jul 20, 2017, at 10:38 AM, Dan Smith wrote: > > This proposal doesn't really talk about XML or gfsh support. > > The XML should probably just be a nested xml element, like this. It should > have the same support for declarables that other callbacks in the xml do. > > > com.mycompany.MySerializer > > > The gfsh command to create an index should also accept a serializer, like > this > > create lucene index --serializer=com.mycompany.MySerializer > > If there are no objections I'll update the proposal. > > -Dan > >> On Tue, Jul 18, 2017 at 10:38 AM, Dan Smith wrote: >> >> I think this LuceneSerializer API needs a slight tweak. In order to >> implement the proposed FlatFormatSerializer, the serializer needs access to >> the index configuration to see what fields the user wants to index. We >> should also pass the LuceneIndex to the serializer. >> >> public interface LuceneSerializer { >> Collection toDocuments(Object value, *LuceneIndex index*); >> } >> >>> On Thu, Jul 13, 2017 at 2:19 PM, Dan Smith wrote: >>> >>> On Thu, Jul 13, 2017 at 11:26 AM, Jacob Barrett >>> wrote: >>> Collections are really tough in Lucene because you have to flatten the document. I struggled against it for some time on a project a few years ago and ultimately decided to index the relationships separately and then merge the results. >>> >>> Yeah, this is part of the motivation for providing the LuceneSerializer >>> API. We can provide a built in serializer that just flattens all nested >>> collections into a single field, but users could also write their own >>> implementation that converts the nested objects into separate lucene >>> documents and use some of query classes in org.apache.lucene.search.join if >>> they really need to. >>> >>> It's not part of the goal here, but I think this LuceneSerializer API >>> could also make it easier to do spatial indexing, because users could >>> create a serializer that converts their gemfire object into a Lucene >>> document with GeoPointFields. >>> >>> -Dan >>> >>> >>
Re: Proposal: Lucene indexing/searching for nested objects
This proposal doesn't really talk about XML or gfsh support. The XML should probably just be a nested xml element, like this. It should have the same support for declarables that other callbacks in the xml do. com.mycompany.MySerializer The gfsh command to create an index should also accept a serializer, like this create lucene index --serializer=com.mycompany.MySerializer If there are no objections I'll update the proposal. -Dan On Tue, Jul 18, 2017 at 10:38 AM, Dan Smith wrote: > I think this LuceneSerializer API needs a slight tweak. In order to > implement the proposed FlatFormatSerializer, the serializer needs access to > the index configuration to see what fields the user wants to index. We > should also pass the LuceneIndex to the serializer. > > public interface LuceneSerializer { > Collection toDocuments(Object value, *LuceneIndex index*); > } > > On Thu, Jul 13, 2017 at 2:19 PM, Dan Smith wrote: > >> On Thu, Jul 13, 2017 at 11:26 AM, Jacob Barrett >> wrote: >> >>> Collections are really tough in Lucene because you have to flatten the >>> document. I struggled against it for some time on a project a few years ago >>> and ultimately decided to index the relationships separately and then merge >>> the results. >>> >> >> Yeah, this is part of the motivation for providing the LuceneSerializer >> API. We can provide a built in serializer that just flattens all nested >> collections into a single field, but users could also write their own >> implementation that converts the nested objects into separate lucene >> documents and use some of query classes in org.apache.lucene.search.join if >> they really need to. >> >> It's not part of the goal here, but I think this LuceneSerializer API >> could also make it easier to do spatial indexing, because users could >> create a serializer that converts their gemfire object into a Lucene >> document with GeoPointFields. >> >> -Dan >> >> >
Re: Proposal: Lucene indexing/searching for nested objects
I think this LuceneSerializer API needs a slight tweak. In order to implement the proposed FlatFormatSerializer, the serializer needs access to the index configuration to see what fields the user wants to index. We should also pass the LuceneIndex to the serializer. public interface LuceneSerializer { Collection toDocuments(Object value, *LuceneIndex index*); } On Thu, Jul 13, 2017 at 2:19 PM, Dan Smith wrote: > On Thu, Jul 13, 2017 at 11:26 AM, Jacob Barrett > wrote: > >> Collections are really tough in Lucene because you have to flatten the >> document. I struggled against it for some time on a project a few years ago >> and ultimately decided to index the relationships separately and then merge >> the results. >> > > Yeah, this is part of the motivation for providing the LuceneSerializer > API. We can provide a built in serializer that just flattens all nested > collections into a single field, but users could also write their own > implementation that converts the nested objects into separate lucene > documents and use some of query classes in org.apache.lucene.search.join if > they really need to. > > It's not part of the goal here, but I think this LuceneSerializer API > could also make it easier to do spatial indexing, because users could > create a serializer that converts their gemfire object into a Lucene > document with GeoPointFields. > > -Dan > >
Re: Proposal: Lucene indexing/searching for nested objects
On Thu, Jul 13, 2017 at 11:26 AM, Jacob Barrett wrote: > Collections are really tough in Lucene because you have to flatten the > document. I struggled against it for some time on a project a few years ago > and ultimately decided to index the relationships separately and then merge > the results. > Yeah, this is part of the motivation for providing the LuceneSerializer API. We can provide a built in serializer that just flattens all nested collections into a single field, but users could also write their own implementation that converts the nested objects into separate lucene documents and use some of query classes in org.apache.lucene.search.join if they really need to. It's not part of the goal here, but I think this LuceneSerializer API could also make it easier to do spatial indexing, because users could create a serializer that converts their gemfire object into a Lucene document with GeoPointFields. -Dan
Re: Proposal: Lucene indexing/searching for nested objects
Collections are really tough in Lucene because you have to flatten the document. I struggled against it for some time on a project a few years ago and ultimately decided to index the relationships separately and then merge the results. Sent from my iPhone > On Jul 13, 2017, at 11:13 AM, Dan Smith wrote: > > +1 Looks good. I think we should consider adding support for collections as > well, but that doesn't have to be in the first cut. > > -Dan > >> On Wed, Jul 12, 2017 at 10:37 AM, Diane Hardman wrote: >> >> The Geode 1.2.0 release includes Lucene text search fully integrated and >> tested (no longer experimental). We are now proposing enhancements to >> improve Lucene usability in Geode. >> >> Some Geode users create data models that include nested and complex >> objects. The current Geode Lucene integration supports indexing and >> querying only the top-level fields in the data object. The objective of >> this proposal is to support indexing and querying an arbitrary depth of >> nested objects. >> >> >> Please review the proposal in the following wiki page and give us your >> feedback. >> >> https://cwiki.apache.org/confluence/display/GEODE/ >> Lucene+Text+Search+on+Nested+Object >>
Re: Proposal: Lucene indexing/searching for nested objects
+1 Looks good. I think we should consider adding support for collections as well, but that doesn't have to be in the first cut. -Dan On Wed, Jul 12, 2017 at 10:37 AM, Diane Hardman wrote: > The Geode 1.2.0 release includes Lucene text search fully integrated and > tested (no longer experimental). We are now proposing enhancements to > improve Lucene usability in Geode. > > Some Geode users create data models that include nested and complex > objects. The current Geode Lucene integration supports indexing and > querying only the top-level fields in the data object. The objective of > this proposal is to support indexing and querying an arbitrary depth of > nested objects. > > > Please review the proposal in the following wiki page and give us your > feedback. > > https://cwiki.apache.org/confluence/display/GEODE/ > Lucene+Text+Search+on+Nested+Object >
Proposal: Lucene indexing/searching for nested objects
The Geode 1.2.0 release includes Lucene text search fully integrated and tested (no longer experimental). We are now proposing enhancements to improve Lucene usability in Geode. Some Geode users create data models that include nested and complex objects. The current Geode Lucene integration supports indexing and querying only the top-level fields in the data object. The objective of this proposal is to support indexing and querying an arbitrary depth of nested objects. Please review the proposal in the following wiki page and give us your feedback. https://cwiki.apache.org/confluence/display/GEODE/Lucene+Text+Search+on+Nested+Object