I think there might not be enough time. Bertrand, WDYT?
Critical for project success or an add on ? Ian On 2 May 2013 16:34, Ilya Velesevich <ilya.velesev...@gmail.com> wrote: > Hi Ian, > > Many thanks for your reply! > > Also one additional clarification about "using DataImportHandler or > ManifoldCF to provide search for Sling resources using Solr". Could you > share some thoughts about this task? Or you probably think that this task > should not be part of GSoC as it seems there could be not enough time to > implement such support? > > Thanks, > Ilya > > > On Wed, May 1, 2013 at 2:05 AM, Ian Boston <i...@tfd.co.uk> wrote: > > > Hi > > Some comments in line, > > but please remember to submit this proposal at the GSoC site so that it > can > > be reviewed. > > The deadline is > > > > 3rd May 2013 > > > > Ie this Friday. > > > > Ian > > (More below). > > > > > > On 30 April 2013 19:15, Ilya Velesevich <ilya.velesev...@gmail.com> > wrote: > > > > > Hi Everyone, > > > > > > I‘m working on proposal for “Apache Solr backend for Apache Sling” task > > as > > > part of Google Summer of Code 2013 – > > > https://issues.apache.org/jira/browse/SLING-2795. Thus far I was > reading > > > articles/watching videos/looking through source code to investigate the > > > topic in more depth. Now I want to describe my vision on task and > > > implementation approach. All your comments/suggestions would be very > > > helpful in order to improve my proposal and bring more value of > > > implementing the task. > > > > > > I see several parts of the task. > > > > > > *1. **Provide CRUDL operations for Solr data through Sling API.* > > > > > > This will allow creating Sling resources residing in Solr server and > > > querying them through Sling API using Solr search capabilities. Solr > > query > > > syntax should be used for queries. > > > > > > From Sling API perspective custom *ResourceProvider *(and *Resource*) > > > implementation will be created additionally implementing * > > > QueriableResourceProvider* and *ModifyingResourceProvider*. (If > > necessary * > > > RefreshableResourceProvider* and *DynamicResourceProvider* interfaces > > will > > > also be implemented). To communicate with Solr server Solrj API will be > > > used. > > > > > > > > > yes (and you might want to think about runing Solr embedded for dev > > purposes). > > > > > > > > > > *2. **Provide convenient ways to create Solr resources based on > > > different data.* > > > > > > *2.1. **Create Solr resource based on arbitrary Sling resource*. > This > > > will allow adding Sling resources to Solr server for efficient search. > > The > > > created Solr resource will also hold a reference (most likely, resource > > > path) to the original Sling resource. The *Adaptable* concept seems to > > be a > > > reasonable way of implementing this functionality – to “convert” > > arbitrary > > > Sling resource to Solr resource and resolve original Sling resource > based > > > on Solr resource. > > > > > > Also I think that not all metadata of Sling resource should be used > when > > > creating corresponding Solr resource – so this task should also include > > > some configuration to specify metadata necessary to be passed to Solr > > > resource. Additionally, some transformations on resource metadata could > > be > > > supported here. > > > > > > > > > > > I think you should think initially about just getting or resolving Solr > > resources using the ResourceResolver. > > > > Later you can add creating those resources via the > > ModifyingResourceProvider. If you think of a Resource as a map of > > properties, then it fits the Solr document model reasonably well. Ie a > > Resource maps 1:1 with a Solr Document. > > > > > > > > > > > > * 2.2. *When creating Solr resources not all data could be > efficiently > > > stored in Solr – for instance, large binary files. If this is the > > > situation, one could create Sling resource (for instance, FileSystem or > > > Jackrabbit) and then create Solr resource based on that Sling resource > – > > > this’ll allow both efficient search through Solr and effective storing > > > options. As an optimization, these steps could be done automatically > > based > > > on some configuration. So *when Solr resource is created we could > analyze > > > it > > > * (analyze metadata, trying to adapt to certain types) *and create > > > additional supporting resources in other parts of Sling virtual > resource > > > tree if necessary*. What do you think – is it necessary to implement > such > > > functionality or 2.1 option will be sufficient? What useful scenarios > do > > > you see for this task besides the “large binary” scenario? > > > > > > > > > Resources may have properties that are streams. How the stream is stored > > and delivered is an implementation detail of the ResourceProvider and the > > object it provides. So a SolrResourceProvider might provide SolrResource > > objects, which expose a SolrResourceDocument when > > resource.adaptTo(SolrResourceDocument.class) is invoked. > > > > The SolrResourceDocument might then have a getBodyStream() method. > > > > > > > > > > *3. **Provide solution to support search for arbitrary Sling > > > resources through Sling API using Solr capabilities.* > > > > > > From my point of view this one needs some external solutions to support > > > things like full index, incremental index, creating different > schedules, > > > etc. I see that Solr DataImportHandler or Apache ManifoldCF could be > > > utilized for this task. So the concept of solution here would be to > write > > > necessary implementation so that Sling virtual resource tree could be > > used > > > as a data source for one of the components mentioned above. What do you > > > think about this approach? Could you advice some other alternatives to > > Solr > > > DataImportHandler and Apache ManifoldCF for implementing this task? > > > > > > > > > > > > Also I’ve got couple of questions on Sling API: > > > > > > - Am I right that the “best practice” way to provide bundle with > > custom > > > * > > > ResourceProvider* implementation is to use Apache Felix Maven SCR > > Plugin > > > and specify certain SCR annotations (like *@Component*, *@Service* > and > > > some others) on corresponding classes – *ResourceProvider* or * > > > ResourceProviderFactory* implementation in this case? > > > > > > > IIRC you will implement a ResourceProviderFactory as a @Component with a > > @Service annotation indicating it implements ResourceProviderFactory > > interface. It will then build ResourceProvider objects. To check I would > > need to have a quick look at the API. > > > > > > > > > > > > > > > > > - I see that *ResourceResolver* is intended to be used by clients to > > > obtain and work with Sling resources. Also it seems to me that it is > > > unlikely necessary to create custom *ResourceResolver* > implementation > > > for the Solr integration task. But still, could you please specify > > some > > > valid typical cases when one would need to create custom * > > > ResourceResolver*? > > > > > > > > > > > Correct, you wont need to create a ResourceResolver. > > > > > > > > > > > > > - Suppose I have configured same resource provider implementation > > (like > > > file system resource provider or possible Solr resource provider) > > under > > > two > > > urls “/url1” and “/url2”. Now I want to perform *findResources*/* > > > queryResources* but only for the resources residing under “/url1”. > Is > > it > > > possible to limit search results in such way? (Probably I missed > > > something, > > > but looking through source code it seems that query results from all > > > queriable resource providers supporting given query language will be > > > combined regardless where in the resource tree corresponding > provider > > is > > > configured) > > > > > > > > > You may decide to limit searches to path subtrees in the query language > > itself. > > > > > > > > > > > > > > > > > > Please write any feedback/thoughts you have after reading this vision – > > > this’ll really help me to understand details further. > > > > > > > > > > > > > Sounds like your getting there, please remember to submit a proposal > before > > the deadline if your still interested. > > > > Thanks > > Ian > > > > > > > > > > > > Many thanks in advance, > > > > > > Ilya > > > > > >