Re: [GSoC 2009] Add search capability to index/search artifacts in the SCA domain

Phillipe Ramalho Fri, 03 Apr 2009 01:01:00 -0700

Hi everyone,

I just updated my proposals with the new features suggested by Luciano and
Adriano.


Any more comments will be appreciated ; )

Best Regards,
Phillipe Ramalho

On Wed, Apr 1, 2009 at 11:34 PM, Phillipe Ramalho <
[email protected]> wrote:

> Hi Adriano,
>
> Thanks for the comments, they are really helpful : )
>
> Some comments in line:
>
> In addition, with every artifact the indexed artifact is related to, an
> extra information can be added using a Lucene feature called payload, this
> information could tell what is the relationship between the elements.
>
> I liked about this relationship thing, have you thought about extending
> Lucene query parser so new syntax could be provided? We could extend and add
> support to something like: isreferenced("StoreCatalog") ...so every
> component that is referenced by StoreCatalog would be returned. Well, maybe
> we could also do this using Lucene field, it would be much faster. Anyway,
> there are cool features that could be done using payloads, we just need to
> come up with some good ideas : )
>
> I never did that on Lucene query parser, extending the query parser syntax,
> but I liked the idea too. I will do some more investigation about it and add
> it to the proposal ; )
>
> To handle different file types, file analyzers will be implemented to
> extract the texts from it. For example, a .class file is a binary file, but
> the method names (mainly the ones annotated with SCA Annotations) could be
> extracted using Java Reflection API. File analyzers could also call other
> analyzers recursively, for example, an .composite file could be analyzed
> using a CompositeAnalyzer and when it reaches the implementation.java node
> it could invoke JavaClassAnalyzer and etc. This way each type of file will
> have only its significant text indexed, otherwise, if the file is parsed
> using a common text file analyzer, every search for "component" would find
> every composite file, because it contains "<component>" node declaration.
>
> This is really what I had in mind, do something that only extracts the
> relevant information, because search is also about good results, it is not
> as simple as just finding them, otherwise Google would not be so famous and
> you probably would never be applying for GSoC : )...I think we should also
> implement an analyzer for compressed files, there are many jars on a domain,
> we cannot just ignore them.
>
> Good idea, so we could browse compressed files like browsing a folder. I
> will also add it to the proposal.
>
> Now, about the "searching" session of your proposal, it's fine, I think
> Lucene already give us a good query parser for user input. It's a good idea
> to implement everything as an SCA component, and one of the services it
> could provide is to search not only using a query text, but also accepting
> Lucene query objects as input. Some app using the search component could
> have a very user friendly interface where the user could check many
> checkboxes and other high level GUI component to refine a query, for this
> cases, when the app execute the search it would probably generate the Lucene
> objects directly instead of creating a query string.
>
> OK, I think it's going to be easy, the query text is converted to lucene
> query objects anyway, the only thing this new functionality needs to do is
> not parse the query, just execute the query objects directly against the
> index : )
>
> Hey, this is a good way to display a result, because in the results you can
> already see the artifacts relationship. Maybe we could work on expanding the
> result tree down to files inside compressed files or method inside class
> files. I think this display model could be extended not only for displaying
> results, but also to display every artifact on the domain manager web app.
>
> That's the idea, to expand down to every artifact we could parse and index
> : )
>
> I think you might want to double the "Implementing text and file analyzer
> for indexing" phase time.
>
> Agreed, I will do that ; )
>
> Regards,
> Phillipe Ramalho
>
>
> On Wed, Apr 1, 2009 at 1:27 AM, Adriano Crestani <
> [email protected]> wrote:
>
>> Hi Phillipe,
>>
>> very good and detailed proposal : )
>>
>> In addition, with every artifact the indexed artifact is related to, an
>> extra information can be added using a Lucene feature called payload, this
>> information could tell what is the relationship between the elements.
>>
>> I liked about this relationship thing, have you thought about extending
>> Lucene query parser so new syntax could be provided? We could extend and add
>> support to something like: isreferenced("StoreCatalog") ...so every
>> component that is referenced by StoreCatalog would be returned. Well, maybe
>> we could also do this using Lucene field, it would be much faster. Anyway,
>> there are cool features that could be done using payloads, we just need to
>> come up with some good ideas : )
>>
>> To handle different file types, file analyzers will be implemented to
>> extract the texts from it. For example, a .class file is a binary file, but
>> the method names (mainly the ones annotated with SCA Annotations) could be
>> extracted using Java Reflection API. File analyzers could also call other
>> analyzers recursively, for example, an .composite file could be analyzed
>> using a CompositeAnalyzer and when it reaches the implementation.java node
>> it could invoke JavaClassAnalyzer and etc. This way each type of file will
>> have only its significant text indexed, otherwise, if the file is parsed
>> using a common text file analyzer, every search for "component" would find
>> every composite file, because it contains "<component>" node declaration.
>>
>> This is really what I had in mind, do something that only extracts the
>> relevant information, because search is also about good results, it is not
>> as simple as just finding them, otherwise Google would not be so famous and
>> you probably would never be applying for GSoC : )...I think we should also
>> implement an analyzer for compressed files, there are many jars on a domain,
>> we cannot just ignore them.
>>
>> Now, about the "searching" session of your proposal, it's fine, I think
>> Lucene already give us a good query parser for user input. It's a good idea
>> to implement everything as an SCA component, and one of the services it
>> could provide is to search not only using a query text, but also accepting
>> Lucene query objects as input. Some app using the search component could
>> have a very user friendly interface where the user could check many
>> checkboxes and other high level GUI component to refine a query, for this
>> cases, when the app execute the search it would probably generate the Lucene
>> objects directly instead of creating a query string.
>>
>> The results will be displayed using a tree layout, something like Eclipse
>> IDE does [see image below] on its text search results, but instead of a tree
>> like project -> package -> class -> fragment text that contains the searched
>> text, it would be, for example, node > contribution > component >
>> file.componsite file > fragment text that contains the searched text. This
>> is just an example, the way the results can be displayed can still be
>> discussed on the community mailing list.
>> Hey, this is a good way to display a result, because in the results you
>> can already see the artifacts relationship. Maybe we could work on expanding
>> the result tree down to files inside compressed files or method inside class
>> files. I think this display model could be extended not only for displaying
>> results, but also to display every artifact on the domain manager web app.
>>
>> I think you might want to double the "Implementing text and file analyzer
>> for indexing" phase time.
>>
>> +1 from me too :)
>>
>> Adriano Crestani
>>
>>
>>
>> On Wed, Apr 1, 2009 at 12:02 AM, Phillipe Ramalho <
>> [email protected]> wrote:
>>
>>> Thanks Luciano,
>>>
>>> You might start thinking on how you are going to integrate with the
>>> runtime, possibly the contribution processing as a new phase or a new
>>> type of processor ?
>>>
>>> OK, I will investigate more about that and add some details about this to
>>> my proposal. I will let every
>>> one knows when I update it.
>>>
>>> Best Regards,
>>> Phillipe Ramalho
>>>
>>>  On Tue, Mar 31, 2009 at 10:29 AM, Luciano Resende <[email protected]
>>> > wrote:
>>>
>>>> On Tue, Mar 31, 2009 at 1:04 AM, Phillipe Ramalho
>>>> <[email protected]> wrote:
>>>> > Hi everyone,
>>>> >
>>>> > This is my proposal for the project "Add search capability to
>>>> index/search
>>>> > artifacts in the SCA domain" described at [1]. I already submitted the
>>>> > proposal at gsoc webpage and added it to Tuscany Wiki proposals at
>>>> [2].
>>>> >
>>>> > Any critic, suggestion, comments, review will be appreciated.
>>>> >
>>>> > I think there are some good points that could be improved on the
>>>> proposal
>>>> > and I'm still working on that, mainly those points I say that should
>>>> be
>>>> > discussed on the community, so, any comments about that will be also
>>>> > appreciated : )
>>>> >
>>>>
>>>>
>>>> Looks really good, and very detailed....
>>>>
>>>> You might start thinking on how you are going to integrate with the
>>>> runtime, possibly the contribution processing as a new phase or a new
>>>> type of processor ?
>>>>
>>>> Anyway, +1 from me.
>>>>
>>>> > Thanks in advance,
>>>> > Phillipe Ramalho
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Luciano Resende
>>>> Apache Tuscany, Apache PhotArk
>>>> http://people.apache.org/~lresende<http://people.apache.org/%7Elresende>
>>>> http://lresende.blogspot.com/
>>>>
>>>
>>>
>>>
>>> --
>>> Phillipe Ramalho
>>>
>>
>>
>
>
> --
> Phillipe Ramalho
>



-- 
Phillipe Ramalho

Re: [GSoC 2009] Add search capability to index/search artifacts in the SCA domain

Reply via email to