Hi everyone, I just updated my proposals with the new features suggested by Luciano and Adriano.
Any more comments will be appreciated ; ) Best Regards, Phillipe Ramalho On Wed, Apr 1, 2009 at 11:34 PM, Phillipe Ramalho < [email protected]> wrote: > Hi Adriano, > > Thanks for the comments, they are really helpful : ) > > Some comments in line: > > In addition, with every artifact the indexed artifact is related to, an > extra information can be added using a Lucene feature called payload, this > information could tell what is the relationship between the elements. > > I liked about this relationship thing, have you thought about extending > Lucene query parser so new syntax could be provided? We could extend and add > support to something like: isreferenced("StoreCatalog") ...so every > component that is referenced by StoreCatalog would be returned. Well, maybe > we could also do this using Lucene field, it would be much faster. Anyway, > there are cool features that could be done using payloads, we just need to > come up with some good ideas : ) > > I never did that on Lucene query parser, extending the query parser syntax, > but I liked the idea too. I will do some more investigation about it and add > it to the proposal ; ) > > To handle different file types, file analyzers will be implemented to > extract the texts from it. For example, a .class file is a binary file, but > the method names (mainly the ones annotated with SCA Annotations) could be > extracted using Java Reflection API. File analyzers could also call other > analyzers recursively, for example, an .composite file could be analyzed > using a CompositeAnalyzer and when it reaches the implementation.java node > it could invoke JavaClassAnalyzer and etc. This way each type of file will > have only its significant text indexed, otherwise, if the file is parsed > using a common text file analyzer, every search for "component" would find > every composite file, because it contains "<component>" node declaration. > > This is really what I had in mind, do something that only extracts the > relevant information, because search is also about good results, it is not > as simple as just finding them, otherwise Google would not be so famous and > you probably would never be applying for GSoC : )...I think we should also > implement an analyzer for compressed files, there are many jars on a domain, > we cannot just ignore them. > > Good idea, so we could browse compressed files like browsing a folder. I > will also add it to the proposal. > > Now, about the "searching" session of your proposal, it's fine, I think > Lucene already give us a good query parser for user input. It's a good idea > to implement everything as an SCA component, and one of the services it > could provide is to search not only using a query text, but also accepting > Lucene query objects as input. Some app using the search component could > have a very user friendly interface where the user could check many > checkboxes and other high level GUI component to refine a query, for this > cases, when the app execute the search it would probably generate the Lucene > objects directly instead of creating a query string. > > OK, I think it's going to be easy, the query text is converted to lucene > query objects anyway, the only thing this new functionality needs to do is > not parse the query, just execute the query objects directly against the > index : ) > > Hey, this is a good way to display a result, because in the results you can > already see the artifacts relationship. Maybe we could work on expanding the > result tree down to files inside compressed files or method inside class > files. I think this display model could be extended not only for displaying > results, but also to display every artifact on the domain manager web app. > > That's the idea, to expand down to every artifact we could parse and index > : ) > > I think you might want to double the "Implementing text and file analyzer > for indexing" phase time. > > Agreed, I will do that ; ) > > Regards, > Phillipe Ramalho > > > On Wed, Apr 1, 2009 at 1:27 AM, Adriano Crestani < > [email protected]> wrote: > >> Hi Phillipe, >> >> very good and detailed proposal : ) >> >> In addition, with every artifact the indexed artifact is related to, an >> extra information can be added using a Lucene feature called payload, this >> information could tell what is the relationship between the elements. >> >> I liked about this relationship thing, have you thought about extending >> Lucene query parser so new syntax could be provided? We could extend and add >> support to something like: isreferenced("StoreCatalog") ...so every >> component that is referenced by StoreCatalog would be returned. Well, maybe >> we could also do this using Lucene field, it would be much faster. Anyway, >> there are cool features that could be done using payloads, we just need to >> come up with some good ideas : ) >> >> To handle different file types, file analyzers will be implemented to >> extract the texts from it. For example, a .class file is a binary file, but >> the method names (mainly the ones annotated with SCA Annotations) could be >> extracted using Java Reflection API. File analyzers could also call other >> analyzers recursively, for example, an .composite file could be analyzed >> using a CompositeAnalyzer and when it reaches the implementation.java node >> it could invoke JavaClassAnalyzer and etc. This way each type of file will >> have only its significant text indexed, otherwise, if the file is parsed >> using a common text file analyzer, every search for "component" would find >> every composite file, because it contains "<component>" node declaration. >> >> This is really what I had in mind, do something that only extracts the >> relevant information, because search is also about good results, it is not >> as simple as just finding them, otherwise Google would not be so famous and >> you probably would never be applying for GSoC : )...I think we should also >> implement an analyzer for compressed files, there are many jars on a domain, >> we cannot just ignore them. >> >> Now, about the "searching" session of your proposal, it's fine, I think >> Lucene already give us a good query parser for user input. It's a good idea >> to implement everything as an SCA component, and one of the services it >> could provide is to search not only using a query text, but also accepting >> Lucene query objects as input. Some app using the search component could >> have a very user friendly interface where the user could check many >> checkboxes and other high level GUI component to refine a query, for this >> cases, when the app execute the search it would probably generate the Lucene >> objects directly instead of creating a query string. >> >> The results will be displayed using a tree layout, something like Eclipse >> IDE does [see image below] on its text search results, but instead of a tree >> like project -> package -> class -> fragment text that contains the searched >> text, it would be, for example, node > contribution > component > >> file.componsite file > fragment text that contains the searched text. This >> is just an example, the way the results can be displayed can still be >> discussed on the community mailing list. >> Hey, this is a good way to display a result, because in the results you >> can already see the artifacts relationship. Maybe we could work on expanding >> the result tree down to files inside compressed files or method inside class >> files. I think this display model could be extended not only for displaying >> results, but also to display every artifact on the domain manager web app. >> >> I think you might want to double the "Implementing text and file analyzer >> for indexing" phase time. >> >> +1 from me too :) >> >> Adriano Crestani >> >> >> >> On Wed, Apr 1, 2009 at 12:02 AM, Phillipe Ramalho < >> [email protected]> wrote: >> >>> Thanks Luciano, >>> >>> You might start thinking on how you are going to integrate with the >>> runtime, possibly the contribution processing as a new phase or a new >>> type of processor ? >>> >>> OK, I will investigate more about that and add some details about this to >>> my proposal. I will let every >>> one knows when I update it. >>> >>> Best Regards, >>> Phillipe Ramalho >>> >>> On Tue, Mar 31, 2009 at 10:29 AM, Luciano Resende <[email protected] >>> > wrote: >>> >>>> On Tue, Mar 31, 2009 at 1:04 AM, Phillipe Ramalho >>>> <[email protected]> wrote: >>>> > Hi everyone, >>>> > >>>> > This is my proposal for the project "Add search capability to >>>> index/search >>>> > artifacts in the SCA domain" described at [1]. I already submitted the >>>> > proposal at gsoc webpage and added it to Tuscany Wiki proposals at >>>> [2]. >>>> > >>>> > Any critic, suggestion, comments, review will be appreciated. >>>> > >>>> > I think there are some good points that could be improved on the >>>> proposal >>>> > and I'm still working on that, mainly those points I say that should >>>> be >>>> > discussed on the community, so, any comments about that will be also >>>> > appreciated : ) >>>> > >>>> >>>> >>>> Looks really good, and very detailed.... >>>> >>>> You might start thinking on how you are going to integrate with the >>>> runtime, possibly the contribution processing as a new phase or a new >>>> type of processor ? >>>> >>>> Anyway, +1 from me. >>>> >>>> > Thanks in advance, >>>> > Phillipe Ramalho >>>> > >>>> >>>> >>>> >>>> -- >>>> Luciano Resende >>>> Apache Tuscany, Apache PhotArk >>>> http://people.apache.org/~lresende<http://people.apache.org/%7Elresende> >>>> http://lresende.blogspot.com/ >>>> >>> >>> >>> >>> -- >>> Phillipe Ramalho >>> >> >> > > > -- > Phillipe Ramalho > -- Phillipe Ramalho
