Hi, Can we take a decision regarding whether all parsers need to be in one place ? ( either malhar lib / contrib )
In this PR https://github.com/apache/incubator-apex-malhar/pull/137 we are moving XML and JSON parser from contrib to lib. I suggest we move CSV parsers as well . I understand Thomas's point of dependencies, but shouldn't we also think about users that use malhar that would wish to have all parsers under common bucket ? Thanks, Shubham On Fri, Dec 18, 2015 at 2:44 AM, Thomas Weise <[email protected]> wrote: > This has changed over time. Pieces that cannot be covered by unit tests > should still be kept out of library. > > In addition, we cannot keep on expanding the number of dependencies in a > single monolithic module. Hence, we are going start to create smaller > modules with their dependencies and tests setup correctly. The first of > those will be Kafka, for which a PR is currently open. > > For lib, we can add more operators as long as they don't introduce new > dependencies. When building an app and using one operator from lib, > everything else comes with it. That's a problem for the application > assembly we are trying to address. > > Thomas > > > On Thu, Dec 17, 2015 at 1:06 PM, Sandesh Hegde <[email protected]> > wrote: > > > Even my understanding was along the lines, "if an unit test can't be run > on > > a developer machine without installing dependencies ( RabbitMq/redis ec), > > it should go into contrib". > > > > Documenting the exact process as to what goes where helps. > > > > On Thu, Dec 17, 2015 at 12:16 PM Chinmay Kolhatkar < > > [email protected]> > > wrote: > > > > > Hi Isha, > > > > > > I think what Shubham meant to say is any operator which has a > dependency > > on > > > external entity (not library) should go in contrib. Others should go in > > > library. > > > > > > For eg. DB related operators needs an instance/server of respect > database > > > to be running. Without that running server, the operator cannot satisfy > > the > > > functionality expected. Hence it goes into contrib. > > > > > > I think none of the parsers will have dependency on external entity, > > hence > > > they should go in library and not contrib. > > > > > > - Chinmay. > > > On 18 Dec 2015 00:42, "Isha Arkatkar" <[email protected]> wrote: > > > > > > > Hi Shubham, > > > > > > > > I think it is somewhat subjective what goes in Contrib Vs > > Library. > > > > Here is what I understood about general guideline: > > > > An operator would go in malhar-library if it does not have > any > > > > other library dependency than what is already available. If operator > > > needs > > > > to include a dependency, it could be added to Contrib but not > library. > > > > > > > > If we add an external library dependency in Malhar-library, the > > size > > > > of the lib jar keeps on growing. So, if we refer malhar-lib in a > > project, > > > > the size of the total package would increase, even if we may not > > directly > > > > use all external libs. > > > > > > > > For this reason, in pull request #137 > > > > <https://github.com/apache/incubator-apex-malhar/pull/137> , I move > > only > > > > Json and XML parsers to malhar-library. Csv one had library reference > > to > > > > supercsv, so it is still in contrib. > > > > > > > > Please correct if I missed something. :) > > > > > > > > Thanks! > > > > Isha > > > > > > > > On Wed, Dec 16, 2015 at 11:27 PM, Shubham Pathak < > > > [email protected]> > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > What do we mean by "additional dependency" in case of deciding > > contrib > > > vs > > > > > lib ? > > > > > As i understand, "additional dependency" is when an operator is > > > > > interacting with external technologies . For e.g kafka, MQ , HBase > > etc. > > > > > By this definition even CSV parsers need to be moved to malhar-lib. > > > > > > > > > > Thanks, > > > > > Shubham > > > > > > > > > > > > > > > On Thu, Dec 17, 2015 at 5:39 AM, Isha Arkatkar < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > Hi Chandni, > > > > > > > > > > > > I have moved that as well to lib, as the parsers depended on > > that. > > > > > > > > > > > > Thanks, > > > > > > Isha > > > > > > > > > > > > On Wed, Dec 16, 2015 at 1:01 PM, Chandni Singh < > > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > There is a converter package under com.datatorrent.contrib > which > > > has > > > > a > > > > > > > Converter API. This belongs in library as well. > > > > > > > > > > > > > > Thanks, > > > > > > > Chandni > > > > > > > > > > > > > > On Tue, Dec 15, 2015 at 1:29 PM, Chandni Singh < > > > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > Isha, > > > > > > > > > > > > > > > > Thanks for moving this. When you move these files, please > place > > > > then > > > > > > > under > > > > > > > > a package which reflects its functionality. I don't see the > > need > > > > for > > > > > > > > package called schema. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Chandni > > > > > > > > > > > > > > > > On Tue, Dec 15, 2015 at 12:31 PM, Isha Arkatkar < > > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > >> Hi, > > > > > > > >> > > > > > > > >> For csv parser there is an additional dependency. So, I'll > > > move > > > > > only > > > > > > > >> json > > > > > > > >> and xml to new location. > > > > > > > >> > > > > > > > >> Thanks, > > > > > > > >> Isha > > > > > > > >> > > > > > > > >> On Tue, Dec 15, 2015 at 11:42 AM, Thomas Weise < > > > > > > [email protected]> > > > > > > > >> wrote: > > > > > > > >> > > > > > > > >> > As long as the operators don't introduce additional > > > dependencies > > > > > > they > > > > > > > >> > should be in lib. > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > On Tue, Dec 15, 2015 at 9:34 AM, Shubham Pathak < > > > > > > > >> [email protected]> > > > > > > > >> > wrote: > > > > > > > >> > > > > > > > > >> > > Hi Chandni, > > > > > > > >> > > > > > > > > > >> > > I had written those operators. > > > > > > > >> > > Here is the jira for that > > > > > > > >> https://malhar.atlassian.net/browse/MLHR-1838 > > > > > > > >> > > You would find the entire discussion there. > > > > > > > >> > > > > > > > > > >> > > Why are all these operator under Malhar/contrib and not > > > > > Malhar/lib > > > > > > > >> > > When i was writing the code i saw AbstractCsvParser in > > > > contriib > > > > > > and > > > > > > > >> hence > > > > > > > >> > > added there. > > > > > > > >> > > > > > > > > > >> > > Recently i got to know which operators must go in > contrib > > > and > > > > > what > > > > > > > >> must > > > > > > > >> > go > > > > > > > >> > > in lib. > > > > > > > >> > > By that definition, these operators must belong to lib. > > > > > > > >> > > > > > > > > > >> > > Thanks, > > > > > > > >> > > Shubham > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > On Tue, Dec 15, 2015 at 1:32 PM, Chandni Singh < > > > > > > > >> [email protected]> > > > > > > > >> > > wrote: > > > > > > > >> > > > > > > > > > >> > > > Hi, > > > > > > > >> > > > > > > > > > > >> > > > I just came across couple of formatter and parser > > > operators > > > > > > which > > > > > > > >> are > > > > > > > >> > > under > > > > > > > >> > > > Malhar/contrib/schema. > > > > > > > >> > > > > > > > > > > >> > > > I have couple of questions: > > > > > > > >> > > > 1. What does schema denote here? > > > > > > > >> > > > 2. Why formatter/parser which are functions are placed > > > under > > > > > > > schema > > > > > > > >> > > > package? > > > > > > > >> > > > 2. Why are all these operator under Malhar/contrib and > > not > > > > > > > >> Malhar/lib > > > > > > > >> > > > > > > > > > > >> > > > Thanks, > > > > > > > >> > > > Chandni > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
