Hi, I see HDFSFileCopyModule and HDFSFileMerger in the library as well. Since we are so close to the release and I am not sure if these classes are just specific to HDFS, I am going to mark them Evolving so that we can address this afterwards and change the name if its suitable.
Thanks, Chandni On Sat, May 7, 2016 at 2:17 PM, Chandni Singh <[email protected]> wrote: > I can help Dev. > > Thanks, > Chandni > > On Sat, May 7, 2016 at 1:23 PM, Amol Kekre <[email protected]> wrote: > >> We do have docs on apache.org. Love to a very extensive and deep doc on >> this topic. >> >> Should we add "How to ..." sections? >> >> @dev, thks for volunteering. Anyone more volunteers? >> >> Thks, >> Amol >> >> >> On Sat, May 7, 2016 at 12:20 PM, Devendra Tagare < >> [email protected]> >> wrote: >> >> > @Thomas,@Amol I would like to contribute/collaborate on this. >> > >> > Will create a ticket for the same. >> > >> > Thanks, >> > Dev >> > >> > On Sat, May 7, 2016 at 11:04 AM, Thomas Weise <[email protected]> >> > wrote: >> > >> > > The documentation is here and is indexed: >> > > >> > > http://apex.apache.org/docs/malhar/ >> > > >> > > I think this is a matter of enhancing it. >> > > >> > > >> > > On Sat, May 7, 2016 at 9:18 AM, Amol Kekre <[email protected]> >> wrote: >> > > >> > > > Thomas and I talked. Both of us agree that a white paper is due to >> get >> > > > going. Google index clearly beats "find . | grep ..." in this day >> and >> > > age. >> > > > >> > > > The white paper would walk through and have data on HDFS, FTP, NFS, >> S3, >> > > > maybe even example apps (could be app properties) accompanying this. >> > > > >> > > > So any volunteers? >> > > > >> > > > Thks >> > > > Amol >> > > > >> > > > >> > > > On Thu, May 5, 2016 at 5:10 PM, Thomas Weise < >> [email protected]> >> > > > wrote: >> > > > >> > > > > Do we have other projects that create dummy classes for every >> > possible >> > > > > mounted file system just so that the user knows that's possible? >> The >> > > > > capability that matters here from app perspective is local file >> > system >> > > > and >> > > > > every developer in the Hadoop ecosystem should understand that. >> > > > > >> > > > > If the operator doesn't have anything specific to NFS then there >> is >> > no >> > > > > place for it in the library (it would be confusing, not helpful). >> > > > > >> > > > > There should be a different approach for pre-configured operators >> > that >> > > > > doesn't involve writing Java code. >> > > > > >> > > > > Thomas >> > > > > >> > > > > >> > > > > >> > > > > On Thu, May 5, 2016 at 3:10 PM, Amol Kekre <[email protected]> >> > > wrote: >> > > > > >> > > > > > I am not suggesting duplicating code; extend the operators. Just >> > add >> > > > > > something (may not even be a function) that can be viewed as >> > specific >> > > > to >> > > > > a >> > > > > > particular source. Say for NFS, it may be as simple as changing >> a >> > > > > default. >> > > > > > A file with NFS in its name help a great deal with adoption. >> > > > > > >> > > > > > Thks >> > > > > > Amol >> > > > > > >> > > > > > >> > > > > > On Thu, May 5, 2016 at 11:45 AM, Chandni Singh < >> > > > [email protected]> >> > > > > > wrote: >> > > > > > >> > > > > > > IMO this is not a good idea. >> > > > > > > >> > > > > > > We are proposing to add additional Java code which is generic >> > > (works >> > > > > with >> > > > > > > HDFS, NFS, local FS) but just calling it something specific - >> > NFS. >> > > > IMO >> > > > > > this >> > > > > > > is much more confusing to users. >> > > > > > > >> > > > > > > If we want to make it easier for users to find out that the FS >> > > Module >> > > > > > > supports writing to NFS then maybe we need to improve >> > documentation >> > > > or >> > > > > > > highlight it somewhere else. >> > > > > > > >> > > > > > > Adding java classes means more maintenance overhead and here >> > these >> > > > > > classes >> > > > > > > are not doing anything additional. >> > > > > > > >> > > > > > > Thanks, >> > > > > > > Chandni >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > On Thu, May 5, 2016 at 11:24 AM, Mohit Jotwani < >> > > > [email protected]> >> > > > > > > wrote: >> > > > > > > >> > > > > > > > +1 on Sandeep's suggestion. This would make an end user's >> life >> > > lot >> > > > > more >> > > > > > > > easier! >> > > > > > > > >> > > > > > > > Regards, >> > > > > > > > Mohit >> > > > > > > > >> > > > > > > > On Thu, May 5, 2016 at 11:51 PM, Sandeep Deshmukh < >> > > > > > > [email protected] >> > > > > > > > > >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > I do agree with Amol on having clear and explicit modules. >> > This >> > > > is >> > > > > > more >> > > > > > > > > from an end user perspective. For someone who is new to >> Apex, >> > > > > having >> > > > > > > > > separate NFS, HDFS, FTP, etc would make lot more sense >> than >> > one >> > > > > > generic >> > > > > > > > FS >> > > > > > > > > module. However small change these modules may have, like >> > just >> > > > > couple >> > > > > > > of >> > > > > > > > > small functions, I would like to have them separate for >> the >> > end >> > > > > user. >> > > > > > > > > >> > > > > > > > > It is finally about the perspective and the user >> experience >> > :) >> > > > > > > > > >> > > > > > > > > Regards, >> > > > > > > > > Sandeep >> > > > > > > > > >> > > > > > > > > On Thu, May 5, 2016 at 8:48 PM, Thomas Weise < >> > > > > [email protected] >> > > > > > > >> > > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > I don't think we should name something NFS* when it >> isn't >> > > > > specific >> > > > > > to >> > > > > > > > > NFS. >> > > > > > > > > > It is just like any other local FS for this purpose and >> > > that's >> > > > > > > already >> > > > > > > > > > covered by the Hadoop file system abstraction. >> > > > > > > > > > >> > > > > > > > > > Why can't a single FS Input module accommodate all of >> this. >> > > > Once >> > > > > > you >> > > > > > > > know >> > > > > > > > > > the FS URL, you can automatically optimize the >> > configuration, >> > > > if >> > > > > > > > > > appropriate. >> > > > > > > > > > >> > > > > > > > > > Thanks, >> > > > > > > > > > Thomas >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > On Thu, May 5, 2016 at 12:08 AM, Chaitanya Chebolu < >> > > > > > > > > > [email protected]> wrote: >> > > > > > > > > > >> > > > > > > > > > > Hi Chandni, >> > > > > > > > > > > >> > > > > > > > > > > Its a good point. I created the hierarchy based on >> user >> > > > > > > perspective >> > > > > > > > > and >> > > > > > > > > > > especially for non Java users. If I return >> FileSplitter >> > and >> > > > > > > > BlockReader >> > > > > > > > > > > from FS Input Module, then this module works for NFS. >> > But, >> > > > for >> > > > > > > users >> > > > > > > > > > > perspective it would be difficult, whether this module >> > > works >> > > > > for >> > > > > > > NFS >> > > > > > > > or >> > > > > > > > > > any >> > > > > > > > > > > other fileSystem. >> > > > > > > > > > > >> > > > > > > > > > > Regards, >> > > > > > > > > > > Chaitanya >> > > > > > > > > > > >> > > > > > > > > > > On Thu, May 5, 2016 at 11:05 AM, Chandni Singh < >> > > > > > > > > [email protected]> >> > > > > > > > > > > wrote: >> > > > > > > > > > > >> > > > > > > > > > > > I am sorry Chaitanya but I have more questions about >> > this >> > > > > > > > > > > > >> > > > > > > > > > > > 1. why is the FS Input Module abstract when by >> default >> > it >> > > > can >> > > > > > > > return >> > > > > > > > > > > > FileSplitter & BlockReader in >> > com.datatorrent.lib.io.fs? >> > > > > > > > > > > > These implementations are not specific to NFS. >> > > > > > > > > > > > >> > > > > > > > > > > > 2. In the NFS module that you have suggested to >> create, >> > > > what >> > > > > is >> > > > > > > > > > specific >> > > > > > > > > > > to >> > > > > > > > > > > > NFS? >> > > > > > > > > > > > >> > > > > > > > > > > > Please note: I have created a ticket >> APEXMALHAR-2081 to >> > > > > remove >> > > > > > > > > > > > FSFileSplitter from library and move its feature to >> the >> > > > base >> > > > > > > > > operator. >> > > > > > > > > > > > >> > > > > > > > > > > > Thanks, >> > > > > > > > > > > > Chandni >> > > > > > > > > > > > >> > > > > > > > > > > > On Wed, May 4, 2016 at 10:29 PM, Chaitanya Chebolu < >> > > > > > > > > > > > [email protected]> wrote: >> > > > > > > > > > > > >> > > > > > > > > > > > > FSFileSplitter & BlockReader are available in >> > > > > > > > > > com.datatorrent.lib.io.fs >> > > > > > > > > > > > > package. >> > > > > > > > > > > > > >> > > > > > > > > > > > > On Thu, May 5, 2016 at 10:47 AM, Chandni Singh < >> > > > > > > > > > > [email protected]> >> > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > >> > > > > > > > > > > > > > Ok. What is specific about the fileSplitter and >> > > > > blockReader >> > > > > > > > > > returned >> > > > > > > > > > > by >> > > > > > > > > > > > > > this implementation? >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > On May 4, 2016 9:43 PM, "Chaitanya Chebolu" < >> > > > > > > > > > > [email protected] >> > > > > > > > > > > > > >> > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Hi Chandni, >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Properties wise nothing specific. FS Input >> Module >> > > is >> > > > an >> > > > > > > > > abstract >> > > > > > > > > > > > Module >> > > > > > > > > > > > > > and >> > > > > > > > > > > > > > > NFS Module implements the abstract methods - >> > > > > > > > > createFileSplitter() >> > > > > > > > > > > and >> > > > > > > > > > > > > > > createBlockReader(). >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Regards, >> > > > > > > > > > > > > > > Chaitanya >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > On Wed, May 4, 2016 at 9:45 PM, Chandni Singh >> < >> > > > > > > > > > > > [email protected] >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Hi Chaitanya, >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > What will be specific in NFS Input Module >> that >> > is >> > > > not >> > > > > > > > > provided >> > > > > > > > > > by >> > > > > > > > > > > > FS >> > > > > > > > > > > > > > > Input >> > > > > > > > > > > > > > > > Module? >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Thanks, >> > > > > > > > > > > > > > > > Chandni >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > On Wed, May 4, 2016 at 7:12 AM, Amol Kekre < >> > > > > > > > > > [email protected] >> > > > > > > > > > > > >> > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > +1 >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Thks >> > > > > > > > > > > > > > > > > Amol >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > On Tue, May 3, 2016 at 10:06 PM, Sandeep >> > > > Deshmukh < >> > > > > > > > > > > > > > > > [email protected] >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > +1 >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Regards, >> > > > > > > > > > > > > > > > > > Sandeep >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > On Fri, Apr 29, 2016 at 3:26 PM, Mohit >> > > Jotwani >> > > > < >> > > > > > > > > > > > > > > [email protected]> >> > > > > > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > +1 >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Regards, >> > > > > > > > > > > > > > > > > > > Mohit >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > On Fri, Apr 29, 2016 at 2:09 PM, >> > Chaitanya >> > > > > > Chebolu >> > > > > > > < >> > > > > > > > > > > > > > > > > > > [email protected]> wrote: >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Hi All, >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > I am proposing NFS Input Module. >> Use >> > > case >> > > > > is >> > > > > > to >> > > > > > > > > read >> > > > > > > > > > > > large >> > > > > > > > > > > > > > > files >> > > > > > > > > > > > > > > > > from >> > > > > > > > > > > > > > > > > > > NFS >> > > > > > > > > > > > > > > > > > > > in parallel. >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Design of NFS input module: >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > There is a common interface >> > > > > "FSInputModule" >> > > > > > in >> > > > > > > > > > Malhar >> > > > > > > > > > > > for >> > > > > > > > > > > > > > the >> > > > > > > > > > > > > > > > > input >> > > > > > > > > > > > > > > > > > > > Modules. NFS input Module extends >> from >> > > > > > > > FSInputModule >> > > > > > > > > > and >> > > > > > > > > > > > can >> > > > > > > > > > > > > be >> > > > > > > > > > > > > > > > > > achieved >> > > > > > > > > > > > > > > > > > > by >> > > > > > > > > > > > > > > > > > > > using FSFileSplitter and BlockReader >> > > > > operators. >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Please share your thoughts on >> this. >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Regards, >> > > > > > > > > > > > > > > > > > > > Chaitanya >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
