Re: [Discuss] Storm hdfs spout improvements
Sure will do that. Get Outlook for Android<https://aka.ms/ghei36> From: Arun Iyer <ai...@hortonworks.com> on behalf of Arun Mahadevan <ar...@apache.org> Sent: Wednesday, February 15, 2017 10:35:43 AM To: dev@storm.apache.org Cc: Anudeep Kumar; Bobby Evans Subject: Re: [Discuss] Storm hdfs spout improvements Can you please raise a pull request with your proposal? That way it will be easier to review and comment. Thanks, Arun On 2/15/17, 9:04 AM, "Sachin Pasalkar" <sachin_pasal...@symantec.com> wrote: >Can any one take a look at this? I have attached my code in JIRA. > >On 14/02/17, 7:38 AM, "Sachin Pasalkar" <sachin_pasal...@symantec.com> >wrote: > >>I have created JIRA for this >>https://issues.apache.org/jira/browse/STORM-2358. >>For point 1: >> >>Its specific use case just to support why it needs to be public >> >>For point 2: >>We are limiting code to be very specific to these 2 implementations we >>should have generic implementation. I see there is another check-in >>happening for ZippedTextFileReader.I have attached my code changes in >>JIRA, please take a look, where you need to provide class. >> >>For point 3: >> >>Lets assume I have multiple topologies with different readers. So I >>defined the a base topology class with HDFSSpout in it. Now I always needs >>to pass the outputFields as separate array. This actually can be part of >>every reader class as its very specific to it. >> >>Thanks, >>Sachin >> >>On 14/02/17, 4:52 AM, "Roshan Naik" <ros...@hortonworks.com> wrote: >> >>> >>> >>>On 2/13/17, 12:14 PM, "Sachin Pasalkar" <sachin_pasal...@symantec.com> >>>wrote: >>> >>>>I have attached updated source code of HDFSSpout for more reference. I >>>>have updated respective classes (not attached) >>> >>> >>>Don¹t see any attachment. Answers are below. Better to do this discussion >>>on a JIRA. >>> >>> >>>On 2/13/17, 8:32 AM, "Sachin Pasalkar" <sachin_pasal...@symantec.com> >>>wrote: >>> >>>>Hi, >>>> >>>>I was looking at storm hdfs spout code in 1.x branch, I found below >>>>improvements can be made in below code. >>>> >>>> 1. Make org.apache.storm.hdfs.spout.AbstractFileReader as public so >>>>that it can be used in generics. >>> >>>Java generics and making a class public are unrelated to my knowledge. >>>But >>>making it public sounds ok to me if its useful for "user defined² readers >>>Š although it doesn¹t really have that much going on in it. For future >>>built-in reader types it is immaterial as they can derive from it anyway >>>just like the existing ones. HdfsSpout class itself doesn¹t care about >>>the >>>ŒAbstractFileReader¹ type. For that there is the ŒFileReader¹ interface. >>> >>> >>> >>>> 2. org.apache.storm.hdfs.spout.HdfsSpout requires readerType as >>>>String. It will be great to have class >>>>readerType; So we will not use Class.forName at multiple places also it >>>>will help in below point. >>> >>>The reason it is a string, is that, for built-in readers, we wanted to >>>support Œshort aliases¹ like Œtext¹ and Œseq¹ instead of FQCN.. >>> >>> >>>> 3. HdfsSpout also needs to provide outFields which are declared as >>>>constants in each reader(e.g.SequenceFileReader). We can have abstract >>>>API AbstractFileReader in which return them to user to make it generic. >>> >>> >>>These consts can¹t go into the AbstractFileReader as they are reader >>>specific. >>> >>>They are there just for convenience. Users can call withOutputFields() >>>on >>>the spout and set it to these predefined names or anything else. >>> >>> >>>-Roshan >>> >> >
Re: [Discuss] Storm hdfs spout improvements
Can you please raise a pull request with your proposal? That way it will be easier to review and comment. Thanks, Arun On 2/15/17, 9:04 AM, "Sachin Pasalkar"wrote: >Can any one take a look at this? I have attached my code in JIRA. > >On 14/02/17, 7:38 AM, "Sachin Pasalkar" >wrote: > >>I have created JIRA for this >>https://issues.apache.org/jira/browse/STORM-2358. >>For point 1: >> >>Its specific use case just to support why it needs to be public >> >>For point 2: >>We are limiting code to be very specific to these 2 implementations we >>should have generic implementation. I see there is another check-in >>happening for ZippedTextFileReader.I have attached my code changes in >>JIRA, please take a look, where you need to provide class. >> >>For point 3: >> >>Lets assume I have multiple topologies with different readers. So I >>defined the a base topology class with HDFSSpout in it. Now I always needs >>to pass the outputFields as separate array. This actually can be part of >>every reader class as its very specific to it. >> >>Thanks, >>Sachin >> >>On 14/02/17, 4:52 AM, "Roshan Naik" wrote: >> >>> >>> >>>On 2/13/17, 12:14 PM, "Sachin Pasalkar" >>>wrote: >>> I have attached updated source code of HDFSSpout for more reference. I have updated respective classes (not attached) >>> >>> >>>Don¹t see any attachment. Answers are below. Better to do this discussion >>>on a JIRA. >>> >>> >>>On 2/13/17, 8:32 AM, "Sachin Pasalkar" >>>wrote: >>> Hi, I was looking at storm hdfs spout code in 1.x branch, I found below improvements can be made in below code. 1. Make org.apache.storm.hdfs.spout.AbstractFileReader as public so that it can be used in generics. >>> >>>Java generics and making a class public are unrelated to my knowledge. >>>But >>>making it public sounds ok to me if its useful for "user defined² readers >>>Š although it doesn¹t really have that much going on in it. For future >>>built-in reader types it is immaterial as they can derive from it anyway >>>just like the existing ones. HdfsSpout class itself doesn¹t care about >>>the >>>ŒAbstractFileReader¹ type. For that there is the ŒFileReader¹ interface. >>> >>> >>> 2. org.apache.storm.hdfs.spout.HdfsSpout requires readerType as String. It will be great to have class readerType; So we will not use Class.forName at multiple places also it will help in below point. >>> >>>The reason it is a string, is that, for built-in readers, we wanted to >>>support Œshort aliases¹ like Œtext¹ and Œseq¹ instead of FQCN.. >>> >>> 3. HdfsSpout also needs to provide outFields which are declared as constants in each reader(e.g.SequenceFileReader). We can have abstract API AbstractFileReader in which return them to user to make it generic. >>> >>> >>>These consts can¹t go into the AbstractFileReader as they are reader >>>specific. >>> >>>They are there just for convenience. Users can call withOutputFields() >>>on >>>the spout and set it to these predefined names or anything else. >>> >>> >>>-Roshan >>> >> >
Re: [Discuss] Storm hdfs spout improvements
Can any one take a look at this? I have attached my code in JIRA. On 14/02/17, 7:38 AM, "Sachin Pasalkar"wrote: >I have created JIRA for this >https://issues.apache.org/jira/browse/STORM-2358. >For point 1: > >Its specific use case just to support why it needs to be public > >For point 2: >We are limiting code to be very specific to these 2 implementations we >should have generic implementation. I see there is another check-in >happening for ZippedTextFileReader.I have attached my code changes in >JIRA, please take a look, where you need to provide class. > >For point 3: > >Lets assume I have multiple topologies with different readers. So I >defined the a base topology class with HDFSSpout in it. Now I always needs >to pass the outputFields as separate array. This actually can be part of >every reader class as its very specific to it. > >Thanks, >Sachin > >On 14/02/17, 4:52 AM, "Roshan Naik" wrote: > >> >> >>On 2/13/17, 12:14 PM, "Sachin Pasalkar" >>wrote: >> >>>I have attached updated source code of HDFSSpout for more reference. I >>>have updated respective classes (not attached) >> >> >>Don¹t see any attachment. Answers are below. Better to do this discussion >>on a JIRA. >> >> >>On 2/13/17, 8:32 AM, "Sachin Pasalkar" >>wrote: >> >>>Hi, >>> >>>I was looking at storm hdfs spout code in 1.x branch, I found below >>>improvements can be made in below code. >>> >>> 1. Make org.apache.storm.hdfs.spout.AbstractFileReader as public so >>>that it can be used in generics. >> >>Java generics and making a class public are unrelated to my knowledge. >>But >>making it public sounds ok to me if its useful for "user defined² readers >>Š although it doesn¹t really have that much going on in it. For future >>built-in reader types it is immaterial as they can derive from it anyway >>just like the existing ones. HdfsSpout class itself doesn¹t care about >>the >>ŒAbstractFileReader¹ type. For that there is the ŒFileReader¹ interface. >> >> >> >>> 2. org.apache.storm.hdfs.spout.HdfsSpout requires readerType as >>>String. It will be great to have class >>>readerType; So we will not use Class.forName at multiple places also it >>>will help in below point. >> >>The reason it is a string, is that, for built-in readers, we wanted to >>support Œshort aliases¹ like Œtext¹ and Œseq¹ instead of FQCN.. >> >> >>> 3. HdfsSpout also needs to provide outFields which are declared as >>>constants in each reader(e.g.SequenceFileReader). We can have abstract >>>API AbstractFileReader in which return them to user to make it generic. >> >> >>These consts can¹t go into the AbstractFileReader as they are reader >>specific. >> >>They are there just for convenience. Users can call withOutputFields() >>on >>the spout and set it to these predefined names or anything else. >> >> >>-Roshan >> >
Re: [Discuss] Storm hdfs spout improvements
I have created JIRA for this https://issues.apache.org/jira/browse/STORM-2358. For point 1: Its specific use case just to support why it needs to be public For point 2: We are limiting code to be very specific to these 2 implementations we should have generic implementation. I see there is another check-in happening for ZippedTextFileReader.I have attached my code changes in JIRA, please take a look, where you need to provide class. For point 3: Lets assume I have multiple topologies with different readers. So I defined the a base topology class with HDFSSpout in it. Now I always needs to pass the outputFields as separate array. This actually can be part of every reader class as its very specific to it. Thanks, Sachin On 14/02/17, 4:52 AM, "Roshan Naik"wrote: > > >On 2/13/17, 12:14 PM, "Sachin Pasalkar" >wrote: > >>I have attached updated source code of HDFSSpout for more reference. I >>have updated respective classes (not attached) > > >Don¹t see any attachment. Answers are below. Better to do this discussion >on a JIRA. > > >On 2/13/17, 8:32 AM, "Sachin Pasalkar" >wrote: > >>Hi, >> >>I was looking at storm hdfs spout code in 1.x branch, I found below >>improvements can be made in below code. >> >> 1. Make org.apache.storm.hdfs.spout.AbstractFileReader as public so >>that it can be used in generics. > >Java generics and making a class public are unrelated to my knowledge. But >making it public sounds ok to me if its useful for "user defined² readers >Š although it doesn¹t really have that much going on in it. For future >built-in reader types it is immaterial as they can derive from it anyway >just like the existing ones. HdfsSpout class itself doesn¹t care about the >ŒAbstractFileReader¹ type. For that there is the ŒFileReader¹ interface. > > > >> 2. org.apache.storm.hdfs.spout.HdfsSpout requires readerType as >>String. It will be great to have class >>readerType; So we will not use Class.forName at multiple places also it >>will help in below point. > >The reason it is a string, is that, for built-in readers, we wanted to >support Œshort aliases¹ like Œtext¹ and Œseq¹ instead of FQCN.. > > >> 3. HdfsSpout also needs to provide outFields which are declared as >>constants in each reader(e.g.SequenceFileReader). We can have abstract >>API AbstractFileReader in which return them to user to make it generic. > > >These consts can¹t go into the AbstractFileReader as they are reader >specific. > >They are there just for convenience. Users can call withOutputFields() on >the spout and set it to these predefined names or anything else. > > >-Roshan >
Re: [Discuss] Storm hdfs spout improvements
On 2/13/17, 12:14 PM, "Sachin Pasalkar"wrote: >I have attached updated source code of HDFSSpout for more reference. I >have updated respective classes (not attached) Don¹t see any attachment. Answers are below. Better to do this discussion on a JIRA. On 2/13/17, 8:32 AM, "Sachin Pasalkar" wrote: >Hi, > >I was looking at storm hdfs spout code in 1.x branch, I found below >improvements can be made in below code. > > 1. Make org.apache.storm.hdfs.spout.AbstractFileReader as public so >that it can be used in generics. Java generics and making a class public are unrelated to my knowledge. But making it public sounds ok to me if its useful for "user defined² readers Š although it doesn¹t really have that much going on in it. For future built-in reader types it is immaterial as they can derive from it anyway just like the existing ones. HdfsSpout class itself doesn¹t care about the ŒAbstractFileReader¹ type. For that there is the ŒFileReader¹ interface. > 2. org.apache.storm.hdfs.spout.HdfsSpout requires readerType as >String. It will be great to have class >readerType; So we will not use Class.forName at multiple places also it >will help in below point. The reason it is a string, is that, for built-in readers, we wanted to support Œshort aliases¹ like Œtext¹ and Œseq¹ instead of FQCN.. > 3. HdfsSpout also needs to provide outFields which are declared as >constants in each reader(e.g.SequenceFileReader). We can have abstract >API AbstractFileReader in which return them to user to make it generic. These consts can¹t go into the AbstractFileReader as they are reader specific. They are there just for convenience. Users can call withOutputFields() on the spout and set it to these predefined names or anything else. -Roshan
Re: [Discuss] Storm hdfs spout improvements
I have attached updated source code of HDFSSpout for more reference. I have updated respective classes (not attached) On 13/02/17, 10:02 PM, "Sachin Pasalkar"wrote: >Hi, > >I was looking at storm hdfs spout code in 1.x branch, I found below >improvements can be made in below code. > > 1. Make org.apache.storm.hdfs.spout.AbstractFileReader as public so >that it can be used in generics. > 2. org.apache.storm.hdfs.spout.HdfsSpout requires readerType as >String. It will be great to have class >readerType; So we will not use Class.forName at multiple places also it >will help in below point. > 3. HdfsSpout also needs to provide outFields which are declared as >constants in each reader(e.g.SequenceFileReader). We can have abstract >API AbstractFileReader in which return them to user to make it generic. > >Let me know your thoughts on this. > >Thanks, >Sachin
[Discuss] Storm hdfs spout improvements
Hi, I was looking at storm hdfs spout code in 1.x branch, I found below improvements can be made in below code. 1. Make org.apache.storm.hdfs.spout.AbstractFileReader as public so that it can be used in generics. 2. org.apache.storm.hdfs.spout.HdfsSpout requires readerType as String. It will be great to have class readerType; So we will not use Class.forName at multiple places also it will help in below point. 3. HdfsSpout also needs to provide outFields which are declared as constants in each reader(e.g.SequenceFileReader). We can have abstract API AbstractFileReader in which return them to user to make it generic. Let me know your thoughts on this. Thanks, Sachin