Re: [Discuss] Storm hdfs spout improvements

2017-02-15 Thread Sachin Pasalkar
Sure will do that.

Get Outlook for Android<https://aka.ms/ghei36>



From: Arun Iyer <ai...@hortonworks.com> on behalf of Arun Mahadevan 
<ar...@apache.org>
Sent: Wednesday, February 15, 2017 10:35:43 AM
To: dev@storm.apache.org
Cc: Anudeep Kumar; Bobby Evans
Subject: Re: [Discuss] Storm hdfs spout improvements

Can you please raise a pull request with your proposal? That way it will be 
easier to review and comment.

Thanks,
Arun


On 2/15/17, 9:04 AM, "Sachin Pasalkar" <sachin_pasal...@symantec.com> wrote:

>Can any one take a look at this? I have attached my code in JIRA.
>
>On 14/02/17, 7:38 AM, "Sachin Pasalkar" <sachin_pasal...@symantec.com>
>wrote:
>
>>I have created JIRA for this
>>https://issues.apache.org/jira/browse/STORM-2358.
>>For point 1:
>>
>>Its specific use case just to support why it needs to be public
>>
>>For point 2:
>>We are limiting code to be very specific to these 2 implementations we
>>should have generic implementation. I see there is another check-in
>>happening for ZippedTextFileReader.I have attached my code changes in
>>JIRA, please take a look, where you need to provide class.
>>
>>For point 3:
>>
>>Lets assume I have multiple topologies with different readers. So I
>>defined the a base topology class with HDFSSpout in it. Now I always needs
>>to pass the outputFields as separate array. This actually can be part of
>>every reader class as its very specific to it.
>>
>>Thanks,
>>Sachin
>>
>>On 14/02/17, 4:52 AM, "Roshan Naik" <ros...@hortonworks.com> wrote:
>>
>>>
>>>
>>>On 2/13/17, 12:14 PM, "Sachin Pasalkar" <sachin_pasal...@symantec.com>
>>>wrote:
>>>
>>>>I have attached updated source code of HDFSSpout for more reference. I
>>>>have updated respective classes (not attached)
>>>
>>>
>>>Don¹t see any attachment. Answers are below. Better to do this discussion
>>>on a JIRA.
>>>
>>>
>>>On 2/13/17, 8:32 AM, "Sachin Pasalkar" <sachin_pasal...@symantec.com>
>>>wrote:
>>>
>>>>Hi,
>>>>
>>>>I was looking at storm hdfs spout code in 1.x branch, I found below
>>>>improvements can be made in below code.
>>>>
>>>>  1.  Make org.apache.storm.hdfs.spout.AbstractFileReader as public so
>>>>that it can be used in generics.
>>>
>>>Java generics and making a class public are unrelated to my knowledge.
>>>But
>>>making it public sounds ok to me if its useful for "user defined² readers
>>>Š although it doesn¹t really have that much going on in it. For future
>>>built-in reader types it is immaterial as they can derive from it anyway
>>>just like the existing ones. HdfsSpout class itself doesn¹t care about
>>>the
>>>ŒAbstractFileReader¹ type. For that there is the ŒFileReader¹ interface.
>>>
>>>
>>>
>>>>  2.  org.apache.storm.hdfs.spout.HdfsSpout requires readerType as
>>>>String. It will be great to have class
>>>>readerType; So we will not use Class.forName at multiple places also it
>>>>will help in below point.
>>>
>>>The reason it is a string, is that, for built-in readers,  we wanted to
>>>support Œshort aliases¹ like Œtext¹ and Œseq¹ instead of FQCN..
>>>
>>>
>>>>  3.  HdfsSpout also needs to provide outFields which are declared as
>>>>constants in each reader(e.g.SequenceFileReader). We can have abstract
>>>>API AbstractFileReader in which return them to user to make it generic.
>>>
>>>
>>>These consts can¹t go into the AbstractFileReader as they are reader
>>>specific.
>>>
>>>They are there just for convenience.  Users can call withOutputFields()
>>>on
>>>the spout and set it to these predefined names or anything else.
>>>
>>>
>>>-Roshan
>>>
>>
>



Re: [Discuss] Storm hdfs spout improvements

2017-02-14 Thread Arun Mahadevan
Can you please raise a pull request with your proposal? That way it will be 
easier to review and comment.

Thanks,
Arun


On 2/15/17, 9:04 AM, "Sachin Pasalkar"  wrote:

>Can any one take a look at this? I have attached my code in JIRA.
>
>On 14/02/17, 7:38 AM, "Sachin Pasalkar" 
>wrote:
>
>>I have created JIRA for this
>>https://issues.apache.org/jira/browse/STORM-2358.
>>For point 1:
>>
>>Its specific use case just to support why it needs to be public
>>
>>For point 2:
>>We are limiting code to be very specific to these 2 implementations we
>>should have generic implementation. I see there is another check-in
>>happening for ZippedTextFileReader.I have attached my code changes in
>>JIRA, please take a look, where you need to provide class.
>>
>>For point 3:
>>
>>Lets assume I have multiple topologies with different readers. So I
>>defined the a base topology class with HDFSSpout in it. Now I always needs
>>to pass the outputFields as separate array. This actually can be part of
>>every reader class as its very specific to it.
>>
>>Thanks,
>>Sachin
>>
>>On 14/02/17, 4:52 AM, "Roshan Naik"  wrote:
>>
>>>
>>>
>>>On 2/13/17, 12:14 PM, "Sachin Pasalkar" 
>>>wrote:
>>>
I have attached updated source code of HDFSSpout for more reference. I
have updated respective classes (not attached)
>>>
>>>
>>>Don¹t see any attachment. Answers are below. Better to do this discussion
>>>on a JIRA.
>>>
>>>
>>>On 2/13/17, 8:32 AM, "Sachin Pasalkar" 
>>>wrote:
>>>
Hi,

I was looking at storm hdfs spout code in 1.x branch, I found below
improvements can be made in below code.

  1.  Make org.apache.storm.hdfs.spout.AbstractFileReader as public so
that it can be used in generics.
>>>
>>>Java generics and making a class public are unrelated to my knowledge.
>>>But
>>>making it public sounds ok to me if its useful for "user defined² readers
>>>Š although it doesn¹t really have that much going on in it. For future
>>>built-in reader types it is immaterial as they can derive from it anyway
>>>just like the existing ones. HdfsSpout class itself doesn¹t care about
>>>the
>>>ŒAbstractFileReader¹ type. For that there is the ŒFileReader¹ interface.
>>>
>>>
>>>
  2.  org.apache.storm.hdfs.spout.HdfsSpout requires readerType as
String. It will be great to have class
readerType; So we will not use Class.forName at multiple places also it
will help in below point.
>>>
>>>The reason it is a string, is that, for built-in readers,  we wanted to
>>>support Œshort aliases¹ like Œtext¹ and Œseq¹ instead of FQCN..
>>>
>>>
  3.  HdfsSpout also needs to provide outFields which are declared as
constants in each reader(e.g.SequenceFileReader). We can have abstract
API AbstractFileReader in which return them to user to make it generic.
>>>
>>>
>>>These consts can¹t go into the AbstractFileReader as they are reader
>>>specific.
>>>
>>>They are there just for convenience.  Users can call withOutputFields()
>>>on
>>>the spout and set it to these predefined names or anything else.
>>>
>>>
>>>-Roshan
>>>
>>
>




Re: [Discuss] Storm hdfs spout improvements

2017-02-14 Thread Sachin Pasalkar
Can any one take a look at this? I have attached my code in JIRA.

On 14/02/17, 7:38 AM, "Sachin Pasalkar" 
wrote:

>I have created JIRA for this
>https://issues.apache.org/jira/browse/STORM-2358.
>For point 1:
>
>Its specific use case just to support why it needs to be public
>
>For point 2:
>We are limiting code to be very specific to these 2 implementations we
>should have generic implementation. I see there is another check-in
>happening for ZippedTextFileReader.I have attached my code changes in
>JIRA, please take a look, where you need to provide class.
>
>For point 3:
>
>Lets assume I have multiple topologies with different readers. So I
>defined the a base topology class with HDFSSpout in it. Now I always needs
>to pass the outputFields as separate array. This actually can be part of
>every reader class as its very specific to it.
>
>Thanks,
>Sachin
>
>On 14/02/17, 4:52 AM, "Roshan Naik"  wrote:
>
>>
>>
>>On 2/13/17, 12:14 PM, "Sachin Pasalkar" 
>>wrote:
>>
>>>I have attached updated source code of HDFSSpout for more reference. I
>>>have updated respective classes (not attached)
>>
>>
>>Don¹t see any attachment. Answers are below. Better to do this discussion
>>on a JIRA.
>>
>>
>>On 2/13/17, 8:32 AM, "Sachin Pasalkar" 
>>wrote:
>>
>>>Hi,
>>>
>>>I was looking at storm hdfs spout code in 1.x branch, I found below
>>>improvements can be made in below code.
>>>
>>>  1.  Make org.apache.storm.hdfs.spout.AbstractFileReader as public so
>>>that it can be used in generics.
>>
>>Java generics and making a class public are unrelated to my knowledge.
>>But
>>making it public sounds ok to me if its useful for "user defined² readers
>>Š although it doesn¹t really have that much going on in it. For future
>>built-in reader types it is immaterial as they can derive from it anyway
>>just like the existing ones. HdfsSpout class itself doesn¹t care about
>>the
>>ŒAbstractFileReader¹ type. For that there is the ŒFileReader¹ interface.
>>
>>
>>
>>>  2.  org.apache.storm.hdfs.spout.HdfsSpout requires readerType as
>>>String. It will be great to have class
>>>readerType; So we will not use Class.forName at multiple places also it
>>>will help in below point.
>>
>>The reason it is a string, is that, for built-in readers,  we wanted to
>>support Œshort aliases¹ like Œtext¹ and Œseq¹ instead of FQCN..
>>
>>
>>>  3.  HdfsSpout also needs to provide outFields which are declared as
>>>constants in each reader(e.g.SequenceFileReader). We can have abstract
>>>API AbstractFileReader in which return them to user to make it generic.
>>
>>
>>These consts can¹t go into the AbstractFileReader as they are reader
>>specific.
>>
>>They are there just for convenience.  Users can call withOutputFields()
>>on
>>the spout and set it to these predefined names or anything else.
>>
>>
>>-Roshan
>>
>



Re: [Discuss] Storm hdfs spout improvements

2017-02-13 Thread Sachin Pasalkar
I have created JIRA for this
https://issues.apache.org/jira/browse/STORM-2358.
For point 1:

Its specific use case just to support why it needs to be public

For point 2:
We are limiting code to be very specific to these 2 implementations we
should have generic implementation. I see there is another check-in
happening for ZippedTextFileReader.I have attached my code changes in
JIRA, please take a look, where you need to provide class.

For point 3:

Lets assume I have multiple topologies with different readers. So I
defined the a base topology class with HDFSSpout in it. Now I always needs
to pass the outputFields as separate array. This actually can be part of
every reader class as its very specific to it.

Thanks,
Sachin

On 14/02/17, 4:52 AM, "Roshan Naik"  wrote:

>
>
>On 2/13/17, 12:14 PM, "Sachin Pasalkar" 
>wrote:
>
>>I have attached updated source code of HDFSSpout for more reference. I
>>have updated respective classes (not attached)
>
>
>Don¹t see any attachment. Answers are below. Better to do this discussion
>on a JIRA.
>
>
>On 2/13/17, 8:32 AM, "Sachin Pasalkar" 
>wrote:
>
>>Hi,
>>
>>I was looking at storm hdfs spout code in 1.x branch, I found below
>>improvements can be made in below code.
>>
>>  1.  Make org.apache.storm.hdfs.spout.AbstractFileReader as public so
>>that it can be used in generics.
>
>Java generics and making a class public are unrelated to my knowledge. But
>making it public sounds ok to me if its useful for "user defined² readers
>Š although it doesn¹t really have that much going on in it. For future
>built-in reader types it is immaterial as they can derive from it anyway
>just like the existing ones. HdfsSpout class itself doesn¹t care about the
>ŒAbstractFileReader¹ type. For that there is the ŒFileReader¹ interface.
>
>
>
>>  2.  org.apache.storm.hdfs.spout.HdfsSpout requires readerType as
>>String. It will be great to have class
>>readerType; So we will not use Class.forName at multiple places also it
>>will help in below point.
>
>The reason it is a string, is that, for built-in readers,  we wanted to
>support Œshort aliases¹ like Œtext¹ and Œseq¹ instead of FQCN..
>
>
>>  3.  HdfsSpout also needs to provide outFields which are declared as
>>constants in each reader(e.g.SequenceFileReader). We can have abstract
>>API AbstractFileReader in which return them to user to make it generic.
>
>
>These consts can¹t go into the AbstractFileReader as they are reader
>specific.
>
>They are there just for convenience.  Users can call withOutputFields() on
>the spout and set it to these predefined names or anything else.
>
>
>-Roshan
>



Re: [Discuss] Storm hdfs spout improvements

2017-02-13 Thread Roshan Naik


On 2/13/17, 12:14 PM, "Sachin Pasalkar" 
wrote:

>I have attached updated source code of HDFSSpout for more reference. I
>have updated respective classes (not attached)


Don¹t see any attachment. Answers are below. Better to do this discussion
on a JIRA.


On 2/13/17, 8:32 AM, "Sachin Pasalkar" 
wrote:

>Hi,
>
>I was looking at storm hdfs spout code in 1.x branch, I found below
>improvements can be made in below code.
>
>  1.  Make org.apache.storm.hdfs.spout.AbstractFileReader as public so
>that it can be used in generics.

Java generics and making a class public are unrelated to my knowledge. But
making it public sounds ok to me if its useful for "user defined² readers
Š although it doesn¹t really have that much going on in it. For future
built-in reader types it is immaterial as they can derive from it anyway
just like the existing ones. HdfsSpout class itself doesn¹t care about the
ŒAbstractFileReader¹ type. For that there is the ŒFileReader¹ interface.



>  2.  org.apache.storm.hdfs.spout.HdfsSpout requires readerType as
>String. It will be great to have class
>readerType; So we will not use Class.forName at multiple places also it
>will help in below point.

The reason it is a string, is that, for built-in readers,  we wanted to
support Œshort aliases¹ like Œtext¹ and Œseq¹ instead of FQCN..


>  3.  HdfsSpout also needs to provide outFields which are declared as
>constants in each reader(e.g.SequenceFileReader). We can have abstract
>API AbstractFileReader in which return them to user to make it generic.


These consts can¹t go into the AbstractFileReader as they are reader
specific.

They are there just for convenience.  Users can call withOutputFields() on
the spout and set it to these predefined names or anything else.


-Roshan



Re: [Discuss] Storm hdfs spout improvements

2017-02-13 Thread Sachin Pasalkar
I have attached updated source code of HDFSSpout for more reference. I
have updated respective classes (not attached)

On 13/02/17, 10:02 PM, "Sachin Pasalkar" 
wrote:

>Hi,
>
>I was looking at storm hdfs spout code in 1.x branch, I found below
>improvements can be made in below code.
>
>  1.  Make org.apache.storm.hdfs.spout.AbstractFileReader as public so
>that it can be used in generics.
>  2.  org.apache.storm.hdfs.spout.HdfsSpout requires readerType as
>String. It will be great to have class
>readerType; So we will not use Class.forName at multiple places also it
>will help in below point.
>  3.  HdfsSpout also needs to provide outFields which are declared as
>constants in each reader(e.g.SequenceFileReader). We can have abstract
>API AbstractFileReader in which return them to user to make it generic.
>
>Let me know your thoughts on this.
>
>Thanks,
>Sachin



[Discuss] Storm hdfs spout improvements

2017-02-13 Thread Sachin Pasalkar
Hi,

I was looking at storm hdfs spout code in 1.x branch, I found below 
improvements can be made in below code.

  1.  Make org.apache.storm.hdfs.spout.AbstractFileReader as public so that it 
can be used in generics.
  2.  org.apache.storm.hdfs.spout.HdfsSpout requires readerType as String. It 
will be great to have class readerType; So we 
will not use Class.forName at multiple places also it will help in below point.
  3.  HdfsSpout also needs to provide outFields which are declared as constants 
in each reader(e.g.SequenceFileReader). We can have abstract API 
AbstractFileReader in which return them to user to make it generic.

Let me know your thoughts on this.

Thanks,
Sachin