Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-22 Thread Zhiliang Zhu
Dear Sujit,
Since you are senior with Spark, I might not know whether it is convenient for 
you to help comment some on my dilemma 
while using spark to deal with R background application ...
Thank you very much!Zhiliang
 

 On Tuesday, September 22, 2015 1:45 AM, Zhiliang Zhu  
wrote:
   

 Hi Romi,
I must show my sincere appreciation towards your kind & helpful help.
One more question, currently I am using spark to deal with financial data 
analysis, so lots of operations on R data.frame/matrix and stat/regressionare 
always called.However, SparkR currently is not that strong, most of its 
functions are from spark SQL and Mlib. Then, SQL and DataFrame is not as 
flexibly & easyas R operate on data.frame/matrix, moreover, now I do not decide 
how much function in Mlib would be used to R specific stat/regression .
I have also thought of only operating the data by way of spark Java, it is 
quite much hard to act as data.frame/matrix from R .I think I have lost in risk 
by those.
Would you help comment some on my points...
Thank you very much!Zhiliang


 


 On Tuesday, September 22, 2015 1:21 AM, Sujit Pal  
wrote:
   

 Hi Zhiliang,
Haven't used the Java API but found this Javadoc page, may be helpful to you.
https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/mllib/rdd/RDDFunctions.html

I think the equivalent Java code snippet might go something like this:
RDDFunctions.fromRDD(rdd1, ClassTag$.apply(Class)).sliding(2)
(the second parameter of fromRDD comes from this discussion 
thread).http://apache-spark-user-list.1001560.n3.nabble.com/how-to-construct-a-ClassTag-object-as-a-method-parameter-in-Java-td6768.html

There is also the SlidingRDD 
decorator:https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/mllib/rdd/SlidingRDD.html

So maybe something like this:
new SlidingRDD(rdd1, 2, ClassTag$.apply(Class))
-sujit

On Mon, Sep 21, 2015 at 9:16 AM, Zhiliang Zhu  wrote:

Hi Sujit,
I must appreciate your kind help very much~
It seems to be OK, however, do you know the corresponding spark Java API 
achievement...Is there any java API as scala sliding, and it seemed that I do 
not find spark scala's doc about sliding ...
Thank you very much~Zhiliang 


 On Monday, September 21, 2015 11:48 PM, Sujit Pal  
wrote:
   

 Hi Zhiliang, 
Would something like this work?
val rdd2 = rdd1.sliding(2).map(v => v(1) - v(0))
-sujit

On Mon, Sep 21, 2015 at 7:58 AM, Zhiliang Zhu  
wrote:

Hi Romi,
Thanks very much for your kind help comment~~
In fact there is some valid backgroud of the application, it is about R data 
analysis #fund_nav_daily is a M X N (or M X 1) matrix or data.frame, col is 
each daily fund return, row is the daily date#fund_return_daily needs to count 
the each fund's daily return subtracted the previous day's return 
fund_return_daily <- diff(log(fund_nav_daily)) 
#the first row would be all 0, since there is no previous row ahead first row
fund_return_daily <- rbind(matrix(0,ncol = ncol(fund_return_daily)), 
fund_return_daily) ...
I need to exactly code the R program by way of spark, then RDD/DataFrame is 
used to replace R data.frame, however, I just found that it is VERY MUCH 
diffcult to make the spark program to flexibly descript & transform R backgroud 
applications.I think I have seriously lost myself into risk about this...
Would you help direct me some about the above coding issue... and my risk about 
practice in spark/R application...
I must show all my sincere thanks torwards your kind help.

P.S. currently sparkR in spark 1.4.1 , there is many bug in the API 
createDataFrame/except/unionAll, and it seemsthat spark Java has more functions 
than sparkR.Also, no specific R regression algorithmn is including in sparkR .
Best Regards,Zhiliang


 On Monday, September 21, 2015 7:36 PM, Romi Kuntsman  
wrote:
   

 RDD is a set of data rows (in your case numbers), there is no meaning for the 
order of the items.
What exactly are you trying to accomplish?

Romi Kuntsman, Big Data Engineer
http://www.totango.com

On Mon, Sep 21, 2015 at 2:29 PM, Zhiliang Zhu  
wrote:

Dear , 

I have took lots of days to think into this issue, however, without any 
success...I shall appreciate your all kind help.
There is an RDD rdd1, I would like get a new RDD rdd2, each row in 
rdd2[ i ] = rdd1[ i ] - rdd[i - 1] .What kinds of API or function would I use...

Thanks very much!John




   



   



   

  

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Romi Kuntsman
RDD is a set of data rows (in your case numbers), there is no meaning for
the order of the items.
What exactly are you trying to accomplish?

*Romi Kuntsman*, *Big Data Engineer*
http://www.totango.com

On Mon, Sep 21, 2015 at 2:29 PM, Zhiliang Zhu 
wrote:

> Dear ,
>
> I have took lots of days to think into this issue, however, without any
> success...
> I shall appreciate your all kind help.
>
> There is an RDD rdd1, I would like get a new RDD rdd2, each row
> in rdd2[ i ] = rdd1[ i ] - rdd[i - 1] .
> What kinds of API or function would I use...
>
>
> Thanks very much!
> John
>
>


How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
Dear , 

I have took lots of days to think into this issue, however, without any 
success...I shall appreciate your all kind help.
There is an RDD rdd1, I would like get a new RDD rdd2, each row in 
rdd2[ i ] = rdd1[ i ] - rdd[i - 1] .What kinds of API or function would I use...

Thanks very much!John


Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
Hi Romi,
Thanks very much for your kind help comment~~
In fact there is some valid backgroud of the application, it is about R data 
analysis #fund_nav_daily is a M X N (or M X 1) matrix or data.frame, col is 
each daily fund return, row is the daily date#fund_return_daily needs to count 
the each fund's daily return subtracted the previous day's return 
fund_return_daily <- diff(log(fund_nav_daily)) 
#the first row would be all 0, since there is no previous row ahead first row
fund_return_daily <- rbind(matrix(0,ncol = ncol(fund_return_daily)), 
fund_return_daily) ...
I need to exactly code the R program by way of spark, then RDD/DataFrame is 
used to replace R data.frame, however, I just found that it is VERY MUCH 
diffcult to make the spark program to flexibly descript & transform R backgroud 
applications.I think I have seriously lost myself into risk about this...
Would you help direct me some about the above coding issue... and my risk about 
practice in spark/R application...
I must show all my sincere thanks torwards your kind help.

P.S. currently sparkR in spark 1.4.1 , there is many bug in the API 
createDataFrame/except/unionAll, and it seemsthat spark Java has more functions 
than sparkR.Also, no specific R regression algorithmn is including in sparkR .
Best Regards,Zhiliang


 On Monday, September 21, 2015 7:36 PM, Romi Kuntsman  
wrote:
   

 RDD is a set of data rows (in your case numbers), there is no meaning for the 
order of the items.
What exactly are you trying to accomplish?

Romi Kuntsman, Big Data Engineer
http://www.totango.com

On Mon, Sep 21, 2015 at 2:29 PM, Zhiliang Zhu  
wrote:

Dear , 

I have took lots of days to think into this issue, however, without any 
success...I shall appreciate your all kind help.
There is an RDD rdd1, I would like get a new RDD rdd2, each row in 
rdd2[ i ] = rdd1[ i ] - rdd[i - 1] .What kinds of API or function would I use...

Thanks very much!John




  

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Sujit Pal
Hi Zhiliang,

Would something like this work?

val rdd2 = rdd1.sliding(2).map(v => v(1) - v(0))

-sujit


On Mon, Sep 21, 2015 at 7:58 AM, Zhiliang Zhu 
wrote:

> Hi Romi,
>
> Thanks very much for your kind help comment~~
>
> In fact there is some valid backgroud of the application, it is about R
> data analysis.
> ...
> #fund_nav_daily is a M X N (or M X 1) matrix or data.frame, col is each
> daily fund return, row is the daily date
> #fund_return_daily needs to count the each fund's daily return subtracted
> the previous day's return
> fund_return_daily <- diff(log(fund_nav_daily))
>
> #the first row would be all 0, since there is no previous row ahead first
> row
> fund_return_daily <- rbind(matrix(0,ncol = ncol(fund_return_daily)),
> fund_return_daily)
> ...
>
> I need to exactly code the R program by way of spark, then RDD/DataFrame
> is used to replace R data.frame,
> however, I just found that it is VERY MUCH diffcult to make the spark
> program to flexibly descript & transform R backgroud applications.
> I think I have seriously lost myself into risk about this...
>
> Would you help direct me some about the above coding issue... and my risk
> about practice in spark/R application...
>
> I must show all my sincere thanks torwards your kind help.
>
> P.S. currently sparkR in spark 1.4.1 , there is many bug in the API
> createDataFrame/except/unionAll, and it seems
> that spark Java has more functions than sparkR.
> Also, no specific R regression algorithmn is including in sparkR .
>
> Best Regards,
> Zhiliang
>
>
> On Monday, September 21, 2015 7:36 PM, Romi Kuntsman 
> wrote:
>
>
> RDD is a set of data rows (in your case numbers), there is no meaning for
> the order of the items.
> What exactly are you trying to accomplish?
>
> *Romi Kuntsman*, *Big Data Engineer*
> http://www.totango.com
>
> On Mon, Sep 21, 2015 at 2:29 PM, Zhiliang Zhu  > wrote:
>
> Dear ,
>
> I have took lots of days to think into this issue, however, without any
> success...
> I shall appreciate your all kind help.
>
> There is an RDD rdd1, I would like get a new RDD rdd2, each row
> in rdd2[ i ] = rdd1[ i ] - rdd[i - 1] .
> What kinds of API or function would I use...
>
>
> Thanks very much!
> John
>
>
>
>
>


Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
Hi Romi,
I must show my sincere appreciation towards your kind & helpful help.
One more question, currently I am using spark to deal with financial data 
analysis, so lots of operations on R data.frame/matrix and stat/regressionare 
always called.However, SparkR currently is not that strong, most of its 
functions are from spark SQL and Mlib. Then, SQL and DataFrame is not as 
flexibly & easyas R operate on data.frame/matrix, moreover, now I do not decide 
how much function in Mlib would be used to R specific stat/regression .
I have also thought of only operating the data by way of spark Java, it is 
quite much hard to act as data.frame/matrix from R .I think I have lost in risk 
by those.
Would you help comment some on my points...
Thank you very much!Zhiliang


 


 On Tuesday, September 22, 2015 1:21 AM, Sujit Pal  
wrote:
   

 Hi Zhiliang,
Haven't used the Java API but found this Javadoc page, may be helpful to you.
https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/mllib/rdd/RDDFunctions.html

I think the equivalent Java code snippet might go something like this:
RDDFunctions.fromRDD(rdd1, ClassTag$.apply(Class)).sliding(2)
(the second parameter of fromRDD comes from this discussion 
thread).http://apache-spark-user-list.1001560.n3.nabble.com/how-to-construct-a-ClassTag-object-as-a-method-parameter-in-Java-td6768.html

There is also the SlidingRDD 
decorator:https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/mllib/rdd/SlidingRDD.html

So maybe something like this:
new SlidingRDD(rdd1, 2, ClassTag$.apply(Class))
-sujit

On Mon, Sep 21, 2015 at 9:16 AM, Zhiliang Zhu  wrote:

Hi Sujit,
I must appreciate your kind help very much~
It seems to be OK, however, do you know the corresponding spark Java API 
achievement...Is there any java API as scala sliding, and it seemed that I do 
not find spark scala's doc about sliding ...
Thank you very much~Zhiliang 


 On Monday, September 21, 2015 11:48 PM, Sujit Pal  
wrote:
   

 Hi Zhiliang, 
Would something like this work?
val rdd2 = rdd1.sliding(2).map(v => v(1) - v(0))
-sujit

On Mon, Sep 21, 2015 at 7:58 AM, Zhiliang Zhu  
wrote:

Hi Romi,
Thanks very much for your kind help comment~~
In fact there is some valid backgroud of the application, it is about R data 
analysis #fund_nav_daily is a M X N (or M X 1) matrix or data.frame, col is 
each daily fund return, row is the daily date#fund_return_daily needs to count 
the each fund's daily return subtracted the previous day's return 
fund_return_daily <- diff(log(fund_nav_daily)) 
#the first row would be all 0, since there is no previous row ahead first row
fund_return_daily <- rbind(matrix(0,ncol = ncol(fund_return_daily)), 
fund_return_daily) ...
I need to exactly code the R program by way of spark, then RDD/DataFrame is 
used to replace R data.frame, however, I just found that it is VERY MUCH 
diffcult to make the spark program to flexibly descript & transform R backgroud 
applications.I think I have seriously lost myself into risk about this...
Would you help direct me some about the above coding issue... and my risk about 
practice in spark/R application...
I must show all my sincere thanks torwards your kind help.

P.S. currently sparkR in spark 1.4.1 , there is many bug in the API 
createDataFrame/except/unionAll, and it seemsthat spark Java has more functions 
than sparkR.Also, no specific R regression algorithmn is including in sparkR .
Best Regards,Zhiliang


 On Monday, September 21, 2015 7:36 PM, Romi Kuntsman  
wrote:
   

 RDD is a set of data rows (in your case numbers), there is no meaning for the 
order of the items.
What exactly are you trying to accomplish?

Romi Kuntsman, Big Data Engineer
http://www.totango.com

On Mon, Sep 21, 2015 at 2:29 PM, Zhiliang Zhu  
wrote:

Dear , 

I have took lots of days to think into this issue, however, without any 
success...I shall appreciate your all kind help.
There is an RDD rdd1, I would like get a new RDD rdd2, each row in 
rdd2[ i ] = rdd1[ i ] - rdd[i - 1] .What kinds of API or function would I use...

Thanks very much!John




   



   



  

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
Hi Sujit,
I must appreciate your kind help very much~
It seems to be OK, however, do you know the corresponding spark Java API 
achievement...Is there any java API as scala sliding, and it seemed that I do 
not find spark scala's doc about sliding ...
Thank you very much~Zhiliang 


 On Monday, September 21, 2015 11:48 PM, Sujit Pal  
wrote:
   

 Hi Zhiliang, 
Would something like this work?
val rdd2 = rdd1.sliding(2).map(v => v(1) - v(0))
-sujit

On Mon, Sep 21, 2015 at 7:58 AM, Zhiliang Zhu  
wrote:

Hi Romi,
Thanks very much for your kind help comment~~
In fact there is some valid backgroud of the application, it is about R data 
analysis #fund_nav_daily is a M X N (or M X 1) matrix or data.frame, col is 
each daily fund return, row is the daily date#fund_return_daily needs to count 
the each fund's daily return subtracted the previous day's return 
fund_return_daily <- diff(log(fund_nav_daily)) 
#the first row would be all 0, since there is no previous row ahead first row
fund_return_daily <- rbind(matrix(0,ncol = ncol(fund_return_daily)), 
fund_return_daily) ...
I need to exactly code the R program by way of spark, then RDD/DataFrame is 
used to replace R data.frame, however, I just found that it is VERY MUCH 
diffcult to make the spark program to flexibly descript & transform R backgroud 
applications.I think I have seriously lost myself into risk about this...
Would you help direct me some about the above coding issue... and my risk about 
practice in spark/R application...
I must show all my sincere thanks torwards your kind help.

P.S. currently sparkR in spark 1.4.1 , there is many bug in the API 
createDataFrame/except/unionAll, and it seemsthat spark Java has more functions 
than sparkR.Also, no specific R regression algorithmn is including in sparkR .
Best Regards,Zhiliang


 On Monday, September 21, 2015 7:36 PM, Romi Kuntsman  
wrote:
   

 RDD is a set of data rows (in your case numbers), there is no meaning for the 
order of the items.
What exactly are you trying to accomplish?

Romi Kuntsman, Big Data Engineer
http://www.totango.com

On Mon, Sep 21, 2015 at 2:29 PM, Zhiliang Zhu  
wrote:

Dear , 

I have took lots of days to think into this issue, however, without any 
success...I shall appreciate your all kind help.
There is an RDD rdd1, I would like get a new RDD rdd2, each row in 
rdd2[ i ] = rdd1[ i ] - rdd[i - 1] .What kinds of API or function would I use...

Thanks very much!John




   



  

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
Hi Sujit,
Thanks very much for your kind help.I have found the sliding doc in both scala 
and java spark, it is from mlib RDDFunctions, though in the doc there is always 
not enough example.
Best Regards,Zhiliang

 


 On Monday, September 21, 2015 11:48 PM, Sujit Pal  
wrote:
   

 Hi Zhiliang, 
Would something like this work?
val rdd2 = rdd1.sliding(2).map(v => v(1) - v(0))
-sujit

On Mon, Sep 21, 2015 at 7:58 AM, Zhiliang Zhu  
wrote:

Hi Romi,
Thanks very much for your kind help comment~~
In fact there is some valid backgroud of the application, it is about R data 
analysis #fund_nav_daily is a M X N (or M X 1) matrix or data.frame, col is 
each daily fund return, row is the daily date#fund_return_daily needs to count 
the each fund's daily return subtracted the previous day's return 
fund_return_daily <- diff(log(fund_nav_daily)) 
#the first row would be all 0, since there is no previous row ahead first row
fund_return_daily <- rbind(matrix(0,ncol = ncol(fund_return_daily)), 
fund_return_daily) ...
I need to exactly code the R program by way of spark, then RDD/DataFrame is 
used to replace R data.frame, however, I just found that it is VERY MUCH 
diffcult to make the spark program to flexibly descript & transform R backgroud 
applications.I think I have seriously lost myself into risk about this...
Would you help direct me some about the above coding issue... and my risk about 
practice in spark/R application...
I must show all my sincere thanks torwards your kind help.

P.S. currently sparkR in spark 1.4.1 , there is many bug in the API 
createDataFrame/except/unionAll, and it seemsthat spark Java has more functions 
than sparkR.Also, no specific R regression algorithmn is including in sparkR .
Best Regards,Zhiliang


 On Monday, September 21, 2015 7:36 PM, Romi Kuntsman  
wrote:
   

 RDD is a set of data rows (in your case numbers), there is no meaning for the 
order of the items.
What exactly are you trying to accomplish?

Romi Kuntsman, Big Data Engineer
http://www.totango.com

On Mon, Sep 21, 2015 at 2:29 PM, Zhiliang Zhu  
wrote:

Dear , 

I have took lots of days to think into this issue, however, without any 
success...I shall appreciate your all kind help.
There is an RDD rdd1, I would like get a new RDD rdd2, each row in 
rdd2[ i ] = rdd1[ i ] - rdd[i - 1] .What kinds of API or function would I use...

Thanks very much!John




   



  

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Sujit Pal
Hi Zhiliang,

Haven't used the Java API but found this Javadoc page, may be helpful to
you.

https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/mllib/rdd/RDDFunctions.html

I think the equivalent Java code snippet might go something like this:

RDDFunctions.fromRDD(rdd1, ClassTag$.apply(Class)).sliding(2)

(the second parameter of fromRDD comes from this discussion thread).
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-construct-a-ClassTag-object-as-a-method-parameter-in-Java-td6768.html

There is also the SlidingRDD decorator:
https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/mllib/rdd/SlidingRDD.html

So maybe something like this:

new SlidingRDD(rdd1, 2, ClassTag$.apply(Class))

-sujit

On Mon, Sep 21, 2015 at 9:16 AM, Zhiliang Zhu  wrote:

> Hi Sujit,
>
> I must appreciate your kind help very much~
>
> It seems to be OK, however, do you know the corresponding spark Java API
> achievement...
> Is there any java API as scala sliding, and it seemed that I do not find
> spark scala's doc about sliding ...
>
> Thank you very much~
> Zhiliang
>
>
>
> On Monday, September 21, 2015 11:48 PM, Sujit Pal 
> wrote:
>
>
> Hi Zhiliang,
>
> Would something like this work?
>
> val rdd2 = rdd1.sliding(2).map(v => v(1) - v(0))
>
> -sujit
>
>
> On Mon, Sep 21, 2015 at 7:58 AM, Zhiliang Zhu  > wrote:
>
> Hi Romi,
>
> Thanks very much for your kind help comment~~
>
> In fact there is some valid backgroud of the application, it is about R
> data analysis.
> ...
> #fund_nav_daily is a M X N (or M X 1) matrix or data.frame, col is each
> daily fund return, row is the daily date
> #fund_return_daily needs to count the each fund's daily return subtracted
> the previous day's return
> fund_return_daily <- diff(log(fund_nav_daily))
>
> #the first row would be all 0, since there is no previous row ahead first
> row
> fund_return_daily <- rbind(matrix(0,ncol = ncol(fund_return_daily)),
> fund_return_daily)
> ...
>
> I need to exactly code the R program by way of spark, then RDD/DataFrame
> is used to replace R data.frame,
> however, I just found that it is VERY MUCH diffcult to make the spark
> program to flexibly descript & transform R backgroud applications.
> I think I have seriously lost myself into risk about this...
>
> Would you help direct me some about the above coding issue... and my risk
> about practice in spark/R application...
>
> I must show all my sincere thanks torwards your kind help.
>
> P.S. currently sparkR in spark 1.4.1 , there is many bug in the API
> createDataFrame/except/unionAll, and it seems
> that spark Java has more functions than sparkR.
> Also, no specific R regression algorithmn is including in sparkR .
>
> Best Regards,
> Zhiliang
>
>
> On Monday, September 21, 2015 7:36 PM, Romi Kuntsman 
> wrote:
>
>
> RDD is a set of data rows (in your case numbers), there is no meaning for
> the order of the items.
> What exactly are you trying to accomplish?
>
> *Romi Kuntsman*, *Big Data Engineer*
> http://www.totango.com
>
> On Mon, Sep 21, 2015 at 2:29 PM, Zhiliang Zhu  > wrote:
>
> Dear ,
>
> I have took lots of days to think into this issue, however, without any
> success...
> I shall appreciate your all kind help.
>
> There is an RDD rdd1, I would like get a new RDD rdd2, each row
> in rdd2[ i ] = rdd1[ i ] - rdd[i - 1] .
> What kinds of API or function would I use...
>
>
> Thanks very much!
> John
>
>
>
>
>
>
>
>