Re: The equivalent for INSTR in Spark FP

2016-08-02 Thread Mich Talebzadeh
Apologies it should read Jacek. Confused with my friend's name Jared :(

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 2 August 2016 at 11:18, Mich Talebzadeh 
wrote:

> Thanks Jared for your kind words. I don't think I am anywhere near there
> yet :)
>
> In general I subtract one character before getting to "CD". That is the
> way debit from debit cards are marked in a Bank's statement.
>
> I get out of bound error if -->
> select(mySubstr($"transactiondescription",lit(0),instr($"transactiondescription",
> "CD")-1)   fails with the length. So I did
>
> ll_18740868.where($"transactiontype" === "DEB" &&
> $"transactiondescription" > "
> ").select(mySubstr($"transactiondescription",lit(0),instr($"transactiondescription",
> "CD")-1),$"debitamount").collect.foreach(println)
>
> which basically examines if the value of $"transactiondescription" > "  "
> then do the substring
>
> Now are there better options than that, say make UDF handle the error
> when length($"transactiondescription" < 2 or it is null etc and returns
> something to avoid the program crashing?
>
> Thanks again for your help.
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 2 August 2016 at 10:57, Jacek Laskowski  wrote:
>
>> Congrats! You made it. A serious Spark dev badge unlocked :)
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Tue, Aug 2, 2016 at 9:58 AM, Mich Talebzadeh
>>  wrote:
>> > it should be lit(0) :)
>> >
>> > rs.select(mySubstr($"transactiondescription", lit(0),
>> > instr($"transactiondescription", "CD"))).show(1)
>> > +--+
>> > |UDF(transactiondescription,0,instr(transactiondescription,CD))|
>> > +--+
>> > |  OVERSEAS TRANSACTI C|
>> > +--+
>> >
>> >
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn
>> >
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> > Disclaimer: Use it at your own risk. Any and all responsibility for any
>> > loss, damage or destruction of data or any other property which may
>> arise
>> > from relying on this email's technical content is explicitly
>> disclaimed. The
>> > author will in no case be liable for any monetary damages arising from
>> such
>> > loss, damage or destruction.
>> >
>> >
>> >
>> >
>> > On 2 August 2016 at 08:52, Mich Talebzadeh 
>> > wrote:
>> >>
>> >> No thinking on my part!!!
>> >>
>> >> rs.select(mySubstr($"transactiondescription", lit(1),
>> >> instr($"transactiondescription", "CD"))).show(2)
>> >> +--+
>> >> |UDF(transactiondescription,1,instr(transactiondescription,CD))|
>> >> +--+
>> >> |   VERSEAS TRANSACTI C|
>> >> |   XYZ.COM 80...|
>> >> +--+
>> >> only showing top 2 rows
>> >>
>> >> Let me test it.
>> >>
>> >> Cheers
>> >>
>> >>
>> >>
>> >> Dr Mich Talebzadeh
>> >>
>> >>
>> >>
>> >> LinkedIn
>> >>
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >>
>> >>
>> >>
>> >> http://talebzadehmich.wordpress.com
>> >>
>> >>
>> >> Disclaimer: Use it at your own risk. Any and all responsibility for any
>> >> loss, damage or destruction of data or any other property which may
>> arise
>> >> from relying on this email's technical 

Re: The equivalent for INSTR in Spark FP

2016-08-02 Thread Mich Talebzadeh
Thanks Jared for your kind words. I don't think I am anywhere near there
yet :)

In general I subtract one character before getting to "CD". That is the way
debit from debit cards are marked in a Bank's statement.

I get out of bound error if -->
select(mySubstr($"transactiondescription",lit(0),instr($"transactiondescription",
"CD")-1)   fails with the length. So I did

ll_18740868.where($"transactiontype" === "DEB" && $"transactiondescription"
> "
").select(mySubstr($"transactiondescription",lit(0),instr($"transactiondescription",
"CD")-1),$"debitamount").collect.foreach(println)

which basically examines if the value of $"transactiondescription" > "  "
then do the substring

Now are there better options than that, say make UDF handle the error
when length($"transactiondescription" < 2 or it is null etc and returns
something to avoid the program crashing?

Thanks again for your help.





Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 2 August 2016 at 10:57, Jacek Laskowski  wrote:

> Congrats! You made it. A serious Spark dev badge unlocked :)
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Tue, Aug 2, 2016 at 9:58 AM, Mich Talebzadeh
>  wrote:
> > it should be lit(0) :)
> >
> > rs.select(mySubstr($"transactiondescription", lit(0),
> > instr($"transactiondescription", "CD"))).show(1)
> > +--+
> > |UDF(transactiondescription,0,instr(transactiondescription,CD))|
> > +--+
> > |  OVERSEAS TRANSACTI C|
> > +--+
> >
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> >
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> The
> > author will in no case be liable for any monetary damages arising from
> such
> > loss, damage or destruction.
> >
> >
> >
> >
> > On 2 August 2016 at 08:52, Mich Talebzadeh 
> > wrote:
> >>
> >> No thinking on my part!!!
> >>
> >> rs.select(mySubstr($"transactiondescription", lit(1),
> >> instr($"transactiondescription", "CD"))).show(2)
> >> +--+
> >> |UDF(transactiondescription,1,instr(transactiondescription,CD))|
> >> +--+
> >> |   VERSEAS TRANSACTI C|
> >> |   XYZ.COM 80...|
> >> +--+
> >> only showing top 2 rows
> >>
> >> Let me test it.
> >>
> >> Cheers
> >>
> >>
> >>
> >> Dr Mich Talebzadeh
> >>
> >>
> >>
> >> LinkedIn
> >>
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>
> >>
> >>
> >> http://talebzadehmich.wordpress.com
> >>
> >>
> >> Disclaimer: Use it at your own risk. Any and all responsibility for any
> >> loss, damage or destruction of data or any other property which may
> arise
> >> from relying on this email's technical content is explicitly
> disclaimed. The
> >> author will in no case be liable for any monetary damages arising from
> such
> >> loss, damage or destruction.
> >>
> >>
> >>
> >>
> >> On 1 August 2016 at 23:43, Mich Talebzadeh 
> >> wrote:
> >>>
> >>> Thanks Jacek.
> >>>
> >>> It sounds like the issue the position of the second variable in
> >>> substring()
> >>>
> >>> This works
> >>>
> >>> scala> val wSpec2 =
> >>> Window.partitionBy(substring($"transactiondescription",1,20))
> >>> wSpec2: org.apache.spark.sql.expressions.WindowSpec =
> >>> org.apache.spark.sql.expressions.WindowSpec@1a4eae2
> >>>
> >>> Using udf as suggested
> >>>
> >>> scala> val mySubstr = udf { (s: String, start: Int, end: Int) =>
> >>>  |  s.substring(start, end) }
> >>> mySubstr: org.apache.spark.sql.UserDefinedFunction =
> >>> UserDefinedFunction(,StringType,List(StringType,

Re: The equivalent for INSTR in Spark FP

2016-08-02 Thread Jacek Laskowski
Congrats! You made it. A serious Spark dev badge unlocked :)

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Tue, Aug 2, 2016 at 9:58 AM, Mich Talebzadeh
 wrote:
> it should be lit(0) :)
>
> rs.select(mySubstr($"transactiondescription", lit(0),
> instr($"transactiondescription", "CD"))).show(1)
> +--+
> |UDF(transactiondescription,0,instr(transactiondescription,CD))|
> +--+
> |  OVERSEAS TRANSACTI C|
> +--+
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>
>
> On 2 August 2016 at 08:52, Mich Talebzadeh 
> wrote:
>>
>> No thinking on my part!!!
>>
>> rs.select(mySubstr($"transactiondescription", lit(1),
>> instr($"transactiondescription", "CD"))).show(2)
>> +--+
>> |UDF(transactiondescription,1,instr(transactiondescription,CD))|
>> +--+
>> |   VERSEAS TRANSACTI C|
>> |   XYZ.COM 80...|
>> +--+
>> only showing top 2 rows
>>
>> Let me test it.
>>
>> Cheers
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> Disclaimer: Use it at your own risk. Any and all responsibility for any
>> loss, damage or destruction of data or any other property which may arise
>> from relying on this email's technical content is explicitly disclaimed. The
>> author will in no case be liable for any monetary damages arising from such
>> loss, damage or destruction.
>>
>>
>>
>>
>> On 1 August 2016 at 23:43, Mich Talebzadeh 
>> wrote:
>>>
>>> Thanks Jacek.
>>>
>>> It sounds like the issue the position of the second variable in
>>> substring()
>>>
>>> This works
>>>
>>> scala> val wSpec2 =
>>> Window.partitionBy(substring($"transactiondescription",1,20))
>>> wSpec2: org.apache.spark.sql.expressions.WindowSpec =
>>> org.apache.spark.sql.expressions.WindowSpec@1a4eae2
>>>
>>> Using udf as suggested
>>>
>>> scala> val mySubstr = udf { (s: String, start: Int, end: Int) =>
>>>  |  s.substring(start, end) }
>>> mySubstr: org.apache.spark.sql.UserDefinedFunction =
>>> UserDefinedFunction(,StringType,List(StringType, IntegerType,
>>> IntegerType))
>>>
>>>
>>> This was throwing error:
>>>
>>> val wSpec2 =
>>> Window.partitionBy(substring("transactiondescription",1,indexOf("transactiondescription",'CD')-2))
>>>
>>>
>>> So I tried using udf
>>>
>>> scala> val wSpec2 =
>>> Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1),
>>> instr('s, "CD")))
>>>  | )
>>> :28: error: value select is not a member of
>>> org.apache.spark.sql.ColumnName
>>>  val wSpec2 =
>>> Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1),
>>> instr('s, "CD")))
>>>
>>> Obviously I am not doing correctly :(
>>>
>>> cheers
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> Disclaimer: Use it at your own risk. Any and all responsibility for any
>>> loss, damage or destruction of data or any other property which may arise
>>> from relying on this email's technical content is explicitly disclaimed. The
>>> author will in no case be liable for any monetary damages arising from such
>>> loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On 1 August 2016 at 23:02, Jacek Laskowski  wrote:

 Hi,

 Interesting...

 I'm temping to think that substring function should accept the columns
 that hold the numbers for start and end. I'd love hearing people's
 thought on this.

 For now, I'd say you need to define udf to do substring as follows:

 scala> val mySubstr = udf { (s: String, start: Int, end: Int) =>
 s.substring(start, end) }
 mySubstr: org.apache.spark.sql.expressions.UserDefinedFunction =
 

Re: The equivalent for INSTR in Spark FP

2016-08-02 Thread Mich Talebzadeh
it should be lit(0) :)

rs.select(mySubstr($"transactiondescription", lit(0),
instr($"transactiondescription", "CD"))).show(1)
+--+
|UDF(transactiondescription,0,instr(transactiondescription,CD))|
+--+
|  OVERSEAS TRANSACTI C|
+--+



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 2 August 2016 at 08:52, Mich Talebzadeh 
wrote:

> No thinking on my part!!!
>
> rs.select(mySubstr($"transactiondescription", lit(1),
> instr($"transactiondescription", "CD"))).show(2)
> +--+
> |UDF(transactiondescription,1,instr(transactiondescription,CD))|
> +--+
> |   VERSEAS TRANSACTI C|
> |   XYZ.COM 80...|
> +--+
> only showing top 2 rows
>
> Let me test it.
>
> Cheers
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 1 August 2016 at 23:43, Mich Talebzadeh 
> wrote:
>
>> Thanks Jacek.
>>
>> It sounds like the issue the position of the second variable in
>> substring()
>>
>> This works
>>
>> scala> val wSpec2 =
>> Window.partitionBy(substring($"transactiondescription",1,20))
>> wSpec2: org.apache.spark.sql.expressions.WindowSpec =
>> org.apache.spark.sql.expressions.WindowSpec@1a4eae2
>>
>> Using udf as suggested
>>
>> scala> val mySubstr = udf { (s: String, start: Int, end: Int) =>
>>  |  s.substring(start, end) }
>> mySubstr: org.apache.spark.sql.UserDefinedFunction =
>> UserDefinedFunction(,StringType,List(StringType, IntegerType,
>> IntegerType))
>>
>>
>> This was throwing error:
>>
>> val wSpec2 = Window.partitionBy(substring("transactiondescription",1,
>> indexOf("transactiondescription",'CD')-2))
>>
>>
>> So I tried using udf
>>
>> scala> val wSpec2 =
>> Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1),
>> instr('s, "CD")))
>>  | )
>> :28: error: value select is not a member of
>> org.apache.spark.sql.ColumnName
>>  val wSpec2 =
>> Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1),
>> instr('s, "CD")))
>>
>> Obviously I am not doing correctly :(
>>
>> cheers
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 1 August 2016 at 23:02, Jacek Laskowski  wrote:
>>
>>> Hi,
>>>
>>> Interesting...
>>>
>>> I'm temping to think that substring function should accept the columns
>>> that hold the numbers for start and end. I'd love hearing people's
>>> thought on this.
>>>
>>> For now, I'd say you need to define udf to do substring as follows:
>>>
>>> scala> val mySubstr = udf { (s: String, start: Int, end: Int) =>
>>> s.substring(start, end) }
>>> mySubstr: org.apache.spark.sql.expressions.UserDefinedFunction =
>>> UserDefinedFunction(,StringType,Some(List(StringType,
>>> IntegerType, IntegerType)))
>>>
>>> scala> df.show
>>> +---+
>>> |  s|
>>> +---+
>>> |hello world|
>>> +---+
>>>
>>> scala> df.select(mySubstr('s, lit(1), instr('s, 

Re: The equivalent for INSTR in Spark FP

2016-08-02 Thread Mich Talebzadeh
No thinking on my part!!!

rs.select(mySubstr($"transactiondescription", lit(1),
instr($"transactiondescription", "CD"))).show(2)
+--+
|UDF(transactiondescription,1,instr(transactiondescription,CD))|
+--+
|   VERSEAS TRANSACTI C|
|   XYZ.COM 80...|
+--+
only showing top 2 rows

Let me test it.

Cheers



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 1 August 2016 at 23:43, Mich Talebzadeh 
wrote:

> Thanks Jacek.
>
> It sounds like the issue the position of the second variable in substring()
>
> This works
>
> scala> val wSpec2 =
> Window.partitionBy(substring($"transactiondescription",1,20))
> wSpec2: org.apache.spark.sql.expressions.WindowSpec =
> org.apache.spark.sql.expressions.WindowSpec@1a4eae2
>
> Using udf as suggested
>
> scala> val mySubstr = udf { (s: String, start: Int, end: Int) =>
>  |  s.substring(start, end) }
> mySubstr: org.apache.spark.sql.UserDefinedFunction =
> UserDefinedFunction(,StringType,List(StringType, IntegerType,
> IntegerType))
>
>
> This was throwing error:
>
> val wSpec2 = Window.partitionBy(substring("transactiondescription",1,
> indexOf("transactiondescription",'CD')-2))
>
>
> So I tried using udf
>
> scala> val wSpec2 =
> Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1),
> instr('s, "CD")))
>  | )
> :28: error: value select is not a member of
> org.apache.spark.sql.ColumnName
>  val wSpec2 =
> Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1),
> instr('s, "CD")))
>
> Obviously I am not doing correctly :(
>
> cheers
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 1 August 2016 at 23:02, Jacek Laskowski  wrote:
>
>> Hi,
>>
>> Interesting...
>>
>> I'm temping to think that substring function should accept the columns
>> that hold the numbers for start and end. I'd love hearing people's
>> thought on this.
>>
>> For now, I'd say you need to define udf to do substring as follows:
>>
>> scala> val mySubstr = udf { (s: String, start: Int, end: Int) =>
>> s.substring(start, end) }
>> mySubstr: org.apache.spark.sql.expressions.UserDefinedFunction =
>> UserDefinedFunction(,StringType,Some(List(StringType,
>> IntegerType, IntegerType)))
>>
>> scala> df.show
>> +---+
>> |  s|
>> +---+
>> |hello world|
>> +---+
>>
>> scala> df.select(mySubstr('s, lit(1), instr('s, "ll"))).show
>> +---+
>> |UDF(s, 1, instr(s, ll))|
>> +---+
>> | el|
>> +---+
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Mon, Aug 1, 2016 at 11:18 PM, Mich Talebzadeh
>>  wrote:
>> > Thanks Jacek,
>> >
>> > Do I have any other way of writing this with functional programming?
>> >
>> > select
>> >
>> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
>> >
>> >
>> > Cheers,
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn
>> >
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> > Disclaimer: Use it at your own risk. Any and all responsibility for any
>> > loss, damage or destruction of data or any other property which may
>> arise
>> > from relying on this email's technical content is explicitly
>> disclaimed. The
>> > author will in no case be liable for any monetary damages arising from
>> such
>> > 

Re: The equivalent for INSTR in Spark FP

2016-08-01 Thread Mich Talebzadeh
Thanks Jacek.

It sounds like the issue the position of the second variable in substring()

This works

scala> val wSpec2 =
Window.partitionBy(substring($"transactiondescription",1,20))
wSpec2: org.apache.spark.sql.expressions.WindowSpec =
org.apache.spark.sql.expressions.WindowSpec@1a4eae2

Using udf as suggested

scala> val mySubstr = udf { (s: String, start: Int, end: Int) =>
 |  s.substring(start, end) }
mySubstr: org.apache.spark.sql.UserDefinedFunction =
UserDefinedFunction(,StringType,List(StringType, IntegerType,
IntegerType))


This was throwing error:

val wSpec2 = Window.partitionBy(substring("transactiondescription",1,
indexOf("transactiondescription",'CD')-2))


So I tried using udf

scala> val wSpec2 =
Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1),
instr('s, "CD")))
 | )
:28: error: value select is not a member of
org.apache.spark.sql.ColumnName
 val wSpec2 =
Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1),
instr('s, "CD")))

Obviously I am not doing correctly :(

cheers



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 1 August 2016 at 23:02, Jacek Laskowski  wrote:

> Hi,
>
> Interesting...
>
> I'm temping to think that substring function should accept the columns
> that hold the numbers for start and end. I'd love hearing people's
> thought on this.
>
> For now, I'd say you need to define udf to do substring as follows:
>
> scala> val mySubstr = udf { (s: String, start: Int, end: Int) =>
> s.substring(start, end) }
> mySubstr: org.apache.spark.sql.expressions.UserDefinedFunction =
> UserDefinedFunction(,StringType,Some(List(StringType,
> IntegerType, IntegerType)))
>
> scala> df.show
> +---+
> |  s|
> +---+
> |hello world|
> +---+
>
> scala> df.select(mySubstr('s, lit(1), instr('s, "ll"))).show
> +---+
> |UDF(s, 1, instr(s, ll))|
> +---+
> | el|
> +---+
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Mon, Aug 1, 2016 at 11:18 PM, Mich Talebzadeh
>  wrote:
> > Thanks Jacek,
> >
> > Do I have any other way of writing this with functional programming?
> >
> > select
> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
> >
> >
> > Cheers,
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> >
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> The
> > author will in no case be liable for any monetary damages arising from
> such
> > loss, damage or destruction.
> >
> >
> >
> >
> > On 1 August 2016 at 22:13, Jacek Laskowski  wrote:
> >>
> >> Hi Mich,
> >>
> >> There's no indexOf UDF -
> >>
> >>
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$
> >>
> >>
> >> Pozdrawiam,
> >> Jacek Laskowski
> >> 
> >> https://medium.com/@jaceklaskowski/
> >> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> >> Follow me at https://twitter.com/jaceklaskowski
> >>
> >>
> >> On Mon, Aug 1, 2016 at 7:24 PM, Mich Talebzadeh
> >>  wrote:
> >> > Hi,
> >> >
> >> > What is the equivalent of FP for the following window/analytic that
> >> > works OK
> >> > in Spark SQL
> >> >
> >> > This one using INSTR
> >> >
> >> > select
> >> >
> >> >
> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
> >> >
> >> >
> >> > select distinct *
> >> > from (
> >> >   select
> >> >
> >> >
> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
> >> >   SUM(debitamount) OVER (PARTITION BY
> >> >
> >> >
> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2)) AS
> >> > spent
> >> >   from accounts.ll_18740868 where transactiontype = 'DEB'
> >> >  ) tmp
> >> >
> >> >
> >> > I tried indexOf but it does not work!
> >> >
> >> > val wSpec2 =
> >> >
> >> >
> 

Re: The equivalent for INSTR in Spark FP

2016-08-01 Thread Jacek Laskowski
Hi,

Interesting...

I'm temping to think that substring function should accept the columns
that hold the numbers for start and end. I'd love hearing people's
thought on this.

For now, I'd say you need to define udf to do substring as follows:

scala> val mySubstr = udf { (s: String, start: Int, end: Int) =>
s.substring(start, end) }
mySubstr: org.apache.spark.sql.expressions.UserDefinedFunction =
UserDefinedFunction(,StringType,Some(List(StringType,
IntegerType, IntegerType)))

scala> df.show
+---+
|  s|
+---+
|hello world|
+---+

scala> df.select(mySubstr('s, lit(1), instr('s, "ll"))).show
+---+
|UDF(s, 1, instr(s, ll))|
+---+
| el|
+---+

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Mon, Aug 1, 2016 at 11:18 PM, Mich Talebzadeh
 wrote:
> Thanks Jacek,
>
> Do I have any other way of writing this with functional programming?
>
> select
> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
>
>
> Cheers,
>
>
>
>
>
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>
>
> On 1 August 2016 at 22:13, Jacek Laskowski  wrote:
>>
>> Hi Mich,
>>
>> There's no indexOf UDF -
>>
>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$
>>
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Mon, Aug 1, 2016 at 7:24 PM, Mich Talebzadeh
>>  wrote:
>> > Hi,
>> >
>> > What is the equivalent of FP for the following window/analytic that
>> > works OK
>> > in Spark SQL
>> >
>> > This one using INSTR
>> >
>> > select
>> >
>> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
>> >
>> >
>> > select distinct *
>> > from (
>> >   select
>> >
>> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
>> >   SUM(debitamount) OVER (PARTITION BY
>> >
>> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2)) 
>> > AS
>> > spent
>> >   from accounts.ll_18740868 where transactiontype = 'DEB'
>> >  ) tmp
>> >
>> >
>> > I tried indexOf but it does not work!
>> >
>> > val wSpec2 =
>> >
>> > Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD")))
>> > :26: error: not found: value indexOf
>> >  val wSpec2 =
>> >
>> > Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD")))
>> >
>> >
>> > Thanks
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn
>> >
>> > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> > Disclaimer: Use it at your own risk. Any and all responsibility for any
>> > loss, damage or destruction of data or any other property which may
>> > arise
>> > from relying on this email's technical content is explicitly disclaimed.
>> > The
>> > author will in no case be liable for any monetary damages arising from
>> > such
>> > loss, damage or destruction.
>> >
>> >
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: The equivalent for INSTR in Spark FP

2016-08-01 Thread Mich Talebzadeh
Thanks Jacek,

Do I have any other way of writing this with functional programming?

select substring(transactiondescription,1,INSTR(transactiondescription,'
CD')-2),


Cheers,











Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 1 August 2016 at 22:13, Jacek Laskowski  wrote:

> Hi Mich,
>
> There's no indexOf UDF -
>
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$
>
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Mon, Aug 1, 2016 at 7:24 PM, Mich Talebzadeh
>  wrote:
> > Hi,
> >
> > What is the equivalent of FP for the following window/analytic that
> works OK
> > in Spark SQL
> >
> > This one using INSTR
> >
> > select
> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
> >
> >
> > select distinct *
> > from (
> >   select
> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
> >   SUM(debitamount) OVER (PARTITION BY
> >
> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2)) AS
> > spent
> >   from accounts.ll_18740868 where transactiontype = 'DEB'
> >  ) tmp
> >
> >
> > I tried indexOf but it does not work!
> >
> > val wSpec2 =
> >
> Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD")))
> > :26: error: not found: value indexOf
> >  val wSpec2 =
> >
> Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD")))
> >
> >
> > Thanks
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> >
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> The
> > author will in no case be liable for any monetary damages arising from
> such
> > loss, damage or destruction.
> >
> >
>


Re: The equivalent for INSTR in Spark FP

2016-08-01 Thread Jacek Laskowski
Hi Mich,

There's no indexOf UDF -
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$


Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Mon, Aug 1, 2016 at 7:24 PM, Mich Talebzadeh
 wrote:
> Hi,
>
> What is the equivalent of FP for the following window/analytic that works OK
> in Spark SQL
>
> This one using INSTR
>
> select
> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
>
>
> select distinct *
> from (
>   select
> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
>   SUM(debitamount) OVER (PARTITION BY
> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2)) AS
> spent
>   from accounts.ll_18740868 where transactiontype = 'DEB'
>  ) tmp
>
>
> I tried indexOf but it does not work!
>
> val wSpec2 =
> Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD")))
> :26: error: not found: value indexOf
>  val wSpec2 =
> Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD")))
>
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: The equivalent for INSTR in Spark FP

2016-08-01 Thread Mich Talebzadeh
Any programming expert who can shed some light on this?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 1 August 2016 at 18:24, Mich Talebzadeh 
wrote:

> Hi,
>
> What is the equivalent of FP for the following window/analytic that works
> OK in Spark SQL
>
> This one using INSTR
>
> select
> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
>
>
> select distinct *
> from (
>   select
> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
>   SUM(debitamount) OVER (PARTITION BY
> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2))
> AS spent
>   from accounts.ll_18740868 where transactiontype = 'DEB'
>  ) tmp
>
>
> I tried indexOf but it does not work!
>
> val wSpec2 =
> Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD")))
> :26: error: not found: value indexOf
>  val wSpec2 =
> Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD")))
>
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>