Spark Streaming RDD Cleanup too slow

2018-09-05 Thread Prashant Sharma
I have a Spark Streaming job which takes too long to delete temp RDD's. I
collect about 4MM telemetry metrics per minute and do minor aggregations in
the Streaming Job.

I am using Amazon R4 instances.  The Driver RPC call although Async,i
believe, is slow getting the handle for future object  at "askAsync call.
Here  is the Spark code which does the cleanup -
https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala#L125

Any chance anyone else encountered similar issue with their Streaming jobs?
About 20% of our time (~60 secs) is spent in cleaning the temp RDDs.
best,
Prashant


Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Manu Zhang
Have you tried adding Encoder for columns as suggested by Jungtaek Lim ?

On Thu, Sep 6, 2018 at 6:24 AM Mich Talebzadeh 
wrote:

>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> I can rebuild the comma separated list as follows:
>
>
>case class columns(KEY: String, TICKER: String, TIMEISSUED: String,
> PRICE: Float)
> val sqlContext= new org.apache.spark.sql.SQLContext(sparkContext)
> import sqlContext.implicits._
>
>
>  for(line <- pricesRDD.collect.toArray)
>  {
>var key = line._2.split(',').view(0).toString
>var ticker =  line._2.split(',').view(1).toString
>var timeissued = line._2.split(',').view(2).toString
>var price = line._2.split(',').view(3).toFloat
>var allInOne = key+","+ticker+","+timeissued+","+price
>println(allInOne)
>
> and the print shows the columns separated by ","
>
>
> 34e07d9f-829a-446a-93ab-8b93aa8eda41,SAP,2018-09-05T23:22:34,56.89
>
> So I just need to convert that line of rowinto a DataFrame
>
> I try this conversion to DF to write to MongoDB document with 
> MongoSpark.save(df,
> writeConfig)
>
> var df = sparkContext.parallelize(Seq(columns(key, ticker, timeissued,
> price))).toDF
>
> [error]
> /data6/hduser/scala/md_streaming_mongoDB/src/main/scala/myPackage/md_streaming_mongoDB.scala:235:
> value toDF is not a member of org.apache.spark.rdd.RDD[columns]
> [error] var df = sparkContext.parallelize(Seq(columns(key,
> ticker, timeissued, price))).toDF
> [
>
>
> frustrating!
>
>  has anyone come across this?
>
> thanks
>
> On Wed, 5 Sep 2018 at 13:30, Mich Talebzadeh 
> wrote:
>
>> yep already tried it and it did not work.
>>
>> thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 5 Sep 2018 at 10:10, Deepak Sharma  wrote:
>>
>>> Try this:
>>>
>>> *import **spark*.implicits._
>>>
>>> df.toDF()
>>>
>>>
>>> On Wed, Sep 5, 2018 at 2:31 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 With the following

 case class columns(KEY: String, TICKER: String, TIMEISSUED: String,
 PRICE: Float)

  var key = line._2.split(',').view(0).toString
  var ticker =  line._2.split(',').view(1).toString
  var timeissued = line._2.split(',').view(2).toString
  var price = line._2.split(',').view(3).toFloat

   var df = Seq(columns(key, ticker, timeissued, price))
  println(df)

 I get


 List(columns(ac11a78d-82df-4b37-bf58-7e3388aa64cd,MKS,2018-09-05T10:10:15,676.5))

 So just need to convert that list to DF

 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com


 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Wed, 5 Sep 2018 at 09:49, Mich Talebzadeh 
 wrote:

> Thanks!
>
> The spark  is version 2.3.0
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise 

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Mich Talebzadeh
Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



I can rebuild the comma separated list as follows:


   case class columns(KEY: String, TICKER: String, TIMEISSUED: String,
PRICE: Float)
val sqlContext= new org.apache.spark.sql.SQLContext(sparkContext)
import sqlContext.implicits._


 for(line <- pricesRDD.collect.toArray)
 {
   var key = line._2.split(',').view(0).toString
   var ticker =  line._2.split(',').view(1).toString
   var timeissued = line._2.split(',').view(2).toString
   var price = line._2.split(',').view(3).toFloat
   var allInOne = key+","+ticker+","+timeissued+","+price
   println(allInOne)

and the print shows the columns separated by ","


34e07d9f-829a-446a-93ab-8b93aa8eda41,SAP,2018-09-05T23:22:34,56.89

So I just need to convert that line of rowinto a DataFrame

I try this conversion to DF to write to MongoDB document with
MongoSpark.save(df,
writeConfig)

var df = sparkContext.parallelize(Seq(columns(key, ticker, timeissued,
price))).toDF

[error]
/data6/hduser/scala/md_streaming_mongoDB/src/main/scala/myPackage/md_streaming_mongoDB.scala:235:
value toDF is not a member of org.apache.spark.rdd.RDD[columns]
[error] var df = sparkContext.parallelize(Seq(columns(key,
ticker, timeissued, price))).toDF
[


frustrating!

 has anyone come across this?

thanks

On Wed, 5 Sep 2018 at 13:30, Mich Talebzadeh 
wrote:

> yep already tried it and it did not work.
>
> thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 5 Sep 2018 at 10:10, Deepak Sharma  wrote:
>
>> Try this:
>>
>> *import **spark*.implicits._
>>
>> df.toDF()
>>
>>
>> On Wed, Sep 5, 2018 at 2:31 PM Mich Talebzadeh 
>> wrote:
>>
>>> With the following
>>>
>>> case class columns(KEY: String, TICKER: String, TIMEISSUED: String,
>>> PRICE: Float)
>>>
>>>  var key = line._2.split(',').view(0).toString
>>>  var ticker =  line._2.split(',').view(1).toString
>>>  var timeissued = line._2.split(',').view(2).toString
>>>  var price = line._2.split(',').view(3).toFloat
>>>
>>>   var df = Seq(columns(key, ticker, timeissued, price))
>>>  println(df)
>>>
>>> I get
>>>
>>>
>>> List(columns(ac11a78d-82df-4b37-bf58-7e3388aa64cd,MKS,2018-09-05T10:10:15,676.5))
>>>
>>> So just need to convert that list to DF
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Wed, 5 Sep 2018 at 09:49, Mich Talebzadeh 
>>> wrote:
>>>
 Thanks!

 The spark  is version 2.3.0

 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com


 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Wed, 5 Sep 2018 at 09:41, Jungtaek Lim  wrote:

> You may also find below link useful (though it looks far old), since
> case class is the thing 

Re: deploy-mode cluster. FileNotFoundException

2018-09-05 Thread Marcelo Vanzin
See SPARK-4160. Long story short: you need to upload the files and
jars to some shared storage (like HDFS) manually.
On Wed, Sep 5, 2018 at 2:17 AM Guillermo Ortiz Fernández
 wrote:
>
> I'm using standalone cluster and the final command I'm trying is:
> spark-submit --verbose --deploy-mode cluster --driver-java-options 
> "-Dlogback.configurationFile=conf/i${1}Logback.xml" \
> --class com.example.Launcher --driver-class-path 
> lib/spark-streaming-kafka-0-10_2.11-2.0.2.jar:lib/kafka-clients-0.10.0.1.jar  
> \
> --files conf/${1}Conf.json iris-core-0.0.1-SNAPSHOT.jar conf/${1}Conf.json
>
> El mié., 5 sept. 2018 a las 11:11, Guillermo Ortiz Fernández 
> () escribió:
>>
>> I want to execute my processes in cluster mode. As I don't know where the 
>> driver has been executed I have to do available all the file it needs. I 
>> undertand that they are two options. Copy all the files to all nodes of copy 
>> them to HDFS.
>>
>> My doubt is,, if I want to put all the files in HDFS, isn't it automatic 
>> with --files and --jar parameters in the spark-submit command? or do I have 
>> to copy to HDFS manually?
>>
>> My idea is to execute something like:
>> spark-submit --driver-java-options 
>> "-Dlogback.configurationFile=conf/${1}Logback.xml" \
>> --class com.example.Launcher --driver-class-path 
>> lib/spark-streaming-kafka-0-10_2.11-2.0.2.jar:lib/kafka-clients-1.0.0.jar \
>> --files /conf/${1}Conf.json example-0.0.1-SNAPSHOT.jar conf/${1}Conf.json
>> I have tried to with --files hdfs:// without copying anything to hdfs 
>> and it doesn't work either.
>>


-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[ML] Setting Non-Transform Params for a Pipeline & PipelineModel

2018-09-05 Thread Aleksander Eskilson
I had originally sent this to the Dev list since the API discussed here is
still marked as experimental in portions, but it occurs to me this may
still be a general use question, sorry for the cross-listing.

In a nutshell, what I'd like to do is instantiate a Pipeline (or extension
class of Pipeline) with metadata that is copied to the PipelineModel when
fitted, and can be read again when the fitted model is persisted and loaded
by another consumer. These metadata are specific to the PipelineModel more
than any particular Transform or the Estimator declared as part of the
Pipeline: the intent is that the PipelineModel params can be read by a
downstream consumer of the loaded model, but the value that the params
should take will only be known to the creator the of Pipeline/trainer of
the PipelineModel.

It seems that Pipeline and PipelineModel support the Params interface, like
Transform and Estimator do. It seems I can extend Pipeline to a custom
class MyPipeline, where the constructor could enforce that my metadata
Params are set. However, when the Pipeline is *fit*, the resultant
PipelineModel doesn't seem to include the original CustomPipeline's params,
only params from the individual Transform steps.

>From a read of the code, it seems that the *fit* method will copy over the
Stages to the PipelineModel, and those will be persisted (along with the
Stages' Params) during *write*, *but* any Params belonging to the Pipeline
are not copied to the PipelineModel (as only Stages are considered during
copy, not the ParamMap of the Pipeline) [1].

Is this a correct read of the flow here? That a CustomPipeline extension of
Pipeline with member Params does not get those non-Transform Params copied
into the fitted PipelineMode?

If so, would a feature enhancement including Pipeline-specific Params being
copyable into the fitted PipelineModel be considered acceptable?

Or should there be another way to include metadata *about* the Pipeline
such that the metadata is copyable to the fitted PipelineModel, and able to
be persisted with PipelineModel *write* and read again with PipelineModel
*load*? My first attempt at this has been to extend the Pipeline class
itself with member params, but this doesn't seem to do the trick given how
Params are actually copied only for Stages between Pipeline and the fitted
PipelineModel.

It occurs to me I could write a custom *withMetadata* transform Stage which
would really just an identity function but with the desired Params built
in, and that those Params would get copied with the other Stages, but as
discussed at the top, this particular use-case for metadata isn't about any
particular Transform, but more about metadata for the whole Pipeline.

Alek

[1] --
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala#L135


Re: Unsubscribe

2018-09-05 Thread Sunil Prabhakara
>
>


Re: Spark hive udf: no handler for UDAF analysis exception

2018-09-05 Thread Swapnil Chougule
Looks like Spark Session has only implementation for UDAF but not for UDF.
Is it a bug or some work around is there ?
T.Gaweda has opened JIRA for this. SPARK-25334

Thanks,
Swapnil

On Tue, Sep 4, 2018 at 4:20 PM Swapnil Chougule 
wrote:

> Created one project 'spark-udf' & written hive udf as below:
>
> package com.spark.udf
> import org.apache.hadoop.hive.ql.exec.UDF
>
> class UpperCase extends UDF with Serializable {
>   def evaluate(input: String): String = {
> input.toUpperCase
>   }
>
> Built it & created jar for it. Tried to use this udf in another spark
> program:
>
> spark.sql("CREATE OR REPLACE FUNCTION uppercase AS
> 'com.spark.udf.UpperCase' USING JAR
> '/home/swapnil/spark-udf/target/spark-udf-1.0.jar'")
>
> But following line is giving me exception:
>
> spark.sql("select uppercase(Car) as NAME from cars").show
>
> *Exception:*
>
> > Exception in thread "main" org.apache.spark.sql.AnalysisException: No
> > handler for UDAF 'com.dcengines.fluir.udf.Strlen'. Use
> > sparkSession.udf.register(...) instead.; line 1 pos 7 at
> >
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeFunctionExpression(SessionCatalog.scala:1105)
> > at
> >
> org.apache.spark.sql.catalyst.catalog.SessionCatalog$$anonfun$org$apache$spark$sql$catalyst$catalog$SessionCatalog$$makeFunctionBuilder$1.apply(SessionCatalog.scala:1085)
> > at
> >
> org.apache.spark.sql.catalyst.catalog.SessionCatalog$$anonfun$org$apache$spark$sql$catalyst$catalog$SessionCatalog$$makeFunctionBuilder$1.apply(SessionCatalog.scala:1085)
> > at
> >
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:115)
> > at
> >
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction(SessionCatalog.scala:1247)
> > at
> >
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$16$$anonfun$applyOrElse$6$$anonfun$applyOrElse$52.apply(Analyzer.scala:1226)
> > at
> >
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$16$$anonfun$applyOrElse$6$$anonfun$applyOrElse$52.apply(Analyzer.scala:1226)
> > at
> >
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>
> Any help around this is really appreciated.
>
>
>


Padova Apache Spark Meetup

2018-09-05 Thread Matteo Durighetto
Hello,
we are creating a new meetup of enthusiast Apache Spark Users
in Italy at Padova


https://www.meetup.com/Padova-Apache-Spark-Meetup/

Is it possible to  add the meetup link to the web page
https://spark.apache.org/community.html ?

Moreover is it possible to announce future events in this mailing list ?


Kind Regards

Matteo Durighetto
e-mail: m.durighe...@miriade.it
supporto kandula :
database : support...@miriade.it
business intelligence : support...@miriade.it
infrastructure : supp...@miriade.it

M I R I A D E - P L A Y  T H E  C H A N G E

Via Castelletto 11, 36016 Thiene VI
Tel. 0445030111 - Fax 0445030100
Website: http://www.miriade.it/





 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Le informazioni contenute in questa e-mail sono destinate alla persona
alla quale sono state inviate. Nel rispetto della legge, dei
regolamenti e delle normative vigenti, questa e-mail non deve essere
resa pubblica poiché potrebbe contenere informazioni di natura
strettamente confidenziale. Qualsiasi persona che al di fuori del
destinatario dovesse riceverla o dovesse entrarne in possesso non é
autorizzata a leggerla, diffonderla, inoltrarla o duplicarla. Se chi
legge non é il destinatario del messaggio e' pregato di avvisare
immediatamente il mittente e successivamente di eliminarlo. Miriade
declina ogni responsabilità per l'incompleta e l'errata trasmissione
di questa e-mail o per un ritardo nella ricezione della stessa.


Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Mich Talebzadeh
yep already tried it and it did not work.

thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 5 Sep 2018 at 10:10, Deepak Sharma  wrote:

> Try this:
>
> *import **spark*.implicits._
>
> df.toDF()
>
>
> On Wed, Sep 5, 2018 at 2:31 PM Mich Talebzadeh 
> wrote:
>
>> With the following
>>
>> case class columns(KEY: String, TICKER: String, TIMEISSUED: String,
>> PRICE: Float)
>>
>>  var key = line._2.split(',').view(0).toString
>>  var ticker =  line._2.split(',').view(1).toString
>>  var timeissued = line._2.split(',').view(2).toString
>>  var price = line._2.split(',').view(3).toFloat
>>
>>   var df = Seq(columns(key, ticker, timeissued, price))
>>  println(df)
>>
>> I get
>>
>>
>> List(columns(ac11a78d-82df-4b37-bf58-7e3388aa64cd,MKS,2018-09-05T10:10:15,676.5))
>>
>> So just need to convert that list to DF
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 5 Sep 2018 at 09:49, Mich Talebzadeh 
>> wrote:
>>
>>> Thanks!
>>>
>>> The spark  is version 2.3.0
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Wed, 5 Sep 2018 at 09:41, Jungtaek Lim  wrote:
>>>
 You may also find below link useful (though it looks far old), since
 case class is the thing which Encoder is available, so there may be another
 reason which prevent implicit conversion.


 https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-Scala-Error-value-toDF-is-not-a-member-of-org-apache/m-p/29994/highlight/true#M973

 And which Spark version do you use?


 2018년 9월 5일 (수) 오후 5:32, Jungtaek Lim 님이 작성:

> Sorry I guess I pasted another method. the code is...
>
> implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): 
> DatasetHolder[T] = {
>   DatasetHolder(_sqlContext.createDataset(s))
> }
>
>
> 2018년 9월 5일 (수) 오후 5:30, Jungtaek Lim 님이 작성:
>
>> I guess you need to have encoder for the type of result for columns().
>>
>>
>> https://github.com/apache/spark/blob/2119e518d31331e65415e0f817a6f28ff18d2b42/sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala#L227-L229
>>
>> implicit def rddToDatasetHolder[T : Encoder](rdd: RDD[T]): 
>> DatasetHolder[T] = {
>>   DatasetHolder(_sqlContext.createDataset(rdd))
>> }
>>
>> You can see lots of Encoder implementations in the scala code. If
>> your type doesn't match anything it may not work and you need to provide
>> custom Encoder.
>>
>> -Jungtaek Lim (HeartSaVioR)
>>
>> 2018년 9월 5일 (수) 오후 5:24, Mich Talebzadeh 님이
>> 작성:
>>
>>> Thanks
>>>
>>> I already do that as below
>>>
>>> val sqlContext= new
>>> org.apache.spark.sql.SQLContext(sparkContext)
>>>   import sqlContext.implicits._
>>>
>>> but still getting the error!
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>> for any loss, damage or 

Re: deploy-mode cluster. FileNotFoundException

2018-09-05 Thread Guillermo Ortiz Fernández
I'm using standalone cluster and the final command I'm trying is:
spark-submit --verbose --deploy-mode cluster --driver-java-options
"-Dlogback.configurationFile=conf/i${1}Logback.xml" \
--class com.example.Launcher --driver-class-path
lib/spark-streaming-kafka-0-10_2.11-2.0.2.jar:lib/kafka-clients-0.10.0.1.jar
\
--files conf/${1}Conf.json iris-core-0.0.1-SNAPSHOT.jar conf/${1}Conf.json

El mié., 5 sept. 2018 a las 11:11, Guillermo Ortiz Fernández (<
guillermo.ortiz.f...@gmail.com>) escribió:

> I want to execute my processes in cluster mode. As I don't know where the
> driver has been executed I have to do available all the file it needs. I
> undertand that they are two options. Copy all the files to all nodes of
> copy them to HDFS.
>
> My doubt is,, if I want to put all the files in HDFS, isn't it automatic
> with --files and --jar parameters in the spark-submit command? or do I have
> to copy to HDFS manually?
>
> My idea is to execute something like:
> spark-submit --driver-java-options
> "-Dlogback.configurationFile=conf/${1}Logback.xml" \
> --class com.example.Launcher --driver-class-path
> lib/spark-streaming-kafka-0-10_2.11-2.0.2.jar:lib/kafka-clients-1.0.0.jar \
> --files /conf/${1}Conf.json example-0.0.1-SNAPSHOT.jar conf/${1}Conf.json
> I have tried to with --files hdfs:// without copying anything to hdfs
> and it doesn't work either.
>
>


deploy-mode cluster. FileNotFoundException

2018-09-05 Thread Guillermo Ortiz Fernández
I want to execute my processes in cluster mode. As I don't know where the
driver has been executed I have to do available all the file it needs. I
undertand that they are two options. Copy all the files to all nodes of
copy them to HDFS.

My doubt is,, if I want to put all the files in HDFS, isn't it automatic
with --files and --jar parameters in the spark-submit command? or do I have
to copy to HDFS manually?

My idea is to execute something like:
spark-submit --driver-java-options
"-Dlogback.configurationFile=conf/${1}Logback.xml" \
--class com.example.Launcher --driver-class-path
lib/spark-streaming-kafka-0-10_2.11-2.0.2.jar:lib/kafka-clients-1.0.0.jar \
--files /conf/${1}Conf.json example-0.0.1-SNAPSHOT.jar conf/${1}Conf.json
I have tried to with --files hdfs:// without copying anything to hdfs
and it doesn't work either.


Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Deepak Sharma
Try this:

*import **spark*.implicits._

df.toDF()


On Wed, Sep 5, 2018 at 2:31 PM Mich Talebzadeh 
wrote:

> With the following
>
> case class columns(KEY: String, TICKER: String, TIMEISSUED: String, PRICE:
> Float)
>
>  var key = line._2.split(',').view(0).toString
>  var ticker =  line._2.split(',').view(1).toString
>  var timeissued = line._2.split(',').view(2).toString
>  var price = line._2.split(',').view(3).toFloat
>
>   var df = Seq(columns(key, ticker, timeissued, price))
>  println(df)
>
> I get
>
>
> List(columns(ac11a78d-82df-4b37-bf58-7e3388aa64cd,MKS,2018-09-05T10:10:15,676.5))
>
> So just need to convert that list to DF
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 5 Sep 2018 at 09:49, Mich Talebzadeh 
> wrote:
>
>> Thanks!
>>
>> The spark  is version 2.3.0
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 5 Sep 2018 at 09:41, Jungtaek Lim  wrote:
>>
>>> You may also find below link useful (though it looks far old), since
>>> case class is the thing which Encoder is available, so there may be another
>>> reason which prevent implicit conversion.
>>>
>>>
>>> https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-Scala-Error-value-toDF-is-not-a-member-of-org-apache/m-p/29994/highlight/true#M973
>>>
>>> And which Spark version do you use?
>>>
>>>
>>> 2018년 9월 5일 (수) 오후 5:32, Jungtaek Lim 님이 작성:
>>>
 Sorry I guess I pasted another method. the code is...

 implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): 
 DatasetHolder[T] = {
   DatasetHolder(_sqlContext.createDataset(s))
 }


 2018년 9월 5일 (수) 오후 5:30, Jungtaek Lim 님이 작성:

> I guess you need to have encoder for the type of result for columns().
>
>
> https://github.com/apache/spark/blob/2119e518d31331e65415e0f817a6f28ff18d2b42/sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala#L227-L229
>
> implicit def rddToDatasetHolder[T : Encoder](rdd: RDD[T]): 
> DatasetHolder[T] = {
>   DatasetHolder(_sqlContext.createDataset(rdd))
> }
>
> You can see lots of Encoder implementations in the scala code. If your
> type doesn't match anything it may not work and you need to provide custom
> Encoder.
>
> -Jungtaek Lim (HeartSaVioR)
>
> 2018년 9월 5일 (수) 오후 5:24, Mich Talebzadeh 님이
> 작성:
>
>> Thanks
>>
>> I already do that as below
>>
>> val sqlContext= new org.apache.spark.sql.SQLContext(sparkContext)
>>   import sqlContext.implicits._
>>
>> but still getting the error!
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>> for any loss, damage or destruction of data or any other property which 
>> may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 5 Sep 2018 at 09:17, Jungtaek Lim  wrote:
>>
>>> You may need to import implicits from your spark session like below:
>>> (Below code is borrowed from
>>> https://spark.apache.org/docs/latest/sql-programming-guide.html)
>>>
>>> import org.apache.spark.sql.SparkSession
>>> val spark = SparkSession
>>>   .builder()
>>>   .appName("Spark SQL basic example")
>>>   .config("spark.some.config.option", "some-value")
>>>   .getOrCreate()
>>> // For implicit conversions like converting RDDs 

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Mich Talebzadeh
With the following

case class columns(KEY: String, TICKER: String, TIMEISSUED: String, PRICE:
Float)

 var key = line._2.split(',').view(0).toString
 var ticker =  line._2.split(',').view(1).toString
 var timeissued = line._2.split(',').view(2).toString
 var price = line._2.split(',').view(3).toFloat

  var df = Seq(columns(key, ticker, timeissued, price))
 println(df)

I get

List(columns(ac11a78d-82df-4b37-bf58-7e3388aa64cd,MKS,2018-09-05T10:10:15,676.5))

So just need to convert that list to DF

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 5 Sep 2018 at 09:49, Mich Talebzadeh 
wrote:

> Thanks!
>
> The spark  is version 2.3.0
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 5 Sep 2018 at 09:41, Jungtaek Lim  wrote:
>
>> You may also find below link useful (though it looks far old), since case
>> class is the thing which Encoder is available, so there may be another
>> reason which prevent implicit conversion.
>>
>>
>> https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-Scala-Error-value-toDF-is-not-a-member-of-org-apache/m-p/29994/highlight/true#M973
>>
>> And which Spark version do you use?
>>
>>
>> 2018년 9월 5일 (수) 오후 5:32, Jungtaek Lim 님이 작성:
>>
>>> Sorry I guess I pasted another method. the code is...
>>>
>>> implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): 
>>> DatasetHolder[T] = {
>>>   DatasetHolder(_sqlContext.createDataset(s))
>>> }
>>>
>>>
>>> 2018년 9월 5일 (수) 오후 5:30, Jungtaek Lim 님이 작성:
>>>
 I guess you need to have encoder for the type of result for columns().


 https://github.com/apache/spark/blob/2119e518d31331e65415e0f817a6f28ff18d2b42/sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala#L227-L229

 implicit def rddToDatasetHolder[T : Encoder](rdd: RDD[T]): 
 DatasetHolder[T] = {
   DatasetHolder(_sqlContext.createDataset(rdd))
 }

 You can see lots of Encoder implementations in the scala code. If your
 type doesn't match anything it may not work and you need to provide custom
 Encoder.

 -Jungtaek Lim (HeartSaVioR)

 2018년 9월 5일 (수) 오후 5:24, Mich Talebzadeh 님이
 작성:

> Thanks
>
> I already do that as below
>
> val sqlContext= new org.apache.spark.sql.SQLContext(sparkContext)
>   import sqlContext.implicits._
>
> but still getting the error!
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Wed, 5 Sep 2018 at 09:17, Jungtaek Lim  wrote:
>
>> You may need to import implicits from your spark session like below:
>> (Below code is borrowed from
>> https://spark.apache.org/docs/latest/sql-programming-guide.html)
>>
>> import org.apache.spark.sql.SparkSession
>> val spark = SparkSession
>>   .builder()
>>   .appName("Spark SQL basic example")
>>   .config("spark.some.config.option", "some-value")
>>   .getOrCreate()
>> // For implicit conversions like converting RDDs to DataFramesimport 
>> spark.implicits._
>>
>>
>> 2018년 9월 5일 (수) 오후 5:11, Mich Talebzadeh 님이
>> 작성:
>>
>>> Hi,
>>>
>>> I have spark streaming that send data and I need to put that data
>>> into MongoDB for test purposes. The easiest way is to create a DF from 

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Mich Talebzadeh
Thanks!

The spark  is version 2.3.0

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 5 Sep 2018 at 09:41, Jungtaek Lim  wrote:

> You may also find below link useful (though it looks far old), since case
> class is the thing which Encoder is available, so there may be another
> reason which prevent implicit conversion.
>
>
> https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-Scala-Error-value-toDF-is-not-a-member-of-org-apache/m-p/29994/highlight/true#M973
>
> And which Spark version do you use?
>
>
> 2018년 9월 5일 (수) 오후 5:32, Jungtaek Lim 님이 작성:
>
>> Sorry I guess I pasted another method. the code is...
>>
>> implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): 
>> DatasetHolder[T] = {
>>   DatasetHolder(_sqlContext.createDataset(s))
>> }
>>
>>
>> 2018년 9월 5일 (수) 오후 5:30, Jungtaek Lim 님이 작성:
>>
>>> I guess you need to have encoder for the type of result for columns().
>>>
>>>
>>> https://github.com/apache/spark/blob/2119e518d31331e65415e0f817a6f28ff18d2b42/sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala#L227-L229
>>>
>>> implicit def rddToDatasetHolder[T : Encoder](rdd: RDD[T]): DatasetHolder[T] 
>>> = {
>>>   DatasetHolder(_sqlContext.createDataset(rdd))
>>> }
>>>
>>> You can see lots of Encoder implementations in the scala code. If your
>>> type doesn't match anything it may not work and you need to provide custom
>>> Encoder.
>>>
>>> -Jungtaek Lim (HeartSaVioR)
>>>
>>> 2018년 9월 5일 (수) 오후 5:24, Mich Talebzadeh 님이
>>> 작성:
>>>
 Thanks

 I already do that as below

 val sqlContext= new org.apache.spark.sql.SQLContext(sparkContext)
   import sqlContext.implicits._

 but still getting the error!

 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com


 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Wed, 5 Sep 2018 at 09:17, Jungtaek Lim  wrote:

> You may need to import implicits from your spark session like below:
> (Below code is borrowed from
> https://spark.apache.org/docs/latest/sql-programming-guide.html)
>
> import org.apache.spark.sql.SparkSession
> val spark = SparkSession
>   .builder()
>   .appName("Spark SQL basic example")
>   .config("spark.some.config.option", "some-value")
>   .getOrCreate()
> // For implicit conversions like converting RDDs to DataFramesimport 
> spark.implicits._
>
>
> 2018년 9월 5일 (수) 오후 5:11, Mich Talebzadeh 님이
> 작성:
>
>> Hi,
>>
>> I have spark streaming that send data and I need to put that data
>> into MongoDB for test purposes. The easiest way is to create a DF from 
>> the
>> individual list of columns as below
>>
>> I loop over individual rows in RDD and perform the following
>>
>> case class columns(KEY: String, TICKER: String, TIMEISSUED:
>> String, PRICE: Float)
>>
>>  for(line <- pricesRDD.collect.toArray)
>>  {
>> var key = line._2.split(',').view(0).toString
>>var ticker =  line._2.split(',').view(1).toString
>>var timeissued = line._2.split(',').view(2).toString
>>var price = line._2.split(',').view(3).toFloat
>>val priceToString = line._2.split(',').view(3)
>>if (price > 90.0)
>>{
>>  println ("price > 90.0, saving to MongoDB collection!")
>> // Save prices to mongoDB collection
>>* var df = Seq(columns(key, ticker, timeissued,
>> price)).toDF*
>>
>> but it fails with message
>>
>>  value toDF is not a member of Seq[columns].
>>
>> What would be the easiest way of resolving this please?
>>
>> thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> 

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Jungtaek Lim
You may also find below link useful (though it looks far old), since case
class is the thing which Encoder is available, so there may be another
reason which prevent implicit conversion.

https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-Scala-Error-value-toDF-is-not-a-member-of-org-apache/m-p/29994/highlight/true#M973

And which Spark version do you use?


2018년 9월 5일 (수) 오후 5:32, Jungtaek Lim 님이 작성:

> Sorry I guess I pasted another method. the code is...
>
> implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): 
> DatasetHolder[T] = {
>   DatasetHolder(_sqlContext.createDataset(s))
> }
>
>
> 2018년 9월 5일 (수) 오후 5:30, Jungtaek Lim 님이 작성:
>
>> I guess you need to have encoder for the type of result for columns().
>>
>>
>> https://github.com/apache/spark/blob/2119e518d31331e65415e0f817a6f28ff18d2b42/sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala#L227-L229
>>
>> implicit def rddToDatasetHolder[T : Encoder](rdd: RDD[T]): DatasetHolder[T] 
>> = {
>>   DatasetHolder(_sqlContext.createDataset(rdd))
>> }
>>
>> You can see lots of Encoder implementations in the scala code. If your
>> type doesn't match anything it may not work and you need to provide custom
>> Encoder.
>>
>> -Jungtaek Lim (HeartSaVioR)
>>
>> 2018년 9월 5일 (수) 오후 5:24, Mich Talebzadeh 님이
>> 작성:
>>
>>> Thanks
>>>
>>> I already do that as below
>>>
>>> val sqlContext= new org.apache.spark.sql.SQLContext(sparkContext)
>>>   import sqlContext.implicits._
>>>
>>> but still getting the error!
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Wed, 5 Sep 2018 at 09:17, Jungtaek Lim  wrote:
>>>
 You may need to import implicits from your spark session like below:
 (Below code is borrowed from
 https://spark.apache.org/docs/latest/sql-programming-guide.html)

 import org.apache.spark.sql.SparkSession
 val spark = SparkSession
   .builder()
   .appName("Spark SQL basic example")
   .config("spark.some.config.option", "some-value")
   .getOrCreate()
 // For implicit conversions like converting RDDs to DataFramesimport 
 spark.implicits._


 2018년 9월 5일 (수) 오후 5:11, Mich Talebzadeh 님이
 작성:

> Hi,
>
> I have spark streaming that send data and I need to put that data into
> MongoDB for test purposes. The easiest way is to create a DF from the
> individual list of columns as below
>
> I loop over individual rows in RDD and perform the following
>
> case class columns(KEY: String, TICKER: String, TIMEISSUED:
> String, PRICE: Float)
>
>  for(line <- pricesRDD.collect.toArray)
>  {
> var key = line._2.split(',').view(0).toString
>var ticker =  line._2.split(',').view(1).toString
>var timeissued = line._2.split(',').view(2).toString
>var price = line._2.split(',').view(3).toFloat
>val priceToString = line._2.split(',').view(3)
>if (price > 90.0)
>{
>  println ("price > 90.0, saving to MongoDB collection!")
> // Save prices to mongoDB collection
>* var df = Seq(columns(key, ticker, timeissued,
> price)).toDF*
>
> but it fails with message
>
>  value toDF is not a member of Seq[columns].
>
> What would be the easiest way of resolving this please?
>
> thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>



Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Jungtaek Lim
Sorry I guess I pasted another method. the code is...

implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]):
DatasetHolder[T] = {
  DatasetHolder(_sqlContext.createDataset(s))
}


2018년 9월 5일 (수) 오후 5:30, Jungtaek Lim 님이 작성:

> I guess you need to have encoder for the type of result for columns().
>
>
> https://github.com/apache/spark/blob/2119e518d31331e65415e0f817a6f28ff18d2b42/sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala#L227-L229
>
> implicit def rddToDatasetHolder[T : Encoder](rdd: RDD[T]): DatasetHolder[T] = 
> {
>   DatasetHolder(_sqlContext.createDataset(rdd))
> }
>
> You can see lots of Encoder implementations in the scala code. If your
> type doesn't match anything it may not work and you need to provide custom
> Encoder.
>
> -Jungtaek Lim (HeartSaVioR)
>
> 2018년 9월 5일 (수) 오후 5:24, Mich Talebzadeh 님이 작성:
>
>> Thanks
>>
>> I already do that as below
>>
>> val sqlContext= new org.apache.spark.sql.SQLContext(sparkContext)
>>   import sqlContext.implicits._
>>
>> but still getting the error!
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 5 Sep 2018 at 09:17, Jungtaek Lim  wrote:
>>
>>> You may need to import implicits from your spark session like below:
>>> (Below code is borrowed from
>>> https://spark.apache.org/docs/latest/sql-programming-guide.html)
>>>
>>> import org.apache.spark.sql.SparkSession
>>> val spark = SparkSession
>>>   .builder()
>>>   .appName("Spark SQL basic example")
>>>   .config("spark.some.config.option", "some-value")
>>>   .getOrCreate()
>>> // For implicit conversions like converting RDDs to DataFramesimport 
>>> spark.implicits._
>>>
>>>
>>> 2018년 9월 5일 (수) 오후 5:11, Mich Talebzadeh 님이
>>> 작성:
>>>
 Hi,

 I have spark streaming that send data and I need to put that data into
 MongoDB for test purposes. The easiest way is to create a DF from the
 individual list of columns as below

 I loop over individual rows in RDD and perform the following

 case class columns(KEY: String, TICKER: String, TIMEISSUED:
 String, PRICE: Float)

  for(line <- pricesRDD.collect.toArray)
  {
 var key = line._2.split(',').view(0).toString
var ticker =  line._2.split(',').view(1).toString
var timeissued = line._2.split(',').view(2).toString
var price = line._2.split(',').view(3).toFloat
val priceToString = line._2.split(',').view(3)
if (price > 90.0)
{
  println ("price > 90.0, saving to MongoDB collection!")
 // Save prices to mongoDB collection
* var df = Seq(columns(key, ticker, timeissued,
 price)).toDF*

 but it fails with message

  value toDF is not a member of Seq[columns].

 What would be the easiest way of resolving this please?

 thanks

 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com


 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.



>>>


Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Jungtaek Lim
I guess you need to have encoder for the type of result for columns().

https://github.com/apache/spark/blob/2119e518d31331e65415e0f817a6f28ff18d2b42/sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala#L227-L229

implicit def rddToDatasetHolder[T : Encoder](rdd: RDD[T]): DatasetHolder[T] = {
  DatasetHolder(_sqlContext.createDataset(rdd))
}

You can see lots of Encoder implementations in the scala code. If your type
doesn't match anything it may not work and you need to provide custom
Encoder.

-Jungtaek Lim (HeartSaVioR)

2018년 9월 5일 (수) 오후 5:24, Mich Talebzadeh 님이 작성:

> Thanks
>
> I already do that as below
>
> val sqlContext= new org.apache.spark.sql.SQLContext(sparkContext)
>   import sqlContext.implicits._
>
> but still getting the error!
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 5 Sep 2018 at 09:17, Jungtaek Lim  wrote:
>
>> You may need to import implicits from your spark session like below:
>> (Below code is borrowed from
>> https://spark.apache.org/docs/latest/sql-programming-guide.html)
>>
>> import org.apache.spark.sql.SparkSession
>> val spark = SparkSession
>>   .builder()
>>   .appName("Spark SQL basic example")
>>   .config("spark.some.config.option", "some-value")
>>   .getOrCreate()
>> // For implicit conversions like converting RDDs to DataFramesimport 
>> spark.implicits._
>>
>>
>> 2018년 9월 5일 (수) 오후 5:11, Mich Talebzadeh 님이
>> 작성:
>>
>>> Hi,
>>>
>>> I have spark streaming that send data and I need to put that data into
>>> MongoDB for test purposes. The easiest way is to create a DF from the
>>> individual list of columns as below
>>>
>>> I loop over individual rows in RDD and perform the following
>>>
>>> case class columns(KEY: String, TICKER: String, TIMEISSUED: String,
>>> PRICE: Float)
>>>
>>>  for(line <- pricesRDD.collect.toArray)
>>>  {
>>> var key = line._2.split(',').view(0).toString
>>>var ticker =  line._2.split(',').view(1).toString
>>>var timeissued = line._2.split(',').view(2).toString
>>>var price = line._2.split(',').view(3).toFloat
>>>val priceToString = line._2.split(',').view(3)
>>>if (price > 90.0)
>>>{
>>>  println ("price > 90.0, saving to MongoDB collection!")
>>> // Save prices to mongoDB collection
>>>* var df = Seq(columns(key, ticker, timeissued, price)).toDF*
>>>
>>> but it fails with message
>>>
>>>  value toDF is not a member of Seq[columns].
>>>
>>> What would be the easiest way of resolving this please?
>>>
>>> thanks
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>


Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Mich Talebzadeh
Thanks

I already do that as below

val sqlContext= new org.apache.spark.sql.SQLContext(sparkContext)
  import sqlContext.implicits._

but still getting the error!

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 5 Sep 2018 at 09:17, Jungtaek Lim  wrote:

> You may need to import implicits from your spark session like below:
> (Below code is borrowed from
> https://spark.apache.org/docs/latest/sql-programming-guide.html)
>
> import org.apache.spark.sql.SparkSession
> val spark = SparkSession
>   .builder()
>   .appName("Spark SQL basic example")
>   .config("spark.some.config.option", "some-value")
>   .getOrCreate()
> // For implicit conversions like converting RDDs to DataFramesimport 
> spark.implicits._
>
>
> 2018년 9월 5일 (수) 오후 5:11, Mich Talebzadeh 님이 작성:
>
>> Hi,
>>
>> I have spark streaming that send data and I need to put that data into
>> MongoDB for test purposes. The easiest way is to create a DF from the
>> individual list of columns as below
>>
>> I loop over individual rows in RDD and perform the following
>>
>> case class columns(KEY: String, TICKER: String, TIMEISSUED: String,
>> PRICE: Float)
>>
>>  for(line <- pricesRDD.collect.toArray)
>>  {
>> var key = line._2.split(',').view(0).toString
>>var ticker =  line._2.split(',').view(1).toString
>>var timeissued = line._2.split(',').view(2).toString
>>var price = line._2.split(',').view(3).toFloat
>>val priceToString = line._2.split(',').view(3)
>>if (price > 90.0)
>>{
>>  println ("price > 90.0, saving to MongoDB collection!")
>> // Save prices to mongoDB collection
>>* var df = Seq(columns(key, ticker, timeissued, price)).toDF*
>>
>> but it fails with message
>>
>>  value toDF is not a member of Seq[columns].
>>
>> What would be the easiest way of resolving this please?
>>
>> thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>


Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Jungtaek Lim
You may need to import implicits from your spark session like below:
(Below code is borrowed from
https://spark.apache.org/docs/latest/sql-programming-guide.html)

import org.apache.spark.sql.SparkSession
val spark = SparkSession
  .builder()
  .appName("Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
  .getOrCreate()
// For implicit conversions like converting RDDs to DataFramesimport
spark.implicits._


2018년 9월 5일 (수) 오후 5:11, Mich Talebzadeh 님이 작성:

> Hi,
>
> I have spark streaming that send data and I need to put that data into
> MongoDB for test purposes. The easiest way is to create a DF from the
> individual list of columns as below
>
> I loop over individual rows in RDD and perform the following
>
> case class columns(KEY: String, TICKER: String, TIMEISSUED: String,
> PRICE: Float)
>
>  for(line <- pricesRDD.collect.toArray)
>  {
> var key = line._2.split(',').view(0).toString
>var ticker =  line._2.split(',').view(1).toString
>var timeissued = line._2.split(',').view(2).toString
>var price = line._2.split(',').view(3).toFloat
>val priceToString = line._2.split(',').view(3)
>if (price > 90.0)
>{
>  println ("price > 90.0, saving to MongoDB collection!")
> // Save prices to mongoDB collection
>* var df = Seq(columns(key, ticker, timeissued, price)).toDF*
>
> but it fails with message
>
>  value toDF is not a member of Seq[columns].
>
> What would be the easiest way of resolving this please?
>
> thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Mich Talebzadeh
Hi,

I have spark streaming that send data and I need to put that data into
MongoDB for test purposes. The easiest way is to create a DF from the
individual list of columns as below

I loop over individual rows in RDD and perform the following

case class columns(KEY: String, TICKER: String, TIMEISSUED: String,
PRICE: Float)

 for(line <- pricesRDD.collect.toArray)
 {
var key = line._2.split(',').view(0).toString
   var ticker =  line._2.split(',').view(1).toString
   var timeissued = line._2.split(',').view(2).toString
   var price = line._2.split(',').view(3).toFloat
   val priceToString = line._2.split(',').view(3)
   if (price > 90.0)
   {
 println ("price > 90.0, saving to MongoDB collection!")
// Save prices to mongoDB collection
   * var df = Seq(columns(key, ticker, timeissued, price)).toDF*

but it fails with message

 value toDF is not a member of Seq[columns].

What would be the easiest way of resolving this please?

thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.