Spark Streaming RDD Cleanup too slow

2018-09-05 Thread Prashant Sharma
I have a Spark Streaming job which takes too long to delete temp RDD's. I collect about 4MM telemetry metrics per minute and do minor aggregations in the Streaming Job. I am using Amazon R4 instances. The Driver RPC call although Async,i believe, is slow getting the handle for future object at

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Manu Zhang
Have you tried adding Encoder for columns as suggested by Jungtaek Lim ? On Thu, Sep 6, 2018 at 6:24 AM Mich Talebzadeh wrote: > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Mich Talebzadeh
Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all

Re: deploy-mode cluster. FileNotFoundException

2018-09-05 Thread Marcelo Vanzin
See SPARK-4160. Long story short: you need to upload the files and jars to some shared storage (like HDFS) manually. On Wed, Sep 5, 2018 at 2:17 AM Guillermo Ortiz Fernández wrote: > > I'm using standalone cluster and the final command I'm trying is: > spark-submit --verbose --deploy-mode cluster

[ML] Setting Non-Transform Params for a Pipeline & PipelineModel

2018-09-05 Thread Aleksander Eskilson
I had originally sent this to the Dev list since the API discussed here is still marked as experimental in portions, but it occurs to me this may still be a general use question, sorry for the cross-listing. In a nutshell, what I'd like to do is instantiate a Pipeline (or extension class of

Re: Unsubscribe

2018-09-05 Thread Sunil Prabhakara
> >

Re: Spark hive udf: no handler for UDAF analysis exception

2018-09-05 Thread Swapnil Chougule
Looks like Spark Session has only implementation for UDAF but not for UDF. Is it a bug or some work around is there ? T.Gaweda has opened JIRA for this. SPARK-25334 Thanks, Swapnil On Tue, Sep 4, 2018 at 4:20 PM Swapnil Chougule wrote: > Created one project 'spark-udf' & written hive udf as

Padova Apache Spark Meetup

2018-09-05 Thread Matteo Durighetto
Hello, we are creating a new meetup of enthusiast Apache Spark Users in Italy at Padova https://www.meetup.com/Padova-Apache-Spark-Meetup/ Is it possible to add the meetup link to the web page https://spark.apache.org/community.html ? Moreover is it possible to announce future

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Mich Talebzadeh
yep already tried it and it did not work. thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com

Re: deploy-mode cluster. FileNotFoundException

2018-09-05 Thread Guillermo Ortiz Fernández
I'm using standalone cluster and the final command I'm trying is: spark-submit --verbose --deploy-mode cluster --driver-java-options "-Dlogback.configurationFile=conf/i${1}Logback.xml" \ --class com.example.Launcher --driver-class-path

deploy-mode cluster. FileNotFoundException

2018-09-05 Thread Guillermo Ortiz Fernández
I want to execute my processes in cluster mode. As I don't know where the driver has been executed I have to do available all the file it needs. I undertand that they are two options. Copy all the files to all nodes of copy them to HDFS. My doubt is,, if I want to put all the files in HDFS, isn't

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Deepak Sharma
Try this: *import **spark*.implicits._ df.toDF() On Wed, Sep 5, 2018 at 2:31 PM Mich Talebzadeh wrote: > With the following > > case class columns(KEY: String, TICKER: String, TIMEISSUED: String, PRICE: > Float) > > var key = line._2.split(',').view(0).toString > var ticker =

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Mich Talebzadeh
With the following case class columns(KEY: String, TICKER: String, TIMEISSUED: String, PRICE: Float) var key = line._2.split(',').view(0).toString var ticker = line._2.split(',').view(1).toString var timeissued = line._2.split(',').view(2).toString var price =

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Mich Talebzadeh
Thanks! The spark is version 2.3.0 Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaimer:* Use it

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Jungtaek Lim
You may also find below link useful (though it looks far old), since case class is the thing which Encoder is available, so there may be another reason which prevent implicit conversion.

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Jungtaek Lim
Sorry I guess I pasted another method. the code is... implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = { DatasetHolder(_sqlContext.createDataset(s)) } 2018년 9월 5일 (수) 오후 5:30, Jungtaek Lim 님이 작성: > I guess you need to have encoder for the type of result for

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Jungtaek Lim
I guess you need to have encoder for the type of result for columns(). https://github.com/apache/spark/blob/2119e518d31331e65415e0f817a6f28ff18d2b42/sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala#L227-L229 implicit def rddToDatasetHolder[T : Encoder](rdd: RDD[T]):

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Mich Talebzadeh
Thanks I already do that as below val sqlContext= new org.apache.spark.sql.SQLContext(sparkContext) import sqlContext.implicits._ but still getting the error! Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Jungtaek Lim
You may need to import implicits from your spark session like below: (Below code is borrowed from https://spark.apache.org/docs/latest/sql-programming-guide.html) import org.apache.spark.sql.SparkSession val spark = SparkSession .builder() .appName("Spark SQL basic example")

getting error: value toDF is not a member of Seq[columns]

2018-09-05 Thread Mich Talebzadeh
Hi, I have spark streaming that send data and I need to put that data into MongoDB for test purposes. The easiest way is to create a DF from the individual list of columns as below I loop over individual rows in RDD and perform the following case class columns(KEY: String, TICKER: String,