from:"anna stax"

help/suggestions to setup spark cluster

2017-04-26 Thread anna stax

I need to setup a spark cluster for Spark streaming and scheduled batch
jobs and adhoc queries.
Please give me some suggestions. Can this be done in standalone mode.

Right now we have a spark cluster in standalone mode on AWS EC2 running
spark streaming application. Can we run spark batch jobs and zeppelin on
the same. Do we need a better resource manager like Mesos?

Are there any companies or individuals that can help in setting this up?

Thank you.
-Anna

Re: help/suggestions to setup spark cluster

2017-04-26 Thread anna stax

Hi Sam,

Thank you for the reply.

What do you mean by
I doubt people run spark in a. Single EC2 instance, certainly not in
production I don't think

What is wrong in having a data pipeline on EC2 that reads data from kafka,
processes using spark and outputs to cassandra? Please explain.

Thanks
-Anna

On Wed, Apr 26, 2017 at 2:22 PM, Sam Elamin  wrote:

> Hi Anna
>
> There are a variety of options for launching spark clusters. I doubt
> people run spark in a. Single EC2 instance, certainly not in production I
> don't think
>
> I don't have enough information of what you are trying to do but if you
> are just trying to set things up from scratch then I think you can just use
> EMR which will create a cluster for you and attach a zeppelin instance as
> well
>
>
> You can also use databricks for ease of use and very little management but
> you will pay a premium for that abstraction
>
>
> Regards
> Sam
> On Wed, 26 Apr 2017 at 22:02, anna stax  wrote:
>
>> I need to setup a spark cluster for Spark streaming and scheduled batch
>> jobs and adhoc queries.
>> Please give me some suggestions. Can this be done in standalone mode.
>>
>> Right now we have a spark cluster in standalone mode on AWS EC2 running
>> spark streaming application. Can we run spark batch jobs and zeppelin on
>> the same. Do we need a better resource manager like Mesos?
>>
>> Are there any companies or individuals that can help in setting this up?
>>
>> Thank you.
>>
>> -Anna
>>
>

Re: help/suggestions to setup spark cluster

2017-04-26 Thread anna stax

Thanks Cody,

As I already mentioned I am running spark streaming on EC2 cluster in
standalone mode. Now in addition to streaming, I want to be able to run
spark batch job hourly and adhoc queries using Zeppelin.

Can you please confirm that a standalone cluster is OK for this. Please
provide me some links to help me get started.

Thanks
-Anna

On Wed, Apr 26, 2017 at 7:46 PM, Cody Koeninger  wrote:

> The standalone cluster manager is fine for production.  Don't use Yarn
> or Mesos unless you already have another need for it.
>
> On Wed, Apr 26, 2017 at 4:53 PM, anna stax  wrote:
> > Hi Sam,
> >
> > Thank you for the reply.
> >
> > What do you mean by
> > I doubt people run spark in a. Single EC2 instance, certainly not in
> > production I don't think
> >
> > What is wrong in having a data pipeline on EC2 that reads data from
> kafka,
> > processes using spark and outputs to cassandra? Please explain.
> >
> > Thanks
> > -Anna
> >
> > On Wed, Apr 26, 2017 at 2:22 PM, Sam Elamin 
> wrote:
> >>
> >> Hi Anna
> >>
> >> There are a variety of options for launching spark clusters. I doubt
> >> people run spark in a. Single EC2 instance, certainly not in production
> I
> >> don't think
> >>
> >> I don't have enough information of what you are trying to do but if you
> >> are just trying to set things up from scratch then I think you can just
> use
> >> EMR which will create a cluster for you and attach a zeppelin instance
> as
> >> well
> >>
> >>
> >> You can also use databricks for ease of use and very little management
> but
> >> you will pay a premium for that abstraction
> >>
> >>
> >> Regards
> >> Sam
> >> On Wed, 26 Apr 2017 at 22:02, anna stax  wrote:
> >>>
> >>> I need to setup a spark cluster for Spark streaming and scheduled batch
> >>> jobs and adhoc queries.
> >>> Please give me some suggestions. Can this be done in standalone mode.
> >>>
> >>> Right now we have a spark cluster in standalone mode on AWS EC2 running
> >>> spark streaming application. Can we run spark batch jobs and zeppelin
> on the
> >>> same. Do we need a better resource manager like Mesos?
> >>>
> >>> Are there any companies or individuals that can help in setting this
> up?
> >>>
> >>> Thank you.
> >>>
> >>> -Anna
> >
> >
>

Spark standalone , client mode. How do I monitor?

2017-06-27 Thread anna stax

Hi all,

I have a spark standalone cluster. I am running a spark streaming
application on it and the deploy mode is client. I am looking for the best
way to monitor the cluster and application so that I will know when the
application/cluster is down. I cannot move to cluster deploy mode now.

I appreciate your thoughts.

Thanks
-Anna

Logging in lSpark streaming application

2017-07-06 Thread anna stax

Do I need to include the log4j dependencies in my pom.xml of the spark
streaming application or it is already included in spark libraries?

I am running Spark in standalone mode on AWS EC2.

Thanks

Re: Spark streaming, Storage tab questions

2017-07-09 Thread anna stax

On Sun, Jul 9, 2017 at 4:33 PM, anna stax  wrote:

> Does each row represent the state of my app at different time?
>
> When the fraction cached is 90% and the size on Disk is 0, does that mean
> 10% of the data is lost. Its neither in memory now disk?
>
> I am running spark streaming standalone mode, and I am using mapWithState.
> It runs fine for several hours and then fails. So trying to debug what is
> causing it to fail.
>
> Please share your thoughts and experiences.
>
> Thanks
>
>

Spark streaming for CEP

2017-10-18 Thread anna stax

Hello all,

Has anyone used spark streaming for CEP (Complex Event processing).  Any
CEP libraries that works well with spark. I have a use case for CEP and
trying to see if spark streaming is a good fit.

Currently we have a data pipeline using Kafka, Spark streaming and
Cassandra for data ingestion and near real time dashboard.

Please share your experience.
Thanks much.
-Anna

Re: Spark streaming for CEP

2017-10-25 Thread anna stax

Thanks very much  Mich, Thomas and Stephan . I will look into it.

On Tue, Oct 24, 2017 at 8:02 PM, lucas.g...@gmail.com 
wrote:

> This looks really interesting, thanks for linking!
>
> Gary Lucas
>
> On 24 October 2017 at 15:06, Mich Talebzadeh 
> wrote:
>
>> Great thanks Steve
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 24 October 2017 at 22:58, Stephen Boesch  wrote:
>>
>>> Hi Mich, the github link has a brief intro - including a link to the
>>> formal docs http://logisland.readthedocs.io/en/latest/index.html .
>>>  They have an architectural overview, developer guide, tutorial, and pretty
>>> comprehensive api docs.
>>>
>>> 2017-10-24 13:31 GMT-07:00 Mich Talebzadeh :
>>>
>>>> thanks Thomas.
>>>>
>>>> do you have a summary write-up for this tool please?
>>>>
>>>>
>>>> regards,
>>>>
>>>>
>>>>
>>>>
>>>> Thomas
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 24 October 2017 at 13:53, Thomas Bailet 
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> we (@ hurence) have released on open source middleware based on
>>>>> SparkStreaming over Kafka to do CEP and log mining, called *logisland*
>>>>> (https://github.com/Hurence/logisland/) it has been deployed into
>>>>> production for 2 years now and does a great job. You should have a look.
>>>>>
>>>>>
>>>>> bye
>>>>>
>>>>> Thomas Bailet
>>>>>
>>>>> CTO : hurence
>>>>>
>>>>> Le 18/10/17 à 22:05, Mich Talebzadeh a écrit :
>>>>>
>>>>> As you may be aware the granularity that Spark streaming has is
>>>>> micro-batching and that is limited to 0.5 second. So if you have 
>>>>> continuous
>>>>> ingestion of data then Spark streaming may not be granular enough for CEP.
>>>>> You may consider other products.
>>>>>
>>>>> Worth looking at this old thread on mine "Spark support for Complex
>>>>> Event Processing (CEP)
>>>>>
>>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201604.
>>>>> mbox/%3CCAJ3fcbB8eaf0JV84bA7XGUK5GajC1yGT3ZgTNCi8arJg56=LbQ@
>>>>> mail.gmail.com%3E
>>>>>
>>>>> HTH
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>> On 18 October 2017 at 20:52, anna stax  wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> Has anyone used spark streaming for CEP (Complex Event processing).
>>>>>> Any CEP libraries that works well with spark. I have a use case for CEP 
>>>>>> and
>>>>>> trying to see if spark streaming is a good fit.
>>>>>>
>>>>>> Currently we have a data pipeline using Kafka, Spark streaming and
>>>>>> Cassandra for data ingestion and near real time dashboard.
>>>>>>
>>>>>> Please share your experience.
>>>>>> Thanks much.
>>>>>> -Anna
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How do I save the dataframe data as a pdf file?

2017-12-12 Thread anna stax

Thanks Anthony for the response.

Yes, the data in the dataframe represents a report and I want to create pdf
files.
I am using scala so hoping to find a easier solution in scala, if not I
will try out your suggestion .


On Tue, Dec 12, 2017 at 11:29 AM, Anthony Thomas 
wrote:

> Are you trying to produce a formatted table in a pdf file where the
> numbers in the table come from a dataframe? I.e. to present summary
> statistics or other aggregates? If so I would guess your best bet would be
> to collect the dataframe as a Pandas dataframe and use the to_latex method.
> You can then use a standard latex compiler to produce a pdf with a table
> containing that data. I don't know if there's any comparable built-in for
> Scala, but you could always collect the data as an array of arrays and
> write these to a tex file using standard IO. Maybe someone has an easier
> suggestion.
>
> On Tue, Dec 12, 2017 at 11:12 AM, shyla deshpande <
> deshpandesh...@gmail.com> wrote:
>
>> Hello all,
>>
>> Is there a way to write the dataframe data as a pdf file?
>>
>> Thanks
>> -Shyla
>>
>
>

Unable to see the table created using saveAsTable From Beeline. Please help!

2018-07-06 Thread anna stax

I am  running spark 2.1.0 on AWS EMR

In my Zeppelin Note I am creating a table

df.write
.format("parquet")
  .saveAsTable("default.1test")

and I see the table when I

spark.catalog.listTables().show()
+++---+-+---+
|name|database|description|tableType|isTemporary|
+++---+-+---+
|   1test| default|   null|  MANAGED|  false|


>From Beeline client, I don’t see the table

0: jdbc:hive2://localhost:10001/> show tables;
+---++--+--+
| database  | tableName  | isTemporary  |
+---++--+--+
+---++--+--+
No rows selected (0.115 seconds)

Re: Unable to see the table created using saveAsTable From Beeline. Please help!

2018-07-07 Thread anna stax

Is some configuration missing ? Appreciate any help

On Fri, Jul 6, 2018 at 4:10 PM, anna stax  wrote:

> I am  running spark 2.1.0 on AWS EMR
>
> In my Zeppelin Note I am creating a table
>
> df.write
> .format("parquet")
>   .saveAsTable("default.1test")
>
> and I see the table when I
>
> spark.catalog.listTables().show()
> +++---+-+---+
> |name|database|description|tableType|isTemporary|
> +++---+-+---+
> |   1test| default|   null|  MANAGED|  false|
>
>
> From Beeline client, I don’t see the table
>
> 0: jdbc:hive2://localhost:10001/> show tables;
> +---++--+--+
> | database  | tableName  | isTemporary  |
> +---++--+--+
> +---++--+--+
> No rows selected (0.115 seconds)
>
>

Writing the contents of spark dataframe to Kafka with Spark 2.2

2019-03-18 Thread anna stax

Hi all,
I am unable to write the contents of spark dataframe to Kafka.
I am using Spark 2.2

This is my code

val df = Seq(("1","One"),("2","two")).toDF("key","value")
df.printSchema()
df.show(false)
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
  .write
  .format("kafka")
  .option("kafka.bootstrap.servers", "127.0.0.1:9092")
  .option("topic", "testtopic")
  .save()

and I am getting the following error message
[Stage 0:>  (0 + 2)
/ 2]Exception in thread "main" org.apache.spark.SparkException: Job aborted
due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent
failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver):
java.lang.NoSuchMethodError:
org.apache.spark.sql.catalyst.expressions.Cast$.apply$default$3()Lscala/Option;
at
org.apache.spark.sql.kafka010.KafkaWriteTask.createProjection(KafkaWriteTask.scala:112)
at
org.apache.spark.sql.kafka010.KafkaWriteTask.(KafkaWriteTask.scala:39)
at
org.apache.spark.sql.kafka010.KafkaWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(KafkaWriter.scala:90)
at
org.apache.spark.sql.kafka010.KafkaWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(KafkaWriter.scala:89)
at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I have added this dependency


  org.apache.spark
  spark-sql_2.11
  2.2.2


Appreciate any help. Thanks.
https://stackoverflow.com/questions/55229945/writing-the-contents-of-spark-dataframe-to-kafka-with-spark-2-2

Re: Writing the contents of spark dataframe to Kafka with Spark 2.2

2019-03-19 Thread anna stax

Hi Gabor,

Thank you for the response.

I do have those dependencies added.

 
  org.apache.spark
  spark-core_2.11
  2.2.2


  org.apache.spark
  spark-sql_2.11
  2.2.2


  org.apache.spark
  spark-sql-kafka-0-10_2.11
  2.2.2


and my kafka version is kafka_2.11-1.1.0


On Tue, Mar 19, 2019 at 12:48 AM Gabor Somogyi 
wrote:

> Hi Anna,
>
>   Have you added spark-sql-kafka-0-10_2.11:2.2.0 package as well?
> Further info can be found here:
> https://spark.apache.org/docs/2.2.0/structured-streaming-kafka-integration.html#deploying
> The same --packages option can be used with spark-shell as well...
>
> BR,
> G
>
>
> On Mon, Mar 18, 2019 at 10:07 PM anna stax  wrote:
>
>> Hi all,
>> I am unable to write the contents of spark dataframe to Kafka.
>> I am using Spark 2.2
>>
>> This is my code
>>
>> val df = Seq(("1","One"),("2","two")).toDF("key","value")
>> df.printSchema()
>> df.show(false)
>> df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
>>   .write
>>   .format("kafka")
>>   .option("kafka.bootstrap.servers", "127.0.0.1:9092")
>>   .option("topic", "testtopic")
>>   .save()
>>
>> and I am getting the following error message
>> [Stage 0:>  (0 +
>> 2) / 2]Exception in thread "main" org.apache.spark.SparkException: Job
>> aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most
>> recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor
>> driver): java.lang.NoSuchMethodError:
>> org.apache.spark.sql.catalyst.expressions.Cast$.apply$default$3()Lscala/Option;
>> at
>> org.apache.spark.sql.kafka010.KafkaWriteTask.createProjection(KafkaWriteTask.scala:112)
>> at
>> org.apache.spark.sql.kafka010.KafkaWriteTask.(KafkaWriteTask.scala:39)
>> at
>> org.apache.spark.sql.kafka010.KafkaWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(KafkaWriter.scala:90)
>> at
>> org.apache.spark.sql.kafka010.KafkaWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(KafkaWriter.scala:89)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
>> at
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
>> at
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>> at org.apache.spark.scheduler.Task.run(Task.scala:99)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> I have added this dependency
>>
>> 
>>   org.apache.spark
>>   spark-sql_2.11
>>   2.2.2
>> 
>>
>> Appreciate any help. Thanks.
>>
>> https://stackoverflow.com/questions/55229945/writing-the-contents-of-spark-dataframe-to-kafka-with-spark-2-2
>>
>

Re: Writing the contents of spark dataframe to Kafka with Spark 2.2

2019-03-19 Thread anna stax

Hi Gabor,

I am just trying out.  Desperately trying to make writing to kafka work
using spark-sql-kafka.
In my deployment project I do use the provided scope.


On Tue, Mar 19, 2019 at 8:50 AM Gabor Somogyi 
wrote:

> Hi Anna,
>
> Looks like some sort of version mismatch.
>
> Presume scala version double checked...
>
> Not sure why the mentioned artifacts are not in provided scope.
> It will end-up in significantly smaller jar + these artifacts should be
> available on the cluster (either by default for example core or due to
> --packages for example the sql-kafka).
>
> BR,
> G
>
>
> On Tue, Mar 19, 2019 at 4:35 PM anna stax  wrote:
>
>> Hi Gabor,
>>
>> Thank you for the response.
>>
>> I do have those dependencies added.
>>
>>  
>>   org.apache.spark
>>   spark-core_2.11
>>   2.2.2
>> 
>> 
>>   org.apache.spark
>>   spark-sql_2.11
>>   2.2.2
>> 
>> 
>>   org.apache.spark
>>   spark-sql-kafka-0-10_2.11
>>   2.2.2
>> 
>>
>> and my kafka version is kafka_2.11-1.1.0
>>
>>
>> On Tue, Mar 19, 2019 at 12:48 AM Gabor Somogyi 
>> wrote:
>>
>>> Hi Anna,
>>>
>>>   Have you added spark-sql-kafka-0-10_2.11:2.2.0 package as well?
>>> Further info can be found here:
>>> https://spark.apache.org/docs/2.2.0/structured-streaming-kafka-integration.html#deploying
>>> The same --packages option can be used with spark-shell as well...
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Mon, Mar 18, 2019 at 10:07 PM anna stax  wrote:
>>>
>>>> Hi all,
>>>> I am unable to write the contents of spark dataframe to Kafka.
>>>> I am using Spark 2.2
>>>>
>>>> This is my code
>>>>
>>>> val df = Seq(("1","One"),("2","two")).toDF("key","value")
>>>> df.printSchema()
>>>> df.show(false)
>>>> df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
>>>>   .write
>>>>   .format("kafka")
>>>>   .option("kafka.bootstrap.servers", "127.0.0.1:9092")
>>>>   .option("topic", "testtopic")
>>>>   .save()
>>>>
>>>> and I am getting the following error message
>>>> [Stage 0:>  (0
>>>> + 2) / 2]Exception in thread "main" org.apache.spark.SparkException: Job
>>>> aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most
>>>> recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor
>>>> driver): java.lang.NoSuchMethodError:
>>>> org.apache.spark.sql.catalyst.expressions.Cast$.apply$default$3()Lscala/Option;
>>>> at
>>>> org.apache.spark.sql.kafka010.KafkaWriteTask.createProjection(KafkaWriteTask.scala:112)
>>>> at
>>>> org.apache.spark.sql.kafka010.KafkaWriteTask.(KafkaWriteTask.scala:39)
>>>> at
>>>> org.apache.spark.sql.kafka010.KafkaWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(KafkaWriter.scala:90)
>>>> at
>>>> org.apache.spark.sql.kafka010.KafkaWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(KafkaWriter.scala:89)
>>>> at
>>>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
>>>> at
>>>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
>>>> at
>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
>>>> at
>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
>>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>>>> at org.apache.spark.scheduler.Task.run(Task.scala:99)
>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>>
>>>> I have added this dependency
>>>>
>>>> 
>>>>   org.apache.spark
>>>>   spark-sql_2.11
>>>>   2.2.2
>>>> 
>>>>
>>>> Appreciate any help. Thanks.
>>>>
>>>> https://stackoverflow.com/questions/55229945/writing-the-contents-of-spark-dataframe-to-kafka-with-spark-2-2
>>>>
>>>

help/suggestions to setup spark cluster

Re: help/suggestions to setup spark cluster

Re: help/suggestions to setup spark cluster

Spark standalone , client mode. How do I monitor?

Logging in lSpark streaming application

Re: Spark streaming, Storage tab questions

Spark streaming for CEP

Re: Spark streaming for CEP

Re: How do I save the dataframe data as a pdf file?

Unable to see the table created using saveAsTable From Beeline. Please help!

Re: Unable to see the table created using saveAsTable From Beeline. Please help!

Writing the contents of spark dataframe to Kafka with Spark 2.2

Re: Writing the contents of spark dataframe to Kafka with Spark 2.2

Re: Writing the contents of spark dataframe to Kafka with Spark 2.2

14 matches

Site Navigation

Mail list logo

Footer information