Re: [Structured Streaming] More than 1 streaming in a code

2018-04-06 Thread Panagiotis Garefalakis
Hello Aakash,

When you use query.awaitTermination you are pretty much blocking there
waiting for the current query to stop or throw an exception. In your case
the second query will not even start.
What you could do instead is remove all the blocking calls and use
spark.streams.awaitAnyTermination instead (waiting for either query1 or
query2 to terminate). Make sure you do that after the query2.start call.

I hope this helps.

Cheers,
Panagiotis

On Fri, Apr 6, 2018 at 11:23 AM, Aakash Basu 
wrote:

> Any help?
>
> Need urgent help. Someone please clarify the doubt?
>
> -- Forwarded message --
> From: Aakash Basu 
> Date: Thu, Apr 5, 2018 at 3:18 PM
> Subject: [Structured Streaming] More than 1 streaming in a code
> To: user 
>
>
> Hi,
>
> If I have more than one writeStream in a code, which operates on the same
> readStream data, why does it produce only the first writeStream? I want the
> second one to be also printed on the console.
>
> How to do that?
>
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import split, col
>
> class test:
>
>
> spark = SparkSession.builder \
> .appName("Stream_Col_Oper_Spark") \
> .getOrCreate()
>
> data = spark.readStream.format("kafka") \
> .option("startingOffsets", "latest") \
> .option("kafka.bootstrap.servers", "localhost:9092") \
> .option("subscribe", "test1") \
> .load()
>
> ID = data.select('value') \
> .withColumn('value', data.value.cast("string")) \
> .withColumn("Col1", split(col("value"), ",").getItem(0)) \
> .withColumn("Col2", split(col("value"), ",").getItem(1)) \
> .drop('value')
>
> ID.createOrReplaceTempView("transformed_Stream_DF")
>
> df = spark.sql("select avg(col1) as aver from transformed_Stream_DF")
>
> df.createOrReplaceTempView("abcd")
>
> wordCounts = spark.sql("Select col1, col2, col2/(select aver from abcd) 
> col3 from transformed_Stream_DF")
>
>
> # ---#
>
> query1 = df \
> .writeStream \
> .format("console") \
> .outputMode("complete") \
> .trigger(processingTime='3 seconds') \
> .start()
>
> query1.awaitTermination()
> # ---#
>
> query2 = wordCounts \
> .writeStream \
> .format("console") \
> .trigger(processingTime='3 seconds') \
> .start()
>
> query2.awaitTermination()
>
> # /home/kafka/Downloads/spark-2.3.0-bin-hadoop2.7/bin/spark-submit 
> --packages 
> org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.0,com.databricks:spark-csv_2.10:1.0.3
>  
> /home/aakashbasu/PycharmProjects/AllMyRnD/Kafka_Spark/Stream_Col_Oper_Spark.py
>
>
>
>
> Thanks,
> Aakash.
>
>


Mesos Spark Tasks - Lost

2015-05-19 Thread Panagiotis Garefalakis
Hello all,

I am facing a weird issue for the last couple of days running Spark on top
of Mesos and I need your help. I am running Mesos in a private cluster and
managed to deploy successfully  hdfs, cassandra, marathon and play but
Spark is not working for a reason. I have tried so far:
different java versions (1.6 and 1.7 oracle and openjdk), different
spark-env configuration, different Spark versions (from 0.8.8 to 1.3.1),
different HDFS versions (hadoop 5.1 and 4.6), and updating pom dependencies.

More specifically while local tasks complete fine, in cluster mode all the
tasks get lost.
(both using spark-shell and spark-submit)
>From the worker log I see something like this:

---
I0519 02:36:30.475064 12863 fetcher.cpp:214] Fetching URI
'hdfs:/:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
I0519 02:36:30.747372 12863 fetcher.cpp:99] Fetching URI
'hdfs://X:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' using Hadoop
Client
I0519 02:36:30.747546 12863 fetcher.cpp:109] Downloading resource from
'hdfs://:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' to
'/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
I0519 02:36:34.205878 12863 fetcher.cpp:78] Extracted resource
'/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
into
'/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3'
*Error: Could not find or load main class two*

---

And from the Spark Terminal:

---
15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
SparkPi.scala:35
15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
SparkPi.scala:35
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 7 in stage 0.0 failed 4 times, most recent failure:
Lost task 7.3 in stage 0.0 (TID 26, ): ExecutorLostFailure
(executor lost)
Driver stacktrace: at
org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)atorg.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
..
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

---

Any help will be greatly appreciated!

Regards,
Panagiotis


Re: Mesos Spark Tasks - Lost

2015-05-20 Thread Panagiotis Garefalakis
Tim thanks for your reply,

I am following this quite clear mesos-spark tutorial:
https://docs.mesosphere.com/tutorials/run-spark-on-mesos/
So mainly I tried running spark-shell which locally works fine but when the
jobs are submitted through mesos something goes wrong!

My question is: is there a some extra configuration needed for the workers
(that is not mentioned at the tutorial) ??

The Executor Lost message I get is really generic so I dont know whats
going on..
Please check the attached mesos execution event log.

Thanks again,
Panagiotis


On Wed, May 20, 2015 at 8:21 AM, Tim Chen  wrote:

> Can you share your exact spark-submit command line?
>
> And also cluster mode is not yet released yet (1.4) and doesn't support
> spark-shell, so I think you're just using client mode unless you're using
> latest master.
>
> Tim
>
> On Tue, May 19, 2015 at 8:57 AM, Panagiotis Garefalakis <
> panga...@gmail.com> wrote:
>
>> Hello all,
>>
>> I am facing a weird issue for the last couple of days running Spark on
>> top of Mesos and I need your help. I am running Mesos in a private cluster
>> and managed to deploy successfully  hdfs, cassandra, marathon and play but
>> Spark is not working for a reason. I have tried so far:
>> different java versions (1.6 and 1.7 oracle and openjdk), different
>> spark-env configuration, different Spark versions (from 0.8.8 to 1.3.1),
>> different HDFS versions (hadoop 5.1 and 4.6), and updating pom dependencies.
>>
>> More specifically while local tasks complete fine, in cluster mode all
>> the tasks get lost.
>> (both using spark-shell and spark-submit)
>> From the worker log I see something like this:
>>
>> ---
>> I0519 02:36:30.475064 12863 fetcher.cpp:214] Fetching URI
>> 'hdfs:/:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
>> I0519 02:36:30.747372 12863 fetcher.cpp:99] Fetching URI
>> 'hdfs://X:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' using Hadoop
>> Client
>> I0519 02:36:30.747546 12863 fetcher.cpp:109] Downloading resource from
>> 'hdfs://:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' to
>> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
>> I0519 02:36:34.205878 12863 fetcher.cpp:78] Extracted resource
>> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
>> into
>> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3'
>> *Error: Could not find or load main class two*
>>
>> ---
>>
>> And from the Spark Terminal:
>>
>> ---
>> 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
>> 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
>> 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
>> SparkPi.scala:35
>> 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
>> SparkPi.scala:35
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>> due to stage failure: Task 7 in stage 0.0 failed 4 times, most recent
>> failure: Lost task 7.3 in stage 0.0 (TID 26, ): ExecutorLostFailure
>> (executor lost)
>> Driver stacktrace: at
>> org.apache.spark.scheduler.DAGScheduler.org
>> <http://org.apache.spark.scheduler.dagscheduler.org/>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)atorg.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> ..
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>> ---
>>
>> Any help will be greatly appreciated!
>>
>> Regards,
>> Panagiotis
>>
>
>


-sparklogs-spark-shell-1431993674182-EVENT_LOG_1
Description: Binary data

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org