Exception in Shutdown-thread, bad file descriptor

2017-12-20 Thread Noorul Islam Kamal Malmiyoda
Hi all, We are getting the following exception and this somehow blocks the parent thread from proceeding further. 17/11/14 16:50:09 SPARK_APP WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/11/14 16:50:17 SPARK_APP

Controlling number of spark partitions in dataframes

2017-10-26 Thread Noorul Islam Kamal Malmiyoda
Hi all, I have the following spark configuration spark.app.name=Test spark.cassandra.connection.host=127.0.0.1 spark.cassandra.connection.keep_alive_ms=5000 spark.cassandra.connection.port=1 spark.cassandra.connection.timeout_ms=3 spark.cleaner.ttl=3600 spark.default.parallelism=4

What is the equivalent of forearchRDD in DataFrames?

2017-10-26 Thread Noorul Islam Kamal Malmiyoda
Hi all, I have a Dataframe with 1000 records. I want to split them into 100 each and post to rest API. If it was RDD, I could use something like this myRDD.foreachRDD { rdd => rdd.foreachPartition { partition => { This will ensure that code is executed on executors

Re: How best we can store streaming data on dashboards for real time user experience?

2017-03-29 Thread Noorul Islam Kamal Malmiyoda
I think better place would be a in memory cache for real time. Regards, Noorul On Thu, Mar 30, 2017 at 10:31 AM, Gaurav1809 wrote: > I am getting streaming data and want to show them onto dashboards in real > time? > May I know how best we can handle these streaming

Application kill from UI do not propagate exception

2017-03-24 Thread Noorul Islam Kamal Malmiyoda
Hi all, I am trying to trap UI kill event of a spark application from driver. Some how the exception thrown is not propagated to the driver main program. See for example using spark-shell below. Is there a way to get hold of this event and shutdown the driver program? Regards, Noorul

Application kill from UI do not propagate exception

2017-03-23 Thread Noorul Islam Kamal Malmiyoda
Hi all, I am trying to trap UI kill event of a spark application from driver. Some how the exception thrown is not propagated to the driver main program. See for example using spark-shell below. Is there a way to get hold of this event and shutdown the driver program? Regards, Noorul

Re: How do I deal with ever growing application log

2017-03-05 Thread Noorul Islam Kamal Malmiyoda
Or you could use sinks like elasticsearch. Regards, Noorul On Mon, Mar 6, 2017 at 10:52 AM, devjyoti patra wrote: > Timothy, why are you writing application logs to HDFS? In case you want to > analyze these logs later, you can write to local storage on your slave nodes > and

Re: Testing --supervise flag

2016-08-02 Thread Noorul Islam Kamal Malmiyoda
Widening to dev@spark On Mon, Aug 1, 2016 at 4:21 PM, Noorul Islam K M wrote: > > Hi all, > > I was trying to test --supervise flag of spark-submit. > > The documentation [1] says that, the flag helps in restarting your > application automatically if it exited with non-zero

Re: Application not showing in Spark History

2016-08-02 Thread Noorul Islam Kamal Malmiyoda
Have you tried https://github.com/spark-jobserver/spark-jobserver On Tue, Aug 2, 2016 at 2:23 PM, Rychnovsky, Dusan wrote: > Hi, > > > I am trying to launch my Spark application from within my Java application > via the SparkSubmit class, like this: > > > > List

Re: When worker is killed driver continues to run causing issues in supervise mode

2016-07-13 Thread Noorul Islam Kamal Malmiyoda
Adding dev list On Jul 13, 2016 5:38 PM, "Noorul Islam K M" wrote: > > Spark version: 1.6.1 > Cluster Manager: Standalone > > I am experimenting with cluster mode deployment along with supervise for > high availability of streaming applications. > > 1. Submit a streaming job

Cassandra read throughput using DataStax connector in Spark

2015-12-26 Thread Noorul Islam Kamal Malmiyoda
Hello all, I am using DataStax connector to read data from Cassandra and write to another Cassandra cluster. Infra is Amazon. I have three nodes cluster with replication factor of 3 on both clusters. But the throughput seems to be very low. It takes 7 minutes to transfer around 2.5 GB/node. I