from:"Noorul Islam Kamal Malmiyoda"

Exception in Shutdown-thread, bad file descriptor

2017-12-20 Thread Noorul Islam Kamal Malmiyoda

Hi all,

We are getting the following exception and this somehow blocks the parent
thread from proceeding further.

17/11/14 16:50:09 SPARK_APP WARN NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
17/11/14 16:50:17 SPARK_APP WARN SparkContext: Use an existing
SparkContext, some configuration may not take effect.
[Stage 4:> (13 + 1) / 200][Stage 5:> (1 + 0) / 8][Stage 8:> (0 + 0) /
8]Exception
in thread "Shutdown-checker" java.lang.RuntimeException: eventfd_write()
failed: Bad file descriptor
at io.netty.channel.epoll.Native.eventFdWrite(Native Method)
at io.netty.channel.epoll.EpollEventLoop.wakeup(EpollEventLoop.java:106)
at
io.netty.util.concurrent.SingleThreadEventExecutor.shutdownGracefully(SingleThreadEventExecutor.java:538)
at
io.netty.util.concurrent.MultithreadEventExecutorGroup.shutdownGracefully(MultithreadEventExecutorGroup.java:146)
at
io.netty.util.concurrent.AbstractEventExecutorGroup.shutdownGracefully(AbstractEventExecutorGroup.java:69)
at
com.datastax.driver.core.NettyOptions.onClusterClose(NettyOptions.java:193)
at com.datastax.driver.core.Connection$Factory.shutdown(Connection.java:902)
at
com.datastax.driver.core.Cluster$Manager$ClusterCloseFuture$1.run(Cluster.java:2539)

This is very hard to replicate, so I am not able to come up with a
re-produceable recipe. Has anyone faced similar issue? Any help is
appreciated.

Spark Version: 2.0.1


Thanks and Regards
Noorul

Controlling number of spark partitions in dataframes

2017-10-26 Thread Noorul Islam Kamal Malmiyoda

Hi all,

I have the following spark configuration

spark.app.name=Test
spark.cassandra.connection.host=127.0.0.1
spark.cassandra.connection.keep_alive_ms=5000
spark.cassandra.connection.port=1
spark.cassandra.connection.timeout_ms=3
spark.cleaner.ttl=3600
spark.default.parallelism=4
spark.master=local[2]
spark.ui.enabled=false
spark.ui.showConsoleProgress=false

Because I am setting spark.default.parallelism to 4, I was expecting
only 4 spark partitions. But it looks like it is not the case

When I do the following

df.foreachPartition { partition =>
  val groupedPartition = partition.toList.grouped(3).toList
  println("Grouped partition " + groupedPartition)
}

There are too many print statements with empty list at the top. Only
the relevant partitions are at the bottom. Is there a way to control
number of partitions?

Regards,
Noorul

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

What is the equivalent of forearchRDD in DataFrames?

2017-10-26 Thread Noorul Islam Kamal Malmiyoda

Hi all,

I have a Dataframe with 1000 records. I want to split them into 100
each and post to rest API.

If it was RDD, I could use something like this

myRDD.foreachRDD {
  rdd =>
rdd.foreachPartition {
  partition => {

This will ensure that code is executed on executors and not on driver.

Is there any similar approach that we can take for Dataframes? I see
examples on stackoverflow with collect() which will bring whole data
to driver.

Thanks and Regards
Noorul

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: How best we can store streaming data on dashboards for real time user experience?

2017-03-29 Thread Noorul Islam Kamal Malmiyoda

I think better place would be a in memory cache for real time.

Regards,
Noorul

On Thu, Mar 30, 2017 at 10:31 AM, Gaurav1809  wrote:
> I am getting streaming data and want to show them onto dashboards in real
> time?
> May I know how best we can handle these streaming data? where to store? (DB
> or HDFS or ???)
> I want to give users a real time analytics experience.
>
> Please suggest possible ways. Thanks.
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/How-best-we-can-store-streaming-data-on-dashboards-for-real-time-user-experience-tp28548.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Application kill from UI do not propagate exception

2017-03-24 Thread Noorul Islam Kamal Malmiyoda

Hi all,

I am trying to trap UI kill event of a spark application from driver.
Some how the exception thrown is not propagated to the driver main
program. See for example using spark-shell below.

Is there a way to get hold of this event and shutdown the driver program?

Regards,
Noorul


spark@spark1:~/spark-2.1.0/sbin$ spark-shell --master spark://10.29.83.162:7077
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
17/03/23 15:16:47 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable
17/03/23 15:16:53 WARN ObjectStore: Failed to get database
global_temp, returning NoSuchObjectException
Spark context Web UI available at http://10.29.83.162:4040
Spark context available as 'sc' (master = spark://10.29.83.162:7077,
app id = app-20170323151648-0002).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0
  /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 17/03/23 15:17:28 ERROR StandaloneSchedulerBackend: Application
has been killed. Reason: Master removed our application: KILLED
17/03/23 15:17:28 ERROR Inbox: Ignoring error
org.apache.spark.SparkException: Exiting due to error from cluster
scheduler: Master removed our application: KILLED
at 
org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:459)
at 
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.dead(StandaloneSchedulerBackend.scala:139)
at 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint.markDead(StandaloneAppClient.scala:254)
at 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(StandaloneAppClient.scala:168)
at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@25b8f9d2

scala>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Application kill from UI do not propagate exception

2017-03-23 Thread Noorul Islam Kamal Malmiyoda

Hi all,

I am trying to trap UI kill event of a spark application from driver.
Some how the exception thrown is not propagated to the driver main
program. See for example using spark-shell below.

Is there a way to get hold of this event and shutdown the driver program?

Regards,
Noorul


spark@spark1:~/spark-2.1.0/sbin$ spark-shell --master spark://10.29.83.162:7077
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
17/03/23 15:16:47 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable
17/03/23 15:16:53 WARN ObjectStore: Failed to get database
global_temp, returning NoSuchObjectException
Spark context Web UI available at http://10.29.83.162:4040
Spark context available as 'sc' (master = spark://10.29.83.162:7077,
app id = app-20170323151648-0002).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0
  /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 17/03/23 15:17:28 ERROR StandaloneSchedulerBackend: Application
has been killed. Reason: Master removed our application: KILLED
17/03/23 15:17:28 ERROR Inbox: Ignoring error
org.apache.spark.SparkException: Exiting due to error from cluster
scheduler: Master removed our application: KILLED
at 
org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:459)
at 
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.dead(StandaloneSchedulerBackend.scala:139)
at 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint.markDead(StandaloneAppClient.scala:254)
at 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(StandaloneAppClient.scala:168)
at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@25b8f9d2

scala>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: How do I deal with ever growing application log

2017-03-05 Thread Noorul Islam Kamal Malmiyoda

Or you could use sinks like elasticsearch.

Regards,
Noorul

On Mon, Mar 6, 2017 at 10:52 AM, devjyoti patra  wrote:
> Timothy, why are you writing application logs to HDFS? In case you want to
> analyze these logs later, you can write to local storage on your slave nodes
> and later rotate those files to a suitable location. If they are only going
> to useful for debugging the application, you can always remove them
> periodically.
> Thanks,
> Dev
>
> On Mar 6, 2017 9:48 AM, "Timothy Chan"  wrote:
>>
>> I'm running a single worker EMR cluster for a Structured Streaming job.
>> How do I deal with my application log filling up HDFS?
>>
>> /var/log/spark/apps/application_1487823545416_0021_1.inprogress
>>
>> is currently 21.8 GB
>>
>> Sent with Shift

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Testing --supervise flag

2016-08-02 Thread Noorul Islam Kamal Malmiyoda

Widening to dev@spark

On Mon, Aug 1, 2016 at 4:21 PM, Noorul Islam K M  wrote:
>
> Hi all,
>
> I was trying to test --supervise flag of spark-submit.
>
> The documentation [1] says that, the flag helps in restarting your
> application automatically if it exited with non-zero exit code.
>
> I am looking for some clarification on that documentation. In this
> context, does application means the driver?
>
> Will the driver be re-launched if an exception is thrown by the
> application? I tested this scenario and the driver is not re-launched.
>
> ~/spark-1.6.1/bin/spark-submit --deploy-mode cluster --master 
> spark://10.29.83.162:6066 --class 
> org.apache.spark.examples.ExceptionHandlingTest 
> /home/spark/spark-1.6.1/lib/spark-examples-1.6.1-hadoop2.6.0.jar
>
> I killed the driver java process using 'kill -9' command and the driver
> is re-launched.
>
> Is this the only scenario were driver will be re-launched? Is there a
> way to simulate non-zero exit code and test the use of --supervise flag?
>
> Regards,
> Noorul
>
> [1] 
> http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Application not showing in Spark History

2016-08-02 Thread Noorul Islam Kamal Malmiyoda

Have you tried https://github.com/spark-jobserver/spark-jobserver

On Tue, Aug 2, 2016 at 2:23 PM, Rychnovsky, Dusan
 wrote:
> Hi,
>
>
> I am trying to launch my Spark application from within my Java application
> via the SparkSubmit class, like this:
>
>
>
> List args = new ArrayList<>();
>
> args.add("--verbose");
> args.add("--deploy-mode=cluster");
> args.add("--master=yarn");
> ...
>
>
> SparkSubmit.main(args.toArray(new String[args.size()]));
>
>
>
> This works fine, with one catch - the application does not appear in Spark
> History after it's finished.
>
>
> If, however, I run the application using `spark-submit.sh`, like this:
>
>
>
> spark-submit \
>
>   --verbose \
>
>   --deploy-mode=cluster \
>
>   --master=yarn \
>
>   ...
>
>
>
> the application appears in Spark History correctly.
>
>
> What am I missing?
>
>
> Also, is this a good way to launch a Spark application from within a Java
> application or is there a better way?
>
> Thanks,
>
> Dusan
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: When worker is killed driver continues to run causing issues in supervise mode

2016-07-13 Thread Noorul Islam Kamal Malmiyoda

Adding dev list
On Jul 13, 2016 5:38 PM, "Noorul Islam K M"  wrote:

>
> Spark version: 1.6.1
> Cluster Manager: Standalone
>
> I am experimenting with cluster mode deployment along with supervise for
> high availability of streaming applications.
>
> 1. Submit a streaming job in cluster mode with supervise
> 2. Say that driver is scheduled on worker1. The app started
>successfully.
> 3. Kill worker1 java process. This does not kill driver process and
>hence the application (context) is still alive.
> 4. Because of supervise flag, driver gets scheduled to new worker
>worker2 and hence a new context is created, making it a duplicate.
>
> I think this seems to be a bug.
>
> Regards,
> Noorul
>

Cassandra read throughput using DataStax connector in Spark

2015-12-26 Thread Noorul Islam Kamal Malmiyoda

Hello all,

I am using DataStax connector to read data from Cassandra and write to
another Cassandra cluster.  Infra is Amazon. I have three nodes
cluster with replication factor of 3 on both clusters.

But the throughput seems to be very low. It takes 7 minutes to
transfer around 2.5 GB/node. I think the bottleneck is at the read
side as I could see that spark node (Independent of two clusters) is
less loaded with respect to memory and CPU.

I tried tweaking some from
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-connection-parameters

Do you have any idea whether there is any parameter that I can tweak
to get better throughput?

Regards,
Noorul

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Exception in Shutdown-thread, bad file descriptor

Controlling number of spark partitions in dataframes

What is the equivalent of forearchRDD in DataFrames?

Re: How best we can store streaming data on dashboards for real time user experience?

Application kill from UI do not propagate exception

Application kill from UI do not propagate exception

Re: How do I deal with ever growing application log

Re: Testing --supervise flag

Re: Application not showing in Spark History

Re: When worker is killed driver continues to run causing issues in supervise mode

Cassandra read throughput using DataStax connector in Spark

11 matches

Site Navigation

Mail list logo

Footer information