How to query on Cassandra and load results in Spark dataframe

2019-01-22 Thread Soheil Pourbafrani
Hi, Using the command val table = spark .read .format("org.apache.spark.sql.cassandra") .options(Map( "table" -> "A", "keyspace" -> "B")) .load someone can load whole table data into a dataframe. Instead, I want to run a query in Cassandra and load just the result in dataframe (not

Re: How to sleep Spark job

2019-01-22 Thread Soheil Pourbafrani
Thanks for the tip! On Wed, Jan 23, 2019 at 10:28 AM Moein Hosseini wrote: > In this manner, your application should create distinct jobs each time. So > for the first time you driver create DAG and do it with help of executors, > then finish the job and goes to sleep( Driver/Application ).

Re: How to sleep Spark job

2019-01-22 Thread Moein Hosseini
In this manner, your application should create distinct jobs each time. So for the first time you driver create DAG and do it with help of executors, then finish the job and goes to sleep( Driver/Application ). When it wakes up, it will create new Job and DAG and ... Some how same as create

dropping unused data from a stream

2019-01-22 Thread Paul Tremblay
I will be streaming data and am trying to understand how to get rid of old data from a stream so it does not become to large. I will stream in one large table of buying data and join that to another table of different data. I need the last 14 days from the second table. I will not need data that

Re: How to sleep Spark job

2019-01-22 Thread Kevin Mellott
I’d recommend using a scheduler of some kind to trigger your job each hour, and have the Spark job exit when it completes. Spark is not meant to run in any type of “sleep mode”, unless you want to run a structured streaming job and create a separate process to pull data from Casandra and publish

Re: How to sleep Spark job

2019-01-22 Thread Moein Hosseini
Hi Soheil, Yes, It's possible to force your application to sleep after Job do { // Your spark job goes here Thread.sleep(360); } while(true); But maybe AirFlow is better option if you need scheduler on your Spark Job. On Wed, Jan 23, 2019 at 9:26 AM Soheil Pourbafrani wrote: > Hi,

How to sleep Spark job

2019-01-22 Thread Soheil Pourbafrani
Hi, I want to submit a job in YARN cluster to read data from Cassandra and write them in HDFS, every hour, for example. Is it possible to make Spark Application sleep in a while true loop and awake every hour to process data?

Local Storage Encryption - Spark ioEncryption

2019-01-22 Thread Sinha, Breeta (Nokia - IN/Bangalore)
Hi All, We are trying to enable encryption between spark-shuffle and local FileSystem. We wanted to clarify our understanding on this. Currently we're working on Spark 2.4 According to our understanding of Spark supporting Local Storage Encryption, that is, "Enabling local disk I/O

Re:Increase time for Spark Job to be in Accept mode in Yarn

2019-01-22 Thread 大啊
Hi , please tell me why you need to increase the time? At 2019-01-22 18:38:29, "Chetan Khatri" wrote: Hello Spark Users, Can you please tell me how to increase the time for Spark job to be in Accept mode in Yarn. Thank you. Regards, Chetan

Re:Re:Spark Core InBox.scala has error

2019-01-22 Thread 大啊
I have check the method `process` of class `Inbox`. If the message is not null ,will continue treat next message. If the message is null,will exit the process. This logic looks right. 在 2019-01-23 11:35:59,"大啊" 写道: Could you show how you hit this error? At 2019-01-23 09:50:16,

Re:Spark Core InBox.scala has error

2019-01-22 Thread 大啊
Could you show how you hit this error? At 2019-01-23 09:50:16, "kaishen" wrote: >Inbox.scala, line 158: >message = messages.poll() >if the message is not null, then it will be lost and never be executed. >Please help to verify this bug! > > > > >-- >Sent from:

Spark Core InBox.scala has error

2019-01-22 Thread kaishen
Inbox.scala, line 158: message = messages.poll() if the message is not null, then it will be lost and never be executed. Please help to verify this bug! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: How to force-quit a Spark application?

2019-01-22 Thread Pola Yao
Hi Marcelo, I have dumped through jstack, and saw the ShutdownHookManager : ''' "Thread-1" #19 prio=5 os_prio=0 tid=0x7f9b6828e800 nid=0x77cb waiting on condition [0x7f9a123e3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking

Re: I have trained a ML model, now what?

2019-01-22 Thread Felix Cheung
About deployment/serving SPIP https://issues.apache.org/jira/browse/SPARK-26247 From: Riccardo Ferrari Sent: Tuesday, January 22, 2019 8:07 AM To: User Subject: I have trained a ML model, now what? Hi list! I am writing here to here about your experience on

I have trained a ML model, now what?

2019-01-22 Thread Riccardo Ferrari
Hi list! I am writing here to here about your experience on putting Spark ML models into production at scale. I know it is a very broad topic with many different faces depending on the use-case, requirements, user base and whatever is involved in the task. Still I'd like to open a thread about

RE: Spark UI History server on Kubernetes

2019-01-22 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Hi, We’ve setup spark-history service (based on spark 2.4) on K8S. UI works perfectly fine when running on NodePort. We’re facing some issues when on ingress. Please let us know what kind of inputs do you need? Thanks and Regards, Abhishek From: Battini Lakshman Sent: Tuesday, January 22,

Spark UI History server on Kubernetes

2019-01-22 Thread Battini Lakshman
Hello, We are running Spark 2.4 on Kubernetes cluster, able to access the Spark UI using "kubectl port-forward". However, this spark UI contains currently running Spark application logs, we would like to maintain the 'completed' spark application logs as well. Could someone help us to setup

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-22 Thread Jörn Franke
You can try with Yarn node labels: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/NodeLabel.html Then you can whitelist nodes. > Am 19.01.2019 um 00:20 schrieb Serega Sheypak : > > Hi, is there any possibility to tell Scheduler to blacklist specific nodes in > advance?

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-22 Thread Attila Zsolt Piros
The new issue is https://issues.apache.org/jira/browse/SPARK-26688. On Tue, Jan 22, 2019 at 11:30 AM Attila Zsolt Piros wrote: > Hi, > > >> Is it this one: https://github.com/apache/spark/pull/23223 ? > > No. My old development was https://github.com/apache/spark/pull/21068, > which is closed.

Increase time for Spark Job to be in Accept mode in Yarn

2019-01-22 Thread Chetan Khatri
Hello Spark Users, Can you please tell me how to increase the time for Spark job to be in *Accept* mode in Yarn. Thank you. Regards, Chetan

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-22 Thread Attila Zsolt Piros
Hi, >> Is it this one: https://github.com/apache/spark/pull/23223 ? No. My old development was https://github.com/apache/spark/pull/21068, which is closed. This would be a new improvement with a new Apache JIRA issue ( https://issues.apache.org) and with a new Github pull request. >> Can I try