Re: Run Multiple Spark jobs. Reduce Execution time.

2018-02-14 Thread akshay naidu
a small hint would be very helpful . On Wed, Feb 14, 2018 at 5:17 PM, akshay naidu wrote: > Hello Siva, > Thanks for your reply. > > Actually i'm trying to generate online reports for my clients. For this I > want the jobs should be executed faster without putting any

stdout: org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in

2018-02-14 Thread kant kodali
Hi All, I get an AnalysisException when I run the following query spark.sql(select current_timestamp() as tsp, count(*) from table group by window(tsp, '5 minutes')) I just want create a processing time columns and want to run some simple stateful query like above. I understand

Re: [Structured Streaming] Avoiding multiple streaming queries

2018-02-14 Thread Tathagata Das
Of course, you can write to multiple Kafka topics from a single query. If your dataframe that you want to write has a column named "topic" (along with "key", and "value" columns), it will write the contents of a row to the topic in that row. This automatically works. So the only thing you need to

Re: Spark structured streaming: periodically refresh static data frame

2018-02-14 Thread Tathagata Das
1. Just loop like this. def startQuery(): Streaming Query = { // Define the dataframes and start the query } // call this on main thread while (notShutdown) { val query = startQuery() query.awaitTermination(refreshIntervalMs) query.stop() // refresh static data } 2. Yes,

Re: Why python cluster mode is not supported in standalone cluster?

2018-02-14 Thread Ashwin Sai Shankar
+dev mailing list(since i didn't get a response from user DL) On Tue, Feb 13, 2018 at 12:20 PM, Ashwin Sai Shankar wrote: > Hi Spark users! > I noticed that spark doesn't allow python apps to run in cluster mode in > spark standalone cluster. Does anyone know the reason? I

[Spark-Core]port opened by the SparkDriver is vulnerable to flooding attacks

2018-02-14 Thread sandeep-katta
SparkSubmit will open the port to communicate with the APP Master and executors. This port is not closing the IDLE connections,so it is vulnerable for DOS attack,I did telnet IP port and this connection is not closed. In order to fix this I tried to Handle in the *userEventTriggered *of

Re: SparkR test script issue: unable to run run-tests.h on spark 2.2

2018-02-14 Thread chandan prakash
Thanks a lot Hyukjin & Felix. It was helpful. Going to older version worked. Regards, Chandan On Wed, Feb 14, 2018 at 3:28 PM, Felix Cheung wrote: > Yes it is issue with the newer release of testthat. > > To workaround could you install an earlier version with

Re: Run Multiple Spark jobs. Reduce Execution time.

2018-02-14 Thread akshay naidu
Hello Siva, Thanks for your reply. Actually i'm trying to generate online reports for my clients. For this I want the jobs should be executed faster without putting any job on QUEUE irrespective of the number of jobs different clients are executing from different locations. currently , a job

Re: Spark structured streaming: periodically refresh static data frame

2018-02-14 Thread Appu K
TD, Thanks a lot for the quick reply :) Did I understand it right that in the main thread, to wait for the termination of the context I'll not be able to use outStream.awaitTermination() - [ since i'll be closing in inside another thread ] What would be a good approach to keep the main app

Re: Spark structured streaming: periodically refresh static data frame

2018-02-14 Thread Tathagata Das
Let me fix my mistake :) What I suggested in that earlier thread does not work. The streaming query that joins a streaming dataset with a batch view, does not correctly pick up when the view is updated. It works only when you restart the query. That is, - stop the query - recreate the dataframes,

Re: Spark structured streaming: periodically refresh static data frame

2018-02-14 Thread Appu K
More specifically, Quoting TD from the previous thread "Any streaming query that joins a streaming dataframe with the view will automatically start using the most updated data as soon as the view is updated” Wondering if I’m doing something wrong in

Re: Run Multiple Spark jobs. Reduce Execution time.

2018-02-14 Thread akshay naidu
** yarn-site.xml yarn.scheduler.fair.preemption.cluster-utilization-threshold 0.8 yarn.scheduler.minimum-allocation-mb 3584 yarn.scheduler.maximum-allocation-mb

Re: Run Multiple Spark jobs. Reduce Execution time.

2018-02-14 Thread akshay naidu
On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu wrote: > Hello, > I'm try to run multiple spark jobs on cluster running in yarn. > Master is 24GB server with 6 Slaves of 12GB > > fairscheduler.xml settings are - > > FAIR > 10 > 2 > > > I am running 8 jobs

Re: SparkR test script issue: unable to run run-tests.h on spark 2.2

2018-02-14 Thread Felix Cheung
Yes it is issue with the newer release of testthat. To workaround could you install an earlier version with devtools? will follow up for a fix. _ From: Hyukjin Kwon Sent: Wednesday, February 14, 2018 6:49 PM Subject: Re: SparkR test script

Re: SparkR test script issue: unable to run run-tests.h on spark 2.2

2018-02-14 Thread Hyukjin Kwon
>From a very quick look, I think testthat version issue with SparkR. I had to fix that version to 1.x before in AppVeyor. There are few details in https://github.com/apache/spark/pull/20003 Can you check and lower testthat version? On 14 Feb 2018 6:09 pm, "chandan prakash"

Spark structured streaming: periodically refresh static data frame

2018-02-14 Thread Appu K
Hi, I had followed the instructions from the thread https://mail-archives.apache.org/mod_mbox/spark-user/201704.mbox/%3cd1315d33-41cd-4ba3-8b77-0879f3669...@qvantel.com%3E while trying to reload a static data frame periodically that gets joined to a structured streaming query. However, the

SparkR test script issue: unable to run run-tests.h on spark 2.2

2018-02-14 Thread chandan prakash
Hi All, I am trying to run test script of R under ./R/run-tests.sh but hitting same ERROR everytime. I tried running on mac as well as centos machine, same issue coming up. I am using spark 2.2 (branch-2.2) I followed from apache doc and followed the steps: 1. installed R 2. installed packages