date:20180314

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Aakash Basu

Thanks to TD, the savior! Shall look into it. On Thu, Mar 15, 2018 at 1:04 AM, Tathagata Das wrote: > Relevant: https://databricks.com/blog/2018/03/13/ > introducing-stream-stream-joins-in-apache-spark-2-3.html > > This is true stream-stream join which will

Re: How to start practicing Python Spark Streaming in Linux?

2018-03-14 Thread Felix Cheung

It’s best to start with Structured Streaming https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#tab_python_0 https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#tab_python_0 _ From: Aakash Basu

Spark Conf

2018-03-14 Thread Vinyas Shetty

Hi, I am trying to understand the spark internals ,so was looking the spark code flow. Now in a scenario where i do a spark-submit in yarn cluster mode with --executor-memory 8g via command line ,now how does spark know about this exectuor memory value ,since in SparkContext i see :

Re: retention policy for spark structured streaming dataset

2018-03-14 Thread Lian Jiang

It is already partitioned by timestamp. But is it right retention policy process to stop the streaming job, trim the parquet file and restart the streaming job? Thanks. On Wed, Mar 14, 2018 at 12:51 PM, Sunil Parmar wrote: > Can you use partitioning ( by day ) ? That will

Re: retention policy for spark structured streaming dataset

2018-03-14 Thread Sunil Parmar

Can you use partitioning ( by day ) ? That will make it easier to drop data older than x days outside streaming job. Sunil Parmar On Wed, Mar 14, 2018 at 11:36 AM, Lian Jiang wrote: > I have a spark structured streaming job which dump data into a parquet > file. To

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Tathagata Das

Relevant: https://databricks.com/blog/2018/03/13/introducing-stream-stream-joins-in-apache-spark-2-3.html This is true stream-stream join which will automatically buffer delayed data and appropriately join stuff with SQL join semantics. Please check it out :) TD On Wed, Mar 14, 2018 at 12:07

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali

Do I need to set SPARK_DIST_CLASSPATH or SPARK_CLASSPATH ? The latest version of spark (2.3) only has SPARK_CLASSPATH. On Wed, Mar 14, 2018 at 11:37 AM, kant kodali wrote: > Hi, > > I am not using emr. And yes I restarted several times. > > On Wed, Mar 14, 2018 at 6:35 AM,

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Dylan Guedes

I misread it, and thought that you question was if pyspark supports kafka lol. Sorry! On Wed, Mar 14, 2018 at 3:58 PM, Aakash Basu wrote: > Hey Dylan, > > Great! > > Can you revert back to my initial and also the latest mail? > > Thanks, > Aakash. > > On 15-Mar-2018

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Aakash Basu

Hey Dylan, Great! Can you revert back to my initial and also the latest mail? Thanks, Aakash. On 15-Mar-2018 12:27 AM, "Dylan Guedes" wrote: > Hi, > > I've been using the Kafka with pyspark since 2.1. > > On Wed, Mar 14, 2018 at 3:49 PM, Aakash Basu

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Dylan Guedes

Hi, I've been using the Kafka with pyspark since 2.1. On Wed, Mar 14, 2018 at 3:49 PM, Aakash Basu wrote: > Hi, > > I'm yet to. > > Just want to know, when does Spark 2.3 with 0.10 Kafka Spark Package > allows Python? I read somewhere, as of now Scala and Java are

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Aakash Basu

Hi, I'm yet to. Just want to know, when does Spark 2.3 with 0.10 Kafka Spark Package allows Python? I read somewhere, as of now Scala and Java are the languages to be used. Please correct me if am wrong. Thanks, Aakash. On 14-Mar-2018 8:24 PM, "Georg Heiler" wrote:

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali

Hi, I am not using emr. And yes I restarted several times. On Wed, Mar 14, 2018 at 6:35 AM, Anthony, Olufemi < olufemi.anth...@capitalone.com> wrote: > After you updated your yarn-site.xml file, did you restart the YARN > resource manager ? > > > >

retention policy for spark structured streaming dataset

2018-03-14 Thread Lian Jiang

I have a spark structured streaming job which dump data into a parquet file. To avoid the parquet file grows infinitely, I want to discard 3 month old data. Does spark streaming supports this? Or I need to stop the streaming job, trim the parquet file and restart the streaming job? Thanks for any

Re: Spark Job Server application compilation issue

2018-03-14 Thread sujeet jog

Thanks for pointing . On Wed, Mar 14, 2018 at 11:19 PM, Vadim Semenov wrote: > This question should be directed to the `spark-jobserver` group: > https://github.com/spark-jobserver/spark-jobserver#contact > > They also have a gitter chat. > > Also include the errors you

Re: Spark Job Server application compilation issue

2018-03-14 Thread Vadim Semenov

This question should be directed to the `spark-jobserver` group: https://github.com/spark-jobserver/spark-jobserver#contact They also have a gitter chat. Also include the errors you get once you're going to be asking them a question On Wed, Mar 14, 2018 at 1:37 PM, sujeet jog

Spark Job Server application compilation issue

2018-03-14 Thread sujeet jog

Input is a json request, which would be decoded in myJob() & processed further. Not sure what is wrong with below code, it emits errors as unimplemented methods (runJob/validate), any pointers on this would be helpful, jobserver-0.8.0 object MyJobServer extends SparkSessionJob { type JobData

Bisecting Kmeans Linkage Matrix Output (Cluster Indices)

2018-03-14 Thread GabeChurch

I have been working on a project to return a Linkage Matrix output from the Spark Bisecting Kmeans Algorithm output so that it is possible to plot the selection steps in a dendogram. I am having trouble returning valid Indices when I use more than 3-4 clusters in the algorithm and am hoping

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Georg Heiler

Did you try spark 2.3 with structured streaming? There watermarking and plain sql might be really interesting for you. Aakash Basu schrieb am Mi. 14. März 2018 um 14:57: > Hi, > > > > *Info (Using):Spark Streaming Kafka 0.8 package* > > *Spark 2.2.1* > *Kafka 1.0.1* >

Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Aakash Basu

Hi, *Info (Using):Spark Streaming Kafka 0.8 package* *Spark 2.2.1* *Kafka 1.0.1* As of now, I am feeding paragraphs in Kafka console producer and my Spark, which is acting as a receiver is printing the flattened words, which is a complete RDD operation. *My motive is to read two tables

Re: How to run spark shell using YARN

2018-03-14 Thread Anthony, Olufemi

After you updated your yarn-site.xml file, did you restart the YARN resource manager ? https://aws.amazon.com/premiumsupport/knowledge-center/restart-service-emr/ Femi From: kant kodali Date: Wednesday, March 14, 2018 at 6:16 AM To: Femi Anthony Cc:

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali

16GB RAM. AWS m4.xlarge. It's a three node cluster and I only have YARN and HDFS running. Resources are barely used however I believe there is something in my config that is preventing YARN to see that I have good amount of resources I think (thats my guess I never worked with YARN before). My

Re: Insufficient memory for Java Runtime

2018-03-14 Thread Femi Anthony

Try specifying executor memory. On Tue, Mar 13, 2018 at 5:15 PM, Shiyuan wrote: > Hi Spark-Users, > I encountered the problem of "insufficient memory". The error is logged > in the file with a name " hs_err_pid86252.log"(attached in the end of this > email). > > I launched

Re: How to run spark shell using YARN

2018-03-14 Thread Femi Anthony

What's the hardware configuration of the box you're running on i.e. how much memory does it have ? Femi On Wed, Mar 14, 2018 at 5:32 AM, kant kodali wrote: > Tried this > > ./spark-shell --master yarn --deploy-mode client --executor-memory 4g > > > Same issue. Keeps going

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali

Tried this ./spark-shell --master yarn --deploy-mode client --executor-memory 4g Same issue. Keeps going forever.. 18/03/14 09:31:25 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1521019884656

Re: How to run spark shell using YARN

2018-03-14 Thread Femi Anthony

Make sure you have enough memory allocated for Spark workers, try specifying executor memory as follows: --executor-memory to spark-submit. On Wed, Mar 14, 2018 at 3:25 AM, kant kodali wrote: > I am using spark 2.3.0 and hadoop 2.7.3. > > Also I have done the following

Re: [EXT] Debugging a local spark executor in pycharm

2018-03-14 Thread Vitaliy Pisarev

Actually, I stumbled on this SO page . While it is not straightforward, it is a fairly simple solution. In short: - I made sure there is only one executing task at a time by calling repartition(1) - this

Re: Spark Application stuck

2018-03-14 Thread Femi Anthony

Have you taken a look at the EMR UI ? What does your Spark setup look like ? I assume you're on EMR on AWS. The various UI urls and ports are listed here: https://docs.aws.amazon.com/emr/latest/ManagementGuide/ emr-web-interfaces.html On Wed, Mar 14, 2018 at 4:23 AM, Mukund Big Data

Spark Application stuck

2018-03-14 Thread Mukund Big Data

Hi I am executing the following recommendation engine using Spark ML https://aws.amazon.com/blogs/big-data/building-a-recommendation-engine-with-spark-ml-on-amazon-emr-using-zeppelin/ When I am trying to save the model, the application hungs and does't respond. Any pointers to find where the

How to start practicing Python Spark Streaming in Linux?

2018-03-14 Thread Aakash Basu

Hi all, Any guide on how to kich-start learning PySpark Streaming in ubuntu standalone system? Step wise, practical hands-on, would be great. Also, connecting Kafka with Spark and getting real time data and processing it in micro-batches... Any help? Thanks, Aakash.

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali

I am using spark 2.3.0 and hadoop 2.7.3. Also I have done the following and restarted all. But I still see ACCEPTED: waiting for AM container to be allocated, launched and register with RM. And i am unable to spawn spark-shell. editing $HADOOP_HOME/etc/hadoop/capacity-scheduler.xml and change

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali

any idea? On Wed, Mar 14, 2018 at 12:12 AM, kant kodali wrote: > I set core-site.xml, hdfs-site.xml, yarn-site.xml as per this website > and these are the > only three files I changed Do I need to set or change

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali

I set core-site.xml, hdfs-site.xml, yarn-site.xml as per this website and these are the only three files I changed Do I need to set or change anything in mapred-site.xml (As of now I have not touched mapred-site.xml)? when I do yarn -node

Re: Multiple Kafka Spark Streaming Dataframe Join query

Re: How to start practicing Python Spark Streaming in Linux?

Spark Conf

Re: retention policy for spark structured streaming dataset

Re: retention policy for spark structured streaming dataset

Re: Multiple Kafka Spark Streaming Dataframe Join query

Re: How to run spark shell using YARN

Re: Multiple Kafka Spark Streaming Dataframe Join query

Re: Multiple Kafka Spark Streaming Dataframe Join query

Re: Multiple Kafka Spark Streaming Dataframe Join query

Re: Multiple Kafka Spark Streaming Dataframe Join query

Re: How to run spark shell using YARN

retention policy for spark structured streaming dataset

Re: Spark Job Server application compilation issue

Re: Spark Job Server application compilation issue

Spark Job Server application compilation issue

Bisecting Kmeans Linkage Matrix Output (Cluster Indices)

Re: Multiple Kafka Spark Streaming Dataframe Join query

Multiple Kafka Spark Streaming Dataframe Join query

Re: How to run spark shell using YARN

Re: How to run spark shell using YARN

Re: Insufficient memory for Java Runtime

Re: How to run spark shell using YARN

Re: How to run spark shell using YARN

Re: How to run spark shell using YARN

Re: [EXT] Debugging a local spark executor in pycharm

Re: Spark Application stuck

Spark Application stuck

How to start practicing Python Spark Streaming in Linux?

Re: How to run spark shell using YARN

Re: How to run spark shell using YARN

Re: How to run spark shell using YARN

32 matches

Site Navigation

Mail list logo

Footer information