date:20180730

Re: Use Arrow instead of Pickle without pandas_udf

2018-07-30 Thread Bryan Cutler

Here is a link to the JIRA for adding StructType support for scalar pandas_udf https://issues.apache.org/jira/browse/SPARK-24579 On Wed, Jul 25, 2018 at 3:36 PM, Hichame El Khalfi wrote: > Hey Holden, > Thanks for your reply, > > We currently using a python function that produces a

Executor lost for unknown reasons error Spark 2.3 on kubernetes

2018-07-30 Thread purna pradeep

Hello, I’m getting below error in spark driver pod logs and executor pods are getting killed midway through while the job is running and even driver pod Terminated with below intermittent error ,this happens if I run multiple jobs in parallel. Not able to see executor logs as executor pods

Executor lost for unknown reasons error Spark 2.3 on kubernetes

2018-07-30 Thread Mamillapalli, Purna Pradeep

Hello, I’m getting below error in spark driver pod logs and executor pods are getting killed midway through while the job is running and even driver pod Terminated with below intermittent error ,this happens if I run multiple jobs in parallel. Not able to see executor logs as executor pods

Re: How to Create one DB connection per executor and close it after the job is done?

2018-07-30 Thread Vadim Semenov

object MyDatabseSingleton { @transient lazy val dbConn = DB.connect(…) `transient` marks the variable to be excluded from serialization and `lazy` would open connection only when it's needed and also makes sure that the val is thread-safe

Re: How to Create one DB connection per executor and close it after the job is done?

2018-07-30 Thread kant kodali

Hi Patrick, This object must be serializable right? I wonder if I will access to this object in my driver(since it is getting created on the executor side) so I can close when I am done with my batch? Thanks! On Mon, Jul 30, 2018 at 7:37 AM, Patrick McGloin wrote: > You could use an object in

sorting on dataframe causes out of memory (java heap space)

2018-07-30 Thread msbreuer

While working with larger datasets I run into out of memory issues. Basically a hadoop sequence file is read, its contents are sorted and a hadoop map file is written back. Code works fine for workloads greater than 20gb. Than I changed one column in my dataset to store a large object and size of

Re: Kafka backlog - spark structured streaming

2018-07-30 Thread Arun Mahadevan

Heres a proposal to a add - https://github.com/apache/spark/pull/21819 Its always good to set "maxOffsetsPerTrigger" unless you want spark to process till the end of the stream in each micro batch. Even without "maxOffsetsPerTrigger" the lag can be non-zero by the time the micro batch completes.

Re: Kafka backlog - spark structured streaming

2018-07-30 Thread Burak Yavuz

If you don't set rate limiting through `maxOffsetsPerTrigger`, Structured Streaming will always process until the end of the stream. So number of records waiting to be processed should be 0 at the start of each trigger. On Mon, Jul 30, 2018 at 8:03 AM, Kailash Kalahasti <

Kafka backlog - spark structured streaming

2018-07-30 Thread Kailash Kalahasti

Is there any way to find out backlog on kafka topic while using spark structured streaming ? I checked few consumer apis but that requires to enable groupid for streaming, but seems it is not allowed. Basically i want to know number of records waiting to be processed. Any suggestions ?

Re: How to Create one DB connection per executor and close it after the job is done?

2018-07-30 Thread Patrick McGloin

You could use an object in Scala, of which only one instance will be created on each JVM / Executor. E.g. object MyDatabseSingleton { var dbConn = ??? } On Sat, 28 Jul 2018, 08:34 kant kodali, wrote: > Hi All, > > I understand creating a connection forEachPartition but I am wondering can >

Re: How to reduceByKeyAndWindow in Structured Streaming?

2018-07-30 Thread oripwk

Thanks guys, it really helps. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Using Spark Streaming for analyzing changing data

2018-07-30 Thread oripwk

We have a use case where there's a stream of events while every event has an ID and its current state with a timestamp: … 111,ready,1532949947 111,offline,1532949955 111,ongoing,1532949955 111,offline,1532949973 333,offline,1532949981 333,ongoing,1532949987 … We want to ask questions about the

How to add a new source to exsting struct streaming application, like a kafka source

2018-07-30 Thread 杨浩

How to add a new source to exsting struct streaming application, like a kafka source

How to read csv in dataframe

2018-07-30 Thread Lehak Dharmani

I am trying to read csv in spark dataframe . My Os = Ubuntu 18.04, spark-version 2.3.1, python -version 2.7.15 My code : from pyspark import SparkConf from pyspark import SparkContext from pyspark import SQLContext from pyspark.sql import SparkSession conf = SparkConf() sc = SparkContext(conf =

Re: Use Arrow instead of Pickle without pandas_udf

Executor lost for unknown reasons error Spark 2.3 on kubernetes

Executor lost for unknown reasons error Spark 2.3 on kubernetes

Re: How to Create one DB connection per executor and close it after the job is done?

Re: How to Create one DB connection per executor and close it after the job is done?

sorting on dataframe causes out of memory (java heap space)

Re: Kafka backlog - spark structured streaming

Re: Kafka backlog - spark structured streaming

Kafka backlog - spark structured streaming

Re: How to Create one DB connection per executor and close it after the job is done?

Re: How to reduceByKeyAndWindow in Structured Streaming?

Using Spark Streaming for analyzing changing data

How to add a new source to exsting struct streaming application, like a kafka source

How to read csv in dataframe

14 matches

Site Navigation

Mail list logo

Footer information