RE: Query on Spark Hive with kerberos Enabled on Kubernetes

2018-07-23 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi Sandeep, Any inputs on this? Regards Surya From: Garlapati, Suryanarayana (Nokia - IN/Bangalore) Sent: Saturday, July 21, 2018 6:50 PM To: Sandeep Katta Cc: d...@spark.apache.org; user@spark.apache.org Subject: RE: Query on Spark Hive with kerberos Enabled on Kubernetes Hi Sandeep, Thx for

Re: Spark on Mesos - Weird behavior

2018-07-23 Thread Thodoris Zois
Hi Susan, This is exactly what we have used. Thank you for your interest! - Thodoris > On 23 Jul 2018, at 20:55, Susan X. Huynh wrote: > > Hi Thodoris, > > Maybe setting "spark.scheduler.minRegisteredResourcesRatio" to > 0 would > help? Default value is 0 with Mesos. > > "The minimum

Re: Spark on Mesos: Spark issuing hundreds of SUBSCRIBE requests / second and crashing Mesos

2018-07-23 Thread Nimi W
That does sound like it could be it - I checked our libmesos version and it is 1.4.1. I'll try upgrading libmesos. Thanks. On Mon, Jul 23, 2018 at 12:13 PM Susan X. Huynh wrote: > Hi Nimi, > > This sounds similar to a bug I have come across before. See: >

Re: Spark on Mesos: Spark issuing hundreds of SUBSCRIBE requests / second and crashing Mesos

2018-07-23 Thread Susan X. Huynh
Hi Nimi, This sounds similar to a bug I have come across before. See: https://jira.apache.org/jira/browse/SPARK-22342?focusedCommentId=16429950=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16429950 It turned out to be a bug in libmesos (the client library used to

Re: Interest in adding ability to request GPU's to the spark client?

2018-07-23 Thread Susan X. Huynh
There's some discussion and proposal of supporting GPUs in this Spark JIRA: https://jira.apache.org/jira/browse/SPARK-24615 "Accelerator-aware task scheduling for Spark" Susan On Thu, Jul 12, 2018 at 11:17 AM, Mich Talebzadeh wrote: > I agree. > > Adding GPU capability to Spark in my opinion

Re: Re: Re: spark sql data skew

2018-07-23 Thread Gourav Sengupta
https://docs.databricks.com/spark/latest/spark-sql/skew-join.html The above might help, in case you are using a join. On Mon, Jul 23, 2018 at 4:49 AM, 崔苗 wrote: > but how to get count(distinct userId) group by company from count(distinct > userId) group by company+x? > count(userId) is

Re: Spark on Mesos - Weird behavior

2018-07-23 Thread Susan X. Huynh
Hi Thodoris, Maybe setting "spark.scheduler.minRegisteredResourcesRatio" to > 0 would help? Default value is 0 with Mesos. "The minimum ratio of registered resources (registered resources / total expected resources) (resources are executors in yarn mode and Kubernetes mode, CPU cores in

Re: [Spark Structured Streaming on K8S]: Debug - File handles/descriptor (unix pipe) leaking

2018-07-23 Thread Abhishek Tripathi
Hello Dev! Spark structured streaming job with simple window aggregation is leaking file descriptor on kubernetes as cluster manager setup. It seems bug. I am suing HDFS as FS for checkpointing. Have anyone observed same? Thanks for any help. Please find more details in trailing email. For

Re: [Structured Streaming] Avoiding multiple streaming queries

2018-07-23 Thread Silvio Fiorito
Using the current Kafka sink that supports routing based on topic column, you could just duplicate the rows (e.g. explode rows with different topic, key values). That way you’re only reading and processing the source once and not having to resort to custom sinks, foreachWriter, or multiple

Apache Spark Cluster

2018-07-23 Thread Uğur Sopaoğlu
We try to create a cluster which consists of 4 machines. The cluster will be used by multiple-users. How can we configured that user can submit jobs from personal computer and is there any free tool you can suggest to leverage procedure. -- Uğur Sopaoğlu

Re: [Structured Streaming] Avoiding multiple streaming queries

2018-07-23 Thread kant kodali
understand each row has a topic column but can we write one row to multiple topics? On Thu, Jul 12, 2018 at 11:00 AM, Arun Mahadevan wrote: > What I meant was the number of partitions cannot be varied with > ForeachWriter v/s if you were to write to each sink using independent > queries. Maybe