spark.sql.hive.exec.dynamic.partition description

2019-04-29 Thread Mike Chan
Hi Guys, Does any one have detailed descriptions for hive parameters in spark? like spark.sql.hive.exec.dynamic.partition I couldn't find any reference in my spark 2.3.2 configuration. I'm looking into a problem that Spark cannot understand Hive partition at all. In my Hive table it is

unsubscribe

2019-04-29 Thread Amrit Jangid

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

2019-04-29 Thread Rishi Shah
modified the subject & would like to clarify that I am looking to create an anaconda parcel with pyarrow and other libraries, so that I can distribute it on the cloudera cluster.. On Tue, Apr 30, 2019 at 12:21 AM Rishi Shah wrote: > Hi All, > > I have been trying to figure out a way to build

Anaconda installation with Pyspark on cloudera managed server

2019-04-29 Thread Rishi Shah
Hi All, I have been trying to figure out a way to build anaconda parcel with pyarrow included for my cloudera managed server for distribution but this doesn't seem to work right. Could someone please help? I have tried to install anaconda on one of the management nodes on cloudera cluster...

Re: [EXT] handling skewness issues

2019-04-29 Thread Jules Damji
Yes, indeed! A few talks in the developer and deep dives address the data skews issue and how to address them. I shall let the group know when the talk sessions are available. Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Apr 29, 2019, at 2:13 PM, Michael Mansour >

Re: Handle Null Columns in Spark Structured Streaming Kafka

2019-04-29 Thread Jason Nerothin
See also here: https://stackoverflow.com/questions/44671597/how-to-replace-null-values-with-a-specific-value-in-dataframe-using-spark-in-jav On Mon, Apr 29, 2019 at 5:27 PM Jason Nerothin wrote: > Spark SQL has had an na.fill function on it since at least 2.1. Would that > work for you? > > >

Re: Handle Null Columns in Spark Structured Streaming Kafka

2019-04-29 Thread Jason Nerothin
Spark SQL has had an na.fill function on it since at least 2.1. Would that work for you? https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/DataFrameNaFunctions.html On Mon, Apr 29, 2019 at 4:57 PM Shixiong(Ryan) Zhu wrote: > Hey Snehasish, > > Do you have a reproducer for this

Re: [EXT] handling skewness issues

2019-04-29 Thread Michael Mansour
There were recently some fantastic talks about this at the SparkSummit conference in San Francisco. I suggest you check out the SparkSummit YouTube channel after May 9th for a deep dive into this topic. From: rajat kumar Date: Monday, April 29, 2019 at 9:34 AM To: "user@spark.apache.org"

Re: spark hive concurrency

2019-04-29 Thread Mich Talebzadeh
That assertion seems to be true. Spark does not seem to hold locks when doing DML on a Hive table. I cannot recall whether I checked it in previous versions of Spark. However, in Spark 2.3 I can see that is true using Hive 3.0 This may be a potential oversight as Spark SQL and Hive are drifting

Issue with offset management using Spark on Dataproc

2019-04-29 Thread Austin Weaver
Hey guys, relatively new Spark Dev here and i'm seeing some kafka offset issues and was wondering if you guys could help me out. I am currently running a spark job on Dataproc and am getting errors trying to re-join a group and read data from a kafka topic. I have done some digging and am not

handling skewness issues

2019-04-29 Thread rajat kumar
Hi All, How to overcome skewness issues in spark ? I read that we can add some randomness to key column before join and remove that random part after join. is there any better way ? Above method seems to be a workaround. thanks rajat

Spark 2.4.1 on Kubernetes - DNS resolution of driver fails

2019-04-29 Thread Olivier Girardot
Hi everyone, I have ~300 spark job on Kubernetes (GKE) using the cluster auto-scaler, and sometimes while running these jobs a pretty bad thing happens, the driver (in cluster mode) gets scheduled on Kubernetes and launches many executor pods. So far so good, but the k8s "Service" associated to

Re: Getting EOFFileException while reading from sequence file in spark

2019-04-29 Thread Prateek Rajput
I checked and removed 0 sized files then also it is coming. And sometimes when there is no 0 size file then also it is happening. I checked data also if it is corrupted by directly opening that file and checking it. I traced whole data but did not find any issue. For hadoop Map-Reduce no such

Re: Getting EOFFileException while reading from sequence file in spark

2019-04-29 Thread Deepak Sharma
This can happen if the file size is 0 On Mon, Apr 29, 2019 at 2:28 PM Prateek Rajput wrote: > Hi guys, > I am getting this strange error again and again while reading from from a > sequence file in spark. > User class threw exception: org.apache.spark.SparkException: Job aborted. > at >

Getting EOFFileException while reading from sequence file in spark

2019-04-29 Thread Prateek Rajput
Hi guys, I am getting this strange error again and again while reading from from a sequence file in spark. User class threw exception: org.apache.spark.SparkException: Job aborted. at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:100) at

spark hive concurrency

2019-04-29 Thread CPC
Hi All, Does spark2 support concurrency on hive tables? I mean when we query with hive and issue show locks we can see shared locks. But when we use spark sql and query tables we could not see any locks on tables. Thanks in advance..