date:20201007

Re: Excessive disk IO with Spark structured streaming

2020-10-07 Thread Jungtaek Lim

I can't spend too much time on explaining one by one. I strongly encourage you to do a deep-dive instead of just looking around as you want to know about "details" - that's how open source works. I'll go through a general explanation instead of replying inline; probably I'd write a blog doc if

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-07 Thread Dongjoon Hyun

Thank you so much for your feedback, Koert. Yes, SPARK-20202 was created in April 2017 and targeted for 3.1.0 since Nov 2019. However, I believe Apache Spark 3.1.0 (Hadoop 3.2/Hive 2.3 distribution) will work with old Hadoop 2.x clusters if you isolated the classpath via SPARK-31960.

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-07 Thread Koert Kuipers

it seems to me with SPARK-20202 we are no longer planning to support hadoop2 + hive 1.2. is that correct? so basically spark 3.1 will no longer run on say CDH 5.x or HDP2.x with hive? my use case is building spark 3.1 and launching on these existing clusters that are not managed by me. e.g. i do

Re: Hive on Spark in Kubernetes.

2020-10-07 Thread Yuri Oleynikov (‫יורי אולייניקוב‬‎)

Thank you very much! Отправлено с iPhone > 7 окт. 2020 г., в 17:38, mykidong написал(а): > > Hi all, > > I have recently written a blog about hive on spark in kubernetes > environment: > - https://itnext.io/hive-on-spark-in-kubernetes-115c8e9fa5c1 > > In this blog, you can find how to run

[Spark Core] - Installation issue - "java.lang.UnsatisfiedLinkError: no zstd-jni in java.library.path"

2020-10-07 Thread jelvis

Dear all,I have setup two Spark standalone test clusters which both suffered from the same problem. I have a workaround but it's bad. I would appreciate some help and input. I'm too much of a beginner to conclude that it's a bug but I found someone else having the exact same issue on Stack

Hive on Spark in Kubernetes.

2020-10-07 Thread mykidong

Hi all, I have recently written a blog about hive on spark in kubernetes environment: - https://itnext.io/hive-on-spark-in-kubernetes-115c8e9fa5c1 In this blog, you can find how to run hive on kubernetes using spark thrift server compatible with hive server2. Cheers, - Kidong. -- Sent from:

Re: Excessive disk IO with Spark structured streaming

2020-10-07 Thread Sergey Oboguev

Hi Jungtaek, *> I meant the subdirectory inside the directory you're providing as "checkpointLocation", as there're several directories in that directory...* There are two: *my-spark-checkpoint-dir/MainApp* created by sparkSession.sparkContext().setCheckpointDir() contains only empty subdir

Re: Hive using Spark engine vs native spark with hive integration.

2020-10-07 Thread Patrick McCarthy

I think a lot will depend on what the scripts do. I've seen some legacy hive scripts which were written in an awkward way (e.g. lots of subqueries, nested explodes) because pre-spark it was the only way to express certain logic. For fairly straightforward operations I expect Catalyst would reduce

[SparkR] gapply with strings with arrow

2020-10-07 Thread Jacek Pliszka

Hi! Is there any place I can find information how to use gapply with arrow? I've tried something very simple collect(gapply( df, c("ColumnA"), function(key, x){ data.frame(out=c("dfs"), stringAsFactors=FALSE) }, "out String" )) But it fails - similar code with integers or

reading a csv.gz file from sagemaker using pyspark kernel mode

2020-10-07 Thread cloudytech43

I am trying to read a compressed CSV file in pyspark. but I am unable to read in pyspark kernel mode in sagemaker. The same file I can read using pandas when the kernel is conda-python3 (in sagemaker) What I tried : file1 = 's3://testdata/output1.csv.gz' file1_df = spark.read.csv(file1,

Re: Excessive disk IO with Spark structured streaming

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

Re: Hive on Spark in Kubernetes.

[Spark Core] - Installation issue - "java.lang.UnsatisfiedLinkError: no zstd-jni in java.library.path"

Hive on Spark in Kubernetes.

Re: Excessive disk IO with Spark structured streaming

Re: Hive using Spark engine vs native spark with hive integration.

[SparkR] gapply with strings with arrow

reading a csv.gz file from sagemaker using pyspark kernel mode

10 matches

Site Navigation

Mail list logo

Footer information