I can't spend too much time on explaining one by one. I strongly encourage
you to do a deep-dive instead of just looking around as you want to know
about "details" - that's how open source works.
I'll go through a general explanation instead of replying inline; probably
I'd write a blog doc if
Thank you so much for your feedback, Koert.
Yes, SPARK-20202 was created in April 2017
and targeted for 3.1.0 since Nov 2019.
However, I believe Apache Spark 3.1.0 (Hadoop 3.2/Hive 2.3 distribution)
will work with old Hadoop 2.x clusters
if you isolated the classpath via SPARK-31960.
it seems to me with SPARK-20202 we are no longer planning to support
hadoop2 + hive 1.2. is that correct?
so basically spark 3.1 will no longer run on say CDH 5.x or HDP2.x with
hive?
my use case is building spark 3.1 and launching on these existing clusters
that are not managed by me. e.g. i do
Thank you very much!
Отправлено с iPhone
> 7 окт. 2020 г., в 17:38, mykidong написал(а):
>
> Hi all,
>
> I have recently written a blog about hive on spark in kubernetes
> environment:
> - https://itnext.io/hive-on-spark-in-kubernetes-115c8e9fa5c1
>
> In this blog, you can find how to run
Dear all,I have setup two Spark standalone test clusters which both suffered
from the same problem. I have a workaround but it's bad. I would appreciate
some help and input. I'm too much of a beginner to conclude that it's a bug
but I found someone else having the exact same issue on Stack
Hi all,
I have recently written a blog about hive on spark in kubernetes
environment:
- https://itnext.io/hive-on-spark-in-kubernetes-115c8e9fa5c1
In this blog, you can find how to run hive on kubernetes using spark thrift
server compatible with hive server2.
Cheers,
- Kidong.
--
Sent from:
Hi Jungtaek,
*> I meant the subdirectory inside the directory you're providing as
"checkpointLocation", as there're several directories in that directory...*
There are two:
*my-spark-checkpoint-dir/MainApp*
created by sparkSession.sparkContext().setCheckpointDir()
contains only empty subdir
I think a lot will depend on what the scripts do. I've seen some legacy
hive scripts which were written in an awkward way (e.g. lots of subqueries,
nested explodes) because pre-spark it was the only way to express certain
logic. For fairly straightforward operations I expect Catalyst would reduce
Hi!
Is there any place I can find information how to use gapply with arrow?
I've tried something very simple
collect(gapply(
df,
c("ColumnA"),
function(key, x){
data.frame(out=c("dfs"), stringAsFactors=FALSE)
},
"out String"
))
But it fails - similar code with integers or
I am trying to read a compressed CSV file in pyspark. but I am unable to read
in pyspark kernel mode in sagemaker.
The same file I can read using pandas when the kernel is conda-python3 (in
sagemaker)
What I tried :
file1 = 's3://testdata/output1.csv.gz'
file1_df = spark.read.csv(file1,
10 matches
Mail list logo