I'm trying to implement structured spark streaming source for a custom
connector. I'm wondering if it is possible to do predicate pushdown in the
streaming source? I'm aware this may be something native to the datastore in
question. However, I would really appreciate if someone can redirect me to
Hi,
When reducing partitions is better to use coalesce because it doesn't need
to shuffle the data.
dataframe.coalesce(1)
El mar., 23 jun. 2020 23:54, Hichki escribió:
> Hello Team,
>
>
>
> I am new to Spark environment. I have converted Hive query to Spark Scala.
> Now I am loading data and d
Hello Team,
I am new to Spark environment. I have converted Hive query to Spark Scala.
Now I am loading data and doing performance testing. Below are details on
loading 3 weeks data. Cluster level small file avg size is set to 128 MB.
1. New temp table where I am loading data is ORC format
Hi,
I prefer to do most of my projects in Python and for that I use Jupyter.
I have been downloading the compiled version of spark.
I do not normally like the source code version because the build process
makes me nervous.
You know with lines of stuff scrolling up the screen.
What am I am going
As far as I know, in general, there isn't a way to distinguish explicit
null values from missing ones. (Someone please correct me if I'm wrong,
since I would love to be able to do this for my own reasons). If you
really must do it, and don't care about performance at all (since it will
be horribl
Hi
Please look at my issue from the link below.
https://stackoverflow.com/questions/62526118/how-to-differentiate-between-null-and-missing-mongogdb-values-in-a-spark-datafra
Kindly Help
Best
Mannat
Hi,
I'm testing our codebase against spark 3.0.0 stack and I realized that
elasticsearch-hadoop libraries are built against scala 2.11 and thus are
not working with spark 3.0.0. (and probably 2.4.2).
Is there anybody else facing this issue? How did you solve it?
The PR on the ES library is open s