Found jars in /assembly/target/scala-2.12/jars

2020-06-23 Thread Anwar AliKhan

[Spark Streaming] predicate pushdown in custom connector source.

2020-06-23 Thread Rahul Kumar
I'm trying to implement structured spark streaming source for a custom connector. I'm wondering if it is possible to do predicate pushdown in the streaming source? I'm aware this may be something native to the datastore in question. However, I would really appreciate if someone can redirect me

Re: Spark Small file issue

2020-06-23 Thread German SM
Hi, When reducing partitions is better to use coalesce because it doesn't need to shuffle the data. dataframe.coalesce(1) El mar., 23 jun. 2020 23:54, Hichki escribió: > Hello Team, > > > > I am new to Spark environment. I have converted Hive query to Spark Scala. > Now I am loading data and

Spark Small file issue

2020-06-23 Thread Hichki
Hello Team, I am new to Spark environment. I have converted Hive query to Spark Scala. Now I am loading data and doing performance testing. Below are details on loading 3 weeks data. Cluster level small file avg size is set to 128 MB. 1. New temp table where I am loading data is ORC

Where are all the jars gone ?

2020-06-23 Thread Anwar AliKhan
Hi, I prefer to do most of my projects in Python and for that I use Jupyter. I have been downloading the compiled version of spark. I do not normally like the source code version because the build process makes me nervous. You know with lines of stuff scrolling up the screen. What am I am

Re: apache-spark mongodb dataframe issue

2020-06-23 Thread Jeff Evans
As far as I know, in general, there isn't a way to distinguish explicit null values from missing ones. (Someone please correct me if I'm wrong, since I would love to be able to do this for my own reasons). If you really must do it, and don't care about performance at all (since it will be

apache-spark mongodb dataframe issue

2020-06-23 Thread Harmanat Singh
Hi Please look at my issue from the link below. https://stackoverflow.com/questions/62526118/how-to-differentiate-between-null-and-missing-mongogdb-values-in-a-spark-datafra Kindly Help Best Mannat

elasticsearch-hadoop is not compatible with spark 3.0( scala 2.12)

2020-06-23 Thread murat migdisoglu
Hi, I'm testing our codebase against spark 3.0.0 stack and I realized that elasticsearch-hadoop libraries are built against scala 2.11 and thus are not working with spark 3.0.0. (and probably 2.4.2). Is there anybody else facing this issue? How did you solve it? The PR on the ES library is open