Re: Why spark-submit works with package not with jar

2020-10-20 Thread Wim Van Leuven
Sean, Problem with the -packages is that in enterprise settings security might not allow the data environment to link to the internet or even the internal proxying artefect repository. Also, wasn't uberjars an antipattern? For some reason I don't like them... Kind regards -wim On Wed, 21 Oct

Re: Spark Structured streaming - Kakfa - slowness with query 0

2020-10-20 Thread lec ssmi
Structured streaming's bottom layer also uses a micro-batch mechanism. It seems that the first batch is slower than the latter, I also often encounter this problem. It feels related to the division of batches. Other the other hand, spark's batch size is usually bigger than flume

Re: Spark Structured streaming - Kakfa - slowness with query 0

2020-10-20 Thread KhajaAsmath Mohammed
Yes. Changing back to latest worked but I still see the slowness compared to flume. Sent from my iPhone > On Oct 20, 2020, at 10:21 PM, lec ssmi wrote: > >  > Do you start your application with chasing the early Kafka data ? > > Lalwani, Jayesh 于2020年10月21日周三 上午2:19写道: >> Are you

Re: Spark Structured streaming - Kakfa - slowness with query 0

2020-10-20 Thread lec ssmi
Do you start your application with chasing the early Kafka data ? Lalwani, Jayesh 于2020年10月21日周三 上午2:19写道: > Are you getting any output? Streaming jobs typically run forever, and keep > processing data as it comes in the input. If a streaming job is working > well, it will typically generate

??The decimal result is incorrectly enlarged by 100 times??

2020-10-20 Thread ??????
Hi , I have came across a problem about correctness of spark decimal, and I have researched for it a few days. This problem is very curious. My spark version is spark 2.3.1 I have a sql like this: Create table table_S stored as orc as Select a*b*c from table_a Union all Select d from table_B

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
Thanks again all. Anyway as Nicola suggested I used the trench war approach to sort this out by just using jars and working out their dependencies in ~/.ivy2/jars directory using grep -lRi :) This now works with just using jars (new added ones in grey) after resolving the dependencies

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Sean Owen
Rather, let --packages (via Ivy) worry about them, because they tell Ivy what they need. There's no 100% guarantee that conflicting dependencies are resolved in a way that works in every single case, which you run into sometimes when using incompatible libraries, but yes this is the point of

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
or just use mvn or sbt to create an Uber jar file. LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * *Disclaimer:* Use it at your own risk. Any and all

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
Thanks again all. Hi Sean, As I understood from your statement, you are suggesting just use --packages without worrying about individual jar dependencies? LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Sean Owen
>From the looks of it, it's the com.google.http-client ones. But there may be more. You should not have to reason about this. That's why you let Maven / Ivy resolution figure it out. It is not true that everything in .ivy2 is on the classpath. On Tue, Oct 20, 2020 at 3:48 PM Mich Talebzadeh

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Nicolas Paris
you can proceed step by step. > java.lang.NoClassDefFoundError: > com/google/api/client/http/HttpRequestInitializer I would run `grep -lRi HttpRequestInitializer` in the ivy2 folder to spot the jar containing that class. after several other class not found you should succeed Mich Talebzadeh

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
Hi Nicolas, I removed ~/.iv2 and reran the spark job with the package included (the one working) Under ~/.ivy/jars I Have 37 jar files, including the one that I had before. /home/hduser/.ivy2/jars> ls com.databricks_spark-avro_2.11-4.0.0.jar

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Nicolas Paris
once you got the jars from --package in the ~/.ivy2 folder you can then add the list to --jars . in this way there is no missing dependency. ayan guha writes: > Hi > > One way to think of this is --packages is better when you have third party > dependency and --jars is better when you have

Re: Why spark-submit works with package not with jar

2020-10-20 Thread ayan guha
Hi One way to think of this is --packages is better when you have third party dependency and --jars is better when you have custom in-house built jars. On Wed, 21 Oct 2020 at 3:44 am, Mich Talebzadeh wrote: > Thanks Sean and Russell. Much appreciated. > > Just to clarify recently I had issues

Re: Spark Structured streaming - Kakfa - slowness with query 0

2020-10-20 Thread Lalwani, Jayesh
Are you getting any output? Streaming jobs typically run forever, and keep processing data as it comes in the input. If a streaming job is working well, it will typically generate output at a certain cadence From: KhajaAsmath Mohammed Date: Tuesday, October 20, 2020 at 1:23 PM To: "user

Spark Structured streaming - Kakfa - slowness with query 0

2020-10-20 Thread KhajaAsmath Mohammed
Hi, I have started using spark structured streaming for reading data from kaka and the job is very slow. Number of output rows keeps increasing in query 0 and the job is running forever. any suggestions for this please? [image: image.png] Thanks, Asmath

Pyspark Framework for Apache Atlas (especially Tagging)

2020-10-20 Thread Dennis Suhari
Hi Spark Community, does somebody knows a Pyspark framework that integrates with Apache Atlas ? I want to trigger tagging etc. durch my Pyspark Dataframe Operations. Atlas has an API which I could use. So I could write my own framework. But before I do this I wanted to ask whether knows

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
Thanks Sean and Russell. Much appreciated. Just to clarify recently I had issues with different versions of Google Guava jar files in building Uber jar file (to evict the unwanted ones). These used to work a year and half ago using Google Dataproc compute engines (comes with Spark preloaded) and

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Sean Owen
Probably because your JAR file requires other JARs which you didn't supply. If you specify a package, it reads metadata like a pom.xml file to understand what other dependent JARs also need to be loaded. On Tue, Oct 20, 2020 at 10:50 AM Mich Talebzadeh wrote: > Hi, > > I have a scenario that I

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Russell Spitzer
--jar Adds only that jar --package adds the Jar and a it's dependencies listed in maven On Tue, Oct 20, 2020 at 10:50 AM Mich Talebzadeh wrote: > Hi, > > I have a scenario that I use in Spark submit as follows: > > spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars >

Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
Hi, I have a scenario that I use in Spark submit as follows: spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar, */home/hduser/jars/spark-bigquery_2.11-0.2.6.jar* As you can see the jar files needed

【The decimal result is incorrectly enlarged by 100 times】

2020-10-20 Thread 王长春
Hi , I have came across a problem about correctness of spark decimal, and I have researched for it a few days. This problem is very curious. My spark version is spark 2.3.1 I have a sql like this: Create table table_S stored as orc as Select a*b*c from table_a Union all Select d from table_B

Organize an Meetup of Apache Spark

2020-10-20 Thread Raúl Martín Saráchaga Díaz
Hi, I would like to organize a meetup of Apache Spark in Lima, Peru. I love share with all the community. Regards, Raúl Saráchaga