Re: DataFrame joins with Spark-Java

2017-11-29 Thread Rishi Mishra
Hi Sushma, can you try as below with a left anti join ..In my example name & id consists of a key. df1.alias("a").join(df2.alias("b"), col("a.name").equalTo(col("b.name")) .and(col("a.id").equalTo(col("b.id"))) , "left_anti").selectExpr("name", "id").show(10,

Kafka version support

2017-11-29 Thread Raghavendra Pandey
Just wondering if anyone has tried spark structured streaming kafka connector (2.2) with Kafka 0.11 or Kafka 1.0 version Thanks Raghav

spark2.2 org.apache.spark.sql.catalyst.errors.package$TreeNodeException

2017-11-29 Thread starstar
when i read hive data,by spark2.2.0 sql. get the exception: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange hashpartitioning(pid#1, 200) +- *HashAggregate(keys=[pid#1], functions=[partial_sum(expnum#0L), partial_sum(outnum#2L)], output=[pid#1, sum#655L,

Re: JDK1.8 for spark workers

2017-11-29 Thread KhajaAsmath Mohammed
This didnt work. I tried it but no luck. On Wed, Nov 29, 2017 at 7:49 PM, Vadim Semenov wrote: > You can pass `JAVA_HOME` environment variable > > `spark.executorEnv.JAVA_HOME=/usr/lib/jvm/java-1.8.0` > > On Wed, Nov 29, 2017 at 10:54 AM, KhajaAsmath Mohammed < >

DataFrame joins with Spark-Java

2017-11-29 Thread sushma spark
Dear Friends, I am new to spark DataFrame. My requirement is i have a dataframe1 contains the today's records and dataframe2 contains yesterday's records. I need to compare the today's records with yesterday's records and find out new records which are not exists in the yesterday's records based

Re: JDK1.8 for spark workers

2017-11-29 Thread Vadim Semenov
You can pass `JAVA_HOME` environment variable `spark.executorEnv.JAVA_HOME=/usr/lib/jvm/java-1.8.0` On Wed, Nov 29, 2017 at 10:54 AM, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Hi, > > I am running cloudera version of spark2.1 and our cluster is on JDK1.7. > For some of the

JDK1.8 for spark workers

2017-11-29 Thread KhajaAsmath Mohammed
Hi, I am running cloudera version of spark2.1 and our cluster is on JDK1.7. For some of the libraries, I need JDK1.8, is there a way to set to run Spark worker in JDK1.8 without upgrading . I was able run driver in JDK 1.8 by setting the path but not the workers. 17/11/28 20:22:27 WARN

AW: [Spark R]: dapply only works for very small datasets

2017-11-29 Thread Kunft, Andreas
?Thanks alot. I will have a lock at the issues Von: Felix Cheung Gesendet: Mittwoch, 29. November 2017 04:47 An: Kunft, Andreas; user@spark.apache.org Betreff: Re: [Spark R]: dapply only works for very small datasets You can find more

[Structured Streaming] Continuous Processing Mode plan?

2017-11-29 Thread Marchant, Hayden
I was really excited by the demo in Summer 2017 on Continuous Processing Mode for Structured Streaming, and have been regularly checking the JIRA item (https://issues.apache.org/jira/browse/SPARK-20928) for activity. We have a project with very low latency requirements that can only be