Hi Sushma,
can you try as below with a left anti join ..In my example name & id
consists of a key.
df1.alias("a").join(df2.alias("b"),
col("a.name").equalTo(col("b.name"))
.and(col("a.id").equalTo(col("b.id"))) ,
"left_anti").selectExpr("name", "id").show(10,
Just wondering if anyone has tried spark structured streaming kafka
connector (2.2) with Kafka 0.11 or Kafka 1.0 version
Thanks
Raghav
when i read hive data,by spark2.2.0 sql. get the exception:
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
tree:
Exchange hashpartitioning(pid#1, 200)
+- *HashAggregate(keys=[pid#1], functions=[partial_sum(expnum#0L),
partial_sum(outnum#2L)], output=[pid#1, sum#655L,
This didnt work. I tried it but no luck.
On Wed, Nov 29, 2017 at 7:49 PM, Vadim Semenov
wrote:
> You can pass `JAVA_HOME` environment variable
>
> `spark.executorEnv.JAVA_HOME=/usr/lib/jvm/java-1.8.0`
>
> On Wed, Nov 29, 2017 at 10:54 AM, KhajaAsmath Mohammed <
>
Dear Friends,
I am new to spark DataFrame. My requirement is i have a dataframe1 contains
the today's records and dataframe2 contains yesterday's records. I need to
compare the today's records with yesterday's records and find out new
records which are not exists in the yesterday's records based
You can pass `JAVA_HOME` environment variable
`spark.executorEnv.JAVA_HOME=/usr/lib/jvm/java-1.8.0`
On Wed, Nov 29, 2017 at 10:54 AM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:
> Hi,
>
> I am running cloudera version of spark2.1 and our cluster is on JDK1.7.
> For some of the
Hi,
I am running cloudera version of spark2.1 and our cluster is on JDK1.7. For
some of the libraries, I need JDK1.8, is there a way to set to run Spark
worker in JDK1.8 without upgrading .
I was able run driver in JDK 1.8 by setting the path but not the workers.
17/11/28 20:22:27 WARN
?Thanks alot.
I will have a lock at the issues
Von: Felix Cheung
Gesendet: Mittwoch, 29. November 2017 04:47
An: Kunft, Andreas; user@spark.apache.org
Betreff: Re: [Spark R]: dapply only works for very small datasets
You can find more
I was really excited by the demo in Summer 2017 on Continuous Processing Mode
for Structured Streaming, and have been regularly checking the JIRA item
(https://issues.apache.org/jira/browse/SPARK-20928) for activity.
We have a project with very low latency requirements that can only be