I have a structured streaming query which sinks to Kafka. This query has a
complex aggregation logic.
I would like to sink the output DF of this query to multiple Kafka topics
each partitioned on a different ‘key’ column. I don’t want to have
multiple Kafka sinks for each of the different
We actually moved away from using the driver pod IP because of
https://github.com/apache-spark-on-k8s/spark/issues/482. The current way
this works is that the driver url is constructed based on the value of
"spark.driver.host" that is set to the DNS name of the headless driver
service in the
So, we've run into something interesting. In our case, we've got some
proprietary networking HW which is very feature limited in the TCP/IP
space, so using Romana, executors can't seem to find the driver using the
hostname lookup method it's attempting. Is there any way to make it use IP?
Thanks,
Hi,
I am getting below error
Caused by: org.apache.kafka.clients.consumer.OffsetOutOfRangeException:
Offsets out of range with no configured reset policy for partitions:
{topic1-0=304337}
as soon as I submit a spark app to my cluster.
I am using below dependency
name:
Can you try running your query with static literal for date filter.
(join_date >= SOME 2 MONTH OLD DATE). I cannot think of any reason why this
query should create more than 60 tasks.
On 12 Feb 2018 6:26 am, "amit kumar singh" wrote:
Hi
create table emp as select *
Thanks Nicholas. It makes sense. Now that I have a hint, I can play with it too!
jg
> On Feb 11, 2018, at 19:15, Nicholas Hakobian
> wrote:
>
> I spent a few minutes poking around in the source code and found this:
>
> The data type representing None, used
See
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
and
https://stackoverflow.com/questions/42448564/spark-sql-window-function-with-complex-condition
for a more involved example
KhajaAsmath Mohammed schrieb am Mo. 12. Feb. 2018
um
I am also looking for the same answer. Will this work in streaming application
too ??
Sent from my iPhone
> On Feb 12, 2018, at 8:12 AM, Debabrata Ghosh wrote:
>
> Georg - Thanks ! Will you be able to help me with a few examples please.
>
> Thanks in advance again !
>
Georg - Thanks ! Will you be able to help me with a few examples please.
Thanks in advance again !
Cheers,
D
On Mon, Feb 12, 2018 at 6:03 PM, Georg Heiler
wrote:
> You should look into window functions for spark sql.
> Debabrata Ghosh schrieb
Hey people, sortByKey is not lazy evaluated in spark because of it will use
rang partition but i have one confusion it will load whole data from source
or only sample data. so if i use sortByKey it will get whole data in memory
or only sample so second time when i run real action it will load the
You should look into window functions for spark sql.
Debabrata Ghosh schrieb am Mo. 12. Feb. 2018 um
13:10:
> Hi,
> Greetings !
>
> I needed some efficient way in pyspark to execute a
> comparison (on all the attributes) between the
Hi,
Greetings !
I needed some efficient way in pyspark to execute a
comparison (on all the attributes) between the current row and the previous
row. My intent here is to leverage the distributed framework of Spark to
the best extent so that can achieve a good
Hi,
I have questions regarding spark structured streaming deployment
recommendation
I have +- 100 kafka topics that can be processed using similar code block.
I am using pyspark 2.2.1
Here is the idea:
TOPIC_LIST = ["topic_a","topic_b"."topic_c"]
stream = {}
for t in TOPIC_LIST:
13 matches
Mail list logo