I quickly looked into the attached log in SO post, and the problem doesn't
seem to be related to Kafka. The error stack trace is from checkpointing to
GCS, and the implementation of OutputStream for GCS seems to be provided
with Google.
Could you please elaborate the stack trace or upload the log
Hi,
I couldn't reproduce this error :/ I wonder if there is something else
underline causing it...
*Input*
➜ kafka_2.12-2.5.0 ./bin/kafka-console-producer.sh --bootstrap-server
localhost:9092 --topic test1
{"name": "pedro", "age": 50}
>{"name": "pedro", "age": 50}
>{"name": "pedro", "age": 50}
Unsubscribe
On Fri, Jan 15, 2021 at 9:52 AM Dilip Desavali
wrote:
> Unsubscribe
>
Hi,
not sure if it is your case, but if the source data is heavy and deeply
nested I'd recommend explicitly providing the schema when reading the json.
df = spark.read.schema(schema).json(updated_dataset)
On Thu, 21 Jan 2021 at 04:15, srinivasarao daruna
wrote:
> Hi,
> I am running a spark
Hi,
I am running a spark job on a huge dataset. I have allocated 10 R5.16xlarge
machines. (each consists 64cores, 512G).
The source data is json and i need to do some json transformations. So, i
read them as text and then convert to a dataframe.
ds = spark.read.textFile()
updated_dataset =
陆伯鹰
中国投资有限责任公司信息技术部
电话:+86 (0)10 84096521
传真:+86 (0)10 64086851
北京市东城区朝阳门北大街1号新保利大厦8层 100010
网站:www.china-inv.cn
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外
披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件
人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。
This email message
This SO post is pretty much the exact same issue:
https://stackoverflow.com/questions/59962680/spark-structured-streaming-error-while-sending-aggregated-result-to-kafka-topic
The user mentions it's an issue with
org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.4
--
Sent from:
Hi all:
I am having a strange issue incorporating `groupBy` statements into a
structured streaming job when trying to write to Kafka or Delta. Weirdly it
only appears to work if I write to console, or to memory...
*I'm running Spark 3.0.1 with the following dependencies:
*
Heh that could make sense, but that definitely was not my mental model of
how python binds variables! Definitely is not how Scala works.
On Wed, Jan 20, 2021 at 10:00 AM Marco Wong wrote:
> Hmm, I think I got what Jingnan means. The lambda function is x != i and i
> is not evaluated when the
Hmm, I think I got what Jingnan means. The lambda function is x != i and i
is not evaluated when the lambda function was defined. So the pipelined rdd
is rdd.filter(lambda x: x != i).filter(lambda x: x != i), rather than
having the values of i substituted. Does that make sense to you, Sean?
On
No, because the final rdd is really the result of chaining 3 filter
operations. They should all execute. It _should_ work like
"rdd.filter(...).filter(..).filter(...)"
On Wed, Jan 20, 2021 at 9:46 AM Zhu Jingnan wrote:
> I thought that was right result.
>
> As rdd runs on a lacy basis. so
RDDs are still relevant in a few ways - there is no Dataset in Python for
example, so RDD is still the 'typed' API. They still underpin DataFrames.
And of course it's still there because there's probably still a lot of code
out there that uses it. Occasionally it's still useful to drop into that
I thought that was right result.
As rdd runs on a lacy basis. so every time rdd.collect() executed, the i
will be updated to the latest i value, so only one will be filter out.
Regards
Jingnan
On Wed, Jan 20, 2021 at 9:01 AM Sean Owen wrote:
> That looks very odd indeed. Things like this
A. global scope and global variables are bad habits in Python (this is
about an 'rdd' and 'i' variable used in lambda).
B. lambdas are usually misused and abused in Python especially when they
used in global context: ideally you'd like to use pure functions and use
something like:
```
def
That looks very odd indeed. Things like this work as expected:
rdd = spark.sparkContext.parallelize([0,1,2])
def my_filter(data, i):
return data.filter(lambda x: x != i)
for i in range(3):
rdd = my_filter(rdd, i)
rdd.collect()
... as does unrolling the loop.
But your example behaves as if
Hi Marco,
A Scala dev here.
In short: yet another reason against Python :)
Honestly, I've got no idea why the code gives the output. Ran it with
3.1.1-rc1 and got the very same results. Hoping pyspark/python devs will
chime in and shed more light on this.
Pozdrawiam,
Jacek Laskowski
Hi Marco,
IMHO RDD is only for very sophisticated use cases that very few Spark devs
would be capable of. I consider RDD API a sort of Spark assembler and most
Spark devs should stick to Dataset API.
Speaking of HBase, see
The issue is resolved. Resolution is little weird but it worked. It was due
to scala version mismatch with projects in my package.
On Wed, 20 Jan 2021 at 18:07, Mich Talebzadeh
wrote:
> Hi Vikas,
>
> Are you running this on your local laptop etc or using some IDE etc?
>
> What is your available
Hi, my name is Marco and I'm one of the developers behind
https://github.com/unicredit/hbase-rdd
a project we are currently reviewing for various reasons.
We were basically wondering if RDD "is still a thing" nowadays (we see lots of
usage for DataFrames or Datasets) and we're not sure how
Dear Spark users,
I ran the Python code below on a simple RDD, but it gave strange results.
The filtered RDD contains non-existent elements which were filtered away
earlier. Any idea why this happened?
```
rdd = spark.sparkContext.parallelize([0,1,2])
for i in range(3):
print("RDD is ",
Hi Vikas,
Are you running this on your local laptop etc or using some IDE etc?
What is your available memory for Spark?
Start with minimum set like below
def spark_session_local(appName):
return SparkSession.builder \
.master('local[1]') \
.appName(appName) \
Hi Sachit,
I am running it in local. My IP mentioned is the private IP address and
therefore it is useless for anyone.
On Wed, 20 Jan 2021 at 17:37, Sachit Murarka
wrote:
> Hi Vikas
>
> 1. Are you running in local mode? Master has local[*]
> 2. Pls mask the ip or confidential info while
Hi Vikas
1. Are you running in local mode? Master has local[*]
2. Pls mask the ip or confidential info while sharing logs
Thanks
Sachit
On Wed, 20 Jan 2021, 17:35 Vikas Garg, wrote:
> Hi,
>
> I am facing issue with spark executor. I am struggling with this issue
> since last many days and
Hi,
I want to apply custom logic for each row of data I am getting through
kafka and want to do it with microbatch.
When I am running it , it is not progressing.
kafka_stream_df \
.writeStream \
.foreach(process_records) \
.outputMode("append") \
24 matches
Mail list logo