Re: Question related to asynchronously map transformation using java spark structured streaming

2023-03-26 Thread Emmanouil Kritharakis
Hello again,

Do we have any news for the above question?
I would really appreciate it.

Thank you,

--

Emmanouil (Manos) Kritharakis

Ph.D. candidate in the Department of Computer Science
<https://sites.bu.edu/casp/people/ekritharakis/>

Boston University


On Tue, Mar 14, 2023 at 12:04 PM Emmanouil Kritharakis <
kritharakismano...@gmail.com> wrote:

> Hello,
>
> I hope this email finds you well!
>
> I have a simple dataflow in which I read from a kafka topic, perform a map
> transformation and then I write the result to another topic. Based on your
> documentation here
> <https://spark.apache.org/docs/3.3.2/structured-streaming-kafka-integration.html#content>,
> I need to work with Dataset data structures. Even though my solution works,
> I need to utilize map transformation asynchronously. So my question is how
> can I asynchronously call map transformation with Dataset data structures
> in a java structured streaming environment? Can you please share a working
> example?
>
> I am looking forward to hearing from you as soon as possible. Thanks in
> advance!
>
> Kind regards
>
> --
>
> Emmanouil (Manos) Kritharakis
>
> Ph.D. candidate in the Department of Computer Science
> <https://sites.bu.edu/casp/people/ekritharakis/>
>
> Boston University
>


Question related to asynchronously map transformation using java spark structured streaming

2023-03-14 Thread Emmanouil Kritharakis
Hello,

I hope this email finds you well!

I have a simple dataflow in which I read from a kafka topic, perform a map
transformation and then I write the result to another topic. Based on your
documentation here
,
I need to work with Dataset data structures. Even though my solution works,
I need to utilize map transformation asynchronously. So my question is how
can I asynchronously call map transformation with Dataset data structures
in a java structured streaming environment? Can you please share a working
example?

I am looking forward to hearing from you as soon as possible. Thanks in
advance!

Kind regards

--

Emmanouil (Manos) Kritharakis

Ph.D. candidate in the Department of Computer Science


Boston University


Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Emmanouil Kritharakis
Hello,

I hope this email finds you well!

I have a simple dataflow in which I read from a kafka topic, perform a map
transformation and then I write the result to another topic. Based on your
documentation here
,
I need to work with Dataset data structures. Even though my solution works,
I need to increase the parallelism. The spark documentation includes a lot
of parameters that I can change based on specific data structures like
*spark.default.parallelism* or *spark.sql.shuffle.partitions*. The former
is the default number of partitions in RDDs returned by transformations
like join, reduceByKey while the later is not recommended for structured
streaming as it is described in documentation: "Note: For structured
streaming, this configuration cannot be changed between query restarts from
the same checkpoint location".

So my question is how can I increase the parallelism for a simple dataflow
based on datasets with a map transformation only?

I am looking forward to hearing from you as soon as possible. Thanks in
advance!

Kind regards,

--

Emmanouil (Manos) Kritharakis

Ph.D. candidate in the Department of Computer Science


Boston University