Thanks Burak! Appreciate it. This makes sense.
How do you suggest we make sure resulting data doesn't produce tiny files?
If we are not on databricks yet and can not leverage delta lake features?
Also checkpointing feature, do you have active blog/article I can take
a look at to try out an
Hello Users,
I am using on-premise object storage and able to perform operations on
different bucket using aws-cli.
However, when I am trying to use the same path from my spark code, it
fails. Here are the details -
Addes dependencies in build.sbt -
- hadoop-aws-2.7.4.ja
-
Hi Rishi,
That is exactly why Trigger.Once was created for Structured Streaming. The
way we look at streaming is that it doesn't have to be always real time, or
24-7 always on. We see streaming as a workflow that you have to repeat
indefinitely. See this blog post for more details!
Hi All,
I recently started playing with spark streaming, and checkpoint location
feature looks very promising. I wonder if anyone has an opinion about using
spark streaming with checkpoint location option as a slow batch processing
solution. What would be the pros and cons of utilizing streaming
Hi,
I think that we should stop using S3a, and use S3.
Please try refer about EMRFS and how it provides fantastic advantages :)
Regards,
Gourav Sengupta
On Thu, Apr 30, 2020 at 12:54 AM Aniruddha P Tekade
wrote:
> Hello,
>
> I am trying to run a spark job that is trying to write the data
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
With end no in sight.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
With end no in sight.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
NO SHAME
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
13 matches
Mail list logo