Is it a streaming job?
On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry wrote:
> I have a Spark job that consists of a large number of Window operations
> and hence involves large shuffles. I have roughly 900 GiBs of data,
> although I am using a large enough cluster (10 * m5.4xlarge instances). I
>
I have a Spark job that consists of a large number of Window operations and
hence involves large shuffles. I have roughly 900 GiBs of data, although I
am using a large enough cluster (10 * m5.4xlarge instances). I am using the
following configurations for the job, although I have tried various
Difficult things in spark is debugging and tuning.
Not yet. Learning spark
On Fri, Sep 6, 2019 at 2:17 PM Shyam P wrote:
> cool ,but did you find a way or anyhelp or clue ?
>
> On Fri, Sep 6, 2019 at 11:40 PM David Zhou wrote:
>
>> I have the same question with yours
>>
>> On Thu, Sep 5, 2019 at 9:18 PM Shyam P wrote:
>>
>>> Hi,
>>>
>>> I am
cool ,but did you find a way or anyhelp or clue ?
On Fri, Sep 6, 2019 at 11:40 PM David Zhou wrote:
> I have the same question with yours
>
> On Thu, Sep 5, 2019 at 9:18 PM Shyam P wrote:
>
>> Hi,
>>
>> I am using spark-sql-2.4.1v to streaming in my PoC.
>>
>> how to refresh the loaded
Hi,
My streaming job consumes data from kafka and writes them into Cassandra.
Current status:
Cassandra is not stable. Streaming job crashed when it can't write data
into Cassandra. Streaming job has check point. Usually, the Cassandra
cluster will come back in 4 hours. Finally, I start the
I have the same question with yours
On Thu, Sep 5, 2019 at 9:18 PM Shyam P wrote:
> Hi,
>
> I am using spark-sql-2.4.1v to streaming in my PoC.
>
> how to refresh the loaded dataframe from hdfs/cassandra table every time
> new batch of stream processed ? What is the practice followed in general
Gabor,
Thanks for the clarification.
Thanks
On Fri, Sep 6, 2019 at 12:38 AM Gabor Somogyi
wrote:
> Sethupathi,
>
> Let me extract then the important part what I've shared:
>
> 1. "This ensures that each Kafka source has its own consumer group that
> does not face interference from any other
Sethupathi,
Let me extract then the important part what I've shared:
1. "This ensures that each Kafka source has its own consumer group that
does not face interference from any other consumer"
2. Consumers may eat the data from each other, offset calculation may give
back wrong result (that's
I believe this issue was fixed in Spark 2.4.
Spark DataSource V2 has been still being radically developed - It is not
complete yet until now.
So, I think the feasible option to get through at the current moment is:
1. upgrade to higher Spark versions
2. disable filter push down at your
Hi
I'm trying to use sparkContext.addJar method for adding new jar files
like TomCat. But in some cases, it does not work well.
The error message says an Executor can not load an anonymous function.
Why anonymous functions cannot be loaded in spite of adding a jar to all
Executors?
This is
Hi,
I am using spark v2.3.2. I have an implementation of DSV2. Here is what is
happening:
1) Obtained a dataframe using MyDataSource
scala> val df1 = spark.read.format("com.shubham.MyDataSource").load
> MyDataSource.MyDataSource
> MyDataSource.createReader: Going to create a new
12 matches
Mail list logo