Opened an issue. https://issues.apache.org/jira/browse/SPARK-24144
Since it is a Major issue for us, I have marked it as Major issue. Feel
free to change if that is not the case from Spark's perspective.
On Tue, May 1, 2018 at 4:34 AM, Michael Armbrust
wrote:
> Please
Just wondering-
Given that currently V2 is less performant because of use of Row vs InternalRow
(and other things?), is still evolving, and is missing some of the other
features of V1, it might help to focus on remediating those features and then
look at porting the filesources over.
As for
I agree that Spark should fully handle state serialization and recovery for
most sources. This is how it works in V1, and we definitely wouldn't want
or need to change that in V2.* The question is just whether we should have
an escape hatch for the sources that don't want Spark to do that, and if
I think there's a difference. You're right that we wanted to clean up the
API in V2 to avoid file sources using side channels. But there's a big
difference between adding, for example, a way to report partitioning and
designing for sources that need unbounded state. It's a judgment call, but
I
Thank you Shane!!
On Tue, May 1, 2018 at 8:58 AM, Xiao Li wrote:
> Thank you very much, Shane! Yeah, it works now!
>
> Xiao
>
>
> 2018-05-01 8:40 GMT-07:00 shane knapp :
>
>> and we're back! there was apparently a firewall migration yesterday that
>>
This is usually caused by skew. Sometimes you can work around it by in
creasing the number of partitions like you tried, but when that doesn’t
work you need to change the partitioning that you’re using.
If you’re aggregating, try adding an intermediate aggregation. For example,
if your query is
Thank you very much, Shane! Yeah, it works now!
Xiao
2018-05-01 8:40 GMT-07:00 shane knapp :
> and we're back! there was apparently a firewall migration yesterday that
> went sideways.
>
> shane
>
> On Mon, Apr 30, 2018 at 8:27 PM, shane knapp wrote:
and we're back! there was apparently a firewall migration yesterday that
went sideways.
shane
On Mon, Apr 30, 2018 at 8:27 PM, shane knapp wrote:
> we just noticed that we're unable to connect to jenkins, and have reached
> out to our NOC support staff at our colo. until
Dear Apache Enthusiast,
We are pleased to announce our schedule for ApacheCon North America
2018. ApacheCon will be held September 23-27 at the Montreal Marriott
Chateau Champlain in Montreal, Canada.
Registration is open! The early bird rate of $575 lasts until July 21,
at which time it
Hi
I am getting the above error in Spark SQL . I have increase (using 5000 )
number of partitions but still getting the same error .
My data most probably is skew.
org.apache.spark.shuffle.FetchFailedException: Too large frame: 4247124829
at
Hi Everyone,
I wonder If someone could be so kind and share some light on this problem:
[PySpark.sql.filter not performing as it
should](https://stackoverflow.com/q/49995538)
Cheers,
A.
Sent with [ProtonMail](https://protonmail.com) Secure Email.
Hi Everyone,
I wonder If someone could be so kind and share some light on this problem:
[spark.python.worker.reuse not working as
expected](https://stackoverflow.com/q/50043684)
Cheers,
A.
Sent with [ProtonMail](https://protonmail.com) Secure Email.
Hi Everyone,
I wonder If someone could be so kind and share some light on this problem:
[UnresolvedException: Invalid call to dataType on unresolved object when using
DataSet constructed from Seq.empty (since Spark
2.3.0)](https://stackoverflow.com/q/49757487)
Cheers,
A.
Sent with
13 matches
Mail list logo