Rather than calling it hash64, it'd be better to just call it xxhash64. The
reason being ten years from now, we probably would look back and laugh at a
specific hash implementation. It'd be better to just name the expression what
it is.
On Wed, Mar 06, 2019 at 7:59 PM, <
Hi,
I’m working on something that requires deterministic randomness, i.e. a row
gets the same “random” value no matter the order of the DataFrame. A seeded
hash seems to be the perfect way to do this, but the existing hashes have
various limitations:
- hash: 32-bit output (only 4 billion
Do we have other block/critical issues for Spark 2.4.1 or waiting something
to be fixed? I roughly searched the JIRA, seems there's no block/critical
issues marked for 2.4.1.
Thanks
Saisai
shane knapp 于2019年3月7日周四 上午4:57写道:
> i'll be popping in to the sig-big-data meeting on the 20th to talk
I think this was needed to add support for bucketed Hive tables. Like Tyson
noted, if the other side of a join can be bucketed the same way, then Spark
can use a bucketed join. I have long-term plans to support this in the
DataSourceV2 API, but I don't think we are very close to implementing it
I think they might be used in bucketing? Not 100% sure.
On Wed, Mar 06, 2019 at 1:40 PM, < tcon...@gmail.com > wrote:
>
>
>
> Hi,
>
>
>
>
>
>
>
> I noticed the existence of a Hive Hash partitioning implementation in
> Spark, but also noticed that it’s not being used, and that the
Hi,
I noticed the existence of a Hive Hash partitioning implementation in Spark,
but also noticed that it's not being used, and that the Spark hash
partitioning function is presently hardcoded to Murmur3. My question is
whether Hive Hash is dead code or are their future plans to support
i'll be popping in to the sig-big-data meeting on the 20th to talk about
stuff like this.
On Wed, Mar 6, 2019 at 12:40 PM Stavros Kontopoulos <
stavros.kontopou...@lightbend.com> wrote:
> Yes its a touch decision and as we discussed today (
>
Yes its a touch decision and as we discussed today (
https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA
)
"Kubernetes support window is 9 months, Spark is two years". So we may end
up with old client versions on branches still supported like 2.4.x in the
future.
That
i'll be there (again) working the riselab booth april 23-25 in SF... come
by and say hi!
we'll also have demos and information about some of our ongoing research
projects... once we get the details hammered out i'll post more
information here.
looking forward to seeing everyone again. :)
I think the general philosophy here should be Python should be the most liberal
and support a column object, or a literal value. It's also super useful to
support column name, but we need to decide what happens for a string column. Is
a string passed in a literal string value, or a column name?
Two drivers can't be listening on port 4040 at the same time -- on the same
machine. The OS wouldn't allow it. Are they actually on different machines
or somehow different interfaces? or are you saying the reported port is
wrong?
On Wed, Mar 6, 2019 at 12:23 PM Moein Hosseini wrote:
> I've
I've submitted two spark applications in cluster of 3 standalone nodes in
near the same time (I have bash script to submit them one after one without
delay). But something goes wrong. In the master UI, Running applications
section show both of my job with true configuration (cores, memory and
If the old client is basically unusable with the versions of K8S
people mostly use now, and the new client still works with older
versions, I could see including this in 2.4.1.
Looking at https://github.com/fabric8io/kubernetes-client#compatibility-matrix
it seems like the 4.1.1 client is needed
Yes Shane Knapp has done the work for that already, and also tests pass, I
am working on a PR now, I could submit it for the 2.4 branch .
I understand that this is a major dependency update, but the problem I see
is that the client version is so old that I dont think it makes
much sense for
On Wed, Mar 6, 2019 at 7:17 AM Sean Owen wrote:
> The problem is that that's a major dependency upgrade in a maintenance
> release. It didn't seem to work when we applied it to master. I don't
> think it would block a release.
>
> i tested the k8s client 4.1.2 against master a couple of weeks
The problem is that that's a major dependency upgrade in a maintenance
release. It didn't seem to work when we applied it to master. I don't
think it would block a release.
On Wed, Mar 6, 2019 at 6:32 AM Stavros Kontopoulos
wrote:
>
> We need to resolve this
Dear Apache Enthusiast,
(You’re receiving this because you are subscribed to one or more user
mailing lists for an Apache Software Foundation project.)
TL;DR:
* Apache Roadshow DC is in 3 weeks. Register now at
https://apachecon.com/usroadshowdc19/
* Registration for Apache Roadshow Chicago is
We need to resolve this https://issues.apache.org/jira/browse/SPARK-26742
as well for 2.4.1, to make k8s support meaningful as many people are now on
1.11+
Stavros
On Tue, Mar 5, 2019 at 3:12 PM Saisai Shao wrote:
> Hi DB,
>
> I saw that we already have 6 RCs, but the vote I can search by now
19 matches
Mail list logo