Ah yeah sorry I got a bit mixed up.
On Wed, Mar 28, 2018 at 7:54 PM Ted Yu wrote:
> bq. this shuffle could outweigh the benefits of the organized data if the
> cardinality is lower.
>
> I wonder if you meant higher in place of the last word above.
>
> Cheers
>
> On Wed, Mar
bq. this shuffle could outweigh the benefits of the organized data if the
cardinality is lower.
I wonder if you meant higher in place of the last word above.
Cheers
On Wed, Mar 28, 2018 at 7:50 PM, Russell Spitzer
wrote:
> For added color, one thing that I may want
For added color, one thing that I may want to consider as a data source
implementer is the cost / benefit of applying a particular clustering. For
example, a dataset with low cardinality in the clustering key could benefit
greatly from clustering on that key before writing to Cassandra since
>
> Spark would always apply the required clustering and sort order because
> they are required by the data source. It is reasonable for a source to
> reject data that isn’t properly prepared. For example, data must be written
> to HTable files with keys in order or else the files are invalid.
How would Spark determine whether or not to apply a recommendation - a cost
threshold?
Spark would always apply the required clustering and sort order because
they are required by the data source. It is reasonable for a source to
reject data that isn’t properly prepared. For example, data must be
Thanks for starting this discussion.
When I was troubleshooting Spark on K8s, I often faced a need to turn on
debug messages on the driver and executor pods of my jobs, which would be
possible if I somehow put the right log4j.properties file inside the pods.
I know I can build custom Docker
If you need the functionality I would recommend you just copying the code
over to your project and use it that way.
On Wed, Mar 28, 2018 at 9:02 AM Felix Cheung
wrote:
> I think the difference is py4j is a public library whereas the R backend
> is specific to SparkR.
I think the difference is py4j is a public library whereas the R backend is
specific to SparkR.
Can you elaborate what you need JVMObjectTracker for? We have provided R
convenient APIs to call into JVM: sparkR.callJMethod for example
_
From: Jeremy Liu
As Lucas said, those directories are generated and copied when you run a
full maven build with the -Pkubernetes flag specified (or use instructions
in
https://spark.apache.org/docs/latest/building-spark.html#building-a-runnable-distribution
).
Also, using the Kubernetes integration in the main
We discussed this early on in our fork and I think we should have this in a
JIRA and discuss it further. It's something we want to address in the
future.
One proposed method is using a StatefulSet of size 1 for the driver. This
ensures recovery but at the same time takes away from the completion
How would Spark determine whether or not to apply a recommendation - a cost
threshold? And yes, it would be good to flesh out what information we get
from Spark in the datasource when providing these
recommendations/requirements - I could see statistics and the existing
outputPartitioning/Ordering
A carry-over from the apache-spark-on-k8s project, it would be useful to
have a configurable restart policy for submitted jobs with the Kubernetes
resource manager. See the following issues:
https://github.com/apache-spark-on-k8s/spark/issues/133
Are you building on the fork or on the official release now? I built v2.3.0
from source w/out issue. One thing I noticed is that I needed to run the
build-image command from the bin which was placed in dist/ as opposed to
the one in the repo (as that's how it copies the necessary targets).
Hi,
I built apache-spark-on-k8s from source on Ubuntu 16.04 and it got built
without errors. Next, I wanted to create docker images, so as explained at
https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html I
used sbin/build-push-docker-images.sh to create those. While using
14 matches
Mail list logo