so how to run the check locally?
On master tree, sbt mimaReportBinaryIssues Seems to lead to a lot of errors
reported. Do we need to modify SparkBuilder.scala etc to run it locally? Could
not figure out how Jekins run the check on its console outputs.
Best Regards,
Raymond Liu
-Original
when insert data (the data is small, it will not be partitioned
automatically)into one table which is on tachyon, how can i control the
data position, i mean how can i point which machine the data should exist
on?
if we can not control, what is the data assign strategy of tachyon or spark?
I went ahead and created JIRAs.
JIRA for Hierarchical Clustering:
https://issues.apache.org/jira/browse/SPARK-2429
JIRA for Standarized Clustering APIs:
https://issues.apache.org/jira/browse/SPARK-2430
Before submitting a PR for the standardized API, I want to implement a
few clustering
Might be worth checking out scikit-learn and mahout to get some broad ideas—
Sent from Mailbox
On Thu, Jul 10, 2014 at 4:25 PM, RJ Nowling rnowl...@gmail.com wrote:
I went ahead and created JIRAs.
JIRA for Hierarchical Clustering:
https://issues.apache.org/jira/browse/SPARK-2429
JIRA for
Hi,
I've implemented a class that does Chi-squared feature selection for
RDD[LabeledPoint]. It also computes basic class/feature occurrence statistics
and other methods like mutual information or information gain can be easily
implemented. I would like to make a pull request. However, MLlib
Just a heads up, we merged Prashant's work on having the sbt build read all
dependencies from Maven. Please report any issues you find on the dev list
or on JIRA.
One note here for developers, going forward the sbt build will use the same
configuration style as the maven build (-D for options and
Woot!
On Thu, Jul 10, 2014 at 11:15 AM, Patrick Wendell patr...@databricks.com
wrote:
Just a heads up, we merged Prashant's work on having the sbt build read all
dependencies from Maven. Please report any issues you find on the dev list
or on JIRA.
One note here for developers, going
Cool~
On Thu, Jul 10, 2014 at 1:29 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
Woot!
On Thu, Jul 10, 2014 at 11:15 AM, Patrick Wendell patr...@databricks.com
wrote:
Just a heads up, we merged Prashant's work on having the sbt build read
all
dependencies from Maven. Please report
Hi devs!
Right now it takes a non-trivial amount of time to launch EC2 clusters.
Part of this time is spent starting the EC2 instances, which is out of our
control. Another part of this time is spent installing stuff on and
configuring the instances. This, we can control.
I’d like to explore
Had a few quick questions...
Just wondering if right now spark sql is expected to be thread safe on
master?
doing a simple hadoop file - RDD - schema RDD - write parquet
will fail in reflection code if i run these in a thread pool.
The SparkSqlSerializer, seems to create a new Kryo instance
Hey Ian,
Thanks for bringing these up! Responses in-line:
Just wondering if right now spark sql is expected to be thread safe on
master?
doing a simple hadoop file - RDD - schema RDD - write parquet
will fail in reflection code if i run these in a thread pool.
You are probably hitting
You are partially correct.
It's not terribly complex, but also not easy to accomplish. Sounds like you
want to manage some partially/fully baked AMI's with the core spark libs and
dependencies already on the image. Main issues that crop up are:
1) image sprawl, as libs/config/defaults/etc
-1 I honestly do not know the voting rules for the Spark community, so
please excuse me if I am out of line or if Mesos compatibility is not a
concern at this point.
We just tried to run this version built against 2.3.0-cdh5.0.2 on mesos
0.18.2. All of our jobs with data above a few gigabytes
Just realized the deadline was Monday, my apologies. The issue
nevertheless stands.
On Thu, Jul 10, 2014 at 9:28 PM, Gary Malouf malouf.g...@gmail.com wrote:
-1 I honestly do not know the voting rules for the Spark community, so
please excuse me if I am out of line or if Mesos compatibility
14 matches
Mail list logo