Hi all,
Does anyone know the reasoning behind implementing
org.apache.spark.graphx.TripletFields in Java instead of Scala? It doesn't
look like there's anything in there that couldn't be done in Scala.
Nothing serious, just curious. Thanks!
-Jay
The static fields - Scala can't express JVM static fields unfortunately.
Those will be important once we provide the Java API.
On Thu, Jan 15, 2015 at 8:58 AM, Jay Hutfles jayhutf...@gmail.com wrote:
Hi all,
Does anyone know the reasoning behind implementing
Alex,
I didn't communicate properly. By private, I simply meant the expectation
that it is not a public API. The plan is to still omit it from the
scaladoc/javadoc generation, but no language visibility modifier will be
applied on them.
After 1.3, you will likely no longer need to use things in
It's a bunch of strategies defined here:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
In most common use cases (e.g. inner equi join), filters are pushed below
the join or into the join. Doing a cartesian product followed
Thanks, that helps a bit at least with the NaN but the MSE is still very
high even with that step size and 10k iterations:
training Mean Squared Error = 3.3322561285919316E7
Does this method need say 100k iterations?
On Thu, Jan 15, 2015 at 5:42 PM, Robin East robin.e...@xense.co.uk wrote:
-dev, +user
You’ll need to set the gradient descent step size to something small - a bit of
trial and error shows that 0.0001 works.
You’ll need to create a LinearRegressionWithSGD instance and set the step size
explicitly:
val lr = new LinearRegressionWithSGD()
I am new to Spark and GraphX, however, I use Tinkerpop backed graphs and
think the idea of using Tinkerpop as the API for GraphX is a great idea and
hope you are still headed in that direction. I noticed that Tinkerpop 3 is
moving into the Apache family:
Reynold,
Thanks for the heads up. In general, I strongly oppose the use of private
to restrict access to certain parts of the API, the reason being that I
might find the need to use some of the internals of a library from my own
project. I find that a @DeveloperAPI annotation serves the same
It looks like you're training on the non-scaled data but testing on the
scaled data. Have you tried this training testing on only the scaled
data?
On Thu, Jan 15, 2015 at 10:42 AM, Devl Devel devl.developm...@gmail.com
wrote:
Thanks, that helps a bit at least with the NaN but the MSE is still
What Reynold is describing is a performance optimization in implementation,
but the semantics of the join (cartesian product plus relational algebra
filter) should be the same and produce the same results.
On Thu, Jan 15, 2015 at 1:36 PM, Reynold Xin r...@databricks.com wrote:
It's a bunch of
Reynold,
One thing I'd like worked into the public portion of the API is the json
inferencing logic that creates a Set[(String, StructType)] out of
Map[String,Any]. SPARK-5260 addresses this so that I can use Accumulators
to infer my schema instead of forcing a map/reduce phase to occur on an RDD
Not so sure about your question, but the SparkStrategies.scala and
Optimizer.scala is a good start if you want to get details of the join
implementation or optimization.
-Original Message-
From: Andrew Ash [mailto:and...@andrewash.com]
Sent: Friday, January 16, 2015 4:52 AM
To: Reynold
Hi guys,
A few people seem to have the same problem with Spark 1.2.0 so I figured I
would push it here.
see:
http://apache-spark-user-list.1001560.n3.nabble.com/MissingRequirementError-with-spark-td21149.html
In a nutshell, for sbt test to work, we now need to fork a JVM and also give
more
13 matches
Mail list logo