https://github.com/apache/spark/blob/master/docs/building-spark.md#speeding-up-compilation-with-zinc
Could someone summarize how they invoke zinc as part of a regular
build-test-etc. cycle?
I'll add it in to the aforelinked page if appropriate.
Nick
+1 (non-binding)
Checked on CentOS 6.5, compiled from the source.
Ran various examples in stand-alone master and three slaves, and
browsed the web UI.
On Sat, Nov 29, 2014 at 2:16 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark
Hi all,
I created a JIRA to discuss adding RDDs for dimensional (not sure what
else to call it) data like time series and spatial data. Spark could be a
better time series and/or spatial database than existing approaches out
there.
https://issues.apache.org/jira/browse/SPARK-4727
I saw that
You just run it once with zinc -start and leave it running as a
background process on your build machine. You don't have to do
anything for each build.
On Wed, Dec 3, 2014 at 3:44 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
Here's a fix: https://github.com/apache/spark/pull/3586
On Wed, Dec 3, 2014 at 11:05 AM, Michael Armbrust mich...@databricks.com
wrote:
Thanks for reporting. As a workaround you should be able to SET
spark.sql.hive.convertMetastoreParquet=false, but I'm going to try to fix
this before the
I think the problem has to do with akka not picking up the
reference.conf file in the assembly.jar
We managed to make akka pick the conf file by temporary switching the
class loaders.
Thread.currentThread().setContextClassLoader(JavaSparkContext.class.getClassLoader());
The model gets build
Hi, I am wondering the status of the Ooyala Spark Jobserver, any plan to
get it into the spark release?
Best Regards
Jun Feng Liu
IBM China Systems Technology Laboratory in Beijing
Phone: 86-10-82452683
E-mail: liuj...@cn.ibm.com
BLD 28,ZGC Software Park
No.8 Rd.Dong Bei Wang West,
Hi,
I am using SparkSQL on 1.1.0 branch.
The following code leads to a scala.MatchError
at
org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:247)
val scm = StructType(*inputRDD*.schema.fields.init :+
StructField(list,
ArrayType(
StructType(
Will do. Am on the road - will annotate an iPython notebook with what works
what didn't work ...
Cheers
k/
On Wed, Dec 3, 2014 at 4:19 PM, Xiangrui Meng men...@gmail.com wrote:
Krishna, could you send me some code snippets for the issues you saw
in naive Bayes and k-means? -Xiangrui
On Sun,
We are trying to call spark through an osgi service (with osgifyed
version of assembly.jar). Spark does not work (due to the way spark
reads akka reference.conf) unless we switch the class loader as follows.
Thread.currentThread().setContextClassLoader(JavaSparkContext.class.getClassLoader());
Thanks Marcelo, this is just how Maven works (unfortunately) answers my
question.
Another related question: I tried to use `mvn scala:cc` and discovered that
it only seems to work scan src/main and src/test directories (according to its
docs
I realize we're not voting, but +1 to this proposal since commit messages
can't be changed whereas JIRA issues can always be updated after the fact.
2014-12-02 13:05 GMT-08:00 Patrick Wendell pwend...@gmail.com:
Also a note on this for committers - it's possible to re-word the
title during
Hi !
I have the same issue that this one
https://issues.apache.org/jira/browse/SPARK-1693 but using the version
2.5 of Hadoop.
The fix bug is here
https://github.com/witgo/spark/commit/dc63905908cb7c84c741bb5fdc4ad7d4abdcb0b2
for Hadoop 2.4 and 2.3.
For now I just changed my version of
hi, all
where build a graph from edge tuples with api Graph.fromEdgeTuples,
the edges object type is RDD[Edge], inside of EdgeRDD.fromEdge,
EdgePartitionBuilder.add func’s param is better to be Edge object.
no need to create a new Edge object again.
def fromEdgeTuples[VD: ClassTag](
hi, all
where build a graph from edge tuples with api Graph.fromEdgeTuples,
the edges object type is RDD[Edge], inside of EdgeRDD.fromEdge,
EdgePartitionBuilder.add func’s param is better to be Edge object.
no need to create a new Edge object again.
def fromEdgeTuples[VD: ClassTag](
I agree we should separate out the integration tests so it's easy for dev
to just run the other fast tests locally. I opened a jira for it
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-4746
On Nov 30, 2014 3:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi Ryan,
As
Hey Jun,
The Ooyala server is being maintained by it's original author (Evan Chan)
here:
https://github.com/spark-jobserver/spark-jobserver
This is likely to stay as a standalone project for now, since it builds
directly on Spark's public API's.
- Patrick
On Wed, Dec 3, 2014 at 9:02 PM, Jun
Oh, derp. I just assumed from looking at all the options that there was
something to it. Thanks Sean.
On Thu Dec 04 2014 at 7:47:33 AM Sean Owen so...@cloudera.com wrote:
You just run it once with zinc -start and leave it running as a
background process on your build machine. You don't have to
fwiw, when we did this work in HBase, we categorized the tests. Then some
tests can share a single jvm, while some others need to be isolated in
their own jvm. Nevertheless surefire can still run them in parallel by
starting/stopping several jvm.
I think we need to do this as well. Perhaps the
Have you seen this thread http://search-hadoop.com/m/JW1q5xxSAa2 ?
Test categorization in HBase is done through maven-surefire-plugin
Cheers
On Thu, Dec 4, 2014 at 4:05 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
fwiw, when we did this work in HBase, we categorized the tests. Then
I got the following error during Spark startup (Yarn-client mode):
14/12/04 19:33:58 INFO Client: Uploading resource
file:/x/home/jianshuang/spark/spark-latest/lib/datanucleus-api-jdo-3.2.6.jar
-
Actually my HADOOP_CLASSPATH has already been set to include
/etc/hadoop/conf/*
export
HADOOP_CLASSPATH=/etc/hbase/conf/hbase-site.xml:/usr/lib/hbase/lib/hbase-protocol.jar:$(hbase
classpath)
Jianshi
On Fri, Dec 5, 2014 at 11:54 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Looks like
Looks like the datanucleus*.jar shouldn't appear in the hdfs path in
Yarn-client mode.
Maybe this patch broke yarn-client.
https://github.com/apache/spark/commit/a975dc32799bb8a14f9e1c76defaaa7cfbaf8b53
Jianshi
On Fri, Dec 5, 2014 at 12:02 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Correction:
According to Liancheng, this hotfix might be the root cause:
https://github.com/apache/spark/commit/38cb2c3a36a5c9ead4494cbc3dde008c2f0698ce
Jianshi
On Fri, Dec 5, 2014 at 12:45 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Looks like the datanucleus*.jar shouldn't appear in
I created a ticket for this:
https://issues.apache.org/jira/browse/SPARK-4757
Jianshi
On Fri, Dec 5, 2014 at 1:31 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Correction:
According to Liancheng, this hotfix might be the root cause:
Thanks for flagging this. I reverted the relevant YARN fix in Spark
1.2 release. We can try to debug this in master.
On Thu, Dec 4, 2014 at 9:51 PM, Jianshi Huang jianshi.hu...@gmail.com wrote:
I created a ticket for this:
https://issues.apache.org/jira/browse/SPARK-4757
Jianshi
On Fri,
Sorry for the late of follow-up.
I used Hao's DESC EXTENDED command and found some clue:
new (broadcast broken Spark build):
parameters:{numFiles=0, EXTERNAL=TRUE, transient_lastDdlTime=1417763892,
COLUMN_STATS_ACCURATE=false, totalSize=0, numRows=-1, rawDataSize=-1}
old (broadcast working
Hi,
I got exception saying Hive: NoSuchObjectException(message:table table
not found)
when running DROP TABLE IF EXISTS table
Looks like a new regression in Hive module.
Anyone can confirm this?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog:
With Liancheng's suggestion, I've tried setting
spark.sql.hive.convertMetastoreParquet false
but still analyze noscan return -1 in rawDataSize
Jianshi
On Fri, Dec 5, 2014 at 3:33 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
If I run ANALYZE without NOSCAN, then Hive can successfully
29 matches
Mail list logo