zinc invocation examples

2014-12-04 Thread Nicholas Chammas
https://github.com/apache/spark/blob/master/docs/building-spark.md#speeding-up-compilation-with-zinc Could someone summarize how they invoke zinc as part of a regular build-test-etc. cycle? I'll add it in to the aforelinked page if appropriate. Nick

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-04 Thread Takeshi Yamamuro
+1 (non-binding) Checked on CentOS 6.5, compiled from the source. Ran various examples in stand-alone master and three slaves, and browsed the web UI. On Sat, Nov 29, 2014 at 2:16 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark

RDDs for dimensional (time series, spatial) data

2014-12-04 Thread RJ Nowling
Hi all, I created a JIRA to discuss adding RDDs for dimensional (not sure what else to call it) data like time series and spatial data. Spark could be a better time series and/or spatial database than existing approaches out there. https://issues.apache.org/jira/browse/SPARK-4727 I saw that

Re: zinc invocation examples

2014-12-04 Thread Sean Owen
You just run it once with zinc -start and leave it running as a background process on your build machine. You don't have to do anything for each build. On Wed, Dec 3, 2014 at 3:44 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote:

Re: [Thrift,1.2 RC] what happened to parquet.hive.serde.ParquetHiveSerDe

2014-12-04 Thread Michael Armbrust
Here's a fix: https://github.com/apache/spark/pull/3586 On Wed, Dec 3, 2014 at 11:05 AM, Michael Armbrust mich...@databricks.com wrote: Thanks for reporting. As a workaround you should be able to SET spark.sql.hive.convertMetastoreParquet=false, but I'm going to try to fix this before the

Re: packaging spark run time with osgi service

2014-12-04 Thread Lochana Menikarachchi
I think the problem has to do with akka not picking up the reference.conf file in the assembly.jar We managed to make akka pick the conf file by temporary switching the class loaders. Thread.currentThread().setContextClassLoader(JavaSparkContext.class.getClassLoader()); The model gets build

Ooyala Spark JobServer

2014-12-04 Thread Jun Feng Liu
Hi, I am wondering the status of the Ooyala Spark Jobserver, any plan to get it into the spark release? Best Regards Jun Feng Liu IBM China Systems Technology Laboratory in Beijing Phone: 86-10-82452683 E-mail: liuj...@cn.ibm.com BLD 28,ZGC Software Park No.8 Rd.Dong Bei Wang West,

scala.MatchError on SparkSQL when creating ArrayType of StructType

2014-12-04 Thread invkrh
Hi, I am using SparkSQL on 1.1.0 branch. The following code leads to a scala.MatchError at org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:247) val scm = StructType(*inputRDD*.schema.fields.init :+ StructField(list, ArrayType( StructType(

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-04 Thread Krishna Sankar
Will do. Am on the road - will annotate an iPython notebook with what works what didn't work ... Cheers k/ On Wed, Dec 3, 2014 at 4:19 PM, Xiangrui Meng men...@gmail.com wrote: Krishna, could you send me some code snippets for the issues you saw in naive Bayes and k-means? -Xiangrui On Sun,

spark osgi class loading issue

2014-12-04 Thread Lochana Menikarachchi
We are trying to call spark through an osgi service (with osgifyed version of assembly.jar). Spark does not work (due to the way spark reads akka reference.conf) unless we switch the class loader as follows. Thread.currentThread().setContextClassLoader(JavaSparkContext.class.getClassLoader());

Re: Spurious test failures, testing best practices

2014-12-04 Thread Ryan Williams
Thanks Marcelo, this is just how Maven works (unfortunately) answers my question. Another related question: I tried to use `mvn scala:cc` and discovered that it only seems to work scan src/main and src/test directories (according to its docs

Re: keeping PR titles / descriptions up to date

2014-12-04 Thread Andrew Or
I realize we're not voting, but +1 to this proposal since commit messages can't be changed whereas JIRA issues can always be updated after the fact. 2014-12-02 13:05 GMT-08:00 Patrick Wendell pwend...@gmail.com: Also a note on this for committers - it's possible to re-word the title during

Dependent on multiple versions of servlet-api jars lead to throw an SecurityException when Spark built for hadoop 2.5.0

2014-12-04 Thread nivdul
Hi ! I have the same issue that this one https://issues.apache.org/jira/browse/SPARK-1693 but using the version 2.5 of Hadoop. The fix bug is here https://github.com/witgo/spark/commit/dc63905908cb7c84c741bb5fdc4ad7d4abdcb0b2 for Hadoop 2.4 and 2.3. For now I just changed my version of

a question of Graph build api

2014-12-04 Thread jinkui.sjk
hi, all where build a graph from edge tuples with api Graph.fromEdgeTuples, the edges object type is RDD[Edge], inside of EdgeRDD.fromEdge, EdgePartitionBuilder.add func’s param is better to be Edge object. no need to create a new Edge object again. def fromEdgeTuples[VD: ClassTag](

a question of Graph build api

2014-12-04 Thread jinkui . sjk
hi, all where build a graph from edge tuples with api Graph.fromEdgeTuples, the edges object type is RDD[Edge], inside of EdgeRDD.fromEdge, EdgePartitionBuilder.add func’s param is better to be Edge object. no need to create a new Edge object again. def fromEdgeTuples[VD: ClassTag](

Re: Spurious test failures, testing best practices

2014-12-04 Thread Imran Rashid
I agree we should separate out the integration tests so it's easy for dev to just run the other fast tests locally. I opened a jira for it https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-4746 On Nov 30, 2014 3:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi Ryan, As

Re: Ooyala Spark JobServer

2014-12-04 Thread Patrick Wendell
Hey Jun, The Ooyala server is being maintained by it's original author (Evan Chan) here: https://github.com/spark-jobserver/spark-jobserver This is likely to stay as a standalone project for now, since it builds directly on Spark's public API's. - Patrick On Wed, Dec 3, 2014 at 9:02 PM, Jun

Re: zinc invocation examples

2014-12-04 Thread Nicholas Chammas
Oh, derp. I just assumed from looking at all the options that there was something to it. Thanks Sean. On Thu Dec 04 2014 at 7:47:33 AM Sean Owen so...@cloudera.com wrote: You just run it once with zinc -start and leave it running as a background process on your build machine. You don't have to

Re: Unit tests in 5 minutes

2014-12-04 Thread Nicholas Chammas
fwiw, when we did this work in HBase, we categorized the tests. Then some tests can share a single jvm, while some others need to be isolated in their own jvm. Nevertheless surefire can still run them in parallel by starting/stopping several jvm. I think we need to do this as well. Perhaps the

Re: Unit tests in 5 minutes

2014-12-04 Thread Ted Yu
Have you seen this thread http://search-hadoop.com/m/JW1q5xxSAa2 ? Test categorization in HBase is done through maven-surefire-plugin Cheers On Thu, Dec 4, 2014 at 4:05 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: fwiw, when we did this work in HBase, we categorized the tests. Then

Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
I got the following error during Spark startup (Yarn-client mode): 14/12/04 19:33:58 INFO Client: Uploading resource file:/x/home/jianshuang/spark/spark-latest/lib/datanucleus-api-jdo-3.2.6.jar -

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
Actually my HADOOP_CLASSPATH has already been set to include /etc/hadoop/conf/* export HADOOP_CLASSPATH=/etc/hbase/conf/hbase-site.xml:/usr/lib/hbase/lib/hbase-protocol.jar:$(hbase classpath) Jianshi On Fri, Dec 5, 2014 at 11:54 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: Looks like

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
Looks like the datanucleus*.jar shouldn't appear in the hdfs path in Yarn-client mode. Maybe this patch broke yarn-client. https://github.com/apache/spark/commit/a975dc32799bb8a14f9e1c76defaaa7cfbaf8b53 Jianshi On Fri, Dec 5, 2014 at 12:02 PM, Jianshi Huang jianshi.hu...@gmail.com wrote:

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
Correction: According to Liancheng, this hotfix might be the root cause: https://github.com/apache/spark/commit/38cb2c3a36a5c9ead4494cbc3dde008c2f0698ce Jianshi On Fri, Dec 5, 2014 at 12:45 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Looks like the datanucleus*.jar shouldn't appear in

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
I created a ticket for this: https://issues.apache.org/jira/browse/SPARK-4757 Jianshi On Fri, Dec 5, 2014 at 1:31 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Correction: According to Liancheng, this hotfix might be the root cause:

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Patrick Wendell
Thanks for flagging this. I reverted the relevant YARN fix in Spark 1.2 release. We can try to debug this in master. On Thu, Dec 4, 2014 at 9:51 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: I created a ticket for this: https://issues.apache.org/jira/browse/SPARK-4757 Jianshi On Fri,

Re: Auto BroadcastJoin optimization failed in latest Spark

2014-12-04 Thread Jianshi Huang
Sorry for the late of follow-up. I used Hao's DESC EXTENDED command and found some clue: new (broadcast broken Spark build): parameters:{numFiles=0, EXTERNAL=TRUE, transient_lastDdlTime=1417763892, COLUMN_STATS_ACCURATE=false, totalSize=0, numRows=-1, rawDataSize=-1} old (broadcast working

drop table if exists throws exception

2014-12-04 Thread Jianshi Huang
Hi, I got exception saying Hive: NoSuchObjectException(message:table table not found) when running DROP TABLE IF EXISTS table Looks like a new regression in Hive module. Anyone can confirm this? Thanks, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog:

Re: Auto BroadcastJoin optimization failed in latest Spark

2014-12-04 Thread Jianshi Huang
With Liancheng's suggestion, I've tried setting spark.sql.hive.convertMetastoreParquet false but still analyze noscan return -1 in rawDataSize Jianshi On Fri, Dec 5, 2014 at 3:33 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: If I run ANALYZE without NOSCAN, then Hive can successfully