Re: zinc invocation examples

2014-12-05 Thread Ryan Williams
fwiw I've been using `zinc -scala-home $SCALA_HOME -nailed -start` which:

- starts a nailgun server as well,
- uses my installed scala 2.{10,11}, as opposed to zinc's default 2.9.2
https://github.com/typesafehub/zinc#scala: If no options are passed to
locate a version of Scala then Scala 2.9.2 is used by default (which is
bundled with zinc).

The latter seems like it might be especially important.


On Thu Dec 04 2014 at 4:25:32 PM Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 Oh, derp. I just assumed from looking at all the options that there was
 something to it. Thanks Sean.

 On Thu Dec 04 2014 at 7:47:33 AM Sean Owen so...@cloudera.com wrote:

  You just run it once with zinc -start and leave it running as a
  background process on your build machine. You don't have to do
  anything for each build.
 
  On Wed, Dec 3, 2014 at 3:44 PM, Nicholas Chammas
  nicholas.cham...@gmail.com wrote:
   https://github.com/apache/spark/blob/master/docs/
  building-spark.md#speeding-up-compilation-with-zinc
  
   Could someone summarize how they invoke zinc as part of a regular
   build-test-etc. cycle?
  
   I'll add it in to the aforelinked page if appropriate.
  
   Nick
 



Re: Unit tests in 5 minutes

2014-12-05 Thread Andrew Or
@Patrick and Josh actually we went even further than that. We simply
disable the UI for most tests and these used to be the single largest
source of port conflict.


Re: zinc invocation examples

2014-12-05 Thread Patrick Wendell
One thing I created a JIRA for a while back was to have a similar
script to sbt/sbt that transparently downloads Zinc, Scala, and
Maven in a subdirectory of Spark and sets it up correctly. I.e.
build/mvn.

Outside of brew for MacOS there aren't good Zinc packages, and it's a
pain to figure out how to set it up.

https://issues.apache.org/jira/browse/SPARK-4501

Prashant Sharma looked at this for a bit but I don't think he's
working on it actively any more, so if someone wanted to do this, I'd
be extremely grateful.

- Patrick

On Fri, Dec 5, 2014 at 11:05 AM, Ryan Williams
ryan.blake.willi...@gmail.com wrote:
 fwiw I've been using `zinc -scala-home $SCALA_HOME -nailed -start` which:

 - starts a nailgun server as well,
 - uses my installed scala 2.{10,11}, as opposed to zinc's default 2.9.2
 https://github.com/typesafehub/zinc#scala: If no options are passed to
 locate a version of Scala then Scala 2.9.2 is used by default (which is
 bundled with zinc).

 The latter seems like it might be especially important.


 On Thu Dec 04 2014 at 4:25:32 PM Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 Oh, derp. I just assumed from looking at all the options that there was
 something to it. Thanks Sean.

 On Thu Dec 04 2014 at 7:47:33 AM Sean Owen so...@cloudera.com wrote:

  You just run it once with zinc -start and leave it running as a
  background process on your build machine. You don't have to do
  anything for each build.
 
  On Wed, Dec 3, 2014 at 3:44 PM, Nicholas Chammas
  nicholas.cham...@gmail.com wrote:
   https://github.com/apache/spark/blob/master/docs/
  building-spark.md#speeding-up-compilation-with-zinc
  
   Could someone summarize how they invoke zinc as part of a regular
   build-test-etc. cycle?
  
   I'll add it in to the aforelinked page if appropriate.
  
   Nick
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: drop table if exists throws exception

2014-12-05 Thread Michael Armbrust
The command run fine for me on master.  Note that Hive does print an
exception in the logs, but that exception does not propogate to user code.

On Thu, Dec 4, 2014 at 11:31 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:

 Hi,

 I got exception saying Hive: NoSuchObjectException(message:table table
 not found)

 when running DROP TABLE IF EXISTS table

 Looks like a new regression in Hive module.

 Anyone can confirm this?

 Thanks,
 --
 Jianshi Huang

 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/



Re: drop table if exists throws exception

2014-12-05 Thread Mark Hamstra
And that is no different from how Hive has worked for a long time.

On Fri, Dec 5, 2014 at 11:42 AM, Michael Armbrust mich...@databricks.com
wrote:

 The command run fine for me on master.  Note that Hive does print an
 exception in the logs, but that exception does not propogate to user code.

 On Thu, Dec 4, 2014 at 11:31 PM, Jianshi Huang jianshi.hu...@gmail.com
 wrote:

  Hi,
 
  I got exception saying Hive: NoSuchObjectException(message:table table
  not found)
 
  when running DROP TABLE IF EXISTS table
 
  Looks like a new regression in Hive module.
 
  Anyone can confirm this?
 
  Thanks,
  --
  Jianshi Huang
 
  LinkedIn: jianshi
  Twitter: @jshuang
  Github  Blog: http://huangjs.github.com/
 



CREATE TABLE AS SELECT does not work with temp tables in 1.2.0

2014-12-05 Thread kb
I am having trouble getting create table as select or saveAsTable from a
hiveContext to work with temp tables in spark 1.2.  No issues in 1.1.0 or
1.1.1

Simple modification to test case in the hive SQLQuerySuite.scala:

test(double nested data) {
sparkContext.parallelize(Nested1(Nested2(Nested3(1))) ::
Nil).registerTempTable(nested)
checkAnswer(
  sql(SELECT f1.f2.f3 FROM nested),
  1)
checkAnswer(sql(CREATE TABLE test_ctas_1234 AS SELECT * from nested),
Seq.empty[Row])
checkAnswer(
  sql(SELECT * FROM test_ctas_1234),
  sql(SELECT * FROM nested).collect().toSeq)
  }


output:

11:57:15.974 ERROR org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:
org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:45 Table not found
'nested'
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1243)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1192)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9209)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
at
org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation$lzycompute(CreateTableAsSelect.scala:59)
at
org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation(CreateTableAsSelect.scala:55)
at
org.apache.spark.sql.hive.execution.CreateTableAsSelect.sideEffectResult$lzycompute(CreateTableAsSelect.scala:82)
at
org.apache.spark.sql.hive.execution.CreateTableAsSelect.sideEffectResult(CreateTableAsSelect.scala:70)
at
org.apache.spark.sql.hive.execution.CreateTableAsSelect.execute(CreateTableAsSelect.scala:89)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
at 
org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:105)
at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:103)
at
org.apache.spark.sql.hive.execution.SQLQuerySuite$$anonfun$4.apply$mcV$sp(SQLQuerySuite.scala:122)
at
org.apache.spark.sql.hive.execution.SQLQuerySuite$$anonfun$4.apply(SQLQuerySuite.scala:117)
at
org.apache.spark.sql.hive.execution.SQLQuerySuite$$anonfun$4.apply(SQLQuerySuite.scala:117)
at
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
at
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
at
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
at
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
at org.scalatest.Suite$class.run(Suite.scala:1424)
at
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
at org.scalatest.FunSuite.run(FunSuite.scala:1555)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
at

Re: CREATE TABLE AS SELECT does not work with temp tables in 1.2.0

2014-12-05 Thread Michael Armbrust
Thanks for reporting.  This looks like a regression related to:
https://github.com/apache/spark/pull/2570

I've filed it here: https://issues.apache.org/jira/browse/SPARK-4769

On Fri, Dec 5, 2014 at 12:03 PM, kb kend...@hotmail.com wrote:

 I am having trouble getting create table as select or saveAsTable from a
 hiveContext to work with temp tables in spark 1.2.  No issues in 1.1.0 or
 1.1.1

 Simple modification to test case in the hive SQLQuerySuite.scala:

 test(double nested data) {
 sparkContext.parallelize(Nested1(Nested2(Nested3(1))) ::
 Nil).registerTempTable(nested)
 checkAnswer(
   sql(SELECT f1.f2.f3 FROM nested),
   1)
 checkAnswer(sql(CREATE TABLE test_ctas_1234 AS SELECT * from nested),
 Seq.empty[Row])
 checkAnswer(
   sql(SELECT * FROM test_ctas_1234),
   sql(SELECT * FROM nested).collect().toSeq)
   }


 output:

 11:57:15.974 ERROR org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:
 org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:45 Table not
 found
 'nested'
 at

 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1243)
 at

 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1192)
 at

 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9209)
 at

 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at

 org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation$lzycompute(CreateTableAsSelect.scala:59)
 at

 org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation(CreateTableAsSelect.scala:55)
 at

 org.apache.spark.sql.hive.execution.CreateTableAsSelect.sideEffectResult$lzycompute(CreateTableAsSelect.scala:82)
 at

 org.apache.spark.sql.hive.execution.CreateTableAsSelect.sideEffectResult(CreateTableAsSelect.scala:70)
 at

 org.apache.spark.sql.hive.execution.CreateTableAsSelect.execute(CreateTableAsSelect.scala:89)
 at

 org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
 at
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
 at
 org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
 at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:105)
 at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:103)
 at

 org.apache.spark.sql.hive.execution.SQLQuerySuite$$anonfun$4.apply$mcV$sp(SQLQuerySuite.scala:122)
 at

 org.apache.spark.sql.hive.execution.SQLQuerySuite$$anonfun$4.apply(SQLQuerySuite.scala:117)
 at

 org.apache.spark.sql.hive.execution.SQLQuerySuite$$anonfun$4.apply(SQLQuerySuite.scala:117)
 at

 org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
 at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
 at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
 at org.scalatest.Transformer.apply(Transformer.scala:22)
 at org.scalatest.Transformer.apply(Transformer.scala:20)
 at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
 at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
 at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
 at

 org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
 at
 org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
 at
 org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
 at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
 at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
 at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
 at

 org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
 at

 org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
 at

 org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
 at

 org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
 at
 org.scalatest.SuperEngine.org
 $scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
 at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
 at
 org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
 at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
 at org.scalatest.Suite$class.run(Suite.scala:1424)
 at
 org.scalatest.FunSuite.org
 $scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
 at
 org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
 at
 

Protobuf version in mvn vs sbt

2014-12-05 Thread spark.dubovsky.jakub
Hi devs,

  I play with your amazing Spark here in Prague for some time. I have 
stumbled on a thing which I like to ask about. I create assembly jars from 
source and then use it to run simple jobs on our 2.3.0-cdh5.1.3 cluster 
using yarn. Example of my usage [1]. Formerly I had started to use sbt for 
creating assemblies like this [2] which runs just fine. Then reading those 
maven-prefered stories here on dev list I found make-distribution.sh script 
in root of codebase and wanted to give it a try. I used it to create 
assembly by both [3] and [4].

  But I am not able to use assemblies created by make-distribution because 
it refuses to be submited to cluster. Here is what happens:
- run [3] or [4]
- recompile app agains new assembly
- submit job using new assembly by [1] like command
- submit fails with important parts of stack trace being [5]

  My guess is that it is due to improper version of protobuf included in 
assembly jar. My questions are:
- Can you confirm this hypothesis?
- What is the difference between sbt and mvn way of creating assembly? I 
mean sbt works and mvn not...
- What additional option I need to pass to make-distribution to make it 
work?

  Any help/explanation here would be appreciated

  Jakub
--
[1] ./bin/spark-submit --num-executors 200 --master yarn-cluster --conf 
spark.yarn.jar=assembly/target/scala-2.10/spark-assembly-1.2.1-SNAPSHOT-
hadoop2.3.0-cdh5.1.3.jar --class org.apache.spark.mllib.
CreateGuidDomainDictionary root-0.1.jar ${args}

[2] ./sbt/sbt -Dhadoop.version=2.3.0-cdh5.1.3 -Pyarn -Phive assembly/
assembly

[3] ./make-distribution.sh -Dhadoop.version=2.3.0-cdh5.1.3 -Pyarn -Phive -
DskipTests

[4] ./make-distribution.sh -Dyarn.version=2.3.0 -Dhadoop.version=2.3.0-cdh
5.1.3 -Pyarn -Phive -DskipTests

[5]Exception in thread main org.apache.hadoop.yarn.exceptions.
YarnRuntimeException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.
getClient(RpcClientFactoryPBImpl.java:79)
    at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getProxy
(HadoopYarnProtoRPC.java:48)
    at org.apache.hadoop.yarn.client.RMProxy$1.run(RMProxy.java:134)
...
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance
(NativeConstructorAccessorImpl.java:39)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.
getClient(RpcClientFactoryPBImpl.java:76)
... 27 more
Caused by: java.lang.VerifyError: class org.apache.hadoop.yarn.proto.
YarnServiceProtos$SubmitApplicationRequestProto overrides final method 
getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)



Re: Protobuf version in mvn vs sbt

2014-12-05 Thread Marcelo Vanzin
When building against Hadoop 2.x, you need to enable the appropriate
profile, aside from just specifying the version. e.g. -Phadoop-2.3
for Hadoop 2.3.

On Fri, Dec 5, 2014 at 12:51 PM,  spark.dubovsky.ja...@seznam.cz wrote:
 Hi devs,

   I play with your amazing Spark here in Prague for some time. I have
 stumbled on a thing which I like to ask about. I create assembly jars from
 source and then use it to run simple jobs on our 2.3.0-cdh5.1.3 cluster
 using yarn. Example of my usage [1]. Formerly I had started to use sbt for
 creating assemblies like this [2] which runs just fine. Then reading those
 maven-prefered stories here on dev list I found make-distribution.sh script
 in root of codebase and wanted to give it a try. I used it to create
 assembly by both [3] and [4].

   But I am not able to use assemblies created by make-distribution because
 it refuses to be submited to cluster. Here is what happens:
 - run [3] or [4]
 - recompile app agains new assembly
 - submit job using new assembly by [1] like command
 - submit fails with important parts of stack trace being [5]

   My guess is that it is due to improper version of protobuf included in
 assembly jar. My questions are:
 - Can you confirm this hypothesis?
 - What is the difference between sbt and mvn way of creating assembly? I
 mean sbt works and mvn not...
 - What additional option I need to pass to make-distribution to make it
 work?

   Any help/explanation here would be appreciated

   Jakub
 --
 [1] ./bin/spark-submit --num-executors 200 --master yarn-cluster --conf
 spark.yarn.jar=assembly/target/scala-2.10/spark-assembly-1.2.1-SNAPSHOT-
 hadoop2.3.0-cdh5.1.3.jar --class org.apache.spark.mllib.
 CreateGuidDomainDictionary root-0.1.jar ${args}

 [2] ./sbt/sbt -Dhadoop.version=2.3.0-cdh5.1.3 -Pyarn -Phive assembly/
 assembly

 [3] ./make-distribution.sh -Dhadoop.version=2.3.0-cdh5.1.3 -Pyarn -Phive -
 DskipTests

 [4] ./make-distribution.sh -Dyarn.version=2.3.0 -Dhadoop.version=2.3.0-cdh
 5.1.3 -Pyarn -Phive -DskipTests

 [5]Exception in thread main org.apache.hadoop.yarn.exceptions.
 YarnRuntimeException: java.lang.reflect.InvocationTargetException
 at org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.
 getClient(RpcClientFactoryPBImpl.java:79)
 at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getProxy
 (HadoopYarnProtoRPC.java:48)
 at org.apache.hadoop.yarn.client.RMProxy$1.run(RMProxy.java:134)
 ...
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance
 (NativeConstructorAccessorImpl.java:39)
 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
 (DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.
 getClient(RpcClientFactoryPBImpl.java:76)
 ... 27 more
 Caused by: java.lang.VerifyError: class org.apache.hadoop.yarn.proto.
 YarnServiceProtos$SubmitApplicationRequestProto overrides final method
 getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)




-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Protobuf version in mvn vs sbt

2014-12-05 Thread DB Tsai
As Marcelo said, CDH5.3 is based on hadoop 2.3, so please try

./make-distribution.sh -Pyarn -Phive -Phadoop-2.3
-Dhadoop.version=2.3.0-cdh5.1.3 -DskipTests

See the detail of how to change the profile at
https://spark.apache.org/docs/latest/building-with-maven.html

Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Fri, Dec 5, 2014 at 12:54 PM, Marcelo Vanzin van...@cloudera.com wrote:
 When building against Hadoop 2.x, you need to enable the appropriate
 profile, aside from just specifying the version. e.g. -Phadoop-2.3
 for Hadoop 2.3.

 On Fri, Dec 5, 2014 at 12:51 PM,  spark.dubovsky.ja...@seznam.cz wrote:
 Hi devs,

   I play with your amazing Spark here in Prague for some time. I have
 stumbled on a thing which I like to ask about. I create assembly jars from
 source and then use it to run simple jobs on our 2.3.0-cdh5.1.3 cluster
 using yarn. Example of my usage [1]. Formerly I had started to use sbt for
 creating assemblies like this [2] which runs just fine. Then reading those
 maven-prefered stories here on dev list I found make-distribution.sh script
 in root of codebase and wanted to give it a try. I used it to create
 assembly by both [3] and [4].

   But I am not able to use assemblies created by make-distribution because
 it refuses to be submited to cluster. Here is what happens:
 - run [3] or [4]
 - recompile app agains new assembly
 - submit job using new assembly by [1] like command
 - submit fails with important parts of stack trace being [5]

   My guess is that it is due to improper version of protobuf included in
 assembly jar. My questions are:
 - Can you confirm this hypothesis?
 - What is the difference between sbt and mvn way of creating assembly? I
 mean sbt works and mvn not...
 - What additional option I need to pass to make-distribution to make it
 work?

   Any help/explanation here would be appreciated

   Jakub
 --
 [1] ./bin/spark-submit --num-executors 200 --master yarn-cluster --conf
 spark.yarn.jar=assembly/target/scala-2.10/spark-assembly-1.2.1-SNAPSHOT-
 hadoop2.3.0-cdh5.1.3.jar --class org.apache.spark.mllib.
 CreateGuidDomainDictionary root-0.1.jar ${args}

 [2] ./sbt/sbt -Dhadoop.version=2.3.0-cdh5.1.3 -Pyarn -Phive assembly/
 assembly

 [3] ./make-distribution.sh -Dhadoop.version=2.3.0-cdh5.1.3 -Pyarn -Phive -
 DskipTests

 [4] ./make-distribution.sh -Dyarn.version=2.3.0 -Dhadoop.version=2.3.0-cdh
 5.1.3 -Pyarn -Phive -DskipTests

 [5]Exception in thread main org.apache.hadoop.yarn.exceptions.
 YarnRuntimeException: java.lang.reflect.InvocationTargetException
 at org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.
 getClient(RpcClientFactoryPBImpl.java:79)
 at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getProxy
 (HadoopYarnProtoRPC.java:48)
 at org.apache.hadoop.yarn.client.RMProxy$1.run(RMProxy.java:134)
 ...
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance
 (NativeConstructorAccessorImpl.java:39)
 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
 (DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.
 getClient(RpcClientFactoryPBImpl.java:76)
 ... 27 more
 Caused by: java.lang.VerifyError: class org.apache.hadoop.yarn.proto.
 YarnServiceProtos$SubmitApplicationRequestProto overrides final method
 getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)




 --
 Marcelo

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Protobuf version in mvn vs sbt

2014-12-05 Thread Sean Owen
(Nit: CDH *5.1.x*, including 5.1.3, is derived from Hadoop 2.3.x. 5.3
is based on 2.5.x)

On Fri, Dec 5, 2014 at 3:29 PM, DB Tsai dbt...@dbtsai.com wrote:
 As Marcelo said, CDH5.3 is based on hadoop 2.3, so please try

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Protobuf version in mvn vs sbt

2014-12-05 Thread DB Tsai
oh, I meant to say cdh5.1.3 used by Jakub's company is based on 2.3. You
can see it from the first part of the Cloudera's version number - 2.3.0-cdh
5.1.3.


Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai

On Fri, Dec 5, 2014 at 1:38 PM, Sean Owen so...@cloudera.com wrote:

 (Nit: CDH *5.1.x*, including 5.1.3, is derived from Hadoop 2.3.x. 5.3
 is based on 2.5.x)

 On Fri, Dec 5, 2014 at 3:29 PM, DB Tsai dbt...@dbtsai.com wrote:
  As Marcelo said, CDH5.3 is based on hadoop 2.3, so please try



build in IntelliJ IDEA

2014-12-05 Thread Judy Nash
Hi everyone,

Have a newbie question on using IntelliJ to build and debug.

I followed this wiki to setup IntelliJ:
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA

Afterward I tried to build via Toolbar (Build  Rebuild Project).
The action fails with the error message:
Cannot start compiler: the SDK is not specified.

What SDK do I need to specify to get the build working?

Thanks,
Judy


Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-05 Thread Patrick Wendell
Hey All,

Thanks all for the continued testing!

The issue I mentioned earlier SPARK-4498 was fixed earlier this week
(hat tip to Mark Hamstra who contributed to fix).

In the interim a few smaller blocker-level issues with Spark SQL were
found and fixed (SPARK-4753, SPARK-4552, SPARK-4761).

There is currently an outstanding issue (SPARK-4740[1]) in Spark core
that needs to be fixed.

I want to thank in particular Shopify and Intel China who have
identified and helped test blocker issues with the release. This type
of workload testing around releases is really helpful for us.

Once things stabilize I will cut RC2. I think we're pretty close with this one.

- Patrick

On Wed, Dec 3, 2014 at 5:38 PM, Takeshi Yamamuro linguin@gmail.com wrote:
 +1 (non-binding)

 Checked on CentOS 6.5, compiled from the source.
 Ran various examples in stand-alone master and three slaves, and
 browsed the web UI.

 On Sat, Nov 29, 2014 at 2:16 PM, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.2.0!

 The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.2.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1048/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.2.0!

 The vote is open until Tuesday, December 02, at 05:15 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.1.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == What justifies a -1 vote for this release? ==
 This vote is happening very late into the QA period compared with
 previous votes, so -1 votes should only occur for significant
 regressions from 1.0.2. Bugs already present in 1.1.X, minor
 regressions, or bugs related to new features will not block this
 release.

 == What default changes should I be aware of? ==
 1. The default value of spark.shuffle.blockTransferService has been
 changed to netty
 -- Old behavior can be restored by switching to nio

 2. The default value of spark.shuffle.manager has been changed to sort.
 -- Old behavior can be restored by setting spark.shuffle.manager to
 hash.

 == Other notes ==
 Because this vote is occurring over a weekend, I will likely extend
 the vote if this RC survives until the end of the vote period.

 - Patrick

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: build in IntelliJ IDEA

2014-12-05 Thread Josh Rosen
If you go to “File - Project Structure” and click on “Project” under the 
“Project settings” heading, do you see an entry for “Project SDK?”  If not, you 
should click “New…” and configure a JDK; by default, I think IntelliJ should 
figure out a correct path to your system JDK, so you should just be able to hit 
“Ok” then rebuild your project.   For reference, here’s a screenshot showing 
what my version of that window looks like: http://i.imgur.com/hRfQjIi.png


On December 5, 2014 at 1:52:35 PM, Judy Nash (judyn...@exchange.microsoft.com) 
wrote:
Hi everyone,  

Have a newbie question on using IntelliJ to build and debug.  

I followed this wiki to setup IntelliJ:  
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA
  

Afterward I tried to build via Toolbar (Build  Rebuild Project).  
The action fails with the error message:  
Cannot start compiler: the SDK is not specified.  

What SDK do I need to specify to get the build working?  

Thanks,  
Judy