Re: Support Hive 0.13 .1 in Spark SQL
Hey Cheng, Right now we aren't using stable API's to communicate with the Hive Metastore. We didn't want to drop support for Hive 0.12 so right now we are using a shim layer to support compiling for 0.12 and 0.13. This is very costly to maintain. If Hive has a stable meta-data API for talking to a Metastore, we should use that (is HCatalog sufficient for this purpose?). Ideally we would be able to talk to multiple versions of the Hive metastore and we can keep a single internal version of Hive for our use of Serde's, etc. I've created SPARK-4114 for this: https://issues.apache.org/jira/browse/SPARK-4114 This is a very important issue for Spark SQL, so I'd welcome comments on that JIRA from anyone who is familiar with Hive/HCatalog internals. - Patrick On Mon, Oct 27, 2014 at 9:54 PM, Cheng, Hao hao.ch...@intel.com wrote: Hi, all I have some PRs blocked by hive upgrading (e.g. https://github.com/apache/spark/pull/2570), the problem is some internal hive method signature changed, it's hard to make the compatible in code level (sql/hive) when switching back/forth the Hive versions. I guess the motivation of the upgrading is to support the Metastore with different Hive versions. So, how about just keep the metastore related hive jars upgrading or utilize the HCatalog directly? And of course we can either leaving hive-exec.jar hive-cli.jar etc as 0.12 or upgrade to 0.13.1, but not support them both. Sorry if I missed some discussion of Hive upgrading. Cheng Hao - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: best IDE for scala + spark development?
thanks everyone. i've been using vim and sbt recently, and i really like it. it's lightweight, fast. plus, ack, ctrl-t, nerdtre, etc. in vim do all the good work. but, as i'm not familiar with scala/spark api yet, i really wish to have these two things in vim + sbt. 1. code completion as in intellij (typing long method / class name in scala/spark isn't that fun!) 2. scala doc on the fly in the text editor (just so i don't have to switch back and forth between the text editor and the scala doc) did anyone have experience with adding these 2 things to vim? thanks! On Mon, Oct 27, 2014 at 5:14 PM, Will Benton wi...@redhat.com wrote: I'll chime in as yet another user who is extremely happy with sbt and a text editor. (In my experience, running ack from the command line is usually just as easy and fast as using an IDE's find-in-project facility.) You can, of course, extend editors with Scala-specific IDE-like functionality (in particular, I am aware of -- but have not used -- ENSIME for emacs or TextMate). Since you're new to Scala, you may not know that you can run any sbt command preceded by a tilde, which will watch files in your project and run the command when anything changes. Therefore, running ~compile from the sbt repl will get you most of the continuous syntax-checking functionality you can get from an IDE. best, wb - Original Message - From: ll duy.huynh@gmail.com To: d...@spark.incubator.apache.org Sent: Sunday, October 26, 2014 10:07:20 AM Subject: best IDE for scala + spark development? i'm new to both scala and spark. what IDE / dev environment do you find most productive for writing code in scala with spark? is it just vim + sbt? or does a full IDE like intellij works out better? thanks! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/best-IDE-for-scala-spark-development-tp8965.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: HiveContext bug?
Hi Marcelo, yes this is a known Spark SQL bug and we've got PRs to fix it (2887 2967). Not merged yet because newly merged Hive 0.13.1 support causes some conflicts. Thanks for reporting this :) On Tue, Oct 28, 2014 at 6:41 AM, Marcelo Vanzin van...@cloudera.com wrote: Well, looks like a huge coincidence, but this was just sent to github: https://github.com/apache/spark/pull/2967 On Mon, Oct 27, 2014 at 3:25 PM, Marcelo Vanzin van...@cloudera.com wrote: Hey guys, I've been using the HiveFromSpark example to test some changes and I ran into an issue that manifests itself as an NPE inside Hive code because some configuration object is null. Tracing back, it seems that `sessionState` being a lazy val in HiveContext is causing it. That variably is only evaluated in [1], while the call in [2] causes a Driver to be initialized by [3], which the tries to use the thread-local session state ([4]) which hasn't been set yet. This could be seen as a Hive bug ([3] should probably be calling the constructor that takes a conf object), but is there a reason why these fields are lazy in HiveContext? I explicitly called SessionState.setCurrentSessionState() before the CommandProcessorFactory call and that seems to fix the issue too. [1] https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L305 [2] https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L289 [3] https://github.com/apache/hive/blob/9c63b2fdc35387d735f4c9d08761203711d4974b/ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java#L104 [4] https://github.com/apache/hive/blob/9c63b2fdc35387d735f4c9d08761203711d4974b/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L286 -- Marcelo -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: best IDE for scala + spark development?
My two cents for Mac Vim/Emacs users. Fixed a Scala ctags Mac compatibility bug months ago, and you may want to use the most recent version here https://github.com/scala/scala-dist/blob/master/tool-support/src/emacs/contrib/dot-ctags On Tue, Oct 28, 2014 at 4:26 PM, Duy Huynh duy.huynh@gmail.com wrote: thanks everyone. i've been using vim and sbt recently, and i really like it. it's lightweight, fast. plus, ack, ctrl-t, nerdtre, etc. in vim do all the good work. but, as i'm not familiar with scala/spark api yet, i really wish to have these two things in vim + sbt. 1. code completion as in intellij (typing long method / class name in scala/spark isn't that fun!) 2. scala doc on the fly in the text editor (just so i don't have to switch back and forth between the text editor and the scala doc) did anyone have experience with adding these 2 things to vim? thanks! On Mon, Oct 27, 2014 at 5:14 PM, Will Benton wi...@redhat.com wrote: I'll chime in as yet another user who is extremely happy with sbt and a text editor. (In my experience, running ack from the command line is usually just as easy and fast as using an IDE's find-in-project facility.) You can, of course, extend editors with Scala-specific IDE-like functionality (in particular, I am aware of -- but have not used -- ENSIME for emacs or TextMate). Since you're new to Scala, you may not know that you can run any sbt command preceded by a tilde, which will watch files in your project and run the command when anything changes. Therefore, running ~compile from the sbt repl will get you most of the continuous syntax-checking functionality you can get from an IDE. best, wb - Original Message - From: ll duy.huynh@gmail.com To: d...@spark.incubator.apache.org Sent: Sunday, October 26, 2014 10:07:20 AM Subject: best IDE for scala + spark development? i'm new to both scala and spark. what IDE / dev environment do you find most productive for writing code in scala with spark? is it just vim + sbt? or does a full IDE like intellij works out better? thanks! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/best-IDE-for-scala-spark-development-tp8965.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [MLlib] Contributing Algorithm for Outlier Detection
Hi Anant, Thank you for reviewing and helping us out. Please find the following link where you can see the initial code. https://github.com/codeAshu/Outlier-Detection-with-AVF-Spark/blob/master/OutlierWithAVFModel.scala The input file for the code should be in csv format. We have provided a dataset there at the link. We are currently facing the following style issues in the code(code is working fine though) : At line no 62 and 79 we have redundant functions and variables (count_dataPoint, count_trimmedData) for giving line numbers within the function trimScores(). At line no 144 and 149 if we do not use two separate functions to increment line numbers we get erroneous results . Is there any alternative way of handling that? We think that it because of scala clousers where any local variable which is not in RDD doesn't get updated in subsequent pairRDDFunctions. Regards, Ashutosh Kaushik -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p8990.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: jenkins downtime tomorrow morning ~6am-8am PDT
this is done, and jenkins is up and building again. On Mon, Oct 27, 2014 at 10:46 AM, shane knapp skn...@berkeley.edu wrote: i'll be bringing jenkins down tomorrow morning for some system maintenance and to get our backups kicked off. i do expect to have the system back up and running before 8am. please let me know ASAP if i need to reschedule this. thanks, shane
How to run tests properly?
Hi, I want to contribute to the MLlib library but I can't get the tests up working. I've found three ways of running the tests on the commandline. I just want to execute the MLlib tests. 1. via dev/run-tests script This script executes all tests and take several hours to finish. Some tests failed but I can't say which of them. Should this really take that long? Can I specify to run only MLlib tests? 2. directly via maven I did the following described in the docs [0]. export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive clean package mvn -Pyarn -Phadoop-2.3 -Phive test This also doesn't work. Why do I have to package spark bevore running the tests? 3. via sbt I tried the following. I freshly cloned spark and checked out the tag v1.1.0-rc4. sbt/sbt project mllib test and get the following exception in several cluster tests. [info] - task size should be small in both training and prediction *** FAILED *** [info] org.apache.spark.SparkException: Job aborted due to stage failure: Master removed our application: FAILED [info] at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) [info] at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) [info] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) [info] at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) [info] at scala.Option.foreach(Option.scala:236) [info] at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) summary: [error] Failed: Total 223, Failed 12, Errors 0, Passed 211 [error] Failed tests: [error] org.apache.spark.mllib.clustering.KMeansClusterSuite [error] org.apache.spark.mllib.classification.LogisticRegressionClusterSuite [error] org.apache.spark.mllib.optimization.GradientDescentClusterSuite [error] org.apache.spark.mllib.classification.SVMClusterSuite [error] org.apache.spark.mllib.linalg.distributed.RowMatrixClusterSuite [error] org.apache.spark.mllib.regression.LinearRegressionClusterSuite [error] org.apache.spark.mllib.classification.NaiveBayesClusterSuite [error] org.apache.spark.mllib.regression.LassoClusterSuite [error] org.apache.spark.mllib.regression.RidgeRegressionClusterSuite [error] org.apache.spark.mllib.optimization.LBFGSClusterSuite [error] (mllib/test:test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 661 s, completed 28.10.2014 17:13:10 sbt/sbt project mllib test 761,74s user 22,86s system 109% cpu 11:59,57 total I tried several slightly different ways but I can't get the tests working. I observed that the tests are running __very__ slow in some configurations. The cpu nearly idles and the ram usage is low. Am I doing something fundamental wrong? After many hours of trial and error I'm stuck. Long build and test durations are making it difficult to investigate. Hopefully someone can give me a hint. Which one is the right way to flexibly run the tests of the different sub projects. Thanks, Niklas [0] https://spark.apache.org/docs/latest/building-with-maven.html - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: How to run tests properly?
On Tue, Oct 28, 2014 at 6:18 PM, Niklas Wilcke 1wil...@informatik.uni-hamburg.de wrote: 1. via dev/run-tests script This script executes all tests and take several hours to finish. Some tests failed but I can't say which of them. Should this really take that long? Can I specify to run only MLlib tests? Yes, running all tests takes a long long time. It does print which tests failed, and you can see the errors in the test output. Did you read http://spark.apache.org/docs/latest/building-with-maven.html#spark-tests-in-maven ? This shows how to run just one test suite. In any Maven project you can try things like mvn test -pl [module] to run just one module's tests. 2. directly via maven I did the following described in the docs [0]. export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive clean package mvn -Pyarn -Phadoop-2.3 -Phive test This also doesn't work. Why do I have to package spark bevore running the tests? What doesn't work? Some tests use the built assembly, which requires packaging. 3. via sbt I tried the following. I freshly cloned spark and checked out the tag v1.1.0-rc4. sbt/sbt project mllib test and get the following exception in several cluster tests. [info] - task size should be small in both training and prediction *** FAILED *** This just looks like a flaky test failure; I'd try again. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Breeze::DiffFunction not serializable
Hi, I'm trying to call Breeze::LBFGS from the master on each partition but getting *NonSerializable* error. I guess it's well-known that the Breeze DiffFunction is not serializable. /// import breeze.linalg.{Vector = BV, DenseVector=BDV, SparseVector=BSV} val lbfgs = new breeze.optimize.LBFGS[BDV[Double]] val wInit: BDV[Double] = Array.fill(numFeatures)(0.0).toBreeze def localUpdate(d:Array[(Double, BV[Double])], w:BDV[Double]) : BDV[Double] { def getObj = new DiffFunction[BDV[Double]] { def calculate(w: BDV[Double]) : (Double, BDV[Double]) = { ... } } lbfgs.minimize(getObj, w) } rdd.mapPartitions{ iter: Iterator[(Double, BV[Double])] = { val d : Array[(Double, BV[Double])] = iter.toArray val w : BDV[Double] = localUpdate(d, wInit) Iterator(w) } The following link talks about using the KyroSerializationWrapper as a solution: http://http://stackoverflow.com/questions/23050067/spark-task-not-serializable-how-to-work-with-complex-map-closures-that-call-o http://http://stackoverflow.com/questions/23050067/spark-task-not-serializable-how-to-work-with-complex-map-closures-that-call-o But I didn't have good luck yet. Can some one points to a work-around way to do the serialization? Thanks a lot. Xuepeng -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Breeze-DiffFunction-not-serializable-tp8996.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
HiveShim not found when building in Intellij
I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)) ^
Re: HiveShim not found when building in Intellij
Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)) ^ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: HiveShim not found when building in Intellij
Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)) ^
Re: HiveShim not found when building in Intellij
Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. We should come up with a good set of instructions on how to import the pom files + add the few extra source directories. Off hand I am not sure exactly what the correct sequence is. - Patrick On Tue, Oct 28, 2014 at 7:57 PM, Stephen Boesch java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)) ^ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: HiveShim not found when building in Intellij
Thanks Patrick for the heads up. I have not been successful to discover a combination of profiles (i.e. enabling hive or hive-0.12.0 or hive-13.0) that works in Intellij with maven. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. We should come up with a good set of instructions on how to import the pom files + add the few extra source directories. Off hand I am not sure exactly what the correct sequence is. - Patrick On Tue, Oct 28, 2014 at 7:57 PM, Stephen Boesch java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)) ^
Re: HiveShim not found when building in Intellij
-Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0” is to enable hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but expected to go to upstream soon (Spark-3720). Thanks. Zhan Zhang On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote: Thanks Patrick for the heads up. I have not been successful to discover a combination of profiles (i.e. enabling hive or hive-0.12.0 or hive-13.0) that works in Intellij with maven. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. We should come up with a good set of instructions on how to import the pom files + add the few extra source directories. Off hand I am not sure exactly what the correct sequence is. - Patrick On Tue, Oct 28, 2014 at 7:57 PM, Stephen Boesch java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)) ^ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: HiveShim not found when building in Intellij
Yes, these two combinations work for me. On 10/29/14 12:32 PM, Zhan Zhang wrote: -Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0” is to enable hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but expected to go to upstream soon (Spark-3720). Thanks. Zhan Zhang On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote: Thanks Patrick for the heads up. I have not been successful to discover a combination of profiles (i.e. enabling hive or hive-0.12.0 or hive-13.0) that works in Intellij with maven. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. We should come up with a good set of instructions on how to import the pom files + add the few extra source directories. Off hand I am not sure exactly what the correct sequence is. - Patrick On Tue, Oct 28, 2014 at 7:57 PM, Stephen Boesch java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)) ^ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: HiveShim not found when building in Intellij
I am interested specifically in how to build (and hopefully run/debug..) under Intellij. Your posts sound like command line maven - which has always been working already. Do you have instructions for building in IJ? 2014-10-28 21:38 GMT-07:00 Cheng Lian lian.cs@gmail.com: Yes, these two combinations work for me. On 10/29/14 12:32 PM, Zhan Zhang wrote: -Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0” is to enable hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but expected to go to upstream soon (Spark-3720). Thanks. Zhan Zhang On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote: Thanks Patrick for the heads up. I have not been successful to discover a combination of profiles (i.e. enabling hive or hive-0.12.0 or hive-13.0) that works in Intellij with maven. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. We should come up with a good set of instructions on how to import the pom files + add the few extra source directories. Off hand I am not sure exactly what the correct sequence is. - Patrick On Tue, Oct 28, 2014 at 7:57 PM, Stephen Boesch java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)) ^
Re: HiveShim not found when building in Intellij
Btw - we should have part of the official docs that describes a full from scratch build in IntelliJ including any gotchas. Then we can update it if there are build changes that alter it. I created this JIRA for it: https://issues.apache.org/jira/browse/SPARK-4128 On Tue, Oct 28, 2014 at 9:42 PM, Stephen Boesch java...@gmail.com wrote: I am interested specifically in how to build (and hopefully run/debug..) under Intellij. Your posts sound like command line maven - which has always been working already. Do you have instructions for building in IJ? 2014-10-28 21:38 GMT-07:00 Cheng Lian lian.cs@gmail.com: Yes, these two combinations work for me. On 10/29/14 12:32 PM, Zhan Zhang wrote: -Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0 is to enable hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but expected to go to upstream soon (Spark-3720). Thanks. Zhan Zhang On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote: Thanks Patrick for the heads up. I have not been successful to discover a combination of profiles (i.e. enabling hive or hive-0.12.0 or hive-13.0) that works in Intellij with maven. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. We should come up with a good set of instructions on how to import the pom files + add the few extra source directories. Off hand I am not sure exactly what the correct sequence is. - Patrick On Tue, Oct 28, 2014 at 7:57 PM, Stephen Boesch java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)) ^ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: HiveShim not found when building in Intellij
You may first open the root pom.xml file in IDEA, and then go for menu View / Tool Windows / Maven Projects, then choose desired Maven profile combination under the Profiles node (e.g. I usually use hadoop-2.4 + hive + hive-0.12.0). IDEA will ask you to re-import the Maven projects, confirm, then it should be OK. I can debug within IDEA with this approach. However, you have to clean the whole project before debugging Spark within IDEA if you compiled the project outside IDEA. Haven't got time to investigate this annoying issue. Also, you can remove sub projects unrelated to your tasks to accelerate compilation and/or avoid other IDEA build issues (e.g. Avro related Spark streaming build failure in IDEA). On 10/29/14 12:42 PM, Stephen Boesch wrote: I am interested specifically in how to build (and hopefully run/debug..) under Intellij. Your posts sound like command line maven - which has always been working already. Do you have instructions for building in IJ? 2014-10-28 21:38 GMT-07:00 Cheng Lian lian.cs@gmail.com mailto:lian.cs@gmail.com: Yes, these two combinations work for me. On 10/29/14 12:32 PM, Zhan Zhang wrote: -Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0” is to enable hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but expected to go to upstream soon (Spark-3720). Thanks. Zhan Zhang On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com mailto:java...@gmail.com wrote: Thanks Patrick for the heads up. I have not been successful to discover a combination of profiles (i.e. enabling hive or hive-0.12.0 or hive-13.0) that works in Intellij with maven. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com mailto:pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. We should come up with a good set of instructions on how to import the pom files + add the few extra source directories. Off hand I am not sure exactly what the correct sequence is. - Patrick On Tue, Oct 28, 2014 at 7:57 PM, Stephen Boesch java...@gmail.com mailto:java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com mailto:matei.zaha...@gmail.com: Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM,
Re: HiveShim not found when building in Intellij
I just started a totally fresh IntelliJ project importing from our root pom. I used all the default options and I added hadoop-2.4, hive, hive-0.13.1 profiles. I was able to run spark core tests from within IntelliJ. Didn't try anything beyond that, but FWIW this worked. - Patrick On Tue, Oct 28, 2014 at 9:54 PM, Cheng Lian lian.cs@gmail.com wrote: You may first open the root pom.xml file in IDEA, and then go for menu View / Tool Windows / Maven Projects, then choose desired Maven profile combination under the Profiles node (e.g. I usually use hadoop-2.4 + hive + hive-0.12.0). IDEA will ask you to re-import the Maven projects, confirm, then it should be OK. I can debug within IDEA with this approach. However, you have to clean the whole project before debugging Spark within IDEA if you compiled the project outside IDEA. Haven't got time to investigate this annoying issue. Also, you can remove sub projects unrelated to your tasks to accelerate compilation and/or avoid other IDEA build issues (e.g. Avro related Spark streaming build failure in IDEA). On 10/29/14 12:42 PM, Stephen Boesch wrote: I am interested specifically in how to build (and hopefully run/debug..) under Intellij. Your posts sound like command line maven - which has always been working already. Do you have instructions for building in IJ? 2014-10-28 21:38 GMT-07:00 Cheng Lian lian.cs@gmail.com: Yes, these two combinations work for me. On 10/29/14 12:32 PM, Zhan Zhang wrote: -Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0 is to enable hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but expected to go to upstream soon (Spark-3720). Thanks. Zhan Zhang On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote: Thanks Patrick for the heads up. I have not been successful to discover a combination of profiles (i.e. enabling hive or hive-0.12.0 or hive-13.0) that works in Intellij with maven. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. We should come up with a good set of instructions on how to import the pom files + add the few extra source directories. Off hand I am not sure exactly what the correct sequence is. - Patrick On Tue, Oct 28, 2014 at 7:57 PM, Stephen Boesch java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)) ^ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: HiveShim not found when building in Intellij
Hao Cheng had just written such a from scratch guide for building Spark SQL in IDEA. Although it's written in Chinese, I think the illustrations are already descriptive enough. http://www.cnblogs.com//articles/4058371.html On 10/29/14 12:45 PM, Patrick Wendell wrote: Btw - we should have part of the official docs that describes a full from scratch build in IntelliJ including any gotchas. Then we can update it if there are build changes that alter it. I created this JIRA for it: https://issues.apache.org/jira/browse/SPARK-4128 On Tue, Oct 28, 2014 at 9:42 PM, Stephen Boesch java...@gmail.com wrote: I am interested specifically in how to build (and hopefully run/debug..) under Intellij. Your posts sound like command line maven - which has always been working already. Do you have instructions for building in IJ? 2014-10-28 21:38 GMT-07:00 Cheng Lian lian.cs@gmail.com: Yes, these two combinations work for me. On 10/29/14 12:32 PM, Zhan Zhang wrote: -Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0 is to enable hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but expected to go to upstream soon (Spark-3720). Thanks. Zhan Zhang On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote: Thanks Patrick for the heads up. I have not been successful to discover a combination of profiles (i.e. enabling hive or hive-0.12.0 or hive-13.0) that works in Intellij with maven. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. We should come up with a good set of instructions on how to import the pom files + add the few extra source directories. Off hand I am not sure exactly what the correct sequence is. - Patrick On Tue, Oct 28, 2014 at 7:57 PM, Stephen Boesch java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)) ^ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: HiveShim not found when building in Intellij
I have selected the same options as Cheng LIang: hadoop-2.4, hive, hive 0.12.0 . After a full Rebuild in IJ I still see the HiveShim errors. I really do not know what is different. I had pulled three hours ago from github upstream master. Just for kicks i am trying PW's combination which uses 0.13.1 now.. But it appears there is something else going on here. Patrick/ Cheng: did you build on the command line using Maven first? I do that since in the past that had been required. 2014-10-28 21:57 GMT-07:00 Patrick Wendell pwend...@gmail.com: I just started a totally fresh IntelliJ project importing from our root pom. I used all the default options and I added hadoop-2.4, hive, hive-0.13.1 profiles. I was able to run spark core tests from within IntelliJ. Didn't try anything beyond that, but FWIW this worked. - Patrick On Tue, Oct 28, 2014 at 9:54 PM, Cheng Lian lian.cs@gmail.com wrote: You may first open the root pom.xml file in IDEA, and then go for menu View / Tool Windows / Maven Projects, then choose desired Maven profile combination under the Profiles node (e.g. I usually use hadoop-2.4 + hive + hive-0.12.0). IDEA will ask you to re-import the Maven projects, confirm, then it should be OK. I can debug within IDEA with this approach. However, you have to clean the whole project before debugging Spark within IDEA if you compiled the project outside IDEA. Haven't got time to investigate this annoying issue. Also, you can remove sub projects unrelated to your tasks to accelerate compilation and/or avoid other IDEA build issues (e.g. Avro related Spark streaming build failure in IDEA). On 10/29/14 12:42 PM, Stephen Boesch wrote: I am interested specifically in how to build (and hopefully run/debug..) under Intellij. Your posts sound like command line maven - which has always been working already. Do you have instructions for building in IJ? 2014-10-28 21:38 GMT-07:00 Cheng Lian lian.cs@gmail.com: Yes, these two combinations work for me. On 10/29/14 12:32 PM, Zhan Zhang wrote: -Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0 is to enable hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but expected to go to upstream soon (Spark-3720). Thanks. Zhan Zhang On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote: Thanks Patrick for the heads up. I have not been successful to discover a combination of profiles (i.e. enabling hive or hive-0.12.0 or hive-13.0) that works in Intellij with maven. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. We should come up with a good set of instructions on how to import the pom files + add the few extra source directories. Off hand I am not sure exactly what the correct sequence is. - Patrick On Tue, Oct 28, 2014 at 7:57 PM, Stephen Boesch java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors:
Re: HiveShim not found when building in Intellij
Cheng - to make it recognize the new HiveShim for 0.12 I had to click on spark-hive under packages in the left pane, then go to Open Module Settings - then explicitly add the v0.12.0/src/main/scala folder to the sources by navigating to it and then ctrl+click to add it as a source. Did you have to do this? On Tue, Oct 28, 2014 at 9:57 PM, Patrick Wendell pwend...@gmail.com wrote: I just started a totally fresh IntelliJ project importing from our root pom. I used all the default options and I added hadoop-2.4, hive, hive-0.13.1 profiles. I was able to run spark core tests from within IntelliJ. Didn't try anything beyond that, but FWIW this worked. - Patrick On Tue, Oct 28, 2014 at 9:54 PM, Cheng Lian lian.cs@gmail.com wrote: You may first open the root pom.xml file in IDEA, and then go for menu View / Tool Windows / Maven Projects, then choose desired Maven profile combination under the Profiles node (e.g. I usually use hadoop-2.4 + hive + hive-0.12.0). IDEA will ask you to re-import the Maven projects, confirm, then it should be OK. I can debug within IDEA with this approach. However, you have to clean the whole project before debugging Spark within IDEA if you compiled the project outside IDEA. Haven't got time to investigate this annoying issue. Also, you can remove sub projects unrelated to your tasks to accelerate compilation and/or avoid other IDEA build issues (e.g. Avro related Spark streaming build failure in IDEA). On 10/29/14 12:42 PM, Stephen Boesch wrote: I am interested specifically in how to build (and hopefully run/debug..) under Intellij. Your posts sound like command line maven - which has always been working already. Do you have instructions for building in IJ? 2014-10-28 21:38 GMT-07:00 Cheng Lian lian.cs@gmail.com: Yes, these two combinations work for me. On 10/29/14 12:32 PM, Zhan Zhang wrote: -Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0 is to enable hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but expected to go to upstream soon (Spark-3720). Thanks. Zhan Zhang On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote: Thanks Patrick for the heads up. I have not been successful to discover a combination of profiles (i.e. enabling hive or hive-0.12.0 or hive-13.0) that works in Intellij with maven. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. We should come up with a good set of instructions on how to import the pom files + add the few extra source directories. Off hand I am not sure exactly what the correct sequence is. - Patrick On Tue, Oct 28, 2014 at 7:57 PM, Stephen Boesch java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)) ^ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail:
Re: HiveShim not found when building in Intellij
Thanks guys - adding the source root for the shim manually was the issue. For some reason the other issue I was struggling with (NoCLassDefFoundError on ThreadFactoryBuilder) also disappeared. I am able to run tests now inside IJ. Woot 2014-10-28 22:13 GMT-07:00 Patrick Wendell pwend...@gmail.com: Oops - I actually should have added v0.13.0 (i.e. to match whatever I did in the profile). On Tue, Oct 28, 2014 at 10:05 PM, Patrick Wendell pwend...@gmail.com wrote: Cheng - to make it recognize the new HiveShim for 0.12 I had to click on spark-hive under packages in the left pane, then go to Open Module Settings - then explicitly add the v0.12.0/src/main/scala folder to the sources by navigating to it and then ctrl+click to add it as a source. Did you have to do this? On Tue, Oct 28, 2014 at 9:57 PM, Patrick Wendell pwend...@gmail.com wrote: I just started a totally fresh IntelliJ project importing from our root pom. I used all the default options and I added hadoop-2.4, hive, hive-0.13.1 profiles. I was able to run spark core tests from within IntelliJ. Didn't try anything beyond that, but FWIW this worked. - Patrick On Tue, Oct 28, 2014 at 9:54 PM, Cheng Lian lian.cs@gmail.com wrote: You may first open the root pom.xml file in IDEA, and then go for menu View / Tool Windows / Maven Projects, then choose desired Maven profile combination under the Profiles node (e.g. I usually use hadoop-2.4 + hive + hive-0.12.0). IDEA will ask you to re-import the Maven projects, confirm, then it should be OK. I can debug within IDEA with this approach. However, you have to clean the whole project before debugging Spark within IDEA if you compiled the project outside IDEA. Haven't got time to investigate this annoying issue. Also, you can remove sub projects unrelated to your tasks to accelerate compilation and/or avoid other IDEA build issues (e.g. Avro related Spark streaming build failure in IDEA). On 10/29/14 12:42 PM, Stephen Boesch wrote: I am interested specifically in how to build (and hopefully run/debug..) under Intellij. Your posts sound like command line maven - which has always been working already. Do you have instructions for building in IJ? 2014-10-28 21:38 GMT-07:00 Cheng Lian lian.cs@gmail.com: Yes, these two combinations work for me. On 10/29/14 12:32 PM, Zhan Zhang wrote: -Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0 is to enable hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but expected to go to upstream soon (Spark-3720). Thanks. Zhan Zhang On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote: Thanks Patrick for the heads up. I have not been successful to discover a combination of profiles (i.e. enabling hive or hive-0.12.0 or hive-13.0) that works in Intellij with maven. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. We should come up with a good set of instructions on how to import the pom files + add the few extra source directories. Off hand I am not sure exactly what the correct sequence is. - Patrick On Tue, Oct 28, 2014 at 7:57 PM, Stephen Boesch java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com : Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at
Re: HiveShim not found when building in Intellij
Oops - I actually should have added v0.13.0 (i.e. to match whatever I did in the profile). On Tue, Oct 28, 2014 at 10:05 PM, Patrick Wendell pwend...@gmail.com wrote: Cheng - to make it recognize the new HiveShim for 0.12 I had to click on spark-hive under packages in the left pane, then go to Open Module Settings - then explicitly add the v0.12.0/src/main/scala folder to the sources by navigating to it and then ctrl+click to add it as a source. Did you have to do this? On Tue, Oct 28, 2014 at 9:57 PM, Patrick Wendell pwend...@gmail.com wrote: I just started a totally fresh IntelliJ project importing from our root pom. I used all the default options and I added hadoop-2.4, hive, hive-0.13.1 profiles. I was able to run spark core tests from within IntelliJ. Didn't try anything beyond that, but FWIW this worked. - Patrick On Tue, Oct 28, 2014 at 9:54 PM, Cheng Lian lian.cs@gmail.com wrote: You may first open the root pom.xml file in IDEA, and then go for menu View / Tool Windows / Maven Projects, then choose desired Maven profile combination under the Profiles node (e.g. I usually use hadoop-2.4 + hive + hive-0.12.0). IDEA will ask you to re-import the Maven projects, confirm, then it should be OK. I can debug within IDEA with this approach. However, you have to clean the whole project before debugging Spark within IDEA if you compiled the project outside IDEA. Haven't got time to investigate this annoying issue. Also, you can remove sub projects unrelated to your tasks to accelerate compilation and/or avoid other IDEA build issues (e.g. Avro related Spark streaming build failure in IDEA). On 10/29/14 12:42 PM, Stephen Boesch wrote: I am interested specifically in how to build (and hopefully run/debug..) under Intellij. Your posts sound like command line maven - which has always been working already. Do you have instructions for building in IJ? 2014-10-28 21:38 GMT-07:00 Cheng Lian lian.cs@gmail.com: Yes, these two combinations work for me. On 10/29/14 12:32 PM, Zhan Zhang wrote: -Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0 is to enable hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but expected to go to upstream soon (Spark-3720). Thanks. Zhan Zhang On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote: Thanks Patrick for the heads up. I have not been successful to discover a combination of profiles (i.e. enabling hive or hive-0.12.0 or hive-13.0) that works in Intellij with maven. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. We should come up with a good set of instructions on how to import the pom files + add the few extra source directories. Off hand I am not sure exactly what the correct sequence is. - Patrick On Tue, Oct 28, 2014 at 7:57 PM, Stephen Boesch java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)) ^
Re: HiveShim not found when building in Intellij
Hm, the shim source folder could be automatically recognized some time before, although at a wrong directory level (sql/hive/v0.12.0/src instead of sql/hive/v0.12.0/src/main/scala), it compiles. Just tried against a fresh checkout, indeed need to add shim source folder manually. Sorry for the confusion. Cheng On 10/29/14 1:05 PM, Patrick Wendell wrote: Cheng - to make it recognize the new HiveShim for 0.12 I had to click on spark-hive under packages in the left pane, then go to Open Module Settings - then explicitly add the v0.12.0/src/main/scala folder to the sources by navigating to it and then ctrl+click to add it as a source. Did you have to do this? On Tue, Oct 28, 2014 at 9:57 PM, Patrick Wendell pwend...@gmail.com wrote: I just started a totally fresh IntelliJ project importing from our root pom. I used all the default options and I added hadoop-2.4, hive, hive-0.13.1 profiles. I was able to run spark core tests from within IntelliJ. Didn't try anything beyond that, but FWIW this worked. - Patrick On Tue, Oct 28, 2014 at 9:54 PM, Cheng Lian lian.cs@gmail.com wrote: You may first open the root pom.xml file in IDEA, and then go for menu View / Tool Windows / Maven Projects, then choose desired Maven profile combination under the Profiles node (e.g. I usually use hadoop-2.4 + hive + hive-0.12.0). IDEA will ask you to re-import the Maven projects, confirm, then it should be OK. I can debug within IDEA with this approach. However, you have to clean the whole project before debugging Spark within IDEA if you compiled the project outside IDEA. Haven't got time to investigate this annoying issue. Also, you can remove sub projects unrelated to your tasks to accelerate compilation and/or avoid other IDEA build issues (e.g. Avro related Spark streaming build failure in IDEA). On 10/29/14 12:42 PM, Stephen Boesch wrote: I am interested specifically in how to build (and hopefully run/debug..) under Intellij. Your posts sound like command line maven - which has always been working already. Do you have instructions for building in IJ? 2014-10-28 21:38 GMT-07:00 Cheng Lian lian.cs@gmail.com: Yes, these two combinations work for me. On 10/29/14 12:32 PM, Zhan Zhang wrote: -Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0 is to enable hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but expected to go to upstream soon (Spark-3720). Thanks. Zhan Zhang On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote: Thanks Patrick for the heads up. I have not been successful to discover a combination of profiles (i.e. enabling hive or hive-0.12.0 or hive-13.0) that works in Intellij with maven. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. We should come up with a good set of instructions on how to import the pom files + add the few extra source directories. Off hand I am not sure exactly what the correct sequence is. - Patrick On Tue, Oct 28, 2014 at 7:57 PM, Stephen Boesch java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven project. There are several steps to do surgery on the yarn-parent / yarn projects , then do a full rebuild. That was working until one week ago. Intellij/maven is presently broken in two ways: this hive shim (which may yet hopefully be a small/simple fix - let us see) and (2) the NoClassDefFoundError on ThreadFactoryBuilder from my prior emails -and which is quite a serious problem . 2014-10-28 19:46 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Hi Stephen, How did you generate your Maven workspace? You need to make sure the Hive profile is enabled for it. For example sbt/sbt -Phive gen-idea. Matei On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim