security testing on spark ?
Hi all, Does anyone know of any effort from the community on security testing spark clusters. I.e. Static source code analysis to find security flaws Penetration testing to identify ways to compromise spark cluster Fuzzing to crash spark Thanks, Judy
RE: Error building Spark on Windows with sbt
I have not had any success building using sbt/sbt on windows. However, I have been able to binary by using maven command directly. From: Richard Eggert [mailto:richard.egg...@gmail.com] Sent: Sunday, October 25, 2015 12:51 PM To: Ted YuCc: User Subject: Re: Error building Spark on Windows with sbt Yes, I know, but it would be nice to be able to test things myself before I push commits. On Sun, Oct 25, 2015 at 3:50 PM, Ted Yu > wrote: If you have a pull request, Jenkins can test your change for you. FYI On Oct 25, 2015, at 12:43 PM, Richard Eggert > wrote: Also, if I run the Maven build on Windows or Linux without setting -DskipTests=true, it hangs indefinitely when it gets to org.apache.spark.JavaAPISuite. It's hard to test patches when the build doesn't work. :-/ On Sun, Oct 25, 2015 at 3:41 PM, Richard Eggert > wrote: By "it works", I mean, "It gets past that particular error". It still fails several minutes later with a different error: java.lang.IllegalStateException: impossible to get artifacts when data has not been loaded. IvyNode = org.scala-lang#scala-library;2.10.3 On Sun, Oct 25, 2015 at 3:38 PM, Richard Eggert > wrote: When I try to start up sbt for the Spark build, or if I try to import it in IntelliJ IDEA as an sbt project, it fails with a "No such file or directory" error when it attempts to "git clone" sbt-pom-reader into .sbt/0.13/staging/some-sha1-hash. If I manually create the expected directory before running sbt or importing into IntelliJ, then it works. Why is it necessary to do this, and what can be done to make it not necessary? Rich -- Rich -- Rich -- Rich
spark thrift server supports timeout?
Hello everyone, Does spark thrift server support timeout? Is there a documentation I can reference for questions like these? I know it support cancels, but not sure about timeout. Thanks, Judy
Get a list of temporary RDD tables via Thrift
Hi, How can I get a list of temporary tables via Thrift? Have used thrift's startWithContext and registered a temp table, but not seeing the temp table/rdd when running show tables. Thanks, Judy
saveAsTable fails on Python with Unresolved plan found
Hello, I am following the tutorial code on sql programming guidehttps://spark.apache.org/docs/1.2.1/sql-programming-guide.html#inferring-the-schema-using-reflection to try out Python on spark 1.2.1. SaveAsTable function works on Scala bur fails on python with Unresolved plan found. Broken Python code: from pyspark.sql import SQLContext, Row sqlContext = SQLContext(sc) lines = sc.textFile(data.txt) parts = lines.map(lambda l: l.split(,)) people = parts.map(lambda p: Row(id=p[0], name=p[1])) schemaPeople = sqlContext.inferSchema(people) schemaPeople.saveAsTable(peopletable) saveAsTable fails with Unresolved plan found. org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved plan found, tree: 'CreateTableAsSelect None, pytable, false, None This scala code works fine: from pyspark.sql import SQLContext, Row sqlContext = SQLContext(sc) lines = sc.textFile(data.txt) parts = lines.map(lambda l: l.split(,)) people = parts.map(lambda p: Row(id=p[0], name=p[1])) schemaPeople = sqlContext.inferSchema(people) schemaPeople.saveAsTable(peopletable) Is this a known issue? Or am I not using Python correctly? Thanks, Judy
RE: saveAsTable fails on Python with Unresolved plan found
SPARK-4825https://issues.apache.org/jira/browse/SPARK-4825 looks like the right bug, but it should've been fixed on 1.2.1. Is a similar fix needed in Python? From: Judy Nash Sent: Thursday, May 7, 2015 7:26 AM To: user@spark.apache.org Subject: saveAsTable fails on Python with Unresolved plan found Hello, I am following the tutorial code on sql programming guidehttps://spark.apache.org/docs/1.2.1/sql-programming-guide.html#inferring-the-schema-using-reflection to try out Python on spark 1.2.1. SaveAsTable function works on Scala bur fails on python with Unresolved plan found. Broken Python code: from pyspark.sql import SQLContext, Row sqlContext = SQLContext(sc) lines = sc.textFile(data.txt) parts = lines.map(lambda l: l.split(,)) people = parts.map(lambda p: Row(id=p[0], name=p[1])) schemaPeople = sqlContext.inferSchema(people) schemaPeople.saveAsTable(peopletable) saveAsTable fails with Unresolved plan found. org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved plan found, tree: 'CreateTableAsSelect None, pytable, false, None This scala code works fine: from pyspark.sql import SQLContext, Row sqlContext = SQLContext(sc) lines = sc.textFile(data.txt) parts = lines.map(lambda l: l.split(,)) people = parts.map(lambda p: Row(id=p[0], name=p[1])) schemaPeople = sqlContext.inferSchema(people) schemaPeople.saveAsTable(peopletable) Is this a known issue? Or am I not using Python correctly? Thanks, Judy
RE: saveAsTable fails on Python with Unresolved plan found
Figured it out. It was because I was using HiveContext instead of SQLContext. FYI in case others saw the same issue. From: Judy Nash Sent: Thursday, May 7, 2015 7:38 AM To: 'user@spark.apache.org' Subject: RE: saveAsTable fails on Python with Unresolved plan found SPARK-4825https://issues.apache.org/jira/browse/SPARK-4825 looks like the right bug, but it should've been fixed on 1.2.1. Is a similar fix needed in Python? From: Judy Nash Sent: Thursday, May 7, 2015 7:26 AM To: user@spark.apache.orgmailto:user@spark.apache.org Subject: saveAsTable fails on Python with Unresolved plan found Hello, I am following the tutorial code on sql programming guidehttps://spark.apache.org/docs/1.2.1/sql-programming-guide.html#inferring-the-schema-using-reflection to try out Python on spark 1.2.1. SaveAsTable function works on Scala bur fails on python with Unresolved plan found. Broken Python code: from pyspark.sql import SQLContext, Row sqlContext = SQLContext(sc) lines = sc.textFile(data.txt) parts = lines.map(lambda l: l.split(,)) people = parts.map(lambda p: Row(id=p[0], name=p[1])) schemaPeople = sqlContext.inferSchema(people) schemaPeople.saveAsTable(peopletable) saveAsTable fails with Unresolved plan found. org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved plan found, tree: 'CreateTableAsSelect None, pytable, false, None This scala code works fine: from pyspark.sql import SQLContext, Row sqlContext = SQLContext(sc) lines = sc.textFile(data.txt) parts = lines.map(lambda l: l.split(,)) people = parts.map(lambda p: Row(id=p[0], name=p[1])) schemaPeople = sqlContext.inferSchema(people) schemaPeople.saveAsTable(peopletable) Is this a known issue? Or am I not using Python correctly? Thanks, Judy
RE: Using 'fair' scheduler mode with thrift server
The expensive query can take all executor slots, but no task occupy the executor permanently. i.e. The second job can possibly to take some resources to execute in-between tasks of the expensive queries. Can the fair scheduler mode help in this case? Or is it possible to setup thrift such that no query is taking all resources. From: Sean Owen [mailto:so...@cloudera.com] Sent: Wednesday, April 1, 2015 12:28 AM To: Asad Khan Cc: user@spark.apache.org Subject: Re: Using 'fair' scheduler mode Does the expensive query take all executor slots? Then there is nothing for any other job to use regardless of scheduling policy. On Mar 31, 2015 9:20 PM, asadrao as...@microsoft.commailto:as...@microsoft.com wrote: Hi, I am using the Spark ‘fair’ scheduler mode. I have noticed that if the first query is a very expensive query (ex: ‘select *’ on a really big data set) than any subsequent query seem to get blocked. I would have expected the second query to run in parallel since I am using the ‘fair’ scheduler mode not the ‘fifo’. I am submitting the query through thrift server. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-fair-scheduler-mode-tp22328.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org
Spark SQL does not read from cached table if table is renamed
Hi all, Noticed a bug in my current version of Spark 1.2.1. After a table is cached with cache table table command, query will not read from memory if SQL query renames the table. This query reads from in memory table i.e. select hivesampletable.country from default.hivesampletable group by hivesampletable.country This query with renamed table reads from hive i.e. select table.country from default.hivesampletable table group by table.country Is this a known bug? Most BI tools rename tables to avoid table name collision. Thanks, Judy
Matching Spark application metrics data to App Id
Hi, I want to get telemetry metrics on spark apps activities, such as run time and jvm activities. Using Spark Metrics I am able to get the following sample data point on the an app: type=GAUGE, name=application.SparkSQL::headnode0.1426626495312.runtime_ms, value=414873 How can I match this datapoint to the AppId? (i.e. app-20150317210815-0001) Spark App name is not an unique identifier. 1426626495312 appear to be unique, but I am unable to see how this is related to the AppId. Thanks, Judy
RE: configure number of cached partition in memory on SparkSQL
Thanks Cheng for replying. Meant to say to change number of partitions of a cached table. It doesn’t need to be re-adjusted after caching. To provide more context: What I am seeing on my dataset is that we have a large number of tasks. Since it appears each task is mapped to a partition, I want to see if matching partitions to available core count will make it faster. I’ll give your suggestion a try to see if it will help. Experiment is a great way to learn more about spark internals. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Monday, March 16, 2015 5:41 AM To: Judy Nash; user@spark.apache.org Subject: Re: configure number of cached partition in memory on SparkSQL Hi Judy, In the case of HadoopRDD and NewHadoopRDD, partition number is actually decided by the InputFormat used. And spark.sql.inMemoryColumnarStorage.batchSize is not related to partition number, it controls the in-memory columnar batch size within a single partition. Also, what do you mean by “change the number of partitions after caching the table”? Are you trying to re-cache an already cached table with a different partition number? Currently, I don’t see a super intuitive pure SQL way to set the partition number in this case. Maybe you can try this (assuming table t has a column s which is expected to be sorted): SET spark.sql.shuffle.partitions = 10; CACHE TABLE cached_t AS SELECT * FROM t ORDER BY s; In this way, we introduce a shuffle by sorting a column, and zoom in/out the partition number at the same time. This might not be the best way out there, but it’s the first one that jumped into my head. Cheng On 3/5/15 3:51 AM, Judy Nash wrote: Hi, I am tuning a hive dataset on Spark SQL deployed via thrift server. How can I change the number of partitions created by caching the table on thrift server? I have tried the following but still getting the same number of partitions after caching: Spark.default.parallelism spark.sql.inMemoryColumnarStorage.batchSize Thanks, Judy
RE: spark standalone with multiple executors in one work node
I meant from one app, yes. Was asking this because our previous tuning experiment shows spark-on-yarn runs faster when overloading workers with executors (i.e. if a worker has 4 cores, creating 2 executors each use 4 cores will see a speed boost from 1 executor with 4 cores). I have found an equivalent solution for standalone that have given me a speed boost. Instead of adding more executors, I overloaded SPARK_WORKER_CORES to 2x of CPU cores on the worker. We are seeing better performance due to CPU now has consistent 100% utilization. -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Thursday, February 26, 2015 2:11 AM To: Judy Nash Cc: user@spark.apache.org Subject: Re: spark standalone with multiple executors in one work node --num-executors is the total number of executors. In YARN there is not quite the same notion of a Spark worker. Of course, one worker has an executor for each running app, so yes, but you mean for one app? it's possible, though not usual, to run multiple executors for one app on one worker. This may be useful if your executor heap size is otherwise getting huge. On Thu, Feb 26, 2015 at 1:58 AM, Judy Nash judyn...@exchange.microsoft.com wrote: Hello, Does spark standalone support running multiple executors in one worker node? It seems yarn has the parameter --num-executors to set number of executors to deploy, but I do not find the equivalent parameter in spark standalone. Thanks, Judy
configure number of cached partition in memory on SparkSQL
Hi, I am tuning a hive dataset on Spark SQL deployed via thrift server. How can I change the number of partitions after caching the table on thrift server? I have tried the following but still getting the same number of partitions after caching: Spark.default.parallelism spark.sql.inMemoryColumnarStorage.batchSize Thanks, Judy
spark standalone with multiple executors in one work node
Hello, Does spark standalone support running multiple executors in one worker node? It seems yarn has the parameter --num-executors to set number of executors to deploy, but I do not find the equivalent parameter in spark standalone. Thanks, Judy
spark slave cannot execute without admin permission on windows
Hi, Is it possible to configure spark to run without admin permission on windows? My current setup run master slave successfully with admin permission. However, if I downgrade permission level from admin to user, SparkPi fails with the following exception on the slave node: Exception in thread main org.apache.spark.SparkException: Job aborted due to s tage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 9, workernode0.jnashsparkcurr2.d10.internal.cloudapp.net) : java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi$$anonfun$1 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) Upon investigation, it appears that sparkPi jar under spark_home\worker\appname\*.jar does not have execute permission set, causing spark not able to find class. Advice would be very much appreciated. Thanks, Judy
RE: Is the Thrift server right for me?
It should relay the queries to spark (i.e. you shouldn't see any MR job on Hadoop you should see activities on the spark app on headnode UI). Check your hive-site.xml. Are you directing to the hive server 2 port instead of spark thrift port? Their default ports are both 1. From: Andrew Lee [mailto:alee...@hotmail.com] Sent: Wednesday, February 11, 2015 12:00 PM To: sjbrunst; user@spark.apache.org Subject: RE: Is the Thrift server right for me? I have ThriftServer2 up and running, however, I notice that it relays the query to HiveServer2 when I pass the hive-site.xml to it. I'm not sure if this is the expected behavior, but based on what I have up and running, the ThriftServer2 invokes HiveServer2 that results in MapReduce or Tez query. In this case, I could just connect directly to HiveServer2 if Hive is all you need. If you are programmer and want to mash up data from Hive with other tables and data in Spark, then Spark ThriftServer2 seems to be a good integration point at some use case. Please correct me if I misunderstood the purpose of Spark ThriftServer2. Date: Thu, 8 Jan 2015 14:49:00 -0700 From: sjbru...@uwaterloo.camailto:sjbru...@uwaterloo.ca To: user@spark.apache.orgmailto:user@spark.apache.org Subject: Is the Thrift server right for me? I'm building a system that collects data using Spark Streaming, does some processing with it, then saves the data. I want the data to be queried by multiple applications, and it sounds like the Thrift JDBC/ODBC server might be the right tool to handle the queries. However, the documentation for the Thrift server http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server seems to be written for Hive users who are moving to Spark. I never used Hive before I started using Spark, so it is not clear to me how best to use this. I've tried putting data into Hive, then serving it with the Thrift server. But I have not been able to update the data in Hive without first shutting down the server. This is a problem because new data is always being streamed in, and so the data must continuously be updated. The system I'm building is supposed to replace a system that stores the data in MongoDB. The dataset has now grown so large that the database index does not fit in memory, which causes major performance problems in MongoDB. If the Thrift server is the right tool for me, how can I set it up for my application? If it is not the right tool, what else can I use? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-the-Thrift-server-right-for-me-tp21044.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org
Spark Metrics Servlet for driver and executor
Hi all, Looking at spark metricsServlet. What is the url exposing driver executor json response? Found master and worker successfully, but can't find url that return json for the other 2 sources. Thanks! Judy
RE: spark 1.2 compatibility
Yes. It's compatible with HDP 2.1 -Original Message- From: bhavyateja [mailto:bhavyateja.potin...@gmail.com] Sent: Friday, January 16, 2015 3:17 PM To: user@spark.apache.org Subject: spark 1.2 compatibility Is spark 1.2 is compatibly with HDP 2.1 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-2-compatibility-tp21197.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: spark 1.2 compatibility
Should clarify on this. I personally have used HDP 2.1 + Spark 1.2 and have not seen a problem. However officially HDP 2.1 + Spark 1.2 is not a supported scenario. -Original Message- From: Judy Nash Sent: Friday, January 16, 2015 5:35 PM To: 'bhavyateja'; user@spark.apache.org Subject: RE: spark 1.2 compatibility Yes. It's compatible with HDP 2.1 -Original Message- From: bhavyateja [mailto:bhavyateja.potin...@gmail.com] Sent: Friday, January 16, 2015 3:17 PM To: user@spark.apache.org Subject: spark 1.2 compatibility Is spark 1.2 is compatibly with HDP 2.1 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-2-compatibility-tp21197.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Spark SQL API Doc IsCached as SQL command
Thanks Cheng. Tried it out and saw the InMemoryColumnarTableScan word in the physical plan. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Friday, December 12, 2014 11:37 PM To: Judy Nash; user@spark.apache.org Subject: Re: Spark SQL API Doc IsCached as SQL command There isn’t a SQL statement that directly maps SQLContext.isCached, but you can use EXPLAIN EXTENDED to check whether the underlying physical plan is a InMemoryColumnarTableScan. On 12/13/14 7:14 AM, Judy Nash wrote: Hello, Few questions on Spark SQL: 1) Does Spark SQL support equivalent SQL Query for Scala command: IsCached(table name) ? 2) Is there a documentation spec I can reference for question like this? Closest doc I can find is this one: https://spark.apache.org/docs/1.1.0/sql-programming-guide.html#deploying-in-existing-hive-warehouses Thanks, Judy
Spark SQL API Doc IsCached as SQL command
Hello, Few questions on Spark SQL: 1) Does Spark SQL support equivalent SQL Query for Scala command: IsCached(table name) ? 2) Is there a documentation spec I can reference for question like this? Closest doc I can find is this one: https://spark.apache.org/docs/1.1.0/sql-programming-guide.html#deploying-in-existing-hive-warehouses Thanks, Judy
RE: Spark-SQL JDBC driver
Looks like you are wondering why you cannot see the RDD table you have created via thrift? Based on my own experience with spark 1.1, RDD created directly via Spark SQL (i.e. Spark Shell or Spark-SQL.sh) is not visible on thrift, since thrift has its own session containing its own RDD. Spark SQL experts on the forum can confirm on this though. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Tuesday, December 9, 2014 6:42 AM To: Anas Mosaad Cc: Judy Nash; user@spark.apache.org Subject: Re: Spark-SQL JDBC driver According to the stacktrace, you were still using SQLContext rather than HiveContext. To interact with Hive, HiveContext *must* be used. Please refer to this page http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables On 12/9/14 6:26 PM, Anas Mosaad wrote: Back to the first question, this will mandate that hive is up and running? When I try it, I get the following exception. The documentation says that this method works only on SchemaRDD. I though that countries.saveAsTable did not work for that a reason so I created a tmp that contains the results from the registered temp table. Which I could validate that it's a SchemaRDD as shown below. @Judy, I do really appreciate your kind support and I want to understand and off course don't want to wast your time. If you can direct me the documentation describing this details, this will be great. scala val tmp = sqlContext.sql(select * from countries) tmp: org.apache.spark.sql.SchemaRDD = SchemaRDD[12] at RDD at SchemaRDD.scala:108 == Query Plan == == Physical Plan == PhysicalRDD [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29], MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36 scala tmp.saveAsTable(Countries) org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved plan found, tree: 'CreateTableAsSelect None, Countries, false, None Project [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29] Subquery countries LogicalRDD [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29], MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36 at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60) at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411) at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412) at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412) at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413) at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) at org.apache.spark.sql.SQLContext
RE: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava
To report back how I ultimately solved this issue and someone else can do: 1) Check each jar class path and make sure the jars are listed in the order of Guava class version (i.e. spark-assembly needs to list before Hadoop 2.4 because spark-assembly has guava 14 and Hadoop 2.4 has guava 11). May require update compute-classpath.sh to get the ordering right. 2) If the other jars uses a higher version, bump spark guava library to higher version. Guava supposedly to be very backward compatible. Hope this helps. -Original Message- From: Marcelo Vanzin [mailto:van...@cloudera.com] Sent: Tuesday, December 2, 2014 11:35 AM To: Judy Nash Cc: Patrick Wendell; Denny Lee; Cheng Lian; u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava On Tue, Dec 2, 2014 at 11:22 AM, Judy Nash judyn...@exchange.microsoft.com wrote: Any suggestion on how can user with custom Hadoop jar solve this issue? You'll need to include all the dependencies for that custom Hadoop jar to the classpath. Those will include Guava (which is not included in its original form as part of the Spark dependencies). -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Sunday, November 30, 2014 11:06 PM To: Judy Nash Cc: Denny Lee; Cheng Lian; u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Thanks Judy. While this is not directly caused by a Spark issue, it is likely other users will run into this. This is an unfortunate consequence of the way that we've shaded Guava in this release, we rely on byte code shading of Hadoop itself as well. And if the user has their own Hadoop classes present it can cause issues. On Sun, Nov 30, 2014 at 10:53 PM, Judy Nash judyn...@exchange.microsoft.com wrote: Thanks Patrick and Cheng for the suggestions. The issue was Hadoop common jar was added to a classpath. After I removed Hadoop common jar from both master and slave, I was able to bypass the error. This was caused by a local change, so no impact on the 1.2 release. -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Wednesday, November 26, 2014 8:17 AM To: Judy Nash Cc: Denny Lee; Cheng Lian; u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Just to double check - I looked at our own assembly jar and I confirmed that our Hadoop configuration class does use the correctly shaded version of Guava. My best guess here is that somehow a separate Hadoop library is ending up on the classpath, possible because Spark put it there somehow. tar xvzf spark-assembly-1.3.0-SNAPSHOT-hadoop2.4.0.jar cd org/apache/hadoop/ javap -v Configuration | grep Precond Warning: Binary file Configuration contains org.apache.hadoop.conf.Configuration #497 = Utf8 org/spark-project/guava/common/base/Preconditions #498 = Class #497 // org/spark-project/guava/common/base/Preconditions #502 = Methodref #498.#501// org/spark-project/guava/common/base/Preconditions.checkArgument:(ZL j ava/lang/Object;)V 12: invokestatic #502// Method org/spark-project/guava/common/base/Preconitions.checkArgument:(ZLj a va/lang/Object;)V 50: invokestatic #502// Method org/spark-project/guava/common/base/Preconitions.checkArgument:(ZLj a va/lang/Object;)V On Wed, Nov 26, 2014 at 11:08 AM, Patrick Wendell pwend...@gmail.com wrote: Hi Judy, Are you somehow modifying Spark's classpath to include jars from Hadoop and Hive that you have running on the machine? The issue seems to be that you are somehow including a version of Hadoop that references the original guava package. The Hadoop that is bundled in the Spark jars should not do this. - Patrick On Wed, Nov 26, 2014 at 1:45 AM, Judy Nash judyn...@exchange.microsoft.com wrote: Looks like a config issue. I ran spark-pi job and still failing with the same guava error Command ran: .\bin\spark-class.cmd org.apache.spark.deploy.SparkSubmit --class org.apache.spark.examples.SparkPi --master spark://headnodehost:7077 --executor-memory 1G --num-executors 1 .\lib\spark-examples-1.2.1-SNAPSHOT-hadoop2.4.0.jar 100 Had used the same build steps on spark 1.1 and had no issue. From: Denny Lee [mailto:denny.g@gmail.com] Sent: Tuesday, November 25, 2014 5:47 PM To: Judy Nash; Cheng Lian; u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava To determine if this is a Windows vs. other configuration, can you just try to call the Spark-class.cmd SparkSubmit without actually referencing the Hadoop or Thrift server classes? On Tue Nov 25 2014 at 5:42:09 PM Judy Nash judyn...@exchange.microsoft.com wrote: I
RE: Spark-SQL JDBC driver
You can use thrift server for this purpose then test it with beeline. See doc: https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server From: Anas Mosaad [mailto:anas.mos...@incorta.com] Sent: Monday, December 8, 2014 11:01 AM To: user@spark.apache.org Subject: Spark-SQL JDBC driver Hello Everyone, I'm brand new to spark and was wondering if there's a JDBC driver to access spark-SQL directly. I'm running spark in standalone mode and don't have hadoop in this environment. -- Best Regards/أطيب المنى, Anas Mosaad
monitoring for spark standalone
Hello, Are there ways we can programmatically get health status of master slave nodes, similar to Hadoop Ambari? Wiki seems to suggest there are only web UI or instrumentations (http://spark.apache.org/docs/latest/monitoring.html). Thanks, Judy
RE: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava
Any suggestion on how can user with custom Hadoop jar solve this issue? -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Sunday, November 30, 2014 11:06 PM To: Judy Nash Cc: Denny Lee; Cheng Lian; u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Thanks Judy. While this is not directly caused by a Spark issue, it is likely other users will run into this. This is an unfortunate consequence of the way that we've shaded Guava in this release, we rely on byte code shading of Hadoop itself as well. And if the user has their own Hadoop classes present it can cause issues. On Sun, Nov 30, 2014 at 10:53 PM, Judy Nash judyn...@exchange.microsoft.com wrote: Thanks Patrick and Cheng for the suggestions. The issue was Hadoop common jar was added to a classpath. After I removed Hadoop common jar from both master and slave, I was able to bypass the error. This was caused by a local change, so no impact on the 1.2 release. -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Wednesday, November 26, 2014 8:17 AM To: Judy Nash Cc: Denny Lee; Cheng Lian; u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Just to double check - I looked at our own assembly jar and I confirmed that our Hadoop configuration class does use the correctly shaded version of Guava. My best guess here is that somehow a separate Hadoop library is ending up on the classpath, possible because Spark put it there somehow. tar xvzf spark-assembly-1.3.0-SNAPSHOT-hadoop2.4.0.jar cd org/apache/hadoop/ javap -v Configuration | grep Precond Warning: Binary file Configuration contains org.apache.hadoop.conf.Configuration #497 = Utf8 org/spark-project/guava/common/base/Preconditions #498 = Class #497 // org/spark-project/guava/common/base/Preconditions #502 = Methodref #498.#501// org/spark-project/guava/common/base/Preconditions.checkArgument:(ZLj ava/lang/Object;)V 12: invokestatic #502// Method org/spark-project/guava/common/base/Preconitions.checkArgument:(ZLja va/lang/Object;)V 50: invokestatic #502// Method org/spark-project/guava/common/base/Preconitions.checkArgument:(ZLja va/lang/Object;)V On Wed, Nov 26, 2014 at 11:08 AM, Patrick Wendell pwend...@gmail.com wrote: Hi Judy, Are you somehow modifying Spark's classpath to include jars from Hadoop and Hive that you have running on the machine? The issue seems to be that you are somehow including a version of Hadoop that references the original guava package. The Hadoop that is bundled in the Spark jars should not do this. - Patrick On Wed, Nov 26, 2014 at 1:45 AM, Judy Nash judyn...@exchange.microsoft.com wrote: Looks like a config issue. I ran spark-pi job and still failing with the same guava error Command ran: .\bin\spark-class.cmd org.apache.spark.deploy.SparkSubmit --class org.apache.spark.examples.SparkPi --master spark://headnodehost:7077 --executor-memory 1G --num-executors 1 .\lib\spark-examples-1.2.1-SNAPSHOT-hadoop2.4.0.jar 100 Had used the same build steps on spark 1.1 and had no issue. From: Denny Lee [mailto:denny.g@gmail.com] Sent: Tuesday, November 25, 2014 5:47 PM To: Judy Nash; Cheng Lian; u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava To determine if this is a Windows vs. other configuration, can you just try to call the Spark-class.cmd SparkSubmit without actually referencing the Hadoop or Thrift server classes? On Tue Nov 25 2014 at 5:42:09 PM Judy Nash judyn...@exchange.microsoft.com wrote: I traced the code and used the following to call: Spark-class.cmd org.apache.spark.deploy.SparkSubmit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal --hiveconf hive.server2.thrift.port=1 The issue ended up to be much more fundamental however. Spark doesn't work at all in configuration below. When open spark-shell, it fails with the same ClassNotFound error. Now I wonder if this is a windows-only issue or the hive/Hadoop configuration that is having this problem. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Tuesday, November 25, 2014 1:50 AM To: Judy Nash; u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Oh so you're using Windows. What command are you using to start the Thrift server then? On 11/25/14 4:25 PM, Judy Nash wrote: Made progress but still blocked. After recompiling the code on cmd instead of PowerShell, now I can see all 5 classes as you mentioned. However I am still seeing the same error as before. Anything else I can check for? From
RE: Unable to compile spark 1.1.0 on windows 8.1
Have you checked out the wiki here? http://spark.apache.org/docs/latest/building-with-maven.html A couple things I did differently from you: 1) I got the bits directly from github (https://github.com/apache/spark/). Use branch 1.1 for spark 1.1 2) execute maven command on cmd (powershell misses libraries sometimes) 3) Increase maven memory per suggested by building with maven wiki Hope this helps. -Original Message- From: Ishwardeep Singh [mailto:ishwardeep.si...@impetus.co.in] Sent: Monday, December 1, 2014 1:50 AM To: u...@spark.incubator.apache.org Subject: RE: Unable to compile spark 1.1.0 on windows 8.1 Hi Judy, Thank you for your response. When I try to compile using maven mvn -Dhadoop.version=1.2.1 -DskipTests clean package I get an error Error: Could not find or load main class . I have maven 3.0.4. And when I run command sbt package I get the same exception as earlier. I have done the following steps: 1. Download spark-1.1.0.tgz from the spark site and unzip the compressed zip to a folder d:\myworkplace\software\spark-1.1.0 2. Then I downloaded sbt-0.13.7.zip and extract it to folder d:\myworkplace\software\sbt 3. Update the PATH environment variable to include d:\myworkplace\software\sbt\bin in the PATH. 4. Navigate to spark folder d:\myworkplace\software\spark-1.1.0 5. Run the command sbt assembly 6. As a side effect of this command a number of libraries are downloaded and I get an initial error that path C:\Users\ishwardeep.singh\.sbt\0.13\staging\ec3aa8f39111944cc5f2\sbt-pom-reader does not exist. 7. I manually create this subfolder ec3aa8f39111944cc5f2\sbt-pom-reader and retry to get the next error as described in my initial error. Is this the correct procedure to compile spark 1.1.0? Please let me know. Hoping to hear from you soon. Regards, ishwardeep -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-compile-spark-1-1-0-on-windows-8-1-tp19996p20075.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Unable to compile spark 1.1.0 on windows 8.1
I have found the following to work for me on win 8.1: 1) run sbt assembly 2) Use Maven. You can find the maven commands for your build at : docs\building-spark.md -Original Message- From: Ishwardeep Singh [mailto:ishwardeep.si...@impetus.co.in] Sent: Thursday, November 27, 2014 11:31 PM To: u...@spark.incubator.apache.org Subject: Unable to compile spark 1.1.0 on windows 8.1 Hi, I am trying to compile spark 1.1.0 on windows 8.1 but I get the following exception. [info] Compiling 3 Scala sources to D:\myworkplace\software\spark-1.1.0\project\target\scala-2.10\sbt0.13\classes... [error] D:\myworkplace\software\spark-1.1.0\project\SparkBuild.scala:26: object sbt is not a member of package com.typesafe [error] import com.typesafe.sbt.pom.{PomBuild, SbtPomKeys} [error] ^ [error] D:\myworkplace\software\spark-1.1.0\project\SparkBuild.scala:53: not found: type PomBuild [error] object SparkBuild extends PomBuild { [error] ^ [error] D:\myworkplace\software\spark-1.1.0\project\SparkBuild.scala:121: not found: value SbtPomKeys [error] otherResolvers = SbtPomKeys.mvnLocalRepository(dotM2 = Seq(Resolver.file(dotM2, dotM2))), [error]^ [error] D:\myworkplace\software\spark-1.1.0\project\SparkBuild.scala:165: value projectDefinitions is not a member of AnyRef [error] super.projectDefinitions(baseDirectory).map { x = [error] ^ [error] four errors found [error] (plugins/compile:compile) Compilation failed I have also setup scala 2.10. Need help to resolve this issue. Regards, Ishwardeep -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-compile-spark-1-1-0-on-windows-8-1-tp19996.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava
Thanks Patrick and Cheng for the suggestions. The issue was Hadoop common jar was added to a classpath. After I removed Hadoop common jar from both master and slave, I was able to bypass the error. This was caused by a local change, so no impact on the 1.2 release. -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Wednesday, November 26, 2014 8:17 AM To: Judy Nash Cc: Denny Lee; Cheng Lian; u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Just to double check - I looked at our own assembly jar and I confirmed that our Hadoop configuration class does use the correctly shaded version of Guava. My best guess here is that somehow a separate Hadoop library is ending up on the classpath, possible because Spark put it there somehow. tar xvzf spark-assembly-1.3.0-SNAPSHOT-hadoop2.4.0.jar cd org/apache/hadoop/ javap -v Configuration | grep Precond Warning: Binary file Configuration contains org.apache.hadoop.conf.Configuration #497 = Utf8 org/spark-project/guava/common/base/Preconditions #498 = Class #497 // org/spark-project/guava/common/base/Preconditions #502 = Methodref #498.#501// org/spark-project/guava/common/base/Preconditions.checkArgument:(ZLjava/lang/Object;)V 12: invokestatic #502// Method org/spark-project/guava/common/base/Preconitions.checkArgument:(ZLjava/lang/Object;)V 50: invokestatic #502// Method org/spark-project/guava/common/base/Preconitions.checkArgument:(ZLjava/lang/Object;)V On Wed, Nov 26, 2014 at 11:08 AM, Patrick Wendell pwend...@gmail.com wrote: Hi Judy, Are you somehow modifying Spark's classpath to include jars from Hadoop and Hive that you have running on the machine? The issue seems to be that you are somehow including a version of Hadoop that references the original guava package. The Hadoop that is bundled in the Spark jars should not do this. - Patrick On Wed, Nov 26, 2014 at 1:45 AM, Judy Nash judyn...@exchange.microsoft.com wrote: Looks like a config issue. I ran spark-pi job and still failing with the same guava error Command ran: .\bin\spark-class.cmd org.apache.spark.deploy.SparkSubmit --class org.apache.spark.examples.SparkPi --master spark://headnodehost:7077 --executor-memory 1G --num-executors 1 .\lib\spark-examples-1.2.1-SNAPSHOT-hadoop2.4.0.jar 100 Had used the same build steps on spark 1.1 and had no issue. From: Denny Lee [mailto:denny.g@gmail.com] Sent: Tuesday, November 25, 2014 5:47 PM To: Judy Nash; Cheng Lian; u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava To determine if this is a Windows vs. other configuration, can you just try to call the Spark-class.cmd SparkSubmit without actually referencing the Hadoop or Thrift server classes? On Tue Nov 25 2014 at 5:42:09 PM Judy Nash judyn...@exchange.microsoft.com wrote: I traced the code and used the following to call: Spark-class.cmd org.apache.spark.deploy.SparkSubmit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal --hiveconf hive.server2.thrift.port=1 The issue ended up to be much more fundamental however. Spark doesn't work at all in configuration below. When open spark-shell, it fails with the same ClassNotFound error. Now I wonder if this is a windows-only issue or the hive/Hadoop configuration that is having this problem. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Tuesday, November 25, 2014 1:50 AM To: Judy Nash; u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Oh so you're using Windows. What command are you using to start the Thrift server then? On 11/25/14 4:25 PM, Judy Nash wrote: Made progress but still blocked. After recompiling the code on cmd instead of PowerShell, now I can see all 5 classes as you mentioned. However I am still seeing the same error as before. Anything else I can check for? From: Judy Nash [mailto:judyn...@exchange.microsoft.com] Sent: Monday, November 24, 2014 11:50 PM To: Cheng Lian; u...@spark.incubator.apache.org Subject: RE: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava This is what I got from jar tf: org/spark-project/guava/common/base/Preconditions.class org/spark-project/guava/common/math/MathPreconditions.class com/clearspring/analytics/util/Preconditions.class parquet/Preconditions.class I seem to have the line that reported missing, but I am missing this file: com/google/inject/internal/util/$Preconditions.class Any suggestion on how to fix this? Very much appreciate the help as I am very new to Spark and open source technologies. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Monday
RE: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava
Made progress but still blocked. After recompiling the code on cmd instead of PowerShell, now I can see all 5 classes as you mentioned. However I am still seeing the same error as before. Anything else I can check for? From: Judy Nash [mailto:judyn...@exchange.microsoft.com] Sent: Monday, November 24, 2014 11:50 PM To: Cheng Lian; u...@spark.incubator.apache.org Subject: RE: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava This is what I got from jar tf: org/spark-project/guava/common/base/Preconditions.class org/spark-project/guava/common/math/MathPreconditions.class com/clearspring/analytics/util/Preconditions.class parquet/Preconditions.class I seem to have the line that reported missing, but I am missing this file: com/google/inject/internal/util/$Preconditions.class Any suggestion on how to fix this? Very much appreciate the help as I am very new to Spark and open source technologies. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Monday, November 24, 2014 8:24 PM To: Judy Nash; u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Hm, I tried exactly the same commit and the build command locally, but couldn’t reproduce this. Usually this kind of errors are caused by classpath misconfiguration. Could you please try this to ensure corresponding Guava classes are included in the assembly jar you built? jar tf assembly/target/scala-2.10/spark-assembly-1.2.1-SNAPSHOT-hadoop2.4.0.jar | grep Preconditions On my machine I got these lines (the first line is the one reported as missing in your case): org/spark-project/guava/common/base/Preconditions.class org/spark-project/guava/common/math/MathPreconditions.class com/clearspring/analytics/util/Preconditions.class parquet/Preconditions.class com/google/inject/internal/util/$Preconditions.class On 11/25/14 6:25 AM, Judy Nash wrote: Thank you Cheng for responding. Here is the commit SHA1 on the 1.2 branch I saw this failure in: commit 6f70e0295572e3037660004797040e026e440dbd Author: zsxwing zsxw...@gmail.commailto:zsxw...@gmail.com Date: Fri Nov 21 00:42:43 2014 -0800 [SPARK-4472][Shell] Print Spark context available as sc. only when SparkContext is created... ... successfully It's weird that printing Spark context available as sc when creating SparkContext unsuccessfully. Let me know if you need anything else. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Friday, November 21, 2014 8:02 PM To: Judy Nash; u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Hi Judy, could you please provide the commit SHA1 of the version you're using? Thanks! On 11/22/14 11:05 AM, Judy Nash wrote: Hi, Thrift server is failing to start for me on latest spark 1.2 branch. I got the error below when I start thrift server. Exception in thread main java.lang.NoClassDefFoundError: com/google/common/bas e/Preconditions at org.apache.hadoop.conf.Configuration$DeprecationDelta.init(Configur ation.java:314)…. Here is my setup: 1) Latest spark 1.2 branch build 2) Used build command: mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package 3) Added hive-site.xml to \conf 4) Version on the box: Hive 0.13, Hadoop 2.4 Is this a real bug or am I doing something wrong? --- Full Stacktrace: Exception in thread main java.lang.NoClassDefFoundError: com/google/common/bas e/Preconditions at org.apache.hadoop.conf.Configuration$DeprecationDelta.init(Configur ation.java:314) at org.apache.hadoop.conf.Configuration$DeprecationDelta.init(Configur ation.java:327) at org.apache.hadoop.conf.Configuration.clinit(Configuration.java:409) at org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopU til.scala:82) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala: 42) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala :202) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.sca la) at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1784) at org.apache.spark.storage.BlockManager.init(BlockManager.scala:105) at org.apache.spark.storage.BlockManager.init(BlockManager.scala:180) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:292) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:159) at org.apache.spark.SparkContext.init(SparkContext.scala:230) at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv. scala:38) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveTh riftServer2.scala:56) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThr
RE: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava
I traced the code and used the following to call: Spark-class.cmd org.apache.spark.deploy.SparkSubmit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal --hiveconf hive.server2.thrift.port=1 The issue ended up to be much more fundamental however. Spark doesn’t work at all in configuration below. When open spark-shell, it fails with the same ClassNotFound error. Now I wonder if this is a windows-only issue or the hive/Hadoop configuration that is having this problem. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Tuesday, November 25, 2014 1:50 AM To: Judy Nash; u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Oh so you're using Windows. What command are you using to start the Thrift server then? On 11/25/14 4:25 PM, Judy Nash wrote: Made progress but still blocked. After recompiling the code on cmd instead of PowerShell, now I can see all 5 classes as you mentioned. However I am still seeing the same error as before. Anything else I can check for? From: Judy Nash [mailto:judyn...@exchange.microsoft.com] Sent: Monday, November 24, 2014 11:50 PM To: Cheng Lian; u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: RE: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava This is what I got from jar tf: org/spark-project/guava/common/base/Preconditions.class org/spark-project/guava/common/math/MathPreconditions.class com/clearspring/analytics/util/Preconditions.class parquet/Preconditions.class I seem to have the line that reported missing, but I am missing this file: com/google/inject/internal/util/$Preconditions.class Any suggestion on how to fix this? Very much appreciate the help as I am very new to Spark and open source technologies. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Monday, November 24, 2014 8:24 PM To: Judy Nash; u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Hm, I tried exactly the same commit and the build command locally, but couldn’t reproduce this. Usually this kind of errors are caused by classpath misconfiguration. Could you please try this to ensure corresponding Guava classes are included in the assembly jar you built? jar tf assembly/target/scala-2.10/spark-assembly-1.2.1-SNAPSHOT-hadoop2.4.0.jar | grep Preconditions On my machine I got these lines (the first line is the one reported as missing in your case): org/spark-project/guava/common/base/Preconditions.class org/spark-project/guava/common/math/MathPreconditions.class com/clearspring/analytics/util/Preconditions.class parquet/Preconditions.class com/google/inject/internal/util/$Preconditions.class On 11/25/14 6:25 AM, Judy Nash wrote: Thank you Cheng for responding. Here is the commit SHA1 on the 1.2 branch I saw this failure in: commit 6f70e0295572e3037660004797040e026e440dbd Author: zsxwing zsxw...@gmail.commailto:zsxw...@gmail.com Date: Fri Nov 21 00:42:43 2014 -0800 [SPARK-4472][Shell] Print Spark context available as sc. only when SparkContext is created... ... successfully It's weird that printing Spark context available as sc when creating SparkContext unsuccessfully. Let me know if you need anything else. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Friday, November 21, 2014 8:02 PM To: Judy Nash; u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Hi Judy, could you please provide the commit SHA1 of the version you're using? Thanks! On 11/22/14 11:05 AM, Judy Nash wrote: Hi, Thrift server is failing to start for me on latest spark 1.2 branch. I got the error below when I start thrift server. Exception in thread main java.lang.NoClassDefFoundError: com/google/common/bas e/Preconditions at org.apache.hadoop.conf.Configuration$DeprecationDelta.init(Configur ation.java:314)…. Here is my setup: 1) Latest spark 1.2 branch build 2) Used build command: mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package 3) Added hive-site.xml to \conf 4) Version on the box: Hive 0.13, Hadoop 2.4 Is this a real bug or am I doing something wrong? --- Full Stacktrace: Exception in thread main java.lang.NoClassDefFoundError: com/google/common/bas e/Preconditions at org.apache.hadoop.conf.Configuration$DeprecationDelta.init(Configur ation.java:314) at org.apache.hadoop.conf.Configuration$DeprecationDelta.init(Configur ation.java:327) at org.apache.hadoop.conf.Configuration.clinit(Configuration.java:409) at org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopU til.scala:82) at org.apache.spark.deploy.SparkHadoopUtil.init
RE: beeline via spark thrift doesn't retain cache
Thanks Yanbo. My issue was 1) . I had spark thrift server setup, but it was running against hive instead of Spark SQL due a local change. After I fix this, beeline automatically caches rerun queries + accepts cache table. From: Yanbo Liang [mailto:yanboha...@gmail.com] Sent: Friday, November 21, 2014 12:42 AM To: Judy Nash Cc: u...@spark.incubator.apache.org Subject: Re: beeline via spark thrift doesn't retain cache 1) make sure your beeline client connected to Hiveserver2 of Spark SQL. You can found execution logs of Hiveserver2 in the environment of start-thriftserver.sh. 2) what about your scale of data. If cache with small data, it will take more time to schedule workload between different executors. Look the configuration of spark execution environment. Whether there are enough memory for RDD storage, if not, it will take some time to serialize/deserialize data between memory and disk. 2014-11-21 11:06 GMT+08:00 Judy Nash judyn...@exchange.microsoft.commailto:judyn...@exchange.microsoft.com: Hi friends, I have successfully setup thrift server and execute beeline on top. Beeline can handle select queries just fine, but it cannot seem to do any kind of caching/RDD operations. i.e. 1) Command “cache table” doesn’t work. See error: Error: Error while processing statement: FAILED: ParseException line 1:0 cannot recognize input near 'cache' 'table' 'hivesampletable' (state=42000,code=4) 2) Re-run SQL commands do not have any performance improvements. By comparison, Spark-SQL shell can execute “cache table” command and rerunning SQL command has a huge performance boost. Am I missing something or this is expected when execute through Spark thrift server? Thanks! Judy
RE: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava
Looks like a config issue. I ran spark-pi job and still failing with the same guava error Command ran: .\bin\spark-class.cmd org.apache.spark.deploy.SparkSubmit --class org.apache.spark.examples.SparkPi --master spark://headnodehost:7077 --executor-memory 1G --num-executors 1 .\lib\spark-examples-1.2.1-SNAPSHOT-hadoop2.4.0.jar 100 Had used the same build steps on spark 1.1 and had no issue. From: Denny Lee [mailto:denny.g@gmail.com] Sent: Tuesday, November 25, 2014 5:47 PM To: Judy Nash; Cheng Lian; u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava To determine if this is a Windows vs. other configuration, can you just try to call the Spark-class.cmd SparkSubmit without actually referencing the Hadoop or Thrift server classes? On Tue Nov 25 2014 at 5:42:09 PM Judy Nash judyn...@exchange.microsoft.commailto:judyn...@exchange.microsoft.com wrote: I traced the code and used the following to call: Spark-class.cmd org.apache.spark.deploy.SparkSubmit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal --hiveconf hive.server2.thrift.port=1 The issue ended up to be much more fundamental however. Spark doesn’t work at all in configuration below. When open spark-shell, it fails with the same ClassNotFound error. Now I wonder if this is a windows-only issue or the hive/Hadoop configuration that is having this problem. From: Cheng Lian [mailto:lian.cs@gmail.commailto:lian.cs@gmail.com] Sent: Tuesday, November 25, 2014 1:50 AM To: Judy Nash; u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Oh so you're using Windows. What command are you using to start the Thrift server then? On 11/25/14 4:25 PM, Judy Nash wrote: Made progress but still blocked. After recompiling the code on cmd instead of PowerShell, now I can see all 5 classes as you mentioned. However I am still seeing the same error as before. Anything else I can check for? From: Judy Nash [mailto:judyn...@exchange.microsoft.com] Sent: Monday, November 24, 2014 11:50 PM To: Cheng Lian; u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: RE: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava This is what I got from jar tf: org/spark-project/guava/common/base/Preconditions.class org/spark-project/guava/common/math/MathPreconditions.class com/clearspring/analytics/util/Preconditions.class parquet/Preconditions.class I seem to have the line that reported missing, but I am missing this file: com/google/inject/internal/util/$Preconditions.class Any suggestion on how to fix this? Very much appreciate the help as I am very new to Spark and open source technologies. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Monday, November 24, 2014 8:24 PM To: Judy Nash; u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Hm, I tried exactly the same commit and the build command locally, but couldn’t reproduce this. Usually this kind of errors are caused by classpath misconfiguration. Could you please try this to ensure corresponding Guava classes are included in the assembly jar you built? jar tf assembly/target/scala-2.10/spark-assembly-1.2.1-SNAPSHOT-hadoop2.4.0.jar | grep Preconditions On my machine I got these lines (the first line is the one reported as missing in your case): org/spark-project/guava/common/base/Preconditions.class org/spark-project/guava/common/math/MathPreconditions.class com/clearspring/analytics/util/Preconditions.class parquet/Preconditions.class com/google/inject/internal/util/$Preconditions.class On 11/25/14 6:25 AM, Judy Nash wrote: Thank you Cheng for responding. Here is the commit SHA1 on the 1.2 branch I saw this failure in: commit 6f70e0295572e3037660004797040e026e440dbd Author: zsxwing zsxw...@gmail.commailto:zsxw...@gmail.com Date: Fri Nov 21 00:42:43 2014 -0800 [SPARK-4472][Shell] Print Spark context available as sc. only when SparkContext is created... ... successfully It's weird that printing Spark context available as sc when creating SparkContext unsuccessfully. Let me know if you need anything else. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Friday, November 21, 2014 8:02 PM To: Judy Nash; u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Hi Judy, could you please provide the commit SHA1 of the version you're using? Thanks! On 11/22/14 11:05 AM, Judy Nash wrote: Hi, Thrift server is failing to start for me on latest spark 1.2 branch. I got the error below when I start thrift server. Exception in thread main java.lang.NoClassDefFoundError: com/google/common/bas e/Preconditions
RE: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava
This is what I got from jar tf: org/spark-project/guava/common/base/Preconditions.class org/spark-project/guava/common/math/MathPreconditions.class com/clearspring/analytics/util/Preconditions.class parquet/Preconditions.class I seem to have the line that reported missing, but I am missing this file: com/google/inject/internal/util/$Preconditions.class Any suggestion on how to fix this? Very much appreciate the help as I am very new to Spark and open source technologies. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Monday, November 24, 2014 8:24 PM To: Judy Nash; u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Hm, I tried exactly the same commit and the build command locally, but couldn’t reproduce this. Usually this kind of errors are caused by classpath misconfiguration. Could you please try this to ensure corresponding Guava classes are included in the assembly jar you built? jar tf assembly/target/scala-2.10/spark-assembly-1.2.1-SNAPSHOT-hadoop2.4.0.jar | grep Preconditions On my machine I got these lines (the first line is the one reported as missing in your case): org/spark-project/guava/common/base/Preconditions.class org/spark-project/guava/common/math/MathPreconditions.class com/clearspring/analytics/util/Preconditions.class parquet/Preconditions.class com/google/inject/internal/util/$Preconditions.class On 11/25/14 6:25 AM, Judy Nash wrote: Thank you Cheng for responding. Here is the commit SHA1 on the 1.2 branch I saw this failure in: commit 6f70e0295572e3037660004797040e026e440dbd Author: zsxwing zsxw...@gmail.commailto:zsxw...@gmail.com Date: Fri Nov 21 00:42:43 2014 -0800 [SPARK-4472][Shell] Print Spark context available as sc. only when SparkContext is created... ... successfully It's weird that printing Spark context available as sc when creating SparkContext unsuccessfully. Let me know if you need anything else. From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Friday, November 21, 2014 8:02 PM To: Judy Nash; u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava Hi Judy, could you please provide the commit SHA1 of the version you're using? Thanks! On 11/22/14 11:05 AM, Judy Nash wrote: Hi, Thrift server is failing to start for me on latest spark 1.2 branch. I got the error below when I start thrift server. Exception in thread main java.lang.NoClassDefFoundError: com/google/common/bas e/Preconditions at org.apache.hadoop.conf.Configuration$DeprecationDelta.init(Configur ation.java:314)…. Here is my setup: 1) Latest spark 1.2 branch build 2) Used build command: mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package 3) Added hive-site.xml to \conf 4) Version on the box: Hive 0.13, Hadoop 2.4 Is this a real bug or am I doing something wrong? --- Full Stacktrace: Exception in thread main java.lang.NoClassDefFoundError: com/google/common/bas e/Preconditions at org.apache.hadoop.conf.Configuration$DeprecationDelta.init(Configur ation.java:314) at org.apache.hadoop.conf.Configuration$DeprecationDelta.init(Configur ation.java:327) at org.apache.hadoop.conf.Configuration.clinit(Configuration.java:409) at org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopU til.scala:82) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala: 42) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala :202) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.sca la) at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1784) at org.apache.spark.storage.BlockManager.init(BlockManager.scala:105) at org.apache.spark.storage.BlockManager.init(BlockManager.scala:180) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:292) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:159) at org.apache.spark.SparkContext.init(SparkContext.scala:230) at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv. scala:38) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveTh riftServer2.scala:56) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThr iftServer2.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:353) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75
latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava
Hi, Thrift server is failing to start for me on latest spark 1.2 branch. I got the error below when I start thrift server. Exception in thread main java.lang.NoClassDefFoundError: com/google/common/bas e/Preconditions at org.apache.hadoop.conf.Configuration$DeprecationDelta.init(Configur ation.java:314) Here is my setup: 1) Latest spark 1.2 branch build 2) Used build command: mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package 3) Added hive-site.xml to \conf 4) Version on the box: Hive 0.13, Hadoop 2.4 Is this a real bug or am I doing something wrong? --- Full Stacktrace: Exception in thread main java.lang.NoClassDefFoundError: com/google/common/bas e/Preconditions at org.apache.hadoop.conf.Configuration$DeprecationDelta.init(Configur ation.java:314) at org.apache.hadoop.conf.Configuration$DeprecationDelta.init(Configur ation.java:327) at org.apache.hadoop.conf.Configuration.clinit(Configuration.java:409) at org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopU til.scala:82) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala: 42) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala :202) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.sca la) at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1784) at org.apache.spark.storage.BlockManager.init(BlockManager.scala:105) at org.apache.spark.storage.BlockManager.init(BlockManager.scala:180) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:292) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:159) at org.apache.spark.SparkContext.init(SparkContext.scala:230) at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv. scala:38) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveTh riftServer2.scala:56) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThr iftServer2.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:353) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: com.google.common.base.Precondition s at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
beeline via spark thrift doesn't retain cache
Hi friends, I have successfully setup thrift server and execute beeline on top. Beeline can handle select queries just fine, but it cannot seem to do any kind of caching/RDD operations. i.e. 1) Command cache table doesn't work. See error: Error: Error while processing statement: FAILED: ParseException line 1:0 cannot recognize input near 'cache' 'table' 'hivesampletable' (state=42000,code=4) 2) Re-run SQL commands do not have any performance improvements. By comparison, Spark-SQL shell can execute cache table command and rerunning SQL command has a huge performance boost. Am I missing something or this is expected when execute through Spark thrift server? Thanks! Judy