Re: Can't access remote Hive table from spark
Hi Zhan, Yes, I found there is a hdfs account, which is created by Ambari, but what's the password for this account, how can I login under this account? Can I just change the password for the hdfs account? Regards, -- Original -- From: Zhan Zhang;zzh...@hortonworks.com; Send time: Thursday, Feb 12, 2015 2:00 AM To: guxiaobo1...@qq.com; Cc: user@spark.apache.orguser@spark.apache.org; Cheng Lianlian.cs@gmail.com; Subject: Re: Can't access remote Hive table from spark You need to have right hdfs account, e.g., hdfs, to create directory and assign permission. Thanks. Zhan Zhang On Feb 11, 2015, at 4:34 AM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi Zhan, My Single Node Cluster of Hadoop is installed by Ambari 1.7.0, I tried to create the /user/xiaobogu directory in hdfs, but both failed with user xiaobogu and root [xiaobogu@lix1 current]$ hadoop dfs -mkdir /user/xiaobogu DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. mkdir: Permission denied: user=xiaobogu, access=WRITE, inode=/user:hdfs:hdfs:drwxr-xr-x root@lix1 bin]# hadoop dfs -mkdir /user/xiaobogu DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. mkdir: Permission denied: user=root, access=WRITE, inode=/user:hdfs:hdfs:drwxr-xr-x I notice there is a hdfs account created by ambari, but what's password for it, should I user the hdfs account to create the directory? -- Original -- From: Zhan Zhang;zzh...@hortonworks.com; Send time: Sunday, Feb 8, 2015 4:11 AM To: guxiaobo1...@qq.com; Cc: user@spark.apache.orguser@spark.apache.org; Cheng Lianlian.cs@gmail.com; Subject: Re: Can't access remote Hive table from spark Yes. You need to create xiaobogu under /user and provide right permission to xiaobogu. Thanks. Zhan Zhang On Feb 7, 2015, at 8:15 AM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi Zhan Zhang, With the pre-bulit version 1.2.0 of spark against the yarn cluster installed by ambari 1.7.0, I come with the following errors: [xiaobogu@lix1 spark]$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi--master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10 Spark assembly has been built with Hive, including Datanucleus jars on classpath 15/02/08 00:11:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/02/08 00:11:54 INFO client.RMProxy: Connecting to ResourceManager at lix1.bh.com/192.168.100.3:8050 15/02/08 00:11:56 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers 15/02/08 00:11:57 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4096 MB per container) 15/02/08 00:11:57 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/02/08 00:11:57 INFO yarn.Client: Setting up container launch context for our AM 15/02/08 00:11:57 INFO yarn.Client: Preparing resources for our AM container 15/02/08 00:11:58 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=xiaobogu, access=WRITE, inode=/user:hdfs:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6515) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6497) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6449) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4251) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2
Re: Can't access remote Hive table from spark
Hi Lian, Will the latest 0.14.0 version of Hive,which is installed by ambari 1.7.0 by default, be supported by the next release of Spark? Regards, -- Original -- From: Cheng Lian;lian.cs@gmail.com; Send time: Friday, Feb 6, 2015 9:02 AM To: guxiaobo1...@qq.com; user@spark.apache.orguser@spark.apache.org; Subject: Re: Can't access remote Hive table from spark Please note that Spark 1.2.0 only support Hive 0.13.1 or 0.12.0, none of other versions are supported. Best, Cheng On 1/25/15 12:18 AM, guxiaobo1982 wrote: Hi, I built and started a single node standalone Spark 1.2.0 cluster along with a single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and Hive node I can create and query tables inside Hive, and on remote machines I can submit the SparkPi example to the Spark master. But I failed to run the following example code : public class SparkTest { public static void main(String[] args) { String appName= This is a test application; String master=spark://lix1.bh.com:7077; SparkConf conf = new SparkConf().setAppName(appName).setMaster(master); JavaSparkContext sc = new JavaSparkContext(conf); JavaHiveContext sqlCtx = new org.apache.spark.sql.hive.api.java.JavaHiveContext(sc); //sqlCtx.sql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)); //sqlCtx.sql(LOAD DATA LOCAL INPATH '/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src); // Queries are expressed in HiveQL. ListRow rows = sqlCtx.sql(FROM src SELECT key, value).collect(); System.out.print(I got + rows.size() + rows \r\n); sc.close();} } Exception in thread main org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70) at org.apache.spark.sql.hive.HiveContext$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$super$lookupRelation(HiveContext.scala:253) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141) at org.apache.spark.sql.hive.HiveContext$anon$2.lookupRelation(HiveContext.scala:253) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:143) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:138) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$4.apply(TreeNode.scala:162) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157
Re: Can't access remote Hive table from spark
; Send time: Friday, Feb 6, 2015 2:55 PM To: guxiaobo1...@qq.com; Cc: user@spark.apache.orguser@spark.apache.org; Cheng Lianlian.cs@gmail.com; Subject: Re: Can't access remote Hive table from spark Not sure spark standalone mode. But on spark-on-yarn, it should work. You can check following link: http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/ Thanks. Zhan Zhang On Feb 5, 2015, at 5:02 PM, Cheng Lian lian.cs@gmail.com wrote: Please note that Spark 1.2.0 only support Hive 0.13.1 or 0.12.0, none of other versions are supported. Best, Cheng On 1/25/15 12:18 AM, guxiaobo1982 wrote: Hi, I built and started a single node standalone Spark 1.2.0 cluster along with a single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and Hive node I can create and query tables inside Hive, and on remote machines I can submit the SparkPi example to the Spark master. But I failed to run the following example code : public class SparkTest { public static void main(String[] args) { String appName= This is a test application; String master=spark://lix1.bh.com:7077; SparkConf conf = new SparkConf().setAppName(appName).setMaster(master); JavaSparkContext sc = new JavaSparkContext(conf); JavaHiveContext sqlCtx = new org.apache.spark.sql.hive.api.java.JavaHiveContext(sc); //sqlCtx.sql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)); //sqlCtx.sql(LOAD DATA LOCAL INPATH '/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src); // Queries are expressed in HiveQL. ListRow rows = sqlCtx.sql(FROM src SELECT key, value).collect(); System.out.print(I got + rows.size() + rows \r\n); sc.close();} } Exception in thread main org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70) at org.apache.spark.sql.hive.HiveContext$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$super$lookupRelation(HiveContext.scala:253) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141) at org.apache.spark.sql.hive.HiveContext$anon$2.lookupRelation(HiveContext.scala:253) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:143) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:138) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$4.apply(TreeNode.scala:162) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:147) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:138) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:137) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1$anonfun$apply$2.apply(RuleExecutor.scala:61) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1$anonfun$apply$2.apply(RuleExecutor.scala:59) at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) at scala.collection.immutable.List.foldLeft(List.scala:84) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1.apply(RuleExecutor.scala:59) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1.apply
Can we execute create table and load data commands against Hive inside HiveContext?
Hi, I am playing with the following example code: public class SparkTest { public static void main(String[] args){ String appName= This is a test application; String master=spark://lix1.bh.com:7077; SparkConf conf = new SparkConf().setAppName(appName).setMaster(master); JavaSparkContext sc = new JavaSparkContext(conf); JavaHiveContext sqlCtx = new org.apache.spark.sql.hive.api.java.JavaHiveContext(sc); //sqlCtx.sql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)); //sqlCtx.sql(LOAD DATA LOCAL INPATH '/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src); // Queries are expressed in HiveQL. ListRow rows = sqlCtx.sql(FROM src SELECT key, value).collect(); //ListRow rows = sqlCtx.sql(show tables).collect(); System.out.print(I got + rows.size() + rows \r\n); sc.close(); }} With the create table and load data commands commented out, the query command can be executed successfully, but I come to ClassNotFoundExceptions if these two commands are executed inside HiveContext, even with different error messages, The create table command will cause the following: Exception in thread main org.apache.spark.sql.execution.QueryExecutionException: FAILED: Hive Internal Error: java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook) at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:309) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) at org.apache.spark.sql.execution.Command$class.execute(commands.scala:46) at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.api.java.JavaSchemaRDD.init(JavaSchemaRDD.scala:42) at org.apache.spark.sql.hive.api.java.JavaHiveContext.sql(JavaHiveContext.scala:37) at com.blackhorse.SparkTest.main(SparkTest.java:24) [delete Spark temp dirs] DEBUG org.apache.spark.util.Utils - Shutdown hook called [delete Spark local dirs] DEBUG org.apache.spark.storage.DiskBlockManager - Shutdown hook called The load data command will cause the following: Exception in thread main org.apache.spark.sql.execution.QueryExecutionException: FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:309) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) at org.apache.spark.sql.execution.Command$class.execute(commands.scala:46) at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.api.java.JavaSchemaRDD.init(JavaSchemaRDD.scala:42) at org.apache.spark.sql.hive.api.java.JavaHiveContext.sql(JavaHiveContext.scala:37) at com.blackhorse.SparkTest.main(SparkTest.java:25) [delete Spark local dirs] DEBUG org.apache.spark.storage.DiskBlockManager - Shutdown hook called [delete Spark temp dirs] DEBUG org.apache.spark.util.Utils - Shutdown hook called
how to specify hive connection options for HiveContext
Hi, I know two options, one for spark_submit, the other one for spark-shell, but how to set for programs running inside eclipse? Regards,
Re: Can't access remote Hive table from spark
One friend told me that I should add the hive-site.xml file to the --files option of spark-submit command, but how can I run and debug my program inside eclipse? -- Original -- From: guxiaobo1982;guxiaobo1...@qq.com; Send time: Sunday, Feb 1, 2015 4:18 PM To: Jörn Frankejornfra...@gmail.com; Subject: Re: Can't access remote Hive table from spark I am sorry , i forget to say that I have created the table manually . 在 2015年2月1日,下午4:14,Jörn Franke jornfra...@gmail.com 写道: You commented the line which is suppose to create a table. Le 25 janv. 2015 09:20, guxiaobo1982 guxiaobo1...@qq.com a écrit : Hi, I built and started a single node standalone Spark 1.2.0 cluster along with a single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and Hive node I can create and query tables inside Hive, and on remote machines I can submit the SparkPi example to the Spark master. But I failed to run the following example code : public class SparkTest { public static void main(String[] args) { String appName= This is a test application; String master=spark://lix1.bh.com:7077; SparkConf conf = new SparkConf().setAppName(appName).setMaster(master); JavaSparkContext sc = new JavaSparkContext(conf); JavaHiveContext sqlCtx = new org.apache.spark.sql.hive.api.java.JavaHiveContext(sc); //sqlCtx.sql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)); //sqlCtx.sql(LOAD DATA LOCAL INPATH '/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src); // Queries are expressed in HiveQL. ListRow rows = sqlCtx.sql(FROM src SELECT key, value).collect(); System.out.print(I got + rows.size() + rows \r\n); sc.close();} } Exception in thread main org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70) at org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:253) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141) at org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:253) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:143) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:138) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:162) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:147) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:138) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:137) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61
spark-shell can't import the default hive-site.xml options probably.
Hi, To order to let a local spark-shell connect to a remote spark stand-alone cluster and access hive tables there, I must put the hive-site.xml file into the local spark installation's conf path, but spark-shell even can't import the default settings there, I found two errors: property namehive.metastore.client.connect.retry.delay/name value5s/value /property property namehive.metastore.client.socket.timeout/name value1800s/value /property Spark-shell try to read 5s and 1800s and integers, they must be changed to 5 and 1800 to let spark-shell work, It's suggested to be fixed in future versions.
Re: RE: Can't access remote Hive table from spark
Hi Skanda, How do set up your SPARK_CLASSPATH? I add the following line to my SPARK_HOME/conf/spark-env.sh , and still got the same error. export SPARK_CLASSPATH=${SPARK_CLASSPATH}:/etc/hive/conf -- Original -- From: Skanda Prasad;skanda.ganapa...@gmail.com; Send time: Monday, Jan 26, 2015 7:41 AM To: guxiaobo1...@qq.com; user@spark.apache.orguser@spark.apache.org; Subject: RE: Can't access remote Hive table from spark This happened to me as well, putting hive-site.xml inside conf doesn't seem to work. Instead I added /etc/hive/conf to SPARK_CLASSPATH and it worked. You can try this approach. -Skanda From: guxiaobo1982 Sent: 25-01-2015 13:50 To: user@spark.apache.org Subject: Can't access remote Hive table from spark Hi, I built and started a single node standalone Spark 1.2.0 cluster along with a single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and Hive node I can create and query tables inside Hive, and on remote machines I can submit the SparkPi example to the Spark master. But I failed to run the following example code : public class SparkTest { public static void main(String[] args) { String appName= This is a test application; String master=spark://lix1.bh.com:7077; SparkConf conf = new SparkConf().setAppName(appName).setMaster(master); JavaSparkContext sc = new JavaSparkContext(conf); JavaHiveContext sqlCtx = new org.apache.spark.sql.hive.api.java.JavaHiveContext(sc); //sqlCtx.sql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)); //sqlCtx.sql(LOAD DATA LOCAL INPATH '/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src); // Queries are expressed in HiveQL. ListRow rows = sqlCtx.sql(FROM src SELECT key, value).collect(); System.out.print(I got + rows.size() + rows \r\n); sc.close();} } Exception in thread main org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70) at org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:253) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141) at org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:253) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:143) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:138) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:162) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:147) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:138) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:137
Re: RE: Can't access remote Hive table from spark
The following line does not work too export SPARK_CLASSPATH=/etc/hive/conf -- Original -- From: guxiaobo1982;guxiaobo1...@qq.com; Send time: Sunday, Feb 1, 2015 2:15 PM To: Skanda Prasadskanda.ganapa...@gmail.com; user@spark.apache.orguser@spark.apache.org; Cc: 徐涛77044...@qq.com; Subject: Re: RE: Can't access remote Hive table from spark Hi Skanda, How do set up your SPARK_CLASSPATH? I add the following line to my SPARK_HOME/conf/spark-env.sh , and still got the same error. export SPARK_CLASSPATH=${SPARK_CLASSPATH}:/etc/hive/conf -- Original -- From: Skanda Prasad;skanda.ganapa...@gmail.com; Send time: Monday, Jan 26, 2015 7:41 AM To: guxiaobo1...@qq.com; user@spark.apache.orguser@spark.apache.org; Subject: RE: Can't access remote Hive table from spark This happened to me as well, putting hive-site.xml inside conf doesn't seem to work. Instead I added /etc/hive/conf to SPARK_CLASSPATH and it worked. You can try this approach. -Skanda From: guxiaobo1982 Sent: 25-01-2015 13:50 To: user@spark.apache.org Subject: Can't access remote Hive table from spark Hi, I built and started a single node standalone Spark 1.2.0 cluster along with a single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and Hive node I can create and query tables inside Hive, and on remote machines I can submit the SparkPi example to the Spark master. But I failed to run the following example code : public class SparkTest { public static void main(String[] args) { String appName= This is a test application; String master=spark://lix1.bh.com:7077; SparkConf conf = new SparkConf().setAppName(appName).setMaster(master); JavaSparkContext sc = new JavaSparkContext(conf); JavaHiveContext sqlCtx = new org.apache.spark.sql.hive.api.java.JavaHiveContext(sc); //sqlCtx.sql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)); //sqlCtx.sql(LOAD DATA LOCAL INPATH '/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src); // Queries are expressed in HiveQL. ListRow rows = sqlCtx.sql(FROM src SELECT key, value).collect(); System.out.print(I got + rows.size() + rows \r\n); sc.close();} } Exception in thread main org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70) at org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:253) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141) at org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:253) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:143) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:138) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:162) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191
Can't access remote Hive table from spark
Hi, I built and started a single node standalone Spark 1.2.0 cluster along with a single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and Hive node I can create and query tables inside Hive, and on remote machines I can submit the SparkPi example to the Spark master. But I failed to run the following example code : public class SparkTest { public static void main(String[] args) { String appName= This is a test application; String master=spark://lix1.bh.com:7077; SparkConf conf = new SparkConf().setAppName(appName).setMaster(master); JavaSparkContext sc = new JavaSparkContext(conf); JavaHiveContext sqlCtx = new org.apache.spark.sql.hive.api.java.JavaHiveContext(sc); //sqlCtx.sql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)); //sqlCtx.sql(LOAD DATA LOCAL INPATH '/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src); // Queries are expressed in HiveQL. ListRow rows = sqlCtx.sql(FROM src SELECT key, value).collect(); System.out.print(I got + rows.size() + rows \r\n); sc.close();} } Exception in thread main org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70) at org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:253) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141) at org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:253) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:143) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:138) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:162) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:147) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:138) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:137) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) at scala.collection.immutable.List.foldLeft(List.scala:84) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) at
How to create distributed matrixes from hive tables.
Hi, We have large datasets with data format for Spark MLLib matrix, but there are pre-computed by Hive and stored inside Hive, my question is can we create a distributed matrix such as IndexedRowMatrix directlly from Hive tables, avoiding reading data from Hive tables and feed them into an empty Matrix. Regards
How to get the master URL at runtime inside driver program?
Hi, Driver programs submitted by the spark-submit script will get the runtime spark master URL, but how it get the URL inside the main method when creating the SparkConf object? Regards,
Is cluster mode is supported by the submit command for standalone clusters?
Hi, The submitting applications guide in http://spark.apache.org/docs/latest/submitting-applications.html says: Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to usecluster mode to minimize network latency between the drivers and the executors. Note that cluster mode is currently not supported for standalone clusters, Mesos clusters, or python applications. But there is a followed example, is this an error? and is cluster mode supported for standalone clusters? # Run on a Spark Standalone cluster in cluster deploy mode with supervise ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master spark://207.184.161.138:7077 \ --deploy-mode cluster --supervise --executor-memory 20G \ --total-executor-cores 100 \ /path/to/examples.jar \ 1000
Re: Can't submit the SparkPi example to local Yarn 2.6.0 installed byambari 1.7.0
/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 --queue thequeue lib/spark-examples-1.2.0-hadoop2.6.0.jar 10 Got the same error by the above command, I think I missed the jar containing the Jackson libraries. -- Original -- From: Sean Owen;so...@cloudera.com; Send time: Sunday, Dec 28, 2014 3:08 AM To: guxiaobo1...@qq.com; Cc: useruser@spark.apache.org; Subject: Re: Can't submit the SparkPi example to local Yarn 2.6.0 installed byambari 1.7.0 The problem is a conflicts in the version of Jackson used in your cluster versus what you run. I would start by taking off things like the assembly jar from your classpath. Try the userClassPathFirst option as well to avoid using the Jackson in your Hadoop distribution. Hi,I build the 1.2.0 version of spark against single node hadoop 2.6.0 installed by ambari 1.7.0, the ./bin/run-example SparkPi 10 command can execute on my local Mac 10.9.5 and the centos virtual machine, which host hadoop, but I can't run the SparkPi example inside yarn, it seems there's something wrong with the classpathes: export HADOOP_CONF_DIR=/etc/hadoop/conf ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 --queue thequeue --jars spark-assembly-1.2.0-hadoop2.6.0.jar,spark-1.2.0-yarn-shuffle.jar,datanucleus-core-3.2.10.jar,datanucleus-rdbms-3.2.9.jar,datanucleus-api-jdo-3.2.6.jar lib/spark-examples-1.2.0-hadoop2.6.0.jar 10 Spark assembly has been built with Hive, including Datanucleus jars on classpath 14/12/10 15:38:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/12/10 15:39:00 INFO impl.TimelineClientImpl: Timeline service address: http://lix1.bh.com:8188/ws/v1/timeline/ Exception in thread main java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.configObjectMapper(YarnJacksonJaxbJsonProvider.java:57) at org.apache.hadoop.yarn.util.timeline.TimelineUtils.clinit(TimelineUtils.java:47) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:166) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:65) at org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:501) at org.apache.spark.deploy.yarn.Client.run(Client.scala:35) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.deser.std.StdDeserializer at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 28 more [xiaobogu@lix1 spark-1.2.0-bin-2.6.0]$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi
Can't submit the SparkPi example to local Yarn 2.6.0 installed by ambari 1.7.0
Hi,I build the 1.2.0 version of spark against single node hadoop 2.6.0 installed by ambari 1.7.0, the ./bin/run-example SparkPi 10 command can execute on my local Mac 10.9.5 and the centos virtual machine, which host hadoop, but I can't run the SparkPi example inside yarn, it seems there's something wrong with the classpathes: export HADOOP_CONF_DIR=/etc/hadoop/conf ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 --queue thequeue --jars spark-assembly-1.2.0-hadoop2.6.0.jar,spark-1.2.0-yarn-shuffle.jar,datanucleus-core-3.2.10.jar,datanucleus-rdbms-3.2.9.jar,datanucleus-api-jdo-3.2.6.jar lib/spark-examples-1.2.0-hadoop2.6.0.jar 10 Spark assembly has been built with Hive, including Datanucleus jars on classpath 14/12/10 15:38:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/12/10 15:39:00 INFO impl.TimelineClientImpl: Timeline service address: http://lix1.bh.com:8188/ws/v1/timeline/ Exception in thread main java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.configObjectMapper(YarnJacksonJaxbJsonProvider.java:57) at org.apache.hadoop.yarn.util.timeline.TimelineUtils.clinit(TimelineUtils.java:47) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:166) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:65) at org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:501) at org.apache.spark.deploy.yarn.Client.run(Client.scala:35) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.deser.std.StdDeserializer at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 28 more [xiaobogu@lix1 spark-1.2.0-bin-2.6.0]$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 --queue thequeue lib/spark-examples-1.2.0-hadoop2.6.0.jar 10 Spark assembly has been built with Hive, including Datanucleus jars on classpath 14/12/10 15:39:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/12/10 15:39:51 INFO impl.TimelineClientImpl: Timeline service address: http://lix1.bh.com:8188/ws/v1/timeline/ Exception in thread main java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at
Re: How to build Spark against the latest
The following command works ./make-distribution.sh --tgz -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests -- Original -- From: guxiaobo1982;guxiaobo1...@qq.com; Send time: Thursday, Dec 25, 2014 3:58 PM To: guxiaobo1...@qq.com; Ted Yuyuzhih...@gmail.com; Cc: user@spark.apache.orguser@spark.apache.org; Subject: Re: How to build Spark against the latest What options should I use when running the make-distribution.sh script, I tried ./make-distribution.sh --hadoop.version 2.6.0 --with-yarn -with-hive --with-tachyon --tgz with nothing came out. Regards -- Original -- From: guxiaobo1982;guxiaobo1...@qq.com; Send time: Wednesday, Dec 24, 2014 6:52 PM To: Ted Yuyuzhih...@gmail.com; Cc: user@spark.apache.orguser@spark.apache.org; Subject: Re: How to build Spark against the latest Hi Ted, The reference command works, but where I can get the deployable binaries? Xiaobo Gu -- Original -- From: Ted Yu;yuzhih...@gmail.com; Send time: Wednesday, Dec 24, 2014 12:09 PM To: guxiaobo1...@qq.com; Cc: user@spark.apache.orguser@spark.apache.org; Subject: Re: How to build Spark against the latest See http://search-hadoop.com/m/JW1q5Cew0j On Tue, Dec 23, 2014 at 8:00 PM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, The official pom.xml file only have profile for hadoop version 2.4 as the latest version, but I installed hadoop version 2.6.0 with ambari, how can I build spark against it, just using mvn -Dhadoop.version=2.6.0, or how to make a coresponding profile for it? Regards, Xiaobo
Re: How to build Spark against the latest
Hi Ted, The reference command works, but where I can get the deployable binaries? Xiaobo Gu -- Original -- From: Ted Yu;yuzhih...@gmail.com; Send time: Wednesday, Dec 24, 2014 12:09 PM To: guxiaobo1...@qq.com; Cc: user@spark.apache.orguser@spark.apache.org; Subject: Re: How to build Spark against the latest See http://search-hadoop.com/m/JW1q5Cew0j On Tue, Dec 23, 2014 at 8:00 PM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, The official pom.xml file only have profile for hadoop version 2.4 as the latest version, but I installed hadoop version 2.6.0 with ambari, how can I build spark against it, just using mvn -Dhadoop.version=2.6.0, or how to make a coresponding profile for it? Regards, Xiaobo
Re: How to build Spark against the latest
What options should I use when running the make-distribution.sh script, I tried ./make-distribution.sh --hadoop.version 2.6.0 --with-yarn -with-hive --with-tachyon --tgz with nothing came out. Regards -- Original -- From: guxiaobo1982;guxiaobo1...@qq.com; Send time: Wednesday, Dec 24, 2014 6:52 PM To: Ted Yuyuzhih...@gmail.com; Cc: user@spark.apache.orguser@spark.apache.org; Subject: Re: How to build Spark against the latest Hi Ted, The reference command works, but where I can get the deployable binaries? Xiaobo Gu -- Original -- From: Ted Yu;yuzhih...@gmail.com; Send time: Wednesday, Dec 24, 2014 12:09 PM To: guxiaobo1...@qq.com; Cc: user@spark.apache.orguser@spark.apache.org; Subject: Re: How to build Spark against the latest See http://search-hadoop.com/m/JW1q5Cew0j On Tue, Dec 23, 2014 at 8:00 PM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, The official pom.xml file only have profile for hadoop version 2.4 as the latest version, but I installed hadoop version 2.6.0 with ambari, how can I build spark against it, just using mvn -Dhadoop.version=2.6.0, or how to make a coresponding profile for it? Regards, Xiaobo
How to build Spark against the latest
Hi, The official pom.xml file only have profile for hadoop version 2.4 as the latest version, but I installed hadoop version 2.6.0 with ambari, how can I build spark against it, just using mvn -Dhadoop.version=2.6.0, or how to make a coresponding profile for it? Regards, Xiaobo
Re: What about implementing various hypothesis test for LogisticRegression in MLlib
Hi Xiangrui, You can refer to An Introduction to Statistical Learning with Applications in R, there are many stander hypothesis test to do regarding to linear regression and logistic regression, they should be implement in the fist order, then we will list some other testes, which are also important when using logistic regression to build score cards. Xiaobo Gu -- Original -- From: Xiangrui Meng;men...@gmail.com; Send time: Wednesday, Aug 20, 2014 2:18 PM To: guxiaobo1...@qq.com; Cc: user@spark.apache.orguser@spark.apache.org; Subject: Re: What about implementing various hypothesis test for LogisticRegression in MLlib We implemented chi-squared tests in v1.1: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala#L166 and we will add more after v1.1. Feedback on which tests should come first would be greatly appreciated. -Xiangrui On Tue, Aug 19, 2014 at 9:50 PM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, From the documentation I think only the model fitting part is implement, what about the various hypothesis test and performance indexes used to evaluate the model fit? Regards, Xiaobo Gu - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
What about implementing various hypothesis test for Logistic Regression in MLlib
Hi, From the documentation I think only the model fitting part is implement, what about the various hypothesis test and performance indexes used to evaluate the model fit? Regards, Xiaobo Gu
What's the best practice to deploy spark on Big SMP servers?
Hi, We have a big SMP server(with 128G RAM and 32 CPU cores) to runn small scale analytical works, what's the best practice to deploy a stand alone Spark on the server to achieve good performance. How many instances should be configured, how many RAM and CPU cores should be allocated for each instances? Regards, Xiaobo Gu
Where Can I find the full documentation for Spark SQL?
Hi, I want to know the full list of functions, syntax, features that Spark SQL supports, is there some documentations. Regards, Xiaobo Gu
Re: Where Can I find the full documentation for Spark SQL?
the api only says this : public JavaSchemaRDD sql(String sqlQuery)Executes a query expressed in SQL, returning the result as a JavaSchemaRDD but what kind of sqlQuery we can execute, is there any more documentation? Xiaobo Gu -- Original -- From: Gianluca Privitera;gianluca.privite...@studio.unibo.it; Date: Jun 26, 2014 To: user@spark.apache.orguser@spark.apache.org; Subject: Re: Where Can I find the full documentation for Spark SQL? You can find something in the API, nothing more than that I think for now. Gianluca On 25 Jun 2014, at 23:36, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, I want to know the full list of functions, syntax, features that Spark SQL supports, is there some documentations. Regards, Xiaobo Gu .