Re: Can't access remote Hive table from spark

2015-02-11 Thread guxiaobo1982
Hi Zhan,


Yes, I found there is a hdfs account, which is created by Ambari, but what's 
the password for this account, how can I login under this account?
Can I just change the password for the hdfs account?


Regards,






-- Original --
From:  Zhan Zhang;zzh...@hortonworks.com;
Send time: Thursday, Feb 12, 2015 2:00 AM
To: guxiaobo1...@qq.com; 
Cc: user@spark.apache.orguser@spark.apache.org; Cheng 
Lianlian.cs@gmail.com; 
Subject:  Re: Can't access remote Hive table from spark



 You need to have right hdfs account, e.g., hdfs,  to create directory and 
assign permission. 
 
 Thanks.
 
 
 Zhan Zhang
  On Feb 11, 2015, at 4:34 AM, guxiaobo1982 guxiaobo1...@qq.com wrote:
 
  Hi Zhan,
 My Single Node Cluster of Hadoop is installed by Ambari 1.7.0, I tried to 
create the /user/xiaobogu directory in hdfs, but both failed with user xiaobogu 
and root
 
 
   [xiaobogu@lix1 current]$ hadoop dfs -mkdir /user/xiaobogu
  DEPRECATED: Use of this script to execute hdfs command is deprecated.
  Instead use the hdfs command for it.
  
 
  mkdir: Permission denied: user=xiaobogu, access=WRITE, 
inode=/user:hdfs:hdfs:drwxr-xr-x
  
 
  root@lix1 bin]# hadoop dfs -mkdir /user/xiaobogu
  DEPRECATED: Use of this script to execute hdfs command is deprecated.
  Instead use the hdfs command for it.
  
 
 
 
  mkdir: Permission denied: user=root, access=WRITE, 
inode=/user:hdfs:hdfs:drwxr-xr-x
  
 
  I notice there is a hdfs account created by ambari, but what's password for 
it, should I user the hdfs account to create the directory?
  
 
 
  
 
 
 
 -- Original --
  From:  Zhan Zhang;zzh...@hortonworks.com;
 Send time: Sunday, Feb 8, 2015 4:11 AM
 To: guxiaobo1...@qq.com; 
 Cc: user@spark.apache.orguser@spark.apache.org; Cheng 
Lianlian.cs@gmail.com; 
 Subject:  Re: Can't access remote Hive table from spark
 
 
 
 Yes. You need to create xiaobogu under /user and provide right permission to 
xiaobogu. 
 
 Thanks.
 
 
 Zhan Zhang
 
  On Feb 7, 2015, at 8:15 AM, guxiaobo1982 guxiaobo1...@qq.com wrote:
 
  Hi Zhan Zhang,
 
 
 With the pre-bulit version 1.2.0 of spark against the yarn cluster installed 
by ambari 1.7.0, I come with the following errors:
  
[xiaobogu@lix1 spark]$ ./bin/spark-submit --class 
org.apache.spark.examples.SparkPi--master yarn-cluster  --num-executors 3 
--driver-memory 512m  --executor-memory 512m   --executor-cores 1  
lib/spark-examples*.jar 10
 

 
 
Spark assembly has been built with Hive, including Datanucleus jars on classpath
 
15/02/08 00:11:53 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
 
15/02/08 00:11:54 INFO client.RMProxy: Connecting to ResourceManager at 
lix1.bh.com/192.168.100.3:8050
 
15/02/08 00:11:56 INFO yarn.Client: Requesting a new application from cluster 
with 1 NodeManagers
 
15/02/08 00:11:57 INFO yarn.Client: Verifying our application has not requested 
more than the maximum memory capability of the cluster (4096 MB per container)
 
15/02/08 00:11:57 INFO yarn.Client: Will allocate AM container, with 896 MB 
memory including 384 MB overhead
 
15/02/08 00:11:57 INFO yarn.Client: Setting up container launch context for our 
AM
 
15/02/08 00:11:57 INFO yarn.Client: Preparing resources for our AM container
 
15/02/08 00:11:58 WARN hdfs.BlockReaderLocal: The short-circuit local reads 
feature cannot be used because libhadoop cannot be loaded.
 
Exception in thread main org.apache.hadoop.security.AccessControlException: 
Permission denied: user=xiaobogu, access=WRITE, 
inode=/user:hdfs:hdfs:drwxr-xr-x
 
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6515)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6497)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6449)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4251)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
 
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
 
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
 
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2

Re: Can't access remote Hive table from spark

2015-02-08 Thread guxiaobo1982
Hi Lian,
Will the latest 0.14.0 version of Hive,which is installed by ambari 1.7.0 by 
default, be supported by the next release of Spark?


Regards,




-- Original --
From:  Cheng Lian;lian.cs@gmail.com;
Send time: Friday, Feb 6, 2015 9:02 AM
To: guxiaobo1...@qq.com; user@spark.apache.orguser@spark.apache.org; 

Subject:  Re: Can't access remote Hive table from spark



  
Please note that Spark 1.2.0 only support Hive 0.13.1 or 0.12.0,
 none of other versions are supported.
   
Best,
 Cheng
   
On 1/25/15 12:18 AM, guxiaobo1982 wrote:
   



Hi,
   I built and started a single node standalone Spark 1.2.0 
cluster along with a single node Hive 0.14.0 instance installed by 
Ambari 1.17.0. On the Spark and Hive node I can create and query 
tables inside Hive, and on remote machines I can submit the SparkPi 
example   to the Spark master. But I failed to run the following
   example code :
   
 

public class SparkTest {
 
 public static   void main(String[] args)
 
 {
 
  String appName= This   is a test application;
 
  String master=spark://lix1.bh.com:7077;
 
  
 
  SparkConf conf = new   
SparkConf().setAppName(appName).setMaster(master);
 
  JavaSparkContext sc = new   JavaSparkContext(conf);
 
  
 
  JavaHiveContext sqlCtx = new   
org.apache.spark.sql.hive.api.java.JavaHiveContext(sc);
 
  //sqlCtx.sql(CREATE   TABLE IF NOT EXISTS src 
(key INT,   value STRING));
 
  //sqlCtx.sql(LOAD   DATA LOCAL INPATH 
'/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src);
 
  //   Queries are expressed in HiveQL.
 
ListRow rows = sqlCtx.sql(FROM src SELECT key, value).collect();
 
System.out.print(I got  + rows.size() +  rows \r\n);
 
  sc.close();}
 
}
 

 
 
Exception in thread main 
org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src
 
 at   
org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)
 
 at   
org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)
 
 at   
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70)
 
 at 
org.apache.spark.sql.hive.HiveContext$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$super$lookupRelation(HiveContext.scala:253)
 
 at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141)
 
 at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141)
 
 at   scala.Option.getOrElse(Option.scala:120)
 
 at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141)
 
 at   
org.apache.spark.sql.hive.HiveContext$anon$2.lookupRelation(HiveContext.scala:253)
 
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:143)
 
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:138)
 
 at   
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
 
 at   
org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$4.apply(TreeNode.scala:162)
 
 at   scala.collection.Iterator$anon$11.next(Iterator.scala:328)
 
 at   scala.collection.Iterator$class.foreach(Iterator.scala:727)
 
 at   scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 
 at   
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
 
 at   
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
 
 at   
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
 
 at   
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
 
 at   scala.collection.AbstractIterator.to(Iterator.scala:1157)
 
 at   
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
 
 at   
scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157

Re: Can't access remote Hive table from spark

2015-02-07 Thread guxiaobo1982
;
Send time: Friday, Feb 6, 2015 2:55 PM
To: guxiaobo1...@qq.com; 
Cc: user@spark.apache.orguser@spark.apache.org; Cheng 
Lianlian.cs@gmail.com; 
Subject:  Re: Can't access remote Hive table from spark



 Not sure spark standalone mode. But on spark-on-yarn, it should work. You can 
check following link: 
 
  http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/
 
 
 Thanks.
 
 
 Zhan Zhang
 
  On Feb 5, 2015, at 5:02 PM, Cheng Lian lian.cs@gmail.com wrote:
 

Please note that Spark 1.2.0 only support Hive 0.13.1 or 0.12.0, none of other 
versions are supported.
 
Best,
 Cheng
 
On 1/25/15 12:18 AM, guxiaobo1982 wrote:
 
 
  
 
  Hi,
 I built and started a single node standalone Spark 1.2.0 cluster along with a 
single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and 
Hive node I can create and query tables inside Hive, and on remote machines I 
can submit the SparkPi example  to the Spark master. But I failed to run the 
following example code :
 
 
  
public class SparkTest {
 
public  static void main(String[] args)
 
{
 
String appName= This is a test application;
 
String master=spark://lix1.bh.com:7077;
 
 
 
SparkConf conf = new SparkConf().setAppName(appName).setMaster(master);
 
JavaSparkContext sc = new JavaSparkContext(conf);
 
 
 
JavaHiveContext sqlCtx = new 
org.apache.spark.sql.hive.api.java.JavaHiveContext(sc);
 
//sqlCtx.sql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING));
 
//sqlCtx.sql(LOAD DATA LOCAL INPATH 
'/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src);
 
// Queries are expressed in HiveQL.
 
ListRow rows = sqlCtx.sql(FROM src SELECT key, value).collect();
 
System.out.print(I got  + rows.size() +  rows \r\n);
 
sc.close();}
 
}
 

 
 
Exception in thread main 
org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src
 
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)
 
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)
 
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70)
 
at 
org.apache.spark.sql.hive.HiveContext$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$super$lookupRelation(HiveContext.scala:253)
 
at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141)
 
at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141)
 
at scala.Option.getOrElse(Option.scala:120)
 
at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141)
 
at 
org.apache.spark.sql.hive.HiveContext$anon$2.lookupRelation(HiveContext.scala:253)
 
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:143)
 
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:138)
 
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
 
at 
org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$4.apply(TreeNode.scala:162)
 
at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
 
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
 
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
 
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
 
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
 
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
 
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
 
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
 
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
 
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
 
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191)
 
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:147)
 
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
 
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:138)
 
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:137)
 
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1$anonfun$apply$2.apply(RuleExecutor.scala:61)
 
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1$anonfun$apply$2.apply(RuleExecutor.scala:59)
 
at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
 
at scala.collection.immutable.List.foldLeft(List.scala:84)
 
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1.apply(RuleExecutor.scala:59)
 
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1.apply

Can we execute create table and load data commands against Hive inside HiveContext?

2015-02-05 Thread guxiaobo1982
Hi,


I am playing with the following example code:


 
public class SparkTest {
 
public static void main(String[] args){
 
String appName= This is a test application;
 
String master=spark://lix1.bh.com:7077;
 

 
SparkConf conf = new 
SparkConf().setAppName(appName).setMaster(master);
 
JavaSparkContext sc = new JavaSparkContext(conf);
 

 
JavaHiveContext sqlCtx = new 
org.apache.spark.sql.hive.api.java.JavaHiveContext(sc);
 
//sqlCtx.sql(CREATE TABLE IF NOT EXISTS src (key INT, value 
STRING));
 
//sqlCtx.sql(LOAD DATA LOCAL INPATH 
'/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src);
 
// Queries are expressed in HiveQL.
 
ListRow rows = sqlCtx.sql(FROM src SELECT key, 
value).collect();
 
//ListRow rows = sqlCtx.sql(show tables).collect();
 

 
System.out.print(I got  + rows.size() +  rows \r\n);
 
sc.close();
 
}}

With the create table and load data commands commented out, the query command 
can be executed successfully, but I come to ClassNotFoundExceptions if these 
two commands are executed inside HiveContext, even with different error 
messages,

The create table command will cause the following:




  

Exception in thread main 
org.apache.spark.sql.execution.QueryExecutionException: FAILED: Hive Internal 
Error: java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)

at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:309)

at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)

at 
org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)

at 
org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)

at 
org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)

at 
org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30)

at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)

at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)

at 
org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)

at 
org.apache.spark.sql.api.java.JavaSchemaRDD.init(JavaSchemaRDD.scala:42)

at 
org.apache.spark.sql.hive.api.java.JavaHiveContext.sql(JavaHiveContext.scala:37)

at com.blackhorse.SparkTest.main(SparkTest.java:24)

[delete Spark temp dirs] DEBUG org.apache.spark.util.Utils - Shutdown hook 
called

 

[delete Spark local dirs] DEBUG org.apache.spark.storage.DiskBlockManager - 
Shutdown hook called

The load data command will cause the following:







Exception in thread main 
org.apache.spark.sql.execution.QueryExecutionException: FAILED: 
RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassNotFoundException: 
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory

at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:309)

at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)

at 
org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)

at 
org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)

at org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)

at 
org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30)

at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)

at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)

at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)

at org.apache.spark.sql.api.java.JavaSchemaRDD.init(JavaSchemaRDD.scala:42)

at 
org.apache.spark.sql.hive.api.java.JavaHiveContext.sql(JavaHiveContext.scala:37)

at com.blackhorse.SparkTest.main(SparkTest.java:25)

[delete Spark local dirs] DEBUG org.apache.spark.storage.DiskBlockManager - 
Shutdown hook called

 

[delete Spark temp dirs] DEBUG org.apache.spark.util.Utils - Shutdown hook 
called

how to specify hive connection options for HiveContext

2015-02-02 Thread guxiaobo1982
Hi,


I know two options, one for spark_submit, the other one for spark-shell, but 
how to set for programs running inside eclipse?


Regards,

Re: Can't access remote Hive table from spark

2015-02-01 Thread guxiaobo1982
One friend told me that I should add the hive-site.xml file to the --files 
option of spark-submit command, but how can I run and debug my program inside 
eclipse?






-- Original --
From:  guxiaobo1982;guxiaobo1...@qq.com;
Send time: Sunday, Feb 1, 2015 4:18 PM
To: Jörn Frankejornfra...@gmail.com; 

Subject:  Re: Can't access remote Hive table from spark



I am sorry , i forget to say that I have created the table manually .


在 2015年2月1日,下午4:14,Jörn Franke jornfra...@gmail.com 写道:



You commented the line which is suppose to create a table.
 Le 25 janv. 2015 09:20, guxiaobo1982 guxiaobo1...@qq.com a écrit :
Hi,
I built and started a single node standalone Spark 1.2.0 cluster along with a 
single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and 
Hive node I can create and query tables inside Hive, and on remote machines I 
can submit the SparkPi example to the Spark master. But I failed to run the 
following example code :


 
public class SparkTest {
 
public static void main(String[] args)
 
{
 
String appName= This is a test application;
 
String master=spark://lix1.bh.com:7077;
 

 
SparkConf conf = new 
SparkConf().setAppName(appName).setMaster(master);
 
JavaSparkContext sc = new JavaSparkContext(conf);
 

 
JavaHiveContext sqlCtx = new 
org.apache.spark.sql.hive.api.java.JavaHiveContext(sc);
 
//sqlCtx.sql(CREATE TABLE IF NOT EXISTS src (key INT, value 
STRING));
 
//sqlCtx.sql(LOAD DATA LOCAL INPATH 
'/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src);
 
// Queries are expressed in HiveQL.
 
ListRow rows = sqlCtx.sql(FROM src SELECT key, value).collect();
 
System.out.print(I got  + rows.size() +  rows \r\n);
 
sc.close();}
 
}




Exception in thread main 
org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src

at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)

at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70)

at 
org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:253)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141)

at scala.Option.getOrElse(Option.scala:120)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141)

at 
org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:253)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:143)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:138)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:162)

at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)

at scala.collection.AbstractIterator.to(Iterator.scala:1157)

at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:147)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:138)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:137)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61

spark-shell can't import the default hive-site.xml options probably.

2015-02-01 Thread guxiaobo1982
Hi,


To order to let a local spark-shell connect to  a remote spark stand-alone 
cluster and access  hive tables there, I must put the hive-site.xml file into 
the local spark installation's conf path, but spark-shell even can't import the 
default settings there, I found two errors:
 
property
 
  namehive.metastore.client.connect.retry.delay/name
 
  value5s/value
 
/property
 
property
 
  namehive.metastore.client.socket.timeout/name
 
  value1800s/value
 
/property

Spark-shell try to read 5s and 1800s and integers, they must be changed to 5 
and 1800 to let spark-shell work, It's suggested to be fixed in future versions.

Re: RE: Can't access remote Hive table from spark

2015-01-31 Thread guxiaobo1982
Hi Skanda,


How do set up your SPARK_CLASSPATH?


I add the following line to my SPARK_HOME/conf/spark-env.sh , and still got the 
same error.
 
export SPARK_CLASSPATH=${SPARK_CLASSPATH}:/etc/hive/conf





-- Original --
From:  Skanda Prasad;skanda.ganapa...@gmail.com;
Send time: Monday, Jan 26, 2015 7:41 AM
To: guxiaobo1...@qq.com; user@spark.apache.orguser@spark.apache.org; 

Subject:  RE: Can't access remote Hive table from spark



This happened to me as well, putting hive-site.xml inside conf doesn't seem to 
work. Instead I added /etc/hive/conf to SPARK_CLASSPATH and it worked. You can 
try this approach.

-Skanda


From: guxiaobo1982
Sent: ‎25-‎01-‎2015 13:50
To: user@spark.apache.org
Subject: Can't access remote Hive table from spark


Hi,
I built and started a single node standalone Spark 1.2.0 cluster along with a 
single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and 
Hive node I can create and query tables inside Hive, and on remote machines I 
can submit the SparkPi example to the Spark master. But I failed to run the 
following example code :


 
public class SparkTest {
 
public static void main(String[] args)
 
{
 
String appName= This is a test application;
 
String master=spark://lix1.bh.com:7077;
 

 
SparkConf conf = new 
SparkConf().setAppName(appName).setMaster(master);
 
JavaSparkContext sc = new JavaSparkContext(conf);
 

 
JavaHiveContext sqlCtx = new 
org.apache.spark.sql.hive.api.java.JavaHiveContext(sc);
 
//sqlCtx.sql(CREATE TABLE IF NOT EXISTS src (key INT, value 
STRING));
 
//sqlCtx.sql(LOAD DATA LOCAL INPATH 
'/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src);
 
// Queries are expressed in HiveQL.
 
ListRow rows = sqlCtx.sql(FROM src SELECT key, value).collect();
 
System.out.print(I got  + rows.size() +  rows \r\n);
 
sc.close();}
 
}




Exception in thread main 
org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src

at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)

at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70)

at 
org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:253)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141)

at scala.Option.getOrElse(Option.scala:120)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141)

at 
org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:253)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:143)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:138)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:162)

at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)

at scala.collection.AbstractIterator.to(Iterator.scala:1157)

at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:147)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:138)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:137

Re: RE: Can't access remote Hive table from spark

2015-01-31 Thread guxiaobo1982
The following line does not work too 
export SPARK_CLASSPATH=/etc/hive/conf




-- Original --
From:  guxiaobo1982;guxiaobo1...@qq.com;
Send time: Sunday, Feb 1, 2015 2:15 PM
To: Skanda Prasadskanda.ganapa...@gmail.com; 
user@spark.apache.orguser@spark.apache.org; 
Cc: 徐涛77044...@qq.com; 
Subject:  Re: RE: Can't access remote Hive table from spark



Hi Skanda,


How do set up your SPARK_CLASSPATH?


I add the following line to my SPARK_HOME/conf/spark-env.sh , and still got the 
same error.
 
export SPARK_CLASSPATH=${SPARK_CLASSPATH}:/etc/hive/conf





-- Original --
From:  Skanda Prasad;skanda.ganapa...@gmail.com;
Send time: Monday, Jan 26, 2015 7:41 AM
To: guxiaobo1...@qq.com; user@spark.apache.orguser@spark.apache.org; 

Subject:  RE: Can't access remote Hive table from spark



This happened to me as well, putting hive-site.xml inside conf doesn't seem to 
work. Instead I added /etc/hive/conf to SPARK_CLASSPATH and it worked. You can 
try this approach.

-Skanda


From: guxiaobo1982
Sent: ‎25-‎01-‎2015 13:50
To: user@spark.apache.org
Subject: Can't access remote Hive table from spark


Hi,
I built and started a single node standalone Spark 1.2.0 cluster along with a 
single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and 
Hive node I can create and query tables inside Hive, and on remote machines I 
can submit the SparkPi example to the Spark master. But I failed to run the 
following example code :


 
public class SparkTest {
 
public static void main(String[] args)
 
{
 
String appName= This is a test application;
 
String master=spark://lix1.bh.com:7077;
 

 
SparkConf conf = new 
SparkConf().setAppName(appName).setMaster(master);
 
JavaSparkContext sc = new JavaSparkContext(conf);
 

 
JavaHiveContext sqlCtx = new 
org.apache.spark.sql.hive.api.java.JavaHiveContext(sc);
 
//sqlCtx.sql(CREATE TABLE IF NOT EXISTS src (key INT, value 
STRING));
 
//sqlCtx.sql(LOAD DATA LOCAL INPATH 
'/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src);
 
// Queries are expressed in HiveQL.
 
ListRow rows = sqlCtx.sql(FROM src SELECT key, value).collect();
 
System.out.print(I got  + rows.size() +  rows \r\n);
 
sc.close();}
 
}




Exception in thread main 
org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src

at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)

at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70)

at 
org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:253)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141)

at scala.Option.getOrElse(Option.scala:120)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141)

at 
org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:253)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:143)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:138)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:162)

at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)

at scala.collection.AbstractIterator.to(Iterator.scala:1157)

at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191

Can't access remote Hive table from spark

2015-01-25 Thread guxiaobo1982
Hi,
I built and started a single node standalone Spark 1.2.0 cluster along with a 
single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and 
Hive node I can create and query tables inside Hive, and on remote machines I 
can submit the SparkPi example to the Spark master. But I failed to run the 
following example code :


 
public class SparkTest {
 
public static void main(String[] args)
 
{
 
String appName= This is a test application;
 
String master=spark://lix1.bh.com:7077;
 

 
SparkConf conf = new 
SparkConf().setAppName(appName).setMaster(master);
 
JavaSparkContext sc = new JavaSparkContext(conf);
 

 
JavaHiveContext sqlCtx = new 
org.apache.spark.sql.hive.api.java.JavaHiveContext(sc);
 
//sqlCtx.sql(CREATE TABLE IF NOT EXISTS src (key INT, value 
STRING));
 
//sqlCtx.sql(LOAD DATA LOCAL INPATH 
'/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src);
 
// Queries are expressed in HiveQL.
 
ListRow rows = sqlCtx.sql(FROM src SELECT key, value).collect();
 
System.out.print(I got  + rows.size() +  rows \r\n);
 
sc.close();}
 
}




Exception in thread main 
org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src

at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)

at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70)

at 
org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:253)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:141)

at scala.Option.getOrElse(Option.scala:120)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141)

at 
org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:253)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:143)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$5.applyOrElse(Analyzer.scala:138)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:162)

at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)

at scala.collection.AbstractIterator.to(Iterator.scala:1157)

at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:147)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:138)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:137)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)

at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)

at scala.collection.immutable.List.foldLeft(List.scala:84)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)

at scala.collection.immutable.List.foreach(List.scala:318)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)

at 

How to create distributed matrixes from hive tables.

2015-01-18 Thread guxiaobo1982
Hi,


We have large datasets with data format for Spark MLLib matrix, but there are 
pre-computed by Hive and stored inside Hive, my question is can we create a 
distributed matrix such as IndexedRowMatrix directlly from Hive tables, 
avoiding reading data from Hive tables and feed them into an empty Matrix.


Regards

How to get the master URL at runtime inside driver program?

2015-01-17 Thread guxiaobo1982
Hi,


Driver programs submitted by the spark-submit script will get the runtime spark 
master URL, but how it get the URL inside the main method when creating the 
SparkConf object?


Regards,

Is cluster mode is supported by the submit command for standalone clusters?

2015-01-17 Thread guxiaobo1982
Hi,


The submitting applications guide in 
http://spark.apache.org/docs/latest/submitting-applications.html says:


Alternatively, if your application is submitted from a machine far from the 
worker machines (e.g. locally on your laptop), it is common to usecluster mode 
to minimize network latency between the drivers and the executors. Note that 
cluster mode is currently not supported for standalone clusters, Mesos 
clusters, or python applications.




But there is a followed example, is this an error? and is cluster mode 
supported for standalone clusters?


# Run on a Spark Standalone cluster in cluster deploy mode with supervise 
./bin/spark-submit \   --class org.apache.spark.examples.SparkPi \   --master 
spark://207.184.161.138:7077 \   --deploy-mode cluster   --supervise   
--executor-memory 20G \   --total-executor-cores 100 \   /path/to/examples.jar 
\   1000

Re: Can't submit the SparkPi example to local Yarn 2.6.0 installed byambari 1.7.0

2014-12-29 Thread guxiaobo1982
/bin/spark-submit --class org.apache.spark.examples.SparkPi --master 
yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g 
--executor-cores 1 --queue thequeue lib/spark-examples-1.2.0-hadoop2.6.0.jar 10


Got the same error by the above command, I think I missed the jar containing 
the Jackson libraries.
 




-- Original --
From:  Sean Owen;so...@cloudera.com;
Send time: Sunday, Dec 28, 2014 3:08 AM
To: guxiaobo1...@qq.com; 
Cc: useruser@spark.apache.org; 
Subject:  Re: Can't submit the SparkPi example to local Yarn 2.6.0 installed 
byambari 1.7.0




The problem is a conflicts in the version of Jackson used in your cluster 
versus what you run. I would start by taking off things like the assembly jar 
from your classpath. Try the userClassPathFirst option as well to avoid using 
the Jackson in your Hadoop distribution. 
 Hi,I build the 1.2.0 version of spark against single node hadoop 2.6.0 
installed by ambari 1.7.0, the ./bin/run-example SparkPi 10 command can execute 
on my local Mac 10.9.5 and the centos virtual machine, which host hadoop, but I 
can't run the SparkPi example inside yarn, it seems there's something wrong 
with the classpathes:


 
export HADOOP_CONF_DIR=/etc/hadoop/conf
 


 
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master 
yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g 
--executor-cores 1 --queue thequeue --jars 
spark-assembly-1.2.0-hadoop2.6.0.jar,spark-1.2.0-yarn-shuffle.jar,datanucleus-core-3.2.10.jar,datanucleus-rdbms-3.2.9.jar,datanucleus-api-jdo-3.2.6.jar
 lib/spark-examples-1.2.0-hadoop2.6.0.jar 10

Spark assembly has been built with Hive, including Datanucleus jars on classpath

14/12/10 15:38:59 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

14/12/10 15:39:00 INFO impl.TimelineClientImpl: Timeline service address: 
http://lix1.bh.com:8188/ws/v1/timeline/

Exception in thread main java.lang.NoClassDefFoundError: 
org/codehaus/jackson/map/deser/std/StdDeserializer

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

at 
org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.configObjectMapper(YarnJacksonJaxbJsonProvider.java:57)

at 
org.apache.hadoop.yarn.util.timeline.TimelineUtils.clinit(TimelineUtils.java:47)

at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:166)

at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)

at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:65)

at 
org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:501)

at org.apache.spark.deploy.yarn.Client.run(Client.scala:35)

at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139)

at org.apache.spark.deploy.yarn.Client.main(Client.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: java.lang.ClassNotFoundException: 
org.codehaus.jackson.map.deser.std.StdDeserializer

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

... 28 more

[xiaobogu@lix1 spark-1.2.0-bin-2.6.0]$ ./bin/spark-submit --class 
org.apache.spark.examples.SparkPi 

Can't submit the SparkPi example to local Yarn 2.6.0 installed by ambari 1.7.0

2014-12-26 Thread guxiaobo1982
Hi,I build the 1.2.0 version of spark against single node hadoop 2.6.0 
installed by ambari 1.7.0, the ./bin/run-example SparkPi 10 command can execute 
on my local Mac 10.9.5 and the centos virtual machine, which host hadoop, but I 
can't run the SparkPi example inside yarn, it seems there's something wrong 
with the classpathes:


 
export HADOOP_CONF_DIR=/etc/hadoop/conf
 


 
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master 
yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g 
--executor-cores 1 --queue thequeue --jars 
spark-assembly-1.2.0-hadoop2.6.0.jar,spark-1.2.0-yarn-shuffle.jar,datanucleus-core-3.2.10.jar,datanucleus-rdbms-3.2.9.jar,datanucleus-api-jdo-3.2.6.jar
 lib/spark-examples-1.2.0-hadoop2.6.0.jar 10

Spark assembly has been built with Hive, including Datanucleus jars on classpath

14/12/10 15:38:59 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

14/12/10 15:39:00 INFO impl.TimelineClientImpl: Timeline service address: 
http://lix1.bh.com:8188/ws/v1/timeline/

Exception in thread main java.lang.NoClassDefFoundError: 
org/codehaus/jackson/map/deser/std/StdDeserializer

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

at 
org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.configObjectMapper(YarnJacksonJaxbJsonProvider.java:57)

at 
org.apache.hadoop.yarn.util.timeline.TimelineUtils.clinit(TimelineUtils.java:47)

at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:166)

at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)

at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:65)

at 
org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:501)

at org.apache.spark.deploy.yarn.Client.run(Client.scala:35)

at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139)

at org.apache.spark.deploy.yarn.Client.main(Client.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: java.lang.ClassNotFoundException: 
org.codehaus.jackson.map.deser.std.StdDeserializer

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

... 28 more

[xiaobogu@lix1 spark-1.2.0-bin-2.6.0]$ ./bin/spark-submit --class 
org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 
--driver-memory 1g --executor-memory 1g --executor-cores 1 --queue thequeue 
lib/spark-examples-1.2.0-hadoop2.6.0.jar 10

Spark assembly has been built with Hive, including Datanucleus jars on classpath

14/12/10 15:39:49 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

14/12/10 15:39:51 INFO impl.TimelineClientImpl: Timeline service address: 
http://lix1.bh.com:8188/ws/v1/timeline/

Exception in thread main java.lang.NoClassDefFoundError: 
org/codehaus/jackson/map/deser/std/StdDeserializer

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at 

Re: How to build Spark against the latest

2014-12-25 Thread guxiaobo1982
The following command works
 
./make-distribution.sh --tgz  -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 
-Dhadoop.version=2.6.0 -Phive -DskipTests



-- Original --
From:  guxiaobo1982;guxiaobo1...@qq.com;
Send time: Thursday, Dec 25, 2014 3:58 PM
To: guxiaobo1...@qq.com; Ted Yuyuzhih...@gmail.com; 
Cc: user@spark.apache.orguser@spark.apache.org; 
Subject:  Re:   How to build Spark against the latest





What options should I use when running the make-distribution.sh script,


I tried ./make-distribution.sh --hadoop.version 2.6.0 --with-yarn -with-hive 
--with-tachyon --tgz
with nothing came out.


Regards



-- Original --
From:  guxiaobo1982;guxiaobo1...@qq.com;
Send time: Wednesday, Dec 24, 2014 6:52 PM
To: Ted Yuyuzhih...@gmail.com; 
Cc: user@spark.apache.orguser@spark.apache.org; 
Subject:  Re:  How to build Spark against the latest



Hi Ted,
 The reference command works, but where I can get the deployable binaries?


Xiaobo Gu








-- Original --
From:  Ted Yu;yuzhih...@gmail.com;
Send time: Wednesday, Dec 24, 2014 12:09 PM
To: guxiaobo1...@qq.com; 
Cc: user@spark.apache.orguser@spark.apache.org; 
Subject:  Re: How to build Spark against the latest



See http://search-hadoop.com/m/JW1q5Cew0j

On Tue, Dec 23, 2014 at 8:00 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:
Hi,
The official pom.xml file only have profile for hadoop version 2.4 as the 
latest version, but I installed hadoop version 2.6.0 with ambari, how can I 
build spark against it, just using mvn -Dhadoop.version=2.6.0, or how to make a 
coresponding profile for it?


Regards,


Xiaobo

Re: How to build Spark against the latest

2014-12-24 Thread guxiaobo1982
Hi Ted,
 The reference command works, but where I can get the deployable binaries?


Xiaobo Gu








-- Original --
From:  Ted Yu;yuzhih...@gmail.com;
Send time: Wednesday, Dec 24, 2014 12:09 PM
To: guxiaobo1...@qq.com; 
Cc: user@spark.apache.orguser@spark.apache.org; 
Subject:  Re: How to build Spark against the latest



See http://search-hadoop.com/m/JW1q5Cew0j

On Tue, Dec 23, 2014 at 8:00 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:
Hi,
The official pom.xml file only have profile for hadoop version 2.4 as the 
latest version, but I installed hadoop version 2.6.0 with ambari, how can I 
build spark against it, just using mvn -Dhadoop.version=2.6.0, or how to make a 
coresponding profile for it?


Regards,


Xiaobo

Re: How to build Spark against the latest

2014-12-24 Thread guxiaobo1982
What options should I use when running the make-distribution.sh script,


I tried ./make-distribution.sh --hadoop.version 2.6.0 --with-yarn -with-hive 
--with-tachyon --tgz
with nothing came out.


Regards



-- Original --
From:  guxiaobo1982;guxiaobo1...@qq.com;
Send time: Wednesday, Dec 24, 2014 6:52 PM
To: Ted Yuyuzhih...@gmail.com; 
Cc: user@spark.apache.orguser@spark.apache.org; 
Subject:  Re:  How to build Spark against the latest



Hi Ted,
 The reference command works, but where I can get the deployable binaries?


Xiaobo Gu








-- Original --
From:  Ted Yu;yuzhih...@gmail.com;
Send time: Wednesday, Dec 24, 2014 12:09 PM
To: guxiaobo1...@qq.com; 
Cc: user@spark.apache.orguser@spark.apache.org; 
Subject:  Re: How to build Spark against the latest



See http://search-hadoop.com/m/JW1q5Cew0j

On Tue, Dec 23, 2014 at 8:00 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:
Hi,
The official pom.xml file only have profile for hadoop version 2.4 as the 
latest version, but I installed hadoop version 2.6.0 with ambari, how can I 
build spark against it, just using mvn -Dhadoop.version=2.6.0, or how to make a 
coresponding profile for it?


Regards,


Xiaobo

How to build Spark against the latest

2014-12-23 Thread guxiaobo1982
Hi,
The official pom.xml file only have profile for hadoop version 2.4 as the 
latest version, but I installed hadoop version 2.6.0 with ambari, how can I 
build spark against it, just using mvn -Dhadoop.version=2.6.0, or how to make a 
coresponding profile for it?


Regards,


Xiaobo

Re: What about implementing various hypothesis test for LogisticRegression in MLlib

2014-08-22 Thread guxiaobo1982
Hi Xiangrui,


You can refer to An Introduction to Statistical Learning with Applications in 
R, there are many stander hypothesis test to do regarding to linear 
regression and logistic regression, they should be implement in the fist order, 
then we will  list some other testes, which are also important when using 
logistic regression to build score cards.


Xiaobo Gu




-- Original --
From:  Xiangrui Meng;men...@gmail.com;
Send time: Wednesday, Aug 20, 2014 2:18 PM
To: guxiaobo1...@qq.com; 
Cc: user@spark.apache.orguser@spark.apache.org; 
Subject:  Re: What about implementing various hypothesis test for 
LogisticRegression in MLlib



We implemented chi-squared tests in v1.1:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala#L166
and we will add more after v1.1. Feedback on which tests should come
first would be greatly appreciated. -Xiangrui

On Tue, Aug 19, 2014 at 9:50 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:
 Hi,

 From the documentation I think only the model fitting part is implement,
 what about the various hypothesis test and performance indexes used to
 evaluate the model fit?

 Regards,

 Xiaobo Gu

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

What about implementing various hypothesis test for Logistic Regression in MLlib

2014-08-19 Thread guxiaobo1982
Hi,

From the documentation I think only the model fitting part is implement, what 
about the various hypothesis test and performance indexes used to evaluate the 
model fit?


Regards,


Xiaobo Gu

What's the best practice to deploy spark on Big SMP servers?

2014-06-26 Thread guxiaobo1982
Hi,


We have a  big SMP server(with 128G RAM and 32 CPU cores) to runn small scale 
analytical works, what's the best practice to deploy a stand alone Spark on the 
server to achieve good performance.


How many instances should be configured, how many RAM and CPU cores should be 
allocated for each instances?






Regards,


Xiaobo Gu

Where Can I find the full documentation for Spark SQL?

2014-06-25 Thread guxiaobo1982
Hi,


I want to know the full list of functions, syntax, features that Spark SQL 
supports, is there some documentations.




Regards,


Xiaobo Gu

Re: Where Can I find the full documentation for Spark SQL?

2014-06-25 Thread guxiaobo1982
the  api only says this :


public JavaSchemaRDD sql(String sqlQuery)Executes a query expressed in SQL, 
returning the result as a JavaSchemaRDD





but what kind of sqlQuery we can execute, is there any more documentation?


Xiaobo Gu




-- Original --
From:  Gianluca Privitera;gianluca.privite...@studio.unibo.it;
Date:  Jun 26, 2014
To:  user@spark.apache.orguser@spark.apache.org; 

Subject:  Re: Where Can I find the full documentation for Spark SQL?



You can find something in the API, nothing more than that I think for now.

Gianluca 

On 25 Jun 2014, at 23:36, guxiaobo1982 guxiaobo1...@qq.com wrote:

 Hi,
 
 I want to know the full list of functions, syntax, features that Spark SQL 
 supports, is there some documentations.
 
 
 Regards,
 
 Xiaobo Gu

.