Re: Strange behavior of spark-shell while accessing hdfs

2014-11-11 Thread hmxxyy
Thanks guys for the info.

I have to use yarn to access a kerberos cluster.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behavior-of-spark-shell-while-accessing-hdfs-tp18549p18677.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark-shell exception while running in YARN mode

2014-11-11 Thread hmxxyy
The Pi example gives same error in yarn mode

HADOOP_CONF_DIR=/home/gs/conf/current ./spark-submit --class
org.apache.spark.examples.SparkPi --master yarn-client
../examples/target/spark-examples_2.10-1.2.0-SNAPSHOT.jar

What could be wrong here?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-shell-exception-while-running-in-YARN-mode-tp18679p18688.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: disable log4j for spark-shell

2014-11-10 Thread hmxxyy
Tried --driver-java-options and SPARK_JAVA_OPTS, none of them worked

Had to change the default one and rebuilt.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/disable-log4j-for-spark-shell-tp11278p18513.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: disable log4j for spark-shell

2014-11-10 Thread hmxxyy
Even after changing
core/src/main/resources/org/apache/spark/log4j-defaults.properties to WARN
followed by a rebuild, the log level is still INFO.

Any other suggestions?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/disable-log4j-for-spark-shell-tp11278p18518.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: disable log4j for spark-shell

2014-11-10 Thread hmxxyy
Some console messages:

14/11/10 20:04:33 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:46713
14/11/10 20:04:33 INFO util.Utils: Successfully started service 'HTTP file
server' on port 46713.
14/11/10 20:04:34 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/10 20:04:34 INFO server.AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040
14/11/10 20:04:34 INFO util.Utils: Successfully started service 'SparkUI' on
port 4040.
14/11/10 20:04:34 INFO netty.NettyBlockTransferService: Server created on
46997
14/11/10 20:04:34 INFO storage.BlockManagerMaster: Trying to register
BlockManager
14/11/10 20:04:34 INFO storage.BlockManagerMasterActor: Registering block
manager localhost:46997 with 265.0 MB RAM, BlockManagerId(driver,
localhost, 46997)
14/11/10 20:04:35 INFO storage.BlockManagerMaster: Registered BlockManager

and the log4j-default.properties looks like:

cat core/src/main/resources/org/apache/spark/log4j-defaults.properties
# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p
%c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=WARN
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=WARN

Any suggestions?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/disable-log4j-for-spark-shell-tp11278p18520.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Strange behavior of spark-shell while accessing hdfs

2014-11-10 Thread hmxxyy
I am trying spark-shell on a single host and got some strange behavior of
spark-shell.

If I run bin/spark-shell without connecting a master, it can access a hdfs
file on a remote cluster with kerberos authentication.

scala val textFile =
sc.textFile(hdfs://*.*.*.*:8020/user/lih/drill_test/test.csv)
scala textFile.count()
res0: Long = 9

However, if I start the master and slave on the same host and using 
bin/spark-shell --master spark://*.*.*.*:7077
run the same commands

scala textFile.count()
14/11/11 05:00:23 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
stgace-launcher06.diy.corp.ne1.yahoo.com): java.io.IOException: Failed on
local exception: java.io.IOException:
org.apache.hadoop.security.AccessControlException: Client cannot
authenticate via:[TOKEN, KERBEROS]; Host Details : local host is:
*.*.*.*.com/98.138.236.95; destination host is: *.*.*.*:8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1375)
at org.apache.hadoop.ipc.Client.call(Client.java:1324)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:225)
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy20.getBlockLocations(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1165)
at 
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1155)
at 
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1145)
at
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:268)
at 
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:235)
at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:228)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1318)
at
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:293)
at
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:289)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:289)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
at
org.apache.hadoop.mapred.LineRecordReader.init(LineRecordReader.java:108)
at
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:233)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:195)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException:
org.apache.hadoop.security.AccessControlException: Client cannot
authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:657)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
at
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:621)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1423)
at org.apache.hadoop.ipc.Client.call(Client.java:1342)
... 38 more
Caused by: 

Spark 1.1.0 with Hadoop 2.5.0

2014-10-06 Thread hmxxyy
Does Spark 1.1.0 work with Hadoop 2.5.0?

The maven build instruction only has command options  up to hadoop 2.4.

Anybody ever made it work?

I am trying to run spark-sql with hive 0.12 on top of hadoop 2.5.0 but can't
make it work.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-1-0-with-Hadoop-2-5-0-tp15827.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to make ./bin/spark-sql work with hive?

2014-10-03 Thread Hmxxyy
No, it is hive 0.12.4.

Let me try your suggestion. It is an existing hive db. I am using the original 
hive-site.xml as is.

Sent from my iPhone

 On Oct 3, 2014, at 5:02 PM, Edwin Chiu edwin.c...@manage.com wrote:
 
 Are you using hive 0.13?
 
 Switching back to HadoopDefaultAuthenticator in your hive-site.xml worth a 
 shot
 
 property
 
   namehive.security.authenticator.manager/name
 
   
 !--valueorg.apache.hadoop.hive.ql.security.ProxyUserAuthenticator/value--
 
   
 valueorg.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator/value
 
 /property
 
 
 
 - Edwin
 
 On Fri, Oct 3, 2014 at 4:25 PM, Li HM hmx...@gmail.com wrote:
 If I don't have that jar, I am getting the following error:
 
 xception in thread main java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ClassNotFoundException: 
 org.apache.hcatalog.security.HdfsAuthorizationProvider
  at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:286)
  at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:116)
  at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601)
  at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ClassNotFoundException: 
 org.apache.hcatalog.security.HdfsAuthorizationProvider
  at 
 org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthorizeProviderManager(HiveUtils.java:342)
  at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:280)
  ... 9 more
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hcatalog.security.HdfsAuthorizationProvider
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:266)
  at 
 org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthorizeProviderManager(HiveUtils.java:335)
  ... 10 more
 
 On Fri, Oct 3, 2014 at 3:27 PM, Michael Armbrust mich...@databricks.com 
 wrote:
 Why are you including hcatalog-core.jar?  That is probably causing the 
 issues.
 
 On Fri, Oct 3, 2014 at 3:03 PM, Li HM hmx...@gmail.com wrote:
 This is my SPARK_CLASSPATH after cleanup
 SPARK_CLASSPATH=/home/test/lib/hcatalog-core.jar:$SPARK_CLASSPATH
 
 now use mydb works.
 
 but show tables and select * from test still gives exception:
 
 spark-sql show tables;
 OK
 java.io.IOException: java.io.IOException: Cannot create an instance of 
 InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in 
 mapredWork!
at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:551)
at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1471)
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:305)
at 
 org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:272)
at 
 org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)
at 
 org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)
at 
 org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:38)
at 
 org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360)
at 
 org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360)
at 
 org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:103)
at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:98)
at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:58)
at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at