Yep - Burak's answer should work. FWIW the error message from the stack trace that shows this is the line "Failed to load class for data source: avro"
Thanks Shivaram On Sat, Jun 13, 2015 at 6:13 PM, Burak Yavuz <brk...@gmail.com> wrote: > Hi, > Not sure if this is it, but could you please try > "com.databricks.spark.avro" instead of just "avro". > > Thanks, > Burak > On Jun 13, 2015 9:55 AM, "Shing Hing Man" <mat...@yahoo.com.invalid> > wrote: > >> Hi, >> I am trying to read a avro file in SparkR (in Spark 1.4.0). >> >> I started R using the following. >> matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0 >> >> Inside the R shell, when I issue the following, >> >> > read.df(sqlContext, "file:///home/matmsh/myfile.avro","avro") >> >> I get the following exception. >> Caused by: java.lang.RuntimeException: Failed to load class for data >> source: avro >> >> Below is the stack trace. >> >> >> matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0 >> >> R version 3.2.0 (2015-04-16) -- "Full of Ingredients" >> Copyright (C) 2015 The R Foundation for Statistical Computing >> Platform: x86_64-suse-linux-gnu (64-bit) >> >> R is free software and comes with ABSOLUTELY NO WARRANTY. >> You are welcome to redistribute it under certain conditions. >> Type 'license()' or 'licence()' for distribution details. >> >> Natural language support but running in an English locale >> >> R is a collaborative project with many contributors. >> Type 'contributors()' for more information and >> 'citation()' on how to cite R or R packages in publications. >> >> Type 'demo()' for some demos, 'help()' for on-line help, or >> 'help.start()' for an HTML browser interface to help. >> Type 'q()' to quit R. >> >> Launching java with spark-submit command >> /home/matmsh/installed/spark/bin/spark-submit "--packages" >> "com.databricks:spark-avro_2.10:1.0.0" "sparkr-shell" >> /tmp/RtmpoT7FrF/backend_port464e1e2fb16a >> Ivy Default Cache set to: /home/matmsh/.ivy2/cache >> The jars for the packages stored in: /home/matmsh/.ivy2/jars >> :: loading settings :: url = >> jar:file:/home/matmsh/installed/sparks/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml >> com.databricks#spark-avro_2.10 added as a dependency >> :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 >> confs: [default] >> found com.databricks#spark-avro_2.10;1.0.0 in list >> found org.apache.avro#avro;1.7.6 in local-m2-cache >> found org.codehaus.jackson#jackson-core-asl;1.9.13 in list >> found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in list >> found com.thoughtworks.paranamer#paranamer;2.3 in list >> found org.xerial.snappy#snappy-java;1.0.5 in list >> found org.apache.commons#commons-compress;1.4.1 in list >> found org.tukaani#xz;1.0 in list >> found org.slf4j#slf4j-api;1.6.4 in list >> :: resolution report :: resolve 421ms :: artifacts dl 16ms >> :: modules in use: >> com.databricks#spark-avro_2.10;1.0.0 from list in [default] >> com.thoughtworks.paranamer#paranamer;2.3 from list in [default] >> org.apache.avro#avro;1.7.6 from local-m2-cache in [default] >> org.apache.commons#commons-compress;1.4.1 from list in [default] >> org.codehaus.jackson#jackson-core-asl;1.9.13 from list in [default] >> org.codehaus.jackson#jackson-mapper-asl;1.9.13 from list in [default] >> org.slf4j#slf4j-api;1.6.4 from list in [default] >> org.tukaani#xz;1.0 from list in [default] >> org.xerial.snappy#snappy-java;1.0.5 from list in [default] >> --------------------------------------------------------------------- >> | | modules || artifacts | >> | conf | number| search|dwnlded|evicted|| number|dwnlded| >> --------------------------------------------------------------------- >> | default | 9 | 0 | 0 | 0 || 9 | 0 | >> --------------------------------------------------------------------- >> :: retrieving :: org.apache.spark#spark-submit-parent >> confs: [default] >> 0 artifacts copied, 9 already retrieved (0kB/9ms) >> 15/06/13 17:37:42 INFO spark.SparkContext: Running Spark version 1.4.0 >> 15/06/13 17:37:42 WARN util.NativeCodeLoader: Unable to load >> native-hadoop library for your platform... using builtin-java classes where >> applicable >> 15/06/13 17:37:42 WARN util.Utils: Your hostname, gauss resolves to a >> loopback address: 127.0.0.1; using 192.168.0.10 instead (on interface >> enp3s0) >> 15/06/13 17:37:42 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind >> to another address >> 15/06/13 17:37:42 INFO spark.SecurityManager: Changing view acls to: >> matmsh >> 15/06/13 17:37:42 INFO spark.SecurityManager: Changing modify acls to: >> matmsh >> 15/06/13 17:37:42 INFO spark.SecurityManager: SecurityManager: >> authentication disabled; ui acls disabled; users with view permissions: >> Set(matmsh); users with modify permissions: Set(matmsh) >> 15/06/13 17:37:43 INFO slf4j.Slf4jLogger: Slf4jLogger started >> 15/06/13 17:37:43 INFO Remoting: Starting remoting >> 15/06/13 17:37:43 INFO Remoting: Remoting started; listening on addresses >> :[akka.tcp://sparkDriver@192.168.0.10:46219] >> 15/06/13 17:37:43 INFO util.Utils: Successfully started service >> 'sparkDriver' on port 46219. >> 15/06/13 17:37:43 INFO spark.SparkEnv: Registering MapOutputTracker >> 15/06/13 17:37:43 INFO spark.SparkEnv: Registering BlockManagerMaster >> 15/06/13 17:37:43 INFO storage.DiskBlockManager: Created local directory >> at >> /tmp/spark-c8661016-d922-4ad3-a171-7b0f719c40a2/blockmgr-e79853e5-e046-4b13-a3ba-0b4c46831461 >> 15/06/13 17:37:43 INFO storage.MemoryStore: MemoryStore started with >> capacity 265.4 MB >> 15/06/13 17:37:43 INFO spark.HttpFileServer: HTTP File server directory >> is >> /tmp/spark-c8661016-d922-4ad3-a171-7b0f719c40a2/httpd-0f11e45e-08fe-40b1-8bf9-21de1dd472b7 >> 15/06/13 17:37:43 INFO spark.HttpServer: Starting HTTP Server >> 15/06/13 17:37:43 INFO server.Server: jetty-8.y.z-SNAPSHOT >> 15/06/13 17:37:43 INFO server.AbstractConnector: Started >> SocketConnector@0.0.0.0:48507 >> 15/06/13 17:37:43 INFO util.Utils: Successfully started service 'HTTP >> file server' on port 48507. >> 15/06/13 17:37:43 INFO spark.SparkEnv: Registering OutputCommitCoordinator >> 15/06/13 17:37:43 INFO server.Server: jetty-8.y.z-SNAPSHOT >> 15/06/13 17:37:43 INFO server.AbstractConnector: Started >> SelectChannelConnector@0.0.0.0:4040 >> 15/06/13 17:37:43 INFO util.Utils: Successfully started service 'SparkUI' >> on port 4040. >> 15/06/13 17:37:43 INFO ui.SparkUI: Started SparkUI at >> http://192.168.0.10:4040 >> 15/06/13 17:37:43 INFO spark.SparkContext: Added JAR >> file:/home/matmsh/.ivy2/jars/com.databricks_spark-avro_2.10-1.0.0.jar at >> http://192.168.0.10:48507/jars/com.databricks_spark-avro_2.10-1.0.0.jar >> with timestamp 1434213463626 >> 15/06/13 17:37:43 INFO spark.SparkContext: Added JAR >> file:/home/matmsh/.ivy2/jars/org.apache.avro_avro-1.7.6.jar at >> http://192.168.0.10:48507/jars/org.apache.avro_avro-1.7.6.jar with >> timestamp 1434213463627 >> 15/06/13 17:37:43 INFO spark.SparkContext: Added JAR >> file:/home/matmsh/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar >> at >> http://192.168.0.10:48507/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar >> with timestamp 1434213463627 >> 15/06/13 17:37:43 INFO spark.SparkContext: Added JAR >> file:/home/matmsh/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar >> at >> http://192.168.0.10:48507/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar >> with timestamp 1434213463628 >> 15/06/13 17:37:43 INFO spark.SparkContext: Added JAR >> file:/home/matmsh/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar >> at >> http://192.168.0.10:48507/jars/com.thoughtworks.paranamer_paranamer-2.3.jar >> with timestamp 1434213463628 >> 15/06/13 17:37:43 INFO spark.SparkContext: Added JAR >> file:/home/matmsh/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar at >> http://192.168.0.10:48507/jars/org.xerial.snappy_snappy-java-1.0.5.jar >> with timestamp 1434213463630 >> 15/06/13 17:37:43 INFO spark.SparkContext: Added JAR >> file:/home/matmsh/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar >> at >> http://192.168.0.10:48507/jars/org.apache.commons_commons-compress-1.4.1.jar >> with timestamp 1434213463630 >> 15/06/13 17:37:43 INFO spark.SparkContext: Added JAR >> file:/home/matmsh/.ivy2/jars/org.slf4j_slf4j-api-1.6.4.jar at >> http://192.168.0.10:48507/jars/org.slf4j_slf4j-api-1.6.4.jar with >> timestamp 1434213463630 >> 15/06/13 17:37:43 INFO spark.SparkContext: Added JAR >> file:/home/matmsh/.ivy2/jars/org.tukaani_xz-1.0.jar at >> http://192.168.0.10:48507/jars/org.tukaani_xz-1.0.jar with timestamp >> 1434213463630 >> 15/06/13 17:37:43 INFO executor.Executor: Starting executor ID driver on >> host localhost >> 15/06/13 17:37:43 INFO util.Utils: Successfully started service >> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 55381. >> 15/06/13 17:37:43 INFO netty.NettyBlockTransferService: Server created on >> 55381 >> 15/06/13 17:37:43 INFO storage.BlockManagerMaster: Trying to register >> BlockManager >> 15/06/13 17:37:43 INFO storage.BlockManagerMasterEndpoint: Registering >> block manager localhost:55381 with 265.4 MB RAM, BlockManagerId(driver, >> localhost, 55381) >> 15/06/13 17:37:43 INFO storage.BlockManagerMaster: Registered BlockManager >> >> Welcome to SparkR! >> Spark context is available as sc, SQL context is available as sqlContext >> > read.df(sqlContext, "file:///home/matmsh/myfile.avro","avro") >> 15/06/13 17:38:53 ERROR r.RBackendHandler: load on 1 failed >> java.lang.reflect.InvocationTargetException >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127) >> at >> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) >> at >> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) >> at >> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) >> at >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) >> at >> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) >> at >> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) >> at >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) >> at >> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) >> at >> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) >> at >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) >> at >> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) >> at >> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) >> at >> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) >> at >> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) >> at >> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) >> at >> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) >> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) >> at >> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) >> at >> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.lang.RuntimeException: Failed to load class for data >> source: avro >> at scala.sys.package$.error(package.scala:27) >> at >> org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:216) >> at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229) >> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) >> at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230) >> ... 25 more >> Error: returnStatus == 0 is not TRUE >> >> >> Thanks in advance for any assistance! >> Shing >> >> >> >>