Re: How to read avro in SparkR
Burak's suggestion works : read.df(sqlContext, "file:///home/matmsh/myfile.avro","com.databricks.spark.avro" ) Maybe the above should be added to SparkR (R on Spark) - Spark 1.4.0 Documentation | | | | | | | | | SparkR (R on Spark) - Spark 1.4.0 DocumentationSparkR (R on Spark) Overview SparkR DataFrames Starting Up: SparkContext, SQLContext Creating DataFrames From local data frames From Data Sources From Hive tables DataFrame Operations | | | | View on spark.apache.org | Preview by Yahoo | | | | | Thanks!Shing On Sunday, 14 June 2015, 3:07, Shivaram Venkataraman wrote: Yep - Burak's answer should work. FWIW the error message from the stack trace that shows this is the line "Failed to load class for data source: avro" ThanksShivaram On Sat, Jun 13, 2015 at 6:13 PM, Burak Yavuz wrote: Hi, Not sure if this is it, but could you please try "com.databricks.spark.avro" instead of just "avro".Thanks, BurakOn Jun 13, 2015 9:55 AM, "Shing Hing Man" wrote: Hi, I am trying to read a avro file in SparkR (in Spark 1.4.0). I started R using the following. matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0 Inside the R shell, when I issue the following, > read.df(sqlContext, "file:///home/matmsh/myfile.avro","avro") I get the following exception. Caused by: java.lang.RuntimeException: Failed to load class for data source: avro Below is the stack trace. matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0 R version 3.2.0 (2015-04-16) -- "Full of Ingredients"Copyright (C) 2015 The R Foundation for Statistical ComputingPlatform: x86_64-suse-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R. Launching java with spark-submit command /home/matmsh/installed/spark/bin/spark-submit "--packages" "com.databricks:spark-avro_2.10:1.0.0" "sparkr-shell" /tmp/RtmpoT7FrF/backend_port464e1e2fb16a Ivy Default Cache set to: /home/matmsh/.ivy2/cacheThe jars for the packages stored in: /home/matmsh/.ivy2/jars:: loading settings :: url = jar:file:/home/matmsh/installed/sparks/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xmlcom.databricks#spark-avro_2.10 added as a dependency:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found com.databricks#spark-avro_2.10;1.0.0 in list found org.apache.avro#avro;1.7.6 in local-m2-cache found org.codehaus.jackson#jackson-core-asl;1.9.13 in list found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in list found com.thoughtworks.paranamer#paranamer;2.3 in list found org.xerial.snappy#snappy-java;1.0.5 in list found org.apache.commons#commons-compress;1.4.1 in list found org.tukaani#xz;1.0 in list found org.slf4j#slf4j-api;1.6.4 in list:: resolution report :: resolve 421ms :: artifacts dl 16ms :: modules in use: com.databricks#spark-avro_2.10;1.0.0 from list in [default] com.thoughtworks.paranamer#paranamer;2.3 from list in [default] org.apache.avro#avro;1.7.6 from local-m2-cache in [default] org.apache.commons#commons-compress;1.4.1 from list in [default] org.codehaus.jackson#jackson-core-asl;1.9.13 from list in [default] org.codehaus.jackson#jackson-mapper-asl;1.9.13 from list in [default] org.slf4j#slf4j-api;1.6.4 from list in [default] org.tukaani#xz;1.0 from list in [default] org.xerial.snappy#snappy-java;1.0.5 from list in [default] - | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 9 | 0 | 0 | 0 || 9 | 0 | -:: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts copied, 9 already retrieved (0kB/9ms)15/06/13 17:37:42 INFO spark.SparkContext: Running Spark version 1.4.015/06/13 17:37:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable15/06/13 17:37:42 WARN util.Utils: Your hostname, gauss resolves to a loopback address: 127.0.0.1; using 192.168.0.10 instead (on interface enp3s0)15/06/13 17:37:42 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address15/06/13 17:37:42 INFO spark.SecurityManager: Changing view acls to: matmsh15/06/13 17:37:42 INFO spark.SecurityManager: Changing modify acls to: matmsh15/06/13 17:
Re: How to read avro in SparkR
Yep - Burak's answer should work. FWIW the error message from the stack trace that shows this is the line "Failed to load class for data source: avro" Thanks Shivaram On Sat, Jun 13, 2015 at 6:13 PM, Burak Yavuz wrote: > Hi, > Not sure if this is it, but could you please try > "com.databricks.spark.avro" instead of just "avro". > > Thanks, > Burak > On Jun 13, 2015 9:55 AM, "Shing Hing Man" > wrote: > >> Hi, >> I am trying to read a avro file in SparkR (in Spark 1.4.0). >> >> I started R using the following. >> matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0 >> >> Inside the R shell, when I issue the following, >> >> > read.df(sqlContext, "file:///home/matmsh/myfile.avro","avro") >> >> I get the following exception. >> Caused by: java.lang.RuntimeException: Failed to load class for data >> source: avro >> >> Below is the stack trace. >> >> >> matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0 >> >> R version 3.2.0 (2015-04-16) -- "Full of Ingredients" >> Copyright (C) 2015 The R Foundation for Statistical Computing >> Platform: x86_64-suse-linux-gnu (64-bit) >> >> R is free software and comes with ABSOLUTELY NO WARRANTY. >> You are welcome to redistribute it under certain conditions. >> Type 'license()' or 'licence()' for distribution details. >> >> Natural language support but running in an English locale >> >> R is a collaborative project with many contributors. >> Type 'contributors()' for more information and >> 'citation()' on how to cite R or R packages in publications. >> >> Type 'demo()' for some demos, 'help()' for on-line help, or >> 'help.start()' for an HTML browser interface to help. >> Type 'q()' to quit R. >> >> Launching java with spark-submit command >> /home/matmsh/installed/spark/bin/spark-submit "--packages" >> "com.databricks:spark-avro_2.10:1.0.0" "sparkr-shell" >> /tmp/RtmpoT7FrF/backend_port464e1e2fb16a >> Ivy Default Cache set to: /home/matmsh/.ivy2/cache >> The jars for the packages stored in: /home/matmsh/.ivy2/jars >> :: loading settings :: url = >> jar:file:/home/matmsh/installed/sparks/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml >> com.databricks#spark-avro_2.10 added as a dependency >> :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 >> confs: [default] >> found com.databricks#spark-avro_2.10;1.0.0 in list >> found org.apache.avro#avro;1.7.6 in local-m2-cache >> found org.codehaus.jackson#jackson-core-asl;1.9.13 in list >> found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in list >> found com.thoughtworks.paranamer#paranamer;2.3 in list >> found org.xerial.snappy#snappy-java;1.0.5 in list >> found org.apache.commons#commons-compress;1.4.1 in list >> found org.tukaani#xz;1.0 in list >> found org.slf4j#slf4j-api;1.6.4 in list >> :: resolution report :: resolve 421ms :: artifacts dl 16ms >> :: modules in use: >> com.databricks#spark-avro_2.10;1.0.0 from list in [default] >> com.thoughtworks.paranamer#paranamer;2.3 from list in [default] >> org.apache.avro#avro;1.7.6 from local-m2-cache in [default] >> org.apache.commons#commons-compress;1.4.1 from list in [default] >> org.codehaus.jackson#jackson-core-asl;1.9.13 from list in [default] >> org.codehaus.jackson#jackson-mapper-asl;1.9.13 from list in [default] >> org.slf4j#slf4j-api;1.6.4 from list in [default] >> org.tukaani#xz;1.0 from list in [default] >> org.xerial.snappy#snappy-java;1.0.5 from list in [default] >> - >> | | modules || artifacts | >> | conf | number| search|dwnlded|evicted|| number|dwnlded| >> - >> | default | 9 | 0 | 0 | 0 || 9 | 0 | >> - >> :: retrieving :: org.apache.spark#spark-submit-parent >> confs: [default] >> 0 artifacts copied, 9 already retrieved (0kB/9ms) >> 15/06/13 17:37:42 INFO spark.SparkContext: Running Spark version 1.4.0 >> 15/06/13 17:37:42 WARN util.NativeCodeLoader: Unable to load >> native-hadoop library for your platform... using builtin-java classes where >> applicable >> 15/06/13 17:37:42 WARN util.Utils: Your hostname, gauss resolves to a >> loopback address: 127.0.0.1; using 192.168.0.10 instead (on interface >> enp3s0) >> 15/06/13 17:37:42 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind >> to another address >> 15/06/13 17:37:42 INFO spark.SecurityManager: Changing view acls to: >> matmsh >> 15/06/13 17:37:42 INFO spark.SecurityManager: Changing modify acls to: >> matmsh >> 15/06/13 17:37:42 INFO spark.SecurityManager: SecurityManager: >> authentication disabled; ui acls disabled; users with view permissions: >> Set(matmsh); users with modify permissions: Set(matmsh) >> 15/06/13 17:37:43 INFO slf4j.Slf4jLogger: Slf4jLogger started >> 15/06/13 17:37:43 INFO Remoting: Starting remoting >> 15/06/13 17:37:43 INF
Re: How to read avro in SparkR
Hi, Not sure if this is it, but could you please try "com.databricks.spark.avro" instead of just "avro". Thanks, Burak On Jun 13, 2015 9:55 AM, "Shing Hing Man" wrote: > Hi, > I am trying to read a avro file in SparkR (in Spark 1.4.0). > > I started R using the following. > matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0 > > Inside the R shell, when I issue the following, > > > read.df(sqlContext, "file:///home/matmsh/myfile.avro","avro") > > I get the following exception. > Caused by: java.lang.RuntimeException: Failed to load class for data > source: avro > > Below is the stack trace. > > > matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0 > > R version 3.2.0 (2015-04-16) -- "Full of Ingredients" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-suse-linux-gnu (64-bit) > > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > > Natural language support but running in an English locale > > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > > Launching java with spark-submit command > /home/matmsh/installed/spark/bin/spark-submit "--packages" > "com.databricks:spark-avro_2.10:1.0.0" "sparkr-shell" > /tmp/RtmpoT7FrF/backend_port464e1e2fb16a > Ivy Default Cache set to: /home/matmsh/.ivy2/cache > The jars for the packages stored in: /home/matmsh/.ivy2/jars > :: loading settings :: url = > jar:file:/home/matmsh/installed/sparks/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml > com.databricks#spark-avro_2.10 added as a dependency > :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 > confs: [default] > found com.databricks#spark-avro_2.10;1.0.0 in list > found org.apache.avro#avro;1.7.6 in local-m2-cache > found org.codehaus.jackson#jackson-core-asl;1.9.13 in list > found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in list > found com.thoughtworks.paranamer#paranamer;2.3 in list > found org.xerial.snappy#snappy-java;1.0.5 in list > found org.apache.commons#commons-compress;1.4.1 in list > found org.tukaani#xz;1.0 in list > found org.slf4j#slf4j-api;1.6.4 in list > :: resolution report :: resolve 421ms :: artifacts dl 16ms > :: modules in use: > com.databricks#spark-avro_2.10;1.0.0 from list in [default] > com.thoughtworks.paranamer#paranamer;2.3 from list in [default] > org.apache.avro#avro;1.7.6 from local-m2-cache in [default] > org.apache.commons#commons-compress;1.4.1 from list in [default] > org.codehaus.jackson#jackson-core-asl;1.9.13 from list in [default] > org.codehaus.jackson#jackson-mapper-asl;1.9.13 from list in [default] > org.slf4j#slf4j-api;1.6.4 from list in [default] > org.tukaani#xz;1.0 from list in [default] > org.xerial.snappy#snappy-java;1.0.5 from list in [default] > - > | | modules || artifacts | > | conf | number| search|dwnlded|evicted|| number|dwnlded| > - > | default | 9 | 0 | 0 | 0 || 9 | 0 | > - > :: retrieving :: org.apache.spark#spark-submit-parent > confs: [default] > 0 artifacts copied, 9 already retrieved (0kB/9ms) > 15/06/13 17:37:42 INFO spark.SparkContext: Running Spark version 1.4.0 > 15/06/13 17:37:42 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 15/06/13 17:37:42 WARN util.Utils: Your hostname, gauss resolves to a > loopback address: 127.0.0.1; using 192.168.0.10 instead (on interface > enp3s0) > 15/06/13 17:37:42 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind > to another address > 15/06/13 17:37:42 INFO spark.SecurityManager: Changing view acls to: matmsh > 15/06/13 17:37:42 INFO spark.SecurityManager: Changing modify acls to: > matmsh > 15/06/13 17:37:42 INFO spark.SecurityManager: SecurityManager: > authentication disabled; ui acls disabled; users with view permissions: > Set(matmsh); users with modify permissions: Set(matmsh) > 15/06/13 17:37:43 INFO slf4j.Slf4jLogger: Slf4jLogger started > 15/06/13 17:37:43 INFO Remoting: Starting remoting > 15/06/13 17:37:43 INFO Remoting: Remoting started; listening on addresses > :[akka.tcp://sparkDriver@192.168.0.10:46219] > 15/06/13 17:37:43 INFO util.Utils: Successfully started service > 'sparkDriver' on port 46219. > 15/06/13 17:37:43 INFO spark.SparkEnv: Registering MapOutputTracker > 15/06/13 17:37:43 INFO spark.SparkEnv: Registering BlockManagerMaster