Re: How to read avro in SparkR

2015-06-14 Thread Shing Hing Man
Burak's suggestion works :
 read.df(sqlContext, 
"file:///home/matmsh/myfile.avro","com.databricks.spark.avro" )
Maybe the above should be added to 
SparkR (R on Spark) - Spark 1.4.0 Documentation

|   |
|   |   |   |   |   |
| SparkR (R on Spark) - Spark 1.4.0 DocumentationSparkR (R on Spark) Overview 
SparkR DataFrames Starting Up: SparkContext, SQLContext Creating DataFrames 
From local data frames From Data Sources From Hive tables DataFrame Operations  
|
|  |
| View on spark.apache.org | Preview by Yahoo |
|  |
|   |

  Thanks!Shing



 On Sunday, 14 June 2015, 3:07, Shivaram Venkataraman 
 wrote:
   

 Yep - Burak's answer should work. FWIW the error message from the stack trace 
that shows this is the line "Failed to load class for data source: avro"

ThanksShivaram
On Sat, Jun 13, 2015 at 6:13 PM, Burak Yavuz  wrote:

Hi,
Not sure if this is it, but could you please try "com.databricks.spark.avro" 
instead of just "avro".Thanks,
BurakOn Jun 13, 2015 9:55 AM, "Shing Hing Man"  wrote:

Hi,  I am trying to read a avro file in SparkR (in Spark 1.4.0).
I started R using the following.
matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0
Inside the R shell, when I issue the following, 

> read.df(sqlContext, "file:///home/matmsh/myfile.avro","avro")
I get the following exception.
Caused by: java.lang.RuntimeException: Failed to load class for data source: 
avro
Below is the stack trace.

matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0
R version 3.2.0 (2015-04-16) -- "Full of Ingredients"Copyright (C) 2015 The R 
Foundation for Statistical ComputingPlatform: x86_64-suse-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to 
redistribute it under certain conditions.Type 'license()' or 'licence()' for 
distribution details.
 Natural language support but running in an English locale
R is a collaborative project with many contributors.Type 'contributors()' for 
more information and'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for 
an HTML browser interface to help.Type 'q()' to quit R.
Launching java with spark-submit command 
/home/matmsh/installed/spark/bin/spark-submit "--packages" 
"com.databricks:spark-avro_2.10:1.0.0" "sparkr-shell" 
/tmp/RtmpoT7FrF/backend_port464e1e2fb16a Ivy Default Cache set to: 
/home/matmsh/.ivy2/cacheThe jars for the packages stored in: 
/home/matmsh/.ivy2/jars:: loading settings :: url = 
jar:file:/home/matmsh/installed/sparks/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xmlcom.databricks#spark-avro_2.10
 added as a dependency:: resolving dependencies :: 
org.apache.spark#spark-submit-parent;1.0 confs: [default] found 
com.databricks#spark-avro_2.10;1.0.0 in list found org.apache.avro#avro;1.7.6 
in local-m2-cache found org.codehaus.jackson#jackson-core-asl;1.9.13 in list 
found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in list found 
com.thoughtworks.paranamer#paranamer;2.3 in list found 
org.xerial.snappy#snappy-java;1.0.5 in list found 
org.apache.commons#commons-compress;1.4.1 in list found org.tukaani#xz;1.0 in 
list found org.slf4j#slf4j-api;1.6.4 in list:: resolution report :: resolve 
421ms :: artifacts dl 16ms :: modules in use: 
com.databricks#spark-avro_2.10;1.0.0 from list in [default] 
com.thoughtworks.paranamer#paranamer;2.3 from list in [default] 
org.apache.avro#avro;1.7.6 from local-m2-cache in [default] 
org.apache.commons#commons-compress;1.4.1 from list in [default] 
org.codehaus.jackson#jackson-core-asl;1.9.13 from list in [default] 
org.codehaus.jackson#jackson-mapper-asl;1.9.13 from list in [default] 
org.slf4j#slf4j-api;1.6.4 from list in [default] org.tukaani#xz;1.0 from list 
in [default] org.xerial.snappy#snappy-java;1.0.5 from list in [default] 
- | | 
modules || artifacts | | conf | number| search|dwnlded|evicted|| 
number|dwnlded| 
- | default 
| 9 | 0 | 0 | 0 || 9 | 0 | 
-:: 
retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts 
copied, 9 already retrieved (0kB/9ms)15/06/13 17:37:42 INFO spark.SparkContext: 
Running Spark version 1.4.015/06/13 17:37:42 WARN util.NativeCodeLoader: Unable 
to load native-hadoop library for your platform... using builtin-java classes 
where applicable15/06/13 17:37:42 WARN util.Utils: Your hostname, gauss 
resolves to a loopback address: 127.0.0.1; using 192.168.0.10 instead (on 
interface enp3s0)15/06/13 17:37:42 WARN util.Utils: Set SPARK_LOCAL_IP if you 
need to bind to another address15/06/13 17:37:42 INFO spark.SecurityManager: 
Changing view acls to: matmsh15/06/13 17:37:42 INFO spark.SecurityManager: 
Changing modify acls to: matmsh15/06/13 17:

Re: How to read avro in SparkR

2015-06-13 Thread Shivaram Venkataraman
Yep - Burak's answer should work. FWIW the error message from the stack
trace that shows this is the line
"Failed to load class for data source: avro"

Thanks
Shivaram

On Sat, Jun 13, 2015 at 6:13 PM, Burak Yavuz  wrote:

> Hi,
> Not sure if this is it, but could you please try
> "com.databricks.spark.avro" instead of just "avro".
>
> Thanks,
> Burak
> On Jun 13, 2015 9:55 AM, "Shing Hing Man" 
> wrote:
>
>> Hi,
>>   I am trying to read a avro file in SparkR (in Spark 1.4.0).
>>
>> I started R using the following.
>> matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0
>>
>> Inside the R shell, when I issue the following,
>>
>> > read.df(sqlContext, "file:///home/matmsh/myfile.avro","avro")
>>
>> I get the following exception.
>> Caused by: java.lang.RuntimeException: Failed to load class for data
>> source: avro
>>
>> Below is the stack trace.
>>
>>
>> matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0
>>
>> R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
>> Copyright (C) 2015 The R Foundation for Statistical Computing
>> Platform: x86_64-suse-linux-gnu (64-bit)
>>
>> R is free software and comes with ABSOLUTELY NO WARRANTY.
>> You are welcome to redistribute it under certain conditions.
>> Type 'license()' or 'licence()' for distribution details.
>>
>>  Natural language support but running in an English locale
>>
>> R is a collaborative project with many contributors.
>> Type 'contributors()' for more information and
>> 'citation()' on how to cite R or R packages in publications.
>>
>> Type 'demo()' for some demos, 'help()' for on-line help, or
>> 'help.start()' for an HTML browser interface to help.
>> Type 'q()' to quit R.
>>
>> Launching java with spark-submit command
>> /home/matmsh/installed/spark/bin/spark-submit "--packages"
>> "com.databricks:spark-avro_2.10:1.0.0" "sparkr-shell"
>> /tmp/RtmpoT7FrF/backend_port464e1e2fb16a
>> Ivy Default Cache set to: /home/matmsh/.ivy2/cache
>> The jars for the packages stored in: /home/matmsh/.ivy2/jars
>> :: loading settings :: url =
>> jar:file:/home/matmsh/installed/sparks/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>> com.databricks#spark-avro_2.10 added as a dependency
>> :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
>>  confs: [default]
>>  found com.databricks#spark-avro_2.10;1.0.0 in list
>>  found org.apache.avro#avro;1.7.6 in local-m2-cache
>>  found org.codehaus.jackson#jackson-core-asl;1.9.13 in list
>>  found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in list
>>  found com.thoughtworks.paranamer#paranamer;2.3 in list
>>  found org.xerial.snappy#snappy-java;1.0.5 in list
>>  found org.apache.commons#commons-compress;1.4.1 in list
>>  found org.tukaani#xz;1.0 in list
>>  found org.slf4j#slf4j-api;1.6.4 in list
>> :: resolution report :: resolve 421ms :: artifacts dl 16ms
>>  :: modules in use:
>>  com.databricks#spark-avro_2.10;1.0.0 from list in [default]
>>  com.thoughtworks.paranamer#paranamer;2.3 from list in [default]
>>  org.apache.avro#avro;1.7.6 from local-m2-cache in [default]
>>  org.apache.commons#commons-compress;1.4.1 from list in [default]
>>  org.codehaus.jackson#jackson-core-asl;1.9.13 from list in [default]
>>  org.codehaus.jackson#jackson-mapper-asl;1.9.13 from list in [default]
>>  org.slf4j#slf4j-api;1.6.4 from list in [default]
>>  org.tukaani#xz;1.0 from list in [default]
>>  org.xerial.snappy#snappy-java;1.0.5 from list in [default]
>>  -
>>  | | modules || artifacts |
>>  | conf | number| search|dwnlded|evicted|| number|dwnlded|
>>  -
>>  | default | 9 | 0 | 0 | 0 || 9 | 0 |
>>  -
>> :: retrieving :: org.apache.spark#spark-submit-parent
>>  confs: [default]
>>  0 artifacts copied, 9 already retrieved (0kB/9ms)
>> 15/06/13 17:37:42 INFO spark.SparkContext: Running Spark version 1.4.0
>> 15/06/13 17:37:42 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 15/06/13 17:37:42 WARN util.Utils: Your hostname, gauss resolves to a
>> loopback address: 127.0.0.1; using 192.168.0.10 instead (on interface
>> enp3s0)
>> 15/06/13 17:37:42 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind
>> to another address
>> 15/06/13 17:37:42 INFO spark.SecurityManager: Changing view acls to:
>> matmsh
>> 15/06/13 17:37:42 INFO spark.SecurityManager: Changing modify acls to:
>> matmsh
>> 15/06/13 17:37:42 INFO spark.SecurityManager: SecurityManager:
>> authentication disabled; ui acls disabled; users with view permissions:
>> Set(matmsh); users with modify permissions: Set(matmsh)
>> 15/06/13 17:37:43 INFO slf4j.Slf4jLogger: Slf4jLogger started
>> 15/06/13 17:37:43 INFO Remoting: Starting remoting
>> 15/06/13 17:37:43 INF

Re: How to read avro in SparkR

2015-06-13 Thread Burak Yavuz
Hi,
Not sure if this is it, but could you please try
"com.databricks.spark.avro" instead of just "avro".

Thanks,
Burak
On Jun 13, 2015 9:55 AM, "Shing Hing Man"  wrote:

> Hi,
>   I am trying to read a avro file in SparkR (in Spark 1.4.0).
>
> I started R using the following.
> matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0
>
> Inside the R shell, when I issue the following,
>
> > read.df(sqlContext, "file:///home/matmsh/myfile.avro","avro")
>
> I get the following exception.
> Caused by: java.lang.RuntimeException: Failed to load class for data
> source: avro
>
> Below is the stack trace.
>
>
> matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0
>
> R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
> Copyright (C) 2015 The R Foundation for Statistical Computing
> Platform: x86_64-suse-linux-gnu (64-bit)
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>  Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
> Launching java with spark-submit command
> /home/matmsh/installed/spark/bin/spark-submit "--packages"
> "com.databricks:spark-avro_2.10:1.0.0" "sparkr-shell"
> /tmp/RtmpoT7FrF/backend_port464e1e2fb16a
> Ivy Default Cache set to: /home/matmsh/.ivy2/cache
> The jars for the packages stored in: /home/matmsh/.ivy2/jars
> :: loading settings :: url =
> jar:file:/home/matmsh/installed/sparks/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
> com.databricks#spark-avro_2.10 added as a dependency
> :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
>  confs: [default]
>  found com.databricks#spark-avro_2.10;1.0.0 in list
>  found org.apache.avro#avro;1.7.6 in local-m2-cache
>  found org.codehaus.jackson#jackson-core-asl;1.9.13 in list
>  found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in list
>  found com.thoughtworks.paranamer#paranamer;2.3 in list
>  found org.xerial.snappy#snappy-java;1.0.5 in list
>  found org.apache.commons#commons-compress;1.4.1 in list
>  found org.tukaani#xz;1.0 in list
>  found org.slf4j#slf4j-api;1.6.4 in list
> :: resolution report :: resolve 421ms :: artifacts dl 16ms
>  :: modules in use:
>  com.databricks#spark-avro_2.10;1.0.0 from list in [default]
>  com.thoughtworks.paranamer#paranamer;2.3 from list in [default]
>  org.apache.avro#avro;1.7.6 from local-m2-cache in [default]
>  org.apache.commons#commons-compress;1.4.1 from list in [default]
>  org.codehaus.jackson#jackson-core-asl;1.9.13 from list in [default]
>  org.codehaus.jackson#jackson-mapper-asl;1.9.13 from list in [default]
>  org.slf4j#slf4j-api;1.6.4 from list in [default]
>  org.tukaani#xz;1.0 from list in [default]
>  org.xerial.snappy#snappy-java;1.0.5 from list in [default]
>  -
>  | | modules || artifacts |
>  | conf | number| search|dwnlded|evicted|| number|dwnlded|
>  -
>  | default | 9 | 0 | 0 | 0 || 9 | 0 |
>  -
> :: retrieving :: org.apache.spark#spark-submit-parent
>  confs: [default]
>  0 artifacts copied, 9 already retrieved (0kB/9ms)
> 15/06/13 17:37:42 INFO spark.SparkContext: Running Spark version 1.4.0
> 15/06/13 17:37:42 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 15/06/13 17:37:42 WARN util.Utils: Your hostname, gauss resolves to a
> loopback address: 127.0.0.1; using 192.168.0.10 instead (on interface
> enp3s0)
> 15/06/13 17:37:42 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind
> to another address
> 15/06/13 17:37:42 INFO spark.SecurityManager: Changing view acls to: matmsh
> 15/06/13 17:37:42 INFO spark.SecurityManager: Changing modify acls to:
> matmsh
> 15/06/13 17:37:42 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view permissions:
> Set(matmsh); users with modify permissions: Set(matmsh)
> 15/06/13 17:37:43 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 15/06/13 17:37:43 INFO Remoting: Starting remoting
> 15/06/13 17:37:43 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkDriver@192.168.0.10:46219]
> 15/06/13 17:37:43 INFO util.Utils: Successfully started service
> 'sparkDriver' on port 46219.
> 15/06/13 17:37:43 INFO spark.SparkEnv: Registering MapOutputTracker
> 15/06/13 17:37:43 INFO spark.SparkEnv: Registering BlockManagerMaster