Re: SparkR Supported Types - Please add bigint

2015-07-23 Thread Exie
Interestingly, after more digging, df.printSchema() in raw spark shows the columns as a long, not a bigint. root |-- localEventDtTm: timestamp (nullable = true) |-- asset: string (nullable = true) |-- assetCategory: string (nullable = true) |-- assetType: string (nullable = true) |-- event:

SparkR Supported Types - Please add bigint

2015-07-23 Thread Exie
Hi Folks, Using Spark to read in JSON files and detect the schema, it gives me a dataframe with a bigint filed. R then fails to import the dataframe as it cant convert the type. head(mydf) Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class jobj to a data.frame

Re: s3 bucket access/read file

2015-06-30 Thread Exie
Not sure if this helps, but the options I set are slightly different: val hadoopConf=sc.hadoopConfiguration hadoopConf.set(fs.s3n.awsAccessKeyId,key) hadoopConf.set(fs.s3n.awsSecretAccessKey,secret) Try setting them to s3n as opposed to just s3 Good luck! -- View this message in context:

Re: Spark 1.4.0: read.df() causes excessive IO

2015-06-30 Thread Exie
Just to add to this, here's some more info: val myDF = hiveContext.read.parquet(s3n://myBucket/myPath/) Produces these... 2015-07-01 03:25:50,450 INFO [pool-14-thread-4] (org.apache.hadoop.fs.s3native.NativeS3FileSystem) - Opening 's3n://myBucket/myPath/part-r-00339.parquet' for reading That

Re: Spark run errors on Raspberry Pi

2015-06-30 Thread Exie
FWIW, I had some trouble getting Spark running on a Pi. My core problem was using snappy for compression as it comes as a pre-made binary for i386 and I couldnt find one for ARM. So to work around it there was an option to use LZO instead, then everything worked. Off the top of my head, it was

1.4.0

2015-06-30 Thread Exie
So I was delighted with Spark 1.3.1 using Parquet 1.6.0 which would partition data into folders. So I set up some parquet data paritioned by date. This enabled is to reference a single day/month/year minimizing how much data was scanned. eg: val myDataFrame =

Spark 1.4.0: read.df() causes excessive IO

2015-06-29 Thread Exie
Hi Folks, I just stepped up from 1.3.1 to 1.4.0, the most notable difference for me so far is the data frame reader/writer. Previously: val myData = hiveContext.load(s3n://someBucket/somePath/,parquet) Now: val myData = hiveContext.read.parquet(s3n://someBucket/somePath) Using the original

Spark 1.3.0 - 1.3.1 produces java.lang.NoSuchFieldError: NO_FILTER

2015-05-14 Thread Exie
Hello Bright Sparks, I was using Spark 1.3.0 to push data out to Parquet files. They have been working great, super fast, easy way to persist data frames etc. However I just swapped out Spark 1.3.0 and picked up the tarball for 1.3.1. I unzipped it, copied my config over and then went to read