from:"Exie"

Re: SparkR Supported Types - Please add bigint

2015-07-23 Thread Exie

Interestingly, after more digging, df.printSchema() in raw spark shows the columns as a long, not a bigint. root |-- localEventDtTm: timestamp (nullable = true) |-- asset: string (nullable = true) |-- assetCategory: string (nullable = true) |-- assetType: string (nullable = true) |-- event:

SparkR Supported Types - Please add bigint

2015-07-23 Thread Exie

Hi Folks, Using Spark to read in JSON files and detect the schema, it gives me a dataframe with a bigint filed. R then fails to import the dataframe as it cant convert the type. head(mydf) Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class jobj to a data.frame

Re: s3 bucket access/read file

2015-06-30 Thread Exie

Not sure if this helps, but the options I set are slightly different: val hadoopConf=sc.hadoopConfiguration hadoopConf.set(fs.s3n.awsAccessKeyId,key) hadoopConf.set(fs.s3n.awsSecretAccessKey,secret) Try setting them to s3n as opposed to just s3 Good luck! -- View this message in context:

Re: Spark 1.4.0: read.df() causes excessive IO

2015-06-30 Thread Exie

Just to add to this, here's some more info: val myDF = hiveContext.read.parquet(s3n://myBucket/myPath/) Produces these... 2015-07-01 03:25:50,450 INFO [pool-14-thread-4] (org.apache.hadoop.fs.s3native.NativeS3FileSystem) - Opening 's3n://myBucket/myPath/part-r-00339.parquet' for reading That

Re: Spark run errors on Raspberry Pi

2015-06-30 Thread Exie

FWIW, I had some trouble getting Spark running on a Pi. My core problem was using snappy for compression as it comes as a pre-made binary for i386 and I couldnt find one for ARM. So to work around it there was an option to use LZO instead, then everything worked. Off the top of my head, it was

1.4.0

2015-06-30 Thread Exie

So I was delighted with Spark 1.3.1 using Parquet 1.6.0 which would partition data into folders. So I set up some parquet data paritioned by date. This enabled is to reference a single day/month/year minimizing how much data was scanned. eg: val myDataFrame =

Spark 1.4.0: read.df() causes excessive IO

2015-06-29 Thread Exie

Hi Folks, I just stepped up from 1.3.1 to 1.4.0, the most notable difference for me so far is the data frame reader/writer. Previously: val myData = hiveContext.load(s3n://someBucket/somePath/,parquet) Now: val myData = hiveContext.read.parquet(s3n://someBucket/somePath) Using the original

Spark 1.3.0 - 1.3.1 produces java.lang.NoSuchFieldError: NO_FILTER

2015-05-14 Thread Exie

Hello Bright Sparks, I was using Spark 1.3.0 to push data out to Parquet files. They have been working great, super fast, easy way to persist data frames etc. However I just swapped out Spark 1.3.0 and picked up the tarball for 1.3.1. I unzipped it, copied my config over and then went to read

Re: SparkR Supported Types - Please add bigint

SparkR Supported Types - Please add bigint

Re: s3 bucket access/read file

Re: Spark 1.4.0: read.df() causes excessive IO

Re: Spark run errors on Raspberry Pi

1.4.0

Spark 1.4.0: read.df() causes excessive IO

Spark 1.3.0 - 1.3.1 produces java.lang.NoSuchFieldError: NO_FILTER

8 matches

Site Navigation

Mail list logo

Footer information