Spark 1.3 SQL Type Parser Changes?

2015-03-10 Thread Nitay Joffe
In Spark 1.2 I used to be able to do this: scala org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType(structint:bigint) res30: org.apache.spark.sql.catalyst.types.DataType = StructType(List(StructField(int,LongType,true))) That is, the name of a column can be a keyword like int. This is no

Re: Spark S3 Performance

2014-11-24 Thread Nitay Joffe
: 14/11/20 15:39:45 [Executor task launch worker-1 ] INFO HadoopRDD: Input split: s3n://mybucket/myfile:335544320+67108864 On Nov 22, 2014 7:23 AM, Nitay Joffe ni...@actioniq.co wrote: Err I meant #1 :) - Nitay Founder CTO On Sat, Nov 22, 2014 at 10:20 AM, Nitay Joffe ni...@actioniq.co

Re: Spark S3 Performance

2014-11-22 Thread Nitay Joffe
Anyone have any thoughts on this? Trying to understand especially #2 if it's a legit bug or something I'm doing wrong. - Nitay Founder CTO On Thu, Nov 20, 2014 at 11:54 AM, Nitay Joffe ni...@actioniq.co wrote: I have a simple S3 job to read a text file and do a line count. Specifically I'm

Re: Spark S3 Performance

2014-11-22 Thread Nitay Joffe
Err I meant #1 :) - Nitay Founder CTO On Sat, Nov 22, 2014 at 10:20 AM, Nitay Joffe ni...@actioniq.co wrote: Anyone have any thoughts on this? Trying to understand especially #2 if it's a legit bug or something I'm doing wrong. - Nitay Founder CTO On Thu, Nov 20, 2014 at 11:54 AM

Spark S3 Performance

2014-11-20 Thread Nitay Joffe
I have a simple S3 job to read a text file and do a line count. Specifically I'm doing *sc.textFile(s3n://mybucket/myfile).count*.The file is about 1.2GB. My setup is standalone spark cluster with 4 workers each with 2 cores / 16GB ram. I'm using branch-1.2 code built against hadoop 2.4 (though