In Spark 1.2 I used to be able to do this:
scala
org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType(structint:bigint)
res30: org.apache.spark.sql.catalyst.types.DataType =
StructType(List(StructField(int,LongType,true)))
That is, the name of a column can be a keyword like int. This is no
:
14/11/20 15:39:45 [Executor task launch worker-1 ] INFO HadoopRDD: Input
split: s3n://mybucket/myfile:335544320+67108864
On Nov 22, 2014 7:23 AM, Nitay Joffe ni...@actioniq.co wrote:
Err I meant #1 :)
- Nitay
Founder CTO
On Sat, Nov 22, 2014 at 10:20 AM, Nitay Joffe ni...@actioniq.co
Anyone have any thoughts on this? Trying to understand especially #2 if
it's a legit bug or something I'm doing wrong.
- Nitay
Founder CTO
On Thu, Nov 20, 2014 at 11:54 AM, Nitay Joffe ni...@actioniq.co wrote:
I have a simple S3 job to read a text file and do a line count.
Specifically I'm
Err I meant #1 :)
- Nitay
Founder CTO
On Sat, Nov 22, 2014 at 10:20 AM, Nitay Joffe ni...@actioniq.co wrote:
Anyone have any thoughts on this? Trying to understand especially #2 if
it's a legit bug or something I'm doing wrong.
- Nitay
Founder CTO
On Thu, Nov 20, 2014 at 11:54 AM
I have a simple S3 job to read a text file and do a line count.
Specifically I'm doing *sc.textFile(s3n://mybucket/myfile).count*.The
file is about 1.2GB. My setup is standalone spark cluster with 4 workers
each with 2 cores / 16GB ram. I'm using branch-1.2 code built against
hadoop 2.4 (though