Re: Importing csv files into Hive ORC target table

2016-02-18 Thread Alex Dzhagriev
Hi Mich, Try to use a regexp to parse your string instead of the split. Thanks, Alex. On Thu, Feb 18, 2016 at 6:35 PM, Mich Talebzadeh < mich.talebza...@cloudtechnologypartners.co.uk> wrote: > > > thanks, > > > > I have an issue here. > > define rdd to read the CSV file > > scala> var csv =

Importing csv files into Hive ORC target table

2016-02-18 Thread Mich Talebzadeh
> thanks, > > I have an issue here. > > define rdd to read the CSV file > > scala> var csv = sc.textFile("/data/stg/table2") > csv: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[69] at textFile at > :27 > > I then get rid of the header > > scala> val csv2 =

Re: Importing csv files into Hive ORC target table

2016-02-17 Thread Alex Dzhagriev
Hi Mich, You can use data frames ( http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes) to achieve that. val sqlContext = new HiveContext(sc) var rdd = sc.textFile("/data/stg/table2") //... //perform you business logic, cleanups, etc. //...

Importing csv files into Hive ORC target table

2016-02-17 Thread Mich Talebzadeh
Hi, We put csv files that are zipped using bzip into a staging are on hdfs In Hive an external table is created as below: DROP TABLE IF EXISTS stg_t2; CREATE EXTERNAL TABLE stg_t2 ( INVOICENUMBER string ,PAYMENTDATE string ,NET string ,VAT string ,TOTAL string ) COMMENT 'from csv file