Hi All,
In the above scenario if the field delimiter is default of hive then Spark
is able to parse the data as expected , hence i believe this is a bug.
Regards,
Shiva Achari
On Tue, Apr 5, 2016 at 8:15 PM, Shiva Achari wrote:
> Hi,
>
> I have created a hive external table stored as textfile partitioned by
> event_date Date.
>
> How do we have to specify a specific format of csv while reading in spark
> from Hive table ?
>
> The environment is
>
> 1. 1.Spark 1.5.0 - cdh5.5.1 Using Scala version 2.10.4(Java
> HotSpot(TM) 64 - Bit Server VM, Java 1.7.0_67)
> 2. Hive 1.1, CDH 5.5.1
>
> scala script
>
> sqlContext.setConf("hive.exec.dynamic.partition", "true")
> sqlContext.setConf("hive.exec.dynamic.partition.mode",
> "nonstrict")
>
> val distData = sc.parallelize(Array((1, 1, 1), (2, 2, 2), (3, 3,
> 3))).toDF
> val distData_1 = distData.withColumn("event_date", current_date())
> distData_1: org.apache.spark.sql.DataFrame = [_1: int, _2: int,
> _3: int, event_date: date]
>
> scala > distData_1.show
> + ---+---+---+--+
> |_1 |_2 |_3 | event_date |
> | 1 | 1 | 1 | 2016-03-25 |
> | 2 | 2 | 2 | 2016-03-25 |
> | 3 | 3 | 3 | 2016-03-25 |
>
>
> distData_1.write.mode("append").partitionBy("event_date").saveAsTable("part_table")
>
>
> scala > sqlContext.sql("select * from part_table").show
> | a| b| c| event_date |
> |1,1,1 | null | null | 2016-03-25 |
> |2,2,2 | null | null | 2016-03-25 |
> |3,3,3 | null | null | 2016-03-25 |
>
>
>
> Hive table
>
> create external table part_table (a String, b int, c bigint)
> partitioned by (event_date Date)
> row format delimited fields terminated by ','
> stored as textfile LOCATION "/user/hdfs/hive/part_table";
>
> select * from part_table shows
> |part_table.a | part_table.b | part_table.c |
> part_table.event_date |
> |1 |1 |1
> |2016-03-25
> |2 |2 |2
> |2016-03-25
> |3 |3 |3
> |2016-03-25
>
>
> Looking at the hdfs
>
>
> The path has 2 part files
> /user/hdfs/hive/part_table/event_date=2016-03-25
> part-0
> part-1
>
> part-0 content
> 1,1,1
> part-1 content
> 2,2,2
> 3,3,3
>
>
> P.S. if we store the table as orc it writes and reads the data as
> expected.
>
>