I am just starting to use parquet, and I have hit a big problem. CREATE TABLE if not exists demand_parquet ( account int, site int, carrier int , cluster string, os string, token string, idfa string, dpid_sha1 string, bid_requests int) partitioned by (date string) clustered by (dpid_sha1) into 256 buckets stored as parquet;
I insert with this statement FROM demand_carrier INSERT OVERWRITE TABLE sfr_demand_carrier_parquet PARTITION (date) SELECT date, account, site , carrier , cluster , os , token , idfa , dpid_sha1 , bid_requests; everything seems to work fine ( my hadoop job completes) But on the hive console output I see a neverending log (over 12 hours and counting). If I stop hive (during this logging phase) then the parquet file is destroyed. I assume I must be doing something fundamentally wrong... Presumably no one else has this excess logging issue, yet it seems like there is no way of turning the logging off without recompiling parquet.... What am I missing? Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.codec.CodecConfig: Compression set to false Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.codec.CodecConfig: Compression: UNCOMPRESSED Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Parquet block size to 134217728 Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Parquet page size to 1048576 Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576 Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Dictionary is on Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Validation is off Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0 Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.InternalParquetRecordWriter: Flushing mem store to file. allocated memory: 47,605,338 Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 0B for [account] INT32: 0 values, 0B raw, 0B comp, 0 pages, encodings: [] Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 0B for [site] INT32: 0 values, 0B raw, 0B comp, 0 pages, encodings: [] Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 0B for [carrier] INT32: 0 values, 0B raw, 0B comp, 0 pages, encodings: [] Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 0B for [cluster] BINARY: 0 values, 0B raw, 0B comp, 0 pages, encodings: [] Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 0B for [os] BINARY: 0 values, 0B raw, 0B comp, 0 pages, encodings: [] Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 0B for [token] BINARY: 0 values, 0B raw, 0B comp, 0 pages, encodings: [] Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 0B for [idfa] BINARY: 0 values, 0B raw, 0B comp, 0 pages, encodings: [] ....... Oct 1, 2014 11:42:59 AM INFO: parquet.hadoop.InternalParquetRecordWriter: Flushing mem store to file. allocated memory: 47,605,338 Oct 1, 2014 11:42:59 AM INFO: parquet.hadoop.ColumnChunkPage WriteStore: written 0B for [account] INT32: 0 values, 0B raw, 0B comp, 0 pages, encodings: [] Oct 1, 2014 11:42:59 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 0B for [site] INT32: 0 values, 0B raw, 0B comp, 0 pages, encodings: [] Oct 1, 2014 11:42:59 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 0B for [carrier] INT32: 0 values, 0B raw, 0B comp, 0 pages, encodings: [] Oct 1, 2014 11:42:59 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 0B for [cluster] BINARY: 0 values, 0B raw, 0B comp, 0 pages, encodings: [] Oct 1, 2014 11:42:59 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 0B for [os] BINARY: 0 values, 0B raw, 0B comp, 0 pages, encodings: [] Oct 1, 2014 11:42:59 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 0B for [token] BINARY: 0 values, 0B raw, 0B comp, 0 pages, encodings: [] in the individual reduce logs I see 44:39 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 23B for [account] INT32: 1 values, 6B raw, 6B comp, 1 pages, encodings: [BIT_PACKED, PLAIN, RLE] Sep 30, 2014 12:44:39 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 27B for [site] INT32: 1 values, 10B raw, 10B comp, 1 pages, encodings: [BIT_PACKED, PLAIN, RLE] Sep 30, 2014 12:44:39 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 27B for [carrier] INT32: 1 values, 10B raw, 10B comp, 1 pages, encodings: [BIT_PACKED, PLAIN, RLE] Sep 30, 2014 12:44:39 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 30B for [cluster] BINARY: 1 values, 13B raw, 13B comp, 1 pages, encodings: [BIT_PACKED, PLAIN, RLE] Sep 30, 2014 12:44:39 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 30B for [os] BINARY: 1 values, 13B raw, 13B comp, 1 pages, encodings: [BIT_PACKED, PLAIN, RLE] Sep 30, 2014 12:44:39 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 34B for [token] BINARY: 1 values, 17B raw, 17B comp, 1 pages, encodings: [BIT_PACKED, PLAIN, RLE] Sep 30, 2014 12:44:39 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 38B for [idfa] BINARY: 1 values, 21B raw, 21B comp, 1 pages, encodings: [BIT_PACKED, PLAIN, RLE] Sep 30, 2014 12:44:39 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 27B for [dpid_sha1] BINARY: 1 values, 10B raw, 10B comp, 1 pages, encodings: [BIT_PACKED, PLAIN, RLE] Sep 30, 2014 12:44:39 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 23B for [bid_requests] INT32: 1 values, 6B raw, 6B comp, 1 pages, encodings: [BIT_PACKED, PLAIN, RLE] Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.codec.CodecConfig: Compression set to false Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.codec.CodecConfig: Compression: UNCOMPRESSED Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Parquet block size to 134217728 Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Parquet page size to 1048576 Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576 Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Dictionary is on Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Validation is off Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0
