Thanks for the counting solution! Zhao, I've uploaded the S3 log parser to HIVE-693.
Among other things, I also noticed a Hive bug today: when using hive in server mode (via python) to import 400 different partitions one after another, datanode started reporting "too many open files" errors in its logs. The analysis showed that Hive is not closing connections to datanode at all when doing loads like this: LOAD DATA LOCAL INPATH '/home/neith/xshairlogs//mapped-2009-04-24.gz' OVERWRITE INTO TABLE shairlogs PARTITION (pdate='2009-04-24') [don't know if that happens also in CLI mode - doing 300 commands like that is a bit too tedious to see if it can be reproduced :] bye andraz >From Zheng Shao <zsh...@gmail.com> Subject Re: counting different regexes in a single pass Date Mon, 27 Jul 2009 20:46:24 GMT Hi Andraz, I just opened a JIRA for AWS S3 log format. Can you attach a patch file to: https://issues.apache.org/jira/browse/HIVE-693 ? For your question, I think the approach suggested by David Lerman should work fine. -- Andraz Tori, CTO Zemanta Ltd, New York, London, Ljubljana www.zemanta.com mail: and...@zemanta.com tel: +386 41 515 767 twitter: andraz, skype: minmax_test