Re: How to skip the malformatted records while loading data

2011-08-24 Thread Ashutosh Chauhan
One possibility is to filter out NULLs, something like following:

hive select * from tb where id != NULL or pref != NULL or zip != NULL;

This is not most efficient, but will work.

2011/8/18 XieXianshan xi...@cn.fujitsu.com

 Hi,everyone,

 Is there an option to ignore malformatted records while loading data
 into hive table?
 Or an option to ignore bad rows while querying data?

 For instance:
 1. Specify a row format explicitly for a new table.
 hivecreate table tb (id int, pref string, zip string) row format
 delimited fields terminated by ',' lines terminated by '\n';

 2. Load data into the table from a csv file that with bad records.
 hiveload data local inpath 'data.csv' overwrite into table tb;

 The data.csv might look like:
 32,aaa,422
 --Blank line
 33:bbb:423 --Invalid field delimiter :
 aa,ccc,424 --Non-int number aa

 3. Select data
 hive select * from tb;
 OK
 32 aaa 422
 NULL NULL NULL
 NULL NULL NULL
 NULL ccc 424
 Time taken: 0.196 seconds

 I have tried to set mapred.skip.map.max.skip.records,but it seems not to
 work.

 Thanks in advance.

 Regards,
 Xie

 --
 Best Regards
 Xie Xianshan
 --
 Xie Xianshan
 Dept.IV of Technology and Development
 Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
 No. 6 Wenzhu Road, Nanjing, China
 PostCode: 210012
 PHONE: +86+25-86630566-8522
 FUJITSU INTERNAL: 7998-8522
 MAIL: xi...@cn.fujitsu.com
 --
 This communication is for use by the intended recipient(s) only and may
 contain information that is privileged, confidential and exempt from
 disclosure under applicable law. If you are not an intended recipient of
 this communication, you are hereby notified that any dissemination,
 distribution or copying hereof is strictly prohibited.  If you have
 received this communication in error, please notify me by reply e-mail,
 permanently delete this communication from your system, and destroy any
 hard copies you may have printed




How to skip the malformatted records while loading data

2011-08-18 Thread XieXianshan
Hi,everyone,

Is there an option to ignore malformatted records while loading data
into hive table?
Or an option to ignore bad rows while querying data?

For instance:
1. Specify a row format explicitly for a new table.
hivecreate table tb (id int, pref string, zip string) row format
delimited fields terminated by ',' lines terminated by '\n';

2. Load data into the table from a csv file that with bad records.
hiveload data local inpath 'data.csv' overwrite into table tb;

The data.csv might look like:
32,aaa,422
--Blank line
33:bbb:423 --Invalid field delimiter :
aa,ccc,424 --Non-int number aa

3. Select data
hive select * from tb;
OK
32 aaa 422
NULL NULL NULL
NULL NULL NULL
NULL ccc 424
Time taken: 0.196 seconds

I have tried to set mapred.skip.map.max.skip.records,but it seems not to
work.

Thanks in advance.

Regards,
Xie

-- 
Best Regards
Xie Xianshan
--
Xie Xianshan
Dept.IV of Technology and Development
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, China
PostCode: 210012
PHONE: +86+25-86630566-8522
FUJITSU INTERNAL: 7998-8522
MAIL: xi...@cn.fujitsu.com
--
This communication is for use by the intended recipient(s) only and may
contain information that is privileged, confidential and exempt from
disclosure under applicable law. If you are not an intended recipient of
this communication, you are hereby notified that any dissemination,
distribution or copying hereof is strictly prohibited.  If you have
received this communication in error, please notify me by reply e-mail,
permanently delete this communication from your system, and destroy any
hard copies you may have printed 



How to skip the malformatted records while loading data

2011-08-18 Thread XieXianshan
Hi,everyone,

Is there an option to ignore malformatted records while loading data
into hive table?
Or an option to ignore bad rows while querying data?

For instance:
1. Specify a row format explicitly for a new table.
hivecreate table tb (id int, pref string, zip string) row format
delimited fields terminated by ',' lines terminated by '\n';

2. Load data into the table from a csv file that with bad records.
hiveload data local inpath 'data.csv' overwrite into table tb;

The data.csv might look like:
32,aaa,422
--Blank line
33:bbb:423 --Invalid field delimiter :
aa,ccc,424 --Non-int number aa

3. Select data
hive select * from tb;
OK
32 aaa 422
NULL NULL NULL
NULL NULL NULL
NULL ccc 424
Time taken: 0.196 seconds

I have tried to set mapred.skip.map.max.skip.records,but it seems not to
work.

Thanks in advance.

Regards,
Xie

-- 
Best Regards
Xie Xianshan
--
Xie Xianshan
Dept.IV of Technology and Development
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, China
PostCode: 210012
PHONE: +86+25-86630566-8522
FUJITSU INTERNAL: 7998-8522
MAIL: xi...@cn.fujitsu.com
--
This communication is for use by the intended recipient(s) only and may
contain information that is privileged, confidential and exempt from
disclosure under applicable law. If you are not an intended recipient of
this communication, you are hereby notified that any dissemination,
distribution or copying hereof is strictly prohibited.  If you have
received this communication in error, please notify me by reply e-mail,
permanently delete this communication from your system, and destroy any
hard copies you may have printed