Re: How to skip the malformatted records while loading data
One possibility is to filter out NULLs, something like following: hive select * from tb where id != NULL or pref != NULL or zip != NULL; This is not most efficient, but will work. 2011/8/18 XieXianshan xi...@cn.fujitsu.com Hi,everyone, Is there an option to ignore malformatted records while loading data into hive table? Or an option to ignore bad rows while querying data? For instance: 1. Specify a row format explicitly for a new table. hivecreate table tb (id int, pref string, zip string) row format delimited fields terminated by ',' lines terminated by '\n'; 2. Load data into the table from a csv file that with bad records. hiveload data local inpath 'data.csv' overwrite into table tb; The data.csv might look like: 32,aaa,422 --Blank line 33:bbb:423 --Invalid field delimiter : aa,ccc,424 --Non-int number aa 3. Select data hive select * from tb; OK 32 aaa 422 NULL NULL NULL NULL NULL NULL NULL ccc 424 Time taken: 0.196 seconds I have tried to set mapred.skip.map.max.skip.records,but it seems not to work. Thanks in advance. Regards, Xie -- Best Regards Xie Xianshan -- Xie Xianshan Dept.IV of Technology and Development Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST) No. 6 Wenzhu Road, Nanjing, China PostCode: 210012 PHONE: +86+25-86630566-8522 FUJITSU INTERNAL: 7998-8522 MAIL: xi...@cn.fujitsu.com -- This communication is for use by the intended recipient(s) only and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not an intended recipient of this communication, you are hereby notified that any dissemination, distribution or copying hereof is strictly prohibited. If you have received this communication in error, please notify me by reply e-mail, permanently delete this communication from your system, and destroy any hard copies you may have printed
How to skip the malformatted records while loading data
Hi,everyone, Is there an option to ignore malformatted records while loading data into hive table? Or an option to ignore bad rows while querying data? For instance: 1. Specify a row format explicitly for a new table. hivecreate table tb (id int, pref string, zip string) row format delimited fields terminated by ',' lines terminated by '\n'; 2. Load data into the table from a csv file that with bad records. hiveload data local inpath 'data.csv' overwrite into table tb; The data.csv might look like: 32,aaa,422 --Blank line 33:bbb:423 --Invalid field delimiter : aa,ccc,424 --Non-int number aa 3. Select data hive select * from tb; OK 32 aaa 422 NULL NULL NULL NULL NULL NULL NULL ccc 424 Time taken: 0.196 seconds I have tried to set mapred.skip.map.max.skip.records,but it seems not to work. Thanks in advance. Regards, Xie -- Best Regards Xie Xianshan -- Xie Xianshan Dept.IV of Technology and Development Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST) No. 6 Wenzhu Road, Nanjing, China PostCode: 210012 PHONE: +86+25-86630566-8522 FUJITSU INTERNAL: 7998-8522 MAIL: xi...@cn.fujitsu.com -- This communication is for use by the intended recipient(s) only and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not an intended recipient of this communication, you are hereby notified that any dissemination, distribution or copying hereof is strictly prohibited. If you have received this communication in error, please notify me by reply e-mail, permanently delete this communication from your system, and destroy any hard copies you may have printed
How to skip the malformatted records while loading data
Hi,everyone, Is there an option to ignore malformatted records while loading data into hive table? Or an option to ignore bad rows while querying data? For instance: 1. Specify a row format explicitly for a new table. hivecreate table tb (id int, pref string, zip string) row format delimited fields terminated by ',' lines terminated by '\n'; 2. Load data into the table from a csv file that with bad records. hiveload data local inpath 'data.csv' overwrite into table tb; The data.csv might look like: 32,aaa,422 --Blank line 33:bbb:423 --Invalid field delimiter : aa,ccc,424 --Non-int number aa 3. Select data hive select * from tb; OK 32 aaa 422 NULL NULL NULL NULL NULL NULL NULL ccc 424 Time taken: 0.196 seconds I have tried to set mapred.skip.map.max.skip.records,but it seems not to work. Thanks in advance. Regards, Xie -- Best Regards Xie Xianshan -- Xie Xianshan Dept.IV of Technology and Development Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST) No. 6 Wenzhu Road, Nanjing, China PostCode: 210012 PHONE: +86+25-86630566-8522 FUJITSU INTERNAL: 7998-8522 MAIL: xi...@cn.fujitsu.com -- This communication is for use by the intended recipient(s) only and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not an intended recipient of this communication, you are hereby notified that any dissemination, distribution or copying hereof is strictly prohibited. If you have received this communication in error, please notify me by reply e-mail, permanently delete this communication from your system, and destroy any hard copies you may have printed