Re: Map Join Problems
Hi Peter, There are other parameters work with hive.auto.convert.join. Please check "hive.hashtable.max.memory.usage" and "hive.smalltable.filesize". They're quite self-explanatory but if you need more details you can check the auto mapjoin JIRA ticket https://issues.apache.org/jira/browse/HIVE-1642. On Tue, May 28, 2013 at 11:17 AM, Peter Chu wrote: > Using Hive 0.8.1 on Amazon EMR Hadoop Job. > > Some problems with using mapjoin: > > 1) Exceed memory, I got the following errors. Then I remove mapjoin in > the query and instead set hive.auto.convert.join=true, thinking that let > hive decides when mapjoin is suitable. It does run much farther in the > job, but then another similar error towards the end. > > 2) The I tried with mapjoin in the same query before and then set > hive.mapjoin.localtask.max.memory.usage=3, same exact error. > > My questions is that is there any other settings I can use to increase > mapjoin memory or hashtable size? Or is there any other better options? > > 2013-05-25 11:37:39 Starting to launch local task to process map join; > maximum memory = 932118528 > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/home/hadoop/.versions/hive-0.8.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > 2013-05-25 11:37:57 Processing rows:20 Hashtable size: 19 > Memory usage: 776687416 rate: 0.833 > 2013-05-25 11:38:00 Processing rows:215031 Hashtable size: 215031 > Memory usage: 813018320 rate: 0.872 > 2013-05-25 11:38:00 Dump the hashtable into file: > file:/tmp/hadoop/hive_2013-05-25_23-37-37_320_2027014861824847272/-local-10006/HashTable-Stage-6/MapJoin-bu-21--.hashtable > Execution failed with exit status: 2 > Obtaining error information > > Task failed! > Task ID: > Stage-10 > > Logs: > > > 3) Please look at the four errors example below. My other question is that > of all the runs with that mapjoin error, there is a pattern, the mapjoin is > done on bu, which is a job before the error, all the error happens* just shy > of 4 rows of the bu table mapjoin*, I found this too much of a coincidence, > can someone please offer some insight?: > > #1: > > *215035* Rows loaded to > hdfs://10.190.182.26:9000/mnt/var/lib/hive_081/tmp/scratch/hive_2013-05-25_23-35-49_100_9059150281675034748/-ext-1 > MapReduce Jobs Launched: > Job 0: Map: 18 Reduce: 13 Accumulative CPU: 139.31 sec HDFS Read: > 1226414348 HDFS Write: 2179 SUCCESS > Job 1: Map: 9 Accumulative CPU: 54.1 sec HDFS Read: 687306237 HDFS Write: > 695722636 SUCCESS > Job 2: Map: 16 Accumulative CPU: 89.09 sec HDFS Read: 695838641 HDFS > Write: 703096594 SUCCESS > Total MapReduce CPU Time Spent: 4 minutes 42 seconds 500 msec > OK > Time taken: 108.206 seconds > OK > Time taken: 0.013 seconds > Total MapReduce jobs = 3 > Execution log at: > /tmp/hadoop/hadoop_20130525233737_8911fdca-6536-45bb-aac3-19b92b3de99c.log > 2013-05-25 11:37:39 Starting to launch local task to process map join; > maximum memory = 932118528 > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/home/hadoop/.versions/hive-0.8.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > 2013-05-25 11:37:57 Processing rows:20 Hashtable size: 19 > Memory usage: 776687416 rate: 0.833 > 2013-05-25 11:38:00 Processing rows:*215031*Hashtable size: > 215031 Memory usage: 813018320 rate: 0.872 > 2013-05-25 11:38:00 Dump the hashtable into file: > file:/tmp/hadoop/hive_2013-05-25_23-37-37_320_2027014861824847272/-local-10006/HashTable-Stage-6/MapJoin-bu-21--.hashtable > Execution failed with exit status: 2 > Obtaining error information > > Task failed! > Task ID: > Stage-10 > > Logs: > > #2 > > Table default.badurls stats: [num_partitions: 0, num_files: 18, num_rows: 0, > total_size: 701922144, raw_data_size: 0]*214618* Rows loaded to > hdfs://10.46.205.55:9000/mnt/var/lib/hive_081/tmp/scratch/hive_2013-05-25_23-12-54_513_8781300101638774300/-ext-1 > MapReduce Jobs Launched: > Job 0: Map: 21 Reduce: 13 Accumulative CPU: 142.11 sec HDFS Read: > 1225164183 HDFS Write: 2179 SUCCESS > Job 1: Map: 9 Accumulative CPU: 53.25 sec HDFS Read: 686157231 HDFS > Write: 694562725 SUCCESS > Job 2: Map: 18 Accumulative CPU: 92.26 sec HDFS Read: 694650326 HDFS > Write: 701922144 SUCCESS > Total MapReduce CPU Time Spent: 4 minutes 47 seconds 620 msec > OK > Time taken: 104.744 seconds
Map Join Problems
Using Hive 0.8.1 on Amazon EMR Hadoop Job. Some problems with using mapjoin: 1) Exceed memory, I got the following errors. Then I remove mapjoin in the query and instead set hive.auto.convert.join=true, thinking that let hive decides when mapjoin is suitable. It does run much farther in the job, but then another similar error towards the end. 2) The I tried with mapjoin in the same query before and then set hive.mapjoin.localtask.max.memory.usage=3, same exact error. My questions is that is there any other settings I can use to increase mapjoin memory or hashtable size? Or is there any other better options? 2013-05-25 11:37:39 Starting to launch local task to process map join; maximum memory = 932118528 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/.versions/hive-0.8.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2013-05-25 11:37:57 Processing rows:20 Hashtable size: 19 Memory usage: 776687416 rate: 0.833 2013-05-25 11:38:00 Processing rows:215031 Hashtable size: 215031 Memory usage: 813018320 rate: 0.872 2013-05-25 11:38:00 Dump the hashtable into file: file:/tmp/hadoop/hive_2013-05-25_23-37-37_320_2027014861824847272/-local-10006/HashTable-Stage-6/MapJoin-bu-21--.hashtable Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-10 Logs: 3) Please look at the four errors example below. My other question is that of all the runs with that mapjoin error, there is a pattern, the mapjoin is done on bu, which is a job before the error, all the error happens just shy of 4 rows of the bu table mapjoin, I found this too much of a coincidence, can someone please offer some insight?:#1:215035 Rows loaded to hdfs://10.190.182.26:9000/mnt/var/lib/hive_081/tmp/scratch/hive_2013-05-25_23-35-49_100_9059150281675034748/-ext-1 MapReduce Jobs Launched: Job 0: Map: 18 Reduce: 13 Accumulative CPU: 139.31 sec HDFS Read: 1226414348 HDFS Write: 2179 SUCCESS Job 1: Map: 9 Accumulative CPU: 54.1 sec HDFS Read: 687306237 HDFS Write: 695722636 SUCCESS Job 2: Map: 16 Accumulative CPU: 89.09 sec HDFS Read: 695838641 HDFS Write: 703096594 SUCCESS Total MapReduce CPU Time Spent: 4 minutes 42 seconds 500 msec OK Time taken: 108.206 seconds OK Time taken: 0.013 seconds Total MapReduce jobs = 3 Execution log at: /tmp/hadoop/hadoop_20130525233737_8911fdca-6536-45bb-aac3-19b92b3de99c.log 2013-05-25 11:37:39 Starting to launch local task to process map join; maximum memory = 932118528 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/.versions/hive-0.8.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2013-05-25 11:37:57 Processing rows:20 Hashtable size: 19 Memory usage: 776687416 rate: 0.833 2013-05-25 11:38:00 Processing rows:215031 Hashtable size: 215031 Memory usage: 813018320 rate: 0.872 2013-05-25 11:38:00 Dump the hashtable into file: file:/tmp/hadoop/hive_2013-05-25_23-37-37_320_2027014861824847272/-local-10006/HashTable-Stage-6/MapJoin-bu-21--.hashtable Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-10 Logs:#2Table default.badurls stats: [num_partitions: 0, num_files: 18, num_rows: 0, total_size: 701922144, raw_data_size: 0] 214618 Rows loaded to hdfs://10.46.205.55:9000/mnt/var/lib/hive_081/tmp/scratch/hive_2013-05-25_23-12-54_513_8781300101638774300/-ext-1 MapReduce Jobs Launched: Job 0: Map: 21 Reduce: 13 Accumulative CPU: 142.11 sec HDFS Read: 1225164183 HDFS Write: 2179 SUCCESS Job 1: Map: 9 Accumulative CPU: 53.25 sec HDFS Read: 686157231 HDFS Write: 694562725 SUCCESS Job 2: Map: 18 Accumulative CPU: 92.26 sec HDFS Read: 694650326 HDFS Write: 701922144 SUCCESS Total MapReduce CPU Time Spent: 4 minutes 47 seconds 620 msec OK Time taken: 104.744 seconds OK Time taken: 0.013 seconds Total MapReduce jobs = 3 Execution log at: /tmp/hadoop/hadoop_20130525231414_3cc50fdd-7e7a-4bcf-baab-b465804c6e49.log 2013-05-25 11:14:41 Starting to launch local task to process map join; maximum memory = 932118528 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/.versions/hive-0.8.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder
Wrapping an analytical function
Hi there, I am trying the following: select case when ta_end_datetime_berekenen = 'Y' then lead(ta_update_datetime) over ( partition by dn_waarde_van, dn_waarde_tot order by ta_update_datetime ) else ea_end_datetime end ea_end_datetime, ta_insert_datetime, ta_update_datetime from tmp_wtdh_bestedingsklasse_10_s2_stap2 However, when I try that, I get the following error: NoViableAltException(86@[129:7: ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN ) )?]) FAILED: ParseException line 1:175 missing KW_END at 'over' near ')' in selection target line 1:254 cannot recognize input near 'else' 'ea_end_datetime' 'end' in selection target Would I be correct in assuming that it's not possible to wrap an analytical function in another function? This is with Hive 0.11. Any feedback is appreciated! Kind regards, Frans Drijver
Re: Partitioning confusion
if you have a 250GB file, then how did it become 2.5TB ? if you can not write a mapreduce job to process to write your data into specific partitions, then the other way around would be load the entire data into a temporary table and then load data into partitioned tabled and then drop the temporary table. approach would be 1) Create a temporary table CREATE TABLE temp_employees (name STRING, salary FLOAT, subordinates ARRAY, deductions MAP, address STRUCT ); 2) Load data into this table LOAD DATA LOCAL INPATH '/home/satish/data/employees/input/employees-country.txt' INTO TABLE temp_employees; 3) Create partitoned table CREATE TABLE employees (name STRING, salary FLOAT, subordinates ARRAY, deductions MAP, address STRUCT ) PARTITIONED BY (country STRING, state STRING); 2) Load data from temporary data into partitioned tables 2a) if there are already partitions existing then be careful giving a blanket query or it will overwrite data into exisiting partitions insert overwrite employees partition(country='USA',state='IL') select * from temp_employees where country="USA" and state ="IL"; this you will need to do for all of them 2b) if your partitioned table is empty currently then you can just enable dynamic partitioning and hive will take care of adding correct data into individual partitions set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.dynamic.partition=true; insert overwrite table employees select * from employees; 3) drop the temporary table by doing above approach you will have your data being replicated 2x times of replication factor for sometime but then once you drop temporary table it will be normal HDFS replication factor size data. Its been sometime I have worked on hive so others may have better approaches as well, so wait for someone to correct me before going further :) On Mon, May 27, 2013 at 2:13 PM, Sai Sai wrote: > Nitin > I am still confused, from the below data that i have given should the > file which sits in the folder Country=USA and state=IL have only the rows > where Country=USA and state=IL or will it have rows of other countries also. > The reason i ask is because if we have a 250GB file and would like to > create 10 partitions that would end up in 2.5 TB * 3 = 7.5TB. Is this > expected. > Thanks > S > > -- > *From:* Nitin Pawar > *To:* user@hive.apache.org; Sai Sai > *Sent:* Monday, 27 May 2013 2:08 PM > *Subject:* Re: Partitioning confusion > > when you specify the load data query with specific partition, it will put > the entire data into that partition. > > > > On Mon, May 27, 2013 at 1:08 PM, Sai Sai wrote: > > > After creating a partition for a country (USA) and state (IL) and when we > go to the the hdfs site to look at the partition in the browser we r seeing > all the records for all the countries and states rather than just for the > partition created for US and IL given below, is this correct behavior: > > Here is my commands: > > > CREATE TABLE employees (name STRING, salary FLOAT, subordinates > ARRAY, deductions MAP, address STRUCT city:STRING, state:STRING, zip:INT, country:STRING> ) PARTITIONED BY > (country STRING, state STRING); > > LOAD DATA LOCAL INPATH > '/home/satish/data/employees/input/employees-country.txt' INTO TABLE > employees PARTITION (country='USA',state='IL'); > > > Here is my original data file, where i have a few countries data such as > USA, INDIA, UK, AUS: > > > John Doe10.0Mary SmithTodd JonesFederal Taxes.2State > Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600USA > Mary Smith8.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 > Ontario St.ChicagoIL60601USA > Todd Jones7.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago > Ave.Oak ParkIL60700USA > Bill King6.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure > Dr.ObscuriaIL60100USA > Boss Man20.0John DoeFred FinanceFederal Taxes.3State > Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500USA > Fred Finance15.0Stacy AccountantFederal Taxes.3State > Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500USA > Stacy Accountant6.0Federal Taxes.15State Taxes.03Insurance.1300 Main > St.NapervilleIL60563USA > John Doe 210.0Mary SmithTodd JonesFederal Taxes.2State > Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600INDIA > Mary Smith 28.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 > Ontario St.ChicagoIL60601INDIA > Todd Jones 27.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago > Ave.Oak ParkIL60700AUSTRALIA > Bill King 26.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure > Dr.ObscuriaIL60100AUSTRALIA > Boss Man2 20.0John DoeFred FinanceFederal Taxes.3State > Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500UK > Fred Finance 215.0Stacy AccountantFederal Taxes.3State > Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500UK > Stacy Accountant 26.0Federal Taxes.
Re: Partitioning confusion
Nitin I am still confused, from the below data that i have given should the file which sits in the folder Country=USA and state=IL have only the rows where Country=USA and state=IL or will it have rows of other countries also. The reason i ask is because if we have a 250GB file and would like to create 10 partitions that would end up in 2.5 TB * 3 = 7.5TB. Is this expected. Thanks S From: Nitin Pawar To: user@hive.apache.org; Sai Sai Sent: Monday, 27 May 2013 2:08 PM Subject: Re: Partitioning confusion when you specify the load data query with specific partition, it will put the entire data into that partition. On Mon, May 27, 2013 at 1:08 PM, Sai Sai wrote: > >After creating a partition for a country (USA) and state (IL) and when we go >to the the hdfs site to look at the partition in the browser we r seeing all >the records for all the countries and states rather than just for the >partition created for US and IL given below, is this correct behavior: > >Here is my commands: > > > > >CREATE TABLE employees (name STRING, salary FLOAT, subordinates ARRAY, >deductions MAP, address STRUCTstate:STRING, zip:INT, country:STRING> ) PARTITIONED BY (country STRING, state >STRING); > > >LOAD DATA LOCAL INPATH >'/home/satish/data/employees/input/employees-country.txt' INTO TABLE employees >PARTITION (country='USA',state='IL'); > > > > >Here is my original data file, where i have a few countries data such as USA, >INDIA, UK, AUS: > > > > >John Doe10.0Mary SmithTodd JonesFederal Taxes.2State Taxes.05Insurance.11 >Michigan Ave.ChicagoIL60600USA >Mary Smith8.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 Ontario >St.ChicagoIL60601USA >Todd Jones7.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago Ave.Oak >ParkIL60700USA >Bill King6.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure >Dr.ObscuriaIL60100USA >Boss Man20.0John DoeFred FinanceFederal Taxes.3State Taxes.07Insurance.051 >Pretentious Drive.ChicagoIL60500USA >Fred Finance15.0Stacy AccountantFederal Taxes.3State Taxes.07Insurance.052 >Pretentious Drive.ChicagoIL60500USA >Stacy Accountant6.0Federal Taxes.15State Taxes.03Insurance.1300 Main >St.NapervilleIL60563USA >John Doe 210.0Mary SmithTodd JonesFederal Taxes.2State >Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600INDIA >Mary Smith 28.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 >Ontario St.ChicagoIL60601INDIA >Todd Jones 27.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago >Ave.Oak ParkIL60700AUSTRALIA >Bill King 26.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure >Dr.ObscuriaIL60100AUSTRALIA >Boss Man2 20.0John DoeFred FinanceFederal Taxes.3State >Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500UK >Fred Finance 215.0Stacy AccountantFederal Taxes.3State >Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500UK >Stacy Accountant 26.0Federal Taxes.15State Taxes.03Insurance.1300 Main >St.NapervilleIL60563UK > > >Now when i navigate to: >Contents of directory >/user/hive/warehouse/db1.db/employees/country=USA/state=IL > > > >I see all the records and was wondering if it should have only USA & IL >records. >Please help. -- Nitin Pawar
Re: Partitioning confusion
when you specify the load data query with specific partition, it will put the entire data into that partition. On Mon, May 27, 2013 at 1:08 PM, Sai Sai wrote: > > After creating a partition for a country (USA) and state (IL) and when we > go to the the hdfs site to look at the partition in the browser we r seeing > all the records for all the countries and states rather than just for the > partition created for US and IL given below, is this correct behavior: > > Here is my commands: > > > CREATE TABLE employees (name STRING, salary FLOAT, subordinates > ARRAY, deductions MAP, address STRUCT city:STRING, state:STRING, zip:INT, country:STRING> ) PARTITIONED BY > (country STRING, state STRING); > > LOAD DATA LOCAL INPATH > '/home/satish/data/employees/input/employees-country.txt' INTO TABLE > employees PARTITION (country='USA',state='IL'); > > > Here is my original data file, where i have a few countries data such as > USA, INDIA, UK, AUS: > > > John Doe10.0Mary SmithTodd JonesFederal Taxes.2State > Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600USA > Mary Smith8.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 > Ontario St.ChicagoIL60601USA > Todd Jones7.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago > Ave.Oak ParkIL60700USA > Bill King6.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure > Dr.ObscuriaIL60100USA > Boss Man20.0John DoeFred FinanceFederal Taxes.3State > Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500USA > Fred Finance15.0Stacy AccountantFederal Taxes.3State > Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500USA > Stacy Accountant6.0Federal Taxes.15State Taxes.03Insurance.1300 Main > St.NapervilleIL60563USA > John Doe 210.0Mary SmithTodd JonesFederal Taxes.2State > Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600INDIA > Mary Smith 28.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 > Ontario St.ChicagoIL60601INDIA > Todd Jones 27.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago > Ave.Oak ParkIL60700AUSTRALIA > Bill King 26.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure > Dr.ObscuriaIL60100AUSTRALIA > Boss Man2 20.0John DoeFred FinanceFederal Taxes.3State > Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500UK > Fred Finance 215.0Stacy AccountantFederal Taxes.3State > Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500UK > Stacy Accountant 26.0Federal Taxes.15State Taxes.03Insurance.1300 Main > St.NapervilleIL60563UK > > Now when i navigate to: > Contents of directory > /user/hive/warehouse/db1.db/employees/country=USA/state=IL > > I see all the records and was wondering if it should have only USA & IL > records. > Please help. > -- Nitin Pawar
Re:Partitioning confusion
After creating a partition for a country (USA) and state (IL) and when we go to the the hdfs site to look at the partition in the browser we r seeing all the records for all the countries and states rather than just for the partition created for US and IL given below, is this correct behavior: Here is my commands: CREATE TABLE employees (name STRING, salary FLOAT, subordinates ARRAY, deductions MAP, address STRUCT ) PARTITIONED BY (country STRING, state STRING); LOAD DATA LOCAL INPATH '/home/satish/data/employees/input/employees-country.txt' INTO TABLE employees PARTITION (country='USA',state='IL'); Here is my original data file, where i have a few countries data such as USA, INDIA, UK, AUS: John Doe10.0Mary SmithTodd JonesFederal Taxes.2State Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600USA Mary Smith8.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 Ontario St.ChicagoIL60601USA Todd Jones7.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago Ave.Oak ParkIL60700USA Bill King6.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure Dr.ObscuriaIL60100USA Boss Man20.0John DoeFred FinanceFederal Taxes.3State Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500USA Fred Finance15.0Stacy AccountantFederal Taxes.3State Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500USA Stacy Accountant6.0Federal Taxes.15State Taxes.03Insurance.1300 Main St.NapervilleIL60563USA John Doe 210.0Mary SmithTodd JonesFederal Taxes.2State Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600INDIA Mary Smith 28.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 Ontario St.ChicagoIL60601INDIA Todd Jones 27.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago Ave.Oak ParkIL60700AUSTRALIA Bill King 26.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure Dr.ObscuriaIL60100AUSTRALIA Boss Man2 20.0John DoeFred FinanceFederal Taxes.3State Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500UK Fred Finance 215.0Stacy AccountantFederal Taxes.3State Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500UK Stacy Accountant 26.0Federal Taxes.15State Taxes.03Insurance.1300 Main St.NapervilleIL60563UK Now when i navigate to: Contents of directory /user/hive/warehouse/db1.db/employees/country=USA/state=IL I see all the records and was wondering if it should have only USA & IL records. Please help.
Re: Issue with Json tuple lateral view
Thanks Navis From: Navis류승우 To: user@hive.apache.org; Sai Sai Sent: Monday, 27 May 2013 12:15 PM Subject: Re: Issue with Json tuple lateral view Removing last ',' in second row would make result as you expected. I can't tell it's bug or not. 2013/5/27 Sai Sai : > > * > Here is the json-data that i load: > * > > { "blogID" : "FJY26J1333", "date" : "2012-04-01", "name" : "vpxnksu", > "comment" : "good stuff", "contact" : { "email" : "vpxn...@gmail.com", > "website" : "vpxnksu.wordpress.com" } } > { "blogID" : "VSAUMDFGSD", "date" : "2012-04-01", "name" : "yhftrcx", > "comment" : "another comment",} > > * > Here is the hive commands : > * > > CREATE EXTERNAL TABLE json_comments(value STRING) LOCATION > '/user/json-comments'; > > LOAD DATA LOCAL INPATH '/home/satish/data/inputSai/json-comments.txt' > OVERWRITE INTO TABLE json_comments; > > SELECT b.blogID, c.email FROM json_comments a LATERAL VIEW > json_tuple(a.value, 'blogID', 'contact') b AS blogID, contact LATERAL VIEW > json_tuple(b.contact, 'email', 'website') c AS email, website; > > * > Here r the results of map reduce: > * > > blogid email > FJY26J1333 vpxn...@gmail.com > NULL NULL > > * > My question is why the 2nd row is coming up as Null values, i was expecting > the results to be like this: > * > > blogid email > FJY26J1333 vpxn...@gmail.com > VSAUMDFGSD NULL > > Any input is appreciated in understanding this. > Thanks > S