[jira] [Updated] (HIVE-11540) Too many delta files during Compaction - OOM
[ https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11540: Fix Version/s: 2.0.0 > Too many delta files during Compaction - OOM > > > Key: HIVE-11540 > URL: https://issues.apache.org/jira/browse/HIVE-11540 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Nivin Mathew >Assignee: Eugene Koifman > Labels: TODOC1.3 > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11540.3.patch, HIVE-11540.4.patch, > HIVE-11540.6.patch, HIVE-11540.patch > > > Hello, > I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with > an average of 20 million records a day. I have 5 compactors running at > various times (30m/5m/5s), no matter what time I give, the compactors seem to > run out of memory cleaning up a couple thousand delta files and ultimately > falls behind compacting/cleaning delta files. Any suggestions on what I can > do to improve performance? Or can Hive streaming not handle this kind of load? > I used this post as reference: > http://henning.kropponline.de/2015/05/19/hivesink-for-flume/ > {noformat} > 2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: > Error running child : java.lang.OutOfMemoryError: Direct buffer memory > Max block location exceeded for split: CompactorInputSplit{base: > hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406, > bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, > delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, > delta_1056415_1056416, delta_1056417_1056418,… > , delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, > delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, > delta_1074051_1074052]} splitsize: 8772 maxsize: 10 > 2015-08-12 15:34:25,271 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of > splits:3 > 2015-08-12 15:34:25,367 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting > tokens for job: job_1439397150426_0068 > 2015-08-12 15:34:25,603 INFO [upladevhwd04v.researchnow.com-18]: > impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted > application application_1439397150426_0068 > 2015-08-12 15:34:25,610 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:submit(1294)) - The url to track the job: > http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/ > 2015-08-12 15:34:25,611 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: > job_1439397150426_0068 > 2015-08-12 15:34:30,170 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:33,756 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job > job_1439397150426_0068 running in uber mode : false > 2015-08-12 15:34:33,757 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 0% reduce 0% > 2015-08-12 15:34:35,147 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:40,155 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:45,184 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:50,201 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:55,256 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:35:00,205 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:35:02,975 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 33% reduce 0% > 2015-08-12 15:35:02,982 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : > attempt_1439397150426_0068_m_00_0, Status : FAILED > 2015-08-12 15:35:03,000 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : > attempt_1439397150426_0068_m_01_0, Status : FAILED > 2015-08-12 15:35:04,008
[jira] [Updated] (HIVE-11540) Too many delta files during Compaction - OOM
[ https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-11540: -- Labels: TODOC1.3 (was: TODOC2.0) > Too many delta files during Compaction - OOM > > > Key: HIVE-11540 > URL: https://issues.apache.org/jira/browse/HIVE-11540 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Nivin Mathew >Assignee: Eugene Koifman > Labels: TODOC1.3 > Fix For: 1.3.0 > > Attachments: HIVE-11540.3.patch, HIVE-11540.4.patch, > HIVE-11540.6.patch, HIVE-11540.patch > > > Hello, > I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with > an average of 20 million records a day. I have 5 compactors running at > various times (30m/5m/5s), no matter what time I give, the compactors seem to > run out of memory cleaning up a couple thousand delta files and ultimately > falls behind compacting/cleaning delta files. Any suggestions on what I can > do to improve performance? Or can Hive streaming not handle this kind of load? > I used this post as reference: > http://henning.kropponline.de/2015/05/19/hivesink-for-flume/ > {noformat} > 2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: > Error running child : java.lang.OutOfMemoryError: Direct buffer memory > Max block location exceeded for split: CompactorInputSplit{base: > hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406, > bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, > delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, > delta_1056415_1056416, delta_1056417_1056418,… > , delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, > delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, > delta_1074051_1074052]} splitsize: 8772 maxsize: 10 > 2015-08-12 15:34:25,271 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of > splits:3 > 2015-08-12 15:34:25,367 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting > tokens for job: job_1439397150426_0068 > 2015-08-12 15:34:25,603 INFO [upladevhwd04v.researchnow.com-18]: > impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted > application application_1439397150426_0068 > 2015-08-12 15:34:25,610 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:submit(1294)) - The url to track the job: > http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/ > 2015-08-12 15:34:25,611 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: > job_1439397150426_0068 > 2015-08-12 15:34:30,170 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:33,756 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job > job_1439397150426_0068 running in uber mode : false > 2015-08-12 15:34:33,757 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 0% reduce 0% > 2015-08-12 15:34:35,147 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:40,155 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:45,184 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:50,201 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:55,256 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:35:00,205 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:35:02,975 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 33% reduce 0% > 2015-08-12 15:35:02,982 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : > attempt_1439397150426_0068_m_00_0, Status : FAILED > 2015-08-12 15:35:03,000 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : > attempt_1439397150426_0068_m_01_0, Status : FAILED > 2015-08-12
[jira] [Updated] (HIVE-11540) Too many delta files during Compaction - OOM
[ https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11540: -- Attachment: HIVE-11540.6.patch > Too many delta files during Compaction - OOM > > > Key: HIVE-11540 > URL: https://issues.apache.org/jira/browse/HIVE-11540 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Nivin Mathew >Assignee: Eugene Koifman > Attachments: HIVE-11540.3.patch, HIVE-11540.4.patch, > HIVE-11540.6.patch, HIVE-11540.patch > > > Hello, > I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with > an average of 20 million records a day. I have 5 compactors running at > various times (30m/5m/5s), no matter what time I give, the compactors seem to > run out of memory cleaning up a couple thousand delta files and ultimately > falls behind compacting/cleaning delta files. Any suggestions on what I can > do to improve performance? Or can Hive streaming not handle this kind of load? > I used this post as reference: > http://henning.kropponline.de/2015/05/19/hivesink-for-flume/ > {noformat} > 2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: > Error running child : java.lang.OutOfMemoryError: Direct buffer memory > Max block location exceeded for split: CompactorInputSplit{base: > hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406, > bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, > delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, > delta_1056415_1056416, delta_1056417_1056418,… > , delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, > delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, > delta_1074051_1074052]} splitsize: 8772 maxsize: 10 > 2015-08-12 15:34:25,271 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of > splits:3 > 2015-08-12 15:34:25,367 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting > tokens for job: job_1439397150426_0068 > 2015-08-12 15:34:25,603 INFO [upladevhwd04v.researchnow.com-18]: > impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted > application application_1439397150426_0068 > 2015-08-12 15:34:25,610 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:submit(1294)) - The url to track the job: > http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/ > 2015-08-12 15:34:25,611 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: > job_1439397150426_0068 > 2015-08-12 15:34:30,170 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:33,756 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job > job_1439397150426_0068 running in uber mode : false > 2015-08-12 15:34:33,757 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 0% reduce 0% > 2015-08-12 15:34:35,147 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:40,155 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:45,184 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:50,201 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:55,256 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:35:00,205 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:35:02,975 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 33% reduce 0% > 2015-08-12 15:35:02,982 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : > attempt_1439397150426_0068_m_00_0, Status : FAILED > 2015-08-12 15:35:03,000 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : > attempt_1439397150426_0068_m_01_0, Status : FAILED > 2015-08-12 15:35:04,008 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job
[jira] [Updated] (HIVE-11540) Too many delta files during Compaction - OOM
[ https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-11540: -- Labels: TODOC2.0 (was: ) > Too many delta files during Compaction - OOM > > > Key: HIVE-11540 > URL: https://issues.apache.org/jira/browse/HIVE-11540 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Nivin Mathew >Assignee: Eugene Koifman > Labels: TODOC2.0 > Attachments: HIVE-11540.3.patch, HIVE-11540.4.patch, > HIVE-11540.6.patch, HIVE-11540.patch > > > Hello, > I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with > an average of 20 million records a day. I have 5 compactors running at > various times (30m/5m/5s), no matter what time I give, the compactors seem to > run out of memory cleaning up a couple thousand delta files and ultimately > falls behind compacting/cleaning delta files. Any suggestions on what I can > do to improve performance? Or can Hive streaming not handle this kind of load? > I used this post as reference: > http://henning.kropponline.de/2015/05/19/hivesink-for-flume/ > {noformat} > 2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: > Error running child : java.lang.OutOfMemoryError: Direct buffer memory > Max block location exceeded for split: CompactorInputSplit{base: > hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406, > bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, > delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, > delta_1056415_1056416, delta_1056417_1056418,… > , delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, > delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, > delta_1074051_1074052]} splitsize: 8772 maxsize: 10 > 2015-08-12 15:34:25,271 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of > splits:3 > 2015-08-12 15:34:25,367 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting > tokens for job: job_1439397150426_0068 > 2015-08-12 15:34:25,603 INFO [upladevhwd04v.researchnow.com-18]: > impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted > application application_1439397150426_0068 > 2015-08-12 15:34:25,610 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:submit(1294)) - The url to track the job: > http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/ > 2015-08-12 15:34:25,611 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: > job_1439397150426_0068 > 2015-08-12 15:34:30,170 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:33,756 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job > job_1439397150426_0068 running in uber mode : false > 2015-08-12 15:34:33,757 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 0% reduce 0% > 2015-08-12 15:34:35,147 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:40,155 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:45,184 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:50,201 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:55,256 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:35:00,205 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:35:02,975 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 33% reduce 0% > 2015-08-12 15:35:02,982 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : > attempt_1439397150426_0068_m_00_0, Status : FAILED > 2015-08-12 15:35:03,000 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : > attempt_1439397150426_0068_m_01_0, Status : FAILED > 2015-08-12 15:35:04,008 INFO
[jira] [Updated] (HIVE-11540) Too many delta files during Compaction - OOM
[ https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11540: -- Summary: Too many delta files during Compaction - OOM (was: Too many delta files during Compaction) > Too many delta files during Compaction - OOM > > > Key: HIVE-11540 > URL: https://issues.apache.org/jira/browse/HIVE-11540 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Nivin Mathew >Assignee: Eugene Koifman > Attachments: HIVE-11540.3.patch, HIVE-11540.patch > > > Hello, > I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with > an average of 20 million records a day. I have 5 compactors running at > various times (30m/5m/5s), no matter what time I give, the compactors seem to > run out of memory cleaning up a couple thousand delta files and ultimately > falls behind compacting/cleaning delta files. Any suggestions on what I can > do to improve performance? Or can Hive streaming not handle this kind of load? > I used this post as reference: > http://henning.kropponline.de/2015/05/19/hivesink-for-flume/ > {noformat} > 2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: > Error running child : java.lang.OutOfMemoryError: Direct buffer memory > Max block location exceeded for split: CompactorInputSplit{base: > hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406, > bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, > delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, > delta_1056415_1056416, delta_1056417_1056418,… > , delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, > delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, > delta_1074051_1074052]} splitsize: 8772 maxsize: 10 > 2015-08-12 15:34:25,271 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of > splits:3 > 2015-08-12 15:34:25,367 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting > tokens for job: job_1439397150426_0068 > 2015-08-12 15:34:25,603 INFO [upladevhwd04v.researchnow.com-18]: > impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted > application application_1439397150426_0068 > 2015-08-12 15:34:25,610 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:submit(1294)) - The url to track the job: > http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/ > 2015-08-12 15:34:25,611 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: > job_1439397150426_0068 > 2015-08-12 15:34:30,170 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:33,756 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job > job_1439397150426_0068 running in uber mode : false > 2015-08-12 15:34:33,757 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 0% reduce 0% > 2015-08-12 15:34:35,147 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:40,155 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:45,184 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:50,201 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:55,256 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:35:00,205 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:35:02,975 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 33% reduce 0% > 2015-08-12 15:35:02,982 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : > attempt_1439397150426_0068_m_00_0, Status : FAILED > 2015-08-12 15:35:03,000 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : > attempt_1439397150426_0068_m_01_0, Status : FAILED > 2015-08-12 15:35:04,008 INFO
[jira] [Updated] (HIVE-11540) Too many delta files during Compaction - OOM
[ https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11540: -- Attachment: HIVE-11540.4.patch > Too many delta files during Compaction - OOM > > > Key: HIVE-11540 > URL: https://issues.apache.org/jira/browse/HIVE-11540 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Nivin Mathew >Assignee: Eugene Koifman > Attachments: HIVE-11540.3.patch, HIVE-11540.4.patch, HIVE-11540.patch > > > Hello, > I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with > an average of 20 million records a day. I have 5 compactors running at > various times (30m/5m/5s), no matter what time I give, the compactors seem to > run out of memory cleaning up a couple thousand delta files and ultimately > falls behind compacting/cleaning delta files. Any suggestions on what I can > do to improve performance? Or can Hive streaming not handle this kind of load? > I used this post as reference: > http://henning.kropponline.de/2015/05/19/hivesink-for-flume/ > {noformat} > 2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: > Error running child : java.lang.OutOfMemoryError: Direct buffer memory > Max block location exceeded for split: CompactorInputSplit{base: > hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406, > bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, > delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, > delta_1056415_1056416, delta_1056417_1056418,… > , delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, > delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, > delta_1074051_1074052]} splitsize: 8772 maxsize: 10 > 2015-08-12 15:34:25,271 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of > splits:3 > 2015-08-12 15:34:25,367 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting > tokens for job: job_1439397150426_0068 > 2015-08-12 15:34:25,603 INFO [upladevhwd04v.researchnow.com-18]: > impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted > application application_1439397150426_0068 > 2015-08-12 15:34:25,610 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:submit(1294)) - The url to track the job: > http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/ > 2015-08-12 15:34:25,611 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: > job_1439397150426_0068 > 2015-08-12 15:34:30,170 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:33,756 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job > job_1439397150426_0068 running in uber mode : false > 2015-08-12 15:34:33,757 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 0% reduce 0% > 2015-08-12 15:34:35,147 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:40,155 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:45,184 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:50,201 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:34:55,256 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:35:00,205 INFO [Thread-7]: compactor.Initiator > (Initiator.java:run(88)) - Checking to see if we should compact > weblogs.vop_hs.dt=15-08-12 > 2015-08-12 15:35:02,975 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 33% reduce 0% > 2015-08-12 15:35:02,982 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : > attempt_1439397150426_0068_m_00_0, Status : FAILED > 2015-08-12 15:35:03,000 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : > attempt_1439397150426_0068_m_01_0, Status : FAILED > 2015-08-12 15:35:04,008 INFO [upladevhwd04v.researchnow.com-18]: > mapreduce.Job