[jira] [Updated] (HIVE-11540) Too many delta files during Compaction - OOM

2015-11-11 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11540:

Fix Version/s: 2.0.0

> Too many delta files during Compaction - OOM
> 
>
> Key: HIVE-11540
> URL: https://issues.apache.org/jira/browse/HIVE-11540
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Nivin Mathew
>Assignee: Eugene Koifman
>  Labels: TODOC1.3
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11540.3.patch, HIVE-11540.4.patch, 
> HIVE-11540.6.patch, HIVE-11540.patch
>
>
> Hello,
> I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with 
> an average of 20 million records a day. I have 5 compactors running at 
> various times (30m/5m/5s), no matter what time I give, the compactors seem to 
> run out of memory cleaning up a couple thousand delta files and ultimately 
> falls behind compacting/cleaning delta files. Any suggestions on what I can 
> do to improve performance? Or can Hive streaming not handle this kind of load?
> I used this post as reference: 
> http://henning.kropponline.de/2015/05/19/hivesink-for-flume/
> {noformat}
> 2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: 
> Error running child : java.lang.OutOfMemoryError: Direct buffer memory
> Max block location exceeded for split: CompactorInputSplit{base: 
> hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406,
>  bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, 
> delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, 
> delta_1056415_1056416, delta_1056417_1056418,…
> , delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, 
> delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, 
> delta_1074051_1074052]} splitsize: 8772 maxsize: 10
> 2015-08-12 15:34:25,271 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of 
> splits:3
> 2015-08-12 15:34:25,367 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting 
> tokens for job: job_1439397150426_0068
> 2015-08-12 15:34:25,603 INFO  [upladevhwd04v.researchnow.com-18]: 
> impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted 
> application application_1439397150426_0068
> 2015-08-12 15:34:25,610 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:submit(1294)) - The url to track the job: 
> http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/
> 2015-08-12 15:34:25,611 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: 
> job_1439397150426_0068
> 2015-08-12 15:34:30,170 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:33,756 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job 
> job_1439397150426_0068 running in uber mode : false
> 2015-08-12 15:34:33,757 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 0% reduce 0%
> 2015-08-12 15:34:35,147 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:40,155 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:45,184 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:50,201 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:55,256 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:00,205 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:02,975 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 33% reduce 0%
> 2015-08-12 15:35:02,982 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_00_0, Status : FAILED
> 2015-08-12 15:35:03,000 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_01_0, Status : FAILED
> 2015-08-12 15:35:04,008 

[jira] [Updated] (HIVE-11540) Too many delta files during Compaction - OOM

2015-10-26 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-11540:
--
Labels: TODOC1.3  (was: TODOC2.0)

> Too many delta files during Compaction - OOM
> 
>
> Key: HIVE-11540
> URL: https://issues.apache.org/jira/browse/HIVE-11540
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Nivin Mathew
>Assignee: Eugene Koifman
>  Labels: TODOC1.3
> Fix For: 1.3.0
>
> Attachments: HIVE-11540.3.patch, HIVE-11540.4.patch, 
> HIVE-11540.6.patch, HIVE-11540.patch
>
>
> Hello,
> I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with 
> an average of 20 million records a day. I have 5 compactors running at 
> various times (30m/5m/5s), no matter what time I give, the compactors seem to 
> run out of memory cleaning up a couple thousand delta files and ultimately 
> falls behind compacting/cleaning delta files. Any suggestions on what I can 
> do to improve performance? Or can Hive streaming not handle this kind of load?
> I used this post as reference: 
> http://henning.kropponline.de/2015/05/19/hivesink-for-flume/
> {noformat}
> 2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: 
> Error running child : java.lang.OutOfMemoryError: Direct buffer memory
> Max block location exceeded for split: CompactorInputSplit{base: 
> hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406,
>  bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, 
> delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, 
> delta_1056415_1056416, delta_1056417_1056418,…
> , delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, 
> delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, 
> delta_1074051_1074052]} splitsize: 8772 maxsize: 10
> 2015-08-12 15:34:25,271 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of 
> splits:3
> 2015-08-12 15:34:25,367 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting 
> tokens for job: job_1439397150426_0068
> 2015-08-12 15:34:25,603 INFO  [upladevhwd04v.researchnow.com-18]: 
> impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted 
> application application_1439397150426_0068
> 2015-08-12 15:34:25,610 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:submit(1294)) - The url to track the job: 
> http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/
> 2015-08-12 15:34:25,611 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: 
> job_1439397150426_0068
> 2015-08-12 15:34:30,170 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:33,756 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job 
> job_1439397150426_0068 running in uber mode : false
> 2015-08-12 15:34:33,757 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 0% reduce 0%
> 2015-08-12 15:34:35,147 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:40,155 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:45,184 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:50,201 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:55,256 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:00,205 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:02,975 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 33% reduce 0%
> 2015-08-12 15:35:02,982 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_00_0, Status : FAILED
> 2015-08-12 15:35:03,000 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_01_0, Status : FAILED
> 2015-08-12 

[jira] [Updated] (HIVE-11540) Too many delta files during Compaction - OOM

2015-10-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11540:
--
Attachment: HIVE-11540.6.patch

> Too many delta files during Compaction - OOM
> 
>
> Key: HIVE-11540
> URL: https://issues.apache.org/jira/browse/HIVE-11540
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Nivin Mathew
>Assignee: Eugene Koifman
> Attachments: HIVE-11540.3.patch, HIVE-11540.4.patch, 
> HIVE-11540.6.patch, HIVE-11540.patch
>
>
> Hello,
> I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with 
> an average of 20 million records a day. I have 5 compactors running at 
> various times (30m/5m/5s), no matter what time I give, the compactors seem to 
> run out of memory cleaning up a couple thousand delta files and ultimately 
> falls behind compacting/cleaning delta files. Any suggestions on what I can 
> do to improve performance? Or can Hive streaming not handle this kind of load?
> I used this post as reference: 
> http://henning.kropponline.de/2015/05/19/hivesink-for-flume/
> {noformat}
> 2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: 
> Error running child : java.lang.OutOfMemoryError: Direct buffer memory
> Max block location exceeded for split: CompactorInputSplit{base: 
> hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406,
>  bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, 
> delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, 
> delta_1056415_1056416, delta_1056417_1056418,…
> , delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, 
> delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, 
> delta_1074051_1074052]} splitsize: 8772 maxsize: 10
> 2015-08-12 15:34:25,271 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of 
> splits:3
> 2015-08-12 15:34:25,367 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting 
> tokens for job: job_1439397150426_0068
> 2015-08-12 15:34:25,603 INFO  [upladevhwd04v.researchnow.com-18]: 
> impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted 
> application application_1439397150426_0068
> 2015-08-12 15:34:25,610 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:submit(1294)) - The url to track the job: 
> http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/
> 2015-08-12 15:34:25,611 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: 
> job_1439397150426_0068
> 2015-08-12 15:34:30,170 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:33,756 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job 
> job_1439397150426_0068 running in uber mode : false
> 2015-08-12 15:34:33,757 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 0% reduce 0%
> 2015-08-12 15:34:35,147 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:40,155 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:45,184 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:50,201 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:55,256 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:00,205 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:02,975 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 33% reduce 0%
> 2015-08-12 15:35:02,982 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_00_0, Status : FAILED
> 2015-08-12 15:35:03,000 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_01_0, Status : FAILED
> 2015-08-12 15:35:04,008 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job 

[jira] [Updated] (HIVE-11540) Too many delta files during Compaction - OOM

2015-10-24 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-11540:
--
Labels: TODOC2.0  (was: )

> Too many delta files during Compaction - OOM
> 
>
> Key: HIVE-11540
> URL: https://issues.apache.org/jira/browse/HIVE-11540
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Nivin Mathew
>Assignee: Eugene Koifman
>  Labels: TODOC2.0
> Attachments: HIVE-11540.3.patch, HIVE-11540.4.patch, 
> HIVE-11540.6.patch, HIVE-11540.patch
>
>
> Hello,
> I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with 
> an average of 20 million records a day. I have 5 compactors running at 
> various times (30m/5m/5s), no matter what time I give, the compactors seem to 
> run out of memory cleaning up a couple thousand delta files and ultimately 
> falls behind compacting/cleaning delta files. Any suggestions on what I can 
> do to improve performance? Or can Hive streaming not handle this kind of load?
> I used this post as reference: 
> http://henning.kropponline.de/2015/05/19/hivesink-for-flume/
> {noformat}
> 2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: 
> Error running child : java.lang.OutOfMemoryError: Direct buffer memory
> Max block location exceeded for split: CompactorInputSplit{base: 
> hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406,
>  bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, 
> delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, 
> delta_1056415_1056416, delta_1056417_1056418,…
> , delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, 
> delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, 
> delta_1074051_1074052]} splitsize: 8772 maxsize: 10
> 2015-08-12 15:34:25,271 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of 
> splits:3
> 2015-08-12 15:34:25,367 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting 
> tokens for job: job_1439397150426_0068
> 2015-08-12 15:34:25,603 INFO  [upladevhwd04v.researchnow.com-18]: 
> impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted 
> application application_1439397150426_0068
> 2015-08-12 15:34:25,610 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:submit(1294)) - The url to track the job: 
> http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/
> 2015-08-12 15:34:25,611 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: 
> job_1439397150426_0068
> 2015-08-12 15:34:30,170 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:33,756 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job 
> job_1439397150426_0068 running in uber mode : false
> 2015-08-12 15:34:33,757 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 0% reduce 0%
> 2015-08-12 15:34:35,147 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:40,155 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:45,184 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:50,201 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:55,256 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:00,205 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:02,975 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 33% reduce 0%
> 2015-08-12 15:35:02,982 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_00_0, Status : FAILED
> 2015-08-12 15:35:03,000 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_01_0, Status : FAILED
> 2015-08-12 15:35:04,008 INFO  

[jira] [Updated] (HIVE-11540) Too many delta files during Compaction - OOM

2015-10-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11540:
--
Summary: Too many delta files during Compaction - OOM  (was: Too many delta 
files during Compaction)

> Too many delta files during Compaction - OOM
> 
>
> Key: HIVE-11540
> URL: https://issues.apache.org/jira/browse/HIVE-11540
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Nivin Mathew
>Assignee: Eugene Koifman
> Attachments: HIVE-11540.3.patch, HIVE-11540.patch
>
>
> Hello,
> I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with 
> an average of 20 million records a day. I have 5 compactors running at 
> various times (30m/5m/5s), no matter what time I give, the compactors seem to 
> run out of memory cleaning up a couple thousand delta files and ultimately 
> falls behind compacting/cleaning delta files. Any suggestions on what I can 
> do to improve performance? Or can Hive streaming not handle this kind of load?
> I used this post as reference: 
> http://henning.kropponline.de/2015/05/19/hivesink-for-flume/
> {noformat}
> 2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: 
> Error running child : java.lang.OutOfMemoryError: Direct buffer memory
> Max block location exceeded for split: CompactorInputSplit{base: 
> hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406,
>  bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, 
> delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, 
> delta_1056415_1056416, delta_1056417_1056418,…
> , delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, 
> delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, 
> delta_1074051_1074052]} splitsize: 8772 maxsize: 10
> 2015-08-12 15:34:25,271 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of 
> splits:3
> 2015-08-12 15:34:25,367 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting 
> tokens for job: job_1439397150426_0068
> 2015-08-12 15:34:25,603 INFO  [upladevhwd04v.researchnow.com-18]: 
> impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted 
> application application_1439397150426_0068
> 2015-08-12 15:34:25,610 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:submit(1294)) - The url to track the job: 
> http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/
> 2015-08-12 15:34:25,611 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: 
> job_1439397150426_0068
> 2015-08-12 15:34:30,170 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:33,756 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job 
> job_1439397150426_0068 running in uber mode : false
> 2015-08-12 15:34:33,757 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 0% reduce 0%
> 2015-08-12 15:34:35,147 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:40,155 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:45,184 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:50,201 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:55,256 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:00,205 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:02,975 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 33% reduce 0%
> 2015-08-12 15:35:02,982 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_00_0, Status : FAILED
> 2015-08-12 15:35:03,000 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_01_0, Status : FAILED
> 2015-08-12 15:35:04,008 INFO  

[jira] [Updated] (HIVE-11540) Too many delta files during Compaction - OOM

2015-10-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11540:
--
Attachment: HIVE-11540.4.patch

> Too many delta files during Compaction - OOM
> 
>
> Key: HIVE-11540
> URL: https://issues.apache.org/jira/browse/HIVE-11540
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Nivin Mathew
>Assignee: Eugene Koifman
> Attachments: HIVE-11540.3.patch, HIVE-11540.4.patch, HIVE-11540.patch
>
>
> Hello,
> I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with 
> an average of 20 million records a day. I have 5 compactors running at 
> various times (30m/5m/5s), no matter what time I give, the compactors seem to 
> run out of memory cleaning up a couple thousand delta files and ultimately 
> falls behind compacting/cleaning delta files. Any suggestions on what I can 
> do to improve performance? Or can Hive streaming not handle this kind of load?
> I used this post as reference: 
> http://henning.kropponline.de/2015/05/19/hivesink-for-flume/
> {noformat}
> 2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: 
> Error running child : java.lang.OutOfMemoryError: Direct buffer memory
> Max block location exceeded for split: CompactorInputSplit{base: 
> hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406,
>  bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, 
> delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, 
> delta_1056415_1056416, delta_1056417_1056418,…
> , delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, 
> delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, 
> delta_1074051_1074052]} splitsize: 8772 maxsize: 10
> 2015-08-12 15:34:25,271 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of 
> splits:3
> 2015-08-12 15:34:25,367 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting 
> tokens for job: job_1439397150426_0068
> 2015-08-12 15:34:25,603 INFO  [upladevhwd04v.researchnow.com-18]: 
> impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted 
> application application_1439397150426_0068
> 2015-08-12 15:34:25,610 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:submit(1294)) - The url to track the job: 
> http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/
> 2015-08-12 15:34:25,611 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: 
> job_1439397150426_0068
> 2015-08-12 15:34:30,170 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:33,756 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job 
> job_1439397150426_0068 running in uber mode : false
> 2015-08-12 15:34:33,757 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 0% reduce 0%
> 2015-08-12 15:34:35,147 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:40,155 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:45,184 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:50,201 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:55,256 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:00,205 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:02,975 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 33% reduce 0%
> 2015-08-12 15:35:02,982 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_00_0, Status : FAILED
> 2015-08-12 15:35:03,000 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_01_0, Status : FAILED
> 2015-08-12 15:35:04,008 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job