[jira] [Created] (MAPREDUCE-2614) Allow append arbitrary text at the end of generated query in DBOutputFormat class

2011-06-22 Thread Jarek Jarcec Cecho (JIRA)
Allow append arbitrary text at the end of generated query in DBOutputFormat 
class
-

 Key: MAPREDUCE-2614
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2614
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Jarek Jarcec Cecho
Priority: Minor


It would be wonderful if DBOutputFormat class allow addition of arbitrary text 
at the end of generated query. This feature can be useful for example in case 
of MySQL database to specify ON DUPLICATE KEY UPDATE ... part of the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2187) map tasks timeout during sorting

2011-06-22 Thread Anupam Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anupam Seth updated MAPREDUCE-2187:
---

Attachment: MAPREDUCE-2187-branch-MR-279.patch

 map tasks timeout during sorting
 

 Key: MAPREDUCE-2187
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2187
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.2
Reporter: Gianmarco De Francisci Morales
Assignee: Anupam Seth
 Attachments: MAPREDUCE-2187-20-security.patch, 
 MAPREDUCE-2187-22.patch, MAPREDUCE-2187-branch-MR-279.patch, 
 MAPREDUCE-2187-trunk.patch


 During the execution of a large job, the map tasks timeout:
 {code}
 INFO mapred.JobClient: Task Id : attempt_201010290414_60974_m_57_1, 
 Status : FAILED
 Task attempt_201010290414_60974_m_57_1 failed to report status for 609 
 seconds. Killing!
 {code}
 The bug is in the fact that the mapper has already finished, and, according 
 to the logs, the timeout occurs during the merge sort phase.
 The intermediate data generated by the map task is quite large. So I think 
 this is the problem.
 The logs show that the merge-sort was running for 10 minutes when the task 
 was killed.
 I think the mapred.Merger should call Reporter.progress() somewhere.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2614) Allow append arbitrary text at the end of generated query in DBOutputFormat class

2011-06-22 Thread Jarek Jarcec Cecho (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Jarcec Cecho updated MAPREDUCE-2614:
--

Release Note: Add new configuration property for DBConfiguration and it's 
propagation to DBOutputFormat
  Status: Patch Available  (was: Open)

I've add new configuration property to DBConfiguration called 
OUTPUT_QUERY_APPEND_PROPERTY. If this property is not empty than it's content 
is appended at the end of query generation in class DBOutputFormat 
(constructQuery method). I've also fixed both test around this two classes.

 Allow append arbitrary text at the end of generated query in DBOutputFormat 
 class
 -

 Key: MAPREDUCE-2614
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2614
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Jarek Jarcec Cecho
Priority: Minor

 It would be wonderful if DBOutputFormat class allow addition of arbitrary 
 text at the end of generated query. This feature can be useful for example in 
 case of MySQL database to specify ON DUPLICATE KEY UPDATE ... part of the 
 query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2614) Allow append arbitrary text at the end of generated query in DBOutputFormat class

2011-06-22 Thread Jarek Jarcec Cecho (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Jarcec Cecho updated MAPREDUCE-2614:
--

Attachment: MAPREDUCE-2614.patch

 Allow append arbitrary text at the end of generated query in DBOutputFormat 
 class
 -

 Key: MAPREDUCE-2614
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2614
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Jarek Jarcec Cecho
Priority: Minor
 Attachments: MAPREDUCE-2614.patch


 It would be wonderful if DBOutputFormat class allow addition of arbitrary 
 text at the end of generated query. This feature can be useful for example in 
 case of MySQL database to specify ON DUPLICATE KEY UPDATE ... part of the 
 query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2614) Allow append arbitrary text at the end of generated query in DBOutputFormat class

2011-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053346#comment-13053346
 ] 

Hadoop QA commented on MAPREDUCE-2614:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483446/MAPREDUCE-2614.patch
  against trunk revision 1138301.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestMRCLI
  org.apache.hadoop.fs.TestFileSystem
  org.apache.hadoop.mapreduce.lib.db.TestDBJob

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/411//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/411//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/411//console

This message is automatically generated.

 Allow append arbitrary text at the end of generated query in DBOutputFormat 
 class
 -

 Key: MAPREDUCE-2614
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2614
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Jarek Jarcec Cecho
Priority: Minor
 Attachments: MAPREDUCE-2614.patch


 It would be wonderful if DBOutputFormat class allow addition of arbitrary 
 text at the end of generated query. This feature can be useful for example in 
 case of MySQL database to specify ON DUPLICATE KEY UPDATE ... part of the 
 query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2615) MR 279: KillJob should go through AM whenever possible

2011-06-22 Thread Siddharth Seth (JIRA)
MR 279: KillJob should go through AM whenever possible
--

 Key: MAPREDUCE-2615
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2615
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: 0.23.0


KillJob currently goes directly to the RM - which effectively causes the AM and 
tasks to be killed via a signal. History information is not recorded in this 
case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2615) MR 279: KillJob should go through AM whenever possible

2011-06-22 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-2615:
--

Attachment: MR2615.patch

Kill goes via the AM if it's available.
Setting #maps, reduces and finishTime in JobSummaryLog in case of Killed jobs.

 MR 279: KillJob should go through AM whenever possible
 --

 Key: MAPREDUCE-2615
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2615
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: 0.23.0

 Attachments: MR2615.patch


 KillJob currently goes directly to the RM - which effectively causes the AM 
 and tasks to be killed via a signal. History information is not recorded in 
 this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2560) Support specification of codecs by name

2011-06-22 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned MAPREDUCE-2560:
-

Assignee: Arun Ramakrishnan  (was: Anthony Urso)

 Support specification of codecs by name
 ---

 Key: MAPREDUCE-2560
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2560
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Tom White
Assignee: Arun Ramakrishnan
  Labels: newbie

 By changing the code to take advantage of HADOOP-7323, it will be possible to 
 specify compression codecs in configuration by name (e.g. 'gzip'), not only 
 by classname, although that will still be supported, of course (e.g. 
 'org.apache.hadoop.io.compress.GzipCodec').

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2560) Support specification of codecs by name

2011-06-22 Thread Anthony Urso (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Urso reassigned MAPREDUCE-2560:
---

Assignee: Anthony Urso

 Support specification of codecs by name
 ---

 Key: MAPREDUCE-2560
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2560
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Tom White
Assignee: Anthony Urso
  Labels: newbie

 By changing the code to take advantage of HADOOP-7323, it will be possible to 
 specify compression codecs in configuration by name (e.g. 'gzip'), not only 
 by classname, although that will still be supported, of course (e.g. 
 'org.apache.hadoop.io.compress.GzipCodec').

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-2615) MR 279: KillJob should go through AM whenever possible

2011-06-22 Thread Luke Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Lu resolved MAPREDUCE-2615.


Tags: mrv2
  Resolution: Fixed
Hadoop Flags: [Reviewed]

+1. Committed to MR-279. Thanks Sidd!

 MR 279: KillJob should go through AM whenever possible
 --

 Key: MAPREDUCE-2615
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2615
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: 0.23.0

 Attachments: MR2615.patch


 KillJob currently goes directly to the RM - which effectively causes the AM 
 and tasks to be killed via a signal. History information is not recorded in 
 this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2602) Allow setting of end-of-record delimiter for TextInputFormat (for the old API)

2011-06-22 Thread Ahmed Radwan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053561#comment-13053561
 ] 

Ahmed Radwan commented on MAPREDUCE-2602:
-

The Hadoop-QA test failures above are not related to the submitted patch.

 Allow setting of end-of-record delimiter for TextInputFormat (for the old API)
 --

 Key: MAPREDUCE-2602
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2602
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ahmed Radwan
Assignee: Ahmed Radwan
 Attachments: MAPREDUCE-2602.patch


 Since there are users who are still using the old MR API, it will be useful 
 to modify the org.apache.hadoop.mapred.LineRecordReader and 
 org.apache.hadoop.mapred.TextInputFormat to be able to use custom 
 (user-specified) end-of-record delimiters. This will make use of the 
 LineReader improvement introduced in HADOOP-7096 that enables the LineReader 
 to break lines at user-specified delimiters. 
 Note: MAPREDUCE-2254 already added this improvement to the new API (but not 
 the old API).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2616) [Gridmix] Input data compression emulation might not work as expected with data reuse

2011-06-22 Thread Amar Kamat (JIRA)
[Gridmix] Input data compression emulation might not work as expected with data 
reuse
-

 Key: MAPREDUCE-2616
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2616
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix
Affects Versions: 0.23.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.23.0


Currently, all the Gridmix input data files are located at 
gridmix-io-dir/input (gridmix-io-dir is expected as a CLI parameter). When 
compression emulation is enabled, Gridmix will check for compressed files 
(based on suffixes) in the input folder. Gridmix will bail out if there are no 
compressed input files. If the input folder consists of a mix of compressed and 
uncompressed input files, then Gridmix might end up using uncompressed files 
resulting into no emulation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira