[jira] [Commented] (MAPREDUCE-6234) TestHighRamJob fails due to the change in MAPREDUCE-5785
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313534#comment-14313534 ] Hadoop QA commented on MAPREDUCE-6234: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697632/MAPREDUCE-6234.003.patch against trunk revision 260b5e3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-gridmix. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5179//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5179//console This message is automatically generated. > TestHighRamJob fails due to the change in MAPREDUCE-5785 > > > Key: MAPREDUCE-6234 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/gridmix, mrv2 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch, > MAPREDUCE-6234.003.patch > > > TestHighRamJob fails by this. > {code} > --- > T E S T S > --- > Running org.apache.hadoop.mapred.gridmix.TestHighRamJob > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec <<< > FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob > testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) > Time elapsed: 1.102 sec <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<-1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) > at > org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6249) Streaming task will not untar tgz uploaded with -archives
Liu Xiao created MAPREDUCE-6249: --- Summary: Streaming task will not untar tgz uploaded with -archives Key: MAPREDUCE-6249 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6249 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 2.5.2 Environment: hadoop-2.5.2 hadoop-streaming-2.5.2.jar Reporter: Liu Xiao when writing hadoop streaming task. i used -archives to upload a tgz from local machine to hdfs task working directory, but it has not been untarred as the document says. I've searched a lot without any luck. Here is the hadoop streaming task starting command with hadoop-2.5.2 hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \ -files mapper.sh -archives /home/hadoop/tmp/test.tgz#test \ -D mapreduce.job.maps=1 \ -D mapreduce.job.reduces=1 \ -input "/test/test.txt" \ -output "/res/" \ -mapper "sh mapper.sh" \ -reducer "cat" and "mapper.sh" cat > /dev/null ls -l test exit 0 in "test.tgz" there is two files "test.1.txt" and "test.2.txt" echo "abcd" > test.1.txt echo "efgh" > test.2.txt tar zcvf test.tgz test.1.txt test.2.txt the output from above task lrwxrwxrwx 1 hadoop hadoop 71 Feb 8 23:25 test -> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz but what desired may be like this -rw-r--r-- 1 hadoop hadoop 5 Feb 8 23:25 test.1.txt -rw-r--r-- 1 hadoop hadoop 5 Feb 8 23:25 test.2.txt so, why test.tgz has not been untarred automatically as document says, and or there is actually another way makes the "tgz" being untarred -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313469#comment-14313469 ] Hadoop QA commented on MAPREDUCE-6174: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697564/MAPREDUCE-6174.v1.txt against trunk revision af08425. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 18 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5178//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5178//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5178//console This message is automatically generated. > Combine common stream code into parent class for InMemoryMapOutput and > OnDiskMapOutput. > --- > > Key: MAPREDUCE-6174 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 3.0.0, 2.6.0 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: MAPREDUCE-6174.v1.txt > > > Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing > similar things with regards to IFile streams. > In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are > different from 3rd-party implementations, this JIRA will make them subclass a > common class (see > https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6248) Persist DistCp job id in the staging directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313414#comment-14313414 ] Jing Zhao commented on MAPREDUCE-6248: -- yes, actually that will be even better! I will upload a patch for this later. > Persist DistCp job id in the staging directory > -- > > Key: MAPREDUCE-6248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6248 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Reporter: Jing Zhao >Assignee: Jing Zhao > > Currently the DistCp is acting as a tool and the corresponding MapReduce Job > is created and used inside of its {{execute}} method. It is thus difficult > for external services to query its progress and counters. It may be helpful > to persist the job id into a file inside its staging directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6248) Persist DistCp job id in the staging directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313411#comment-14313411 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-6248: Why not have a public API in DistCp and use that programmatically instead of persisting IDs into files and then reading them? > Persist DistCp job id in the staging directory > -- > > Key: MAPREDUCE-6248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6248 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Reporter: Jing Zhao >Assignee: Jing Zhao > > Currently the DistCp is acting as a tool and the corresponding MapReduce Job > is created and used inside of its {{execute}} method. It is thus difficult > for external services to query its progress and counters. It may be helpful > to persist the job id into a file inside its staging directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6234) TestHighRamJob fails due to the change in MAPREDUCE-5785
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated MAPREDUCE-6234: Summary: TestHighRamJob fails due to the change in MAPREDUCE-5785 (was: MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml) > TestHighRamJob fails due to the change in MAPREDUCE-5785 > > > Key: MAPREDUCE-6234 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/gridmix, mrv2 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch, > MAPREDUCE-6234.003.patch > > > TestHighRamJob fails by this. > {code} > --- > T E S T S > --- > Running org.apache.hadoop.mapred.gridmix.TestHighRamJob > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec <<< > FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob > testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) > Time elapsed: 1.102 sec <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<-1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) > at > org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6234) MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated MAPREDUCE-6234: Attachment: MAPREDUCE-6234.003.patch 003 fixes test failure without changing the value of DEFAULT_*_MEMORY_MB. > MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml > > > Key: MAPREDUCE-6234 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/gridmix, mrv2 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch, > MAPREDUCE-6234.003.patch > > > TestHighRamJob fails by this. > {code} > --- > T E S T S > --- > Running org.apache.hadoop.mapred.gridmix.TestHighRamJob > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec <<< > FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob > testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) > Time elapsed: 1.102 sec <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<-1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) > at > org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6248) Persist DistCp job id in the staging directory
Jing Zhao created MAPREDUCE-6248: Summary: Persist DistCp job id in the staging directory Key: MAPREDUCE-6248 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6248 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Reporter: Jing Zhao Assignee: Jing Zhao Currently the DistCp is acting as a tool and the corresponding MapReduce Job is created and used inside of its {{execute}} method. It is thus difficult for external services to query its progress and counters. It may be helpful to persist the job id into a file inside its staging directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313278#comment-14313278 ] Hadoop QA commented on MAPREDUCE-6246: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697540/MAPREDUCE-6246.patch against trunk revision af08425. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5177//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5177//console This message is automatically generated. > DBOutputFormat.java appending extra semicolon to query which is incompatible > with DB2 > - > > Key: MAPREDUCE-6246 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 2.4.1 > Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x > Platform: xSeries, pSeries > Browser: Firefox, IE > Security Settings: No Security, Flat file, LDAP, PAM > File System: HDFS, GPFS FPO >Reporter: ramtin >Assignee: ramtin > Labels: DB2, mapreduce > Fix For: 2.4.1 > > Attachments: MAPREDUCE-6246.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > DBoutputformat is used for writing output of mapreduce jobs to the database > and when used with db2 jdbc drivers it fails with following error > com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, > SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, > DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at > com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) > In DBOutputFormat class there is constructQuery method that generates "INSERT > INTO" statement with semicolon(";") at the end. > Semicolon is ANSI SQL-92 standard character for a statement terminator but > this feature is disabled(OFF) as a default settings in IBM DB2. > Although by using -t we can turn it ON for db2. > (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). > But there are some products that already built on top of this default > setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6223) TestJobConf#testNegativeValueForTaskVmem failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313167#comment-14313167 ] Karthik Kambatla commented on MAPREDUCE-6223: - Patch looks mostly good to me. Nit: I would leave the test for negative values, but update the asserts to reflect the expected behavior. > TestJobConf#testNegativeValueForTaskVmem failures > - > > Key: MAPREDUCE-6223 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6223 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Gera Shegalov >Assignee: Varun Saxena > Attachments: MAPREDUCE-6223.001.patch, MAPREDUCE-6223.002.patch, > MAPREDUCE-6223.003.patch > > > {code} > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.328 sec <<< > FAILURE! - in org.apache.hadoop.conf.TestJobConf > testNegativeValueForTaskVmem(org.apache.hadoop.conf.TestJobConf) Time > elapsed: 0.089 sec <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<-1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem(TestJobConf.java:111) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6234) MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313148#comment-14313148 ] Karthik Kambatla commented on MAPREDUCE-6234: - Thanks for working on this, folks. As you might see in the description of the config, it is kind of hard to pick a single value for DEFAULT_MAP_MEMORY_MB, and the most appropriate value seemed 1024 since we fallback to that value. I like Gera's proposal of adding a helper method to get the default value; however, I wonder if that would just translate to calling {{JobConf#getMemoryRequired}} on the default conf. > MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml > > > Key: MAPREDUCE-6234 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/gridmix, mrv2 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch > > > TestHighRamJob fails by this. > {code} > --- > T E S T S > --- > Running org.apache.hadoop.mapred.gridmix.TestHighRamJob > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec <<< > FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob > testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) > Time elapsed: 1.102 sec <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<-1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) > at > org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313146#comment-14313146 ] Hudson commented on MAPREDUCE-6237: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2050 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2050/]) MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of reusing conections. Contributed by Kannan Rajah. (ozawa: rev 241336ca2b7cf97d7e0bd84dbe0542b72f304dc9) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java > Multiple mappers with DBInputFormat don't work because of reusing conections > > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.5.0, 2.6.0 >Reporter: Kannan Rajah >Assignee: Kannan Rajah > Fix For: 2.6.1 > > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313139#comment-14313139 ] Hudson commented on MAPREDUCE-6237: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #2031 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2031/]) MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of reusing conections. Contributed by Kannan Rajah. (ozawa: rev 241336ca2b7cf97d7e0bd84dbe0542b72f304dc9) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java > Multiple mappers with DBInputFormat don't work because of reusing conections > > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.5.0, 2.6.0 >Reporter: Kannan Rajah >Assignee: Kannan Rajah > Fix For: 2.6.1 > > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master
[ https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated MAPREDUCE-6242: Status: Patch Available (was: Open) > Progress report log is incredibly excessive in application master > - > > Key: MAPREDUCE-6242 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 2.4.0 >Reporter: Jian Fang >Assignee: Varun Saxena > Attachments: MAPREDUCE-6242.001.patch > > > We saw incredibly excessive logs in application master for a long running one > with many task attempts. The log write rate is around 1MB/sec in some cases. > Most of the log entries were from the progress report such as the following > ones. > 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt > attempt_1422985365246_0001_m_00_0 is : 0.15605757 > 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt > attempt_1422985365246_0001_m_00_0 is : 0.4108217 > 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt > attempt_1422985365246_0001_m_02_0 is : 0.06634143 > 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt > attempt_1422985365246_0001_m_00_0 is : 0.6506 > 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt > attempt_1422985365246_0001_m_01_0 is : 0.21723115 > Looks like the report interval is controlled by a hard-coded variable > PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We > should allow users to set the appropriate progress interval for their > applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master
[ https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated MAPREDUCE-6242: Status: Open (was: Patch Available) > Progress report log is incredibly excessive in application master > - > > Key: MAPREDUCE-6242 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 2.4.0 >Reporter: Jian Fang >Assignee: Varun Saxena > Attachments: MAPREDUCE-6242.001.patch > > > We saw incredibly excessive logs in application master for a long running one > with many task attempts. The log write rate is around 1MB/sec in some cases. > Most of the log entries were from the progress report such as the following > ones. > 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt > attempt_1422985365246_0001_m_00_0 is : 0.15605757 > 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt > attempt_1422985365246_0001_m_00_0 is : 0.4108217 > 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt > attempt_1422985365246_0001_m_02_0 is : 0.06634143 > 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt > attempt_1422985365246_0001_m_00_0 is : 0.6506 > 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt > attempt_1422985365246_0001_m_01_0 is : 0.21723115 > Looks like the report interval is controlled by a hard-coded variable > PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We > should allow users to set the appropriate progress interval for their > applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313096#comment-14313096 ] Hudson commented on MAPREDUCE-6237: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #100 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/100/]) MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of reusing conections. Contributed by Kannan Rajah. (ozawa: rev 241336ca2b7cf97d7e0bd84dbe0542b72f304dc9) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java > Multiple mappers with DBInputFormat don't work because of reusing conections > > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.5.0, 2.6.0 >Reporter: Kannan Rajah >Assignee: Kannan Rajah > Fix For: 2.6.1 > > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4413) MR lib dir contains jdiff (which is gpl)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313044#comment-14313044 ] Hudson commented on MAPREDUCE-4413: --- SUCCESS: Integrated in Hadoop-trunk-Commit #7053 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7053/]) MAPREDUCE-4413. MR lib dir contains jdiff (which is gpl) (Nemon Lou via aw) (aw: rev aab459c904bf2007c5b230af8c058793935faf89) * hadoop-mapreduce-project/CHANGES.txt * hadoop-assemblies/src/main/resources/assemblies/hadoop-mapreduce-dist.xml > MR lib dir contains jdiff (which is gpl) > > > Key: MAPREDUCE-4413 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4413 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: build >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Nemon Lou >Priority: Critical > Fix For: 3.0.0 > > Attachments: MAPREDUCE-4413.patch, MAPREDUCE-4413.patch > > > A tarball built from trunk contains the following: > ./share/hadoop/mapreduce/lib/jdiff-1.0.9.jar > jdiff is gplv2, we need to exclude it from the build artifact. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313047#comment-14313047 ] Hudson commented on MAPREDUCE-6237: --- SUCCESS: Integrated in Hadoop-trunk-Commit #7053 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7053/]) MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of reusing conections. Contributed by Kannan Rajah. (ozawa: rev 241336ca2b7cf97d7e0bd84dbe0542b72f304dc9) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java > Multiple mappers with DBInputFormat don't work because of reusing conections > > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.5.0, 2.6.0 >Reporter: Kannan Rajah >Assignee: Kannan Rajah > Fix For: 2.6.1 > > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4413) MR lib dir contains jdiff (which is gpl)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4413: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) +1 committed to trunk. Thanks! > MR lib dir contains jdiff (which is gpl) > > > Key: MAPREDUCE-4413 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4413 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: build >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Nemon Lou >Priority: Critical > Fix For: 3.0.0 > > Attachments: MAPREDUCE-4413.patch, MAPREDUCE-4413.patch > > > A tarball built from trunk contains the following: > ./share/hadoop/mapreduce/lib/jdiff-1.0.9.jar > jdiff is gplv2, we need to exclude it from the build artifact. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312951#comment-14312951 ] Hudson commented on MAPREDUCE-6237: --- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #99 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/99/]) MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of reusing conections. Contributed by Kannan Rajah. (ozawa: rev 241336ca2b7cf97d7e0bd84dbe0542b72f304dc9) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java * hadoop-mapreduce-project/CHANGES.txt > Multiple mappers with DBInputFormat don't work because of reusing conections > > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.5.0, 2.6.0 >Reporter: Kannan Rajah >Assignee: Kannan Rajah > Fix For: 2.6.1 > > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-207: --- Status: Open (was: Patch Available) Cancelling patch, as it no longer applies. > Computing Input Splits on the MR Cluster > > > Key: MAPREDUCE-207 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: applicationmaster, mrv2 >Reporter: Philip Zeyliger >Assignee: Gera Shegalov > Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, > MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, > MAPREDUCE-207.v07.patch > > > Instead of computing the input splits as part of job submission, Hadoop could > have a separate "job task type" that computes the input splits, therefore > allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated MAPREDUCE-6174: -- Attachment: MAPREDUCE-6174.v1.txt [~jira.shegalov], I have uploaded a patch for this issue. Would you please have a look? > Combine common stream code into parent class for InMemoryMapOutput and > OnDiskMapOutput. > --- > > Key: MAPREDUCE-6174 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 3.0.0, 2.6.0 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: MAPREDUCE-6174.v1.txt > > > Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing > similar things with regards to IFile streams. > In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are > different from 3rd-party implementations, this JIRA will make them subclass a > common class (see > https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated MAPREDUCE-6174: -- Status: Patch Available (was: Open) > Combine common stream code into parent class for InMemoryMapOutput and > OnDiskMapOutput. > --- > > Key: MAPREDUCE-6174 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 2.6.0, 3.0.0 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: MAPREDUCE-6174.v1.txt > > > Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing > similar things with regards to IFile streams. > In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are > different from 3rd-party implementations, this JIRA will make them subclass a > common class (see > https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312817#comment-14312817 ] Hudson commented on MAPREDUCE-6237: --- FAILURE: Integrated in Hadoop-Yarn-trunk #833 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/833/]) MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of reusing conections. Contributed by Kannan Rajah. (ozawa: rev 241336ca2b7cf97d7e0bd84dbe0542b72f304dc9) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java * hadoop-mapreduce-project/CHANGES.txt > Multiple mappers with DBInputFormat don't work because of reusing conections > > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.5.0, 2.6.0 >Reporter: Kannan Rajah >Assignee: Kannan Rajah > Fix For: 2.6.1 > > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6223) TestJobConf#testNegativeValueForTaskVmem failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312791#comment-14312791 ] Akira AJISAKA commented on MAPREDUCE-6223: -- Thanks [~varun_saxena] for updating the patch. +1 pending [~kasha]'s review. The findbugs warnings look unrelated to the patch. > TestJobConf#testNegativeValueForTaskVmem failures > - > > Key: MAPREDUCE-6223 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6223 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Gera Shegalov >Assignee: Varun Saxena > Attachments: MAPREDUCE-6223.001.patch, MAPREDUCE-6223.002.patch, > MAPREDUCE-6223.003.patch > > > {code} > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.328 sec <<< > FAILURE! - in org.apache.hadoop.conf.TestJobConf > testNegativeValueForTaskVmem(org.apache.hadoop.conf.TestJobConf) Time > elapsed: 0.089 sec <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<-1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem(TestJobConf.java:111) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312745#comment-14312745 ] Kannan Rajah commented on MAPREDUCE-6237: - Created MAPREDUCE-6247 to track connection pooling. > Multiple mappers with DBInputFormat don't work because of reusing conections > > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.5.0, 2.6.0 >Reporter: Kannan Rajah >Assignee: Kannan Rajah > Fix For: 2.6.1 > > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6247) Use DBCP connection pooling in DBInputFormat
Kannan Rajah created MAPREDUCE-6247: --- Summary: Use DBCP connection pooling in DBInputFormat Key: MAPREDUCE-6247 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6247 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.6.0, 2.5.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Priority: Minor As part of MAPREDUCE-6237, we removed caching of DB connection. [~jira.shegalov] and [~ozawa] suggested that we use DBCP connection pooling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Description: DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) In DBOutputFormat class there is constructQuery method that generates "INSERT INTO" statement with semicolon(";") at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. was: In DBOutputFormat class there is constructQuery method that generates "INSERT INTO" statement with semicolon(";") at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. > DBOutputFormat.java appending extra semicolon to query which is incompatible > with DB2 > - > > Key: MAPREDUCE-6246 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 2.4.1 > Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x > Platform: xSeries, pSeries > Browser: Firefox, IE > Security Settings: No Security, Flat file, LDAP, PAM > File System: HDFS, GPFS FPO >Reporter: ramtin >Assignee: ramtin > Labels: DB2, mapreduce > Fix For: 2.4.1 > > Attachments: MAPREDUCE-6246.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > DBoutputformat is used for writing output of mapreduce jobs to the database > and when used with db2 jdbc drivers it fails with following error > com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, > SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, > DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at > com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) > In DBOutputFormat class there is constructQuery method that generates "INSERT > INTO" statement with semicolon(";") at the end. > Semicolon is ANSI SQL-92 standard character for a statement terminator but > this feature is disabled(OFF) as a default settings in IBM DB2. > Although by using -t we can turn it ON for db2. > (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). > But there are some products that already built on top of this default > setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Description: DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) In DBOutputFormat class there is constructQuery method that generates "INSERT INTO" statement with semicolon(";") at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. was: DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) > DBOutputFormat.java appending extra semicolon to query which is incompatible > with DB2 > - > > Key: MAPREDUCE-6246 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 2.4.1 > Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x > Platform: xSeries, pSeries > Browser: Firefox, IE > Security Settings: No Security, Flat file, LDAP, PAM > File System: HDFS, GPFS FPO >Reporter: ramtin >Assignee: ramtin > Labels: DB2, mapreduce > Fix For: 2.4.1 > > Attachments: MAPREDUCE-6246.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > DBoutputformat is used for writing output of mapreduce jobs to the database > and when used with db2 jdbc drivers it fails with following error > com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, > SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, > DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at > com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) > In DBOutputFormat class there is constructQuery method that generates "INSERT > INTO" statement with semicolon(";") at the end. > Semicolon is ANSI SQL-92 standard character for a statement terminator but > this feature is disabled(OFF) as a default settings in IBM DB2. > Although by using -t we can turn it ON for db2. > (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). > But there are some products that already built on top of this default > setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Description: DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) was: DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) In DBOutputFormat class there is constructQuery method that generates "INSERT INTO" statement with semicolon(";") at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. > DBOutputFormat.java appending extra semicolon to query which is incompatible > with DB2 > - > > Key: MAPREDUCE-6246 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 2.4.1 > Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x > Platform: xSeries, pSeries > Browser: Firefox, IE > Security Settings: No Security, Flat file, LDAP, PAM > File System: HDFS, GPFS FPO >Reporter: ramtin >Assignee: ramtin > Labels: DB2, mapreduce > Fix For: 2.4.1 > > Attachments: MAPREDUCE-6246.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > DBoutputformat is used for writing output of mapreduce jobs to the database > and when used with db2 jdbc drivers it fails with following error > com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, > SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, > DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at > com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Attachment: MAPREDUCE-6246.patch > DBOutputFormat.java appending extra semicolon to query which is incompatible > with DB2 > - > > Key: MAPREDUCE-6246 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 2.4.1 > Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x > Platform: xSeries, pSeries > Browser: Firefox, IE > Security Settings: No Security, Flat file, LDAP, PAM > File System: HDFS, GPFS FPO >Reporter: ramtin >Assignee: ramtin > Labels: DB2, mapreduce > Fix For: 2.4.1 > > Attachments: MAPREDUCE-6246.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In DBOutputFormat class there is constructQuery method that generates "INSERT > INTO" statement with semicolon(";") at the end. > Semicolon is ANSI SQL-92 standard character for a statement terminator but > this feature is disabled(OFF) as a default settings in IBM DB2. > Although by using -t we can turn it ON for db2. > (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). > But there are some products that already built on top of this default > setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Fix Version/s: 2.4.1 Labels: DB2 mapreduce (was: ) Status: Patch Available (was: Open) I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates "INSERT INTO" query without semicolon(";"). This technique is already used in DBInputFormat class for generating different "SELECT" statements for Oracle and MySQL databases. > DBOutputFormat.java appending extra semicolon to query which is incompatible > with DB2 > - > > Key: MAPREDUCE-6246 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 2.4.1 > Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x > Platform: xSeries, pSeries > Browser: Firefox, IE > Security Settings: No Security, Flat file, LDAP, PAM > File System: HDFS, GPFS FPO >Reporter: ramtin >Assignee: ramtin > Labels: mapreduce, DB2 > Fix For: 2.4.1 > > Original Estimate: 24h > Remaining Estimate: 24h > > In DBOutputFormat class there is constructQuery method that generates "INSERT > INTO" statement with semicolon(";") at the end. > Semicolon is ANSI SQL-92 standard character for a statement terminator but > this feature is disabled(OFF) as a default settings in IBM DB2. > Although by using -t we can turn it ON for db2. > (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). > But there are some products that already built on top of this default > setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6244) Hadoop examples when run without an argument, gives ERROR instead of just usage info
[ https://issues.apache.org/jira/browse/MAPREDUCE-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312642#comment-14312642 ] Tsuyoshi OZAWA commented on MAPREDUCE-6244: --- Cancelling for the previous comment. > Hadoop examples when run without an argument, gives ERROR instead of just > usage info > > > Key: MAPREDUCE-6244 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6244 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.23.0, trunk-win, 2.6.0 >Reporter: Robert Justice >Assignee: Abhishek Kapoor >Priority: Minor > Attachments: HADOOP-8834.patch, HADOOP-8834.patch > > > Hadoop sort example should not give an ERROR and only should display usage > when run with no parameters. > {code} > $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar sort > ERROR: Wrong number of parameters: 0 instead of 2. > sort [-m ] [-r ] [-inFormat ] [-outFormat > ] [-outKey ] [-outValue class>] [-totalOrder ] > Generic options supported are > -conf specify an application configuration file > -D use value for given property > -fs specify a namenode > -jt specify a job tracker > -files specify comma separated files to be > copied to the map reduce cluster > -libjars specify comma separated jar files > to include in the classpath. > -archives specify comma separated > archives to be unarchived on the compute machines. > The general command line syntax is > bin/hadoop command [genericOptions] [commandOptions] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6244) Hadoop examples when run without an argument, gives ERROR instead of just usage info
[ https://issues.apache.org/jira/browse/MAPREDUCE-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-6244: -- Status: Open (was: Patch Available) > Hadoop examples when run without an argument, gives ERROR instead of just > usage info > > > Key: MAPREDUCE-6244 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6244 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.0, 0.23.0, trunk-win >Reporter: Robert Justice >Assignee: Abhishek Kapoor >Priority: Minor > Attachments: HADOOP-8834.patch, HADOOP-8834.patch > > > Hadoop sort example should not give an ERROR and only should display usage > when run with no parameters. > {code} > $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar sort > ERROR: Wrong number of parameters: 0 instead of 2. > sort [-m ] [-r ] [-inFormat ] [-outFormat > ] [-outKey ] [-outValue class>] [-totalOrder ] > Generic options supported are > -conf specify an application configuration file > -D use value for given property > -fs specify a namenode > -jt specify a job tracker > -files specify comma separated files to be > copied to the map reduce cluster > -libjars specify comma separated jar files > to include in the classpath. > -archives specify comma separated > archives to be unarchived on the compute machines. > The general command line syntax is > bin/hadoop command [genericOptions] [commandOptions] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312636#comment-14312636 ] Tsuyoshi OZAWA commented on MAPREDUCE-6237: --- Committed this to trunk, branch-2, and branch-2.6. Thanks [~rkannan82] for your contribution and thanks [~jira.shegalov] for your review. [~rkannan82], BTW, do you mind creating following JIRA to use thread pool based on Gera's suggestion? > Multiple mappers with DBInputFormat don't work because of reusing conections > > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.5.0, 2.6.0 >Reporter: Kannan Rajah >Assignee: Kannan Rajah > Fix For: 2.6.1 > > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Description: In DBOutputFormat class there is constructQuery method that generates "INSERT INTO" statement with semicolon(";") at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. was: In DBOutputFormat class there is constructQuery method that generates "INSERT INTO" statement with semicolon(";") at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates "INSERT INTO" command without semicolon(";"). This technique is already used in DBInputFormat class for generating different "SELECT" statements for Oracle and MySQL databases. > DBOutputFormat.java appending extra semicolon to query which is incompatible > with DB2 > - > > Key: MAPREDUCE-6246 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 2.4.1 > Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x > Platform: xSeries, pSeries > Browser: Firefox, IE > Security Settings: No Security, Flat file, LDAP, PAM > File System: HDFS, GPFS FPO >Reporter: ramtin >Assignee: ramtin > Original Estimate: 24h > Remaining Estimate: 24h > > In DBOutputFormat class there is constructQuery method that generates "INSERT > INTO" statement with semicolon(";") at the end. > Semicolon is ANSI SQL-92 standard character for a statement terminator but > this feature is disabled(OFF) as a default settings in IBM DB2. > Although by using -t we can turn it ON for db2. > (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). > But there are some products that already built on top of this default > setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-6237: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Multiple mappers with DBInputFormat don't work because of reusing conections > > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.5.0, 2.6.0 >Reporter: Kannan Rajah >Assignee: Kannan Rajah > Fix For: 2.6.1 > > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-6237: -- Fix Version/s: 2.6.1 > Multiple mappers with DBInputFormat don't work because of reusing conections > > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.5.0, 2.6.0 >Reporter: Kannan Rajah >Assignee: Kannan Rajah > Fix For: 2.6.1 > > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Description: In DBOutputFormat class there is constructQuery method that generates "INSERT INTO" statement with semicolon(;;)) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates "INSERT INTO" command without semicolon(;). This technique is already used in DBInputFormat class for generating different "SELECT" statements for Oracle and MySQL databases. was: In DBOutputFormat class there is constructQuery method that generates "INSERT INTO" statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates "INSERT INTO" command without semicolon(;). This technique is already used in DBInputFormat class for generating different "SELECT" statements for Oracle and MySQL databases. > DBOutputFormat.java appending extra semicolon to query which is incompatible > with DB2 > - > > Key: MAPREDUCE-6246 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 2.4.1 > Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x > Platform: xSeries, pSeries > Browser: Firefox, IE > Security Settings: No Security, Flat file, LDAP, PAM > File System: HDFS, GPFS FPO >Reporter: ramtin >Assignee: ramtin > Original Estimate: 24h > Remaining Estimate: 24h > > In DBOutputFormat class there is constructQuery method that generates "INSERT > INTO" statement with semicolon(;;)) at the end. > Semicolon is ANSI SQL-92 standard character for a statement terminator but > this feature is disabled(OFF) as a default settings in IBM DB2. > Although by using -t we can turn it ON for db2. > (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). > But there are some products that already built on top of this default > setting (OFF) so by turning ON this feature make them error prone. > I changed the current DBOutputFormat class by checking the product name from > connection object to see if it is DB2 then generates "INSERT INTO" command > without semicolon(;). > This technique is already used in DBInputFormat class for generating > different "SELECT" statements for Oracle and MySQL databases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Description: In DBOutputFormat class there is constructQuery method that generates "INSERT INTO" statement with semicolon(";") at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates "INSERT INTO" command without semicolon(;). This technique is already used in DBInputFormat class for generating different "SELECT" statements for Oracle and MySQL databases. was: In DBOutputFormat class there is constructQuery method that generates "INSERT INTO" statement with semicolon(;;)) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates "INSERT INTO" command without semicolon(;). This technique is already used in DBInputFormat class for generating different "SELECT" statements for Oracle and MySQL databases. > DBOutputFormat.java appending extra semicolon to query which is incompatible > with DB2 > - > > Key: MAPREDUCE-6246 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 2.4.1 > Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x > Platform: xSeries, pSeries > Browser: Firefox, IE > Security Settings: No Security, Flat file, LDAP, PAM > File System: HDFS, GPFS FPO >Reporter: ramtin >Assignee: ramtin > Original Estimate: 24h > Remaining Estimate: 24h > > In DBOutputFormat class there is constructQuery method that generates "INSERT > INTO" statement with semicolon(";") at the end. > Semicolon is ANSI SQL-92 standard character for a statement terminator but > this feature is disabled(OFF) as a default settings in IBM DB2. > Although by using -t we can turn it ON for db2. > (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). > But there are some products that already built on top of this default > setting (OFF) so by turning ON this feature make them error prone. > I changed the current DBOutputFormat class by checking the product name from > connection object to see if it is DB2 then generates "INSERT INTO" command > without semicolon(;). > This technique is already used in DBInputFormat class for generating > different "SELECT" statements for Oracle and MySQL databases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Description: In DBOutputFormat class there is constructQuery method that generates "INSERT INTO" statement with semicolon(";") at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates "INSERT INTO" command without semicolon(";"). This technique is already used in DBInputFormat class for generating different "SELECT" statements for Oracle and MySQL databases. was: In DBOutputFormat class there is constructQuery method that generates "INSERT INTO" statement with semicolon(";") at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates "INSERT INTO" command without semicolon(;). This technique is already used in DBInputFormat class for generating different "SELECT" statements for Oracle and MySQL databases. > DBOutputFormat.java appending extra semicolon to query which is incompatible > with DB2 > - > > Key: MAPREDUCE-6246 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 2.4.1 > Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x > Platform: xSeries, pSeries > Browser: Firefox, IE > Security Settings: No Security, Flat file, LDAP, PAM > File System: HDFS, GPFS FPO >Reporter: ramtin >Assignee: ramtin > Original Estimate: 24h > Remaining Estimate: 24h > > In DBOutputFormat class there is constructQuery method that generates "INSERT > INTO" statement with semicolon(";") at the end. > Semicolon is ANSI SQL-92 standard character for a statement terminator but > this feature is disabled(OFF) as a default settings in IBM DB2. > Although by using -t we can turn it ON for db2. > (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). > But there are some products that already built on top of this default > setting (OFF) so by turning ON this feature make them error prone. > I changed the current DBOutputFormat class by checking the product name from > connection object to see if it is DB2 then generates "INSERT INTO" command > without semicolon(";"). > This technique is already used in DBInputFormat class for generating > different "SELECT" statements for Oracle and MySQL databases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
ramtin created MAPREDUCE-6246: - Summary: DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 Key: MAPREDUCE-6246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 2.4.1 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x Platform: xSeries, pSeries Browser: Firefox, IE Security Settings: No Security, Flat file, LDAP, PAM File System: HDFS, GPFS FPO Reporter: ramtin Assignee: ramtin In DBOutputFormat class there is constructQuery method that generates "INSERT INTO" statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates "INSERT INTO" command without semicolon(;). This technique is already used in DBInputFormat class for generating different "SELECT" statements for Oracle and MySQL databases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-5381) Support graceful decommission of tasktracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved MAPREDUCE-5381. Resolution: Won't Fix Hardly any development is happening in 1.x now. I am closing this in favor of YARN's YARN-914. Please reopen if need be. > Support graceful decommission of tasktracker > > > Key: MAPREDUCE-5381 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5381 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1 >Affects Versions: 1.2.0 >Reporter: Luke Lu >Assignee: Binglin Chang > Attachments: MAPREDUCE-5381-graceful-decomm.v1.patch > > > When TTs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running jobs. > Currently if a TT is decommissioned, all running tasks on the TT need to be > rescheduled on other TTs. Further more, for finished map tasks, if their map > output are not fetched by the reducers of the job, these map tasks will need > to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > tasktracker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-6237: -- Summary: Multiple mappers with DBInputFormat don't work because of reusing conections (was: DBRecordReader is not thread safe) > Multiple mappers with DBInputFormat don't work because of reusing conections > > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.5.0, 2.6.0 >Reporter: Kannan Rajah >Assignee: Kannan Rajah > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6237) DBRecordReader is not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-6237: -- Affects Version/s: 2.6.0 Hadoop Flags: Reviewed > DBRecordReader is not thread safe > - > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.5.0, 2.6.0 >Reporter: Kannan Rajah >Assignee: Kannan Rajah > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) DBRecordReader is not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312590#comment-14312590 ] Tsuyoshi OZAWA commented on MAPREDUCE-6237: --- +1, findbugs look not related to your patch. I'll commit this to branch-2 and trunk shortly. > DBRecordReader is not thread safe > - > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.5.0 >Reporter: Kannan Rajah >Assignee: Kannan Rajah > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) DBRecordReader is not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312583#comment-14312583 ] Kannan Rajah commented on MAPREDUCE-6237: - [~ozawa] Is the patch alright? Anything else I need to do to get this committed? > DBRecordReader is not thread safe > - > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.5.0 >Reporter: Kannan Rajah >Assignee: Kannan Rajah > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312122#comment-14312122 ] kumar ranganathan commented on MAPREDUCE-5903: -- I am also facing the same exception when enabling LDAP for windows active directory in hadoop-2.6.0. > If Kerberos Authentication is enabled, MapReduce job is failing on reducer > phase > > > Key: MAPREDUCE-5903 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.4.0 > Environment: hadoop: 2.4.0.2.1.2.0 >Reporter: Victor Kim >Priority: Critical > Labels: shuffle > > I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, > Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. > ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos > principal. > Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one > having Kerberos principal on all boxes). Result: job successfully completed. > Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. > Result: Map tasks are completed SUCCESSfully, Reduce task fails with > ShuffleError Caused by: java.io.IOException: Exceeded > MAX_FAILED_UNIQUE_FETCHES (see the stack trace below). > The use case with user impersonation used to work on earlier versions, > without YARN (with JT&TT). > I found similar issue with Kerberos AUTH involved here: > https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ > And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as > resolved, which is not the case when Kerberos Authentication is enabled. > The exception trace from YarnChild JVM: > 2014-05-21 12:49:35,687 FATAL [fetcher#3] > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed > with too many fetch failures and insufficient progress! > 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#3 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; > bailing-out. > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323) > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165) -- This message was sent by Atlassian JIRA (v6.3.4#6332)