date:20150209

[jira] [Commented] (MAPREDUCE-6234) TestHighRamJob fails due to the change in MAPREDUCE-5785

2015-02-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313534#comment-14313534
 ] 

Hadoop QA commented on MAPREDUCE-6234:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12697632/MAPREDUCE-6234.003.patch
  against trunk revision 260b5e3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-gridmix.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5179//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5179//console

This message is automatically generated.

> TestHighRamJob fails due to the change in MAPREDUCE-5785
> 
>
> Key: MAPREDUCE-6234
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/gridmix, mrv2
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
> Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch, 
> MAPREDUCE-6234.003.patch
>
>
> TestHighRamJob fails by this.
> {code}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.mapred.gridmix.TestHighRamJob
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec <<< 
> FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob
> testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob)  
> Time elapsed: 1.102 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1024> but was:<-1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98)
>   at 
> org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6249) Streaming task will not untar tgz uploaded with -archives

2015-02-09 Thread Liu Xiao (JIRA)

Liu Xiao created MAPREDUCE-6249:
---

 Summary: Streaming task will not untar tgz uploaded with -archives
 Key: MAPREDUCE-6249
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6249
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 2.5.2
 Environment: hadoop-2.5.2
hadoop-streaming-2.5.2.jar
Reporter: Liu Xiao


when writing hadoop streaming task. i used -archives to upload a tgz from local 
machine to hdfs task working directory, but it has not been untarred as the 
document says. I've searched a lot without any luck.

Here is the hadoop streaming task starting command with hadoop-2.5.2

hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \
-files mapper.sh
-archives /home/hadoop/tmp/test.tgz#test \
-D mapreduce.job.maps=1 \
-D mapreduce.job.reduces=1 \
-input "/test/test.txt" \
-output "/res/" \
-mapper "sh mapper.sh" \
-reducer "cat"

and "mapper.sh"

cat > /dev/null
ls -l test
exit 0

in "test.tgz" there is two files "test.1.txt" and "test.2.txt"

echo "abcd" > test.1.txt
echo "efgh" > test.2.txt
tar zcvf test.tgz test.1.txt test.2.txt

the output from above task

lrwxrwxrwx 1 hadoop hadoop 71 Feb  8 23:25 test -> 
/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz

but what desired may be like this

-rw-r--r-- 1 hadoop hadoop 5 Feb  8 23:25 test.1.txt
-rw-r--r-- 1 hadoop hadoop 5 Feb  8 23:25 test.2.txt

so, why test.tgz has not been untarred automatically as document says, and or 
there is actually another way makes the "tgz" being untarred



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-02-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313469#comment-14313469
 ] 

Hadoop QA commented on MAPREDUCE-6174:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697564/MAPREDUCE-6174.v1.txt
  against trunk revision af08425.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5178//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5178//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5178//console

This message is automatically generated.

> Combine common stream code into parent class for InMemoryMapOutput and 
> OnDiskMapOutput.
> ---
>
> Key: MAPREDUCE-6174
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 3.0.0, 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6174.v1.txt
>
>
> Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
> similar things with regards to IFile streams.
> In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
> different from 3rd-party implementations, this JIRA will make them subclass a 
> common class (see 
> https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6248) Persist DistCp job id in the staging directory

2015-02-09 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313414#comment-14313414
 ] 

Jing Zhao commented on MAPREDUCE-6248:
--

yes, actually that will be even better! I will upload a patch for this later.

> Persist DistCp job id in the staging directory
> --
>
> Key: MAPREDUCE-6248
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6248
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>
> Currently the DistCp is acting as a tool and the corresponding MapReduce Job  
> is created and used inside of its {{execute}} method. It is thus difficult 
> for external services to query its progress and counters. It may be helpful 
> to persist the job id into a file inside its staging directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6248) Persist DistCp job id in the staging directory

2015-02-09 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313411#comment-14313411
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6248:


Why not have a public API in DistCp and use that programmatically instead of 
persisting IDs into files and then reading them?

> Persist DistCp job id in the staging directory
> --
>
> Key: MAPREDUCE-6248
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6248
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>
> Currently the DistCp is acting as a tool and the corresponding MapReduce Job  
> is created and used inside of its {{execute}} method. It is thus difficult 
> for external services to query its progress and counters. It may be helpful 
> to persist the job id into a file inside its staging directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6234) TestHighRamJob fails due to the change in MAPREDUCE-5785

2015-02-09 Thread Masatake Iwasaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated MAPREDUCE-6234:

Summary: TestHighRamJob fails due to the change in MAPREDUCE-5785  (was: 
MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml)

> TestHighRamJob fails due to the change in MAPREDUCE-5785
> 
>
> Key: MAPREDUCE-6234
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/gridmix, mrv2
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
> Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch, 
> MAPREDUCE-6234.003.patch
>
>
> TestHighRamJob fails by this.
> {code}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.mapred.gridmix.TestHighRamJob
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec <<< 
> FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob
> testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob)  
> Time elapsed: 1.102 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1024> but was:<-1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98)
>   at 
> org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6234) MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml

2015-02-09 Thread Masatake Iwasaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated MAPREDUCE-6234:

Attachment: MAPREDUCE-6234.003.patch

003 fixes test failure without changing the value of DEFAULT_*_MEMORY_MB.

> MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml
> 
>
> Key: MAPREDUCE-6234
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/gridmix, mrv2
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
> Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch, 
> MAPREDUCE-6234.003.patch
>
>
> TestHighRamJob fails by this.
> {code}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.mapred.gridmix.TestHighRamJob
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec <<< 
> FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob
> testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob)  
> Time elapsed: 1.102 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1024> but was:<-1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98)
>   at 
> org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6248) Persist DistCp job id in the staging directory

2015-02-09 Thread Jing Zhao (JIRA)

Jing Zhao created MAPREDUCE-6248:


 Summary: Persist DistCp job id in the staging directory
 Key: MAPREDUCE-6248
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6248
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Reporter: Jing Zhao
Assignee: Jing Zhao


Currently the DistCp is acting as a tool and the corresponding MapReduce Job  
is created and used inside of its {{execute}} method. It is thus difficult for 
external services to query its progress and counters. It may be helpful to 
persist the job id into a file inside its staging directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313278#comment-14313278
 ] 

Hadoop QA commented on MAPREDUCE-6246:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697540/MAPREDUCE-6246.patch
  against trunk revision af08425.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5177//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5177//console

This message is automatically generated.

> DBOutputFormat.java appending extra semicolon to query which is incompatible 
> with DB2
> -
>
> Key: MAPREDUCE-6246
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.4.1
> Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
> Platform: xSeries, pSeries
> Browser: Firefox, IE
> Security Settings: No Security, Flat file, LDAP, PAM
> File System: HDFS, GPFS FPO
>Reporter: ramtin
>Assignee: ramtin
>  Labels: DB2, mapreduce
> Fix For: 2.4.1
>
> Attachments: MAPREDUCE-6246.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> DBoutputformat is used for writing output of mapreduce jobs to the database 
> and when used with db2 jdbc drivers it fails with following error
> com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
> SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
> DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
> com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)
> In DBOutputFormat class there is constructQuery method that generates "INSERT 
> INTO" statement with semicolon(";") at the end.
> Semicolon is ANSI SQL-92 standard character for a statement terminator but 
> this feature is disabled(OFF) as a default settings in IBM DB2.
> Although by using -t we can turn it ON for db2. 
> (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
>  But there are some products that already built on top of this default 
> setting (OFF) so by turning ON this feature make them error prone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6223) TestJobConf#testNegativeValueForTaskVmem failures

2015-02-09 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313167#comment-14313167
 ] 

Karthik Kambatla commented on MAPREDUCE-6223:
-

Patch looks mostly good to me. Nit: I would leave the test for negative values, 
but update the asserts to reflect the expected behavior. 

> TestJobConf#testNegativeValueForTaskVmem failures
> -
>
> Key: MAPREDUCE-6223
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6223
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Gera Shegalov
>Assignee: Varun Saxena
> Attachments: MAPREDUCE-6223.001.patch, MAPREDUCE-6223.002.patch, 
> MAPREDUCE-6223.003.patch
>
>
> {code}
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.328 sec <<< 
> FAILURE! - in org.apache.hadoop.conf.TestJobConf
> testNegativeValueForTaskVmem(org.apache.hadoop.conf.TestJobConf)  Time 
> elapsed: 0.089 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1024> but was:<-1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem(TestJobConf.java:111)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6234) MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml

2015-02-09 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313148#comment-14313148
 ] 

Karthik Kambatla commented on MAPREDUCE-6234:
-

Thanks for working on this, folks. As you might see in the description of the 
config, it is kind of hard to pick a single value for DEFAULT_MAP_MEMORY_MB, 
and the most appropriate value seemed 1024 since we fallback to that value. I 
like Gera's proposal of adding a helper method to get the default value; 
however, I wonder if that would just translate to calling 
{{JobConf#getMemoryRequired}} on the default conf. 



> MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml
> 
>
> Key: MAPREDUCE-6234
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/gridmix, mrv2
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
> Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch
>
>
> TestHighRamJob fails by this.
> {code}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.mapred.gridmix.TestHighRamJob
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec <<< 
> FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob
> testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob)  
> Time elapsed: 1.102 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1024> but was:<-1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98)
>   at 
> org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313146#comment-14313146
 ] 

Hudson commented on MAPREDUCE-6237:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2050 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2050/])
MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of 
reusing conections. Contributed by Kannan Rajah. (ozawa: rev 
241336ca2b7cf97d7e0bd84dbe0542b72f304dc9)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java


> Multiple mappers with DBInputFormat don't work because of reusing conections
> 
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Fix For: 2.6.1
>
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313139#comment-14313139
 ] 

Hudson commented on MAPREDUCE-6237:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #2031 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2031/])
MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of 
reusing conections. Contributed by Kannan Rajah. (ozawa: rev 
241336ca2b7cf97d7e0bd84dbe0542b72f304dc9)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java


> Multiple mappers with DBInputFormat don't work because of reusing conections
> 
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Fix For: 2.6.1
>
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master

2015-02-09 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated MAPREDUCE-6242:

Status: Patch Available  (was: Open)

> Progress report log is incredibly excessive in application master
> -
>
> Key: MAPREDUCE-6242
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.4.0
>Reporter: Jian Fang
>Assignee: Varun Saxena
> Attachments: MAPREDUCE-6242.001.patch
>
>
> We saw incredibly excessive logs in application master for a long running one 
> with many task attempts. The log write rate is around 1MB/sec in some cases. 
> Most of the log entries were from the progress report such as the following 
> ones.
> 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1422985365246_0001_m_00_0 is : 0.15605757
> 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1422985365246_0001_m_00_0 is : 0.4108217
> 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1422985365246_0001_m_02_0 is : 0.06634143
> 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1422985365246_0001_m_00_0 is : 0.6506
> 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1422985365246_0001_m_01_0 is : 0.21723115
> Looks like the report interval is controlled by a hard-coded variable 
> PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We 
> should allow users to set the appropriate progress interval for their 
> applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master

2015-02-09 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated MAPREDUCE-6242:

Status: Open  (was: Patch Available)

> Progress report log is incredibly excessive in application master
> -
>
> Key: MAPREDUCE-6242
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.4.0
>Reporter: Jian Fang
>Assignee: Varun Saxena
> Attachments: MAPREDUCE-6242.001.patch
>
>
> We saw incredibly excessive logs in application master for a long running one 
> with many task attempts. The log write rate is around 1MB/sec in some cases. 
> Most of the log entries were from the progress report such as the following 
> ones.
> 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1422985365246_0001_m_00_0 is : 0.15605757
> 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1422985365246_0001_m_00_0 is : 0.4108217
> 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1422985365246_0001_m_02_0 is : 0.06634143
> 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1422985365246_0001_m_00_0 is : 0.6506
> 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1422985365246_0001_m_01_0 is : 0.21723115
> Looks like the report interval is controlled by a hard-coded variable 
> PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We 
> should allow users to set the appropriate progress interval for their 
> applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313096#comment-14313096
 ] 

Hudson commented on MAPREDUCE-6237:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #100 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/100/])
MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of 
reusing conections. Contributed by Kannan Rajah. (ozawa: rev 
241336ca2b7cf97d7e0bd84dbe0542b72f304dc9)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java


> Multiple mappers with DBInputFormat don't work because of reusing conections
> 
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Fix For: 2.6.1
>
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-4413) MR lib dir contains jdiff (which is gpl)

2015-02-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313044#comment-14313044
 ] 

Hudson commented on MAPREDUCE-4413:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #7053 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7053/])
MAPREDUCE-4413. MR lib dir contains jdiff (which is gpl) (Nemon Lou via aw) 
(aw: rev aab459c904bf2007c5b230af8c058793935faf89)
* hadoop-mapreduce-project/CHANGES.txt
* hadoop-assemblies/src/main/resources/assemblies/hadoop-mapreduce-dist.xml


> MR lib dir contains jdiff (which is gpl)
> 
>
> Key: MAPREDUCE-4413
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4413
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Nemon Lou
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: MAPREDUCE-4413.patch, MAPREDUCE-4413.patch
>
>
> A tarball built from trunk contains the following:
> ./share/hadoop/mapreduce/lib/jdiff-1.0.9.jar
> jdiff is gplv2, we need to exclude it from the build artifact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313047#comment-14313047
 ] 

Hudson commented on MAPREDUCE-6237:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #7053 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7053/])
MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of 
reusing conections. Contributed by Kannan Rajah. (ozawa: rev 
241336ca2b7cf97d7e0bd84dbe0542b72f304dc9)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java


> Multiple mappers with DBInputFormat don't work because of reusing conections
> 
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Fix For: 2.6.1
>
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4413) MR lib dir contains jdiff (which is gpl)

2015-02-09 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4413:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

+1 committed to trunk.

Thanks!

> MR lib dir contains jdiff (which is gpl)
> 
>
> Key: MAPREDUCE-4413
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4413
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Nemon Lou
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: MAPREDUCE-4413.patch, MAPREDUCE-4413.patch
>
>
> A tarball built from trunk contains the following:
> ./share/hadoop/mapreduce/lib/jdiff-1.0.9.jar
> jdiff is gplv2, we need to exclude it from the build artifact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312951#comment-14312951
 ] 

Hudson commented on MAPREDUCE-6237:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #99 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/99/])
MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of 
reusing conections. Contributed by Kannan Rajah. (ozawa: rev 
241336ca2b7cf97d7e0bd84dbe0542b72f304dc9)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java
* hadoop-mapreduce-project/CHANGES.txt


> Multiple mappers with DBInputFormat don't work because of reusing conections
> 
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Fix For: 2.6.1
>
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2015-02-09 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-207:
---
Status: Open  (was: Patch Available)

Cancelling patch, as it no longer applies.

> Computing Input Splits on the MR Cluster
> 
>
> Key: MAPREDUCE-207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: applicationmaster, mrv2
>Reporter: Philip Zeyliger
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, 
> MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, 
> MAPREDUCE-207.v07.patch
>
>
> Instead of computing the input splits as part of job submission, Hadoop could 
> have a separate "job task type" that computes the input splits, therefore 
> allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-02-09 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6174:
--
Attachment: MAPREDUCE-6174.v1.txt

[~jira.shegalov], I have uploaded a patch for this issue. Would you please have 
a look?

> Combine common stream code into parent class for InMemoryMapOutput and 
> OnDiskMapOutput.
> ---
>
> Key: MAPREDUCE-6174
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 3.0.0, 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6174.v1.txt
>
>
> Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
> similar things with regards to IFile streams.
> In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
> different from 3rd-party implementations, this JIRA will make them subclass a 
> common class (see 
> https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-02-09 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6174:
--
Status: Patch Available  (was: Open)

> Combine common stream code into parent class for InMemoryMapOutput and 
> OnDiskMapOutput.
> ---
>
> Key: MAPREDUCE-6174
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 2.6.0, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6174.v1.txt
>
>
> Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
> similar things with regards to IFile streams.
> In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
> different from 3rd-party implementations, this JIRA will make them subclass a 
> common class (see 
> https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312817#comment-14312817
 ] 

Hudson commented on MAPREDUCE-6237:
---

FAILURE: Integrated in Hadoop-Yarn-trunk #833 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/833/])
MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of 
reusing conections. Contributed by Kannan Rajah. (ozawa: rev 
241336ca2b7cf97d7e0bd84dbe0542b72f304dc9)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java
* hadoop-mapreduce-project/CHANGES.txt


> Multiple mappers with DBInputFormat don't work because of reusing conections
> 
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Fix For: 2.6.1
>
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6223) TestJobConf#testNegativeValueForTaskVmem failures

2015-02-09 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312791#comment-14312791
 ] 

Akira AJISAKA commented on MAPREDUCE-6223:
--

Thanks [~varun_saxena] for updating the patch. +1 pending [~kasha]'s review. 
The findbugs warnings look unrelated to the patch.

> TestJobConf#testNegativeValueForTaskVmem failures
> -
>
> Key: MAPREDUCE-6223
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6223
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Gera Shegalov
>Assignee: Varun Saxena
> Attachments: MAPREDUCE-6223.001.patch, MAPREDUCE-6223.002.patch, 
> MAPREDUCE-6223.003.patch
>
>
> {code}
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.328 sec <<< 
> FAILURE! - in org.apache.hadoop.conf.TestJobConf
> testNegativeValueForTaskVmem(org.apache.hadoop.conf.TestJobConf)  Time 
> elapsed: 0.089 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1024> but was:<-1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem(TestJobConf.java:111)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Kannan Rajah (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312745#comment-14312745
 ] 

Kannan Rajah commented on MAPREDUCE-6237:
-

Created MAPREDUCE-6247 to track connection pooling.

> Multiple mappers with DBInputFormat don't work because of reusing conections
> 
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Fix For: 2.6.1
>
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6247) Use DBCP connection pooling in DBInputFormat

2015-02-09 Thread Kannan Rajah (JIRA)

Kannan Rajah created MAPREDUCE-6247:
---

 Summary: Use DBCP connection pooling in DBInputFormat
 Key: MAPREDUCE-6247
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6247
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 2.6.0, 2.5.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
Priority: Minor


As part of MAPREDUCE-6237, we removed caching of DB connection. 
[~jira.shegalov] and [~ozawa] suggested that we use DBCP connection pooling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Description: 
DBoutputformat is used for writing output of mapreduce jobs to the database and 
when used with db2 jdbc drivers it fails with following error

com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)

In DBOutputFormat class there is constructQuery method that generates "INSERT 
INTO" statement with semicolon(";") at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

  was:
In DBOutputFormat class there is constructQuery method that generates "INSERT 
INTO" statement with semicolon(";") at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.


> DBOutputFormat.java appending extra semicolon to query which is incompatible 
> with DB2
> -
>
> Key: MAPREDUCE-6246
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.4.1
> Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
> Platform: xSeries, pSeries
> Browser: Firefox, IE
> Security Settings: No Security, Flat file, LDAP, PAM
> File System: HDFS, GPFS FPO
>Reporter: ramtin
>Assignee: ramtin
>  Labels: DB2, mapreduce
> Fix For: 2.4.1
>
> Attachments: MAPREDUCE-6246.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> DBoutputformat is used for writing output of mapreduce jobs to the database 
> and when used with db2 jdbc drivers it fails with following error
> com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
> SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
> DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
> com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)
> In DBOutputFormat class there is constructQuery method that generates "INSERT 
> INTO" statement with semicolon(";") at the end.
> Semicolon is ANSI SQL-92 standard character for a statement terminator but 
> this feature is disabled(OFF) as a default settings in IBM DB2.
> Although by using -t we can turn it ON for db2. 
> (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
>  But there are some products that already built on top of this default 
> setting (OFF) so by turning ON this feature make them error prone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Description: 
DBoutputformat is used for writing output of mapreduce jobs to the database and 
when used with db2 jdbc drivers it fails with following error

com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)


In DBOutputFormat class there is constructQuery method that generates "INSERT 
INTO" statement with semicolon(";") at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

  was:
DBoutputformat is used for writing output of mapreduce jobs to the database and 
when used with db2 jdbc drivers it fails with following error

com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)


> DBOutputFormat.java appending extra semicolon to query which is incompatible 
> with DB2
> -
>
> Key: MAPREDUCE-6246
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.4.1
> Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
> Platform: xSeries, pSeries
> Browser: Firefox, IE
> Security Settings: No Security, Flat file, LDAP, PAM
> File System: HDFS, GPFS FPO
>Reporter: ramtin
>Assignee: ramtin
>  Labels: DB2, mapreduce
> Fix For: 2.4.1
>
> Attachments: MAPREDUCE-6246.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> DBoutputformat is used for writing output of mapreduce jobs to the database 
> and when used with db2 jdbc drivers it fails with following error
> com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
> SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
> DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
> com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)
> In DBOutputFormat class there is constructQuery method that generates "INSERT 
> INTO" statement with semicolon(";") at the end.
> Semicolon is ANSI SQL-92 standard character for a statement terminator but 
> this feature is disabled(OFF) as a default settings in IBM DB2.
> Although by using -t we can turn it ON for db2. 
> (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
>  But there are some products that already built on top of this default 
> setting (OFF) so by turning ON this feature make them error prone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Description: 
DBoutputformat is used for writing output of mapreduce jobs to the database and 
when used with db2 jdbc drivers it fails with following error

com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)

  was:
DBoutputformat is used for writing output of mapreduce jobs to the database and 
when used with db2 jdbc drivers it fails with following error

com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)

In DBOutputFormat class there is constructQuery method that generates "INSERT 
INTO" statement with semicolon(";") at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.


> DBOutputFormat.java appending extra semicolon to query which is incompatible 
> with DB2
> -
>
> Key: MAPREDUCE-6246
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.4.1
> Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
> Platform: xSeries, pSeries
> Browser: Firefox, IE
> Security Settings: No Security, Flat file, LDAP, PAM
> File System: HDFS, GPFS FPO
>Reporter: ramtin
>Assignee: ramtin
>  Labels: DB2, mapreduce
> Fix For: 2.4.1
>
> Attachments: MAPREDUCE-6246.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> DBoutputformat is used for writing output of mapreduce jobs to the database 
> and when used with db2 jdbc drivers it fails with following error
> com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
> SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
> DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
> com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Attachment: MAPREDUCE-6246.patch

> DBOutputFormat.java appending extra semicolon to query which is incompatible 
> with DB2
> -
>
> Key: MAPREDUCE-6246
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.4.1
> Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
> Platform: xSeries, pSeries
> Browser: Firefox, IE
> Security Settings: No Security, Flat file, LDAP, PAM
> File System: HDFS, GPFS FPO
>Reporter: ramtin
>Assignee: ramtin
>  Labels: DB2, mapreduce
> Fix For: 2.4.1
>
> Attachments: MAPREDUCE-6246.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In DBOutputFormat class there is constructQuery method that generates "INSERT 
> INTO" statement with semicolon(";") at the end.
> Semicolon is ANSI SQL-92 standard character for a statement terminator but 
> this feature is disabled(OFF) as a default settings in IBM DB2.
> Although by using -t we can turn it ON for db2. 
> (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
>  But there are some products that already built on top of this default 
> setting (OFF) so by turning ON this feature make them error prone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Fix Version/s: 2.4.1
   Labels: DB2 mapreduce  (was: )
   Status: Patch Available  (was: Open)

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates "INSERT INTO"  query 
without semicolon(";"). 

This technique is already used in DBInputFormat class for generating different 
"SELECT" statements for Oracle and MySQL databases.

> DBOutputFormat.java appending extra semicolon to query which is incompatible 
> with DB2
> -
>
> Key: MAPREDUCE-6246
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.4.1
> Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
> Platform: xSeries, pSeries
> Browser: Firefox, IE
> Security Settings: No Security, Flat file, LDAP, PAM
> File System: HDFS, GPFS FPO
>Reporter: ramtin
>Assignee: ramtin
>  Labels: mapreduce, DB2
> Fix For: 2.4.1
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In DBOutputFormat class there is constructQuery method that generates "INSERT 
> INTO" statement with semicolon(";") at the end.
> Semicolon is ANSI SQL-92 standard character for a statement terminator but 
> this feature is disabled(OFF) as a default settings in IBM DB2.
> Although by using -t we can turn it ON for db2. 
> (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
>  But there are some products that already built on top of this default 
> setting (OFF) so by turning ON this feature make them error prone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6244) Hadoop examples when run without an argument, gives ERROR instead of just usage info

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312642#comment-14312642
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-6244:
---

Cancelling for the previous comment.

> Hadoop examples when run without an argument, gives ERROR instead of just 
> usage info
> 
>
> Key: MAPREDUCE-6244
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6244
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.23.0, trunk-win, 2.6.0
>Reporter: Robert Justice
>Assignee: Abhishek Kapoor
>Priority: Minor
> Attachments: HADOOP-8834.patch, HADOOP-8834.patch
>
>
> Hadoop sort example should not give an ERROR and only should display usage 
> when run with no parameters. 
> {code}
> $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar sort
> ERROR: Wrong number of parameters: 0 instead of 2.
> sort [-m ] [-r ] [-inFormat ] [-outFormat 
> ] [-outKey ] [-outValue  class>] [-totalOrder   ]  
> Generic options supported are
> -conf  specify an application configuration file
> -D use value for given property
> -fs   specify a namenode
> -jt specify a job tracker
> -files specify comma separated files to be 
> copied to the map reduce cluster
> -libjars specify comma separated jar files 
> to include in the classpath.
> -archives specify comma separated 
> archives to be unarchived on the compute machines.
> The general command line syntax is
> bin/hadoop command [genericOptions] [commandOptions]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6244) Hadoop examples when run without an argument, gives ERROR instead of just usage info

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-6244:
--
Status: Open  (was: Patch Available)

> Hadoop examples when run without an argument, gives ERROR instead of just 
> usage info
> 
>
> Key: MAPREDUCE-6244
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6244
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.0, 0.23.0, trunk-win
>Reporter: Robert Justice
>Assignee: Abhishek Kapoor
>Priority: Minor
> Attachments: HADOOP-8834.patch, HADOOP-8834.patch
>
>
> Hadoop sort example should not give an ERROR and only should display usage 
> when run with no parameters. 
> {code}
> $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar sort
> ERROR: Wrong number of parameters: 0 instead of 2.
> sort [-m ] [-r ] [-inFormat ] [-outFormat 
> ] [-outKey ] [-outValue  class>] [-totalOrder   ]  
> Generic options supported are
> -conf  specify an application configuration file
> -D use value for given property
> -fs   specify a namenode
> -jt specify a job tracker
> -files specify comma separated files to be 
> copied to the map reduce cluster
> -libjars specify comma separated jar files 
> to include in the classpath.
> -archives specify comma separated 
> archives to be unarchived on the compute machines.
> The general command line syntax is
> bin/hadoop command [genericOptions] [commandOptions]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312636#comment-14312636
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-6237:
---

Committed this to trunk, branch-2, and branch-2.6. Thanks [~rkannan82] for your 
contribution and thanks [~jira.shegalov] for your review.

[~rkannan82], BTW, do you mind creating following JIRA to use thread pool based 
on Gera's suggestion?

> Multiple mappers with DBInputFormat don't work because of reusing conections
> 
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Fix For: 2.6.1
>
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Description: 
In DBOutputFormat class there is constructQuery method that generates "INSERT 
INTO" statement with semicolon(";") at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

  was:
In DBOutputFormat class there is constructQuery method that generates "INSERT 
INTO" statement with semicolon(";") at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates "INSERT INTO" command 
without semicolon(";"). 

This technique is already used in DBInputFormat class for generating different 
"SELECT" statements for Oracle and MySQL databases.


> DBOutputFormat.java appending extra semicolon to query which is incompatible 
> with DB2
> -
>
> Key: MAPREDUCE-6246
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.4.1
> Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
> Platform: xSeries, pSeries
> Browser: Firefox, IE
> Security Settings: No Security, Flat file, LDAP, PAM
> File System: HDFS, GPFS FPO
>Reporter: ramtin
>Assignee: ramtin
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In DBOutputFormat class there is constructQuery method that generates "INSERT 
> INTO" statement with semicolon(";") at the end.
> Semicolon is ANSI SQL-92 standard character for a statement terminator but 
> this feature is disabled(OFF) as a default settings in IBM DB2.
> Although by using -t we can turn it ON for db2. 
> (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
>  But there are some products that already built on top of this default 
> setting (OFF) so by turning ON this feature make them error prone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-6237:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Multiple mappers with DBInputFormat don't work because of reusing conections
> 
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Fix For: 2.6.1
>
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-6237:
--
Fix Version/s: 2.6.1

> Multiple mappers with DBInputFormat don't work because of reusing conections
> 
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Fix For: 2.6.1
>
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Description: 
In DBOutputFormat class there is constructQuery method that generates "INSERT 
INTO" statement with semicolon(;;)) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates "INSERT INTO" command 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
"SELECT" statements for Oracle and MySQL databases.

  was:
In DBOutputFormat class there is constructQuery method that generates "INSERT 
INTO" statement with semicolon(;) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates "INSERT INTO" command 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
"SELECT" statements for Oracle and MySQL databases.


> DBOutputFormat.java appending extra semicolon to query which is incompatible 
> with DB2
> -
>
> Key: MAPREDUCE-6246
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.4.1
> Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
> Platform: xSeries, pSeries
> Browser: Firefox, IE
> Security Settings: No Security, Flat file, LDAP, PAM
> File System: HDFS, GPFS FPO
>Reporter: ramtin
>Assignee: ramtin
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In DBOutputFormat class there is constructQuery method that generates "INSERT 
> INTO" statement with semicolon(;;)) at the end.
> Semicolon is ANSI SQL-92 standard character for a statement terminator but 
> this feature is disabled(OFF) as a default settings in IBM DB2.
> Although by using -t we can turn it ON for db2. 
> (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
>  But there are some products that already built on top of this default 
> setting (OFF) so by turning ON this feature make them error prone.
> I changed the current DBOutputFormat class by checking the product name from 
> connection object to see if it is DB2 then generates "INSERT INTO" command 
> without semicolon(;). 
> This technique is already used in DBInputFormat class for generating 
> different "SELECT" statements for Oracle and MySQL databases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Description: 
In DBOutputFormat class there is constructQuery method that generates "INSERT 
INTO" statement with semicolon(";") at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates "INSERT INTO" command 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
"SELECT" statements for Oracle and MySQL databases.

  was:
In DBOutputFormat class there is constructQuery method that generates "INSERT 
INTO" statement with semicolon(;;)) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates "INSERT INTO" command 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
"SELECT" statements for Oracle and MySQL databases.


> DBOutputFormat.java appending extra semicolon to query which is incompatible 
> with DB2
> -
>
> Key: MAPREDUCE-6246
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.4.1
> Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
> Platform: xSeries, pSeries
> Browser: Firefox, IE
> Security Settings: No Security, Flat file, LDAP, PAM
> File System: HDFS, GPFS FPO
>Reporter: ramtin
>Assignee: ramtin
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In DBOutputFormat class there is constructQuery method that generates "INSERT 
> INTO" statement with semicolon(";") at the end.
> Semicolon is ANSI SQL-92 standard character for a statement terminator but 
> this feature is disabled(OFF) as a default settings in IBM DB2.
> Although by using -t we can turn it ON for db2. 
> (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
>  But there are some products that already built on top of this default 
> setting (OFF) so by turning ON this feature make them error prone.
> I changed the current DBOutputFormat class by checking the product name from 
> connection object to see if it is DB2 then generates "INSERT INTO" command 
> without semicolon(;). 
> This technique is already used in DBInputFormat class for generating 
> different "SELECT" statements for Oracle and MySQL databases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Description: 
In DBOutputFormat class there is constructQuery method that generates "INSERT 
INTO" statement with semicolon(";") at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates "INSERT INTO" command 
without semicolon(";"). 

This technique is already used in DBInputFormat class for generating different 
"SELECT" statements for Oracle and MySQL databases.

  was:
In DBOutputFormat class there is constructQuery method that generates "INSERT 
INTO" statement with semicolon(";") at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates "INSERT INTO" command 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
"SELECT" statements for Oracle and MySQL databases.


> DBOutputFormat.java appending extra semicolon to query which is incompatible 
> with DB2
> -
>
> Key: MAPREDUCE-6246
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.4.1
> Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
> Platform: xSeries, pSeries
> Browser: Firefox, IE
> Security Settings: No Security, Flat file, LDAP, PAM
> File System: HDFS, GPFS FPO
>Reporter: ramtin
>Assignee: ramtin
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In DBOutputFormat class there is constructQuery method that generates "INSERT 
> INTO" statement with semicolon(";") at the end.
> Semicolon is ANSI SQL-92 standard character for a statement terminator but 
> this feature is disabled(OFF) as a default settings in IBM DB2.
> Although by using -t we can turn it ON for db2. 
> (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
>  But there are some products that already built on top of this default 
> setting (OFF) so by turning ON this feature make them error prone.
> I changed the current DBOutputFormat class by checking the product name from 
> connection object to see if it is DB2 then generates "INSERT INTO" command 
> without semicolon(";"). 
> This technique is already used in DBInputFormat class for generating 
> different "SELECT" statements for Oracle and MySQL databases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)

ramtin created MAPREDUCE-6246:
-

 Summary: DBOutputFormat.java appending extra semicolon to query 
which is incompatible with DB2
 Key: MAPREDUCE-6246
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.4.1
 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
Platform: xSeries, pSeries
Browser: Firefox, IE
Security Settings: No Security, Flat file, LDAP, PAM
File System: HDFS, GPFS FPO
Reporter: ramtin
Assignee: ramtin


In DBOutputFormat class there is constructQuery method that generates "INSERT 
INTO" statement with semicolon(;) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates "INSERT INTO" command 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
"SELECT" statements for Oracle and MySQL databases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (MAPREDUCE-5381) Support graceful decommission of tasktracker

2015-02-09 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-5381.

Resolution: Won't Fix

Hardly any development is happening in 1.x now. I am closing this in favor of 
YARN's YARN-914. Please reopen if need be.

> Support graceful decommission of tasktracker
> 
>
> Key: MAPREDUCE-5381
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5381
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1
>Affects Versions: 1.2.0
>Reporter: Luke Lu
>Assignee: Binglin Chang
> Attachments: MAPREDUCE-5381-graceful-decomm.v1.patch
>
>
> When TTs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running jobs.
> Currently if a TT is decommissioned, all running tasks on the TT need to be 
> rescheduled on other TTs. Further more, for finished map tasks, if their map 
> output are not fetched by the reducers of the job, these map tasks will need 
> to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> tasktracker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-6237:
--
Summary: Multiple mappers with DBInputFormat don't work because of reusing 
conections  (was: DBRecordReader is not thread safe)

> Multiple mappers with DBInputFormat don't work because of reusing conections
> 
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6237) DBRecordReader is not thread safe

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-6237:
--
Affects Version/s: 2.6.0
 Hadoop Flags: Reviewed

> DBRecordReader is not thread safe
> -
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6237) DBRecordReader is not thread safe

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312590#comment-14312590
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-6237:
---

+1, findbugs look not related to your patch. I'll commit this to branch-2 and 
trunk shortly.

> DBRecordReader is not thread safe
> -
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.5.0
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6237) DBRecordReader is not thread safe

2015-02-09 Thread Kannan Rajah (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312583#comment-14312583
 ] 

Kannan Rajah commented on MAPREDUCE-6237:
-

[~ozawa] Is the patch alright? Anything else I need to do to get this committed?

> DBRecordReader is not thread safe
> -
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.5.0
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase

2015-02-09 Thread kumar ranganathan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312122#comment-14312122
 ] 

kumar ranganathan commented on MAPREDUCE-5903:
--

I am also facing the same exception when enabling LDAP for windows active 
directory in hadoop-2.6.0. 

> If Kerberos Authentication is enabled, MapReduce job is failing on reducer 
> phase
> 
>
> Key: MAPREDUCE-5903
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.4.0
> Environment: hadoop: 2.4.0.2.1.2.0
>Reporter: Victor Kim
>Priority: Critical
>  Labels: shuffle
>
> I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, 
> Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. 
> ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos 
> principal. 
> Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one 
> having Kerberos principal on all boxes). Result: job successfully completed.
> Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. 
> Result: Map tasks are completed SUCCESSfully, Reduce task fails with 
> ShuffleError Caused by: java.io.IOException: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES (see the stack trace below).
> The use case with user impersonation used to work on earlier versions, 
> without YARN (with JT&TT).
> I found similar issue with Kerberos AUTH involved here: 
> https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ
> And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as 
> resolved, which is not the case when Kerberos Authentication is enabled.
> The exception trace from YarnChild JVM:
> 2014-05-21 12:49:35,687 FATAL [fetcher#3] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed 
> with too many fetch failures and insufficient progress!
> 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#3
> at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; 
> bailing-out.
> at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
> at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

48 matches

Mail list logo