[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory

2014-06-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035988#comment-14035988
 ] 

Karthik Kambatla commented on MAPREDUCE-5896:
-

Looks good to me. +1. Will commit this later today. 

 Allow InputSplits to indicate which locations have the block cached in memory
 -

 Key: MAPREDUCE-5896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5896-1.patch, MAPREDUCE-5896-2.patch, 
 MAPREDUCE-5896.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory

2014-06-11 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027977#comment-14027977
 ] 

Sandy Ryza commented on MAPREDUCE-5896:
---

bq. Can we make InputSplitLocationInfo extend InputSplit? It doesn't make sense 
for any class to implement only InputSplitLocationInfo without implementing 
InputSplit.
Will do.

bq. Nothing to do with this patch. It is unfortunate that mapreduce.InputSplit 
doesn't implement mapred.InputSplit. Would it be easy to fix it?
Not entirely sure the reasoning there, but as this stuff can have binary 
compatibility implications in mysterious ways, I'd rather not touch it if we 
don't need to.

bq. The following two constants should probably be in SplitLocationInfo?
They're only used in FileSplit and not in SplitLocationInfo - is there utility 
in moving them away from where they're used?  I'd like to avoid adding these 
constants to the API because, when we include additional storage types, each 
SplitLocationInfo could end up as a union of storage types - needing to add a 
ON_DISK_AND_IN_FLASH_AND_IN_MEMORY would be ugly.

bq. Instead of assigning ON_DISK by default, would it make sense to set it post 
null-check after the loop for checking if it is in memory.
Any advantage to this?  It would add extra code, an extra branch, and I don't 
think be particularly more readable.

bq. Do you think it would make sense to include the string corresponding to the 
location in SplitLocationInfo itself?
Will do.

 Allow InputSplits to indicate which locations have the block cached in memory
 -

 Key: MAPREDUCE-5896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5896-1.patch, MAPREDUCE-5896.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory

2014-06-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028710#comment-14028710
 ] 

Hadoop QA commented on MAPREDUCE-5896:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12649975/MAPREDUCE-5896-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4650//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4650//console

This message is automatically generated.

 Allow InputSplits to indicate which locations have the block cached in memory
 -

 Key: MAPREDUCE-5896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5896-1.patch, MAPREDUCE-5896-2.patch, 
 MAPREDUCE-5896.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory

2014-06-10 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026541#comment-14026541
 ] 

Tom White commented on MAPREDUCE-5896:
--

+1 for the API changes.

 Allow InputSplits to indicate which locations have the block cached in memory
 -

 Key: MAPREDUCE-5896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5896-1.patch, MAPREDUCE-5896.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory

2014-06-10 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027022#comment-14027022
 ] 

Karthik Kambatla commented on MAPREDUCE-5896:
-

Review comments:
# Can we make InputSplitLocationInfo extend InputSplit? It doesn't make sense 
for any class to implement only InputSplitLocationInfo without implementing 
InputSplit.
# I am uncomfortable having to depend on index matching between 
InputSplit#getLocations and InputSplit#getLocationInfo. Do you think it would 
make sense to include the string corresponding to the location in 
SplitLocationInfo itself? We could deprecate InputSplit#getLocations(). Users 
are to be expected to use getLocationInfos instead.
# Nothing to do with this patch. It is unfortunate that mapreduce.InputSplit 
doesn't implement mapred.InputSplit. Would it be easy to fix it?
# Nit: The following two constants should probably be in SplitLocationInfo? 
{code}
  private static final SplitLocationInfo ON_DISK = new SplitLocationInfo(false);
  private static final SplitLocationInfo IN_MEMORY = new 
SplitLocationInfo(true);
{code}
# Nit: Instead of assigning ON_DISK by default, would it make sense to set it 
post null-check after the loop for checking if it is in memory. 
{code}
   for (int i = 0; i  hosts.length; i++) {
 hostInfos[i] = ON_DISK;
 // because N will be tiny, scanning is probably faster than a HashSet
 for (String inMemoryHost : inMemoryHosts) {
{code}

 Allow InputSplits to indicate which locations have the block cached in memory
 -

 Key: MAPREDUCE-5896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5896-1.patch, MAPREDUCE-5896.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory

2014-05-28 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011391#comment-14011391
 ] 

Sandy Ryza commented on MAPREDUCE-5896:
---

Updated patch sets all the new APIs to Evolving, fixes the typo that Tom 
noticed, includes cached hosts in mapred.FileInputFormat split generation, and 
adds tests.

 Allow InputSplits to indicate which locations have the block cached in memory
 -

 Key: MAPREDUCE-5896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5896-1.patch, MAPREDUCE-5896.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory

2014-05-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011441#comment-14011441
 ] 

Hadoop QA commented on MAPREDUCE-5896:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12647178/MAPREDUCE-5896-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4630//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4630//console

This message is automatically generated.

 Allow InputSplits to indicate which locations have the block cached in memory
 -

 Key: MAPREDUCE-5896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5896-1.patch, MAPREDUCE-5896.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory

2014-05-27 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009664#comment-14009664
 ] 

Tom White commented on MAPREDUCE-5896:
--

@Evolving would mean that it could change incompatibly between 2.4.0 and 2.5.0 
say, but not between 2.4.0 and 2.4.1 - and that might be the right trade off 
here. Hopefully SplitLocationInfo would not need to be changed incompatibly, 
but it's hard to know for sure until it's being used.

 Allow InputSplits to indicate which locations have the block cached in memory
 -

 Key: MAPREDUCE-5896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5896.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory

2014-05-26 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008903#comment-14008903
 ] 

Tom White commented on MAPREDUCE-5896:
--

From a compatibility point of view I think the changes are OK, although 
SplitLocationInfo should not (yet) be marked as @Stable. Also, 
mapred.FileSplit has a new getSplitLocationInfo() method, but it should be 
getLocationInfo(). Add an @Override annotation to catch this.

How would the split calculation in FileInputFormat change to use hosts with a 
cached copy of a particular block? It would be worth creating a trial 
implementation to test that the changes are sufficient. 

 Allow InputSplits to indicate which locations have the block cached in memory
 -

 Key: MAPREDUCE-5896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5896.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory

2014-05-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008925#comment-14008925
 ] 

Hadoop QA commented on MAPREDUCE-5896:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646096/MAPREDUCE-5896.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4620//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4620//console

This message is automatically generated.

 Allow InputSplits to indicate which locations have the block cached in memory
 -

 Key: MAPREDUCE-5896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5896.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory

2014-05-26 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009055#comment-14009055
 ] 

Sandy Ryza commented on MAPREDUCE-5896:
---

What should the criteria be for marking this stable?  I'd like to start using 
this in downstream projects (Spark, and I believe Tez could benefit as well) as 
soon as possible, and an Evolving annotation would prevent this.

 Allow InputSplits to indicate which locations have the block cached in memory
 -

 Key: MAPREDUCE-5896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5896.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory

2014-05-21 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004424#comment-14004424
 ] 

Sandy Ryza commented on MAPREDUCE-5896:
---

Given HDFS's plans for hierarchical storage management, I think it would be 
good to make this extensible to handle storage mediums beyond memory.  I talked 
this over with [~andrew.wang] and [~atm] and we think the right interface would 
be something like a SplitLocationInfo class, with isInMemory() and isOnDisk() 
methods.  We can later add isInFlash() and possibly even getDisk() to return 
which disk the data is on. 

InputSplits would have a SplitLocationInfo[] getLocationInfo() method that 
returns info about how the data is stored on each host returned by 
getLocations().

 Allow InputSplits to indicate which locations have the block cached in memory
 -

 Key: MAPREDUCE-5896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory

2014-05-21 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005194#comment-14005194
 ] 

Sandy Ryza commented on MAPREDUCE-5896:
---

Uploaded a POC patch.  I'll add some tests if others think the APIs make sense.

 Allow InputSplits to indicate which locations have the block cached in memory
 -

 Key: MAPREDUCE-5896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5896.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)